Python Bracket Parser: Handle Custom Brackets ( ) [ ] < > - +

by Lucas 62 views

Hey everyone! 👋 I'm gonna walk you through creating a bracket grammar parser using Python. The coolest part? We'll be able to handle a custom set of brackets like < >, [ ], ( ), and even - +. This article is for you if you're interested in Python, algorithms, parsing, finite state machines, or grammars, so let's dive in and see how to make it happen! I made this parser out of pure curiosity, just to see if I could do it. It's still basic and doesn't have any fancy error checking, but it gets the job done. I'm gonna explain the code and the different parts of how it works.

The Core Idea: Parsing Bracket Grammar 🤓

Alright, so what's the deal with bracket grammar parsing? Basically, we want to take a string that has brackets in it (like <abc[def(ghi)]jkl>) and figure out if the brackets are properly matched and nested. For example, <[()]> is valid, but <[)> is not. The key is to make sure every opening bracket has a matching closing bracket in the right order. Our parser will need to understand the relationships between the brackets. I'm talking about knowing which bracket is opening and which one is closing and make sure that they match up correctly. Our simple parser will be a cool tool for tasks like validating code syntax, processing mathematical expressions, or checking the structure of data formats.

Now, there are many ways to do this, but we'll stick with a straightforward approach: using a stack data structure. A stack is like a pile of plates – you can only add or remove plates from the top. In our case, we'll use the stack to keep track of the opening brackets we encounter. Every time we find an opening bracket, we'll put it on the stack. When we see a closing bracket, we'll check if it matches the bracket at the top of the stack. If they match, we remove the top bracket from the stack (because we've found a matching pair). If the stack is empty or the brackets don't match, then there's an error.

So, the input is gonna be a string containing brackets (like <[()]>). The output will be a boolean: True if the brackets are balanced and False if not. We'll also need to define our bracket pairs: We'll support < >, [ ], ( ), and - +. The logic is pretty simple, right? This will be easy to implement using Python. The code will walk through each character of the input string and make sure that the opening and closing brackets are in the correct order. This is gonna be a fun, and informative journey, so hang tight!

Implementing the Parser in Python 🐍

Let's get our hands dirty and write some Python code! Here's a basic implementation of a bracket grammar parser. We'll start with a simple function that takes a string and a dictionary of bracket pairs as input. The dictionary will let us define which brackets are opening and which are closing. Here's the code:

def is_balanced(text, bracket_pairs):
    stack = []
    for char in text:
        if char in bracket_pairs.keys():
            stack.append(char)
        elif char in bracket_pairs.values():
            if not stack:
                return False
            top = stack.pop()
            if char != bracket_pairs.get(top):
                return False
    return not stack

# Example usage:
bracket_pairs = {
    '<' : '>',
    '[' : ']',
    '(' : ')',
    '-' : '+'
}

text1 = '<[()]>'
text2 = '<[)>'

print(f'"{text1}" is balanced: {is_balanced(text1, bracket_pairs)}')
print(f'"{text2}" is balanced: {is_balanced(text2, bracket_pairs)}')

So, let's break this down bit by bit, shall we? The is_balanced function is the heart of the parser. It takes two arguments: the text we want to check and a dictionary that specifies the bracket pairs. We create an empty stack called stack. This is gonna be used to store the opening brackets as we go through the text. Then, we loop through each character in the input text. If the character is an opening bracket (i.e., a key in the bracket_pairs dictionary), we add it to the stack. On the other hand, if the character is a closing bracket (i.e., a value in the bracket_pairs dictionary), we check if the stack is empty. If it is, it means we've encountered a closing bracket without a corresponding opening bracket, so the brackets aren't balanced, and we return False. If the stack isn't empty, we pop the top element from the stack (which should be the last opening bracket we saw). We then check if the current closing bracket matches the opening bracket we just popped. If they don't match (using bracket_pairs.get(top)), it means the brackets are mismatched, and we return False. Finally, after we've processed the entire text, we check if the stack is empty. If it is, it means that all opening brackets have been matched with closing brackets, and the brackets are balanced, so we return True. If the stack isn't empty, it means there are some opening brackets that haven't been closed, so we return False.

Expanding the Functionality 🚀

Now, what if we want to make our parser a little more robust? Here are a few ways we can expand our parser to make it more useful:

  • Error Reporting: Right now, the parser just returns True or False. It would be much more helpful if it could tell us where the errors are. We could modify the function to return the position of the first unmatched bracket or the position where a mismatch occurs. This is gonna be useful for debugging.
  • Supporting Different Bracket Types: Our current parser is very flexible because it uses a dictionary to define bracket pairs. However, we could make it even more flexible by allowing different types of brackets to be defined as parameters to the function.
  • Ignoring Non-Bracket Characters: Right now, the parser will treat any character that isn't a bracket as an error. We could modify the function to ignore any character that's not an opening or closing bracket. This would make the parser more tolerant of input that includes other characters.
  • Nested Structures: This basic parser handles nested brackets correctly. It can handle situations like <[()]> without any problems. But the structure is still kinda simplistic.

Adding Error Reporting in Python 📝

Let's improve our parser by adding error reporting. Instead of just returning True or False, we'll return a more detailed result that includes the position of the first unmatched bracket, or the position where a mismatch occurs, and an error message. Here's the updated code:

def is_balanced_with_error(text, bracket_pairs):
    stack = []
    error_position = -1
    for i, char in enumerate(text):
        if char in bracket_pairs.keys():
            stack.append((char, i))
        elif char in bracket_pairs.values():
            if not stack:
                error_position = i
                return False, f'Unmatched closing bracket at position {i}'
            top, pos = stack.pop()
            if char != bracket_pairs.get(top):
                error_position = i
                return False, f'Mismatched brackets at position {i}'
    if stack:
        top, pos = stack.pop()
        error_position = pos
        return False, f'Unmatched opening bracket at position {pos}'
    return True, 'Balanced'

# Example usage:
bracket_pairs = {
    '<' : '>',
    '[' : ']',
    '(' : ')',
    '-' : '+'
}

text1 = '<[()]>'
text2 = '<[)>'
text3 = '<[('

result1, message1 = is_balanced_with_error(text1, bracket_pairs)
result2, message2 = is_balanced_with_error(text2, bracket_pairs)
result3, message3 = is_balanced_with_error(text3, bracket_pairs)

print(f'"{text1}" is balanced: {result1}, {message1}')
print(f'"{text2}" is balanced: {result2}, {message2}')
print(f'"{text3}" is balanced: {result3}, {message3}')

We've added error reporting to our parser. We’ve included the position of the error. In this new version, we're not just returning True or False. We're returning a tuple. The first element is a boolean that indicates whether the brackets are balanced. The second element is a string that contains an error message if the brackets aren't balanced. If there's an error, the error message will specify the type of error and the position in the input string where the error occurred. We've also added a error_position variable to keep track of the position of the first error. The error position is the index in the text where we detected the error. This improvement helps pinpoint the exact location of any bracket issues. The function's logic remains mostly the same, but now, if it finds an error, it returns False along with an error message indicating the position of the error. If the loop completes without finding any errors and the stack is empty, the function returns True along with the message "Balanced".

Using Finite State Machines for Parsing 🤖

Now, let's talk about another approach to parsing: Finite State Machines (FSMs). An FSM is a computational model that can be in one of a finite number of states. Based on the input, it transitions from one state to another. For our bracket grammar parser, we can design an FSM with states representing the current context of bracket nesting. However, for this particular problem, the stack-based approach is more straightforward and efficient, so we'll stick with that. The FSM approach could become more relevant if our grammar becomes more complex, requiring the parser to keep track of different types of structures, not just brackets.

With the stack approach, we're essentially using the stack as a memory to remember the current state of bracket nesting. This is a more natural fit for this problem. So, while FSMs are super useful for a lot of parsing problems, for our simple bracket grammar, the stack is the way to go.

Conclusion: Wrapping Up the Bracket Grammar Parser 🥳

We did it, guys! We successfully built a bracket grammar parser in Python that supports custom bracket sets and gives us error reporting. We explored how to use a stack data structure to keep track of opening brackets, how to check for matching pairs, and how to add helpful error messages. You've learned how to validate bracket syntax using Python and handle various types of brackets and their positions in the input string.

Remember, this is just the beginning! You can extend this parser to handle more complex grammars or use it as a building block for other projects. Keep experimenting, and happy coding!