A state machine for a non-regular language
Categories: none
Following on from the section on state machines, we will now look at an example of a language that isn't regular, and see what sort of problems you might have designing a state machine.
As it turns out, most programming languages are not regular languages, as we will see at the end of this section.
A non regular language
We will look at a simple example of a language that isn't regular. The language contains only the characters X and Y, and consists of all the strings that consist of n
Xs followed by n
Ys, where n
is any integer. Examples of valid strings are:
XY or XXYY or XXXYYY etc
Invalid strings include:
XXY or XXYYY
because the number of Xs and Ys are not equal. There are lots of other invalid strings, of course.
The state machine
Here is our state machine. For clarity we have excluded all the error paths and just show the correct paths through the machine:
Here is how it works for the string XY
:
- Starting in state 1
X
moves us to state 2Y
moves us to state 9, which is the valid end state
Here is how it works for the string XXYY
:
- Starting in state 1
X
moves us to state 2X
moves us to state 3Y
moves us to state 8Y
moves us to state 9, which is the valid end state
It should be clear how this works for other strings. Each X
takes us one step further to the right, along the top row. We then need exactly the same number of Y
characters to get us back to the valid end state, 9.
The problem
Here is the problem. The state machine shown for only works for 4 or less incoming X
characters.
Of course, we could easily add extra states to allow us to handle 5 Xs. Or 10. Or 100.
But however many states you add, someone could come along with a string that was too long and the state machine would fail. To be capable of handling a string of any length you would need a state machine with an infinite number of states.
But we are dealing with finite state machines, which by definition cannot have an infinite number of states (the clue's in the name). A finite state machine cannot handle this particular language.
What does this mean?
Just because this languages isn't regular doesn't mean a computer can't parse it, of course. It is a very simple language - it would clearly be possible to write a program to check a string to see whether it was valid or not. You just need to count the number of Xs and then count the number of Ys, and check if they are the same.
But you can't use a finite state machine to do that, so our simple language isn't regular.
We can't apply some other methods of parsing that only apply to regular languages.
Programming languages
Are programming languages like Python or Java regular? Well it turns out most of them are not.
A simple way to see that is to look at a basic maths expression:
x = a * (b + c * (d + e * (f + g)))
This expression uses brackets. Brackets can be nested, as shown. One of the things that has to be true of any valid expression is that each opening bracket must be matched by a closing bracket.
Can a state machine do this? well we already know that it can't. Even if we ignore all the other parts of the expression, the very least we need to do is check a string like this:
((()))
for any number of brackets. That is exactly the same problem as we looked at before for strings like XXXYYY
, so we know a finite state machine can't parse a mathematical expression, so any language that allows nested brackets can't be regular.
If fact you get a similar problem with other types of nesting. Nested if statements, nested for loops, nested function definitions, nested classes - any language that supports those things (and most langauges do) cannot be a regular language.
See also
Sign up to the Creative Coding Newletter
Join my newsletter to receive occasional emails when new content is added, using the form below:
Popular tags
555 timer abstract data type abstraction addition algorithm and gate array ascii ascii85 base32 base64 battery binary binary encoding binary search bit block cipher block padding byte canvas colour coming soon computer music condition cryptographic attacks cryptography decomposition decryption deduplication dictionary attack encryption file server flash memory hard drive hashing hexadecimal hmac html image insertion sort ip address key derivation lamp linear search list mac mac address mesh network message authentication code music nand gate network storage none nor gate not gate op-amp or gate pixel private key python quantisation queue raid ram relational operator resources rgb rom search sort sound synthesis ssd star network supercollider svg switch symmetric encryption truth table turtle graphics yenc