Following on from the previous section on state machines, we will look at how we can check the syntax of a hex string such as "0xAA55", how to implement a state machine in code, and how to use it to convert a hex string to an integer.

The syntax of our hex number is as follows:

- A
`0`

character, followed by - An
`x`

character, followed by - One or more pairs of hex characters (
`0-9`

or`A-F`

)

Examples of valid hex strings are `0xAB`

, `0x1234`

, `0x8BADF00D`

.

Our scheme doesn't allow an odd number of hex digits, so `0xC5C`

would be invalid as it has 3 digits. Note that most languages *do* allow an odd number of digits. We have added the requirements for an even number of digits to make the state machine a bit more interesting.

Here is our state machine:

We have used names, rather than numbers, for the states, because we will also be creating code for this state machine. It is better to use meaningful names rather than "magic numbers" in code.

The transitions are labelled as follows:

`0`

and`x`

represent the individual characters`hex`

represents any single character is the set`0-9`

or`A-F`

`*`

represents*any*character that isn't matched by any other branch

Let's see how we could use the state machine to check a valid string `0x1234`

:

- We start in the initial state,
`start`

. - The first symbol is a
`0`

, which leads us to state`waitx`

- in this state we are waiting for an`x`

character. - The next symbol is an
`x`

, which leads us to state`digit1`

- in this state we are waiting for the first hex digit. - The next symbol is a
`1`

, a valid hex digit, so we go to state`odd`

- in this state we have an odd number of digits. - The next symbol is a
`2`

, a valid hex digit, so we go to state`even`

- in this state we have an even number of digits. This is a valid end state, because the string so far (`0x12`

) is a valid hex string. - The next symbol
`3`

takes us back to the`odd`

state. Then the symbol`4`

takes us back to the`even`

state, where the string ends.

The string is now finished, and the state machine is left is state `even`

, which is the valid end state (marked in green).

Notice that if we add an extra character the state machine goes back to state `odd`

(not a valid end state). If we add another character it goes to state `even`

- this ensure that the string is only marked as valid if there are an even number of hex digits.

Each state looks for specific characters, and will revert straight to the `error`

state is an unexpected character is encountered.

So for example if we tried to decode 0123:

- We start in the initial state,
`start`

. - The first symbol is a
`0`

, which leads us to state`waitx`

. - The next symbol is a
`1`

, but we expected an`x`

. We go to state`error`

. - In
`error`

state, any input character leads back to`error`

.

To implement a state machine in code, first we define constants for our states. We could give our states number values, but we will use strings here because it will make it easier to follow when we print out the results:

START = 'start' WAITX = 'waitx' DIGIT1 = 'digit1' ODD = 'odd' EVEN = 'even' ERROR = 'error'

Next we define a function `next_state`

that accepts a current state and an input value, and returns a next state. Here is a start:

def next_state(state, value): if state == START: if value == '0': return WAITX else: return ERROR else: return ERROR

Inside the function we check the `state`

. If it is `START`

we check the value. If it is `0`

we return a next state of `WAITX`

. If the value is not `0`

, it is incorrect so we return a state of `ERROR`

.

If the state is not `START`

we return a next state of `ERROR`

.

Of course, this function isn't complete. We need to add if clauses for the other states:

def next_state(state, value): if state == START: if value == '0': return WAITX else: return ERROR elif state == WAITX: if value == 'x': return DIGIT1 else: return ERROR elif state == DIGIT1: if value in '0123456789ABCDEF': return ODD else: return ERROR elif state == ODD: if value in '0123456789ABCDEF': return EVEN else: return ERROR elif state == EVEN: if value in '0123456789ABCDEF': return ODD else: return ERROR else: return ERROR

The extra `elif`

clauses deal with the other possible input states. For example, if the input state is `WAITX`

we check for an `x`

and return either `DIGIT1`

if it is, or `ERROR`

if not.

For the `DIGIT1`

case, we need to check if `value`

is any valid hex digit. We do this by checking if it is **in** the string `0123456789ABCDEF`

. Similar for `ODD`

and `EVEN`

.

One final point of note, we end with an `if`

statement. This deals with the case of `state`

being equal to `ERROR`

or *any other value* that isn't a valid state. So if for some reason our code gets called with a nonsense string like `XYZ`

we will just treat it as an error state (this is an example of *defensive coding* - make sure your code behaves sensibly for unexpected inputs).

The net thing we need to do is create a *main loop* that requests an input string, and feedd it into the state machine one character at a time, and prints the result:

s = input('Type in a hex value ') state = START for c in s: state = next_state(state, c) print(state) if state == EVEN: print('VALID') else: print('INVALID')

To help us understand what is going on, the loop prints out that state after each stage.

Here is the complete code:

START = 'start' WAITX = 'waitx' DIGIT1 = 'digit1' ODD = 'odd' EVEN = 'even' ERROR = 'error' def next_state(state, value): if state == START: if value == '0': return WAITX else: return ERROR elif state == WAITX: if value == 'x': return DIGIT1 else: return ERROR elif state == DIGIT1: if value in '0123456789ABCDEF': return ODD else: return ERROR elif state == ODD: if value in '0123456789ABCDEF': return EVEN else: return ERROR elif state == EVEN: if value in '0123456789ABCDEF': return ODD else: return ERROR else: return ERROR s = input('Type in a hex value ') state = START for c in s: state = next_state(state, c) print(state) if state == EVEN: print('VALID') else: print('INVALID')

So far our code can only analyse a string and tell us if it is correct or not. But what if we wanted to use the state machine to convert a hex string into an integer value?

First lets look at the conversion. Imagine the string is 0x1234. We will be given the digits in order, from left to right (that is, the most significant number first). So assuming with have a variable `total`

that starts off at 0.

- The first digit is a 1, so our total is 0x1
- The next digit is a 2, so our total is 0x12
- The next digit is a 3, so our total is 0x123
- The next digit is a 4, so our total is 0x1234

In other words, each time a new digit comes in, we shift the `total`

value to the left by one digit and add the new number. Since hex is base 16, shifting the value left by one digit is the same as multiplying it by 16.

Here is a simple function that accumulates a digit into an existing total:

total = 0 def accumulate(c): global total v = '0123456789ABCDEF'.index(c) total = total*16 + v

We use the `global`

keyword because `total`

is a global variable.

To convert the character `0`

to `F`

into a value 0 to 15, we look up its index in the string `0123456789ABCDEF`

. Character `0`

appears at position 0 so index returns 0. Character `F`

is at position 15 so index returns 15.

Now we just need to call the accumulate function each time we pass through states `DIGIT1`

, `ODD`

or `EVEN`

. Here is the modified full code:

START = 'start' WAITX = 'waitx' DIGIT1 = 'digit1' ODD = 'odd' EVEN = 'even' ERROR = 'error' total = 0 def accumulate(c): global total v = '0123456789ABCDEF'.index(c) total = total*16 + v def next_state(state, value): if state == START: if value == '0': return WAITX else: return ERROR elif state == WAITX: if value == 'x': return DIGIT1 else: return ERROR elif state == DIGIT1: if value in '0123456789ABCDEF': accumulate(value) return ODD else: return ERROR elif state == ODD: if value in '0123456789ABCDEF': accumulate(value) return EVEN else: return ERROR elif state == EVEN: if value in '0123456789ABCDEF': accumulate(value) return ODD else: return ERROR else: return ERROR s = input('Type in a hex value ') total = 0 state = START for c in s: state = next_state(state, c) print(state) if state == EVEN: print('VALID') print(total) else: print('INVALID')

In the next section we will look at a non-regular language, and see why it is impossible to implement it as a state machine.

Copyright (c) Axlesoft Ltd 2021