Converting hexadecimal
Categories: none
Following on from the previous section on state machines, we will look at how we can check the syntax of a hex string such as "0xAA55", how to implement a state machine in code, and how to use it to convert a hex string to an integer.
Defining the syntax
The syntax of our hex number is as follows:
- A
0
character, followed by - An
x
character, followed by - One or more pairs of hex characters (
0-9
orA-F
)
Examples of valid hex strings are 0xAB
, 0x1234
, 0x8BADF00D
.
Our scheme doesn't allow an odd number of hex digits, so 0xC5C
would be invalid as it has 3 digits. Note that most languages do allow an odd number of digits. We have added the requirements for an even number of digits to make the state machine a bit more interesting.
The state machine
Here is our state machine:
We have used names, rather than numbers, for the states, because we will also be creating code for this state machine. It is better to use meaningful names rather than "magic numbers" in code.
The transitions are labelled as follows:
0
andx
represent the individual charactershex
represents any single character is the set0-9
orA-F
*
represents any character that isn't matched by any other branch
Checking a valid string
Let's see how we could use the state machine to check a valid string 0x1234
:
- We start in the initial state,
start
. - The first symbol is a
0
, which leads us to statewaitx
- in this state we are waiting for anx
character. - The next symbol is an
x
, which leads us to statedigit1
- in this state we are waiting for the first hex digit. - The next symbol is a
1
, a valid hex digit, so we go to stateodd
- in this state we have an odd number of digits. - The next symbol is a
2
, a valid hex digit, so we go to stateeven
- in this state we have an even number of digits. This is a valid end state, because the string so far (0x12
) is a valid hex string. - The next symbol
3
takes us back to theodd
state. Then the symbol4
takes us back to theeven
state, where the string ends.
The string is now finished, and the state machine is left is state even
, which is the valid end state (marked in green).
Notice that if we add an extra character the state machine goes back to state odd
(not a valid end state). If we add another character it goes to state even
- this ensure that the string is only marked as valid if there are an even number of hex digits.
Invalid strings
Each state looks for specific characters, and will revert straight to the error
state is an unexpected character is encountered.
So for example if we tried to decode 0123:
- We start in the initial state,
start
. - The first symbol is a
0
, which leads us to statewaitx
. - The next symbol is a
1
, but we expected anx
. We go to stateerror
. - In
error
state, any input character leads back toerror
.
A state machine in code
To implement a state machine in code, first we define constants for our states. We could give our states number values, but we will use strings here because it will make it easier to follow when we print out the results:
START = 'start' WAITX = 'waitx' DIGIT1 = 'digit1' ODD = 'odd' EVEN = 'even' ERROR = 'error'
Next we define a function next_state
that accepts a current state and an input value, and returns a next state. Here is a start:
def next_state(state, value): if state == START: if value == '0': return WAITX else: return ERROR else: return ERROR
Inside the function we check the state
. If it is START
we check the value. If it is 0
we return a next state of WAITX
. If the value is not 0
, it is incorrect so we return a state of ERROR
.
If the state is not START
we return a next state of ERROR
.
Of course, this function isn't complete. We need to add if clauses for the other states:
def next_state(state, value): if state == START: if value == '0': return WAITX else: return ERROR elif state == WAITX: if value == 'x': return DIGIT1 else: return ERROR elif state == DIGIT1: if value in '0123456789ABCDEF': return ODD else: return ERROR elif state == ODD: if value in '0123456789ABCDEF': return EVEN else: return ERROR elif state == EVEN: if value in '0123456789ABCDEF': return ODD else: return ERROR else: return ERROR
The extra elif
clauses deal with the other possible input states. For example, if the input state is WAITX
we check for an x
and return either DIGIT1
if it is, or ERROR
if not.
For the DIGIT1
case, we need to check if value
is any valid hex digit. We do this by checking if it is in the string 0123456789ABCDEF
. Similar for ODD
and EVEN
.
One final point of note, we end with an if
statement. This deals with the case of state
being equal to ERROR
or any other value that isn't a valid state. So if for some reason our code gets called with a nonsense string like XYZ
we will just treat it as an error state (this is an example of defensive coding - make sure your code behaves sensibly for unexpected inputs).
The net thing we need to do is create a main loop that requests an input string, and feedd it into the state machine one character at a time, and prints the result:
s = input('Type in a hex value ') state = START for c in s: state = next_state(state, c) print(state) if state == EVEN: print('VALID') else: print('INVALID')
To help us understand what is going on, the loop prints out that state after each stage.
Here is the complete code:
START = 'start' WAITX = 'waitx' DIGIT1 = 'digit1' ODD = 'odd' EVEN = 'even' ERROR = 'error' def next_state(state, value): if state == START: if value == '0': return WAITX else: return ERROR elif state == WAITX: if value == 'x': return DIGIT1 else: return ERROR elif state == DIGIT1: if value in '0123456789ABCDEF': return ODD else: return ERROR elif state == ODD: if value in '0123456789ABCDEF': return EVEN else: return ERROR elif state == EVEN: if value in '0123456789ABCDEF': return ODD else: return ERROR else: return ERROR s = input('Type in a hex value ') state = START for c in s: state = next_state(state, c) print(state) if state == EVEN: print('VALID') else: print('INVALID')
Hex conversion
So far our code can only analyse a string and tell us if it is correct or not. But what if we wanted to use the state machine to convert a hex string into an integer value?
First lets look at the conversion. Imagine the string is 0x1234. We will be given the digits in order, from left to right (that is, the most significant number first). So assuming with have a variable total
that starts off at 0.
- The first digit is a 1, so our total is 0x1
- The next digit is a 2, so our total is 0x12
- The next digit is a 3, so our total is 0x123
- The next digit is a 4, so our total is 0x1234
In other words, each time a new digit comes in, we shift the total
value to the left by one digit and add the new number. Since hex is base 16, shifting the value left by one digit is the same as multiplying it by 16.
Here is a simple function that accumulates a digit into an existing total:
total = 0 def accumulate(c): global total v = '0123456789ABCDEF'.index(c) total = total*16 + v
We use the global
keyword because total
is a global variable.
To convert the character 0
to F
into a value 0 to 15, we look up its index in the string 0123456789ABCDEF
. Character 0
appears at position 0 so index returns 0. Character F
is at position 15 so index returns 15.
Now we just need to call the accumulate function each time we pass through states DIGIT1
, ODD
or EVEN
. Here is the modified full code:
START = 'start' WAITX = 'waitx' DIGIT1 = 'digit1' ODD = 'odd' EVEN = 'even' ERROR = 'error' total = 0 def accumulate(c): global total v = '0123456789ABCDEF'.index(c) total = total*16 + v def next_state(state, value): if state == START: if value == '0': return WAITX else: return ERROR elif state == WAITX: if value == 'x': return DIGIT1 else: return ERROR elif state == DIGIT1: if value in '0123456789ABCDEF': accumulate(value) return ODD else: return ERROR elif state == ODD: if value in '0123456789ABCDEF': accumulate(value) return EVEN else: return ERROR elif state == EVEN: if value in '0123456789ABCDEF': accumulate(value) return ODD else: return ERROR else: return ERROR s = input('Type in a hex value ') total = 0 state = START for c in s: state = next_state(state, c) print(state) if state == EVEN: print('VALID') print(total) else: print('INVALID')
In the next section we will look at a non-regular language, and see why it is impossible to implement it as a state machine.
See also
Sign up to the Creative Coding Newletter
Join my newsletter to receive occasional emails when new content is added, using the form below:
Popular tags
555 timer abstract data type abstraction addition algorithm and gate array ascii ascii85 base32 base64 battery binary binary encoding binary search bit block cipher block padding byte canvas colour coming soon computer music condition cryptographic attacks cryptography decomposition decryption deduplication dictionary attack encryption file server flash memory hard drive hashing hexadecimal hmac html image insertion sort ip address key derivation lamp linear search list mac mac address mesh network message authentication code music nand gate network storage none nor gate not gate op-amp or gate pixel private key python quantisation queue raid ram relational operator resources rgb rom search sort sound synthesis ssd star network supercollider svg switch symmetric encryption truth table turtle graphics yenc