Comparison of encoding schemes

Martin McBride, 2017-04-09
Tags binary encoding
Categories binary encoding data formats base64 base32 base16 ascii85 yenc

Of the main four binary encoding schemes, Hex and Base64 are the most commonly used. Base32 is less common, and ASCII85 is only really used in the context of PostScript and PDF.

Hex encoding has several major plus points. It is very easy to understand and to implement. Each byte is encoded as a separate character pair. Looking at hex encoded data in a text editor is exactly like looking at binary data in a hex editor. You can search it, edit it, and if you have spent enough time working with hex files you might even be able to read it, decoding it in your head as you go along. There is a major downside to these advantages. It is the most inefficient scheme, in fact it increases the size of the data by 100%.

Base64 uses a larger character set to achieve a more efficient encoding. It is not intended to be in any way human readable, but it is designed to be compatible with as many systems as possible. The 64 characters are compatible with normal ASCII, as well as older variants of ASCII, and EBCDIC (a predecessor of ASCII). The algorithm is more complex than hex encoding, but the data size is only increased by 33%.

Base32 uses a more restricted character set than Base64, and is therefore less efficient. The encoding only uses upper case letters and some numerals. Numbers 0 and 1 are excluded to avoid confusion with letters. Base32 can also be used in place of Base64 if there is a danger that the case of letters might be altered. Data size is increased by 60%

Base32 falls somewhere between Hex and Base64. It lacks the simplicity of Hex, but it is less efficient than Base64. In most cases, if you are opting for a slightly more complex algorithm you might as well go for Base64 which produces smaller data.

There is one area where Base32 is quite useful. If you need a user to manually enter a binary key, for example a product activation code, Base32 is worth considering. It is case insensitive, and uses only letters and numerals - for manual entry this less confusing than Base64 (which is case sensitive and uses punctuation symbols), but more compact than Hex.

ASCII85 is the most efficient coding system – data size increases by just 20%. It has a couple of minor disadvantages. It uses a larger character set, and so it is only compatible with ASCII (unlike Base64, which supports various close relatives of ASCII). It is also slightly more demanding computationally, since it uses division rather than bit shifting. However, these factors are becoming increasingly irrelevant in the context of modern computer systems. The main reason that Base64 continues to be used more than ASCII85 is probably the simple fact that it has been around for longer.

Copyright (c) Axlesoft Ltd 2021