yEnc encoding

By Martin McBride, 2017-04-09
Tags: binary encoding yenc
Categories: binary encoding data formats

yEncode (yEnc) is a binary encoding format designed for sending binary data via email and newsgroups. It is different to other methods (such as Base64 encoding) in that yEncoding produces binary data, modified so that certain "critical" byte values are substituted.

The key to yEnc is to recognise that most modern news and email systems can transfer 8-bit binary data, except for a few problems with certain specific byte values. A null (hex 0x00) can be misinterpreted as the end of the data and can cause problems in some systems. Similarly hex values 0x0A and 0x0D can be misinterpreted as end-of-line characters, and may sometimes be automatically substituted in certain systems. The main purpose of yEnc is to ensure that these byte values never appear in the encoded data.

Key Characteristics

yEnc is not a true binary encoding method - it does not produce printable ASCII data. It is not suitable for ASCII-only networks or protocols. It is not suitable for user entry of binary keys, nor for including binary strings in filenames or URLs.

yEnc is useful in specific situations such as email and newsgroups, where the main requirement is to escape problem characters such as null or CR, LF. Encoded data size is data dependent, but in most cases it is very efficient, typically increasing data size by only 1 or 2%. Worst case data size increases by about 100%, but this is unlikely to happen in practice.

It is worth noting that this algorithm has not been adopted by any official standards organisation. It is supported by several web browsers and email clients.

Encoding

The encoding algorithm operates as follows:

Escape character is "=", hex 0x3D.

Critical characters are 0x00, 0x0A, 0x0D and 0x3D. 0x3d is included because it is the escape character, and therefore cannot appear "as itself" in the encoded data.

n is the output line length, typically 128 is used.

  1. Get a byte from the input stream.

  2. Add 42 (decimal) to the byte, modulo 256.

  3. If the result is a critical character, output an escape character and increment the input byte by another 64 (modulo 256).

  4. Output the byte.

This is repeated until all the input bytes are used up. To ensure that the data can be transmitted by most standard protocols, a CRLF pair should be inserted every n output bytes. However, if a line ends with a critical character, then the 2-byte escape sequence should be output on the same line. This means that in that case only, the line is permitted to be n+1 bytes long.

If you are curious about the step of adding 42 to each byte of the input data, the reason is quite simple. Binary data often contains a disproportionate number of zero bytes, which would all need to be escaped, doubling their size. Adding 42 to every byte removes this source of inefficiency. Almost any value could have been chosen. Presumably, 42 was chosen for the obvious reason.

Header and Trailer

The binary data is immediately preceded by the header line:

=ybegin line=128 size=123456 name=mybinary.dat

The 3 parameters must all be present, and the name must be the final parameter. These parameters indicate the typical line length (ie, ''n''), the number of bytes in the unencoded data, and the name of the original binary file. The following trailer must follow immediately after the data:

=yend size=123456 crc32=abcdef12

size must have the same value as the header. The crc32 value is optional, but if present it must contain the 32-bit CRC of the original data. One thing which is not made entirely explicit in the yEnc specification is that the CRC value is an 8-digit HEX number, all other parameters are decimal.

Multipart Encodings

yEnc supports multipart encoded binaries. It also contains recommendations for subject line conventions when posting multipart encoded binaries to newsgroups. This isn't described here, refer to the yEnc website for more details.

See also

Sign up to the Creative Coding Newletter

Join my newsletter to receive occasional emails when new content is added, using the form below:

Popular tags

555 timer abstract data type abstraction addition algorithm and gate array ascii ascii85 base32 base64 battery binary binary encoding binary search bit block cipher block padding byte canvas colour coming soon computer music condition cryptographic attacks cryptography decomposition decryption deduplication dictionary attack encryption file server flash memory hard drive hashing hexadecimal hmac html image insertion sort ip address key derivation lamp linear search list mac mac address mesh network message authentication code music nand gate network storage none nor gate not gate op-amp or gate pixel private key python quantisation queue raid ram relational operator resources rgb rom search sort sound synthesis ssd star network supercollider svg switch symmetric encryption truth table turtle graphics yenc