next up previous
Next: Uuencoding Up: The UUDeview Decoding Library Previous: Bibliography

Encoding Formats

The following sections describe the four most widely used formats for encoding binary data into plain text, uuencoding, xxencoding, Base64 and BinHex. Another section shortly mentions Quoted-Printable encoding.

Other formats exist, like btoa and ship, but they are not mentioned here. btoa is much less efficient than the others. ship is slightly more efficient and will probably be supported in future.

Uuencoding, xxencoding and Base 64 basically work the same. They are all ``three in four'' encodings, which means that they take three octets11 from the input file and encode them into four characters.


Table: Bit mapping for Three-in-Four encoding
Input Octet 1              
Input Bit 7 6 5 4 3 2 1 0
Output Data #1 5 4 3 2 1 0    
Output Data #2             5 4
                 
Input Octet 2              
Input Bit 7 6 5 4 3 2 1 0
Output Data #2 3 2 1 0        
Output Data #3         5 4 3 2
                 
Input Octet 3              
Input Bit 7 6 5 4 3 2 1 0
Output Data #3 1 0            
Output Data #4     5 4 3 2 1 0


Three bytes are 24 bits, and they are divided into 4 sections of 6 bits each. Table [*] describes in detail how the input bits are copied into the output data bits. 6 bits can have values from 0 to 63; each of the ``three in four'' encodings now uses a character table with 64 entries, where each possible value is mapped to a specific character.

The advantage of three in four encodings is their simplicity, as encoding and decoding can be done by mere bit shifting and two simple tables (one for encoding, mapping values to characters, and one for decoding, with the reverse mapping). The disadvantage is that the encoded data is 33% larger than the input (not counting line breaks and other information added to the encoded data).

The before-mentioned ship data is more effective; it is a so-called Base 85 encoding. Base 85 encodings take four input bytes (32 bits) and encode them into five characters. Each of this characters encode a value from 0 to 84; five characters can therefore encode a value from 0 to 6#6, covering the complete 32 bit range. Base 85 encodings need more ``complicated'' math and a larger character table, but result in only 25% bigger encoded files.

In order to illustrate the encodings and present some actual data, we will present the following text encoded in each of the formats:

This is a test file for illustrating the various
encoding methods. Let's make this text longer than
57 bytes to wrap lines with Base64 data, too.
Greetings, Frank Pilhofer



Subsections
next up previous
Next: Uuencoding Up: The UUDeview Decoding Library Previous: Bibliography
2005-01-11