next up previous
Next: BinHex encoding Up: Encoding Formats Previous: Xxencoding

Base64 encoding

Base 64 is part of the MIME (Multipurpose Internet Mail Extensions) standard, described in [RFC1521], section 5.2. Sometimes, it is incorrectly referred to as ``MIME encoding''; however, the MIME documents specify much more than just how to encode binary data. It defines a complete framework for attachments within E-Mails. Being part of a widely accepted standard, Base64 has the advantage of being the best-specified type of encoding.


Table: Encoding Table for Base64 Encoding
Data Value +0 +1 +2 +3 +4 +5 +6 +7
0 A B C D E F G H
8 I J K L M N O P
16 Q R S T U V W X
24 Y Z a b c d e f
32 g h i j k l m n
40 o p q r s t u v
48 w x y z 0 1 2 3
56 4 5 6 7 8 9 + /


The general concept of three-in-four encoding is the same as with the previous two types, just another new character table to represent the values needs to be introduced (table [*]). Note that this table differs from the xxencoding table only in a single character (`/' versus `-'). If a line of encoding does not feature either character, it may be difficult to tell which encoding is used on the line.

The Base64 encoding does not have ``begin'' and ``end'' lines; such a concept is not needed, because the framework of a MIME message defines the beginning and end of a part. The encoded data is defined to be a ``stream'' of characters, and the decoder is supposed to ignore any ``illegal'' characters in the stream (such as line breaks or other whitespace). Each line must be shorter than 80 characters and terminated with a CRLF sequence. No particular line length is enforced, but most implementations encode 57 octets into 76 encoded characters. Theoretically, a line might hold 79 characters, although this would violate the rule of thumb that the line length is a multiple of four (therefore encoding an integral number of octets).14

The end-of-file handling if the input data has not a multiple of three octets is slightly different in Base64 encoding than it is in uuencoding. If one octet is left at the end of the input stream, the data is padded with 4 zero bits (giving a total of 12 bits) and encoded into two characters. After that, two equal signs `=' are written to complete the four character sequence. If two octets are left, the data is padded with 2 zero bits (giving a total of 18 bits), and encoded into three characters, after which a single equal sign `=' is written.

Here's our sample file in Base64. Note that this text is only the encoded data. It is not a valid MIME message. Without the required framework, no proper MIME software will read it.

VGhpcyBpcyBhIHRlc3QgZmlsZSBmb3IgaWxsdXN0cmF0aW5nIHRoZSB2YXJpb3VzCmVuY29kaW5n
IG1ldGhvZHMuIExldCdzIG1ha2UgdGhpcyB0ZXh0IGxvbmdlciB0aGFuCjU3IGJ5dGVzIHRvIHdy
YXAgbGluZXMgd2l0aCBCYXNlNjQgZGF0YSwgdG9vLgpHcmVldGluZ3MsIEZyYW5rIFBpbGhvZmVy
Cg==

For a more elaborate documentation of Base64 encoding and details of the MIME framework, I suggest reading [RFC1521].

The MIME standard also defines a way to split a message into multiple parts so that re-assembly of the parts on the remote end is easily possible. For details, see section 7.3.2, ``The Message/Partial subtype'' of the standard.


next up previous
Next: BinHex encoding Up: Encoding Formats Previous: Xxencoding
2005-01-26