next up previous
Next: Quoted-Printable Up: Encoding Formats Previous: Base64 encoding

BinHex encoding

The BinHex encoding originates from the Macintosh environment, and it takes the special properties of a Macintosh file into account. There, a file has two parts or ``forks'': the ``resource'' fork holds machine code, and the ``data'' fork holds arbitrary data. For files from other systems, the data fork is usually empty.

I have not found a ``definitive'' definition of the format. My knowledge is based on two descriptions I found, one from Yves Lempereur and another from Peter Lewis. A similar description can be found in [RFC1741].


Table: Encoding Table for BinHex Encoding
Data Value +0 +1 +2 +3 +4 +5 +6 +7
0 ! " # $ % #2658#>   '
8 ) * + , - 0 1 2
16 3 4 5 6 8 9 @ A
24 B C D E F G H I
32 J K L M N P Q R
40 S T U V X Y Z [
48 ` a b c d e f h
56 i j k l m p q r


A BinHex file is a stream of characters, beginning and ending with a colon `:'; intermediate line breaks are to be ignored by the decoder. Each line but the last should be exactly 64 characters in length. The last line may be shorter, and in a special case can also be 65 characters long. The trailing colon must not stand alone, so if the input data ends on an output line boundary, the colon is appended to this line as 65th character. Thus a BinHex begins with a colon in the first column and ends with a colon not in the first column.

The line before the beginning of encoded data (before the initial `:') should contain the following verbatim text:15

(This file must be converted with BinHex 4.0)
BinHex is another three-in-four encoding, and not surprisingly, another different character table is used (table [*]). The documentation does not explicitely mention what is supposed to happen if the original input data does not have a multiple of three octets. But from reading between the lines, it looks like ``unnecessary'' characters (those that would result in equal signs in Base64 encoding) are not printed.


Table: BinHex RLE decoding
Compressed Data   Uncompressed Data
00 11 22 33 44 55 8#8 00 11 22 33 44 55
11 22 90 04 33   8#8 11 22 22 22 22 33
11 22 90 00 33 44 8#8 11 22 90 33 44  
2B 90 00 90 04 55 8#8 2B 90 90 90 90 55


The encoded characters decode into a RLE-compressed bytestream, which must be handled in the next step (of course, decoding and decompressing are usually handled at the same time). A Run Length Encoding simply replaces multiple subsequent occurrences of one octet are replaced by the character, a special marker, and the repetition count. BinHex uses the marker 0x90 (octal 0220, decimal 128). The octet sequence 0xff 0x90 0x04 would decompress into four times 0xff. If the marker itself occurs, it must be ``escaped'' by the special sequence 0x90 0x00 (the marker with a repetition count of 0). Table [*] shows four more examples. Note the last example, where the marker itself is repeated.

Figure: BinHex file structure
9#9

The decompression results in a data stream which consists of three parts, the header section, the data fork and the resource fork. Figure [*] shows how the sections are composed. The numbers above each item indicate its size in octets. The header has the following items:

n
The length of the filename in octets. This is a single octet, so the maximum length of a filename is 255.
Name
The filename, n octets in length. The length does not include the final nullbyte (which is actually the next item).16
0
This single nullbyte terminates the previous filename.
Type
The Macintosh file type.
Auth
The Macintosh ``creator'', the program which wrote the original file. This and the previous item are used to start the right program to edit or display a file. I have no idea what common values are.
Flags
Macintosh file flags. No idea what they are.
Dlen
The number of octets in the data fork.
Rlen
The number of octets in the resource fork.
HC
CRC checksum of the header data.

After the header, at offset 10#10, follow the Dlen octets of the data fork and a CRC checksum of the data fork (offset 11#11), then Rlen octets of the resource fork (offset 12#12) and a CRC checksum of the resource fork (offset 13#13). Note that the CRCs are present even if the forks are empty.

The three CRC checksums are calculated as described in the following text, taken from Peter Lewis' description:

BinHex 4.0 uses a 16-bit CRC with a 0x1021 seed. The general algorithm is to take data 1 bit at a time and process it through the following:
  1. Take the old CRC (use 0x0000 if there is no previous CRC) and shift it to the left by 1.
  2. Put the new data bit in the least significant position (right bit).
  3. If the bit shifted out in (1) was a 1 then xor the CRC with 0x1021.
  4. Loop back to (1) until all the data has been processed.

This is the sample file in BinHex. However, the encoder I used replaced the LF characters from the original file with CR characters. It probably noticed that the input file was plain text and reformatted it to Mac-style text, but I consider this a software bug. The assigned filename is ``test.txt''.

(This file must be converted with BinHex 4.0)
:#&4&8e3Z9&K8!&4&@&4dG(Kd!!!!!!#X!!!!!+3j9'KTFb"TFb"K)(4PFh3JCQP
XC5"QEh)JD@aXGA0dFQ&dD@jR)(4SC5"fBA*TEh9c$@9ZBfpND@jR)'ePG'K[C(-
Z)%aPG#Gc)'eKDf8JG'KTFb"dCAKd)'a[EQGPFL"dD'&Z$68h)'*jG'9c)(4[)(G
bBA!JE'PZCA-JGfPdD#"#BA0P0M3JC'&dB5`JG'p[,Je(FQ9PG'PZCh-X)%CbB@j
V)&"TE'K[CQ9b$B0A!!!!:


next up previous
Next: Quoted-Printable Up: Encoding Formats Previous: Base64 encoding
2005-01-26