Spline Font Database

This page is often grievously out of date. It was a fair approximation to reality on 30 June 2003. Even if out of date it should be helpful, but if you really need to know the current format look at sfd.c and see what it parses.

PfaEdit's sfd files are ASCII files (so they can be copied easily across the internet and so that diffs are somewhat meaningful). They contain a full description of your font.

They are vaguely modeled on bdf files. The first few lines contain general font properties, then there's a section for each character, then a section for each bitmap font.

Font Header

Here is an example of what the first few lines look like:

SplineFontDB: 1.0
FontName: Ambrosia
FullName: Ambrosia
FamilyName: Ambrosia
Weight: Medium
Copyright: Copyright (C) 1995-2000 by George Williams
Comments: This is a funny font.
Version: 001.000
ItalicAngle: 0
UnderlinePosition: -133
UnderlineWidth: 20
Ascent: 800
Descent: 200
DisplaySize: -24
AntiAlias: 1
WinInfo: 64 16 4
FitToEm: 1
XUID: 3 18 21
Encoding: unicode
Order2: 1
OnlyBitmaps: 0
TeXData: 1 10485760 0 269484 134742 89828 526385 1048576 89828

The first line just identifies the file as an sfd file. The next few lines give the various different names that postscript allows fonts to have. Then some fairly self-explanatory items (if they don't make sense, look them up in the font info dlg). A few things need some explaination:

TeXData
These are the TeX font parameters (and some similar info). The first number is 1,2 or 3 and indicates that the font is a text, math or math ext font. The next number is the design pointsize (times (1<<20)). Then follow the font parameters. These values are usually in TeX fix_word format where there is a binary point after the first 20 binary digits (so to get the number divide by (1<<20)).
DisplaySize
This is the number of pixels per em that will be used by default to display the font in fontviews (it may be changed of course). Negative numbers mean to rasterize the display from the outlines, positive numbers mean to use a prebuilt bitmap font of that size.
AntiAlias
Whether the fontview should display the font as antialiased or black and white. (AntiAliased looks better, but will be slower)
FitToEm
Controls whether Fit to Em is checked by default in a fontview that displays this font.
WinInfo
Has three pieces of data on the default display of windows containing this font. The first datum says that the window should be scrolled so that glyph at encoding 64 should be visible, the second that the window should have 16 character columns horizontally, and the last that there should be 4 character rows vertically.
Encoding
For normal fonts this will be one of the names (or a close approximation thereto) that appears in the Encoding pulldown list. CID keyed fonts will not have encodings. Instead they'll have something like:
Registry: Adobe
Ordering: japan1
Supplement: 4
CIDVersion: 1.2

Some fonts will have some TrueType information in them too (look at the truetype spec for the meanings of these, they usually live in the OS/2 table).

FSType: 4
PFMFamily: 17
TTFWeight: 400
TTFWidth: 5
Panose: 2 0 5 3 0 0 0 0 0 0
LineGap: 252
VLineGap: 0

If loaded from a font with an OS/2 table there may also be the following fields (there is no ui to set these fields, but PfaEdit preserves them)

HheadAscent: 892
HheadDescent: -200
OS2TypoAscent: 892
OS2TypoDescent: -200
OS2WinAscent: 892
OS2WinDescent: -200

These represent different definitions of ascent and descent that are stored in various places in the truetype file (Horizontal header and OS/2 tables).

Some fonts will have Postscript specific information contained in the Private dictionary

BeginPrivate: 1
BlueValues 23 [-19 0 502 517 750 768]
EndPrivate

If the font has any opentype features it will have a list of script language entries:

ScriptLang: 3
 1 latn 1 ROM   
 1 latn 4 DEU  ROM  VIT  dflt 
 3 cyrl 1 dflt grek 1 dflt latn 5 DEU  ROM  TRK  VIT  dflt

The first line says that there are 3 sets of script lang entries. The next line says that this entry is only active for one script ('latn') and then only if the language is 'ROM '. The next line says the entry is again only active for one script, but that this time any of the four languages 'DEU ', 'ROM ', 'VIT ' or the default language apply to it. And the last line gives an example of an entry that is active for several scripts with several languages.

If your font has any kerning classes

KernClass: 31 64 5 0
 1 F
 41 L Lacute glyph78 Lcommaaccent Ldot Lslash
 1 P
...
 6 hyphen
 5 space
...
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -152 -195 -152 -225 0 0 0 0 0 0 0 0 0 0 0 0 0 -145 -145 -130
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -130 0 0 0 0 0 0 0 0 0 0 -145 0 -115 0 0 0 0 -65 0 -140 -120 -120
...

The first line says that this Kerning Class has 31 different classes for the first character, and 64 for the second. It is active for entry 5 in the script lang list above. The OpenType flags will be 0. The next line says that class 1 of the first character (class 0 is reserved and never present) consists of only one character "F" (the number in front is the string length of the line. It speeds up processing the sfd file but has no semantic content). The next line is for class 2 of the first character, it has more characters in it and a longer string length. After 30 entries we start on the classes for the second character. They look exactly like classes for the first character. After all the second character classes have been defined we have an array of numbers, <char1 class cnt>*<char2 class cnt> of them in fact. This specified the amount of kerning that should be placed between a characters of the given classes of left and right characters (ie. if char1 was in left class 2 and char2 was in right class 4 then we would index this array with 3*<char2 class cnt> + 4).

The order of opentype features within GPOS/GSUB tables may be stored:

TableOrder: GPOS 4
	'subs'
	'sinf'
	'mark'
	'kern'
TableOrder: GSUB 2
	'frac'
	'nutf'

If the font contains ttf hinting, then the file may contain unparsed truetype tables (understanding ttf hinting is beyond PfaEdit's abilities, but it can at least preserve the data):

TtfTable: prep 4360
5S;o3()It?eJ8r@H[HSJH[H^@!b&BQ*?Vcm@'XSh+1MACZ>Up/\,o1+Ca't2!<ocH+Wn2p"@,t&
+Wo+[()Is6G8:u7D/^7,*,KO/(E=N5!=s)LCMjn,:Mp3:DSL&j05dG#cY`hLCN"!<CBO$@s(_ZX
...

The first line says that the 'prep' table is 4360 bytes long. Subsequent lines will provide 4360 bytes of data, packed in Enc85 (which takes binary data and packs it into ASCII without too much expansion. See the PostScript Reference Manual for a description of this packing).

The LangName entries represent the TrueType names table the number represents the language and is followed by a list of strings encoded in UTF-7. The first string corresponds to ID=0 (Copyright), the second to ID=1 (Family), ... trailing empty strings will be omitted. In the American English language (1033) section, if one of these names exactly matches the equivalent postscript item then that name will be omitted (this makes it easier to handle updates, users only have to change the copyright in one place)

LangName: 1033 "" "" "Regular" "GWW:Caliban Regular: Version 1.0" "" "Version 1.0"
LangName: 1032 "" "" "+A5oDsQ09A78DvQ05A7oDrAAA"

If your font has any anchor classes:

AnchorClass: "top" mark 4 7 1 "bottom" mark 4 7 1 "Anchor-2" mark 4 7 1 "Anchor-3" mark 4 7 2 "Anchor-4" mark 4 7 2 "Anchor-5" mark 4 7 3 "Anchor-6" mark 4 7 4 

There is an Anchor Class named "top" which has feature tag 'mark', opentype flags 4, a script lang index of 7, and it should be merged with other anchor classes marked with "1". The next class is named "bottom", the next "Anchor-2" and so forth. (Anchor class names are output in UTF7)

There may be a Grid entry near the top of the font, this specifies the splines to be drawn in the grid layer for the font, see below for a description of the splineset format:

Grid
678 -168 m 5
 -40 -168 l 5
-678 729 m 1
 1452 729 l 1
-678 525 m 1
 1452 525 l 1
EndSplineSet

Outline Character Data

Then for non-CID fonts:

BeginChars: 285 253

This means that the font has room for 285 characters and that there are a total of 253 defined (usually control characters are not defined). A character looks like:

StartChar: exclam
Encoding: 33 33
Width: 258
Flags: 
HStem: 736 13<39 155>  -14 88<162 180>
VStem: 71 84<49 396> 
Fore
195 742 m 0
 195 738 193 736 189 736 c 0
 175 736 155 743 155 682 c 0
 155 661 130 249 130 131 c 0
 130 100 96 99 96 131 c 0
 96 149 71 662 71 682 c 0
 71 731 51 736 37 736 c 0
 33 736 31 738 31 742 c 0
 31 748 36 747 38 749 c 1
 188 749 l 1
 190 747 195 748 195 742 c 0
80 32 m 0
 81 53 95 75 116 74 c 0
 137 73 150 53 150 32 c 0
 150 10 137 -14 115 -14 c 0
 93 -14 79 10 80 32 c 0
EndSplineSet
EndChar

The first line names the character, the next line gives the encoding, first in the current font, then in unicode. Then the advance width.

Then a set of flags (there are currently four flags: "H" => the character has been changed since last hinted, "M" the character's hints have been adjusted manually, "W" the width has been set explicitly, and "O" => the character was open when last saved).

Then horizontal and vertical (postscript) stem hints (set of several two number pairs, the first number in the pair is the location of the stem, the next number is the width of the stem, the numbers in brokets (<>) indicate where the hint is valid).

For fonts with vertical metrics there may also be a

VWidth: 1000

specifying the vertical advance width.

The entry Fore starts the foreground splines, they are encoded as postscript commands with moveto abbreviated to m, curveto to c and lineto to l (lower case el). The digit after after the letter indicates whether the point is curve (0), corner (1) or tangent (2). A set of splines in the background is similar, it will be introduced by a Back entry.

While a background image is stored in the following horrible format:

StartChar: A
...
Image: 167 301 0 21 2 1 23 753 2.53892 2.53892
J:N0SYd"0-qu?]szzz!!#7`s7cQozzz!!!!(s8Viozzzz"98E!zzzz!!3-"rVuouzzz!!!'"
s8N'!zzz!!!!$s8W,7zzzz"98E$huE`WzJ+s!Dz!"],0s6p!g!!!!"s8W-!n,NFg!!!Q0s8Vio
z5QCc`s82is!!!!`s8W,gz!WW3"s8W&uzJ,fQKp](9o!!iQ(s8W-!z!<<*!s7cQo!!",@s8W-!
...
EndImage
EndChar

Where the numbers on the image line mean respectively: width (of image in pixels), height, image type (0=>mono, 1=>indexed, 2=>true), bytes per line, number of color entries in the color table, the index in the color table of the transparent color (or for true color images the transparent color itself), the x and y coordinates of the upper left corner of the image, the x and y scale factors to convert image pixels into character units. Then follows a bunch of binary data encoded using Adobe's Encode85 filter (See the PostScript Reference manual for a description). These data contain all the colors in the color table followed by a dump of the image pixel data.

Bitmap data will be compressed by run length encoding. I'm not going to go into that in detail, if you want to understand it I suggest you look at the file sfd.c and search for image2rle to see how it is done. The image is compressed using rle and then output as above, only now there is one more parameter on the "Image:" line which gives the number of bytes to be read from the data stream.

A character need not contain any splines:

StartChar: semicolon
Encoding: 59 59
Width: 264
Flags: 
HStem: 
VStem: 
Ref: 44 N 1 0 0 1 0 0
Ref: 46 N 1 0 0 1 0 414
EndChar

Above is one with just references to other characters (a semi-colon is drawn here by drawing a comma and stacking a period on top of it). The first number is the local encoding of the character being refered to, the N says the reference is not selected, the remaining 6 numbers are a postscript transformation matrix, the one for comma (44) is the identity matrix, while the one for period (46) just translates it vertically 414 units.

If a font has been loaded from a truetype file it may containing hinting information (PfaEdit does not attempt to understand truetype hints, just to preserve them)

TtfInstrs: 107
5Xtqo&gTLA(_S)TQj!Kq"UP8<!<rr:&$QcW!"K,K&kWe?(^pl]#mUY<!s\f7"U>G:!%\s-3WRec
$pP.r$uZOWNsl$t"H>'?EW%CM&Cer:&f3P>eEnad5<Qq=rQYuk3AE2g>q7E*
EndTtf

This is 107 bytes of Enc85 encoded binary data.

If the character contains Anchor Points these will be included:

AnchorPoint: "botom" 780 -60 basechar 0
AnchorPoint: "top" 803 1487 basechar 0

the point names the anchor class it belongs to (in UTF-7), its location, what type of point it is (basechar, mark, baselig, basemark, entry, exit), and for ligatures a number indicating which ligature component it refers to.

If the character is the first in any kern pairs (not a pair defined by a kern class, however)

KernsSLIF: 114 -100 1 0 117 -92 1 0 101 -123 1 0 97 -107 1 0 111 -107 1 0

Where each kern pair is represented by 4 numbers. The first is the encoding of the second character (using the current encoding for the font), the next is the horizontal kerning amount, then the script lang index, and finally the open type flags. Then we start over with the next kernpair.

Data that are to go into other GPOS, GSUB or GDEF sub-tables are stored like this:

Position: 12 0 'sinf' dx=0 dy=-900 dh=0 dv=0
Ligature: 12 0 'frac' one slash four
Substitution: 12 3 'smcp' agrave.sc
AlternateSubs: 12 0 'swsh' glyph490 A.swash
MultipleSubs: 12 0 'ccmp' a grave
Ligature: 12 4 'liga' f f
LCarets: 0 0 '    ' 1 650 

In most of these lines the first two numbers provide a script lang index and a set of opentype flags (except for LCarets where they are ignored). This is followed by an opentype tag (also ignored for ligature carets). A simple position change is expressed by the amount of movement of the glyph and of the glyph's advance width. A ligature contains the names of the characters that make it up. A simple substitution contains the name of the character that it will become. An alternate sub contains the list of characters that the user may choose from. A multiple substitution contains the characters the current glyph is to be decomposed into. A ligature caret contains a count of the number of carets defined, and the a list of the locations of those carets.

A glyph may have an arbetrary comment associated with it, this will be output in UTF-7

Comment: Hi

or a color

Colour: ff0000

Bitmap Fonts

After all the outline characters have been described there is an EndChars entry and then follow any bitmap fonts:

EndChars
BitmapFont: 12 285 10 2 1
BDFChar: 32 3 0 0 0 0
z
BDFChar: 33 3 0 1 0 9
^d(.M5X7S"!'gMa

The bitmap font line contains the following numbers: the pixelsize of the font, the number of potential characters in the font, the ascent and the descent of the font and the depth of font (number of bits in a pixel). This is followed by a list of bitmap characters, the bitmap character line contains the following numbers: the encoding (local), the width, the minimum x value, the minimum y value, the maximum x value and the maximum y value. This is followed by another set of binary data encoded as above there will be (ymax-ymin+1)* ((xmax-xmin+8)/8) (unencoded) bytes, there is no color table here (the high order bit comes first in the image, set bits should be colored black, clear bits transparent).

A bitmap font is ended by:

EndBitmapFont
BitmapFont: 17 285 14 3 1
BDFChar: 0 17 0 0 0 0
z
...
EndBitmapFont
EndSplineFont

CID keyed fonts

A CID font is saved slighlty differently. It begins with the normal font header which contains the information in the top level CID font dictionary. As mentioned above this will include special keys that specify the CID charset (registry, ordering, supplement). It will also include:

CIDVersion: 2.0
BeginSubFonts: 5 8318

The CIDVersion is self-explanitory. The BeginSubFonts line says that there are 5 subfonts the largest of which contains slots for 8318 characters (again some of these may not be defined). This will be followed by a list of the subfonts (dumped out just like normal fonts) and their characters. Only the top level font will contain any bitmap characters, anchor classes, etc.

Autosave Format

Error recovery files are saved in ~/.PfaEdit/autosave, they have quite random looking names and end in .asfd. They look very similar to .sfd files above.

If an asfd file starts with a line:

Base: /home/gww/myfonts/pfaedit/Ambrosia.sfd

Then it is assumed to be a list of changes applied to that file (which may be an sfd file or a font file). If it does not start with a "Base:" line then it is assumed to be a new font. The next line contains the encoding, as above. The next line is a BeginChars line. The number given on the line is not the number of characters in the file, but is the maximum number that could appear in the font. Then follows a list of all changed characters in the font (in the format described above).

Bitmaps are not preserved. Grid changes are not preserved.

-- Prev -- TOC -- Next --