International Internet Application Development with iMIME

Internationalizating Big Backend Web & Mail Applications
Adrian D. Havill

original edition by Red Hat, Inc. [  ]

Copyright ©2001 Red Hat, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being just "UTF-8 for ASCII Hackers", with one Front-Cover Text: "original edition by Red Hat, Inc." and with one Back-Cover Text: "Additional documentation and support for iMIME can be obtained from Red Hat, Inc.".


Description

iMIME is a C library (with some other language interfaces) that is used to input and query mail messages and HTTP submissions that have non-ASCII and non-text input.

Features

orthagonal
old HTML FORMs, old file uploads, and new HTML FORMs can all be processed with the exact same interface, loosening the link between the web page and the server-side interface. Web programmer and content designer productivity is increased because one generic interface handles all character sets and all form styles, making it easier to translate forms or change the structure of forms (add file upload, for example) without needing to worry about changing the backend. This facilites easy forward and backward migration from legacy to modern forms, or you can support them both, should you negotiate with the client and discover an older browser that can't handle new form submission styles.
robust
iMIME is designed to handle malformed messages with relative grace, and can tolerate slight deviances from standard syntax.
portable
While developed with gcc and glibc, the library is written in Standard C (C99) with POSIX, Single Unix Standard, and Unix98 calls, making porting to other standard environments easy.
open source
This code is free software, and can be used in accordance with the General GNU Public License. The documentation is also free, and may be used in accordance with the GNU Free Documentation License.
internationalization
iMIME's strongest point is its near complete support for Standards Track MIME RFCs related to internationalization, providing a parser which can decode the complexities without requiring extra effort by the host application.
single pass parser
iMIME can parse a non-rewindable stream, such as a TCP stream, without requiring the entire message to be read into memory first nor requiring the application to deliver the stream with certain size chunks or lines or maintain its own buffers.
reentrant
iMIME uses no global/static variables and uses only thread-safe library functions, allowing it to be used by threaded applications.
dynamic
No objects are stored in fixed-size arrays, including strings, so the size of any object is limited only by the environment. While certainly no guarantee of safety, fixed-size arrays (especially strings) are the most commonly exploited buffer overrun/stack smashing attack for internet applications.
Unicode/ISO-10646
Text information is automatically converted when appropriate, making code in the calling library for handling multiple character sets and encodings unnecessary. The system's iconv library is used, keeping this code small and free of tables.

Supported Standards and RFCs

iMIME was written to correctly parse messages using the syntax defined in most popular MIME and HTTP related RFCs. Special emphasis was placed on decoding I18N encoded messages painlessly and supporting as many I18N related standards and specifications as possible.

RFC 822
multiline header lines, quoted header values are understood, free form text headers and headers with are parsed appropriately. Dates and timezones are parsed and decoded. Received headers are rewritten in a form compatible with value and parameter parsing, with the focus being the timestamp.
RFC 1867
iMIME understands HTML file upload forms and treats them as MIME messages.
RFC 2045
RFC 2046
RFC 2047
quoted-printable, base64, and multipart bodies, as well as headers with non-ASCII encoded words are decoded and converted automatically to Unicode.
RFC 2183
Content-Disposition headers are used to decide whether a body is considered to be a separate file or kept in memory. It will decode and use the time and date information for files based on this header.
RFC 2231
this extends the ways used to encoded non-ASCII in headers by specifying the format for extremely long parameters, parameter values with non-ASCII characters, and language information in parameter values as well as encoded words.
RFC 2279
UTF-8 is used to multibyte encode ISO-10646 into C strings, which allows for easy use and manipulation of I18N strings by applications which are 8-bit clean but weren't designed for wide Unicode characters.
RFC 2388
multipart/form-data will not just be used exclusively for file uploads in future HTML, but will be used for I18N FORMs and large forms in general. multipart/form-data is especially good for non-ASCII, multiple files per form-control, large amounts of data, and base64 and quoted-printable encoded form controls.
RFC 2616
Header "Q" values for language and content negotiation are parsed. If the parser is used for CGI nph (modern Apache 1.3+ nph that provides an unbuffered stream and translates transport details such as encryption and chunking), the parser will recognize the HTTP command if it's the first line and transform it into a header line similar to "Request: "URI"; method=METHOD; http=HTTP-VERSION".
HTML 4.01
HTML 4.01 gives recommendations for FORM submissions and extended multipart/form-data format, and defines how to handle non-ASCII form control names as well as multiple files per file upload control. iMIME also implements HTML 4.01's recommendation that servers recognize the semicolon as a form control separator in addition to the ampersand for application/x-www-form-urlencoded, making it easier to pass data to a form processor from an HTML <A>nchor, where the ampersand must be escaped.

While not yet standard, many message submitting web clients have extensions related to I18N that iMIME supports.

SGML NCRs and HTML General Entity Encoding
Some web tools, such as Internet Explorer, will encode Unicode characters that are not in the target character set by representing them as SGML NCR (numeric character references; ex. &#32;, &#x20;). Text streams that contain these characters, either old base 10 form or the new hexadecimal form, will be post processed so the characters are normalized. HTML 4 general entities (ex. &ocirc;, &euro;) are also recognized, allowing users to enter characters that their software is not capable of generating. Interpretation is loose in that invalid sequences are ignored and the ampersand is not consumed, making it work well with text files that haven't properly transformed '&' to "&amp;". Note that the following characters will not be normalized, as the application may want SGML/HTML input from the form (many web forms know that the users are HTML saavy and allow/encourage the input of raw HTML) and normalizing these characters would cause them to interfere with post-processing by the application: If you need these characters to be processed, you will need to use the convert with sgml_safe=false. Note that these characters shouldn't be automatically generated by browsers as they are ASCII/ISO-639.
application/x-www-form-urlencoded; charset=encoding
Some newer browsers such as Mozilla attempt to correct the design flaw in older form submissions where the character encoding was not specified for a submission, meaning that the coders had to work with the web designers and either agree on a common character set for each page, or an hidden control that specified the character set. This extension eliminates that kludge.

Current Limitations

iMIME isn't yet designed to do everything. In particular, the following known limitations exist.

License & Copyright

The code is ©2001 by Red Hat, Inc. You may use the code, binary and source, in accordance with the GPL. Like most free software, the library has no warranty or support. Then again, there's a lot of non-free software out there that refuses to provide support or a warranty! Red Hat support and service products are available for open source software.

This documentation is ©2001 by Red Hat, Inc. You may use this documentation in accordance with the FDL.

Obtaining the Source Code

The most current source code and documentation for iMIME is distributed via anonymous FTP from <URL:ftp://people.redhat.com/havill/imime.tar.gz>.

RPMs are available in the same directory for more convenient package oriented installation of the source and IA-32 binaries.

The same directory should also hold older versions of iMIME.

Quick Start

The following program will read in one mime message from standard input, and then will run header queries against the header information. For simplicity, no error checking is done, but proper memory deallocation is shown.

#include <stdio.h>
#include <stdlib.h>
#include "mime.h"

int main(int argc, char *argv[]) {
  mime_state *state = mime_init(NULL, mime_mime, NULL, 1, NULL, stdin);
  mime_msg *messages = mime_parse(NULL, state);
  
  while (--argc > 0) {
    char *s = mime_get_header_info(messages->headers, argv[argc]);
    mime_fputs(s, stdout);
    puts("");
    free(s);
  }
  mime_free_msg_list(messages);
  mime_free(state);
  return 0;
}
  

To parse MIME messages with iMIME, most applications will need to go through the following steps shown above.

  1. Initialize the state machine with mime_init. If unsuccesful, the function returns NULL.
  2. Parse the message with mime_parse(). The returned value will be the head of the linked list of messages passed to it, or a new list. If unsuccessful, it will return NULL.
  3. The next step is to look at the contents of the mime_msg returned and work with the data. In this example, we run header queries passed from the command line and print the UTF-8 string.
  4. The final step is to get back the resources allocated by calling mime_free_msg_list to free the message list and everything it references, then free up the dynamically allocated state information with mime_free. Note that mime_free_msg_list will also attempt to remove temporary files it creates, so the filename must be changed to a string of zero length or the file copied or moved to a different location. If you copy or move the temporary file, be sure to preserve the file access and modification timestamps which the parser may set.

Feed the following message (make sure the newlines are CRLF and no other space precedes the headers of blank lines) to the above program in standard input. You will also need to be using an Standard C environment where text input streams are treated verbatim (MS Windows Visual C++ CRT (C Runtime Library) converts CRLF to '\n'. Cygnus/Red Hat's cygwin can leave it as is), to keep the CRLF newlines from getting converted.

To: =?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?= <santa@nowhere.com>
From: =?ISO-8859-1?Q?Olle_J=E4rnefors?= <havill@redhat.com>
Subject: =?ISO-2022-JP?B?N0obJEIyfkR7SEckTkZiTUYbKEI=?=
Comments: (=?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?=)
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=ISO-2022-JP

This is a test=
 to see how it works =41=42=43

H=65llo!
  

When run with the argument Subject, the program should output a string that resembles C/C++/Java.

L"7J\u6539\u8A02\u7248\u306E\u5185\u5BB9"

This is "7J改訂版の内容".

Data Structures

To use the library, you need to manipulate the contents of data structures defined in mime.h. While the query interface can manipulate the header and message structures and large scale software engineering principles frown on directly accessing the contents of abstract data types (because the contents could very well change when iMIME is improved), it can be beneficial to scan the structures directly for performance, for when you want to convert the data structures to a format used by the host application.

Strings

Strings are multibyte to make it easier for applications that don't use wide characters, and because strings are stored and transfered from internal memory to external files, which are normally 8-bit with modern environments, making transfer of text objects from memory to disk and back easier. Functions are provided to transform UTF strings to wide strings.

UTF-8 for ASCII Hackers

Note: MIME standards often use the parameter name "charset" when they really mean the "encoding of a character set". Microsoft and IBM use the terminology "codepage" for the encoding of a character set. EUC uses "codeset", which actually does refer to the our meaning of character set (EUC can encode up to four separate codesets). Also, Unicode characters within this document are either prefixed with U+ (if less than 0xFFFF) or U- (if greater than 0xFFFF).

UTF-8 is a 8-bit octet transformation of 21-bit per character Unicode. If you never cared about Unicode and want to know as little as possible because you're monolingual, then all you need to know about Unicode (also known as ISO-10646) the character set and UTF-8 the encoding can be summarized by the following points.

Occasionally you will hear criticism from a minority of Han/CJK (aka Kangxi/Chinese, Kanji/Japanese, Hanja/Korean) character) users that claim their language cannot be properly represented by Unicode. As a general rule of thumb, Unicode can round-trip encode/decode all national standard characters. If they're using computers now with their language, they can use it with Unicode, and there's plenty of room for expansion with Unicode should additional characters be discovered to be useful.

Clarifying the confusion regarding Han characters and Unicode involves understanding that Unicode, unlike some other Han character sets, tries to not mix characters and glyphs (font variants of the same character), unless a national standard also encoded the glyphs (so that round-trip conversion is possible). This is a kludge which acknowledges that Unicode never could have become accepted unless it supported legacy charset conversion, no matter how broken the legacy set was. Unicode tries not to encode glyph variants unless for compatibility because while this makes rendering (questionably) easier, the processing of character information (text searching and normalizing) unscalable as the glypth variant count rises.

Most of the characters they point out that are allegedly not in Unicode are in fact in Unicode but not the base glyph variant.

Variants of the same character are properly handled by a higher level. Although it will rarely be needed except by scholars and polyglots, the MIME decoders provided allow the specification of language to select a set of font glyphs for the Unicode characters, and Unicode allows for language tags via special characters in plane 14 should the higher level protocol not support language specification or the stream must be plain text.

Separator String

If one value has multiple string values (for example, if a header contains two "X-Subject" lines), these strings will be concatenated together under the one header "X-Subject". In between the two strings will be the byte 0xFF, which never appears in UTF-8 strings.

Multibyte UTF Magic Prefix

Strings are either free-form (and may include null characters) or text. If the text was converted from a character set to UTF-8 format, the first three hex bytes of the string will be 0xEB 0xBB 0xBF. These three bytes represent the Unicode character U+FEFF, which is a zero width, non breaking space.

In a render that understands Unicode, this character when printed should do nothing. However, applications using the library should test for the presence of this character are remove it if it is the front of the string (but not further occurences of the character within the same string), as it is a magic character that is not part of the conversion. If the string indeed contained a U+FEFF character, the First six bytes will be 0xEB 0xBB 0xBF 0xEB 0xBB 0xBF, and the first three bytes should be ignored, but not the second three bytes.

This magic is put at the front of the strings because application environments that are not internationalized will need to determine which multibyte strings are Unicode and which are in other encodings. Also, MIME messages sometimes do not contain character set/encoding information, and one needs to know which text strings are ambiguous and which are definitely UTF-8.

You can modify the library behavior to make UTF-8 strings non-prefixed with a U+FEFF if your application does not need to distinguish between UTF-8 and non-UTF-8 strings.

Headers

/* typedef struct mime_header */ Headers are the series of colon separated Header: information pairs at the top of the message and continuing to the first blank line.

If a header is non-structured (The freeform field evaluates to true), the text to the right of the colon is not interpreted as structured and the string is put inside of a mime_header with no pointer to a mime_data. Otherwise, the comma separated values are stored in a mime_data list.

Each value can have zero or more parameters (separated by semicolons) associated with it to the right of the value. These are stored in the mime_param list. A parameter can have an attribute and optionally a value. If the value is present, it is separated from the name by an equals sign.

Messages

/* typedef struct mime_msg */ Messages hold the body of the message, whether it be a pointer to the disk (the body.filename field), a pointer to an encapsulated message (the body.multipart field), or a string in memory (the body.s field). As the body can have embedded '\0' characters in it, the length field holds the actual length in bytes.

The pointers to strings in the info structure are convenience fields which point to various header values and parameters which are often used. If the information is not available, they are set to a constant string of length zero, not NULL, so it's always safe to pass one of these to a Standard C <string.h> function. Common time & date header values and parameters are also parsed and decoded for ease-of-use.

Finally, the headers field points to a linked list of headers associated with this message.

Parse State

In general, the state machine should be considered opaque. Helper functions for options should be used to manipulate the structure so applications will work with future versions of iMIME.

Query Interface

Although the message data type and the header data type are not opaque and the contents visible, generic convenient query mechanisms exist to retrieve data from these objects as if they were an abstract type. This interface also makes it easier to build wrappers and bridges from other languages that may not interface easily to native C types and functions.

Syntax

A simple syntax for retrieving header information and message bodies exist. The message body query syntax includes the header query syntax.

Header Query Syntax

The EBNF for query strings used by mime_get_header_info() is as follows.

hdr 
hdrstr
hdrstr.val
hdrstr.val.att

Given the header Content-Type: text/plain; charset=iso-2022-jp, str would be Content-Type, val would be text/plain, and att would be charset.

If no att is specified, all the parameter names are returned in one string, separated by the string delimiter. If no val is specified, all values are returned in one string. If no str is specified, all headers are returned separated.

Body Query Syntax

The EBNF for query strings used by mime_get_msg() is as follows, which the allocated return value, if any, in parenthesis.

an allocated pointer to the return type (in parenthesis) is returned
cmd ref^hdr queries the headers char*
cmd ref? gets content type mime_body_type*
cmd ref= gets content body mime_msg*
char*
cmd ref# content length size_t*
cmd ref get message mime_msg*
cmd ref:cmd dereference multipart contents
ref <id> part with Message-ID or Content-ID of id
ref !ref do not recurse/descend into multipart.
ref index #index */*
ref {index} #index text/*
ref [index] #index text/plain
ref (index) #index file (determined by the Content-Disposition)
ref "name" part labeled with name in Content-Disposition
'name'
index integer integer is a whole number referring to the message parts, decending into multipart children.

When a message is parsed, the multipart segments are appended in order to the linked list of messages. Assuming that the linked-list consists of the following contents:

Message-ID: <0123.4567@imime> 
MIME-Version: 1.0 (This is a test)
Content-Type: multipart/mixed; boundary=a
Content-Disposition: inline; name="a"

--a
Content-Type: multipart/alternate; boundary=b
Content-Disposition: inline; name="b"
Content-ID: < 89AB.CDEF@imime>

This is the prologue. It will be ignored.
--b
Content-Type: text/plain
Content-Description: this is the content for non-rich text viewers
Content-ID:  <FEDC.BA98@imime>

This is a test
--b
Content-Type: text/html
Content-ID: < DEAD.BEAF@imime >
Content-Description: this is the content for browser e-mail clients

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<TITLE>Test</TITLE>
<P>This is a test
--b--

This is the epilogue. It will be ignored.
--a
Content-Type: application/octet-stream
Content-ID: <0000.0000@imime >
Content-Disposition: attachment; filename=test.sh; name="a2"
Content-Description: This is a bash shell script

#!/bin/bash
echo "This is a test!"

--a--
   

The following example commands shows which part would be returned.

auto-descending into multipart not decending into multipart
0multipart/mixed !0 multipart/mixed
1 multipart/alternate !0:0 multipart/alternate
2text/plain !0:!0:!0 text/plain
3text/html !0:!0:!1 text/html
4 application/octet-stream !0:!1 application/octet-stream
5NULL !1NULL
{0}text/plain {!0}NULL
{1}text/html {!1}NULL
{2}NULL {!2}NULL
[0]text/plain [!0]NULL
[1]NULL [!1]NULL
(0) application/octet-stream (!0)NULL
(1)NULL (!1)NULL

The following is a complete program for allowing one to test the syntax of queries. It accepts zero or more messages stored in files as arguments to the program, then reads query commands from standard input until the end of file is reached (usually with Control-D on Linux and Control-Z on Windows).

#include <ctype.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "mime.h"

int main(int argc, char *argv[]) {
        int i;
        mime_msg *msgs = NULL; 

        for (i = 1; i < argc; i++) {
                FILE *file = fopen(argv[i], "rb");
                mime_state *state; 
        
                if (file == NULL) { 
                        perror(argv[i]);
                        exit(EXIT_FAILURE);
                }
                state = mime_init(NULL, mime_mime, NULL, 1, NULL, file);
                msgs = mime_parse(msgs, state); 
                mime_free(state);
                if (fclose(file) == EOF) {
                        perror(argv[i]);
                        exit(EXIT_FAILURE);
                }
        }
        while (!feof(stdin)) {
                char line[132], *s;
                size_t n;

                if (fgets(line, sizeof(line), stdin) == NULL) { 
                        if (ferror(stdin)) {
                                perror(NULL);
                                mime_free_msg_list(msgs);
                                exit(EXIT_FAILURE);
                        }
                        else break;
                }
                n = strlen(line);
                while (n != 0 && isspace(line[--n])) {
                        line[n] = '\0'; 
                }
                s = mime_get_msg(msgs, line);
                if (s == NULL) {
                        puts("NULL");
                        continue;
                }
                switch (line[n]) {
                        mime_body_type *type;

                case '#':
                        printf("length is %zu\n", *((size_t *) s));
                        break;
                case '?':
                        switch (*((mime_body_type *) s)) {
                        case mime_in_memory:
                                puts("type is in_memory");
                                break;
                        case mime_filename:
                                puts("type is filename");
                                break;
                        case mime_multipart:
                                puts("type is multipart");
                                break;
                        default:
                                puts("type is UNKNOWN");
                        }
                        break;
                case '=':
                        line[n] = '?';
                        type = mime_get_msg(msgs, line);
                        switch (*type) {
                        case mime_in_memory:
                                fputs("obj is ", stdout);
                                mime_fputs(*((char **) s), stdout);
                                puts("");
                                break;
                        case mime_filename:
                                printf("file is %s\n", *((char **) s));
                                break;
                        case mime_multipart:
                                printf("multi is %p\n", *((void **) s));
                                break;
                        default:printf("????? is %p\n", *((void **) s));
                        }
                        free(type);
                        break;
                default:if (strchr(line, '^') != NULL) {
                                if (s == NULL)
                                        puts("NULL");
                                else {  mime_fputs((char *) s, stdout);
                                        puts("");
                                }
                        }
                        else printf("body is %p\n", *((void **) s));
                }
                free(s);
        }
        mime_free_msg_list(msgs);
        return 0;
}
  

Wildcards

For convenience, query strings allow one or more wildcards per identifier. Note that the wildcard has many restrictions regarding its use.

{
  char *s = mime_get_msg(msgs, "'submit*'^content-type.*.charset");
  printf("The encoding for the 1st submit form control is %s\n", s);
  free(result);
}
  

Doing More

While the combination of mime_init() then mime_parse() is enough for most basic message decoding tasks, additional functions exist for controlling the parser behavior.

Controlling Message Sizes

mime_limit *mime_get_limits(mime_state *state);

When a state is initialized, size limits are disabled and thresholds are set to obvious behavior. These limits and thresholds can be modified at any time after initializing the state object and using it to parse messages.

Setting Thresholds

iMIME differentiates between internal "in memory" objects and objects that are considered to be external "attachments" or files, and this judgement is made based on the Content-Type and Content-Disposition.

The default settings within the state cause iMIME to save all external attachments to disk and all internal objects are kept in memory, but you can modify this behavior. For example, you may want small file based objects to be kept in memory for performance. Or you may want to write excessively large "internal" objects to disk so excessive amounts of RAM memory are not allocated.

The cutoff point for when a object, whether it be classified as an attachment or internal, is the threshold. Until the amount of bytes in the body of the object reaches or exceeds the threshold point, the object will be kept in memory. The object will be written to a temporary file. If information about the modification date and/or the access date is present in the Content-Disposition, the temporary file will have these set accordingly.

To change the thresholds of a mime_state *, modify the threshold fields.

{
  mime_state *state = mime_init(NULL, mime_parse, NULL, 1, NULL, stdin);
  mime_limit *limits = mime_get_limits(state);

  limits->write_alloc_threshold = 10240;
  limits->write_file_theshold = 64;
}
  

Setting Limits

When using iMIME to parse objects coming from an unknown and/or untrusted source, the size of the objects being sent cannot be known in advance. To reduce DoS problems where a third person sends an object intentionally or unintentionally of an unreasonable size, or sends an unreasonable amount of objects, iMIME allows you to configure the state machine to stop accepting input after certain limits are reached.

The limits set can be per object, all objects combined, and you can control the amount of bytes and the amount of message bodies received. The library can also separate the limits between external objects and internal objects, as the handling and body resource consumption depends on thresholds and whether they are considered to be external or internal.

{
  mime_state *state = mime_init(NULL, mime_parse, NULL, 1, NULL, stdin);
  mime_limit *limits = mime_get_limits(state);

  limits->max_total_size = 32768;
  limits->max_total_objects = 5;

  limits->max_total_file_size = 16384;
  limits->max_file_size = 8196;
  limits->max_file_objects = 3;

  limits->max_total_alloc_size = 8192;
  limits->max_alloc_size = 4096;
  limits->max_alloc_objects = 3;
}
  

You can query the default limits that are set during the state initialization with the function void mime_set_default_limits(mime_limit *limits), which will store the defaults values in the struct pointed to by limits.

Specifying File Locations

Objects with sizes over certain thresholds will be saved to a file with a name that is set by the fifth parameter to mime_init().

#include <stdio.h>
#include "mime.h"

int main(int argc, char *argv[]) {
  mime_state *state = mime_init(NULL, mime_mime, NULL, 1, "data.", stdin);
  mime_msg *messages = mime_parse(NULL, state);
  mime_free_msg_list(messages);
  mime_free(state);
  return 0;
}
  

The above program will save attachments and files from the message input into the program to the current working directory, with names similar to "data.jd5hYt", "data.65hTrd", and "data.JH76ya".

Strings Instead of Files

If the message to be parsed is already in memory, you can specify that the parse occur with data from memory rather than an open readable binary stream.

Because the string may contain embedded zeros (if it includes MIME attachments with Content-Transfer-Encoding: binary), you must also specify the length of the string.

#include <stdio.h>
#include "mime.h"

const char *msg =
 "Content-Type: text/plain\r\n"
 "Content-Disposition: attachment; filename=test.txt\r\n"
 "\r\n"
 "This is a test file.\r\n";

int main(int argc, char *argv[]) {
  mime_state *state = mime_init(NULL, mime_mime, NULL, 0, NULL, msg, strlen(msg));
  mime_msg *messages = mime_parse(NULL, state);
  if (messages != NULL && messages->type == mime_filename) {
    printf("data saved to '%s'\n", messages->body.filename);
    /* don't let mime_free_msg_list() remove the temp */ 
    messages->body.filename[0] = '\0'; 
  }
  mime_free_msg_list(messages);
  mime_free(state);
  return 0;
}
  

The above example should create a file in the temporary directory with the contents "This is a test file.".

Parsing from a string received is useful for CGI applications handling form data passed with GET as the data will be in memory, retrieved by C code similar to getenv("QUERY_STRING").

Checking for Errors

typedef enum { mime_no_error, mime_limits_error } mime_error;
mime_error mime_get_error(const mime_state *state);

mime_parse() reports one type of error that may be useful to report to the submitter of the message: whether or not the message was greater than the set limits (mime_limits_error). If mime_no_error is returned by mime_get_error(), no error related to message size occurred.

For absolute robustness, you will want to test for other errors that can occur.

To test for these errors, set errno to 0 before mime_init(), and set for a non-zero value both after mime_init() and mime_parse.

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include "mime.h"

int main(int argc, char *argv[]) {
        mime_state *state; 
        mime_msg *messages;

        errno = 0;
        state = mime_init(NULL, mime_mime, NULL, 1, NULL, stdin); 
        if (errno) {
                perror(NULL);   /* out of memory */
                exit(EXIT_FAILURE);
        }
        else {  mime_limit *limits = mime_get_limits(state);
                limits->max_total_size = 32768; /* <= 32K of body */ 
                messages = mime_parse(NULL, state); 
                if (errno) {
                        perror(NULL);   /* no memory or io problem */
                        exit(EXIT_FAILURE);
                }
        }
        switch (mime_get_error(state)) {
        case mime_no_error:             break;
        case mime_limits_error: puts("message too large");
                                break;
        default:                puts("unknown error!");
        }
        mime_free_msg_list(messages);
        mime_free(state);
        return 0;
}
  

Security

All data that comes from an untrusted source needs to be carefully evaluated by the application program. Because iMIME was designed to process information coming from outside sources, you should always regard this data as untrusted unless it comes from an authenticated trusted source.

iMIME may occasionally return a NULL pointer in certain data structures when the data could not be converted to UTF-8, an illegal UTF-8 sequence is found, the message is malformed or prematurely terminated. It will also attempt to gracefully bail out of error situations without leaking memory to reduce (not eliminate) the chance of DoS exploits.

The application using the library will need to perform additional checks should it be accepting files and non-text data to determine if they are valid.

The library tries to not crashing due to bad input, but does nothing about rejecting bad data that does not crash the parser and obeys limits. This is the responsibility of the library user.

If the library creates files, these files may have access and modification timestamps that were set from information received from the MIME data. They cannot be trusted as accurate. As always, the contents of files received from unknown sources should not be trusted. This includes not just the obvious executable binary, but scripts and less obvious sources (PostScript®) that have executable properties.

The security notes in the RFCs that iMIME support should be consulted and taken into account when using iMIME.

Usage Examples

iMIME is designed to help software applications with the following needs.

  1. Automatically parsing complicated mail messages for mail daemons, such as mailing list managers. Most automated e-mail responders require a plain text message and are not very tolerant of complicated messages. On the otherhand, it's often hard to get modern e-mail programs to send a simple plain text message! This library makes it a little easier for messages designed to take email messages as input and act on them. Mail daemon software can use this to understand commands that are MIME encoded.
  2. Few e-mail clients and existing libraries support all the standards related to I18N and MIME messages. This will decode all of it, and serves as a competitor to existing MIME libraries.
  3. CGI and web server side scripts rarely have sophisticated client submission parsing. This library understands all the new standards track submission standards as well as the legacy standards with browser extensions such as Mozilla's charset parameter with the application/x-www-form-urlencoded type.
  4. People trying to migrate their web server form parsing, especially FORMs that handle non-ASCII and binary file uploads, to new multipart/form-data style. The legacy format is supported and emulated as a multipart/form-data, making migration easy.
  5. Anyone that needs to parse anything non-English in messages, but doesn't want to worry about character sets and just wants to see a uniform character string no matter what was initially spit at the parser.

"I18Ning" Mail Responders

Mailing list software and other automated mail responders proliferate around the internet. For interactive clients, where a special command is sent to an address, the mail daemon returns a specific reply. (ex. sending a "help" or "subscribe" command to a mailing list daemon). There are a few problems with current software that can be solved by modifying mailing list software to use this library to preprocess input.

Using iMIME with CGI

iMIME was designed especially for handling new generation web form submissions. HTML 4 goes into great detail about the multipart/form-data format and its recommended use for non-ASCII submissions, file uploads, and large amounts of data.

Unfortunately, the sheer complexity of the standard, plus the fact that existing libraries handle the multipart/form-data with a completely different API means that web designers had to maintain two different code bases for web submissions.

iMIME can handle the most complicated web form submissions, as well as the older application/x-www-form-urlencoded with the same API, making it easy to migrate from the older system and/or be backwards compatible with older pages and older browsers that can't support the new type.

Data Passed with GET

<FORM action=example.cgi>
  
  1. Always set the first parameter to NULL.
  2. Set the second parameter of mime_init() to mime_urlencoded, saying that no header information is present and that we will be parsing the body immediately.
  3. Set the third parameter to the character encoding that the web page is in. The default chararacter set for web pages is ISO-8859-1, also known as "Latin-1" and incorrectly referred to as "8-bit ASCII" or "Extended ASCII." If this parameter is set to NULL, the library will query the run-time locale to determine the character set/encoding. While standards dictate that the lack of a "charset" for a web page indicates that the page defaults to using ISO-8859-1, in reality this is abused, and web browsers may send non-Latin-1 data if the encoding is not set explicitly by the HTTP server. If a charset parameter has been set in the Content-Type (some newer browsers such as Mozilla can do this), this will override this parameter and the locale's character encoding.
  4. Set the fourth parameter to false (zero), indicating that the parser is not to read from a stream.
  5. The fifth parameter should be a pointer to the string after the "?" in the URL. Using C with CGI/1.1, this is usually done with getenv("QUERY_STRING").
  6. The sixth parameter is a size_t unsigned integer type that will be the length of the fifth parameter. If the fifth parameter contains no embedded '\0' characters (which it shouldn't if it's valid), this will be the same as the value returned by strlen(url_encoded_string).

The data parsed with mime_parse() will resemble a MIME multipart message. Ampersands and semicolons will look like MIME multipart boundaries. The data to the left of the equals sign will look like the data to the right of "Content-Disposition; name=".

POSTed Forms

<FORM method=post action=example.cgi>
  

As the data type is the same, HTML form method="post" handling is the same as HTML FORM handling with iMIME are handled, with the exception being that the application/x-www-form-urlencoded is read from a stream (usually stdin with CGI) and not a string.

  1. Set the fourth parameter to true (non-zero), indicating that the next parameter specifies a a <stdio.h> style FILE pointer.
  2. Set the fifth parameter to a FILE pointer that has been fopen()ed with a binary reading mode such as "rb". This is not important with glibc based systems because binary files are the same as text files, but on other platforms failing to set to binary will cause linefeed information which is critical to iMIME to be altered. stdin is a text stream, not a binary stream. The library does not fclose() the stream.

Note that should the CGI be working in the obsolete "nph" mode and is receiving the headers, you should use the standard parsing which is the same as handling a file upload. The Content-Type will cause iMIME to process the data properly.

Handling File Uploads

<FORM method=post enctype="multipart/form-data" action=example.cgi>
  

Handling a file upload is identical to a POSTed form or converting mail from the perspective of the application.

  1. Set the second parameter of mime_init() to mime_mime, saying that header information is present.

In the case of multipart/form-data POSTs, iMIME behaves slightly differently than a regular MIME mail.

If the browser support uploading multiple files within one control as per HTML 4, the multiple files will be in a multipart/mixed object.

non-ASCII Legacy FORMs

For new forms, you should always use multipart/form-data instead of application/x-www-form-urlencoded, because MIME has mechanisms built in for specifying the character encoding for every control, where the older method must either hardcode/coordinate the encoding between the web page and the script, or rely on a kludge where the encoding is passed in as a hidden control.

In the past, the new form was frowned upon due to lack of browser support, but now all modern browsers support new form submissions.

iMIME can use the following methods for determining the character encoding for older forms:

Encoding Info from Agent

iMIME supports User-Agents (web browsers) such as Mozilla that can specify the encoding in the charset parameter in the Content-Type for old forms.

Content-Type: application/x-www-form-urlencoded; charset=iso-8859-2

name=noone&submit=ok
  

This method has the following advantages:

This method has the following disadvantages:

Submit & Page Encoding

When the accept-charset attribute is set to UNKNOWN (the default value), a form is submitted in the same encoding that the web page containing the form is in.

If the HTML 4 attribute for the accept-charset is set, the character set used should be one from the specified list. Few browsers currently support this very new feature, but as it is standard support is expected to increase.

<FORM action=example.cgi accept-charset="utf-8, euc-kr" method=post>
  

If accept-charset=UNKNOWN (the default) and the page is not in ISO-8859-1 (Latin-1, the Western European character set), then the character encoding of the page must be set by the web server. Web pages that assume the locale default (such as Japanese pages with no charset parameter) are wrong according to standards and are not guaranteed to work correctly. This also includes the updated version of Latin-1, ISO-8859-15, which includes the Euro currency symbol ("€") and some French characters missing from Latin-1. There are three ways to tell the User-Agent (web browser) what character encoding the page is in, and thus what character set it should submit the form when accept-charset=UNKNOWN.

  1. The way which gives the international web contents/translation team control is to have them emulate a HTTP header with a meta tag.
    <META http-equiv=content-type content="text/html; charset=big5">
        

    This has the following advantages:

    The <META> tag solution has the following disadvantages:

  2. The canon way to tell browsers what character encoding the page is in is have the web server itself return the information in the Content-Type within the HTTP response. Apache allows the charset parameter to be modified many different ways.
  3. XHTML, which is XML, can use the XML mechanisms for specifying the encoding.
Form Encoding Info

iMIME has a special variable that may be passed in form controls called "charset-enc". When a form control of this name appears, the value is used as the character encoding for all subsequent controls. Thus you must make sure that this control appears before all text controls.

<FORM method=post action=example.cgi>
 <INPUT type=hidden name=charset-enc value=iso-8859-2>
 <!-- the above must be the first control -->
 <INPUT name=full-name>
</FORM>
  

The advantages of this method are:

The disadvantages of this method are:

It is possible to have more than one charset-enc in a form. The subsequent variables will override and replace the previous set value, but will not cause reconversion of the previous text controls.

Doing this doesn't make much sense with legacy forms though as only one character encoding is used for all the form controls. In general, you don't want to do this, but one special cases come to mind: You want some characters to not be converted when using 7-bit stateful encoding. In this case, the normal character set would be something similar to ISO-2022-JP or ISO-2022-KR or ISO-2022-CN, and a few controls would be set to decode as US-ASCII so the escape/shift sequences will be ignored.

Script Encoding Info

The simple method is to leave the encoding to the web backend script team, and tell the international web contents and translation team what character set the pages with forms must be in.

{
  mime_state *state;

  state = mime_init(NULL, mime_urlencoded, "koi8-r", 1, NULL, stdin);
}
  

The above sets the parser to expect the form data to be in a popular encoding for Russian Cyrillic. The page submitting the form must also be in KOI8-R.

The advantages of this method are:

The disadvantages of this method are:

Multiple Encoding Info

All three technique for determining the character encoding can be used at once. In the case that two or more sources are available for determining the character encoding, precedence is set as follows, with the first listed method having the highest precedence.

  1. character encoding returned by the browser in the charset parameter
  2. character encoding returned within a form control via the charset-enc variable
  3. character encoding set by the backend application explicitly or implicitly via the mime_init() function

It's a good idea not to rely on the character encoding being set by the browser alone as most older browsers do not send this information.

Extra Features

iMIME contains some extra routines not directly related to parsing or querying messages, but to help with the debugging and conversion of UTF-8, HTML ampersand escapes, and wide strings.

Debugging Strings

mime_fputs(const char *s, FILE *stream);
mime_fputws(const wchar_t *s, FILE *stream);

Most environments still do not have UTF-8 consoles that can be used to view raw UTF-8 data. Also, the sheer number of characters in Unicode and the complexity of Unicode means that you often want to see the raw hex codes for each character instead of the actual representation. iMIME provides two helper functions that can display UTF-8 encoded and wide strings on a ASCII terminal.

mime_fputs(MIME_UTF_BOM "\xE6\x97\xA5\xE6\x9C\xAC\xE8\xAA\x9E\x0A", stdout);
  

produces the string:

L"\u65E5\u672C\u8A9E\n"

which is 日本語 with a newline at the end. (The string means "Japanese language")

Using Wide Characters

wchar_t *mime_utf_conv(const char *s)

mbstowcs() is provided by Standard C to convert from multibyte strings to wide strings, but suffers from some problems that make it inadequate for use with this library.

#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include "mime.h"

int main(int argc, char *argv[]) {
        wchar_t *s;
        
        setlocale(LC_CTYPE, "ja_JP.eucJP");
        
        /* convert UTF-8 char * to wide even though LC_CTYPE is set
         * so that char * should be EUC-JP. 
         */      
        s = mime_utf_conv(MIME_UTF_BOM "\xe6\xbc\xa2\xe5\xad\x97\x0a");
        if (s != NULL) {
                fputws(s, stdout);
                
                /* stdout has been changed to wide orientation so output the
                 * non-wide string to stderr. 
                 */
                mime_fputws(s, stderr); /* output EUC-JP */
                fputc('\n', stderr);
                free(s);
        }
        else exit(EXIT_FAILURE);
        return 0;
}
  

mime_utf_conv() will convert from UTF-8 to a wchar_t string where each character contains an ISO-10646 (Unicode) code.

漢字
/*0x8055180*/ L"\u6F22\u5B57\n" /*3*/
  

If you are using a terminal which can display Japanese, you should see the word "kanji" ("Chinese Character") in Japanese style ideographs followed by an escaped ASCII version. This is prefixed by the pointer location on your system and suffixed by the character (not the byte) count if debugging is enabled.

Parsing NCR and HTML Entities

void mime_init_html(mime_html_state *state, int sgml_safe);
char *mime_decode_html(const char *s, mime_html_state *state)

iMIME post processes all strings converted to UTF-8 through a HTML entity processor and numeric character reference processor. It understands all general entities in HTML 4.01 (including &euro;, and understands decimal as well as hexadecimal character references.

mime_decode_html() is called multiple times as characters become available and are processed by a state machine.

#include <stdio.h>
#include <stdlib.h>
#include "mime.h"

int main(int argc, char *argv[]) {
	int i;

	for (i = 1; i < argc; i++) {
		mime_html_state state;
		char *s;

		mime_init_html(&state, 0);
		s = mime_decode_html(argv[i], &state);
		mime_fputs(s, stdout); puts(""); free(s);
		s = mime_decode_html(NULL, &state);
		mime_fputs(s, stdout); puts(""); free(s);
	}
	return 0;
}
  

You must initialize the HTML conversion state machine with mime_init_html() before calling mime_decode_html().

Hello&#x20;World&#33; &hearts;
  

as the first argument produces the following output:

"Hello World! \u2665"
""

... which is "Hello World! ♥". Note that although mime_decode_html() outputs UTF-8, it does not prefix the string with MIME_UTF_BOM because the magic is added for converted encodings only.

When the library is compiled with debugging enabled through macros, the following additional general entities will be available.

&rscid;
expands to the RCS/CVS id for the source filemime.c
&package;
usually expands to "imime"
&version;
expands to the version of this library (ex. "1.0.0")

Some people who are conservative about security believe it not wise to let outsiders know what version of software you're running lest script kiddies use known exploits for older versions, so the thinking is that live production code with debugging code removed should disable this entity, which it does.

Parsing Time Strings

time_t mime_parse_time(const char *date_time);

iMIME understands both asctime() style time strings ("Sun Nov 6 08:49:37 1994") and RFC 850 style time strings. It also understands most of the U.S. timezone, military zone (A to Z), UTC/GMT zone, and some far east timezone (Japan and South Korea) labels.

#include <stdio.h>
#include <time.h>
#include "mime.h"

int main(int argc, char *argv[]) {
  int i;

  for (i = 1; i < argc; i++) {
    time_t tod = mime_parse_time(argv[i]);
    fputs(asctime(gmtime(&tod)), stdout);
    fputs(asctime(localtime(&tod)), stdout);
  }
  return 0;
}
  

MIME Transfer Encodings

typedef enum { mime_binary, mime_8bit, mime_7bit, mime_base64, mime_quoted_printable } mime_transfer;
void mime_init_enc(mime_enc_state *decoder, mime_transfer type);
size_t mime_qp_decode(char *dst, const char *src, mime_enc_state *state);
size_t mime_64_decode(char *dst, const char *src, mime_enc_state *state);

The last thing the world needs is another Quoted-Printable and Base64 decoder, but in case you do need it, iMIME's internal routines are exported for application use.

The parsers use a state object so you don't have to buffer the lines yourself. Lines can be arbitrarily long. To initialize the state machine, pass a pointer to an initialized or uninitialized structure to the following function.

Characters after the '=' padding are permitted and allowed in Base64 mode.

#include <stdio.h>
#include "mime.h"
   
int main(int argc, char *argv[]) {
  char s[] = "SGVsbG8gV29ybGQh"; /* "Hello World!" */
  mime_enc_state state;
  size_t n;

  mime_init_enc(&state, mime_base64);
  n = mime_64_decode(s, s, &state);
  s[n] = '\0'; 
  puts(s);
  return 0;
}
  

The example above should print "Hello World!" on a line.

Content Negotiation

char *mime_negotiate_content(mime_header *msg, const char *available);

Content negotiation is a standard feature of HTTP/1.1 and is supported by most modern servers such as Apache. However, applications other than web servers may wish to conveniently figure out which language is most appropriate to return, given a set of languages with varying levels of quality matched with users desired languages.

Suppose we received a mail or HTTP request with the following header:

X-Accept-Language: fr-CA, fr; q=0.999, en; q=0.8, de; q=0.500; *; q=0
  

This could mean:"I am from Montreal and Canadian French is my native language. Given a choice, I prefer this dialect, but "standard" French is almost just as acceptable. I am bilingual though and understand English as well. I studied German and will take that if you have it, but only if you can't deliver in my two comfortable languages. I don't know any other languages, and don't even think about sending these to me."

{
  const char *have = "Content-Language: en, de; q=0.9, ja; q=5, fr; q=0.1";
  char *negotiated = mime_negotiate_content((mime_msg *) msg, have);
  if (negotiated == NULL)
    puts("NULL");
  else {  puts(negotiated);
          free(negotiated);
  }
}
  

This example could be interpreted as follows: "The web page was written in English, but one of our web team members is German, knows our products and is familiar with our marketing pitch, and we trust his translations. Every once in a while we send the English page to a Japanese translation firm, and this page is not up-to-date, and the translator is not a technical person and not familiar with computer vocabulary. We received a French translation of our page once from a college student studying French translation.

   fr
  

The above code will return "fr", even though it is the least well translated page. This is because the default macroed negotiation algorithm always gives preference What the client wants to accept, and only uses the quality of the content when the client prefers two or more available resources equally.

Markup Language Handling

As iMIME will allow an application to easily receive and decode files, chances are that an interactive web application will want to immediately use that data-- whether it be an image (ex. face shot of the user submitted through a registration form), or the pronounciation of one's name.

Handling image, sound, and video is complicated as there are numerous standard formats. Many libraries exist to handle these files and iMIME does not bother to duplicate the functionality.

Modern GUI mail programs often default to outputting MIME formatted text/html unless modified to output text/plain. To deal with this common format, a convenience conversion filter are provided.

mime_markup_state *mime_init_markup(const char *alt);
void mime_free_markup(mime_markup_state *decoder);
char *mime_transform_markup(const char *sgml, mime_markup_state *state);

Automated processing of data received from mail and web clients nowadays send some SGML application, such as DocBook, HTML, or XML. All of these formats generally markup normal text with tags such as <SAMPLE>. iMIME converts marked up text to plain text via the following methods:

With most markup languages, the removal of tags will cause the loss of too much information too be useful to humans, but the information could be useful for a text search engine or some other processor that needs only the unformatted raw text stream.

#include <stdlib.h>
#include <stdio.h>
#include "mime.h"

int main(int argc, char *argv[]) {
        int i, result = EXIT_SUCCESS;
        mime_markup_state *state; 

        if ((state = mime_init_markup("alt, longdesc")) == NULL)
                exit(EXIT_FAILURE);
        for (i = 1; i <= argc; i++) {
                const char *sgml = i == argc ? NULL : argv[i];
                char *s = mime_transform_markup(sgml, state); 

                if (s != NULL) { 
                        printf("%s%c", s, i == argc ? '\n' : ' '); 
                        free(s);
                }
                else {  result = EXIT_FAILURE;
                        break;
                }
        }
        mime_free_markup(state);
        return result; 
}
  

Interfaces other than C

The library is written in Standard C and works best when called by C or C++ routines, but wrapper APIs have been written for popular scripting languages, especially scripting languages used for server side dynamic web content parsing, as the library is designed to work well with HTTP client data.

C++

[ Under Construction ]

Java

[ Under Construction ]

PHP

[ Under Construction ]

Perl

[ Under Construction ]

CGI.pm

[ Under Construction ]

cgi-lib

[ Under Construction ]

Python

[ Under Construction ]

TCL/Tk

[ Under Construction ]

Ruby

[ Under Construction ]

Customizing the Source

As this is library is open source, you can modify it in accordance with the license. The following areas have been set up to allow for trivial modification.

Macros

The C source code has plenty of macros which are designed to allow the developer to customize the operation of the library. Define and set these in your Makefile to override default behavior.

Note that changing these macros will probably cause the library to misbehave and not function properly as-is; additional hacking on the code will be necessary.

DMALLOC
When true, attempts to replace the common memory and pointer allocation related functions with debug-use versions. You need Dmalloc to be installed on your system.
MIME_REJECT_IDENT_ICONV
If true, will cause the iconv_open() to not be called and an error returned when you try to convert from UTF-8 to UTF-8 (which does nothing except error check the stream).
MIME_PARSE_KEYWORDS
If true, the header Keywords will be parsed as a structured comma separated list, instead of freeform text and #phrases which is what RFC 822 says Keywords consists of.
MIME_UNPARSED_RCVD
If true, the Received header will not be preprocessed into a comma, semicolon structured form.
NDEBUG and DEBUG
If NDEBUG is defined, the pointer address and character count will not be output for each mime_fputs(). Alternatively, you can enable debugging code by setting the macro DEBUG to true (non-zero).
MIME_BACKSLASH_QUOTE
If true, double-quote marks printed by mime_fputs() and mime_fputws() will be printed as \" instead of \u0022.
MIME_USE_UFFFF
If true, illegal UTF-8 bytes found during conversion will be converted to the Unicode character U+FFFF (Not a Character) instead of being ignored and discarded.
MIME_HTML_2BYTE_UTF_U0000
If true, U+0000 will be encoded in UTF-8 as two bytes instead of the standard one byte so Standard C <string.h> routines will differentiate between the '\0' string terminator and the same character in Unicode that is actually part of the string and not a terminator.
MIME_CASE_INSENSITIVE_NAME_QUERY
If true, queries for messages with "name"s from via the Content-Disposition will be case insensitive for ASCII values. Non-ASCII (Latin-1 and other Unicode ranges) in the name will not be converted.
MIME_USE_GLOBAL_LOCALE
If true, C functions that rely on LC_CTYPE will be used instead of thread-safe substitutes that are hard-coded to the "C" locale. Since message data from the internet does not vary in format, there is no reason to use locale sensitive parsing.
MIME_PREFER_HAVE_OVER_WANT
If true, the negotiation priority will be reversed in that the contents quality will be given the priority over the acceptable quality.

The following macros are available to applications that #include "mime.h".

MIME_UTF
A string containing the encoding that strings are converted into. Must be supported by the system's iconv_open(). The wide character converter and debug routines only understand UTF-8.
MIME_UTF_BOM
The prefix sequence appended to the front of converted strings. Usually U+FEFF converted to the MIME_UTF encoding. If you don't want a prefix, define this to be "".
MIME_UTF_DELIM
The string that separates multiple strings. Usually set to "\xFF", which never appears in UTF-8.
MIME_WIDE_DELIM
The same as MIME_UTF_DELIM, except it is used by mime_utf_conv() and it is only one character and not a string. Usually set to U+FFFC, which is a Unicode Object Replacement character.
MIME_TMP_BASE
A path separator followed by a filename prefix. The file will usually be put in /tmp or /var/tmp depending on the system. The filename will be suffixed with a random string of up to six filename-safe characters.
MIME_CHARSET_ENC
A HTML/SGML token designed to be the "name" attribute in a <INPUT name="charset-enc" type="hidden" value="encoding"> control. All controls after this are converted to MIME_UTF from encoding. Used to let the form itself set the encoding.

String Buffer Increment

Strings of type mime_string are dynamically preallocated with a set amount of characters. When all the preallocated space is used up, a realloc() is called with a size either twice the previous (if the inc field is zero), or a size that is increased by the value of the field inc.

Allocate too little space, and performance degrades due to continuous unnecessary realloc() calls. Allocate too much space, and the library consumes more free store than it needs to.

The const size_t global variables at the top of mime.c may be tuned for a specific application.

Source Code Keywords

Within the source code the strings keywords are embedded in the comments to point out parts that you may wish to consider fixing or modifying.

XXX
The section of code may or may not need modification depending on the opinion of the user of the code.
FIXME
The section of the code is known to not do something in an ideal way, but it works well enough and/or the fix is non-trivial that nobody has bothered to fix it.
TODO
Functionality that has not yet been implemented yet.

Happy Hacking!


[  ] Additional documentation and support for iMIME can be obtained from Red Hat, Inc.

$Id: en.html,v 1.38 2001/04/16 04:06:12 havill Exp havill $