utf16.h File Reference

C API: 16-bit Unicode handling macros. More...

#include "unicode/utf.h"

Go to the source code of this file.

Defines

#define U16_IS_SINGLE(c)   !U_IS_SURROGATE(c)
 Does this code unit alone encode a code point (BMP, not a surrogate)?

#define U16_IS_LEAD(c)   (((c)&0xfffffc00)==0xd800)
 Is this code unit a lead surrogate (U+d800..U+dbff)?

#define U16_IS_TRAIL(c)   (((c)&0xfffffc00)==0xdc00)
 Is this code unit a trail surrogate (U+dc00..U+dfff)?

#define U16_IS_SURROGATE(c)   U_IS_SURROGATE(c)
 Is this code unit a surrogate (U+d800..U+dfff)?

#define U16_IS_SURROGATE_LEAD(c)   (((c)&0x400)==0)
 Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate?

#define U16_SURROGATE_OFFSET   ((0xd800<<10UL)+0xdc00-0x10000)
 Helper constant for U16_GET_SUPPLEMENTARY.

#define U16_GET_SUPPLEMENTARY(lead, trail)   (((UChar32)(lead)<<10UL)+(UChar32)(trail)-U16_SURROGATE_OFFSET)
 Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.

#define U16_LEAD(supplementary)   (UChar)(((supplementary)>>10)+0xd7c0)
 Get the lead surrogate (0xd800..0xdbff) for a supplementary code point (0x10000..0x10ffff).

#define U16_TRAIL(supplementary)   (UChar)(((supplementary)&0x3ff)|0xdc00)
 Get the trail surrogate (0xdc00..0xdfff) for a supplementary code point (0x10000..0x10ffff).

#define U16_LENGTH(c)   ((uint32_t)(c)<=0xffff ? 1 : 2)
 How many 16-bit code units are used to encode this Unicode code point? (1 or 2) The result is not defined if c is not a Unicode code point (U+0000..U+10ffff).

#define U16_MAX_LENGTH   2
 The maximum number of 16-bit code units per Unicode code point (U+0000..U+10ffff).

#define U16_GET_UNSAFE(s, i, c)
 Get a code point from a string at a random-access offset, without changing the offset.

#define U16_GET(s, start, i, length, c)
 Get a code point from a string at a random-access offset, without changing the offset.

#define U16_NEXT_UNSAFE(s, i, c)
 Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.

#define U16_NEXT(s, i, length, c)
 Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.

#define U16_APPEND_UNSAFE(s, i, c)
 Append a code point to a string, overwriting 1 or 2 code units.

#define U16_APPEND(s, i, capacity, c, isError)
 Append a code point to a string, overwriting 1 or 2 code units.

#define U16_FWD_1_UNSAFE(s, i)
 Advance the string offset from one code point boundary to the next.

#define U16_FWD_1(s, i, length)
 Advance the string offset from one code point boundary to the next.

#define U16_FWD_N_UNSAFE(s, i, n)
 Advance the string offset from one code point boundary to the n-th next one, i.e., move forward by n code points.

#define U16_FWD_N(s, i, length, n)
 Advance the string offset from one code point boundary to the n-th next one, i.e., move forward by n code points.

#define U16_SET_CP_START_UNSAFE(s, i)
 Adjust a random-access offset to a code point boundary at the start of a code point.

#define U16_SET_CP_START(s, start, i)
 Adjust a random-access offset to a code point boundary at the start of a code point.

#define U16_PREV_UNSAFE(s, i, c)
 Move the string offset from one code point boundary to the previous one and get the code point between them.

#define U16_PREV(s, start, i, c)
 Move the string offset from one code point boundary to the previous one and get the code point between them.

#define U16_BACK_1_UNSAFE(s, i)
 Move the string offset from one code point boundary to the previous one.

#define U16_BACK_1(s, start, i)
 Move the string offset from one code point boundary to the previous one.

#define U16_BACK_N_UNSAFE(s, i, n)
 Move the string offset from one code point boundary to the n-th one before it, i.e., move backward by n code points.

#define U16_BACK_N(s, start, i, n)
 Move the string offset from one code point boundary to the n-th one before it, i.e., move backward by n code points.

#define U16_SET_CP_LIMIT_UNSAFE(s, i)
 Adjust a random-access offset to a code point boundary after a code point.

#define U16_SET_CP_LIMIT(s, start, i, length)
 Adjust a random-access offset to a code point boundary after a code point.


Detailed Description

C API: 16-bit Unicode handling macros.

This file defines macros to deal with 16-bit Unicode (UTF-16) code units and strings. utf16.h is included by utf.h after unicode/umachine.h and some common definitions.

For more information see utf.h and the ICU User Guide Strings chapter (http://icu.sourceforge.net/userguide/strings.html).

Usage: ICU coding guidelines for if() statements should be followed when using these macros. Compound statements (curly braces {}) must be used for if-else-while... bodies and all macro statements should be terminated with semicolon.

Definition in file utf16.h.


Define Documentation

#define U16_APPEND s,
i,
capacity,
c,
isError   ) 
 

Value:

{ \
    if((uint32_t)(c)<=0xffff) { \
        (s)[(i)++]=(uint16_t)(c); \
    } else if((uint32_t)(c)<=0x10ffff && (i)+1<(capacity)) { \
        (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
        (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
    } else   { \
        (isError)=TRUE; \
    } \
}
Append a code point to a string, overwriting 1 or 2 code units.

The offset points to the current end of the string contents and is advanced (post-increment). "Safe" macro, checks for a valid code point. If a surrogate pair is written, checks for sufficient space in the string. If the code point is not valid or a trail surrogate does not fit, then isError is set to TRUE.

Parameters:
s const UChar * string buffer
i string offset, must be i<capacity
capacity size of the string buffer
c code point to append
isError output UBool set to TRUE if an error occurs, otherwise not modified
See also:
U16_APPEND_UNSAFE

Stable:
ICU 2.4

Definition at line 319 of file utf16.h.

Referenced by UnicodeString::lastIndexOf().

#define U16_APPEND_UNSAFE s,
i,
 ) 
 

Value:

{ \
    if((uint32_t)(c)<=0xffff) { \
        (s)[(i)++]=(uint16_t)(c); \
    } else { \
        (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
        (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
    } \
}
Append a code point to a string, overwriting 1 or 2 code units.

The offset points to the current end of the string contents and is advanced (post-increment). "Unsafe" macro, assumes a valid code point and sufficient space in the string. Otherwise, the result is undefined.

Parameters:
s const UChar * string buffer
i string offset
c code point to append
See also:
U16_APPEND

Stable:
ICU 2.4

Definition at line 292 of file utf16.h.

#define U16_BACK_1 s,
start,
 ) 
 

Value:

{ \
    if(U16_IS_TRAIL((s)[--(i)]) && (i)>(start) && U16_IS_LEAD((s)[(i)-1])) { \
        --(i); \
    } \
}
Move the string offset from one code point boundary to the previous one.

(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
start starting string offset (usually 0)
i string offset, must be start<i
See also:
U16_BACK_1_UNSAFE

Stable:
ICU 2.4

Definition at line 543 of file utf16.h.

#define U16_BACK_1_UNSAFE s,
 ) 
 

Value:

{ \
    if(U16_IS_TRAIL((s)[--(i)])) { \
        --(i); \
    } \
}
Move the string offset from one code point boundary to the previous one.

(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
See also:
U16_BACK_1

Stable:
ICU 2.4

Definition at line 524 of file utf16.h.

#define U16_BACK_N s,
start,
i,
 ) 
 

Value:

{ \
    int32_t __N=(n); \
    while(__N>0 && (i)>(start)) { \
        U16_BACK_1(s, start, i); \
        --__N; \
    } \
}
Move the string offset from one code point boundary to the n-th one before it, i.e., move backward by n code points.

(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
start start of string
i string offset, must be start<i
n number of code points to skip
See also:
U16_BACK_N_UNSAFE

Stable:
ICU 2.4

Definition at line 586 of file utf16.h.

#define U16_BACK_N_UNSAFE s,
i,
 ) 
 

Value:

{ \
    int32_t __N=(n); \
    while(__N>0) { \
        U16_BACK_1_UNSAFE(s, i); \
        --__N; \
    } \
}
Move the string offset from one code point boundary to the n-th one before it, i.e., move backward by n code points.

(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
n number of code points to skip
See also:
U16_BACK_N

Stable:
ICU 2.4

Definition at line 563 of file utf16.h.

#define U16_FWD_1 s,
i,
length   ) 
 

Value:

{ \
    if(U16_IS_LEAD((s)[(i)++]) && (i)<(length) && U16_IS_TRAIL((s)[i])) { \
        ++(i); \
    } \
}
Advance the string offset from one code point boundary to the next.

(Post-incrementing iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
i string offset, must be i<length
length string length
See also:
U16_FWD_1_UNSAFE

Stable:
ICU 2.4

Definition at line 359 of file utf16.h.

#define U16_FWD_1_UNSAFE s,
 ) 
 

Value:

{ \
    if(U16_IS_LEAD((s)[(i)++])) { \
        ++(i); \
    } \
}
Advance the string offset from one code point boundary to the next.

(Post-incrementing iteration.) "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
See also:
U16_FWD_1

Stable:
ICU 2.4

Definition at line 341 of file utf16.h.

#define U16_FWD_N s,
i,
length,
 ) 
 

Value:

{ \
    int32_t __N=(n); \
    while(__N>0 && (i)<(length)) { \
        U16_FWD_1(s, i, length); \
        --__N; \
    } \
}
Advance the string offset from one code point boundary to the n-th next one, i.e., move forward by n code points.

(Post-incrementing iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
i string offset, must be i<length
length string length
n number of code points to skip
See also:
U16_FWD_N_UNSAFE

Stable:
ICU 2.4

Definition at line 400 of file utf16.h.

#define U16_FWD_N_UNSAFE s,
i,
 ) 
 

Value:

{ \
    int32_t __N=(n); \
    while(__N>0) { \
        U16_FWD_1_UNSAFE(s, i); \
        --__N; \
    } \
}
Advance the string offset from one code point boundary to the n-th next one, i.e., move forward by n code points.

(Post-incrementing iteration.) "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
n number of code points to skip
See also:
U16_FWD_N

Stable:
ICU 2.4

Definition at line 378 of file utf16.h.

#define U16_GET s,
start,
i,
length,
 ) 
 

Value:

{ \
    (c)=(s)[i]; \
    if(U16_IS_SURROGATE(c)) { \
        uint16_t __c2; \
        if(U16_IS_SURROGATE_LEAD(c)) { \
            if((i)+1<(length) && U16_IS_TRAIL(__c2=(s)[(i)+1])) { \
                (c)=U16_GET_SUPPLEMENTARY((c), __c2); \
            } \
        } else { \
            if((i)-1>=(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \
                (c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \
            } \
        } \
    } \
}
Get a code point from a string at a random-access offset, without changing the offset.

"Safe" macro, handles unpaired surrogates and checks for string boundaries.

The offset may point to either the lead or trail surrogate unit for a supplementary code point, in which case the macro will read the adjacent matching surrogate as well. If the offset points to a single, unpaired surrogate, then that itself will be returned as the code point. Iteration through a string is more efficient with U16_NEXT_UNSAFE or U16_NEXT.

Parameters:
s const UChar * string
start starting string offset (usually 0)
i string offset, must be start<=i<length
length string length
c output UChar32 variable
See also:
U16_GET_UNSAFE

Stable:
ICU 2.4

Definition at line 201 of file utf16.h.

Referenced by UnicodeString::endsWith().

#define U16_GET_SUPPLEMENTARY lead,
trail   )     (((UChar32)(lead)<<10UL)+(UChar32)(trail)-U16_SURROGATE_OFFSET)
 

Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.

The result is undefined if the input values are not lead and trail surrogates.

Parameters:
lead lead surrogate (U+d800..U+dbff)
trail trail surrogate (U+dc00..U+dfff)
Returns:
supplementary code point (U+10000..U+10ffff)

Stable:
ICU 2.4

Definition at line 109 of file utf16.h.

#define U16_GET_UNSAFE s,
i,
 ) 
 

Value:

{ \
    (c)=(s)[i]; \
    if(U16_IS_SURROGATE(c)) { \
        if(U16_IS_SURROGATE_LEAD(c)) { \
            (c)=U16_GET_SUPPLEMENTARY((c), (s)[(i)+1]); \
        } else { \
            (c)=U16_GET_SUPPLEMENTARY((s)[(i)-1], (c)); \
        } \
    } \
}
Get a code point from a string at a random-access offset, without changing the offset.

"Unsafe" macro, assumes well-formed UTF-16.

The offset may point to either the lead or trail surrogate unit for a supplementary code point, in which case the macro will read the adjacent matching surrogate as well. The result is undefined if the offset points to a single, unpaired surrogate. Iteration through a string is more efficient with U16_NEXT_UNSAFE or U16_NEXT.

Parameters:
s const UChar * string
i string offset
c output UChar32 variable
See also:
U16_GET

Stable:
ICU 2.4

Definition at line 169 of file utf16.h.

#define U16_IS_LEAD  )     (((c)&0xfffffc00)==0xd800)
 

Is this code unit a lead surrogate (U+d800..U+dbff)?

Parameters:
c 16-bit code unit
Returns:
TRUE or FALSE

Stable:
ICU 2.4

Definition at line 60 of file utf16.h.

#define U16_IS_SINGLE  )     !U_IS_SURROGATE(c)
 

Does this code unit alone encode a code point (BMP, not a surrogate)?

Parameters:
c 16-bit code unit
Returns:
TRUE or FALSE

Stable:
ICU 2.4

Definition at line 51 of file utf16.h.

#define U16_IS_SURROGATE  )     U_IS_SURROGATE(c)
 

Is this code unit a surrogate (U+d800..U+dfff)?

Parameters:
c 16-bit code unit
Returns:
TRUE or FALSE

Stable:
ICU 2.4

Definition at line 78 of file utf16.h.

#define U16_IS_SURROGATE_LEAD  )     (((c)&0x400)==0)
 

Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate?

Parameters:
c 16-bit code unit
Returns:
TRUE or FALSE

Stable:
ICU 2.4

Definition at line 88 of file utf16.h.

#define U16_IS_TRAIL  )     (((c)&0xfffffc00)==0xdc00)
 

Is this code unit a trail surrogate (U+dc00..U+dfff)?

Parameters:
c 16-bit code unit
Returns:
TRUE or FALSE

Stable:
ICU 2.4

Definition at line 69 of file utf16.h.

#define U16_LEAD supplementary   )     (UChar)(((supplementary)>>10)+0xd7c0)
 

Get the lead surrogate (0xd800..0xdbff) for a supplementary code point (0x10000..0x10ffff).

Parameters:
supplementary 32-bit code point (U+10000..U+10ffff)
Returns:
lead surrogate (U+d800..U+dbff) for supplementary

Stable:
ICU 2.4

Definition at line 121 of file utf16.h.

#define U16_LENGTH  )     ((uint32_t)(c)<=0xffff ? 1 : 2)
 

How many 16-bit code units are used to encode this Unicode code point? (1 or 2) The result is not defined if c is not a Unicode code point (U+0000..U+10ffff).

Parameters:
c 32-bit code point
Returns:
1 or 2

Stable:
ICU 2.4

Definition at line 141 of file utf16.h.

#define U16_MAX_LENGTH   2
 

The maximum number of 16-bit code units per Unicode code point (U+0000..U+10ffff).

Returns:
2

Stable:
ICU 2.4

Definition at line 149 of file utf16.h.

Referenced by UnicodeString::lastIndexOf().

#define U16_NEXT s,
i,
length,
 ) 
 

Value:

{ \
    (c)=(s)[(i)++]; \
    if(U16_IS_LEAD(c)) { \
        uint16_t __c2; \
        if((i)<(length) && U16_IS_TRAIL(__c2=(s)[(i)])) { \
            ++(i); \
            (c)=U16_GET_SUPPLEMENTARY((c), __c2); \
        } \
    } \
}
Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.

(Post-incrementing forward iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The offset may point to the lead surrogate unit for a supplementary code point, in which case the macro will read the following trail surrogate as well. If the offset points to a trail surrogate or to a single, unpaired lead surrogate, then that itself will be returned as the code point.

Parameters:
s const UChar * string
i string offset, must be i<length
length string length
c output UChar32 variable
See also:
U16_NEXT_UNSAFE

Stable:
ICU 2.4

Definition at line 267 of file utf16.h.

#define U16_NEXT_UNSAFE s,
i,
 ) 
 

Value:

{ \
    (c)=(s)[(i)++]; \
    if(U16_IS_LEAD(c)) { \
        (c)=U16_GET_SUPPLEMENTARY((c), (s)[(i)++]); \
    } \
}
Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.

(Post-incrementing forward iteration.) "Unsafe" macro, assumes well-formed UTF-16.

The offset may point to the lead surrogate unit for a supplementary code point, in which case the macro will read the following trail surrogate as well. If the offset points to a trail surrogate, then that itself will be returned as the code point. The result is undefined if the offset points to a single, unpaired lead surrogate.

Parameters:
s const UChar * string
i string offset
c output UChar32 variable
See also:
U16_NEXT

Stable:
ICU 2.4

Definition at line 239 of file utf16.h.

#define U16_PREV s,
start,
i,
 ) 
 

Value:

{ \
    (c)=(s)[--(i)]; \
    if(U16_IS_TRAIL(c)) { \
        uint16_t __c2; \
        if((i)>(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \
            --(i); \
            (c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \
        } \
    } \
}
Move the string offset from one code point boundary to the previous one and get the code point between them.

(Pre-decrementing backward iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The input offset may be the same as the string length. If the offset is behind a trail surrogate unit for a supplementary code point, then the macro will read the preceding lead surrogate as well. If the offset is behind a lead surrogate or behind a single, unpaired trail surrogate, then that itself will be returned as the code point.

Parameters:
s const UChar * string
start starting string offset (usually 0)
i string offset, must be start<i
c output UChar32 variable
See also:
U16_PREV_UNSAFE

Stable:
ICU 2.4

Definition at line 501 of file utf16.h.

#define U16_PREV_UNSAFE s,
i,
 ) 
 

Value:

{ \
    (c)=(s)[--(i)]; \
    if(U16_IS_TRAIL(c)) { \
        (c)=U16_GET_SUPPLEMENTARY((s)[--(i)], (c)); \
    } \
}
Move the string offset from one code point boundary to the previous one and get the code point between them.

(Pre-decrementing backward iteration.) "Unsafe" macro, assumes well-formed UTF-16.

The input offset may be the same as the string length. If the offset is behind a trail surrogate unit for a supplementary code point, then the macro will read the preceding lead surrogate as well. If the offset is behind a lead surrogate, then that itself will be returned as the code point. The result is undefined if the offset is behind a single, unpaired trail surrogate.

Parameters:
s const UChar * string
i string offset
c output UChar32 variable
See also:
U16_PREV

Stable:
ICU 2.4

Definition at line 472 of file utf16.h.

#define U16_SET_CP_LIMIT s,
start,
i,
length   ) 
 

Value:

{ \
    if((start)<(i) && (i)<(length) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \
        ++(i); \
    } \
}
Adjust a random-access offset to a code point boundary after a code point.

If the offset is behind the lead surrogate of a surrogate pair, then the offset is incremented. Otherwise, it is not modified. The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
start starting string offset (usually 0)
i string offset, start<=i<=length
length string length
See also:
U16_SET_CP_LIMIT_UNSAFE

Stable:
ICU 2.4

Definition at line 630 of file utf16.h.

#define U16_SET_CP_LIMIT_UNSAFE s,
 ) 
 

Value:

{ \
    if(U16_IS_LEAD((s)[(i)-1])) { \
        ++(i); \
    } \
}
Adjust a random-access offset to a code point boundary after a code point.

If the offset is behind the lead surrogate of a surrogate pair, then the offset is incremented. Otherwise, it is not modified. The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
See also:
U16_SET_CP_LIMIT

Stable:
ICU 2.4

Definition at line 608 of file utf16.h.

#define U16_SET_CP_START s,
start,
 ) 
 

Value:

{ \
    if(U16_IS_TRAIL((s)[i]) && (i)>(start) && U16_IS_LEAD((s)[(i)-1])) { \
        --(i); \
    } \
}
Adjust a random-access offset to a code point boundary at the start of a code point.

If the offset points to the trail surrogate of a surrogate pair, then the offset is decremented. Otherwise, it is not modified. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
start starting string offset (usually 0)
i string offset, must be start<=i
See also:
U16_SET_CP_START_UNSAFE

Stable:
ICU 2.4

Definition at line 443 of file utf16.h.

#define U16_SET_CP_START_UNSAFE s,
 ) 
 

Value:

{ \
    if(U16_IS_TRAIL((s)[i])) { \
        --(i); \
    } \
}
Adjust a random-access offset to a code point boundary at the start of a code point.

If the offset points to the trail surrogate of a surrogate pair, then the offset is decremented. Otherwise, it is not modified. "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
See also:
U16_SET_CP_START

Stable:
ICU 2.4

Definition at line 422 of file utf16.h.

#define U16_SURROGATE_OFFSET   ((0xd800<<10UL)+0xdc00-0x10000)
 

Helper constant for U16_GET_SUPPLEMENTARY.

Internal:
Do not use. This API is for interal use only.

Definition at line 95 of file utf16.h.

#define U16_TRAIL supplementary   )     (UChar)(((supplementary)&0x3ff)|0xdc00)
 

Get the trail surrogate (0xdc00..0xdfff) for a supplementary code point (0x10000..0x10ffff).

Parameters:
supplementary 32-bit code point (U+10000..U+10ffff)
Returns:
trail surrogate (U+dc00..U+dfff) for supplementary

Stable:
ICU 2.4

Definition at line 131 of file utf16.h.


Generated on Mon Jul 14 00:41:54 2008 for ICU 3.6 by doxygen 1.3.5