com.ibm.icu.text
Interface UForwardCharacterIterator
- UCharacterIterator
public interface UForwardCharacterIterator
Interface that defines an API for forward-only iteration
on text objects.
This is a minimal interface for iteration without random access
or backwards iteration. It is especially useful for wrapping
streams with converters into an object for collation or
normalization.
Characters can be accessed in two ways: as code units or as
code points.
Unicode code points are 21-bit integers and are the scalar values
of Unicode characters. ICU uses the type
int
for them.
Unicode code units are the storage units of a given
Unicode/UCS Transformation Format (a character encoding scheme).
With UTF-16, all code points can be represented with either one
or two code units ("surrogates").
String storage is typically based on code units, while properties
of characters are typically determined using code point values.
Some processes may be designed to work with sequences of code units,
or it may be known that all characters that are important to an
algorithm can be represented with single code units.
Other processes will need to use the code point access functions.
ForwardCharacterIterator provides next() to access
a code unit and advance an internal position into the text object,
similar to a
return text[position++]
.
It provides nextCodePoint() to access a code point and advance an internal
position.
nextCodePoint() assumes that the current position is that of
the beginning of a code point, i.e., of its first code unit.
After nextCodePoint(), this will be true again.
In general, access to code units and code points in the same
iteration loop should not be mixed. In UTF-16, if the current position
is on a second code unit (Low Surrogate), then only that code unit
is returned even by nextCodePoint().
Usage:
public void function1(UForwardCharacterIterator it) {
int c;
while((c=it.next())!=UForwardCharacterIterator.DONE) {
// use c
}
}
static int | DONE - Indicator that we have reached the ends of the UTF16 text.
|
int | next() - Returns the UTF16 code unit at index, and increments to the next
code unit (post-increment semantics).
|
int | nextCodePoint() - Returns the code point at index, and increments to the next code
point (post-increment semantics).
|
DONE
public static final int DONE
Indicator that we have reached the ends of the UTF16 text.
next
public int next()
Returns the UTF16 code unit at index, and increments to the next
code unit (post-increment semantics). If index is out of
range, DONE is returned, and the iterator is reset to the limit
of the text.
- the next UTF16 code unit, or DONE if the index is at the limit
of the text.
nextCodePoint
public int nextCodePoint()
Returns the code point at index, and increments to the next code
point (post-increment semantics). If index does not point to a
valid surrogate pair, the behavior is the same as
next()
. Otherwise the iterator is incremented past
the surrogate pair, and the code point represented by the pair
is returned.
- the next codepoint in text, or DONE if the index is at
the limit of the text.
Copyright (c) 2006 IBM Corporation and others.