com.ibm.icu.text

Class UnicodeDecompressor

Implemented Interfaces:
com.ibm.icu.text.SCSU

public final class UnicodeDecompressor
extends Object
implements com.ibm.icu.text.SCSU

A decompression engine implementing the Standard Compression Scheme for Unicode (SCSU) as outlined in Unicode Technical Report #6.

USAGE

The static methods on UnicodeDecompressor may be used in a straightforward manner to decompress simple strings:

  byte [] compressed = ... ; // get compressed bytes from somewhere
  String result = UnicodeDecompressor.decompress(compressed);
 

The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeDecompressor offers more powerful APIs allowing iterative decompression:

  // Decompress an array "bytes" of length "len" using a buffer of 512 chars
  // to the Writer "out"

  UnicodeDecompressor myDecompressor         = new UnicodeDecompressor();
  final static int    BUFSIZE                = 512;
  char []             charBuffer             = new char [ BUFSIZE ];
  int                 charsWritten           = 0;
  int []              bytesRead              = new int [1];
  int                 totalBytesDecompressed = 0;
  int                 totalCharsWritten      = 0;

  do {
    // do the decompression
    charsWritten = myDecompressor.decompress(bytes, totalBytesDecompressed, 
                                             len, bytesRead,
                                             charBuffer, 0, BUFSIZE);

    // do something with the current set of chars
    out.write(charBuffer, 0, charsWritten);

    // update the no. of bytes decompressed
    totalBytesDecompressed += bytesRead[0];

    // update the no. of chars written
    totalCharsWritten += charsWritten;

  } while(totalBytesDecompressed <32len);

  myDecompressor.reset(); // reuse decompressor
 

Decompression is performed according to the standard set forth in Unicode Technical Report #6

Author:
Stephen F. Booth
See Also:
UnicodeCompressor

Fields inherited from interface com.ibm.icu.text.SCSU

ARMENIANINDEX, COMPRESSIONOFFSET, GREEKINDEX, HALFWIDTHKATAKANAINDEX, HIRAGANAINDEX, INVALIDCHAR, INVALIDWINDOW, IPAEXTENSIONINDEX, KATAKANAINDEX, LATININDEX, MAXINDEX, NUMSTATICWINDOWS, NUMWINDOWS, RESERVEDINDEX, SCHANGE0, SCHANGE1, SCHANGE2, SCHANGE3, SCHANGE4, SCHANGE5, SCHANGE6, SCHANGE7, SCHANGEU, SDEFINE0, SDEFINE1, SDEFINE2, SDEFINE3, SDEFINE4, SDEFINE5, SDEFINE6, SDEFINE7, SDEFINEX, SINGLEBYTEMODE, SQUOTE0, SQUOTE1, SQUOTE2, SQUOTE3, SQUOTE4, SQUOTE5, SQUOTE6, SQUOTE7, SQUOTEU, SRESERVED, UCHANGE0, UCHANGE1, UCHANGE2, UCHANGE3, UCHANGE4, UCHANGE5, UCHANGE6, UCHANGE7, UDEFINE0, UDEFINE1, UDEFINE2, UDEFINE3, UDEFINE4, UDEFINE5, UDEFINE6, UDEFINE7, UDEFINEX, UNICODEMODE, UQUOTEU, URESERVED, sOffsetTable, sOffsets

Constructor Summary

UnicodeDecompressor()
Create a UnicodeDecompressor.

Method Summary

static String
decompress(byte[] buffer)
Decompress a byte array into a String.
static char[]
decompress(byte[] buffer, int start, int limit)
Decompress a byte array into a Unicode character array.
int
decompress(byte[] byteBuffer, int byteBufferStart, int byteBufferLimit, int[] bytesRead, char[] charBuffer, int charBufferStart, int charBufferLimit)
Decompress a byte array into a Unicode character array.
void
reset()
Reset the decompressor to its initial state.

Constructor Details

UnicodeDecompressor

public UnicodeDecompressor()
Create a UnicodeDecompressor. Sets all windows to their default values.

Method Details

decompress

public static String decompress(byte[] buffer)
Decompress a byte array into a String.
Parameters:
buffer - The byte array to decompress.
Returns:
A String containing the decompressed characters.
See Also:
decompress(byte [], int, int)

decompress

public static char[] decompress(byte[] buffer,
                                int start,
                                int limit)
Decompress a byte array into a Unicode character array.
Parameters:
buffer - The byte array to decompress.
start - The start of the byte run to decompress.
limit - The limit of the byte run to decompress.
Returns:
A character array containing the decompressed bytes.
See Also:
decompress(byte [])

decompress

public int decompress(byte[] byteBuffer,
                      int byteBufferStart,
                      int byteBufferLimit,
                      int[] bytesRead,
                      char[] charBuffer,
                      int charBufferStart,
                      int charBufferLimit)
Decompress a byte array into a Unicode character array. This function will either completely fill the output buffer, or consume the entire input.
Parameters:
byteBuffer - The byte buffer to decompress.
byteBufferStart - The start of the byte run to decompress.
byteBufferLimit - The limit of the byte run to decompress.
bytesRead - A one-element array. If not null, on return the number of bytes read from byteBuffer.
charBuffer - A buffer to receive the decompressed data. This buffer must be at minimum two characters in size.
charBufferStart - The starting offset to which to write decompressed data.
charBufferLimit - The limiting offset for writing decompressed data.
Returns:
The number of Unicode characters written to charBuffer.

reset

public void reset()
Reset the decompressor to its initial state.

Copyright (c) 2006 IBM Corporation and others.