it.unimi.dsi.fastutil.bytes
Class ByteArrayFrontCodedList

java.lang.Object
  extended byit.unimi.dsi.fastutil.objects.AbstractObjectCollection
      extended byit.unimi.dsi.fastutil.objects.AbstractObjectList
          extended byit.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList
All Implemented Interfaces:
Cloneable, Collection, Comparable, List, ObjectCollection, ObjectList, Serializable, Stack

public class ByteArrayFrontCodedList
extends AbstractObjectList
implements Serializable, Cloneable

Compact storage of lists of arrays using front coding.

This class stores immutably a list of arrays in a single large array using front coding (of course, the compression will be reasonable only if the list is sorted lexicographically—see below). It implements an immutable type-specific list that returns the i-th array when calling get(i). The returned array may be freely modified.

Front coding is based on the idea that if the i-th and the (i+1)-th array have a common prefix, we might store the length of the common prefix, and then the rest of the second array.

This approach, of course, requires that once in a while an array is stored entirely. The ratio of a front-coded list defines how often this happens (once every ratio() arrays). A higher ratio means more compression, but means also a longer access time, as more arrays have to be probed to build the result. Note that we must build an array every time get(int) is called, but this class provides also methods that extract one of the stored arrays in a given array, reducing garbage collection. See the documentation of the family of get() methods.

By setting the ratio to 1 we actually disable front coding: however, we still have a data structure storing large list of arrays with a reduced overhead (just one integer per array, plus the space required for lengths).

Note that the typical usage of front-coded lists is under the form of serialized objects; usually, the data that has to be compacted is processed offline, and the resulting structure is stored permanently. Since the pointer array is not stored, the serialized format is very small.

Implementation Details

All arrays are stored in a large array. A separate array of pointers indexes arrays whose position is a multiple of the ratio: thus, a higher ratio means also less pointers.

More in detail, an array whose position is a multiple of the ratio is stored as the array length, followed by the elements of the array. The array length is coded by a simple variable-length list of k-1 bit blocks, where k is the number of bits of the underlying primitive type. All other arrays are stored as follows: let common the length of the maximum common prefix between the array and its predecessor. Then we store the array length decremented by common, followed by common, followed by the array elements whose index is greater than or equal to common. For instance, if we store foo, foobar, football and fool in a front-coded character-array list with ratio 3, the character array will contain

 3 f o o 3 3 b a r 5 3 t b a l l 4 f o o l 
 

Limitations

All arrays are stored in a large array: thus, the compressed list must not exceed Integer.MAX_VALUE elements. Moreover, iterators are less efficient when they move back, as previous() cannot build incrementally the previous array (whereas (next() can).

See Also:
Serialized Form

Field Summary
static long serialVersionUID
           
 
Constructor Summary
ByteArrayFrontCodedList(Collection c, int ratio)
          Creates a new front-coded list containing the arrays in the given collection.
ByteArrayFrontCodedList(Iterator arrays, int ratio)
          Creates a new front-coded list containing the arrays returned by the given iterator.
 
Method Summary
 int arrayLength(int index)
          Computes the length of the array at the given index.
 Object clone()
          Returns a copy of this list.
 Object get(int index)
           
 int get(int index, byte[] a)
          Stores in the given array an array stored in this front-coded list.
 int get(int index, byte[] a, int offset, int length)
          Stores in the given array elements from an array stored in this front-coded list.
 byte[] getArray(int index)
           
 ObjectListIterator objectListIterator(int start)
          Returns a type-specific list iterator on the list starting at a given index.
 int ratio()
          Returns the ratio of this list.
 int size()
           
 String toString()
           
 
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
add, addAll, addAll, addAll, addAll, addAll, addAll, addElements, addElements, compareTo, contains, equals, getElements, hashCode, indexOf, lastIndexOf, listIterator, listIterator, objectIterator, objectListIterator, objectSubList, peek, pop, push, rem, remove, removeElements, set, size, subList, top
 
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectCollection
add, clear, containsAll, isEmpty, iterator, remove, removeAll, retainAll, toArray, toArray
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.List
add, clear, containsAll, isEmpty, iterator, remove, removeAll, retainAll, toArray, toArray
 
Methods inherited from interface it.unimi.dsi.fastutil.Stack
isEmpty
 

Field Detail

serialVersionUID

public static final long serialVersionUID
See Also:
Constant Field Values
Constructor Detail

ByteArrayFrontCodedList

public ByteArrayFrontCodedList(Iterator arrays,
                               int ratio)
Creates a new front-coded list containing the arrays returned by the given iterator.

Parameters:
arrays - an iterator returning arrays.
ratio - the desired ratio.

ByteArrayFrontCodedList

public ByteArrayFrontCodedList(Collection c,
                               int ratio)
Creates a new front-coded list containing the arrays in the given collection.

Parameters:
c - a collection containing arrays.
ratio - the desired ratio.
Method Detail

ratio

public int ratio()
Returns the ratio of this list.

Returns:
the ratio of this list.

arrayLength

public int arrayLength(int index)
Computes the length of the array at the given index.

Parameters:
index - an index.
Returns:
the length of the index-th array.

get

public Object get(int index)
Specified by:
get in interface List

getArray

public byte[] getArray(int index)
See Also:
get(int)

get

public int get(int index,
               byte[] a,
               int offset,
               int length)
Stores in the given array elements from an array stored in this front-coded list.

Parameters:
index - an index.
a - the array that will store the result.
offset - an offset into a where elements will be store.
length - a maximum number of elements to store in a.
Returns:
if a can hold the extracted elements, the number of extracted elements; otherwise, the number of remaining elements with the sign changed.

get

public int get(int index,
               byte[] a)
Stores in the given array an array stored in this front-coded list.

Parameters:
index - an index.
a - the array that will store the content of the result (we assume that it can hold the result).
Returns:
if a can hold the extracted elements, the number of extracted elements; otherwise, the number of remaining elements with the sign changed.

size

public int size()
Specified by:
size in interface List

objectListIterator

public ObjectListIterator objectListIterator(int start)
Description copied from interface: ObjectList
Returns a type-specific list iterator on the list starting at a given index.

The iterator returned by the List.listIterator() method and by this method are identical; however, using this method you can save a type casting.

Specified by:
objectListIterator in interface ObjectList
See Also:
List.listIterator(int)

clone

public Object clone()
Returns a copy of this list.

Returns:
a copy of this list.

toString

public String toString()
Overrides:
toString in class AbstractObjectList