Cryptix logo

Implementing LOKI'91 in Java

(Cryptix Application Note #3 --July 1997)

 Table of Contents

Return to the Application Notes Index or the FAQ or the Overview


Overview

In this note, I discuss the Java implementation of LOKI'91 --an Australian symmetric block cipher algorithm. The code is ported from a 'C' version written by the principal designers of the algorithm, Matthew Kwan and Lawrence Brown. It can be obtained from the ftp.replay.com archives.

The objectives of this implementation project can be summarised in these simple points:

 
Coding to the I/JCE API

Coding a cipher algorithm to comply with the I/JCE involves the following:

The engineBlockSize() method

This method should return the size in bytes of the cipher's block. LOKI'91 being a 64-bit block cipher, we define a constant we call BLOCK_SIZE set to 8 which we return when this method is invoked.

public int engineBlockSize () {
    return BLOCK_SIZE;
}

The engineInitEncrypt() and engineInitDecrypt() methods

These methods respectively generate an encryption or decryption session-key from a user-supplied cryptographic data in a java.security.Key object. They also set the state of the cipher instance to ENCRYPT or DECRYPT respectively.

In LOKI'91, both encryption and decryption use the same session-key. What differs is the order of the cryptographic formulae applied to the data using this key. Our implementation centralises this code in a makeKey() method:

public void engineInitEncrypt (Key key)
throws KeyException {
 
    makeKey(key);
    state = ENCRYPT;
}
 
public void engineInitDecrypt (Key key)
throws KeyException {
 
    makeKey(key);
    state = DECRYPT;
}

The engineUpdate() method

This method is the heart of the IJCE API. It allows a caller to input a byte subarray and obtain the result in an output subarray.

The JCE specifications indicate that the caller may use the same or overlapping subarray locations for the input and output arguments. How the cipher knows that it's encryption or decryption we need to carry on is function of the cipher instance's state value.

Below is the code we use for the Loki implementation. It is also basically the same as what we use for all the other symmetric ciphers in the Cryptix library. The method returns the number of bytes effectively en/de-crypted, the formal parameters have the following meaning:

in

the input data.

inOff

the offset into in specifying where the data starts.

inLen

the length of the subarray. It's the responsibility of the caller to ensure that inLen = n * 8 bytes.

out

the output array.

outOff

the offset indicating where to start writing into the out array.

public int engineUpdate (byte[] in, int inOff, int inLen,
                         byte[] out, int outOff) {
 
    int blockCount = inLen / BLOCK_SIZE;
    byte[] outs = new byte[blockCount * BLOCK_SIZE];
    int outsOff = 0;
 
    if (state == Cipher.ENCRYPT)
        for (int i = 0; i < blockCount; i++) {
            lokiEncrypt(in, inOff, outs outsOff);
            inOff += BLOCK_SIZE;
            outsOff += BLOCK_SIZE;
        }
    else
        for (int i = 0; i < blockCount; i++) {
            lokiDecrypt(in, inOff, outs, outsOff);
            inOff += BLOCK_SIZE;
            outsOff += BLOCK_SIZE;
        }
    System.arraycopy(outs, 0, out, outOff, outs.length);
    return outs.length;
}


What about algorithm specific features?

Loki has four weak keys and 12 semi-weak keys that if used will reduce the strength of the cipher against differential cryptanalysis. It is then desireable to disallow the use of such keys, or for the sake of compatibility with other implementations, allow them if explicitely instructed to do so.

To offer the user such control over the choice of using or not weak and semi-weak keys, I resorted to the use of an algorithm-specific property. This property, referred to as USE_WEAK_KEYS, is to be set in the Security Provider properties file. If present it can be true or false, with true meaning allow the use of weak and semi-weak keys, and false meaning do not allow such use.

The line in the properties file looks as follows:

Alg.USE_WEAK_KEYS.LOKI91=true

In the code a corresponding private boolean variable named usingWeakKeys is used to reflect the value of this property. Here is now the complete code of the LOKI91 constructor:

public LOKI91 () {
    super(false, false, "Cryptix");
    try {
        String ps = Security.getAlgorithmProperty("LOKI91",
            "USE_WEAK_KEYS");
        usingWeakKeys = Boolean.valueOf(ps).booleanValue();
    } catch (Exception e) {
        usingWeakKeys = false;
    }
}

To allow more flexibility for the user of the implementation, some accessors to this variable are defined:

/**
 *    Returns the current value in use for the USE_WEAK_KEYS property.
 */
public boolean isUsingWeakKeys () {
    return usingWeakKeys;
}
 
/**
 *    Sets the current value for this instance's USE_WEAK_KEYS property.
 *
 *    @param    b    the new value for the property.
 */
public void setUsingWeakKeys (boolean b) {
    usingWeakKeys = b;
}


Does it work?

Now that we coded the algorithm we need to make sure that it does what it's supposed to do; otherwise we might discover that we have inadvertently invented a new cipher ;-)

All respectable cryptographic algorithms come with a set of test data, detailing some input vectors, and the respective output given specific conditions (mode of operations, key data, initialisation vector, etc...). LOKI'91 is the same. Included in the source code distribution is a set of test data: a single vector and a set of 'triplets.'

The test class source code is available in TestLOKI91.java. Here are the top lines from the class output.

Test vector (single):
Encrypting:
  computed: C86CAEC1E3B7B17E
 certified: C86CAEC1E3B7B17E
 *** GOOD
 
Decrypting:
  computed: 126898D55E911500
 certified: 126898D55E911500
 *** GOOD
 
Test vectors (triplets):
Key              Plain-text       Cipher-text      Enc.   Dec.
................ ................ ................ ...... ......
0000000000000000 0000000000000000 BD84A2085EF609C7 GOOD   GOOD
0000000000000000 BD84A2085EF609C7 0000000000000000 GOOD   GOOD
FFFFFFFFFFFFFFFF 0000000000000000 5C77E002D1991C4D GOOD   GOOD
FFFFFFFFFFFFFFFF 5C77E002D1991C4D 0000000000000000 GOOD   GOOD
...

If now we set the USE_WEAK_KEY property to false, or not define it at all, the test output for the triplets will show the following:

...
Test vectors (triplets):
Key              Plain-text       Cipher-text      Enc.   Dec.
................ ................ ................ ...... ......
0000000000000000 *** java.security.KeyException: User key is a LOKI91 [Semi-]Weak Key
0000000000000000 *** java.security.KeyException: User key is a LOKI91 [Semi-]Weak Key
FFFFFFFFFFFFFFFF *** java.security.KeyException: User key is a LOKI91 [Semi-]Weak Key
...
 

The complete source code for the first take of the implementation is given in LOKI91_1.java.


Can we make it better?

So far, so good. But can we make it better? Better than what you might ask. Well, since we have all the other cipher implementations, let's first have a look at how this version compares to the other symmetric block ciphers. For this we'll use the TestBCs class used in Application Note #2.

The speed test results for the first cut of LOKI'91 are as follows, when run on a Pentium 133MHz. with the JDK 1.1.3

Table-1: Encrypting 1MB speed comparison (slow LOKI91)

Without JIT compiler

With JIT compiler

Algorithm

Time
(millis.)

Rate
(Kbits/sec)

Time
(millis.)

Rate
(Kbits/sec)

Blowfish

10,820

757

1,710

4,790

LOKI91 (slow)

1,532,590

5

101,940

80

RC2

19,550

419

2,250

3,640

CAST5

13,020

629

2,470

3,316

IDEA

21,200

385

3,190

2,568

SAFER

27,350

299

3,380

2,445

So what does this mean? Simply put, that we have a perfectly correct but unusable implementation of the LOKI'91 cipher algorithm!

To make it faster let's have a closer look at the encryption method:

private void
lokiEncrypt (byte[] in, int off, byte[] out, int outOff) {
    ...
    for (int i = 0; i < ROUNDS; ) {
        a = R ^ sKey[i++];
        b = s(a)                       |
            s(a >>>  8)          <<  8 |
            s(a >>> 16)          << 16 |
            s(a >>> 24 | a << 8) << 24;
        a = 0;
        for (int j = 0; j < 32; j++)
            a |= ((b >>> P[j]) & 0x1) << (31 - j);
        L ^= a;
 
        a = L ^ sKey[i++];
        b = s(a)                       |
            s(a >>>  8)          <<  8 |
            s(a >>> 16)          << 16 |
            s(a >>> 24 | a << 8) << 24;
        a = 0;
        for (int j = 0; j < 32; j++)
            a |= ((b >>> P[j]) & 0x1) << (31 - j);
        R ^= a;
    }
    ...
}

For each half-block (32-bit), the algorithm invokes the method s four times to generate a new 32-bit block that is then permutated before being XORed with a value. The s method itself is:

private static int s (int i) {
    i &= 0xFFF;
    int r = ((i >>> 8) & 0xC) | (i & 0x3);
    int c =  (i >>> 2) & 0xFF;
    int t = (c + ((r * 17) ^ 0xFF)) & 0xFF;
    return exp8(t, fExp[r], fGen[r]) & 0xFF;
}

As we can see the input to s is a 12-bit value and the output is a byte (8-bit). If then we can pre-compute all possible values of s, we can then use an array of bytes with the input as its index. This will replace the four method invocations in each 32-bit block to four accesses of a byte array. The source code in LOKI91_2.java does just that. Here are the code fragment and the results of the test using this new version:

private void
lokiEncrypt (byte[] in, int off, byte[] out, int outOff) {
    ...
    for (int i = 0; i < ROUNDS; ) {
        a = R ^ sKey[i++];
        b =  S[ a                  & 0xFFF] & 0xFF |
            (S[(a >>>  8)          & 0xFFF] & 0xFF) <<  8 |
            (S[(a >>> 16)          & 0xFFF] & 0xFF) << 16 |
            (S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF) << 24;
        a = 0;
        for (int j = 0; j < 32; j++)
            a |= ((b >>> P[j]) & 0x1) << (31 - j);
        L ^= a;
 
        a = L ^ sKey[i++];
        b =  S[ a                  & 0xFFF] & 0xFF |
            (S[(a >>>  8)          & 0xFFF] & 0xFF) <<  8 |
            (S[(a >>> 16)          & 0xFFF] & 0xFF) << 16 |
            (S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF) << 24;
        a = 0;
        for (int j = 0; j < 32; j++)
            a |= ((b >>> P[j]) & 0x1) << (31 - j);
        R ^= a;
    }
    ...
}

Table-2: Encrypting 1MB speed comparison (medium LOKI91)

Without JIT compiler

With JIT compiler

Algorithm

Time
(millis.)

Rate
(Kbits/sec)

Time
(millis.)

Rate
(Kbits/sec)

Blowfish

10,820

757

1,710

4,790

LOKI91 (medium)

92,600

88

12,350

663

RC2

19,550

419

2,250

3,640

CAST5

13,020

629

2,470

3,316

IDEA

21,200

385

3,190

2,568

SAFER

27,350

299

3,380

2,445

That's much better, but still no cigar! Looking again to the code above we're tempted to use the same technique with the Permutation arrays. For this we used four arrays of ints, giving the result of the permutation for a byte depending on its position in the 32-bit chunk. The source code in LOKI91_3.java does just that. Here are the code fragment and the results of the test using this new version:

private void
lokiEncrypt (byte[] in, int off, byte[] out, int outOff) {
    ...
    for (int i = 0; i < ROUNDS; ) {
        a = R ^ sKey[i++];
        L ^= P0[S[ a                  & 0xFFF] & 0xFF] |
             P1[S[(a >>>  8)          & 0xFFF] & 0xFF] |
             P2[S[(a >>> 16)          & 0xFFF] & 0xFF] |
             P3[S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF];
 
        a = L ^ sKey[i++];
        R ^= P0[S[ a                  & 0xFFF] & 0xFF] |
             P1[S[(a >>>  8)          & 0xFFF] & 0xFF] |
             P2[S[(a >>> 16)          & 0xFFF] & 0xFF] |
             P3[S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF];
    }
    ...
}

Table-3: Encrypting 1MB speed comparison (fast LOKI91)

Without JIT compiler

With JIT compiler

Algorithm

Time
(millis.)

Rate
(Kbits/sec)

Time
(millis.)

Rate
(Kbits/sec)

Blowfish

10,820

757

1,710

4,790

LOKI91 (fast)

14,440

567

2,190

3,740

RC2

19,550

419

2,250

3,640

CAST5

13,020

629

2,470

3,316

IDEA

21,200

385

3,190

2,568

SAFER

27,350

299

3,380

2,445

We're almost there ;-) Time-wise, now we have a workable implementation, but what about Space? Looking at the size of the LOKI91.class file with the latest version you'll see that it grew to 45KB. That doesn't cut it

But why 45KB? Our S-Box occupies 4,096 bytes and the four Pi permutation arrays occupy each 256 ints or also a total of 4KB. Yes but that's only the run-time space, we shouldn't forget the source code itself which lists all the values of these variables. Clearly, compiling a java source into it's class code doesn't reduce the size to what it's going to be at run-time.

Looking at the values of the Pi permutation arrays, you'll notice that the permutation operation

As a consequence, we don't need to pre-compute all four groups of permutations. Only the first is enough! This translates into a run-time space savings of at least 3 * 256 * ints or 3KB. The code for the encrypt routine now looks as follows:

private void
lokiEncrypt (byte[] in, int off, byte[] out, int outOff) {
    ...
    for (int i = 0; i < ROUNDS; ) {
        a = R ^ sKey[i++];
        L ^= P[S[ a                  & 0xFFF] & 0xFF] |
             P[S[(a >>>  8)          & 0xFFF] & 0xFF] << 1 |
             P[S[(a >>> 16)          & 0xFFF] & 0xFF] << 2 | 
             P[S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF] << 3;
 
        a = L ^ sKey[i++];
        R ^= P[S[ a                  & 0xFFF] & 0xFF] |
             P[S[(a >>>  8)          & 0xFFF] & 0xFF] << 1 |
             P[S[(a >>> 16)          & 0xFFF] & 0xFF] << 2 | 
             P[S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF] << 3;
    }
    ...
}

Finally, to really reduce the size with a negligeable deterioration in speed, we generate both the S and P arrays values at class instantiation. For this we used a java 1.1 new feature known as: Blank Finals. A java Blank Final is a final field that doesn't have an initial value when it's first defined. Blank Finals though must have a value assigned to them before they are ever used. Once a value has been assigned to such a field, it can never be changed.

private static final byte[] S = new byte[0x1000];
private static final int[] P = new int[256];
 
static {
 
    int[] sGen = {
        375, 379, 391, 395, 397, 415, 419, 425,
        433, 445, 451, 463, 471, 477, 487, 499};
    int r;
    int c;
    int t;
 
    for (int i = 0; i < 0x1000; i++) {
        r = ((i >>> 8) & 0x0C) | (i & 0x3);
        c =  (i >>> 2) & 0xFF;
        t = (c + ((r * 17) ^ 0xFF)) & 0xFF;
        S[i] = exp31(t, sGen[r]);
    }
 
    int[] Po = {
        31, 23, 15, 7, 30, 22, 14, 6,
        29, 21, 13, 5, 28, 20, 12, 4,
        27, 19, 11, 3, 26, 18, 10, 2,
        25, 17,  9, 1, 24, 16,  8, 0};
    int s;
 
    for (int i = 0; i < 0x100; i++) {
        s = 0;
        for (int j = 0; j < 32; j++)
            s |= ((i >>> Po[j]) & 0x1) << (31 - j);
        P[i] = s;
    }
}

The size of the class file now drops to just 5KB! The source code for this final version is given in LOKI91.java.


Cryptix
Raif S. Naffah
Chatswood, 14 July 1997.

Copyright © 1996-1997 Systemics Ltd
on behalf of the Cryptix Development Team.
All rights reserved.
Cryptix is a trademark of Systemics Ltd.