(Cryptix Application Note #3 --July 1997)
Table of Contents
Return to the Application Notes Index or the FAQ or the Overview
In this note, I discuss the Java implementation of LOKI'91 --an Australian symmetric block cipher algorithm. The code is ported from a 'C' version written by the principal designers of the algorithm, Matthew Kwan and Lawrence Brown. It can be obtained from the ftp.replay.com archives.
The objectives of this implementation project can be summarised in these simple points:
Coding a cipher algorithm to comply with the I/JCE involves the following:
import java.security.Cipher; import java.security.Key; import java.security.KeyException; public class LOKI91 extends Cipher ...
public int engineBlockSize () { ... } public void engineInitEncrypt (Key key) throws KeyException { ... } public void engineInitDecrypt (Key key) throws KeyException { ... } public int engineUpdate (byte[] in, int inOff, int inLen, byte[] out, int outOff) { ...
public LOKI91 () { super(false, false, "Cryptix"); ... }
The first argument in the super() invocation tells the superclass whether or not this implementation handles Buffering. The second tells it whether or not it handles Padding. The last argument is the Security Provider for this algorithm. For more information about what a Security Provider is, see the Java Cryptography Architecture document at Sun's java security page.
In our case, LOKI'91 being a symmetric cipher we chose the first one. Our class declaration is now:
import java.security.Cipher; import java.security.Key; import java.security.KeyException; import java.security.SymmetricCipher; public class LOKI91 extends Cipher implements SymmetricCipher { ... }
This method should return the size in bytes of the cipher's block. LOKI'91 being a 64-bit block cipher, we define a constant we call BLOCK_SIZE set to 8 which we return when this method is invoked.
public int engineBlockSize () { return BLOCK_SIZE; }
These methods respectively generate an encryption or decryption session-key from a user-supplied cryptographic data in a java.security.Key object. They also set the state of the cipher instance to ENCRYPT or DECRYPT respectively.
In LOKI'91, both encryption and decryption use the same session-key. What differs is the order of the cryptographic formulae applied to the data using this key. Our implementation centralises this code in a makeKey() method:
public void engineInitEncrypt (Key key) throws KeyException { makeKey(key); state = ENCRYPT; } public void engineInitDecrypt (Key key) throws KeyException { makeKey(key); state = DECRYPT; }
This method is the heart of the IJCE API. It allows a caller to input a byte subarray and obtain the result in an output subarray.
The JCE specifications indicate that the caller may use the same or overlapping subarray locations for the input and output arguments. How the cipher knows that it's encryption or decryption we need to carry on is function of the cipher instance's state value.
Below is the code we use for the Loki implementation. It is also basically the same as what we use for all the other symmetric ciphers in the Cryptix library. The method returns the number of bytes effectively en/de-crypted, the formal parameters have the following meaning:
in |
the input data. |
inOff |
the offset into in specifying where the data starts. |
inLen |
the length of the subarray. It's the responsibility of the caller to ensure that inLen = n * 8 bytes. |
out |
the output array. |
outOff |
the offset indicating where to start writing into the out array. |
public int engineUpdate (byte[] in, int inOff, int inLen, byte[] out, int outOff) { int blockCount = inLen / BLOCK_SIZE; byte[] outs = new byte[blockCount * BLOCK_SIZE]; int outsOff = 0; if (state == Cipher.ENCRYPT) for (int i = 0; i < blockCount; i++) { lokiEncrypt(in, inOff, outs outsOff); inOff += BLOCK_SIZE; outsOff += BLOCK_SIZE; } else for (int i = 0; i < blockCount; i++) { lokiDecrypt(in, inOff, outs, outsOff); inOff += BLOCK_SIZE; outsOff += BLOCK_SIZE; } System.arraycopy(outs, 0, out, outOff, outs.length); return outs.length; }
Loki has four weak keys and 12 semi-weak keys that if used will reduce the strength of the cipher against differential cryptanalysis. It is then desireable to disallow the use of such keys, or for the sake of compatibility with other implementations, allow them if explicitely instructed to do so.
To offer the user such control over the choice of using or not weak and semi-weak keys, I resorted to the use of an algorithm-specific property. This property, referred to as USE_WEAK_KEYS, is to be set in the Security Provider properties file. If present it can be true or false, with true meaning allow the use of weak and semi-weak keys, and false meaning do not allow such use.
The line in the properties file looks as follows:
Alg.USE_WEAK_KEYS.LOKI91=true
In the code a corresponding private boolean variable named usingWeakKeys is used to reflect the value of this property. Here is now the complete code of the LOKI91 constructor:
public LOKI91 () { super(false, false, "Cryptix"); try { String ps = Security.getAlgorithmProperty("LOKI91", "USE_WEAK_KEYS"); usingWeakKeys = Boolean.valueOf(ps).booleanValue(); } catch (Exception e) { usingWeakKeys = false; } }
To allow more flexibility for the user of the implementation, some accessors to this variable are defined:
/** * Returns the current value in use for the USE_WEAK_KEYS property. */ public boolean isUsingWeakKeys () { return usingWeakKeys; } /** * Sets the current value for this instance's USE_WEAK_KEYS property. * * @param b the new value for the property. */ public void setUsingWeakKeys (boolean b) { usingWeakKeys = b; }
Now that we coded the algorithm we need to make sure that it does what it's supposed to do; otherwise we might discover that we have inadvertently invented a new cipher ;-)
All respectable cryptographic algorithms come with a set of test data, detailing some input vectors, and the respective output given specific conditions (mode of operations, key data, initialisation vector, etc...). LOKI'91 is the same. Included in the source code distribution is a set of test data: a single vector and a set of 'triplets.'
The test class source code is available in TestLOKI91.java. Here are the top lines from the class output.
Test vector (single):
Encrypting:
computed: C86CAEC1E3B7B17E
certified: C86CAEC1E3B7B17E
*** GOOD
Decrypting:
computed: 126898D55E911500
certified: 126898D55E911500
*** GOOD
Test vectors (triplets):
Key Plain-text Cipher-text Enc. Dec.
................ ................ ................ ...... ......
0000000000000000 0000000000000000 BD84A2085EF609C7 GOOD GOOD
0000000000000000 BD84A2085EF609C7 0000000000000000 GOOD GOOD
FFFFFFFFFFFFFFFF 0000000000000000 5C77E002D1991C4D GOOD GOOD
FFFFFFFFFFFFFFFF 5C77E002D1991C4D 0000000000000000 GOOD GOOD
...
If now we set the USE_WEAK_KEY property to false, or not define it at all, the test output for the triplets will show the following:
... Test vectors (triplets): Key Plain-text Cipher-text Enc. Dec. ................ ................ ................ ...... ...... 0000000000000000 *** java.security.KeyException: User key is a LOKI91 [Semi-]Weak Key 0000000000000000 *** java.security.KeyException: User key is a LOKI91 [Semi-]Weak Key FFFFFFFFFFFFFFFF *** java.security.KeyException: User key is a LOKI91 [Semi-]Weak Key ...
The complete source code for the first take of the implementation is given in LOKI91_1.java.
So far, so good. But can we make it better? Better than what you might ask. Well, since we have all the other cipher implementations, let's first have a look at how this version compares to the other symmetric block ciphers. For this we'll use the TestBCs class used in Application Note #2.
The speed test results for the first cut of LOKI'91 are as follows, when run on a Pentium 133MHz. with the JDK 1.1.3
|
Without JIT compiler |
With JIT compiler | ||
---|---|---|---|---|
Algorithm |
Time |
Rate |
Time |
Rate |
Blowfish |
10,820 |
757 |
1,710 |
4,790 |
LOKI91 (slow) |
1,532,590 |
5 |
101,940 |
80 |
RC2 |
19,550 |
419 |
2,250 |
3,640 |
CAST5 |
13,020 |
629 |
2,470 |
3,316 |
IDEA |
21,200 |
385 |
3,190 |
2,568 |
SAFER |
27,350 |
299 |
3,380 |
2,445 |
So what does this mean? Simply put, that we have a perfectly correct but unusable implementation of the LOKI'91 cipher algorithm!
To make it faster let's have a closer look at the encryption method:
private void lokiEncrypt (byte[] in, int off, byte[] out, int outOff) { ... for (int i = 0; i < ROUNDS; ) { a = R ^ sKey[i++]; b = s(a) | s(a >>> 8) << 8 | s(a >>> 16) << 16 | s(a >>> 24 | a << 8) << 24; a = 0; for (int j = 0; j < 32; j++) a |= ((b >>> P[j]) & 0x1) << (31 - j); L ^= a; a = L ^ sKey[i++]; b = s(a) | s(a >>> 8) << 8 | s(a >>> 16) << 16 | s(a >>> 24 | a << 8) << 24; a = 0; for (int j = 0; j < 32; j++) a |= ((b >>> P[j]) & 0x1) << (31 - j); R ^= a; } ... }
For each half-block (32-bit), the algorithm invokes the method s four times to generate a new 32-bit block that is then permutated before being XORed with a value. The s method itself is:
private static int s (int i) { i &= 0xFFF; int r = ((i >>> 8) & 0xC) | (i & 0x3); int c = (i >>> 2) & 0xFF; int t = (c + ((r * 17) ^ 0xFF)) & 0xFF; return exp8(t, fExp[r], fGen[r]) & 0xFF; }
As we can see the input to s is a 12-bit value and the output is a byte (8-bit). If then we can pre-compute all possible values of s, we can then use an array of bytes with the input as its index. This will replace the four method invocations in each 32-bit block to four accesses of a byte array. The source code in LOKI91_2.java does just that. Here are the code fragment and the results of the test using this new version:
private void lokiEncrypt (byte[] in, int off, byte[] out, int outOff) { ... for (int i = 0; i < ROUNDS; ) { a = R ^ sKey[i++]; b = S[ a & 0xFFF] & 0xFF | (S[(a >>> 8) & 0xFFF] & 0xFF) << 8 | (S[(a >>> 16) & 0xFFF] & 0xFF) << 16 | (S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF) << 24; a = 0; for (int j = 0; j < 32; j++) a |= ((b >>> P[j]) & 0x1) << (31 - j); L ^= a; a = L ^ sKey[i++]; b = S[ a & 0xFFF] & 0xFF | (S[(a >>> 8) & 0xFFF] & 0xFF) << 8 | (S[(a >>> 16) & 0xFFF] & 0xFF) << 16 | (S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF) << 24; a = 0; for (int j = 0; j < 32; j++) a |= ((b >>> P[j]) & 0x1) << (31 - j); R ^= a; } ... }
|
Without JIT compiler |
With JIT compiler | ||
---|---|---|---|---|
Algorithm |
Time |
Rate |
Time |
Rate |
Blowfish |
10,820 |
757 |
1,710 |
4,790 |
LOKI91 (medium) |
92,600 |
88 |
12,350 |
663 |
RC2 |
19,550 |
419 |
2,250 |
3,640 |
CAST5 |
13,020 |
629 |
2,470 |
3,316 |
IDEA |
21,200 |
385 |
3,190 |
2,568 |
SAFER |
27,350 |
299 |
3,380 |
2,445 |
That's much better, but still no cigar! Looking again to the code above we're tempted to use the same technique with the Permutation arrays. For this we used four arrays of ints, giving the result of the permutation for a byte depending on its position in the 32-bit chunk. The source code in LOKI91_3.java does just that. Here are the code fragment and the results of the test using this new version:
private void lokiEncrypt (byte[] in, int off, byte[] out, int outOff) { ... for (int i = 0; i < ROUNDS; ) { a = R ^ sKey[i++]; L ^= P0[S[ a & 0xFFF] & 0xFF] | P1[S[(a >>> 8) & 0xFFF] & 0xFF] | P2[S[(a >>> 16) & 0xFFF] & 0xFF] | P3[S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF]; a = L ^ sKey[i++]; R ^= P0[S[ a & 0xFFF] & 0xFF] | P1[S[(a >>> 8) & 0xFFF] & 0xFF] | P2[S[(a >>> 16) & 0xFFF] & 0xFF] | P3[S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF]; } ... }
|
Without JIT compiler |
With JIT compiler | ||
---|---|---|---|---|
Algorithm |
Time |
Rate |
Time |
Rate |
Blowfish |
10,820 |
757 |
1,710 |
4,790 |
LOKI91 (fast) |
14,440 |
567 |
2,190 |
3,740 |
RC2 |
19,550 |
419 |
2,250 |
3,640 |
CAST5 |
13,020 |
629 |
2,470 |
3,316 |
IDEA |
21,200 |
385 |
3,190 |
2,568 |
SAFER |
27,350 |
299 |
3,380 |
2,445 |
We're almost there ;-) Time-wise, now we have a workable implementation, but what about Space? Looking at the size of the LOKI91.class file with the latest version you'll see that it grew to 45KB. That doesn't cut it
But why 45KB? Our S-Box occupies 4,096 bytes and the four Pi permutation arrays occupy each 256 ints or also a total of 4KB. Yes but that's only the run-time space, we shouldn't forget the source code itself which lists all the values of these variables. Clearly, compiling a java source into it's class code doesn't reduce the size to what it's going to be at run-time.
Looking at the values of the Pi permutation arrays, you'll notice that the permutation operation
As a consequence, we don't need to pre-compute all four groups of permutations. Only the first is enough! This translates into a run-time space savings of at least 3 * 256 * ints or 3KB. The code for the encrypt routine now looks as follows:
private void lokiEncrypt (byte[] in, int off, byte[] out, int outOff) { ... for (int i = 0; i < ROUNDS; ) { a = R ^ sKey[i++]; L ^= P[S[ a & 0xFFF] & 0xFF] | P[S[(a >>> 8) & 0xFFF] & 0xFF] << 1 | P[S[(a >>> 16) & 0xFFF] & 0xFF] << 2 | P[S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF] << 3; a = L ^ sKey[i++]; R ^= P[S[ a & 0xFFF] & 0xFF] | P[S[(a >>> 8) & 0xFFF] & 0xFF] << 1 | P[S[(a >>> 16) & 0xFFF] & 0xFF] << 2 | P[S[(a >>> 24 | a << 8) & 0xFFF] & 0xFF] << 3; } ... }
Finally, to really reduce the size with a negligeable deterioration in speed, we generate both the S and P arrays values at class instantiation. For this we used a java 1.1 new feature known as: Blank Finals. A java Blank Final is a final field that doesn't have an initial value when it's first defined. Blank Finals though must have a value assigned to them before they are ever used. Once a value has been assigned to such a field, it can never be changed.
private static final byte[] S = new byte[0x1000]; private static final int[] P = new int[256]; static { int[] sGen = { 375, 379, 391, 395, 397, 415, 419, 425, 433, 445, 451, 463, 471, 477, 487, 499}; int r; int c; int t; for (int i = 0; i < 0x1000; i++) { r = ((i >>> 8) & 0x0C) | (i & 0x3); c = (i >>> 2) & 0xFF; t = (c + ((r * 17) ^ 0xFF)) & 0xFF; S[i] = exp31(t, sGen[r]); } int[] Po = { 31, 23, 15, 7, 30, 22, 14, 6, 29, 21, 13, 5, 28, 20, 12, 4, 27, 19, 11, 3, 26, 18, 10, 2, 25, 17, 9, 1, 24, 16, 8, 0}; int s; for (int i = 0; i < 0x100; i++) { s = 0; for (int j = 0; j < 32; j++) s |= ((i >>> Po[j]) & 0x1) << (31 - j); P[i] = s; } }
The size of the class file now drops to just 5KB! The source code for this final version is given in LOKI91.java.
Copyright © 1996-1997 Systemics Ltd
on behalf of the Cryptix Development Team.
All rights reserved.
Cryptix is a trademark of Systemics Ltd.