Cryptography 101

(This is based upon a book chapter from the Handbook of Information Security Management, publishes by Auerbach Press.)

No security professional today can get away from it. The one thing many people dread is the complexities associated with cryptography. Upon hearing the word cryptography, we typically think of intensely complex mathematical problems. Linear algebra, elliptic curves, trigonometry and calculus still bring shivers to many people.

There are several sample programs used to illustrate the ciphers discussed in this article. These are written in PERL, although the PERL language is not discussed. There is also limited mathematics presented in this chapter. The PERL language source code for the example programs discussed in the article can be obtained from several websites as listed at the end of the article.

This article discusses what cryptography is, along with stream and block based ciphers. While it includes a brief discussion of symmetric and asymmetric cryptography, the primary area of discussion is around symmetric cryptography methods.

Cryptography Defined

Before entering into the discussion, the security professional should be familiar with a series of terms. Plaintext is the original message the sender wants to communicate and when disguised it is called ciphertext. Encryption itself is the process of manipulating and transforming the original message in such a way as to make it difficult, if not impossible, for someone without the appropriate information to determine the original message. Decryption then is the reverse, or the manipulation and transformation of the ciphertext to recover the plaintext.

The encryption and decryption process often use a key, which is a sequence used to perform the encryption. Cryptographers often refer to the key as the algorithm itself. Symmetric encryption systems use the same key to encrypt and decrypt the message, while asymmetric systems use different keys.

Cryptography itself is a diverse science, devoted to finding new ways of disguising or hiding the plaintext. Cryptographers attempt to define new algorithms or protocols to disguise the original methods. Cryptanalysis is the reverse of cryptography, as it is the science of breaking cryptographic algorithms. The better analyzed a cryptographic scheme through cryptanalysis, the more confidence the security community has in the cryptographic algorithm. The process and benefit of cryptanalysis is discussed later in this chapter.

The people who create cryptographic algorithms and ciphers are called cryptographers. Cryptanalysts are people who work to defeat the cipher created by the cryptographer. Cryptanalysts do not have the key associated with the cipher, and may not even know the algorithm in use. Finally, cryptology is the study of both cryptography and cryptanalysis.

Why do we need Cryptography?

Cryptography has been used in many applications, particularly military or within governments to prevent spies from other governments or agencies from determining what is actually occurring. Today, we use cryptography for similar reasons including the protection of passwords (Identification and Authentication), secure communications, secret sharing, electronic commerce, message certification, electronic mail, financial transactions, remote access and simply the protection of information we do not want others to see. Generally, the less complex or predominately used applications are enabled to use cryptography before others.

Identification and Authentication

Cryptography is often used to identify and authenticate the person sending the message. Identification consist of identifying the person sending the message, while authentication verifies the identity of sender. Authorization is neither of these, but the process of determining if a person or system is authorized to perform the requested function. We commonly see digital signatures used to identify and authentication a message or provide access to a service such as a bank account, information stored on a disk drive or television.

Secure Communication

This is the most common use for cryptography, in that it provides a method for two or more people to protect the contents of a message by preventing unauthorized people from seeing it.

Electronic Commerce

Many people make use of cryptography without even realizing it. Each time you access a secured web site to buy an article, you are using cryptography to protect the information exchange between the user, the browser and the merchant’s web server. A web browser and server exchange information to encrypt the information transmitted from the web browser to where it is decrypted on the server side of the connection.

Remote Access

Organizations use secured remote access as a method of allowing their employees or other authorized people access to their intranet or protected systems using cryptography. Secured remote access prevents the compromise of passwords and user names thorough a variety of methods, some with very high security.

Other Applications

Cryptography is used in places outside of the computer-processing world. We see cryptography used to prevent the theft of information to steal cellular phone usage, for example. Through cryptography we can develop complicated protection schemes for financial, medical and entertainment uses.

The Alphabet

Cryptography is not limited to simply working with the 26 letters of the alphabet. While many cryptographic algorithms have only concentrated on the alphabetic letters, leaving punctuation in the message can provide information increasing the chance of defeating the cryptographic algorithm used. Consequently, many ciphers remove the punctuation or it is not included in the original message.

However, the alphabet as defined in the American Standard Code for Information Interchange (ASCII) defines 94 printable characters. (There are in fact 127 characters in the ASCII character set, but not all of them are printable.) The alphabet we are working with consists of the following letters:

ASCII   TEXT    ASCII   TEXT    ASCII   TEXT
32              33      !       34      "
35      #       36      $       37      %
38      &       39      '       40      (
41      )       42      *       43      +
44      ,       45      -       46      .
47      /       48      0       49      1
50      2       51      3       52      4
53      5       54      6       55      7
56      8       57      9       58      :
59      ;       60      <       61      =
62      >       63      ?       64      @
65      A       66      B       67      C
68      D       69      E       70      F
71      G       72      H       73      I
74      J       75      K       76      L
77      M       78      N       79      O
80      P       81      Q       82      R
83      S       84      T       85      U
86      V       87      W       88      X
89      Y       90      Z       91      [
92      \       93      ]       94      ^
95      _       96      `       97      a
98      b       99      c       100     d
101     e       102     f       103     g
104     h       105     i       106     j
107     k       108     l       109     m
110     n       111     o       112     p
113     q       114     r       115     s
116     t       117     u       118     v
119     w       120     x       121     y
122     z       123     {       124     |
125     }       126     ~

Some cryptographic methods process only the alphabet, which is not a good way of doing things anymore. Consequently, most of the sample programs presented with this material process the entire range of printable ASCII characters. However, as ciphers like ROT-13 are intended to only deal with the alphabet letters, only these are encrypted in the sample program.

Introducing the Key

As mentioned earlier, the key is often considered the algorithm itself. The key however, is often an arbitrary or random value provided to alter the operation of the cryptographic algorithm. The larger the key, the better, as a larger key provides better protection against a brute force attack (trying all possible keys) than a shorter key.

The key, therefore, can be thought of as a component, which behaves much like a key does for a lock. If you have the right physical key, you can open the lock. The same is true with data. You lock your data with an encryption algorithm and open it with the corresponding key.

The intent is to provide a key space large enough to make brute force attacks impractical. However, as we have seen over the last few years, advances in technology often redefine the definition of impractical. The Data Encryption Standard, DES, has typically been considered a very strong algorithm. The DES keyspace (56 bit key) has approximately 72 quadrillion possible keys. A brute force attack on this keyspace was considered impractical due to the limitations on available computing capacity. However, changes in computing technology have allowed for computers to be clustered together or specific equipment developed to attack DES.

Two additional words are important when considering keyspace. These are entropy and unicity.

Entropy refers to the degree of randomness in cryptography. This means, the greater the degree of randomness, the higher the entropy. To illustrate, the English language has a low entropy. Consider the sequence "th" in English. The next possible character is likely one of "a", "e", "i", "o", or "u", with varying probability, depending on the context. The entropy of this sequence is low since there are five likely candidates following "th". So you can easily see that long passages of English text have an entropy far lower than one would expect for a random bit-sequence of equal length. For perfectly random data, the entropy, in bits, is exactly the length of the data, in bits.

The second is the term unicity. When a cryptographical system has a large keyspace, it is said to have a large unicity distance. Unicity is by modern standards, a largely theoretical measurement specifying the number of characters an unbounded attacker would need to obtain an unambiguous decryption of the ciphertext.

While the discussion to this point has used the term “key”, the remainder of this chapter uses the term cryptovariable for consistency with modern terminology.

With an understanding of keys and how they affect cryptographic systems and can make it both easier and harder for a cryptanalyst to retrieve the plaintext, we can now turn our discussion to some cryptographic ciphers.

The Substitution Ciphers

The substitution cipher is something many of us have used, even as kids. This involves the simple process of substituting one letter for another based upon a cryptovariable. Typically, substitution involves shifting positions in the alphabet a defined number of characters. Many old ciphers were based upon substitution, including the Caesar cipher and ROT-13.

To examine the substitution cipher, lets look at an example.

Assuming we are using the entire range of ASCII characters, our basic alphabet consists of

!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

This is the entire printable ASCII alphabet, except for the space character, which is typically preserved in the substitution cipher. The cryptovariable defines the number of positions to “shift” in the alphabet. Using a cryptovariable of 5, we shift 5 positions to the right when encrypting, making the letter ‘A’ the letter ‘F’ after the substitution. If the new letter is greater than the last letter in the alphabet, the cipher wraps to the beginning of the alphabet. With a substitution cryptovariable of 5, the original and substitution alphabets look like:

Original Alphabet
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Substitution Alphabet
&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~!"#$%

The program subst.pl found at the end of the chapter, illustrates the substitution cipher. When running the subst.pl program, it must know if you want to encrypt or decrypt, and the cryptovariable, or positions to shift when performing the substitution. For example, the plaintext

Auerbach Press

When encrypted using a cryptovariable of 33 looks like

$ subst.pl --encrypt --key 33
Shifting right by 33 positions
Auerbach Press
b8(5%$&+ q5(66
$

The program prints the number of positions shifted for informational purposes only. The recipient of the message must know the cryptovariable to decrypt the message. Without knowing the correct cryptovariable, the plaintext message cannot be recovered from the ciphertext:

$ echo "b8(5%$&+ q5(66" | subst.pl --decrypt --key 17
Shifting left by 17 positions
Q'u$rqsx `$u%%
$ echo "b8(5%$&+ q5(66" | subst.pl --decrypt --key 33
Shifting left by 33 positions
Auerbach Press
$

There are several well-known versions of the substitution cipher including the Caesar and ROT-13 ciphers.

The Caesar Cipher

It is rumored that Julius Caesar of the Roman Empire used a substitution cipher when communicating with his military or government department. This substitution cipher uses a rotation of 3 positions. This makes the letter ‘A’ the letter ‘D’ after shifting three characters to the right.

The reader can use either the substitution cipher illustrated earlier, or the caesar.pl script at the end of the chapter.

$ echo 'cryptography is fun!' | caesar.pl --encrypt
fu|swrjudsk| lv ixq$
$

This is a very simple cipher as it is not a permutation of the alphabet, but a simple rotation.

ROT-13

The ROT13 cipher was written to satisfy a requirement on the Internet Usenet news system to protect information from people. Using ROT-13 allowed people to conceal potentially offensive news postings with a warning message. Those people who wanted to access the article had to make the conscious effort to decrypt it. It is an excellent example of the benefits of sharing a common cryptovariable.

The ROT-13 cipher performs its substitution only on the alphabet letters A through Z and a through z. The cipher uses a fixed cryptovariable, which rotates 13 positions in the alphabet, making an ‘A’ an ‘N’, a ‘C’ an ‘O’, etc. For example, the text

Cryptography is fun

When processed through ROT-13 becomes

$ echo "cryptography is fun" | rot13.pl --encrypt
pelcgbtencul vf sha
$

With ROT-13, encrypting the ciphertext again restores the original plaintext:

$ echo "pelcgbtencul vf sha" | rot13.pl --encrypt
cryptography is fun
$

A secnd example program, rot13b.pl uses the Crypt:ROT13 PERL module and is included as an alternate method of using ROT-13.

These substitution ciphers are easily defeated because the cipher text does not hide the frequency distribution of the plaintext. This means a good cryptanalysis can defeat them with approximately 25 characters.

Security professionals do not depend upon substitution ciphers to protect their data. While ROT-13 is still used in Internet Usenet newsgroups, most modern ciphers deal with all printable characters and not just those which are found in the alphabet.

The Polyalphabetic Cipher

Leon Battista invented the Polyalphabetic substitution cipher in 1568. With this substitution cipher, there are multiple cryptovariables, with each cryptovariables being used to encrypt one letter in the message. Once the last cryptovariable is used, the first is used again. For example, the message

I have a headache

processed through a polyalphabetic cipher with the cryptovariables

abcdefghijklmnopqrst
ahcoeuejwnsklwqpm
gurtelakqmwpcjry
192h3j4l5auqmeugh

becomes a (i) h (h) r (a) h (v) e (e) u (a) a (h) l (e) i (a) n (d) w (a) q (c) m (h) w (e)

As you can see, there are several cases where the same letter in the ciphertext maps to a different letter in the plaintext. The appearance of the same ciphertext letter representing multiple plaintext letters makes the use of frequency analysis a little more difficult.

This cipher introduces the concept of a period. The period is the number of cycles it takes to reuse the same cryptovariable. Our example here has a period of four, meaning the first row of letters is used to encrypt every fourth letter. The sample program at the end of the chapter, poly.pl, has a period of five.

Relatively simple computer programs can easily defeat this cipher even when there is a large period in use.

The Transposition Cipher

There are several transposition ciphers. With a transposition, you are not typically altering the characters, although you can, but transposing them or changing their order to hide the message.

One form of transposition cipher breaks the plaintext into 5 character groups and then transposes the letters in a defined sequence. The program trans.pl illustrates this form of cipher.

$ ./trans.pl --encrypt
everything is coming up roses
acdeb eyver tghin  cis  ogmin  rup  o.ses
$

The trans.pl program generates the cryptovariable and saves it as the first five characters of the message. Alternatively, the cryptovariable could be eliminated from the message and communicated separately to the message recipient.

The text everything is coming up roses

is split into five character groups. The trans.pl program adds periods in the event the last group doesn’t contain five characters. So, the plaintext becomes

everything is coming up roses
every|thing|is co|ming| up r|oses

The | symbol indicates where the groups are, as the spaces are retained. The cryptovariable is then applied to change the ordering of the letters. The cryptovariable could be supplied by the sender or generated by the program as our illustration does. The cryptovariable is acdeb or 13452, meaning the first character of the plaintext is the first character of the ciphertext, the second plaintext character is now the fifth ciphertext character and so on. We can better see this through the use of a table.

Plain Text	Cryptovariable	Cipher Text
e	1	e
v	3	y
e	4	v
r	5	e
y	2	r

This doesn’t make the transposition cipher very strong. For instance, it is also susceptible to frequency analysis. However, if combined with a substitution cipher, there is great value in it. This means we can transpose the characters and then perform the substitution, or we can do the substitution first and then transpose the characters. DES does both transposition and substitution (16 rounds) on the data being encrypted.

The Columnar Transposition Cipher

Another example of a transposition cipher is a columnar transposition. With this cipher, the letters are arranged in a defined group and the first character of each line then forms the first group of characters in the cipher.

While it sounds fairly simple, it is a relatively good cipher and not very obvious. For example, assuming we are using the plain text

Now is the time for all good men to come to the aid of the party!

and a cryptovariable of 7, the columnar transposition cipher first splits the input stream in lines of seven characters, making our cipher look like

Now is
the tim
e for a
ll good
 men to
 come t
o the a
id of t
he part
y!

The resulting ciphertext takes the letter in each column to build a line of data. Reading the resulting text across the line left to right, you see the result looks like ciphertext. However, if you read the columns, you see the “ciphertext” still reads as the original plaintext:

Ntel  oihy
oh lmc de!
wef eot
  ognmhop
itro eefa
si ot   r
 madotatt

Consequently, the strength in the columnar transposition is in performing two transformations: one to the columnar format and then a character transposition. Our previous example, trans.pl, already performs the transposition. Since that program also works on five character groups, there is a higher level of abstraction in the encryption:

adcbe nlet   hioy
dcbae l hom ed c!
eadcb ee fw t.  o
adcbe  go n mpoh.
cabde trio  efea.
aedcb sto i  .r
edabc adom  tt.at

Now, the downfall in this example is the cryptovariable is the first part of the ciphertext, so it stands out in this example. If the characters representing the cryptovariable were more randomly chosen, it would not be so obvious. However, if the cryptovariable was consistent across all lines of data, it could be shared between the sender and recipient, making it unnecessary, and inadvisable, to include in the message. However, it illustrates the discussion.

To recap however, performing multiple substitutions and transformations for a single encryption process provides a very high degree of abstraction, making frequency analysis and other common attacks against these ciphers very difficult.

Streaming Ciphers

There are basically two types of stream ciphers: the One Time Pad and the keystream. Stream ciphers work by operating on continuous streams of data and encrypting them using a highly random cryptovariable. Key stream ciphers often use hardware based cryptovariable generators to get this high entropy cryptovariable.

The Pseudo Random Number Generator (PRNG) found in computer systems are not random enough for use as a cryptovariable generator for this type of cipher. That is to say, the PRNG is not statistically unbiased. However, the same program stream.pl, uses the PRNG to illustrate the concept.

Like the other ciphers presented thus far, the stream is a symmetric cipher. It typically works at the bit level. The keystream is the cryptovariable used to encrypt the plaintext, usually done using the XOR function on the bit level. The generation of the keystream can be totally independent of the plaintext and resulting ciphertext, and is called a synchronous stream cipher. Alternatively, the plain and cipher text can be related to the cryptovariable, which is called a self-synchronizing stream cipher.

The One Time Pad

The one time pad uses a long series of random letters as the cryptovariable. The cryptovariable is the same length as the message and is only used once. Major Joseph Mauborgne and Gilbert Vernam of AT&T originally developed it. Large sets of random, non-repeating characters were written on sheets of paper, which were then glued together to form a pad. When a message was encrypted, the sheets were removed from the pad to avoid sending a message with the same cryptovariable. The receiver must have the same pad and use the same sheets to correctly decrypt the message.

One-time pads work by adding the alphabetic positions of the plaintext and the cryptovariable and subtracting 26. The modulus, or remainder is the ciphertext. For example, the plaintext message

REDALERT

Combined with the cryptovariable

IPEBVYTM

Results in the ciphertext

AUICHDLG

It works this way because

R + I mod 26 = A since R is the 18th letter and I is the 9th. 18 + 9 = 27 mod 26 = 1, which means the letter A. (Remember, modulus calculates the remainder.)

The secret with the one time pad is to make it as random as possible for the entire cryptovariable length. However, the generation of a truly random pad is very difficult, and using a pseudo-random number generator (PRNG) is not sufficient, since these have a certain element of non-randomness.

This is, theoretically, an example of an unbreakable (by brute force) encryption system. The theoretical part depends upon the secrecy of the cryptovariable and remembering you can never use the same cryptovariable to send another message. Ever.

There are potential problems with the one time pad. The sender and receiver must be perfectly synchronized, as being off a bit or two may completely invalidate the message. While this method is practical for the protection of small messages, it is not practical for the protection of a communications channel or a large volume of data. The second problem is with the storage of the pad. The pad itself must be perfectly secure and have the ability to remove the used bits. Failure to protect the pad invalidates the cryptovariable.

Finally, the pad can be applied to binary data. Instead of using modulo 26, the process uses an exclusive OR (XOR) function. The decryption process otherwise works the same and is presented in the section “The XOR Cipher”.

The program pad.pl provides and example of a one-time pad, however, since it does not use a true random generator, it is not advisable to use it for protecting your government secrets!

The XOR Cipher

The XOR cipher, or simple XOR is a standard operation on bits of data. The cipher works by using a cryptovariable to perform the bit-level operation on the plaintext. However, because this is a bit-level operation, encrypting the ciphertext with the cryptovariable results in the plaintext.

While this cipher is considered to be an embarrassment, many commercial software programs use this approach to “protect” the data stored or processed by them. Many software vendors claim to have implemented an encryption scheme as secure but much faster than DES, while in fact they have likely only implemented an XOR function.

The XOR cipher is an example of a streaming key cipher. Like the one time pad, it requires a high degree of randomness in the key. The key is generated as a highly random sequence of bits. The ratio of zeroes and ones should be statistically balanced, otherwise the resulting ciphertext may be skewed and easier to defeat.

Assuming we have the plain text

This is good

The XOR cipher typically converts the ASCII text to a binary representation of each character, which in this case is 104 bits (13 characters with 8 bits per character). We are assuming in this discussion the plain text is already ASCII. If the data is binary, then the conversion to ASCII is not required. For example, the binary representation looks like

01010100011010000110100101110011001000000110100101110011001000000110011011011110110111101100100

applied with a key

001100100111000100011001010011000100011100011001011100110000110101001000010101000111001001100001

A random key is generated, also consisting of 104 bits. As the cipher processes each bit in the plaintext, it is XOR’d with the corresponding bit from the key. Remember, the XOR function returns

0 if both bits are the same (e.g. 0 and 0 or 1 and 1)
1 if the bits are different (e.g. 0 and 1 )

As the bits are processed, they are reassembled into characters and the resulting ciphertext is displayed. The key must also be provided for the receiver to decrypt the message. Ideally, the key is provided out of band from the message to protect both the key and the ciphertext.

The stream.pl script demonstrates a key stream cipher using the XOR function. When using the stream.pl script, the user does not provide a key. It is randomly generated using the pseudo-random number generator (PRNG). As mentioned earlier in the article, this is not ideal, but it serves our purposes. The provided plain text is written to a file, as is the key used to encrypt the plain text. The following example demonstrates using stream.pl:

The file to be encrypted in our example is called ‘c’ and contains the following text:

[chare@linux crypto]$ cat c
This is more fun
than a barrel of monkeys

We encrypt it using the stream.pl command to illustrate the stream cipher

[chare@linux crypto]$ perl stream.pl --encrypt c

The resulting output consists of the ciphertext in the file c.cip, and the key in the file c.key. As with any cryptographic system, the key must be closely guarded and ideally should never be stored with the plain or cipher text. The following command shows the files in the directory:

[chare@linux crypto]$ ls
c   c.cip   c.key  stream.pl

The stream.pl script stores the ciphertext as a sequence of hexadecimal values corresponding to each line in the plain text.

[chare@linux crypto]$ cat c.cip
70 17 CA 9E 84 E0 5E 83 77 3C 3E FF FC 24 A4 FE
F8 98 60 01 16 E5 AF 65 73 F2 96 94 5B 73 A1 41 F9 4E A9 A2 B4 DC 1A F4

Finally, the key itself is stored in a separate file so it can be communicated to the recipient separately, as a series of binary digits, one sequence for each line of plain text.

[chare@linux crypto]$ cat c.key
00100100011111111010001111101101101001001000100100101101101000110001101001010011010011001001101011011100010000101101000110010000
100011001111000000000001011011110011011010000100100011110000011100010010100000001110010011110001001101110101001111001110001001111101100100100011110001101100110011011111101110010110001110000111

Decrypting the data requires both the cipher text and the key file:

[chare@linux crypto]$ perl stream.pl --decrypt --key c.key c.cip
This is more fun
than a barrel of monkeys
[chare@linux crypto]$

The streaming cipher is often implemented in hardware to get a truly random and statistically balanced number of ones and zeroes. The hardware implementation also includes compensators to account for loss of synchronization between the two key generators – they must be in sync.

However, the XOR function on its own, which is typically used in this operation, is not cryptographically strong. Many software vendors use this as their method on encryption and claim their encryption method is as strong as DES without the time or processing power requirements. Don’t be fooled.

Despite the apparent weaknesses of the some of the preceding ciphers, their combined strength can be formidable. As we shall see shortly, DES is a combination of substitution and transposition. For example, one could take some plaintext, perform a simple substitution on it, transpose it, perform an XOR, and then repeat this a defined number of times using the key as the input to the various functions. Consequently, there is strength in numbers (pardon the pun).

The next major group of ciphers are the block ciphers. These are illustrated using publicly available PERL modules; and there are many block cipher implementations available including Blowfish, Data Encryption Standard (DES), Triple DES, Twofish and Advanced Encryption Standard (AES).

Block Ciphers

Unlike the previous group of ciphers, which typically operate on a character at a time, the block ciphers assemble a fixed size block of text and perform the encryption on the entire block.

Block ciphers are symmetric: they use the same cryptovariable to encrypt and decrypt. An important item to remember – the resulting ciphertext is the same length as the plain text. The encryption algorithm is modified using the user provided cryptovariable. The block size for most ciphers of this type is 64 bits, however, we can expect this block size to increase with additional advantages in computing technology.

Symmetric vs. Asymmetric

Despite common understanding, both symmetric and asymmetric cryptography uses block cipher methods. The primary difference is in key usage. Symmetric cryptography uses the same cryptovariable for both the encryption and decryption operations. IBM developed the most popular cryptographic cipher, Data Encryption Standard, in the middle 1970’s. DES was established as a United States Federal standard in 1976, and was only replaced in 2000 by the Advanced Encryption Standard (and triple DES).

Asymmetric cryptography uses two cryptovariables: one to encrypt and one to decrypt. The keys are related mathematically and are based upon Elliptic Curve Cryptography, factoring large prime numbers or discrete logarithms. The nature of mathematics behind these complex problems is beyond this discussion. Asymmetric cryptography is commonly known as public key cryptography. The most popular form of public-key or asymmetric cryptography is RSA, invented by Rivest, Shamir and Adleman.

As discussed elsewhere in this article, the primary advantage of asymmetric cryptography is key management. While the sender and receiver must exchange the secret key when using symmetric cryptography, in asymmetric cryptography the private key is kept private and never revealed to the other. Asymmetric cryptography addresses the major concern with symmetric key distribution: the accidental discovery of the key by an unauthorized person.

Symmetric cryptography has significant computational advantages over asymmetric cryptography due to differences in the algorithm implementations. However, what is not often remembered is these two cryptographic methods are often combined into hybrid systems. The asymmetric method is used to exchange a shared cryptovariable, which is then used by the symmetric method to perform both the encryption and decryption operations.

There is a vast array of block cipher cryptographic functions including (alphabetically):

Advanced Encryption Standard (AES) - Rijandel
Blowfish
CAST
Data Encryption Standard (DES)
FEAL
IDEA
KHUFU and KHAFRE
LOKI
Lucifer
Madryga
RC2
RC5
REDOC
SKIPJACK
SAFER
Triple DES
Twofish

(The highlighted ciphers are specifically discussed in this chapter.) Of course, you get the point: there are a large number of block oriented ciphers, some of which are protected by patents or copyrights, while others are not. This chapter discusses some of them and includes a sample program illustrating their use.

Operation Modes

Most block ciphers have four modes of operation:

Electronic code Book
Cipher Block Chaining
Cipher Feedback
Output Feedback

The electronic Code Book and Cipher Block chaining are block modes, while the Cipher and Output feedback modes are stream oriented within the cipher. This may sound like a contradiction, but it is not.

Electronic Code Book (ECB)

ECB is considered one of the more obvious ways of using cryptography, and is certainly the easiest. Since any block cipher in ECB mode always results in the same ciphertext for the same plaintext, it is possible to create a “codebook” of often-used phrases or messages. These can be stored and used when needed to send a message without having to perform the encryption time and time again.

In ECB mode, each block is encrypted independently, allowing randomly accessed files to be encrypted and still accessed without having to process the file in a linear encryption fashion.

ECB does have problems. Since the same plaintext always produces the same ciphertext, a cryptanalysis obtaining both can start to create a codebook of their own, can obtain the actual message without the need to launch an attack to find the key.

The ability of the cryptanalyst to create their own codebook is not a problem with the cipher per se, but is an inherent problem with the mode of operation. The program ecb.pl demonstrates the Electronic Code Book operation.

Cipher Block Chaining (CBC)

Using a chain of blocks resolves the problem with the Electronic code Book. In CBC mode, the result of encrypting one block of data is fed back into the process to encrypt the next block of data. Unlike ECB, CBC forces each plaintext message to encrypt to a different value by using an initialization vector. (Because many of the sample programs do not use an initialization vector, they can be said to be operating in ECB mode.)

The initialization vector, or IV, is the value fed into the first encryption process, since there is no previous encryption result to use. The IV itself has no meaning; it is just there to make each encryption process unique and to hide any common message formating patterns.

Before actually performing the encryption, CBC mode uses the XOR function with either the IV or the previous encryption feedback on the plaintext. The result is then processed through the encryption algorithm. This is best illustrated using a picture:

Figure 1 - Illustrating CBC

The encryption process illustrated in Figure 1 is the same for decryption. Unlike the cryptovariable, it is not necessary for the IV to be kept secret: it can be transmitted with the ciphertext.

The principle features with CBC are the IV and the next encryption process is affected by the result of the previous encryption. This prevents two identical messages from ever being equivalent.

Cipher Feedback (CFB)

The cipher feedback mode implements the block cipher as a self-synchronizing stream cipher. In this mode, each bit produced in the keystream is the result of a predetermined number of fixed ciphertext bits as explained by Schneier. Since the keystream is the result of a series of ciphertext bits, it is possible to synchronize two disparate keystream generators after the recipient system has received the corresponding number of ciphertext bits. Incidentally, the self-synchronizing stream is known in the military is ciphertext auto key (CTAK) due to the synchronization properties.

Now, with CBC mode, the encryption process cannot begin until a complete block of text is received and is ready to process. For some applications this is difficult if not impractical due to user delays, network latency etc. Consequently, CFB allows encryption on units smaller than the block size. This makes it easier for handling issues such as network latency or delays in receiving the data to process.

Like CBC mode, CFB uses an Initialization vector to start the process. The strength and weakness of CFB is when errors occur. Errors in the encryption process are corrected at decryption. However, when an error is introduced during decryption, some of the ciphertext is incorrectly decrypted, resulting in garbage. For example, if a 1-bit error is introduced, it affects the following 9 bytes as the incorrect bit is introduced and processed in the keystream.

Output Feedback (OFB)

The other perspective is output feedback, where the block cipher is operated as a synchronized stream. Here, the keystream is generated independently of the message and is known as Key Auto-Key (KAK) in military terminology. Output Feedback is essentially the same as cipher feedback with some differences in the internals of how the ciphertext is generated.

AES

The Advanced Encryption System, also known as Rijndael (pronounced “Rhine Dahl” was named as the replacement for DES by the United States National Institute of Science and Technology (NIST) in October 2000 and adopted as the new U.S. standard in 2001. The AES algorithm supports key sizes of 128, 192 and 256 bits.

Despite the relatively recent selection of Rijndael, there are already implementations of this block cipher in use. As with other block mode ciphers, AES supports the traditional block mode operations previously described (ECB, CBC, CFB and OFB).

Blowfish

The Blowfish algorithm was developed by Bruce Schneier and has been implemented in several products including FolderBot security products, PGPfone and Nautilus. While there have been successful cryptanalytic attacks against Blowfish, the discovered weak keys have not been successfully used to compromise the actual plain text.

Blowfish is a fast block cipher algorithm, which is compact, secure and simple to implement. It is best used in situations where the key does not change often. This makes Blowfish impractical for packet switched encryption or applications where the key changes frequently. Blowfish has been implemented in PERL and a sample program called blowfish.pl. Further details on the inner workings of Blowfish are described in Scheier’s book, Applied Cryptography.

Data Encryption Standard (DES)

As mentioned earlier, IBM developed DES in the mid-1970’s. It was the follow-on development activity from the Lucifer cipher. DES became the United States Federal encryption standard in 1976, and is the most popular form of cryptographic cipher developed.

DES is a block mode cipher, working on blocks of 64 bits (8 bytes) using a cryptovariable with 56 bits and 8 parity bits. The 56 bits form the real key, and the parity bits are used as check bits for the true key. DES works by performing 16 rounds of encryption, using both substitution and transposition. Some reference texts use the term permutation instead of transposition, but they are for all intents, equivalent terms. Decryption of the ciphertext uses the same algorithm, except the operations are performed in reverse.

IDEA

IDEA was first introduced in 1990 and is known as the International Data Encryption Algorithm. There have been several revisions to IDEA after Rivest and Shamir demonstrated weaknesses in the algorithm using differential cryptanalysis. It is considered by some cryptologists as the most secure block cipher available to the public.

IDEA uses a 64 bit block size with a key length of 128 bits. It uses a combination of XOR and modulo addition and multiplication. The algorithm divides each 64 bit block into 16 bit sub-blocks, where the operation is performed. There are eight rounds of processing on each sub-block using sub-keys derived from the 128 bit key.

While the IDEA cipher can be easily implemented in software, the future is not clear given the IDEA cipher is patented and must be licensed for commercial use. Despite the current reluctance to use the IDEA cipher for these reasons, it is included in the Pretty Good Privacy (PGP) implementation.

Triple DES

There are other modes to DES aside from the one previously described. These are double and triple DES. Combining multiple encryptions, either with the same or different algorithms is known as “multiple encryption”.

Double DES, which is not going to be discussed here, has been proven to be no more secure than single DES.

Triple DES on the other hand, has several operating modes which make it more secure and more difficult to attack Although DES is most commonly associated with multiple encryption, it is important to remember, any cipher could be used in this mode, or you can link multiple different ciphers together.

The triple DES modes can use two or three cryptovariables. The two cryptovariable modes are

DES-EEE2 – Encrypt-Encrypt-Encrypt
DES-EDE2 – Encrypt-Decrypt-Encrypt

and the three key modes are

DES-EEE3 – Encrypt-Encrypt-Encrypt
DES-EDE3 – Encrypt-Decrypt-Encrypt

Encrypt-Encrypt-Encrypt

This mode of operation involves three rounds of encryption using two or three cryptovariables. Each round processes the ciphertext from the first round. Using independent and separate cryptovariables decreases the probability of a successful attack. An illustration is suitable to demonstrate:

Figure 2 - EEE2 Figure 2 illustrates using EEE mode with two cryptovariables and is called EEE2. The first and third encryption operations use the same cryptovariable. If using three cryptovariables, the operation is the same, except each encryption uses a separate cryptovariable and is called EEE3.

Encrypt-Decrypt-Encrypt

This operation is similar except the process is to encrypt with one cryptovariable, decrypt with another, and then encrypt again. Like EEE, each subsequent operation uses the ciphertext from the previous operation:

Figure 3 - EDE2 Figure 3 illustrates the EDE operation. The decryption operation in the second step uses a different cyptovariable, resulting in garbled plaintext. The final encryption operation encrypts the garbled plaintext, resulting in the final ciphertext.

There is no magic with the EDE scheme. The method was designed by IBM to maintain compatibility with conventional implementations of the DES algorithm. Encrypting twice with the same key adds no additional security value.

Twofish

Counterpane Labs, founded by Bruce Schneier, developed the twofish cipher. The cipher was one of the five AES finalists, although it eventually lost to Rijndael. Twofish is a freely available, unpatented and uncopyrighted cipher, so it can be used almost anywhere. It uses a 128 bit block with a 128, 192 or 256 bit key and works in all of the standard modes presented in this chapter.

The algorithm is efficient on a variety of platforms including smart cards, software and hardware. The twofish algorithm has been heavily analyzed by cryptologists worldwide and is considered cryptographically strong.

Cryptanalysis

In order to design a robust encryption algorithm or cryptographic protocol, one should use cryptanalysis to find and correct any weaknesses. This is precisely the reason why the best (most trusted) encryption algorithms are ones that have been made available to public scrutiny. For example, DES has been exposed to public scrutiny for years, and is, therefore, well trusted, while Skipjack is secret and less well-trusted. It is a basic tenet of cryptology that the security of an algorithm should not rely on its secrecy. Inevitably, the algorithm will be discovered and its weaknesses (if any) will be exploited.

The various techniques in cryptanalysis attempting to compromise cryptosystems are referred to as attacks. Some attacks are general, whereas others apply only to certain types of cryptosystems. However, cryptanalysis uses mathematics and linguistics elements to determine the key and, therefore, the plaintext of the original message.

Differential

Differential cryptanalysis compares specific pieces of ciphertext to determine the key. This is done by the cryptanalyst by using two pieces of plaintext which have specific differences. The cryptanalyst watches the progress of the encryption process through each step. Once the encryption process is completed, the differences in the ciphertext are used to determine the probabilities for specific keys. As more and more pairs of ciphertext are compared, one key will emerge as the most probable.

Linear

Linear cryptanalysis is a generic analysis method that applies to a large class of ciphers, of which DES is a member. It is a probabilistic approach in which an output bit from the round function bears some linear relationship to some subset of input and key bits, with some useful probability. It's more powerful than differential cryptanalysis, since less known/chosen plaintext is required to mount the attack.

Related Key

Related-key cryptanalysis is based on weaknesses in certain ciphers in which two keys with a low hamming-weight difference can be used to mount some useful attack on the key. For example, even if you don't know what the key, K, is, but you can force the system to use key K' such that K' is a low distance from K, you may be able to use the fact that K'=K+1 to your advantage. Modern ciphers are generally resistant to related-key attacks, but not always.

Cryptographic Attacks

There are several principle attack types against an algorithm, with varying degrees of difficulty and success. The cryptanalyst uses these attack methods when attempting to determine the cryptovariable and algorithm used. The cryptanalysis techniques are:

Ciphertext Only Attack (COA)
Known Plaintext Attack
Chosen Plaintext Attack
Adaptive Chosen Plaintext Attack
Chosen Ciphertext Attack
Brute Force Attack
“Purchase key” Attack
“Rubber Hose” Attack

Our discussion will examine and define ach of these attack methods.

Ciphertext Only Attack

In the Ciphertext Only attack, the cryptanalyst only has several examples of ciphertext encrypted with the same algorithm, but no plaintext. In this situation, the cryptanalyst wants to recover the plaintext of as many messages as possible, ideally recovering the keys used as well. Making use of this type of attack requires a fairly large sample of ciphertext and is considered generally difficult.

Known Plaintext Attack

The cryptanalyst in this attack mode, has the plaintext and matching ciphertext, but does not know the key and possibly the algorithm used. Consequently, the cryptanalyst wants to determine the algorithm used, and ideally the keys to decrypt future messages.

Chosen Plaintext Attack

In the Chosen Plaintext Attack, the cryptanalyst has access to the plaintext and ciphertext. Additionally, the cryptanalyst gets to choose the plaintext to encrypt. Like the known plaintext attack, the cryptanalyst is trying to deduce the algorithm and/or the key for decrypting future messages. This is considered a better attack method than the known plaintext attack as the cryptanalyst can choose the blocks of plaintext.

Adaptive Chosen Plaintext Attack

This is similar to the Chosen Plaintext Attack, but in adaptive mode, the cryptanalyst can choose to adjust the plaintext to encrypt based upon the results of the previous encryption.

Chosen Ciphertext Attack

Here the cryptanalyst chooses the ciphertext to be decrypted and has access to the decrypted plaintext. This attack is largely used against asymmetric encryption, although it can be used against symmetric systems as well. When a chosen plaintext and a chosen ciphertext attack are used in combination, it is often referred to as a “chosen text attack”.

Brute Force Attack

The brute-force attack is exactly that – the process of trying every possible key to decrypt the ciphertext and obtain the message. This is not typically practical, but is sometimes the best method of obtaining the key.

“Purchase key” Attack

The “purchase key” attack consists of the cryptanalyst, or attacker purchasing or bribing the key holder into providing the key.

“Rubber Hose” Attack

Here, the attacker uses bribery, extortion or torture to obtain the key from the key holder. Both this and the “purchase key” attack are very powerful and sometimes the best way to obtain the key.

Summary

The art, or science, of cryptography is often considered a world only the mathematical geniuses can access. In reality, all of us use cryptography on a daily basis, generally without knowing it.

The security professional requires a good understanding of cryptographic systems and their uses, even though they may not understand the mathematics or theory behind them unless they are developing or implementing cryptography at the code level. For most, understanding when and where to use it, as well as what options are available, is enough.

This chapter has presented several stream and block based symmetric ciphers, attack methods and provided an example of attacking a cipher using frequency distribution. The text, along with the sample programs, provides even the security professional with no cryptography background, the elements to be able to discuss when, where, and some different types of cryptography with their peers.

References

Some of these references were used in the preparation of this chapter, while others provide useful historical or additional information on cryptography.

Sir Francis Bacon, De Augmentis Scientarum'', Book 6, Chapter i. [as quoted in C. Stopes,Bacon-Shakspere Question'', 1889]

Sir Richard F. Burton trans., ``The Kama Sutra of Vatsayana'', Arkana/Penguin, 1991.

Cipher A. Deavours and Louis Kruh, ``Machine Cryptography and Modern Cryptanalysis'', Artech House, 1985.

Whitfield Diffie and Martin Hellman, ``New Directions in Cryptography'', IEEE Transactions on Information Theory, Nov 1976.

Horst Feistel, ``Cryptographic Coding for Data-Bank Privacy'', IBM Research Report RC2827.

Simson Garfinkel, PGP: Pretty Good Privacy'', O'Reilly & Associates, Inc., 1995. Proceedings, EUROCRYPT '90; Springer Verlag. David Kahn,The Codebreakers'', Macmillan, 1967.

Derek J. Price, ``The Equatorie of the Planetis'', edited from Peterhouse MS 75.I, Cambridge University Press, 1955.

Ronald L. Rivest, ``The RC5 Encryption Algorithm'', document made available by FTP and World Wide Web, 1994.

Steve Bellovin and Marcus Ranum, individual personal communications, July 1995.

Rivest, Shamir and Adleman, ``A method for obtaining digital signatures and public key cryptosystems'', Communications of the ACM, Feb. 1978, pp. 120-126.

Adi Shamir, ``Myths and Realities'', invited talk at CRYPTO '95, Santa Barbara, CA; August 1995.

Bruce Schneier, “Applied Cryptography”, Wiley, 1996.

Steve Burnett and Stephen Paine, “RSA Security’s Official Guide to Cryptography”, RSA Press 2001.

H. X. Mel, Doris Baker, Cryptography Decrypted. Addisson-Wesley, 2001.

Source Code

The following section provides sample PERL code illustrating a number of the ciphers discussed in this chapter. The code was written using PERL 5.6.0 on a Linux system. The source code illustrates:

Printing the ASCII character set (alphabet.pl)
Obtaining a frequency distribution of the characters in a message (freq.pl)
The Caesar cipher (caesar.pl)
A Substitution cipher (subst.pl)
A Transposition cipher (trans.pl)
A Columnar Transposition cipher (col.pl)
The ROT-13 cipher (rot13.pl)
The Polyalphabetic cipher (poly.pl)
A Keyed Stream cipher (stream.pl)
A One Time Pad cipher (pad.pl)
The Blowfish cipher (blowfish.pl)
The DES cipher (des.pl)