Читать книгу Secret and Urgent - The Story of Codes and Ciphers - Анон - Страница 14
IV
ОглавлениеSimple substitution is the ABC and arithmetic of cryptography, and a complete understanding of it and of the method of breaking it is essential to any knowledge of the art. Explanations of the method, surrounded by very good stories, are given both in Poe’s Gold Bug and Conan Doyle’s Adventure of the Dancing Men, but as ninety per cent of all cipher messages are in some form of simple substitution it is worth giving another here.
Perhaps the simplest form of all is that which assigns a number to each letter of the alphabet in order of occurrence—A = 1, B = 2, C = 3, and so on; or reverses the process, making Z = 1, Y = 2, X = 3. Along with it goes the alphabet of Julius Caesar, which replaces each letter by the one that follows it, two, three or more places down the alphabet. Both are still occasionally met with—usually among school children or pairs of secretive and romantic lovers. Most simple substitution ciphers go beyond this to introduce some slight complication, such as writing the message in conventional signs instead of letters, or using a keyword, something in which no letter is repeated, the key-word being written down, with the rest of the alphabet following and the clear alphabet below:
In enciphering, the letters of the top line are substituted for those of the lower line. Example: All cats are grey at night becomes NFF WNQP NMO KMOX NQ HBKAQ, or, making the message up into five-letter groups and adding nulls at the end to fill out the last group, as is frequently done:
NFFWNQPNMOKMOXNQHBKAQVCDZ
Suppose now that the cryptographer is faced with a message of unknown content. (The groups are numbered for convenience in referring to them.)
The first step in decipherment is to count the frequency with which each letter appears and to draw up a table of the result. In the present case it is:
The first observation to be made from this table is that the message cannot be in a transposition cipher. There are not enough vowels and though in a short message frequencies do not always correspond to those given by the tables, P, Q and X come altogether too often.
Therefore the cipher is substitution; and if substitution, then simple substitution, since double substitution (for reasons that will be given later) would very likely have more than twenty letters represented, and would show no such violent variations in frequency as the drop from 13 P’s to a single M, H, and V.
Having cleared the track by identifying the cipher according to type, the cryptographer now turns to his table of letter frequencies. (Table I) Here he finds that E is the most frequent letter in English with T next. It seems likely, therefore that P is E and S is T. In a message as short as this the order of the first two may be reversed; but it will be noted that the P’s are scattered fairly evenly through the message while the S’s tend to bunch, which would indicate the correctness of the P = E solution. Consonants group themselves; vowels invariably scatter. The message is accordingly rewritten with the provisional values P = E and S = T in the proper places below:
This seems very reasonable; it contains no impossible linguistic combinations and the spacing of the T’s and E’s appears what it should be in normal text. Following T and E the next letters in the alphabet in order of frequency are A, O, N, R, I and S. In the message under consideration this corresponds very well with the high frequencies of the letters K, Q, R, Z, C, and E; but both in message and in frequency table these six letters are so closely grouped that it would be very difficult to tell which was which without extensive experiment along trial and error lines.
The cryptographer therefore takes a short cut by consulting the table of trigrams, or three-letter groups. (Table XII) These show that the is overwhelmingly the most frequent three-letter combination in the language; and further, that it is very rare to find any letter but H standing between T and E. In the present message the combination T-blank-E occurs twice, in groups 1 and 5. In both cases the blank is represented by the same letter of the cipher (Z), and on frequency Z can well represent H.
But if Z = H, then Q is probably R or S; for with the insertion of the H group 1 reads THE-blank-E, which is a strong possibility for THERE or THESE. Q, which fills the blank could be, on frequency, either R or S. However in group 10 the combination ZPQ occurs and the ZP has been solved as HE; and the same combination is repeated in groups 19–20. Reference to the trigram table shows that HER is one of the most common in the language, while HES is relatively rare. The balance of the probabilities thus favors the hypothesis that:
Q = R
and it is accordingly filled in that way.
In groups 17–18 occurs the combination SESTSE. This has been partially solved to read T-blank-T-blank-T-blank, which, with the repeated E’s, constitutes a pattern word. The cryptographer therefore looks at his table of pattern-words (Table XI) and discovers that this pattern usually means TITUTI or TETATE. Since P = E in this cipher, the pattern must represent the first of these two combinations, which yields the values:
E = I; T = U
both of which check very well by the frequency table for the message. These values are now filled in, and it becomes evident that the cipher is near complete solution.
Of the little group of letters that showed high frequencies in the message there now remain unsolved R, C, K and J; of high-frequency letters for which no values in the message have been found there remain A, O, N and S. Two of these drop into place with the acceptance of the TITUTI combination, which can hardly end in anything but ON, yielding the equations:
K = O; J = N
If this be correct, groups 1–2 now read THERE I-NOR, or, dividing it into words along the obvious lines, THERE I- NO R, which makes it apparent that:
R = S
This leaves only one letter in the high-frequency group (A) to be accounted for and only one letter of high frequency in the message (C); and unless there be some strong reason to the contrary the cryptographer can assume that:
C = A
Once more filling in, with the obvious word divisions indicated, the following result is obtained:
It is apparent how nearly this finishes the task. Obviously nothing will do at the end of group 11 but the letters L and D, to complete the word should, which gives the equations:
F = L; M = D
Similarly replacing the G of the message in group 6 with M yields a satisfactory result, and the U’s in groups 4 and 14 work out nicely as V’s. LON-blank in groups 13–14 now becomes clear as LONG, and H = B is required in group 17. The remainder can now be filled in:
V = W; X = Y; L = P; I = C.
The message is solved and the cryptographer now draws up his table of equivalents:
The key-word was evidently “chimpanzee” with the final E dropped off (repetitions are not permissible) and it is now possible to fill in the whole table and wait for the appearance of the next message written with the same key.