February Meeting Notes: cryptography and simple histograms

For this month’s meeting, Joel and I talked about simple cryptography theory, and followed it up with discussing how a histogram can be used to help analyze and break certain types of cryptography schemes.

Histograms are just a way to graphically represent data. This can be color data from an image, or data in a text or binary file. Really, histograms are just simple bar graphs.

Read on for the rest of the details.

Without posting everything that Joel and I said in the meeting, it’d pretty difficult for me to convey exactly how histograms can be used for cryptanalysis. Simply put, the histogram shows the number of times each character appears in a file. In a simple letter-substitution scheme, it would be easy to see what letters show up most often in natural language and in the encrypted text. There’s a fairly good chance that you can start replacing letters that have similar frequency. Once you’ve accurately substituted enough letters in the encrypted text to form a few whole words or easily-guessed partial words, it becomes no more difficult to completely decrypt the message than playing a game of hangman that’s already half-solved.

Here are a few histograms I generated for large text files. This is useful for analyzing the frequency that certain characters appear in a file:

As you can see, the charts both top out at the same place. That’s a space character. Spaces are easily the most common character found in written text. All the bars to the left of the tall bar are “control characters” such as carriage returns. Directly to the right of the tall bar are symbols, numbers, upper-case and lower-case letters respectively.

Notice the whole right side of the above graphs are empty, because those are called “high ascii” characters that aren’t commonly found in written text, but are common in binary files.

This is a histogram of a file containing only random data:

And finally, a histogram of an OpenBSD binary executable file (which has a lot of nulls on the far left) throwing off the curve. Nulls are very common on executable files on almost any platform.

Finally, you can take a look at my code. It was a pretty quick hack for personal research reasons, but I decided to bring it up in the meeting today. I made sure to document most of the important logic in the code.

http://churchofpuffy.com:7080/histogram.phps

Thanks to everyone who showed up for this month’s meeting!