A clojure kata: Diceware password generator (part II)

Continuing the discussion from our last installment, we are implementing a password generator that uses diceware.

The Word List

We took care of creating a function that generates a random 5 digit key, but now we need something to do with it. Diceware uses a word list located at http://world.std.com/~reinhold/diceware.wordlist.asc . That URL returns a text file with a format that looks like this:

...
31533	grand
31534	grant
31535	grape
31536	graph
31541	grasp
31542	grass
...

:thought_balloon: Aside: What I love about this password generation system is that it’s a good example of a very un-secret procedure that provides a provably secret result. The word list at that url is open and static. An attacker will know exactly what words are in your password. That’s not an issue though because there are 7776 words in there and you chose them at random. A brute force attack will require 7776^n guesses where n is the number of words in your password. You just have to make sure there enough words to make it impossible for someone to try them all. I think they recommend 6 now which yields a password space of 10^23 residents. That tells you how powerful today’s computing power is. :thought_balloon:

So, this file looks suspiciously like a hashmap. A number for the key and a word for the value. That suggests that we need to convert a sequence of lines from that file into a clojure hash-map.

First things first though. Let’s create a var for that URL. We’ll make it a java URL object.

(def wordlist-url (java.net.URL. "http://world.std.com/~reinhold/diceware.wordlist.asc"))

We’re also going to need to use the I/O library so let’s bring that in

(require '[clojure.java.io :as io])

Now we can create a convenience function that grabs the file and makes it into a sequence of lines

(defn get-wordlist []
  (filter (fn [line] (re-matches #"[1-6]{5}.*" line)) 
          (line-seq (io/reader wordlist-url))))

The line-seq function takes a java BufferedReader object and returns a lazy sequence of lines. We want to filter that sequence of lines with a regular expression (#"[1-6]{5}.*"). This gets rid of any lines in the file that aren’t parseable as key/value pairs (comments, the digital signature, blank lines, etc.)

Once we have one of these lines, we need to split it up into the key and the value. To do that we’ll haul out another regular expression and the re-seq function

(defn split-line [line]
  (let [[_ k-string val] (first (re-seq #"([1-6]{5}).(.*)" line))]
    [(Integer/parseInt k-string) val]))

To understand this we should look at that re-seq call.

user=> (re-seq #"([1-6]{5}).(.*)" "11111 hello")
(["11111 hello" "11111" "hello"])

This returns a sequence containing a vector of three elements. The first element is the entire input string. The remaining elements are the matches from the regular expression. The key (as a string) and the value.

In the let form we assign this vector via clojure’s destructuring. The vector on the left [_ k-string val] gets assigned to by the return value of the function call on the right. The underscore character is used, conventionally as a throwaway var.

So now we have a k-string var and a val string. We return a vector of these two values after parsing the k-string into an integer. It looks like this

user=> (split-line "11111 hello")
[11111 "hello"]
user=> (split-line "bogus")
nil

This is something we can work with.

Making the Map

We now have the word list file expressed as a sequence of lines and we have a function that converts a line into a key/value vector. It should be easy to make a hashmap out of that

(def wordlist-map (into {} (map split-line (get-wordlist))))

Let’s see what we get

user=> wordlilst-map
{25535 "fix", 26651 "gavel", 41235 "luke", 55614 "strewn", 55121 "sound", 64525 "wyner", ...
user=> (wordlist-map 25535)
"fix"
user=> (wordlist-map 66666)
"@"
user=> (wordlist-map 11111)
"a"
user=> (wordlist-map 11117)
nil

Seems like it works. One minor change we might make. When the code that defines wordlist-map is loaded there is a noticeable delay. This is because it has to make a network call, pull back a file and then parse 7776 lines of text into a map. It’s not horrible, but if we’d rather lazily load that map we can do this:

(def wordlist-map (delay (into {} (map split-line (get-wordlist)))))

We just added a delay call to the original into. A delay postpones the evaluation of the forms in its parameter body until it is specifically dereferenced. With this delay in place the wordlist-map looks like this

user=> wordlist-map
#object[clojure.lang.Delay 0x1a4e0f43 {:status :pending, :val nil}]

It now returns immediately when it is loaded but you get this object thing with a status of :pending. That just means that the delay hasn’t been realized yet.

Here, see?

use=>(realized? wordlist-map)
false

To dereference wordlist-map you add the @ character to the front of the var. A call to @wordlist-map will execute the download and make the map and return it. It’ll also cache the result so that it won’t have to do it all again the next time you want to use it. The only change is that you now always have to dereference the var to get the value.

user=> (@wordlist-map 25535)
"fix"

Looks like we’ve got one more chapter in our little saga. We’ll be putting it all together so you’ll definitely want to turn in for that. Let’s end by reprinting the code we wrote here.

(require '[clojure.java.io :as io])
(def wordlist-url (java.net.URL. "http://world.std.com/~reinhold/diceware.wordlist.asc"))

(defn get-wordlist []
  (filter (fn [line] (re-matches #"[1-6]{5}.*" line)) 
          (line-seq (io/reader wordlist-url))))

(defn split-line [line]
  (let [[_ k-string val] (first (re-seq #"([1-6]{5}).(.*)" line))]
    [(Integer/parseInt k-string) val]))

(def wordlist-map (delay (into {} (map split-line (get-wordlist)))))