Even after the Snowden revelations, there remained one big mystery about what the NSA was doing and how. The NSA’s classified 2013 budget request mentioned, as a priority item, “groundbreaking cryptanalytic capabilities to defeat adversarial cryptography and exploit internet traffic.” There was a requested increase, of several hundred million dollars, for “cryptanalytic IT services” and “cryptanalysis and exploitation services program C” (whatever that was). And a classified presentation slide showed encrypted data being passed to a high-performance computing system called “TURMOIL,” and decrypts coming out. But whatever was going on inside TURMOIL seemed to be secret even *within* NSA; someone at Snowden’s level wouldn’t have had access to the details.

So, what was (or is) inside the NSA’s cryptanalytic black box? A quantum computer? Maybe even one that they bought from D-Wave? (Rimshot.) A fast classical factoring algorithm? A proof of P=NP? Commentators on the Internet rushed to suggest each of these far-reaching possibilities. Some of us tried to pour cold water on these speculations—pointing out that one could envision many scenarios that were a little more prosaic, a little more tied to the details of how public-key crypto is actually used in the real world. Were we just naïve?

This week, a new bombshell 14-author paper (see also the website) advances an exceedingly plausible hypothesis about what *may* have been the NSA’s greatest cryptanalytic secret of recent years. One of the authors is J. Alex Halderman of the University of Michigan, my best friend since junior high school, who I’ve blogged about before. Because of that, I had some advance knowledge of this scoop, and found myself having to do what regular *Shtetl-Optimized* readers will know is the single hardest thing in the world for me: *bite my tongue and not say anything.* Until now, that is.

Besides Alex, the other authors are David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot, Eric Wustrow, Santiago Zanella-Béguelink, and Paul Zimmermann (two of these, Green and Heninger, have previously turned up on *Shtetl-Optimized*).

These authors study vulnerabilities in Diffie-Hellman key exchange, the “original” (but still widely-used) public-key cryptosystem, the one that predates even RSA. Diffie-Hellman is the thing where Alice and Bob first agree on a huge prime number p and a number g, then Alice picks a secret a and sends Bob g^{a} (mod p), and Bob picks a secret b and sends Alice g^{b} (mod p), and then Alice and Bob can both compute (g^{a})^{b}=(g^{b})^{a}=g^{ab} (mod p), but an eavesdropper who’s listening in only knows p, g, g^{a} (mod p), and g^{b} (mod p), and one can plausibly conjecture that it’s hard from those things alone to get g^{ab} (mod p). So then Alice and Bob share a secret unknown to the eavesdropper, which they didn’t before, and they can use that secret to start doing cryptography.

As far as anyone knows today, the best way to break Diffie-Hellman is simply by calculating discrete logarithms: that is, solving the problem of recovering a given only g and h=g^{a} (mod p). At least on a classical computer, the fastest known algorithm for discrete logarithms (over fields of prime order) is the number field sieve (NFS). Under plausible conjectures about the distribution of “smooth” numbers, NFS uses time that grows like exp((1.923+o(1))(log p)^{1/3}(log log p)^{2/3}), where the exp and logs are base e (and yes, even the lower-order stuff like (log log p)^{2/3} makes a big difference in practice). Of course, once you know the running time of the best-known algorithm, you can then try to choose a key size (that is, a value of log(p)) that’s out of reach for that algorithm on the computing hardware of today.

(Note that the recent breakthrough of Antoine Joux, solving discrete log in in quasipolynomial time in fields of small characteristic, also relied heavily on sieving ideas. But there are no improvements from this yet for the “original” discrete log problem, over prime fields.)

But there’s one crucial further fact, which has been understood for at least a decade by theoretical cryptographers, but somehow was slow to filter out to the people who deploy practical cryptosystems. The further fact is that in NFS, *you can arrange things so that almost all the discrete-logging effort depends only on the prime number p, and not at all on the specific numbers g and h for which you’re trying to take the discrete log*. After this initial “precomputation” step, you then have a massive database that you can use to speed up the “descent” step: the step of solving of g^{a}=h (mod p), for any (g,h) pair that you want.

It’s a little like the complexity class P/poly, where a single, hard-to-compute “advice string” unlocks exponentially many inputs once you have it. (Or a bit more precisely, one could say that NFS reveals that exponentiation modulo a prime number is *sort of* a trapdoor one-way function, except that the trapdoor information is subexponential-size, and given the trapdoor, inverting the function is still subexponential-time, but a milder subexponential than before.)

The kicker is that, in practice, a large percentage of all clients and servers that use Diffie-Hellman key exchange use the same few prime numbers p. This means that, if you wanted to decrypt a large fraction of all the traffic encrypted with Diffie-Hellman, you wouldn’t need to do NFS over and over: you could just do it for a few p’s and cache the results. This fact can singlehandedly change the outlook for breaking Diffie-Hellman.

The story is different depending on the key size, log(p). In the 1990s, the US government insisted on “export-grade” cryptography for products sold overseas (what a quaint concept!), which meant that the key size was restricted to 512 bits. For 512-bit keys, Adrian et al. were able to implement NFS and use it to do the precomputation step in about 7 days on a cluster with a few thousand cores. After this initial precomputation step (which produced 2.5GB of data), doing the descent, to find the discrete log for a specific (g,h) pair, took only about 90 seconds on a 24-core machine.

OK, but no one *still* uses 512-bit keys, do they? The first part of Adrian et al.’s paper demonstrates that, because of implementation issues, even today you can force many servers to “downgrade” to the 512-bit, export-grade keys—and then, having done so, you can stall for time for 90 seconds as you figure out the session key, and then do a man-in-the-middle attack and take over and impersonate the server. It’s an impressive example of the sort of game computer security researchers have been playing for a long time—but it’s really just a warmup to the main act.

As you’d expect, many servers today are configured more intelligently, and will only agree to 1024-bit keys. But even there, Adrian et al. found that a large fraction of servers rely on just a single 1024-bit prime (!), and many of the ones that don’t rely on just a few other primes. Adrian et al. estimate that, for a single 1024-bit prime, doing the NFS precomputation would take about 45 million years using a single core—or to put it more ominously, 1 year using 45 million cores. If you built special-purpose hardware, that could go down by almost two orders of magnitude, putting the monetary cost at a few hundred million dollars, completely within the reach of a sufficiently determined nation-state. Once the precomputation was done, and the terabytes of output stored in a data center somewhere, computing a particular discrete log would then take about 30 days using 1 core, or mere minutes using a supercomputer. Once again, none of this is assuming any algorithmic advances beyond what’s publicly known. (Of course, it’s possible that the NSA *also* has some algorithmic advances; even modest ones could obviate the need for special-purpose hardware.)

While writing this post, I did my own back-of-the-envelope, and got that using NFS, calculating a 1024-bit discrete log should be about 7.5 million times harder than calculating a 512-bit discrete log. So, extrapolating from the 7 days it took Adrian et al. to do it for 512 bits, this suggests that it might’ve taken them about 143,840 years to calculate 1024-bit discrete logs with the few thousand cores they had, or 1 year if they had 143,840 times as many cores (since almost all this stuff is extremely parallelizable). Adrian et al. mention optimizations that they expect would improve this by a factor of 3, giving us about 100 million core-years, very similar to Adrian et al.’s estimate of 45 million core-years (the lower-order terms in the running time of NFS might account for some of the remaining discrepancy).

Adrian et al. mount a detailed argument in their paper that all of the details about NSA’s “groundbreaking cryptanalytic capabilities” that we learned from the Snowden documents match what *would* be true if the NSA were doing something like the above. The way Alex put it to me is that, sure, the NSA might not have been doing this, but if not, then he would like to understand *why* not—for it would’ve been completely feasible within the cryptanalytic budget they had, and the NSA would’ve known that, and it would’ve been a very good codebreaking value for the money.

Now that we know about this weakness of Diffie-Hellman key exchange, what can be done?

The most obvious solution—but a good one!—is just to use longer keys. For decades, when applied cryptographers would announce some attack like this, theorists like me would say with exasperation: “dude, why don’t you fix *all* these problems in one stroke by just, like, increasing the key sizes by a factor of 10? when it’s an exponential against a polynomial, we all know the exponential will win eventually, so why not just go out to where it does?” The applied cryptographers explain to us, with equal exasperation in their voices, that there are all sorts of reasons why not, from efficiency to (maybe the biggest thing) *backwards-compatibility.* You can’t unilaterally demand 2048-bit keys, if millions of your customers are using browsers that only understand 1024-bit keys. On the other hand, given the new revelations, it looks like there really will be a big push to migrate to larger key sizes, as the theorists would’ve suggested from their ivory towers.

A second, equally-obvious solution is to *stop relying so much on the same few prime numbers* in Diffie-Hellman key exchange. (Note that the reason RSA isn’t vulnerable to this particular attack is that it inherently requires a different composite number N for each public key.) In practice, generating a new huge random prime number tends to be expensive—taking, say, a few minutes—which is why people so often rely on “standard” primes. At the least, we could use libraries of millions of “safe” primes, from which a prime for a given session is chosen randomly.

A third solution is to migrate to elliptic-curve cryptography (ECC), which as far as anyone knows today, is much less vulnerable to descent attacks than the original Diffie-Hellman scheme. Alas, there’s been a lot of understandable distrust of ECC after the DUAL_EC_DBRG scandal, in which it came out that the NSA backdoored some of NIST’s elliptic-curve-based pseudorandom generators by choosing particular elliptic curves that it knew how handle. But maybe the right lesson to draw is mod-p groups and elliptic-curve groups *both* seem to be pretty good for cryptography, but the mod-p groups are less good if everyone is using the same few prime numbers p (*and* those primes are “within nation-state range”), and the elliptic-curve groups are less good if everyone is using the same few elliptic curves. (A lot of these things do seem pretty predictable with hindsight, but how many did *you* predict?)

Many people will use this paper to ask political questions, like: hasn’t the NSA’s codebreaking mission once again usurped its mission to ensure the nation’s information security? Doesn’t the 512-bit vulnerability that many Diffie-Hellman implementations still face, as a holdover from the 1990s export rules, illustrate why encryption should never be deliberately weakened for purposes of “national security”? How can we get over the issue of backwards-compatibility, and get everyone using strong crypto? People absolutely should be asking such questions.

But for readers of this blog, there’s one question that probably looms even larger than those of freedom versus security, openness versus secrecy, etc.: namely, the question of theory versus practice. Which “side” should be said to have “won” this round? Some will say: those useless theoretical cryptographers, they didn’t even know how their coveted Diffie-Hellman system could be broken in the real world! The theoretical cryptographers might reply: *of course* we knew about the ability to do precomputation with NFS! This wasn’t some NSA secret; it’s something we discussed openly for years. And if someone told us how Diffie-Hellman was actually being used (with much of the world relying on the same few primes), we could’ve immediately spotted the potential for such an attack. To which others might reply: then why didn’t you spot it?

Perhaps the right lesson to draw is how silly such debates really are. In the end, piecing this story together took a team that was willing to do everything from learning some fairly difficult number theory to coding up simulations to poring over the Snowden documents for clues about the NSA’s budget. Clear thought doesn’t respect the boundaries between disciplines, or between theory and practice.

*(Thanks very much to Nadia Heninger for reading this post and correcting a few errors in it. For more about this, see Bruce Schneier’s post or Matt Green’s post.)*

by Scott at May 23, 2015 01:41 AM UTC