Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
When Random Isn't Random Enough: Lessons from an Online Poker Exploit (lauradhamilton.com)
185 points by lauradhamilton on Feb 9, 2014 | hide | past | favorite | 84 comments


As recent as 2010 we were finding major flaws in online poker security, here are a couple of videos I did of us sniffing hole cards out of the air because sites were lying about their use of SSL. They were using xOR encryption. Insane.

http://www.youtube.com/watch?v=4HBUe8Fb73Q http://www.youtube.com/watch?v=AAQDEXJdbQc


Ouch, I suppose the moral of the story is don't play poker for money using a wireless connection.


The moral of the story is don't play poker for money where you suspect MITM to be in effect, because the connection is not secure.


It wasn't a MITM. Although I've done that too, ARP flood the router and redirect the traffic through myself. Only works on sites where they didn't peer validate the SSL cert.

These were just packet dumps, wasn't associated with the WAP. It's hard to remember the exact details but I believe I was dumping the packets and decrypting them with the WEP key then piping them into a C program which just applied the decryption key to the packets.


Why would they even send that information to the client before it was needed anyway? Keep it on the server and push it when, and only when, the client needs to see it.


It's catching packets heading to other clients to grab the information they don't send to you.

Or it would be if it were put to use for evil.


Yeah, I remember that. :)


The solution here, which the article fails to mention, and which every security expert will undoubtedly tell you, is to make sure you use super random numbers (that's the technical term, for the layperson) by adding two random numbers together.


Hah, nice one. For people not getting the joke, adding two random numbers reduces the randomness and concentrates the results around a mean.

Interestingly, the perception that adding random numbers together results in even more random numbers is behind the popularity of the scam game "razzle". In razzle there's usually a board containing an array of depressions each lined up with a different value onto which is dropped a number of marbles (there are other ways to play as well, including dice). The important part is the scoring board. After each toss the values where all the marbles landed are added up and then a board is consulted to see how many "points" are scored from that value. The game is easy, get to 10 points and you win. However, there are two tricks. First, the scoring board is arranged in non sequential order. This is to conceal the fact that the group of middle numbers do not win any points. In actuality it is very difficult to win any points, since the probabilities are all concentrated in the middle. Second, because of the scattered nature of the scoring board it's very easy for the person running the game to cheat in your favor by "accidentally" giving you points when you shouldn't have earned them. The scam then works fairly simply. People pay money for each throw, and they are given the opportunity to win a high value prize. For the early throws the operator goes quickly and fudges the score lookups, building up points for the player that they haven't actually used, and giving them an unwarranted confidence in the game. After the player gets within a point or so of winning the operator then lets stops cheating and lets them play completely fairly on their own, at which point they have odds of worse than a thousand to one of winning (keeping in mind that it costs money for every throw).


No! You call a blocking rand function bound by available entropy.

If your random source is compromised, adding two numbers from the same broken source does nothing. What you can do though, is XOR numbers from independent random sources to improve the entropy of the final output. (not sure if that's what you meant by adding random numbers together)


Oh, I get it. These are both joke comments that are commenting on the weird number of urban legends surrounding randomness.


No actually the follow-on was serious.

You can prove to yourself that XOR-ing independent sources of entropy works:

Just remember that XOR is commutative, so you can rearrange the terms. Then realize that a XOR is how you implement a one-time pad.

So if you xor together 5 sources of "entropy", #1 is super broken and outputs all zeros, #2 is secure and independent, #3 is an NSA program, and #4 and #5 are weak, broken RNG's....then it doesn't matter, because the XOR's can be rearranged in your mind so that it's clear that #2 is still acting as an OTP on the rest (comes last). As long as it is uniform and independent (key really does get thrown away) you are good to go -- by the definition of OTP. If it had less than full entropy, then that means an OTP would leave some recoverable information.

Meanwhile, of course, the sources DO need to be independent. If #3 knows the stream that #2 is outputing, by coming later in the chain and producing its output after #2 has produced it, it can undo it by simply copying the output of #2.

This is why independence is important.

So in summary each XOR independently implements a one-time pad on all the rest of the xor results, whilst throwing away the key. If even one out of one hundred sources is actually random, doesn't matter which one, then the result is just as good as if that were the source yuo were using directly. As long as they are independent sources.


Really? I would have expected better from you for your reputation. :-)

Read carefully what I said. Explained more clearly, XORing is at worst a ceiling function of the entropy of two random sources. If it weren't, that would imply one-time pad is insecure.


I was reacting to "blocking function bound by available entropy".


If you care about having good random bytes, you should call a blocking random function (i.e it doesn't return random data until enough entropy is collected)(e.g./dev/urandom).

If you don't care about security, you can certainly call one that always returns using a PRNG (/dev/random). However, with it being based on a much smaller random seed, using it to generate keys is massively reducing the key space an attacker would need to search for discovery.

In summary, reading from /dev/random is a blocking function bound by available entropy and it's more secure than it's /dev/urandom counterpart that is not bound by entropy.

I'm not sure what your problem is there. Care to elaborate?


Doing nothing but an XOR is usually fine but one thing to keep in mind is that an attacker that can see one stream and control the other could completely eliminate any entropy yet make the data look completely random.

This sounds like some unlikely scenario but for instance the Linux kernel uses a method like this for /dev/random. Entropy is collected and mixed in an entropy pool from many sources but at the end it is XORed with the output of Rdrand on processors that support it. The NSA could force Intel to sign a malicious microcode update that changes Rdrand to AES_encrypt(i++, NSA_KEY) ^ entropy_pool and then any random data coming from the kernel is completely predictable and without NSA_KEY it wouldn't be discernible from truly random data.


That's not really right. The other requirement for that scheme to work is to leak the entropy_pool as well. That's why the linux kernel random source IS safe from bad hardware. Even if the hardware source is completely known by a third party, they can't discern the output from random data because it has been XORed with random data.

If it helps to think about, imagine that the evil hardware always output 00000000 for a byte of randomness. The kernel then XORs that with a byte from the entropy_pool, which is unknown to the adversary. The output byte is still completely unknown to the adversary even though it knows one of the inputs.


I believe that chops was being sarcastic in ascribing a typical naive and useless attempt at "improving" randomness to "experts". Come on, "super random numbers"??


Yes, it was indeed a joke. And one which has apparentlly fallen flat.


No no, I got it, much enjoyed.


No, it worked perfectly.

If you have to explain a joke to someone, that means the joke is making fun of them!


I guarantee you these guys were not using test driven development, they need to TEST that their random number generator is random enough.

With TDD they would never have had these problems.


Strangely, the name of the submitters company is "Additive Analytics"


I understand that "swap with entire deck" can't possibly be uniform because it has 52^n input possibilities, which is not divisible by 52! (and that the correct Fisher-Yates having 52! input possibilities and being able to generate every possible outcome is one way to prove that it is uniform). However, I'm not sure I can come up with an intuition for why any particular bias should exist, or why there is a discontinuity that makes it much more likely for a card to end up a short distance after its starting position:

http://en.wikipedia.org/wiki/File:Orderbias.png

Anyone have a good explanation?


I'm not entirely sure, but I have been able to figure a few things out.

By running some simulations by shuffling the range from 1 to n (with n going from 3 to 7), I found that at least one of the most common permutations had always started with 2. I'm unable to come up with a reason to explain this, but I was able to figure out something else interesting.

Imagining that the deck is vertical and going through each card and swapping it with a random card in the deck, the random card that has been swapped will never move further up the deck while the other card can still possibly move further up the deck. This implies that cards at the top of the deck should stay near the top and the cards at the bottom should stay near the bottom. I tested this by taking the sum of the sums of the first half of all the permutations and the sum of the sums of the second half of all the permutations and found that the total sum of the first halves was slightly smaller than the total sum of the second halves.

  n first halves | second halves
  3           53 |            54
  4         1265 |          1295
  5        18322 |         18976
  6       461683 |        498093
  7      9638931        10051128
So it seems that cards starting near the top are more likely to end up near the top and cards starting near the bottom are more likely to end up near the bottom.

Note: for odd numbers I threw out the middle number.


Here's my attempt at an intuitive explanation:

After the first swap, the distribution of numbers at the first position is uniform. What happens in the second swap? Well, the second position contains 1 with probability 1/n, and 2 with probability (n-1)/n. With probability 1/n, this position that is highly biased towards 2 (for n > 2) will get swapped with the first position. So 2 is more likely to be found in the first position.

Of course, it gets messy to carry this out for the n swaps and I still haven't really answered your question yet. To do that let's make the following assumption: after the ith swap, the distribution of numbers at position i is uniform. (I think you can prove this inductively but I haven't tried working it out - I was studying for a midterm before I got nerd-sniped.)

Before the ith swap, position i is biased towards number i. By our assumption, position i-1 is uniform. By making position i uniform we are essentially "smoothing out" the bias at position i to the other positions. Hence, as in the example I started with, it is more likely after swap i that position i-1 will contain number i, at the expense of the other numbers. This then reduces the likelihood of finding number k in positions 1, 2, etc. as opposed to position k-1: 2 is more likely in 1, 3 is more likely in 2, and so on.


There's a decent explanation here:

http://www.codinghorror.com/blog/2007/12/the-danger-of-naive...

There's also a more complete story about the online poker exploit here:

http://www.cigital.com/papers/download/developer_gambling.ph...


It seems to me that the only major issue here is using a seed which can be trivially brute forced. Even if you don't look around the expected server time in order to guess the seed more quickly, 32 bits is really not hard at all to brute force these days.

I don't believe the number of bits the PRNG can generate is an issue here since we only need to uniformly get a number between 1 and 52, though what may be questionable is the cycle length of the PRNG if it weren't using an easily brute forced seed.

I'm not entirely convinced the off-by-1 is substantial, nor the fact that the shuffle produces duplicate shuffles (I can't intuit a significant bias, so I may well be wrong here).

So to summarize: never seed a PRNG with a small and easily brute forced value.


"I'm not entirely convinced the off-by-1 is substantial".

What? You don't think always putting ace of spades at the bottom of the deck is substantial? (given that's what "52" represents)


That wouldn't be the result. The result is that the final card is swapped only once, and can never be swapped with itself, meaning it is always somewhere else in the deck. If the final card is the ace of spades, then the ace of spades can never be the last card in the shuffled deck.

In most casino card games, shuffles happen before this knowledge becomes particularly useful, which is why I don't think it's necessarily the major cause for concern here. Being able to easily guess the seed is, however, a pretty massive issue.


> Being able to easily guess the seed is, however, a pretty massive issue.

Particularly since they initialise the deck each time (it'd be a massive issue anyway, but this just makes it worse).

Starting with 2^32 states, and knowing their algorithm, you have ~4 billion possible shuffles. However, each card you see drastically reduces the search space. My reasoning is that each card you know reduces the possible number by about how many cards it could have been (so no knowledge is 2^32, one card is 2^32 / 52 ...)

Knowing just one card brings you to one of 83 million possible decks. One more card and you're down to just 1.6 million.

Once the flop is down, you know 5 cards and that means there are only about 15 possible decks.

I think this reasoning is right, but it does seem rather dramatic. I'll have to write a simulation, I've been looking for a good blog post to work through & to work on my visualisations.


> I can't intuit a significant bias, so I may well be wrong here

Assume you have a deck of 3 cards. If you switch any 2 cards 3 times, you end up with 27 possibilities. But there are 3! (= 6) permutations. Obviously 27 isn't evenly divisible by 6, so some permutations are represented more than others.

It's not as big an issue, but it is an issue, and it's a mistake many people are likely to make.


Direct link to the full article with a detailed explanation of the exploit: http://www.cigital.com/papers/download/developer_gambling.ph...


Ignoring that some of the variables don't match up properly (the arrays: card and Card), it seems like the explanation of the first flaw may also be flawed.

Flaw #1: An Off-by-One Error

The algorithm above tries to iterate over each card in the deck, swapping each card with another randomly chosen card in the deck. However—every programmer has made this mistake before—there's an off-by-one error. The function random(n) returns a number between 0 and (n-1), not between 1 and n as the programmer intends. As a result, the algorithm will never swap the 52nd card with itself; the 52nd card can never end up in the 52nd place. So that is the first reason the "random" card shuffling isn't really random.

The comment refers to the Pascal code:

  random_number := random(51)+1;
If the programmer really thought that random was between 1 and n then the random_number variable would be a number between 2 and 52 (1+1 to 51+1). It seems like, instead, a better explanation is that they may have thought random(n) produced a random number between 0 and n, hence the need to increment by one. Another explanation is they just messed up the slicing using 51 instead of 52.

The point being that in the writer's explanation of the flaw they actually make the same mistake.

Funnily enough googling "pascal random" points to a stackoverflow article where the best answer makes the same error.

https://stackoverflow.com/questions/4965863/how-to-get-a-ran...


I think programmer tried to be defensive by ignoring an edge case. He either was lazy in searching for its documentation (<= 1999 you know) and tried to be over-smart to avoid "Out of Range" exception in test/production. Or he didn't consider looking at documentation like some of us do. He may have given it a shot by running it several times to see if it actually generates the number provided as an argument.


This was an interesting article. (Font size is tiny using Chrome on iOS).

> If your business or technology depends on using random numbers, your best bet is to use a hardware random number generator.

Some hardware RNGs would be hopeless for this task. It'd be scary to have to buy one of these things and trust the output.


News/YC is my favorite iOS HackerNews client, it's free and beautiful, and comes with Readability so I never run into this problem. So many sites are either not responsive or do it badly, so it's a lifesaver.


I think you got downvoted because your comment reads like it was written by a marketing plant.


My b, I just get really enthusiastic about products that improve my life.


I haven't seen this link posted yet http://www.idquantique.com/random-number-generators/products...

Note they claim: "QUANTIS has also been approved by national authorities and can be used for gaming applications."

If I were implementing this for a casino, I'd do what other posters have already suggested and use at least two independent hardware sources for my random numbers and XOR them together. IMO Intel's on-chip RNG would probably be a good source to use, but only in conjunction with others.


> IMO Intel's on-chip RNG would probably be a good source to use

Only if NSA contractors don't play online poker in their spare time.


I'm curious how actually random are current generators in online poker? I mean, some rather subtle patterns, situations would generate larger pots, therefore more rake. Or being on the new players side in 50/50 situations would 'help' to get him addicted.

I am not talking about 100% of the time dealing someone pocket kings, and someone else pocket aces and king on the flop.

Something subtle and very rare would be enough to count for large amounts of money at the end of the year, given the volume of major poker sites. On other hand, if someone would leak it, that might ruin the business for good.


The major sites don't do this. We know because many people out there collect literally millions of poker hands observed on these sites and mine the data for every kind of statistic you can think of. If anything significant was out of whack they would have picked it up. Look at the 'online poker' section of the twoplustwo forums for example.

The random number generators used by these sites are hardware systems that use micro fluctuations in ambient temperature (for example) as a source of entropy and they are very careful to use enough bits of entropy for every card shuffled.


The random number generators used by these sites are hardware systems that use micro fluctuations in ambient temperature (for example) as a source of entropy and they are very careful to use enough bits of entropy for every card shuffled.

It's amusing to realize that they could just read from dev/urandom with zero risk. They're probably not running Linux, but still.

So, for anyone who's wondering if you need this, or if this adds any extra security: probably not. There's no reason not to use the extremely well-tested and well-understood /dev/urandom.


But their devs can't charge as much if they say "I implemented the random number generator, is one line of bash code" rather than "I created a thermal system that is connected to our server where we will use its data as seeds to generate cryptographically random numbers".

Capitalism, we all play it ;)


Actually for something cryptographic or sensitive in nature you would want to read from /dev/random. The "u" in urandom stands for unlimited, basically if the entropy pool runs dry, reads from /dev/urandom will still return data but that data doesn't necessarily have a significant amount of entropy in it. Reads from /dev/random however will block and wait for more entropy if the entropy pool runs dry.


No, this is an urban myth perpetuated by a broken man page. Cryptographic software should use urandom, not random.


tptacek has addressed this concern many times before: https://hn.algolia.com/?q=author:tptacek%20urandom#!/comment...

The summary is that it's a myth.


FWIW, Windows has an equivalent of /dev/urandom. http://msdn.microsoft.com/en-us/library/windows/desktop/aa37...


Is CryptGenRandom equally secure as /dev/urandom, or are there ways to use it wrong?


Even for bad poker sites they are using HW RNGs. I worked for a hybrid meat space/online casino and we were using IDQ cards. Except when we were not. Owner got squirrely and went for bottom dollar to implement a stand alone version of a carefully engineered system I had developed. The developers ended up using /dev/random instead of at least making a 1 gig rng lookup table. Also the ticketing system was unencrypted so you could print whatever you on a barcode and feed it to the redemption machine to clean it out. Also the games were not tested statistically. Literally a guy sat there and played games until it "felt" right.

I tried to help them out, but eventually I washed my hands of everything and walked away. Too many weird things going on. From talking to others in the industry it is not that different elsewhere.

TL;DR - If you do work for casinos be ready to walk away when things get weird.


Since the odds of getting certain hands is known and there are a lot of professionals with very large databases of hands any manipulation like that would stand out pretty quickly as a statistical anomaly.


Professionals monitor the play of other players and identify statistically abnormalities. You would have to be clever and conservative enough to evade those types of checks which, while I am sure is possible, is not trivial.


Flaw #3 seems flawed to me. There are 52! possible ways to shuffle a deck of cards but a game is only played using a small subset. Suppose there are 4 players, then you need 2 times 4 plus 5 is 13 cards. The remaining deck of 39 cards can be shuffled in 39! ways without affecting the game. These possibilities are still included in those 52! of total possibilities. In case of 4 players there are only 52!/39! possible games that can be played. This is still a larger number then the 4 billion mentioned in the article but it doesn't dwarf the 4 billion as the 8*10^67 does.


I admit I am a total noob here, but couldn't you make something with a TV turned to a station with just static? I have often wondered about this but lack the 'propriate schoolin'.


In engineering terms, it's easier to use a reversed-biased diode as a noise source. An input circuit would transfer the diode's random waveform into a shift register as zeros and ones, until the desired word size has been assembled.

It's really quite simple, and it could produce a very high degree of randomness. It would differ from typical PRNGs in that the binary sequence could not be reproduced, no matter how much you knew about the circuit.

> I have often wondered about this but lack the 'propriate schoolin'.

That's an easily remedied problem. Remember what Mark Twain said: "I have never let my schooling interfere with my education".


Yeah, but AI is taking most of my free cycles at the moment. Still learning, but not much formal computer science. Thank you for responding.


> Yeah, but AI is taking most of my free cycles at the moment.

Given the present state of technology, that's probably a better use of your time. Someone else can produce a white noise source and mass-market it.


In engineering terms it would be easier to cover a shitty webcam with a bag, take a photo, and then use the lower bit of each pixel as the source of entropy.

You can use other things, like microphone input, or CPU temperature.

No need to buy expensive hardware generators.

Then you can use something like Fortuna to generate more random numbers from that seed:

http://en.wikipedia.org/wiki/Fortuna_(PRNG)


> You can use other things, like microphone input,

There's now been reported a way to hack a computer by way of its microphone, so that particular method is out.

> or CPU temperature.

CPU temperature doesn't change enough to serve as a secure source of entropy.

> No need to buy expensive hardware generators.

In fact, there are plenty of projects meant to locate sources of entropy to provide more randomness for computer security purposes. The idea of a hardware source like a reverse-biased diode is just one example.

> Then you can use something like Fortuna ...

Interestingly, the Fortuna scheme relies on system sources of entropy other than its own resources to assure security.


> There's now been reported a way to hack a computer by way of its microphone, so that particular method is out.

No, it's not. It's in some server in a datacenter. If you can get to the server, you can just change the Poker code, no need to do any crazy microphone hacking. Plus you'd need to know exactly when and how the microphone was sampled.

> CPU temperature doesn't change enough to serve as a secure source of entropy.

Doesn't matter how much it changes, the lowest bits are random on a large enough time scale.

You can just run the sampler until you get enough bits, it will take a few minutes. Or you can run it constantly and add more and more entropy as you go.


> > There's now been reported a way to hack a computer by way of its microphone, so that particular method is out.

> No, it's not.

Excuse me? Here's the source:

http://www.scientificamerican.com/article/computers-can-be-h...

Here's a quote: "Computers Can Be Hacked Using High-Frequency Sound -- A computer's microphone and speakers can covertly send and receive data"

Which word didn't you understand?

>> CPU temperature doesn't change enough to serve as a secure source of entropy.

> Doesn't matter how much it changes, the lowest bits are random on a large enough time scale.

False. Computer temperatures aren't a decent source of entropy, because they're too likely to remain stable for long periods. And the temperature can be predicted on a daily basis, a dangerous property for a secure source. That's why this source isn't used.


"by distances of up to about 65 feet (20 meters)"

Good luck getting that close to a server in a data center.


> That's where this program is for: adding entropy-data to the kernel-driver. It does that by fetching 2 images from a video4linux-device (with a random delay in between), calculating the difference between those two and then calculating the number of information-bits in that data. After that, the data with the number-of-entropy-bits is submitted to the kernel-random-driver.

http://www.vanheusden.com/ved/

There's also an "audio" version of, basically, the same thing [0] that I intend on using in the near future. It's as simple as tuning an FM radio to "nothing" and connecting its headphone/line out to your sound card's input line.

[0]: http://www.vanheusden.com/aed/


http://www.random.org/ uses atmospheric radio noise to generate random numbers, similar to what you are proposing.


A similar idea was done by silicon graphics using a lava lamp - http://en.wikipedia.org/wiki/Lavarand


Someone had a go at that exact idea https://github.com/pwarren/rtl-entropy


I'm not a pro (not even an amateur, actually), but the very premise of “shuffling the deck” bewilders me. Shuffling the whole deck is so obviously bug-prone. Why not just pick random elements from decks instead? If I ever wrote a deck simulator I'd never shuffle anything, just picked 1 out of n < 52 when needed. Is this approach too naive and well-known to be somehow flawed as well?


You're simulating a full game of poker. Once a player has been given card 'i', you have to ensure that card 'i' isn't drawn again during the game. You could maintain a set containing all cards that have already been drawn, and re-select your random number if you draw a duplicate, but that's going to get awfully laggy once large numbers of cards have been drawn.


I keep track on which cards have been drawn this round. Every time I need a new card drawn I use, say, one of 52 predefined generators picking 1 out of n <= 52 depending on a millisecond number.

(I could even publish these: an attacker would still need to know which millisecond every single card in this round had been drawn to exploit it.)

I don't reselect random numbers; every time it is used, DrawCard function excludes cards currently in-game from the deck and then applies one of those 52 predefined generators to the result of that ComplementarySet call (OK, only 51 of them are nontrivial, doesn't matter). The buffer of drawn cards in current round is about 15 cards or so. It becomes useless once the round is over and is collected by GC, or cleared in some other way. Nothing gets accumulated.


(52 predefined generators is probably an overkill but when you run an online casino then maybe it's not. :-)


Another lesson to be learned here is it's generally not a good idea to publish the code to a vulnerable, in-production system unless you are very, very sure there are no bugs.


I like to use this as an example of a problem that secure multiparty computation can solve i.e. that you can remove a buggy / malicious central dealer from a system.



Did this exploit actually got exploited? Or did they notify the site and gave them an opportunity to fix it before they released their findings?


This is one of those times you'd really want to use an actual random number generator, rather than a pseudo-random number generator.


PRNG would be fine if you could ensure that there's no leakage or reversible information.


A CSPRNG is fine (just use any decent stream cipher), but as the article suggests, properly seeding it is crucial.


My exact point was that this is non-trivial to ensure.


Many regulatory licensing authorities of online gambling companies mandate the use of certified Hardware RNGs. Example: Alderney Gaming Commission. Certification includes Monte Carlo testing of the output, amongst other things.


true random based on atmospheric noise: random.org


That's a bad idea. You shouldn't be trusting random.org with your random data (what if they get hacked or something). Also if it's send over http then an attacker could listen in to the random data you were being sent (either at your end or at random.org). Ultimately I think you'd do best to use several software methods and 2 hardware methods and just xor them all together into a single secure source of random numbers. I mean, if you're doing this as a business the small cost of this is well worth not having to deal with your random source having issues.


> "...use several software methods and 2 hardware methods and just xor them all together..."

An xor is only okay if you ensure that all of your RNGs are completely independent - unable to affect or observe each other, and do not draw any of their input from any shared or correlated sources. If you've got two machines seeding their entropy pools with packet timings from the same network, then XORing their PRNG output is as likely to decrease entropy as increase it.


I do however think that I am correct in thinking that provided that any one of your sources is "true" and independent then the xor'd string of binary will be completely random. You cannot xor an unkown random binary string and make it less random.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: