Shuffling arrays

Details: Last Updated: 05 September 2021

Surprisingly, today's accepted algorithm for shuffling a list was arrived at by Ronald Fraser and Frank Yates in 1938. Their version used paper and pencil, but it has since been updated for computers and popularised by Donald Knuth. Here's how to shuffle things properly.

Shuffling, the wrong way

Before we get to looking at the right way to do this, let's look at the wrong way. Here's a simple shuffle written in PHP:

// WARNING! This algorithm is known to be faulty. It is placed here for demonstration purposes only.
function naiveShuffle(array $arr):array {
    for ($i = 0; $i < count($arr);   $i++) {
        $ex = rand(0,count($arr)-1);
        [$arr[$i], $arr[$ex]] = [$arr[$ex], $arr[$i]] ;
    }
    return $arr;
}

At first it seems simple enough: iterate over the array, swapping each element with a randomly selected element from the list. What could go wrong?

Let's take a closer look. Take a three element list - [1,2,3]. There are six possible combinations, so we'll run the shuffle and look at the distribution of results.

This is what we get with a thousand iterations:

321	************************
123	***************************
231	*******************************
312	********************************
213	*************************************
132	****************************************

There's a definite bias over just a thousand iterations, but this is a random result, so it could be alright. Let's move on...

Not so fast!

Running the same code for a hundred thousand iterations yields this result:

312	*******************************
321	*******************************
123	*******************************
231	***************************************
132	***************************************
213	****************************************

We're clearly seeing certain outcomes favoured over others. There's obviously a problem here. If you're still not convinced, the code used to produce these results can be downloaded to play with.

The Fisher-Yates algorithm

This is the Fisher-Yates solution implemented in PHP:

function FYshuffle(array $arr):array {
    for ($i = sizeof($arr)-1;$i>0; $i--) {
        $ex = random_int(0, $i);
        [$arr[$ex],$arr[$i]] =[$arr[$i],$arr[$ex]];
    }
    return $arr;
}

At first glance it looks very similar. In fact, you might think it's the same, but look closely.

Here's the result of a hundred thousand iterations of a three-element list:

213	***********************************
312	***********************************
132	***********************************
123	***********************************
321	***********************************
231	***********************************

What's different?

There's no discernible bias in the Fisher Yates result, so how is this different?

Look closely at the two functions. The first iterates over the entire array, swapping every element with another randomly selected element. Fisher-Yates also iterates over the entire array, but at each iteration selects from only those elements that have not yet been visited.

Maths tells us that there are 3! ways to arrange 3 elements. That's six combinations. The incorrect algorithm selects from every element at every swap. That's three potential combination for the first swap, three for the second and three for the third. That's a total of 27 outcomes! 27 is not divisible by 6, so some outcomes must be favoured.

The Fisher-Yates solution generates three possible outcomes at the first swap, but only two outcomes on the second. Thus it produces exactly six outcomes.

The moral

This is the article I had intended to write, but the conclusion isn't what I expected. Yes, there's a right way and a wrong way to shuffle a list, but that's not all. A simple and poorly tested solution may exhibit subtly unexpected behaviour in production. As a professional developer it's important to remember that. The academics covered this ground long ago, and good libraries exist for many tasks. Use them.

The JavaScript version

For completeness, here's the Fisher Yates shuffle implemented in JavaScript

function shuffle(arr) {
    let randomNumbers =  new Uint32Array(arr.length);
    window.crypto.getRandomValues(randomNumbers);
    for (let i = arr.length-1; i>0; i--) {
        let ex = Math.floor((randomNumbers[i]/4294967295)*(i+1));
        [arr[ex],arr[i]] = [arr[i],arr[ex]];
    }
    return arr;
}

Notes

A poor algorithm isn't the only potential source of bias. Random number generators can also inject a bias where there would otherwise be none. See RANDU for a description of a particularly notorious implementation.

PHP includes a shuffle() function but it uses the Mersenne Twister random number generator which is not cyptographically secure. The function shown here uses the cryptographically secure random_int() function.

Similarly, Javascript's Math.random() function is not cryptographically secure. This implementation uses Crypto.getRandomValues() instead. The constant 4294967295 appears as the largest value that can be stored in an Unsigned 32-bit integer. It's used to scale the random number to the range of the size of the array.

Get the code!

The download includes this PHP and JavaScript implementations of the Fisher Yates algorithm, along with the incorrect function and the short PHP script usesd to generate the histograms above.

Questions or comments?

Contact me!

Pragma, not Dogma