Random shuffle Algorithm in python

Sample from a deck of cards, source: http://playingcards.wikidot.com/bicycle

Motivation

Recently I took the online course "Design of Computer Program" taught by professor Peter Norvig.
The course is simply great for any one who want to level up his programming skills.
In the first assignment Peter talked about random shuffle algorithm and how to write a correct one.

 What's a random shuffle algorithm?

Given a collection of objects, (list of user identifiers, a 52-card deck) the random shuffle algorithm should re-arrange the objects in a random way.
The end result is a permutation of the objects in the collection, but the algorithm must guarantee that all permutations are equally likely to appear, put simply the algorithm should not be biased toward any particular set of permutations.

Example

Let C = ["A", "B", "C"]
All possible permutation of C are ["ABC", "ACB", "BAC", "BCA", "CAB" , "CBA"]

If S is a random shuffle algorithm, then:
Probability( S(C) = "ABC" ) = 1/6
Probability( S(C) = "ACB" ) = 1/6
Probability( S(C) = "BAC" ) = 1/6
Probability( S(C) = "BCA" ) = 1/6
Probability( S(C) = "CAB" ) = 1/6
Probability( S(C) = "CBA" ) = 1/6

In the following we will try to come up with an efficient shuffler and make sure it's a fair one.

Bad shuffle algorithm

Suppose we have the following snippet of code.

The algorithm implemented through the function naive_shuffle works as following:
  1.  line 7: create a variable n to store the length of the array to be shuffled.
  2. Line 8: initialize a list named swapped with False values having size n.
  3. Line 10: while not all element in swapped are True:
  • Choose 2 indices i and j at random between 0 and n.
  • In the array to be shuffled swap element at indices i and j.
  • In the swapped array mark elements at indices i and j to True.
You may think that this algorithm is not the perfect but at least it works, frankly I have the same intuition when I first saw the above code.
Let me explain why this algorithm is not correct even if it shuffles the input array in a certain way.
First this algorithm may or may not quit the while loop depending on the implementation of the random.randrange function.
Second the algorithm do not guarantee that the different possible permutations of the input array are equally likely to show up in other words this algorithm show no signs to be fair.

Good shuffle algorithm

 As an example of a good and by good I mean correct and efficient shuffle algorithm, let's examine Knuth's P algorithm also know as Fisher-Yates shuffle algorithm.


The algorithm implemented through the function shuffle in plain English:
  1. Line 7: n denotes the size of the input array.
  2. Line 8: For every index i of the input array going from 0 up to n-2.
  3. Line 9: Choose a random index j between i and n-1, then swap element at i with element at j.

Is the algorithm fair (unbiased) ?
Yes, in fact for every position i in the input array we have exactly n-i possible swap candidates.
For position 0, there is n-0 = n possible swap candidates.
For position 1, there is n-1 possible swap candidates.
.
.
.
For position n-2, there is n - (n-2) = 2 possible candidates.
Thereof the total number of possible permutations generated by this algorithm is:

n * (n-1) * (n-2) * ... * 2 * 1 = n!, or n factorial which is exactly the number of permutations of n items.

 Simulation

Another way to prove that Knuth's P algorithm is unbiased in other words it generate all the possible permutations with equal probability and that the naive shuffle is biased is to run a simulation.

Result

fairly random: nope
shuffler:  naive_shuffler
input:  abc
permutations's distribution [('bca', '26.61%'), ('cab', '26.448%'), ('acb', '13.988%'), ('bac', '14.076%'), ('cba', '13.825%'), ('abc', '5.053%')]
------------------------------------
fairly random: yep
shuffler:  knuth_shuffler
input:  abc
permutations's distribution [('cab', '16.585%'), ('acb', '16.738%'), ('bca', '16.578%'), ('abc', '16.681%'), ('bac', '16.553%'), ('cba', '16.865%')]

On the one hand the result strongly suggest that the naive shuffler is a biased and therefore it's not a correct shuffler, on the other hand it give us more confidence on the fairness of Knuth's P algorithm.

Takeaway

  • When algorithms involve randomness then correctness is more than just getting some of the expected out puts.
  • Simulation is a very handy way to prove the fairness of algorithms that depend on randomness.

Comments