Random shuffle Algorithm in python
Motivation
Recently I took the online course "Design of Computer Program" taught by professor Peter Norvig.
The course is simply great for any one who want to level up his programming skills.
In the first assignment Peter talked about random shuffle algorithm and how to write a correct one.
What's a random shuffle algorithm?
Given a collection of objects, (list of user identifiers, a 52-card deck) the random shuffle algorithm should re-arrange the objects in a random way.
The end result is a permutation of the objects in the collection, but the algorithm must guarantee that all permutations are equally likely to appear, put simply the algorithm should not be biased toward any particular set of permutations.
Example
Let C = ["A", "B", "C"]
All possible permutation of C are ["ABC", "ACB", "BAC", "BCA", "CAB" , "CBA"]
If S is a random shuffle algorithm, then:
Probability( S(C) = "ABC" ) = 1/6
Probability( S(C) = "ACB" ) = 1/6
Probability( S(C) = "BAC" ) = 1/6
Probability( S(C) = "BCA" ) = 1/6
Probability( S(C) = "CAB" ) = 1/6
Probability( S(C) = "CBA" ) = 1/6
In the following we will try to come up with an efficient shuffler and make sure it's a fair one.
Bad shuffle algorithm
Suppose we have the following snippet of code.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import random | |
def swap(array, i, j): | |
array[i], array[j] = array[j], array[i] | |
def shuffle(array): | |
n = len(array) | |
swapped = [False]*n | |
while not all(swapped): | |
i, j = random.randrange(n), random.randrange(n) | |
swap(array, i, j) | |
swapped[i] = swapped[j] = True |
The algorithm implemented through the function naive_shuffle works as following:
- line 7: create a variable n to store the length of the array to be shuffled.
- Line 8: initialize a list named swapped with False values having size n.
- Line 10: while not all element in swapped are True:
- Choose 2 indices i and j at random between 0 and n.
- In the array to be shuffled swap element at indices i and j.
- In the swapped array mark elements at indices i and j to True.
Let me explain why this algorithm is not correct even if it shuffles the input array in a certain way.
First this algorithm may or may not quit the while loop depending on the implementation of the random.randrange function.
Second the algorithm do not guarantee that the different possible permutations of the input array are equally likely to show up in other words this algorithm show no signs to be fair.
Good shuffle algorithm
As an example of a good and by good I mean correct and efficient shuffle algorithm, let's examine Knuth's P algorithm also know as Fisher-Yates shuffle algorithm.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import random | |
def swap(array, i, j): | |
array[i], array[j] = array[j], array[i] | |
def shuffle(array): | |
n = len(array) | |
for i in range(n-1): | |
swap(array, i, random.randrange(i,n)) |
The algorithm implemented through the function shuffle in plain English:
- Line 7: n denotes the size of the input array.
- Line 8: For every index i of the input array going from 0 up to n-2.
- Line 9: Choose a random index j between i and n-1, then swap element at i with element at j.
Is the algorithm fair (unbiased) ?
Yes, in fact for every position i in the input array we have exactly n-i possible swap candidates.
For position 0, there is n-0 = n possible swap candidates.
For position 1, there is n-1 possible swap candidates.
.
.
.
For position n-2, there is n - (n-2) = 2 possible candidates.
Thereof the total number of possible permutations generated by this algorithm is:
n * (n-1) * (n-2) * ... * 2 * 1 = n!, or n factorial which is exactly the number of permutations of n items.
Simulation
Another way to prove that Knuth's P algorithm is unbiased in other words it generate all the possible permutations with equal probability and that the naive shuffle is biased is to run a simulation.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def fact(n): | |
return 1 if n == 1 else n * fact(n-1) | |
def test_shufller(shuffler, items, n=100000): | |
counts = {} | |
for _ in range(n): | |
array = list(items) | |
shuffler(array) | |
p = ''.join(array) | |
counts[p] = counts.get(p, 0) + 1 | |
# exptected counts | |
expected = n / fact(len(deck)) | |
ok = all([0.99 <= val/expected <= 1.1 for val in counts.values()]) | |
if ok: | |
print('fairly random: yep') | |
else: | |
print('fairly random: nope') | |
print('shuffler: ', shuffler.__name__) | |
print('input: ', array) | |
print('permutations\'s distribution', [(key,val*100/n) for key, val in counts.items()]) | |
print('------------------------------------') | |
test_input = 'abc' | |
test_shufller(naive_shuffler, test_input) | |
test_shufller(knuth_shuffler, test_input) |
Result
fairly random: nope
shuffler: naive_shuffler
input: abc
permutations's distribution [('bca', '26.61%'), ('cab', '26.448%'), ('acb', '13.988%'), ('bac', '14.076%'), ('cba', '13.825%'), ('abc', '5.053%')]
------------------------------------
fairly random: yep
shuffler: knuth_shuffler
input: abc
permutations's distribution [('cab', '16.585%'), ('acb', '16.738%'), ('bca', '16.578%'), ('abc', '16.681%'), ('bac', '16.553%'), ('cba', '16.865%')]
shuffler: naive_shuffler
input: abc
permutations's distribution [('bca', '26.61%'), ('cab', '26.448%'), ('acb', '13.988%'), ('bac', '14.076%'), ('cba', '13.825%'), ('abc', '5.053%')]
------------------------------------
fairly random: yep
shuffler: knuth_shuffler
input: abc
permutations's distribution [('cab', '16.585%'), ('acb', '16.738%'), ('bca', '16.578%'), ('abc', '16.681%'), ('bac', '16.553%'), ('cba', '16.865%')]
On the one hand the result strongly suggest that the naive shuffler is a biased and therefore it's not a correct shuffler, on the other hand it give us more confidence on the fairness of Knuth's P algorithm.
Takeaway
- When algorithms involve randomness then correctness is more than just getting some of the expected out puts.
- Simulation is a very handy way to prove the fairness of algorithms that depend on randomness.
Comments
Post a Comment