I was recently posed a question by a psychologist friend of mine. She needed to enumerate all possible answers to a set of 20 questions. Each individual question could have three answers. She then planned to count the actual frequency of answers from the test sample (say, N~=100) and run this result through some other statistical software. The problem was that the statistical software’s input format required the permutations of answers, even if they had a frequency of zero.
I could easily write a procedural iterative program to enumerate all possible binary choices for 20 questions. (Wait, can I?) Wait – what about three possible choices?
The difficulty is deciding just what type of problem this actually is. Is it a permutation thing (i.e., order matters)? Combination (i.e., order doesn’t matter)? Enumeration (iterate each possible permutation)? Using what number base (3)? Should I use an iterative or recursive solution, or perhaps a clever workaround? How long would a reasonably efficient program take?
In any event, I knew that the output set was potentially large, and it wasn’t immediately apparent how to write a piece of C code to enumerate all possible permutations of answers for 20 questions. Let’s define what I’m looking at: I want all possible combinations (i.e., I don’t care about order specifically) of answer strings to a list of 20 questions where each question could be answered in 1 of 3 ways. If I were just interested in permuting the test questions themselves, I would have a set of 20! or 2,432,902,008,176,640,000 or 2.4×1018 (as Unix `bc’ tells me). But I’m not; I don’t care what order the questions are in, but rather producing all possible ways to answer the test itself for one arbitrary ordering.
For 1 question, there are three possible answers: 0, 1, 2 (let’s say they correspond to “yes”, “maybe”, “no”). You might also consider a null answer in addition to this set, but for now, let’s ignore that possibility. So, there are three possible ways that this one question test could be answered. What if we added more questions?
Q1: ?
0, 1, 2
Q2: ?
0, 1, 2
So we have all possible answer strings (of length 2) as:
00
01
02
10
11
12
20
21
22
With just 2 questions, there are 9 possible answer permutations.
Thus, the number of questions becomes the length of our string. The number of response options become the possible values of each location in the bitstring. In binary (base 2) number systems, the possibilities proceed in powers of 2 (e.g., for a 100 question test with only two possible response options, the total number of answer combinations is 2^100). With base 3 and 20 questions, the total number of unique answer combinations is 3^20; that is, someone could answer the test itself in any one of 3^20 ways. For those of us who are computationally challenged, this number has a value of 3,486,784,401 or about 3.4 billion (as an aside, even if 10,000 people responded to this test, the sets of answers they produce would very sparsely populate this space of 3.4 billion strings).
So how do you generate 3.4 billion unique strings of length 20 where each position could be one of three values? Actually, how do you write a program to do that for you?
It turns out that we can take a couple of interesting detours into mathematics [1,2], but I’d prefer to stay on the CS side of the house for now. Let’s think about symbol representation. Above, I abandoned ‘y’, ‘n’, and ‘m’ for representing the choices because those three characters have no directly mathematical relationship. Representing the answers as 0, 1, and 2 (on a binary machine) really doesn’t seem to have much advantage either.
There is in fact a ternary number system with some interesting properties. According to the American Scientist article “Third Base” (Nov/Dec 2001), the ternary number system “is the most efficient of all integer bases; it offers the most economical way of representing numbers.”
Perhaps I could simply write a program to count to 3,486,784,401, converting each integer value to a bitstring and then write a routine for converting that bitstring to a tritstring (ternary string) and printing it to stdout. I’ll try that, and see what happens. But let’s consider the heart of that approach: the translation of binary bitstrings to tritstrings. Let’s translate the binary representation of decimal 12 to ternary.
12 contains two powers of 2: 8 + 4. So in a 4-bit string, let’s turn on the leftmost two bits to get: 1100 (i.e., 1*23 + 1*22 + 0*21 + 0*20, or 1*8 + 1*4 + 0 + 0).
Now how do we convert from base 2 to base 3? How do you convert from one base to another? Well, I just did it for base 10 to base 2, albiet informally. You keep dividing the number to be translated by the target base and keep track of the remainder. The sequence of remainders becomes the digits of your translated number. To wit:
12 / 2 = 6 remainder 0
6 / 2 = 3 remainder 0
3 /2 = 1 remainder 1
1 / 2 = 0 remainder 1
Once the division results in a zero value, we have all the remainders we need (read that right most column of remainders upwards…see how it equals 1100?).
Armed with this knowledge, need we convert from decimal to binary to ternary at all, or can we just directly convert?
12 / 3 = 4 remainder 0
4 / 3 = 1 remainder 1
1 / 3 = 0 remainder 1
Is 110 (ternary) really equal to 12? Yes: 1*3^2 + 1*3^1 + 0 or 1*9 + 1*3 or 12.
Code
Ternary Counting
Postscript
Along the way to investigating this, I stumbled across two other interesting American Scientist articles: “E Pluribus Confusion“, about “fair” seat allocation in the US House of Representatives (a process with a series of varied hacky tie-breaking rules over the years) and “The Bootstrap” about the limits of our common statistical assumptions.
References
[1] http://lin-ear-th-inking.blogspot.com/2012/11/enumerating-permutations-using.html
[2] http://www.roard.com/docs/cookbook/cbsu2.html