Archive for January, 2011

Enumerating Three Choices

I was recently posed a question by a psychologist friend of mine. She needed to enumerate all possible answers to a set of 20 questions. Each individual question could have three answers. She then planned to count the actual frequency of answers from the test sample (say, N~=100) and run this result through some other statistical software. The problem was that the statistical software’s input format required the permutations of answers, even if they had a frequency of zero.

I could easily write a procedural iterative program to enumerate all possible binary choices for 20 questions. (Wait, can I?) Wait – what about three possible choices?

The difficulty is deciding just what type of problem this actually is. Is it a permutation thing (i.e., order matters)? Combination (i.e., order doesn’t matter)? Enumeration (iterate each possible permutation)? Using what number base (3)? Should I use an iterative or recursive solution, or perhaps a clever workaround? How long would a reasonably efficient program take?

In any event, I knew that the output set was potentially large, and it wasn’t immediately apparent how to write a piece of C code to enumerate all possible permutations of answers for 20 questions. Let’s define what I’m looking at: I want all possible combinations (i.e., I don’t care about order specifically) of answer strings to a list of 20 questions where each question could be answered in 1 of 3 ways. If I were just interested in permuting the test questions themselves, I would have a set of 20! or 2,432,902,008,176,640,000 or 2.4×1018 (as Unix `bc’ tells me). But I’m not; I don’t care what order the questions are in, but rather producing all possible ways to answer the test itself for one arbitrary ordering.

For 1 question, there are three possible answers: 0, 1, 2 (let’s say they correspond to “yes”, “maybe”, “no”). You might also consider a null answer in addition to this set, but for now, let’s ignore that possibility. So, there are three possible ways that this one question test could be answered. What if we added more questions?

Q1: ?
0, 1, 2
Q2: ?
0, 1, 2

So we have all possible answer strings (of length 2) as:

00
01
02
10
11
12
20
21
22

With just 2 questions, there are 9 possible answer permutations.

Thus, the number of questions becomes the length of our string. The number of response options become the possible values of each location in the bitstring. In binary (base 2) number systems, the possibilities proceed in powers of 2 (e.g., for a 100 question test with only two possible response options, the total number of answer combinations is 2^100). With base 3 and 20 questions, the total number of unique answer combinations is 3^20; that is, someone could answer the test itself in any one of 3^20 ways. For those of us who are computationally challenged, this number has a value of 3,486,784,401 or about 3.4 billion (as an aside, even if 10,000 people responded to this test, the sets of answers they produce would very sparsely populate this space of 3.4 billion strings).

So how do you generate 3.4 billion unique strings of length 20 where each position could be one of three values? Actually, how do you write a program to do that for you?

It turns out that we can take a couple of interesting detours into mathematics [1,2], but I’d prefer to stay on the CS side of the house for now. Let’s think about symbol representation. Above, I abandoned ‘y’, ‘n’, and ‘m’ for representing the choices because those three characters have no directly mathematical relationship. Representing the answers as 0, 1, and 2 (on a binary machine) really doesn’t seem to have much advantage either.

There is in fact a ternary number system with some interesting properties. According to the American Scientist article “Third Base” (Nov/Dec 2001), the ternary number system “is the most efficient of all integer bases; it offers the most economical way of representing numbers.”

Perhaps I could simply write a program to count to 3,486,784,401, converting each integer value to a bitstring and then write a routine for converting that bitstring to a tritstring (ternary string) and printing it to stdout. I’ll try that, and see what happens. But let’s consider the heart of that approach: the translation of binary bitstrings to tritstrings. Let’s translate the binary representation of decimal 12 to ternary.

12 contains two powers of 2: 8 + 4. So in a 4-bit string, let’s turn on the leftmost two bits to get: 1100 (i.e., 1*23 + 1*22 + 0*21 + 0*20, or 1*8 + 1*4 + 0 + 0).

Now how do we convert from base 2 to base 3? How do you convert from one base to another? Well, I just did it for base 10 to base 2, albiet informally. You keep dividing the number to be translated by the target base and keep track of the remainder. The sequence of remainders becomes the digits of your translated number. To wit:

12 / 2 = 6 remainder 0
6 / 2 = 3 remainder 0
3 /2 = 1 remainder 1
1 / 2 = 0 remainder 1

Once the division results in a zero value, we have all the remainders we need (read that right most column of remainders upwards…see how it equals 1100?).

Armed with this knowledge, need we convert from decimal to binary to ternary at all, or can we just directly convert?

12 / 3 = 4 remainder 0
4 / 3 = 1 remainder 1
1 / 3 = 0 remainder 1

Is 110 (ternary) really equal to 12? Yes: 1*3^2 + 1*3^1 + 0 or 1*9 + 1*3 or 12.

Code

Ternary Counting

Postscript

Along the way to investigating this, I stumbled across two other interesting American Scientist articles: “E Pluribus Confusion“, about “fair” seat allocation in the US House of Representatives (a process with a series of varied hacky tie-breaking rules over the years) and “The Bootstrap” about the limits of our common statistical assumptions.

References

[1] http://lin-ear-th-inking.blogspot.com/2012/11/enumerating-permutations-using.html
[2] http://www.roard.com/docs/cookbook/cbsu2.html

Comments off

Cloud, Pocket, and Desk

I just received some spam from the ACM touting some material about Cloud Computing. The email says, in part:

“Cloud computing promises to radically change the way computer
applications and services are constructed, delivered, and managed.
It is a fundamental new paradigm in which computing is migrating from
personal computers sitting on a desk to large, centrally managed
datacenters.”

This got me to thinking: there are actually two paradigm shifts happening in computing, and the other is the move to smaller, powerful, mobile personal devices. Each seems to be heading in the opposite direction, and the desk looks like the loser.

But of course these paradigm shifts are complementary; there is no reason why a smaller, more natural human interface in the form of a mobile device cannot take advantage of the compute and storage power of the cloud while retaining the ability to perform some computationally heavy tasks if called upon (e.g., if the cloud is unavailable, or if you don’t trust the cloud with the data/inputs to perform the calculation).

In fact, we can think of these two shifts as a factoring of the traditional desktop role; you still have a personal computing device, but it is now mobile, and its power is greatly enhanced by remote storage and compute capabilities. Of course, shortcomings exist: in interfaces (thumb typing? come on), in power (battery life, anyone?), in security (store my medical info in the cloud???), in display size (watching movies on 4 inches…eye strain), etc.

Comments off

Debugging an OS X Kernel Panic With Speculation and Innuendo

Debugging the OS, even with full source, full debugging symbols, and a framework like kprobes or DTrace is a very hard exercise.

Debugging OS X from a user perspective (without access to source, and with skimpy information in a crash report) is almost impossible.

I’m willing to speculate a bit, though.

My recent kernel panics are likely the result of two possible conditions: (1) an actual race condition bug in the kernel; or (2) interference with the kernel data structures that manage locking by some 3rd-party software loaded into the OS as a module. On my system, the only “foreign” code loaded into the OS are drivers for my Microsoft Mouse and Keyboard (let’s face it, Apple keyboard suck, and the one-button-trackpad-click-multi-touch interface is just a nightmare as a pointing device) and VMware:

loaded kexts:
com.microsoft.driver.MicrosoftMouseUSB 6.2.2
com.microsoft.driver.MicrosoftMouse 6.2.2
com.microsoft.driver.MicrosoftKeyboardUSB 6.2.2
com.microsoft.driver.MicrosoftKeyboard 6.2.2
com.vmware.kext.vmnet 2.0.4
com.vmware.kext.vmioplug 2.0.4
com.vmware.kext.vmci 2.0.4
com.vmware.kext.vmx86 2.0.4

I’m willing to believe VMware may be the culprit here, particularly since I’ve started using it heavily since Migrating to the new notebook. My copy of VMware Fusion is a few revisions behind (I think), and OS X is up to date. I didn’t have problems with it on my previous Macbook Pro (although I did have other problems with that notebook). I’ll try updating VMware Fusion and see if that resolves the problem (although it is naturally tough to test the absence of something).

Comments off

Slow Curtain of Death

I should rename this blog “Complaints about Apple Mac OS X”.

Apple has their own alternative to Microsoft’s “Blue Screen of Death.” I call it the “Slow Curtain of Death” — the machine suddenly stops responding, and a line proceeds from the top of the screen to the bottom, dimming the display and displaying a typically sexy-looking Apple message box saying “You must restart your computer. Press…”

My new Macbook Pro crashed (i.e., the kernel had a “panic” — yes, that’s a technical term for when the underlying operating software encounters an unrecoverable error) for the fourth time in a week from the exact same line of code (it looks like an error in handling thread locking conditions in the kernel scheduler). I have sent all four of these reports to Apple via their automated reporting mechanism.

I suspect the culprit is some incompatibility between my version of VMWare Fusion and Snow Leopard (but this is complete speculation). I append the start of the most recent crash report below. Note a few things. First, the error output line itself is quite detailed, even including a possible cause (unlocking an already unlocked mutex or spinlock? why would that cause a problem?) It even includes the file sched_prim.c and line number of the error. Too bad I don’t have Mac OS X source code. The report also includes a dump of the stack, but it lacks reporting the standard dump of CPU registers.

Interval Since Last Panic Report: 27770 sec
Panics Since Last Report: 1
Anonymous UUID: B22098C3-CA08-4230-A115-AFDF431CCB8B

Tue Jan 18 14:50:12 2011
panic(cpu 2 caller 0x226b53): “thread_invoke: preemption_level -1, possible cause: unlocking an unlocked mutex or spinlock”@/SourceCache/xnu/xnu-1504.9.26/osfmk/kern/sched_prim.c:1471
Backtrace (CPU 2), Frame : Return Address (4 potential args on stack)
0xbe56be18 : 0x21b50c (0x5d4438 0xbe56be4c 0×223974 0×0)
0xbe56be68 : 0x226b53 (0x58babc 0xffffffff 0x58ba54 0×226714)
0xbe56bee8 : 0×227259 (0xde9db98 0×0 0x6bc9f000 0×1)
0xbe56bf58 : 0x2272c4 (0x22fc20 0x863ea0 0×0 0x2a358d)
0xbe56bf78 : 0x22fdba (0x22fc20 0x863ea0 0×0 0×0)
0xbe56bfc8 : 0x2a06cc (0x863ea0 0×0 0×10 0xe1933c0)

BSD process name corresponding to current thread: kernel_task

Mac OS version:
10J567

Kernel version:
Darwin Kernel Version 10.6.0: Wed Nov 10 18:13:17 PST 2010; root:xnu-1504.9.26~3/RELEASE_I386
System model name: MacBookPro6,2 (Mac-F22586C8)

System uptime in nanoseconds: 38460313523585

Comments off