My Pal SCSI

My son Alex turned one a couple of weeks ago. (If you are reading this, Alex, happy birthday, and congratulations for learning to read at such an early age!) He picked up quite a stash of loot as a result.

Moore’s law means that the processing power of his toy box very likely exceeds that of my first computer by a large factor. As one example, he received a My Pal Scout talking stuffed animal as a gift. A toy that comes with a USB cable — this is progress! You can customize the toy to say and spell your child’s name, and pick different tunes for it to play.

My inner geek has been wondering what’s inside ever since, but of course I cannot take apart or otherwise ruin my kid’s toy in the name of science. I did, however, plug it in to my Linux box while he was napping. A quick dmesg showed the device implements USB storage, but always responds with ‘Medium Not Present’ when accessed. I guessed (incorrectly) that some extra magic might make the internal flash appear as a disk and then files are just copied to a FAT filesystem stored therein. The toy is relatively inexpensive and coming up with too much special sauce is likely to be prohibitively costly.

USB storage is a successful example of taking an existing protocol (SCSI command set) and wholesale wrapping it in a different wire protocol. Each USB storage transfer is initiated by the host sending a Command Block Wrapper (CBW) — a 31-byte USB packet starting with ‘USBC’, typically containing a SCSI command as a payload. Next, a block of data is transferred if this command represents a read or write. Finally, the device completes the transaction by sending a Command Status Wrapper (CSW), a 13-byte packet beginning with the string ‘USBS’.

One can get a feel for the flavor of the protocol by using usbmon. Much like an ethernet sniffer, usbmon provides a simple mechanism under Linux to capture USB traffic. A simple session might look like:

    # cat /sys/kernel/debug/usb/usbmon/4u > usbmon.txt

One might even potentially run usbmon on a host OS while some other OS is running as a guest in a virtual machine with USB pass-through.

The upshot of the layered approach to USB storage is that Linux creates a generic SCSI device (/dev/sgX) for any USB storage device. Using the generic device, one can directly send SCSI commands to the USB device, and the kernel will take care of wrapping it in USB commands. I believe something similar is possible in Windows land.

As it turns out, Scout is even simpler than I imagined. The internal flash has no controller or filesystem; instead it appears to be a raw NAND flash written a page at a time. It is a simple matter to read the flash using the Linux sg device. One merely opens the device file, and then issues a vendor-specific SCSI command on the file descriptor:

static int read_page(int fd, u32 addr)
{
    u8 cmdblk[] = {
        0xfd,               /* access flash */
        0x28,               /* read it (0x20 = write) */
        0, 0, 0, 0,
        0x06, 0, 0x08, 0,   /* no idea what the rest is */
        0, 0, 0, 0,
        0x47, 0x50
    };
    u8 response_buf[4096];
    u8 sense_buffer[32];

    cmdblk[2] = (addr >> 24) & 0xff;
    cmdblk[3] = (addr >> 16) & 0xff;
    cmdblk[4] = (addr >> 8) & 0xff;
    cmdblk[5] = addr & 0xff;

    sg_io_hdr_t io_hdr = {
        .interface_id = 'S',
        .cmd_len = sizeof(cmdblk),
        .mx_sb_len = sizeof(sense_buffer),
        .dxfer_direction = SG_DXFER_FROM_DEV,
        .dxfer_len = sizeof(response_buf),
        .dxferp = response_buf,
        .cmdp = cmdblk,
        .sbp = sense_buffer,
        .timeout = 20000
    };

    ioctl(fd, SG_IO, &io_hdr);
    /* do something with response_buf here */
    return 0;
}

Reading addresses 0x01000 through 0x10000, 4k at a time, seems to yield the customizable data on the device. The flash is tiny: this is just 64k, yet you can upload a digital audio file of your child’s name plus ten songs.

The data format is rather simple: there is a 30-byte header starting at address 0x1000, containing 16-bit, little-endian pointers for the customized files. Address 0x1008 holds a pointer to the spelling of your child’s name, address 0x100e holds a pointer to the audio file pronouncing your child’s name, and so on. Armed with the Leapfrog software and a packet sniffer, one can verify that these files do indeed match the individual binary files that the software downloads over HTTP when syncing the puppy.

I believe the digital audio files are some flavor of raw 8 kHz PCM, but I could not find the right combination of parameters to sox to make sense out of them. The song files are all apparently compressed with TTComp, some compression program from 1995. Running ttdecomp.exe from within dosemu did successfully decompress them. My guess is these files are some sort of sequencer format rather than sampled audio, given their tiny size.

This is, I think, as far as I wish to investigate the toy. Obvious exercises for the interested reader are to discern the individual file formats, and have the toy play Metallica. It’s pretty incredible how much technology can be cheaply packed into a child’s plaything today. But now I have my eye on dissecting that (non-electronic) toy inchworm — is there a spring in there or what?