Excising FAT Tumors

I recently built a new fileserver for home. It’s pretty nice, with 3.7TB of RAID5 storage. I’ll never have to worry about losing data again (mostly).

Unfortuantely, in the process of transferring data, I found that, well, I had already lost data.  My old backup drive (an external, FAT32 disk)  contained a directory that pointed off into gobbledygook, containing garbage as “files” and “directories.”  I noticed this when my rsync went off the deep end.  This “broken copy problem” is a simple problem to fix: rsync has a –exclude option.  To get the copy to finish, I just –exclude the broken directory and recopy.  Thankfully, the directory was junk (eclipse metadata), so I could ignore it, and all was well.

But.  Once I finish the copy, I’d like to continue using this drive for moving data around, etc.  With this corrupt directory, though, I can’t quite trust it.  When you realize what causes this corrupt, it becomes clear that no off-the-shelf tools will repair this without potentially mangling other data.  So, there’s no way to fix it short of formatting it.  But, understanding the corruption pretty intimately, I saw that it might be possible to fix this, if I hand-edit the disk.

So I did.  Knowing what the underlying problem was, I wrote some tools to inspect the filesystem manually, then went in and excised this directory.  Once that was done, the FS came up “clean enough,” and worked.  It might be fun to go through the process of building these tools, so here’s a quick overview.

First off, let’s identify the problem more precisely.  The big symptom is that ‘find’ returns tons of the following:

./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/ ?<A2>md`<B5>?.?<EE>?
./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/<F4><B0>q<EA>^\@??.?s<FC>
./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/<E6>q^]^\^K^G^N%.<E6>?0
./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/??^]v<B2>?<ED>?.?b^\
./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/?<E2>^^8<A5>^?<F4>?.u<F2>`
./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/<E8>???^X<DF>?<EE>.<F6>8^T

./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/ ?<A2>md`<B5>?.?<EE>?./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/<F4><B0>q<EA>^\@??.?s<FC>./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/<E6>q^]^\^K^G^N%.<E6>?0./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/??^]v<B2>?<ED>?.?b^\./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/?<E2>^^8<A5>^?<F4>?.u<F2>`./density_home/jbm/workspace/.metadata/.plugins/org.eclipse.core.resources/.history/8f/<E8>???^X<DF>?<EE>.<F6>8^T

It’s clearly “corrupt,” but what does that mean?  First, we’ll need to learn a little bit about how FAT works.  To do that, you can read any of a myriad of difficult-to-follow documents, all of which talk a lot about things that are only relevant on floppy disks (no, really).  Examples: Microsoft’s Description of the FAT File System (which is incomplete: it doesn’t cover FAT32), Wikipedia’s File Allocation Table (good, but tl;dr), Andries Brouwer’s The FAT filesystem (great, a little incomplete), and Paul Stoffregen’s Understanding FAT32 Filesystems (aha, finally helpful!  But lots of disk-ish stuff, and a few missing bits).

Once you’ve digested all of those, you’ve got a good idea of how FAT works.  Or you can sit down and listen to Josh talk story.

First and foremost: FAT was designed for floppies.  There’s lots of cruft in there, and a bit of weirdness, from that heritage.  A lot of things we think of as constants are variables in the land of 3.5″ vs 5.25″, DS vs SS, DD or not… all that great floppy crap.  Just be prepared for this.

So, without further ado, the structure of a FAT filesystem.  I’m going to assume you’re working with a partition on a hard disk, or a ‘dd’ copy of one.  The first chunk of the disk is the Boot Sector.  This is where the CPU boots from, so it’s got some actual code in it.  This is important for floppies.  We don’t care.  What we do care about are the following fields stored in the Boot Sector:

  • bytes per sector (0×0b-0×0c)
  • sectors per cluster (0×0d)
  • number of FAT copies (0×10)
  • number of sectors per FAT (0×24-0×27)
  • first cluster of FS (0×3C~0×3F)
  • FS type (0×52-0×59)

If you’re playing the home game, begin by checking the FS Type.  It should be “FAT32 “.  If it’s not, you’re in trouble, because I’m only explaining FAT32.

This is just the header structure, which lets us determine the geometry of the drive, and calculate where data goes.  After this, there’s the eponymous File Allocation Table, then a second copy of the FAT, and then we get to the actual data.  Let’s talk a little about how the data is organized, then come back to the FAT.

The actual directories and files in FAT32 are stored in “clusters.” A cluster is a contiguous block of sectors; a sector is a contiguous block of bytes (corresponding to a floppy disk sector, once upon a time).  All of these things are variables, though it’s really typical that sectors are 512 bytes, and clusters are 32Kbytes.  You should read in the appropriate variables above, and compute them for your disk, though.  The cluster is the base unit of allocation of your filesystem.  When you store a file, its content gets put into a set of clusters.  If the file is under 32K long, it goes into a single cluster; if it’s more than 32K, it goes into multiple clusters.  Similarly, directories are stored in clusters.  Most directories fit within one cluster, but sometimes they, too, span multiple clusters.

The content of files are stored “raw”: a file cluster is just a 32K chunk of that file.  All the metadata about the file is stored in a directory cluster.  Directory clusters, on the other hand, are nothing but metadata.  They’re a list of entries with names, types (directory or file), and cluster numbers.  So, you start out at the root cluster (“first cluster of FS”), and read it as a directory.  Walking through this, you find the name of all files and directories in the root of the FS, and pointers to the clusters containing those files and directories.  From there, you can go to those clusters and read their contents.

But what if a file (or directory) spans more than one cluster?  Normally, you’d assume the OS would allocate a contiguous range of clusters to any file, and just write the file there.  However, let’s assume you’re the OS, and you’re working on a disk that’s been in use for a while.  What if the user wants to write a file that’s bigger than any contiguous range of clusters, but fits into the free space fo the disk?  You could shuffle things around to make that work, but, haven’t I heard this before?  Oh, yeah, the bin packing problem.  It’s hard.  So, instead, we fragment the file into multiple clusters, and spread them to the four corners of the disk.  This lets us use all the free space on the disk, without having to solve NP problems every time the user writes a new WordPerfect document. (Yay floppies!)

So, now we need a way to keep track of all these clusters.  Putting them into the directory entry itself is tricky: we’d really prefer it if our directory tables were made of constant-length entries.  We could use some space at the end of each cluster to point at its next entry, but there must be some good reason not to do it that way, which probably involves floppies (it smells like a DMA/alignment thing to me, but I dunno.  If you do, please comment).  Why must there be a good reason not to do it that way?  Because MS chose to solve it in a way that makes absolutely no sense otherwise.

They created the File Allocation Table.  If you looked at the Boot Sector of your disk, you might’ve noticed that the “number of sectors per FAT” is a kinda big number.  On my 320GB drive, it comes out to 37MB.  If all the file contents are in clusters, and the directories and filesystem metadata are in other clusters, what the heck is in the FAT?

Naturally, it’s the linked list of what cluster comes after the current one.  Every cluster on the disk gets a 32 bit number entry in the FAT (hence FAT32), which is the index of the next cluster after it in the current file. These are coindexed, so cluster 2 gets the 2′th FAT entry: whatever number is in there is the next cluster in this entry.  There’s a few special values: anything over 0×0ffffff7 is an “END” marker.  So, let’s assume we have a directory that points to “NIBBLE.BAS” of length 35K at cluster 13, and that FAT[13] = 37, and FAT[37] = 0×0ffffff.  Then NIBBLE.BAS is made up of all 32K of cluster 13, and then the first 3K of cluster 37.  The remainder of 37 gets wasted, which is a real shame on a 720K floppy.

So, to recap: file metadata is stored in directory tables, which are stuck in clusters.  File contents are also stuck in clusters.  If any one entity spans more than one cluster, you start at the first cluster in its chain, then go to the FAT entry for that cluster to find out the next one.  If the next one is greater than 0×0fffff7, you’ve hit the end of the chain.

How is that metadata stored?  Well, it’s not too bad until we get to long file names.  It’s just a packed struct, of a fixed length.  Every directory table entry is 0×20 bytes long.  The fields you care about:

  • FilenameExt: (0×00-0×0b): 8.3, minus the dot
  • Attributes: directory/readonly/system/etc.  (0×0f is a wacky VFAT entry, if 0×20 is set, it’s a directory, see Andries for a complete bitmask)
  • Cluster high: 0×14-0×15
  • Cluster low: 0×1a-0×1b
  • Filesize: 0×1c-0×1f

Notice that the cluster is split up?  The “high” part is in a previously reserved part of the directory entry structure.  They used to only have the low part.  Floppies.

So, you read in those bits, multiple high by 256 and add it to low, and bob’s your uncle.  The filenames are probably awful (BOBSY~12TXT), but, for the purposes of “Find the place to excise a tumor from my FS,” I can deal with ugly path components.

At this point, we can see what went wrong above: the directory entry for …/8f/ got mangled somehow, and its clusterhi+clusterlo now points off into the middle of file.  The contents of that file are being dutifully interpreted as a directory.  Better still, the file goes on for a reasonable number of clusters, and, best of all: since those clusters contain random stuff, some of it is being interpreted as pointers to subdirectories, which point off to random clusters.  Which are probably other files, and so on, and so forth.

This explains the behavior I was seeing quite nicely.  So, what do I do to fix it?  If I tried to delete it recursively (rm -rf and/or deltree), it could be Very Bad: it’s going to go through and crap all over the clusters randomly pointed to, breaking the chains they’re in, and potentially mangling the whole FS.  That would be pretty tragic.

Knowing what we know above, though, there’s a pretty obvious entry: let’s just remove the bogus directory entry by hand!  Since the contents of 8f are a lost cause anyway, I could just go in and nuke its directory entries.  But, honestly, I kinda hate eclipse.  I only care about that workspace directory for the code in it; the eclipse metadata can go to hell for all I care.  So, I’m going to cut the tree of there.  Now I just need to find the 32 bytes on the disk to overwrite to make this happen.

To do that, we need to find the cluster on disk.  We know the first cluster comes right after the FAT, so that’s easy: len(boot sector) + len(FAT) + N * sizeof(cluster), right?  No, that would make sense, and this is FAT32.  Remember the variable “first cluster of FS” in the boot sector?  Didn’t it seem like that should be zero?  Yeah, it’s two.  It’s always two.  Not sure why, it just is, but, remember, it’s stored in a variable.  So, that’s the cluster you go to read the root filesystem, but it’s actually the first cluster immediately after the FATs.  Yes, FATs, because there’s probably two of them.  So, in the end, you wind up with:

offset_of_cluster(N) = 512 + 2*sectors_per_fat*bytes_per_sector + (N – first_cluster) * (sectors_per_cluster * bytes_per_sector)

Ain’t FAT grand?  This is the most widely-deployed FS in the world, folks.

With all that in hand, we can finally write code to walk down these things and find out where they live on disk.  Once you’ve done that, you can find the offending entry in the directory table and pluck it out by writing zeros over it.  Well, assuming it’s after anything else you might want in that directory.  If the zeroth byte of a filename is 0×00, it’s assumed to be the end of the directory listing.  Supposedly.  So, if you want to access stuff beyond the excised directory, you might be in trouble.  Good luck with that.  Happily (and oddly), Linux’s implementation of vfat skips over zeroed out entries, probably assuming that they’re problematic in this kind of a way.  Actually, it only does that when I mount the disk I modified; the disk image treats this as a directory terminator.  There’s an awkward question I don’t want to ask there, so I won’t.

Also note that the FAT is probably wonky at this point.  We pruned the directory entry, but it might still have clusters allocated to it.  Since we didn’t (and can’t) clean those up, there are probably a bunch of clusters marked as used which aren’t actually connected to the current FS tree.  There’s not any good way to identify these, short of walking the whole FS and keeping track of which clusters are part of something.  At that point, we’ve written our own baby fsck (the stock fsck.vfat totally fails on drives this size for me, and I’m definitely not hooking this thing up to a windows machine at this point).

After all was said and done, I got my drive back.  It’s probably got a bunch of wasted clusters, but, you know what?  I don’t care.  There’s a lot of stuff in this process that, when it comes down to it, I don’t care.  All that matters is that, at the end of the day, I got my drive back.

Project Bacchus Catch-up Post

Actually, since we don’t currently have a public blog or wiki, I thought it’d be nice to collect a bunch of Project Bacchus stuff in one place.

We started Project Bacchus in late October, 2009.  My friend Shannon approached me with a question: “What do you think about sending a camera to the edge of space?”  My answer was quick: “That’s why I got my ham radio license!”  We pulled in a few more people, and a plan emerged through weekly meetings, at which we always have beer. Always.

Our project was kept quiet for a while, in part to give ourselves room to fail (which we did), and in part to keep it coherent. The more people you have, the more ideas/goals you have, the less you can focus. By keeping it to just five guys, we stayed on track, though we did probably bite off too much for our first few launches.

The biggest challenge has actually been scheduling flights, with weather a close second. Everyone wanted to have the whole team present at our first big launch, which we did. Getting there was a challenge, especially in December: between the holidays and the rain, we didn’t have a single launch window. We’ve since moved to a much nimbler group, able and willing to fly with only three guys in the field. That’s worked out really well for us, as you’ll see.

Without further ado, a quick recap of most of Project Bacchus up to the present:

Bacchus I: 2009, November 21; abject failure.

My apartment, the night before Bacchus I

Adam has a flickr set of our attempt here, 20091121 Project Bacchus I.  Note that all these pictures were taken on November 21.  The ones in my apartment were just after midnight, the rest were at about 7am, an hour and a half drive from San Francisco.  This is on top of an all-nighter on Thursday by Michael and myself, which resulted in a nice, solid payload, but slightly frazzled nerves.

Bacchus I's Flight Computer: Spaghetti (Ted Scharff)

That’s a great shot of the flight computer’s (totally spaghetti) guts by Ted; he’s got more electronics porn in his Bacchus I Gallery.  There’s other documentary evidence of this trip, but it’s mostly depressing, so I’ll pretend it doesn’t exist.  On the positive side, I got an inadvertently free margarita at brunch after our failure.

Bacchus I, which never left the ground (Ted Scharff)

In summary, we learned a lot from our first failure.

Bacchus II: 2010, January 10; payload lost at launch.

The balloon, tethered between our cars (Adam Fritzler)

Adam’s the most photo-happy member of the group (but Michael and Ted are both pretty photo-happy themselves), so I’ll link to his set first again: 20100111 Bacchus II Launch.  This launch was in a valley that’s socked in with fog, underneath otherwise crystal-clear skies.  Launching through the fog was technically against FAA regulations, but there was no behavioral difference between us launching in clear skys and through the fog: pilots couldn’t have seen our balloon until the same altitude either way.

(L to R) Ted, myself, Michael, and Shannon (hidden) rig Bacchus II (Adam)

But maybe the FAA regs aren’t there for the pilots, because we immediately lost visual contact with the payload, and the ham radio rig only got three check-ins (up to 9,000′!) before it went silent.  We had a backup cell phone in the payload as well, but it never checked in.  The assumption is a catastrophic failure shortly after launch, but we’ll never know.  For now, the official story is pterodactyls.

Bacchus II: It's About Quality™ (Ted Scharff)

Again, we learned a lot from our failure here. For instance, our new fill method is nice and elegant; the fill method used in Bacchus II… less than elegant. There’s a lot of good photos in Ted’s Bacchus II Picasa Gallery.

Bacchus III: 2010, February 14; payload retrieved.

A camera, dangling from a balloon, took this.

After throwing away several hundreds of dollars of electronics on Bacchus II, we went back to basics for Bacchus III.  One tortilla warmer, one cell phone, and one camera.  We didn’t put on a chute, since we had a very light payload and a giant streamer, which worked out really well.  A couple hours after launch, we came upon Bacchus III lying, undignified, in a fallow field, nary a scratch on it.  The telemetry from the phone’s GPS shows it coming down at about 40mph, which is a little zippy, but not dangerous with as much foam as we had.

Bacchus III Away Team: Shannon, Adam, myself, Michael (Michael Toren)

As part of our “lean and nimble” goal for this launch, we went with just four of the five team members.  Ted had other plans that weekend, so we ran off without him.  We did have his hiking GPS, though, which, once we figured out the UI, was invaluable.

Bacchus III, on the ground (Michael Toren)

We retrieved the payload and got a bunch of photos off of it.  Adam went through and edited them down to the best ones: 20100214 Bacchus III (payload camera).  We thank him greatly for that, as it’s kind of a huge pain, and he’s really good at it.

Based on the data we collected here, we’ve been able to understand the dynamics of the system, predict future behavior, and better plan future flights.  We’re working towards a payload for Bacchus IV in a couple weeks, which is part of why I was reviewing battery holders in the last post.

Bacchus Kite Tests: 2010, February 20; way fun.

Myself, Ted, and Michael rig a "test" payload (Adam Fritzler)

To test some of our descent gear, we decided to use a kite as a flying platform.  We had a few things to test this way, but only really got to one of them this time.  Mostly, this was a good excuse to go out into the park and drop stuff from kites.  It’s really, really fun, try it sometime.  Also, it gave me an excuse to buy a 200′ measuring tape, which is probably one of the more specialized pieces of equipment I own.

Mission accomplished! (Ted Scharff)

After we finished our actual goals (breaking the hell out of some styrofoam), we went ahead and did something silly: dropped the camera, recording video, from 100′ with a Wal-Mart bag as a parachute.  It’s really, incredibly disorienting, but damned fun. Video as soon as wordpress stops being stupid about it.

And that’s it. You’re now caught up on Project Bacchus!

AA Battery Holder Reviews

A lot of my free time has gone into Project Bacchus lately.  Some friends and I decided to try and get a picture of the edge of space in late November, and we’ve been working at it ever since.  A few weeks ago, we had our first successful flight, Bacchus III.

White, blue, black: the edge of space

Bacchus III's view of the horizon near apogee (Courtesy Adam Fritzler)

We’re now getting a little more ambitious, and working on engineering things a little better.  We captured a bunch of pictures, but our camera stopped right after this one:

A nice shot of Bacchus III's impact point (Courtesy Adam Fritzler)

This is the field Bacchus III landed in.  Taken on the way down, literally seconds before impact: talk about good timing!  It landed surprisingly gently, and all the equipment survived intact.  In fact, our styrofoam cooler doesn’t even have a dent from the impact.  It was, quite frankly, a perfect landing.  Despite this, the camera suddenly stopped after the above photo.  Our conclusion is that the batteries came “loose”: the springs that hold the AAs in place inside the camera were compressed by touchdown.  This disconnected power for an instant, resetting the camera.

So, we’re now very interested in battery holders.  So interested, in fact, that I spent an hour the other evening playing “Consumer Reports for Near-Space Hackers.”  Andy (of the eSATA help) purchased some candidate holders from Jameco, our neighborhood electronics warehouse.

Here’s my notes, if you’re in the market for battery holders.

Jameco 216152: 4-AA, 1×4, no case, $1.15/unit

This is a boring 4-AA holder, like you had in the bottom of R/C cars as a kid, yadda yadda. Except that it doesn’t have a door, like your cars always did.

The springs feel good, pretty firm. A longitudinal landing will probably cause disconnect, similar to that seen in the camera on Bacchus III.

There is no case, so it’s really easy for me to imagine bouncing the batteries out unless they’re retained somehow. Given that it costs us $2.50 for two of them, and we’re already spending $20 on the batteries, we could consider these consumables and just replace them every flight. The real challenge there is that we’re resoldering the most important solder junction in the system every flight, which is going to be bad for the PCB eventually.

Another alternative is to construct a little sleeve to go around the batteries, which could be as simple as cardstock (read: cereal box) and fiberglass tape (read: duct tape).

Jameco 216216: 6-AA, 2×3, no case, $0.75/unit

This a little 2 by 3 battery pack.  In layout, it’s the same as the 216152, except that it’s only got three batteries in a row, and there are two of them back-to-back. It feels reasonably constructed, same as the last, but still not the totally solid feeling I’d like. The springs are less solid than the 216152, by a noticeable amount.

Most significantly, the middle battery in each row has the exact same “could get jostled out” problem as the middle two batteries had in the 216152.

We could resort to the same “consider it consumable” strategy here. This is cheaper than the previous solution, and I’m much more okay with burning $0.75 per battery load if it means we don’t have to worry about our power supply.

A sleeve is also an option here.

Jameco 2095453: 4-AA, 1×4, case, $0.99.unit

You know the battery cases Mitch uses for his kits? This is one of those, without the switch. It has the little hole for the switch, just no switch.

It’s 1×4, with a case around it. It snaps closed, and then is retained by a screw. It generally feels really solid, which is nice.

The one big concern I have here is the quality of the springs. They’re significantly weaker than the ones in the 216-series cases.

Conclusion:

I’d go for the 2095453 vs the 216152. They’re both 4-AA, which is not ideal for our current payload, but they both feel really solid in their own ways. With a bit of testing, I’d feel confident that the 2065453 can handle the impact of landing, and then I’d be totally sold. Barring that, I’d prefer to make a dapper little sleeve for the 216152 and work with that a bit.

AMS Venus ES5 eSATA Enclosure Performance

I’ve been looking for an eSATA enclosure to put on an Atom board for a home NAS.  My primary goal is to have 4TB of disk space that’s in a RAID.  The secondary goal is to be cheap, which means low power.  Assuming I keep this thing for four years, and that PG&E continues to charge $0.12/kwh, every extra watt of power consumption costs me a little over $4.

So, I’m looking for low power.  This means an Atom processor and motherboard, which barely sip power.  Unfortunately, none of them have both a reasonably recent Atom and more than two SATA ports onboard, so the goal was to use an eSATA enclosure to connect three 2TB drives in a RAID5.  The first enclosure I came across was the AMS Venus ES5 (AMS DS-315SES), $170 on sale at Central Computers.  I picked it up (after inquiring about restocking policy), along with three Hitachi 2TB drives (HITACHI 0F10311).  I have been trying to get it to work with a PCI SATA board in my old desktop.  That didn’t work: the SIL3512 chipset doesn’t support port mulitpliers.  I borrowed another card from my friend Andy, a VIA chip this time.  Also didn’t support port multipliers.

Andy was then kind enough to meet me at a Peet’s in downtown SF, where we plugged it in to his Dell laptop, with an Intel GS45 (ICH9M) chipset.  This worked like a charm, and we proceeded to do some tests on the array.  At a coffee shop.  It drew surprisingly little attention from passersby: this is San Francisco, after all.

On the positive side, it just worked with linux.  It came up a little curiously the first time, but that’s typical for older RAID enclosures (scanning disk by disk).  Hotswap also seemed to mostly work, though it did cause the enclosure to reset its entire link every time you unplugged or replugged a drive.  This makes hotswap a lot less useful, but it is still, technically, hotswap.

We tried reads and writes of 1GB with ‘dd’, varying the number of simultaneous spindles.  We also tried buffered and unbuffered modes for  these operations, to try and get raw performance of the array’s drives and chipset.

For writes, the enclosure does okay: it looks like it scales out nicely, though doesn’t nearly saturate a 3Gbps link.  At peak, we got just under 210MBps, or 1.68Gbps, or 56% of the theoretical peak.  More typical would be 1.2Gbps, or 40% of peak.

Reads were far more disappointing: buffered reading off three drives gave us 76MBps, or 20% of the theoretical peak.  Note that a single drive reads at 130MBps, so it’s apparently just bus contention slowing this down.  Since this is roughly my use case, I decided that I wasn’t keen on using this enclosure.

Total bandwidth 1 2 3
Read Buffered 130 99.4 76
Unbuffered 131.2
Write Buffered 178.6
Unbuffered 126 208.5 175

A very large thank-you to Andy, for sharing both his hardware and expertise.  He’s been testing spindle performance a lot lately; he may post results somewhere, and I’ll be sure to link to it here.

Also, thanks to the staff of that Peet’s, for not kicking us out, even though we unboxed a weird device and plugged it into the wall in your shop.

NB: We didn’t get to do mixed reads and writes, nor did we get to complete the table below.  Please recall that these tests were performed in a coffee shop; we ran out of latte, and didn’t feel like ordering another, given the way these turned out.

Roku Channel SDK: No Free Software

Short version: the Roku Channel SDK License includes language that prohibits GPL-bearing applications.  Given that their device runs a GPL’d OS with a bunch of other OSS infrastructure, this is disappointing, and F/OSS authors should be careful in accepting the Roku SDK’s terms of service.

If you read my old blog, you may remember that I found a way to  Download the Roku Firmware a while ago. It was clear that one could load custom firmware via a man-in-the-middle technique. In my case, I wanted to place my ripped movies into the Netflix Play-it-Now queue. It was technically feasible, but sort of a pain (it required reversing the Netflix protocol, which required adding custom CA certs to the image, and and and…), so I dropped it.

Fast forward to now-ish, with the introduction of the Channel Store on Roku.  I just noticed Roku’s Channel Developer SDK in an email announcement they sent out.  Exciting!  With a little elbow grease, I could watch my ripped DVDs, use YouTube on my TV (Google TechTalks and MIT OpenCourseWare!), or maybe even have a nice weather channel.  I was so stoked, I got out of bed an hour earlier than usual to look into it.  No, really.

Part of my excitement was knowing it would be familiar.  Having seen their firmware, I know the Roku is a Linux box, running X windows, and using QT for its UI.  All of this is Open Source Software, and I’ve used it all before: this is great!  I can bring my former experience to bear, and a lot of the Open Source stuff I’ve been using in the past should integrate pretty flawlessly.

I went and started signing up for the process.  Before downloading the SDK, though, you have to agree to The Roku Channel Developer Agreement. Instead of blindly clicking, I decided to read it.

5.A.iii:   Subject to the Grace Period, Your Channel Application must at all times: … not contain any open source code or other restricted code that could require Roku to publicly post or display any third party notices or any modifications to such code.

Disappointment.  That is: this platform, built almost entirely on Open Source, won’t allow me to use the same tools they did.  iPhone developers face a similar problem, for much the same reason: the iPhone Store’s TOS disallow GPL code.  The big difference is: the iPhone is a phone.  It’s not too alluring to run Linux on there, or make it do new tricks.  Moreover, Apple used almost no off-the-shelf Open Source in the iPhone device.  The Roku, however, is built entirely on F/OSS, and is a nice little box that sits on your TV.  It’s a fantastic little box, and deserves to run totally custom software.  Unfortunately, with their current Terms of Service, anyone who develops Roku channels can’t legally take part in that creative reuse of an excellent device, which is too bad.

Inbox Zero Party

Recently @edrabbit had a great idea: “I want to install a cache of balloons and confetti over my desk for when I achieve inbox zero.”

It was brilliant, so I kludged together the first step: a ruby script that takes an action when you hit inbox zero.  Here’s the code, if you want to play along at home.  You’ll need to edit your username and password; if you’re on Linux, “gnome-open” is probably the right thing to have there.  On OS X, you’ll want “open.”  If you’re on Windows, you have your own little set of special problems.

(Note that this program exits 0 when you’ve got inbox zero, and 1 otherwise, so you can easily put it into a shell script.  In addition to the exec() command in the script, this could be used to trigger a program that pings an arduino to unlatch a box of balloons and confetti.  For my sake, I’ll be thrilled to have “Peanut Butter Jelly Time” pop up when I hit inbox zero.)

#!/usr/bin/env ruby
# You need to add your username and password below!
require 'net/imap'

config = {
  'server' => "imap.gmail.com",
  'port' => 993,
  'username' => "josh.myer@gmail.com",
  'password' => "",
  'action' => "gnome-open http://bit.ly/pbj_time",
}

imap = Net::IMAP.new(config['server'], config['port'], true)
imap.login(config['username'], config['password'])
imap.select('INBOX')
n = 0
imap.search(["NOT", "DELETED"]).each do |message_id|
  n += 1
end

imap.logout()
imap.disconnect()

puts "Your inbox contains #{n} not-deleted messages."

if n == 0
  puts "SUCCESS!  INBOX ZERO ACHIEVED!"
  exec(config['action'])
  exit 0 # superfluous, I know
else
  exit 1
end

Adding unit tests to Write Yourself A Scheme in 48 Hours

I’ve recently been working my way through Write Yourself a Scheme in 48 Hours. This is a guided tutorial through implementing a Scheme interpreter in Haskell. It’s been really fun, and a nice change from my current contract (which involves a lot of screwdriver turning and other sysadmin duties).

That said, there is one big change that I’d like to see made in the text: build it with unit tests! I’m not a total test-driven disciple, but I do believe they’re a fantastic tool, and this is a great place to use them.

Parsing is incredibly test-friendly: you have an input string, and an expected output parse tree. It’s totally deterministic, with no random numbers, no ambiguities, and no pesky humans. There are a lot of interacting components, especially in terms of nesting, which are hard to keep in mind when manually “testing” things.

In addition, the “here’s code, extend it to do X” pattern used in the text benefits from having comprehensive tests. In working the homework exercises, you’re going to refactor big chunks of code. As you do so, you should be worried about breaking the hell out of your existing code (if you’re not worried, you’re either brilliant, inexperienced, or inattentive). There’s no better way to assuage those worries than unit tests, and, as you find broken things, adding regression tests. It gets even better if you add code coverage, guaranteeing that you’re exercising all codepaths via your tests.

Another wrinkle that makes you want tests: as you work through the book, you merge your existing code into the next snippet of example code. This is often an error-prone process, and having tests to catch mistakes is great. This is the reason I broke down and refactored the code for testing. I started getting into the interesting stuff at the end of chapter two, and suddenly my strings weren’t parsing properly. Upon examination, I found I’d merged my code into the next sample code incorrectly, and left my parseString implementation behind.

So, with all that in mind, let’s make Write Yourself a Scheme more test-friendly. The first step is to factor the parser implementation out into its own module. This is really easy: we create WYAS.hs, containing everything but main, and change the module line to:

module WYAS where

We’ll also need to extend LispVal a little bit, too. We want to be able to compare them, so we’ll add deriving(Show, Eq) immediately after the definition:

data LispVal = Atom String
             | List [LispVal]
             | DottedList [LispVal] LispVal
             | Number Integer
             | Float Float
             | String String
             | Char Char
             | Bool Bool
             deriving (Show, Eq)

Don’t worry about what “deriving()” means here; it’s just some Haskell typeclass voodoo that lets us now show (Char 'x') and have it do something useful, as well as allowing us to compare two LispVals for equality. After adding deriving(Show), you can also make readExpr’s Right case return the actual parse tree as a string:

    Right val -> "Found value: " ++ show val

Handy, no?

Then, we add main.hs, containing:

module Main where
import WYAS
import System.Environment

main :: IO ()
main = do args <- getArgs
    putStrLn (readExpr (args !! 0))

Next, let’s compile this, and make sure we get something that acts like our old parser:

ghc --make -o main main.hs
./main 42
./main "#t"

Once we’re convinced that’s all working, save it and commit it to your version control. (You are using version control, right? svn, git, cvs, hg… they’re all great, and totally worth using here!)

Now, we add a test framework. First and foremost: make sure you have HUnit installed (on ubuntu and debian, it should be ‘apt-get install libghc6-hunit-dev libghc6-hunit-doc‘).

Then, we’ll get to using HUnit to write unit tests. I don’t really want to explain HUnit in too much detail, in part because I still don’t fully grok it. Suffice it to say, the following works, and gets good results. If you have better options, please leave them in the comments, as this is one of those things that could always use refinement.

For my purposes, I’m only really interested in running tests on each parse type. That is: I want to put together a bunch of strings representing possible Number inputs, along with their parses, and run them one after the other. I’d like to have a string to tell me the name of the grouping, but, beyond that, I’m not too picky on how much debug output I get. When an input string fails (and it will), I can use the ./main binary to examine the output parse. So, I’m going to write a small templating facility in my test setup. I’ll define a ParseTestSuite, which is a String identifying the group with a list of (String, LispVal) pairs. The String is the input, and the LispVal is the expected parse. Once I have that, add some code around it to turn them into HUnit’s internal Test representation. At the very end, we collect all these Tests, and run them. Here’s the resultant test.hs:

module Main where
import WYAS
import Test.HUnit
import Text.ParserCombinators.Parsec hiding (spaces)

-- A ParseTestSuite is a name, then a list of strings, and their expected parses
-- For instance:
--   ParseTestSuite "testNumbers" [("32", Number 32), ("3.4", Float 3.4), ..]
--
data ParseTestSuite = ParseTestSuite String [(String, LispVal)] deriving (Show)

doParse :: String -> LispVal

doParse input = case parse parseExpr "lisp" input of
    Left err -> Atom ("No Match " ++ show err)
    Right val -> val

-- Convert a ParseTestSuite into an HUnit test suite.
-- This requires extracting the boolean assert, then turning
-- it into a test case.
suiteEntryToAssert :: String -> LispVal -> Assertion
suiteEntryToAssert s p =
assertBool s (p == doParse s)

parseTestSuiteToTest :: ParseTestSuite -> Test
parseTestSuiteToTest (ParseTestSuite name tests) =
    TestLabel name bodies_t where
    bodies_t = TestList $ map (\(s,p) -> TestLabel s (TestCase (suiteEntryToAssert s p))) tests

tests_Number :: ParseTestSuite
tests_Number = ParseTestSuite "tests_Number"
    [
    ("32", Number 32),
    ("3.2", Float 3.2),
    ("0", Number 0),
    ("#d1", Number 1),
    ("#b10", Number 2),
    ("#x10", Number 16),
    ("#o10", Number 8)
    ]

tests_Bool :: ParseTestSuite
tests_Bool = ParseTestSuite "tests_Bool"
    [
    ("#t", Bool True),
    ("#f", Bool False)
    ]

-- lots of other tests elided

all_tests :: [ParseTestSuite]
all_tests = [tests_Number, tests_Bool, tests_Atom, tests_String, tests_Char, tests_List, tests_DottedList, tests_Quoted]

main :: IO ()
main = do
    runTestTT $ TestList (map parseTestSuiteToTest all_tests)
    return ()

Then, we compile this:

ghc --make -o test test.hs
./test

and its output:

0] jbm@density:~/src/wyas/ch2 $ ./test
Cases: 22  Tried: 22  Errors: 0  Failures: 0
0] jbm@density:~/src/wyas/ch2 $

From here, it’s just a matter of adding more of these ParseTestSuite entries and inserting it into all_tests. As we do this, we gain more confidence in the parser we’re building in this process.

There’s one more key consideration in confidence, though: code coverage. Unit tests are great, but they might not test all your code, leaving bugs lurking for some poor soul to find later. The best way to ensure this is to run your unit tests under a coverage tool. GHC comes with one of these, handily enough: hpc.

As with all coverage tools, we need to recompile the program with some new options. And, like many of them, we need to clean up the project a bit first. So, in the name of being lazy, I wrote a Makefile:

# NB: you're going to get this mangled.
# The things under the target names MUST be tabs, not spaces.
# The joys of ancient Unix programs.
#
all: test main

coverage: clean WYAS.hs test.hs
        ghc --make -fhpc -o test test.hs
        ./test
        hpc markup --include=WYAS ./test

test: WYAS.hs test.hs
        ghc --make -o test test.hs
        ./test

main: WYAS.hs main.hs
        ghc --make -o main main.hs

clean:
        rm -f *.tix *.hi *.o main test

distclean: clean
        rm -f *.html

This lets us run ‘make coverage‘ and get an HTML file showing the coverage of WYAS.hs:

0] jbm@density:~/src/wyas/ch2 $ make coverage
rm -f *.tix *.hi *.o main test
ghc --make -fhpc -o test test.hs
[1 of 2] Compiling WYAS             ( WYAS.hs, WYAS.o )
[2 of 2] Compiling Main             ( test.hs, test.o )
Linking test ...
./test
Cases: 22  Tried: 22  Errors: 0  Failures: 0
hpc markup --include=WYAS ./test
Writing: WYAS.hs.html
Writing: hpc_index.html
Writing: hpc_index_fun.html
Writing: hpc_index_alt.html
Writing: hpc_index_exp.html

To get a quick view of how we’re doing, we can use ‘hpc report‘:

0] jbm@density:~/src/wyas/ch2 $ hpc report test.tix
91% expressions used (416/457)
100% boolean coverage (1/1)
100% guards (0/0)
100% 'if' conditions (1/1)
100% qualifiers (0/0)
62% alternatives used (22/35)
100% local declarations used (6/6)
96% top-level declarations used (32/33)
0] jbm@density:~/src/wyas/ch2 $ hpc report test.tix  WYAS
91% expressions used (233/256)
100% boolean coverage (1/1)
100% guards (0/0)
100% 'if' conditions (1/1)
100% qualifiers (0/0)
63% alternatives used (21/33)
100% local declarations used (5/5)
95% top-level declarations used (19/20)
0] jbm@density:~/src/wyas/ch2 $

As you can see, my “alternatives used” is rather low. Looking at the highlighted output, I need to finish up the tests around my R5RS character decoding implementation. Time to go in and add more cases to my unit tests!

I hope this post helps explain the benefits of testing for this project, and helps you convert your copy of Write Yourself a Scheme into something more test-driven. If it saves you time, or you have any other ways to help the rest of us save some time, leave a comment. WYAS is a great resource, and I’d love to collect all the time-savings we can and fold them into the original text.

Heated Fingerless Gloves FTW

Noisebridge is getting pretty chilly lately.  It’s better than it was, but you still want to bundle up a bit to hack there.  Fingerless gloves are a pretty standard part of my wardrobe six months of the year.  Err, to be totally honest, eight months, here in San Francisco.  Anyway, they’re great: I can type, but my fingers are still insulated.  Unfortunately, Noisebridge is so chilly that insulation isn’t quite enough.  Heck, so is my apartment.
That’s where USB comes in. Yeah, USB-powered, heated fingerless gloves.  Of course they exist!  And they’re pretty awesome.  They are reasonably nice fingerless gloves (see above), with little heating pads velcroed in.  The heating pad goes on the back of your hand, and plugs into the USB cord they come with.
Of course, I had to see exactly what the heating pad was made of, so I did a little bit of minor disassembly.  Looking into it: it’s just a little bit of printed (stamped?) resistive foil, which hooks up to the power and gets hot.  Hunh.  It seems pretty simple.
These things are great, and are keeping my moneymakers nice and warm, without having to heat my whole apartment.  If you’d like a pair, let me know.  I got some extras, and would be happy to sell them for $10/pair.  Happy (and warm) hacking!

Morse decoding using Arduino

Here’s a quick hack I keep meaning to release in a meaningful way: a morse code decoder in arduino.  Hook up a button to pin 2 (like you would for the button examples), and then watch the serial monitor.  If you key in SOS (…/—/…), it will light up the LED on pin 13.  My intention here was to use this to buzzy myself into my apartment building’s door, but I still haven’t gotten around to actually putting it inside the box yet(!).

Give it a go, and see what you think of it.  I’d also like to hook it up to an LCD panel and make a proper morse training device out of it, but that’s for another day, I think.

Leave comments here if you make any use of this, I’m keen on seeing what happens with it!

73 de KJ6ANM

Here’s the file: arduino_morse_decoder.pde(.txt to make things happier).

A community funding experiment

You might remember muralizer, a mural drawing robot I was working on at Noisebridge earlier this year.  Things went awry, and the project never really achieved fruition.

I’m starting it back up again; more deatils over at its own blog: Muralizer Project Blog.  In particular, I’m trying to make ends meet for a few weeks with just Muralizer work.  To that end, I’ve started a pledge drive over at kickstarter: Muralizer kickstarter page.  This is something like a PBS pledge drive, with me handing out prizes for different pledge levels, with a big twist: if the funding goal isn’t met, no money changes hands.   This is great, but means that I need to be getting the community behind me.

So, if you’re reading this and can afford to kick in a few bucks to make a really cool open hardware project happen, please, pledge at kickstarter.  I’d really appreciate it, as would the community of artists who will benefit from a friendly kit version of this hardware.  Thanks in advance for your support!

ὑπόμνηματα: memoranda, notes, minutes