Holiday Hack: reading Java class files

by atc on March 24, 2013, 2 comments

I spent a week by the pool in Egypt with my girlfriend, relaxing and…hacking. The lack of WiFi meant I had to rely on instinct and intellect to get me through my pet project and it proved a worthy exercise in problem solving and perseverance…

I had the crazy idea of practicing my C programing skills by reading the binary .class file format and…doing something with it. A friend even suggesting writing my own Java Virtual Machine, which I’m probably going to do at some point.

Anyhow, a week by the pool meant no WiFi, lots of food and drink, sunburn and…boredom. I set it upon myself to rely purely on the documentation I had to hand — at first just a cached Wikipedia page, later on the JVM specification — to read .class files and practice C.

It turns out this basic task proved to be a useful exercise not only in practicing C but in perseverance and relying on one’s intuition and experience to solve a problem, not just on “googling it” and Stackoverflow.

Lesson #1: having the source makes things easier

Though I’ve learned this many times before in other languages, having the source to hand means you can truly debug an issue.

I happened to be using uthash.h for a hash table as part of my data structures. The mistake I’d made was define my id type as a uint16_t — i.e. a 2-byte unsigned integer type — and then use the HASH_ADD_INT and HASH_FIND_INT convenience macros for manipulating/searching the hash table. Of course an int is 4-bytes on my Debian x86 laptop, so when the macro expanded to something along the lines of HASH_FIND(hh, table, id, sizeof(int), ptr); I was never finding items because the generated memcmp call was comparing the wrong amount of bytes. A rookie error, but a lesson nonetheless.

Lesson #2: don’t quit

Building on lesson #1, I was reminded that I am only limited by time, my own stamina/patience and ability to think around a problem. My limited experience with C and the advanced nature of uthash.h‘s source code meant it was daunting trying to debug and understand what it was doing and ultimately where I was going wrong. Alas, by the end of the debugging exercise I was much the wiser as to the workings of C and its true power. Reading the source code of others is always a useful task and had I quit when seeing the wall of #defines I’d have not better understood macros, C or the inner-workings of data types in C. The lesson? Keep chugging away – even when it’s frustrating you and you think you’re lost it can pay wonders to take 5 minute swim in the pool and a coffee to re-tackle a bug.

Lesson #3: DVCS is really useful

The only time I had WiFi was when we ventured up to the dedicated area near reception. Our room and the majority of facilities were nowhere near it, which meant we’d have to dedicate half an hour to trekking up to reception, hoping the connection’s working (Egypt’s infrastructure isn’t what we’re used to here in the west!) and frantically downloading what I can whilst the bandwidth isn’t saturated by Facebook pokes.

The point, though, is that I had versioning control on all the time. I didn’t have to commit to a remote server to save my small changes, improvements or fixes, I had it right where I needed it: on my hard-drive. This meant I could push to the Github repository irregularly but still have the power of versioning whenever I needed it: in this case in the form of Git.

Lesson #4: the UNIX mentality a.k.a. “do one thing and do it well”

I often get into the habit of concentrating on one programming task and sticking to it. Though I also stray from the task — by hyper-focusing for a bit and changing several things at once, ultimately confusing myself — one is almost always better off just following the UNIX mantra of doing one thing and doing it well: in my case implementing and committing the parsing code in one step and then the utilisation of that feature in the next.

Lesson #5: stop the features and tidy the code

It doesn’t always have to be a frantic race to the finish line for implementing all the features. In my process of learning and trial and error I wrote some crappy code. Once I’d felt I had a handle on what I was working on and ensured my understanding of the structure of the .class files, I spent time removing bits I hated, simplifying the complex parts of the codebase and satisfying my pedantic nature. The added benefit of this is that you stop the head-wind for how you’re implementing features which when the break for refactoring is over can mean you change your approach for a better solution and thus better code.

Lesson #6: cut the WiFi

Seriously: stop relying on Google so much and solve it yourself. Your skills, knowledge and ego will thank you.

My code’s over on Github if you’re interested.

