Tag Archives: programming

Holiday Hack: reading Java class files

I spent a week by the pool in Egypt with my girlfriend, relaxing and…hacking. The lack of WiFi meant I had to rely on instinct and intellect to get me through my pet project and it proved a worthy exercise in problem solving and perseverance…

I had the crazy idea of practicing my C programing skills by reading the binary .class file format and…doing something with it. A friend even suggesting writing my own Java Virtual Machine, which I’m probably going to do at some point.

Anyhow, a week by the pool meant no WiFi, lots of food and drink, sunburn and…boredom. I set it upon myself to rely purely on the documentation I had to hand — at first just a cached Wikipedia page, later on the JVM specification — to read .class files and practice C.

It turns out this basic task proved to be a useful exercise not only in practicing C but in perseverance and relying on one’s intuition and experience to solve a problem, not just on “googling it” and Stackoverflow.

Lesson #1: having the source makes things easier

Though I’ve learned this many times before in other languages, having the source to hand means you can truly debug an issue.

I happened to be using uthash.h for a hash table as part of my data structures. The mistake I’d made was define my id type as a uint16_t — i.e. a 2-byte unsigned integer type — and then use the HASH_ADD_INT and HASH_FIND_INT convenience macros for manipulating/searching the hash table. Of course an int is 4-bytes on my Debian x86 laptop, so when the macro expanded to something along the lines of HASH_FIND(hh, table, id, sizeof(int), ptr); I was never finding items because the generated memcmp call was comparing the wrong amount of bytes. A rookie error, but a lesson nonetheless.

Lesson #2: don’t quit

Building on lesson #1, I was reminded that I am only limited by time, my own stamina/patience and ability to think around a problem. My limited experience with C and the advanced nature of uthash.h‘s source code meant it was daunting trying to debug and understand what it was doing and ultimately where I was going wrong. Alas, by the end of the debugging exercise I was much the wiser as to the workings of C and its true power. Reading the source code of others is always a useful task and had I quit when seeing the wall of #defines I’d have not better understood macros, C or the inner-workings of data types in C. The lesson? Keep chugging away – even when it’s frustrating you and you think you’re lost it can pay wonders to take 5 minute swim in the pool and a coffee to re-tackle a bug.

Lesson #3: DVCS is really useful

The only time I had WiFi was when we ventured up to the dedicated area near reception. Our room and the majority of facilities were nowhere near it, which meant we’d have to dedicate half an hour to trekking up to reception, hoping the connection’s working (Egypt’s infrastructure isn’t what we’re used to here in the west!) and frantically downloading what I can whilst the bandwidth isn’t saturated by Facebook pokes.

The point, though, is that I had versioning control on all the time. I didn’t have to commit to a remote server to save my small changes, improvements or fixes, I had it right where I needed it: on my hard-drive. This meant I could push to the Github repository irregularly but still have the power of versioning whenever I needed it: in this case in the form of Git.

Lesson #4: the UNIX mentality a.k.a. “do one thing and do it well”

I often get into the habit of concentrating on one programming task and sticking to it. Though I also stray from the task — by hyper-focusing for a bit and changing several things at once, ultimately confusing myself — one is almost always better off just following the UNIX mantra of doing one thing and doing it well: in my case implementing and committing the parsing code in one step and then the utilisation of that feature in the next.

Lesson #5: stop the features and tidy the code

It doesn’t always have to be a frantic race to the finish line for implementing all the features. In my process of learning and trial and error I wrote some crappy code. Once I’d felt I had a handle on what I was working on and ensured my understanding of the structure of the .class files, I spent time removing bits I hated, simplifying the complex parts of the codebase and satisfying my pedantic nature. The added benefit of this is that you stop the head-wind for how you’re implementing features which when the break for refactoring is over can mean you change your approach for a better solution and thus better code.

Lesson #6: cut the WiFi

Seriously: stop relying on Google so much and solve it yourself. Your skills, knowledge and ego will thank you.

My code’s over on Github if you’re interested.

Clojure’s Concurrency: easy atoms

Clojure’s atoms let one transactionally and atomically store values. When one uses an atom, Clojure manages the mutation of the value and guarantees atomicity. Such a feature is very useful in a highly concurrent application. Much like Java’s Atomic* classes, but somewhat more powerful.

This is a brief introduction.

Atomic Values
To define an atom, one simply invokes the atom function value with an initial value (just like agents; see also agents and futures):
(def a (atom 0)) ; create an atom and let me refer to it by the identifier a

Now a is our reference to an atom which is managed by Clojure.

To get the value held in a, we simply dereference it:
user=> @a ; or (deref a)
0

Applying state changes to atoms can be achieved in two ways.

swap!
Calling swap! on an atom and passing a function will synchronously and atomically compare-and-set the atom’s value. If we wanted to increase the value held in a we’d do the following:
(swap! a inc)

At which point a would now hold 1:
user=> @a
1

Note: most of the state-mutation functions in Clojure’s concurrency features return the new/current state of the target. For atoms, calling swap! will return the new value should it succeed.

The compare-and-set nature is very useful because agents have another powerful element: validators. When defining an agent, you can optionally pass the :validator key and a function. This function will be used for the compare-and-set “phase”.

Let’s redefine a with our new validator:
(def a (atom 0 :validator even?))

This basically says “let a reference an atom whose initial value is 0 and call the function even? before setting any new values”.

If we then called swap! with the inc function on an initial value of 0, we’d expect it to fail:
user=> (swap! a inc)
IllegalStateException Invalid reference state clojure.lang.ARef.validate (ARef.java:33)

which is awesome, as we can get the atomicity from using atoms but not have code dotted around the place performing pre-requisite checks on new values and all supporting high concurrency.

reset!
Another way of mutating an atom is by using the reset! function to assign the atom a new value:
(reset! a 4)

If an atom has a validator assigned, this will still execute when calling reset!:
user=> (reset! a 1)
IllegalStateException Invalid reference state clojure.lang.ARef.validate (ARef.java:33)

Clojure’s Concurrency: Futures and Agents in Harmony

I’ve previously written on the wonders of Clojure’s agents, giving the programmer a wonderfully easy way of writing asynchronous code with very little effort.

Here’s a slightly more complex example for those wanting more context.

Combining Futures and Agents
We’ll use this (deliberately poor) inefficient find-primes in a (future) to allow for asynchronous processing in a seperate thread — which will also write to our agent — and continue with other tasks as necessary.

Futures are Clojure’s way of syphoning off some calculation in the background so you can retrieve it later on. It’s fork/join, but a hell of a lot simpler.

To define a future, all we have to do is assign it to some variable:
(def f (future (some-complex-long-running-function)))

and when we’re ready to get the value from the future, we just dereference it:
user=> @f
1

If the future is still processing when you dereference, it will block. This isn’t the same as agents which won’t block, but pass you the current value, unless you use (await).

Our inefficient (find-primes) function is a prime (HAH) candidate for asynchronous execution: knowing that it’s slow means we can let it run in the background while we favour other, more pressing tasks in our main thread.

(def f (future (find-primes 60000)))

So f‘s our reference to a new future that is happily running in the background. Calculating all primes up to 60,000 will take a while with the poor (find-primes) implementation (around 14 seconds). Let’s do the user a favour and present the results in a GUI. Here’s the full function:
(defn show-primes [i]
"Find all primes up to i inclusive and present them in a GUI"
(let [fr (JFrame. "Prime Numbers")
lbl (JLabel. (str "Here are all the prime numbers for " i ":"))
ta (JTextArea.)
sp (JScrollPane. ta)
pane (.getContentPane fr)]
(def f (future (find-primes i))) ; asynchronously find the primes while we set up the GUI
(.setPreferredSize lbl (Dimension. 410 20))
(.setPreferredSize sp (Dimension. 410 190))
(.setLineWrap ta true)
(.setSize fr 410 210)
(.add pane lbl BorderLayout/PAGE_START)
(.add pane sp BorderLayout/CENTER)
(.pack fr)
(.setVisible fr true)
(dotimes [n (count @f)] (.setText ta (str (.getText ta) (@f n) "\n")))))

As you can see at the future line, we let Clojure asynchronously execute the prime generation function while we set the GUI, then we add to the text area by derefencing the future — which will block if its work isn’t complete — and finally present the results.

It’s lovely: with one function we can fork calculation and get on with something else, retrieving the results at a later stage. Simple, easy.

There’s nothin’ wrong with the Findbugs Maven Plugin

I made a schoolboy error. I ventured down the road named “Thinking your tools are broken instead of your code”. I haven’t done that in years.

There’s nothing wrong with the Findbugs plugin for Maven. If you want to scan sub-packages by using the onlyAnalyze option, simply end your package declaration with .-. The manual says so.

Modern Service Oriented Architecture: the basics

Just some of my notes on looking over SOA related theory.

What is “SOA”?
Service Oriented Architecture is a means of exposing business services
separately from their platform and codebase in order to provide local
and/or remote invokation of said services through abstracted data types
and signatures.

Put simply: SOA allows you to talk to alternate platforms by removing
the bespoke or language-specific features and data types to a
higher/specified representation providing interoperability between
platforms without the need for their awareness of eachother.

As a real world example, an organisation can provide interoperation
between legacy and new systems by abstracting the manner of how they
talk and the underlying content: new code can use technologies like XML
to define the data and the corresponding types and the two systems can
work together through that abstraction. This scenario is commonplace
among business where different streams are responsible for systems and
so a common format for data interchange is necessary to allow the two to
co-exist separately.

SOA also provides a more modular approach to service integration: if a
web service can “proxy” requests from one system to another, the two can
change independently without the need for related changes to the
services themselves. Such a scenario – one of proxying – can encourage
an ecosystem of services allowing each to communicate with seemingly
unrelated counterparts through a common set of data types and methods.

Benefits of SOA
Typically, the “promises” of SOA are numerous.

SOA allows for system agility
As aluded to previously, the ability for bespoke systems – those written
for specific purposes in a language of choice – to communicate with each
other allows for greater flexibility and scalability with an
organisation’s data. For example, if you can have your sales systems
talking to your suppliers through an aggregation service to warn of
unusual sales volumes, out-of-stock items and so-on, you can better
support the volume of a growing business and provide real-time response
to customer needs.

Consider an organisation where a sales system cannot speak with the MI
to ascertain whether a customer on the phone holds an account with you.
Crazy, no? In scenarios like this where legacy, decades-old and
difficult to support systems hold the key to sales success but are the
hindrance to it, SLA can be leveraged to quickly and easily have the two
communicating and providing front-line departments with the information
they need.

SOA can encourage innovation
In the post-Google era, notions of service “mash-ups” are commonplace.
These newfound, new-age businesses popularised the developer by
harnessing open source technologies and building public-facing APIs: all
built on service oriented architecture. They encouraged interoperation
between seemingly unrealted entities which built a platform for
innovation and a revolution in the software industry.

If you reduce this scale and manage to accomplish this very nature
within an organisation, then new products and alternatives to existing
ones spring from innovative and organic attitudes from developers and
system integrators.

Convert LaTeX to any output format easily

I use LaTeX to take notes, record TODO lists, the shopping and everything in between.

Oftentimes, I’ll need to copy my notes to somewhere: a wiki, this blog, or to put in an email. It’s at this point I need to quickly convert to my target format.

So, to satisfy the above, I wrote a wrapper script that takes a file name and corresponding target format for output (according to Pandoc):

#!/bin/bash
if [ $# -lt 2 ]; then
echo "Please provide a filename and output format compatible with pandoc"
exit 1
fi
pandoc -f latex -t $2 $1

Very simple. Like I said, take a file name and the pandoc output format and then pass it all to pandoc to convert the LaTeX original to a format of my choosing. Very handy!

You wouldn’t build a palace with breeze blocks

I strongly believe that if you want to build awesome software, then every aspect of it needs to be as good as you want the final product to be.

It is often the case that aged software shops will have put in place aspects of their software that they’ll “revisit later on”, which then doesn’t happen.

It gets lost to the crowd of “more important things to do”.

One of the most common neglects I see is how a shop will build, deploy and distribute its software. At best their solution is amateur. It’s usually home-baked and rigged with esoteric and often superfluous tasks and features that are “necessary” due to some poor design decision made previously. The build system — be it continuous integration or some poor sod in the corner — has to do 30 different things, copy 11 thousand files, just to compile the few.

And so the problem persists; the rot festers; the software continues to be sub-par and so do the profit margins.

I feel that if you start from the ground-up, that you build consistently and efficiently then your software becomes such. If it is easy for a developer to build and deploy or distribute your software, it’s then easy for them to do their job. Any developer knows the pains they’ll go through day-to-day when making a change that requires a 15-minute rebuild and 5 minute deployment wait just to check the first name field doesn’t fall over on a null pointer.

With poor practice comes poor software comes poor profit.

So if you’re heading up a development shop and your developers are crying for a more efficient, simpler, consistent build process, then heed their recommendations and concerns. Let them spend the time and money streamlining how they work, because you’ll get reliability, efficiency and less coffee breaks if you do.

Would you pay for that shiny new tablet PC if it came packaged in A4 paper?

How disappointed do you feel with a product when you hold it and you “know” its build quality is poor.

Why is software any different?

You wouldn’t build a palace with breeze blocks.

What’s an “Interface” in Java?

I was browsing reddit this afternoon and came across this post. It’s an interesting concept to explain. I mean, what is an interface in Java?

I thought through the best way to explain this to someone who doesn’t understand OO or Java in general, and I came up with the following.

The Letterbox Analogy
Think of a letterbox.

Doesn’t matter who makes the letterbox, it’ll always operate the same way. Doesn’t matter what country you go to, if you want to post a letter, you just put it through a letter box.

public interface Letterbox {
void post(Letter letter);
}

If I’m a manufacturer, I might want to make exquisite letterboxes. I’ll make ‘em out of gold, and they’ll be sturdy and look outstanding. The shape, though, will remain the same as any other. So I’ll implement the Letterbox interface:

public class GoldLetterBox implements Letterbox {
public void post(Letter letter) {
// some code...
}
}

and then people will buy them, and they’ll put them in doors, and their postal employees will put their letters and packages through the door. It’ll still be Super Special Gold Letterbox Co’s letterbox, but everyone will know how to use it.

Linux + svn + ssh+ Samba/NTFS: Operation Not Permitted!

My fancy-dancy and super-awesome SheevaPlug has certainly settled in at home.

It exposes my media from my NAS so I can access it anywhere; runs transmission-daemon headless for all the ISO downloading of open source software I do; runs subversion/svn for my source code versioning needs; cleans the cat when it runs in with…OK, no. Honestly: it’s simply brilliant.

The latter function — subversion — was suffering from a teething problem or two. I run it via ssh login, so I configured openssh-server to match a group and force running svnserve as follows:

Match Group svn
ForceCommand /usr/bin/svnserve -t -r /mnt/code/scm/svn/repo/

which allows me to login using ssh users and be super-secure over-the-wire. Great.

I then mounted my NAS share (still on the SheevaPlug) that holds the (yet to be created) svn repo. using the following line in /etc/fstab:

//192.168.1.90/code /mnt/code smbfs username=alex,password=moo,uid=1002,gid=1002 0 0

all good.

I then create a repository and try to commit some code, but am presented with:

Transmitting file data .svn: Commit failed (details follow):
svn: Can't chmod '/mnt/code/scm/svn/repo/atc/db/tempfile.8.tmp': Operation not permitted

darn.

Turns out the uid option in the fstab line was referring to a non-existing user. So I corrected it and added umask=000 to the fstab line and it was all back up-and-running.

Sweet.

Off to code more bugs.