Category Archives: Java

Holiday Hack: reading Java class files

I spent a week by the pool in Egypt with my girlfriend, relaxing and…hacking. The lack of WiFi meant I had to rely on instinct and intellect to get me through my pet project and it proved a worthy exercise in problem solving and perseverance…

I had the crazy idea of practicing my C programing skills by reading the binary .class file format and…doing something with it. A friend even suggesting writing my own Java Virtual Machine, which I’m probably going to do at some point.

Anyhow, a week by the pool meant no WiFi, lots of food and drink, sunburn and…boredom. I set it upon myself to rely purely on the documentation I had to hand — at first just a cached Wikipedia page, later on the JVM specification — to read .class files and practice C.

It turns out this basic task proved to be a useful exercise not only in practicing C but in perseverance and relying on one’s intuition and experience to solve a problem, not just on “googling it” and Stackoverflow.

Lesson #1: having the source makes things easier

Though I’ve learned this many times before in other languages, having the source to hand means you can truly debug an issue.

I happened to be using uthash.h for a hash table as part of my data structures. The mistake I’d made was define my id type as a uint16_t — i.e. a 2-byte unsigned integer type — and then use the HASH_ADD_INT and HASH_FIND_INT convenience macros for manipulating/searching the hash table. Of course an int is 4-bytes on my Debian x86 laptop, so when the macro expanded to something along the lines of HASH_FIND(hh, table, id, sizeof(int), ptr); I was never finding items because the generated memcmp call was comparing the wrong amount of bytes. A rookie error, but a lesson nonetheless.

Lesson #2: don’t quit

Building on lesson #1, I was reminded that I am only limited by time, my own stamina/patience and ability to think around a problem. My limited experience with C and the advanced nature of uthash.h‘s source code meant it was daunting trying to debug and understand what it was doing and ultimately where I was going wrong. Alas, by the end of the debugging exercise I was much the wiser as to the workings of C and its true power. Reading the source code of others is always a useful task and had I quit when seeing the wall of #defines I’d have not better understood macros, C or the inner-workings of data types in C. The lesson? Keep chugging away – even when it’s frustrating you and you think you’re lost it can pay wonders to take 5 minute swim in the pool and a coffee to re-tackle a bug.

Lesson #3: DVCS is really useful

The only time I had WiFi was when we ventured up to the dedicated area near reception. Our room and the majority of facilities were nowhere near it, which meant we’d have to dedicate half an hour to trekking up to reception, hoping the connection’s working (Egypt’s infrastructure isn’t what we’re used to here in the west!) and frantically downloading what I can whilst the bandwidth isn’t saturated by Facebook pokes.

The point, though, is that I had versioning control on all the time. I didn’t have to commit to a remote server to save my small changes, improvements or fixes, I had it right where I needed it: on my hard-drive. This meant I could push to the Github repository irregularly but still have the power of versioning whenever I needed it: in this case in the form of Git.

Lesson #4: the UNIX mentality a.k.a. “do one thing and do it well”

I often get into the habit of concentrating on one programming task and sticking to it. Though I also stray from the task — by hyper-focusing for a bit and changing several things at once, ultimately confusing myself — one is almost always better off just following the UNIX mantra of doing one thing and doing it well: in my case implementing and committing the parsing code in one step and then the utilisation of that feature in the next.

Lesson #5: stop the features and tidy the code

It doesn’t always have to be a frantic race to the finish line for implementing all the features. In my process of learning and trial and error I wrote some crappy code. Once I’d felt I had a handle on what I was working on and ensured my understanding of the structure of the .class files, I spent time removing bits I hated, simplifying the complex parts of the codebase and satisfying my pedantic nature. The added benefit of this is that you stop the head-wind for how you’re implementing features which when the break for refactoring is over can mean you change your approach for a better solution and thus better code.

Lesson #6: cut the WiFi

Seriously: stop relying on Google so much and solve it yourself. Your skills, knowledge and ego will thank you.

My code’s over on Github if you’re interested.

The final keyword in Java

Whilst many will frown at the use of the final keyword in Java code, I find it a breath of fresh air. Perhaps it’s that I tend to lean on the side of caution, conservatism and safety when it comes to code: immutability being one of many examples of such. In this post I argue using it better than not, much like avoidance of doubt lends itself to stronger solutions.

Immutability is therefore the strongest reason I quote when one asks why I declare as much as possible as final in my code, but as with many things the use of final does have its caveats.

final Variables
According to the Java Language Specification:

A variable can be declared final. A final variable may only be assigned to once. Declaring a variable final can serve as useful documentation that its value will not change and can help avoid programming errors.

It is a compile-time error if a final variable is assigned to unless it is definitely unassigned (§16) immediately prior to the assignment.

which means local or class scope, if you declare a variable final it must be assigned and (according to certain rules) and the compiler complains on subsequent attempts.

To me, this is useful because if something cannot change then why let it? If I can let the compiler do something for me reliably and consistently, then why shouldn’t I? Again: if a value shouldn’t change then make sure it doesn’t.

Readability
Some might find the following code more readable than an equivalent with the use of final:

public float calculateAverageAge(Collection personCol) {
float ageSum = 0;
for (Person p : personCol) {
ageSum += p.getAge();
}
return ageSum / personCol.size();
}

Yet, when we compare, there’s little difference:

Java final keyword code diff

With that said, the old adage “code is read many more times than it’s written” is a strong case against; though I personally feel it’s not actually *that* unreadable: but I risk venturing into an argument of subjectivity which we all know is futile. Perhaps it’s only less readable because you’re not used to it? Perhaps if it were commonplace we’d never have a problem. If it’s a screen real estate issue: final has the same number of letters as the C keywords const and there are those who argue that using const is good practice too; not to mention that we’re no longer in the days of 80-character-wide terminals.

Immutability
I think it’s fair to state that software is complex. Reducing complexity where possible makes it easier to reason about solutions. Solutions that are easier to reason about are therefore easier to implement at a programming level.

One assault on complexity in any given codebase is the mutation of state: changing properties of some entity that some aspect of the system’s logic relies on leads to a growth in what has to be considered. One could argue then, that reducing state mutation reduces complexity and this therefore leads to an easier solution. It is here where final can aid. My reference cannot change. Recursively, the properties within that object reference cannot change, which in turns means I have less surprise and less to reason about when using those objects. It is a solution that can cascade.* If you don’t agree: replace “easier” with “simpler” and reconsider.

Again: if something cannot change then why let it? In a block of code that declares 5 references and only one can be change (reassigned) then it is that one we have to worry about. It is that one that the unit test should cover in more cases. It is that one that the new programmer reading the code 6 months after it was written has to watch.

Functional Programming
In the functional programming world the idea of purity is a fundamental tenet. Functions are pure if they have no side effects. They are idempotent in nature: the same input produces the same output. Whereas in contrast the OO world does not have such a practice, at least in the same way. To satisfy encapsulation we have mutators which provide an interface to some property of an object. Coupled with abstraction these allow the internal structure of that object to change without forcing change on its clients; but herein lies the problem. Compile time changes are but one fraction of change. Of equal concern are the semantics of that dependency. If the code doesn’t have to be recompiled then great, but what about the actual intent behind that link? Has the logic changed? What impact does that have? How can I tell? The answer in either case is that you cannot tell without testing or without analysing the code. I don’t see that as a huge problem: with change we need to assert its validity. If we can encourage code to “do what it says on the tin” then we have simpler solutions. If side effects are non-existent then we have another string to our bow of simplicity. K.I.S.S., right?

Conclusion
It’s never easy to argue a case for doing something absolute in the professional software development “arena”. One learns either the hard way or through teamwork that these absolute rules are few and far between. Similarly, applying a concept or pattern blindly or where it is inappropriate for the solution leads to problems. Whilst I’d argue — as I have above — that reducing complexity is always something a solution should head towards, sometimes it’s unavoidable. Why then, would we not reduce it where we can, letting us spend energy on the elements that are complex, on the components that cannot be diluted further?

* Update: as stated on reddit, the final keyword does not extend to the fields of an object instance, unlike C’s const on a struct. I omitted this deliberately for a follow up post ;)

Back to IntelliJ IDEA

Like hundreds of others I bet, I purchased an upgrade personal license for Intellij IDEA 12 in their “end of world” sale. I’d heard good things about this latest version, most importantly significant improvements in the area of performance.

I used IntelliJ for about 2 years a while back and was really happy with it until I had to pay again for an upgrade. A somewhat offputting pricing model.

Now, having used NetBeans, Eclipse and IntelliJ, I’d happily say that IntelliJ is now my favourite again. It’s really fast now. Joyfully so. I’ll have to revisit once the honeymoon period is out. It doesn’t play nice with Xmonad yet, but most Java apps don’t.

Clojure’s Concurrency: easy atoms

Clojure’s atoms let one transactionally and atomically store values. When one uses an atom, Clojure manages the mutation of the value and guarantees atomicity. Such a feature is very useful in a highly concurrent application. Much like Java’s Atomic* classes, but somewhat more powerful.

This is a brief introduction.

Atomic Values
To define an atom, one simply invokes the atom function value with an initial value (just like agents; see also agents and futures):
(def a (atom 0)) ; create an atom and let me refer to it by the identifier a

Now a is our reference to an atom which is managed by Clojure.

To get the value held in a, we simply dereference it:
user=> @a ; or (deref a)
0

Applying state changes to atoms can be achieved in two ways.

swap!
Calling swap! on an atom and passing a function will synchronously and atomically compare-and-set the atom’s value. If we wanted to increase the value held in a we’d do the following:
(swap! a inc)

At which point a would now hold 1:
user=> @a
1

Note: most of the state-mutation functions in Clojure’s concurrency features return the new/current state of the target. For atoms, calling swap! will return the new value should it succeed.

The compare-and-set nature is very useful because agents have another powerful element: validators. When defining an agent, you can optionally pass the :validator key and a function. This function will be used for the compare-and-set “phase”.

Let’s redefine a with our new validator:
(def a (atom 0 :validator even?))

This basically says “let a reference an atom whose initial value is 0 and call the function even? before setting any new values”.

If we then called swap! with the inc function on an initial value of 0, we’d expect it to fail:
user=> (swap! a inc)
IllegalStateException Invalid reference state clojure.lang.ARef.validate (ARef.java:33)

which is awesome, as we can get the atomicity from using atoms but not have code dotted around the place performing pre-requisite checks on new values and all supporting high concurrency.

reset!
Another way of mutating an atom is by using the reset! function to assign the atom a new value:
(reset! a 4)

If an atom has a validator assigned, this will still execute when calling reset!:
user=> (reset! a 1)
IllegalStateException Invalid reference state clojure.lang.ARef.validate (ARef.java:33)

There’s nothin’ wrong with the Findbugs Maven Plugin

I made a schoolboy error. I ventured down the road named “Thinking your tools are broken instead of your code”. I haven’t done that in years.

There’s nothing wrong with the Findbugs plugin for Maven. If you want to scan sub-packages by using the onlyAnalyze option, simply end your package declaration with .-. The manual says so.

Is findbugs-maven-plugin onlyAnalyze broken?

I cannot seem to get onlyAnalyze working for my multi-module project: regardless of what package (or pattern) I set, maven-findbugs-plugin either parses everything or nothing and it doesn’t evaluate sub-packages as I’d expect from passing it packagename.*.

To prove either myself or the plugin at fault (though I always assume it’s the former!), I setup a small Maven project with the following structure:


pom.xml
src/
main/java/acme/App.java
main/java/acme/moo/App.java
main/java/no_detect/App.java

which is very simple!

The POM has the following findbugs configuration:

    <build>
        <plugins>
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>findbugs-maven-plugin</artifactId>
                <version>2.4.0</version>
                <executions>
                    <execution>
                        <phase>verify</phase>
                        <goals><goal>findbugs</goal><goal>check</goal></goals>
                    </execution>
                </executions>
                <configuration>
                    <debug>true</debug>
                    <effort>Max</effort>
                    <threshold>Low</threshold>
                    <onlyAnalyze>acme.*</onlyAnalyze>
                </configuration>
            </plugin>
        </plugins>
    </build>

and every App.java has the following code with two obvious violations:


package acme;
import java.io.Serializable;

public class App implements Serializable
{
private static final class NotSer {
private String meh = "meh";
}

private static final NotSer ns = new NotSer();// Violation: not serializable field

public static void main( String[] args )
{
ns.meh = "hehehe";// Vilation: unused
System.out.println( "Hello World!" );
}
}

Note that no_detect.App has the same content as above, but my expectation is that it wouldn’t be
evaluated by findbugs because I have the “onlyAnalyze” option set to acme.* which I assume would
evaluate acme.App and acme.moo.App and nothing else.

I now execute a mvn clean install to clean, build, test, run findbugs, package, install, which
produces the following findbugs report (snipped for brevity) and results in a build failure which is expected
because acme.App and acme.moo.App:


<BugInstance category='BAD_PRACTICE' type='SE_NO_SERIALVERSIONID' instanceOccurrenceMax='0'>
<ShortMessage>Class is Serializable, but doesn't define serialVersionUID</ShortMessage>
<LongMessage>acme.App is Serializable; consider declaring a serialVersionUID</LongMessage>
<Details>
<p> This field is never read.&nbsp; Consider removing it from the class.</p>
</Details>
<BugPattern category='BAD_PRACTICE' abbrev='SnVI' type='SE_NO_SERIALVERSIONID'><ShortDescription>Class is Serializable, but doesn't define serialVersionUID</ShortDescription><Details>
<BugCode abbrev='UrF'><Description>Unread field</Description></BugCode><BugCode abbrev='SnVI'><Description>Serializable class with no Version ID</Description></BugCode>

To summarise: only acme.App is analysed, acme.moo.App (bad) isn’t and neither is
no_detect.App (good).

I tried with two wildcards in the onlyAnalyze option but that produces a **successful build** but
with a findbugs error (Dangling meta character '*' etc).

I tried with onlyAnalyze set to acme.*,acme.moo.* which analyzes all the expected
classes (acme.App and acme.moo.App) which means it “works” but not as I expect; i.e. I have to explicitly declare all parent-packages for the classes I want to analyze: that could get large and difficult to maintain on a multi-module project!

Must. Keep. Trying.

Notes on Maven vs Ant

The Problem

Do you have lack of consistency and understanding how your software is built? Does this equal varying degrees of success – both in each build and in changing or adding new features to it – and lack of confidence from your developers?

Did you roll your own build process that’s "the best thing since sliced bread" but only a select few developers really know how to use it? Do you dedicate resource to tweaking or fixing your build scripts?

To put it another way: are all your developers building the software using the same scripts your continuous integration environment is using?

What’s Wrong with the Above?

I believe building software should not be a complex process. It should be rare for a developer to delve into the build system to tweak it.

I also believe that because there are common parts to building – source code, tests, packaging – that there is little need for a "shop" to be going it alone and building something vanilla just because they can.

Essentially, building software comes down to performing a few tasks regularly and consistently; a developer will need to build, test and package regularly. Rarely do they need to work on the build system (unless something’s broken…).

With this in mind, should developers therefore not be able to perform these tasks without thinking about it? Wouldn’t it be beneficial to us all – as an industry even – to be able to go anywhere and build most software the same way because the requirements for building doesn’t differ?

I’m not making a case for never "rolling your own", I’m merely trying to clarify that those who do may not need to do so. Granted, some scenarios warrant custom build scripts and the like.

The Ant Perspective

With Ant, building means code. It means logic and that invariably doesn’t get the same strictness and quality applied to it like your everday Java and its counterparts. This means the build scripts quickly become convoluted, poorly maintained due to a "get it working" attitude. Unless a good/strong developer is dedicated to maintaining and scaling the build scripts, the job gets shared around and each unique developer’s perspective warps the overall goal.

Most software building is just a set of tasks that are repeated for each build. Compile this. Package this. Distribute that. It doesn’t make sense to have a human do that: computers are here to do the mundane stuff.

The Maven Perspective

With Maven, you just provide metadata. The framework provides the logic in a consistent and predictable manner.Don’t you want developers to be concentrating on writing code: fixing bugs, implementing new functionality and improving the quality of the product.

Maven allows one to have artefacts produced properly, with correct structure, metadata and naming conventions, that mean a developer – or person – doesn’t have to think to do it. These artefacts are of production quality, always.

Consistency

Consistency means one does not have to think about how or what they’re going to do when it comes to the build process.

If you’re a foreman building me a house and I say "build it this way please" and hand you a 3000 page instruction manual on how the Tudors and Stuarts built their houses prior to the great fire of London, you don’t say "Yeah sure!" you recoil because years of history and experience has taught you otherwise.

Is it fair to say that developers need to spend time fixing bugs and implementing new features and not spending time tweaking the bugs in the build system? Would you be happy to pay for a plumber to spend time fixing his toolbox instead of correcting that leak? If this isn’t acceptable, then you can understand why businesses might not want to spend the time (money) correcting your poor house keeping.

Quality Across the Board

Consistency breeds quality. A scenario where best practice is common, reliable and well-established norms are instinct encourages going the extra mile to write better software. In contrast, an unreliable, poorly designed and maintained mode of practice breeds contempt. Building your software is part of all of this and is a contributor to success or failure.

I’m by no means saying "blame your tools", just that better-quality tools can contribute to improved output.

Environments are a Different Beast

Environments have numerous issues whereas development seems fine; "Works for me" is far too common a response for when someone has to investigate a problem in a given deployment target.

The big wins with Maven

  • Consistency across the board. It shouldn’t matter where you go or what customer project you’re on, you build, deploy, package and distribute artefacts the same way. Imagine a situation where developers can go anywhere and "pick up where they left off": start developing within minutes.

  • Junit testing – maven bakes testing from the beginning, which brings it to the developer’s mindset at the right points (i.e. always)

  • Greatly simplifies build processes: it doesn’t become a matter of how, but when. Developers learn a simple set of commands that never changes

  • Dependency management is excellent: simplifies set up of projects, developers can get up-and-running within minutes

  • Packaging is excellent. Multiple formats supported, from JARs, EARs to Zip, TAR etc and it is not left to the understanding and implementation of the developer implementing the build script

  • Industry recognised: we don’t have to have developers "get up to speed" because maven works the same everywhere, developers just remember a few goals to tell maven what to do: compile, clean, test, package and they’re ready to start coding

  • Have your developers building code and understanding how it’s built, not programming by coincidence. Don’t be in a position where your developers are relying on their IDE to build and having a separate process entirely for your CI!

You wouldn’t build a palace with breeze blocks

I strongly believe that if you want to build awesome software, then every aspect of it needs to be as good as you want the final product to be.

It is often the case that aged software shops will have put in place aspects of their software that they’ll “revisit later on”, which then doesn’t happen.

It gets lost to the crowd of “more important things to do”.

One of the most common neglects I see is how a shop will build, deploy and distribute its software. At best their solution is amateur. It’s usually home-baked and rigged with esoteric and often superfluous tasks and features that are “necessary” due to some poor design decision made previously. The build system — be it continuous integration or some poor sod in the corner — has to do 30 different things, copy 11 thousand files, just to compile the few.

And so the problem persists; the rot festers; the software continues to be sub-par and so do the profit margins.

I feel that if you start from the ground-up, that you build consistently and efficiently then your software becomes such. If it is easy for a developer to build and deploy or distribute your software, it’s then easy for them to do their job. Any developer knows the pains they’ll go through day-to-day when making a change that requires a 15-minute rebuild and 5 minute deployment wait just to check the first name field doesn’t fall over on a null pointer.

With poor practice comes poor software comes poor profit.

So if you’re heading up a development shop and your developers are crying for a more efficient, simpler, consistent build process, then heed their recommendations and concerns. Let them spend the time and money streamlining how they work, because you’ll get reliability, efficiency and less coffee breaks if you do.

Would you pay for that shiny new tablet PC if it came packaged in A4 paper?

How disappointed do you feel with a product when you hold it and you “know” its build quality is poor.

Why is software any different?

You wouldn’t build a palace with breeze blocks.