25 July 2009

setting SVN autoprops

I've just enabled the auto-props feature for my local svn installation (see here how to do it). Seriously, this is the 1st thing you should do after you install a subversion client as cleaning up your later, when you have already added hundreds of files to your repository is a tedious task :-(

As usual I googled a little bit for some advise how to do it best and to my suprise I usually found something like this:

[auto-props]
*.java = svn:mime-type=text/plain;svn:eol-style=native;svn:keywords=Date Revision
*.sh = svn:executable;svn:eol-style=native;svn:keywords=Date Revision
(e.g. here)

So what's the problem with that list (except that svn:mime-type is missing for .sh)?
svn:eol-style is set to native which means that files are automatically converted to the platform line end style when they are checked out - e.g. to CRLF on Windows.

Why is this a problem?
Repeatable builds! If you check out the same revision once on Windows and once on Linux you get different artifacts when you build it with e.g. Maven. For Java files this may be no big deal as they are usually compiled into class files, but shell scripts (which get packaged in your Maven artifact) may stop to work correctly if they have suddenly CRLF instead of LF line ends.

So my auto-props configuration looks like this:

*.sh = svn:eol-style=LF;svn:executable
*.txt = svn:eol-style=LF
*.java = svn:mime-type=text/plain;svn:eol-style=LF;svn:keywords=Date Revision Author Id HeadURL
*.xml = svn:mime-type=text/xml;svn:eol-style=LF
*.properties = svn:mime-type=text/plain;svn:eol-style=LF
*.png = svn:mime-type=image/png
*.jpg = svn:mime-type=image/jpeg

Come on guys, there is really no need to use native style on Windows anymore! If your editor/IDE cannot handle LF ends, get a better one!


Note: unfortunately there is currently no way to define auto-props on a repository level (http://subversion.tigris.org/issues/show_bug.cgi?id=1974), so I have to repeat this procedure on my Linux box now

05 April 2009

Ten Advises on Concurrency (in Java)

I've been working with all kinds of threading and synchronization mechanisms in Java for many years now. Though obviously much progress has been made in the common understanding of concurrency (e.g. through such wonderful books like Java Concurrency in Practice by Brian Goetz), I've seen the same errors made again and again.

The problem is, I suppose, that the matter is much to complicated for the average programmer. Therefore I tried to compile ten simple advises for concurrency in Java:
  1. if 2 or more threads access the same data, synchronization is necessary. Period.
  2. synchronized methods are not expensive (at least not as expensive as the time needed to debug the data inconsistencies you get otherwise)
  3. iterating over a synchronized collection is not safe (i.e. is not protected against ConcurrentModificationExceptions)
  4. many classes in the Standard Java library, which you think of as thread-safe, are not. (e.g. DateFormat)
  5. don't try to invent your own clever solutions to avoid synchronization (they are broken most of the time)
  6. use the classes from java.util.concurrent, but read the javadoc carefully!
  7. check-and-act operations (e.g. incrementing a counter) are not atomic and thus inherently not thread-safe
  8. learn the basics:
    • Thread.start/join
    • synchronized-blocks
    • Object.wait/notify/notifyAll
    • volatile
  9. programming in a concurrent scenario is not intuitive!
  10. When multiple locks are involved in a single operation, be aware of lock ordering or deadlocks will occur sooner or later.
Of course, most of these points cannot be left uncommented:

1.) with synchronization in this context I mean all kind of inter-thread-communication. Java synchronized-blocks are just the easiest to use. For example, in rare cases volatile may be enough to synchronize.

2.) Synchronization is expensive compared to non-synchronized methods. But it is far cheaper as it used to be in early Java VMs. VM vendors are working very hard to make it even cheaper in future VMs resp. to remove synchronization alltogether on-the-fly if the VM can prove that it is (currently) not needed.

3.) I'm speaking about the usual Collections.synchronizedXyz and Vector/Hashtable. There are, of course, some collection classes which are declared to be thread-safe (see java.util.concurrent), but they usually sacrifice some other guarantees (e.g. iterator may skip some elements)

5.) And be carefully when you use non-standard concurrency mechanisms invented by others (see e.g. DCL)

7.) atomic compare-and-set operations are a nice way to implement thread-safe check-and-set operations, but they are non-trivial to use

10.) Try to avoid inner-outer-lock constructs where sometimes first the outer than then inner lock is acquired and sometimes vice versa. If you must lock two similar objects try to order them based on id (popular example: order locks on bank accounts based on the account number)

PS: I do obviously not agree with this guy :)

About the usefulness of unit tests for bug fixing

I've seen this one done wrong (IMHO) so often, so I thought I just write down my recommended practice.

Using Unit tests for bugfixing
When you encouter a bug in your application you should do the following:
  1. Write a unit test which reproduces the bug
  2. fix the bug
  3. rerun the unit test to check the bug is fixed
  4. if the unit test still fails, check if either your bugfix or your unit test is flawed and go back to 2 or 3
  5. if you've changed your unit test in 4, then temporarily undo your bugfix and check that the unit test still fails iun that case
  6. deliver fix
What is the advantage of this approach?
For one this helps in fixing the bug, as you can test without any manual steps (e.g. running the whole application) that your bugfix really fixes the bug.
Additionally, the unit test serves as a regression test to prevent the bug from creeping again into later releases.

'Legacy' applications
Often you have applications which are not easily unit-testable. I can tell you a story or two about an application in which not even the tiniest parts were testable without db access (damn singletons!)
Even in that case, I try by all means to set up a unit test by refactoring small parts of the code base to make it testable. However, sometimes the bug fix is so urgent that you don't have the time to write the unit test before the bugfix. So it looks more like this:
  1. fix the bug
  2. test the fix manually
  3. deliver fix
  4. refactor code
  5. write unit test
  6. temporarily roll back the fix to test that the unit test fails
  7. repeat 4, 5 and 6, if needed
  8. deliver fix again
Why all this work (4 to 8), if the bug was already fixed?
Apart from the function as a regression test, this also helps in:
  • improving the test coverage of your legacy application which in turn enables you to do changes with greater confidence
  • improves code structure by e.g. replacing singletons with dependency injection
  • gives you more insight into the legacy code base

27 April 2008

What I don't like about type inference

These days everyone seems to be writing about some of the "new" languages on top of the JVM - or podcasting that's absolutely awesome what A does with groovy or B does with Scala. So I decided to take a look at these "new" languages, too.

In the last weeks I read a lot about Scala, worked through the examples and started to play a little by myself. Basically I like the language very much, mainly because it blends the best of Java with functional programming - and some memories from my college Scheme classes are coming back.

However there are a few things I don't like. Type inference, at least how it is implemented in Scala, is one of it. I can perfectly understand that

Map<String,List<Map<String,Long>>> map =
new HashMap
<String,List<Map<String,Long>>>();

is awful. In Scala you use type inference and write

val map = new HashMap[String,List[[Map[String,Long]]]();

which is much shorter. However this has one drawback: map is inferenced as being of type HashMap and NOT Map. This violates one of the basic programming principles I learned and am defending passionately: Program against interfaces and NOT against concrete classes.
(BTW: Neil Gafter made a proposal how to combine type inference with programming against interfaces: http://gafter.blogspot.com/2007/07/constructor-type-inference.html)

On a side node, did you notice how a immutable value is declared in Scale? With val. Guess how a mutable variable is declared? Right, with var. How stupid is that? Just one character difference between two fundamentally different types of variables? The difference is quite important as the first type allows for the purely functional share nothing approach, which makes concurrent programming so much easier, while var stands for the "traditional" approach of shared variables which must potentially be protected against concurrent access from different threads.

05 April 2008

My blog

Hi,

this is my first blog and my first post. So, 'Hello World'.

I'm going to post here whenever I think I'll have something interesting to write.
So prepare to very irregural updates and articles mainly about software development in general and Java in particular.

Christoph