25 July 2009

setting SVN autoprops

I've just enabled the auto-props feature for my local svn installation (see here how to do it). Seriously, this is the 1st thing you should do after you install a subversion client as cleaning up your later, when you have already added hundreds of files to your repository is a tedious task :-(

As usual I googled a little bit for some advise how to do it best and to my suprise I usually found something like this:

[auto-props]
*.java = svn:mime-type=text/plain;svn:eol-style=native;svn:keywords=Date Revision
*.sh = svn:executable;svn:eol-style=native;svn:keywords=Date Revision
(e.g. here)

So what's the problem with that list (except that svn:mime-type is missing for .sh)?
svn:eol-style is set to native which means that files are automatically converted to the platform line end style when they are checked out - e.g. to CRLF on Windows.

Why is this a problem?
Repeatable builds! If you check out the same revision once on Windows and once on Linux you get different artifacts when you build it with e.g. Maven. For Java files this may be no big deal as they are usually compiled into class files, but shell scripts (which get packaged in your Maven artifact) may stop to work correctly if they have suddenly CRLF instead of LF line ends.

So my auto-props configuration looks like this:

*.sh = svn:eol-style=LF;svn:executable
*.txt = svn:eol-style=LF
*.java = svn:mime-type=text/plain;svn:eol-style=LF;svn:keywords=Date Revision Author Id HeadURL
*.xml = svn:mime-type=text/xml;svn:eol-style=LF
*.properties = svn:mime-type=text/plain;svn:eol-style=LF
*.png = svn:mime-type=image/png
*.jpg = svn:mime-type=image/jpeg

Come on guys, there is really no need to use native style on Windows anymore! If your editor/IDE cannot handle LF ends, get a better one!


Note: unfortunately there is currently no way to define auto-props on a repository level (http://subversion.tigris.org/issues/show_bug.cgi?id=1974), so I have to repeat this procedure on my Linux box now

05 April 2009

Ten Advises on Concurrency (in Java)

I've been working with all kinds of threading and synchronization mechanisms in Java for many years now. Though obviously much progress has been made in the common understanding of concurrency (e.g. through such wonderful books like Java Concurrency in Practice by Brian Goetz), I've seen the same errors made again and again.

The problem is, I suppose, that the matter is much to complicated for the average programmer. Therefore I tried to compile ten simple advises for concurrency in Java:
  1. if 2 or more threads access the same data, synchronization is necessary. Period.
  2. synchronized methods are not expensive (at least not as expensive as the time needed to debug the data inconsistencies you get otherwise)
  3. iterating over a synchronized collection is not safe (i.e. is not protected against ConcurrentModificationExceptions)
  4. many classes in the Standard Java library, which you think of as thread-safe, are not. (e.g. DateFormat)
  5. don't try to invent your own clever solutions to avoid synchronization (they are broken most of the time)
  6. use the classes from java.util.concurrent, but read the javadoc carefully!
  7. check-and-act operations (e.g. incrementing a counter) are not atomic and thus inherently not thread-safe
  8. learn the basics:
    • Thread.start/join
    • synchronized-blocks
    • Object.wait/notify/notifyAll
    • volatile
  9. programming in a concurrent scenario is not intuitive!
  10. When multiple locks are involved in a single operation, be aware of lock ordering or deadlocks will occur sooner or later.
Of course, most of these points cannot be left uncommented:

1.) with synchronization in this context I mean all kind of inter-thread-communication. Java synchronized-blocks are just the easiest to use. For example, in rare cases volatile may be enough to synchronize.

2.) Synchronization is expensive compared to non-synchronized methods. But it is far cheaper as it used to be in early Java VMs. VM vendors are working very hard to make it even cheaper in future VMs resp. to remove synchronization alltogether on-the-fly if the VM can prove that it is (currently) not needed.

3.) I'm speaking about the usual Collections.synchronizedXyz and Vector/Hashtable. There are, of course, some collection classes which are declared to be thread-safe (see java.util.concurrent), but they usually sacrifice some other guarantees (e.g. iterator may skip some elements)

5.) And be carefully when you use non-standard concurrency mechanisms invented by others (see e.g. DCL)

7.) atomic compare-and-set operations are a nice way to implement thread-safe check-and-set operations, but they are non-trivial to use

10.) Try to avoid inner-outer-lock constructs where sometimes first the outer than then inner lock is acquired and sometimes vice versa. If you must lock two similar objects try to order them based on id (popular example: order locks on bank accounts based on the account number)

PS: I do obviously not agree with this guy :)

About the usefulness of unit tests for bug fixing

I've seen this one done wrong (IMHO) so often, so I thought I just write down my recommended practice.

Using Unit tests for bugfixing
When you encouter a bug in your application you should do the following:
  1. Write a unit test which reproduces the bug
  2. fix the bug
  3. rerun the unit test to check the bug is fixed
  4. if the unit test still fails, check if either your bugfix or your unit test is flawed and go back to 2 or 3
  5. if you've changed your unit test in 4, then temporarily undo your bugfix and check that the unit test still fails iun that case
  6. deliver fix
What is the advantage of this approach?
For one this helps in fixing the bug, as you can test without any manual steps (e.g. running the whole application) that your bugfix really fixes the bug.
Additionally, the unit test serves as a regression test to prevent the bug from creeping again into later releases.

'Legacy' applications
Often you have applications which are not easily unit-testable. I can tell you a story or two about an application in which not even the tiniest parts were testable without db access (damn singletons!)
Even in that case, I try by all means to set up a unit test by refactoring small parts of the code base to make it testable. However, sometimes the bug fix is so urgent that you don't have the time to write the unit test before the bugfix. So it looks more like this:
  1. fix the bug
  2. test the fix manually
  3. deliver fix
  4. refactor code
  5. write unit test
  6. temporarily roll back the fix to test that the unit test fails
  7. repeat 4, 5 and 6, if needed
  8. deliver fix again
Why all this work (4 to 8), if the bug was already fixed?
Apart from the function as a regression test, this also helps in:
  • improving the test coverage of your legacy application which in turn enables you to do changes with greater confidence
  • improves code structure by e.g. replacing singletons with dependency injection
  • gives you more insight into the legacy code base