Get the latest Java books
h t t p : / /w w w . j a v a c o f f e e b r e a k . c o m /
Looking for Java resources? Check out the Java Coffee Break directory!
Allen Holub is the author of "Taming Java Threads", a new book published recently by APress. In this exclusive interview, he talks to us about the tricky problems of multi-threading in Java, and the programming language in general.
Q: What do you see has been the biggest change affecting the Java community in the last year?
A: I don't see that there has been a *big* change in the last year.
The 1.3 JDK also adds some other truly useful, but less important, stuff: The java.awt.Robot finally makes it possible to test GUIs in an automated fashion, and the Runtime.addShutdownHook() method is essential for implementing the Singleton design pattern.
None of this is exactly revolutionary, which I see as a good thing. Java finally seems to be settling down enough that I don't have to spend my life continually chasing after the new feature of the week. Frankly, I think that the language has enough features (and this has been the case for some time). Microsoft adds features as a strategy to prevent others from cloning the operating system, but Java doesn't need to protect itself from clones in this way. My heartfelt wish is that Sun proclaim a new-feature moratorium and concentrate of fixing the myriad bugs that permeate all the packages (and the compiler and VM, for that matter).
Q: What do you see as the main advantages of Java, compared to other languages like C++?
A: Ease of programming. I programmed in C++ for eight years, I never looked back once I started programming in Java. C++ is hideously complicated. I once witnessed a roundtable discussion where most of the big C++ gurus were asked (I'm paraphrasing):
If my memory serves, the highest number came from Stroustrup, who put himself at 70%. I think most C++ programmers are closer to 7%. I'm convinced that one of the reasons that C++ programs are so buggy is that the language is so complex that it's just not possible to write a correct program in it.
It is possible, on the other hand, for a single person to understand all of Java, including the packages. Java does everything C++ does---including multiple inheritance---but is a lot easier to program. (Parameterized types are missing, but that's coming.) The enormous wealth of libraries that accompany the language---both in the java.* packages and provided by third parties---is also an enormous advantage. You can just get more done, faster.
This is not to say that I think that Java is perfect---it's just better than the realistic alternatives. (I say "realistic" because I like Eiffel a lot, but I don't expect to be programming in it any time soon.) Java has enormous problems with its threading model, as is discussed in "Taming Java Threads." A lot of the libraries are surprisingly bad (Swing's text controls are an abomination, for example, and EJB is way to complicated for what it does.) Moreover, the language is still hideously buggy in some places---it's inexcusable that printing is still screwed up, and that you can't interrupt() out of a blocking I/O operation.
On the plus side, the "community process" process does leave the way open for the language to evolve. It's been a persistent joke that a programming language becomes obsolete as soon as the official specification is released. The reason for this phenomenon, I think, is that a programming language can't be static---it has to evolve to meet the needs of the programmers and the "business" requirements of the users. The fact that Java is not fixed, and that there's even an official evolutionary mechanism, is likely to make Java much more long lived than other languages.
Q: What type of applications use multi-threading? What type of programmer needs to understand multi-threaded programming?
A: Virtually all applications use multithreading (or should). On the client side, Swing/AWT uses a single thread both to handle all OS-level events and to dispatch notifications to the listeners. This architecture means that the user interface is locked---completely unresponsive---while your program is in the process of servicing a UI event like a button press or a menu-item selection. If you want a cancel button to work, for example, you *must* use a thread to implement the operation that you intend to cancel. Most books on Swing conveniently ignore this problem.
On the sever side, there is typically one or more thread per client connection, with many other threads on the scene doing things like talking to databases. A Servlet, in fact, is something of a worst-case scenario with respect to threading since there is only one instance of the Servlet, but each client connection is handled on its own threads and all client-connection threads simultaneously talk to the same Servlet object---something of a worst-case synchronization scenario.
To make matters worse, the behavior of threaded systems---particularly in
a mutiprocessor environment---is counterintuitive. I'm writing an article
on this subject for JavaWorld right now, but the problem has to do with the way that the hardware works. In effect, virtually none of the
clever tricks that people try to use to get around the overhead of synchronization actually work. (The "double-checked locking" idiom
for singleton access is a case in point--it just doesn't work. There's a good article
that explains why at http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
The upshot of all this is that it's simply not possible to write ANY production-quality Java without understanding muti-threaded programming. All Java programmers need to know this stuff.
Q: What are the differences between multi-process programming, which our Unix readers who remember the days of fork() will appreciate, and multi-threading?
A: I'm hoping that you're joking about Unix readers using multi-process programming. All Unix systems support threading at this juncture---in fact the Solaris and POSIX threading models are (in my completely unbiased opinion :-) vastly better than the Microsoft models, for example. Any Unix program that's still implementing concurrency using a fork() is hopelessly obsolete.
A process is effectively an address space---and all the overhead and data structures, such as virtual-memory tables, that are required to implement the address space. In Java you can think of the VM and a process as rough equivalents. (I'm simplifying: Since the VM is implemented as a DLL or shared library, it's possible for multiple instance of the VM to be running in a single process, but that's rare.)
A thread is a thread of execution---a sequence of memory locations that contain executable instructions, visited by the CPU in some order defined by the instructions themselves. The thread data structure keeps track only of execution-related things, like the register set and runtime stack.
Swapping one process with another is a big deal, since you have to mess with large data structures---such as virtual-memory tables---and may even have to swap memory to disk. Swapping a thread is extremely efficient: push a few registers onto on thread's runtime stack, then pop new values from another thread's runtime stack. In C, you can implement a simple cooperative threading model using setjump() and longjump() calls.
Because of the overhead issues, threads are better than multiple processes in all but one situation: Multiple processes give you multiple address spaces, so memory-intensive operations might have to occur in multiple processes.
Q: From a performance perspective, what's the practical limit on the number of threads? How does one determine whether you're using too many, or too few?
A: The practical limit is set by the OS---one of many reasons why Java's threading model can't be architecture neutral. NT (the OS itself) tends to get unstable if you have more than a couple hundred threads running. (I'm not sure about Win2K, I haven't tried to crash it yet.) Solaris is happy with many thousands of threads running.
That being said, it never makes sense for more threads to be *running* at a given moment than you have processors. That is, an operation performed by two threads sharing a single CPU will run more slowly than the same operation rewritten to be single threaded because of the time wasted doing thread-context swaps.
Threads don't make sense, from a performance point of view, unless they can run on their own CPUs. This is not to say that you don't want more than 16 threads on a 16-CPU box, but it does mean that you want only 16 of those threads to be running at a given moment. The others should be blocked (suspended) waiting for something to do. For example, they could be waiting for client to connect to a socket, or waiting for a DMA-based disk-I/O operation to complete.
On the other hand, it's sometimes worthwhile to use multiple threads as an organizational tool, provided that you have sufficient mastery of multithreaded-programming techniques that you don't cause problems simply by introducing threading into the mix. It's not possible to program naively in a multithreaded environment.
Q: While your code examples are excellent, one of the restrictions on their use is that their source isn't redistributed, which has caused some confusion amongst readers. Can your code be used in open-source projects?
A: One concern that I do have is that I want the source code distributed from my web site rather than somewhere else---that way I can keep it up to date, fix bugs, and generally monitor the state of the code. There's no problem using the code in open-source software, but I'd prefer for my portion of the source to be downloaded from my web site rather then being bundled into a CD-ROM or .zip distribution or equivalent (and I'd prefer changes to the code to be run through me so that I can keep the implementation coherent). Now that web distribution is so commonplace, this desire on my part doesn't seem so awful to me.
Q: Looking to the future, where do you see Java heading? Is there a particularly dominant technology (e.g. J2EE, CORBA, Jini) that you feel will change the way we look at Java?
A: That's quite a question. I'm hoping that Java will head more in the
direction of OO systems than away. A lot of Java is very procedural in structure. The threading model is a case in point; it's not in the
least bit OO. (I talk about this issue quite a bit in "Taming Java Threads.") EJB is also pretty miserable as it stands now. The separation
of the Session and the entity bean is very procedural, not to mention the fact that people don't leave the entity beans on the server
as was, I believe, the intent of the designers, but ship the things around
I suppose what I just did in the last paragraph was to talk myself into believing that one of the technologies that you mentioned is indeed more important than the others---Jini. I agree with Don Norman (who wrote "The Design of Everyday Things" and "The Invisible Computer," both good books), who believes that computers as we know them will gradually disappear in favor of smart appliances, and Jini is the enabling technology for a smart appliance.
In the original telephone systems, the onus of getting connected to the person you wanted to talk to was entirely on the end user of the system. All you had were party lines. Eventually, someone came up with the idea of a central switchboard, which is basically where we are now with server-based architectures. One of the objections to widespread use of the telephone was that it was impossible because everybody would have to be an operator. That's exactly what happened though, when we dial a phone number, we are acting as an operator. The reason I bring this up is that there will come a time, I think, where everybody will be a programmer. Not in the sense of writing programs in a language like Java, but in the sense of being able to communicate to the machine what you want it to do. This communication, however, can be done effectively only in the context of a specialized UI in the sense of a piece of gear specialized for performing a single task. These pieces of gear will, of course, need to talk to each other.
I'll give you an example:
Probably the most usable complex computer system that I use daily is
my car. In fact, the UI is so good that I'm not even aware that I'm using a computer. But I am. With the exception of the steering and the
emergency backup system on the brakes, there's not a single mechanical control in the car that is physically connected to the thing that it
controls. The controls literally comprise a mechanical user interface to a computer, which is actually running the automobile. In fact, the
car isn't even a single computer. It's literally a network of computers distributed throughout the chassis, actively communicating with
Now consider the notion of word processing. I imagine that eventually
I'll do my writing on a piece of intelligent "paper" that I can fold up and put in my pocket. This "paper" will let me correct what I write on
it, though (perhaps using pen-style idioms, perhaps with some other metaphor), and will internally store many virtual pages. For the paper
to be useful, though, it needs to be able to plug into a network of similarly specialized devices and talk to them. I might want to send
a piece of virtual paper with a shopping list on it to a supermarket,
Jini, of course, takes care of only the inter-device communication part of this puzzle, but that's a pretty big part. Eventually, I imagine, similar enabling technologies will emerge to take care of the other parts of the puzzle. Java, because if it's platform independence and vendor neutrality, seems like a good vehicle for this technology.
Q: Well Allen, we appreciate that look into the world of Java and multi-threading. For readers that want to know more, Allen's book is called "Taming Java Threads", and is published by Apress.