Garbage In/Garbage Out

Lots of programmers have moved from using languages that primarily don’t do Garbage Collection to languages that do primarily do Garbage Collection. In fact, I’m probably a late comer to using it seriously. Sure, I’ve used some amounts of java on the side a bit over the past few years – enough to be dangerous, at least. But I haven’t used it enough to really care how the GC was working or even notice bugs where the GC was masking things for me.

In the good old C++ days, every major programming effort that I’ve been involved with had lots of memory allocator debugging techniques employed. We’d use macros for malloc/free, override new/delete, use purify, zeroify memory when its deallocated, create safety zones on each side of the buffers, etc. After you’d done it for a while, these techniques served you pretty well, and with very little effort, you could debug all your memory usage patterns.

Now, fast forward to the land of Garbage Collection. With the language naturally figuring out what you intended to free and not free, you shouldn’t need any of these tools, right? Well, sort of. So far, in my short experience with GC’d languages, it seems pretty common that you need to reference *something* that isn’t written in the GC’d language. For example, java calling out to C++. In this case, you are passing objects back and forth. Sometimes pointers, sometimes not. But either way, you’ve got references to objects that are not going to be GC’d held by objects that are GC’d. Unless you have a perfectly neat little program that can be 100% java, you may run into this. And, debugging it is a pain!

Why is it hard to debug? Well, in C & C++, you can employ all sorts of tricks to allocate/deallocate memory differently. But in the GC’d world, once you drop your references to the object, its going to get cleaned up eventually. And – you don’t know when! When does the GC run? When does it not run? Not much you can do.

Finally, I found one trick which helped a bit. That was to create a simple thread that sits in the background (development mode only) and initiates the GC collection process every second or so. This way, if I’ve got some dangling reference somewhere, the GC will collect the object, and I’ll notice the bug a *lot* sooner than I would have otherwise.

Anyway, this probably isn’t interesting to most folks, but I found it an interesting problem. I like the benefits of not worrying about GCs. But my stodgy old C++ side really likes understanding exactly when my objects are coming and going. Maybe I’m a control freak.