Executor: Executed

A section on our big office whiteboard has this mysterious series of markings on it:

Last bear trap date

How did it get there? What does it mean?

Crossposted from https://technology.lmax.com

The bear trap

Perhaps we would like to check a queue for new work each minute. Perhaps there are a series of metrics we would like to scrape and publish each second. In java land, the natural solution to this is to reach for java.util.concurrent.ScheduledExecutorService; more specifically, its scheduleAtFixedRate and scheduleWithFixedDelay methods.

interface ScheduledExecutorService {
  ScheduledFuture<?> scheduleAtFixedRate(
          Runnable command,
          long initialDelay,
          long period,
          TimeUnit unit);

  ScheduledFuture<?> scheduleWithFixedDelay(
          Runnable command,
          long initialDelay,    
          long delay,
          TimeUnit unit);
}

Assuming we don’t overload the underlying executor, this is a cheap, reliable and well understood way to achieve this goal. Unfortunately though, a bear trap awaits. People have known about this bear trap for ages but, as we see from our whiteboard picture, we’re still falling into it (I suspect that java 8’s method reference sugar is partially responsible, but that’s a different story).

What is the trap? Well, hidden in plain sight in the javadoc of those two methods, we find the following:

The sequence of task executions continues indefinitely until one of the following exceptional completions occur:

  The task is explicitly cancelled via the returned future.
  The executor terminates, also resulting in task cancellation.
  An execution of the task throws an exception. In this case calling get on the returned future will throw ExecutionException, holding the exception as its cause.

Subsequent executions are suppressed.

The third clause is the trap - if the task passed in ever throws, that’s the last time our task will be run. Worse - if nothing ever invokes get on the returned ScheduledFuture, we’ll never see the problematic stack trace, either.

To recap, we’ve got two errors:

  • Things stop working
  • We don’t get told that they’ve stopped

Typically the way we tend to find that someone’s fallen into a bear trap is that a rare but possible condition occurs and breaks their program. At some point later, we notice the application is broken, but no errors have been logged. The application’s become a ghost town - all of the code is there, but there is no longer a thread of execution visiting it. Without the stack trace of the original error, knowing how to fix it is going to involve unpleasant amounts of imagination.

Notably this isn’t just a problem with the scheduler based executors - if we execute a Runnable on a plain old Executor and it throws, it will also not work and not tell us why. This is not great!

Finally: Why is it on the board? To remind me that I still haven’t eradicated this bug, and to do something about it.

Is the API wrong?

There’s been a lot of vitriol directed at this interface over the years. My personal view has veered from the extreme of “someone is wrong on the internet” to “I can sort of see how we ended up here”.

Here’s one possible story. Java is exceptional (ho ho). It leans heavily on exceptions as a communication mechanism, however much we might want it not to. Let’s say our Runnable has encountered a problem which means it can never again succeed - it can communicate this to the executor above merely by throwing an exception. How convenient! If it encounters a transient problem that triggers an Exception, well, maybe it should catch that Exception itself! In the case where ScheduledExecutorService catches it for us, we either lose the ‘convenient’ stop mechanism altogether, or have to specify and then have our user pass in more equipment to decide which Exceptions truly are exceptional.

I’m not totally convinced by that argument - it feels like it would perhaps be better if the interface made it clearer that some Exception danger is on the cards, rather than leaving that behaviour implicit enough to require that bit of documentation. I hope that if I were the author now, writing that in the docs (rather than in the types) would make me reconsider.

I feel the same way about this API as I do about much of the rest of java - that it’s perhaps half a level less opinionated than I’d have liked. Having discovered this, wrapping it up in something that matches one’s own tastes is the way forward. That’s perhaps a good thing if you think taste is personal (java === freedom of expression) or a meh thing if you think your taste is universally better than giving everyone the option to screw up. No guesses as to which side I’m on here!

Conclusion: if being too unopinionated is a fault, this code exhibits that.

This is not the fix

We (eventually; there’s a backlog of historical usages to be converted over) want to ban the direct import of java.util.concurrent.Executors anywhere outside of our own wrapper. Here was our first attempt at this.

package com.lmax.commons.threading;

public final class Executors 
{
    public static ScheduledExecutorService newSingleThreadScheduledExecutor(final String threadName) 
    {
        final CatchAllThreadFactory threadFactory = new CatchAllThreadFactory(threadName);
        return java.util.concurrent.Executors.newSingleThreadScheduledExecutor(threadFactory);
    }

    private static final class CatchAllThreadFactory implements java.util.concurrent.ThreadFactory
    {
        private final String threadName;
        private final ThreadFactory delegate;
        /* constructors elided */

        @Override
        public Thread newThread(final Runnable r)
        {
            final Thread thread = delegate.newThread(r);

            thread.setName(threadName);
            thread.setUncaughtExceptionHandler((t, e) -> LOGGER.error("Uncaught exception on thread " + t.getName(), e));

            return thread;
        }
    }
}

This works for Executor::execute, but the uncaught exception handler isn’t invoked in either of the scheduler methods!

To get the behaviour we want we can’t just parametrise the executor; that really is an indictment of the API.

Now we’re faced with a borderline interesting decision to make. We’re going to wrap up the ScheduledExecutorService instance, but should our wrapper implement ScheduledExecutorService? We chose not to - while we keep the same type signatures, our behaviour is going to be subtly different to what the javadoc suggests. We want to make it obvious that we’re different.

Evicting the bear trap

(…and the whole ScheduledExecutorService interface, for the caller!)

public class ScheduledExecutor {
    private final ScheduledExecutorService scheduledExecutorService;
    /* constructor elided */

    public ScheduledFuture<?> scheduleAtFixedRate(
            final Runnable command,
            final long initialDelay,
            final long period,
            final TimeUnit unit) {
        return scheduledExecutorService.scheduleAtFixedRate(
                () -> runWithExceptionHandling(command),
                initialDelay,
                period,
                unit);
    }

    private static void runWithExceptionHandling(final Runnable command)
    {
        try
        {
            command.run();
        }
        catch (final Exception e)
        {
            LOGGER.error("Error while running scheduled job", e);
        }
    }
}

And now we have only…155 problems to fix - now I remember why I wrote it on the board rather than fixing it straight away!

Oof, that’s a lot of code

Absent an automated refactoring tool, and possessed with the desire to know exactly how many of those usages could potentially set off the bear trap, we set off on an exciting tour of scheduling across our monorepo. A day or so later, we’re down to 0 imports, and we’ve helped 36 places not silently fail at some point in the future. Not a bad haul, all told.

One spotbugs rule later (and a few extra instances that managed to use a scheduler without the import) and we can finally clean that bit of the whiteboard.

public class ExecutorBearTrapDetector extends IllegalInvocationDetector
{
    public ExecutorBearTrapDetector(final BugReporter bugReporter)
    {
        super(
                bugReporter,
                new IllegalMethod(Executors.class, "newScheduledThreadPool", "EXECUTOR_BEAR_TRAP"),
                new IllegalMethod(Executors.class, "newSingleThreadScheduledExecutor", "EXECUTOR_BEAR_TRAP"));
    }
}

Bears are now safe to roam (in the woods, presumably)

Ok, I admit, I couldn’t quite bring myself to clean it off.