Thread.sleep(0) is not for free

In this short blog post, I want to clear up a potential misconception about java.lang.Thread.sleep. Calling Thread.sleep(0) is not for free. True, the official documentation would allow a fast path for millis == 0 that only checks the interrupted status of the current thread to potentially throw an InterruptedException. The actual implementation however, at first calls into the native method Thread.sleepNanos0(long nanos) which is defined in Thread.c, and implemented by JVM_SleepNanos. The native code then checks for interrupts and indeed contains a special case for nanos == 0, however, it looks like

  if (nanos == 0) {
    os::naked_yield();
  } else {
    // ...
  }

This means that calling Thread.sleep(0) is more or less the same as checking for interrupts, and calling Thread.yield(). On Linux, os::naked_yield() is implemented by the posix function sched_yield, that clearly states in its documentation

Avoid calling sched_yield() unnecessarily or inappropriately (e.g., when resources needed by other schedulable threads are still held by the caller), since doing so will result in unnecessary context switches, which will degrade system performance.

Indeed, with my local setup, calling Thread.sleep(0) is roughly as expensive as calling ThreadLocalRandom.nextBytes with byte[128], according to this JMH benchmark. Moreover, Thread.sleep(0) is especially expensive when you need it the least, that is when the CPU is already overloaded, since yielding will more likely result in context switches if there are many threads starving for CPU. Let me demonstrate this using a small JMH benchmark:

@Fork(value = 1)
@Warmup(iterations = 3, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
@State(Scope.Benchmark)
public class ThreadSleep0Benchmark {
    @Param("85")
    private int cpuTokens;

    @Benchmark
    public void burnCpu() {
        Blackhole.consumeCPU(cpuTokens);
    }

    @Benchmark
    public void burnCpuAndSleep0() throws InterruptedException {
        Thread.sleep(0);
        Blackhole.consumeCPU(cpuTokens);
    }
}

It contains two methods, both burning the same amount of CPU, however, one of them additionally calls Thread.sleep(0). I ran both tests on my laptop, with 1, 4, 10 and 20 threads. Microsoft Copilot was so nice to assemble this graph for me visualizing the results:

image showing the impact of sleep0 depending on thread count

As you can see burnCpu scales more or less linearly, till it plateaus at 10 threads. This makes perfect sense since my CPU has 4 performance and 6 efficiency cores.

burnCpuAndSleep0 though exhibits a very different behaviour. With 10 threads, the throughput is only slightly better than with 4 threads, and with 20 threads, it’s worse than with 1 thread.

Cutting a long story short:

If you have a piece of code, that under certain conditions employs sleep to back off, for example after an error, or in an overload situation, replace

int delay = allGood ? 0 : waitShorty;
Thread.sleep(delay);

with

if (!allGood) {
    Thread.sleep(waitShortly);
}

or use TimeUnit.sleep that contains a similar, clearly documented, fast path. If you really rely on the undocumented yield implied by Thread.sleep(0), better make that explicit, by calling Thread.yield() directly.