Thread.sleep(0) is not for free
In this short blog post, I want to clear up a common misconception about java.lang.Thread.sleep
.
Calling Thread.sleep(0)
is not for free. True, the official documentation
would allow a fast path for millis == 0
that only checks the interrupted status of the current
thread to potentially throw an InterruptedException.
The actual implementation however, at first calls into the native method Thread.sleepNanos0(long nanos)
which is defined in Thread.c,
and implemented by JVM_SleepNanos.
The native code then checks for interrupts and indeed contains a special case for nanos == 0
, however, it looks like
if (nanos == 0) {
os::naked_yield();
} else {
// ...
}
This means that calling Thread.sleep(0)
is more or less the same as checking for interrupts, and calling
Thread.yield().
On Linux, os::naked_yield()
is implemented by the posix function
sched_yield,
that clearly states in its documentation
Avoid calling sched_yield() unnecessarily or inappropriately (e.g., when resources needed by other schedulable threads are still held by the caller), since doing so will result in unnecessary context switches, which will degrade system performance.
Indeed, with my local setup, calling Thread.sleep(0)
is roughly as expensive as calling
ThreadLocalRandom.nextBytes
with byte[128]
, according to this JMH benchmark.
Moreover, Thread.sleep(0)
is especially expensive when you need it the least, that is when the CPU is already overloaded, since
yielding will more likely result in context switches if there are many threads starving for CPU. Let me demonstrate this using a
small JMH benchmark:
@Fork(value = 1)
@Warmup(iterations = 3, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
@State(Scope.Benchmark)
public class ThreadSleep0Benchmark {
@Param("85")
private int cpuTokens;
@Benchmark
public void burnCpu() {
Blackhole.consumeCPU(cpuTokens);
}
@Benchmark
public void burnCpuAndSleep0() throws InterruptedException {
Thread.sleep(0);
Blackhole.consumeCPU(cpuTokens);
}
}
It contains two methods, both burning the same amount of CPU, however, one of them additionally calls Thread.sleep(0)
.
I ran both tests on my laptop, with 1, 4, 10 and 20 threads. Microsoft Copilot was so nice to assemble this graph for me
visualizing the results:
As you can see burnCpu
scales more or less linearly, till it plateaus at 10 threads. This makes perfect sense since my
CPU has 4 performance and 6 efficiency cores.
burnCpuAndSleep0
though exhibits a very different behaviour. With 10 threads, the throughput is only slightly better than with
4 threads, and with 20 threads, it’s worse than with 1 thread.
Cutting a long story short:
If you have a piece of performance critical code, that under certain, very unlikely conditions employs sleep
to
back off, for example after an error, or in an overload situation, replace
int delay = allGood ? 0 : waitShorty;
Thread.sleep(delay);
with
if (!allGood) {
Thread.sleep(waitShortly);
}
or use TimeUnit.sleep
that contains a similar, clearly documented, fast path. If you really rely on the undocumented yield
implied by
Thread.sleep(0)
, better make that explicit, by calling Thread.yield()
directly.