What the Hell is GetOpaque in Java
In this blog post, that can be seen as a follow-up of my last post about acquire release semantics, I want to shine some light on getOpaque and setOpaque.
I’ll show how it is defined and what that means in practice, as well as how get
and setOpaque
relate
to the weaker getPlain
and setPlain
and the stronger getAcquire
and setRelease
.
Javadocs and their Interpretation
Unfortunately, the Javadocs
of get
and setOpaque
are somewhat vague, stating
public final Object getOpaque(Object… args)
Returns the value of a variable, accessed in program order, but with no assurance of memory ordering effects with respect to other threads.
public final void setOpaque(Object… args)
Sets the value of a variable to the newValue, in program order, but with no assurance of memory ordering effects with respect to other threads.
Rephrasing this, it means that the runtime or hardware must not reorder opaque operations with other opaque operations on the same variable. It is, however, perfectly legal to reorder opaque operations with other opaque operations on different variables.
To illustrate what that means, I’ve put together a small JCStress test here:
@JCStressTest
@Description("Demonstrates the behaviour of opaque mode")
@Outcome(id = "0, 0, 0", expect = Expect.ACCEPTABLE)
@Outcome(id = "0, 0, 1", expect = Expect.ACCEPTABLE)
@Outcome(id = "0, 0, 2", expect = Expect.ACCEPTABLE)
@Outcome(id = "0, 1, 0", expect = Expect.ACCEPTABLE)
@Outcome(id = "0, 1, 1", expect = Expect.ACCEPTABLE)
@Outcome(id = "0, 1, 2", expect = Expect.ACCEPTABLE)
@Outcome(id = "1, 0, 0", expect = Expect.FORBIDDEN, desc = "b=1, then b=0")
@Outcome(id = "1, 0, 1", expect = Expect.ACCEPTABLE_INTERESTING, desc = "b=1, then a=0")
@Outcome(id = "1, 0, 2", expect = Expect.ACCEPTABLE_INTERESTING, desc = "b=1, then a=0")
@Outcome(id = "1, 1, 0", expect = Expect.FORBIDDEN, desc = "b=1, then b=0")
@Outcome(id = "1, 1, 1", expect = Expect.ACCEPTABLE)
@Outcome(id = "1, 1, 2", expect = Expect.ACCEPTABLE)
@Outcome(id = "2, 0, 0", expect = Expect.FORBIDDEN, desc = "b=2, then b=0")
@Outcome(id = "2, 0, 1", expect = Expect.FORBIDDEN, desc = "b=2, then b=1")
@Outcome(id = "2, 0, 2", expect = Expect.ACCEPTABLE_INTERESTING, desc = "b=2, then a=0")
@Outcome(id = "2, 1, 0", expect = Expect.FORBIDDEN, desc = "b=2, then b=0")
@Outcome(id = "2, 1, 1", expect = Expect.FORBIDDEN, desc = "b=2, then b=1")
@Outcome(id = "2, 1, 2", expect = Expect.ACCEPTABLE)
@State
public class AtomicIntegerSetGetOpaqueTest {
final AtomicInteger a = new AtomicInteger();
final AtomicInteger b = new AtomicInteger();
@Actor
public void actor1() {
a.setOpaque(1);
b.setOpaque(1);
b.setOpaque(2);
}
@Actor
public void actor2(III_Result r) {
r.r1 = b.getOpaque();
r.r2 = a.getOpaque();
r.r3 = b.getOpaque();
}
}
The test contains two atomics, a
and b
, and two actors, that execute concurrently. Note that actor 1 writes a
first,
but actor 2 reads b
first. The list of @Outcome
annotations attached to the test class enumerates all imaginable 3 * 2 * 3 = 18
outcomes, and qualifies them as acceptable, interesting or forbidden. To execute this test after checking out this branch,
you need to run
$ mvn package -pl tests-custom -am -DskipTests
$ java -jar tests-custom/target/jcstress.jar -v -t AtomicIntegerSetGetOpaqueTest
Executing this on a Google Axion ARM VM, I get
Results across all configurations:
RESULT SAMPLES FREQ EXPECT DESCRIPTION
0, 0, 0 467,556,008 48.46% Acceptable
0, 0, 1 54,242 <0.01% Acceptable
0, 0, 2 710,132 0.07% Acceptable
0, 1, 0 801,328 0.08% Acceptable
0, 1, 1 181,784 0.02% Acceptable
0, 1, 2 2,070,255 0.21% Acceptable
1, 0, 0 0 0.00% Forbidden b=1, then b=0
1, 0, 1 3,270 <0.01% Interesting b=1, then a=0
1, 0, 2 12 <0.01% Interesting b=1, then a=0
1, 1, 0 0 0.00% Forbidden b=1, then b=0
1, 1, 1 149,163 0.02% Acceptable
1, 1, 2 346,645 0.04% Acceptable
2, 0, 0 0 0.00% Forbidden b=2, then b=0
2, 0, 1 0 0.00% Forbidden b=2, then b=1
2, 0, 2 339,578 0.04% Interesting b=2, then a=0
2, 1, 0 0 0.00% Forbidden b=2, then b=0
2, 1, 1 0 0.00% Forbidden b=2, then b=1
2, 1, 2 492,645,876 51.06% Acceptable
Note the “interesting” results, where the writes to b
are observed before the writes to a
, though a
is updated
before b
in the Java code. This is legal, since opaque mode gives “no assurance of memory ordering effects with respect to
other threads”. To rule out the “interesting” cases, you can either switch to the more strongly ordered X86 architecture
(see Chapter 10.2 in Volume 3 in the Combined Volume Set of Intel® 64 and IA-32 Architectures Software Developer’s Manuals for details),
or upgrade to acquire-release mode (see my last blog post), like
this:
@Actor
public void actor1() {
a.setRelease(1);
b.setRelease(1);
b.setRelease(2);
}
@Actor
public void actor2(III_Result r) {
r.r1 = b.getAcquire();
r.r2 = a.getAcquire();
r.r3 = b.getAcquire();
}
Additional Documentation
Unfortunately, I could not find any additional official documentation apart from the Javadocs about getOpaque
and
setOpaque
. There is however a very well written, and comprehensive guide about
JDK 9 Memory Order Modes written by Doug Lea
who is the official author of most classes in java.util.concurrent
, that contains a paragraph on
“opaque mode”. According to this document opaque mode is usable in the same contexts as
C++ atomic memory_order_relaxed, which is identical to
relaxed ordering in Rust. Opaque mode is also mentioned and explained in a very interesting
workshop by Aleksey Shipilev about Java Concurrency Stress (JCStress).
Use Cases for get and setOpaque
Opaque mode can be used whenever you want to share an update to a single value with other threads. This includes
primitive types, like boolean
, int
, long
as well as references to immutable objects.
Publishing references to mutable objects using get
and setOpaque
however is unsafe, since reading
a reference with getOpaque
that has been published with setOpaque
does not establish a happens-before-relationship
with writes to non-final fields before the publication. Let me emphasize this essential point with
another JCStress test:
@JCStressTest
@Description("Shows why opaque mode must not used to publish objects with non final fields")
@Outcome(id = "42", expect = Expect.ACCEPTABLE)
@Outcome(id = "0", expect = Expect.ACCEPTABLE_INTERESTING, desc = "observed partially constructed object")
@State
public class AtomicReferenceSetGetOpaqueTest {
static class Holder {
int x = 42;
}
final AtomicReference<Holder> ref = new AtomicReference<>();
@Actor
public void actor1() {
ref.setOpaque(new Holder());
}
@Actor
public void actor2(I_Result r) {
while (true) {
Holder h = ref.getOpaque();
if (h == null) {
Thread.onSpinWait();
continue;
}
r.r1 = h.x;
return;
}
}
}
Here one actor publishes a reference to a Holder
object, that contains a non-final int
, that is initialized to 42
in the
constructor. The other actor waits till the reference shows up, and then reads the int
. Executing this on the same Google Axion ARM VM
I was mentioning before via
$ mvn package -pl tests-custom -am -DskipTests
$ java -jar tests-custom/target/jcstress.jar -v -t AtomicReferenceSetGetOpaqueTest
gives me
Results across all configurations:
RESULT SAMPLES FREQ EXPECT DESCRIPTION
0 51,636 <0.01% Interesting observed partially constructed object
42 1,081,532,417 100.00% Acceptable
As you can see, even in this basic example, there is a tiny probability of things going south when using opaque mode to publish object references. Getting rid of the “interesting” outcome requires any of
- sticking to X86
- upgrading to at least acquire-release mode
- or making
x
final.
Having said all that, let’s let’s come back to valid use cases.
Broadcasting a Stop Signal
Let’s assume that you have one or more worker threads running code like
while (!stopSignalRecieved()) {
doSomeWork();
}
and one thread that at some point should stop the workers. Then opaque mode is an option, like in
final AtomicBoolean stop = new AtomicBoolean();
void startWorkerLoop() {
while (!stop.getOpaque()) {
doSomeWork();
}
out.println("Stopped");
}
void sendStopSignal() {
stop.setOpaque(true);
}
Does it buy you something over just using volatile boolean
, or AtomicBoolean.get
and AtomicBoolean.set
?
Probably not. On X86, reads are compiled to a single mov
instruction, regardless of their type. On ARM,
plain and opaque reads are compiled to ldr
instructions, whereas acquire and volatile reads are implemented using
ldar instructions. I assembled this JMH benchmark,
to gain a better understanding of the performance impact implied by different access modes, that calculates Fibonacci numbers while checking a stop flag.
The benchmark relies on the memory ordering enum
I’ve introduced in my last blog post and looks like
private final AtomicBoolean stop = new AtomicBoolean();
@Param({"VOLATILE", "ACQUIRE_RELEASE", "OPAQUE"})
private MemoryOrdering memoryOrdering;
@Param({"1", "10"})
private int batchSize;
@Param("100000")
private int limit;
@Benchmark
public int fibTillStop() {
final var mod = 1_000_000_007;
var fib0 = 0;
var fib1 = 1;
for (int i = 0; !memoryOrdering.get(stop) && i < limit; i += batchSize) {
for (int j = 0; j < batchSize; j++) {
var fib2 = fib0 + fib1;
if (fib2 >= mod) fib2 -= mod;
fib0 = fib1;
fib1 = fib2;
}
}
return fib1;
}
Executing this on a Google Axion ARM c4a-highcpu-4
instance with a corretto-24.0.1
JVM, I get
A similar picture can be observed on X86. Therefore, if you want to improve the performance
of a loop like the one above, using bigger batches, and not tweaking the memory ordering, is the way to go.
If you read the last paragraphs carefully, you might wonder why we couldn’t use plain reads and writes, that is plain mode, at least on X86 and ARM, to broadcast a stop signal. After all, the CPU instructions used for opaque access on ARM and X86 are ordinary loads. However, CPU instructions only matter if they are executed, which might not be the case in plain mode, since JIT will optimize the check for the stop flag away if it can prove that the loop body never modifies it. This proof typically succeeds if the loop body can be fully inlined. Let me illustrate this point by two examples:
static long brokenBroadcast() throws InterruptedException {
var stop = new AtomicBoolean();
var job = CompletableFuture.supplyAsync(() -> {
var spins = 0L;
while (!stop.getPlain()) {
spins++;
}
return spins;
});
Thread.sleep(100);
stop.setPlain(true);
return job.join();
}
public static void main() throws InterruptedException {
var spins = brokenBroadcast();
out.printf("Done after %s spins%n", spins);
}
This program will almost certainly never terminate, because the check for the stop flag will be optimized away by the JIT compiler. This variation however probably will:
static long brokenButAccidentallyWorkingBroadcast() throws InterruptedException {
var stop = new AtomicBoolean();
var job = CompletableFuture.supplyAsync(() -> {
var spins = 0L;
while (!stop.getPlain()) {
if (++spins % 1_000_000_000 == 0) {
out.printf("%s spins already...%n", spins);
}
}
return spins;
});
Thread.sleep(100);
stop.setPlain(true);
return job.join();
}
public static void main() throws InterruptedException {
var spins = brokenButAccidentallyWorkingBroadcast();
out.printf("Done after %s spins%n", spins);
}
For me this reliably prints
1000000000 spins already...
Done after 1000000000 spins
Am I always that lucky to terminate after exactly 1000000000 spins? Not quite, if you look into the generated assembler code (how to do this would be worth a blog post of its own; for the time being please have a look at the Developers disassemble! Use Java and hsdis to see it all blog post and check out the -XX:CompileCommand=print option) you’ll see that luck isn’t involved at all, since what is actually executed is:
long run() {
if (stop.getPlain()) {
return goBackToInterpreter();
}
var spins = 0;
do {
spins++;
pollForSafePoint(); // <-- https://shipilev.net/jvm/anatomy-quarks/22-safepoint-polls/
} while (spins % 1_000_000_000 != 0);
return goBackToInterpreter();
}
Let’s summarize the important parts:
- The generated code does not check the
stop
flag in the loop at all. - As soon as it hits the if block with the
printf
, it goes back to the interpreter.
The interpreter then invokes printf
, and then continues execution by checking the stop
flag, which at this point is
already true
. Thus, the loop terminates after exactly 1_000_000_000
iterations.
So isn’t JIT going a bit over the top when simply removing the check for the stop flag? Maybe in this very particular case,
however, normally, this kind of optimization is exactly what you want, and far more
common than you might initially think. Let me give you an example: Every time you iterate over an
ArrayList
, using for (var elem : arrayList)
, you theoretically have to pay for
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
again and again. This would incur a significant performance penalty for tight loops if JIT wasn’t allowed to optimize this check away as long as it can prove that the loop body doesn’t modify the list. The Java Memory Model has been specifically designed to allow these kinds of optimizations. The only reliable way to make sure that updates to a variable from one thread are eventually picked up by other threads without additional synchronization, is to use opaque, or a stronger memory ordering mode for both reads and writes.
Broadcasting Progress
Another legitimate use case for opaque mode is publishing progress information in a scenario where one thread performs some long-running task, and another thread monitors its progress. Simplified to its bare minimum, it could look like this:
final AtomicInteger progress = new AtomicInteger(0);
void workerThread() {
while (!done()) {
work();
progress.setOpaque(progress.getPlain() + 1);
}
}
void monitoringThread() throws InterruptedException {
while (!done()) {
out.printf("progress=%s%n", progress.getOpaque());
Thread.sleep(10);
}
}
When using plain writes, that is progress.setPlain(progress.getPlain() + 1)
JIT might be tempted to “optimize”
workerThread()
into
void workerThread() {
var progressInRegister = 0;
while (!done()) {
work();
progressInRegister++;
}
progress.setPlain(progressInRegister);
}
which is probably not what you indented. Opaque mode, as well as any stronger mode, prevents this optimization.
Summary & Recommendation
Opaque mode can be used to share a single value between threads, but it is poorly documented, and it comes with
absolutely no guarantees for other memory locations, which makes it too weak for many use cases. Even where it’s safe,
sticking with volatile
is probably the better choice.