Demystifying getAcquire and setRelease in Java

In this blog post I want to take a close look at the subtle difference between using

getAcquire with setRelease
and using getVolatile with setVolatile, or equivalently, volatile reads and writes.

I will start with the basic concepts, using simple examples. Then I want to discuss a classical mutual exclusion algorithm, and explain why it relies on volatile semantics, and acquire-release is not enough. Hopefully this will clarify the difference between these two concepts for you. It certainly did so for me.

getAcquire and setRelease

Before going into details, I want to avoid a potential source of confusion from the start. The reason why you don’t typically see the usage of getAcquire and setRelease in Java code, is that they are implied when acquiring/releasing locks, volatile reads/writes and by most operations available on the various atomic classes. The examples below would therefore also work if a volatile flag, or atomic get and set was used.

Having said that, let’s check the documentation: According to the Javadocs, getAcquire “returns the value of a variable, and ensures that subsequent loads and stores are not reordered before this access.” setRelease “sets the value of a variable to the newValue, and ensures that prior loads and stores are not reordered after this access.”

Thus, getAcquire acts like a one-way fence for all loads and stores that follow. I like to imagine it having thorns pointing down that prevent operations from being reordered before it.

This diagram shows getAcquire with thorns pointing downward. Operations can be reordered after it, but not before.

Likewise, setRelease acts as a one-way fence in the other direction as is illustrated below:

This diagram shows setRelease with thorns pointing upward. Operations can be reordered before it, but not after.

This means that getAcquire and setRelease, like volatile reads and writes, can be used to establish happens-before-relationships across thread boundaries: If thread A, after an elaborate procedure finally uses setRelease on a boolean done flag, all other threads that observe the flag to be true when reading it using getAcquire, can safely assume that A is indeed done. Translated to Java, this means

AtomicBoolean done = new AtomicBoolean();
State state = null;

void threadA() {
  state = initialze();
  done.setRelease(true); // (1)
}

void threadB() {
  if (done.getAcquire()) {
    // Can safely assume that state is initialized because a 
    // happens-before-relationship has been established with (1)
    assert state != null;
  }
}

Note that the following code is broken and is just waiting to fail when you least expect it, since no happens-before-relationship is established by plain reads and writes:

boolean done;
State state = null;

void threadA() {
  state = initialize();
  done = true;
}

void threadB() {
  if (done) {
    // This assertion can fail, since the compiler 
    // or the hardware might have reordered instructions
    assert state != null;
  }
}

Due to the strong memory model of the X86 architecture (see Chapter 10.2 in Volume 3 in the Combined Volume Set of Intel® 64 and IA-32 Architectures Software Developer’s Manuals) actually broken code like this has a high chance of working on X86 CPUs unless JIT interferes, but might start to fail randomly on ARM, which has a weakly-ordered memory architecture. This is not just a theoretical possibility, but can easily be demonstrated in tests.

Note that exactly the same applies to publishing objects with non-final fields:

class HasNonFinalField { 
  int nonFinalField;
    
  HasNonFinalField() {
      this.nonFinalField = initializeWithPositiveValue();
  }
}

HasNonFinalField published;

void threadA() {
  published = new HasNonFinalField();  
}

void threadB() {
    if (published != null) {
        // This assertion can fail, since the compiler 
        // or the hardware might have reordered instructions.
        assert published.nonFinalField > 0;
    }
}

To fix this, we could transform it into

AtomicReference<HasNonFinalField> published = new AtomicReference<>();

void threadA() {
  published.setRelease(new HasNonFinalField()); // (1)
}

void threadB() {
  HasNonFinalField obj;  
  if ((obj = published.getAcquire()) != null) {
    // Can't fail, since a happens-before-relationship has been established
    // with (1)
    assert obj.nonFinalField > 0;
  }
}

Although often sufficient, relying only on the happens-before order established by getAcquire and setRelease can lead to surprising results, as in the following example:

class ReleaseAcquireRace {
  final AtomicBoolean started1 = new AtomicBoolean();
  final AtomicBoolean started2 = new AtomicBoolean();

  boolean first1;
  boolean first2;

  void run1() {
    started1.setRelease(true);

    if (!started2.getAcquire()) {
      first1 = true;
    }
  }

  void run2() {
    started2.setRelease(true);

    if (!started1.getAcquire()) {
      first2 = true;
    }
  }

  public static void main(String[] args) {
    for (long tries = 1; tries < Long.MAX_VALUE; tries++) {
      var race = new ReleaseAcquireRace();

      var run1 = CompletableFuture.runAsync(race::run1);
      var run2 = CompletableFuture.runAsync(race::run2);

      run1.join();
      run2.join();

      if (race.first1 && race.first2) {
        out.printf("Both threads won after %s tries%n", tries);
        break;
      }

      if (tries % (1 << 20) == 0) {
        out.printf("Never saw both threads winning after %s tries%n", tries);
      }
    }
  }
}

Here, two potentially concurrently executing racers first announce that they have started, and then check if the other racer has already started. If not, they set a flag announcing that they were first, and exit. Studying the code carefully, it’s not too hard to see that these three outcomes are possible:

first1	first2	explanation
false	false	both threads start at the same time
true	false	thread 1 wins
false	true	thread 2 wins

However, if you execute the code from above, you’ll probably see a result like

Both threads won after 7021 tries

indicating that there is a fourth possibility

first1	first2	explanation
true	true	both threads win

How can that be the case? To understand what happened, remember that setRelease and getAcquire only act as one way fences, as illustrated with thorns in the diagram visualizing the implementation of run1 below:

This diagram shows the implementation of run1.

Therefore, the following reordering is legal, and explains the strange result, where both threads declare themselves as winners: This diagram shows a legal out-of-order execution of run1.

The discussed scenario is in fact explicitly mentioned in the Combined Volume Set of Intel® 64 and IA-32 Architectures Software Developer’s Manuals:

Intel 10.2 The Intel SD manual, Volume 3, Chapter 10.2 Memory Ordering

You can verify that the above explanation is indeed the root cause of the unexpected result, by inserting

VarHandle.fullFence();

between setRelease and getAcquire in both run1 and run2 which prevents the problematic reordering.

Another way to exclude the possibility of both threads winning is to use volatile semantics, as implemented by AtomicBoolean.get and AtomicBoolean.set, which brings us to the next section.

Volatile reads and writes

As already mentioned, a volatile read implies a getAcquire and a volatile write a setRelease. In addition, every execution of your program must be explainable by performing volatile reads and writes in a certain global order that is consistent with the order of these operations in your program code. A more formal definition can be found in the Java Language Specification. Note that this does not imply determinism or the absence of races between volatile operations. It just excludes counter-intuitive executions that are not explainable without reorderings, like both threads winning the race in the last example.

Let’s have a closer look as to how volatile semantics are relied on in a real algorithm.

The Peterson Algorithm

The Peterson algorithm is a classical concurrent algorithm for mutual exclusion. Unlike its modern counterparts, it does not rely on atomic compare and swap operations implemented at the hardware level by modern machine architectures. It is therefore significantly outperformed by even the most basic lock implementations that do, but of theoretical interest for the very same reason. First and foremost, it’s a good example for a concurrent algorithm relying on volatile semantics that is less artificial than the ReleaseAcquireRace from the last paragraph.

Although the Peterson Algorithm can be generalized to work with n threads, I want to focus on the two-thread version only. To conveniently switch between different memory ordering modes, I’m using a small helper enum:

enum MemoryOrdering {
    VOLATILE, ACQUIRE_RELEASE, PLAIN;

    int get(AtomicInteger atomic) {
        return switch (this) {
            case PLAIN -> atomic.getPlain();
            case ACQUIRE_RELEASE -> atomic.getAcquire();
            case VOLATILE -> atomic.get();
        };
    }

    void set(AtomicInteger atomic, int value) {
        switch (this) {
            case PLAIN -> atomic.setPlain(value);
            case ACQUIRE_RELEASE -> atomic.setRelease(value);
            case VOLATILE -> atomic.set(value);
        }
    }

    // other operations....
}

The implementation of the Peterson Algorithm requires threads to have uniquely assigned indices, which must be either 0 or 1 and are accessed by calling ThreadIndex.current. After these technicalities, here is an implementation of the Peterson Algorithm for mutual exclusion:

class PetersonLock {
    private final MemoryOrdering memoryOrdering;
    private final AtomicIntegerArray interested = new AtomicIntegerArray(2);
    private final AtomicInteger turn = new AtomicInteger();

    PetersonLock(MemoryOrdering memoryOrdering) {
        this.memoryOrdering = memoryOrdering;
    }

    void lock() {
        var myIdx = ThreadIndex.current();
        var oIdx = 1 - myIdx;

        memoryOrdering.set(interested, myIdx, 1); // (1)
        memoryOrdering.set(turn, oIdx); // (2)

        while (memoryOrdering.get(interested, oIdx) == 1 
               && memoryOrdering.get(turn) != myIdx) { // (3)
            Thread.onSpinWait();
        }
    }

    void unlock() {
        memoryOrdering.set(interested, ThreadIndex.current(), 0);
    }
}

To enter the critical section, a thread first announces its interest at (1), and cedes the turn to the other thread at (2). Then it waits at (3) until either the other thread has no interest in the lock or cedes the turn to this thread.

Although succinct and elegant, this algorithm contains more nuance than meets the eye initially, and if you try to reimplement it from the top of your head, you’ll probably discover that there are many ways to get it wrong. Most notably, the tests I have written fail unless MemoryOrdering.VOLATILE is used. To understand why, remember that with MemoryOrdering.ACQUIRE_RELEASE, a reordering of the lines (1) and (2) is allowed. This, however, opens up the following possibility for both threads to enter the critical section:

Time	Thread 0	Thread 1
0		turn = 0
1	turn = 1
2	interested[0] = 1
3	observe interested[1] == 0
4	enter critical section
5	in critical section	interested[1] = 1
6	in critical section	observe interested[1] == 0
7	in critical section	observe turn == 1
8	in critical section	enter critical section
9	in critical section	in critical section

To exclude this problematic reordering from happening, we can either use MemoryOrdering.VOLATILE, or insert

VarHandle.fullFence();

between (1) and (2).

Before coming to an end, let me say a few words about performance.

Performance Implications

I did some benchmarking, comparing different variations of the Peterson Algorithm with each other on ARM and X86, with inconclusive results. While the version with MemoryOrdering.ACQUIRE_RELEASE and an explicit fence is slightly faster on my X86 laptop, MemoryOrdering.VOLATILE without VarHandle.fullFence() performed better on ARM for me. However, I wouldn’t be surprised if you get exactly opposite results, since differences in execution speeds depend on the exact model of your CPU and the specific JVM being used.

The only thing that can be said with certainty is that switching between MemoryOrdering.VOLATILE and MemoryOrdering.ACQUIRE_RELEASE can have an impact on performance in either way.

Conclusion

getAcquire and setRelease can often be used instead of get and set with volatile semantics, but knowing where this is possible and where it might introduce subtle bugs requires a deep understanding of the implications and a careful, case by case analysis. My recommendation is therefore to avoid these methods except for the most performance critical parts or your application.