Detailed Explanation of AQS (AbstractQueuedSynchronizer) in Java

Detailed Explanation of AQS (AbstractQueuedSynchronizer) in Java

AQS is the core framework of the Java concurrency package java.util.concurrent.locks. It provides a low-level, high-performance, and extensible foundation for building locks and synchronizers. Commonly used synchronization tools such as ReentrantLock, Semaphore, and CountDownLatch are all built on top of AQS.

1. Core Idea of AQS

The core idea of AQS is very simple: it maintains a volatile int synchronization state (state) and a FIFO (First-In-First-Out) thread waiting queue (a variant of the CLH queue).

Synchronization State (state): This is a variable that can be accessed concurrently, and its specific meaning is defined by subclasses. For example:
- In ReentrantLock, state indicates the number of times the lock is held (0 means unlocked, 1 means locked by one thread, >1 indicates reentrancy by the same thread).
- In Semaphore, state represents the number of currently available permits.
- In CountDownLatch, state indicates the count that still needs to be waited for.
Waiting Queue: This is a doubly linked list used to store all threads waiting to acquire the resource. When a thread fails to acquire the resource, it is encapsulated into a node (Node) and added to the tail of the queue, entering a waiting state. When the thread holding the resource releases it, it will wake up the next (or all) waiting thread(s) in the queue.

2. Template Method Pattern in AQS

AQS uses the Template Method Pattern. It defines a series of "skeleton" methods for acquiring and releasing resources (such as acquire and release), leaving the specific implementation of some key steps to subclasses. Subclasses need to override the following protected methods to implement specific synchronization semantics:

tryAcquire(int arg): Attempts to acquire the resource in exclusive mode. Returns true on success, false on failure.
tryRelease(int arg): Attempts to release the resource in exclusive mode. Returns true on success, false on failure.
tryAcquireShared(int arg): Attempts to acquire the resource in shared mode. Returns a negative value on failure; 0 on success, but subsequent shared-mode acquisitions might fail; a positive value on success, and subsequent shared-mode acquisitions might succeed.
tryReleaseShared(int arg): Attempts to release the resource in shared mode.
isHeldExclusively(): Whether the synchronizer is currently held exclusively by a thread.

3. Deep Dive into the AQS Workflow: Exclusive Mode Example

Let's take the most typical exclusive mode (like ReentrantLock) as an example to break down the AQS workflow in detail.

Scenario: Threads A, B, and C compete for a lock implemented based on AQS.

Step One: Thread A attempts to acquire the lock (lock.lock() -> acquire(1))

Calls tryAcquire(1): AQS first calls the subclass-overridden tryAcquire method. Assume the lock is free at this moment (state is 0).
Acquires successfully: In the tryAcquire method, the subclass attempts to change state from 0 to 1 via a CAS operation. Thread A's CAS operation succeeds, state becomes 1, and AQS records that the current lock-holding thread is thread A (exclusiveOwnerThread = threadA).
Process ends: tryAcquire returns true, thread A successfully acquires the lock and continues executing its critical section code. The entire process is very fast and involves no queue operations.

Step Two: Thread B attempts to acquire the lock (while the lock is held by thread A)

Calls tryAcquire(1): Similarly, AQS calls tryAcquire. At this point state is 1, and the holding thread is not B, so the tryAcquire method returns false, indicating failure to acquire.
Joins the waiting queue: Since tryAcquire fails, AQS executes the addWaiter(Node.EXCLUSIVE) method.
- Encapsulates thread B into a Node (mode EXCLUSIVE).
- If the queue already exists (the tail node is not null), it quickly sets the new node as the tail node via a CAS operation.
- If the queue does not exist (e.g., this is the first waiting thread) or the CAS fails, it enters a "spin" loop until successfully adding the new node to the tail of the queue via CAS.
Spins and waits in the queue: After the node enqueues, the acquireQueued method is called. This is a core spinning process:
- Checks if the current node's predecessor node (prev) is the head node. If so, it means it is the first waiter in the queue and is eligible to try acquiring the lock again (calls tryAcquire).
- At this point, thread A has not released the lock yet, so tryAcquire still fails.
- After failing to acquire, thread B checks if it needs to be suspended (park). It sets its predecessor node's waitStatus to SIGNAL, meaning "When you release the lock, you need to wake me up."
- Finally, it calls LockSupport.park(this) to suspend thread B (entering WAITING state), waiting to be awakened.

Step Three: Thread C also attempts to acquire the lock

The process is exactly the same as for thread B. Thread C is encapsulated into a Node, added to the waiting queue after thread B, and then suspended. The queue structure is now: head -> [Node-B] -> [Node-C] (head is a dummy node).

Step Four: Thread A releases the lock (lock.unlock() -> release(1))

Calls tryRelease(1): AQS calls the subclass-overridden tryRelease method. This method decrements state by 1. If state becomes 0, it means the lock is fully released.
Wakes up the successor node: If tryRelease returns true (release successful), AQS checks if there are any waiting threads in the queue (i.e., the head node's waitStatus is not 0).
Finds the node to wake up: AQS finds the next node after the head node (i.e., the node containing thread B).
Executes wake-up unparkSuccessor: Calls LockSupport.unpark(node.thread) to wake up thread B.

Step Five: Thread B, after being awakened, continues competition

Thread B resumes execution from where it was suspended (LockSupport.park).
It re-enters the spin loop of acquireQueued.
This time, it finds its predecessor node is the head node, so it calls tryAcquire(1) again.
The lock is free now, thread B's CAS operation succeeds, changing state from 0 to 1, and records itself as the lock holder.
Thread B sets itself as the new head node (dequeuing the old head node).
Thread B successfully acquires the lock and begins executing its critical section code.

Summary

AQS ingeniously manages thread access to shared resources through a state variable and a FIFO queue. Its core workflow can be summarized as:

Acquiring Resources: First attempt tryAcquire. If successful, return immediately. If failed, add itself to the tail of the queue, and spin/wait until awakened by its predecessor and successfully acquire the resource.
Releasing Resources: Execute tryRelease. Upon success, wake up the next valid waiting thread in the queue.

This design allows AQS to efficiently handle thread queuing, blocking, and waking, forming the foundation for building powerful concurrency tools. Understanding AQS is a key step towards mastering Java concurrent programming.