The Scheduler in Go: Principles and Work Stealing Mechanism
The Go scheduler is a core component of the Go runtime system, responsible for managing the execution of Goroutines on operating system threads (M). Its design goals are to efficiently utilize multi-core CPUs while ensuring low latency and high throughput. Below, we break down the core principles and working mechanisms of the scheduler step by step.
1. Why is a Scheduler Needed?
- Goroutine vs. Thread: Goroutines are lightweight user-space threads with very low creation and switching costs (initially only 2KB stack, dynamically expandable/shrinkable), whereas OS threads are costly (typically with MB-level stacks).
- Core Problem: Directly assigning one OS thread per Goroutine would waste resources; using only one OS thread cannot leverage multiple cores.
- Solution: The scheduler is responsible for fairly distributing a large number of Goroutines onto a small number of OS threads for execution and handling task switching during blocking (e.g., I/O).
2. Basic Structure of the Scheduler: The G-M-P Model
The scheduler consists of three core entities:
- G (Goroutine): Represents a Go coroutine, storing its execution stack, state, and program counter.
- M (Machine): Corresponds to an OS thread, scheduled by the operating system, and actually executes G's code.
- P (Processor): A virtual processor, acting as an intermediary between G and M. P maintains a local G queue (Local Run Queue, LRQ), and an M must be bound to a P to execute a G.
Design Advantages:
- The number of P's defaults to the number of CPU cores (adjustable via
GOMAXPROCS), preventing thread thrashing due to too many M's. - Each P's local queue avoids lock contention associated with a global queue.
3. Core Workflow of the Scheduler
Step 1: Goroutine Creation and Queuing
- When creating a G using
go func(), the G is preferentially placed into the current P's local queue (at the tail). - If the P's local queue is full (capacity 256), half of the G's from the local queue are moved to the global queue (at the tail).
Step 2: M Acquires and Executes a G
- After an M binds to a P, it fetches a G from the head of the P's local queue for execution (if the local queue is empty, it attempts to steal G's from the global queue or from other P's).
- If the M's execution of G leads to a blocking system call (e.g., file I/O):
- The scheduler detaches the M from the P, and the P then binds to an idle M (or creates a new one) to continue executing other G's.
- After the blocking system call ends, the M attempts to bind to a P to resume execution; if no idle P is available, the G is placed into the global queue, and the M goes to sleep.
Step 3: Cooperative Scheduling of Goroutines
- Scheduling points (opportunities for scheduling) include:
- Voluntary Yielding:
runtime.Gosched()explicitly yields the CPU. - Channel Operations: A G is suspended when blocked on channel send/receive.
- System Calls: Handled as described in the blocking scenario above.
- Garbage Collection (GC): During GC, all G's must be paused, and execution order is adjusted.
- Voluntary Yielding:
4. Work Stealing Mechanism
When a P's local queue is empty, the P does not immediately go to sleep but instead steals G's in the following order:
- Acquire from the Global Queue: Periodically checks the global queue (once every 61 scheduler ticks) to prevent starvation of the global queue.
- Steal from Other P's: Randomly selects another P and steals half of the G's from the tail of its local queue (reduces lock contention and avoids conflict with the G currently being executed by that P).
- Acquire from the Netpoller: If no G's can be stolen, checks for any ready G's related to network I/O.
Significance of the Stealing Strategy:
- Balances the load across P's, preventing some P's from being idle while others are backlogged.
- Stealing from the tail reduces contention with the local P's execution of the G at the head.
5. Optimization Features of the Scheduler
- Preemptive Scheduling:
- Before Go 1.14, a G could only be preempted during a function call, potentially allowing a G performing long computations to monopolize a thread.
- Go 1.14+ introduced signal-based preemption, where an M, upon receiving a signal, is forced to yield the CPU, preventing starvation.
- Spinning Threads:
- If an M cannot find a G but runnable G's exist, the M "spins" (idles for a small number of cycles) to avoid frequent sleep/wake cycles.
6. Example: How the Scheduler Handles High-Concurrency Scenarios
Assume a 4-core CPU (GOMAXPROCS=4) with 1000 G's created:
- Initially, the 4 P's distribute the G's relatively evenly (each P gets about 250 G's).
- If P1's G's finish first, it will steal G's from the queues of P2, P3, and P4, ensuring all CPU cores stay busy.
- If a G blocks on a network request, its P immediately switches to execute another G, avoiding thread wastage.
Summary
Through the G-M-P model and the work stealing mechanism, the Go scheduler achieves:
- High Concurrency: Lightweight Goroutines multiplexed onto a small number of OS threads.
- Load Balancing: Work stealing prevents idleness.
- Low Blocking: Detaching M from P during system calls ensures other G's continue execution.
Understanding the scheduler's principles helps in writing more efficient concurrent code (e.g., avoiding G's that occupy the CPU for long periods).