The Scheduler in Go: Principles and Work Stealing Mechanism

The Scheduler in Go: Principles and Work Stealing Mechanism

The Go scheduler is a core component of the Go runtime system, responsible for managing the execution of Goroutines on operating system threads (M). Its design goals are to efficiently utilize multi-core CPUs while ensuring low latency and high throughput. Below, we break down the core principles and working mechanisms of the scheduler step by step.

1. Why is a Scheduler Needed?

Goroutine vs. Thread: Goroutines are lightweight user-space threads with very low creation and switching costs (initially only 2KB stack, dynamically expandable/shrinkable), whereas OS threads are costly (typically with MB-level stacks).
Core Problem: Directly assigning one OS thread per Goroutine would waste resources; using only one OS thread cannot leverage multiple cores.
Solution: The scheduler is responsible for fairly distributing a large number of Goroutines onto a small number of OS threads for execution and handling task switching during blocking (e.g., I/O).

2. Basic Structure of the Scheduler: The G-M-P Model

The scheduler consists of three core entities:

G (Goroutine): Represents a Go coroutine, storing its execution stack, state, and program counter.
M (Machine): Corresponds to an OS thread, scheduled by the operating system, and actually executes G's code.
P (Processor): A virtual processor, acting as an intermediary between G and M. P maintains a local G queue (Local Run Queue, LRQ), and an M must be bound to a P to execute a G.

Design Advantages:

The number of P's defaults to the number of CPU cores (adjustable via GOMAXPROCS), preventing thread thrashing due to too many M's.
Each P's local queue avoids lock contention associated with a global queue.

3. Core Workflow of the Scheduler

Step 1: Goroutine Creation and Queuing

When creating a G using go func(), the G is preferentially placed into the current P's local queue (at the tail).
If the P's local queue is full (capacity 256), half of the G's from the local queue are moved to the global queue (at the tail).

Step 2: M Acquires and Executes a G

After an M binds to a P, it fetches a G from the head of the P's local queue for execution (if the local queue is empty, it attempts to steal G's from the global queue or from other P's).
If the M's execution of G leads to a blocking system call (e.g., file I/O):
- The scheduler detaches the M from the P, and the P then binds to an idle M (or creates a new one) to continue executing other G's.
- After the blocking system call ends, the M attempts to bind to a P to resume execution; if no idle P is available, the G is placed into the global queue, and the M goes to sleep.

Step 3: Cooperative Scheduling of Goroutines

Scheduling points (opportunities for scheduling) include:
- Voluntary Yielding: runtime.Gosched() explicitly yields the CPU.
- Channel Operations: A G is suspended when blocked on channel send/receive.
- System Calls: Handled as described in the blocking scenario above.
- Garbage Collection (GC): During GC, all G's must be paused, and execution order is adjusted.

4. Work Stealing Mechanism

When a P's local queue is empty, the P does not immediately go to sleep but instead steals G's in the following order:

Acquire from the Global Queue: Periodically checks the global queue (once every 61 scheduler ticks) to prevent starvation of the global queue.
Steal from Other P's: Randomly selects another P and steals half of the G's from the tail of its local queue (reduces lock contention and avoids conflict with the G currently being executed by that P).
Acquire from the Netpoller: If no G's can be stolen, checks for any ready G's related to network I/O.

Significance of the Stealing Strategy:

Balances the load across P's, preventing some P's from being idle while others are backlogged.
Stealing from the tail reduces contention with the local P's execution of the G at the head.

5. Optimization Features of the Scheduler

Preemptive Scheduling:
- Before Go 1.14, a G could only be preempted during a function call, potentially allowing a G performing long computations to monopolize a thread.
- Go 1.14+ introduced signal-based preemption, where an M, upon receiving a signal, is forced to yield the CPU, preventing starvation.
Spinning Threads:
- If an M cannot find a G but runnable G's exist, the M "spins" (idles for a small number of cycles) to avoid frequent sleep/wake cycles.

6. Example: How the Scheduler Handles High-Concurrency Scenarios

Assume a 4-core CPU (GOMAXPROCS=4) with 1000 G's created:

Initially, the 4 P's distribute the G's relatively evenly (each P gets about 250 G's).
If P1's G's finish first, it will steal G's from the queues of P2, P3, and P4, ensuring all CPU cores stay busy.
If a G blocks on a network request, its P immediately switches to execute another G, avoiding thread wastage.

Summary

Through the G-M-P model and the work stealing mechanism, the Go scheduler achieves:

High Concurrency: Lightweight Goroutines multiplexed onto a small number of OS threads.
Load Balancing: Work stealing prevents idleness.
Low Blocking: Detaching M from P during system calls ensures other G's continue execution.
Understanding the scheduler's principles helps in writing more efficient concurrent code (e.g., avoiding G's that occupy the CPU for long periods).