The Scheduler in Go: Principles and Work Stealing Mechanism

The Scheduler in Go: Principles and Work Stealing Mechanism

The Go scheduler is a core component of the Go runtime system, responsible for managing the execution of Goroutines on operating system threads (M). Its design goals are to efficiently utilize multi-core CPUs while ensuring low latency and high throughput. Below, we break down the core principles and working mechanisms of the scheduler step by step.


1. Why is a Scheduler Needed?

  • Goroutine vs. Thread: Goroutines are lightweight user-space threads with very low creation and switching costs (initially only 2KB stack, dynamically expandable/shrinkable), whereas OS threads are costly (typically with MB-level stacks).
  • Core Problem: Directly assigning one OS thread per Goroutine would waste resources; using only one OS thread cannot leverage multiple cores.
  • Solution: The scheduler is responsible for fairly distributing a large number of Goroutines onto a small number of OS threads for execution and handling task switching during blocking (e.g., I/O).

2. Basic Structure of the Scheduler: The G-M-P Model

The scheduler consists of three core entities:

  1. G (Goroutine): Represents a Go coroutine, storing its execution stack, state, and program counter.
  2. M (Machine): Corresponds to an OS thread, scheduled by the operating system, and actually executes G's code.
  3. P (Processor): A virtual processor, acting as an intermediary between G and M. P maintains a local G queue (Local Run Queue, LRQ), and an M must be bound to a P to execute a G.

Design Advantages:

  • The number of P's defaults to the number of CPU cores (adjustable via GOMAXPROCS), preventing thread thrashing due to too many M's.
  • Each P's local queue avoids lock contention associated with a global queue.

3. Core Workflow of the Scheduler

Step 1: Goroutine Creation and Queuing

  • When creating a G using go func(), the G is preferentially placed into the current P's local queue (at the tail).
  • If the P's local queue is full (capacity 256), half of the G's from the local queue are moved to the global queue (at the tail).

Step 2: M Acquires and Executes a G

  • After an M binds to a P, it fetches a G from the head of the P's local queue for execution (if the local queue is empty, it attempts to steal G's from the global queue or from other P's).
  • If the M's execution of G leads to a blocking system call (e.g., file I/O):
    • The scheduler detaches the M from the P, and the P then binds to an idle M (or creates a new one) to continue executing other G's.
    • After the blocking system call ends, the M attempts to bind to a P to resume execution; if no idle P is available, the G is placed into the global queue, and the M goes to sleep.

Step 3: Cooperative Scheduling of Goroutines

  • Scheduling points (opportunities for scheduling) include:
    • Voluntary Yielding: runtime.Gosched() explicitly yields the CPU.
    • Channel Operations: A G is suspended when blocked on channel send/receive.
    • System Calls: Handled as described in the blocking scenario above.
    • Garbage Collection (GC): During GC, all G's must be paused, and execution order is adjusted.

4. Work Stealing Mechanism

When a P's local queue is empty, the P does not immediately go to sleep but instead steals G's in the following order:

  1. Acquire from the Global Queue: Periodically checks the global queue (once every 61 scheduler ticks) to prevent starvation of the global queue.
  2. Steal from Other P's: Randomly selects another P and steals half of the G's from the tail of its local queue (reduces lock contention and avoids conflict with the G currently being executed by that P).
  3. Acquire from the Netpoller: If no G's can be stolen, checks for any ready G's related to network I/O.

Significance of the Stealing Strategy:

  • Balances the load across P's, preventing some P's from being idle while others are backlogged.
  • Stealing from the tail reduces contention with the local P's execution of the G at the head.

5. Optimization Features of the Scheduler

  • Preemptive Scheduling:
    • Before Go 1.14, a G could only be preempted during a function call, potentially allowing a G performing long computations to monopolize a thread.
    • Go 1.14+ introduced signal-based preemption, where an M, upon receiving a signal, is forced to yield the CPU, preventing starvation.
  • Spinning Threads:
    • If an M cannot find a G but runnable G's exist, the M "spins" (idles for a small number of cycles) to avoid frequent sleep/wake cycles.

6. Example: How the Scheduler Handles High-Concurrency Scenarios

Assume a 4-core CPU (GOMAXPROCS=4) with 1000 G's created:

  1. Initially, the 4 P's distribute the G's relatively evenly (each P gets about 250 G's).
  2. If P1's G's finish first, it will steal G's from the queues of P2, P3, and P4, ensuring all CPU cores stay busy.
  3. If a G blocks on a network request, its P immediately switches to execute another G, avoiding thread wastage.

Summary

Through the G-M-P model and the work stealing mechanism, the Go scheduler achieves:

  • High Concurrency: Lightweight Goroutines multiplexed onto a small number of OS threads.
  • Load Balancing: Work stealing prevents idleness.
  • Low Blocking: Detaching M from P during system calls ensures other G's continue execution.
    Understanding the scheduler's principles helps in writing more efficient concurrent code (e.g., avoiding G's that occupy the CPU for long periods).