Principles of Graph Convolutional Networks (GCN) and the Message Passing Mechanism

Principles of Graph Convolutional Networks (GCN) and the Message Passing Mechanism

Problem Description
Graph Convolutional Network (GCN) is a core deep learning model for processing graph-structured data. Its core idea is to update the representation of the current node by aggregating features from its neighbor nodes. This problem requires a detailed explanation of GCN's message passing mechanism, the derivation process of the inter-layer propagation formula, and how multi-layer networks capture deep relationships between nodes in a graph.

Step-by-Step Analysis of Key Concepts

  1. Special Characteristics of Graph Data

    • Traditional CNNs can only process Euclidean space data (such as images, text sequences), but graph data is non-Euclidean, where nodes have an arbitrary number of unordered neighbors.
    • Key Challenge: How to define a convolution operation that is permutation invariant and can integrate neighbor information.
  2. Basic Idea of Message Passing

    • Analogy to social networks: Each node gathers information from its neighbors and updates its own state.
    • Mathematical essence: For node \(i\), its new feature \(h_i'\) is determined by its own feature \(h_i\) and the features \(h_j\) of its neighbors (\(j \in \mathcal{N}(i)\)).
  3. Derivation of the Single-Layer GCN Propagation Formula

    • Step 1: Naive Sum Aggregation
      Initial idea: \(h_i' = \sigma \left( W \cdot \sum_{j \in \mathcal{N}(i) \cup \{i\}} h_j \right)\)
      Problem: Does not consider the influence of node degree, causing features of high-degree nodes to have excessively large values.

    • Step 2: Degree Normalization
      Introduce the degree matrix \(D\) (diagonal elements \(D_{ii} = \sum_j A_{ij}\), where \(A\) is the adjacency matrix):
      \(h_i' = \sigma \left( W \cdot \sum_{j \in \mathcal{N}(i) \cup \{i\}} \frac{h_j}{\sqrt{D_{jj}D_{ii}}} \right)\)
      Principle: Normalize neighbor features by the square root of their degrees to prevent gradient explosion/vanishing.

    • Step 3: Matrix Form
      Let \(\tilde{A} = A + I\) (adjacency matrix with added self-loops), \(\tilde{D}_{ii} = \sum_j \tilde{A}_{ij}\), then the final formula is:

\[ H^{(l+1)} = \sigma \left( \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} H^{(l)} W^{(l)} \right) \]

   where $H^{(l)}$ is the node feature matrix at layer $l$, and $W^{(l)}$ are the trainable parameters.
  1. Physical Meaning of the Formula

    • \(\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}}\) is a variant of the symmetrically normalized Laplacian matrix, ensuring eigenvalues fall within \([-1, 1]\) and enhancing training stability.
    • Calculation example: For node \(i\), its new feature = weighted sum of its own feature (weight \(1/D_{ii}\)) + neighbor features (weight \(1/\sqrt{D_{ii}D_{jj}}\)).
  2. Multi-Layer GCN and Deep Relationship Capture

    • A single-layer GCN only aggregates information from 1-hop neighbors. Stacking \(K\) layers allows aggregation of information from \(K\)-hop neighbors.
    • Over-smoothing problem in deep GCNs: After multiple layers, all node features tend to become similar. Solutions: Add residual connections or use gating mechanisms (e.g., GraphSAGE).

Example Illustration
Assume a social network graph:

  • Nodes: User A (degree=2), B (degree=1), C (degree=1)
  • Initial features: \(h_A=[1,0], h_B=[0,1], h_C=[1,1]\)
  • Single-layer GCN calculation (simplified, ignoring activation function):
    • A's new feature = \(\frac{h_A}{2} + \frac{h_B}{\sqrt{2 \times 1}} + \frac{h_C}{\sqrt{2 \times 1}} = [0.5,0] + [0,0.71] + [0.71,0.71] = [1.21,1.42]\)
    • The result integrates the interest features of friends B and C.

Summary
The core of GCN lies in achieving weighted aggregation of neighbor features through the normalized adjacency matrix. Its message passing mechanism enables the model to effectively utilize graph structural information and is widely applied in tasks such as node classification and link prediction.