设计Twitter的时间线功能

字数 819 2025-11-13 23:48:23

设计Twitter的时间线功能

题目描述
设计一个简化版的Twitter，支持用户发布推文、关注/取消关注其他用户，以及查看用户时间线。时间线功能需要返回用户和其关注用户的最新推文，按时间倒序排列。

核心功能需求

postTweet(userId, tweetId): 发布新推文
getNewsFeed(userId): 获取用户时间线（最近10条推文）
follow(followerId, followeeId): 关注用户
unfollow(followerId, followeeId): 取消关注用户

解题步骤

步骤1：数据结构设计
我们需要设计高效的数据结构来存储推文和用户关系：

推文存储：
- 每个推文需要包含：推文ID、发布时间戳、用户ID
- 使用哈希表：key=用户ID，value=该用户发布的推文列表（按时间倒序）
关注关系：
- 使用哈希表：key=用户ID，value=该用户关注的用户集合（使用Set去重）
时间戳生成：
- 使用全局自增ID或时间戳来确保推文的时间顺序

步骤2：核心操作实现

发布推文：

class Twitter:
    def __init__(self):
        self.timestamp = 0
        self.user_tweets = {}  # userId -> list of (timestamp, tweetId)
        self.following = {}    # userId -> set of followeeId

    def postTweet(self, userId, tweetId):
        if userId not in self.user_tweets:
            self.user_tweets[userId] = []
        self.timestamp += 1
        self.user_tweets[userId].append((self.timestamp, tweetId))

关注/取消关注：

def follow(self, followerId, followeeId):
    if followerId not in self.following:
        self.following[followerId] = set()
    self.following[followerId].add(followeeId)

def unfollow(self, followerId, followeeId):
    if followerId in self.following and followeeId in self.following[followerId]:
        self.following[followerId].remove(followeeId)

步骤3：时间线获取的朴素解法
最简单的实现是合并所有相关用户的推文，然后排序：

def getNewsFeed_naive(self, userId):
    # 获取所有需要关注的用户（包括自己）
    users = {userId}
    if userId in self.following:
        users |= self.following[userId]
    
    # 收集所有推文
    all_tweets = []
    for user in users:
        if user in self.user_tweets:
            all_tweets.extend(self.user_tweets[user])
    
    # 按时间倒序排序，取前10条
    all_tweets.sort(reverse=True)
    return [tweetId for _, tweetId in all_tweets[:10]]

步骤4：优化解法 - 多路归并
朴素解法在用户关注很多人时效率低下。优化方案：

为每个用户维护一个最大堆（优先队列）
使用多路归并思想，每次从各个推文流中取最新的

import heapq

def getNewsFeed(self, userId):
    # 获取关注用户列表（包括自己）
    users = {userId}
    if userId in self.following:
        users |= self.following[userId]
    
    # 为每个用户的推文流创建指针（指向最新推文）
    heap = []
    for user in users:
        if user in self.user_tweets and self.user_tweets[user]:
            # 获取该用户最新的推文（最后发布的）
            tweets = self.user_tweets[user]
            last_index = len(tweets) - 1
            timestamp, tweetId = tweets[last_index]
            # 将(-timestamp, tweetId, user, last_index-1)入堆
            # 使用负时间戳实现最大堆
            heapq.heappush(heap, (-timestamp, tweetId, user, last_index - 1))
    
    # 多路归并
    result = []
    while heap and len(result) < 10:
        neg_timestamp, tweetId, user, index = heapq.heappop(heap)
        result.append(tweetId)
        
        # 如果该用户还有更多推文，将下一条推文加入堆
        if index >= 0:
            next_timestamp, next_tweetId = self.user_tweets[user][index]
            heapq.heappush(heap, (-next_timestamp, next_tweetId, user, index - 1))
    
    return result

步骤5：进一步优化考虑

推文存储优化：
- 限制每个用户的推文存储数量（如只存最近1000条）
- 使用链表存储推文，便于插入和删除
时间线预计算：
- 为活跃用户预计算时间线
- 使用推拉结合模型（写时扩散+读时扩展）
分布式考虑：
- 用户数据分片存储
- 使用消息队列异步更新时间线

复杂度分析

发布推文：O(1)
关注/取消关注：O(1)
获取时间线：O(10logK)，其中K是关注的用户数

这种设计在工程实践中平衡了读写性能，是实际系统中常用的解决方案。