Designing Twitter's Timeline Feature

Designing Twitter's Timeline Feature

Problem Description
Design a timeline feature similar to Twitter that supports users posting tweets, following/unfollowing other users, and retrieving the user's home timeline (returns the 10 most recent tweets from the user and the users they follow, sorted in reverse chronological order). The operations of posting a tweet, following/unfollowing must be efficient, and retrieving the timeline should return results quickly.

Solution Approach

Core Requirements Analysis
- Post Tweet: A user can post a tweet (containing content and a timestamp).
- Follow/Unfollow: A user can dynamically follow or unfollow other users (no mutual follow required).
- Get Timeline: Return the most recent 10 tweets from the user and the users they follow, sorted from newest to oldest.
Key Challenges
- The timeline needs to aggregate tweets from multiple users and sort them to get the top 10. If a user follows a large number of people (e.g., 1000), real-time merging and sorting of all tweets every time the timeline is fetched would be very slow.
- Need to balance tweet posting efficiency and timeline query efficiency.
Design Ideas
- Tweet Storage: Each user's tweets are stored in a linked list (sorted by timestamp in descending order), with new tweets inserted at the head of the list.
- Follow Relationships: Use a hash table to maintain a user's follow list.
- Timeline Generation:
  - Option 1 (Real-time Merging): When fetching the timeline, merge the latest 10 tweets from the tweet linked lists of the user and all users they follow (similar to multi-way merge).
  - Option 2 (Pre-aggregation): Maintain a timeline cache (e.g., a priority queue) for each user, actively pushing tweets to followers' caches upon posting. However, cache updates become complex with unfollows or a large number of followers.
- Trade-off: Twitter initially used real-time merging (due to a smaller number of followers) and later switched to a pre-aggregation + caching strategy. This problem focuses on the real-time merging approach.

Detailed Implementation Steps

Data Structure Design

class Tweet:
    def __init__(self, tweet_id, timestamp):
        self.tweet_id = tweet_id
        self.timestamp = timestamp  # Globally increasing timestamp
        self.next = None  # Linked list pointer to the next tweet

class User:
    def __init__(self, user_id):
        self.user_id = user_id
        self.followees = set()  # People followed (including self)
        self.tweet_head = None  # Head node of the tweet linked list

        # Initially follow oneself to include own tweets in the timeline
        self.followees.add(user_id)

    def post_tweet(self, tweet_id, timestamp):
        """Post a tweet: insert at the head of the linked list"""
        new_tweet = Tweet(tweet_id, timestamp)
        new_tweet.next = self.tweet_head
        self.tweet_head = new_tweet

    def follow(self, followee_id):
        self.followees.add(followee_id)

    def unfollow(self, followee_id):
        if followee_id != self.user_id:  # Cannot unfollow oneself
            self.followees.discard(followee_id)

System Core Class

class Twitter:
    def __init__(self):
        self.timestamp = 0  # Global timestamp
        self.user_map = {}  # user_id -> User object

    def _get_user(self, user_id):
        """Get a user, create if not exists"""
        if user_id not in self.user_map:
            self.user_map[user_id] = User(user_id)
        return self.user_map[user_id]

    def postTweet(self, user_id, tweet_id):
        """Post a tweet"""
        user = self._get_user(user_id)
        self.timestamp += 1
        user.post_tweet(tweet_id, self.timestamp)

    def getNewsFeed(self, user_id):
        """Get timeline: merge latest tweets from user and followees"""
        if user_id not in self.user_map:
            return []

        user = self.user_map[user_id]
        # Use a max heap (priority queue) to merge tweets by timestamp
        heap = []
        for followee_id in user.followees:
            followee = self._get_user(followee_id)
            if followee.tweet_head:
                # Store (-timestamp, tweet) in heap to simulate a max heap
                heapq.heappush(heap, (-followee.tweet_head.timestamp, followee.tweet_head))

        result = []
        while heap and len(result) < 10:
            _, tweet = heapq.heappop(heap)
            result.append(tweet.tweet_id)
            # Add the next tweet from this user to the heap
            if tweet.next:
                heapq.heappush(heap, (-tweet.next.timestamp, tweet.next))
        return result

    def follow(self, follower_id, followee_id):
        """Follow operation"""
        follower = self._get_user(follower_id)
        follower.follow(followee_id)

    def unfollow(self, follower_id, followee_id):
        """Unfollow operation"""
        if follower_id in self.user_map:
            self.user_map[follower_id].unfollow(followee_id)

Key Optimization Points
- Tweet Linked List: Each user's tweets are stored in descending order by time, avoiding reversal during retrieval.
- Multi-way Merge: Use a max heap (priority queue) to merge K sorted linked lists, taking the latest tweet each time, ensuring time complexity of O(10 * logK) (where K is the number of followees).
- Follow Self: Users initially follow themselves to simplify timeline logic.

Complexity Analysis

Post Tweet: O(1), insertion at the head of a linked list.
Follow/Unfollow: O(1), hash table operations.
Get Timeline: O(10 * logK), where K is the number of followees. The heap stores at most K tweets, and we retrieve 10 items.

Practical Extensions

When a user follows a massive number of accounts, strategies like tweet sharding, cached pre-generation (e.g., updating the timeline every 5 minutes) can be introduced.
Industrial-grade systems also consider architectural optimizations like database sharding, read-write separation, asynchronous message queue processing, etc.