Designing Twitter's Timeline Feature
Designing Twitter's Timeline Feature
Problem Description
Design a timeline feature similar to Twitter that supports users posting tweets, following/unfollowing other users, and retrieving the user's home timeline (returns the 10 most recent tweets from the user and the users they follow, sorted in reverse chronological order). The operations of posting a tweet, following/unfollowing must be efficient, and retrieving the timeline should return results quickly.
Solution Approach
-
Core Requirements Analysis
- Post Tweet: A user can post a tweet (containing content and a timestamp).
- Follow/Unfollow: A user can dynamically follow or unfollow other users (no mutual follow required).
- Get Timeline: Return the most recent 10 tweets from the user and the users they follow, sorted from newest to oldest.
-
Key Challenges
- The timeline needs to aggregate tweets from multiple users and sort them to get the top 10. If a user follows a large number of people (e.g., 1000), real-time merging and sorting of all tweets every time the timeline is fetched would be very slow.
- Need to balance tweet posting efficiency and timeline query efficiency.
-
Design Ideas
- Tweet Storage: Each user's tweets are stored in a linked list (sorted by timestamp in descending order), with new tweets inserted at the head of the list.
- Follow Relationships: Use a hash table to maintain a user's follow list.
- Timeline Generation:
- Option 1 (Real-time Merging): When fetching the timeline, merge the latest 10 tweets from the tweet linked lists of the user and all users they follow (similar to multi-way merge).
- Option 2 (Pre-aggregation): Maintain a timeline cache (e.g., a priority queue) for each user, actively pushing tweets to followers' caches upon posting. However, cache updates become complex with unfollows or a large number of followers.
- Trade-off: Twitter initially used real-time merging (due to a smaller number of followers) and later switched to a pre-aggregation + caching strategy. This problem focuses on the real-time merging approach.
Detailed Implementation Steps
-
Data Structure Design
class Tweet: def __init__(self, tweet_id, timestamp): self.tweet_id = tweet_id self.timestamp = timestamp # Globally increasing timestamp self.next = None # Linked list pointer to the next tweet class User: def __init__(self, user_id): self.user_id = user_id self.followees = set() # People followed (including self) self.tweet_head = None # Head node of the tweet linked list # Initially follow oneself to include own tweets in the timeline self.followees.add(user_id) def post_tweet(self, tweet_id, timestamp): """Post a tweet: insert at the head of the linked list""" new_tweet = Tweet(tweet_id, timestamp) new_tweet.next = self.tweet_head self.tweet_head = new_tweet def follow(self, followee_id): self.followees.add(followee_id) def unfollow(self, followee_id): if followee_id != self.user_id: # Cannot unfollow oneself self.followees.discard(followee_id) -
System Core Class
class Twitter: def __init__(self): self.timestamp = 0 # Global timestamp self.user_map = {} # user_id -> User object def _get_user(self, user_id): """Get a user, create if not exists""" if user_id not in self.user_map: self.user_map[user_id] = User(user_id) return self.user_map[user_id] def postTweet(self, user_id, tweet_id): """Post a tweet""" user = self._get_user(user_id) self.timestamp += 1 user.post_tweet(tweet_id, self.timestamp) def getNewsFeed(self, user_id): """Get timeline: merge latest tweets from user and followees""" if user_id not in self.user_map: return [] user = self.user_map[user_id] # Use a max heap (priority queue) to merge tweets by timestamp heap = [] for followee_id in user.followees: followee = self._get_user(followee_id) if followee.tweet_head: # Store (-timestamp, tweet) in heap to simulate a max heap heapq.heappush(heap, (-followee.tweet_head.timestamp, followee.tweet_head)) result = [] while heap and len(result) < 10: _, tweet = heapq.heappop(heap) result.append(tweet.tweet_id) # Add the next tweet from this user to the heap if tweet.next: heapq.heappush(heap, (-tweet.next.timestamp, tweet.next)) return result def follow(self, follower_id, followee_id): """Follow operation""" follower = self._get_user(follower_id) follower.follow(followee_id) def unfollow(self, follower_id, followee_id): """Unfollow operation""" if follower_id in self.user_map: self.user_map[follower_id].unfollow(followee_id) -
Key Optimization Points
- Tweet Linked List: Each user's tweets are stored in descending order by time, avoiding reversal during retrieval.
- Multi-way Merge: Use a max heap (priority queue) to merge K sorted linked lists, taking the latest tweet each time, ensuring time complexity of O(10 * logK) (where K is the number of followees).
- Follow Self: Users initially follow themselves to simplify timeline logic.
Complexity Analysis
- Post Tweet: O(1), insertion at the head of a linked list.
- Follow/Unfollow: O(1), hash table operations.
- Get Timeline: O(10 * logK), where K is the number of followees. The heap stores at most K tweets, and we retrieve 10 items.
Practical Extensions
- When a user follows a massive number of accounts, strategies like tweet sharding, cached pre-generation (e.g., updating the timeline every 5 minutes) can be introduced.
- Industrial-grade systems also consider architectural optimizations like database sharding, read-write separation, asynchronous message queue processing, etc.