Top K Problem

Approach: Directly sort all n elements (e.g., using quicksort), then take the first K elements.
Time Complexity:
- Sorting requires O(n log n), taking the first K requires O(K), overall O(n log n).
Disadvantages:
- Low efficiency when n is extremely large, and the data may not fit entirely into memory.
Applicable Scenarios: Small data volume that can be loaded into memory at once.

Approach: Sort only the first K elements, avoiding global sorting.
- For example, using a variant of selection sort or bubble sort, finding one maximum per iteration, executed for K iterations.
Time Complexity:
- Each iteration traverses n elements, K iterations total, O(nK).
Disadvantages:
- Degenerates to O(n²) when K is close to n.
Applicable Scenarios: K is much smaller than n, and efficiency requirements are not high.

This is the most commonly used efficient solution, divided into two cases:

Approach: Use a Max-Heap to maintain the current K smallest elements.
- Steps:
  a. Create a max-heap with capacity K.
  b. Traverse the first K elements and add them directly to the heap.
  c. Starting from the (K+1)-th element, if the current element is smaller than the heap top (i.e., smaller than the current maximum among the K elements), replace the heap top and adjust the heap.
  d. After traversal, the heap contains the top K smallest elements.
Time Complexity:
- Heap construction O(K), heap adjustment O(log K), with n-K adjustments total, overall O(n log K).
Space Complexity: O(K).

Approach: Use a Min-Heap to maintain the current K largest elements.
- Steps:
  a. Create a min-heap with capacity K.
  b. Traverse the first K elements and add them directly to the heap.
  c. Starting from the (K+1)-th element, if the current element is larger than the heap top (i.e., larger than the current minimum among the K elements), replace the heap top and adjust the heap.
  d. After traversal, the heap contains the top K largest elements.
Time Complexity: Same as above, O(n log K).

Why is the heap efficient?

Heap adjustments only affect locally, eliminating the need for global sorting, making it suitable for massive data streams (data input item by item, no need to load everything at once).

Approach: Based on the partition operation of quicksort.
- Steps:
  a. Randomly select a pivot element, partitioning the array into left and right parts.
  b. If the pivot index is exactly K, then the left side contains the top K smallest elements (note the order may be scrambled).
  c. If the pivot index is greater than K, recurse on the left part; if less than K, recurse on the right part.
Time Complexity:
- Average O(n), worst-case O(n²) (can be avoided through randomization).
Disadvantages:
- Requires modifying the original array and is not suitable for data stream scenarios.
Optimization: Combine with the BFPRT algorithm (median of medians) to guarantee worst-case O(n), though less used in practical engineering.

Approach: Suitable for distributed systems or massive data.
- Steps:
  a. Shard the data across multiple machines, each computing a local Top K.
  b. Aggregate all local Top K results and compute the final global Top K.
Advantages:
- Reduces single-machine pressure through parallel processing.

Method	Time Complexity	Space Complexity	Applicable Scenarios
Global Sorting	O(n log n)	O(n)	Small data volume, sufficient memory
Partial Sorting	O(nK)	O(1)	K is very small
Heap	O(n log K)	O(K)	Massive data streams, limited memory
QuickSelect	O(n) average	O(1)	Original array can be modified, complete order not required
Divide and Conquer	Depends on parallelism	O(mK)	Distributed systems, extremely large data

Practical Application Suggestions

In interviews, the Heap solution is usually preferred for its balance of efficiency and generality.
If data can be loaded at once and array modification is allowed, consider QuickSelect.
When dealing with massive data, combine strategy selection (like bucket sort or divide and conquer) with data characteristics (e.g., frequency of duplicates).