Quickselect Algorithm and BFPRT Algorithm
Problem Description
The Quickselect algorithm is used to find the k-th smallest/largest element in an unsorted array, with an average time complexity of O(n). The BFPRT algorithm is its deterministic version, guaranteeing O(n) time complexity even in the worst case. I will explain the principles, differences, and implementations of these two algorithms in detail.
Knowledge Explanation
-
Problem Definition
- Input: An array
arrcontaining n elements, and an integer k (1 ≤ k ≤ n). - Output: The k-th smallest element in the array (to find the k-th largest, you can convert it to finding the (n - k + 1)-th smallest).
- Requirement: Efficient implementation, avoiding the O(n log n) complexity of full sorting.
- Input: An array
-
Principle of the Quickselect Algorithm
- Based on the divide-and-conquer idea of Quicksort, but only needs to process the side containing the target.
- Steps:
a. Randomly select a pivot.
b. Partition: Divide the array into three parts: elements less than, equal to, and greater than the pivot.
c. Judgment: If k falls in the left segment of the pivot, recursively process the left segment; if in the right segment, recursively process the right segment. - Example: Finding the 3rd smallest in [3, 2, 1, 5, 4]:
- Choose pivot 3 → Partition to get [1, 2 | 3 | 5, 4].
- Left segment length 2 < 3, meaning the target is in the right segment → Find the 1st smallest in [5, 4] (since 3 - 2 - 1 = 1).
-
Improvements of the BFPRT Algorithm
- Solves the worst-case O(n²) problem of Quickselect (e.g., when the extreme value is chosen as the pivot each time).
- Core: Deterministically select a pivot to ensure at least 30% of elements are eliminated each time.
- Median of Five Grouping:
a. Group the array into sets of five elements each; any remaining elements form the last group.
b. Sort each group internally and take the median to form a median array.
c. Recursively call BFPRT to find the median of the median array as the pivot.
Code Implementation
import random
def quick_select(arr, k):
"""Quickselect Algorithm"""
if len(arr) == 1:
return arr[0]
pivot = random.choice(arr) # Randomly select pivot
left = [x for x in arr if x < pivot]
mid = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
if k <= len(left):
return quick_select(left, k)
elif k <= len(left) + len(mid):
return pivot
else:
return quick_select(right, k - len(left) - len(mid))
def bfprt_select(arr, k):
"""BFPRT Algorithm"""
if len(arr) <= 5: # For small scales, directly sort and return
return sorted(arr)[k-1]
# Median of five grouping
medians = []
for i in range(0, len(arr), 5):
group = arr[i:i+5]
medians.append(sorted(group)[len(group)//2])
# Recursively find the median of medians as the pivot
pivot = bfprt_select(medians, len(medians)//2 + 1)
# Partition operation
left = [x for x in arr if x < pivot]
mid = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
if k <= len(left):
return bfprt_select(left, k)
elif k <= len(left) + len(mid):
return pivot
else:
return bfprt_select(right, k - len(left) - len(mid))
Complexity Analysis
- Quickselect:
- Average Case: Problem size halves after each partition, T(n) = T(n/2) + O(n) → O(n).
- Worst Case: T(n) = T(n-1) + O(n) → O(n²).
- BFPRT:
- Recurrence: T(n) ≤ T(n/5) + T(7n/10) + O(n).
- Using the Master Theorem, T(n) = O(n), with a coefficient less than 20.
Application Scenarios
- Quickselect: High efficiency when data is randomly distributed; simple implementation.
- BFPRT: Suitable for scenarios with strict time complexity requirements (e.g., real-time systems).
Through comparison, it is evident that BFPRT trades a slightly larger constant coefficient for stable linear time complexity, making it suitable for scenarios requiring guaranteed worst-case performance.