Quickselect Algorithm (Quickselect)

Quickselect Algorithm (Quickselect)

Quickselect is an efficient algorithm for finding the k-th smallest (or k-th largest) element in an unsorted list. It can be seen as a variant of the quicksort algorithm. Unlike quicksort, which recursively processes both subarrays, quickselect only recurses into the subarray containing the target element, thereby reducing the average time complexity from O(n log n) to O(n).

Core Idea and Steps

The core idea of quickselect is the same as quicksort: divide and conquer. It rearranges the array through a "partition" operation, placing a chosen "pivot" element into its final correct position. At this point, all elements to the left of the pivot are less than or equal to the pivot, and all elements to the right are greater than or equal to the pivot. This final position of the pivot becomes the key to deciding which side to search next.

Specific steps are as follows:

Choose a Pivot: Select an element from the array as the pivot. The pivot selection strategy affects the algorithm's worst-case performance, but for simplicity, we typically choose randomly, or select the first/last element.
Partition Operation: Rearrange the array so that all elements smaller than the pivot are moved to its left, and all elements larger than the pivot are moved to its right. The pivot element is then in its final correct position. We call this position pivot_index.
Compare and Recurse:
- Compare pivot_index with k:
  - If k == pivot_index: Then the pivot element itself is exactly the k-th smallest element we are looking for (since array indexing starts from 0, the 0-th smallest is the minimum, the 1st smallest is the second smallest, and so on). Return this pivot element directly.
  - If k < pivot_index: This means the k-th smallest element we are looking for is located in the left subarray of the pivot. We only need to recursively call the quickselect algorithm on the left subarray (arr[left...pivot_index-1]).
  - If k > pivot_index: This means the k-th smallest element we are looking for is located in the right subarray of the pivot. We only need to recursively call the quickselect algorithm on the right subarray (arr[pivot_index+1...right]). Note that because there are pivot_index elements smaller than the pivot to the left, in the right subarray we are looking for the (k - pivot_index - 1)-th smallest element.

Step-by-Step Example

Suppose we want to find the 2nd smallest element (i.e., k=2, because indexing starts from 0, so this is the third smallest number, which is the number 3) in the array arr = [3, 2, 1, 5, 6, 4].

Initial State: arr = [3, 2, 1, 5, 6, 4], left = 0, right = 5, k = 2.

First Recursion:

Choose Pivot: We select the last element 4 as the pivot.
Partition Operation: Rearrange the array so that elements less than 4 are on the left and greater than 4 are on the right.
- Partition process (Lomuto partition scheme):
  - Initialize a "smaller element boundary" index i = left - 1 = -1.
  - Traverse j from left to right-1 (i.e., from 0 to 4):
    - j=0: Element 3 < 4, move i right by one to 0, swap arr[i] and arr[j] (swap with itself), array becomes [3, 2, 1, 5, 6, 4].
    - j=1: Element 2 < 4, move i right by one to 1, swap arr[1] and arr[1], array remains unchanged.
    - j=2: Element 1 < 4, move i right by one to 2, swap arr[2] and arr[2], array remains unchanged.
    - j=3: Element 5 > 4, no swap.
    - j=4: Element 6 > 4, no swap.
  - End of loop. Swap the pivot 4 (at position 5) with the element at i+1 (position 3). After swapping, the array becomes [3, 2, 1, 4, 6, 5].
- At this point, the pivot 4 is at index 3 (pivot_index = 3).
- Partition result: [3, 2, 1] (all <4), 4, [6, 5] (all >4).
Compare and Recurse:
- Compare k=2 and pivot_index=3.
- k (2) < pivot_index (3), so the 2nd smallest element must be in the left subarray [3, 2, 1].
- We only need to continue recursion on the left subarray. The right subarray is completely discarded, which is the key to efficiency improvement.

Second Recursion:

Current array range: indices 0 to 2 of arr, i.e., subarray [3, 2, 1].
left = 0, right = 2, k = 2 (the target's global rank remains unchanged).

Choose Pivot: Select the last element 1 as the pivot.
Partition Operation:
- i = -1.
- Traverse j from 0 to 1:
  - j=0: Element 3 > 1, no swap.
  - j=1: Element 2 > 1, no swap.
- End of loop. Swap the pivot 1 (at position 2) with the element at i+1 (position 0), which is 3. After swapping, the array (within this subarray) becomes [1, 2, 3].
- At this point, pivot_index = 0.
Compare and Recurse:
- Compare k=2 and pivot_index=0.
- k (2) > pivot_index (0), so the 2nd smallest element is in the right subarray.
- Next, search in the right subarray [2, 3] (corresponding to original array indices 1 to 2). Note that because the left subarray ([1]) has only one element (index 0), meaning one element is already smaller than the target, in the new right subarray we are looking for the k - pivot_index - 1 = 2 - 0 - 1 = 1-th smallest element.

Third Recursion:

Current array range: indices 1 to 2 of arr, i.e., subarray [2, 3].
left = 1, right = 2, k_new = 1.

Choose Pivot: Select the last element 3 as the pivot.
Partition Operation:
- i = 0.
- Traverse j from 1 to 1:
  - j=1: Element 2 < 3, move i right by one to 1, swap arr[1] and arr[1] (swap with itself), array remains unchanged.
- End of loop. Swap the pivot 3 (at position 2) with the element at i+1 (position 2) (swap with itself).
- At this point, pivot_index = 2.
Compare:
- Compare k_new=1 and pivot_index=2.
- k_new (1) < pivot_index (2), so the target is in the left subarray.
- The left subarray is [2] (indices 1 to 1), left = 1, right = 1, k_new remains 1.

Fourth Recursion:

Current array range: only one element [2].
left = 1, right = 1.
At this point, no partition is needed, directly judge: because left == right, this single element is what we are looking for.
Return arr[1], which is 2.

Result Verification: Sorting the original array gives [1, 2, 3, 4, 5, 6]. The element at index 2 (the third smallest) is 3. Our calculation returned 2? There's an error here, let's backtrack and check.

Error Investigation:
The problem lies in the judgment of the third recursion. After the second recursion, we got the subarray [2, 3], and we are looking for the 1-th smallest element (k_new=1) in this subarray.

After partition, the pivot 3 is at index 2.
Compare k_new=1 and pivot_index=2, 1 < 2, so we should continue searching in the left subarray [2] (indices 1 to 1), and k_new remains unchanged at 1.
In an array with only one element [2], the 1st smallest element is 2.

But our goal is the 2nd smallest element in the original array. After sorting the original array [1, 2, 3, 4, 5, 6], the 0th smallest is 1, the 1st smallest is 2, the 2nd smallest is 3. So finding 2 is incorrect.

Root Cause of Error: It lies in the transfer of the k value from the second to the third recursion. In the second recursion, our subarray is [1, 2, 3] (corresponding to original array indices 0,1,2), and k=2 refers to the rank within this subarray. After partition, the pivot 1 is at position 0.

The left subarray is empty.
The right subarray is [2, 3] (indices 1,2).
Because k (2) > pivot_index (0), the target is in the right subarray.
In the right subarray, we already know there are (pivot_index + 1) = 1 elements (i.e., the left subarray's [1] and the pivot itself? This is where the concept gets confused) that are smaller than all elements in the right subarray. Actually, in the partition operation, pivot_index indicates the final position of the pivot in the current array. The current array has 3 elements, indices 0,1,2. The pivot 1 at index 0 means 0 elements are smaller than it (left subarray is empty), and 2 elements are larger than it (right subarray [2,3]).
So, when we enter the right subarray [2,3] (starting from index 1), what we are looking for is no longer the global k-th smallest, but the (k - (pivot_index + 1))-th smallest in this new array. Because in the current array [1,2,3], the first (pivot_index + 1) elements (i.e., the element at index 0, 1) are already smaller than all elements in the right subarray. So in the new array [2,3], we are looking for the k - (pivot_index + 1) = 2 - (0 + 1) = 1-th smallest element. This calculation in the example is correct.

So why is the result wrong? Let's manually simulate finding the 2nd smallest element (index 2) in [1,2,3].

Pivot is 1, after partition it becomes [1, 2, 3], pivot_index=0.
k=2 > pivot_index=0, enter the right subarray [2,3], looking for the 2 - (0+1) = 1-th smallest element.
In [2,3], choose pivot 3.
After partition it becomes [2, 3], pivot_index=2 (index in the overall large array, but relative to the starting index of subarray [2,3] which is 1, the local index within this subarray is 2-1=1? This is a misunderstanding. The pivot_index in the algorithm is always the absolute index relative to the entire array segment of the current recursion level).
- More accurately, during the third recursion, left=1, right=2. The partition operation is performed on arr[1] to arr[2]. After partition, the pivot 3 is placed in its correct position among these two elements. Because 2 < 3, after partition it is [2, 3], with pivot 3 at index 2. So pivot_index=2.
Compare k_new=1 and pivot_index=2. 1 < 2, so enter the left subarray. The left subarray's range is left=1 to pivot_index-1=1, i.e., [2].
In [2], we look for the k_new=1-th smallest element. But in a single-element array, the 1st smallest element index is 0, and our array segment has only one position arr[1], whose value is 2. So we return 2.

Conclusion: The algorithm logic is correct. The error lies in the initial statement: "the 2nd smallest element (i.e., k=2, because indexing starts from 0, so this is the third smallest number, which is the number 3)". This is a conceptual error. In programming, usually "the k-th smallest element" means k starts counting from 0, i.e.:

k=0: smallest element (first smallest)
k=1: second smallest element
k=2: third smallest element

So, in our example [3,2,1,5,6,4] sorted as [1,2,3,4,5,6]:

k=0 -> 1
k=1 -> 2
k=2 -> 3

Our algorithm input k=2 returned 2. Is this an incorrect result? No, if we follow the convention of k starting from 0, the algorithm returning 2 is wrong because it should return 3. But if we follow k starting from 1 (i.e., 1st smallest, 2nd smallest), then should the input k be 1 (to find the second smallest number 2) or 2 (to find the third smallest number 3)? This needs to be clearly defined.

Correction: To align with most programming contexts (like C++'s nth_element) and avoid confusion, we explicitly define: k is 0-indexed. That is, k=0 means the first smallest (minimum), k=1 means the second smallest, and so on.

Then, we rerun the algorithm to find the k=2-th smallest element (i.e., the third smallest, should be number 3) in [3,2,1,5,6,4].

After the first recursion, pivot_index=3, element is 4. Because k=2 < 3, we enter the left subarray [3,2,1], k remains unchanged at 2 (because we are looking for the global 2nd smallest; if it's in this left subarray, its rank within this left subarray is also the 2nd smallest).

Second recursion, find the 2nd smallest element in [3,2,1].

Choose pivot 1, after partition becomes [1, 2, 3], pivot_index=0.
Compare k=2 and pivot_index=0. 2 > 0, so enter the right subarray.
The right subarray is [2,3] (indices 1 to 2). In the right subarray, we are looking for the k - (pivot_index + 1) = 2 - (0 + 1) = 1-th smallest element.

Third recursion, find the 1st smallest element in [2,3].

Choose pivot 3, after partition becomes [2,3], pivot_index=2.
Compare k_new=1 and pivot_index=2. 1 < 2, so enter the left subarray [2] (indices 1 to 1), k_new remains unchanged at 1.

Fourth recursion, find the 1st smallest element in [2]. Only one element, return arr[1]=2.

The result is still 2, but we expect 3. This indicates a logical error in the algorithm implementation.

Correct Logic Correction:
The key point is: when k > pivot_index, we enter the right subarray to search. At this point, the starting index of the right subarray is pivot_index+1. What should be the "local rank" new_k of the element we are looking for in this right subarray?

In the current array, there are already pivot_index + 1 elements (i.e., all elements from left to pivot_index) that are less than or equal to the pivot, and these elements are all smaller than any element in the right subarray.
Therefore, in the right subarray, the original target (global k-th smallest) is the (k - (pivot_index + 1))-th smallest element in the right subarray.

This calculation is correct. So where is the error? It lies in the implementation of the partition operation and the meaning of pivot_index. In the second recursion processing [3,2,1], we incorrectly partitioned it into [1,2,3]. Actually, the initial array is [3,2,1]. If we choose the last element 1 as the pivot, the correct result of partitioning should be placing all elements less than 1 on the left and greater than 1 on the right. But both 3 and 2 are greater than 1, so should the result after partition be [1, 2, 3]? No, not exactly. The partition operation requires elements smaller than the pivot to be placed on the left, and larger on the right. Here all elements (3,2) are larger than pivot 1, so after partition, pivot 1 should be on the far left, then all elements greater than 1 maintain their original order? No, the partition operation does not guarantee the order of the part larger than the pivot. But regardless, pivot 1 should be placed in a position such that all elements to its left are smaller than it. Since no element is smaller than 1, pivot 1 should be placed at index 0. Where should the original element 3 at index 0 be swapped to? It should be swapped to index 2 (the original position of element 1)? Let's correctly simulate using Lomuto partition:

Correct Partition Simulation for Second Recursion:
Array segment: arr[0]=3, arr[1]=2, arr[2]=1. left=0, right=2. Pivot pivot = arr[2] = 1.

i = left - 1 = -1.
j traverses from left=0 to right-1=1:
- j=0: Is arr[0]=3 <= pivot=1? No. No operation.
- j=1: Is arr[1]=2 <= pivot=1? No. No operation.
End of traversal. i remains -1.
Swap the pivot arr[2]=1 with arr[i+1] = arr[0]=3.
After swapping, the array segment becomes: arr[0]=1, arr[1]=2, arr[2]=3.
pivot_index = i+1 = 0.

This partition result is correct: [1, 2, 3], pivot 1 at index 0. Now, k=2 (we are looking for the 2nd smallest element in the current array segment [1,2,3], which is 3).
Compare k=2 and pivot_index=0. 2 > 0, so enter the right subarray.
The right subarray is arr[pivot_index+1] to arr[right], i.e., arr[1] to arr[2], which is [2, 3].
In the new right subarray, we are looking for the k_new-th smallest element, where k_new = k - (pivot_index + 1) = 2 - (0 + 1) = 1.

Now enter the third recursion: find the 1st smallest element in [2,3].
Array segment: arr[1]=2, arr[2]=3. left=1, right=2. Pivot pivot = arr[2] = 3.

i = left - 1 = 0.
j traverses from left=1 to right-1=1:
- j=1: Is arr[1]=2 <= pivot=3? Yes.
  - i++ => i=1.
  - Swap arr[i]=2 and arr[j]=2 (swap with itself).
End of traversal.
Swap the pivot arr[2]=3 with arr[i+1] = arr[2]=3 (swap with itself).
pivot_index = i+1 = 2.

Now, compare k_new=1 and pivot_index=2. 1 < 2, so enter the left subarray.
The left subarray is arr[left] to arr[pivot_index-1], i.e., arr[1] to arr[1], which is [2].
In the new left subarray, we are looking for the k_new-th smallest element, where k_new remains unchanged, because k_new < pivot_index, we are searching on the left side, and the relative order of elements in the left part globally has not changed, so the rank k_new remains the same. That is, we are still looking for the 1-st smallest element in this left subarray.

Fourth recursion: find the 1st smallest element in [2].
Only one element, directly return arr[1] = 2.

The result is still 2, but expected is 3.

Root Cause: I finally found the problem! It lies in the correspondence between the definition of "k-th smallest" and array indexing. We have been saying "k is 0-indexed", i.e.:

k=0: smallest
k=1: second smallest
k=2: third smallest

In our example, the sorted array is [1,2,3,4,5,6]. The 2nd smallest element (k=2) is 3.
However, during algorithm execution, when we have a subarray, this "k" refers to the rank in the current entire array, not the rank within the subarray. When we enter a subarray, we need to adjust k.

The problem occurs in the third recursion:

We are searching in the subarray [2,3]. This subarray corresponds to original array indices 1 and 2.
We are looking for the global 2nd smallest element.
In this subarray [2,3], what is the global rank of element 2? Before it, there is subarray [1] (index 0), which is the global smallest. So element 2 is the global second smallest (k=1).
Element 3 is the global third smallest (k=2).
We are looking for the global 2nd smallest (k=2) element, which is 3.
So, in the subarray [2,3], should we be looking for the (2 - 1) = 1-th smallest element in this subarray? Not exactly, because the first element 2 of subarray [2,3] is already the global 1st smallest (counting from 0), so:
- The 0-th smallest element of subarray [2,3] is 2, corresponding to global 1st smallest.
- The 1st smallest element of subarray [2,3] is 3, corresponding to global 2nd smallest.
Therefore, in the subarray [2,3], what we are looking for is precisely the 1st smallest element (k_new=1).

Then, in the third recursion, we are looking for the 1st smallest element (k_new=1) in [2,3]. We chose pivot 3, after partition pivot_index=2. Then we compare k_new=1 and pivot_index=2, because 1 < 2, we enter the left subarray [2], and k_new remains unchanged at 1.

In [2], we look for the 1st smallest element. But [2] has only one element, its 0-th smallest element is 2, the 1st smallest element does not exist! Because the index is out of range. This is the root of the error.

Correct Logic: When k < pivot_index, we enter the left subarray, and k remains unchanged. When k > pivot_index, we enter the right subarray, and k becomes k - (pivot_index + 1). This logic is correct.

The error lies in: In the third recursion, our subarray is [2,3], only two elements, indices 1 and 2. When we perform partition on this subarray, the obtained pivot_index is the absolute index relative to the entire array (2), not the index relative to the subarray. This is correct because k is also a global index.

However, when we enter the left subarray [2], this left subarray contains only one element at index 1. For this single-element array, the only valid k value is 0 (to find its minimum). Our k_new=1 is already beyond the range of this single-element array.

What does this mean? It means our partition operation or pivot selection may have led to searching in an empty or invalid subarray. In this case, shouldn't the algorithm return an incorrect result? Actually, in a correct implementation, when k < pivot_index, the left subarray we search is [left, pivot_index-1]. In the third recursion, left=1, pivot_index=2, so the left subarray is [1, 1], i.e., only one element 2. Then we look for the k_new=1-th smallest element in this single-element array. Since the array has only one element, its index is 0, so the 1st smallest element does not exist. This should cause an error.

Yet, in the standard implementation of quickselect, when k < pivot_index, we recursively call quickselect(arr, left, pivot_index-1, k). Note that the k value here is not changed. So, in the third recursion, we are calling quickselect(arr, 1, 1, 1). Meaning: in the array segment [2] (index 1), find the 1st smallest element. This array segment has only one element, its 0-th smallest element is 2, the 1st smallest element does not exist. So will the algorithm access out of bounds? No, in the recursive function, when left == right, if k == 0, we return arr[left]. But if k > 0, like k=1, and the array has only one element, then k is invalid.