Last updated: 12 December 2022
Master method describes the time complexity of recursive divide and conquer algorithms. Assumes that each subproblem of the recursion is of the same size. Here \(a\) is the number of subproblems, \(\dfrac{n}{b}\) is the size of each subproblem and \(O(n^d)\) is the work done for each subproblem.
See Stanford lectures.
\[T(n) = a \times T \left(\dfrac{n}{b}\right) + O(n^d)\] \[T(n) = \begin{cases} O(n^d \log(n)),& \text{if } a = b^d \\ O(n^d), & \text{if } a \lt b^d \\ O(n^{log_{b}^a}), & \text{if } a \gt b^d \end{cases}\]The three scenarious above are
def merge_sort(arr: list[int]):
"""Merge in place"""
n = len(arr)
# Mergesort requires O(n) extra space
temp = [None] * n
return _merge_sort_rec(lt=0, rt=n-1, arr=arr, temp=temp)
def _merge_sort_rec(lt, rt, arr, temp):
if lt >= rt:
return
mid = (lt + rt)//2
_merge_sort_rec(lt=lt, rt=mid, arr=arr, temp=temp)
_merge_sort_rec(lt=mid + 1, rt=rt, arr=arr, temp=temp)
_merge(lt=lt, rt=rt, mid=mid, arr=arr, temp=temp)
def _merge(lt, rt, mid, arr, temp):
i, j = lt, mid + 1
for k in range(lt, rt + 1):
if i > mid:
temp[k] = arr[j]
j += 1
elif j > rt:
temp[k] = arr[i]
i += 1
elif arr[i] <= arr[j]:
temp[k] = arr[i]
i += 1
else:
# arr[j] < arr[i]
temp[k] = arr[j]
j += 1
arr[lt:rt+1] = temp[lt:rt+1]
arr = [3, 1, -1, 7, 0, 4, 5, 3, 0]
merge_sort(arr)
print(arr) # [-1, 0, 0, 1, 3, 3, 4, 5, 7]
Average run time \(O(n \log n)\) and worst case running time \(O(n^2)\). The running time of quicksort depends on the choice of the pivot element. For instance, if the pivot element chosen is the first element and the array is already sorted, then the running time will be \(O(n^2)\). This is because in each step of the recursion, array is split into a left (smaller) subarray of size 0 and right subarray of size (n-i) where is the index of this pivot. The left recursion does no work, whereas in each right recursion, we recurse through \(n-1,n-2,...1\) elements which is quadratic time.
If at every iteration we pick the median as the pivot element, then the array will get perfectly split into two arrays of size \(n/2\). This will perfectly match the mergesort situation ergo running in \(\theta(n \log n)\) time.
Even if we can choose a pivot that splits the problem into say 25-75 split, the run time will still be \(O(n \log n)\). We can prove this with recursion tree method and the binomial theorem.
import random
def quicksort(arr) -> None:
"""Sorts in-place"""
qs_rec(arr=arr, lt=0, rt=len(arr)-1)
def qs_rec(arr, lt, rt):
if lt >= rt:
return
# This partitions and places the pivot in the rightful place
pivot_ix = partition(arr=arr, lt=lt, rt=rt)
qs_rec(arr=arr, lt=lt, rt=pivot_ix-1)
qs_rec(arr=arr, lt=pivot_ix+1, rt=rt)
def partition(arr, lt, rt):
"""Choosen a random pivot and partition the array such that all elements less than
the pivot are to the left and greater than are to the right of the pivot.
"""
# choose a random pivot and move it to the front
pivot_ix = random.randint(lt, rt)
pivot = arr[pivot_ix]
arr[pivot_ix], arr[lt] = arr[lt], arr[pivot_ix]
i = lt + 1 # first ix > pivot
for j in range(lt + 1, rt + 1):
if arr[j] <= pivot: # arr[j] > pivot: do nothing
arr[i], arr[j] = arr[j], arr[i]
i += 1
# swap the pivot to the correct position
arr[lt], arr[i-1] = arr[i-1], arr[lt]
return i-1
arr=[4, 5, 1, 1, 3, 1, 2, 0, 5, 7]
quicksort(arr)
print(arr) # [0, 1, 1, 1, 2, 3, 4, 5, 5, 7]
To get time complexity of randomised algorithms
Express Y as a sum of indicator random variables \(Y = \sum_{e=0}^m X_e\)
Given an array A with n distinct numbers and k, return the kth order statistic. The k-th order statistic is the kth smallest number.
10, 8, 2, 4 -> 3rd order statistic is 8. First order statistic is 2.
The solution is O(n). Using reduction, we can sort the array in O(n log n) by sorting and returning the (k-x1)th element.
Follows quicksort almost verbatim.
Algorithm:
Time complexity This is a linear time algorithm on expectation. The proof involves using
The algorithm is precisely the same as the randomised approach except how we choose our pivots. This algorithm is deterministically guaranteed to be of \(O(n)\). However, we pay in terms of very large constants in the big-O notation, need for an extra storage of length \(O\left(\frac{n}{5}\right)\).
Algorithm
Time complexity
A balanced binary search tree can be thought of as a dynamic sorted list which supports insertions and deletions in \(O(\log(n))\) time when a sorted list would take \(O(n)\).
Binary Search Tree Property (balanced or otherwise): All the nodes in left subtree of a node n
are <=
to n.val
and all the nodes in the right subtree of n
are
>
the n.val
. \(n_l.val <= n.val < n_r.val\). This is true for all the nodes
in the BST.
The height/depth of a BST can be anywhere between \(O(n)\) and \(O(\log_2(n))\) depending on if it’s balanced. So most operations are \(\theta(height)\).
3
/ \
1 5
\ /
2 4
4
, we go up to 5
but it’s larger than 4
, so we continue up to 3
which is
smaller than 4
and this is indeed the 4
’s predecessor. Another way to think about
this is we find the predecessor parent, when we turn left, ie., the current node is in the
right subtree of an ancestor node. If you can never turn left, then you are the smallest
element. Think of 1
above.size(n)
in each node n
. The
size(n) = 1 + size(left) + size(right)
. The size of a null node is
considered zero. The size is essentially the number nodes in the subtree rooted at n
.size(left) = i-1
, then the current node is the i-th order statistic.size(left) >= i
, we recurse left looking for the i-th
order statistic.size(left) < i
, then we recurse right looking for i - (size(left)+1)
.Rank of node n
:
We recurse and find the node n
. Every time you move left you add size(left)+1 to
the running count until you reach the node you’re looking for. In the example below,
say I want to find the rank of 5
, I start at the root 2
and move rt to 6. My
running count r += size(1) + 1 = 2
. Then I recurse left to 4
, nothing is added
to r
. Then I recurse to the right to 5
and add r += size(3) + 1 = 4
. We return
r + 1
.
2
/ \
1 6
/
4
/ \
3 5
Deletion has 3 cases:
None
.Two children: This is quite cheeky. We run the predecessor operation which will will be the rightmost child of the left subtree (which is guaranteed to exist as this node has 2 children). Then we swap this node with the predecessor node (copy over children and parent pointers). This
Delete 6 in the BST below
3
/ \
1 6
/ \
5 7
/
4
Step 1: Swap with predecessor (temporarily break BST invariant)
3
/ \
1 5
/ \
6 7
/
4
Step 2: Delete 6, we know how to deal with deleting nodes with single
children. Simply move 4 up to be 5's left child.
3
/ \
1 5
/ \
4 7
from dataclasses import dataclass
@dataclass
class Node:
val: float
lt: Optional['Node'] = None
rt: Optional['Node'] = None
def inorder(node: Node) -> None:
if node is None:
return
inorder(node.lt)
print(node.val)
inorder(node.rt)
def insort(a, x):
n = len(a)
lt, rt = 0, n - 1
while lt <= rt:
mid = (lt + rt) // 2
if a[mid] == x:
rt = mid
break
elif a[mid] > x:
rt = mid - 1
else:
# a[mid] < x:
lt = mid + 1
a.insert(rt+1, x)
return a
a = [1,2,3,4,5]
insort(a, 2.3)
insort(a, 88)
insort(a, 0)
insort(a, 4.5)
# Out: [0, 1, 2, 2.3, 3, 4, 4.5, 5, 88]