Welcome, Maxime! Your 4-Day Exam Plan
You just added DSC 40B and exam 1 is in 4 days. That sounds scary, but here is the good news: this guide covers EVERYTHING you need. The material is about how fast code runs — and you can learn it!
What is DSC 40B About? (30-Second Version)
In DSC 40A, you learned the MATH (like: "the answer is the mean"). In DSC 40B, you learn: How does a computer actually CALCULATE that answer? And how FAST can it do it?
Your 4-Day Study Schedule
| Day | What to Study | Time |
|---|---|---|
| Day 1 (Today) | Sections 1-4: Algorithms, Loop Invariants, Counting Operations | ~2 hours |
| Day 2 | Sections 5-7: Big-Theta proofs, Best/Worst case, Expected time | ~2.5 hours |
| Day 3 | Sections 8-10: Sorting, Log properties, Homework 2 walkthrough | ~2 hours |
| Day 4 | Section 11: Full practice exam, then review mistakes | ~2 hours |
How to Use This Guide (ADHD-Friendly Tips)
Read out loud. When you see a "SAY IT OUT LOUD" box, actually say the words. Research shows speaking helps your brain remember 2x better.
Take breaks. When you see a break box, stand up, stretch, get water. Your brain needs 5-minute breaks every 25 minutes.
Do every quiz. Click the answers, don't just read. Active practice beats passive reading by 3x.
Use the points system. Try to get as many points as possible. The sidebar tracks your score!
Lecture 1: What is an Algorithm?
The Big Picture
In DSC 40A, you learned to find answers using math. For example, "the number that minimizes the total absolute error is the median." But 40A stopped there. DSC 40B asks the NEXT question:
To find the median, you need to: (1) sort all the numbers, (2) pick the middle one. Sorting is the hard part — it is an algorithm. And different sorting algorithms have different speeds!
What is an Algorithm?
An algorithm is a step-by-step set of instructions to solve a problem. Like a recipe for cooking.
The 4 Parts of a Data Science Problem
Example: Computing the Mean
This algorithm is correct (it always gives the right answer) and efficient (it only looks at each number once).
Two Big Questions in DSC 40B
Question 1: CORRECTNESS — Does the algorithm always give the right answer? For EVERY possible input?
Question 2: EFFICIENCY — How fast is it? If you have 1,000 numbers, how many steps? What about 1,000,000 numbers?
How Fast is Fast? A Taste of What's Coming
Imagine 3 algorithms processing n = 1,000,000 data points:
| Algorithm | Speed Type | Operations | Time |
|---|---|---|---|
| A (Linear) | n | 1,000,000 | 1 second |
| B (Quadratic) | n² | 1,000,000,000,000 | 11.57 days! |
| C (Cubic) | n³ | 10¹&sup8; | 31,709 years! |
See the difference? A slow algorithm on big data is USELESS. This is why we study efficiency!
Quiz: Lecture 1 (10 points each)
Q1: What does DSC 40B focus on that 40A did not?
Q2: Which is NOT part of a data science problem?
Lecture 2: Loop Invariants (Proving Algorithms Work)
The Problem: How Do We PROVE an Algorithm is Correct?
Testing with a few examples is not enough. Maybe your algorithm works on [1,2,3] but fails on [999, -5, 0]. We need PROOF that it works on ALL inputs.
What is a Loop Invariant?
A loop invariant is a statement (a fact) that is TRUE:
• Before the loop starts (we call this "after iteration 0")
• After every single iteration of the loop
• After the loop ends (and this tells us the algorithm is correct!)
The 3-Step Proof Method
Worked Example: Proving mean() is Correct
Invariant: "After the αth iteration, total contains the sum of the first α elements of data."
Quiz: Loop Invariants (10 points each)
Q1: What is a loop invariant?
Q2: In the mean() proof, what is the initialization step?
Lecture 3: How We Measure Speed (Time Complexity)
Why Can't We Just Use a Stopwatch?
You might think: run the code and time it! But this doesn't work because:
• A fast computer gives different times than a slow computer
• Other apps running in the background change the time
• Different inputs give different times
What is a "Basic Operation"?
A basic operation takes the same amount of time no matter what. We call this constant time or Θ(1).
| Basic Operations Θ(1) | NOT Basic (takes more time) |
|---|---|
arr[i] — access one element | sum(arr) — adds ALL n elements = Θ(n) |
x + y — add two numbers | max(arr) — checks ALL n elements = Θ(n) |
x == y — compare two numbers | sorted(arr) — sorts = Θ(n log n) |
x = 5 — assign a value | arr.copy() — copies ALL n elements = Θ(n) |
len(arr) — Python stores this* | "hello" in arr — searches ALL = Θ(n) |
*In Python, len() is constant time because the length is stored. But in this class, follow what the professor says.
T(n): The Total Operation Count
T(n) = the total number of basic operations when input size is n.
The Table Method (Step by Step)
For each line of code, ask: (1) How much work does ONE execution cost? (2) How many times does this line execute? (3) Multiply them.
Simple Loop = Θ(n)
Nested Independent Loops = Θ(n²)
Nested Dependent Loops = Still Θ(n²)!
| When i = | Inner loop runs |
|---|---|
| 0 | 0 times |
| 1 | 1 time |
| 2 | 2 times |
| ... | ... |
| n-1 | n-1 times |
The formula 1 + 2 + 3 + ... + k = k(k+1)/2 is called the arithmetic series formula. MEMORIZE IT for the exam!
for j in range(i) inside for i in range(n), the total is always n(n-1)/2 = Θ(n²). This pattern appears A LOT on exams.Quiz: Time Complexity (10 points each)
Q1: What is T(n) for this code?
Q2: What is 1+2+3+...+100?
Lectures 4-5: Big-Θ, Big-O, Big-Ω
Why Do We Need Special Notation?
When we count operations, we get things like T(n) = 5n² + 3n + 42. But the "5" and the "+3n+42" don't really matter when n is HUGE. We want to say "this grows like n²" without worrying about details.
The Three Notations: A Simple Comparison
Formal Definitions (You MUST Know These for the Exam)
Θ (Theta) — The Sandwich
for all n ≥ N: c&sub1; · g(n) ≤ f(n) ≤ c&sub2; · g(n)
Think of it as: when n is big enough, f(n) is always SANDWICHED between two scaled copies of g(n).
O (Big-O) — The Ceiling
for all n ≥ N: f(n) ≤ c · g(n)
Ω (Omega) — The Floor
for all n ≥ N: c · g(n) ≤ f(n)
Critical Rules for the Exam
Rule 1: Drop Constants
5n² = Θ(n²) — the "5" disappears because for big n, constants don't change the growth rate.
Rule 2: Keep Only the Biggest Term
n² + n + 100 = Θ(n²) — when n = 1,000,000: n² = 10¹² but n = 10&sup6;. The n² term completely dominates!
Rule 3: Θ Implies Both O and Ω
If f = Θ(n²) then f = O(n²) AND f = Ω(n²).
Rule 4: O Can Be Looser Upward
If f = Θ(n²), then f = O(n²), f = O(n³), f = O(n¹&sup0;&sup0;) — all TRUE! (Saying "at most n³" is true even if it's really n².)
Rule 5: Ω Can Be Looser Downward
If f = Θ(n²), then f = Ω(n²), f = Ω(n), f = Ω(1) — all TRUE!
• Θ(n²) YES • O(n²) YES • Ω(n²) YES • O(n³) YES (looser upper) • Ω(n) YES (looser lower)
• Θ(n) NO • O(n) NO • Θ(n³) NO • Ω(n³) NO
The Growth Rate Ladder (Memorize This!)
constant
logarithmic
linear
linearithmic
quadratic
cubic
← FAST SLOW →
Tricky True/False (Exam Favorites!)
Trap 1: Can You Divide Θ by O?
Q: If f&sub1; = Θ(g&sub1;) and f&sub2; = O(g&sub2;), is f&sub1;/f&sub2; = Θ(g&sub1;/g&sub2;)?
A: FALSE! O only gives an upper bound. f&sub2; could be MUCH smaller than g&sub2;, making the division MUCH bigger.
Trap 2: Does Θ(n²) Always Beat Θ(n³) at n=100?
A: FALSE! Θ hides constants. Algorithm A could be 1,000,000·n² and Algorithm B could be 0.001·n³. At n=100, B is faster!
Trap 3: If f = Ω(n&sup5;) and f = O(n&sup7;), must f be Θ(n&sup5;), Θ(n&sup6;), or Θ(n&sup7;)?
A: FALSE! f could be Θ(n&sup5;.³) or Θ(n&sup6;.&sup7;) or any power between 5 and 7. Or f might not even HAVE a Θ bound!
Quiz: Big-Theta, Big-O, Big-Omega (10 points each)
Q1: If f(n) = 3n³ + 2n + 1, what is f(n) in Θ notation?
Q2: If f = Θ(n²), which is TRUE?
Q3: TRUE or FALSE: If f&sub1;=Θ(g&sub1;) and f&sub2;=O(g&sub2;), then f&sub1;/f&sub2;=Θ(g&sub1;/g&sub2;)
Proving Θ with Constants (Chain of Inequalities)
What the Exam Asks You to Do
The exam gives you a function like f(n) = 3n² - 8n + 4 and asks you to prove it is Θ(n²) by finding actual numbers for c&sub1;, c&sub2;, and N.
c&sub1; · n² ≤ 3n² - 8n + 4 ≤ c&sub2; · n² for all n ≥ N
Step 1: Prove the Upper Bound (f(n) ≤ c&sub2; · g(n))
The trick: make f(n) BIGGER until it looks like c·n².
Two tools to make something bigger:
• "Promote" lower terms: replace smaller powers with bigger powers. Example: replace +4 with +4n² (since n² ≥ 1 when n ≥ 1, so 4n² ≥ 4)
• "Drop" negative terms: removing a subtraction makes things bigger. Example: 3n² - 8n ≤ 3n² (because we removed -8n)
Example: Upper bound for 3n² - 8n + 4
Step 2: Prove the Lower Bound (c&sub1; · g(n) ≤ f(n))
The trick: make f(n) SMALLER until it looks like c·n².
Two tools to make something smaller:
• "Drop" positive terms: removing an addition makes things smaller.
• "Absorb" negative terms: split the leading term to "eat" the negative parts.
Example: Lower bound for 3n² - 8n + 4
Putting It Together
Another Full Example: 4n³ - 5n² + 50 = Θ(n³)
Quiz: Proving Theta (10 points each)
Q1: To prove f(n) ≤ c&sub2;·n² (upper bound), what can you do?
Q2: For f(n)=2n²+3n, which is a valid upper bound proof?
Lecture 6: Best Case, Worst Case & Expected Time
Why Do Some Algorithms Have Different Speeds?
Some code has if statements or early exits that can make it stop early. Depending on the input, the SAME code can be fast or slow!
Insertion Sort — The Classic Example
Best Case: Array Already Sorted [1, 2, 3, 4, 5]
Each new element is already bigger than everything before it. The while loop condition x < arr[j] is FALSE immediately every time. While loop runs 0 times!
Worst Case: Array in Reverse [5, 4, 3, 2, 1]
Each new element is SMALLER than everything before it. The while loop runs the MAXIMUM number of times:
i=1: 1 time, i=2: 2 times, ..., i=n-1: n-1 times
The foo() Function — Another Exam Favorite
CRITICAL: sum([x,y]) adds only 2 numbers = Θ(1). But sum(arr) adds ALL n numbers = Θ(n)!
Best case: The if-condition is true on the FIRST check. We do 1 outer + 1 inner + return sum(arr) which is Θ(n). Answer: Θ(n)
Worst case: The if-condition is NEVER true. We check all n×n pairs, each taking Θ(1), then return False. Answer: Θ(n²)
Expected Time (Probability-Weighted Average)
When code uses RANDOMNESS (like coin flips or random numbers), we compute the expected time:
Example: Coin Toss Code
Example: Random Integer Code
Quiz: Best/Worst Case (10 points each)
Q1: Best case of insertion_sort?
Q2: What is the best case of foo(arr)?
Lecture 7: Brute Force & Selection Sort
Brute Force = Try Everything
A brute force algorithm tries ALL possible answers and picks the best one. It's always correct but often SLOW.
Linear Search
Best case: target is first element → Θ(1)
Worst case: target is not in array → Θ(n)
Selection Sort
Idea: find the smallest element, put it first. Then find the second smallest, put it second. Repeat.
Time complexity: The inner loop runs (n-1) + (n-2) + ... + 1 + 0 = n(n-1)/2 times. ALWAYS Θ(n²), no matter what the input is! (No early exit possible.)
Theoretical Lower Bounds
A lower bound answers: "What is the MINIMUM work ANY algorithm must do for this problem?"
Quiz: Sorting (10 points each)
Q1: Selection sort time complexity?
Log Properties Cheat Sheet
What is a Logarithm? (Simple Explanation)
log&sub2;(n) asks: "2 to WHAT power gives me n?"
In computer science, "log" usually means log&sub2; unless stated otherwise.
The 5 Properties You Need
| # | Property | Example |
|---|---|---|
| 1 | log(a·b) = log(a) + log(b) | log&sub2;(8·4) = log&sub2;(8)+log&sub2;(4) = 3+2 = 5 |
| 2 | log(ak) = k·log(a) VERY IMPORTANT! | log&sub2;(8³) = 3·log&sub2;(8) = 3·3 = 9 |
| 3 | alog_a(n) = n | 2log&sub2;(8) = 2³ = 8 |
| 4 | log_b(c) = 1/log_c(b) (given as hint on exam!) | 1/log&sub2;(n) = log_n(2) |
| 5 | All logs grow at the same rate: log&sub2;(n) = Θ(log&sub1;&sub0;(n)) = Θ(ln(n)) | For Θ, the base doesn't matter! |
Exam Trick: Simplify n1/log&sub2;(n) × n
Exam Trick: Simplify 12·log&sub2;(3n²-2n)
Quick Quiz (10 points)
Q1: What is log&sub2;(2n) simplified?
Homework 2 Walkthrough
Due: Wednesday April 15 (TODAY!) These are the types of problems and how to solve them.
Problem Type 1: Prove Θ-notation
Given f(n) = 3n² - 8n + 4, prove f(n) = Θ(n²).
Practice: f(n) = (n² + 2n - 5)/(n - 10)
Problem Type 2: Combining Runtime Bounds
Given T&sub1;(n) = Θ(n².&sup5;), T&sub2;(n) = O(n log n), etc.
Problem Type 3: Best/Worst Case of Code
Problem Type 4: Expected Time
Problem Type 5: Theoretical Lower Bounds
Practice Exam (20 Questions)
Time yourself! Try to finish in 50 minutes. Check answers AFTER attempting each one.
PE-1: What is an algorithm? (5 pts)
PE-2: What are the 3 steps to prove a loop invariant? (5 pts)
PE-3: sum([x, y]) where x and y are single numbers is: (5 pts)
PE-4: sum(arr) where arr has n elements is: (5 pts)
PE-5: 1 + 2 + 3 + ... + n = ? (5 pts)
PE-6: 7n³ + 2n² - 5n + 100 = Θ(?) (5 pts)
PE-7: If f = Θ(n²), is f = O(n³) true? (5 pts)
PE-8: If f = Θ(n²), is f = Ω(n³) true? (5 pts)
PE-9: Best case of insertion sort? (5 pts)
PE-10: Worst case of insertion sort? (5 pts)
PE-11: Selection sort is always: (5 pts)
PE-12: log&sub2;(ak) = ? (5 pts)
PE-13: Θ(n²) + Θ(n) = ? (5 pts)
PE-14: Θ(n) × Θ(n) = ? (5 pts)
PE-15: TRUE or FALSE: Θ(n²) always beats Θ(n³) at n=100 (5 pts)
PE-16: Coin flip: heads → Θ(n), tails → Θ(n²). Expected time? (5 pts)
PE-17: Best case of linear_search? (5 pts)
PE-18: Lower bound to find if True or False is more common in n items? (5 pts)
PE-19: What does the "N" in the Θ definition represent? (5 pts)
PE-20: If f=Ω(n&sup5;) and f=O(n&sup7;), must f be Θ(n&sup5;), Θ(n&sup6;), or Θ(n&sup7;)? (5 pts)
Quick Reference & Cheat Sheet
Print this page or screenshot it for last-minute review!
Growth Rate Ladder (Slow to Fast)
| Notation | Name | Example Code Pattern | n=1M time |
|---|---|---|---|
| Θ(1) | Constant | arr[0], x+y | instant |
| Θ(log n) | Logarithmic | binary search | ~20 steps |
| Θ(n) | Linear | single for loop | 1 second |
| Θ(n log n) | Linearithmic | merge sort | ~20 seconds |
| Θ(n²) | Quadratic | nested for loops | ~12 days |
| Θ(n³) | Cubic | triple nested loops | ~31,709 years |
Key Formulas
Definitions
O: f(n) ≤ c·g(n) | Ω: c·g(n) ≤ f(n)
Algorithm Complexities
| Algorithm | Best Case | Worst Case |
|---|---|---|
| Linear Search | Θ(1) | Θ(n) |
| Insertion Sort | Θ(n) | Θ(n²) |
| Selection Sort | Θ(n²) | Θ(n²) |
Proving Θ Cheat Sheet
| For Upper Bound (f ≤ c&sub2;·g) | For Lower Bound (c&sub1;·g ≤ f) |
|---|---|
| Drop negative terms (makes bigger) | Drop positive lower-order terms (makes smaller) |
| Promote small positives: replace 5 with 5n² | Split leading term: 4n² = 3n² + n² |
| Add together: 3n² + 5n² = 8n² | Show leftover ≥ 0: n²-8n ≥ 0 when n ≥ 8 |
Expected Time Template
Common Traps
• Cannot divide Θ by O — only do arithmetic when BOTH are Θ
• Θ(n²) does NOT always beat Θ(n³) at specific n — constants matter!
• sum([x,y]) = Θ(1) but sum(arr) = Θ(n)
• Ω(n&sup5;) and O(n&sup7;) does NOT force Θ to be n&sup5;, n&sup6;, or n&sup7;