Scalar vs Vector Operations
Core Concept
Scalar = One value at a time
1 2 3 | |
Vector = Multiple values at once (in this architecture, 8 values)
1 2 3 4 | |
Why it matters: The vector operation takes 1 cycle to do what would take 8 cycles with scalar operations.
Real-World Analogy
- Scalar (cashier): One customer, scan items one by one
- Vector (warehouse): Forklift picks up 8 boxes at once
Common SIMD Instruction Patterns
| Type | Operation | What it does |
|---|---|---|
| Scalar | add dest, a, b | dest = a + b (one value) |
| Vector | vadd dest, a, b | dest[0:N] = a[0:N] + b[0:N] (N values) |
| Scalar Load | load dest, addr | Load 1 word from memory |
| Vector Load | vload dest, addr | Load N consecutive words |
Note: N is the vector length (commonly 4, 8, or 16 depending on architecture).
Visual Example
Adding 10 to values at memory addresses 100-107:
Scalar approach (8 cycles minimum)
1 2 3 4 5 6 7 | |
Vector approach (3 cycles)
1 2 3 | |
Broadcasting
The problem: Vector ALUs operate on vectors, not mixed scalar+vector. You can't directly add a single number to a vector.
1 | |
The solution: "Broadcast" the scalar - replicate it into all slots of a vector first.
1 2 3 4 5 6 | |
Why it's still fast: You broadcast once, then reuse that broadcasted vector for many operations. If you're adding 10 to 1000 different vectors, you broadcast once and use it 1000 times.
Key Takeaway
Vectorization gives up to 8x speedup by processing 8 elements in the same time it takes to process 1.