Skip to content

Instruction Set Reference

Common instruction patterns in VLIW SIMD architectures. Specific syntax varies by architecture, but the concepts are universal.


Instruction Format

Every instruction is a tuple: (engine, (operation, ...operands))

Most operands are scratch addresses (not values). The scratch space is like a large register file.


ALU Engine (12 slots/cycle)

Scalar arithmetic and logic operations.

1
2
("alu", (op, dest, src1, src2))
# scratch[dest] = scratch[src1] OP scratch[src2]

Available Operations

OpDescriptionExample
+Additionscratch[0] = scratch[1] + scratch[2]
-Subtractionscratch[0] = scratch[1] - scratch[2]
*Multiplicationscratch[0] = scratch[1] * scratch[2]
//Integer divisionscratch[0] = scratch[1] // scratch[2]
%Moduloscratch[0] = scratch[1] % scratch[2]
&Bitwise ANDscratch[0] = scratch[1] & scratch[2]
\|Bitwise ORscratch[0] = scratch[1] \| scratch[2]
^Bitwise XORscratch[0] = scratch[1] ^ scratch[2]
<<Left shiftscratch[0] = scratch[1] << scratch[2]
>>Right shiftscratch[0] = scratch[1] >> scratch[2]
<Less than (returns 0 or 1)scratch[0] = 1 if scratch[1] < scratch[2] else 0
==Equal (returns 0 or 1)scratch[0] = 1 if scratch[1] == scratch[2] else 0

VALU Engine (6 slots/cycle)

Vector operations on VLEN=8 contiguous elements.

1
2
("valu", (op, dest, src1, src2))
# scratch[dest:dest+8] = scratch[src1:src1+8] OP scratch[src2:src2+8]

Available Operations

Same as ALU (+, -, *, //, %, &, |, ^, <<, >>, <, ==) but operates on 8 elements.

Special: Broadcast

1
2
3
("valu", ("vbroadcast", dest, src))
# scratch[dest:dest+8] = [scratch[src], scratch[src], ..., scratch[src]]
# Copies one scalar value into all 8 vector slots

Load Engine (2 slots/cycle)

Read from memory or load constants.

Load Constant

1
2
("load", ("const", dest, value))
# scratch[dest] = value (immediate constant)

Load from Memory (Scalar)

1
2
3
("load", ("load", dest, addr_scratch))
# scratch[dest] = memory[scratch[addr_scratch]]
# Note: addr_scratch contains the ADDRESS, not the value

Load from Memory (Vector)

1
2
3
4
("load", ("vload", dest, addr_scratch))
# scratch[dest:dest+8] = memory[addr:addr+8]
# where addr = scratch[addr_scratch]
# Loads 8 consecutive memory words

Store Engine (2 slots/cycle)

Write to memory.

Store to Memory (Scalar)

1
2
("store", ("store", addr_scratch, src))
# memory[scratch[addr_scratch]] = scratch[src]

Store to Memory (Vector)

1
2
3
4
("store", ("vstore", addr_scratch, src))
# memory[addr:addr+8] = scratch[src:src+8]
# where addr = scratch[addr_scratch]
# Stores 8 consecutive memory words

Flow Engine (1 slot/cycle)

Control flow operations.

Select (Conditional Move)

1
2
("flow", ("select", dest, cond, true_val, false_val))
# scratch[dest] = scratch[true_val] if scratch[cond] else scratch[false_val]

Vector Select

1
2
3
("flow", ("vselect", dest, cond, true_val, false_val))
# For each i in 0..7:
#   scratch[dest+i] = scratch[true_val+i] if scratch[cond+i] else scratch[false_val+i]

Jump (Unconditional)

1
2
("flow", ("jump", target_pc))
# PC = target_pc (jump to instruction at index target_pc)

Conditional Jump

1
2
("flow", ("cond_jump", cond_scratch, target_pc))
# if scratch[cond_scratch]: PC = target_pc

Pause

1
2
("flow", ("pause",))
# Pause execution (for debugging synchronization)

Halt

1
2
("flow", ("halt",))
# Stop execution

Important Notes

  1. All effects apply at end of cycle - reads happen before writes
1
2
# This works! Both read old values, then both write
{ "alu": [("swap", a, b, b), ("swap", b, a, a)] }  # Not actual syntax, just concept
  1. Scratch addresses must be allocated - Track which addresses are in use

  2. Vector operations use contiguous addresses - dest means dest through dest+7

  3. Memory addresses come from scratch - Load/store take a scratch address containing the memory address, not the memory address directly