tANS: precomputing rANS
In Also-rANS, I walked through rANS: pack a stream of symbols into a single integer via reversible arithmetic, hit the Shannon limit asymptotically, no rounding to whole bits.
The cost shows up at runtime. Encoding takes a division (); decoding takes another (). On modern hardware integer division takes tens of cycles, where multiplication takes a few. For a compressor that wants to push gigabytes per second through a single core, division on the hot path is too expensive.
tANS removes the division. We tighten rANS’s renormalization invariant until there are only finitely many possible states, then precompute every transition and store it in a table. Per-symbol work collapses to a table lookup and a shift.
The compression is the same as rANS in the limit — same Shannon bound, same fractional-bit costs — but the implementation looks completely different. Encoding becomes a state machine driven by a precomputed transition table; decoding becomes a table walk that reads bits from a stream and emits symbols. tANS is what powers Yann Collet’s FSE library and Zstandard’s entropy stage.
The slot picture, revisited
ANS coders share a common shape. There’s a state — a single integer that evolves as we encode each symbol. There’s a frequency table over the alphabet, with total . And there’s renormalization, which keeps inside some chosen range by spilling low digits to (or refilling from) an output stream as the state grows or shrinks. is a lower bound; is the base of the stream, typically (bits) or (bytes).
The state and the alphabet meet in a slot table. Allocate the integers to symbols, giving each exactly slots. For our running alphabet (), one valid allocation is
[A, A, A, A, B, B, B, C]
The decoder consults this table at every step. Compute the slot index , look up which symbol owns it. That’s the next emitted symbol.
tANS’s tightening choice is to set and . The state now lives in , an interval containing exactly integers — the same size as the slot space.
In rANS, the state space is much bigger than the slot space. Many different states share the same slot, and extracts the slot from the state as a feature. In tANS, the two are in bijection: the eight states map one-to-one onto the eight slots , with giving the slot index directly.
This changes what the state is. There are now only possible states, and each is permanently associated with a single symbol — the one that owns its slot. State is an state. State is a state. The state isn’t a number that contains a slot; it is a slot.
The decoder no longer computes to find the next symbol — the state already is one. The interesting question is what happens to the state next, after a symbol is emitted.
Decoding via a table
The decoder has two jobs at each step: emit the symbol associated with the current state, and transition to the next state. The first is a lookup in the slot table. The second needs work.
In rANS, the decode transition is
For tANS, sits in and , so always. The formula collapses to
The expression is the offset of ‘s slot inside symbol ‘s range — a number in . So lands in .
For our running alphabet, that means decoding from an A state lands at ; a B state, ; the C state, . All of these are below , so the transition needs renormalization: pull bits from the stream into the low end of until it’s back in .
Crucially, the number of refill bits is fully determined by which state we’re decoding from. Decoding from state produces C and lands at , which needs three refills to reach . State produces B and lands at , which needs two. States and (also B) land at and , each needing one refill. All four A-states need exactly one.
Everything the decoder needs — symbol, pre-refill state, refill count — depends only on the current state. So precompute it:
state symbol x_prev refills
8 A 4 1
9 A 5 1
10 A 6 1
11 A 7 1
12 B 3 2
13 B 4 1
14 B 5 1
15 C 1 3
A decode step is now: look up the row for the current state, emit the symbol, pull refills bits off the stream into the low end of x_prev, set the state to the result. No division.
A worked example. We’ll construct the encoded form properly in the next section; for now take it on faith that the message encodes to starting state with bit stream , rightmost bit popped first.
Decode at : A, , 1 refill. Pop → state .
Decode at : B, , 2 refills. Pop → state . Pop → state .
Decode at : C, , 3 refills. Pop → , pop → , pop → .
The stream is empty and the state is — the seed. Decoded message: .
Encoding via a table
Encoding inverts decoding. Given the current state and a symbol to encode, find the unique such that the decoder, starting at , emits and lands back at after refilling from the stream.
The decoder at ran two steps: compute , then refill bits from the stream into the low end of until back in . Inverting, the encoder:
- Spills enough low bits of to drop it into . The spill count is the smallest non-negative integer with .
- Sets .
- Computes .
Step 3 inverts the decoder’s slot lookup: is the offset within ‘s range, shifts to the absolute slot index, and converts slot to state.
Both the spill count and the next state depend only on and , so encoding is also table-driven. Implementations store a (state, symbol) → (spill count, next state) table; the spilled bits themselves are just x mod 2^k, appended low-bit-first.
For our running alphabet, encoding A always spills 1 bit; encoding C always spills 3. Both match Shannon: and . Encoding B spills 1 bit from low states (8–11) and 2 bits from high states (12–15) — averaging closer to bits than Shannon’s . The gap is the slot allocation’s fault; the next section shows how a smarter spread closes it.
A worked example. Encode the message from seed state . Encoding processes the message in reverse, so encode C first.
Encode C from : . Append the three low bits of : . . New state .
Encode B from : , since overshoots . Append the two low bits of : . . New state .
Encode A from : . Append the low bit of : . . New state .
Final state , stream in append order — the same encoded form decoded in the previous section.
Building the tables: spreading symbols
The slot table has been a free parameter throughout. The contiguous allocation was an arbitrary choice; the previous section showed the cost — encoding B averages bits per symbol against Shannon’s .
The layout matters because the encode spill count depends on the current state. With contiguous slots, the B-states are , , and — the upper half of , where encoding B costs bits instead of . Whenever the encoder lands in a B-state and the next symbol is also B, we pay the extra bit. Clustering symbols together creates state correlations that hurt amortized cost.
The fix: spread each symbol’s slots roughly uniformly across . A symbol with slots should land roughly once every positions, regardless of which symbol came before.
Yann Collet’s heuristic does this with one hash-like walk. Set the stride . Walk a cursor around , stepping by modulo each time, and place each symbol times in succession before moving to the next. The intuition: is far from any small rational fraction, so the orbit doesn’t cluster; the keeps the stride coprime to for power-of-two sizes, so every position is visited exactly once before the cursor returns.
To see the spread do anything interesting, bump our running example up to — same probabilities, more states. The stride is . Starting from cursor and stepping by , the first eight visits go to A, the next six to B, the last two to C:
A: 0, 13, 10, 7, 4, 1, 14, 11
B: 8, 5, 2, 15, 12, 9
C: 6, 3
Sorted by position, the slot table reads
position: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
symbol: A A B C A B C A B B A A B A A B
Each symbol is sprinkled across the full range with no contiguous runs. With this spread, the encoder’s bit costs hit Shannon much more closely. B’s amortized cost drifts toward , and the slack continues to close as grows.
Putting it together
Here is complete tANS in pseudocode. Pick a state range with , build the spread and the encode/decode tables, and run the loops.
def build_spread(freq, L):
"""Yann's hash-like walk: place each symbol f_s times, stepping by sigma."""
sigma = (L >> 1) + (L >> 3) + 3
spread = [None] * L
cursor = 0
for s, (f, _) in freq.items():
for _ in range(f):
spread[cursor] = s
cursor = (cursor + sigma) % L
return spread
def encode(message, freq, L, encode_table):
x = L # seed state
stream = []
for s in reversed(message): # process LIFO
f, _ = freq[s]
while x >= 2 * f: # spill low bit until x in [f, 2f)
stream.append(x & 1)
x >>= 1
x = encode_table[s][x - f] # look up next state
return x, stream
def decode(x, stream, decode_table, L, n):
message = []
for _ in range(n):
s, x_prev, refills = decode_table[x - L]
message.append(s)
for _ in range(refills): # refill bits until back in [L, 2L)
x_prev = (x_prev << 1) | stream.pop()
x = x_prev
return message
The encode_table and decode_table are built once from the spread and reused for every message. encode_table[s][r] is the next state when encoding with . decode_table[i] is the triple for state .
The decode inner loop is what makes tANS fast. Per symbol: one table lookup, one append, a handful of shifts and bitwise-ors, a stack pop. No division, no symbol search, no variable-length prefix code. Encoding is similarly cheap, with the spill loop usually running for one or two iterations. This is why tANS sits at the heart of FSE and Zstandard’s entropy stage: all the per-symbol work happens in tight loops over precomputed data.
That is tANS. A finite state machine with each state labelled by a symbol, encode and decode tables that turn every step into a few lookups and bit shifts, and a spread that distributes symbols evenly across the state space. Same Shannon-bounded compression as rANS, at table-driven speed.
Last modified: 29 Apr 2026