Blog

LLM powered data structures: A concurrent, lock-free binary search tree

A lock-free binary search tree optimized for expensive async comparisons, with threaded linked list for O(1) sorted iteration

13 Jan 2026
Large-Scale Semantic Search Without Embeddings

Applying parallel primitives to search and rank 2.4 million arXiv papers using LLM judgments

1 Jan 2026
Parallel Primitives for Multi-Agent Workflows

Exploring coordination patterns from parallel computing for multi-agent LLM systems

31 Dec 2025
How fast can an LLM go?

Comparing InferenceMAX to the hardware limits

22 Oct 2025
Control Layer Benchmarking

Benchmarking the Doubleword Control Layer

21 Oct 2025
The Doubleword Control Layer

We're releasing our AI Gateway. Why are AI gateways hard to do right, and why do we think we've done it right.

21 Oct 2025
LLM guided scheduling

An idea on how to use LLMs to help with scheduling

23 Sep 2025
Paged attention

A discussion of paged attention

22 Sep 2025
Scheduling in inference engines

A look into how inference engines choose which requests to process

22 Sep 2025
Using caching for fast speculative decoding

Speculative decoding speeds up LLM inference, but using another model works poorly.

22 Sep 2025