Fergus's blog

Fergus's blogLLM inference thoughtshttps://fergusfinn.com/Control Layer Benchmarkinghttps://fergusfinn.com/blog/control-layer-benchmarking/https://fergusfinn.com/blog/control-layer-benchmarking/Benchmarking the Doubleword Control Layer Tue, 21 Oct 2025 00:00:00 GMTThe Doubleword Control Layerhttps://fergusfinn.com/blog/control-layer/https://fergusfinn.com/blog/control-layer/We're releasing our AI Gateway. Why are AI gateways hard to do right, and why do we think we've done it right. Tue, 21 Oct 2025 00:00:00 GMTHow fast can LLM inference go? https://fergusfinn.com/blog/inference-arithmetic/https://fergusfinn.com/blog/inference-arithmetic/Comparing InferenceMAX to the hardware limits Wed, 22 Oct 2025 00:00:00 GMTLLM guided schedulinghttps://fergusfinn.com/blog/llm-guided-scheduling/https://fergusfinn.com/blog/llm-guided-scheduling/An idea on how to use LLMs to help with scheduling Tue, 23 Sep 2025 00:00:00 GMTPaged attentionhttps://fergusfinn.com/blog/paged-attention/https://fergusfinn.com/blog/paged-attention/A discussion of paged attention Mon, 22 Sep 2025 00:00:00 GMTUsing caching for fast speculative decodinghttps://fergusfinn.com/blog/spacelike-speculative-decoding/https://fergusfinn.com/blog/spacelike-speculative-decoding/Speculative decoding speeds up LLM inference, but using another model works poorly. Mon, 22 Sep 2025 00:00:00 GMTScheduling in inference engineshttps://fergusfinn.com/blog/scheduling-in-inference-engines/https://fergusfinn.com/blog/scheduling-in-inference-engines/A look into how inference engines choose which requests to process Mon, 22 Sep 2025 00:00:00 GMT