<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Fergus&apos;s blog</title><description>LLM inference thoughts</description><link>https://fergusfinn.com/</link><item><title>Control Layer Benchmarking</title><link>https://fergusfinn.com/blog/control-layer-benchmarking/</link><guid isPermaLink="true">https://fergusfinn.com/blog/control-layer-benchmarking/</guid><description>Benchmarking the Doubleword Control Layer
</description><pubDate>Tue, 21 Oct 2025 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/control-layer-benchmarking/md</mdUrl></item><item><title>The Doubleword Control Layer</title><link>https://fergusfinn.com/blog/control-layer/</link><guid isPermaLink="true">https://fergusfinn.com/blog/control-layer/</guid><description>We&apos;re releasing our AI Gateway. Why are AI gateways hard to do right, and why do we think we&apos;ve done it right.
</description><pubDate>Tue, 21 Oct 2025 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/control-layer/md</mdUrl></item><item><title>How fast can an LLM go? </title><link>https://fergusfinn.com/blog/inference-arithmetic/</link><guid isPermaLink="true">https://fergusfinn.com/blog/inference-arithmetic/</guid><description>Comparing InferenceMAX to the hardware limits
</description><pubDate>Wed, 22 Oct 2025 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/inference-arithmetic/md</mdUrl></item><item><title>LLM guided scheduling</title><link>https://fergusfinn.com/blog/llm-guided-scheduling/</link><guid isPermaLink="true">https://fergusfinn.com/blog/llm-guided-scheduling/</guid><description>An idea on how to use LLMs to help with scheduling
</description><pubDate>Tue, 23 Sep 2025 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/llm-guided-scheduling/md</mdUrl></item><item><title>Paged attention</title><link>https://fergusfinn.com/blog/paged-attention/</link><guid isPermaLink="true">https://fergusfinn.com/blog/paged-attention/</guid><description>A discussion of paged attention
</description><pubDate>Mon, 22 Sep 2025 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/paged-attention/md</mdUrl></item><item><title>Scheduling in inference engines</title><link>https://fergusfinn.com/blog/scheduling-in-inference-engines/</link><guid isPermaLink="true">https://fergusfinn.com/blog/scheduling-in-inference-engines/</guid><description>A look into how inference engines choose which requests to process
</description><pubDate>Mon, 22 Sep 2025 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/scheduling-in-inference-engines/md</mdUrl></item><item><title>Using caching for fast speculative decoding</title><link>https://fergusfinn.com/blog/spacelike-speculative-decoding/</link><guid isPermaLink="true">https://fergusfinn.com/blog/spacelike-speculative-decoding/</guid><description>Speculative decoding speeds up LLM inference, but using another model works poorly.
</description><pubDate>Mon, 22 Sep 2025 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/spacelike-speculative-decoding/md</mdUrl></item><item><title>Scaling Curation with LLM Comparisons</title><link>https://fergusfinn.com/blog/llm-powered-content-discovery/</link><guid isPermaLink="true">https://fergusfinn.com/blog/llm-powered-content-discovery/</guid><description>Building a content discovery system using parallel primitives and BST-based ranking with LLM comparisons
</description><pubDate>Fri, 16 Jan 2026 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/llm-powered-content-discovery/md</mdUrl></item><item><title>Parallel Primitives for Multi-Agent Workflows</title><link>https://fergusfinn.com/blog/parallel-primitives-blog/</link><guid isPermaLink="true">https://fergusfinn.com/blog/parallel-primitives-blog/</guid><description>Exploring coordination patterns from parallel computing for multi-agent LLM systems
</description><pubDate>Wed, 31 Dec 2025 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/parallel-primitives-blog/md</mdUrl></item><item><title>LLM powered data structures: A concurrent, lock-free binary search tree</title><link>https://fergusfinn.com/blog/bst-expensive-comparisons/</link><guid isPermaLink="true">https://fergusfinn.com/blog/bst-expensive-comparisons/</guid><description>A lock-free binary search tree optimized for expensive async comparisons, with threaded linked list for O(1) sorted iteration
</description><pubDate>Tue, 13 Jan 2026 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/bst-expensive-comparisons/md</mdUrl></item><item><title>Large-Scale Semantic Search Without Embeddings</title><link>https://fergusfinn.com/blog/arxiv-llm-search/</link><guid isPermaLink="true">https://fergusfinn.com/blog/arxiv-llm-search/</guid><description>Applying parallel primitives to search and rank 2.4 million arXiv papers using LLM judgments
</description><pubDate>Thu, 01 Jan 2026 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/arxiv-llm-search/md</mdUrl></item><item><title>Weighted random fallback flattens to uniform under high error rates</title><link>https://fergusfinn.com/blog/weighted-fallback-flattening/</link><guid isPermaLink="true">https://fergusfinn.com/blog/weighted-fallback-flattening/</guid><description>When a weighted-random fallback rejects samples and retries without replacement, high error rates cause low-weight models to be selected far more often than their weights suggest.
</description><pubDate>Tue, 17 Feb 2026 00:00:00 GMT</pubDate><mdUrl>https://fergusfinn.com/blog/weighted-fallback-flattening/md</mdUrl></item></channel></rss>