Blog Posts

CorpusBench

A new agentic customer service benchmark focused on using historical business context to infer policy

March 26, 2026

Why isn’t AI diffusing faster?

A six-month check-in on automating a small ecommerce business with frontier models

March 7, 2026

Building an Agent for Scheduling

I needed an agent to do workforce scheduling. One-shot prompting failed, so I built a hierarchical system instead. This post walks through the architecture and the tradeoffs.

March 26, 2025

First-party software

The economics of building software are changing

March 22, 2025

Can LLMs solve complex scheduling problems? (custom eval)

An analysis of LLMs' ability to construct constraint-optimized employee schedules

November 2, 2024

Improving eval performance

A simple framework for iterating quickly to identify what works when building with LLMs

September 21, 2024

Executive leverage

Why many executives spend time on 'minor' details, and why it is rational to do so

July 6, 2024

[Project] Building an Agentic Chatbot

Improving LLM capabilities through increased runtime compute and better specialization.

June 18, 2024

Autonomy in chess vs markets

Most market-driven activity has different properties than chess that will result in less human-centricity

May 26, 2024

Finetuning vs in-context learning

Fixed-cost economics mean it is unlikely that dramatically longer context windows will replace fine-tuning

April 2, 2024

Getting things done: fast or efficient?

Speed and efficiency are often at odds in project management; it is worth considering which you are optimizing for

January 3, 2024

Do consultants add value?

Explaining the gap between perception and reality of what consulting firms are selling

June 6, 2023

'The Nature of Technology'

Review of Brian Arthur's fascinating theory of the true nature of technology and the resulting implications

May 11, 2022