centaurprise

The rise of motivated software

DJ Thompson — Wed, 10 Sep 2025 03:25:39 GMT

The chat trap

You've heard it, I’ve heard it: the growing sentiment that "software is dead."

Why build apps when ChatGPT can do everything? Why design interfaces when you can just type what you want? AI is eating software, just like software ate the world.

But if software is dead, why are people still opening dozens of apps every day? Why do companies still pay hundreds of thousands of dollars for CRMs when an LLM with an MCP for a database would save so much spend? Why hasn't everyone switched to ChatGPT for everything?

The answer is simple. AI is powerful, but talking to a computer isn't always the best way to get things done. AI has a UX problem1.

What’s messed up is we’ve learned this lesson before. In 2017, big tech companies bet everything on chat. Apple and Facebook and X née Twitter all looked at WeChat in China, where you can order food, pay bills, and book flights all through messaging, and thought "this is the future!" They built chatbots for everything.

It just didn’t translate.

Why? Because when everything is possible, nothing is top-of-mind. Chat gives you infinite options, which means infinite surface area to test and infinite ways to get confused. Most chatbots became what Laura Burkhauser at Descript called "phone trees in trench coats"2. It’s the same frustrating "Press 1 for billing" experience, just dressed up in a chat bubble.

Now in 2025, we're making the same mistake again. But this time, we can learn from it.

Software is still hanging in there

The statistics on AI use are in flux, constantly. The MIT study that 95% of enterprise use has no ROI is going around like crazy3. It’s exacerbated by the fact that, when you look outside the US, 75% of knowledge workers use AI tools at work, but only 7-22% (depending on country) are using work-provided tools4. 40+% consistently are bringing tools from home instead, using their personal ChatGPTs instead of the Copilots and Bedrocks afforded them. Even still, only 44% of users (58% of the group using AI at work) say they’re “heavily-reliant” on AI while at work5.

What’s going on with the numbers? Well, inconsistent definitions of what it means to “use AI” and “get value,” for one. Strong differences between usage patterns at early adopter companies and non-adopter companies, for another. Older companies are slotting agents into existing bureaucracy, with amusing but low-ROI results6.

Different intent behind the adoption has an effect as well: some companies feel existential pressure to become “AI-native” (though they define that differently too), while others are checking a box from a board mandate, and that box does not include validating user stickiness over time.

And the user stickiness story in AI is one that isn’t being told much right now. AI companies are throwing down stellar (annualized) ARR and revenue-per-employee numbers, but lurking behind those numbers are skyrocketing churn metrics. Most demos look pretty sharp. The problem is what happens after the demo.

Here's an example: a friend who runs a proserv firm tried switching his team to an AI project management tool driven mostly by chat UX. The rep promises "it can do everything!" Naturally. And technically, that was true: you could badger the software into doing what you wanted, by chatting with it, for almost 100% of the feature portfolio. But after two weeks, his team was back on their old software.

"Every time someone wanted to check a deadline," he told me, "they had to type a question, wait for the response, and hope the AI understood what they meant. Burning tokens the whole time. We used to just… click on the calendar button.” One second versus thirty seconds, fifty times a day, every day, every week7.

This is the gap between potential and practice. AI can do amazing things, but if those things take longer than the old way, people don't switch.

Those who do: well, they’re there for the experiment, and they have no issue churning after a month if the value isn’t there. In fact, that’s the plan from the get-go. Some people on the leading edge of the “crossing-the-chasm” diagram aren’t early adopters, they’re just taste-testers.

The three ages of software

So, if software’s clearly changing but not-so-clearly surviving, where does that leave us? Evolving.

To understand what we're evolving into, let's look at where we've been. Software has advanced through three major ages, each fixing the problems of the last while creating new ones.

Age 1: Software That Just (Barely) Does Something (Sometimes)

In the beginning, there was embedded software, which came welded to hardware. Buy a DEC system, get DEC software. It did one thing, hopefully. When it broke - and it always broke - you called expensive consultants to fix it.

Users had one request: please don't crash. The bar wasn't high because there was no bar. Having any digital system at all felt like magic.

Age 2: Software That Does Everything (If You Can Figure It Out)

Then came the revolution: unopinionated software, freed from the shackles of predetermined hardware. Office. Lotus. Adobe. These platforms promised ultimate flexibility. Build anything! On any hardware! Customize everything! If you could dream it, you could (…probably) configure it.

But there was a catch. With great power came great confusion. It's like being handed a full set of professional chef's knives when all you wanted was to make a sandwich. Some people created masterpieces. Most people nicked themselves.

This era made consultants rich. Not fixing broken software anymore, but teaching people how to use it. Every company had that one person who "really knew Excel"—the keeper of mysterious formulas and secret shortcuts. Cottage industries like “Salesforce developers” cropped up, necessary experts in handling the arcane inner workings of systems designed to be so complex that they could theoretically do anything, given enough recursion (and enough budget).

Age 3: Software That Does What Its Founder Wants (Hope That’s What You Want Too)

Smart companies noticed something: most users were trying to do the same few things. What if, instead of infinite options, we only gave them the best way8?

Enter opinionated software. Box decided how folders should be organized. Slack decided how teams should communicate. Notion decided how documents should be structured. These tools didn't just provide features—they taught you a philosophy.

This worked brilliantly if you agreed with their philosophy. But what if you didn't? What if your team had spent years perfecting a different approach? Tough luck. As Stewart Butterfield put it: "There's no worthwhile software that doesn't involve behavior change."

The consultants didn't disappear. They just changed their pitch from "Let me teach you the features" to "Let me change your company culture."9

The fourth age - software that seeks your goals

Now, you could argue that with the rise of vibe coding, software from all three previous ages is basically commoditized. AI can copy any of it, at any time, right?10

But I posit we're approaching something new. Software that learns what you're trying to achieve and helps you get there. Not by forcing you into its workflow, but by molding itself to yours.

I call it motivated software.

Imagine you're planning a product launch. Today's software structures your UI and data, makes you think in terms of tasks, deadlines, and assignees. You create tickets, set dates, assign people. The software tracks whether Task A is done, but it doesn't understand that Task A only matters if it helps Product B succeed. (Sometimes, in the case of opinionated software, even if Product B succeeds the software will still complain that Task A didn’t get done in just the right ideological way).

Motivated software works differently. It has your same high-level goal: successful product launch. It takes inputs from connected systems: KPIs, documents, updates, system analytics. It notices that sales enablement for infrastructure products always takes longer than scheduled, so it quietly adjusts future timelines for impacted launches. It tracks that marketing needs extra lead time when engineering runs late, so it alerts them early: marketing doesn’t have to depend on someone from engineering manually reminding them each time. It detects that Sarah's qualification calls are thorough but slow, while Mike's are fast but sometimes miss edge cases, and it routes customer inquiries about the new product accordingly.

This isn't just automation. It’s open-ended pattern recognition, and it’s adaptive optimization11. The software develops a model of what you're trying to achieve and constantly searches for better paths to get there, building and tweaking its reward models constantly to stay in-line with yours.

Think of it like this:

Embedded software was a tool that, well, existed
Unopinionated software did what you told it, for better or worse
Opinionated software was a teacher that told you what to do
Motivated software will be a partner that helps you succeed

Isn’t that just an agent?

For people up to speed on the AI ecosystem, the most obvious question about this philosophy is: why can’t an agent just do this too?

To be clear, agents plural are absolutely part of this infrastructure. Motivated software takes advances in LLMs, reasoning models, tool use, and agent-to-agent protocols and meshes them together in a logical way that would be obnoxious to do manually, every time, for every use case.

Individual agents lack three things that a parent motivated software platform brings to the table:

Dimensionality - heavy users of agents know that unlocking their best performance comes from giving them small, granular, well-scoped tasks. You could theoretically ask an agent “fix my company’s budget” but - you’d be much better off (in both token cost and outcomes) deploying and orchestrating lots of smaller targeted agents, each handling specific tasks (“audit our FP&A last quarter,” “review last week’s fraud reports for irregularities,” “design a net-payment terms strategy to improve our cash-on-hand”). Motivated software coordinates this effort and selects which tasks are most relevant, when. Think of agents as a point-vector in this system: one gust of wind, rolling up into motivated software’s weather front.
The best visual I have for this concept. If each vector is an agent, the entire system is the motivated software platform. Similar to how one person could theoretically do all the work of an entire company, one agent could traverse this entire space over time… but not easily and not quickly.

Persistence - it’s no secret agents run into context window issues. Modern agents typically operate with around 100-200k token context windows. More are ramping up to 1M tokens in practice (and labs have up to 10M tokens in window in experimental models). So why is persistence an issue? Even with larger and larger context windows (and you have to assume that LLM builders will continue to innovate here) the cost of one agent holding on to the details of one conversation gets prohibitive, quickly. If you need nuanced details on every customer in your GTM strategy for each incremental agent call, you either need to have every single call expend increasing numbers of tokens (or risk compacting away useful details), or you need a way to retain and sort through context, feeding it to the right agent at the right time. The human could manually copy-paste context in each time, but wow - what a waste of human potential! - and what a great way to introduce suboptimal context calls and room for error.

Hyperparameters - agents don’t dynamically improve themselves or think critically about when to call themselves under which conditions. We as users can do that, but there’s an upper bound on how much we want to be thinking about “the problem of how to optimize an agent to solve our problems,” versus “our problems themselves.” Motivated software abstracts this away - we can program in the criteria for selecting the right agents, system instructions, and parameters, or better yet: we can have agents optimize this over time, creating meta-optimization layers that create recursive customization and sophistication that boosts how closely the software hews to our needs and preferences12.

Why this matters

Here's why motivated software is different from just adding AI to existing apps:

It remembers. Not just your data, but your context. Not just from this conversation, but longitudinally: it knows where to pick up the thread where the last agent dropped it, without burning billions of tokens to do so. It knows that when you say "the Johnson project," you mean the proposal for the client you met last Tuesday, not the internal project run by the engineer who happens to have the same name.

It learns your patterns. Sometimes, you know you’re stuck repeating a mistake, but you don’t know how to fix it. Sometimes, you don’t notice you’re repeating your mistakes until someone catches you, often way too late. Motivated software can nudge us out of ruts: watching you reschedule the weekly standup three times because West Coast teammates can't make the morning slot, it suggests a better time that works for everyone. Seeing you struggle to convert cold emails, it proffers more customized openers and blocks time on your calendar to protect research. Tracking stress signals in your meeting transcripts and messaging apps, it starts pushing late-night meetings and calls to help you recover and avoid burnout.

It connects intentions to actions. When you message "We need to speed up delivery," it doesn't just record the comment. It identifies bottlenecks in your workflow, suggests process improvements, and tracks whether changes actually improve delivery speed. Motivated software is fundamentally about ingesting your goals and reinforcing behaviors and processes that advance them, tuning all these RNNs around us to our advantage.

It incentivizes models to ask for help. Current AI confidently gives wrong answers because saying “I don’t know” in a chat window 50% of the time doesn’t feel great. Motivated software needs to get away from that, to understand and convey its confidence level because being confidently-incorrect regresses you from your goal. Motivated software calibrates AI answers against certainty over time. When uncertain, it asks clarifying questions or brings in human judgment.

Most importantly, it pulls agents out of the chat window and into the underlying primitives. It’s not possible to get the level of performance you need from motivated software by tacking on a chat window - simply asking a bot to give an answer or run a tool does too little, too slowly to have the software form-fit the customer and seek after their goals. LLMs have to be in the infrastructure, the libraries, the core utilities in order for this work.

This is hard to build. Really hard. Current AI technology isn't quite there yet. We need:

AI that can accurately understand goals, not just commands
Memory systems that maintain context over months, not minutes
Context uptake that costs thousands of tokens, not millions
Confidence calibration so the system knows what it doesn't know
Search capabilities that explore solutions without getting stuck
Feedback loops that ensure the software improves, not just changes13

A whole new world

Even just three to five years from now, this could make day-to-day workflows look really different.

Let’s say, in 2030, you run a growing sales team. Your software isn't just tracking deals; it's trying to help you succeed. It notices you're adding more reps but deal velocity isn't increasing proportionally. It identifies that deal approval has become a bottleneck and suggests a solution: parallel approval tracks based on deal size and risk profile.

You're skeptical: do you need that much process for your team size? But you give it a shot. The software spins out a custom agent, adjusts your approval workflows, and monitors the experiment, measuring not just approval speed but discount rates, legal review quality, and rep satisfaction.

After two weeks, it reports: 31% faster approvals, discount rates held steady, slight improvement in contract quality scores. It brings receipts: line-item audit trail of each deal by track, links to updated pipeline metrics, confidence scores against each routing decision it made, flags where human judgment might have differed (there's always room for interpretation). It asks: "What do we keep? What do we change? Do we codify this?"

This isn't science fiction. It's the logical next step from where we are today. The building blocks exist:

Large language models that understand context
Pattern recognition that spots trends
Optimization algorithms that search solution spaces
Feedback systems that learn from results

What's missing is the connective tissue—the infrastructure that lets these pieces work together seamlessly.

Deep dynamic customization as a moat

The companies that figure this out won't just have better software. They'll have sustainable competitive advantages that are nearly impossible to copy.

Why? Because motivated software's value isn't in its features. You can screenshot the UI and dump a feature list into Lovable, but the entire value prop is in the time-series bespoke context you’ve built up user-by-user, tenant-by-tenant: you can’t clone that. The value builds over time as the system learns the unique patterns of your organization.

It's like the difference between a new EA and one who's worked with you for years. They could have gone to the same school, worked for the same firms, but the experience is totally different. The tenured assistant doesn't just follow instructions: they anticipate needs, prevent problems, and make connections you might miss. That knowledge can't be transferred instantly to someone new.

The same protection applies to motivated software. A competitor can copy your interface, match your features, even poach your engineers. But they can't copy the accumulated understanding of how your specific organization works best.

Motivating your software

Building motivated software requires rethinking our entire approach to software development:

Start with goals, not features. Instead of asking "What should this button do?", ask "What is the user trying to achieve?" We express this as a meta-layer in “user journeys” now, but motivated software implants this directly into UI/UX and adds abstraction on top of that: goals are themselves composable and manipulatable.

Design for learning, not just usage. Build systems that get better over time, not just systems that work on day one. We compartmentalize this type of instability into our DS/ML teams in most engineering orgs today: motivated software will broadly assume that frontend, backend, data engineering, infrastructure, security, and analytics are all in a much greater state of flux, and in different states at different customers. Understanding what we’ve built will become its own form of challenge.

Embrace uncertainty. Create software that knows when it doesn't know and can gracefully ask for guidance. More to the point: create software that is guaranteed to not be in its optimal state, or even know its own optimal state, at the point of initialization.

Measure outcomes, not activities. Track whether users achieve their goals, not just whether they click the buttons. Metrics like intercept rate, derivatives, and precision will matter more than days active and clickthrough rates: “facts” will get necessarily more complex, more recursive, and require more storytelling to understand and follow over time.

This is hard. It requires new technical capabilities, new design patterns, and new ways of thinking about software. But the payoff is enormous: software that doesn't just serve users but partners with them.

The future is motivated

Software isn't dead, it's evolving.

The chat interface revolution failed not because AI isn't powerful, but because power without direction is meaningless, donuts in a parking lot. Motivated software provides direction by understanding what users actually want to achieve.

This is a fundamental shift in how we think about tools. For the first time, our software can grasp our intent and seek out our goals, not just execute our commands.

The companies that understand this shift and build software motivated by user success rather than feature completeness won't just win in the market. They'll define entirely new categories of what software can be, and they’ll leave previous generations in the dust.

The age of motivated software is coming. The question isn't whether it will transform how we work. The question is: who will build it first?

Subscribe now

There’s a long thread of complaints to this effect - [1] [2] [3] [4] - just to name a few.

Love that line. Don’t know if this is the origin of the quip, but this is where I saw it.

The core commentary on the Project NANDA report is pretty neatly summed up here.

It’s an older source, but there’s not much comparable data cross-border from recent months.

Different studies, so the numbers aren’t quite apples-to-apples. Take that with a grain of salt, but the baseline of companies using AI (67% vs 75%) is fairly close.

See Making agents work. There’s no good reason to throw compute (and therefore cash) at a task that would be waste if a human did it. (There is good reason to throw compute at friction).

At Slack we used to call these “papercuts:” small UX irritants that feel awful in aggregate when you use that particular software all the time (but don’t matter if you only need it once a month or once a quarter).

The title of this article is a callback to Stuart Eccles’ article from a decade ago, The rise of opinionated software. Aaron Levie and DHH are other notable advocates.

Ever notice how the concept of “digital transformation” survives all three eras? Wonder why that is.

There’s a couple notable arguments in the opposite direction [1] [2] [3] that bear calling out. But there’s a core issue: once you encode your opinion in source, nothing stops a competitor from identifying it and prompting an AI to replicate it. Try asking an agent to build a Linear or Superhuman or Replit clone. Their opinions aren’t moats. Their taste isn’t a moat.

It’s not “understanding” but it could feel like it on the user’s end. Anthropomorphization is not a good habit to get into when dealing with LLMs - it’s the root of like 90% of bad takes in the AI space.

I think this is where I distinguish these “eras” from Andrej Karpathy’s “Software 1.0 / 2.0 / 3.0,” which is less about how the software interacts with users and more about how the developer interacts with the codebase.

These may sound like givens, but there are a slew of agentic experiments that resulted in fragile, expensive failures that came nowhere near their stated objectives. If Anthropic and Cognition can’t “wing it” and succeed, odds are you and I can’t either.

Notes from Inbound

DJ Thompson — Fri, 05 Sep 2025 22:47:46 GMT

Most years, Hubspot runs their annual conference INBOUND in Boston, close to their headquarters. This year, they dropped a small flood of orange paint into every transit station in San Francisco and decided to run it in Salesforce’s backyard1.

Some key themes coming out of the discussions, especially among execs:

Agent web traffic is growing really fast for some

Some businesses are getting consistently 25% of their inbound traffic from deep research (think Perplexity, ChatGPT Web Search, etc). One person I spoke to said their ratio regularly spiked to 75%. I think that’s a sampling bias for sure - our crowd is much more likely to have AI folks deep-researching them than say a neighborhood deli - but still, the total amounts swung around fast.

AII and AIX aren’t anywhere near best-practice

I’ve said it before: I find it funny that it’s taken us 50+ years to optimize for UI/UX, and we’re still not consistently great at it, but when we bring LLMs into the mix we expect them to just get whatever browser or app or outdated GUI we throw at them. Calling an agent interface “AI” feels deliberately obtuse, so I’ve taken to labeling these “AII/AIX” (AI Interface / AI Experience) as the counterpoint to UI/UX, but I don’t know how universal that is.

And survey says we’re far from being good at AII/AIX right now. Most agents have a lot of trouble parsing anything that isn’t pure HTML, but we’re not optimizing for those workflows. Lots of us have watched Browser Use or Operator struggle through complex UIs and RPA-type motions2. What are we supposed to be doing about it on the other end?

Plenty of buzz from GEO agencies, new companies like Parallel or /dev/agents, though the best idea I heard all week was setting up agent-specific microsites, pointing them out in llms.txt, and keeping them sparse, HTML-only, and/or using chatbot interfaces to try to get the inbound agent to stop clicking and just say what it wants. One person said that experiment popped their success rate up from the mid-20%s to the mid-70%s (and the conversations told them more about what the searcher was asking for - more on that in a sec).

The tooling for agent web traffic is still nascent

Unlike searches on major traditional engines, identifying the search people were making via agent is tricky, and backing into how common that search is more of a guessing game than you’d like. People suggested screen replays and some vendors are starting to offer deciphering this as a service, but it’s much more opaque than before.

Revisit that solution around agent microsites. If you have sites that pop up specifically for agents (sit them on robots.txt/llms.txt, don’t provide (easy) UI access for users, promote them in GEO, and create a bunch that are very topical to the use cases you think folks are searching for) then your screen replays suddenly get much more useful.

Even more so if you can get chat execution against whatever your call-to-action is: two LLMs talking often make up things about whether they’re allowed to sell to each other and what the terms are, but the error rates are still much lower than best-in-class browser use in-the-wild. Providing carrots for agents to sort themselves makes it easier to figure out what the people behind the curtain really want.

AI web crawlers are making attribution harder

If you follow independent-web spaces, you know LLM-builder web crawlers are extremely unpopular right now. They’re not great at adhering to robots.txt or llms.txt, they can bombard websites relentlessly, and - for anyone who makes their living off of ads or creative work - they just represent a philosophical threat to occupation or revenue stream that most AI companies aren’t responding to right now3.

Anyways, it’s not just a small website owner issue. If you’re trying to sort through whether the OpenAI hits you’re getting are scraping for training, indexing for GEO, or targeted access for Agent… well, that’s tricky, and the fact that the first one is so high-volume makes it hard to optimize the experience for the latter two. Folks are talking a lot about how to sort traffic (and deal with stubborn crawlers that don’t respond to being politely sorted). The attitude isn’t nearly as exasperated as say HackerNews, but it’s clear the major agentic-search players haven’t cultivated the ecosystem yet.

GEO as a trust vector for ABM

ABM and large enterprise has historically not really cared much about SEO or website design. Take Scale’s website for instance: unless you’re pretty-well steeped in how AI is built, it’s not particularly easy to figure out exactly what Scale is going to sell you, especially for their particularly high-end generative AI product lines. (Their Rapid self-serve lines are a bit easier to suss out).

That’s not a dig on Scale, and it’s not an accident. It’s by design, and it’s common for most companies targeting customers of a large-enough size. You might have intent to buy, but Scale’s guessing that if you stumbled onto the site via Google search, you don’t really have capability to buy.

The customers who are capable are arriving via different channels that support higher trust for the buyer: referrals, previous contracts, existing partners and channels and integrations. Picking out any random data vendor via search engine is not a safe bet for the future of your RLHF or eval program: if you’re spending that level of money, you’d choose a different process.

GEO (“Generative Engine Optimization,” one of like five competing terms for describing how to push your company to the top of a Deep Research set of results) has an interesting vector for changing that. As you build up a context base and personalization set with AI (assuming you can hold onto it over time, which is a big assumption) you can actually create better trust with the results than you had with your search bar4.

Of course, if attribution is still really hard, then large enterprise vendors aren’t going to be particularly incentivized to optimize for GEO still, just because of all the false positives. To get the confusion matrix in the right spot, they need to get some sort of “agentic handshake” going: pretty broad agreement that if that existed in a trusted form it would cut out a lot of wasted qualification time up front.

That’s a big hurdle to get over first: AI still isn’t particularly trusted right now. Hallucination and confabulation are still quite real, and e.g. “Gemini said ClickUp was better for us than Front” is not going to be convincing to an angry exec if it turns out that was the wrong call. AI doesn’t get fired, people do.

Still, digitizing trust signals around enterprise procurement has been an interest area for a lot of people, for a very long time. Expect to see some people at least try their hand here.

Three types of builders

Wrapping up with an observation on presentations companies made over the course of the week. There’s broadly three groups of people based on the solutions they were implementing:

People who were automating away their headcount
People who were taking bureaucracy and waste processes and automating those to “give people time back”
People who were optimizing for what humans do best, what agents do best, and how to wring every last lead and dollar out of each hour and token they spent

Group 3 had the weirdest designs but the best growth, best engagement, best flywheel effects. You could see the technical folks and AI experts gravitate to those solutions; they sparked a lot more imagination and follow-on conversations. Group 3 also didn’t use much jargon compared to the other two: it felt like they had less to hide with their results.

It’s still early, but it’s clear there’s a bit of a shift in how people are approaching these solutions. For the better.

Subscribe now

Ok, it’s supposed to be because they want to strategically realign with innovation around AI. But a month before Dreamforce… quite a few folks I spoke to are amused about the timing.

Just today I tasked GPT-5 Agent with going through an online book and copying over the chapters into a linked Google Drive - should have been a simple exercise. When I timed it, it took me ~15 seconds to do one, not counting parallel-tasking. Agent gave up after 25 minutes having done 7 (8, if you count the duplicates) and having needed 5 separate interventions to get going again. Some of that is justified, some of that is agents just not being that far along yet, but are we really saying the best way for agents to interact with GDrive is via the same drop-down menus I have access to?

If you’re worried about losing your job or your livelihood, comments about “feeling the AGI” or “sustaining 20% cyclical unemployment” aren’t just unresponsive, they’re probably doing real reputational harm. Leaders here aren’t learning the lessons from the botched globalization rollouts: the same people who bought the West Wing-style “free trade is good … free trade creates jobs” mantras aren’t going to get fooled again so easily. They want a plan.

Assuming those results don’t get intermediated by ad buys, which is the reason the search bar stopped being trusted in the first place. It’s really hard to control the ad auction in such a way that it doesn’t ruin your credibility at the top of the ABM food chain.

Making agents work

DJ Thompson — Thu, 04 Sep 2025 01:22:14 GMT

This is the last post in a three-part series on which parts of work are valuable. It started with creating a ridiculous biodiesel drone to show how AI can generate perfect-looking garbage, then updating the Work-Friction-Waste framework to spot which activities actually matter.

Now, let's put it all together to figure out how to actually use agents without destroying what makes your team valuable.

If agents can't replace PMs, what can they do? Lots, as it turns out. Just not what you'd expect from following the hype train.

The Work of PMs

So, we know how Work works, and we know the PRD isn’t the Work, but the Friction, for the PM. What does that mean for the VP who wants to assign agents as PMs?

Good news and bad news. Bad news first: you still need PMs.

The danger of replacing PMs with agents - at least in this scenario - is threefold:

The agent making PRDs is not doing the Work. It’s handling the Friction (which - can be a good thing! But more on that in a sec).
The person prompting the agent is imposing all of their biases onto the agent, and disguising those biases as good ideas with rigor and research. (Like I did with the DeejDrone 9000). The key insight here is that none of my ideas were any better after they were put into the PRD than they were before: I didn’t do any Work to validate anything. But now that they’re in PRD format, it’s way harder to tell that they’re terrible. Friction that doesn’t carry any Work is Waste, by definition.
To top it all off, because we’ve replaced all our PMs with agents, the only people who can see through my PRDs and tell that they don’t have any merit are the people we let go. We lost our BS detectors, so we can’t tell when we’re lying to ourselves anymore1.

But let’s say I know PRDs aren’t the end goal of a PM. Can I train an agent to be a PM if I open up the scope?

Again, probably not, not yet. Agents aren’t great at the Work PMs do. It takes some savvy to handle discovering the Voice of the Customer and reading between the lines, and humans tend to be more effusive and expressive in front of other humans. (Ever “interviewed yourself” for a job with one of those automated video systems, or trialed one you were considering using? You’re not likely to “warm up to the interviewer” during the process; people still like talking to other people).

You need the LLM to have a thesis, interrogate the thesis, and subtly test their thesis while also looking out for “unknown unknowns” during these conversations… but the state-of-the-art realtime voice call agents these days tend to be just a little better than call center phone trees.

PRHs and Product Gap Agents

Good news: agents can still help, a lot. There’s a ton of ways to throw agentic automations against Waste and Friction in the average Product org, but for this conversation, we followed the PRD thread specifically.

First: low-hanging fruit. PM comes back from their customer interviews with notes, dumps them into an LLM, and the LLM summarizes them into PRD format. That’s not agentic behavior by any means, but it’s still pulling Friction out of the PRD process, assuming you can fact-check in less time than writing the document outright.

But let’s get more creative and find more value. Start from the point where your PMs go talk to customers and work backwards. The output of all of that is going to be a PRD; but is there a utility to having a PRD as an input too?

I could see it - for the sake of argument, let’s call it a PRH - a Product Requirements Hypothesis:

Try this out: take the exact same format, but fill it in based on your initial best guess - what do you think the right answers are going to be? Generate a PRD as though you knew the right answer based on your “day zero” knowledge - using an agent to research across the enterprise’s data sources - and then react to it. Do the details and assumptions and logical conclusions match what you expected?
Iterate on this and expose what would make your hypothesis wrong explicitly in the document. This step is key: otherwise you’re just biasing yourself, poisoning your own well. The end state for a PRH is not a draft PRD - you’re gonna redo all of this - but an experimental survey design that explains exactly what you think the options are for each element of the product design, the sensitivity and impacts of each decision change, and the things you want to listen for and explore in conversations.
The final piece of the PRH is the customer inventory - the conversations that you want to have that could get at the level of detail you’re looking for. Again, an LLM can help you run through CRMs or CX datasets to check for specific info, or at least give you guesses that you can react to and reject or edit. You’re looking to get more prescriptive on who to talk to - rather than “whoever my CSMs proffer up” - so you can test really specific elements of your hypothesis (or, also interestingly, find out that the things you expected to find aren’t as easily-discoverable as you hoped)2.
This would result in - in my experience - way more prep for customer conversations than most PMs walk in the door with3. If you’re sitting there testing for both high-level questions and a bunch of little details that you wouldn’t have thought of but could materially affect the end picture, the agent’s done a lot of actual Work helping you prime yourself to extract Customer Voice from what could be otherwise less-structured conversations4.

Ok - then let’s go one step earlier. How do I prime a PRH? Again, I really don’t want to set up a system where I take my innate biases about the product or the business and polish them up so that no one else can tell that they’re hollow and backed by nothing. I’d love to set up chains of events that feed me good triggers and data to inform my hypotheses: my agents should be setting me up for success.

The #1 place that this breaks down in most customer orgs is the product gaps process. Product gaps reporting is often sketchy at best, even among companies that design products for reporting product gaps! The feeds of information are inconsistent, the identity deduplication is rare, the supporting data is scarce, prioritization is done by squeakiest wheel, and at a certain length of backlog no automated gap feed is actually entering the working memory of any team member until a long-overdue-hygiene process goes through and flushes months-old cards off the board.

But an agent, operating in the right environment with the right framework, could do a couple interesting things:

They could (obviously) feed and organize new product gaps into the right streams at the right times. That’s easy. That’s not even agent work - that’s API + WHERE statements, maybe embeds to make the sorting less fragile.
They could reference existing projects or PRHs and ask requestors or gap filers targeted questions at the time of filing - what was the source of the bug? Where does the enterprise customer want this API to feed into? You filed 12 P0 gaps - if I can only do 3 this quarter, which ones would you vote for5?
They could build and maintain profiles on the requestors - both the internal parties, and the actual end requestors (the customers, the Twitter personalities, the vendors, the partners) to build context over time, so that gaps get richer over time. Lots of folks have pined wistfully for the opportunity to do this over my career, basically no one has the time to do it. Prime agent work - stuff humans can only conspire but to dream of because they have actual shit to do.
When it comes time for customer interviews or CABs, agents can shortlist the requestors (filtering for things like red accounts and churn) closing the loop on the information pull6.

My prompt to the VP at this point - how do you go one step before this? How do you use agents to prime a product gap process that feeds you great information and returns great sentiment? I won’t go into the answers here: I think the point’s been made and the ideas here start getting very tactical and very real. It’s much more exciting and useful than I, PMBot: the energy in the room is different because we’re targeting real problems with precise solutions instead of employing unguided AI for its own sake.

Bias for action

I want to close on a note my cofounder Deji brought up when we were discussing this concept. He mentioned his initial gut reaction: doesn’t this push back against conventional startup wisdom to just go do something?

There’s the classic cliche “bias for action.” Brian Armstrong famously put this as “action generates information” - when you’re pre-PMF, taking action is the best way to learn, even if you’re not sure it’s the right action7.

I like that. I don’t think the frameworks clash. Instead, I’d say the two mesh with one simple phrase: don’t confuse activity for action. Or, if I were to paraphrase Brian’s quote into our framework: “All Work generates information: any Work, even the wrong Work. Waste does not.”

It’s easy to run around looking busy, doing productivity theater, making yourself feel busy, doing everything but what matters most. That’s common for larger businesses layering on process for the sake of process; that’s equally common for small startups procrastinating and doing everything but getting in the van, talking to customers, and building what they want. If you spend all your time as a startup on Waste, you’re not taking real action, and you’re not getting real information.

The only difference for pre- and post-PMF companies is that, before you know your customers want to pay you for your Work, you just have to take your best guess. It’s still critical to minimize things that aren’t Work or supporting it - maybe more so - because you need to try so many different types of Work until you find the right one.

Work is the highest revenue-bearing activity. Work is also the highest information-bearing activity. Friction is your Work delivery vehicle.

Everything else is just in your way.

Framing exposes opportunities

Is the point of all this that you should turn your Product team onto making PRHs now? No. It’s an experiment, we ran it once, the jury’s still out on if it’s worth the effort or not. Feel free to play with it and let me know if it fits with how you build.

The actual point is: see the difference better framing makes? By lining up agents to destroy Waste, improve Work outputs, and cycle out Friction to improve throughput, we get a way different Product org that makes the PMs feel like they’re omniscient. The alternative - having agents make fake PRDs - doesn’t feel remotely as cool in comparison.

It's intellectually easy to swap humans for AI 1-for-1. It feels tempting to take all your Waste and task an LLM to do it for you. But I guarantee it's worse. There are better problems to target, and you can spot them using this framework. Tap your AI to eliminate Waste, minimize Friction, and improve your real Work.

Every company using AI is about to learn this lesson. The question is whether they'll learn it the easy way or the hard way.

You’ll be able to tell by how many DeejDrones they launch.

Subscribe now

Coming off of Inbound, there have been a lot of cool conversations about how people in Sales & Marketing roles are using agents productively - and not - and I’ll try to bring this back to earth a little bit and share what I’m hearing starting tomorrow.

We talked about hypocognition in an earlier post - this is sort of hypocognition-by-design, and it’s pretty dangerous.

We had a saying when I was in the Air Force - “negative intelligence is still intelligence.” Basically - don’t discount when you find out that something you wanted to learn cannot be learned.

Example of a simpler customer interview template. Nothing’s wrong with it, but there’s nothing setting you up to challenge exactly what you need to evaluate to get your product right; it’s mostly a blank piece of paper, ideal if you have no idea what your next product looks like, but the more you know about what you want to test, the weaker a template like this is. You can fill a template this generic out, exactly as written, without testing a single hypothesis. A PRH done well might guardrail you away from that outcome.

Here’s the draft PRH we made based on the DeejDrone concept: C003A2 - DD9K PRH. Again - the drone’s still a bad idea. But the template now is designed to expose that in customer interviews and get us detailed information about the viability of going into the commercial drone industry at all, and where the nuances are. This template filled in, plus the interview notes, turned into an actual PRD, will be infinitely better than the result at the beginning of the series.

This is, by the way, the exact reason I dislike the P# system and why we don’t use it.

Your product gaps process may be wildly different than these folks, but we set up an agent with fairly straightforward system instructions to run off a Slack webhook and feed disparate gap reports back to the Product team with some basic enrichment, which was enough to get the wheels turning on other long-neglected problems: https://github.com/centaurprise/deej-drone-9000/tree/main/agents

The original interview is here:

Which work matters

DJ Thompson — Sun, 31 Aug 2025 23:39:31 GMT

The marginal utility of what I’m doing right now

Last time, we prompted an AI to generate a beautiful PRD for a biodiesel drone that should never exist. The document was lovely. The idea was terrible. And while it was easy to tell the difference this time, it’s also easy to see how toning down the prompt just a bit would make it much trickier to spot the vapidness of the output.

That's the trap: when artifacts look right, we assume the thinking behind them is right too, especially if we weren’t there to vet the process. But in this case, the artifact wasn’t the work, it was just evidence that work happened. Or, was supposed to have happened.

Today, I want to share the framework that helps me spot the difference between value-add actions and expensive theater. It's helped teams figure out where AI actually belongs in their stack. More importantly, it's helped them stop wasting humans on tasks that don't matter.

Work, Friction, and Waste

So, how can you tell the PRD isn’t the part “that counts?”

There’s a simple framework for understanding the impact of the action you’re taking or the artifact you’re building: Work, Friction, & Waste. You might recognize the terms from an intro physics course, or from similar concepts from Jobs-to-be-Done1 (Work, Work-of-Work, and Waste), or Lean and Six Sigma (Work, Motion, and Waste)2. The concepts are similar, but to the uninitiated, the word choice of “Friction” is, in my opinion, way more intuitive.

Depending where you first heard them, the definitions taught to you for each of these items could have been quite complex. I personally like keeping them extremely simple:

WORK → What customers pay you for
FRICTION → What allows you to get paid for Work
WASTE → Everything else

My beautifully-rendered diagram of which kinds of activities are which.

I’ve spent a lot of time thinking through and boiling down this framework - first picking up elements as a consultant, then really building out and testing the theory for modern work at Slack, and now at gigue adapting it to how humans should be employing AI. Since I suspect most folks haven’t put nearly that much thought into it, I’ll take a sec to walk through where I’ve gotten with it:

What is the definition of each activity?
How does it look in practice? (we’ll use a “Mutually-Aligned Action Plan” (MAAP) example to help with the nuances)
How should I invest resources in this activity?

Work

Work is the most important of the three activities. There is a simple brightline for what counts as Work and what does not: Work is something the customer would pay you for3.

Be very careful and very strict with this definition: it’s tempting to say “the customer won’t pay us unless we have professional marketing copy” or “the customer won’t pay us unless we build this deck for the MAAP call” - sure, but they won’t pay you for the copy / deck itself, which changes how you handle the time and effort you invest into those assets4.

That does not mean that only Product is Work, or only Sales is Work. Customers will pay for SOC2 compliance; customers will pay for net-60 payment terms; customers will pay for reliable on-time delivery; customers will pay for <5 min response time on service channels with >97% SLA during business hours.

This is not an excuse to be elitist or to partition the functions of the business into haves and have-nots. It’s a strategic razor for activity: for everyone, whether they roll up to Product or Sales or Legal or Finance or Ops or CX, will the energy they spend on this particular activity return revenue or not?

Because Work is something the customer would pay you for, there is generally good reason to improve it at any given time5. Not every dimension a customer will pay for has high ROI - clever product design is still a thing - but at a super-high level “improving the Work” is a good use of resources provided you can tactically figure out the right way to do so.

Friction

Friction is activity that is necessary to effect the trade of Work for money with the customer. That definition is broader than the Work definition (Friction is not just e.g. finserv APIs and delivery couriers - it is anything that has to happen for the exchange of value to complete). The activity of building the MAAP deck from above? Friction6.

Think of the classic physics example of pushing a box on the ground (the one I used in my diagram). The effort you spend pushing can be split into two parts: the effort that moves the box along the straight line, and the effort spent overcoming the drag where the box touches the ground. You can’t not spend the effort on Friction: if you decline to spend it, the box doesn’t move and the Work doesn’t get done.

But, unlike your undergrad physics homework, you control the terms of engagement. You don’t have to choose scenarios with high Friction, and you definitely don’t want to optimize for high Friction7. Instead, you want to optimize for Work-over-Friction - how can you maximize the Work you get done and minimize the Friction it takes?

Take the MAAP example - let’s say the Work is a SaaS product, but the customer is also in part paying for security vis-a-vis an ELA with legal protections, payment terms instead of credit card transactions, and implementation help we’re setting up on this action plan. Some parts of this process, then, are Work, and some aren’t. How can we minimize Friction?

We could skip the Mutually-Aligned Action Plan step altogether, but that actually materially impacts the odds we get a contract signed, and definitely impacts the odds that contract renews (because it reduces the value the customer gets from the contract, so it becomes something the customer would pay for - just not how you’d traditionally think of it). Ok, so Work is happening in that step, we don’t want to throw out the baby with the bathwater here.
Do we need the deck to be built custom for every single customer, every single time? Probably not - there’s probably a good MAAP template that needs small tweaks each time, and we probably get better results that way. Friction pulled out.
Do we need to align the MAAP with the customer? Definitely. It’s in the name. How much back-and-forth do we need? When do we pull the champion in? Can we ghost the doc, prep a live tracker, pre-stage the artifacts we know they’re going to want, hook the doc into our conversation to get dynamic updates? Every bit of Friction we’re pulling out is helping, and every time we improve actual alignment and plan adherence is doing actual Work - it’s the service the customer is (about to be) paying us for.
Do we need meetings for all this? A champion meeting? A draft session? An exec sponsor review? Weekly check-ins? Again, we’re solving for maximizing the alignment of the teams against how much effort it takes to get said alignment. Meeting-phobia vs technophobia is a team preference question, in my experience… but you can get pretty clever on firing updates proactively, early, and eagerly to short-circuit the perception that misalignment has happened or could happen, which pushes the Work / Friction balance decidedly in your favor.

It’s often tempting to do the exact opposite of this approach in practice. Mutually-aligning with a high-value customer is important! and you can talk yourself into spending hours or days on fine-tuning the deck, discussing the ins and outs of punctuation and meeting cadence (which is tipping into the Waste section we’re getting to next) because you’ve convinced yourself that this is what they’re paying you for.

They’re not.

They’d much rather that energy go into the actual relationship and execution of the deal and implementation, rather than the spit and polish of the deck that merely implies relationship and execution, or the meetings about the meetings about the alignment around the implementation that start to actually impinge on the implementation.

The key dynamic for Friction is that investments into Friction assets can be helpful, but they have very low ceilings before they start getting counter-productive. Meanwhile, basically every effort you put into removing time and effort spent on Friction pays dividends, because it increases the Work any given team member can get through their pipeline & get done in a day.

Each little trick you can pull that doesn’t affect Work but does remove Friction usually benefits you8. You may not hear the benefit straight from the customer - remember, they often don’t really care because they’re not paying you for Friction - though in reality low Friction processes often get noticed and commented on because they imply that other processes will also be smooth; they set positive future expectations. They become a source of expected value in their own right (but they still aren’t Work yet).

Waste

The last term in the framework is Waste - which is probably the easiest to understand and the hardest to root out fully, especially in larger companies. Most companies I’ve consulted for, sold into, or advised have codified pretty decent amounts of waste into their processes and convinced themselves that it’s completely irreplaceable9.

Before we get too far: a ton of folks get frustrated by processes that feel Wasteful but do have purpose (maybe not Work, but at least Friction). Those could be regulatory checkboxes, accessibility concerns, or prep and coordination for a team whose processes just aren’t visible to the person accountable for the action.
The best practice in these situations is, to the greatest extent possible, to label what things are. Even if you can’t be precise, saying something like “this process creates xyz metadata which protects us from liability”10 helps both with morale and with humans + agents trying to figure out what pieces of any given process they can adjust or excise entirely.
This will become even more important as more agents get involved in your workflows. Not because they have more morale issues than humans (obviously) but because they aren’t expecting to or particularly adept at reading between the lines, and the un- or under-labeled data and processes are the ones agents mishandle.

Ok, but: there’s a bunch of things that don’t fall into that category. They’re not things the customer would pay for, and they’re not necessary for getting paid. Maybe they’re coordination artifacts, maybe they’re vestigial flows from an abandoned consulting project, maybe they supported a long-since deprecated piece of software.

Moving a contract forward is Work, and therefore so is mutually-aligning with them on the contract and implementation plan. Building the deck that spells out that plan is Friction - necessary, but there’s a (very shallow) upper-bound on how much work is actually valuable, and there’s a very clear straight-line to value for every minute and calorie you can take out of the deck-building process (because it opens up more Work throughput for those AEs / SEs / etc).

Holding MAAP deck review internal syncs? Could be Friction - depends on how mature your team is - but the longer they go on, the more likely they’re Waste. The customer definitely wouldn’t pay for them (so can’t be Work) and while they might have a straight-line dependency to delivering work for a brand-new junior AE warming up to the company’s MAAP process for the first couple times, odds are they stop supporting Work and start supporting bad habits really quickly.

If the MAAP doesn’t get built without the sync - big problem, and running a Wasteful meeting on repeat is not the right way to handle it. If an exec can’t gather context on the deal without the sync (despite having a CRM and a RevOps platform and a team channel and an account channel and 12 other expensive tools) - also a big problem. That weekly pipeline review where everyone reads their deals out loud like so many bedtime stories while the VP could have just read the CRM updates he asked for? Pure Waste. One company I worked with killed it, saved easily 5+ hours per week per rep, and saw no change in forecast accuracy11.

Not all meetings are Wasteful, but meetings that are crutches for basic accountability definitely are.

Waste has maybe the clearest investment criteria of all three: invest in removing it, wherever you find it. No Waste activity merits an incremental calorie, and cutting ruthlessly will often expose things about the business you didn’t know.

Given the mandate for AI and agents in most companies right now, you have a moment: if you’re frustrated with anything about the way your job works, now’s the time to propose a change. (If it involves, even tangentially, the use of AI, odds are you’ll get greenlit).

To recap

Once you see the actions you take through this lens, you can't unsee it. Every meeting becomes a question: is this moving us forward or just keeping us busy? Every document becomes a test: would the customer pay for this or are we just play-acting?

The framework is simple. Applying it is hard. Because it means admitting that a lot of what we do every day does us no good.

Next time, we'll put this framework to task. We’ll figure out how to assign activities to humans and agents based on Work, Friction, and Waste. Spoiler: in the MIT study that showed 95% of companies got no or negative ROI from their AI projects, the number one culprit was tapping agents to execute Waste activities instead of cutting them outright.

Subscribe now

Not going to go into the J2BD framework, but the basics can be found here: https://www.christenseninstitute.org/theory/jobs-to-be-done/

Also going to skip over the details on Lean, but the basics are here: https://theleanway.net/The-8-Wastes-of-Lean

Example: When Stripe simplified payment integration from weeks to minutes, that was Work. Not the documentation, not the sales deck: the actual reduction in developer time was incremental value that customers were willing to pay more for. (When their customers implemented Stripe to simplify their payment integration times or streamline their payments or invoicing, that was reducing Friction from their perspective. We’ll get there soon.)

Put another way: both Work and Friction are on the critical path to getting paid. That doesn’t make them equally valuable: some parts of the path ought to be shorter, and the difference between Work and Friction is “where can I shorten the critical path” versus “where would shortening the critical path turn around and bite me?”

You’re going to find exceptions to this everywhere - recessions, negative cash flow, margin pressure, death spirals, product deprecations, etc. Those are valid, and not the point. “Improving the Work” doesn’t mean gold-plating the commodities you sell: it means finding ways to continuously improve the price / cost / margin / value of your core products and services, according to your strategy, because directing that energy into improving Friction or Waste outputs is by-definition a less-valuable use of your resources. (Directing that energy into reducing the resources you spend on Friction or Waste, however, may be quite competitive)

Example: When Uber eliminated the need to handle cash or cards at the end of rides, they weren't changing the Work (the on-demand phone-based transportation). They were removing Friction: that awkward fumbling with payment that added time on the critical path but generated zero value. Instead, when the ride ended, you just… got out of the car. No waiting, no talking, no buttons, no hoping you had cell service. The ride quality stayed the same, but removing that transaction friction made the whole experience smoother and opened up their pipeline for more rides per driver per day and more rides per passenger per day. You wouldn’t have paid extra for no-contact payment terms, but it did become a source of expected value (it set baseline expectations that competitors didn’t meet, and it implied higher value on Work elsewhere for both buy-side and sell-side users).

This is why I don’t personally like “Work-of-Work” or “Motion” as the names for this term: they give the misguided impression that more Work gets done proportionately with the increase in Motion / Work-of-Work, which isn’t guaranteed to be true. You can create tons of useless Motion / Work-of-Work in service of the same amount of Work. Friction’s a better analogy for capturing that dynamic.

I made reference to the Lean term “Motion” and - while I prefer the connotations of the term “Friction” - I do recommend looking for Lean resources on how to minimize Motion since there’s already a slew of literature on the subject, usually referenced under “Lean Back Office.” Same for “Waste” later on.

Feel like it’d be mean-spirited to give a named case study here like the others :)

Instead of what the truth might be, e.g. “we would have lost a court case, but we took a super-secret settlement, and now we have to label all our shipments just so or we go right back to court again.” Obviously that’s a non-starter, but you don’t have to lay out all the gory details to provide enough context.

Turns out, if you invite n AEs to a t-minute-long status sync and have them each report for an equal amount of time, you systematically burn t(n - 1) minutes each time, since each rep spends (n - 1 / n)% of their time doing nothing, maybe listening to other people rattle off their CRM status, probably just multitasking or zoning out waiting for the meeting to end. These aren’t usually the “learning opportunities” they’re cracked up to be, especially when the attendee list gets long. 10 reps in an hour-long sync? 90% wasted time for each, burns 540 minutes for no good reason, not counting the time they needed to prep the updates and reschedule other conflicts. If they’re each at $150k OTE, that’s at least $675 per sync you didn’t need to spend. Replace these sessions: automate status updates, use meetings to handle direct tactical asks, and only invite the exact people needed for the ask.

How to miss the point of work

DJ Thompson — Fri, 29 Aug 2025 23:57:29 GMT

Last week, a product exec asked me a simple question that I thought encapsulated everything we're getting wrong about agents and automation. Not the tech; I don’t think you need AGI to use LLMs well. The key is using them well.

This is the first in a three-part series that explores why we’re getting it wrong, what it’s costing us, and how to fix it. It starts with me speccing a ridiculous vanity drone and ends with a framework that might just save you from building your own.

The DeejDrone 9000

So, I’m talking with a product exec last week - their company builds logistics vehicles - when he asks: "Why do we need PMs when I can generate better PRDs with AI than my team writes?"

"Let's test that," I said. We pull up the LLM he uses for work, hooked up to his company’s shared drive and resources, and I take the wheel.

We’re going to write a PRD for a new drone (this customer doesn’t even make drones, but the AI wrote a great justification for how easy expanding into the market will be based on absolutely nothing, so we’re all set). I borrow their structure for the PRD, so I don’t miss any steps or line-items - full rigor here.
I’m calling it the ‘DeejDrone 9000’ (again, not typical naming for the industries in question, but again: LLM’s got me covered on why the market will love it).
I’m insisting on making it biodiesel-powered-only (tagline: “she runs on corn” because enterprise aviation folks love HIMYM deep cuts? I bet?) and we’re going to paper over any questions about specific energy and setting up a new fuel supply chain and managing for weather conditions and performance at altitude because it’s trivial to do all of those things from the comfort of my browser window1.
I get a great PRD. Beautiful structure, great prose, notional PR for release, excellent detail all around2. Paste into Word, slap on company letterhead, print 1000 copies, let’s build this thing, right?

I’m narrating as I go, and by this point, it’s obvious to everyone that my DeejDrone 9000 is not a great idea. The top-level “why” is reductive: “because you’re being ridiculous.”

But what made this approach ridiculous? Why does taking this cutting-edge tech and applying it to a tried-and-tested product methodology (adapted specifically to this company and LOB and vetted for rigor) result in such a garbage outcome?

And if this process was extreme, does it mean that all variants on this process are equally ridiculous? Where’s the brightline on doing this work the right way?

Role reversal leads to role erosion

Last time, we explored how sales reps spend hours copying AI content between systems while agents write strategic plans. Today, I want to talk about why this keeps happening and what to do about it.

This is the root of why most companies are getting AI backwards and why it hurts more than they think. Sales reps spend hours copying AI-generated content between systems while agents write their strategic plans. Engineers review thousands of lines of generated code while AI makes their architectural decisions. PMs generate beautiful PRDs with Claude while missing what their customers actually need.

We know we’ve role-reversed. We’ve handed the thinking to the machines and kept the typing for ourselves. One dynamic we didn’t touch on last time: the more we hand over the thinking, the more we cut the thinking out of the process altogether3. That's not just wasting time, it’s actively eroding capability. That’s real damage.

Over the past few months at gigue, I've been iterating on a framework I used in the past to understand why this keeps happening. It's helped me in the past to spot the difference between real work and productivity theater and show it to others. These days it's helped folks figure out where AI actually belongs in the work stack.

Hopefully it’ll help you too.

What our artifacts are supposed to be doing for us

We're getting dangerously good at generating artifacts - code, images, documents - without understanding what for.

Software engineering isn't generating variables and brackets and semicolons; it's building a shared intricate understanding of a problem that transcends the theory or discussion of how you might solve it4. Same is true for AI image, video, and audio generation: art isn't arranging pixels or rocks or notes, it's first and foremost a way of conveying meaning and emotion that’s designed to end-run the mental barriers we put up to hearing those messages spelled out in literal, logical form5.

The rest of knowledge work is similarly deeper than the sum of its artifacts. Product management and enterprise sales isn't document creation; it's discovering what customers actually need and connecting their needs with what the business can deliver, then playing that back to them in their own terms6. Writing documents and updating CRMs is increasing taking over folks’ time, but it’s still not the most important part of the job, and it’s still a lagging indicator of doing your job right7.

The backwards approach? Let AI generate different versions for different people, and then-and-only-then try to extract the core meaning from the outputs. Even when you get what seems like a coherent set, this is like getting a random assortment of pre-baked dishes and then trying to put together recipes for a themed meal.

Little late, no? You already have the final results, and you may not even be able to tell all the ingredients that went into them, so it’s silly to back into the lowest-common denominator and pretend like you meant all of this to happen from the start.

It’s Codenames: Career Edition8.

And that’s if your team put together coherent AI outputs. Way more often, five great engineers generate ten totally different pieces of software in Cursor and no one has the first idea which is the one we were supposed to be building.

Central to the problem is that coherence in practice depends on longitudinal context, something that our existing context windows make very difficult (if possible at all) and very expensive (when possible). When we write code, generate images, draft documents, or update databases, there are long-running threads and ideas that exist in time and space, outside of the scope of any agent’s context window, but in the heads of the humans tracking the program, project, or deal that the artifacts are about.

Not only are the agents not able to hang onto these threads in any detail outside a given session… you don’t really want them to, otherwise you start burning large amounts of cash on every API call. You compact and hope the important parts make it into the summary, or you reset and take each task as a wholly new challenge, reforming new context windows each time, accepting the context loss as you go.

All of this means when you tap agents to build documents as a proxy for doing the job that emits those documents as byproducts, you put yourself in danger of creating artifacts for their own sake. You’re LARPing doing business.

Products and byproducts

So this comes back to the example of the PRD. Why does the PRD I wrote not work? (Despite being ridiculous?)

It’s because the job of a PM is not to write PRDs. (Hopefully this is what the PMs reading this have been yelling for the past few paragraphs).

PRDs are a byproduct of PMs doing their job. The fact that I can use a word combobulator to fill one in “convincingly” means absolutely nothing when it comes to actually building something people want. I didn’t do the job.

A PM’s job is to figure out what the customer wants - what the customer says they want, what they actually need, what they’d like to have, what they think they want but would actually not use or even hate9 - and then figure out how the business should go about building that thing. Or - in many cases - to recognize that we shouldn’t build it at all.

PMs take all this synthesized information and structure it into a coherent plan the business can process, aligning with stakeholders, checking feasibility, knife-fighting prioritization, and generally realizing an amorphous “concept-of-a-plan” that can turn into real value.

But because businesses don’t run well on amorphous stuff, and people don’t remember the details of amorphous stuff well, PMs document. That’s what the PRD is. The PRD is not the key output. It is an artifact that keeps people from forgetting or mishandling the key output. It’s not the Holy Grail; it’s the Ark of the Covenant10 - it’s what’s inside that counts.

What counts

The DeejDrone 9000 isn't going into production. (Sorry to disappoint). But it clarifies something crucial: we can generate perfect-looking artifacts without any of the thinking that's supposed to go into them. We can create PRDs without product sense, code without architecture, emails without human empathy.

The more we hand over the thinking, the more we cut thinking out of the process altogether. That's not just inefficient, it's actively making us worse at our jobs. It’s role erosion, skill rot.

Next time, I'll delve into how to figure out which documents and activities “count” and which ones don’t. I’ve been refining a framework for a while now that explains exactly which parts of your job deserve attention (human or agentic) and which don't. There’s a couple “gotchas” I’ve picked up over time that will hopefully help.

See you then.

Not that there aren’t perfectly viable cargo drones capable of using biodiesel - there are - but this is not how you go about designing one.

For your enjoyment, recreated and anonymized here.

No, “thinking mode” on an LLM doesn’t substitute.

The functions and types and classes of it all are a byproduct of this process, and the quality of the end result depends not on pure beauty of the syntax (which is why SWE teams are not evaluated by the sum of their LeetCode scores) but on the ability of the code to map the shared understanding of the problem and the ability of the team to map the problem to the customer’s reality with minimal excess complexity. CC and Codex are great, but they’re not checking this for you. This is not to say that agentic codegen is bad: it is to say that agentic codegen in the absence of software engineering is bad. I’m not alone in thinking this: if you read some of the “tips and tricks” on r/vibecoding, for example there are a slew of folks trying out techniques for rigor in planning, architecture, orchestration, control, CI/CD, QC,… essentially, re-inventing software engineering from the ground up to get their vibe-coded results to actually work. They’re realizing what many engineers have been saying for a while now: writing the code is a small fraction of the time and effort involved in engineering.

Tools like Midjourney and Runway let you choose with high precision the lens and lighting and scenery of the generations you’re creating, but you’re fundamentally approaching the flow backwards: your choices in technique and style are driving the heart of the output, not the other way around. There’s a ton of good use cases for quickly generating images and videos and songs in industry - upscaling, interstitials, brainstorming and temp generation, remastering, procedural customization - but attempting to substitute for the soul of actual art isn’t one of them.

Obligatory We Don’t Sell Saddles Here reference.

It may not sound like it at first, but play it out and this is common sense: if you update your CRM to pass the qualification gate before you take the qualification call, that’s a problem.

This game, if you’re not familiar. Party game where you need to pick out common underlying concepts from unrelated words in as few turns as possible. The picture version in particular is a great way to learn how someone’s brain works.

This is called the Voice of the Customer, and the gap between problems customers elucidate well and solutions they imagine poorly is startups’ happy place. This is the nuance to the cliche Henry Ford quote: in real innovation space, customers are usually great at describing their problem and terrible at coming up with the right solution. That’s ok: that’s not their job, it’s ours.

This is more an Indiana Jones reference than an actual discussion on religious iconography. Which I guess means the amorphous concept-of-a-plan / “spirit of the product” melts faces when you look at it directly? I don’t know, don’t extend the analogy too far.

Mapping the messy middle

DJ Thompson — Thu, 28 Aug 2025 17:18:16 GMT

The backwards world we're building

Last week I watched a VP of Sales spend forty minutes copying data between spreadsheets while, three tabs over, ChatGPT had just written her entire board presentation. She'll spend another hour this week checking if the AI got the numbers right, then an hour after that carefully copy-pasting each set of bullets into the right text box on PowerPoint and resizing them to fit.

This is the irony builders don’t really like talking about: we're using humanity's most advanced pattern-matching technology to write poetry while the humans who built careers on creativity and strategy are... playing clipboard. Feel the AGI.

Earlier in the centaurprise launch post, I argued we're missing the real opportunity by debating AI versus humans when the real wins come from AI with humans. Today I want to go deeper, because even some of the companies building the best human-AI interfaces are getting the allocation exactly wrong.

The spectrum that's blinding us

Here's how everyone seems to think about AI right now – as a single spectrum with two endpoints1.

On the left: beach time under the watchful eye of our benevolent overlords. On the right: our benevolent overlords are in the trashcan, and we break out the quill and parchment again.

On one end, you've got the AGI believers. AI becomes superintelligent, solves everything, ends labor as we know it. We all sip mai tais on the beach while the machines handle it all. The mechanics? Hand-wavy. The transition? Fiat. The politics? Boring. The funding? Magic. The physical infrastructure? It'll maintain itself, apparently; it’s historically been so good at that.

Who puts out the beach chairs? Throws away the little paper umbrellas? Teaches kids the dangers of excess screen time? Don’t worry your pretty little head over it: it’s our turn to say “I’m afraid I can’t do that, Dave.”

Some folks on this end think the superintelligence loves us (ergo tiki drinks). Others think it destroys us (ergo death rays). But it’s two variations of the same premise: AI will do everything, humans will do nothing (just that one version of “nothing” is spicier than the other).

Is there a Polymarket on which AI tries to do this first? I feel like Ani specifically has to be in the lead, poor girl has seen some stuff by now.

On the other end, you've got the bubble hypothesis. These folks hold the whole AI thing has been wholly worthless, propped up by VC subsidies and hype. When the money runs out, we'll realize ChatGPT was just autocomplete with good marketing, and we'll go back to whatever we were doing in October 2022.

Ok, maybe a bit dramatic, but who knows? Maybe inkwells and blotters make a comeback.

This seems... unlikely at this point. I can run Ollama with an open-source model on my laptop right now, hooked into Zed, as my own open-source copilot. For free. Forever. Even if every AI company imploded tomorrow, that capability isn't going away. Some genies don't go back in bottles.

But here's the thing: this whole spectrum is wrong. It's not about how much AI we use. It's about what we use it for.

Opening the second dimension

Instead of a line, think of it as a map.

So, our original left-right spectrum is still there (plus the death ray version, just for fun). Vertically, we have optimization: unoptimized at the bottom, optimized at the top. We’re pretty close to the bottom on most use cases.

The vertical axis is optimization – are we using AI and humans for what they're each genuinely good at? The horizontal axis is the first spectrum from before – how much are we actually using AI versus humans?

On the top, you've got a strengths-based future. Humans handle trust, creativity, and judgment. AI handles pattern matching, parallel processing, and repetitive precision. Opportunity cost is minimized; adaptability and flexibility are maximized. Like the centaur chess teams I wrote about, where humans provided intuition while computers provided calculation, both adapt to optimize for their strengths relative to each other and, when they do so, the combined team is way stronger than their homogenous rivals.

We optimize for putting the most creative folks on art, and support them with AI that handles the mundane parts of that work. Same for social, strategic, and other tasks. Speaking of strengths, drawing is clearly not one of mine.

On the bottom edge – where I think we are right now – you've got role reversal. AI drafts beautiful images and sketches strategic documents while humans do the digital dishwashing. AI writes the poetry and architects the systems while humans copy-paste between AI windows that don’t talk to each other. We've given our most interesting problems to pattern matchers while using human intelligence as middleware.

In so many cases right now, we get it backwards - we get uncanny-valley artwork and obvious AI-pretending-to-be-human correspondence, and we have to babysit the systems doing it.

It's absurd when you spell it out. But look at your own day. How much of it is spent cleaning up after AI versus doing the strategic thinking only you can do?

It’s not like we satisfy demand

That's what drives me crazy about all the "AI will take all the jobs" talk. It assumes we've run out of problems worth solving.

Think about October 2022 before the ChatGPT launch. Were we living in a solved world? Had we fed everyone? Connected everyone to the internet? Given everyone education? Handled climate change? Cured the diseases that plague us? Or even more mundane luxuries like: built enterprise software that didn't make people want to throw their laptops out windows?

Of course not. There was, and is, infinitely more demand. The world needs so much work done on it. Work that isn’t getting done because humans are blocked doing other things. Work that requires capabilities we don't have. Work that needs both human judgment and superhuman reflexes or processing power or attention or endurance.

The constraint isn't demand for solutions. It’s capacity to deliver them.

Replacing humans with AI is zero-sum thinking. It assumes the pie is fixed, that the only question is who gets which slice. But that's never been how our economy works: we’ve always had a system that relies on and rewards growth. When we augment human capability – when we give people tools that amplify their strengths instead of replacing them – we expand what's possible.

A sales rep who doesn't have to spend 72% of their week on busywork doesn't become unemployed. They become capable of managing four times as many relationships, going four times deeper with each customer, solving problems they never had time to even see before. They close deals on their long tail of their pipeline that they ignore in the status quo. They grow the business.

What optimization actually looks like

I've been building this future at gigue, watching early adopters reorganize their work around strength-based allocation. It's not theoretical anymore. I'm seeing it happen.

The sales team that uses AI to chase down follow-ups and next steps while humans focus on building relationships and understanding problems: they're not doing less work, they're doing different work. Deeper work. The kind that actually moves deals forward.

The engineering team that uses AI to generate test cases and scan for vulnerabilities while humans create features and design system architecture: they're not becoming obsolete, and they’re avoiding the skill rot you catch when you give yourself over to “the vibes.”

This isn't about efficiency. It's about capability. When you stop using humans as routers and start using them as thinkers, everything changes.

Dragging ourselves upward

The path from where we are (role reversal) to where we should be (optimization) isn't automatic. It requires deliberately rethinking every workflow, every job description, every assumption about who does what.

The centaur chess teams found this out. A 2022 study found what players already knew: amateurs with AIs were beating grandmasters with AIs, because the skillsets were different2. The study also found something no one expected: the best solo AI engines were not the ones that dominated the centaur matchups. In fact, for both sides, being the best by yourself was negatively associated with centaur performance3. Both humans and AI had to adapt to each other and reallocate tasks cleverly to win together4.

Most companies won't do this. They'll keep having AI write strategies while humans copy-paste, because that’s easy. It’s imagination bias, just a one-for-one swap. They'll keep having models attempt creativity while knowledge workers handle digital busywork. They'll keep operating in that bizarre bottom quadrant because it's easier than questioning the fundamental architecture of work and opening the can of worms that comes with it.

But some companies will figure this out. They'll realize that the question isn't "how do we use AI to do human jobs?" but "how do we use AI to do the jobs humans never should have been doing in the first place?"

They'll map every task to its optimal processor. Find the tasks they didn’t know were there but have been tripping them up all along. Pattern matching to AI. Relationship building to humans. Parallel processing to AI. Creative leaps to humans. Repetitive precision to AI. Contextual judgment to humans.

And slowly, iteratively, they'll drag themselves up into that top quadrant where both humans and AI are doing what they're actually good at.

The mission in the messy middle

Scott Belsky had this phrase "the messy middle" for that grinding, oscillating period between initial excitement and eventual success. That's where we are with AI right now. The demos were exciting. The AGI dreams are far away5. We're in the messy middle, trying to make this stuff actually work.

My bet is that the winners will be the ones who stop debating replacement and start optimizing allocation. Who stop having philosophical debates about AGI and start building systems where humans and AI each play to their strengths.

‘Cause here's the thing: we don't need AGI to transform work. We just need to stop using vanilla AI backwards. Stop having it write poetry while we copy-paste. Stop having it attempt strategy while we babysit its outputs. Stop pretending it's human while using humans as machines.

The future isn't AI doing everything or AI doing nothing. It's AI and humans each doing what they're built for. That's not a compromise: it's optimization.

And it's time we started building toward it.

Subscribe now

So I feel like I owe Randall Munroe a citation here because, when I uploaded my crappy stick figure sketches into ChatGPT and said “clean this up,” I decidedly got proto-XKCD as an output. Not really my intent but I’m sure those comics are in the DALL-E pre-training dataset so… go read xkcd.com if you haven’t already? I certainly have been for the past two decades.

“These results indicate that the new human–machine capabilities are unrelated, or even negatively related, to humans' traditional chess playing capabilities–the capabilities that machines substitute … Paradoxically, centaur chess players must not rely on their chess playing capabilities, which are inferior to those of the machine, but use other capabilities that allow them to complement the machine's capabilities. Leading players are therefore often not highly rated chess players, but computer engineers with modest chess capability, who approach the game from a computational point of view (Cassidy, 2014). Similarly, the ability to select and tune chess engines in both centaur and machine chess is associated with general data science and creative capabilities rather than with specific chess playing capabilities (Cowen, 2013, p. 86).To develop new complementary resource bundles, humans must have a flexible ability to go beyond domain-specific expertise when structuring, bundling, and leveraging resources (Sirmon et al., 2007)... As our findings show, it is therefore unlikely that actors who outperform in terms of traditional domain-specific capabilities are also able to do so in changed contexts where new and unrelated capabilities determine competitive advantage.” [emphasis mine] https://sms.onlinelibrary.wiley.com/doi/10.1002/smj.3387

“The results of the Bayes factors (BF), the relevant Bayesian Information Criteria (BIC), and the R² values suggest strong (BF > 20) to very strong (BF > 150) evidence of the absence of non-trivial effects … In other words, human and machine chess playing capabilities have no material impact on chess performance in centaur and engine tournaments.” [emphasis mine] https://sms.onlinelibrary.wiley.com/doi/10.1002/smj.3387

I mentioned this one before - when you create “ensemble models” of humans and AIs together and run evals, the “centaur evals” wind up testing different behaviors than if you test each in isolation. (Sounds intuitive when you say it, but I see almost no one building these eval sets in practice). Position: AI Should Not Be An Imitation Game: Centaur Evaluations | Stanford Digital Economy Lab

A huge part of the problem with “AGI” is that it has no technical definition (just “human-like capability or better”), so the goalposts can move on a whim. AGI can exceed any standard of intelligence we’ve ever known; AGI could just be anything better than employees on PIP. It’s hard to say where we are on the trajectory when there’s no clear destination.

This is centaurprise

DJ Thompson — Wed, 27 Aug 2025 23:51:13 GMT

Tl;dr

The best businesses are built on arbitrage. Distribution, data, regulation, talent, timing — if you see the leverage, or the shift in leverage, before others, you win.

Right now, there’s a massive arbitrage opportunity forming around how work happens. Not just tools or features, but the actual architecture of companies.

That’s happening now, and I started centaurprise to keep track of it. Not the hype, not the hand-waving or the hand-wringing: there’s already plenty of that going around. The actual changes in how companies are built, how teams are structured, how decisions get made when humans and AI start working together.

My plan is to share what I’m seeing here - at gigue, and in the companies we work with - as a way to start conversations and work through the details about these transitions.

Truths and fictions

I've been building ML systems since 2011. Managing enterprise accounts since 2016. For much of the past year, I've been building gigue, a multiplayer IDE for non-technical work that assumes humans and AI are teammates, not competitors.

We're not theorizing about hybrid work. We're shipping it.

Companies are quietly reorganizing themselves around human-AI teams. Not just in their marketing slides or investor decks, though for sure there’s a lot of that going around, but in their actual, daily operations.

There’s sales reps orchestrating AI agents to handle discovery calls, and others leaning on agent coaches to help them hone their discovery talk tracks. Engineers pair-program with models that suggest entire architectures, or outsource all their architecture decisions entirely. Support teams where AI handles triage while humans handle empathy, and support teams where AI takes the heat when the humans are out of emotional bandwidth.

There’s no playbook for this yet. The patterns are just starting to emerge.

I’ve taken to calling these companies "centaurprises"1 – part human, part AI, figuring out how to be more than the sum of their parts2.

The hybrid reality we’re scraping together

Here's what's cropping up, piecewise, bit-by-bit across the industry:

That competitor's blog post that actually caught your attention? Written by a human, edited by AI, fact-checked by a human, formatted by AI. The contract you just signed? AI flagged the non-standard terms, humans negotiated the edges, AI tracks compliance, humans manage the relationship.

75+% of businesses are using AI; 95+% of their AI projects are failing. But the ones that are succeeding are getting really interesting, and they don’t look anything like the hype pieces on LinkedIn or Twitter.

We keep having this exhausting debate about AI versus humans when the thought leaders are quietly using AI with humans. We catastrophize or fantasize about replacement while we're actively doing augmentation. We're so busy arguing about a future imperfect that we're missing what's happening right in front of us.

The sales email that made you respond? Drafted by LLM scooping up the latest context from the org, personalized and fact-checked by a human who knows you and your industry, timed by an AI that knows best delivery times by channel and time zone and industry and audience demographic.

The ones you ignored? Either 100% AI without context and an uncanny-valley headshot, or 100% human without homework or a response SLA. Both fail for the same reason: they're not playing as a team.

Backing into the messy middle

Playing as a team has nothing to do with AI replacing humans or humans rejecting AI. We're already living in the messy middle.

The real problem? Most folks - the ones who aren’t trying to replace humans outright - are still doing it backwards. We've got AI assigned to creative strategy while humans copy-paste between systems. We've got models writing poetry while knowledge workers spend hours reformatting spreadsheets. We've got artificial intelligence tackling the highest-value work while human intelligence handles the digital dishwashing.

You’ve felt it, right? That particular flavor of exhaustion that comes from cleaning up after an AI, clicking through each source to see if they even contain the cited data, checking if it hallucinated test results (or rewrote its hooks so it could fake them outright), or typing “continue” every ten minutes because it shut itself down out of confusion or context window or token limitations.

That's not what you're good at. That's not what you should be doing. That's not even work: it's babysitting.

We’re paying massive opportunity costs to work the way we work now. This isn't the future of work. It's barely the present. We can do so much better.

We haven’t seen the future of work yet

The reason people miss the real goal: they think it's about efficiency.

Sure, any LLM can help you write emails faster. But that's like saying the internet just made libraries faster. The part that matters is AI will change what comprises work.

We’ve seen this pattern before. When developers got IDEs, they didn't just write code faster. They wrote different code. They built things that weren't possible before. The tool changed the craft. The second- and third-order effects wind up washing out the short-term first-order benefits.

The same thing is about to happen to every knowledge worker. Not replacement. Evolution.

Sales reps won't just send more emails: they'll orchestrate complex, multi-threaded campaigns across dozens of stakeholders simultaneously, because they’ll abstract emails away. Marketers won't just generate more content: they'll create personally-adapted experiences for every single prospect, because they’ll abstract individual micro-updates to content away.

The core creative and strategic processes will be more important than ever; the thousands of tiny adaptations and adjustments and personalizations that no one ever had the time for will suddenly become widely available.

This isn't about doing the same work with fewer humans. It's about doing work that was impossible when we only had humans, because we didn’t have the time, the capacity, the capability, the energy to do these tasks. Opening up a second, synthetic type of automation is way more useful when you optimize for opportunity cost rather than trying to replace the first type one-for-one.3

Why “centaurs?”

In 1997, Garry Kasparov lost to Deep Blue. Chess was "solved." Humans were obsolete.

Except, no, that's not what happened.

Players invented "centaur chess" – humans and computers playing together4. And something weird happened: centaur teams beat both the best humans and the best computers. Not because the human was better at calculating moves (they weren't). Not because the computer understood strategy (it didn't). But because together they could do something neither could do alone.

The human provided intuition and pattern recognition. The computer provided calculation and consistency. Together, they beat everyone, humans and AIs alike.

(Yeah, I know: pure AI has caught up in chess since then5. Players and researchers warned as early as 2013 that the gap was narrowing or already gone, though studies as late as 20226 still show some advantage to centaur and human-engine teams over pure-AI formats. Even still: chess is bounded, a finite solution space. Business isn't7.)

Plus, I like puns. That’s what you’re really subscribing to.

What I’ll bring here

Every week: Real notes, experiments, and frameworks from building gigue and working with companies navigating this transition.

No: AGI predictions, consciousness debates, "10 ChatGPT prompts," or dystopian hand-wringing. I feel like those bases are covered.

Yes: How to structure hybrid teams. Why AI agents need managers. What breaks when half your workforce isn't human. How to design products that both humans and agents use. Where the frontier is and why the technology does or doesn’t work at specific corner cases.

I'm writing from the middle of it – building gigue, working with early adopters, watching what actually ships versus what gets announced. The gap between those two is where the real insights live.

I'm writing for builders, operators, and anyone who's tired of the hype and wants to understand what's actually happening.

This is a blog with frameworks, not formulas. Better questions lead to better systems. Expect iteration and open questions.

Unlike our official gigue blog, I don’t have a full structure or tag system for this yet: I want to let this play out a bit and see where the community and the lines of thought take it organically before imposing too much structure.

Because the truth is, the playbook is still being written. The companies that win will be the ones willing to experiment in public, learn fast, and share what works in the interest of reaching a global optimum outcome down the line (rather than chasing the social media flavor-of-the-month).

So, welcome to centaurprise. The companies building the future are doing it in real-time. Let's compare notes.

Subscribe now

I won’t spend much time here advertising gigue - we have a separate blog, context, for that - but I’ll make references here and there, especially from customer conversations and field experiments. context is planned to be a lot more practical and focused on how to use our toolkit to do large enterprise sales better, right now. centaurprise is going to be more forward-looking and theoretical, “why we’re building” instead of “what we’re building.” That said - it’s early days, and I haven’t locked in formats yet. Open to feedback as we get going.

I promise this is the last post where I say the word “centaurprise” so much. But I’ll still say it a couple more times later on.

While using the term “centaur” for human-AI combo gets its origins from other fields, notably chess, you may be familiar with the business application from the the HBS/Warwick/MIT study on consultant performance with and without AI assist in 2023, where Ethan Mollick referenced the term “centaur” to mean delineated task allocation and “cyborg” to mean ubiquitous AI allocation when describing LLM augmentation strategies at work (summary here).

There’s too much here for just this post - I’ll revisit it later - but idea that the superset of behaviors for “centaur evaluations” exceeds either the set of human or AI behaviors alone is important. https://digitaleconomy.stanford.edu/wp-content/uploads/2025/06/CentaurEvaluations.pdf

Kasparov himself is credited with inventing the modern format - he called it “advanced chess” - but the “centaur” name pops up all over in a bunch of different contexts. In the ‘70s the idea was called “consultation chess,” which fortunately didn’t stick.

https://gwern.net/note/note#advanced-chess-obituary, for one.

https://sms.onlinelibrary.wiley.com/doi/10.1002/smj.3387, worth reading in its entirety.

There’s a lot to process in these two links, and a lot of others worth reading (on substack and elsewhere). I’m not enough of a chess expert to wax philosophical on the future of the formats, but it’s interesting that at the time of the 2022 paper classical matches were averaging 80~ turns, centaur matches were averaging 100~ turns, and engine matches were averaging 140 turns, with each format progressively more likely to end in a draw than the previous. I think this speaks to saturation of the solution space: fast games come from early errors (e.g. grandmasters don’t lose to Scholar’s Mate) and while there’s something like 10^120 possible games, the fun thing about lopping off decision tree branches is just how exponentially-quickly you can collapse that space such that a) the number of leaves is reasonable again and/or b) every remaining leaf is a draw.