Notes from Inbound

Key themes from Hubspot's first SF conference

Sep 05, 2025

Most years, Hubspot runs their annual conference INBOUND in Boston, close to their headquarters. This year, they dropped a small flood of orange paint into every transit station in San Francisco and decided to run it in Salesforce’s backyard1.

Some key themes coming out of the discussions, especially among execs:

Agent web traffic is growing really fast for some

Some businesses are getting consistently 25% of their inbound traffic from deep research (think Perplexity, ChatGPT Web Search, etc). One person I spoke to said their ratio regularly spiked to 75%. I think that’s a sampling bias for sure - our crowd is much more likely to have AI folks deep-researching them than say a neighborhood deli - but still, the total amounts swung around fast.

AII and AIX aren’t anywhere near best-practice

I’ve said it before: I find it funny that it’s taken us 50+ years to optimize for UI/UX, and we’re still not consistently great at it, but when we bring LLMs into the mix we expect them to just get whatever browser or app or outdated GUI we throw at them. Calling an agent interface “AI” feels deliberately obtuse, so I’ve taken to labeling these “AII/AIX” (AI Interface / AI Experience) as the counterpoint to UI/UX, but I don’t know how universal that is.

And survey says we’re far from being good at AII/AIX right now. Most agents have a lot of trouble parsing anything that isn’t pure HTML, but we’re not optimizing for those workflows. Lots of us have watched Browser Use or Operator struggle through complex UIs and RPA-type motions2. What are we supposed to be doing about it on the other end?

Plenty of buzz from GEO agencies, new companies like Parallel or /dev/agents, though the best idea I heard all week was setting up agent-specific microsites, pointing them out in llms.txt, and keeping them sparse, HTML-only, and/or using chatbot interfaces to try to get the inbound agent to stop clicking and just say what it wants. One person said that experiment popped their success rate up from the mid-20%s to the mid-70%s (and the conversations told them more about what the searcher was asking for - more on that in a sec).

The tooling for agent web traffic is still nascent

Unlike searches on major traditional engines, identifying the search people were making via agent is tricky, and backing into how common that search is more of a guessing game than you’d like. People suggested screen replays and some vendors are starting to offer deciphering this as a service, but it’s much more opaque than before.

Revisit that solution around agent microsites. If you have sites that pop up specifically for agents (sit them on robots.txt/llms.txt, don’t provide (easy) UI access for users, promote them in GEO, and create a bunch that are very topical to the use cases you think folks are searching for) then your screen replays suddenly get much more useful.

Even more so if you can get chat execution against whatever your call-to-action is: two LLMs talking often make up things about whether they’re allowed to sell to each other and what the terms are, but the error rates are still much lower than best-in-class browser use in-the-wild. Providing carrots for agents to sort themselves makes it easier to figure out what the people behind the curtain really want.

AI web crawlers are making attribution harder

If you follow independent-web spaces, you know LLM-builder web crawlers are extremely unpopular right now. They’re not great at adhering to robots.txt or llms.txt, they can bombard websites relentlessly, and - for anyone who makes their living off of ads or creative work - they just represent a philosophical threat to occupation or revenue stream that most AI companies aren’t responding to right now3.

Anyways, it’s not just a small website owner issue. If you’re trying to sort through whether the OpenAI hits you’re getting are scraping for training, indexing for GEO, or targeted access for Agent… well, that’s tricky, and the fact that the first one is so high-volume makes it hard to optimize the experience for the latter two. Folks are talking a lot about how to sort traffic (and deal with stubborn crawlers that don’t respond to being politely sorted). The attitude isn’t nearly as exasperated as say HackerNews, but it’s clear the major agentic-search players haven’t cultivated the ecosystem yet.

GEO as a trust vector for ABM

ABM and large enterprise has historically not really cared much about SEO or website design. Take Scale’s website for instance: unless you’re pretty-well steeped in how AI is built, it’s not particularly easy to figure out exactly what Scale is going to sell you, especially for their particularly high-end generative AI product lines. (Their Rapid self-serve lines are a bit easier to suss out).

That’s not a dig on Scale, and it’s not an accident. It’s by design, and it’s common for most companies targeting customers of a large-enough size. You might have intent to buy, but Scale’s guessing that if you stumbled onto the site via Google search, you don’t really have capability to buy.

The customers who are capable are arriving via different channels that support higher trust for the buyer: referrals, previous contracts, existing partners and channels and integrations. Picking out any random data vendor via search engine is not a safe bet for the future of your RLHF or eval program: if you’re spending that level of money, you’d choose a different process.

GEO (“Generative Engine Optimization,” one of like five competing terms for describing how to push your company to the top of a Deep Research set of results) has an interesting vector for changing that. As you build up a context base and personalization set with AI (assuming you can hold onto it over time, which is a big assumption) you can actually create better trust with the results than you had with your search bar4.

Of course, if attribution is still really hard, then large enterprise vendors aren’t going to be particularly incentivized to optimize for GEO still, just because of all the false positives. To get the confusion matrix in the right spot, they need to get some sort of “agentic handshake” going: pretty broad agreement that if that existed in a trusted form it would cut out a lot of wasted qualification time up front.

That’s a big hurdle to get over first: AI still isn’t particularly trusted right now. Hallucination and confabulation are still quite real, and e.g. “Gemini said ClickUp was better for us than Front” is not going to be convincing to an angry exec if it turns out that was the wrong call. AI doesn’t get fired, people do.

Still, digitizing trust signals around enterprise procurement has been an interest area for a lot of people, for a very long time. Expect to see some people at least try their hand here.

Three types of builders

Wrapping up with an observation on presentations companies made over the course of the week. There’s broadly three groups of people based on the solutions they were implementing:

People who were automating away their headcount
People who were taking bureaucracy and waste processes and automating those to “give people time back”
People who were optimizing for what humans do best, what agents do best, and how to wring every last lead and dollar out of each hour and token they spent

Group 3 had the weirdest designs but the best growth, best engagement, best flywheel effects. You could see the technical folks and AI experts gravitate to those solutions; they sparked a lot more imagination and follow-on conversations. Group 3 also didn’t use much jargon compared to the other two: it felt like they had less to hide with their results.

It’s still early, but it’s clear there’s a bit of a shift in how people are approaching these solutions. For the better.

Ok, it’s supposed to be because they want to strategically realign with innovation around AI. But a month before Dreamforce… quite a few folks I spoke to are amused about the timing.

Just today I tasked GPT-5 Agent with going through an online book and copying over the chapters into a linked Google Drive - should have been a simple exercise. When I timed it, it took me ~15 seconds to do one, not counting parallel-tasking. Agent gave up after 25 minutes having done 7 (8, if you count the duplicates) and having needed 5 separate interventions to get going again. Some of that is justified, some of that is agents just not being that far along yet, but are we really saying the best way for agents to interact with GDrive is via the same drop-down menus I have access to?

If you’re worried about losing your job or your livelihood, comments about “feeling the AGI” or “sustaining 20% cyclical unemployment” aren’t just unresponsive, they’re probably doing real reputational harm. Leaders here aren’t learning the lessons from the botched globalization rollouts: the same people who bought the West Wing-style “free trade is good … free trade creates jobs” mantras aren’t going to get fooled again so easily. They want a plan.

Assuming those results don’t get intermediated by ad buys, which is the reason the search bar stopped being trusted in the first place. It’s really hard to control the ad auction in such a way that it doesn’t ruin your credibility at the top of the ABM food chain.