You've Been a Bad Agent | Transcript: In Person!!! Topic of Agent Payments, Friendly Acquisitions, GPT-5, Endless Model Name and Many Free Ideas

In Person!!! Topic of Agent Payments, Friendly Acquisitions, GPT-5, Endless Model Name and Many Free Ideas

August 27, 2025 / 01:10:45/E11

Matthew Carey (00:00)
What's up? How are you, Matt? So we'll just start this by saying, by the way, I'm going to make actual eye contact with you. Because we're in person. ⁓ yeah, I think we'll make the video worse, but we are an audio first podcast, isn't that right? Yes. Until we decide to pivot to focus on Gen Z, in which case we'll be short form video first. Yeah. we? Sorry, sorry. I shouldn't touch my laptop.

Yeah, we apparently you can put videos directly into Spotify. true. Yeah. I haven't worked that one out yet. I don't think anyone actually cares though. So if you do care, pop us an email. You what's most important is what we care about. Yeah. Because this is a podcast for us, by us, with us, just like the constitution. The back of Will's laptop says has amazing stickers on it. I just want to read some out because

We don't normally get the chance to talk about this stuff. So it has a USA Triathlon sticker, which I would say takes over about a third of the back of his It's a big sticker. It's a huge sticker. Everything's bigger in America. Yeah. And then there's a massive sticker about a coffee company. There is a CloudFlare sticker. There is an MCP everything sticker. That's apt. And then my personal favorite, there's loads more, but we're not going to read them all out. But my personal favorite is a Beware I'm Tapering sticker.

Which is another triathlon reference, which is very funny. Do you want to explain that? No, not really. I think people can Google that one. Really? Okay, alright. Why we won't explain it then? But yeah, I'm in London this week here for a couple of things Friends getting married. It's another friend's 30th birthday or two friends 30th birthday actually And then I was also going to do a triathlon actually here on Sunday the like big London triathlon was on Which is maybe one of the biggest triathlons in the world because it has 6,000 people taking part

Yeah, it's the T100, right? Which is huge, yeah, the T100. And I was planning to race it. The problem was, Jess and I were going to a festival in SF on Friday. And the next flight after that to London was on the Saturday, following Saturday, because the festival is an all-day thing. You're done by 11 PM. And then the best flight I could get was one that would arrive 7.30 AM.

Because all the flights are always like red eyes, right? From America back to London, they're always overnight. You can't like leave, there's no flight leaving San Francisco at like 10 a.m. like SF time and gets into London like, I don't know, midnight or the middle of the night or whatever. Anyway, so, best flight I could get lands Heathrow 7, 25 a.m. The race, the final wave of the triathlon sets off at 10, 10 a.m. It's about an hour from Heathrow.

to the race. was gonna, I had a friend ready to go help me like assemble the bike on the Lizzie line. So it could have like totally worked out, but everything had to go right. And then everything started going right because the flight actually got there at 6.55 AM. So was half an hour early. So perfect for me. thought, hell yeah, let's go. I go to the bathroom, like put on my tri suit. Um, then I walk out. just imagining you putting your tri suit on in Heathrow luggage collection lounge.

Full tri-sit guys, aero helmet, I'm ready, I'm prepped. I was wearing my calf sleeves as well, which is a very triathlon recognized That's a look, that's a look. It is quite a look. then I look on, United has like a tracking app and they just said, yeah, we loaded your bike onto the next flight. So that was the end of that dream. And then I had just waited in Heathrow for three hours. not a fan of United by the way. The staff are so unfriendly.

And it's just like not great service. Although I will say, the best thing about United is they do an ice cream sundae after dinner. And it's a big ice cream. But guys, it's not the cookie. What do they get? You get afternoon tea on Virgin. Yeah, Virgin is the way. The afternoon tea on Virgin was so good. Wow. We are yuppie. But yeah, very nice to see you in person. We're at this incredible office today, which you've kind of organized this, haven't you? Not really. ⁓

I've tagged on to something else that was organized. Yeah, so Zoe from Dawn has had runs these co-working days and they're running a hackathon this evening also. And that she's doing bits actually to get like the dev community together in London. Every time I come to one of these things, I meet like probably know half the people here. And then the other half is completely new, but every time that's quite, I think, relatively impressive. Yeah.

Never bet against a VC network. It's kind of amazing. Yeah, shout out to Zoe and Dawn Capital, who by the way have an incredible office. I forget how fancy like VC offices can get, but I think this is actually one of the fanciest. Like the Excel office in London is fancy as well, for example, but like not this fancy. It's like wood paneling. Like what are we drinking? Some cucumber drink right This is amazing actually. It's a cucumber premium soda made by a company called Something and Nothing.

And I think it is both something and nothing. I would agree with them. The bathroom is warm like a spa. It's marble. It's like marble everywhere. It's incredible. The plants are incredible. Like the coffee machines. They have some very, very incredible art here. I will tell you afterwards because I think if I tell you what the art is right now, it's like liable for a break in because it's like really crazy.

But yeah, they have some art in a certain area of the building that is wild. It's a good reminder. Yeah. It's interesting. I mean, place. Lovely place to be and lovely day to hang out. Great roof terrace. Big roof terrace. views. We came here for AI demo days, I think two months ago around for the last... Was it the last one? It might be the last one. Wow. I'm slacking. Yeah. Anyway. You're a busy man. And they were... We're out on the roof terrace.

like 10 PM at night. Normally everyone goes home at this sort of time, but in dawn, they were super happy to let us stay for a bit longer. And the sun's going down, beautiful sunset. Everyone's just chatting. It was amazing. We had like 50 people here until about 10 o'clock at night, which, I mean, that doesn't happen at any other event. You normally have like 10 or so stragglers who are still getting wasted. But here, wasn't the case. It was just really good chats. Yeah, yeah. That's awesome.

Yeah, to be a VC. And by the way, think the people here, I mean, it's only like 10 of us or something at this co-working day, but it's such a fun vibe. Good people, everyone's building something or whatever. I think one of the best, you often get this on group holidays. ⁓ So it's incredible that it's already happened in just the five or six hours of the day, you have this group dynamic where these shared memes start developing.

And we have this one guy here who joined some WhatsApp group chat or whatever, for some reason was removed from the WhatsApp group chat, unclear why. And turns out one of the admins of the group chat is here as well. And there's now this meme that he has to prove that he's a real developer or whatever to get back into the group chat. I think it's actually just the story of what did he do? Yeah, what did What did he do to get kicked off this group chat? I think there's like 500 people in this group chat.

Right, right, So what did he do that was so bad? He was removed without recourse. Yeah. So yeah, I think we're wondering about that. Yeah. We've asked everybody. The poor guy has to defend himself, even though it's probably just some mistake. So yeah, it's just like a really nice vibe, fun vibe, very friendly. Yeah. And how's it been going with you, man? Like how's building, how's Colo going? Very good. Let's start with the less.

Actually, let's start with the one that's on the bean on the bat button and we haven't spoken about for a while. How's SimplePoll doing? SimplePoll. SimplePoll is doing well, yeah. Plotting along. Yeah, I mean, there's not that many crazy updates. I mean, I added an API. I want to add an MCP server to it as well so you can create polls. I have a funny story about SimplePoll. go on. ⁓ My Slack MCP server needs a team ID ⁓ from Slack, which is actually a very awkward thing to find in Slack.

And I couldn't recall to Google how to get it, but I knew that if I wrote forward slash simple poll space debug, I would get a printout of my team ID. So I use simple poll to get a team ID. I'm glad it's useful in that way. It's so useful. Yeah. It's my Slack debugging tool. I'm happy to hear that. Yeah. It's, if you use it on the web, like the team ID is also like in the URL. Okay. Yeah. But I don't do that. Yeah. That would be helpful. That would be good to know. Yeah. Yeah. Yeah. For sure.

⁓ But no, there's actually a bunch of stuff I want to build. I mean, there's for sure a way, like some way to build like AI into simple Paul. I saw that hacking on a thing actually where it was using Gemini Flash 2.5 to ⁓ like classify based on like the team name ⁓ and like a few other things. ⁓ If it's like a community or like a business because

We have like, there's so many communities using Slack and so many of them using SimplePall. And we're never going to get like New Jersey pickleball to pay like our like business prices. So it'd be nice to let them use SimplePall for free, but it was always like, there's so like, you know, SimplePall has installed on like hundreds of thousands of workspaces. So it's like way too much effort to try and like manually like classify them. But if we do it with AI, like that, could you, we could give it people for free. When you, ⁓ when they sign up.

SimplePool originally. They have like a free trial sort of thing, don't they? Yeah, yeah, there is a free trial. Is there like a, not a button that says if you're a community presser and that sends you a form? And then you could classify that rather than trying to classify like a team name? Oh, I'm just thinking. I think that could also work, yeah. There is a way to specify that you are a community, but you were thinking like, tell us about your community or something like that. And then you can just like...

Not manually, but then you can put a row in the database that says community true. Yeah, yeah, yeah. No, that feels like another way of doing it as well. You were trying to just like... Yeah. Are you thinking that it's going to be too hard to classify based on just like a... Based on a name. So for instance, I'm part of a bunch of communities. No, I'm part of a bunch of like...

Business community. for instance, would the, I mean, it's not very active now, but the Lang chain community on, ⁓ on Slack was essentially Lang chains, internal Slack with a bunch of random people added to like, like beta testing and stuff. ⁓ would you call that a business? Cause I would call that business, but community is literally in the name of the team. Yeah. Yeah. Yeah. Yeah. I mean, like it's a good point. Like I think.

I would almost like... the simple poll charge is based on how large your company or your workspace is. And I think even in those cases where it's a community run by a business and to serve a business primarily, there are just so many people in it that it becomes like kind of unaffordable. Because they pay per user. Yeah, it's kind of like there's kind of like buckets you pay for. So the cheapest is like if you're under 20 people or something. And yeah, there's some people on it with like 100,000 plus people or whatever.

⁓ But then also it's like the communities are great because it can spread across company boundaries, right? And people see, wow, like I can do this poll in Slack. And so it's a good, it's a kind of a good funnel. So I think even though it's run by business, it is kind of in our interest to like, maybe make it free there. Yeah. Okay. So, then would you make it, can you make SimplePoll free for like some channels and not others? Because the hugging face one, I was actually invited to their, their

Personal slack. I see. see. Yeah. Yeah, is also Theoretically possible. Yeah, not something we have right now. Okay, but the line chain one is it's like you join Yeah, the line chain one is like they publish a URL somewhere and you just join with an embed link. Yeah Yeah, yeah, similar pool always always more to do there It's kind of fun to work on it and obviously fits into code on really well because it's a big Python code base and code is all about a big Python code base So I always or it's all about

Go on then, how's Colo? Colo, what's going on? Colo's good, yeah. I don't know, I think working with AI is just so different. think one of the tropes ⁓ that we go on about on this pod is that AI is like a skill and working with AI is like a skill that you need to really build and learn. And some people can get so much more out of Claude code and other people really struggle with it. And I'm not sure I'm particularly good at it yet.

And I think that a big reason for that is that it's like a skill that needs to be learned or like context engineering or whatever you want to call it. But I do think it makes just like building things with AI like a little bit more frustrating than like in the old world, pre-AI of building software, right? If you had like basic understanding of programming and relational databases, you could kind of have an idea and then just build it.

and you knew it was possible to build it. Maybe you would have some stumbling blocks along the way. But with AI, it's like, okay, I think this should be possible. And then you try it and maybe the AI is not very reliable or it doesn't do what you expect it to do. And then you're like, okay, I'm not sure how to make my prompt better or how to get to it working. Is this the analogy of before you had cars, you had horses, and with horses you knew you could go 25 to 30 miles a day?

And you knew that was how far you could go, but now you have a car. Maybe you can go a hundred miles, but maybe you could also go three miles if it breaks as you get down the end of the road. that kind of like, yeah, I mean, maybe I think, and you don't know how to fix the car. Where's that? The horse you just fed it, right? Right. Oh, interesting. I think that it does kind of work. I saw an analogy somewhere that was like maybe comparing the horse to.

like a very complicated spaceship that can go at light speed, but you don't know what any of the buttons do. But sometimes if you press the right buttons, it goes somewhere useful. don't know. I think that's like exaggerating the AI thing. Like the AI is like, yeah. I mean, I think it's for sure. My point mostly is not that like, obviously, you know, I think this stuff is the future, but like, it's just like more frustrating to build with. And I think that's kind of like a- Maybe it's like a train. You need rails.

We need signals. And maybe that's a better analogy. Let me tell you what I've been up to with Colo recently. as a reminder, the headline idea is for it to be like Python vibe debugging. And then if you go one level deeper, it's not just debugging. It's like, sure, Colo can tell you what's wrong here. But it can also give you an understanding of the code base, or specifically give the AI an understanding of the code base. And it does that by doing some very deep Python tracing.

where you enable it and then as you execute your code, you see every single function that was called, every single return value of the local variables, a lot of useful stuff. It's a lot of information, but AI is kind good at going through lot of information and seeing what's relevant. So the thing, the kind of workflow that I'm building towards is you plug the colo MCP server into your agentic loop, into your AMP or your Z agent or your cursor or...

any client that supports or any agent that supports MCP servers, and then your agent is just better informed about the code that it's executing because it can call the colo MCP server. That all makes sense so far, right? Yeah. And then there's a lot to be figured out with this. It's still somewhat early in this workflow and making that workflow reliable. But ⁓ one of the frustrating things has been that

Ideally, the way I'd want this to work is that as a user, don't have to say every time, use the colo MCP server to get the trace and then look at what the trace. Because ⁓ just to name some of the tools, one tool we have is list recent traces. And it shows you a little summary and what the trace was about. for example, you made a post request to slash API. And then you can get the detailed version of that trace with another tool call. Or you can just say, give me the detailed

Traces, give me all five most recent detailed traces. And then you can go even one level deeper than that. ⁓ But it's a little bit like, obviously it'll do it if you say use the colo MCP server, but that's kind of annoying. It should just work for you in the background and the agent should just be able to use the data, see what's relevant instead of just having to like... I guess the discoverability problem is like kind of, it's a very similar problem to the one that Bun is facing at the moment where...

anyone using bun, the first thing that called code does is be like, oh, you're in a TypeScript repo, NPM install. And so I guess I would follow what they're doing. And they just started with a really nifty rules file, I think. But I don't know, you say you need more than just a nifty rules file. Well, OK. I mean, I think that the so in the simple pull there, there is a bunch of rules around this and.

It works like a little bit, but just not super reliably. think, but in theory, right, with MCP, you shouldn't need rules files to go along your MCP server, right? Like just the tool descriptions should kind of be enough. That is kind of the promise. I don't think that's where we are today. Yeah. I think that's the promise, but in your case, so I have a way for you. If you instrument the code base, do you even need MCP? Can't you talk about how a CLI would work in a

That's what Bun is trying to get around. Right. So Bun is not an MCP at all. It's just a C-R-I-T. No, it's just a package. I mean, the way I think about it is that MCP is just, especially local MCP, it's just wrapping CLI tools, but you get to give context about when to use it and how to use it and all the stuff. I saw Jeffrey, one of the Sourcecraft people complaining about MCP and why wasn't it just a CLI. And I think we're going to see a lot of this.

And then the big MCP proponent at the moment is David Kramer from Century. And I know they're releasing, well, they've released a bunch of MCP support, which is still very much in beta, but that'll be really cool when it works for you as well, right? Like, okay, this is going down a different rabbit hole, maybe, but like, do you feel about, Colu is something that someone's going to ship into their own code base, right? Like.

Someone, if I'm using Colo, I install Colo into my code base. How do you feel about people putting telemetry in those packages, like telemetry tracking? Some people I know, like, SpeakEasy, I know you use PostTog inside their SDK. Right, Or inside their CLI. Like, would you use something like that to understand the performance of your MCP server? And the telemetry goes into their...

Probably into yours, into your telemetry service. Oh, I see, see. So you couldn't understand the usage of an open source Yeah, but Colo actually has kind of a concept of plugins. Because I think in, or the way I've thought about this previously is like, so Colo does what I said earlier, in that it can do every function call and every function call return, and it shows you that. And it's very much focused on showing that in your own code.

But sometimes interesting shit happens in a library, right? So we have this idea of execution highlights. So example, when you make a SQL query or you make a outbound request or something, that's highlighted even though it doesn't happen in your own code. At the end of the day, the actual connection is made somewhere else. That's cool. And then we have this of plugin system where any library, like an interesting thing that's worth highlighting happens, can bubble up into the main trace, which can then be consumed by you or by...

AI. But let me get, or this thing I was... thinking about the MCP tracking because I imagine if if Colo starts being used by people and they're complaining that their MCP tools are not being used, you might have instrumented the code base so it works really well in SimplePole. I But then it doesn't work very well for other people. Yeah. How would you, how would you do that continuous improvement over time? I see. Okay. Well, so I have a, I have a solution list for Colo.

about how to get it to be used all the time. I think. And I'm building out and it's kind of like working a little bit. ⁓ so the question is like, how do you get it to call the MCP tool every time? Because really what I would want to happen is like, as soon as the agent decides to execute some code to test whether the code that it's written or that it's trying ⁓ to write. It calls code straight after that, right? So every time. ⁓ What if like instead

⁓ I'll give a simple example to start, guess. What if instead of ⁓ like, the agent, let's say, Claude code running a test and then checking colo for the trace of that test. it just runs on the test. Yeah, or it invokes colo directly, or it's like a colo run and then py test.

⁓ yeah. Okay. And then colo instruments, py test, obviously. Yeah. then also on the return, it doesn't just give you the test results. It also gives you the trace of the test. Okay. Yeah, that makes sense. So then you, the agent doesn't have to call the MCP server. because you've instrumented the code base properly for colo. Well, yes. But all your scripts would be like, would be like colo run the same way that UV works like UV run and then. Exactly. Yeah.

the Python package. I think the thing that's novel about this is just that you get the trace back as well at the end, like inline. Yeah, the wrapper. And then that way, what's cool about this is that that way the agent doesn't have to decide, oh, do I call another tool now after the run? Yeah, so you just connect it together. Yeah, because the agent is always going to want to run the code. So why does it have to be an MCP server at all? In that case, doesn't, does it? Yeah, I mean, OK, so think so far it doesn't. We've gone back to the CLI.

Yeah, still wrap all the commands it was going to write in colo. Exactly. Sweet. And now it's just a rules file, guess, that's like, instead of doing PyTest, do colo run PyTest, or instead of doing ⁓ curl, do colo curl, or whatever. The cool thing about Claude is that you could actually enforce that because you could be like, the only allowed command is colo run. Right, right, right. Yeah. And that's it. The thing where MCP becomes interesting again is like,

not just can I output the trace information inline so that goes straight back to the agent when it does colo run, but also I can put further instructions to call MCP tools, right? So I'm like, if you want to dig into this line of the trace, you can just do get to detail. so much bigger there. There's something so much bigger there because if all of your commands in a code base are prefixed by colo run, then colo has actual knowledge of how Clawed

is using the code base. If you instrument this ⁓ in a repo, it could do that across every Claude code instance running on that repo. So many developers potentially. Now there's your enterprise offering. That's cool. Yeah. I like that. So you can also track. You can track how Claude is using the code base. That's really cool. Yeah. I like that. What commands are being used at which ones are potentially like

not safe and you could bubble that up in some sort of enterprise-y way for a PM to go like, Colo, I don't know, Colo UV sync, Colo run UV sync was ran like a hundred times this past week. That must mean that we are like, I don't know, pulling more branches or something. I don't know. That's interesting. Yeah. I think that could be pretty nice for getting deeper insight or deeper usage insight, guess. Yeah.

I love how all of your tools are focused on a very similar idea, which is getting more information from the data locks in people's brains. Yeah, man. Oh yeah, that's true. Yeah. Yeah. It's all about tightening the feedback loop. Yeah. Yeah. So I was always wondering, I thought they were so different, that actually makes kind of more sense now. And then the triathlon brand is really just about tightening the feedback loop of people knowing you're a triathlete. It doesn't have to come in 30 minutes into the conversation.

You just see it straight away? What do mean by that, Emile? It's saying you're in your car sleeves around the office. That's slander. Slander. Oh, dear. Oh, so I have a bit of news for you, actually. Oh, yeah, go on. Did you see that... I know we don't like talking about news on this. No, no, no. doing this fine. But this one's very relevant. We haven't even talked about GBT-5 and all of that because... Oh, GBT-5. Later, later, Okay. Human Loop is joining the Anthropic team.

I don't, when they say joining, always like, they get acquired? Did they sell? Did they merge? I get very confused, but they say Human Loop is joining Anthropic was the words that Raza wrote. this is very interesting for us because Human Loop came out of UCL where you are an alumni. That's correct. Yeah. In London. Yep. And.

one of my colleagues actually, was his, one of his professors was the, was one of the co-founders. Interesting. And I think, yeah, I think it was really interesting that they sold to Anthropic. I think it was also Anthropic's first, maybe their first acquisition, if it wasn't acquisition. Yeah. Or like an aqua hire. They're quite large to be an aqua hire. they raise money at a pretty hefty valuation.

Not that long ago. Yeah. I haven't followed super closely, but I like Jordan, who's the other co-founder besides Razzy. I actually did EF with him back in 2019. Wow. And he was fun because he was talking about AGI back then and no one was like, was like, well, AGI, that seems kind of outlandish. And I think as the first time actually in that EF code, I'd heard about GPT-2. Okay. And I was like, GPT-2? Okay, interesting. Let me look into this. And I was like, oh wow, this is kind of like novel. Yeah. But yeah, like...

I wasn't expecting to see this acquisition at all. this was the one I thought they were going to go to the grave with this one because I really like how the human founders speak. I've heard Raza want to watch a podcast. I'm pretty sure he has his own. went to an event where he convinced me not to do a PhD, which was quite funny. ⁓ been an alumni of a PhD, he was like, no, no, no, no, no, no, no, no, ⁓

quite nice people. And so it's fun when they seemingly have good outcomes. Obviously, you never know how these deals go down. Yeah, I agree. mean, was never a user of their product, but it's cool to see. I think there was actually, there's another company, two founders who we hung out with at the last OpenAI Dev Day, like Context.ai, they were quite open AI. Yeah, Henry Scott Green. And Alex Gamble. And Alex Gamble, Also out of UCL. They sold...

They joined OpenAI in a similar wording of events. No idea for a similar outcome. Yeah, Henry is doing bits in OpenAI now. He's creating a whole evaluation suite product inside OpenAI. So interesting. Like, evals have been a huge pain in building like MCP servers. Yeah. And it would be like, I think actually this is something I'm going to focus on like next week, is just building like a bunch of like, yeah, just a bunch of really good.

evals, I think, for Colo. We're actually working on that stack one at the moment, Will, who we hired, another PhD actually, very into his evals. And I do find it funny that that is, because the labs don't make many big acquisitions, or many acquisitions at all, whether these are big ones or not, I don't know. But they don't make any many acquisitions at all. interesting that the two that we know about are very eval related acquisitions. And I find it interesting that they

if they are buying them for talent that they don't have that type of talent internally, or they don't have people who think they are that type of talent internally, which makes me just think that everyone finds this really hard. And so these people get acquired because it's like, maybe they're selling a solution to a problem that they have. Do they have the solution? We don't really know, yeah, it's obviously a need that even the labs feel. I know, I know,

Yeah, I know we feel it quite strongly. I'm going to give you another theory for this. I think it's always so hard to tell what's going on in other people's companies. impossible. It's like relationships. You have literally no idea what goes on behind closed doors. it can be so jokingly bad. I remember, have I ever told you the story? At GitHub, I was working on the integrations team, building integrations into GitHub. integrations. They sound fun. ⁓ And I was mostly responsible for the... ⁓

Slack integration, but then someone else on the team was responsible for the Jira integration, like an official GitHub Jira integration. And ⁓ it was mostly replacing an existing really inefficient integration that was just hammering the GitHub API. And this new one was more webhook space, so just like a better way to do it. And then I think it was announced like reasonably quietly, but then overnight it blew up on Hacker News. And there was like a massive discussion about how...

GitHub is embracing Jira more and maybe is shutting down GitHub issues and how this is like the worst news ever for developers. So it's just like the speculation, especially at Kakanoo. And it was the same when GitHub was acquired, like the HN comments were just wild. And you read this, I think, sometimes like as an outside and you're like, yeah, that's pretty reasonable. It's a reasonable concern that GitHub would be building a Jira integration. But in reality, like the motivation.

Like all it takes is like one missing piece of info, right? And then suddenly the whole story changes. So, okay, here's another theory on these deals. So I think both those deals were announced like on the founders own, like LinkedIn feeds and acquiring the acquired companies, like Twitter feeds and whatever. But it's not like on the main blog of OpenAI or Anthropic. Like it's not announced there, which that does happen for like bigger acquisitions, right? Like both sides likes.

say, yeah, this like for the cognition windsurf deal, like both sides were like very much talking about it, right? Yeah. So maybe there are just like lots of these small ones, but we don't see the people. Oh, because we only know about it because we're friends. So maybe there are a lot. mean, the talent war is real right now. Like the other day we talked about like a hundred million dollar offers, right? Yeah. Which by the way, like it sounds like it is not just like

Nat Friedman or whatever who's getting like 100 million like there was some did you hear about the billion dollar one no seriously did you not hear about the billion dollar one i saw one where a researcher was offered like i don't know 125 million or whatever and he declined and then zuck doubled it yeah the billion then he took it no the billion dollar one declined yeah that's why but whether it was real or not i don't know but it was a guy from thinking machines and then it was really funny because then he made um yeah i think

Okay, so the story goes, whether it's true or not, I don't know who it is. But it's a guy from Thinking Machines, Miramarati's lab, he'd worked at Meta for like 11 years. Meta is such a, like, I would say like a breeder of this talent. If you went through Meta in like the late 2010s or 2010s to now, these people seemingly have like, like, demand massive, massive, remuneration packages. But,

So he was working on that. Then I think he was in OpenAI for a bit. And now he's at Thinking Machines, ⁓ which is Mimorati's lab. And supposedly he got offered a billion dollars. But he's a co-founder of Thinking Machines. Of compensation. or total comp over four years or something like that. Said no. And then someone thought it was really funny because he wasn't on Twitter.

And so someone made him a Twitter account and was then like replying, pretending to be this guy, amassed like a fair few thousand followers, maybe like 20,000 followers. And then started promoting a meme coin. And it wasn't him eventually. And he got reported. You got to admire that. But like the hustle to like basically ship post on everyone's stuff being like, huh, yeah, I mean, I wouldn't do it. I was a co-founder of thinking machines. We're going to do way more than this.

And then it was like psychosis took over. a few hours, was like, people keep telling me to start a meme coin. Should I do it? And then it's like, here it is. then he got blocked. that's, I do like Twitter for that. You've got to admire. You've got to respect the hustle. Yeah. So what do you think of GPT-5? wow. One merchant of OpenAI and it's like, what do think of GPT-5? I think GPT-5...

is useful as a naming convention. I think exactly what Sunil said when he was on. ⁓

would OpenAI researchers have left if GPT-5 was this good, was so good, which is what he said. And the answer is probably not, but some of them did leave. So I think he was right. I think he called it that maybe it wasn't as groundbreaking as everyone said it was thought, hyped it up to be. A few thoughts on this. A few thoughts on this. I think that

Yeah, the naming was necessary to unify the O series of models and the GPT series of models. Which seemed like that was the main goal. Yeah, it was seriously annoying for people who didn't understand the difference between 4.0 and 04, which I can understand. Yeah, for sure. 4.0 was a stupid name, but it was fun because I remember watching the press release and I was like, my God, 4.0. And then they broke it. Like if they'd have just gone straight to five, it would have been fine.

Then 04 and 40 would never have overlapped and it would have been fine. And the O series of models didn't make sense either. Why wouldn't they like T series of models, like thinking series of What would O mean? Or reasoning. Or reasoning. Our series of models. That would have made so much more sense to me. Why is it called O? Dude, I have no idea what the O stands for. The first O was meant to be like Omni. Like it can do the 40. And that could do like all...

⁓ mediums. It was like natively audio, natively ⁓ image, natively text. By the way, when you open Chargability now, you see Chargability 5 at the top, right? If you click on voice mode at the bottom right, is that still the 4.0 I've no idea. But I think they've made it more confusing now because GPT-5, so they've gone back to their old series of namings, is not a straight GPT model. It's not like just a singular transformer model.

Right. guess is, and this is just like still speculation, obviously, the guess is it's GPT 4.5, which was their largest model that they ever made. And it's that distilled with some reasoning traces. So it's like a smaller version of that, that's had some sort of fine tuning or some sort of extra training to enable it to output reasoning traces. And that's why they have different versions of it, because they'll have stopped training at different points.

and they'll have some that are more distilled and some that are less distilled. So some that are smaller than others. They'll have various versions of that model and then they have some sort of router or you can actually pick. ⁓ Although the router broke, I think. is silly. You can pick, you can pick like whether it's GPT-5 high or GPT-5 low. And I never know what the high stands for. the high reasoning or is it high effort or is it

I'm assuming it's high effort, like high compute. Anyway, that all makes no sense to me. The only thing I know is it's, I think it's a distillation because, and this is what I was telling you earlier, if I ran on a little bit more is that the, when you move to a next series of models,

Everyone's at, everyone's at compute capacity all the time. They have to be, they raise the money, they spend it on compute. They're always at compute capacity. Cause if they don't, they're leaving money on, they're leaving a computer on the table and even potential intelligence on the table. This kind of came to head when I was chatting with the guys from Coherent. It's like, they're always at capacity. Um, like a couple of years ago. And I was like, why? Because we need to train the biggest model we can. Because we need to get the best thing we can. Like.

The difference between ours and something else is that we have to do the biggest training one. We have to do that yellow run, that massive run. And then what I thought was quite sus was Anthropic released Opus 4.1 straight after Opus 4, seemingly, like not that long after. And then they immediately made it a default in Cloud Code. And I was like, holy shit, come on guys. You say Sonnet is your best everyday model.

But then you release Opus 4 and you're like, no, Opus 4 is just for special occasions. It's just for when you've got your birthday, you know. It's for special occasions, holidays and weekends, you know. Opus 4.1, no, now that's back to your everyday model. Yeah, that's... What happens there? So what I think happened is they released Opus 4, they got some good data from the usage, and then they did some sort of distillation to make Opus 4 smaller.

And now they call it 4.1 and now they can supply it to many more people at less compute capacity, less compute constraint. And that just makes a lot of sense to me that that's what they did. Distillation is like this crazy phenomenon where you could put more data into a model, you can make the model smaller and you continue your training, but you can actually often get very similar performance. I didn't know that's how that worked. Okay. That's fascinating. So the idea to everyone is to train the biggest model, the best model they can.

And then you take a much smaller model and you tune it with the, well, you train it, you carry on training with the outputs of the bigger model and the smaller model like develops into like the little brother of the bigger model. ⁓ it retains almost similar performance. There's an incredible scaling law with distillation where it retains almost similar performance for many times less, ⁓ less, less, less model size. So that's what I think is going on is they're playing with

that type of methodology. One thing that is kind of like, that seemed like a big goal for the Stupidity 5 change as well is that like, now people can get like, even all like, I don't know, our mothers or whoever in our life who, you know, uses default chat GBT just on free mode, not paying, they would start having like opinions about like, oh, this is what the AI can't do, this is what the AI can do. And I would always be like, well, but have you tried it in like,

or like a reasoning model or whatever. And the goal of this seems to be that now, because it auto-selects the appropriate model to use, even for free users, that people can get a bit more insight into what's actually possible. Do you think they've achieved this? I have no idea if they've achieved this. I think for consumer brands, that makes a lot of sense. So for ChatGPT, a consumer product, it makes a lot of sense that the consumer shouldn't have to pick the model. We get chats mostly in our sales group chats all the time about...

we have some internal co-pilots that we let customers use. And the chat is like, which model should we be using all the time? And I think for most people who aren't like on the money here, or they should just be given a router, a routed model, and it will work for, I don't know, the majority of queries. found, and it will also save a lot of compute if it does well. If classifier does well, they'll be serving a much cheaper model for simple questions.

That's why I was really bullish on the router companies a few years ago. And I'm interested to check in with them. There was one called not diamond, because these could be like also a candidate. Yeah. These could be like a candidate for like a sneaky aqua hire from one of those, from one of the big labs. Because to serve something like that in charge of BT, you could free up so much more compute there for training. Right. Right. Right. Yeah. And, and yeah, that makes sense. on the flip side of this.

I think as professional model users, ⁓ I have never been bothered by having multiple models that I can choose from. I think some people really make this like a big deal, like, my God, there's so many to pick from, but like they all have different strengths. And I know when I want to use 03 and I know when I want to use like a non-radiable Interestingly, you have to give them a name and then you have to like impute that name with some characteristics. You can't say I want the fast model. I want the slow model. Right.

I want, like that doesn't work. you have to give it like some sort of personality because the user has to be like, I'm going to make a judgment now to use the slow model because I think that I'm going to get better performance with reasoning traces. Like I found it quite helpful to have the distinction that like, okay, the reasoning models, 01, 03, 03 pro, whatever, they have really good reasoning.

But when they announced this back in like December or whatever, they were saying these models perform better in maths and coding, but they are worse creatively. And then they released GBD 4.5 and said, this is our biggest model that is not reasoning. And that is actually the best at creative tasks. So I know if I'm going to write a birthday card to a friend, no, I'm just kidding. don't just use AI for that. But a task of that nature, I would go to GBD 4.5, right? Yeah. Or your WhatsApp MCP.

So it's just like, yeah, I don't know. just think like some people have been playing up this whole like, my God, there's so many models, which one am I going to pick? And it's just like, no, but the routing then, supposedly quite simple, right? Like classification task. If it's a general knowledge task, send it to the model with the latest knowledge cutoff. If it's like a coding task, send it to our coding range of models. If it's a writing task, send it to gptformat.

That, of course, only if the router actually works and if you trust it. Yeah, well, I heard Sam Oman had to make a big tweet about how his router didn't work, so everyone was getting served shit models. Please try GPT-5 again. We promise it's amazing. That is a bit wild, isn't it? Yeah. So I just looked again, by the way. So they changed the UI, at least on my end again. So now the models are called GPT-5 Auto, Fast, Thinking Mini, Thinking, and Pro.

⁓ what is thinking mini? Wait, thinking mini, thinking I'm Pro? What the fuck is pro? So pro, yeah. The thing is like, did you ever use 03 pro? I don't think I used it once because I was so burned from 01 pro taking forever. Yeah, no, I never used it. I use 03 a fair bit, even still for coding. It was my most used model, I think, besides Claude. I still use it when like I

when I need to get some more insight onto something that I'm like, maybe I've exhausted all my knowledge. I've exhausted and I'm just like really stuck on something. And then I go to 03 and then normally I stick with 03 for the rest of the session. Thanks. I get quite excited about, but yeah, I never, never used 03 pro, There was, yeah, I think I had one that started spouting Chinese. I don't know which one was that. I was sure it was 03 pro. Sometimes you just think a bit better in Chinese.

Yeah, nobody all of its reasoning cases when Chinese has something is wrong. ⁓ Related question to this. I think one workflow that's worked very well for me manually is having like different model like families like reviewing each other's work. And that's one I guess of the kind of theoretical downsides of Claude code. I think they've said they will always just use anthropic models.

But it makes so much sense to me to have like a 03 or Gemini 2.5 Pro review the work of a bolts on it. Do you do that or do you have a workflow for that? We did a lot of work on evals in the last year and there are some really interesting thoughts that you basically can't get. The really bad one is to get a GPT class model.

to review another GPT class. They all like each other. They have like this sycophantic energy when it comes to like reviewing their family's code. They're like, my God, this is the best thing I've ever seen exactly as I would have done it. And it's like, on dude, we know that. But I always feel like Sonna is better. I always feel like Claude is better at doing something like that. I think you can.

put the model into like a reviewing latent space, which helps being like you're a reviewer now. This like immediately distrust everything that you're looking at. does that, can you actually get it into that if you have all the conversation history? Oh, no, probably not. This is like a restart situation. I do think Claude can be a capable reviewer in that way. Claude tends to be quite, which one was it? Claude tends to be quite.

Gosh, I think. I can't remember. We did tests with all of the Gemini models, of the Cormors, all of the OpenAI models in about December last year when we were trying to work out how to do LLM as a judge. we also did some Lama models as well because there was a company in London called Alta. Yeah, they demoed at the... Atla or Alta? Atla. Atla, Atla AI. Yeah, Maurice and Roman, I think.

Yeah, I met, it's actually really bad because I met Roman last weekend or the weekend before again because we were both judging a hackathon. nice. Yeah, he's really cool. He's really big into his triathlon as well. Is he? Yeah, massively into it. No wait, remind me of his name?

I'm not gonna get his last name right. I think he begins with an E. I think he spent time in Zurich. Roman Engler. Yeah, Engler. Did I meet him at Air Demo days or? No, his co-founder demoed. Ah, right, okay. Yeah.

That's awesome. Okay, I'll need to look into that. I think he's triathlon. I was moderating a panel with him on it, organized by Zoe from Dawn actually. And he was on it, ⁓ I'm not gonna name everyone else who was on it, but it was quite a fun panel. And the questions were all like very, the questions like were given were all very like PC, I would say. They were all like.

How do you think about finding a co-founder? How did you find the incubator you were in? They all went to an incubator. It was accelerate or not to accelerate. Basically should you take VC money? That was the panel. And I just, I wanted to throw a little curve ball. So I started it with like, what, ⁓ if you were a fruit, what fruit would you be? And the question, the answers are really fun actually. But I think Robin said he'd be a banana because he likes doing sport and bananas are very good for sporting energy. ⁓

Which one made me think, yes, you're definitely a triathlete because they love dinamas. That's awesome. OK, very interesting. ⁓ The altar, yes. They fine-tuned a Lama model. Yeah, yeah, yeah. There must be an MCP server out there that just a bit like the Oracle and AMP, basically. That's kind what I want. want an MCP that is basically just like, I probably want Gemini 2.5 Pro, I think, instead of 03.

I feel like it's the best model out there at the moment. For reviewing? Gemini 2.5 In general? In general. It's obviously awful at It still can't tool call, yeah, but otherwise it's fine. ⁓ I used ⁓ both O3 and Gemini 2.5 Pro to do my UK tax return a few weeks ago. Nice. Or at least to provide... There was some stuff I had to calculate, like some capital gains stuff I had to calculate ⁓ to give to the accountant. And I think then they review it all again. I try to give them just like the...

to GPT summary or the summary and they were like, can you just send me the raw data, But it was interesting because I had 03 review it and I gave it all the transaction data for all the stocks bought and sold. And it gave me a result and it was like, you actually lost some money this year on your stock sales. And I was like, okay, just saying, maybe that's kind of fine. You can carry the loss forward or whatever, not a big deal. ⁓ And then...

I was like, just in case, let me just give this whole conversation to Gemini to review. it gave me a completely different answer that I actually made a bunch of minus, like, okay, what's going on? And I gave that back to 03 to review. And I also kicked off in fresh 03 one and a fresh Gemini one just from scratch. And yeah, it turns out that initial 03 conversation, it had just got like a minus in the wrong place or something. It had missed that. So one of the stocks I sold was

Shopify and they had like a stock split. Ah, okay. So like, it had thought I had lost a bunch of money on Shopify, but actually gained a little bit or something. Yeah. But they just like completely missed that and it was like true information. Whereas Gemini like, didn't have that. Now obviously is that just like a random thing, but I don't know. It made me trust O3 a little bit less. And Gemini has been pretty reliable. I was going to like, you kind of want verification as a service.

And since we were on topic of free ideas, I love free ideas. I very heavily spec'd out like a business idea, which is all around the thought that as the cost of intelligence goes to zero, the amount of shit that just gets pumped out by these computers goes to infinity. How do you tell what's true and what's not in this world of ours? wait, we talked about this ages ago. I love probably talked about this. And this is like...

It's been an ongoing thought in the back of my mind for ages about how one might do that. And I think someone would start, probably start a very successful company on this. it's kind of a weird company because you're kind of a dev tool, but maybe also you start with a lot of human data labelers. How you phrase what your MVP is going to look like. I also thought it was kind of interesting for journalists.

I just really love community notes on X and I love the fact that like Elon Musk also gets community noted. Yeah. And I think, and, people are kind of using grok now as community notes sometimes. So someone will make like a statement and then someone will be like, grok, is this true? Or ask complexity, is this true? And like, I think that's actually amazing because you just immediately called out if you're chatting shit and it's like sighted and so.

That's awesome and I think we're going to need way more of that in this world. Yeah. this crazy world in the future. the community notes thing especially is such an awesome concept because I think I had a conversation after we last chatted, but yeah, this idea of fact checking, not through like an independent panel of fact checkers, but like through community notes, I think is so, so, so smart because of how community notes work. Yeah. this idea that like you, like community notes isn't just like a note someone esteemed wrote, right? But it's like,

both provided by community member, but then also voted on by community members. And a community note is only shown if enough people like Raided is helpful who have previously disagreed on other notes. Is it? Yeah. like it needs, so it needs to be basically not just like have a lot of votes, but it needs to be like uncontroversial. Interesting. Otherwise it's not, I think that is the genius of it, right? Because like the problem with anything now is that

There's like different contexts. There's like different, people have different opinions on like things that maybe in the past, past as facts and maybe for good reason, who knows. But facts are subjective and we all write our own history, whether we choose to admit it or not. Every, you see this with startups all the time. Subjective you said, or object. We all write, I would say we our own history. Whether we choose to like acknowledge that. I see this with startups all the time like.

startup that has an exit or they get further down the line and they talk about how impressive their vision was at the beginning when in reality they were doing Uber for cats, you know? And then they pivoted like 12 times until they finally learned how to get some space or some niche in the market. I just think calling people out on bullshit is probably important and it's going to get more important when those people are not people.

When they're when they're like maybe even malicious language models or whatever. And all of this data that we're creating on the internet is being used currently for the next generation of language models. And so I think we do have some sort of duty to make sure that that data is accurate and correct. at least it's aligned with as much of humanity as you can align it with. Yeah, yeah, yeah. No, it's...

Fascinating. yeah. So free ideas. Someone should build that. I should start telling all my startup ideas on here, but I think someone should. I really do think someone should build that. I'm tempted to just do... You know how Peter Levels has all of these subdomains where he has like different little apps? I've got loads that I want to do and I think one of them should be like Truth at matzakary.com. it's like an MCP server where you can send it, like is something correct? That is really cool. And I make my own like MVP of that. Yeah.

man, someone should really build like a ⁓ MCP interface for like a human to use to look at MCP servers. And I don't mean the MCP inspector. mean like probably more something that will generate the inputs as well. Like a functioning- Like a tester. MCP client, but not for testing, just for like exploring the MCP server. Like a playground. A playground.

A good play, like a really good playground. I know some people make initial efforts. So Cloudflare had one, it's very bare-bones. But I think it needs to have good inputs, basically. the Muppet guy, the guy who made Muppet, he works at Sentry now. He also ⁓ made one for Muppet when they were doing Muppet MCP. I think we're going to make one for Disco. I think that's in my plan for next week is to make one. We already previously made one and then removed it.

because we couldn't get the UI quite right or work out how it fit in the constant user flow. So we'll probably revitalize that. I do think, yeah, I always thought it was really fun if every MCP server could ship with its own playground that turned it into an agent. Yeah, oh my God. mean, I guess some... Not that many people vibe on that idea. I've I've said it to a lot of people, but I want every MCP server to ship where you go to the URL,

you can like sign in with OpenAI maybe or sign in with Anthropic. they have some sort of... But signing with OpenAI is a thing now? I don't know if you can do sign with OpenAI. know you should be able to do sign in with Anthropic if you hack the code code. Right. Like user flow. That should be possible. I'm not sure if it's like allowed or not, but if you do that sign in with Anthropic, someone could... I think it'd be really fun because then someone could...

go to the URL of your, go to the webpage of a server that's now no longer a server, it's just an agent. They can sign in, they can have running tasks that just run in the background continuously. Maybe even allow like these MCP servers to have cron jobs. And then now you've made like the beginnings of like an agent platform all based on the idea of an MCP server. I like it. I think for sure this needs to be built.

Someone will build it. And I think that's what Disco will probably become eventually. I just wanted to have good examples, right? Like if I'm testing out the context of an MCP server, don't make me come up with Tailwind or with whatever library, right? Like I needed to just be click, click, click. Just show me what you can do. And I think the registries will... This is where the alignment needs to come in. Like this is why I think Smith3 maybe are going to have a problem because they're not... They take third party MCP servers, right? And they host them for you in a container.

And when you do that, you're not overly invested in the individual servers. Like for you to make a change to the underlying server, to the underlying tools, like that would be like a, an effort. You'd have to pull the server, you'd have to make a change. You'd have to get the original contributor to commit it back. Like all of that stuff would feel like an effort was there. You need a platform that cares about the servers themselves and wants to make a good experience in using those servers.

Yeah. I agree. I don't think that exists yet. Yeah. The registry stuff is going to be interesting. I think it feels like it's happening soon. It is. We started this whole chat with- I'm going back to VS Code. Oh, you are from Cursor. Well, just for developing MCP service. Oh, yeah. mean, the MCP support in VS Code is incredible. Amazing. Amazing. They've done such a good job.

The guys who were working on it, they're really active on Twitter. I was just like shitposting, like I'm going back to code and they were like, oh, if there's anything you can do, like message me, message us. I think it's really smart. We're going to try to stay to date. Yeah, Microsoft, super community oriented. And not just like, they support everything? because you look at like the example clients list in MCP and MCP supports like six different things or whatever, right? It's like.

prompts, resources, tools, sampling, there's solicitation, there's a couple of routes. like VS Code is the only one that supports them all. Yeah, not even Cloud Desktop. Yeah, Cloud Desktop doesn't do sampling, it doesn't do a bunch of stuff. And I think, actually I went to MCP night last week, MCP night 2.0, which was hilarious. I sent you some pictures from it actually. They ended with these two guys rapping.

And their rapper name was MC Protocol. Okay, San Francisco is a really weird place. Do we have any listeners from San Francisco? know? Surely. Yeah. I guess we might do. We check the stats. They were wearing big gold chains that just said MCP. I think we should get some of those. Should we get some made? We should get some made. I was thinking we should get a sponsor and then all of these things where we're like, we should get merch. We could actually do that.

So if anyone wants to sponsor us, actually, I'm not asking. We could just pop some emails off. Should we do that? Yeah, sure. Will's like, no, not the business I have to run. No, man, this is super fun. Wait, I was going somewhere at some point with one of these threads. You started with, I could fully cut you off about the things we originally started up. We're going back to the start. No, I just think it's a very smart thing to be, yeah.

VS Code, not just do they build all the stuff that's in the spec, but they also, I think, are co-maintainers of MCP now. So they actually build stuff into VS Code that's part of the MCP, or is about to become part of the spec before it's even released. it's very ahead of the Smart move by them. I think it's a smart I still think MCP is going to be much, much, much, much, much bigger than it is right I think it's going be huge. I think you won't even notice it. I think it will just be the

protocol that agents communicate. Maybe not that agents communicate, but the agents have access to external tools. And we'll think in like two, three, four years, and we'll be like, how did AI ⁓ access external stuff before? Like before, and you won't even think about it. It'll be in all of your web frameworks. You'll declare it like a robots.txt or you'll...

not even TXT, you'll declare it like your API endpoints. Your website will have API endpoints for browsers and they'll have MCPM points. And I'm really looking forward to the first AI native web frameworks. That's an interesting thought. I'm really looking forward to it. think there are going to be some people that are able to do really mad stuff where they can integrate the infrastructure with the framework.

Like a certain orange cloud. Okay, ⁓ while we're talking about the orange cloud, what do you think about them trying to charge model companies for accessing publishers' websites? Yeah, it's interesting. ⁓ Because they have just for context, they have a very good bot blocking. ⁓

product and that is like one of them, Cloudflare's main value propositions. And they came in a bit of hot water recently. Well, not maybe hot water, they- Controversy. Controversy where they've done the big negotiation between all the model companies and some publishers and they're allowing the publishers to get paid for when the model companies script their websites. Some model companies are feeling a little bit left out potentially of this deal. Some, and then some people who normally we would consider publishers, but

didn't want to be considered publishers. think Hacker News was there. Why have you turned this on for our website so now we're not getting scraped by all of the bots? I think this gets super complicated as soon as you dig into it, even the slightest. to be honest, I have mixed feelings about it. think there's a core idea which makes sense to me, but then there's a lot of parts of it that make less sense to me. I think if you're...

There is something to be said about the narrative that the CEO announced this with, which is like a core kind of, ⁓ I guess, like framework or the core kind of tacit agreement of the web. Is that like, if you publish content, then you are rewarded by people coming to your website to read that content. And then by them coming on the website, that means they need a subscription or they need, you can serve them ads or...

you can sell them related products or whatever. But if that information now is just summarized or copied into an AI chat and the user never goes to the website, then why is anyone... on the value. Yeah. Then no one comes to your website, where is the incentive to create content? I think that's a very real concern if you're a publisher of content, if you're a media organization or...

Well, they're getting absolutely screwed right now. Yeah, exactly. Whereas if you're making a physical product, ⁓ Like say you're making a great mattress. An incubator for startup founders. It really doesn't matter. Like you're not making money from people coming to your website. Like if someone types in the chat, give me the best mattress, then you're making money by them buying it through chat. Exactly. And they can't not use your product. Yeah, this thermostat is lying to us. It's getting so hot in here.

does it get up? Yeah, I wonder if even in VC offices the AC shuts off after 5pm or 6pm or whatever. actually, maybe, yeah. That must be what it is, yeah. So yeah, I have mixed feelings about it. don't know what the source is. Also, you have to differentiate between training and using, right? It's one thing to train your model on all of what's published on the internet. Then it's another thing for a user to instruct an agent to...

visit a site to retrieve some information. I don't Do you think? Yeah, for sure. Like I think that's kind of a different. So training I think is around IP. Sure. That IP costs. But like. I think, okay, the only thing I think of, the only thing I can be sure of in this is that AI companies are going to need a continual stream of data. Right. The reason why they're training on

reasoning traces is because the very high quality amount of data is getting less and less and less, all that they haven't ⁓ used yet. ⁓ So to build those corpuses, they need more data. If they can pay publishers to create more data, then to me, it seems like they would do that in a heartbeat. They are already paying companies like Scale AI.

OpenAI is on board with this Cloudflare deal as far as I'm They're pretty much all on board. The ones that got very angry were, like you said, the ones that were saying that they were going to use an agent at runtime. And they were scraping data just for runtime retrieval. The thing is also, maybe I want to build a scraper at some point that scrapes something just for my own use case. If I'll have to pay, I don't know, I guess if it's affordable.

Maybe it's worth it. like, mean, it costs like $5 a month to run like a very successful website on Cloudflare. I can't imagine it's not going to be affordable. ⁓ yeah. I mean, guess it depends on the, I mean, if you have to pay a dollar to scrape the New York Times, otherwise you're blocked by Cloudflare. I think that's, it feels like less than it would cost to read it, ⁓ What do you mean?

If you were going to read the New York Times, if you were going to buy the New York Times and read it, what's that? $5? Yeah, I guess. Yeah. So if you could get that data, but you miss the niceties of like having it on a nice website or having it printed out and you paid less money, that makes sense to me. It's like a fair trade. Does that not make sense to you? I don't. Or do you just have this fundamental belief that everything someone puts on the internet should be for free? Because I fully believe in an open web, but I...

don't believe that companies should be able to monetize that content extensively without, without, Yeah. Yeah. I'm not sure. I'm not sure exactly what I believe here. think it gets murky quickly. Like an open access web. Yeah. I think, yeah, I don't know. listened to Matthew Prince talk about this several times now.

I guess I'm quite excited. I'm quite excited because I think... I'm excited for you. No, no, no. If they do manage to create an economy here where it is sustainable for publishers to keep producing content, then we're going to optimize for higher quality content again. And if we optimize for higher quality content, we're going to get better AI models. And Cloudflare earns a pretty penny on the way, which...

as a shareholder, I'm very excited about. Personally. No, I think it's a good point. I agree. Well, on that bombshell, I think it's probably time to end. I was going to say one last thing, which is just a funny thing to wrap us up. We talked about group chats at the beginning, Yeah. I was added to a group chat recently. And I don't know if this is becoming a trend or if this is the one group chat, but the group chat is called

1 million beers. And every message in the group chat is someone posting a beer and then incrementing a counter. Wait, let me see what is it at moment. At the moment, 931. Okay. How many people are in this group chat? At the So he says like five. 291. Okay. And then yeah. Is it on WhatsApp? It's on WhatsApp, yeah. Because at some point WhatsApp will disable the group chat. Do you want to be added?

So here, here's the thing. ⁓ well, also, they are, they are ruthless. saw a guy post a cider. no. kicked out message deleted. ⁓ that's very funny, which is brutal. But yeah, if you post like every message, just a picture of a beer and nothing else, counter going up. Not every other message is deleted. That's quite draconian. like it. It's quite different. The only issue is right. So WhatsApp has a

max group size of 1024. That sounds like a very particular number. Yeah, yeah, it's like a kibby bite to one. That sounds like integral to their architecture. Which means, even if you add a thousand people in here, they will have to drink a lot of beers. If everyone posted one per day, or no, like you would have to a thousand beers per person.

Yeah, to get two. Which is like three years. So it's three years if everyone does one per day. So it's gonna take... This is gonna be on for a while. It's gonna be on for a while, The moderators, they... That's like starting a star. They're like, we're in this for like 10 years, guys. Let's go. a big thing, yeah. So if anyone wants to be added, and if you're a heavy drinker,

Send me a message. You have to be a heavy drinker because you have to put your weight in that Exactly. Yeah. I think probably I'm not even worthy of it, but I'll stay until it fills up. Yeah. I'm not sure you can add me. Cause I'm not doing that many, I guess like, okay. So the one a day, this is also a misnomer because what, at least in London, what's going to happen is on Wednesday, they're going to drink four. On Thursday, they're going to drink five or six. On Friday, they're going to drink like six or seven.

Saturday and then my trick's seven or eight or nine and then Sunday you'll maybe be off. I clearly hang around the wrong circles. that's like London finance bro, isn't it? You see them all outside in the white shirts and the puffer jackets sat like outside the pub. I remember the first time I was in London for a summer interning in 2014 and I saw a guy in a suit but also wearing shorts like that's a look. Having a pint at like 12 p.m. in Leatherne Hall Market.

And were you just like, what is this? I was just amazed. I was like, wow, yeah. Having a pint like midday in the heat in the summer. just, that's amazing. But I just thought of one thing that we have to shout out. We can't get through this without shouting out. And that is that some people in this building did really well in a hackathon that went down in Sweden. yeah. I don't know much about this. Okay. So it's like a lovable hackathon. Lovable. Lovable. Criandum.

20 VC, all basically all of lovable's investors organized a hackathon in Sweden, massive prize, maybe like 20,000 euro prize. It was 24 hours. They flew you out there. Um, it was on Monday this week. And I think these guys managed to make a company and sell it. That was their party trick in 20 hours. They called it like the 20 hour company and they made it and sold it. I think the product was a little bit lame, but they managed to do incredible outreach.

got people like Cluely on board at like 2am in San Francisco time. They got someone from Cluely on board, like a marketing intern to do a video. And then they got acquired the next day, before the end of the hackathon. And I thought that was pretty impressive. was pretty cool. Yeah, yeah. mean, did they share how much they were acquired for? Or like, what was the product? you know? It was some automated outreach product. Oh, I see, see. Interesting.

I have no idea how That makes sense to develop a that is also your distribution channel and you can reach. Yeah, that's kind cool. Yes. I mean, their distribution channel was the fact they were doing this hackathon. They like hyped up the fact they were doing a hackathon. They were going to be the first people. But it's quite an audacious goal. I thought that was a wildly audacious goal to be like, we're going to make a company and sell it in 24 hours. Because they did say that they were going to sell it in 24 hours. Yeah, that's very cool. That's very impressive. And yeah, very, very cool. Anyway, ⁓ should we leave it? Let's wrap it.

Thanks, boss. Lovely to do it in person. can't believe this is our first one in person. But yeah, if you've got one hour and 15 minutes in, I'm very impressed. And we shall see you next week. we've never done this before in person. So like, comment, subscribe. Follow the show on Apple Podcasts and Spotify. Give us a review on Apple. I think that's very good for podcasts.

Review on Apple, review on Spotify, review wherever you get your podcast from. also, it's on YouTube now. So if you search the title of any podcast episode or Bad Age at Podcast, you should find us on YouTube. And yeah, ping us a message if there's something particularly you want us to cover, or if you've got any cool guests that you'd like us to have on. I think we've had some awesome guests, amazing DevTool guests. Yeah, and more coming up to be revealed. we've got some really good ones, actually. Some really good ones. I'm not going to say the words.

Just in case it happens. I don't want to jinx it. Bye bye. Bye, big love.

Creators and Guests

Host

Matt Carey

ai engineer @StackOneHQ

Host

Wilhelm Klopp

building @kolo_ai

In Person!!! Topic of Agent Payments, Friendly Acquisitions, GPT-5, Endless Model Name and Many Free Ideas

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere