AI Engineering is Dead? Hectic Stag Dos, Event FOMO or Not to Go and Agent Evals are HARRDDD
Matt Carey (00:00)
Dude! Lovely to see you.
Wilhelm Klopp (00:02)
Are we live?
Matt Carey (00:02)
Mate, we're always live.
Wilhelm Klopp (00:04)
wow, I just did a big stretch and it felt so nice. It's great to see you.
Matt Carey (00:07)
My...
Yes, I would see you too. Dude, my back actually kind of hurts. I went on a stag day this weekend and I came back with painted nails and an absolutely broken back and those two things aren't actually they happened at two very different points. Yeah, I just wanted to clarify. No, they both happened on the stag day. They just happened at different occasions. We were like climbing through a forest.
Wilhelm Klopp (00:14)
my god, look at them. They look incredible.
wait, the nails happened on the stack too? Or the back didn't happen? Okay.
I also went to a stag do actually
last weekend.
Matt Carey (00:31)
yeah? you're in Peru right?
Wilhelm Klopp (00:33)
I was in Peru and then the Sagre was in Las Vegas. First time in Vegas. It's a crazy place. I'm not quite sure I want to go back to be honest because it's just like so over the top. We were in Caesar's Palace and they have all this like huge lobby like very over... it's just so over the top ornamental. Like it's just so much. And I'm like is this what people think Europe is like?
Matt Carey (00:35)
wow.
Mm-hmm.
Wilhelm Klopp (00:54)
It's just like ornaments over the top everywhere and like marble fountains and like all this stuff. Like it's just mental. but yeah, it's just, I don't know. And there's like lots of smoking indoors. Actually, I'll tell you what, I'll tell you what most annoyed me. And I would even use the term, I found it borderline offensive, which is not a term I often use. feel like, yeah, taking offense to things. was often not like the most productive thing or whatever, but
I think like you and I, like, you know, we spend our lives like trying to build products and getting people to use them. And we like want more people to use them because we put all of our like love and blood and sweat and tears into these products. And it's like, why don't more people use them? Or like, you know, we'd love if more people use them. And so that's like the back, the backdrop. And then you walk through like the Las Vegas casino floor and you see all these people sitting at slot machines, just like clicking the button and you're just like...
Matt Carey (01:42)
you
All the bright lights though,
all the bright lights man.
Wilhelm Klopp (01:46)
These slot machines,
they're designed to rip you off. Like they're designed to just take your money. There is no value. They just bling, bling, bling, waste your time and money. And yet there's like hundreds of people all hours of the day at the slot machines. It's like, what?
Matt Carey (02:01)
No, but you could win. No, I'm kidding. I actually hate that as well.
Wilhelm Klopp (02:02)
Yeah, I see.
So that was just a bit like, whoa, like, clearly it's like a bit of a reset. Like, you know, maybe the world is not how I think it works. Anyway, how was your stag do?
Matt Carey (02:13)
funny,
funny. Oh, it was good. It was in a, like a cottage in Wales and we went for like, I don't know, mad old hike and stuff. It was, it was pretty fun. It was quite relaxed. It quite nice. Just like a group of people. Really nice to meet all of them before the wedding as well, cause I didn't really know any of them. So it was, it was pretty good. It was pretty good.
Wilhelm Klopp (02:19)
Nice.
That's awesome.
Totally. Yeah. that's a good point. I didn't realize I would see
all those guys again for the wedding. That's a good point exactly so when is when is the wedding for you for this one?
Matt Carey (02:37)
Yeah. You didn't realize you'd see them all again? Wait, were you not invited to the wedding? Just the stag do?
Personality hire.
it's in like two weeks.
Wilhelm Klopp (02:49)
Okay, cool, cool. So it's like pretty tight. Yeah, this one is until August. So I think that's why I wasn't thinking about the wedding. So wait, how did the painted nails come into the picturesque Welsh hike?
Matt Carey (02:54)
fair.
⁓ it was just fun. Yeah, I don't know.
Wilhelm Klopp (03:00)
But did
you like go out or something and it happened at a salon or it was at home?
Matt Carey (03:04)
No, we like really didn't... No, we really... No,
dude, it's so bad. I was like quite drunk when we did them. Anyway, we can get off this topic. We have so much to talk about. Go on.
Wilhelm Klopp (03:11)
I think it's great. actually, we have some, it feeds
actually into one of the topics I wanted to bring up. So it's, it fits actually very well here because I was listening to another podcast, a podcast I've really started to like. It's by CGP Grey and another guy called Mike. It's called the Cortex podcast. Do you know CGP Grey? A huge YouTuber made all these like, like animated videos about like how to become Pope, the history of the city of London and how it's different to London.
Matt Carey (03:29)
Hmm?
yeah.
Wilhelm Klopp (03:39)
They had a huge video in 2015 called Humans Need Not Apply about AI predicting basically that self-driving cars and robots would be imminent. obviously that didn't quite happen. But anyway, huge YouTuber, I really like his stuff. He has this podcast and it's just the two of them chatting and they had their 10 year anniversary. They did a little review and he said one thing that I thought was really interesting about...
Matt Carey (03:49)
Yeah.
Wilhelm Klopp (04:05)
his theory about how podcasts work. And he said, a podcast must eat one of three things. Either your podcast eats the news, so like every week you talk about, here's what happened. So it eats the news, it eats the guests, or it eats the lives of the hosts.
Matt Carey (04:21)
What about all three? I think I want to do all three. I want to have the most buff podcast ever.
Wilhelm Klopp (04:21)
And I think that's so good, right? You want to do all three? ⁓
That's great. So I don't know. I think that's interesting because, and he was saying like, the news is obviously the easiest one, right? Because you like, there's just always news and you can just react to the news. But, and then the lives of the hosts is probably like the hardest one or like, it's like the one that where you have to really make sure that there is like content or whatever.
But I think it works really well for us because I think we both live very interesting lives, right? Like we just spent five minutes talking about something we hadn't even put on the agenda. Even though we have like 20 things on the agenda. But yeah, I agree. Like we probably want to do some of everything. But I think the lives of the hosts is a great one for us, to be honest.
Matt Carey (05:00)
Well, we can do our best. We can do our best. I don't know. I think you live a very interesting life. You're in Peru, aren't you?
Wilhelm Klopp (05:05)
Yeah, yeah, then after Vegas, we were in Peru for a week, which was cool. ⁓ Yeah, like beautiful. Do you wanna talk about it or like should we talk about it later? Well, let's talk about it.
Matt Carey (05:10)
Hopefully.
I mean
you go for it. Just just tell
Wilhelm Klopp (05:17)
It's, yeah, so we did one of the Machu Picchu hiking trails. So it was like a five day, four night hike. And let me say, incredible nature, stunning, right? Like you start the hike, so the city you fly into, Cusco, is at like 3,500 meters elevation, super high up. Like, I think...
Matt Carey (05:22)
Hmm?
Yeah, it's really high up. It's really high up, right? Yeah.
Wilhelm Klopp (05:39)
Like where we ski, like in France, that's not at 3,500 meters. Like maybe some of the, like that's lower. Like I think Val Thoren is like maybe 3,000 meters or something like that. It's like, so it's super high already. And then I think at the peak we were at like 4,200, no, 4,600 meters. So like.
Matt Carey (05:59)
Yeah, I that's pretty
hectic. Like I start feeling quite ill over like about 4,000 years. Because I've been up there a couple of times and every time it's like, is, this is tough. Yeah, it's actually horrific, isn't it? Yeah.
Wilhelm Klopp (06:04)
Same. Yep.
Dude, my altitude headaches were awful. ⁓ It was, yeah.
And I also realized that I really don't enjoy camping. Like I thought, know, four nights, how bad could it be? Turns out I can... Yeah, it was cool though because we were gradually sleeping in like slightly nicer setups, but the first night was rough, man. Like it was freezing cold. So we were in like a little bit of a dome.
Matt Carey (06:20)
Four nights is quite a lot, four nights is quite a lot. Yeah.
Were you tenting?
Okay.
Wilhelm Klopp (06:32)
Like
a tiny dome that someone had built but it had no ventilation and it was like glass. So First of all, the air gets really bad really quickly and you get a headache which I did on top of the altitude headache But then the condensation starts dripping from the ceiling on you like while here and it's freezing cold like we were in two sleeping bags Wearing a beanie Man, it like relying on body heat to like make things happen
Matt Carey (06:52)
You
Wilhelm Klopp (06:57)
So like, was, so that was tough. And I don't know, man, like I think some people love this stuff. I just kept thinking like, you know, seeing this nature is beautiful, but also like getting to write code every day is beautiful, you know?
Matt Carey (07:08)
my
god Will, you can't compare writing code to going to see Machu Picchu. You actually can't do that, it's not okay. I'm reading The Hippie by Paolo Coelho, I think, the guy who wrote The Alchemist, and Machu Picchu features quite a lot. It's pretty interesting. It's cool. Anyway, should we move on? We have a... ⁓ dude, you know what we were talking about after the last podcast?
Wilhelm Klopp (07:12)
Ha
Okay. ⁓ cool.
Let's do it, yeah.
yeah.
Matt Carey (07:30)
because I was like, oh we should edit this one, maybe add a jingle and you were like, no we need to release it. And then I still took about a day to release it. Well in that day, and we were like, we need to release it stays current and so it stays interesting. And then the day after, Sonnet 4 came out. And Opus 4, yeah Opus 4 came out, like the new Claude models. So yeah, I mean I've been finding them pretty good. I've been finding them pretty good.
Wilhelm Klopp (07:34)
yeah.
Mm.
big model release day. Yeah, things just happen every day.
I
was away most of the time. I haven't used it a ton. I actually was using it a lot yesterday. I was using Sonnet 4 and I was using Cloud Code heavily yesterday. Yeah, I was thinking, man, I was like, wish I had like, I feel like the cool kids, have like their own really hard evals and they don't like, you know, any current model, maybe 50 % of their evals work.
Matt Carey (08:02)
Mmm.
Was it good?
Yeah.
Wilhelm Klopp (08:18)
And then the new one comes out and you can be like, oh yeah, 60%. Cool, it's better.
Matt Carey (08:21)
It's better. No, but dude, that's not
your job. That's actually not your job. Like your job is to use the model to like eval and to like just decide which one you think is better based on some usage time. Like the evals is for the labs to game and things like that. don't think, I think it's, the evals are so hard. You can't expect a consumer to like start playing with evals. Like even on your own tasks, like your own tasks must vary a lot.
Wilhelm Klopp (08:35)
Sure, sure. But, you know.
Right, it's true.
Exactly. Yeah. Yeah, so that's the thing. I feel like it's slightly hard to tell like how good they are. Like I haven't I mean, I haven't used Opus for a ton, but ⁓
Matt Carey (08:55)
Opus
has been quite nice for... I think it's the only one that can... I think that there is an eval in making a good name with a pun or with something that feels like nice and I think Opus is the first one that can make a decent name.
Wilhelm Klopp (08:57)
Mm.
cool. yeah.
Matt Carey (09:09)
Because you know how if you ever try to make a name with GBT, it would say something like two normal words just stuck together and it'd be like, here is a catchy name. And you're like, that's not a catchy name. I think, well, with the original GBT4 was like a killer for that. But I think Opus is the first one that can, Opus4 is the first one that can make something that's fairly reasonable as a name. ⁓ Yeah, but I don't know how you programmatically do that, but I think that's quite a nice eval actually.
Wilhelm Klopp (09:10)
Nice.
Yeah, yeah, yeah, yeah, yeah, yeah, I've seen that, yep.
That's.
and like creative and yeah, that's cool.
Yeah, yeah, that's interesting. You need like a human reviewer or something for.
Matt Carey (09:35)
Yeah, because I always find naming super hard. Everyone does. If you can make a model that's got a better name, yeah, maybe they could name themselves
at that point. They could like work out like what generation they should be, whether they should be 4-0 or 4 ⁓ or 04. Yeah, anyway. 4-0, 04.
Wilhelm Klopp (09:46)
What they should call themselves. That's cool. 40 or 404. Yeah, interesting.
don't know, I just wish like, yeah, so you're right. I think the evals are very hard. Whenever I tried to make an eval, I just couldn't make it hard enough. Or even like, I had a thing the other day where I was like, because ADER has its evals, right? Its own benchmark. And I was trying, I was playing around with them.
Matt Carey (10:01)
Yeah.
Wilhelm Klopp (10:12)
And those ones, they top out at like 70 % or something, right? No.
Matt Carey (10:17)
Yeah, but I heard, I heard, sorry to interrupt, heard ⁓ Sonnet,
I heard Sonnet 4 is worse than Sonnet 3.7 on the ADA benchmark. Am I like, am I, am I reading something awful there because I, I think that's just completely saturated.
Wilhelm Klopp (10:26)
really?
fascinating.
Well, OK, so I'm just looking at the thing now. So Sonnet 3.7, we're thinking, gets 65 % of the ADO benchmark. And the peak of the ADO benchmark is 03 at like 80%.
Matt Carey (10:43)
No, it's
Opus that gets worse than Sonic 4. Opus 4 gets worse than Sonic 4. Which is weird because Opus 4 is a better model in a lot of respects.
Wilhelm Klopp (10:48)
⁓
I see.
So the thing is I was playing around quite a lot with the Python part of the Ada benchmark. And what I found, even with testing with 3.7, is that not all of the test cases were passing.
Matt Carey (10:58)
Aya.
Wilhelm Klopp (11:14)
when run under the constraints that the Ada benchmark itself enforces, which is pass at two, so it gets two attempts to solve the test. But then if you just take that away and let it go 10 attempts, it just solved all of them, even 3.7. It took like six attempts in one of them, I think, but it just did all of them. like, I don't know. Like to me, like I...
Matt Carey (11:21)
Okay.
Wilhelm Klopp (11:36)
It's not that interesting to me if the model can do it in one or two attempts. Like it's more interesting if it can do it like at all, you know?
Matt Carey (11:40)
Yeah, I had
something today which it 100 % couldn't do, which I think should be a benchmark at some rate, is working out JavaScript packaging.
Wilhelm Klopp (11:52)
Totally, yes.
Matt Carey (11:53)
like for like really
old libraries that haven't been packaged for ESM and then like getting them to work in like multiple different runtimes, I don't know. Absolute pain and like getting the TS config set up in a monorepo with like do you use composite? Do you not use composite? What even is composite? I don't know. Anyway, this is getting super, super nerdy. I've gone, you watch. ⁓
Wilhelm Klopp (12:01)
that's cool. I like that.
Yep.
No, it's good man. This
is what we, this is what it's all about. Wait, can I, before we talk about the watch, can I just say one other thing? Because it's also very related. I'm excited to hear about the watch. So I was, I was doing, we are the bad agent podcast. I think we can talk about this agent stuff. It's, so I was using Claude code yesterday quite a lot for what in theory should be like a...
good task for an agent. I was doing some work in SimplePool to migrate from a unmaintained mocking library, which has all kinds of issues and is annoying, to a fully maintained mocking library. This is to mock HTTP requests in tests.
In theory, it's like the perfect task for an agent, I think, because it's a closed loop thing. Agent can keep running the tests. And also, it's very clear what the end goal is. All the tests migrated, all the mocking library uninstalled, all the tests passing. But it just absolutely wasn't working. I think I spent the whole day on it.
and with Claude code and Sonnet 4, and it just couldn't quite figure it out. It kept getting off track a lot. It made all kinds of scripts to update the code. had all kinds of summaries, which were not really correct. It kept not running the tests. So don't know. It just felt like... ⁓
Matt Carey (13:19)
Yeah, the summary thing
I really hate, like, it keeps on making documentation of itself, but then they immediately go out of date and then you're left with loads of random markdown files. Yeah, so I think we need, I might add that to Shippy actually. Add, like, if it finds random markdown files in the code base, it like checks whether they're correct or not and then updates like an actual cursor rule or an actual Claude agent rule or something.
Wilhelm Klopp (13:24)
Yeah.
Yeah, yeah, yeah.
Right. Yeah, exactly.
Mmm.
Yep.
Yeah, that's really interesting.
I love the shine too because at any point there's like five different ways we could continue going. And then, so well at the end of this, just to finish this, like I... ⁓
Matt Carey (13:53)
Yeah, go for it.
Wilhelm Klopp (13:54)
I was like, okay, cool. think I have learned a few things like watching Claude do this and we've seen a few things and tried a few things. and now let me like try, like, let me push all the changes it made up, keep them as a reference as a branch, but then like try again, fresh based on what we've learned. And then it was actually kind of more fun, know, like being a bit more in the driver's seat. And man, I wrote like my own if statement that was solving a problem that Claude hadn't quite figured out. And I was just like, dude, this is so fun. This is what programming is.
be like...
Matt Carey (14:21)
You've rediscovered programming! Yeah, no, I do agree, it's fun.
Wilhelm Klopp (14:28)
Anyway.
Matt Carey (14:28)
It's cool and there's
some stuff at the moment that I've been playing with that 100 % I just know the model is just gonna mess up like anything to do with I've been playing with a lot of durable objects and Like the actor model like anything around that the model they have no idea what's going on. They're like what you're referencing something like
Wilhelm Klopp (14:34)
Right.
Nice.
What's the actor model? Is
that a Cloudflare thing?
Matt Carey (14:49)
No, I don't know where it comes from. It's like a programming model with static objects. I'm really probably not the best person to describe what it is. But yeah, anything surrounding durable objects, I find it gets very confused. It's amazing at React. Well, in my my uninitiated eyes, it's amazing at React. Dude, we have so much to talk about. Okay, okay.
Wilhelm Klopp (14:56)
Okay.
Mm.
Let's talk about the watch, okay?
Let's talk about the watch. ⁓ Well, what's the watch, bro? Okay.
Matt Carey (15:11)
No no no, can talk about watch. I got a new watch, cool. End of story. No no no. It's actually awesome. Dude, it's actually awesome. It's a Tactics, so it has
weird tactical stuff that I've had to turn off because one of the shortcuts is like a kill switch that wipes the watch. ⁓ Which doesn't sound very fun. it has like ballistics mode in it as well, which I dunno. Dude.
Wilhelm Klopp (15:30)
Wait, what?
Like what? What is-
Okay wait, back up. I thought you would get like a new garment or something, like the Phoenix 8.
Matt Carey (15:39)
I did, I did. It's
a Garmin Tactics. No, it's a Tactics 7. It's like, I got it because it was crazy discounted because the A just came out, but it was cheaper than some of the Phoenix's and I wanted one with good battery life and a solid screen. And the Tactics has great battery life and a solid screen. So I was like, let's do it.
Wilhelm Klopp (15:42)
What's a garment tactics?
Nice.
Wait, so who is the tactics? By the way, first of love how Garmin as a company is fascinating as well. There was a really cool substack about its rise. And also it's like, it has so many product lines. It's fat. And I think the whole substack was about how Garmin had to reinvent itself again and again and again, which I think is just like a, ⁓ yeah, exactly, exactly, exactly. It's just a, and I think they had a phase where they were just doing stuff for boats or whatever, or I don't know. It's a fascinating story.
Matt Carey (16:10)
Hmm.
from the sat nav era.
Yeah, yeah, they do.
They do like the heads up, not the heads up displays, but like the displays and the navigation for boats, because they just did navigation for a while. And now they do like smartwatches, which obviously related.
Wilhelm Klopp (16:31)
Right, right, right.
Yeah, yeah,
it's easy to forget that the whole world isn't just like startups and like VC funding, right? Because this is like a very mature company.
Matt Carey (16:40)
Yeah, I think it's really easy
to forget that the whole world is not static. Like when you think about, I might do this, I might do this. And then you forget that everyone else is also doing, I might do this. And sometimes those cross and sometimes they don't, but everyone's always doing something or they're not doing anything. Some companies, but mostly people are doing stuff. ⁓
Wilhelm Klopp (16:47)
Right.
Mm-mm-mm-mm.
That's true, yeah. I'll send you the link,
oh, the sub-stack post for anyone listening is called Garmin's $40 billion pivot.
Matt Carey (17:07)
That's nuts. Cool company though. Cool company. I just, I feel like the whole fitness industry though, there's some, yeah, there's some really strange stuff. The first logo I saw when I got off the plane in San Francisco was Strava's because they're, if you're driving from the airport to the center of San Fran, they're like on the outskirts. They have a building there. And I was like, mad. Cool. Cool, cool. And they bought our friends at Runner recently. Did you see that?
Wilhelm Klopp (17:20)
interesting.
just think. cool. Yeah, it is wild, like.
Rana and then
they also bought a cycling coaching app I think ⁓ more recently than the Rana thing. Yeah.
Matt Carey (17:37)
I did not know that. like more
recently, because they bought they bought fat maps ages ago, like, or like two years ago. And that was something that was kind of controversial if you were into trail running and outdoors because they they bought it and binned it basically. And that was kind of sad. And it was from Chamonix. So Juliet's friends were all just like up in arms about that. Yeah. Some of them worked there, but then some of them some of them didn't, obviously. ⁓ OK. So go on.
Wilhelm Klopp (17:49)
Mmm.
That sucks. ⁓
no way.
I just want Strava to have a
good triathlon mode, man. Like it's annoying. Like... Five, bro! It's five because it... Yeah, yeah, yeah. Yeah.
Matt Carey (18:05)
Yeah, the fact you have to do three separate activities. yeah, shit, you have the changeovers. my god. And then you have to go in and make all the changeovers private so
that no one else can see them.
Wilhelm Klopp (18:18)
And
you have to edit the run. It's just like, what man? And then it doesn't show your whole time and you have to like edit the run activity to like say, my run was obviously just the final part. My whole thing, my whole like time was this because like the way you compare yourself to everyone else is like not through the run, it's through everything. Yeah.
Matt Carey (18:33)
Yeah, the whole time. Yeah, the whole time. Yeah. No,
that's a good point there. Dude, anyone from Strava listening, make a triathlon mode.
Wilhelm Klopp (18:41)
Yeah, I mean, I don't know.
feel like at these companies, right? Like you don't really care about this stuff. You care about like just what everyone uses. Like Strava, most like the vast majority of people use it for running and cycling. So that's what they care about. I don't know. My hope at some point with all of this AI stuff was that like it'll become way easier to create code. So we can have all of these incredible like long tail features, like a great triathlon mode for Strava, even though just like a million people want it or whatever.
Matt Carey (18:49)
Mm-hmm.
Yeah, but then it's the... ⁓
dude, it's definitely limited by some random... You've seen the microservices video? It's limited because of some random... Yeah, the Krasam video. Amazing video. No, I feel like it's got to be limited to that. dude, okay. So, I have massive foam at the moment because it's the AI engineer world fair in San Fran. Are you gonna go to that?
Wilhelm Klopp (19:11)
yeah, yeah, yeah, yeah. The Crozam.
So ⁓
yeah, yeah, ⁓
so I went, look, this is my badge. I went there. I went, so I bought a ticket, dude, the ticket is a thousand dollars, a thousand dollars, which is very hard to compare also because at the Stripe conference a few weeks ago, I think most people had free tickets and that was a top, top, top notch conference, right? Like
Matt Carey (19:29)
you are going? Okay of course you're going.
Wow. Made that printing money.
⁓
That was where Johnny Ive
spoke, right?
Wilhelm Klopp (19:49)
Yeah, exactly. Johnny Ive, Mark Zuckerberg, the Carlson Brothers, obviously, like top tier food. I went to this yesterday. I got up a bit. I wasn't feeling great. So I had a bit of a late start to the day. I think I woke up there at 11 a.m. Super hungry, super desperate for coffee. The check-in takes forever because like half the printers aren't working or something. Anyway, I eventually get through it. I asked them like, where is the coffee? And he's like, okay, just go two floors down. I tried to go two floors down.
Matt Carey (19:53)
Yeah, yeah.
Wilhelm Klopp (20:14)
I'm told, your, your past doesn't get you here. I'm like, I paid a thousand dollars for this and my path doesn't get me here. And then also I wasn't sure. like it wasn't super clear where everything else was. I looked at the map later and I think I just missed a floor. but I, I was just so desperate for coffee that I just left. and then a friend message saying, Hey, do you want to get lunch? And then I got lunch with them.
Matt Carey (20:18)
⁓ no!
and you can't get coffee
Wilhelm Klopp (20:38)
And then just went home and did this Claude code mega session. So I haven't even really been at the event yet.
Matt Carey (20:43)
Oh my god, well I hope you get to go today. Yeah, no, I have megafonema, but it's good they put them on... No, they put them on... I don't know, they put them on YouTube, so I'll watch like, I'll wait a month and then see what's the crazy trending one on YouTube and watch it, basically. I think. I think that's how I'm gonna do it.
Wilhelm Klopp (20:45)
Yeah, well tell me more about the FOMO. Should I check out anything that you want to see?
Yeah, yeah. I mostly got it because
I wanted to hang out with the Zed guys. They're all there. Yeah, yeah, they have a booth, I think, maybe. So I just wanted to like spend some time with them. And I saw Steve Ruiz. He was at the conference. No, I was too in need of other things. I like the coffee. And he was chatting to someone else. But yeah, I think lots of people in town for this. and there's another big event as well. There's Snowflake Summit.
Matt Carey (21:02)
Yeah, they rant.
Okay, nice. Yeah. I met the modal. you say hi?
Wilhelm Klopp (21:23)
which actually I've seen a lot more badges for that one. So I wonder if that's the big event in SF this week.
Matt Carey (21:29)
I don't know. Wouldn't be the big event in my circles. Although my mate's mum works with Snowflake, so I reckon she's there, yeah. Yeah, no, I don't know, FOMO, who else? I saw the, you know the modal CEO was in town this week? Yeah, Eric, he took us all out for dinner. It was really lovely. Yeah, it was awesome. It was so good to meet him. Company.
Wilhelm Klopp (21:33)
no way. I'll say hi.
yeah, Eric.
No way. that's awesome. Yeah, I've, I've, that's great. Yeah.
Really? Yeah.
Matt Carey (21:55)
company that I love really. It's like very cool company,
amazing dev tool, very innovative. Yeah, big fan, big fan of them.
Wilhelm Klopp (22:01)
So
how do you square your love of modal with your dislike of Python?
Matt Carey (22:05)
that's hot actually.
See, I love modal right? And they've made some really cool stuff. But I have still had horrible python bugs. I saw the best one in the modal slack channel which was like...
I'm getting this like really nondescript error about VLLM. I don't know what's causing it. It works on my machine. Like, why doesn't it work in the cloud? Like what's going on? And the reply was just like, did you call your script VLLM by any chance? And it's like the script was the same name as the package. So something was failing and like, but Python, my God, so horrible. Like the fact, yeah.
Wilhelm Klopp (22:27)
Mm-hmm.
no.
You
Matt Carey (22:42)
The fact that nothing is scoped, anyway, I'm not gonna rant on about Python, because we only have like 10 minutes or 20 minutes left and I could rant the whole time about Python. ⁓ Yeah.
Wilhelm Klopp (22:50)
That's so sad. We should do
a Lex Friedman Megapod, which is just four hours, five hours. No, no, we don't have to talk about Biden for five hours.
Matt Carey (22:57)
of a Python run. I could do a Python run for five hours, I think. I think I could cover, I could probably cover
most of my grievances in five hours.
Wilhelm Klopp (23:05)
You know what? I think that would be quite a fun future episode. We should do that. It's interesting to me because for me it's always like whenever I hang out with the Python crowd, they all complain about JavaScript and how it's awful. And whenever I hang out with the JavaScript crowd, they all complain about Python and how it's awful.
Matt Carey (23:08)
I don't think it would.
Mmm.
Yeah, I mean, I've been complaining about JavaScript all morning. I think actual JavaScript just does just suck as well. Like I'm using this library. I'm using this library that's super old and I just need to update to a newer version of something. ⁓ But it was, it hasn't been touched since, no, 2020.
Wilhelm Klopp (23:28)
Mm-hmm.
Mm-mm.
Matt Carey (23:37)
And so it's got no support for ES modules. It's got no support for anything modern in JavaScript. So it doesn't work in the browser anymore. It doesn't work in... It works in Node, but just Node if you set it to something old. So basically it doesn't work in a worker. And that's a pain in the ass for me. Although, with ES build, it works in a worker. It's a JSON path.
Wilhelm Klopp (23:52)
Right. Yep.
What's the library?
Matt Carey (23:59)
the original JSON path JavaScript library. It does actually work in a worker if you bundle it with the S build like normal workers are, but it doesn't work in a worker which uses the V comp plugin because it's got a react app as well. Or at least I can't work out the config for it. So pain in the ass anyway.
Wilhelm Klopp (24:01)
Cheers.
Mm-hmm.
This
sounds like typical JavaScript stuff, doesn't it?
Matt Carey (24:18)
Yeah,
this is like three of the most finicky things all connected. It's like old JavaScript, Veep plugins, and brand new things from Cloudflare. They're all amazing when they work, but together I'm playing with fire, I think. ⁓ Yeah. Yeah.
Wilhelm Klopp (24:27)
Mmm.
Right, yeah, yeah. Yeah, yeah.
Totally, It's a bummer how that stuff happens. yeah, mean,
has AI been any help with figuring this out or not really?
Matt Carey (24:42)
literally
zero. The most help it's been is a deep research thread telling me what I should use instead of JSON path.
Wilhelm Klopp (24:47)
Oh, dude, deep research. I actually had a cool moment with deep research the other day. Oh man, this is actually interesting. Deep research, for ages.
Matt Carey (24:51)
Mmm.
Not like the rest
of our podcast. This bit is actually interesting.
Wilhelm Klopp (24:58)
Yeah.
No, no, I'm just I'm always thinking like is the story I'm about to tell actually how interesting is it? But I think this one's a good one. Well, for ages we were trying to figure out.
Matt Carey (25:06)
are gone. Gone, tell, tell.
Wilhelm Klopp (25:11)
So in, in color, right, we try and like trace everything and we have these different entry points. so that's, that's really the tragic, as soon as like our color, activation entry point gets hit, we can like trace a lot of things, but like, how does it get hit? Right? Like we have like a color run and then your script thing. you could like a bit like,
D D trace run or like basically like we can whatever is in your script. We can trace we have like a middleware or anything in the request. We have like a little decorator thing. So like anything it's decorated can get hit. But one thing that was always a bit of a challenge is like if you do like a sub process or something like that in your code, like how do we trace that?
Matt Carey (25:29)
Yeah, I am.
Wilhelm Klopp (25:46)
And it seemed like other tools could do it, but we'd looked into it previously and we just couldn't quite figure it out. And I just had a deep research task spin up that's like, Hey, like, can you look into how other libraries and especially like DD trace or like new relic run or these ones, like how do they trace everything? Cause it seems like they do. And it pointed me at this awesome, Python feature called site customize, which have you ever heard of site customize? It sounds like, so like what is, you know,
Matt Carey (26:08)
No. ⁓
Wilhelm Klopp (26:11)
What does the site part of that mean? But it's a thing. It's like a file where if it's on your Python path, Python will just load and run that code before the rest of the program starts up. Yeah.
Matt Carey (26:22)
What? Mate, that sounds like a
horrific security feature.
Wilhelm Klopp (26:27)
So
you can just put your or we can just put our own colo site customized thing onto the path or to the Python path or whatever. And then it'll just always activate. So I'm just like, damn, like, I don't know how I would have Googled for that. You know, like that's a deep research thing.
Matt Carey (26:36)
Mmm.
No, no, 100 % no. No, it is
really good. And for me today, like less.
crazily, it ranked all of the JSON path libraries by like how large they were, like any benchmarks that had been done on Reddit or anything like that. Just which was the best one, which would be the easiest to implement, which had the same feature parity of the one that I was using. And like all of that, like I can just send that to someone and be like, guys, we need to change. And the PR is going to take me like 20 minutes, right? But like the rest of that would have taken me quite a long time. So this was really good.
Wilhelm Klopp (26:51)
Nice.
Yep.
That's cool. Yep.
You know what would be hilarious?
The opposite of a kind of shippy or any kind of PR review bot where it just, someone else opens a PR.
Matt Carey (27:19)
Mm.
Wilhelm Klopp (27:22)
And then you kick off a deep research task that just absolutely like slags off the PR gathers all this research about why it's a poor approach. And then it's like, not approved, rejected. Here is a 20 page report on why your code sucks.
Matt Carey (27:37)
Yeah, which would then obviously be ingested by another model to get down to like one line. Yeah, obviously.
Wilhelm Klopp (27:42)
Of of course, yeah.
Okay, wait, I was just looking at the JSON path thing. It looks a bit like XPath. I'm not sure I've ever used this. What are you using it for?
Matt Carey (27:50)
we use this a lot in stack one. It's all about how we define where fields exist in a request and a response. And so it's how we do all of our transformations to insert values in certain places and read from certain places. We use JSON path and we use something called Jexil as well, which is another one of those query languages. And then we actually combine them both together in this crazy monstrosity.
Wilhelm Klopp (27:57)
Mmm. it's sick.
Gotcha.
Matt Carey (28:15)
I'll show you some of the query strings sometime. They look mad. Yeah, they look actually mad.
Wilhelm Klopp (28:16)
interesting.
That makes sense. Yeah, of course,
because it's so much of stack one is like transformations and yeah.
Matt Carey (28:25)
Yeah, and it's like writing a spec
that does a transformation at runtime. Dude, I'm actually building like stack mode, stack one's alter ego, like stack two. The MCP thing that I'm building at the moment is, we're not aiming for feature parity with stack one because it's like MCP approach rather than API approach. in some, and we're going for like direct to customer rather than B2B. But in some avenues, we...
Wilhelm Klopp (28:34)
Nice.
Matt Carey (28:47)
have feature parity and in some avenues we have surpassed feature parity which is really quite funny like it's mostly it's very broken but in some very small areas we have like some cool stuff which obviously if you prioritize from the outset it's a lot easier to do than if you're like two and a half years into a mature product.
Wilhelm Klopp (28:51)
No way.
That's wild. Yep.
Totally, and this is like using foreign MCP servers for the connections.
Matt Carey (29:08)
This is using, this is hosting MCP servers. man, we've gone to MCP now. This is like hosting MCP servers for other people. man, we did well, we did well, we did well. Yeah, no, this is like creating servers for different use cases, chaining tools together, generating new tools, all in like a UI that hopefully my grandma can use. That's the plan. Yeah, that's the plan. But we're doing our best, we're doing our best.
Wilhelm Klopp (29:13)
I was gonna say we've gone 30 minutes without talking about MCP. It's a new record.
That's awesome.
But you are using other people's MCP servers as well or just...
Matt Carey (29:36)
No, so
the cool thing is I'm actually just generating them. So there's going to be some attribution. Attribution, that's the right word. Yeah. Because we are generating our own spec from open source MIT servers. servers that are online on GitHub, we just scraped a bunch of them. And then we're generating our own spec from that.
Wilhelm Klopp (29:43)
Hmm.
I should say.
Matt Carey (29:58)
And our own spec means that they can be customized by the user and they can do a bunch more fun stuff. But we'll open source all of the servers and let people write their own. Yeah, it's going to be cool. It's going to be cool. I probably got about a month left before we do a big launch. But we'll do a demo. Not a demo, we'll do. I did a demo at Tinkerer's last week. It was a super early demo. But we'll do more of a private beta demo.
Wilhelm Klopp (30:19)
nice.
Damn, I wanna see the
demo. I wanna see, dude, you've been up to so much stuff actually. You had some crazy hackathon build thing that you were doing as well. What was that?
Matt Carey (30:24)
in a couple weeks. I'll show you, I'll show you.
I guess me and my friend Thomas. Have I introduced you to Thomas? No, maybe not.
Wilhelm Klopp (30:37)
from Cloudflare.
Matt Carey (30:38)
Yeah, baseline Thomas.
Wilhelm Klopp (30:40)
Yeah, I think I met him briefly at demo days once, yeah.
Matt Carey (30:42)
Okay. Yeah,
okay. So he...
So me and him had like not a disagreement about something that we were thinking about building. We both wanted to build a word processor and we were both like chatting about like what would be a cool fun thing to build. Oh, I've never built a word processor. I've never built a word processor. That sounds like fun. And then we were like had a massive disagreement on how we should use AI because obviously AI should be in our word processor. And his thought was he wants AI to make you a better programmer, make you a better writer. Like he wants to intrinsically become better at writing through using AI.
Wilhelm Klopp (31:04)
Mm-hmm.
Matt Carey (31:14)
AI
so afterwards once he's like and he wants this to be like training almost so his thoughts were you could never copy and paste the AI suggestions the AI was like a critique almost like a Grammarly but on more like on steroids and without like an easy apply you know like Grammarly has that like easy apply all like he didn't want that he wanted the writing to come from you and for the AI to give you like some tips and tricks and like interesting ways of maybe rephrasing something along the
Wilhelm Klopp (31:31)
Yeah, yeah, yeah, yeah.
Mm-hmm.
Matt Carey (31:40)
And I was like, that sounds cool, but no one will buy it. And he said, I don't care. I'm going to build it. It's going to be even cooler than you think it is. And I was like, okay, right. The way I would, the way I want it is I want something where you write, you just brain dump what you're thinking, like in no format, so to speak. And...
then at the end you have an option to be like, I want this to be a job description for work. I want this to be my CV. I want this to be like a product design review. I want this to be like a request for comment. I want this to be like some structure or an email to a colleague. I want this to be like some structure that in human society we recognize as a structure, but I don't want to write it. I want to write just my thoughts out in a horrible, no.
Wilhelm Klopp (32:05)
Right.
Matt Carey (32:27)
punctuation English and then I want that to be transformed almost like granola esque where they transform your horrible notes into like decent notes ⁓ I want to do that and so we armed with our like individual ideas we went off and we spent well I got back from the pub on Friday night and between like 10 p.m. and 2 p.m. I made a little prototype I'd never made an electron app before it was really good fun ⁓
Wilhelm Klopp (32:33)
Mmm.
Mm-hmm.
Yeah, that's awesome.
Yeah?
that's great.
Matt Carey (32:51)
Yeah,
really, really good fun. So yeah, that's out. It's called thinkynote.com. Yeah, you can try it. There's some hacky stuff you have to do to get it working. You have to ⁓ get around the quarantine from Apple because I don't have an Apple dev license. So you have to paste this horrible script. Yeah, you have to paste this horrible script into your terminal. And then the transformation bit is still like...
Wilhelm Klopp (32:56)
Yeah, amazing. Wait, can I try it?
keynote.
You cheeky boy.
Matt Carey (33:16)
behind a feature flag, so that doesn't work. But what you do get is you get, as you're writing, this was the other part for mine, is I want to continuously brain dump. I never want to like leave my page. So I want a long, continuously scrolling page I never leave. But a lot of the time when I'm writing stuff down, sorry, I've gone a bit on monologue, but a lot of the time when I write stuff down, yeah, a lot of the time, I find that I need to know some external piece of information to like make my next thought.
Wilhelm Klopp (33:27)
Mmm.
Nice. Yep.
Matt Carey (33:43)
And so what actually happens is as you write, there's a model that's continuously running in the background. And the question that's being asked is, what should I know for my next line that I don't currently know? And it just replies with like little suggestions of things that you might not know. So like little random stats and like just like random stuff. It's really funny. So I did this horrible demo that I put on LinkedIn just because I thought it was hysterical. And it was like, ⁓ hi, does this work? And the model went check the console logs.
Wilhelm Klopp (33:54)
Hmm.
man, that's awesome.
That's awesome.
Matt Carey (34:10)
And then I was like, no, but seriously, does it work? And it was like, I'm speaking. And then I was like, what's my name? and it also stores memory. ⁓ if you say something about yourself while you're writing, it will store that. And because it's an electron app, it's not stored in some database somewhere. It's just literally a text file on your computer, which is really fun. ⁓ Yeah, so I've been enjoying that. I need to get the transformation thing working. That's the next step. And then, yeah, like, dude, I'm well stoked about this.
Wilhelm Klopp (34:11)
Ha
Hahaha
Nice.
⁓
Yep, yep, yep, yep, yep. That's sick. Nice.
Sounds like, so
did Thomas build something as well? Or did you just straight up win?
Matt Carey (34:39)
Did Thomas? Yeah.
Thomas built like a beautiful markdown editor and then didn't get any of the AI working, but it's such a nice markdown editor, from scratch in the browser. It's just really nice. was like, I was so impressed with us. And then, and that was really funny because I posted about it on Twitter. And then we got like four more. Well, we got like two other people just randomly said, yeah, I'll build something. So we got Jan from
Wilhelm Klopp (34:55)
Mm-hmm.
⁓ sweet. ⁓ man, that's
Matt Carey (35:04)
from the Netherlands and he built
some like context area thing, like a text area, but it automatically scrapes like URLs and adds it to context. It was really funky. And then we had, I think his name is Charlie. He built or he started, he was doing like designs for like a nutrition app. Yeah, he did like a...
Wilhelm Klopp (35:12)
⁓ cool.
Matt Carey (35:24)
It was actually just sick. I was like this. It's like a completely re like redesign of my fitness pal cross with a bit with Strava ⁓
Wilhelm Klopp (35:32)
Nice.
That's cool.
Matt Carey (35:32)
I really liked it because Strava
has horrible UX. I know we're talking back about Strava but it is awful. Like every time you want to find a person, like Strava is a social network right? Why do they hide like where the people are? Like I'm always like where is the search box again? wait I have to click this button then this button and now I'm in the search box okay now I can find you. Yeah. dude I've never used the browser. I've never used the browser, the web one.
Wilhelm Klopp (35:36)
Yeah, yeah, man, there's so much they could improve.
Hahaha.
Mm-hmm. And that's on the mobile app, which is the better UX than the web. man,
you're in for a treat. This is really cool. I would have loved to join. think we were messaging and I was like, yeah, I'm traveling all of this weekend so I can't join. Yeah, yeah, yeah. Right, right.
Matt Carey (36:08)
Yeah, I think you were in Peru, ⁓ no, you were in Vegas at that moment, I think.
Wilhelm Klopp (36:13)
I have actually kind of been looking for, like I take a lot of like little notes all the time, like in Google Keep or sometimes in Apple Notes. They serve slightly different purposes in my workflow. And then I have a do a lot of like writing and thinking and whatever in Reflect, which is like this.
writing. It's actually kind of like what you described. It's like one long page, but it's broken up into days, but it's just infinite scroll in all directions. And it's really pleasant. I've really liked it. And it actually, think it kind of came a little bit from the era of Rome. If you remember Rome, you have like the bi-directional linking and stuff, which I don't...
Matt Carey (36:35)
Okay.
yeah, I know Roam, I know Roam. dude, you were the right person
to ask about this, because you love productivity tools, don't you?
Wilhelm Klopp (36:53)
I'm glad I have that reputation. think it's a bit of a midwit to love productivity. It's a bit of a midwit place to be be loving productivity tools.
Matt Carey (36:55)
Yeah, I think you do. No, you definitely do.
I don't think it's a midway, I think it's
like a pursuit of some sort of self-excellence. I don't know. Yeah, people do it. Genuinely, you're not the only one, I know.
Wilhelm Klopp (37:09)
that will go with that.
no, it's a big military industrial complex, the productivity improvement genre. But the thing that I'm missing at the moment is like, I think I I've written lots of like thoughts and notes and whatever, but none of these tools actually have APIs or APIs that you can really use.
Google Keep doesn't, or I think it only does in a strange enterprise context. But otherwise, no API any normal person can use. Reflect is all in on end-to-end encryption. So they have an API, but only to add notes or add text to an existing. It's like an append-only API. ⁓
Matt Carey (37:41)
Mm-hmm.
Nice.
dude, mine will probably just have an MCP server
eventually.
Wilhelm Klopp (37:52)
Yeah, because, okay, so the thing that I actually want is I would love a workflow where like overnight for like, I don't know, an hour or whatever.
03 or whatever the smartest model is looks at all of my notes from the day and all of the problems and conundrums I wrote down and then gives me like when I wake up a nice little summary of like hey have you thought about this like you might have not realized that there is a Creative solution to this problem that you haven't like at least written down like I think that would be like a really cool addition to the workflow But none of my current note apps, I think really support it
Matt Carey (38:24)
So
I wasn't going to support the async process. I was going to do just the real time one while you're in the flow. Because I use this thing called FreeWrite every now and again. And it's the most basic note taking app. But its killer feature is the Pomodoro timer.
Wilhelm Klopp (38:36)
Hmm.
Matt Carey (38:42)
That's it. That's its killer feature is and it's beautiful when you when you you can scroll to increase the time and it makes a clicking noise like and it's like haptic and everything. it's super nice. Anyway, that's basically the only feature of this thing and so you write for a certain period of time and the idea is once you're done you feel a bit like relieved and it's like there's some weight lifted off your shoulders, you feel free, your skin's better, your back hurts less, like it's all everything's good in the world, you know.
Wilhelm Klopp (38:42)
no way.
That sounds great.
Mm-hmm.
⁓ You're
moisturized, flourishing in your lane. You're just immediately moisturized.
Matt Carey (39:11)
Yeah, you're moisturised. You're just immediately moisturised. Did you see, did you
see Louis from Bloop wrote a post called AI engineering is dead? This is like massive pivot, but.
Wilhelm Klopp (39:23)
⁓
I saw this on the agenda. I think I saw it in passing. And I actually don't know him very well at all. I don't think we've ever really met. But yeah, what's the gist?
Matt Carey (39:33)
The gist is, first of all, it's a really good post and everyone should go and have a look at it. The gist is previously, role of the AI engineer came around when we were trying to like scaffold round models, make workflows, do all of this like prompt hacking, ⁓ context window hacking, like making everything just like work, you know?
Wilhelm Klopp (39:46)
Mm. Mm-hmm, mm-hmm, Right, brag.
Matt Carey (39:53)
Yeah, rag, like all of this stuff and we were chaining together lots of LLM calls, each one with a very particular purpose. We were setting up evals for each one and I'm saying like this, we were, this was like a year ago max. I would say like this is like six months ago to be fair. But he has a really funny opening few lines, which is like first they came for XML, then they came for JSON, then they came for...
Wilhelm Klopp (39:59)
Right.
Matt Carey (40:16)
chain of thought prompting and now they're coming for the AI engineer. And it's like, it's like previously you had to do all of this stuff and now it's just a model with tools and tools are regular software engineering and the model you don't control everything else. Everything else has just been squished back into the model. And I like this because it's on a similar take to some of the other pieces out recently that I think of directionally very correct.
Wilhelm Klopp (40:19)
Hahaha
Mm-hmm, mm-hmm, mm-hmm.
Mm-hmm.
Matt Carey (40:41)
So one of them is The Model is the Product by, he goes by like Alexander Dorsier on Twitter, but I think his name, I can't remember his name actually, his real name, that's actually really bad. He's a French lecturer, he runs a small AI lab, but yeah, it's called The Model is the Product, and I think the website is vintagedata.com if I remember right. But amazing post where he basically said that,
Wilhelm Klopp (40:53)
Cool.
nice.
Matt Carey (41:02)
as these companies get bigger, their models get better and their models will include more of the engineering that previously was needed around them. And basically it all falls into the model eventually. And the only companies that will be able to sustain their value over a long period of time will be AI companies. So he points out that Windsurf were training their own models. Cursor, their main selling point for ages was they had this apply model.
Wilhelm Klopp (41:15)
Yeah.
Mm-hmm.
Matt Carey (41:28)
that automatically applied your code and enabled that like a genetic loop. If you didn't have that, like it didn't work, right? Like the magic didn't happen. And so the magic always comes from a special model. But now the applying is just a tool call. So cursor maybe doesn't have that value proposition. And so they're more like maybe more on the fringe. don't know. Everything, they were saying everything falls into the model and.
Wilhelm Klopp (41:31)
Right, right, right, right, Yep. ⁓
Yeah. But now the applying is just a tool call.
Yup.
Matt Carey (41:53)
Yeah, mean, Nico, my work, wrote a blog post, very similar, it called, yeah, it's called Tools Not Rules.
Wilhelm Klopp (41:54)
Interesting. nice, I actually, that's on my reading list, the one from Nico, yeah, yeah, yeah.
Matt Carey (42:02)
And it came out a couple of weeks ago, basically talking about how we've re-architected a lot of our internal tooling to instead of make these very strict rules and strict workflows about if this, do this, if this, do this, workflows, basically chain models together, model calls together. We now have this much more freeform approach with the agentic loop and plus tools.
Wilhelm Klopp (42:14)
Mm-hmm, mm-hmm, mm-hmm, mm-hmm.
Totally.
I mean, it's interesting. I'm totally on board with the model plus tools is like what is like the new meta or whatever. Like that definitely was a big unlock when we realized the spot building like the Zed-agentic editing. But like, I also still think, I mean, I think it makes sense as like a high level strategy, like to focus on models, tools, MCP, as opposed to on like...
Matt Carey (42:30)
Yeah.
Hmm.
Wilhelm Klopp (42:44)
weird custom models that apply code or like overly fancy rag.
Matt Carey (42:48)
But it's what they needed to get
their springboard. They have to do that next bit. So I think, imagine the models get much better. And we currently have models plus tools. But a lot of our tools are just wrapping API endpoints. That's what most people do. Imagine we just end up with models plus fetch. Models plus fetch plus bash. Or even like model plus curl, model plus bash, model plus git. And you're done.
Wilhelm Klopp (42:52)
Totally. Yeah, yeah,
Right.
Yeah, yeah, yeah.
I
mean, I think so I think this makes sense to me at a high level perspective, but I think like if you're building like an actual working end user experience, there's what I think Torsten called this like elbow grease or like all of the things that are just like the small little things to make an actual like nice pleasant experience. I think that's that's what engineering always has been a lot about,
Matt Carey (43:23)
Yeah.
That's the software engineering. I think a year ago we were saying, agents or models aren't good enough to be left alone. Putting them in a loop is silly. Why would you do that? They descend into madness. All of these ideas that, I mean, I was saying, a lot of people were saying, and now we're all like, oh yeah, you need a tool for each individual use case. Each individual action is a tool. And then,
Wilhelm Klopp (43:39)
I see.
Yeah.
Yeah, yeah,
So, right.
Matt Carey (43:57)
So I think it's like
Wilhelm Klopp (43:58)
Yeah.
Matt Carey (43:58)
one step away from, yeah, we just need an interface. And so whether the interface is like a bash command for a local terminal or whether the interface is curl for like a remote thing or whether the interface is, I don't know. mean, those are the two interfaces, right? What else do you have? A WebSocket interface? I don't know.
Wilhelm Klopp (44:08)
Mm-hmm.
Mm-hmm.
Carl and Pat, yeah.
No, it's fair, although I don't know. I think I see the point that it is just all software engineering, but maybe it always was just software engineering and it's just about picking the right tool for the job or whatever.
Matt Carey (44:30)
No, there was some
ML stuff, like definitely with the evals and trying to make each individual workflow step. You do, but you're having a singular eval, like it's an eval for a task. It's like a model eval rather than before. I was doing like loads of task specific evals, like switching models between them. I guess you're still doing that now. I was given some really good advice about evals for Shippy, which is pick some use cases where you know.
Wilhelm Klopp (44:35)
But you still kind of want evals now, right? Like ideally you still want eva-
I see.
Matt Carey (44:57)
what the next tool that needs to be called, and then that is your eval. And that's the only eval you can do in an agentic use case. Create some scenarios, then you have one tool that is next possible to be called, and then that is all you can do.
Wilhelm Klopp (45:01)
Right, right, right, right, right, yep.
What's currently happening with Shippy? What's the status?
Matt Carey (45:17)
we do at the moment? So Shippy is like, I mean it's in GA, like people use it, it's really cool, it's an autonomous code review, does a bit of QA as well. I guess the next thing to work on is the whole rules thing, like there's a lot of different cursor rules like windsurf rules, can we combine those, can we keep them up to date, can we do all of that in CI? ⁓
Wilhelm Klopp (45:31)
Yep, automatically suggesting rules,
Yeah, I think that's so
powerful, so powerful.
Matt Carey (45:41)
Yeah, I think that will be awesome and I haven't thought
about that too much yet. But I think when I think about that a more, that'd be really good. And then I guess the next step after that is I keep on getting sidetracked by thinking of ways to monetize it, which I think probably is not the best plan, but I'm going to make some... It's already an MCP client. So when I run it, I give it access to the Context 7 MCP server. But like I want to give it access to a browser rendering MCP server. So...
Wilhelm Klopp (46:01)
Nice. Yep.
Matt Carey (46:07)
That's the next thing to do. And then I think like a really cool thing that no one really does when they're doing PRs, but they should do. And so it would be nice if this was just done automatically is benchmark the old code and the new code. And so I'm going to give access to a code interpreter, which can do that like as an MCP server. And like if you've changed a function.
Wilhelm Klopp (46:19)
Mmm.
Which one do you use for
that one?
Matt Carey (46:24)
I I'm gonna go a little bit custom, maybe Piedide or something. maybe, no, sorry, maybe, yeah, no, sorry. I'm thinking about something else. No, I'm gonna go with modal sandbox.
Wilhelm Klopp (46:34)
cool, nice,
Matt Carey (46:34)
Yeah, I'm gonna go with the modal sandbox.
And then if that works, I'm gonna do some custom thing on ECS probably, but to start with 100 % of modal sandbox.
Wilhelm Klopp (46:44)
Ooh,
I did not expect you would go that way. Nice. I quite like ECS, actually.
Matt Carey (46:47)
Yeah, probably. Yeah,
I'll probably do something on ECS, but that's like if I have usage of this, like to begin with, it'll be modal sandbox, super easy to set up. And I just want to see if I can get the model to call an MCP server with the old code and the new code like reliably. And, and if it can work out like which change is like relevant to be benchmarked because there's a lot of things like if you change some filtering logic, you can benchmark it, but you need to have, you need to create some
Wilhelm Klopp (46:54)
Yeah.
Yeah.
Matt Carey (47:14)
good inputs and outputs that on the fly. So it would be nice to like see if the model could do that. Because if it can, then that's a really good use case that we haven't really been able to do before because of context windows and all of, and model cost and all of that stuff.
Wilhelm Klopp (47:27)
That's awesome. Love it, man. Really, really exciting. There is so much, yeah, you wanted to wrap this up a bit sooner, right? Am I gonna see you in London this weekend?
Matt Carey (47:29)
Dude, there's so much stuff to work on. ⁓
yeah,
of course. We're gonna do something like, I might actually, dude, I've got so many side projects. Yeah, yeah, I you wanna go out on Saturday. I'm probably keen. I'm going for a big bike ride on Saturday. So then I'm probably keen. But then on Sunday, I'm gonna go to the granola cafe, coworking thing.
Wilhelm Klopp (47:40)
You want to go out on Saturday?
Okay, nice.
Granola Cafe?
Matt Carey (47:56)
Mate, Shrey from Granola is running co-working sessions on a Sunday, which is ridiculous because it's a Sunday, but I've got so many side projects and some of them need to get done soon, so I'm gonna go and finish something.
Wilhelm Klopp (48:01)
Amazing.
100%. No, that sounds very good. Yeah, my schedule is a bit mad because I get in only Friday morning doing a bunch of work stuff during the day. And then Saturday is like my 10-year, sixth-form reunion. Which is all... Yeah.
Matt Carey (48:21)
yeah, you were talking about this. Dude, you're not making it
out. You're all gonna be so done by the time you wanna go out. You're gonna be like, get me away from people by now.
Wilhelm Klopp (48:27)
Yeah, yeah, yeah, yeah. It's possible. But then I
also have like a 7 a.m. flight from Heathrow on the Sunday. So the plan was to like, yeah, to go straight from, yeah, exactly. Head, head, maybe a shower and then head to the airport like straight, but don't sleep.
Matt Carey (48:35)
⁓ wait, that's on Sunday? ⁓ so your plan is to stay up?
The only
time I've ever wanted to do that, I've missed my flight.
Wilhelm Klopp (48:50)
Hopefully that doesn't happen.
Matt Carey (48:51)
The only time I've ever tried,
I got back home at like 2am, 2.30am and was like, I'm gonna have a shower, get ready, got a cab at four and then about, I'd done all my stuff around three and then I sat down and the next thing I knew it was 9am. And I was like, aww no. Yeah, it didn't work.
Wilhelm Klopp (48:57)
Mm-hmm.
Mm-hmm.
god, yeah. That's a nerd. I think I've done it once
before to go to a... It was actually not planned. It was like an assignment that ended up being much harder than I thought. And then I had to go to a hackathon.
the same weekend in Barcelona. And then, my god, also this was like a student times, right? But then also, I think the hackathon, it was supposed to have showers, but then it didn't, or they closed much earlier than anyone expected. And so I didn't shower for like three days. And I still remember the look on the guy's face on the plane that I was catching back from Barcelona to London when he's...
Matt Carey (49:20)
That sounds like a wild, a wild evening. What?
Do you think? Go on.
He'd just smell you coming from the end.
Wilhelm Klopp (49:47)
Yeah,
he saw me. He smelled me coming. He was just like, oh fuck. But hopefully it was a quick flight.
Matt Carey (49:50)
that's horrific. dude, I have too been
on a plane when I felt like I was gonna throw up after like a night out or something and I just feel sorry for everyone next to me. Anyway, on that wonderful note, I really need to go. But dude, it's been a pleasure.
Wilhelm Klopp (50:04)
It has been a pleasure. All right. See you next week or see you in London, man. Let's make it's figure something out. Yeah Big love. Bye
Matt Carey (50:08)
Yeah, yeah, yeah, let's do it. do it. Big love. Bye.
