#197 Vibe Coding Gotchas to Watch Out For

Peter Green: Welcome to the Humanizing Work Show. I'm Peter Green here with Richard Lawrence, and I'll start today's episode by saying I'm an AI optimist. I worked at Adobe for 15 years and it's been really fun to watch what they've been doing with generative AI, the features they're building, and in their approach to trying to do generative AI really ethically.   In fact, there's a feature called Enhance Speech that's one of the most magical tools I've ever used, and that's coming from someone who spent decades building and using tools to enhance speech.   Now, I'm not a software developer, though I've worked with and around software developers for the past 20 years. And I've been reading reports of product managers and entrepreneurs doing what's been labeled Vibe Coding. If you've been in a no technology mindfulness retreat for the past year and you've missed it, vibe coding is essentially prompting an LLM like ChatGPT or Gemini or Claude AI. And then the AI starts writing code and building the app with no requirement of understanding what the AI is doing under the hood. You're just vibing with the AI, not really worrying too much about system architecture or scalability, reliability, et cetera.   There are many reports of people finding that vibe coding a prototype feels faster and more effective than writing traditional documents like PRDs or user stories. Then going ahead and using that vibe coded app in customer tests, and handing it off to developers to productize it. Now, I was interested, so I decided to try Vibe Coding a little app, and I had a pretty good experience with it.   I play a bunch of the New York Times Word games, and one of those is called Connections. And all of our family have a text thread where we share our results every day from all these games. Connections is an app that has 16 tiles every day, and then there are four groups of those tiles and you have to figure out how they group together and they go from pretty obvious Connections to more esoteric Connections. Uh, I always wished that in the game you could drag and drop the tiles so that I could test things out, because usually they'll lay the game out in a way that the first row is a red herring. Like the first row of tiles, the first four tiles might say NEVER GONNA GIVE YOU UP, and you might say, oh, well those four go together, right? And just submit 'em. But ah, it doesn't work out. You have to play with it a little bit and you have to sort of learn some of the patterns. So I thought, ah, if I could just drag and drop the tiles... so I decided, why don't I vibe code that?   So I did. I got an app that actually works, and it was fun. There's a community of people that play this game online, so I thought about making it production ready and sharing it with that community. But I'm not a developer and Richard is, so I asked for his advice and Richard took it in a pretty different direction from what I expected, and that provided a lot of clarity for me about how to proceed, and I thought it would be useful to share that with our audience.   So whether you're just thinking about trying vibe coding or if you're well on your way to writing a book about prompt engineering for app development, in today's episode, we'll share some advice about how to get the most out of vibe coding software while mitigating some risks that may not be so obvious at first.   Richard Lawrence: Yeah. Peter, when you asked for my advice, I noticed that you shared the code with me. I think you were expecting me to do a code review and and tell you whether the code was any good or not, and I think you were surprised that my response was, uh. I don't think that's the biggest risk for you. Like I, I don't care how good the code is and what I've heard is, a lot of these tools can produce good code, especially if you're prompting them in a skilled way and they're getting better at it. So that's probably not your biggest problem.   The biggest risk with anything like this is product market fit. So my first question with any new product like this, and the one I started with for your app is, have you tested this with other people? And if so, how?   Peter: Yeah, and my response was, "sort of?"   Richard: Like a, a good startup entrepreneur, "My mom likes it!"   Peter: Yeah, precisely. I've got this little text thread with our family for people that play the New York Times games, and I shared it with that text thread, and I was hoping for a response like, "wow, this is really cool. Wait, you vibe coded this?"   But the response was a mix of, "Wait, this isn't working on my phone," and then just crickets, like no response to that, that I just built this app for them.   So..   Richard: Even your mom?   Peter: Uh, my mom was like, uh, what's this supposed to do? I don't see it doing anything. So she was the, "this isn't working for me."   Richard: Uh, that's rough. She's, she's normally positive about all your stuff.   Peter: Yeah. Yeah. The Mom Test book I think was written maybe just to me, uh, 'cause she's so supportive. So I started to wonder. If I shared this more broadly with that online community of people that talk about the games every, every day, maybe that text thread isn't representative, the more avid Connections community and I could share it with them.   So the real answer to your question is I shared it with some family members. I got some, uh, mild, uh, input on the usefulness of the advice.   Richard: And so it sounds like you're about to walk into a really common problem for startup founders, which is confirmation bias. And this is not just for startup founders, actually. This is a really common problem for product people in any kind of context where our brains show us information that confirms what we believe. And so we tend to seek out confirming information without really thinking about it. And I could hear that in your answer a moment ago, "uh, my family didn't like it, but I assume other people do, so I need to find those people that like it."   Peter: Right.   Richard: They're probably out there.   Peter: Yeah. This, this like the most common response, "well, you're just not testing it with the right people."   Richard: Right. Surely there's a whole market out there. I don't know why you keep picking the ones that don't fit into the market.   And it may be true it, it is sometimes true that we ask the wrong people. This would be availability bias, where we get input from the people who are easiest for us to get input from. And a lot of times those two go together, because the people who are talking with you about your thing, already like your thing, and the mistake is to assume that they're representative.   The way to avoid confirmation bias would be to write down your assumptions. Hypotheses. I believe that avid Connections players really would value this drag and drop capability that's missing from the app, and then go test that. And I wouldn't test that with a prototype.   Peter: Mm-hmm.   Richard: But that's maybe a different conversation. Before we go there, I'm curious what your response was to the people who said it didn't work on their phone.   Peter: Uh, I just assumed the code wasn't good. I had gone through a whole bunch of iterations over the course of a day or so with, ChatGPT is what I was using in this case where it, it would break stuff on my phone and so I assumed, uh, it worked on all three of my devices, but it didn't work on hers.   It turns out though, that it was just a UI issue. So the way I built this is you upload a screenshot of the game board. That approach just wasn't how she would think about doing it. She was expecting to type things in, and the original version of the app actually had that in it. And I thought, "I don't wanna type stuff. I want this to be magical. Upload the screenshot and suddenly the game board appears where you can, where you can drag and drop, right?"   Richard: Yeah.   Peter: Um, so I cut that feature of, of entering by hand the text.   Richard: Okay. Your response to the feedback sort of gets at the bias I was concerned you'd run into, if you started testing this using your prototype, um, which is anchoring bias. If you haven't really validated the problem and you just start testing the solution, you're going to fine tune the solution. You're gonna stick with that. So you took the first solution the AI presented, it worked for you, so now you're fine tuning that and making it work.   If you showed this to other people, you might get feedback about how the solution works, but you might not actually learn what their underlying pain and hope is for their experience with the Connections app. They would give you feedback about your solution.   Peter: Yeah, I could totally imagine, even sharing this with avid players, and them saying, "oh, that's a really cool solution here," and I tweak the UI this way to make it a little bit better and then going away, and then they never actually use it because that's not how they like to play the game.   Richard: Right. So my advice to product people is "be careful not to start with solutions. Start by validating a problem first." And my favorite way to do that, we've mentioned the Mom Test a few times. Um, it's a reference to Rob Fitzpatrick's book on problem interviews, and I really like that kind of approach of starting with customer problem interviews, getting them to talk about their experience, maybe playing this Connections game and what's it like? What do they enjoy about it? What's frustrating?   And you might discover that they actually do have this problem. They may not name it as a wish for drag and drop, or they may have a different problem that you could solve just as well.   So we validate the problem first, then move towards a solution, and if you validated the problem that people wish they could arrange the tiles differently and come up with more creative solutions, then yeah, definitely test your solution and see how it works for them. That's great.   I'm curious, how was the experience vibe, coding? Was it pretty quick to do as a non-developer?   Peter: Um, not really. It was sort of slow and frustrating. I didn't know what to expect. Uh, like in my optimistic self, I thought this is gonna be great, like within an hour or so, I'm gonna have a killer app.   And what I found was that it regularly wrote bugs. I would go have to fix those bugs, then I could sort of test it out. Then I would have ideas for how to make it better. 'cause it would interpret what I said in ways that were kind of surprising to me and it would add features I didn't ask for and I'd have to go cut those back out.   And occasionally I'd go back and forth between ChatGPT and Claude AI and make them competitors and say. "ChatGPT can't figure out how to fix this bug with,OCR, uh, what would your advice be?" And it would always give me some advice and say, "oh, well, you should probably refactor it this way, and I'd take that back."   And so anyway, it, it took way longer than I expected. Most of this was sort of running as a secondary task while I was doing other things that day, but it really did take the better part of a day. And then from time to time I've continued to tweak it. Like even last night, the puzzle had, uh, three words where it was like, uh, what was it? FOUR-WORD PROBLEM, I think it was, and FOUR-WORD was hyphenated and it interpreted those in a weird way. So I had to go fix the hyphen problem. So it took a while. It took a lot longer than I expected.   Richard: As you described that, it sounds like the worst coding experience ever. So as a developer,   Peter: It could be for you.   Richard: That sounds so frustrating. And some of the best examples I've heard of people experiencing what Kent Beck calls augmented coding, like a good experience with this sort of thing. They're definitely not doing it with text like this. They're doing something like TDD pairing with the AI.   Uh, did you ever consider giving up in the middle of that? Like got to a point where you you were thinking, "I don't think I can fix this bug. I'm out."   Peter: Uh, there was definitely a part related to the OCR functionality where it was using a library called Tesseract and it, it was just, it wouldn't load and it wouldn't load and it would error out, and I couldn't figure out how to fix it.   And. It seemed like some other change we had made caused that bug, but it was saying, no Tesseract is just down. That's why it's doing this. And that happens. I'm like, what do you mean that just happens? That's like the whole core of my app. So, uh, I went back and forth on that.   That's where I started going back to Claude. I was like. "Would you use tesseract? Are there other libraries that are better? How would you safeguard agAInst this? Is there a fallback?" Blah, blah, blah.   I considered bailing on it at that point. It was sort of at, right on the teetering edge, and then I got a suggestion from Claude. I took it back in. That actually seemed to fix it, but it was pretty rough going there about two thirds of the way through it.   Richard: Yeah, it's hard to know if you should push through in the middle of these things, whether you're doing it with an LLM or you're doing it as a programmer. You can bump into sunk cost fallacy in this where you actually are hitting a dead end or you're working on something that's not actually all that useful, but you can tell yourself, "I've already put in all this time, I should keep going."   Peter: For sure I was there. It had worked, like I had had a few instances of it working and I was dragging and dropping tiles and it made that way faster. I was like, "ah, I know it can work. I gotta figure it out."   Richard: And in your case, pushing through led to a good solution that it sounds like you're using and getting the original benefit you've wanted. I've seen a lot of cases where people get attached to their first idea or their first draft and pursue it even in the face of feedback that says, "this is probably a bad idea, or it's gonna be more expensive than the original ROI calculations we made, and maybe it's not worth it anymore."   And people keep going, "well, I'm already in there. I should finish this thing." You see this on Scrum teams all the time. Carrying work over from one sprint to another and spending time and money on it that may not produce a positive ROI.   Peter: Yeah, it's difficult because like I said, the app was working. I actually really like the layout of it. I love the UI of it. And when it works, it's like magic. I played it last night. I solved the puzzle in like 20 seconds. It's like, "wow, this is so great. It's so much faster than what I had before."   Richard: Well, congratulations on making your first app. That's pretty exciting. Uh, what are you thinking about doing next?   Peter: It's a good question. Like I don't intend to make any money with this little app. That's not the goal. I would say my wildest dream is that I publish this somewhere, people start using it, and the New York Times developers say, okay, we should build that into the app, and I don't have to worry about maintaining this app anymore, they just integrate my ideas into the app.   So it's, it might be a tool that others in the community would benefit from. I have considered doing a little bit more testing and polish, maybe going into the ChatGPT and saying, "Hey, if I were to build this into a website that was a little more scalable, what should I do?"   Uh, if people like it, great. If not, it's no big deal. It's free. Right. So I've, I've considered putting it out there. I'm a little nervous to do that, but I don't think it would be too much work to do that.   Richard: It seems like you may be running into, optimism bias there. It looks like it's done. It's probably not too much work to get to production.   Peter: Uh, that would be a shocker, richard. If I was running into optimism bias, I feel like optimism bias is a good definition of my life.   Richard: So if you were about to hand this off to another developer, Let's pretend this is a prototype and you're saying, "I've proven out this concept here. Team, turn this into a production ready product." I think optimism bias is one of two biases you'd probably run into with your team at that point.   One is precision bias. This is where if something looks real, looks complete, has lots of detail, whether it's specification or a prototype, that can signal that it is more baked than it actually is. It's more correct then it might actually be. So you would probably find details from your prototype that you didn't even think about that are just, however ChatGPT chose to do it. And you might not even have picked that, like maybe the font it uses or a particular arrangement of the Chrome and the UI or something.   Developers are gonna treat that as just as important as anything else because it looks good, it looks done, and so you have to be really careful to say, this is what to focus on. This is a prototype of this behavior. Here's what I care about. Here are some things that are not as baked, where we can keep experimenting around those, or you have freedom to try something else on those.   And then of course, optimism bias, which is our tendency to look at a thing and think the best case. It's not gonna be that bad. I mean, we need to do a handful of things to get to production, but it looks pretty close. I can take it the rest of the way.   And even people who are really skilled at this sort of thing, you know, whether that's a smart developer, um, or there's a story Daniel Kahneman tells about a group of experts writing a book about cognitive biases running into optimism bias, talking about. How long is it gonna take us to make this textbook? And they were off by a factor of eight in underestimating what it actually took.   And the same thing happens with smart people of all kinds. We think, uh, I've got this, I've done this before. It'll just take a few steps and then we get it done. So the way around optimism bias is to look back at history. How long does this sort of thing usually take? What are the issues we usually run into? And then build that into your prediction of how it's gonna go in the future.   Peter: Yeah, I love that. It's an example of reference class forecasting. That's the technique we use. And this is a great example of what, uh, Bent Flyvbjerg talks about in software development specifically, which is if you're doing something for the first time, his advice is don't. But his second advice is like, if you have to do it for the first time, 'cause I don't, I don't know that anybody else has vibe coded a drag and drop tile app before, and I don't know if I could find data on how long that takes to turn into a product, is to try and find something similar. And it doesn't matter if it's sort of different. It doesn't matter if it wasn't vibe coded. It's really- "go find a little app that has this type of UI and say, how long did it take?" And that's your best guess. You're gonna get a much more accurate forecast based on that.   So Richard, if I could summarize your advice here. Uh, it's, I think it's that vibe coding can give you something really fast and that feels exciting. That was definitely my experience. It's kind of fun to have this little app. I definitely don't think of it as I coded an app. But I did make it.   Uh, the trap I think, is thinking that because it looks real or it's working for me since I've, I coded it, that it must be ready or at least very close to ready, right? The real risk may not be in the code, although there might be some risks there. The bigger risk is probably in skipping the more important work of testing our assumptions and validating whether the product actually solves a problem for customers. So keep an eye out for those biases. Let's see, those were confirmation bias, anchoring bias, sunk cost bias, precision bias and optimism bias that we focused on in this episode. Right?   And then don't use vibe coding to prove your idea. Use it to learn. Does that sound about right?   Richard: Yeah, I, I think that's a good synthesis. And if anyone checking out this episode wants our help in figuring out how to do this kind of validation with or without vibe coding or AI augmented software ,development or whatever, these are human things that aren't really about the technology.   We love helping product leaders get better results by taking a systematic human-centric, complexity aware approach to their work. Uh, you could join me in an upcoming CSPO or A-CSPO, where we work on some of the capabilities around this, or you can contact us at humanizingwork.com and discuss a custom engagement.   Peter: And if you get value from the show and you wanna support it, the best thing you can do if you're watching on YouTube is subscribe, like the episode, and click the bell icon to get notified when new episodes come out. And drop us a comment with your experience with Vibe coding.   Richard: And if you're listening on the podcast, a five star review makes a huge difference in whether other people would benefit from the show, find it or not. And it's super encouraging to us to continue putting out this content for free every week. So thanks for tuning into this episode of The Humanizing Work Show, and we will see you next time.

#197 Vibe Coding Gotchas to Watch Out For

Show Notes

Episode Transcript

Other Episodes

#180 Use a Team Empowerment Map to Fix Delegation Theater

Having Uncomfortable Conversations on a Team

Participant's Key Takeaways from the 2022 Humanizing Work Conference