StackOverflow or LLM? - vibe coding a simple webapp
June 2025
Coding is being disrupted. Can I keep up? Is the code still maintainable? Am I a better employee if I use agentic tools?
TLDR Verdict
A very promising tool, likely only going to get stronger in the future.
Guidig principles if I’d use this in my day to day:
- develop in a way that lets you verify if what is generated is correct early and often. Nothing special - this is a part of “shifting left” - unit tests or live code reloading ensure that what you generate actually works as you go
- pre-prompting seems very promising
- avoiding the burden of reviewing to the PR reviewer - make sure the team is aligned that either standards will be lower or different. Different does not mean bad
- reducing the surface area of the changes helps with the above problem - If you’re refactoring or generating a targeted set of changes its less likely things will go wrong. Using Modules or Microservices goes a long way here - teams can have relaxed standards and cognitive load can be eased in other ways.
The above “principles” are not conclusions but mostly just working thoughts. I invite anyone to chime in and help me shape those…
Making code design standards explicit as a team helps. We never take shortcuts with things like security. We can take shortcuts with maintainability. Getting on the same page about the impact of those standards is easier said than done and they diverge from project to project.
A demo is available at /stack-or-llm/ and the code can be found on my github.
Starting out
I used the Pro version of Cursor and spent about 8 hours in total on this. Created a new github repo, checked it out and threw spagetti at a wall:
Scaffold a Svelte 5 project that does an API call to a weather API and include tests. (or something along those lines)
- Generated code worked out of the box
- Seemed to work
Impressed at the speed and the capability to generate multiple files.
Getting into the groove
In my day-to-day I already use Copilot pretty heavily - both for creating commands to run in the terminal and to do all things code-related. The major difference in my current way of working is scope -> I manually give context and, almost always, just stick to selecting 1-30 lines of code to work on. At most I have it scaffold a test harness for a small bit of existing code to better understand it.
Meaning the major change was the many changes at once approach.
I don’t mind it, I think its well integrated and the shortcuts are intuitive. Undoing works kinda OK, did not crack it yet completely.
What did happen though is that the microarchitecture of the code did not match my mental models. As an example it generated coupled code - used StackAnswer to test a what was supposed to be a generic component. Not terrible, just going against my instincts.
I started creating more prescriptive prompts.
Remove references to
StackAnswer.svelteto make theCardStack.sveltecomponent more generic
Not having to actually clickops the context is nice in those things as well - it searches and includes the mentioned files automatically.
Pain points and limitations
I was lucky that I picked a project that was nothing innovative. Tinder-like clones are probably dime a dozen on Github. Moreover, I used a library it seemed to be trained on (using it too broadly here - admittedly I just let Cursor pick the models it wanted) PLUS the style that it also seemed to be trained on - https://stackoverflow.design.
Use stackoverflow design system (stacks) to add a header
Spewed out valid header code that I would have used at my job (due to parts of it being open sourced and used on https://stackoverflow.com, yay). It also spewed out the SVG for our Logo.
Svelte 5
I wanted to use what we use at my job - latest Svelte 5 - there things broke down and I had to prod it or just revert to manually resolving the issues that popped up. There is still some legacy code in the SwipableCard.svelte. I was also not able to properly fix reactivity of the CardStack.svelte component - remember when I mentioned earlier that it coupled it to StackAnswer? I wanted to use other Card types - something that would be easily achievable through a <slot>. LLM could not solve that problem for me.
Actually, because I’m no Svelte expert, I was also not able to solve that problem using docs or https://stackoverflow.com in a timely manner. Follow up - use @docs!
Tests
I have multiple ways of working in my day to day, sometimes using tests to document and other times tests to drive the shape of my code and development. Most importantly, I use tests to get feedback as fast as possible. So I set out to have my solution testable from the get-go.
Due to the nature of some components (SwipableCard relies on many user interactions) the generated tests were either not working or bad (meaning they tried to use sleep to wait for callbacks/timers to resolve). Being a mostly backend developer for the last X years it was not easy for me to resolve those issues and I did give up in several cases - I wanted to have a finished demo project first and foremost.
Stylings and attention to detail
Here is where pre-promtping would be useful. I would always want to use semantic classes from https://stackoverflow.design but the generated code defaulted to new code classes. It was also super trivial to get into a loop of overly complex DOM and styles if I simply asked the LLM to do more and more.
I learned to stop and actually review the code as we go… Don’t want to shift that burden on the PR reviewer down the line…
There were things that the LLM was not able to solve for me - at least with my prompts and how I used it. The one that stood out was an issue with forcing the viewport to span 100% of the screen, but have a fixed height for the bottom buttons. It mostly worked - but not on my phone.
Admittedly, the LLM did generate a correct hint for me that led me to a few targeted searches and a Stack Overflow answer that solved my issue.
Conclusion
It’s at the top. The code is deployed at https://gcd.si/stack-or-llm if you want to give it a go.
Next on my TODO:
- Try Pre-prompting
- Try using
@docsto upgrade from Svelte 4 concepts - its clear that the LLMs are weaker for concepts they were not trained on - Try using in a larger codebase/language and framework I have more experience in