The Monte-Carlo method of coding: Fitness functions for codebases

April 2026

AlphaPhoenix (Dr. Brian Haidet) has a lovely YouTube channel that I follow. His Algorithmic Redesign video has a great analogy for solving problems with randomness and inherent drift - How to untangle a pair of headphones.

Always Tangled When the Skype Rings by Quinn Dombrowski, via Wikimedia Commons (CC BY 2.0). Hover to play the genetic-algorithm animation.

Paraphrased:

Jiggle the headphones, aka boil the thing to apply temperature (randomness)
Gently pull them apart, aka apply a driving force (a fitness function, a forcing function)

And that’s a great analogy for my day-to-day - how to ensure the codebases in my domain remain healthy without me micro-managing and at scale? Every new feature request introduces new changes (new randomness). By setting good feedback loops, of course!

Reinventing the wheel

I feel strongly that the incoming changes caused by LLMs benefit from the same old good engineering practices. There is the canonical “No Silver Bullet” (1986) paper, but more recently, similar concepts were covered in Clean Architecture (2017) and Building Evolutionary Architectures (2022). The best performing LLM coding tools (citation needed) have strong feedback loops built inside their harnesses.

Tools to (re) evaluate

The point of this rambling was to go beyond unit testing to help with the old problem of codebase evolution and drift without relying on gut feel. Maybe its a question of Engineering Maturity. Here is the shortlist of some tools I believe are able to provide that forcing function and hopefully prevent the codebase to grind to a stop - prevent our earbuds from becoming tangled in the first place so we can put them in our pocket without worrying.

SonarQube and other static analysis tools, like Dependabot, are a great first stop. They have modules on some softer measures that contribute to maintainability. See their Clean Code page.
ArchUnitNET and friends allow you to codify high-level rules codyfing how the architecture of the codebase should look. Think - these classes should not depend on others.
(tangential) Deployment tracking - while not as close to the codebase they still provide important signals and quality gates

We have plenty of existing forcing functions to pick from to add to our automations to scale ourselves (or maybe even automate ourselves out of a job?). I’ll look for opportunities to have Cursor and others to take into account some more high level metrics and see how the code evolves. Using things that were well tested in the past seems like a powerful shortcut.

Back to home