On Legacy

Wenqi He · Research Software Engineer, NCSA, University of Illinois

As software people, we are in the idea production business, and a codebase is a body of knowledge. It always starts from a vague and seemingly simple idea and eventually grows into a monstrosity that nobody understands. This is the real meaning of the softness of software: not that it is easy to change, but that it accumulates, and there is a limit to how much it can accumulate before it becomes so brittle that a minor fall can shatter every bone in the body.

We call this “legacy software,” and we tend to think of it as a matter of age. But legacy has nothing to do with when code was written, and everything to do with why. If an addition to a codebase contributes to the reader’s understanding of the problem domain, it is not legacy, and if it confuses rather than elucidates, it is legacy regardless of when it was written. A line of code written a hundred years ago that makes you go “I see” is not legacy. A line of code written a second ago that makes you go “huh?” is. Legacy is a cognitive condition, not a temporal one.

And cognitive conditions can accelerate. There is a positive feedback loop at the heart of the legacy problem: pretending to understand produces confusion, confusion produces confusing software, and confusing software necessitates further pretending to understand, which produces more confusion, which settles into the software, and the loop continues, each iteration moving the codebase further from comprehensibility while the confidence to say “I do not understand this” erodes, because the whole culture around you is pretending, and admitting incomprehension looks like weakness when everyone else is performing certainty. AI-assisted development does not introduce this loop but spins it faster, so that what used to take years of accumulated confusion can now be produced in an afternoon, which is what makes it the defining pathology of software development in our moment: accelerated aging.

The symptom everyone recognizes is brittleness: you change one thing and ten others break. It is not complexity per se, not size, not the number of contributors, but rather the missing pieces in understanding. To fill in the missing pieces is what we call “rigor,” and it is only when you can make an unassailable argument for every part of the system that you can claim to truly understand it, and it is only then that changing one thing does not break ten others, because you can see exactly what everything depends on and why. Code can be accidentally correct from a false or incomplete understanding of the problem, but that correctness does not survive perturbation, because the missing pieces are still there underneath it, and any change that touches them will find them. Correctness of ideas is what makes correctness of code stable, and the rejection of rigor is therefore not an aesthetic preference but the rejection of comprehension itself, which is, in the most literal sense, the production of confusion.

This is why the comparison to mathematics is so instructive, and so damning. Mathematics is not, as it is commonly taught and commonly understood, a discipline of numbers and equations, but rather the discipline of understanding formal structures, and its defining cultural commitment is to never pretend to understand, which means you cannot get away with a gap in a proof. A group is a group and a ring is a ring, and there is no escape hatch, no edge case where you have to reach below the abstraction to reason about an object you have previously treated without knowledge of its underlying details, and so a mathematician can prove a theorem and forget the proof, because the abstraction holds without their continued attention. We cannot build a system and forget how it works, because software abstractions are never that tight and there is always an edge case, always a failure mode that lives below the contract, which is not a fact about computers but about a culture that has drifted so far from the mathematical spirit that it keeps repeating, as a point of pride, that you do not need math to write code, a claim that is dangerous precisely because what mathematics actually offers is not its content but its pattern of thinking, and that pattern of thinking is exactly what software development needs and does not have.

Why does software lack it? The honest answer is that the culture selects against it. Silicon Valley did not invent this tendency but it perfected it by making the performance of understanding more valuable than understanding itself, since the demo, the pitch, the launch, the sprint review are not occasions for exposition but for persuasion, and for persuasion you do not need rigor but showmanship, so that the discipline demands not a musing philosopher but a quick-witted talk show host, and that is what it produces, generation after generation, until the gap between what is understood and what is claimed to be understood becomes the air that everyone breathes and almost nobody can see.

The gap is also institutional. Computer science programs treat mathematics as a service course: discrete math is nominally required but underemphasized and poorly integrated into the rest of the curriculum, and the parts of the field that come closest to the mathematical spirit, formal methods, type theory, program verification, are treated as advanced electives for the theoretically inclined, which in practice means that unless you already think that way, you will never encounter them. Mathematics programs treat programming as a supplemental practical skill, a concession to employability, and never quite ask what it would mean to bring the mathematical habit of mind fully to bear on the act of programming itself. In the end, nobody is institutionally trained to think about software the way you would need to think about it to resist the pathology. What is needed, and does not exist, is a software curriculum taught from within the mathematical tradition, as a liberal arts degree concerned not with the production of software but with understanding.

So what would it look like to take the other path? The primary deliverable of a pull request would be increased clarity and collective understanding of the problem domain, not the accumulation of functioning code, and reading such pull requests should help the reader understand the problem better, so it should be a pleasure and not a chore. And code review should be the verification of intellectual coherence and integrity, not a judgment of behavioral correctness and apparent plausibility.

How do you irrigate a field with a cup of water? You do not pour what little you have in the cup, because the solution is never in the cup. What you need is a river, one that never stops flowing, and if you hold your cup in it, it fills to the brim so that you cannot carry it without spilling to the ground, and it is the river that irrigates the field, not the cup. The cup never mattered.