Deterministic Programming with LLMs
Introduction
If, like me, you regularly read programming blogs (and you do, otherwise you wouldn't be seeing these words), then you are well aware that our industry is currently deep in the throes of a drastic change and more than half of all those blog articles are currently involved in debating the best way for our industry to adapt to the advent of LLMs (large language models) that are capable of writing code.
Much has been said on the ethics of LLM coding, on the best approach for using it, and on how to use AI agents effectively. In this essay, I would like to make one contribution to this discussion: I would like to talk about how LLMs can be used in a deterministic way. I am not trying to suggest this is the only way to use them, but I do think it is valuable to think about it as one possible tool to use.
Mathematical Proof and LLMs
Before I look at my own industry, I want to take a look at what people are doing in a different area that also is affected by LLMs and AI and try to learn something from what they are doing. The field of math has been confronting questions about how to use AI and what they have done is illuminating.
LLMs are actually quite capable at writing things that look like mathematical proofs. In September 2024, Terence Tao (a Fields-medal-winning mathematician who has been active in looking at the use of AI in math) wrote that supervising an LLM was like: "trying to advise a mediocre, but not completely incompetent, (static simulation of a) graduate student." Very few human beings are capable of acting like a mediocre graduate student in math, and the LLMs have only gotten better in the past year-and-a-half.
But ultimately, LLMs are a device for producing what looks very much like other documents they have seen, and they are subject to hallucinations. This is especially dangerous in math proofs. Proofs often depend on very subtle differences and given a plausible sounding argument it is easy to fall into the trap of believing it. So it is not safe to accept LLM-written proofs as correct, nor is it particularly safe to ask mathematicians to read an LLM-written proof and expect them to catch all errors.
So mathematicians have turned to another computer-based tool: Lean (and other proof systems). In principle, mathematical proofs take starting axioms and definitions, apply logical inferences, and thereby reach a proven conclusion. Actual mathematicians don't generally operate at this level; if they did, then even simple 5-page proofs would be thousands of pages long with too many steps for anyone to hope to follow them and no way of getting any insight from it. But computers can. Lean (and other proof systems) produce just such a rigorous step-by-step proof, but they are not very widely used by professional mathematicians because writing proofs in Lean is actually very challenging.
You can see where this is going. In January of 2026, a team of people successfully got an LLM to create a proof of a meaningfully difficult problem never previously solved by humans. The problem was one of a large set of problems posed by Paul Erdős, but by accident this one had been written down incorrectly, which was only discovered a few months earlier.
The approach had a few steps: first, an LLM (ChatGPT) was asked to create an outline of the proof. (This was an essential step, but — as predicted above — it turned out that the argument actually had a few flaws.) Then another AI tool, Aristotle, was able to patch up the logical flaws in the argument and express the proof within Lean, so it could be verified. Finally, ChatGPT was used again to read the Lean proof and write it in the format of a published math proof, complete with references to existing literature.
Deterministic Tools in Programming
Let's get back to my own industry of software development, also known as "programming". Today we have agents like Claude Code or Gemini Code Assist that are capable of "vibecoding" moderately complex applications with minimal supervision. They are, arguably, at the level of a mediocre junior developer. The existence of these tools is changing nearly everything in our industry at a shocking pace as we try to figure out what software development will look like after the invention of these coding agents. I want to talk specifically about LLMs and determinism.
It is widely agreed that automated deployment scripts are better than deploying manually. Writing a deployment script takes longer than doing a single deployment by hand, but over many deployments the script will pay back that time spent, so scripted deployments save time (in the long run). But that's not the most important reason to use automated deployments.
What is more important is that automated deployments are more reliable. Every time a deployment is done by hand there is a chance of human error. The script is hard to get right in the first place but it will consistently give the same results every time it is run. Computer programs are incredibly good at being deterministic, at producing the exact same result every time. Humans are less so.
And in this respect, LLMs lie somewhere between humans and computer programs. Unlike humans, LLMs don't get bored or tired or impatient. But like humans — and unlike computer programs — they do not produce the exact same results every time they are used. This is fundamental to the way that LLMs operate: based on the "weights" derived from their training data, they calculate the likelihood of possible next words to output, then randomly select one (in proportion to its likelihood). This produces results that are based on the sum total of the training data (most of human knowledge for the frontier models) but which varies slightly each time it is used.
Asking an LLM to do deployments would be easier than writing and debugging a script, and it would probably execute nearly as quickly as one. But it would be a poor way to do deployments because it would not be deterministic: it might work 99 times out of 100, but still fail occasionally.
When Determinism is Needed
Some tasks in programming are done just once, and others are done over and over again. Determinism is important for the tasks that are done repeatedly. The simplest kind of work done just once is a one-off task: migrate the data from system A to system B, import the data from the spreadsheet, generate charts for the presentation, anything that you will do once and then never need again. There is no need for determinism to guarantee the job will be done identically every time if we only plan to do it once.
Nearly all of the code we create is written once in order to run many times. When we create a login service for a web application, we only have to write the service once, and we expect it will be used for every customer every time they log in. In this sense, ordinary development is "write once, use many times". The login itself needs to be deterministic, but the process of writing it doesn't. But there IS a more complex kind of "work done over and over": maintaining standards within the codebase.
A good example of this would be protecting against injection attacks. Before using a user-supplied string within something like an SQL query, an HTML page, or a command line argument it is essential that the string be properly escaped. And this needs to be done EVERY time that you insert a string. "Protect against injection attacks by escaping user-supplied strings before using them" is not a one-and-done task, it is one that needs to be done over and over.
We know (from decades of experience trying to protect against injection attacks) that humans (even experienced developers) aren't reliable enough to get this right 100% of the time. And unfortunately, LLMs also cannot be considered reliable enough. We can include sample code that properly escapes input strings, we can insert a reminder about injection attacks into AGENTS.md or create a Claude Skill specifying how to escape strings, but because of the stocastic nature of LLMs, that will never give us the deterministic confidence that ALL of our strings will be properly sanitized. Better prompting cannot change the fundamental nature of LLMs. And there are many practices that require consisten behavior: following naming conventions, ensuring that every log message includes a stack trace, closing every file in a finally block — there are lots of global practices we want to enforce in our code.
The Solution is Code-Checking Code
Even the best humans are not fully deterministic, and over decades the software industry has invented techniques for enforcing policies when we want them to be universal. Since LLMs share the same limitation, we can use the same solution! Unlike humans and LLMs, programs are extremely deterministic, so I recommend relying on them when we need consistent, reliable behavior.
There are a number of ways to do this. We could encode the policy into the type system: if we had two different types, "UserString" and "SanitizedString" we could get the compiler to enforce the requirement that UserStrings must be sanitized when combining them into a SanitizedString. Or, we could write a "lint" to enforce the use of our naming conventions or to prefer our new logging framework rather than the one we are slowly replacing. We could write a unit test which would scan the code to ensure that only approved libraries are used. And because linters, tests, and compiler-enforced policies are run every single time the code is built, there would be no risk that an LLM or a human programmer would accidentally miss a case.
Creating code aides like this is great for determinism, but it requires extra work. Writing new lint rules or transforming the code to use the NewType pattern may not require a lot of creativity, but it does take a bit of time. And that is where the LLM comes in handy, because agentic programming LLMs are very good at creating exactly this kind of tool. When consistency is important, instead of asking your LLM to follow rules each time, ask your LLM to build a program to enforce the rules, and incorporate it into your build chain.
This recommendation applies whenever there is a policy, something that needs to be done consistently at many places within the code base. When something only needs to be done once (whether it is creating the login screen or writing a lint to enforce some policy), asking the LLM to write it is reasonable. (Just what level of human scrutiny it needs afterward is a topic of intense debate. I review every line of code before committing it; I know not everyone does so.)
Summary (In Case You Skipped the Rest)
LLMs may not get bored like human programmers, but they aren't fully deterministic. Whenever there is a standard practice or policy that needs to be followed every time a certain type of code is written, LLM coding agents might not get it right 100% of the time. You can prevent that by writing lints, tests, or other non-stochastic programs to verify that the policy is being followed and incorporate those into your build process, and the LLM agent can help you to write that.
Posted Tue 24 February 2026 by mcherm in Programming