Automate responsibly

Preamble

Me, at the end of a much longer thread:

Should LLMs be used in workflows where accurate data representation is paramount?

ChatGPT 5.2, high reasoning's answer:

Yes—but only if the workflow is designed so the LLM is not the final authority and can’t silently introduce errors.

If “accurate data representation is paramount” means errors are unacceptable, don’t use an LLM as the source of record, calculator, classifier-of-last-resort, or sole extractor.

Use LLMs safely in such workflows mainly as assistive components: propose candidates, draft summaries, map fields, generate queries/code, explain anomalies—then require verification.

The key requirement is independent checks: ground to authoritative data (retrieval), enforce schemas, run deterministic validation, reconcile against sources, require citations/quotes where possible, and add human review for anything that can’t be automatically verified.

If you can’t add those guardrails (or can’t tolerate residual error), LLMs are the wrong tool for that part of the workflow."

Parrots with hands

In a recent post on his blog, Andrea Amani (aka The Pirate), has deconstructed the inner workings of LLMs and has shown how they are nowhere near intelligent, being instead just stochastic parrots. However he also said the following (in Italian):

La iena non capisce perché le aziende continuano a chiedere sistemi con AI. Per lei è una moda, una roba che sbaglia sulle api e che non sostituirà mai nessuno. Non riesce a spiegarsi perché tutti vogliono agenti, automazioni, chatbot. La risposta è semplice: perché nel 2026 c'è ancora gente che inserisce dati a mano nei gestionali, che gira fogli Excel come fossero tecnologia di punta, che non automatizza niente. E questo è assurdo. È assurdo avere montagne di dati e non usarli per predire, per ottimizzare, per eliminare il lavoro ripetitivo che un agente farebbe in secondi. Quei sei agenti di Parrhesepstein, ognuno un dado pesato, messi in fila con i documenti giusti davanti, le regole giuste nel system prompt, e la verifica automatica alla fine, producono report che un singolo ricercatore umano impiegherebbe anni a compilare. Non perché sono intelligenti. Perché sono veloci, instancabili, e hanno i dati sotto il naso. Il pappagallo con le mani. E il pappagallo con le mani, nel 2026, dovrebbe essere in ogni azienda. Non per sostituire le persone, ma per smettere di farle lavorare come se fossimo nell'età della pietra.

This roughly translates to the following (I am also summarizing somewhat): "The hyena (*) thinks that AI is just a trend and doesn't understand why businesses keep on asking AI-based systems. The answer is simple: that's because in 2026 there are still employees who insert data by hand, and moves Excel files around ass if they are top-notch tech. And this is absurd. [...] Those six Parrhesepstein agents, each a weighted dice, put in front of the right documents, the right rules in the system prompt and automatic verification at the end, produce reports that each single human researcher would need years to compile."

(*) It seems that Andrea lovingly refers to his wife as "the hyena".

While I whole heartedly agree with the sentiment, and I am spending a good part of my working hours trying to automate away reporting, I am not as positive towards the use of stochastic parrots, at least not in an entirely automated fashion (and I think he would agree as well). Let me explain.

Critical VS non-critical (or causal) workflows

In every business, and often in life in general, there comes a time when one has to deal with numbers and dive into the "scary" world of accounting (thunder rumble, screen shake and spooky chords playing). Accounting is not something to be taken lightly because it needs to be as accurate as possible: wrong numbers or categorization may give a false understand of the situation, and also put you or your company in legal troubles.

Accounting, because of its nature, is something that I consider a critical workflow. Critical workflows can be automated, however they should always undergo non-automated review.

Other kind of workflows, say the renaming of a personal audio-video collection of files in order to satisfy certain requirements, is something that I would consider a non-critical workflow. If something goes wrong the area of impact is limited to the owner of the collection, and it is not a big problem is errors are corrected at a later time, instead of being caught in the moment.

In my view, non-critical workflows are the kind of stuff that is perfect for full automation. It is important to understand, however, that whether a workflow should be considered critical or not depends on the owner of such workflow: one might be OK with having a few tens of cents off in their personal accounting, another might not, and that all critical workflows need the utmost care in verifying the output.

Deterministic VS non-deterministic automation

Automation is nothing new, in fact it is even older than computers. The goal has always been the same: do repetitive work wither faster, more precisely (with a higher degree of repeatability and lower variance) or possibly both faster and more precisely. For example, electronic calculators will always be faster and more precise than humans, as long as they don't break and within the specification of the system. Also, if they "fail" (if they encounter a condition outside of their specification) the will fail predictably.

This is known as determinism, and is absolutely fundamental to most automation. I am pretty much sure that nobody would accept a calculator that when asked to calculate 1+1 has even the smallest chance of returning anything but 2.

However, somehow, these days there is a very worrying trend: that of using models that are fundamentally non-deterministic in nature (e.g. LLMs), that are equatable to weighted dice rolls, to perform tasks that could as easily be automated deterministically. Why would anyone leave to chance that which can be done repeatably is beyond my comprehension. Not only that, but most of the workflows that can be solved by using LLMs or agents, can be solved more quickly, efficiently and cost-effectively by traditional deterministic techniques.

It is infinitely better to use language models to help generate deterministic systems for workflow management, instead of using them directly.

The one reason to use stochastic parrots

There is really one good reason to use stochastic parrots: tasks that require natural language processing, such as the Parrhesepstein application built by Andrea. However, notice how even he knows that the produced output needs to be double-checked, and assigns a separate agent just to that task. The agent produces a score that indicates how much a given output from the previous stages of the "agentic pipeline" (note: I truly hate the word "agentic").

Here's my issue with the final step: it is still a weighted dice, and it might spit out a score that is not actually representative of the degree of grounding of the output in actual sources and references. And the worst part is that you will never be able to tell, unless you check the results yourself one by one.

Now, what I described above might not be an issue in many cases. In the case of Parrhesepstein, it probably doesn't matter much. However, imagine if a journalist trusted the contents of Parrhesepstein as-is, without due diligence, and published an article based what a stochastic parrot has deemed to be well-grounded output of another stochastic parrot... which then reveals itself to actually not be. Don't you think that, all-of-a-sudden (not really, it is perfectly foreseeable), the problem becomes much bigger?

And that is the crux of the question: most people do not understand LLMs for what they are, thus are unaware of the possible pitfalls, and on top of that Sam Altman, Dario Amodei and co. keep on lying about the capabilities of these systems. Because if they don't, next year they might be bankrupt.

A final anecdote and a small token of love for spreadsheets

Recently I was using OpenAI's Codex with the ChatGPT 5.3, and even access to the web (which was recently introduced). I wanted to write a Notion integration in Python. I had given it as much context as possible, even pointed it to the necessary documentation. This "genius level at everything intelligence" (directly from the lying mouth of Sam Altman himself) managed to pass the wrong kind of parameters to certain REST endpoint. Was it able to correct the mistake later? Sure, but that's beside the point. The point is that if LLMs were actually intelligent, and actually better than humans, then I shouldn't have gotten the wrong output in the first place.

Furthermore, the all maligned spreadsheets still have a very important place even in today's world. Spreadsheets have the distinct advantage of explicitly showing both the data and the process, and are extremely useful for modeling processes on small-scale data. Not only that, but modern spreadsheet software has powerful programming language bindings that allow, among other things, automated ingestion of data from a variety of sources. "Try it and tweak it first in a spreadsheet, then build a more robust production system with the insight you have gained" is a perfectly viable strategy.

Understand the tools you use, and understand how things can go wrong.