Blog

Analyse backwards

A scientific approach to data promises to let us take decisions on the basis of empirical evidence. This ought to be objective and so more correct than managing our work according to opinions.

But how do we make sure we’ve got the right data? How do we ensure our analytical approach will enable us to take the right decision?

If the data is lacking then we’ll inevitably find ourselves falling back to opinions.

I believe the secret to data-driven decision making is to start at the end…

People have built up intuitions for computers as we’ve lived with them over the decades. Our mental model is that they behave something like a database. We expect computers to record data for us precisely. Inaccuracy is usually attributed to the human system surrounding the computer.

This intuition is at odds with the reality of our new AI oracles. Large language models don’t store facts like databases. They are prone to hallucination (the tendency to make things up during generation). They contain ambiguity and uncertainty very much like the human systems surrounding them.

This post demonstrates this uncertainty and explains how it arises before discussing some implications to help revise your expectations. We conclude with recommendations to address hallucination by returning to the strengths of structured data.

Functional Reproducibility

Have you ever written the perfect program? Has it still run unchanged 6 months later? Can your colleagues run it without you?

Just because your analysis is executable, it doesn’t mean the results are reproducible.

Data ages. Libraries change. Machines differ. Servers fail. Bits rot.

Entropy is inescapable.

We can learn how to engineer reproducibility by drawing on techniques from functional programming and software development.

CSV on the Web

CSV on the web (CSVW) is a standard for describing and clarifying the content of CSV tables.

Robin Gower

Blog

Analyse backwards

LLMs aren't databases

Functional Reproducibility

CSV on the Web