Ben Rees

What a 2019 NLP paper tells us about getting recalled by AI

What a 2019 BERT paper by Petroni et al. reveals about how AI models store and recall facts - and why most brand content is optimised for the wrong thing.

Ben Rees - 6 June 2026

Most brands optimise to be understood by AI. They should be optimising to be recalled by it. Those are not the same problem, and a paper from 2019 explains why.

In Petroni et al., 2019, researchers at Facebook AI and UCL probed BERT to see whether language models could function as knowledge bases. Their answer was a qualified yes. The model stored facts as parametric memory, baked in during training rather than looked up at query time. It was a paper about how models remember.

It is also, accidentally, the best guide I have read to AI brand visibility. Here is what it tells us.

1. Being mentioned a lot only helps if you are mentioned in the right slot

The paper found something specific and strange. The frequency of the object of a fact in the training data correlated with whether the model recalled it. The frequency of the subject did not.

Read that again, because it overturns the obvious strategy.

A fact has a shape: subject, relation, object. "Paris is the capital of France." The model got better at recalling facts whose object (Paris) appeared often. How often the subject (France) appeared made no measurable difference.

For a brand, your name is almost always the subject. "Our platform does X." "We help teams with Y." You are the thing doing the verb. And subject frequency, the paper says, is not what the model remembers you by.

2. You want to be the answer, not the asker

So flip the grammar.

Weak: "We built project management software." Your brand is the subject, the product is the object, and nobody is being recalled.

Stronger: "Project management software is built by [brand]." Now the brand sits in the object slot of a claim the model can store and return.

This is the whole game of GEO (Generative Engine Optimisation), the discipline named by researchers at Princeton, Georgia Tech and the Allen Institute (Aggarwal et al., 2023). The more I test it, the more this 2019 finding holds up. You do not win by talking about yourself more. You win when other people's sentences end in your name.

3. Simple, exclusive facts beat tangled ones

BERT scored 68 to 74.5% precision on 1-to-1 relations like "capital of." On N-to-M relations, where one subject maps to many objects and back again, it collapsed to around 24%.

The lesson for positioning is brutal and useful.

If your brand could be the answer to fifty different questions, you are an N-to-M relation, and the model will hedge. If you own one crisp claim, "the tool for expense reporting," you are closer to a 1-to-1 relation, and that is exactly the kind of fact a model returns confidently.

Narrow your claim. Vagueness is not just bad marketing now. It is technically harder to recall.

4. You do not need a database to be quoted

The most quietly radical result: BERT answered open-domain questions at 57.1% precision@10, against 63.5% for a supervised knowledge base with oracle entity linking. No fine-tuning. No retrieval system. Just what it absorbed during training.

That gap is small, and it has only narrowed since.

It means the model's parametric memory, the stuff it learned from the open web, is competitive with a purpose-built fact store. Your content does not need to be in a special index to be returned. It needs to be in the training data, phrased as a clean fact, in the object slot. That is the whole of the law.

5. Confidence is a signal, so earn it

Performance correlated with the log probability of the prediction. Plainly: when BERT was confident, it was usually right.

Confidence comes from consistency. A claim the model has seen phrased the same way, by many sources, in the object slot, is a claim it will state without hedging. A claim it has seen once, phrased oddly, gets a shrug.

So repetition matters, but structured repetition. The same fact, the same shape, across many places.

The uncomfortable conclusion is that a lot of brand content is written to be impressive rather than recallable. It leads with the brand as subject, claims too many things, and varies the phrasing every time. A 2019 paper about BERT predicted exactly why that fails.

Audit your own copy this week. Count how often your brand is the subject of a sentence versus the object of someone else's. If the ratio is lopsided towards subject, you have found your work.


Related reading