Step Five – Foundation Models and General Purpose Learning

By the late 2010s, the transformer architecture had unified the field. For the first time, a single model family could handle language, vision, audio, code, and multimodal tasks with only minor adaptations. But the real breakthrough came not from architecture alone, but from scale. As researchers expanded datasets, parameters, and compute budgets, a new phenomenon emerged: models began to generalise. They learned behaviours that no individual dataset explicitly contained. They absorbed structure across domains and developed capabilities that exceeded expectations.

This marked the arrival of foundation models – large, pre-trained systems that capture broad patterns across enormous corpora and then adapt to specific tasks with minimal additional training. Instead of building an engine for each problem, one engine could support many applications. In practical terms, this transformed AI from a narrow set of tools into a general platform. Foundational models included:

The shift was historic. Foundation models introduced new learning dynamics, changed how organisations built AI systems, and reframed research questions around alignment, safety, and interpretability. They also altered expectations: where earlier models specialised, foundation models generalised. Where earlier models required domain-specific features, foundation models learned universal representations. Understanding this stage is essential because it explains both the extraordinary progress of the early 2020s and the limits and challenges that still shape AI in 2025.

Running Phi-3 SLM on a laptop! I was genuinely shocked and impressed that this was possible

The Rise of Pretraining

The underlying idea behind foundation models is surprisingly simple:
train one very large model on a vast, diverse dataset so it can learn the structure of a domain before being adapted to specific tasks (hence “General”).

Pretraining shifted AI away from task-specific pipelines and toward a two-stage process:

Stage 1: Learn general patterns across language, images, code, or multimodal data.
Stage 2: Use small amounts of specialised data to adapt the model.

This mirrors human cognition. We learn general concepts first, and then refine them through experience. Pretraining allowed AI systems to behave similarly.

Technically, this relied on:

trillions of tokens
large-scale distributed training
sophisticated optimisation strategies
careful curation (and sometimes accidental emergence) of training data

The result was models with rich internal representations capable of solving tasks they had never been explicitly trained for.

Scaling Laws and Emergent Properties

One of the most striking discoveries of this era was the emergence of scaling laws – predictable curves showing how model quality improves with increased parameters, data, and compute. These laws allowed researchers to forecast performance in advance and plan training runs accordingly.

As models scaled, they exhibited emergent behaviours:

solving reasoning tasks beyond simple pattern recognition
translating between languages not present in training pairs
writing functional code
performing chain-of-thought reasoning when prompted correctly
answering questions across scientific and technical domains

These behaviours were not designed. They emerged from the interaction between architecture, data, and scale. For the first time, AI felt like a general-purpose reasoning system rather than a collection of specialised engines.

During this period, my own work in marketing began to incorporate these models experimentally. Their ability to generalise across topics made them uniquely suited to tasks that depended on conceptual reasoning, interpretation, and content synthesis – capabilities far beyond earlier tools.

Why Foundation Models Changed the Landscape

The significance of foundation models can be understood through three transformations.

Universality

Earlier models solved narrow tasks. Foundation models became base layers for entire ecosystems. A single model could handle classification, summarisation, question answering, translation, retrieval, and generation.

Transferability

Pretraining created embeddings and representations that transferred across domains. A model trained on general language could adapt to medicine, law, marketing, or coding.

Accessibility

APIs made foundation models available to organisations without deep research teams. This democratised access to advanced AI, reshaping product development, analytics, and marketing strategy.

These effects reshaped industry thinking. Instead of asking “How do we build an AI model for this task?” teams increasingly asked “How do we adapt a foundation model to our domain?”

c. 2024, a transitional period where we started talking about the impact of these foundational models on the work we did

Limitations and Challenges

Despite their extraordinary capabilities, foundation models introduced new constraints.

Opacity

As model size increased, interpretability declined. Their representations were powerful but difficult to analyse.

Alignment

General-purpose behaviour raised questions about reliability, safety, and unwanted outputs.

Resource Requirements

Training costs rose sharply, concentrating capability within a small number of organisations with access to large compute clusters.

Dependence on Data Quality

Large-scale pretraining amplified biases, inconsistencies, and noise present in real-world datasets.

These challenges shaped the transition to Stage Six, where the focus shifted from general-purpose learning to agentic behaviour and the orchestration of tasks and tools.

Foundation Models in Practice

Across the early 2020s, foundation models entered almost every domain:

copilots for coding
document drafting and summarisation
semantic search and retrieval
customer interaction systems
decision support and forecasting
data enrichment pipelines

In business contexts, the models provided a new layer of leverage: the ability to reason across content, interpret intent, and generate coherent output. Many marketing workflows changed fundamentally as foundation models became capable of ideation, content structuring, campaign analysis, and even technical synthesis.

What distinguished this period was not just capability, but abstraction. Instead of writing rules or designing features, teams increasingly curated prompts, evaluated outputs, and built lightweight interfaces on top of powerful general models.

Setting the Stage for Agents

Foundation models enabled general reasoning but remained passive. They responded to input rather than initiating sequences of actions. They lacked planning, memory, and long-term structure. Stage Six explores how these limitations triggered the rise of agentic systems – models that orchestrate tools, interact with external systems, and pursue multi-step goals.

This next stage introduces autonomy, context retention, and multi-step decision-making – the bridge between generative text engines and true functional agents.

The Eightfold Path of AI

Summary

Foundation models emerged through large-scale pretraining across vast datasets.
Scaling laws revealed predictable performance gains with increased size.
Models began to generalise, showing emergent, transferable behaviours.
Foundation models reshaped industry by enabling general-purpose AI systems.
Stage Six explores agentic systems that build on these foundational capabilities.