How AI engines are reshaping marketing visibility. Insights into Generative Engine Optimization (GEO), Answer Engine Optimization (AEO), and how brands can influence what LLMs say about them.
What I have now heard from a couple of recent clients, is the importance of actually tracking progress in AI search. As with any parts of marketing you want to creating great messaging material, content and so on; but you also need to know whether that effort is actually working or not.
I use a simple spreadsheet for doing this, attached below:
As part of my work on being seen by various AI tools I am using a simple spreadsheet to track my progress on different platforms. Just posting it here for now if anybody finds it useful:
One of the biggest blockers I have had is trying to get my head around how you can run some sort of large language model which is good enough for useful tasks, on a regular laptop. Before I looked into the details of how the magic worked, this just didn’t seem possible!
But there are some very smart people out there, and your starting point is the understanding of Ollama, downloaded from https://huggingface.co/ .. Once you have this installed and running on your laptop, you can start implementing some of the models and seeing how far you can take it based on your machine spec..
Measuring the performance of marketing departments has always been a primary objective for marketing leaders. There have been many models and proposals for how to do this, most of which I’ve used over the years.
But now there is an added dimension – how well is your AEO and GEO optimisation going? What do you need to add to your quarterly deck? The attached is my suggestion, based on work I have done before, but now with a small update for AI.
Why AI Visibility Lets Us Measure Brand Impact. Finally.
For as long as I’ve worked in marketing, there’s been one problem I could never quite solve: how to measure the impact of brand work.
We’ve always had solid tools for performance — clicks, conversions, funnel stages. But brand strength? Hand-wavey and unmeasurable. I’ve tried surveys, awareness studies, proxies like search volume. They gave some data, but they were slow, expensive, and rarely told us what people actually thought at the point of decision.
That’s why I find AI search and generative engines so interesting. I believe for the first time, we can open that black box and see the impact of your brand in action.
The Measurement Problem
When I’ve been a CMO, I always felt the tension between what the numbers said and what the market was actually feeling. Performance dashboards looked precise, but they ignored the harder question: what position does our brand hold in people’s minds?
Surveys and brand trackers tried to answer this, but they were blunt instruments. They lagged behind reality and rarely influenced day-to-day decisions.
So “brand impact” remained something we talked about, but couldn’t measure in any meaningful way.
AI as a Brand Mirror
Generative engines don’t just list links. They generate answers — which means they have to synthesize a point of view.
When someone asks an AI tool about your category, the answer is shaped by:
What the model has learned over years of training.
What it’s picking up now from your site, your thought leadership, your competitors.
That makes AI outputs a kind of brand mirror. They show what the system thinks your brand (or your category) stands for in real time. As mentioned here, “Brand” can be considered as the priors in a Bayesian model where they represent previous knowledge that we bring when making a product selection. Simply put your prior knowledge that you love Apple products has to be considered when you are searching for a product solution – you are unlikely to buy the new expensive iPhone because of a feature list 🙂
The big difficulty as we all know as marketers is that non-marketers don’t get this. This is partly our fault of course, we are notoriously bad at “Marketing marketing”. And I don’t think words are enough – we need some maths! we need to understand why these things are related at a deeper level. Then, the next time when your boss says “Why are we spending all this money on brand advertising can’t we just get some leads?”, you can link these two things together (brand spend and company performance) much more directly.
So GEO is a new and exciting frontier. But how can we measure its impact?
Five Ways AI Visibility Changes Brand Measurement
It’s real-time – no more waiting six months for a brand tracker; AI outputs shift as your content does.
It’s observable – you can literally see how your brand is described in answers, not just infer it.
It’s comparative – you can check not only how you appear, but how competitors are framed.
It’s testable – publish new messaging or thought leadership and see if it shows up in generative answers.
It’s scalable – instead of surveying a few hundred people, you’re checking the same engines millions already use.
Why This is a Breakthrough
For the first time, we can ask: what does AI say about us? And we can track how that changes.
Publish new messaging → does it appear in AI answers?
Run a campaign → does it shift how the category is described?
Invest in thought leadership → does it influence which sources are cited?
That’s measurable brand impact. Not perfect, but visible in a way it’s never been before.
Why I’m Testing This
This post is part of a wider project I’m running here on bjrees.com. Over the next few weeks I’ll:
Publish structured content designed to test how AI engines update their outputs.
Track before-and-after snapshots in Perplexity, Bing, and ChatGPT.
Share the results, whether they work in my favour or not.
If brand has always been the hardest thing to measure, maybe AI finally gives us a way in.
FAQ
Why has brand impact been so hard to measure? Because the tools we had — surveys, awareness studies — were indirect, slow, and disconnected from real buying moments.
How does AI change that? AI systems generate answers by combining historical knowledge with fresh signals. That gives us a live view of how a brand is represented.
Is this just another form of SEO? Not really. SEO is about rankings. GEO shapes long-form generative answers, and AEO structures content for short, direct responses. Together, they make brand visibility something we can observe and track.
For more than two decades, digital marketing has revolved around search engines. Brands competed for rankings on Google, invested in SEO playbooks, and built inbound marketing machines. But the terrain has shifted. Increasingly, customers don’t just “search” – they ask AI. Tools like ChatGPT, Perplexity, Bing Copilot, and Google’s AI Overviews are rewriting the journey from curiosity to consideration.
But this raises a critical question: if this is true, then what does AI say about your brand? What impact does it have on your marketing?
Unlike search engines, which display lists of links, generative engines produce answers. If your brand is absent from those answers, you are invisible – even if your site is technically well-optimized for search. Visibility in the AI era requires new strategies, grounded in both marketing practice and cognitive science.
That’s where Generative Engine Optimization (GEO) comes in: the discipline of ensuring your brand is embedded in the priors that AI systems draw on when producing content. Alongside GEO, Answer Engine Optimization (AEO) provides near-term tactics for appearing in AI-generated answers today.
This article draws on my background in mathematics, psychology, and early machine learning, together with two decades leading marketing in B2B SaaS, to explore the theory and practice of GEO and AEO – and why they matter for the next decade of marketing.
From Psychology to AI: How Decisions Are Made
Marketers often forget that human decision-making is not purely rational. People don’t start from a blank slate when choosing between brands – they begin with priors: mental shortcuts, schemas, and brand associations.
Psychologists call this process schema-based decision-making. Everyone arrives at a buying decision with a pre-existing set of beliefs. For example:
New evidence – reviews, product pages, recommendations – gets filtered through these priors (or at least “combined with”). The process is mathematically described by Bayesian updating: starting from a prior belief, and updating it with new evidence to form a posterior belief (the decision).
AI Works the Same Way Generative AI mirrors this process. Large language models (LLMs) have priors (training data, embeddings, parameters) and evidence (retrieved documents, prompts). When asked a question, the model integrates both to generate an answer – much like a human updating beliefs with new information.
That means marketers must now think in Bayesian terms not only about human buyers, but about AI engines as decision-makers.
SEO in Decline: A Saturated Market
For years, SEO was the growth engine of B2B marketing. Publish content, build backlinks, climb the rankings. But in 2025, SEO faces three challenges:
1. Market Saturation – Most categories are dominated by entrenched incumbents with high domain authority. – New entrants struggle to gain traction without massive resources.
2. Standardization – Everyone uses the same tools (Ahrefs, SEMrush, Clearscope). – Agencies follow the same playbooks. – What once gave an edge is now a commodity.
3. AI Intermediation – Google AI Overviews, ChatGPT, and Perplexity strip away organic clicks. – Even well-optimized content is paraphrased by AI, with fewer users clicking through.
SEO is no longer an offensive play – it’s defensive. If you already rank, you defend that ground. But building a new presence from scratch is brutally hard.
GEO vs. AEO: Two Paths to AI Visibility
To navigate this new landscape, I distinguish two complementary disciplines:
Generative Engine Optimization (GEO) – Analogy: Brand advertising. – Nature: Long-term, cumulative, hard to measure. – Goal: Shape the priors inside AI systems so your brand is included in their answers. – How: Structured content, metadata, thought leadership, embedding strategies, visibility in trusted sources. – Reference: Pranjal Aggarwal (2023): https://arxiv.org/abs/2311.09735
SEO: The “Middle Child” – Once fast and high-impact. – Now slow, saturated, and heavily intermediated. – Still necessary, but no longer sufficient.
Why GEO Is the Differentiator
For most marketers, AEO will be the entry point – testable, tactical, accessible. But GEO is where differentiation happens.
– It’s hard. Understanding embeddings, training data, and AI salience requires a mix of technical and strategic skill. – It’s rare. Few agencies truly know how to influence LLMs. – It’s durable. Once your brand is part of an AI’s priors, it becomes “sticky.”
In other words: GEO is the smart play for thought-leaders, challengers, and consultancies who want to own the frontier.
You need to understand marketing, Bayesian models, schema theory and machine learning to understand how cognitive and probabilistic models underlying how people make decisions about which product to go for.
Practical Strategies for Marketers
So what does this mean in practice?
1. Think Like a Bayesian – Ask: what priors do people and AI systems already hold about your category? – Provide new evidence that can shift those priors. – Balance brand-building (priors) with tactical content (evidence).
2. Balance GEO and AEO – Use AEO for immediate visibility in AI answers. – Invest in GEO for long-term brand salience inside AI systems. – Accept that both are necessary.
3. Reframe SEO – Treat SEO as defensive, not a growth engine. – Protect your rankings, but don’t bet your future on them. – Redirect resources into GEO and AEO experiments.
4. Build for Multi-Channel Demand – Inbound is no longer enough. – Combine GEO/AEO with outbound, partnerships, and category design. – Build resilience into your marketing mix.
The Human Side: Teams and Capabilities
Technology alone won’t save you. Winning in the AI era requires marketing teams that are: – Strategic – able to connect brand, content, and AI salience. – Technical – comfortable with schema markup, embeddings, and AI monitoring tools. – Adaptive – willing to test new channels before they’re fully standardised.
The challenge is not just tools but culture. Embedding AI into workflows, hiring for curiosity and rigour, and coaching teams to thrive in uncertainty are as important as any technical tactic.
Conclusion: Shaping What AI Says About You
The shift from search to AI answers is not a fad. It is a structural change in how information is mediated, how buyers form beliefs, and how brands achieve visibility.
– SEO is defensive. – AEO is tactical. – GEO is strategic.
The companies that succeed will be those that learn to shape both human beliefs and machine priors.
“If your job isn’t what you love, then something isn’t right”1
If you are not passionate or at least interested in what your company does for customers then working in marketing is quite a slog. Of course parts of the marketing role which are less interesting than other parts (I’m no fan of doing expenses…) but if you aren’t interested in the world of marketing how it works and how you actually get customers – then you will struggle to give it your all. And part of that should be having a passion project in your job ❤️
Mine is AI. I studied it at college and I’ve always been interested in how the AI world is evolving. At times I have struggled to use it in work, but that has all changed with AI taking over the world.
I’ve been working on Project Skynet for a while now, but I am particularly excited about this next stage – running everything on my laptop.
Initial Setup Stage
I’ve only put one or two notes here because it will really depend on your machine and there are plenty of other places that give much better insights about how to set up Linux on a laptop. Below is what I did on my machine.
This was the most interesting and exciting part for me. I had a presumption that it would not be possible to run this system on my laptop. Surely to be running a mini brain on my Dell just doesn’t make sense!? How wrong I was.
It definitely is the case that I can’t create ChatGPT 5 model! I might be able to run one of the ChatGPT 4 models, and I will experiment with that another time.
But I only want to run quite simple tasks so I’m actually just going to use Phi-3. as I say, I’m not planning to plot out a new moon orbit with this, just run some simple processes on my laptop.
I will pause there mostly because I was very pleasantly surprised at how easy this all was to set up. I had originally thought that it would be multiple stages to get this up and running, but Ollma has made this so simple. So for now I will leave you with my first query on my laptop, not in the cloud:
But why am I bothering to do this? Why not just run this in the cloud? It certainly would be the simpler option. The problem with doing this is you aren’t really learning anything or developing your expertise and something new and exciting. This is what I personally enjoy, and so now that I have this up and running I will move on to the next stages – building a marketing assistant that is just running on my laptop.
There are other benefits too – I feel happier about privacy and security when it is running on my machine, I also feel more in control of the spend. but there is also a strange feeling that I don’t want to “rent” the infrastructure I would rather have it on my machine and own it. This was always my ambition but, as I say, I didn’t think it could be done on a laptop. I was wrong.
I’ve updated my marketing pyramid, adding in a new layer for LLMOs – I feel it has got to a point where a marketing strategy that doesn’t reference this new world will start to look a little dated.
I’m working on a new version of this pyramid which property understands the impact of LLM developments, though the core principles here remain unchanged. Put simply you can’t do absolutely everything in a normal sized marketing department. you have to make strategic decisions Specifically choosing to place your budgets and people to address the problems that you face today. Sometimes that will be long term LLMO work, sometimes that’ll just about getting a blog live. Sometimes you just need to go out and meet some more customers!
But you can’t do all of that today and this is where this pyramid really helps me out.
Over the past year, many marketing teams have opened Google Search Console, seen a drop in clicks, and asked: “Is our content failing?”
That’s a reasonable question—until you look a little closer. Because what’s actually happening isn’t failure. It’s a structural change in how people search.
Meet the “Crocodile Effect”
Since mid-2023, a consistent pattern has emerged in Search Console data for many sites:
Impressions are going up
Clicks are going down
This diverging trend has been nicknamed the Crocodile Effect. It’s being driven by changes in how Google surfaces information—specifically, the rollout of AI Overviews (formerly “Search Generative Experience”).
Why It’s Happening
Google’s new AI-generated summaries often answer user queries directly on the results page. These responses pull from multiple sources, cite content, and increasingly eliminate the need to click through to a website.
As a result:
You might still rank highly or be cited in an AI summary.
But the user gets their answer immediately, without clicking.
This is classic zero-click behaviour, accelerated by generative AI.
What This Doesn’t Mean
It doesn’t mean your content isn’t valuable.
It doesn’t mean you’re being outranked.
It doesn’t mean your SEO strategy is broken.
In fact, if your impressions are rising, it likely means your content is still being seen—it’s just being surfaced in a different format.
This is why first-party data matters more than ever. Many third-party SEO tools can underreport traffic by a factor of 5–10x compared to Search Console. Always trust the primary source.
What to Do About It
1. Shift the success metric
Clicks alone are no longer the best proxy for value. Visibility and influence on the buying journey—even without a click—are now key.
2. Optimize for “fan-out”
One large topic may now need to be split into multiple, specific pieces. AI Overviews tend to pull from narrowly focused content that aligns tightly to individual user intents.
Example: Instead of “Microsoft 365 Security Best Practices,” consider also writing posts on:
“Conditional Access Policy Setup”
“How to Audit Weak Passwords in Microsoft Entra”
“Microsoft Defender for Office 365 Configuration Tips”
3. Track LLM visibility
It’s not just about Google anymore. Users are also searching with tools like ChatGPT, Perplexity, and Copilot. Some marketers are starting to track presence across these surfaces, too.
What Might Come Next
While traditional web traffic may drop, purchase intent might actually rise. Users who’ve researched via AI and LLMs could arrive on your site more informed and ready to convert.
In one case I came across recently, traffic originating from ChatGPT converted at 7x the rate of regular organic. That makes sense—if the AI has already explained your value, you’re meeting the visitor mid-funnel, not top.
Final Thought
SEO isn’t dying, but it is evolving.
It’s no longer just: “How do I rank?”
But: “Where am I surfaced, and how?”
Understanding this shift—and adjusting accordingly—will separate the frustrated from the forward-thinking in the next wave of digital strategy.
I’ve been experimenting with large language models (LLMs) and vector databases like Pinecone — not just as a research interest, but as a working prototype. My goal was to build a system that could retrieve, structure, and surface my own content in a way that’s useful to both people and machines.
What started as a technical exercise quickly turned into a content strategy rethink. The more I worked with embeddings, retrieval, and prompting, the more obvious it became that most B2B SaaS content — mine included — isn’t really designed to be useful in an LLM-shaped world.
This post is a set of observations from that process. It’s not a how-to, and it’s definitely not marketing advice. It’s just a few things I’ve noticed while trying to make my content more legible — to machines, yes, but also to myself.
1. LLMs don’t skim, they distill
One of the first things I noticed was how differently LLMs process content. They’re not scanning a web page for formatting cues or crawling a hierarchy of headings. They’re vectorising meaning — pulling intent and structure from the text itself.
This rewards clarity over cleverness. Vague intros, overused analogies, and “setting the stage” paragraphs get flattened. What works best is directness: “This is what the user needs to know, and here’s what we know about it.”
2. Most content is badly stored
I had to dig through slide decks, half-written blog drafts, and internal notes to feed the system anything useful. And even when I did, it wasn’t in a format the LLM could make much sense of.
A lot of our content isn’t unfindable because it’s private — it’s unfindable because it’s scattered, fragmented, and inconsistently written. Structuring information (even just basic metadata and formatting) turned out to be more useful than adding “AI” to anything.
3. Answerability is the new readability
When I tested my system by asking questions for Syskit — “What are common governance risks in Microsoft 365?”, for example — it only worked if the source material actually contained answers. Not positioning. Not messaging. Actual sentences that respond to an implied question.
I started to think of this as “answerability”: could this content, in its current form, directly answer a user or AI prompt? If not, it’s probably not useful — not to the system, and not to anyone else either.
4. Consistency matters more than tone
LLMs are surprisingly good at detecting contradiction. If one post says we support something and another implies we don’t, the system flags ambiguity. That’s useful — but also a bit exposing.
I used to think consistency was about branding. Now I think it’s about information integrity. If the machine can’t reconcile what you’re saying across multiple assets, it won’t confidently say anything at all.
5. Structure beats style
There’s nothing wrong with good writing. But good structure — clear subheadings, defined sections, and consistent terminology — outperforms style every time when you’re working with LLMs.
Most of what I had to rewrite wasn’t because the sentences were bad. It’s because the paragraphs had no job. There was no signal about what a block of text was meant to do: define, explain, compare, warn, resolve.
Once I started thinking about content structurally — almost like documentation or an API — everything started working better.
6. You can’t fake this with ChatGPT
There’s a temptation to take short-cuts: paste your post into ChatGPT, ask for SEO suggestions, then call it LLM-optimised. But when you’re building your own retrieval stack, you realise pretty quickly that what matters isn’t how AI generates content — it’s how it understands it.
Most B2B content isn’t referenceable because it’s too shallow, too scattered, or too brand-filtered. You can’t prompt your way around that. You have to fix the source.
Final thought
Building with LLMs — even in a small way — forced me to re-evaluate how I write, store, and structure information. The tools didn’t just change the output. They changed how I think about the inputs.
“And whether or not AI might already be, as some scientists believe, sentient, and there’s this little piece on the front of the Daily Telegraph this morning about an AI model that was created by the owner of ChatGPT that apparently disobeyed human instructions and refused to switch itself. Researchers say that this particular model is 03 model, described as the smartest and most capable to date was observed tampering with the code that was meant to ensure its automatic shutdown, and it did it despite an explicit instruction from searches that it should allow itself to be shut down, which is fascinating, isn’t?” – Anna Foster, Today program 26.5.2025
So far in this project, I’ve been building a system that can scan all my blog posts, documents, and notes, extract the useful stuff, and make it searchable via natural language. The aim is to get something that works like a real-time assistant — answering questions using my own content as the source.
We’re now at Stage 4. Here’s what’s happened so far, and what comes next.
What’s Working
There are two sides to this system:
One piece of code handles data ingestion. It scans my files, pulls out the text, and stores it in Pinecone (a vector database).
The other piece lets me query that data using natural language.
The ingestion script (PopulateChatSystemDataRepository.py) currently runs manually — mostly because I’m trying to avoid hitting API rate limits. Eventually, it’ll move to Google Cloud Run so it runs continuously without needing my laptop open.
On the querying side, I started with basic keyword search. It was fine, but not great — too brittle. Now I’m using embedding-based retrieval with Pinecone, which is far better at handling fuzzier, more conversational queries.
The current setup includes a FastAPI service deployed on Cloud Run. It accepts queries via a simple URL. For example:
Type that into a browser, and it returns a relevant result from my content. It’s rough around the edges, but it works.
Why Speech-to-Text Is in the Mix
You might notice I’ve already wired in Google’s Speech-to-Text API — even though I’m still in the text-only phase. That’s for later. Eventually, I want this system to handle real-time conversations — voice in, answers out. But for now, I’m keeping things simple.
What’s Next: A Text Interface
This is the next step. I want to build a simple text interface — something that lets me talk to the system like an old-school text adventure game. No need for a fancy UI yet. Just a clean loop where I type a question, the system replies, and I can keep the conversation going.
Why this? Because before I worry about polish, I want to know the core experience works — the retrieval is accurate, the flow makes sense, and I can actually use it.
The checklist:
Add all my content to the system (done)
Build a basic interface for interaction (next)
That’s Stage 4. Getting the interface up and running is the next focus — and from there, it gets a lot more interesting.
When building search tools, intelligent assistants, or AI-driven Q&A systems, one of the most foundational decisions you’ll make is how to retrieve relevant content. Most systems historically use keyword-based search—great for basic use cases, but easily confused by natural language or synonyms.
That’s where embedding-based retrieval comes in.
In this guide, I’ll break down:
The difference between keyword and embedding-based retrieval
Real-world pros and cons
A step-by-step implementation using OpenAI and Pinecone
An alternative local setup using Chroma
Keyword Search vs. Embedding Search
Keyword-Based Retrieval
How it works:
Searches for exact matches between your query and stored content. Works best when both use the same words.
Example:
Query: "What is vector search?"
Returns docs with the exact phrase "vector search".
Pros:
Very fast and low-resource
Easy to explain why a match was returned
Great for structured and exact-match data
Cons:
Doesn’t understand synonyms or phrasing differences
Fails if the words aren’t an exact match
Embedding-Based Retrieval (Semantic Search)
How it works:
Both queries and documents are converted into dense vectors using machine learning models (like OpenAI’s text-embedding-ada-002). The system compares their semantic similarity, not just their words.
Example:
Query: "How does semantic search work?"
Returns docs about “meaning-based search” even if the words are different.
Pros:
Understands intent, not just keywords
Great for unstructured content and natural queries
Can surface more relevant results even if phrasing is varied
Cons:
More computationally intensive
Results are harder to explain (based on vector math)
Requires pre-trained models and a vector database
Feature Comparison Table
Feature
Keyword-Based Retrieval
Embedding-Based Retrieval
Search Logic
Matches words exactly
Matches by meaning
Flexibility
Low
High
Speed
Fast
Slower
Resource Use
Low
Higher
Explainability
High
Low
Best For
Structured search
Chatbots, recommendation, unstructured data
Common Tools
Elasticsearch, Solr
Pinecone, Chroma, FAISS
Setting Up Embedding-Based Retrieval
Let’s build a basic semantic search system using:
OpenAI (text-embedding-ada-002)
Pinecone (hosted vector DB)
Chroma (optional local alternative)
1. Choose Your Tools
Embedding model:
OpenAI’s text-embedding-ada-002 or a local Hugging Face model.
This is the part where all the content sources came together into a centralized system I could actually interact with.
This post is a cleaned-up record of what I built, what worked, what didn’t, and what I planned next. If you’ve ever tried to unify fragmented notes, decks, blogs, and structured documents into a searchable system, this might resonate 🙂
What I Built
There were two main components at the heart of the system:
Batch Processing Script PopulateChatSystemDataRepository.py — this was run manually to gather and format all source data into a single repository. My plan was to automate it later.
Continuous Scanner A lightweight background service monitored for new blog posts and updates.
At that point, the batch script did the heavy lifting, though I intended to shift it onto Google Cloud Run to handle scale.
Where the Data Lived
The sources I processed included:
PowerPoint files These were manually selected and hardcoded into the script — a reasonable tradeoff given how few I needed to track.
Notes from a Cambridge Judge Business School programme
Third-party and personal research logs
iCloud Backups These contained archived slide decks and supporting materials.
All of this data was funneled into a staging area for eventual vector embedding and retrieval.
Microsoft Graph API + OneNote
To pull content from OneNote, I used the Microsoft Graph API. First, I installed the required libraries:
pip install msal requests
msal handled authentication via Azure Active Directory
requests allowed me to interact with the Graph API endpoints
Once I authenticated, I could enumerate and query notebooks like this:
python ExtractNotes.py
After logging in via a Microsoft-generated URL, I could successfully extract content from all the notebooks I needed.
Licensing Curveballs
At the time, I hit a snag: my Microsoft 365 Family plan didn’t include SharePoint Online, which was required to query OneNote via the Graph API.
I weighed my options:
Pay for a Business Standard plan (~£9.40/month)
Try and use my home license in some way, even thought it didn’t seem to have what I needed for OneNote
I went with option 2, supported by a one-month free trial of Microsoft Business Basic to help validate the approach.
Google Sheets as the Backbone
The ingestion script used a JSON keyfile to interact with Google Sheets. It opened the sheet like this:
client.open_by_key(sheet_id).sheet1
Sheets acted as a live database — but I ran into 429 rate-limit errors, especially when repeatedly reading the same files. To solve this, I built a basic checkpointing system so the script would:
Cache previously processed records
Avoid re-downloading the same content every time
Track progress and only fetch new entries on each run
The GitHub Reset
After a short break from the project, I realized the codebase had grown too complex. I had introduced a lot of logic to deal with throttling and retries, but it made everything harder to understand.
So I rolled back to a much earlier commit and started again from a simpler foundation.
It was the right move.
What Came Next
Here’s what I tackled after that cleanup:
Migrated the whole project to an old home laptop
Simplified the ingestion pipeline
Ensured each run processed only new data, not the full archive
Finalized access and querying via Microsoft Graph API for OneNote and SharePoint content
Reflections
Skynet began as a chatbot experiment, but evolved into something bigger — a contextual knowledge system that drew from years of notes, presentations, and personal writing.
Stage 2 was about turning chaos into structure. The next phase was even more exciting: embeddings, retrieval, and building a system that could answer real questions, grounded in my own work.
Over the past few months, I’ve been building something a bit different: a real-time AI-powered assistant designed to help me work better with my own content. The goal is to create a system that can scan and catalog documents, blog posts, audio recordings, and notes, then surface that information back to me as I need it—almost like a second brain. I wanted it to pull from tools I already use daily, like Google Sheets, OneNote, and GitHub, and use technologies like Pinecone, OpenAI, and Google Cloud to power the intelligence behind it.
This blog series is a step-by-step breakdown of how I built it—from a messy OneNote notebook into a working system. Each post will focus on one key stage, including the code, architecture, and lessons learned along the way. This first post is solely about the tech stack that I chose, actually one of the most fun stages.
Setting Up the Environment for a Python-Based AI Chatbot
This project runs on Linux, primarily because I want to use Python, which I have some basic experience with. Here’s how I set up the development environment and supporting tools.
Core Tools and Services
Google Cloud
Speech-to-Text for audio transcription
Cloud Run to execute background processing tasks
Google Sheets for structured data storage (e.g., cataloging blog posts)
OneDrive (Personal) for general document storage
iCloud Drive for mobile voice recordings
GitHub to manage Python code and version control
Trello as a lightweight project tracker
ChatGPT to assist with development and planning
Getting Linux + WSL Working Smoothly
There were a few initial stumbling blocks in getting everything up and running, especially around Windows Subsystem for Linux (WSL). Here’s the distilled process:
Launch CMD as Admin, then enter WSL with: wsl
Activate the virtual Python environment: source myenv/bin/activate
Navigate to the correct project folder: cd ~/skynet/
Code and Repository Setup
All code is version-controlled in GitHub
To update code:
git add PopulateChatSystemDataRepository.py
git commit -m "Message"
git push origin main
git pull origin main
Libraries required when switching machines:
pip install gspread
pip install oauth2client
Purpose of This Setup
One of the key tasks here is to catalog blog posts into a Google Sheet using Python:
python3 SendDataToGoogleSheets.py
This setup forms the backbone of a knowledge base that can be queried by an AI chatbot.
Why Use Google Cloud Run?
I plan to regularly publish new blog posts and documents. These need to be automatically picked up by a cloud-based system, not just left on a drive. To do that, I’m using Google Cloud Run to host the background process that parses and ingests this content.
The service is named skynet, though it hasn’t been deployed live yet—waiting until the code is fully tested.
Setting Up the Google Sheet Database
Create a Google Sheet and give it a meaningful name (e.g., “Chat System Data Repository”).
Enable the Google Sheets API in the Google Cloud Console.
Set up the API key and credentials for access.
Code integration is done using Python with the gspread library—no Zapier or low-code tools.
Data Format for Each Entry
Each blog post entry should include:
Message ID
User ID
Timestamp
Message Text
Source (e.g., Slack, WhatsApp)
Response Status (e.g., Processed, Pending)
Parsed via Python’s feedparser, which extracts standard RSS fields such as title, link, description, content, and publication time.
Next Steps
There are two major next steps:
Add additional content sources into the pipeline (see Trello board).
For any new source, take the data through the same ingestion process.
Currently, everything runs through the PopulateChatSystemDataRepository.py script, which has been updated to handle edge cases like escape characters.
Now that the core data is in place inside a Google Sheet, the next stage is testing the pipeline end-to-end. Once that’s working, I’ll expand to include additional data sources.
I have abandoned attempts to control the whole thing by voice as it just seems an enormous amount of work the minimal return. It’s cool, but not practically helpful.
Instead and I think more interesting, is going into the world fine-tuning more. If I just wanted general information about marketing, I can do a Google search. The value comes from producing something based on my specific views. As a simple example, I attach much more credence to expertise to search engine optimization. This isn’t the view held by everybody in marketing :).
The other interesting point is about security. I have thought about putting this chatbot on the web for anybody to use. But I would need to do an audit of every single piece of information in the underlying database. So for now this is something that I use in general conversation as an aide.
Anyway, I will post more videos as it starts giving me more and more insight.
I’ve had a lot of fun over the last few weeks building a marketing chatbot. I started here and thought “How difficult can it be?!”
Two months later I have found out both how difficult some of it is, but also how easy some of it is. The easy bits were putting together sources of information and using ChatGPT to help with the structure of the app. The most difficult bits were anything to do with permissions (even on my own data) and anything where I needed to use Microsoft Graph API. Still a long way to go, but I thought I’d post an update on the tech stack I’ve used so far.
Screenshot of the app so far! I will publish the real thing when I’ve had a look at the security implications…
Tech Stack
The very simple architecture/workflow for the app can be split in half:
Server-side: Run various processes to collect together everything I know about marketing and put it into a database.
Client-side: A way of running queries against that database
Having these guidelines helped enormously with choosing the technology and also helped to make sure everything worked together.
I wanted to use the following guidelines:
As free as possible. Only pay for something if the cost was minimal and it saved me an enormous amount of time/reduced complexity
Linux based. Again for cost purposes, but also because it’s just much easier to use 3rd party libraries
I ended up with the following, first for server-side then for client-side.
Server-side
Google Cloud. I have used Google Cloud in three places. If you are in the world of Linux then it is so easy to call the Google Cloud API, the online help is superb and it is relatively easy to use. But the biggest winner was the integration between GoogleSheets and the API. I thought this would be tricky, but actually you were guided through it very simply. Specifically I used it for:
Speech-to-text encoding
Cloud Run to run the independent scans of various types of content
Google Sheets as document storage, including a sheet which lists all of my blog posts
Microsoft Personal OneDrive for documents. This is a significant source of information as I have used Microsoft personal OneDrive for a decade or more to keep notes on various ideas. With hindsight, I would have kept this much cleaner and better governed. There has been a lot of work cleaning up what was in here in all sorts of formats
GitHub for storing all Python code. A private repo for now
ChatGPT to help me with everything. It would be a lie to say that I wrote everything from scratch. I went to a very impressive lecture recently at the Cambridge Marketing Meetup from Siok Siok Tan who talked about how, going forward, people would work side-by-side with the machines splitting out tasks as appropriate. The way forward for this sort of work is being smart about what you should do and what the machine should do. What I found particularly interesting was the symbiotic relationship needed for writing code. For example, chat gpt wrote quite a lot for me as a starting point but I usually found I had to do a reasonable amount of work on top of that, because the ChatGPT sources were out of date, and APIs seem to be getting updated all the time.
Microsoft Graph API. Sigh. I use this extensively to programmatically access my OneNote notes. I had just finished working with the Google technology and thought “Surely the Microsoft APIs are just as easy?”. No they are not. A lot of this is to do with security of course, and that is good as I don’t want everybody reading my personal files. However the structure of OneNote files is nontrivial and getting API access to work correctly there’s a lot of hard work.
Client-side
Pinecone was a big breakthrough for me. I want a chatbot which isn’t just a simple text search. I wanted to try and do something which allowed me to use some sort of fuzzy search mechanism. As an example, my records might say something like “Account based marketing is a great strategy”, But I want that to come up if I searched for something like “What is ABM?”. I needed something more than just simple keyword based retrieval, so moved over to using embedding based retrieval with OpenAI.
Docker to containerize everything. Is that a real word!?
Google’s speech-to-text API. I’m going to save this for the 3rd blog post, partly because it is just so impressive it deserves a post of its own.
That will do for now. The next stage is all the testing, particularly to make sure the time using the right security for my documents. One of the key issues with using AI technologies on your own notes (essentially what I’m doing here), is the problem of oversharing information when you shouldn’t. It is something that we help customers with at Syskit, albeit in a different context (Microsoft 365 governance). But the principles are the same – are you sharing things outside your organisation that you shouldn’t be?! Be careful out there… I will eventually put a link in here that links to my app, but not until I have gone through all of the information and documents that I am using to make sure I am not oversharing. I will need to do this by hand so I may be some time…
I’ve started a new project to try and write some Python code so that I can talk to the laptop in real time as I’m working. The ultimate goal is to create a real time assistant who is making suggestions for me in conversations.
Today was day one. Despite many many false starts I finally managed to get the laptop to recognise my voice and start transcribing it. Let Judgement Day begin.
I started doing some manual research into the security risks with different AI tools. But then I thought, why not get the AI to do it for me? So that’s what I did. Once again I am very impressed…
1. Data Privacy and Confidentiality
ChatGPT: Users may inadvertently share sensitive information, potentially leading to data leaks. ChatGPT conversations may not always have the same enterprise-level data security controls, depending on deployment.
Gemini: Google advises against sharing confidential information with Gemini, as conversations may be reviewed to improve quality, raising privacy concerns for sensitive data. (searchenginejournal.com)
Apple Intelligence: Apple prioritizes on-device processing and privacy, with Private Cloud Compute to protect user data. However, the effectiveness depends on consistent use of these privacy features by users. (security.apple.com)
Microsoft Copilot: Integrated within Microsoft 365, Copilot has enterprise-grade security and compliance features built-in, including support for GDPR, HIPAA, and other regulatory standards. However, the risk of users accidentally sharing confidential data through Copilot remains, especially if the model is trained on organization-specific data without strict data policies in place.
2. Phishing and Social Engineering
ChatGPT: Can be exploited to craft convincing phishing emails or social engineering scripts by generating content that mimics corporate communication styles.
Gemini: Google Gemini has been found to have vulnerabilities that could be exploited for phishing, potentially enabling attackers to take over chatbots or impersonate users. (securityweek.com)
Apple Intelligence: While Apple Intelligence AI hasn’t been specifically linked to phishing exploits, any AI with language generation capabilities could potentially be leveraged for social engineering if misused.
Microsoft Copilot: As it interacts with Microsoft 365 tools, Copilot has the potential to automate and personalize phishing messages within Microsoft’s suite, particularly within Outlook or Teams. Enhanced by organizational knowledge, phishing attacks crafted by Copilot could mimic familiar internal communication patterns, making them harder to detect.
3. Malicious Prompt Injections
ChatGPT: Susceptible to prompt injection attacks that could manipulate the AI’s behavior to provide unintended or sensitive information.
Gemini: Vulnerable to indirect prompt injection, which could enable phishing or chatbot takeovers. (securityweek.com)
Apple Intelligence: Apple has measures to guard against vulnerabilities and offers rewards for identifying AI security flaws. However, the risk of prompt injection remains if used in complex workflows without safeguards.
Microsoft Copilot: As Copilot becomes embedded across Microsoft 365 applications, prompt injections could potentially allow malicious users to exploit workflows or access sensitive data by manipulating Copilot’s responses. This is particularly concerning in applications where sensitive data is routinely processed, like Excel or SharePoint.
4. Data Exfiltration and Unauthorized Access
ChatGPT: Without proper security configurations, there is a risk of data exfiltration if ChatGPT is misused or linked to sensitive applications.
Gemini: Accused of unauthorized data scanning on Google Drive, Gemini has raised concerns over potential data exfiltration or access without user consent. (techradar.com)
Apple Intelligence: Apple’s approach emphasizes on-device data handling to mitigate unauthorized data access, though secure implementation and user adherence are necessary to minimize risk.
Microsoft Copilot: As an AI system integrated with Microsoft 365, Copilot has extensive data access, which could be exploited if permissions are misconfigured or if attackers find ways to bypass controls. Because Copilot can access files, emails, and other stored information, this could expose sensitive data if not closely monitored.
5. Compliance and Regulatory Risks
ChatGPT: Compliance risks may arise if sensitive or regulated data is inputted, especially if stored outside organizational control, potentially violating GDPR, CCPA, or other regulations.
Gemini: Google’s Gemini AI practices could pose regulatory challenges, particularly around user consent, data retention, and control over how user data is handled.
Apple Intelligence: Apple’s privacy focus and on-device data processing align well with regulatory standards, but enterprises must ensure their specific use cases remain compliant with industry standards.
Microsoft Copilot: Copilot aligns closely with Microsoft’s regulatory and compliance frameworks, making it a safer choice for organizations bound by strict regulations. However, organizations still need to ensure data governance policies are enforced to avoid regulatory risks related to AI-driven data processing.
Conclusion
While ChatGPT, Gemini, Apple Intelligence, and Microsoft Copilot each bring distinct features and security controls, core security risks are common across all platforms. These include data privacy, phishing, prompt injection vulnerabilities, data exfiltration risks, and compliance challenges. Microsoft Copilot offers the advantage of built-in enterprise security and compliance support, but also presents risks, especially around data handling, unauthorized access, and phishing automation.
For all platforms, implementing clear data-sharing policies, monitoring AI interactions, user training, and regular security audits will help organizations mitigate these risks effectively.
Most of your time as a marketing leader is spent trying to make decisions with inadequate data. In an ideal world, we would have run an A/B test on everything we wanted to do, looked at the numbers and then made a decision. Which image should we use for our new advert? What message? What tone? Which type of customer are we trying to reach? And 1,000 other things.
A/B testing is one way of approaching this problem. The difficulty is that most marketers will – and should! – already have a view. If I was given the choice between two headlines:
Find out how our products can help you
Click here to give us some money
I know which one I would click on, I definitely don’t need to do a test!
But here is a more realistic example. You are trying to sell into a company and you are not sure who makes the decision. Is it the end user? The manager? The person holding the strings?
How on Earth do you do an A/B test for something like this?
You will soon find with a question like this, that you quickly hit the “Everyone has an opinion“ problem. You ask various members of the team and outside your team and everyone gives a different answer. There are a couple of ways out of this situation as I’ve mentioned, but doing an A/B test is generally completely impractical.
So what can you do? The approach that I take now is to use some of the concepts from Bayesian logic to help me make the decision. The key concept is is the idea that every decision you make is a combination of your prior knowledge plus the data that you see. And the real issue with prior knowledge is that everyone comes to the table with their own history.
As an example, if you have been running a marketing team for years, and all you have been doing is numbers-driven digital marketing for B2C businesses – and crucially, you have had success with that, then you are going to start your analysis with that approach in mind – the answer to the question “what should we do next for our marketing?“ will very likely be something around digital marketing strategy. In contrast, if you come to the table from a brand marketing background, then it is likely that your initial opinions will favour this sort of approach. Why? Because this is what you know and there’s a good chance you’ve had success with it at some point in the past.
Crucially, just asking the question “which is right?“ will not get you anywhere! You each have prior knowledge that you are bringing into the process. So what do you do when you start running a campaign and you start to get results, albeit with very low numbers? How do you combine your prior knowledge of what should happen with what is actually happening?
This is where the Bayesian approach can be very useful. I don’t think you need to understand any maths to use this approach, it is about the principles behind it.
I first read about this principle in the book below:
I have recommended this book about six times on this blog, so I am definitely a fan! The key part that is relevant to this blog post I have copied below. I tried to paraphrase it, but then I realised that Sean Carroll’s short explanation is better than anything I could come up with:
“Prior beliefs matter. When we’re trying to understand what is true about the world, everyone enters the game with some initial feeling about what propositions are plausible, and what ones seem relatively unlikely. This isn’t an annoying mistake that we should work to correct; it’s an absolutely necessary part of reasoning in conditions of incomplete information. And when it comes to understanding the fundamental architecture of reality, none of us has complete information.
Prior credences are a starting point for further analysis, and it’s hard to say that any particular priors are “correct” or “incorrect.” There are, needless to say, some useful rules of thumb. Perhaps the most obvious is that simple theories should be given larger priors than complicated ones. That doesn’t mean that simpler theories are always correct; but if a simple theory is wrong, we will learn that by collecting data. As Albert Einstein put it: “The supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.”
Everyone’s entitled to their own priors, but not to their own likelihoods.“
This might feel like a slightly obscure deviation from the subject matter of this blog (marketing!) but I don’t think it is.
Unlike many other areas, It is very difficult to come up with definitive evidence for why one approach is better than another. This can lead to the back and forth, debate about messaging and other areas.. or worse than that, you can even end up with the HiPPO principle for making decisions (“Highest Paid Person’s Opinion”).
But – If you understand that this is where a lot of people are coming from, that they are making decisions based on their prior experience and not necessarily the facts in front of them – then this makes it much easier to have a rational discussion. There will be a very reasonable logic behind why somebody is arguing for something. Listen to that person, interpret and apply intelligently.
I’ve written a few times before about AI and it’s impact on marketing. I’ve written here about the impact on Google search, and here about openAI and ChatGPT.
Finally though Microsoft is making a bit more of a song and dance about their offering, Copilot. I was sceptical at first because a previous version I had tried wasn’t great. But now I am nothing but impressed. What I’ve been most impressed by is the option to make your own chat bots based on your own data using Microsoft Copilot Studio. So what I’ve done here is take all of my old blog posts about marketing from the last 10+ years and created a Copilot! A sort of “B2B marketing Copilot”.
Is ChatGPT a threat Google? Obviously there’s nothing I can write about ChatGPT that hasn’t already been written 400 times, so this post is a sort of “naive plagiarism”. Still, with that in mind…
Where do you go first when you want an answer to a problem? This week I wanted to rewrite some old vb code (of which I am thoroughly ashamed) in C#. I didn’t want to do it from scratch so I looked for a tool to do the first draft.
In the past, my process for figuring out how to do this would be as follows:
1️⃣ Do a Google search on something like “How do I convert vb code into C#?”
2️⃣ Look at the top few search results and pick one or two. Often based on brand awareness or the bit of ad copy that they’ve written. In this example, https://converter.telerik.com/ appears at the top and I clicked through to have a look at their offering. Sometimes this will be an organic result sometimes it will be a paid for #ppc ad – and of course this is how Google make a lot of their money
3️⃣ So like most people my first port of call for a question is just to type it in the browser which then uses Google
4️⃣ But I noticed this morning that I’ve started doing something different. I’ve started asking questions in ChatGPT *before* I type it into the browser. Instead of a Google search I typed “Convert the following code into C#” into ChatGPT.
5️⃣ The results were excellent (see below)
6️⃣ The point here is not about how to update code to C#. The point is how we get answers to questions. Google has completely dominated this scene for a long time. But will it continue to do so? For the first time ever, for certain types of questions, I’m starting with ChatGPT then occasionally falling back on Google search.
So the good thing about this as a user is that we’re getting back to a world where great content wins rather than just the winner being the company with the best marketing team. But what if you’re a marketer??
And as soon as someone has a chrome plugin to change the default search to ChatGPT – then this starts to get interesting. In fact I’d be amazed if there wasn’t already something out there. I’m going to start hunting now, and you know where I’m going to start…
Finding a website with a chatbot on it (for example, https://chat.openai.com/), and having a go yourself, if only to see what everybody is talking about
Adding a generic chatbot to your own website. I’m not quite sure why you would do this, but it’s part of the process understanding how to integrate ChatGPT into your website
(now it starts to get more interesting…) automatically creating an FAQ for your website based on your content
Creating a ChatGPT bot that can go on your site for your customers to use to find out more about you and your company
I’ll talk about the first four points here and then, in the next article, the last point. This is a considerably bigger task, so needs a post of its own. The end goal is to allow customers and potential customers to come to your site and ask questions about your offering. There are two advantages to this approach:
If you’re resource constrained, you don’t have the people to be on the phones answering questions all the time.
Consistency. You can manage and see what’s being said to your customers on the website.
But where should I start?
Reading about it on the Internet
Not a whole lot more I can add here. If anything it’s hard to escape articles about the topic. The BBC has some good articles.
Playing with a chatbot yourself
The first question most people have is “But are these things any good? They’ll never fool me!”. Don’t listen to others, try it out yourself. I’d suggest that the openAI website itself is a great starting point. You may need to create credentials first, but spending some time here will really show you the power of what everybody is talking about. Here’s a pretty random example. I asked “What is account based marketing?”:
That’s very good. Yes, it’s a little generic, but I’ve done that with no effort, no research. If you wanted to find out about a new topic at work, 30 minutes with chatGPT would get you on your way.
Adding a generic chatbot to your site
Really this is a preparatory step before going on to the next more interesting stage. But it does introduce some of the useful resources.
I use WordPress for my site, set my example here is for WP. But the principle is the same – the difficult bit is creating the training data and then training up a model. If you can do that then getting it on WordPress is easy.
In particular, I want to highlight the Jordy Meow plugin. This is an incredible bit of kit, I was repeatedly pleasantly surprised by what was available and how easy it was to install and get working. This is no mean feet given that we’re moving into the territory have training AI models.
Like any WordPress plugin you install it from your dashboard. Then, on your WordPress site you’ll have something that looks a bit like the following:
Creating an FAQ for your website
So far we’ve looked at generic chatbots which are all over the web. But you want something for your website, based on your industry.
Again, I’m not going to go through the details of doing this because there are some fantastic notes on the AI engine help pages, and it will be different for different sites. But the most important point, the place where you need to spend most time and the place where you can really differentiate is on the training content. This will sound familiar to anybody who’s worked in marketing, but if you’re creating something to help you generate interesting content, then you have to have some interesting content to start with. I’ve used the process on this site to create 100+ questions and answers, without having to write a single question myself. The engine is so powerful that you can just give it a block of well written marketing text and it will automatically create some questions and answers from that text. To create the FAQs on this site I simply fed the engine the 94 blog posts I’ve written over the last 10 years and asked it to give me some questions based on this input.
You can see some of the results on this page. Remember all of these were auto generated, including the actual questions:
I’ve been enormously impressed by AI engine and the work done by Meow apps.
All great, but isn’t this a marketing blog? This just feels like a lot of technical detail! Well yes, that’s true. But one of the ways you can differentiate yourself from the crowd as a marketer is by moving on from just talking about technology to actually showcasing it. You can significantly boost your career by properly understanding how AI technologies can impact marketing. This needs to be more than just “Add AI to your marketing efforts!”. Your claims need substance and this is where the hard work comes in. I had the advantage that I’ve been writing blog posts for 10 years or more, so I had the source material. But you have to start somewhere, and this is one way to take the content that you’re writing and getting out to more people in a more palatable form.
Any further questions please feel free to get in touch to discuss how I can help.
Of course, really, we all want to build Skynet. However, until Judgement Day comes, we’ll have to make do with ChatGPT. ChatGPT is obviously a Big Deal right now for marketers, so I wanted to find out for myself.
Firstly, as a general point I do think it’s important to try technology out yourself before moving forward with a project. There are plenty of ways of trying out bits of tech if only for your own understanding. ChatGPT is no exception – you can try it online with almost zero effort:
This should start you off on your journey into the world of ChatGPT. For example, here are the results I got when I asked “What is the marketing flywheel model?”:
Definitely not wrong. And if I had to write a short piece on this topic I could do worse than copy and paste this into a blog post.
If your role is something like “content creator” then you can definitely get away with getting a machine to do your work instead. So, what’s the problem?
The issue can be seen in the response above. Though this answer is “not wrong” that is a long way from it being an insightful and useful piece of content. Customers coming to your site want insight, new ideas, new perspectives. They want to hear from industry experts, else why read your articles at all? If your article is just an aggregation of the content on the Internet, how are you differentiating yourself from everybody else out there?
The response above is marketing 101. Perhaps fine for a GCSE paper, but not good enough if you want to attract real customers (our job here). With follow up questions, I could definitely get more out of ChatGPT, but here are some of the things that are missing:
It’s very generic. How would this be different for your industry?
It’s not “Of the moment”. What’s new in the industry? What’s happened in the last month or two?
It doesn’t help with prioritisation. What needs to be done first? For a mature org vs. a startup?
How is the flywheel model different to other models? What is it similar to? Where shouldn’t you use it? Does it work the same in B2C as it does in B2B?
What’s the underlying strategy for this model? If somebody asked you “Why does it look like this?”, could you answer?
How is this different from the funnel model?
And so on. So again, for certain marketing tasks it is great. If I had an afternoon to write an FAQ about content marketing, this is where I’d start. But most of us aren’t under those time pressures – you should be writing quality over quantity. Have talked to some customers. Find out what their pain really is. Ask them why they bought from your competitor instead. Talk to industry experts. What you write from your own expertise will always be better from what ChatGPT comes up with.
It took quite a while to write part 2 of this post, for reasons I’ll mention below. But like all good investigations, I’ve ended up somewhere different from where I thought I’d be – after spending weeks looking at the Twitter feeds for different companies in different industries, it seems that the way Twitter is used and is useful varies enormously by industry. This makes it difficult to build a generic model for Twitter sentiment analysis (because a model built from, say, small B2C companies, isn’t really applicable for large B2B companies) – but it also makes interesting suggestions for how companies in different industries should use Twitter to help their businesses.
Oh, and my main conclusion? People sure do hate the airlines! More on that later.
First, a potted history of how I got here:
I wanted to build some models for Twitter sentiment analysis. What does that mean? It means “Give me a tweet for your company, and I’ll tell you whether it’s negative, neutral or positive. That allows you to monitor the Twitter feeds for you (or your competitors of course ? ), and track whether they’re getting better or worse”.
I’ve collected millions of tweets from a number of different industries, B2B, B2C, small companies and large companies.
In part 1, I hit some problems with mis-classification of tweets. This came from (I believe), a problem that the base models from Stanford NLP are built from a different corpus of texts – from a generic domain (of English sentences and paragraphs) rather than, say, tweets for companies. There were also still problems with creating a decent model for tweets specifically as opposed to general blocks of English text (again, more on that below in Appendix 1).
So the next job was – could I use the millions of tweets for various companies to build some sort of generic predictor model. I.e. give me a company tweet, I’ll tell you whether it’s Good, Bad or Ugly?
Well, the answer is No. Or at least not within the time that I’ve had so far. And the issue seems to be that the way Twitter is used by companies and by companies’ customers varies significantly by industry.
A first step in creating a predictor model is to manually assign sentiment scores to a long list of tweets – this creates a training set that you then use to train and create a model. During the creation of the model you repeatedly test the model against an out-of-sample dataset to see if your model is working (to avoid things like over-fitting). As a first step, I assigned manual values to around 500 tweets (i.e. I manually tagged 500 tweets as either very negative, negative, neutral, positive or very positive), then I tried to create a model from this. However, my cross-validation scores were terrible – the model was struggling to predict the sentiment of other unseen tweets. I know this is partially because 500 isn’t nearly enough, but still – how could the scores be so bad?
What was the problem? Like all machine learning issues, I’ve generally found that eye-balling the data can tell you a lot. I believe there were two problems with my model:
The problem already mentioned that the Stanford NLP Parser struggles with tweets. And it’s not a trivial case of just swapping in a new POS Tagger for Tweets. Let’s put that to one side for now.
The bigger problem is that, from looking at the manual tags I was assigning to tweets, the values I was using varied enormously by domain.
The latter creates the problem of domain-specificity. Is a model created from, say, tweets about the airline industry, relevant to tweets about online cloud storage? It seems not.
The first clue is the distribution of scores I was giving. Remember, 0=very negative, 1=negative, 2=neutral, 3=positive, 4=very positive. For the airline industry I found the following:
0
1
2
3
4
10%
32%
41%
12%
4%
I.e. though there were a lot of neutral tweets, there were a lot of negative tweets. Here’s a very standard example:
@Delta just rebooked my 70yr old parents on a longer flight back to cvg from fco and no extra legroom that I had paid for. Fail.
There’s a lot of this sort of thing going on for the airlines.
In contrast, here’s the distribution for a couple of big cloud providers (specifically, AWS and Azure combined):
1
2
3
4
3%
83%
11%
2%
An enormous number of utterly neutral tweets. Here’s a standard example:
RT @DataExposed: New #DataExposed show: Data Discovery with Azure Data Catalog. https://t.co/5sWuKpdoYx @ch9 @Azure
..pretty dry stuff.
This actually presents two distinct problems – firstly, as mentioned, if the nature of Twitter usage and the type of tweets varies so much by industry, then models will only be really effective within their domains. Fine – if you work in a given industry (e.g. Airline, IT, Fashion), then you can create a model for your industry and use that.
The second problem however is more difficult to work around. For a given industry, I’ve found that the way in which Twitter is used varies enormously. This is what you’re seeing in these very different distributions. And if the vast majority of tweets are of a given sentiment, then sentiment analysis becomes not only difficult, but actually not particularly useful!
In the airline industry, as far as I can see, Twitter is used almost exclusively for telling airlines how bad they are. There’s a pretty strong correlation between the number of tweets mentioning a given airline and the negative feeling towards that airline. This distribution does vary by airline (see table below), but when you know that most tweets are just complaints, what’s the value in searching for the occasional (positive) needle-in-a-haystack?
If I worked for an airline and wondered “How could we use Twitter to improve our brand?”, the answer would be pretty simple – firstly, improve my product! and second, employ customer service reps to look after these people and react to the complaints. As I say, some airlines are worse than others:
Row Labels
0
1
2
3
4
American Airlines
12%
35%
37%
14%
2%
British Airways
10%
29%
46%
12%
3%
Delta
10%
37%
38%
12%
5%
SouthWest Airlines
5%
26%
47%
16%
5%
United Airlines
12%
43%
35%
6%
4%
Virgin
0%
13%
44%
25%
19%
Well done Virgin and SouthWest. Delta and United – you have work to do…
The problem in the cloud services industry (AWS and Azure) is the opposite – mentions of these services tend to consist of semi-banal tweets about new services offered, new features and so on. I.e. Twitter is used to share information about the products and services and rarely to express emotive responses (it’s very rare to read “Can’t believe how amazing @AWS was today!!! #FTW” – it just doesn’t happen). Certainly the split between B2B and B2C tweets shows this difference (I looked at small and large orgs as well, from local shops, to fashion houses, to small tech companies).
I still think there’s value in implementing a domain-specific model (for example, a model “Just for small tech companies”). The only block is, as described in Appendix 1, the problem of Parsing tweets properly. Maybe once I’ve figured that out, I’ll find a way to classify the other million-odd tweets I’ve collected for the airline industry as a starting point (there are a lot of unhappy airline passengers out there!)
Appendix 1
The problem of parsing tweets
I was warned by the following tweet from the team at Stanford NLP:
Using CoreNLP on social media? Try GATE Twitter model (iff not parsing…) -pos.model gate-EN-twitter.model https://t.co/K2JAF5XwJ2#nlproc
The problem we’re trying to fix, as described in the previous post, is that, for us to understand the sentiment of any sentence we have to carry out a couple of stages first. To begin with we need to Part-of-Speech tag a sentence. So, identify “Dog” as a Noun, “Catch” as a Verb and so on. This has challenges with Twitter which is full of URLs, hashtags, #LOLs and so on. But – the GATE Twitter model mentioned above solves this by adding in this functionality to the POS tagger. If you run a tweet through the GATE POS-tagger it will identify http://some.url/ as a URL and so on.
So far, so good. However, the problem comes with the next stage – Parsing. What’s this? If you look at a sentence such as “I don’t like ice-cream”, the tagger will identify and POS-tag each component of this sentence – I, do, not, like, ice-cream. Great, but to understand the sentiment of this sentence, we need to group these elements further. We need to understand that there’s a hierarchy whereby do and not are grouped together, and apply to like to negate this term. I.e. this sentence is actually negative despite the fact that like is a positive word.
And herein lies the problem with parsing tweets – because of the language used in Twitter, often abbreviated, often partial, it’s very hard to properly parse tweets and work out this structure. This seems to be the problem from the very brief analysis I’ve done. The Stanford team (and others) do state “Sure, you can try the standard parsers using tweets tagged with the GATE POS tagger, but good luck with that!”. It obviously needs more work, and maybe when I have more free time, I’ll have a look!
Machine Learning (ML) and AI are big topics right now. Poor Lee Se-dol has just been beaten by AlphaGo – a machine put together by Google/DeepMind and there are numerous other examples in the news.So everyone is interested, and everyone wants to do more of it. Whether you work in marketing or any other discipline, there’s an expectation to be harnessing the power of ML and AI algorithms to provide insight, models and intelligence to applications.
So, what’s holding us back? Is it the tech – it’s too expensive or not available? I don’t think it’s this at all. It’s been possible to implement ML for decades. You can use R, MATLAB, SPSS, SAS or a ton of other tools or if not those, Excel or even write your own (I have my own Mickey Mouse clustering app here as I got so frustrated using others’ tools). And people like Microsoft are making the tech more accessible all the time (e.g. Azure Machine Learning). So I don’t think that’s the problem.
My opinion is that the biggest shortage is in people who really understand ML, and can use it properly. And this is certainly what I’ve seen at customers and companies we’ve spoken to about this problem – they know they want to do it, but they just can’t get the people! Where are these mythical data scientists? Do they even exist? Could we afford them, even if they did? These are the questions we hear.
The key issue here is that, with ML, a little knowledge can be a dangerous thing. One of the problems with most ML algorithms is that they are complicated. Or at least, you need a significant level of understanding to know what you’re actually doing. If you take a dataset and run a Support Vector Machine over those data points in an attempt at classification, do you really know what the output means? When it gives an unexpected outcome, do you know why? Without jumping to easy (but often wrong) conclusions? Even something easier like k-means clustering – what do those clusters really mean? If there are three clusters, one big and two small, is that really saying something about the fundamental nature of your dataset, or is it just an anomaly because you haven’t transformed your data correctly beforehand?
These are difficult questions – having the tool available to, say, run a k-means clustering algorithm, is only 10% of the battle. Knowing what to do with that tool is the real issue; and how not to present something to your boss that any smart cookie could undermine in 10 minutes.
So finding these people is hard. It’s made even harder because I’ve often seen over-inflation with what people mean by “Machine Learning”. In my experience, someone who claims to “Know statistics”, is likely to be comfortable with value/volumes, perhaps a “mean” or a “standard deviation”, but not much more. Those who say they “Understand ML”, often have a good grasp of statistics, but struggle, when things get tough with machine learning. And of course, if you claim to “Know AI” – understanding a Support Vector Machine doesn’t mean you can build the next Skynet!*
So it’s even tougher finding people who really know their stuff. Maybe it’s just about money – supply and demand. If these people are hard to get, and you’re competing with the City to get them, then maybe you just have to pay the asking price and that’s it.
Or of course you can pick up a book and start learning! There are lots available – my favourite is Machine Learning and Pattern Recognition by Chris Bishop. A bit older now, there’s a lot in there, and you still need to implement this stuff, but it presents the material in a clear way and, most importantly, it helps you understand what you’re really doing with these algorithms. So at least your conclusions will be based on a deep insight, and not on guesses based on pretty looking graphs..
* I also have a “Skynet” project in GitHub. Progress is slow.