Synthetic data: new hype or new tool?

I’ve worked in the niche of turning data in to data products for marketing advertising and automation businesses for nearly two decades, and I can see a huge shift on its way. We’re entering an age where the scarcest resource isn’t data but permission.

Consumers click “reject all,” regulators tighten screws, and large platforms ring‑fence their walled gardens. But strategy still runs on insight, right? The world of 1st party data identity (where the entity that has the customer relationship fully owns the data signal that identifies them) is incredibly important to a world where businesses and customers have a long standing relationship, but there is also another topic emerging: Synthetic data.

Synthetic data offers a third way: fabricate statistically faithful stand‑ins for the audiences we’re losing, then experiment without fear to create even better media, marketing and advertising plans. As with most data capabilities this is a concept used by advertisers first, but more important business cost centres next.

What is synthetic data?

At its core, synthetic data is algorithmically generated information that mirrors patterns in an original data set, but then deletes the link to any identifiable individual. Methods range from generative machine learning that can ‘hallucinate’ new rows of data in to your data set, to agent‑based simulators that replay full decision journeys.

This is why natural, organic human data is so important; if it acts as the training model for machine learning to generate synthetic data then you should be asking any synthetic data provider exactly how it is generated - and ask for a plain-English answer so you can understand its validity.

An example: Subconscious AI.

Subconscious AI positions itself as a “synthetic respondent” engine. The team trained language‑model agents on academic choice‑modeling research and a corpus covering 3.5 million real behaviours. Users log in, ask a question on how one thing affects another (e.g., “how do consumers trade off between price, brand reputation, and specific ingredient claims when choosing a daily facial moisturizer?”), tweak causal levers such as price or brand ethos, and watch simulated respondents debate, decide and surface quantitative insights often in under 15 minutes.

In this example, synthetic data generation flips the customer research sequence: hypothesize, simulate, then verify with a lean real‑world sample instead of the other way around.

Where it creates value for the marketing chain.

Use‑case	Practical win	Example
Cohort expansion	Build privacy‑compliant look‑alikes	Generate 10k synthetic “mid‑funnel DIY dads” for a home‑improvement retailer
Campaign sandboxing	Stress‑test creative, copy, offers	Run 50 price points on synthetic shoppers before launching the real promo
Martech data hygiene	Fill sparse tables, predict missing IDs	Model cookie‑less Safari visitors so attribution models don’t collapse
Market research turbo	Instant focus groups & longitudinal panels	Ask synthetic gamers how a battle‑pass tweak changes retention curve
Product roadmaps	Explore edge‑case adoption paths	Simulate a VR headset launch with niche accessibility tweaks

What’s good about synthetic data?

Ethical speed: No NDAs or weeks of recruitment - standard research takes months.
Cost efficiency: Pennies per respondent versus $300k for a traditional platinum tier research piece.
Edge‑case (weird instances) captured: when weird things happen like churn triggers or fraud behaviours, they can be weeded out.
Continual learning loop: As real data trickles in from the simulation run on synthetic data, it can be feed it back in to the engine to spin fresher synthetic sets and improve the outputs.

What are the limitations & banana skins?

Bias amplification: if your seed data under‑represents or is skewed, the synthetic twin will too, just faster. Garbage in, garbage out.
False confidence: High‑resolution graphics and fancy dashboards can mask low resolution data; always validate synthetic findings with a real‑world litmus test (more below).
IP leakage: if you’re ingesting your own proprietary customer logs, make sure your legals allow this sort of modelling - and make sure your own customer data is 1st party and anonymized if possible.

Things we need to think more about in synthetic data in our industry

Validate, even manually: we should be treating synthetic results as directional until validated by real world data. Run your own fairness checks—gender, ethnicity, region—before activating segments.

Trace histories and check on seed data: businesses should keep a “data birth certificate” linking synthetic sets to source vintage and generation method. I imagine there will be technologies that enable businesses to trace data history, popping up in the coming year or so. But for now, ask where the seed data comes from.

Stakeholder education: explain to your execs that synthetic cohorts are not crystal balls, they’re additive capabilities that enable you to run and rerun a much higher quantity of experiments rapidly. If you are aware of the actual realities of how the panel you paid $300k for and took 6 months to run was put together, this should be an easy win.

Three practical ways to audit a synthetic data product you’re sold in on.

To summarize

It’s easy to be cynical with new technologies like this, but if you’ve been in MarTech or AdTech for long enough you’ll know that research panels and audience segments from the last 50 years are nothing to shout home about when you scratch deep enough. Synthetic data offers up a tangible improvement in the same way that live data collection enabled a tangible improvement over in-person panels.

Marketers who mastered lookalike modeling a decade ago will find synthetic audiences the natural next step. Only now the levers and dials are way better: motivations, context and even causality. As synthetic data continues to revolutionize the world of customer insight, expect a new insights-to-action workflow: simulate the outcome → deploy micro‑test → scale. Done right, we move from guessing what customers will do to rehearsing it frame‑by‑frame before the curtain rises and you have genuine business impact.

In short, synthetic data is worth the effort to turn every effort into a controlled experiment and every strategist into a flight‑test engineer for the new world of marketing, branding and advertising.

Synthetic data: new hype or new tool?

What is synthetic data?

An example: Subconscious AI.

Where it creates value for the marketing chain.

What’s good about synthetic data?

What are the limitations & banana skins?

Things we need to think more about in synthetic data in our industry

Three practical ways to audit a synthetic data product you’re sold in on.

To summarize

Keep Reading

Agentic Browsers: v5 of the internet, and a new era for the open web

WTF is "Multimodal Data"?

30 Predictions for Martech, CX, Data, Identity, Commerce

Part 1: The Publisher's Dilemma: Understanding the Structural Shift in Digital Media Monetization

Visit www.krishraja.com.

Synthetic data: new hype or new tool?

What is synthetic data?

An example: Subconscious AI.

Where it creates value for the marketing chain.

What’s good about synthetic data?

What are the limitations & banana skins?

Things we need to think more about in synthetic data in our industry

Three practical ways to audit a synthetic data product you’re sold in on.

To summarize

Keep Reading

Agentic Browsers: v5 of the internet, and a new era for the open web

WTF is "Multimodal Data"?

30 Predictions for Martech, CX, Data, Identity, Commerce

Part 1: The Publisher's Dilemma: Understanding the Structural Shift in Digital Media Monetization

Visit www.krishraja.com.

An example: Subconscious AI.