Teaching an AI to Make a Negroni (and Why It Took Thousands of Tasting Notes to Get There)
We tried asking an LLM to create cocktail recipes. The results were technically plausible and completely wrong. Eight months and a structured tasting database later, we're at 95% accuracy.
We tried asking an LLM to create cocktail recipes about eight months ago. Cold prompt, no context, just a straightforward request: build us a spring serves menu using our product range. The results were technically plausible and completely wrong. Proportions that would have produced something undrinkable. Flavour pairings that showed zero understanding of how bitterness and sweetness actually interact in a glass. The model knew what a Negroni was in the same way someone who'd only ever read about swimming knows how to do a front crawl.
That failure started a project that's now reached 95% accuracy on recipes that produce genuinely delicious drinks. But the path from zero to here wasn't about better prompts or fancier models. It was about building a structured database of our own knowledge and feeding it in systematically.
Why cold-prompting fails for niche spirits
LLMs understand flavour conceptually but not experientially. They can tell you that gentian is bitter and orange peel is aromatic. What they cannot do, at least not out of the box, is grasp the nuances of how those elements behave in combination. The viscosity of an amaro, the tannin structure of a particular vermouth, the way bitterness length changes when you introduce a citrus modifier. These are sensory realities that barely exist in the training data.
For mainstream spirits this matters less. There's a mountain of published cocktail literature about vodka and gin and whisky. But for niche products like craft vermouth and amaro, the published reference material is thin. Research into fine-tuning language models for recipe generation confirms that domain-specific training data dramatically outperforms general-purpose models, even when those general models are much larger. The detailed production and tasting data that would let a model reason properly about our products simply doesn't exist in the public domain. If we wanted an AI that could work intelligently with our range, we had to provide that data ourselves.
Building the structured tasting database
We started by creating structured tasting notes for every product in our range. Not the marketing descriptions you'd put on a shelf talker, but proper analytical breakdowns: aroma profile, palate character, bitterness intensity, sweetness level, tannin, length, viscosity. All captured in a consistent schema so the data was machine-readable from the start.
Then we cross-referenced each product against drink styles, seasonal contexts, and serving scenarios. On-trade Martini and Negroni variations. Consumer entertaining. Dinner party serves. Picnic drinks. Budget considerations. The database isn't just what our products taste like. It's how they behave in different contexts with different companion ingredients.
We also fed the model reference material from cocktail literature, amaro and vermouth reference books, and published research on botanical ingredients and flavour profiles. Major flavour houses like Symrise are already combining flavorist expertise with machine learning to predict optimal formulations. We're doing something similar at a much smaller, more niche scale.
The bitterness problem (and why subjectivity is the real challenge)
This is where it got genuinely interesting from a food science perspective. Bitterness perception is ultra-subjective. Research into taste receptor genetics confirms that individual differences in how people experience bitter compounds are profound, driven by a combination of genetic variation, experience, and learned associations.
For a craft amaro producer, this isn't academic. It's the central challenge of the product category. It's rare that a majority of people trying a complex bitter amaro will reach the same conclusion about its taste profile. That subjectivity makes it incredibly difficult to build training data, because the "correct" description of a product's bitterness depends partly on who's doing the tasting.
Our advantage, and the reason we could eventually make this work, is that we have thousands of reference points. Professional feedback from bartenders and sommeliers, consumer tasting data from events and trade shows, structured notes from our own production team. Enough data points to build flavour profiles that work in an almost universal way, accounting for the natural spread of perception rather than trying to pretend it doesn't exist.
Where the recipe engine stands now
The system generates recipes for specific occasions, budgets, and taste profiles. You can ask it for a low-bitterness aperitif serve for a summer garden party and get something genuinely well-constructed. Not just technically valid, but something we'd be happy to serve.
| Metric | Early results (month 1) | Current (month 8) |
|---|---|---|
| Usable recipes without adjustment | ~0% | ~95% |
| Data source | Cold LLM prompt | Custom structured tasting database |
| Flavour accuracy | Conceptual only | Sensory-informed |
| Bitterness handling | Generic | Calibrated across perception range |
| Scenario awareness | None | Occasion, budget, season, serve style |
Every recipe still gets tasted before it goes anywhere public. The model has no palate and it never will. But as a hypothesis engine, narrowing the field of possibilities so we can focus our actual tasting time on the most promising combinations, it's become a genuine part of how we develop serves.
The bigger realisation has been about data as competitive advantage. Because we built the tasting note database ourselves, because the detailed interaction data for niche products like ours simply doesn't exist anywhere else, we've created something that's genuinely proprietary. Any producer could train a model on their own products this way. The question is whether they're willing to invest the time in building the structured data that makes it work. The LLM is the easy part. The knowledge architecture underneath it is where the real work lives.
Frequently asked questions
Can AI create cocktail recipes for craft spirits?
Yes, but not by cold-prompting a general LLM. Effective AI recipe generation requires structured training data specific to your products, including detailed tasting notes, flavour profiles, and interaction data. With proper data architecture, we reached 95% accuracy on recipe generation for craft amaro and vermouth serves.
Why do LLMs struggle with niche product flavour profiles?
LLMs understand flavour conceptually but not experientially. For mainstream spirits there is extensive published data, but for niche products like craft vermouth and amaro the detailed sensory information barely exists in public training data. Producers need to build and supply their own structured tasting databases.
How subjective is bitterness perception in spirits?
Extremely. Research in genetics and taste perception confirms profound individual differences in how people experience bitter compounds. It is rare for a majority of tasters to agree on a complex bitter amaro's taste profile. This subjectivity makes AI training challenging and requires large numbers of reference points.
What data do you need to train an AI recipe engine?
Structured tasting notes for each product (aroma, palate, bitterness, sweetness, tannin, length, viscosity), cross-referenced with drink styles, seasonal contexts, serving scenarios, and companion ingredients. Reference material from cocktail literature and published flavour science research also improves results significantly.
Robert Berry is co-founder of Asterley Bros, a London-based premium aperitivo company, and Absolution Labs, an AI automation consultancy for drinks businesses.