How it connects

One engine, one continuous loop

The stages form a single loop, designed so that no component carries more inferential weight than it can support. Consumer voice identifies what people want and what fails them. The molecular layer ranks the pairings most compatible with those preferences. The survey closes the loop by testing whether the compatibility signal translates into real preference, and the GREEN gate confirms that it does for this panel and these concepts. The same loop can be re-run on new data and new flavour territories.

Data pipeline summary

Component	Details
English tweets	5,021 across 6 query groups
China posts	1,649 across 3 platforms (Xiaohongshu, Weibo, Douyin)
FlavorGraph nodes	8,298 nodes; 8,279 screened for Variant C
Survey respondents	n = 34 (APAC urban, aged 25 to 38)

AI performance metrics

Metric	Value	Notes
NLP inter-model agreement (English)	0.635	Mean confidence, VADER vs TextBlob
NLP inter-model agreement (Chinese)	0.731	Mean confidence, RoBERTa-JD vs SnowNLP
Clustering k selection	k = 5	Silhouette optimisation over k in 2 to 7
FlavorGraph HIGH-tier compatibility	0.73 to 0.79	Variant A and Variant B
FlavorGraph LOW-tier compatibility	0.26	Strawberry discriminative baseline
Survey validation (Spearman r)	0.90 (p = 0.037)	Compatibility vs mean liking, n = 34
Survey tier separation	HIGH 6.9 / LOW 4.3	9-point hedonic scale