Chapter 01
The audience already left, and the optimization didn't follow
For most of the last two years, the entire GEO conversation has been conducted in English, about English-language engines, citing English-language sources, measured against English-language prompts. That made sense when AI search was a coastal-US novelty. It does not make sense anymore, because the audience has globalized faster than almost anyone optimizing for it.
The clearest single number is platform geography. According to DemandSage's 2026 ChatGPT statistics, the United States now accounts for roughly 31% of ChatGPT's user base, down from 38% in 2024, which means about 69% of ChatGPT users sit outside the US. The same data puts India at roughly 48 million users, the second-largest single-country market behind the US, and shows Asia-Pacific climbing from 19% to an estimated 24-28% of global traffic between 2024 and 2026. ChatGPT crossed 1 billion monthly active users in June 2026 (DemandSage), and the marginal new user is far more likely to be in Mumbai, São Paulo, or Jakarta than in San Francisco.
Adoption is not just large abroad, it is more intense abroad. Visual Capitalist's 2026 mapping of AI adoption, drawing on survey data, found that several Global South economies are adopting AI faster than developed nations: worker AI usage reached 92% in India, the highest of any surveyed country, with Brazil third globally at 76%. Rest of World's 2026 reporting describes the same pattern, generative AI breaking out fastest in markets the US-centric GEO playbook has barely modeled.
Sixty-nine percent of the audience is not in the United States, and the fastest-growing slice of it does not run its first query in English. A GEO strategy tuned only for English-language US prompts is now optimizing for a shrinking minority of the people asking.
The mismatch is the story. Brands poured effort into being cited by an English-speaking machine, while the machine quietly became the default research tool for a planet that is more than 80% non-English-speaking. The visibility gap that opens here is not a rounding error. It is most of the market.
Chapter 02
The English-source bias is structural, not incidental
The reason non-English visibility is so hard is baked into how these models were built. They are English-native systems wearing a multilingual interface, and the imbalance starts in the training corpus.
Common Crawl, the web scrape underlying most foundation models, is itself skewed: its CC-MAIN-2025-47 release was 42% English, with the next language, Russian, at 6.5%, and languages like Hindi, Turkish, and Malay each under 1%. But the final training mix is far more lopsided than the raw web. Published figures put English at roughly 90% of Llama 2's training data, and analyses of GPT-class models estimate over 90% of training tokens are English, leaving the entire rest of human language to share the remainder. Worse, research summarized by Nature in 2025 notes that a large share of what non-English training data exists is machine-translated from English rather than natively written, which means the model often learns a language through a distorted, anglocentric mirror of itself.
Set that against the actual distribution of people. English is the largest language online by speakers, at about 1.19 billion internet users, roughly 26% of the online population (Statista), and English is published on close to half of all websites (Statista, October 2025). Yet fewer than 20% of people on earth speak English at all (LingoBright, 2026). The training data over-represents English by a factor of four-plus relative to the world's speakers.
- 42% of Common Crawl is English; 6.5% Russian; Hindi/Turkish/Malay each under 1% (Common Crawl, 2025).
- ~90% of Llama 2 training data is English; 90%+ of GPT-class tokens are English (multiple analyses).
- English is ~26% of online users but appears on ~half of all websites (Statista).
- Under 20% of the world speaks English (LingoBright, 2026).
The models are not neutral readers of the global web. They were trained on a corpus where English outweighs every other language combined, and where much of the "non-English" data is English in translation. The default gravity of every answer pulls toward an English source.
This is why the contrast at the heart of FancyAI's thesis matters even more across borders. AI does not rank, it recommends, and the mention is the signal. But the corpus that decides which mention surfaces was assembled with a heavy English thumb on the scale. In a non-English market, being seen is not the problem. Being selected, when the model's instinct is to reach for an English domain, is the entire problem.
Chapter 03
How AI engines cite differently across languages and markets
The bias is structural, but it is not uniform. The most important practical finding of 2026 is that AI engines localize wildly differently from one another, so your visibility in any given country depends heavily on which engine your customers happen to open.
The sharpest measurement comes from xfunnel's analysis of 56,223 citations across four countries and six AI engines. It found a 53-percentage-point gap between the best and worst localizers. Perplexity led at 56.5% non-global citations, with Copilot close behind at 56.0%. The middle pack included Grok at 36.2% and ChatGPT at 29.7%. At the bottom, Gemini sourced just 5.3% of its citations from non-global domains, effectively ignoring local web ecosystems almost entirely. Across the board, 66.5% of top citations came from global, mostly US-based domains, while local ccTLDs like .de and .nl represented only 17.6%, and localized subdomains a near-invisible 0.9%.
Language, not location, is the dominant trigger. Evertune's 2026 testing found that AI responses key primarily off the language of the query rather than the user's location settings: an English prompt surfaces English sources, a Spanish prompt prioritizes Spanish content from local publications. And the engines that do localize, localize most aggressively for their top recommendation, the single most regionally adapted citation is usually the #1 result, which is exactly the slot that gets named in a recommendation.
Glenn Gabe's cross-platform testing (GSQI, 2026) showed the same divide at the mechanical level. Querying in French and Italian:
- Copilot consistently returned correct-language URLs, leveraging Bing's multilingual systems.
- ChatGPT answered in the right language but cited US English source URLs.
- Perplexity usually returned the US English version, occasionally the correct one.
- Claude defaulted to US English versions when asked for sources.
- Gemini and Google's AI Mode more often returned the correct language version.
Your AI visibility in Germany or Mexico is not one number. It is six different numbers, one per engine, separated by up to 53 points. Win on Perplexity and you may be invisible on Gemini in the same market, for the same query, on the same day.
The strategic consequence is that "which engine" becomes a market-entry decision, not just a tracking detail. The engine your local buyers use determines whether your domain is even in the candidate pool.
Chapter 04
Where AI search is growing fastest, and it isn't the US
The English-source bias would be a manageable footnote if AI search were still concentrated in English-speaking markets. It is the opposite. The growth is overwhelmingly in markets where the model's default instinct works against local brands.
On the surface where Google injects AI, the country skew is dramatic. Semrush data reported in 2026 found that AI Overviews trigger on 37.2% of keywords in Indonesia, 29.1% in the Philippines and Mexico, 26.8% in India, and 26.4% in Nigeria, while the United States ranks 13th at just 20.5%. The AI answer layer is denser in emerging non-English markets than in the US.
Population-level adoption tells the same story. Visual Capitalist's 2026 data shows India leading worker AI adoption at 92% and Brazil third at 76%, and student-adoption rankings led by Brazil (11.6%) and India (11.5%). India's traffic on ChatGPT roughly doubled within a single month after the launch of the lower-cost ChatGPT Go tier at $4.50/month (DemandSage), a price designed precisely for high-volume, non-US markets.
- AI Overviews fire on 37.2% of Indonesian keywords vs. 20.5% in the US (Semrush, 2026).
- India: 92% worker AI adoption, the world's highest (Visual Capitalist, 2026).
- Brazil: 76% worker adoption, third globally; ~5.7% of ChatGPT traffic (Visual Capitalist; DemandSage).
- Asia-Pacific is ~28% of ChatGPT traffic and the fastest-growing region (DemandSage, 2026).
The fastest AI-search growth on earth is happening in Indonesia, India, the Philippines, Brazil, and Nigeria, in languages the models under-trained on, on engines that default to US domains. That is the gap. The demand is exploding exactly where the supply of local citations is thinnest.
For a US brand with any international ambition, this reframes the opportunity. The markets adopting AI search most aggressively are also the markets where the competitive field of well-optimized local content is emptiest. The first competent multilingual GEO program in a category often faces almost no one.
Chapter 05
Translation is not localization, and AI can tell the difference
The instinct, once a brand sees the gap, is to run its English pages through machine translation and call the market covered. That is the single most common and most costly mistake in international GEO, because the evidence shows translation moves the needle but localization is what actually wins selection.
The translation half is real and measurable. Weglot's 2026 study, analyzing 1.3 million citations across Google AI Overviews and ChatGPT for Spanish-language markets, found that translated websites received up to 327% more visibility in AI Overviews on non-English queries than untranslated ones. Translated sites pulled 24% more total citations per query, and critically, untranslated sites showed a 431% gap in citations between Spanish and English queries, versus only a 22% gap for translated sites. Weglot's blunt summary: untranslated means invisible. If your page does not exist in the query's language, the model treats you as if you do not exist for that query.
But translation only opens the door. Localization decides whether the model trusts you enough to name you. The same body of research found that geography compounds language: in US-based testing, Spanish queries still returned predominantly English sources, with only about 32% of citations from Spanish content, but run through a Mexico City connection the Spanish share jumped to roughly 63%. In localized Mexican-market testing, 96% of AI Overview citations came from Spanish sources, and English sources were pushed out of the top five entirely when a Spanish option existed. Language tells the model what to read. Geography and local authority tell it whom to believe.
This is where generic translation fails. The KnowledgeBase research and 2026 practitioner testing converge on a consistent finding: regional slang and idiom outperform "generic Spanish." Content written for Mexico beats content written in neutral textbook Spanish for Mexican queries, and the same holds for Spain versus Latin America, or Brazilian versus European Portuguese. AI systems can detect thin, machine-translated text and are measurably less likely to cite it. The pages that win carry local examples, local expert quotes, local publication mentions, and region-specific data, the regional E-E-A-T signals a machine reads as genuine local authority.
Translation gets you into the language. Localization gets you selected within it. A machine-translated page is visible the way a tourist speaking phrasebook Spanish is audible, technically present, obviously foreign, and rarely the voice anyone trusts.
The practical line is clean. Machine translation is a starting point, never the finish. Every page that matters needs a human editor fluent in the regional variant, regional keyword research instead of translated English keywords, and local proof the model can verify off your domain.
Chapter 06
Regional platforms and the hreflang blind spot
Two technical realities sit underneath everything above, and both are routinely missed by US teams. The first is that large parts of the world do not run on Western engines at all. The second is that the standard signal for serving the right language version is largely ignored by the AI layer.
Start with the platform map. In China, Western AI engines are functionally absent, and the field has consolidated around domestic models. Search Engine Land's 2026 reporting describes a fragmented Chinese ecosystem where Baidu's ERNIE, ByteDance's Doubao, DeepSeek, Kimi, and Alibaba's Qwen dominate. Baidu released ERNIE 5.1 in May 2026, landing at #4 on the LMArena Search Arena leaderboard, and is folding DeepSeek into its search product. Optimizing for these systems is a separate discipline with separate signals, separate hosting and indexing realities, and separate content norms. Korea (Naver) and Japan carry their own platform-and-language dynamics where local-language content is essentially mandatory for local queries. Treating "Asia" as one market, or assuming a ChatGPT strategy transfers to Baidu, is a category error.
The second reality is the hreflang blind spot. Hreflang tags are the web standard that tells a search engine which language and regional version of a page to serve. Traditional Google search honors them. But Glenn Gabe's 2026 testing (GSQI) found that most AI chat platforms largely ignore hreflang, with the consistent exception of Bing-powered Copilot. ChatGPT, Perplexity, and Claude did not reliably use hreflang to pick the correct language URL. Google's AI Mode and Bing did. So the signal you would normally lean on to route a French user to your French page does almost nothing inside most AI chat, today.
- China runs on ERNIE, Doubao, DeepSeek, Kimi, and Qwen, not Western engines (Search Engine Land, 2026).
- ERNIE 5.1 (May 2026) ranked #4 on the LMArena Search Arena leaderboard (Codersera/LMArena).
- Hreflang is ignored by most AI chat platforms except Bing-powered Copilot (GSQI, 2026).
- ccTLDs are 17.6% of top citations; localized subdomains just 0.9% (xfunnel).
The signal you trust to serve the right language is invisible to the engines that matter most. Most AI chat does not read hreflang. It reads the content, the language it is written in, and the local authority around it. You cannot route your way into the answer. You have to localize into it.
The takeaway is not to abandon hreflang, it still serves traditional search and may matter more as AI platforms mature, but to stop relying on it as your multilingual AI strategy. The durable signal is genuinely localized content with local authority, because that is what every engine, Western or regional, actually evaluates.
Chapter 07
The execution playbook for multilingual AI visibility
International GEO is not a translation project bolted onto an SEO team. It is a market-by-market selection problem, and it runs on a sequence.
1. Pick markets by AI demand, not legacy revenue. Prioritize where AI search is growing and your category is under-served. The Semrush AI Overview trigger rates and the adoption data point to India, Indonesia, Brazil, Mexico, the Philippines, and Nigeria as high-AI-density, low-competition openings. Rank candidate markets by AI search adoption, engine mix, and how empty the local citation field is.
2. Map the engine mix per market before you write a word. Visibility is six different numbers separated by up to 53 points (xfunnel). Identify which engines your buyers actually use in each market, ChatGPT and Perplexity in much of the West, Copilot where Microsoft is entrenched, Baidu/ERNIE and peers in China, Naver in Korea, and weight effort accordingly.
3. Localize, do not translate. Machine translation is the first draft. Every page that matters gets a regional human editor, regional keyword research, regional idiom, and local examples. Generic Spanish loses to Mexican Spanish; phrasebook output loses to native voice. The 327% translation lift is real, but the 96% local-source dominance only comes with genuine localization.
4. Build local-language authority off your own domain. The mention is the signal in every language. Earn brand mentions and citations on the local-language publications, review sites, forums, and directories the regional engines defer to. Local editorial verification outweighs any amount of self-description, and it is the regional E-E-A-T that pushes English defaults out of the top five.
5. Use ccTLDs and local hosting where you can, but don't bank on routing. ccTLDs (17.6% of top citations) outperform subdomains (0.9%) as a localization signal (xfunnel). Implement hreflang for traditional search, but treat localized content and local authority, not hreflang, as the engine of AI visibility.
6. For regional platforms, build a separate program. Baidu, Naver, and peers are not a translation of your Western strategy. They need native content, local hosting and indexing, and platform-specific structure. Budget for them as distinct workstreams or not at all.
7. Measure selection per market and per engine. Rank tracking by country is not enough. Track share of voice in AI answers per language, per engine, the prompts you appear and do not appear in, and which local competitors get named when you do not. Weglot's data shows pickup is fast where the work is real, roughly 21% of properly translated pages were referenced by AI within 60 days, so the feedback loop is measurable in weeks, not years.
The old job was being seen in a market. The new job is being selected in it, in the local language, on the engine local buyers use, against an English default the model reaches for first. You cannot translate your way there. You localize, you build local authority, and you measure selection one market at a time.
Chapter 08
The first-mover window is open and closing
Every disruption has a window where the work is cheap because almost no one is doing it. International GEO is in that window right now, and the asymmetry is unusually favorable.
Three forces compound. First, demand is exploding in exactly the markets US brands have ignored, with AI Overviews firing on more than a third of Indonesian keywords and worker adoption above 90% in India (Semrush; Visual Capitalist). Second, the competitive field of well-localized content in those markets is nearly empty, because most global brands stopped at machine translation or never localized at all, leaving the local citation ecosystem thin. Third, the structural English bias means the brands that do localize properly are not fighting other localized competitors, they are fighting the model's lazy default toward US domains, which a single well-built local presence can displace, as the 96% Spanish-source dominance in localized Mexican testing shows.
The brands that move now get to define the local answer in their category before the field fills in. The ones that wait will face the same problem they face in English at home: a crowded field, established local authorities, and a model that already has a trusted answer that is not them.
The first-mover window in international GEO is wider than it ever was in SEO, because the demand is bigger, the competition is thinner, and the default the model reaches for is beatable. The brands that localize for selection now will own the answer in markets where their rivals are still translating.
The audience already globalized. The optimization has not. That gap is the opportunity, and it is the one most US brands are still ignoring.