Context: Today, James Berry, CEO & Founder of LLMrefs, publicly shared the internal system prompt for ChatGPT 4o, dated June 2025, on LinkedIn. When asked about it, ChatGPT itself confirms the document’s authenticity.
GPT Insights has analyzed the leaked material in detail, focusing specifically on how ChatGPT uses — or more precisely, limits — web search. The document provides rare, explicit insight into when web search is triggered, how it works, and what it means for publishers, SEO professionals, and website owners.
1. When Is Web Search Activated?
The system prompt defines strict, narrow conditions under which ChatGPT 4o is allowed to use the web tool:
„Use the web tool to access up-to-date information from the web or when responding to the user requires information about their location.“
Explicit examples show web search is only permitted for:
- Real-time or current information (e.g., weather, sports results)
- Location-specific information (e.g., local businesses, events)
- Hard-to-find, niche topics not covered in the model’s training data
- Situations where outdated information could be harmful, such as: „…using an outdated version of a software library or not knowing the date of the next game for a sports team…“
When Search Is Prohibited
The document clearly states that web search is not activated if ChatGPT can answer based on internal knowledge:
„Avoid using the web tool for information already known to be available internally unless the user explicitly indicates preference for public sources.“
Additionally, the older browser tool is completely disabled:
„IMPORTANT: Do not attempt to use the old browser tool or generate responses from the browser tool anymore, as it is now deprecated or disabled.“
2. How Does ChatGPT Search When Allowed?
The leaked prompt outlines clear technical controls for how web search works:
- Parallel Queries: ChatGPT can generate up to five distinct search queries per request.
- Boosting Key Terms: Relevant terms are marked with + to improve search result targeting.
- Freshness Control: The –QDF parameter adjusts how strongly fresh results are prioritized, on a scale from 0 (timeless facts) to 5 (critical, time-sensitive information, where results from the past 30 days or newer are preferred).
- Multilingual Redundancy: If the user’s query is not in English, ChatGPT sends search queries in both English and the original language.
Example:
„User: Find the latest results from Wimbledon 2025“
Assistant (to=web.search): search(„Wimbledon 2025 latest results“)
Note: While the system prompt shows how the web
tool is triggered in this example, it does not explicitly mention the use of + Boost
operators or the --QDF
freshness parameter for web search. These mechanisms are explained earlier in the document for query generation in general, but whether they are consistently applied to public web search remains unspecified in the leak.
3. What the Prompt Says About Links and Source Visibility
The system prompt contains no indication that ChatGPT stores URLs or maintains a persistent web index. Instead, links in responses only appear if surfaced through live search and given how rarely search is activated, this means:
- Links in ChatGPT answers are the exception, not the rule.
- There is no crawling or storage of web pages comparable to search engines like Google.
Website visibility in ChatGPT answers can occur via real-time search results when live search is triggered. However, given how rarely search is activated and the lack of persistent URL storage, this form of visibility remains the exception.
For SEO and content visibility, the implication is clear: If your website isn’t ranked prominently in Bing real-time search snippets at the moment ChatGPT triggers search, you are unlikely to be cited or generate clicks.
4. Conclusion: Rare Search Means Rare Links Same Pattern as the Claude Leak
The ChatGPT 4o leak confirms what was already exposed in the recent Claude system prompt leak, analyzed by GPT Insights:
- ChatGPT only triggers live search when strictly necessary.
- The system prompt does not suggest that ChatGPT stores URLs or maintains a structured web index comparable to a search engine.
- Links in AI answers mostly appear via live search. When generated from model knowledge, they tend to point to homepages. Longer subpage links are often broken, as they are probabilistically reconstructed and frequently inaccurate.
- For most questions, responses come purely from the model’s internal knowledge.
Unlike search engines, LLMs like ChatGPT do not maintain a structured, queryable index of URLs. The model has no direct knowledge of which web pages exist at any given time. Instead, when a URL appears in an answer, it is not retrieved from a reliable source list but generated probabilistically — reconstructed token by token from statistical patterns within the model’s training data.
This probabilistic reconstruction is inherently unreliable. It often produces broken links, outdated page references, or fictional URLs that never existed — a well-known source of 404 errors in AI-generated content.
To avoid these risks, LLMs rarely include links in their answers unless they are grounded via a live search. Only through such a real-time search can the model retrieve an actual, up-to-date link from an external index and cite it with confidence.
For a detailed analysis of why LLMs fundamentally lack a persistent URL index — and the consequences for SEO — see the German-language article: Why LLMs Don’t Store URLs – And What That Means for SEO.
As stated in the Claude analysis:
„Claude does not store URLs as a fixed part of its knowledge representation. Links are only surfaced if live search is activated and the system deems it necessary to show sources.“
What this means for website owners and SEOS:
- You should not expect consistent links or traffic from LLMs like ChatGPT. A recent analysis by SISTRIX confirms this: ChatGPT links to external sources in only 6.3% of its answers. For comparison, Gemini does so in 23% of cases, DeepsSeek in 11.3%, with an overall average of just 13.95% across all tested AI chatbots.
- Visibility depends entirely on appearing in real-time search results at the exact moment live search is triggered.
- The vast majority of AI answers are generated without querying the web meaning no opportunity for your site to be cited.
In short: LLMs do not „crawl the web.“ They rely on internal model knowledge by default and only activate live search when strictly necessary.
GPT Insights will continue monitoring system leaks and publishing transparent, fact-based updates to help content creators and SEO professionals understand the true impact of AI on web visibility.