PRISM converts hundreds of simulated buyer queries into five weighted pillar scores and one AEO health score, with every figure traceable to a real AI response.
Why one number needs a defensible method
A single AEO health score is only useful if the method behind it can be examined. The score in question is the Aicadium AEO tool, PRISM’s: a measure of how your brand appears when buyers ask an answer engine such as ChatGPT, Claude, Gemini, Qwen, and the like, about your category, rather than typing it into search. Whether you appear in the synthesised answer, how you are described, and what the model cites as its authority are becoming as consequential as search rankings once were. This piece explains how PRISM calculates the number.
It covers how the queries are generated, how each of its five scoring pillars is measured, how they are combined, and where the edge cases lie. Nothing in the pipeline is a black box.
Measurement principle: prioritising distributions over individual responses
The first principle is that no single query can be trusted, because answer engines are non-deterministic. The same model, asked the same question twice, can return materially different answers. Measuring one response would capture noise rather than signal.
PRISM therefore runs at scale. For one company, it issues several hundred simulated prompts. These span multiple buyer personas, multiple stages of the buying journey, and multiple models, including ChatGPT, Claude, Gemini, and Qwen. Each prompt runs three times per model. The unit of measurement is the distribution of outcomes rather than any individual response.
How the queries are built without bias
The prompts are generated from the business domain rather than the brand name, which keeps the result honest. Feeding a company name into a model lets the model’s existing knowledge of that company shape the prompts, which flatters the company. Instead, PRISM crawls the product space to understand the domain objectively, then generates prompts a real buyer might ask.
Coverage comes from a four-dimensional taxonomy PRISM calls the Context Grid: Market Domain by Query Type by Persona by Region. Each cell produces three prompts. Realism comes from an independent check, so that every prompt passes an LLM-as-judge gate that marks it as pass, revise, or drop. A prompt that a real buyer would not ask is discarded before it reaches the engine.
The five pillars, and how each is scored
The five pillars track a buyer’s encounter with the answer, revealing whether your brand appears at all (Presence), how prominently (Ranking), how accurately it is described (Insight), and whom the model cites as the authority (Sourcing). Plus, it analyses the infrastructure beneath them all that dictates whether AI crawlers can read your site in the first place (Mapping).
Presence carries the highest weight at 25% because absence makes the other pillars irrelevant. Presence combines three measures. Mention rate is the share of responses that name your brand. Cross-platform consistency is whether you appear across models or only some. Cross-run consistency is whether you reappear reliably when the same question is repeated.
Ranking accounts for 20% of your overall visibility and reflects your prominence when you are mentioned. It measures your share of voice in responses, indicating how often you are the primary recommendation versus a runner-up, and how often competitors overshadow you. Position still carries psychological weight: the first-named brand receives implicit endorsement.
Insight carries 20% and measures the accuracy of the description. Insight scores sentiment, whether the model qualifies and categorises you correctly, and how often it pivots away from you mid-answer. Inaccurate framing in an answer is more damaging than in a blog post, because the buyer trusts the answer more.
Sourcing accounts for 15% of the evaluation and assesses who the model recognises as the authority regarding you. PRISM evaluates the mix of citations by the quality of sources across your own content, as well as government, educational, non-governmental, social, and blog sources. A company can be the most-mentioned brand in its category but still receive a low score in this area if the model cites third-party sources rather than its own pages.
Mapping represents 20% of the evaluation and assesses whether AI crawlers can access and read your site. Unlike the other four pillars, Mapping is based on a direct inspection of your website rather than on responses to specific queries, such as whether crawlers are blocked, whether your content is structured for machine parsing, and whether the technical fundamentals are in place. While Mapping might seem similar to Presence, it is actually distinct from it. This distinction is a common source of confusion, which will be addressed further below.
How the pillars roll into one score
The AEO health score is the weighted sum of the five pillars. Presence carries 25%; Ranking 20%; Insight 20%; Sourcing 15%; and Mapping 20%. The weights sum to 100. So, a pillar score of 60 on Presence contributes 15 points to the health score, while the same 60 on Sourcing contributes 9 points.
The weighting is deliberate rather than arithmetic convenience. Presence leads because a company absent from the answer cannot benefit from accuracy or citation. Ranking and Insight sit at the same weight by design: one measures your standing in the race, the other the accuracy of your framing, and there is no principled reason to rank one above the other. Mapping joins them at 20 per cent as a judgment call. It refers to the infrastructure that decides whether a model can read you at all, which is weighed alongside the answer-level pillars, even though it sits beneath the answer rather than in it. Sourcing carries the lowest weight not because it matters least, but because it is a lever on the other pillars rather than an outcome in itself.
Edge cases that change the reading
The first edge case is the visibility versus infrastructure distinction: the Presence and Mapping distinction. Presence refers to whether your site is mentioned in answers, while Mapping concerns whether AI crawlers can access and read your site. It’s possible for a company to have a strong Presence score while struggling with Mapping; this means the model recognises the company from training data or third-party sources, even if its own webpages are not readable by crawlers. In essence, Presence is a visibility issue, whereas Mapping concerns infrastructure, and the solutions for each differ.
The second edge case is high mention with low authority. In one pilot audit, a company ranked at the top of its category for mention rate but received an F for citation authority. The model treated a review site and a user listing as the authorities on the company, while its own pages went uncited.
The third edge case is core strength with adjacent invisibility. A company may excel in search results for queries related to its core category, but it can completely disappear from related categories that buyers also consider. For example, imagine a payroll platform that ranks highly when users search for tools to manage payroll. However, this same platform may not be mentioned when buyers inquire about benefits administration—a need that is evaluated alongside payroll in their purchasing decision. While the company’s strong performance in its core area may present a healthy overall score, a competitor might dominate the conversation regarding the adjacent category. This discrepancy can create a misleading impression, as strong performance in the core area may obscure gaps in other important areas.
Why nothing is a black box
Every figure in PRISM’s health score traces to a real response. Mentions are first flagged by pattern matching. A model then parses each response to extract the named brands, sentiment, and qualifying language. The metrics are computed from that structured data. A synthesis step turns the scores into a narrative in which each claim links to the response that supports it.
A person reviews the findings and the recommended actions before PRISM delivers them. The method is reproducible because each stage produces a structured artefact for the next. For the strategic case behind the health score, see the full briefing; for the plain-language version, see the briefing for non-technical leaders.


