HTML Accessibility
Checks whether AI search engines like ChatGPT and Claude can access your website content
Every scan runs 55 checks (36 of them scored across 8 categories), measuring how well AI search engines can find, extract, and cite your content.
Can AI crawlers find and access your content?
Checks whether AI search engines like ChatGPT and Claude can access your website content
Checks whether AI bots can reach the page by parsing robots.txt, X-Robots-Tag, and meta robots.
Validates canonical tags, robots directives, and crawler access.
Compares raw HTML vs browser-rendered HTML to measure JavaScript dependency.
Checks whether critical GEO signals (H1, structured data, key passages, and entity mentions) appear within Google's 2 MB HTML processing window. Pages exceeding this limit risk having their most important AI-readable content cut off before crawlers finish parsing.
How many of your owned domains appear in Google results for your brand.
Is your content structured for AI extraction and citation?
Evaluates heading structure quality for AI extractability.
Checks whether content leads with conclusions (answer-first format).
Measures data-richness of content: facts per 100 words.
Checks whether paragraphs are citable in isolation.
Checks whether HTML converts cleanly to Markdown.
Measures the ratio of AI-visible content to total page content. Hero metric: 'AI sees X% of your page.'
Measures Flesch-Kincaid grade level and sentence complexity.
Detects critical data trapped in HTML tables without prose restatement. Trafilatura degrades tables during extraction; jusText removes them outright. Tables without prose fallbacks are high-risk content under AI extraction pipelines.
Measures content substance: word count, content ratio, section coverage.
How effectively your primary keywords are placed across the zones that search engines and AI systems prioritize.
Automatic keyword extraction and scoring showing what keywords your page targets and where they appear.
AI-powered keyword gap analysis suggesting missing high-value keywords for your page.
Does the AI knowledge graph recognize and trust your brand?
Checks Wikidata entity presence and richness.
Evaluates Experience, Expertise, Authoritativeness, and Trustworthiness signals.
Checks consistency of brand entity references within the page (title, headings, body, meta). Does not verify cross-platform consistency.
Detects whether E-E-A-T signals (author credentials, expertise, affiliations) appear in article prose where AI extraction preserves them, or only in bio blocks that extraction pipelines discard (DR-097 Section 4c).
Measures whether internal links point to topically related content using anchor text and URL slug overlap as a proxy signal.
Queries major AI providers to check how a brand appears in AI-generated responses.
Is your content formatted so AI can cite it accurately?
Validates title tag, meta description, viewport, charset, and lang.
Validates structured data (JSON-LD, Microdata, RDFa) for AI system comprehension.
Validates Open Graph and Twitter Card tags.
Detects content age via dateModified and datePublished metadata.
Detects content density, vocabulary diversity, and repeated blocks.
Baseline technical signals that support AI visibility.
Evaluates internal/external link quality and anchor text.
Checks alt text coverage/quality, dimensions, and loading optimization.
Checks HTML-level performance: size, resource hints, blocking resources.
Validates HTTPS, HSTS, CSP, and other security headers. While not directly related to AI visibility, security headers contribute to overall site trust signals that search engines evaluate.
Validates URL length, slug quality, query params, path depth, and trailing slash consistency.
Evaluates accessibility signals: lang, landmarks, skip nav, form labels.
Detects HTTP redirect chains, type misuse, and collapsible normalization hops.
Checks FineWeb-style quality indicators that determine whether a page would survive AI training data filters. Evaluates terminal punctuation ratio, line length distribution, duplicate line ratio, and prose density.
Forward-looking signals tracked for informational value.
Measures the ratio of named entities (people, brands, products, places) in your content. Research across 18,000 verified AI citations found entity-dense content is selected at significantly higher rates than generic prose.
Measures where your citable signals (definitions, entities, data points) concentrate across your content. Research based on 3 million ChatGPT responses found 44.2% of citations reference the first 30% of a page.
Detects clear definitional patterns ('X is,' 'X refers to,' 'X is defined as') that AI systems can extract and attribute with confidence. Content with definitional language is cited at roughly twice the rate of content without it.
Estimates whether your content tone falls in the optimal range for AI citation. Research found content with balanced subjectivity (~0.47, similar to industry analysis) is cited more often than purely promotional or purely dry content.
Checks for an llms.txt file at your site root, the emerging standard for giving AI systems a structured guide to your key pages. Over 844,000 websites have adopted it, though no major AI provider has confirmed parsing it in production.
Checks whether your site provides complete documentation as a single AI-ingestion file. AI coding assistants like Cursor already read llms-full.txt to answer developer questions without crawling page by page.
Detects the CMS, framework, and key plugins powering the scanned page. Technology detection provides context for interpreting your other results rather than being a signal to optimize directly.
Surfaces security-relevant observations from HTTP headers and HTML content collected during your scan. Informational only.
Detects which structural taxonomies are present on the site's URL graph. Informational only.
Would AI stake its reputation on recommending you?
Evaluates whether service/product pages describe real, specific offerings.
Detects whether the site is a living, active business.
Measures how long the business has been operating via RDAP and on-page signals.
Detects SEO-optimized content thin on actual substance.
Checks for privacy policy, terms of service, and ethics/accessibility pages.
Detects case studies, testimonials, or review evidence on the site.
Informational signal: transparent pricing can improve AI trust but its absence does not count against your score.
Evaluates whether the business provides complete, verifiable contact information.
Informational signal: estimates the likelihood that page content was generated by AI. Does not affect your trust score.
Compares the business identity details on your site against the matched Google Maps listing.
Enter any URL. Get your score in 60 seconds. Free.
Run a free GEO scan