Structured Data and Unstructured Data in AI (and Real Life)

schema-quote.png

From our first day of school to our first day at a new job, we're conditioned to learn within structure. School days are subdivided into classes, units, and lessons. Work training is broken down into modules to hone a particular skill or grasp a specific function. Our understanding grows as we recognize how individual parts relate to each other, and how those parts fit within the greater whole. 

Assigning categories and attributes to the items we perceive helps us compartmentalize and streamline our thoughts, allowing us to respond accordingly. The large language models (LLMs) that provide the framework for generative AI also tend to operate better (or at least more reliably) within structure. Structured data follows a well-defined, predictable logic and format. 

The problem – as it relates to generative engine optimization (GEO) – is that the vast majority of information on the web is unstructured. While LLMs can train with and parse through unstructured data, it tends to be less efficient. Think of the difference between mentally processing and summarizing a scholarly essay vs. referencing a visual aid that tidily summarizes its most important concepts. 

As more people turn to ChatGPT, Perplexity, and Google AI Overviews for answers, it's becoming increasingly important to add that layer of context so your content can be understood "at a glance" by LLMs. Exploring and applying structured data markup is a must if you want to achieve an edge over your competitors in AI search. 

Why AI Systems Love Structured Data

The AI Perspective: How Machines "Read" Your Content 

how-ai-reads-your-content.png

Understanding how AI systems "read" content is crucial for AI search optimization.

1. Text Tokenization and Parsing

AI systems don't read words like humans do. Instead, they:

  • Break text into tokens: Words, subwords, or even characters
  • Identify sentence boundaries and paragraph structures
  • Recognize HTML tags and formatting elements
  • Extract metadata from page headers and structured markup

2. Natural Language Processing (NLP) Layers

The AI then applies multiple NLP analysis layers:

  • Lexical Analysis: Identifies parts of speech, verb tenses, and proper nouns
  • Semantic Analysis: Determines word meanings in context
  • Syntactic Analysis: Understands sentence structure and grammar
  • Pragmatic Analysis: Attempts to grasp implied meaning and intent

3. Entity Recognition and Extraction

AI systems scan for:

  • Named entities: People, places, organizations, dates
  • Relationships: Who works for whom, what happened when
  • Attributes: Prices, ratings, specifications, contact info
  • Topics and themes: What the content is fundamentally about

Applying structured data markup (aka schema markup) greatly simplifies this task. You're labeling all the key entities beforehand, leaving little room for misinterpretation. 

4. Knowledge Graph Matching

The AI compares extracted information against:

  • Existing knowledge bases (Google's Knowledge Graph, Wikidata, etc.)
  • Previously processed content for consistency
  • Authoritative sources for fact verification
  • Entity disambiguation (which "Apple" are we talking about, the tech company or the crunchy fruit?)

5. Context and Relevance Scoring

Finally, AI systems assign confidence scores based on:

  • Source authority: Domain reputation, author credentials
  • Content freshness: Publication and update dates
  • Information clarity: How unambiguous the facts are
  • Corroboration: Whether other sources confirm the information

How Structured Data Aids AI Reading Comprehension

Let's take a simple sentence: 

"Epic Web Studios in Erie, PA has provided web development, SEO, and digital marketing services since 2009."

As it is, AI would have to parse this sentence for entities, determine how they relate to one another, and verify their factuality against existing sources.

With structured data, we can label all relevant entities and their relationships beforehand, eliminating any guesswork on the AI's part. In the below example, we're utilizing the JSON (JavaScript Object Notation) format that is standard across the web. A comprehensive guide to JSON data structures can be found at Schema.org 

{

  "@context": "https://schema.org",

  "@type": "Organization",

  "name": "Epic Web Studios",

  "foundingDate": "2009",

  "address": {

    "@type": "PostalAddress",

    "addressLocality": "Erie",

    "addressRegion": "PA",

    "addressCountry": "US"

  },

  "makesOffer": [

    {

      "@type": "Service",

      "name": "Web Development",

      "serviceType": "Web Design and Development"

    },

    {

      "@type": "Service", 

      "name": "Search Engine Optimization",

      "serviceType": "SEO Services"

    },

    {

      "@type": "Service",

      "name": "Digital Marketing",

      "serviceType": "Digital Marketing Services"

    }

  ],

  "areaServed": {

    "@type": "State",

    "name": "Pennsylvania"

  }

}

This format allows AI to process information 10 to 100 times faster with significantly fewer errors, giving AI greater confidence to cite Epic Web Studios in queries related to web development, SEO, or digital marketing services in Erie, PA.

Priority Schema Types for AI Optimization

Foundation Schema (Implement First)

Your absolute priority is schema types that identify yourself as a business, professional, or expert in a specific subject area. 

  1. Organization Schema: Includes essential company information (such as contact details, location, offerings, and areas served) as well as authority signals (credentials, memberships, awards, reviews/ratings, professional pages and profiles) 

  2. Article/BlogPosting Schema: Can help signal Experience, Expertise, Authority, and Trust (E-E-A-T) around articles and their authorship. You can use schema to literally spell out a high-quality résumé and reputation (with backing references) in code!

  3. FAQ Schema: Including frequently asked questions (FAQs) on your web pages and blogs has always been useful. For AI search optimization, it is mandatory. That's because generative AI is engineered for personalized Q&A sessions, and AI Overviews are designed to put answers right in front of search users. 

Content-Specific Schema (Industry Dependent)

  1. Product Schema: An obvious priority for e-commerce businesses; with schema markup, you can spell out product ratings, features/attributes, and pricing and availability

  2. How-To Schema: Step-by-step instructions make a ton of sense for entities such as social services organizations (how to get help or locate resources), sellers of equipment or appliances (user guides), and recipe creators. 

  3. Event Schema: If you regularly host or participate in events, you can use structured data markup to specify the event's date, time, and location, as well as any ticketing or registration information.

Advanced Schema for Authority Building

  1. Person Schema: Looking to land that big job, partnership, contract, or engagement? Person schema can be used to list credentials and qualifications, professional association memberships, experiences, and more.

  2. Dataset Schema: Helps AI parse out key findings in original research and studies, such as survey results and statistics. If you've authored or compiled that research, this markup can provide a huge boost to authority.

Schema Implementation Strategy: From Planning to Launch

schema-infographic.png

Step 1: Assess Content

First, assess your existing content within the framework of purpose. How well does it answer your target audience's questions in terms of clarity, depth, accuracy, and authority? Are all your bases covered? Is it valuable?

Keyword research helps identify high-volume phrases, but don't neglect personal experience and context. By anticipating and speaking to more specific/specialized client situations and scenarios, you can position yourself for unique AI citation opportunities (especially through FAQs, case studies, and blog posts).

Step 2: Prioritize Updates

Next, rank pages and blogs by traffic (which of them get the most visitors?) and value (which of them address your audience's biggest challenges or pain points?) You'll want to prioritize core products or services, identity pages (such as the homepage or staff profiles), and your most popular blog posts – chances are they're the most popular because they're already the most helpful.

Step 3: Choose Your Schema 

Once that's accomplished, it's time to select the right schema type for the job. Certain content management systems, such as epicPlatform, will automatically detect and apply schema depending on page template and content structure (FAQ schema). 

But even if your CMS doesn't have this perk, there are plenty of free tools out there to help, including:

Another tool that is pretty adept at generating structured data? Unironically, generative AI! Focus on one primary entity at a time (e.g., a single business or organization, a specific product or service, or a distinct process or set of instructions) for optimal results, providing only details that contribute to a deeper fundamental understanding of that entity.

Step 4: Test Your Schema

Last but not least, it's vital to test your schema – improperly structured data will be either misinterpreted by AI or not recognized at all. The top tools for testing schema markup include:

These tools are also useful for testing your content's eligibility for display in Google's rich results (specially formatted results that stand out from the standard set of blue links), although those are increasingly being displaced by the AI Overview (still worth checking into!)

Schema Focuses by Business Model

Local Business Optimization

If you own a local business, it's obvious that LocalBusiness schema should be your starting point. This structured data markup type contains essential NAPW (Name, Address, Phone Number, and Website) information, geo-coordinates, and hours of operation, and can be amended to mention core service/product offerings. It should mirror your Google Business Profile (GBP). The goal here is to be cited in AI Overviews for "near me" or "geo-specific" searches.

After building out, testing, and applying LocalBusiness schema (usually by insertion into the htmlsection of your homepage), you should turn your attention to Review and AggregateRating schema, which comes in handy when AI is citing "best" local businesses who are superlative in any aspect. Of course, the more reviews and better ratings your business has, the more useful this exercise will be.

For multi-location and service area businesses, creating individualized location or service area pages with their own schema is a must. These should align with and verify the information in the corresponding Google Business Profiles to enhance AI confidence scores for relevant local searches.

If you'd like to delve deeper into this subject, the Local Falcon blog is a great resource for setting yourself up for success in the local SEO arena.

E-commerce Applications

For e-commerce businesses, it's all about the products. That unambiguity is a blessing, but the task of AI optimization at scale can feel like a curse.

You can lighten that load through prioritization and automation:

  1. Start with Organization schema to help push identity and reputation signals (ratings and Reviews) for your e-commerce business as a whole.

  2. Follow with database-driven Product schema generation of all core products. Before doing this, ensure all essential product information is filled out and consistently formatted in the database, such as product name, description, and pricing.

  3. Apply additional properties to popular, high-margin, and/or promoted products for enhanced visibility in AI Overviews. Review and AggregateRating schema are tremendously beneficial on a product level (think again of searches for superlatives: "biggest," "fastest," "best," etc.).

    Applying properties detailing physical attributes or pointing to similar products boosts relevance, especially for more specific ("long-tail") search queries, e.g., "What is the highest rated winter tire for my 2024 Honda Accord at $200 or less?"

  4. Product FAQs (paired with FAQ schema) can be a nice bonus for complex items, those that have a lot of features or special considerations, or those that are meant to appeal to a very passionate or niche audience.

B2B and Professional Services

B2B and professional services companies often cater to diverse market segments across different industries and/or geographies. When it comes to incorporating structured data for AI, let's begin with the familiar foundation of Organization schema featuring company details, your logo, contact info, and links to social profiles on major platforms (applied to your homepage). 

After that, there are three areas where you'll want to focus your efforts:

    1. Geographic Targeting: Create location-specific landing pages (linking to and from a corresponding Google Business Profile) with market insights, case studies, FAQs, and other content unique to that locale. Pair with the corresponding LocalBusiness schema. And don't neglect citations in pertinent local, professional, and/or industry directories – these help support and corroborate the data on your website and GBP.

    2. Industry Authority Building: Optimize website service pages with "Industries We Serve" sections – or better yet, create dedicated industry pages that speak to their concerns, needs, and challenges, positioning yourself as a solution. Service schema can enhance AI's understanding of what you offer, while FAQ schema can provide potential pathways into your site from AI Overview citations and "People Also Ask" rich results.

    3. Establishing Credibility Signals: This can be on an individual (Person schema with "memberOf," "hasCredential," "alumniOf" properties), organization-wide (years of experience quantified through a "foundingDate" property in Organization schema, industry certifications spelled out in Service schema), or publication level (Article and CreativeWork schema). Naturally, Review and Rating schema are vital for credibility as well. 

How to Maximize Structured Data in AI: Do's and Don'ts

how-to-maximize-structured-data-in-ai-dos-and-donts.png

Do: Write for Both Humans and AI

People use search engines to find answers to questions; AI is engineered to deliver those answers quickly, clearly, and directly. Thus, it is mutually beneficial to both your human AND robot audiences to:

  • Plan content around prevailing questions, concerns, or curiosities

  • Chunk out and structure content for improved scannability and citability – utilize bulleted or numbered lists, comparison tables, and infographics, and a logical header hierarchy (H1, H2, H3) as you work from generalities to the more nuanced facets of a topic.

  • Use direct and conversational language, as this mirrors how people search and "talk to" generative AI.

  • Structured data markup is complementary to – not a replacement for – good content. It exists to make relationships between entities clearer and aid LLMs in extracting data more efficiently. 

Do: Utilize Topic Clusters

It has always paid to demonstrate topical authority to search engines – topic clustering is the practice of covering a subject from all angles, from more general (a pillar page or post) to more specific (supporting pages or posts). Try visually mapping or diagramming topic coverage during content planning; the relationships should be obvious to anyone who looks at it. 

You can support topic clusters with the following forms of structured data:

  • Breadcrumb Schema: Gives AI an understanding of content hierarchies, or how a user would follow a topic from its basics to its nuances across pages or posts.

  • Article Schema: Include the mainEntity (what's the article you're linking out from?) and relatedLink (what are the related posts you're linking to) properties to show relationships, with about properties to provide more context.

  • Organization Schema: The knowsAbout property helps demonstrate areas of expertise; hasOfferCatalog demonstrates your organization's breadth of services or products.

  • FAQ Schema: FAQs are a great place to dive into more specific or niche angles of a particular topic or concern, and reference related content in the same cluster. Once again, an FAQ section on every page is almost mandatory in the AI search era.

Don't: Overuse Markup

We've established that structured data can be a beautiful thing for AI search visibility, but it can just as quickly backfire. You never want to "throw the kitchen sink" at a single page or post, as too much markup can trigger search engine penalties, bog down site performance, or even overwhelm the AI trying to parse it all out.

The "Goldilocks Zone" for Structured Data:

Per-Page Guidelines

  1. Relevance is key: Only add schema types that directly describe the visible content and primary purpose of the page.
  2. One main type per page: E.g., Article, Product, FAQ, HowTo, Event, etc. (You can nest supporting types, but don't overload.)
  3. Supplement, don't stuff: Use supporting properties (e.g., author, datePublished, about, image) to enrich the main type, but avoid adding every possible property just because you can.
  4. Avoid duplicating information: Don't mark up the same content with multiple, conflicting schema types.

Per-Site Guidelines

  1. Consistency: Use the same schema types and property conventions across similar pages (e.g., all blog posts use Article, all products use Product).
  2. Coverage: Aim for all major page types to have appropriate markup, but don't force schema onto pages where it doesn't fit (e.g., Contact or Privacy Policy pages).
  3. Hierarchy: Use Organization or LocalBusiness schema on your homepage and About page to establish site-wide context.

Summary Table

Page Type

Main Schema Type

Supporting Properties/Types

Typical # of Markup Types

Blog Article

Article

Author, Image, FAQ

1-2

Product Page

Product

Offer, Review, Image

1-3

Service Page

Service

LocalBusiness, FAQ

1-2

Homepage/About

Organization

Logo, ContactPoint

1-2

Rule of thumb: If a human reviewer would find your markup logical and directly relevant to the page's content, you're in the Goldilocks zone.

Don't: Just Set It and Forget It

A short while ago, the terms "Generative Engine Optimization," "Search Everywhere Optimization," and "AI Optimization" didn't even exist. We fully expect generative AI and search technologies to grow together and intertwine in new and unexpected ways, which means you can't just "set and forget" any related strategy. 

Testing, tracking, measurement, and refinement are the cornerstones of any SEO or digital marketing strategy, and they are here too.

  1. Test both existing and newly created schema with Google Rich Results Test, Schema Markup Validator, or a supported browser extension. Google Search Console and third-party tools like SEMrush and Ahrefs can be used to monitor for and alert you to errors.

  2. Track visibility in AI overviews through Google Search Console (see "AI Overviews/SGE") and third-party tools like Local Falcon (equipped with both Google AI Overview and ChatGPT visibility tracking features). Keep an eye on competitor performance in AI search, as this can help guide your own schema strategy.

  3. Measure the incidence and frequency of AI citations per page/post, keyword, and structured data type. Keeping a log of your schema updates and their associated pages can help you troubleshoot and adapt.

  4. Refine and adjust schema as your content changes (probable), AI search requirements change (inevitable), and new schema types and properties become available and recognized. A/B test structured data variations on similar pages and make note of which resonates the best. 

Structure a Better Future in AI Search

Does keeping up with AI-preferred data structures and continually shifting SEO practices keep you up at night? Here at Epic, we stay up to date, so you can stay up with your business.

REACH OUT

Return to all