Bringing AI to the Context graph

How AI actually uses a knowledge graph

I’ve been talking a lot about building the graph — indexing documents, normalizing entities, building the ontology. But the whole point of building this thing is so that AI can use it. How does that actually work?

The primary interface for AI consumption is MCP — the Model Context Protocol, which is Anthropic’s standard for giving LLMs access to tools. In this architecture, the MCP tools are thin wrappers around the Java backend’s query endpoints. The LLM calls the tool, the tool hits the backend, the backend runs the query (with or without inference, depending on what’s needed), and the results come back to the LLM as structured data.

Here are four sequence diagrams illustrating the different ways an AI can query the Context Graph depending on the type of information needed:

1. Querying Unstructured Information (Semantic Search)

When the user asks a question about facts extracted from documents, the AI uses the basic SPARQL endpoint to query the pre-materialized assertions.

sequenceDiagram
    actor User
    participant AI as LLM
    participant MCP as MCP Tool
    participant CG as Context Graph<br/>(Asserted + Inferred Graph)
    
    User->>AI: Ask question (Natural Language)
    AI->>MCP: Call tool to query unstructured data
    MCP->>CG: Execute SPARQL
    CG-->>MCP: Return triples & provenance
    MCP-->>AI: Return context
    AI-->>User: Answer question with citations

2. Querying with Ontological Reasoning

When the user asks a complex or conceptual question that requires domain logic (like class inheritance or transitive relationships), the AI queries the inference endpoint.

sequenceDiagram
    actor User
    participant AI as LLM
    participant MCP as MCP Tool
    participant CG as Context Graph<br/>(Inferred Graph / Backward Chaining)
    
    User->>AI: Ask complex ontological question
    AI->>MCP: Call tool for reasoned query
    MCP->>CG: Execute SPARQL (with Reasoning)
    Note over CG: Applies backward chaining<br/>rules at query time
    CG-->>MCP: Return inferred triples
    MCP-->>AI: Return context
    AI-->>User: Answer question

3. Querying Structured Information

When the user asks for aggregations or measurements that live in a relational database, the AI uses the SPARQL-to-SQL translation layer (Ontop) to bridge the semantic gap.

sequenceDiagram
    actor User
    participant AI as LLM
    participant MCP as MCP Tool
    participant Ontop as SPARQL-to-SQL<br/>(Ontop)
    participant DB as Relational Database
    
    User->>AI: Ask question needing structured data
    AI->>MCP: Call tool to translate SPARQL
    MCP->>Ontop: Send SPARQL query
    Ontop-->>MCP: Return translated SQL
    MCP-->>AI: Return SQL
    
    alt AI modifies SQL
        AI->>AI: Review schema and modify SQL
    end
    
    AI->>MCP: Call tool to execute SQL
    MCP->>DB: Run SQL query
    DB-->>MCP: Return tabular results
    MCP-->>AI: Return results
    AI-->>User: Answer question

4. Querying Vector Documents Directly

When the user asks a broad question that requires searching through raw document text rather than extracted entities, the AI performs a semantic vector search directly against the document embeddings stored in Postgres.

sequenceDiagram
    actor User
    participant AI as LLM
    participant MCP as MCP Tool
    participant Embed as Embedding Model
    participant DB as Postgres<br/>(pgvector)
    
    User->>AI: Ask broad conceptual question
    AI->>MCP: Call tool for semantic text search
    MCP->>Embed: Embed search query
    Embed-->>MCP: Return query vector
    MCP->>DB: Execute vector similarity search
    DB-->>MCP: Return top matching document chunks
    MCP-->>AI: Return chunks & provenance
    AI-->>User: Answer question using source text

The query routing architecture

The Java backend exposes a few query endpoints with different reasoning characteristics. I won’t go into too much detail here, but the gist is that we have separate routes depending on what we need:

A raw route for bypassing the reasoner entirely (useful for provenance lookups).
A reasoned route that runs queries through the InfModel (which applies both forward-chained and backward-chained rules).
A TBox route for querying the ontology itself.
A text search route.

Text and Vector Search

When the LLM needs to find an entity or a concept based on a user’s natural language query, we actually have two distinct options:

Text Search in RDF: We can hit the Jena-text (Apache Lucene) index built over rdfs:label literals in the RDF store. This returns matching entity IRIs along with relevance scores. Because these IRIs might be variant (non-canonical) entities, we then do a second hop through the InfModel to resolve the canonical entity identity via the owl:sameAs closure.
Vector Search in Postgres: Because we stored the document chunks and their embeddings in Postgres (using pgvector) during the indexing pipeline, we can also perform a semantic vector search directly against the database.

Having both options means we can use exact/fuzzy keyword matching when we know the entity name, or semantic vector search when we are looking for broader concepts or paragraphs of text.

owl:sameAs transparency

One of the nicest properties of this architecture is that owl:sameAs normalization is invisible to MCP tool consumers. The tool asks for an entity, and the reasoner automatically includes all properties from all co-referent variants. You don’t have to know that the data was extracted across three different documents with three slightly different labels. From the tool’s perspective, there’s just one entity with a complete set of properties.

This is the payoff for all of the normalization work in part 5. Without it, the LLM would have to manually figure out that “King County, WA”, “King County, Washington”, and “King County (WA)” are the same thing, and manually aggregate their properties. With it, that’s the reasoner’s job.

Reasoning as a query-time enrichment

Backward chaining rules make the knowledge graph more powerful than just a lookup table. A backward rule is essentially a derived predicate — a fact that can be derived from other facts at query time.

Here are a few patterns of query-time enrichment that we can do:

1. Reflexive and Transitive Closure If we have a relationship like “A is a part of B” and “B is a part of C”, we don’t need to explicitly store “A is a part of C”. We can write a backward chaining rule that infers this transitive relationship at query time. If the LLM asks “What is inside C?”, the reasoner will automatically traverse the hierarchy and return A, B, and anything else inside them.

2. Class and Subclass Inheritance If we define an ontology where Pediatrician is a subclass of Doctor, and Doctor is a subclass of HealthcareProfessional. If we extract an entity typed as Pediatrician, we don’t need to explicitly assert that they are also a HealthcareProfessional. The reasoner handles this class inheritance dynamically. If the LLM queries for all HealthcareProfessionals in a region, the pediatrician will be returned.

3. Property-based Classification In this system, I have rules for entity classification based on properties. An entity with a fipsCode property and certain other features can be classified as a County even if it wasn’t explicitly typed as County in the extraction. The backward rule fires when the reasoner tries to evaluate ?entity a County — it checks whether the evidence pattern matches, and if so returns true.

This means the graph doesn’t need to explicitly store every derivable fact. The rules encode the domain logic. Adding a new rule changes what can be inferred without touching the stored data. This is the “ontology as business logic” principle from the design posts.

The reasoning playground I described in the visualization post was built specifically so I could test these rules in isolation — write a rule, write a SPARQL query, see what the reasoner derives. The playground runs the query against the base data and against the data augmented by the rules side-by-side, so you can see exactly what the rules contributed.

The zero hallucination claim

In the first post, I mentioned “0% hallucination from LLMs, meaning every assertion must be backed up by sources.” This is a strong claim and I want to be precise about what it means.

It means: for every factual assertion that the system returns to an LLM, there is a provenance trail — you can follow it to a specific source document, a specific chunk of text, and a specific extraction event with a confidence score and timestamp. The LLM can cite its sources because the graph requires sources for everything.

This is different from saying the extractions are always correct. An LLM can extract a wrong fact from a document and the system will faithfully store and return that wrong fact with high confidence. The system doesn’t validate semantic correctness — it tracks origin. If the origin document is wrong or if the extraction was wrong, the provenance chain leads to the wrong source. You can then audit, retract, and correct.

The guarantee is not “everything in the graph is true”. The guarantee is “everything in the graph was explicitly extracted from a named source, and you can look it up”. That’s a meaningful guarantee — it’s the difference between a hallucinating AI and an AI that makes verifiable claims. Verifiable claims can be wrong and then corrected. Hallucinations can’t be corrected because they have no ground truth to check against.

In the future, we could take this even further by assigning confidence scores or doing alignment scoring to ensure the extraction strictly aligns with the source text. And because the LLM uses assertions directly from the RDF store when answering questions, the final output aligns perfectly with the source material. This makes the entire pipeline a much nicer, more robust way to deal with the zero hallucination problem.

Bridging to SQL with ONTOP

One direction I explored and successfully implemented is SPARQL-to-SQL query rewriting. The idea is that a query expressed in SPARQL against the ontology should be automatically translatable into an equivalent SQL query against the original relational database, using the R2RML binding layer as the translation map.

The benefit is that you get a single unified semantic query interface. An AI can write a SPARQL query using the ontology terms and get answers from either the RDF store or the SQL database transparently, depending on where the data lives.

This is currently done using ONTOP integration. Ontop is a tool that implements this kind of virtual knowledge graph — it takes SPARQL queries and rewrites them to SQL using R2RML mappings. I implemented this for the insurance benchmark dataset (a structured set of policy, claim, and customer tables), and it works beautifully. My benchmark showed that Ontop could handle many of the standard SPARQL patterns, though complex joins and aggregations involving multiple hop patterns can get tricky fast.

For the BigQuery datasets, I ended up going with the semantic binding approach I described in part 6 instead — giving the LLM enough context to write SQL directly. It’s less formally elegant than SPARQL-to-SQL but more practical for the LLM-first world where the AI generates queries anyway. If the AI is already generating SQL, having a SPARQL translation layer in between doesn’t add much.

In a traditional enterprise setting where you’d have existing SPARQL tooling and BI tools wanting to query through the ontology, the formal SPARQL-to-SQL approach via ONTOP is essential. For AI-native consumption, the semantic binding layer plus LLM-generated SQL is probably simpler, but having both options means we can adapt to whatever the use case demands.

What I think about this

Here is a quick summary of the design decisions I made for AI query unification and my assessment of them:

Decision / Role	Choice made	Assessment
Search Modality	Dual-track (RDF Text + Postgres Vector)	Excellent. Giving the LLM the ability to do exact/fuzzy keyword searches in the graph or semantic vector searches in the database covers all bases.
Query-time Enrichment	Backward Chaining Rules	Very powerful. It allows us to derive classifications, transitive closures, and class inheritance on the fly, keeping the triplestore clean and pushing domain logic into the ontology.
Zero Hallucination	Strict Provenance Tracking	Essential. By tracking every assertion back to its source document and chunk, we turn hallucinations into auditable, correctable errors. Future alignment scoring will make this even stronger.
SPARQL-to-SQL Translation	ONTOP Integration	Great for traditional BI tools and formal semantic queries, though complex joins can be tricky. For AI-native workflows, the semantic binding layer (from Part 6) is often more practical.

Navigation:

Previous: Part 7: Data Visualization
Next: Part 9: Recap and Future Direction

Written on April 11, 2026