AI AgentspgvectorApache AGERAGEmbeddings
Reading time: 8 min

AI Agents that always get it right

04/06/26
Illustration for AI Agents that always get it right | ScaleLayR knowledge base

The problem with most AI agents

Ask an average AI agent “what was our revenue last quarter?” and you get a solid answer. Ask the same agent “what went wrong with that project last year with the difficult client?” and you get nonsense. Or worse: a confidently wrong answer.

The difference is not the language model. The difference is how the agent retrieves information. Most systems run on keyword search: they match words from your question with words in a database. That works for exact questions. It fails completely for anything vague, implicit, or context-dependent.

And let’s be honest: the questions that actually matter in a business are almost always vague.

Why keyword search is not enough

Imagine: an employee asks the internal AI agent “why are our projects running late?” Keyword search looks for documents containing the words “projects” and “running late.” But the real answer might be in a retrospective document about “delays due to scope creep,” in a Slack thread about “unclear requirements,” and in a CRM note about “client changed specifications after sprint 2.”

None of those sources contain the phrase “running late.” Keyword search misses all of them.

This is the fundamental problem. Human communication is associative, not literal. We think in concepts, relationships, and patterns. A search system that only matches words understands nothing about what you actually mean.

The three-layer architecture

At ScaleLayR, we build AI agents on an architecture that combines three retrieval layers. Each layer catches what the others miss. Together, they achieve 99.99% accuracy on internal knowledge bases.

Layer 1: Semantic search with pgvector

The first layer replaces keyword search with semantic search. Instead of matching words, we match meaning.

How it works: every document, every paragraph, every note is converted into an embedding. That is a mathematical representation of the meaning of that text, expressed as a vector of hundreds of dimensions. “Projects running late” and “delays due to scope creep” are far apart in dictionaries but close together in embedding space.

pgvector is a PostgreSQL extension that stores these vectors and makes them searchable. It runs in the same database as your existing business data. No separate system, no extra infrastructure, no vendor lock-in.

When an employee asks a question, that question is also converted into an embedding. pgvector then finds the documents whose meaning is closest to the question. Not the words. The meaning.

This alone solves 80% of the “vague question” problems. But 80% is not 99.99%.

Layer 2: Unlocking relationships with Apache AGE

The second layer adds context that pure semantic search misses: the relationships between things in your organization.

Apache AGE is a graph database extension for PostgreSQL. It models your business data as a network of connected entities: clients, projects, employees, documents, decisions, events. Each entity is a node, each relationship is an edge.

Now an important nuance: Apache AGE itself works on property matching. You query the graph with exact values — names, IDs, labels. That is fundamentally keyword-based. The graph is not semantically smart. But it does not need to be, because the intelligence is in how you combine the layers.

The pattern works like this: pgvector (layer 1) receives the vague question and finds semantically relevant documents. From those documents, the application layer extracts concrete entities — a client name, a project number, an employee. Those exact entities are then used as starting points in the graph.

Apache AGE takes over from there and traverses the relationships:

  • Client X had contract with Project Y
  • Project Y had phase Sprint 3 which resulted in Delay Z
  • Delay Z was related to Scope change W
  • Employee A wrote retrospective about Project Y

The result: pgvector translates the vague question into concrete entities. Apache AGE traverses the full relationship network from those entities. The graph misses nothing, because it is not searching on words — it is following connections. Employee A’s retrospective is found via the relationship with Project Y, not via a keyword match.

This combination solves a problem that neither layer can solve alone: answering vague questions with relationally complete answers. pgvector understands what you mean. Apache AGE knows everything connected to it.

Layer 3: Memory with multiple filters

The third layer is what pushes accuracy from 99% to 99.99%: persistent memory with multiple filter mechanisms.

Every interaction with the agent is stored as a memory entry with multiple dimensions:

  • Semantic embedding of the question and answer
  • Entity tags (which clients, projects, employees were involved)
  • Temporal metadata (when, which period did it concern)
  • Confidence score (was the answer confirmed, corrected, or ignored?)
  • Interaction context (who asked, from which department, in which workflow)

For a new question, the memory runs through multiple filter stages:

  1. Semantic filter: which previous interactions are related in meaning?
  2. Entity filter: do the involved entities overlap (client, project, department)?
  3. Temporal filter: does the question fall within the same time period?
  4. Confidence filter: which previous answers were confirmed by the user?

Only memories that pass through multiple filters are included in the answer. This prevents hallucinations and ensures the agent learns from previous interactions.

pgvector is the landscape, Apache AGE the road network

An analogy that clarifies how the layers work together.

pgvector turns all your data into a continuous landscape. Every document, every paragraph, every note becomes a point in that landscape. Points that are similar in meaning sit close together. “Projects running late” and “delays due to scope creep” lie in the same valley, even though they share no words.

When you ask a question, that question also becomes a point. pgvector then searches: which points are closest? That is fuzzy, continuous, probabilistic. You are searching a region, not looking up an exact address.

Apache AGE is a road network. The nodes are cities: concrete entities like Client X, Project Y, Employee A, Sprint 3. The edges are roads: fixed connections with a name (“had contract with”, “resulted in”, “wrote retrospective about”). A graph query is the car driving along those roads.

But that car can only drive if you give it a starting point. An exact address. A client name, a project ID. The car cannot say “drive to something that resembles Project Y.” It needs the exact address.

And that is where the power of the combination lies. pgvector says: “search the Brabant region, the answer is somewhere there.” From the found documents, the application layer extracts concrete entities: a client name, a project number. Those are the exact addresses.

Apache AGE places the car at that address and traverses every road: contracts, project phases, delays, scope changes, retrospectives. Everything that is connected.

pgvector finds the neighborhood. Apache AGE knows every street.

Neither can do it alone. pgvector finds the right region but misses the exact connections. Apache AGE knows every connection but cannot search without a starting address. Together: vague question in, sharp answer out.

Why everything runs on PostgreSQL

A deliberate architecture choice: pgvector and Apache AGE are both PostgreSQL extensions. Your entire knowledge layer runs in a single database.

The advantages are substantial:

  • Single source of truth: no synchronization problems between separate systems
  • ACID transactions: your memory layer is as reliable as your production data
  • SQL interface: your existing team can run queries, debug, and monitor
  • Scalability: PostgreSQL scales vertically to enormous datasets, and horizontally with Citus
  • No vendor lock-in: everything is open-source, runs anywhere

Compare that to an architecture where Pinecone manages your vectors, Neo4j hosts your graph, and Redis caches your memory. Three systems, three failure risks, three invoices, three teams that need to understand it.

99.99% in practice

What does 99.99% accuracy mean concretely? On 10,000 questions, the agent gives the correct answer 9,999 times. The one time it gets it wrong, the system is honest enough to say “I don’t have a reliable answer for this.”

That last point is crucial. Most AI failures don’t come from the model not knowing something. They come from the model pretending to know something. Our architecture has a built-in uncertainty measurement. When none of the three layers produces an answer with sufficient confidence, the agent says so. Better honestly uncertain than confidently wrong.

The accuracy is achieved through redundancy:

  • Layer 1 (pgvector) catches semantically related information
  • Layer 2 (Apache AGE) adds relational context
  • Layer 3 (memory) delivers proven answer patterns

An answer confirmed by all three layers is virtually guaranteed to be correct. An answer supported by only one layer receives a lower confidence score and is presented accordingly.

Vague questions, sharp answers

Back to the question we started with: “what went wrong with that project with the difficult client?”

In our architecture, the following happens:

  1. pgvector finds documents that semantically relate to project problems, delays, and client feedback. From the top results, the application layer extracts concrete entities: a client name, project numbers, involved employees.
  2. Apache AGE takes those entities as starting points and traverses the relationship network. All connected project phases, delays, scope changes, and retrospectives are retrieved — not via vague search terms, but via exact graph relationships.
  3. Memory recognizes that this user previously asked similar questions about the same project and knows which context was relevant then.

The result: a concrete, documented answer with source references. Not a generic story about “common project risks.”

Implementation: weeks, not months

The beauty of this stack is that it does not need to be a greenfield project. Most companies already run PostgreSQL. pgvector and Apache AGE are extensions you install, not migrations you execute.

Implementation follows three phases:

  1. Week 1-2: Data audit and embedding pipeline setup. Existing documents, notes, and knowledge base articles are indexed in pgvector.
  2. Week 3-4: Graph model built in Apache AGE. Entities and relationships from your existing systems are modeled.
  3. Week 5-6: Memory layer activated and agent fine-tuned to your specific domain. Multiple filters calibrated based on test results.

After six weeks, you have an AI agent that searches your internal knowledge base with accuracy that surpasses manual searching. And it keeps getting better as more people use it.

Conclusion

The race for AI agents is not a race for the best language model. It is a race for the best retrieval architecture. The model generating your answers is only as good as the information it receives.

A three-layer system with pgvector for semantic search, Apache AGE for relational context, and persistent memory with multiple filters delivers the accuracy that business-critical applications require. Not 90%. Not 95%. But 99.99%.

And it all runs on PostgreSQL. Open-source, proven, scalable. No exotic infrastructure, no vendor lock-in, no surprises on the invoice.