Ateneo — Dossier (English)

§ 02 — Definition

What is Ateneo

An AI-assisted editorial research instrument that distinguishes between solid ground and open reading. Not a chatbot. Not a wrapper. An architecture designed so that every claim has an explicit confidence level.

Level	Mode	Nature	Function
Solid ground	CITA	Deterministic, no LLM	Verifiable literal citability. Pure documentary search with cache. The reader can go to the text and judge for themselves.
Documentary finding	FUENTE	Deterministic + semantic rescue	Passage location. FTS headline + trigram fallback + semantic rescue with page matching.
Knotted reading	MAPA	Framed probabilistic	Open but verifiable interpretation. Guardrails + gap detection + auditing. It does not hallucinate — it rereads.

In Lacan we are not looking for a certainty machine, but a system that distinguishes between verifiable citation and anchored assisted reading. Ateneo does not repeat — it rereads. And it does so anchored in the text.

Master Context · Ateneo · 2026

§ 03 — Architecture

How it works:
layer-by-layer verification

An LLM is a probabilistic generator: it produces fluent text, but does not guarantee that text is faithful to the corpus. Ateneo interposes six verification layers between the model and the user.

🚪

Pre-LLM

1. Whitelist — Only admitted evidence

The LLM never sees data from outside the curated corpus. Suspension of all external judgment: "you can only speak about what is in these books."

✂️

Post-LLM

2. stripDisallowedCitations — Sanitization

Reviews the model's output and strikes out any citation outside the allowed list. "Meditations, Book XV" → does not exist (there are only 12). Removed.

⚖️

Rule

3. Quote Contract — Explicit contract

Prompt with formal instruction: "If you cannot find the citation, say you cannot find it. Do NOT invent." Sets the framework; if it fails, layers 4–6 catch it.

🔍

Logic

4. Gap detection — Suspicious jumps

Detects unjustified temporal or conceptual jumps. Citation from Seminar 2 (1955) alongside a reference to Seminar 20 (1972) without explanation → alarm.

📁

Classification

5. ANCLA / NÚCLEO / VECINDAD / RUIDO

Each retrieved fragment is classified by proximity. Only ANCHOR (exact citation) and CORE (immediate context) reach the LLM. VICINITY and NOISE are discarded.

👥

Cross-check

6. Cross-validation — Consensus across sources

Cross-checks sources against each other. In Marcus Aurelius: 4 languages (Greek, English, Spanish, French). If "confirmed" in English but without a correlate in Greek → inconsistency.

§ 03b — Hybrid search

Three methods, complementary coverage

Each search method has a different blind spot. Combined, they compensate for each other.

🔍

Trigram (pg_trgm)

Form-based search

Divides text into 3-character chunks and compares. Tolerant of typos and imperfect OCR.

✓ Detects typos, variants

✗ Does not understand meaning

🧠

Vector / HNSW

Meaning-based search

Each text → vector of 1,536 numbers. «The impediment to action...» y «τὸ ἐμποδίζον τῇ πράξει...» → nearly identical vectors.

✓ Crosses languages, synonyms

✗ Can confuse ambiguous concepts

📖

FTS (Full-Text Search)

Word-based search

Analytical index with morphological flexibility. "verificar" finds "verificación" and "verificado".

✓ Morphological roots, precision

✗ Does not cross languages

4 formal contracts. 80 cases. 80/80 live. End-to-end runtime auditing. The difference is not the system's volume, but its discipline: verifiable contracts, checkers, CI, and traceability persistence when the model responds.

§ 04 — Evidence of reality

Production, not prototype

Ateneo has been in production since February 2026 operating on real humanities corpora. These are the verified metrics.

80/80

Static and live validation — 20 cases per mode, 4 modes, 0 failures in the current battery

CI · GitHub Actions · production

Formal contracts — CITA, FUENTE, MAPA and Stoic Verify

Contracts · Python checkers · CI

Runtime

Persisted auditing — claims, evidence, and policy snapshots confirmed end-to-end

Production · per-response traceability

Stoic citation verification in Ateneo — result CONFIRMED 95/100 with multilingual contrast — Ateneo Stoic Edition · Production verification · Citation cross-checked against original Greek text (Leopold 1908) in 4 languages

Lacan Corpus ↗

Seminars 1–23 in Spanish and French

Curated corpus of Lacan's 23 Seminars. OCR corrected, pagination aligned with Paidós editions. The hard case: multiple editions, translations with variance, oral seminars transcribed.

Stoic Corpus ↗

Marcus Aurelius + Seneca + Epictetus in 4 languages

Critical apparatus with original Greek text (Leopold 1908, Long 1862), English, Spanish, and French. Cross-verification across four languages. Public domain editions with stable textual tradition.

How do I know it doesn't invent citations? Three levels: (1) citations [Book, p.N] are validated against the deterministic database — if it didn't come from there, it is removed; (2) quoted literals are verified character by character against original snippets; (3) runtime auditing cross-checks languages to detect inconsistencies. The cita_fast contract is 100% deterministic, no LLM.

§ 05 — State of the art

What the field proposes
and what Ateneo has

Papers from the last 6 months on citation verification in LLMs, cross-referenced against Ateneo's architecture.

FACTUM

Johns Hopkins / DARPA · ene 2026

Mechanistic theory: hallucination is an Attention vs Feed-Forward failure in the LLM.

Provides theoretical framework for why Ateneo's guards work. Not competition — justification.

Diagnostic

SemanticCite

U. Sydney · nov 2025

Verify whether the citation actually supports what is said, not just whether it exists.

Ateneo verifies existence. Shared pending gap: verifying interpretive prose.

Partial

CiteGuard

U. Waterloo + U. Illinois · oct 2025

Attribution alignment con retrieval-augmented validation. 65,4% de precisión en CiteME — casi a nivel humano (69,7%).

Ateneo follows this same scheme: retrieval → LLM → verification. In specialized humanities, precision tends to drop due to corpus complexity.

✅ Sí

CheckIfExist

— · ene 2026

Verify that bibliographic references actually exist before including them.

stripDisallowedCitations: if the citation [SEM XI, p.N] did not come from the DB, it is automatically removed.

✅ Sí

HalluCitation Matters

NAIST, Japón · 2025

Documents fabricated references in papers accepted at ACL, NAACL, and EMNLP — top conferences.

The problem is so severe that it contaminates peer-reviewed academic publications.

Diagnostic

Citation Failure / CITENTION

TU Darmstadt · sep 2025

Distinguishes citation failure (incomplete citation) from response failure (incorrect response).

cita_fast: no response failure possible (deterministic). FUENTE separates citation verification from response quality.

✅ Arq.

Deployment Constraints & Citation Hallucination

NYU / CMU / Boston U · mar 2026

Empirical study on how production prompting constraints aggravate citation hallucination, evaluated on Claude Sonnet, GPT-4o, LLaMA, and Qwen with a deterministic verification pipeline.

cita_fast is 100% deterministic and LLM-free: this is exactly what this paper demonstrates works. arXiv:2603.07287

✅ Sí

PaperAsk

arXiv · oct 2025

Quantitative benchmark: citation retrieval fails 48–98% on GPT-4o, GPT-5, and Gemini.

Not anecdote but reproducible numbers on the best models under real conditions.

Benchmark

OpenScholar

Allen Institute · 2024

Real system: scientific literature synthesis with RAG and anchored citations. Paper in Nature, public demo.

Operates in STEM, not humanities. Single model (LLaMA 3.1 8B).

Product

VeriCite

arXiv · oct 2025

Pipeline: generation → evidence selection → refinement with verified citations.

Prototype with no visible production deployment. Does not verify interpretive prose between citations.

Prototype

4 of 9 papers describe mechanisms comparable to components already implemented in Ateneo. The pending gap converges on one point: that interpretive prose have visible grounding in the text. That is precisely the territory where Ateneo is working.

Epistemological note: Quattrociocchi, Capraro, and Perc (arXiv 2512.19466, with empirical validation in PNAS) formalize this intuition under the concept of Epistemia: LLMs are not epistemic agents but stochastic completion systems where linguistic plausibility replaces epistemic evaluation. Ateneo starts from the same premise: reliability is not assumed from the model — it is designed into the contract, traceability, and abstention.

§ 05b — Scite findings · March 2026

A territory still to be explored

              A — Documentary gap
            

Papers on LLM citation verification in specialized humanities corpora — across 280 million indexed sources

Adjacent work is beginning to appear — such as citation attribution in novels (NAACL 2025, Michel et al.) — but the terrain of verification for specialized humanities corpora remains largely unoccupied.

Scite · 2 searches · March 10, 2026

              B — SciRAG profiled
            

100%
STEM

SciRAG (Nov 2025): operates on indexed scientific literature in English with DOIs. Different domain from Ateneo.

DOI: 10.48550/arxiv.2511.14362

              C — CiteGuard unrefuted
            

Contrasting citations in Scite for CiteGuard. The 65,4% en CiteME is the best available STEM result.

Scite tally · March 10, 2026

Across 280 million indexed sources, I found no papers on LLM citation verification in specialized humanities corpora. There is adjacent work — citation attribution in novels is already at top NLP conferences — but the central piece is missing: traceability and explicit judgment about what can be cited and what cannot. That is precisely the territory where Ateneo operates.

Scite Smart Citations search · March 12, 2026

Search updated March 12, 2026 — the gap in humanities corpora remains across 280M indexed sources.

§ 06 — Strategic relevance

Why humanities
is the hard case

Most research on citation verification is concentrated in STEM. Ateneo operates where that research has not yet reached: specialized humanities corpora.

STEM — The easy case

Digital crutches available

Stable DOIs. Structured APIs (PubMed, Semantic Scholar). Single-language corpus (English). Factual citations: "X causes Y." Atomic verification possible and adequate.

Humanities — The hard case

No crutches, with constitutive complexity

Edition variants. Translations with editorial variance. Disputed attribution. Oral circulation. And the fundamental point: in humanities, citing already is interpreting.

What others do

Suppress the probabilistic

The defensive narrative: "we have mitigated hallucinations." It puts you in the same race as everyone: who suppresses better what the LLM does naturally. Race to the bottom.

What Ateneo does

Frame the probabilistic

What others try to suppress, Ateneo frames with real evidence. The ground is hard. The reading is open but anchored. In humanities, this is not a compromise — it is the right approach.

Academic backing · Scite, March 2026

Interpretation as a task, not as a defect

Gadamer & Derrida · Utrecht U. · 2024 · Open Access

«Doing Justice to Poetry»

Formalizes in peer-reviewed literature the central thesis: in humanities, interpretation cannot and should not be closed. DOI: 10.33391/jgjh.171

Openness as a task

Excerpts cited in Scite

"The task of philosophical hermeneutics is 'to leave the undecidable undecided,' because 'no one knows, and no one has the power to decide,' not even Gadamer."

The formula: Ateneo is not "an LLM we patched to stop hallucinating." It is a reading device where the deterministic (CITA, FUENTE) provides solid ground and the probabilistic (MAPA) opens reading. The guardrails do not eliminate probability — they frame it. The valuable thesis is not "we turned the bug into the feature," but something more precise: we designed an architecture adequate to the domain. Recent literature is beginning to converge on the same point via another route: EviBound (arXiv 2511.05524) demonstrates through autonomous agent governance that integrity does not emerge from model size, but from explicit architectural guards. Ateneo carries that same thesis into the editorial domain.

Who it serves today

Current use cases

Ateneo currently operates on two production corpora. These are the profiles that already use it or that directly fit its architecture.

Researchers and educators

Citation verification in curated corpora

A researcher who needs to check whether a Lacan citation (Seminar, page, edition) is literal, approximate, or nonexistent. Ateneo resolves it in seconds with full traceability.

Publishers and critical edition projects

Quality control on manuscripts

An editor reviewing a manuscript with dozens of citations from Marcus Aurelius' Meditations who needs to verify them against multiple editions and languages. Ateneo cross-checks 4 languages automatically.

Research centers and universities

Validation infrastructure for applied AI

A center that wants to integrate AI into its processes but needs to guarantee that generated references are auditable. Ateneo's architecture is portable to new corpora.

Technology transfer

Pilot on a specific corpus

An institutional or technology transfer partner who wants to evaluate Ateneo on their own corpus: legal, philosophical, philological, or textual heritage.

Ateneo: an editorial
research
instrument

Leading LLMs keep
failing at citation verification

What is Ateneo

How it works:
layer-by-layer verification

1. Whitelist — Only admitted evidence

2. stripDisallowedCitations — Sanitization

3. Quote Contract — Explicit contract

4. Gap detection — Suspicious jumps

5. ANCLA / NÚCLEO / VECINDAD / RUIDO

6. Cross-validation — Consensus across sources

Three methods, complementary coverage

Trigram (pg_trgm)

Vector / HNSW

FTS (Full-Text Search)

Production, not prototype

What the field proposes
and what Ateneo has

A territory still to be explored

Why humanities
is the hard case

Interpretation as a task, not as a defect

Current use cases

A concrete conversation

Ateneo: an editorialresearchinstrument

Leading LLMs keepfailing at citation verification

What is Ateneo

How it works:layer-by-layer verification

1. Whitelist — Only admitted evidence

2. stripDisallowedCitations — Sanitization

3. Quote Contract — Explicit contract

4. Gap detection — Suspicious jumps

5. ANCLA / NÚCLEO / VECINDAD / RUIDO

6. Cross-validation — Consensus across sources

Three methods, complementary coverage

Trigram (pg_trgm)

Vector / HNSW

FTS (Full-Text Search)

Production, not prototype

What the field proposesand what Ateneo has

A territory still to be explored

Why humanitiesis the hard case

Interpretation as a task, not as a defect

Current use cases

A concrete conversation

Ateneo: an editorial
research
instrument

Leading LLMs keep
failing at citation verification

How it works:
layer-by-layer verification

What the field proposes
and what Ateneo has

Why humanities
is the hard case