Technical and strategic dossier · March 2026

Ateneo: an editorial
research

instrument

A system that distinguishes between solid ground and open reading. This document presents the problem it solves, how its architecture works, the evidence that it works, and why the right solution for humanities is not the same as for sciences.

Updated 22.03.2026 · State of the art and bibliographic findings cross-checked with Scite Smart Citations

Dicendum est, sed ita ut nihil affirmem, quaeram omnia, dubitans plerumque et mihi diffidens
"I must speak, but without affirming anything; I shall investigate everything, doubting most of the time and distrusting myself."1
1Marcus Tullius Cicero, De divinatione, II, 3, 8, citado en Michel de Montaigne, Los ensayos (ed. 1595), trad. J. Bayod Brau (Barcelona: Acantilado, 2007), 587.

Leading LLMs keep
failing at citation verification

PaperAsk (arXiv, Oct 2025) evaluated GPT-4o, GPT-5, and Gemini-2.5-Flash on real-world academic tasks. Not just in controlled benchmarks, but under real working conditions.

48–98%
Failure rate in citation retrieval with multi-reference queries
PaperAsk · citation retrieval task
72–91%
Failure in content extraction by section
PaperAsk · content extraction task
<0.32
F1 in paper discovery — over 60% of relevant papers are missed
PaperAsk · paper discovery task
Citation retrieval (worst case)
98% failure
Citation retrieval (best case)
48% failure
Content extraction
72–91% failure
AstaBench — mean accuracy (57 agents)
38.78%
Reliability is not assumed — it is designed. These data are not anecdote — they are published and reproducible benchmarks on the best available models under real conditions. The problem is structural.

What is Ateneo

An AI-assisted editorial research instrument that distinguishes between solid ground and open reading. Not a chatbot. Not a wrapper. An architecture designed so that every claim has an explicit confidence level.

Level Mode Nature Function
Solid ground CITA Deterministic, no LLM Verifiable literal citability. Pure documentary search with cache. The reader can go to the text and judge for themselves.
Documentary finding FUENTE Deterministic + semantic rescue Passage location. FTS headline + trigram fallback + semantic rescue with page matching.
Knotted reading MAPA Framed probabilistic Open but verifiable interpretation. Guardrails + gap detection + auditing. It does not hallucinate — it rereads.

In Lacan we are not looking for a certainty machine, but a system that distinguishes between verifiable citation and anchored assisted reading. Ateneo does not repeat — it rereads. And it does so anchored in the text.

Master Context · Ateneo · 2026

Two interactive diagrams to read Ateneo's architecture at a glance before entering the technical layers.

Solid ground / open reading
The two floors of Ateneo
CITA / FUENTE / MAPA
Three contracts with the reader

How it works:
layer-by-layer verification

An LLM is a probabilistic generator: it produces fluent text, but does not guarantee that text is faithful to the corpus. Ateneo interposes six verification layers between the model and the user.

🚪
Pre-LLM
1. Whitelist — Only admitted evidence

The LLM never sees data from outside the curated corpus. Suspension of all external judgment: "you can only speak about what is in these books."

✂️
Post-LLM
2. stripDisallowedCitations — Sanitization

Reviews the model's output and strikes out any citation outside the allowed list. "Meditations, Book XV" → does not exist (there are only 12). Removed.

⚖️
Rule
3. Quote Contract — Explicit contract

Prompt with formal instruction: "If you cannot find the citation, say you cannot find it. Do NOT invent." Sets the framework; if it fails, layers 4–6 catch it.

🔍
Logic
4. Gap detection — Suspicious jumps

Detects unjustified temporal or conceptual jumps. Citation from Seminar 2 (1955) alongside a reference to Seminar 20 (1972) without explanation → alarm.

📁
Classification
5. ANCLA / NÚCLEO / VECINDAD / RUIDO

Each retrieved fragment is classified by proximity. Only ANCHOR (exact citation) and CORE (immediate context) reach the LLM. VICINITY and NOISE are discarded.

👥
Cross-check
6. Cross-validation — Consensus across sources

Cross-checks sources against each other. In Marcus Aurelius: 4 languages (Greek, English, Spanish, French). If "confirmed" in English but without a correlate in Greek → inconsistency.

Three methods, complementary coverage

Each search method has a different blind spot. Combined, they compensate for each other.

🔍
Trigram (pg_trgm)
Form-based search

Divides text into 3-character chunks and compares. Tolerant of typos and imperfect OCR.

✓ Detects typos, variants
✗ Does not understand meaning
🧠
Vector / HNSW
Meaning-based search

Each text → vector of 1,536 numbers. «The impediment to action...» y «τὸ ἐμποδίζον τῇ πράξει...» → nearly identical vectors.

✓ Crosses languages, synonyms
✗ Can confuse ambiguous concepts
📖
FTS (Full-Text Search)
Word-based search

Analytical index with morphological flexibility. "verificar" finds "verificación" and "verificado".

✓ Morphological roots, precision
✗ Does not cross languages
4 formal contracts. 80 cases. 80/80 live. End-to-end runtime auditing. The difference is not the system's volume, but its discipline: verifiable contracts, checkers, CI, and traceability persistence when the model responds.

Production, not prototype

Ateneo has been in production since February 2026 operating on real humanities corpora. These are the verified metrics.

80/80
Static and live validation — 20 cases per mode, 4 modes, 0 failures in the current battery
CI · GitHub Actions · production
4
Formal contracts — CITA, FUENTE, MAPA and Stoic Verify
Contracts · Python checkers · CI
Runtime
Persisted auditing — claims, evidence, and policy snapshots confirmed end-to-end
Production · per-response traceability
Stoic citation verification in Ateneo — result CONFIRMED 95/100 with multilingual contrast
Ateneo Stoic Edition · Production verification · Citation cross-checked against original Greek text (Leopold 1908) in 4 languages
Seminars 1–23 in Spanish and French

Curated corpus of Lacan's 23 Seminars. OCR corrected, pagination aligned with Paidós editions. The hard case: multiple editions, translations with variance, oral seminars transcribed.

Marcus Aurelius + Seneca + Epictetus in 4 languages

Critical apparatus with original Greek text (Leopold 1908, Long 1862), English, Spanish, and French. Cross-verification across four languages. Public domain editions with stable textual tradition.

How do I know it doesn't invent citations? Three levels: (1) citations [Book, p.N] are validated against the deterministic database — if it didn't come from there, it is removed; (2) quoted literals are verified character by character against original snippets; (3) runtime auditing cross-checks languages to detect inconsistencies. The cita_fast contract is 100% deterministic, no LLM.

What the field proposes
and what Ateneo has

Papers from the last 6 months on citation verification in LLMs, cross-referenced against Ateneo's architecture.

FACTUM
Johns Hopkins / DARPA · ene 2026
Mechanistic theory: hallucination is an Attention vs Feed-Forward failure in the LLM.
Provides theoretical framework for why Ateneo's guards work. Not competition — justification.
Diagnostic
SemanticCite
U. Sydney · nov 2025
Verify whether the citation actually supports what is said, not just whether it exists.
Ateneo verifies existence. Shared pending gap: verifying interpretive prose.
Partial
CiteGuard
U. Waterloo + U. Illinois · oct 2025
Attribution alignment con retrieval-augmented validation. 65,4% de precisión en CiteME — casi a nivel humano (69,7%).
Ateneo follows this same scheme: retrieval → LLM → verification. In specialized humanities, precision tends to drop due to corpus complexity.
✅ Sí
CheckIfExist
— · ene 2026
Verify that bibliographic references actually exist before including them.
stripDisallowedCitations: if the citation [SEM XI, p.N] did not come from the DB, it is automatically removed.
✅ Sí
HalluCitation Matters
NAIST, Japón · 2025
Documents fabricated references in papers accepted at ACL, NAACL, and EMNLP — top conferences.
The problem is so severe that it contaminates peer-reviewed academic publications.
Diagnostic
Citation Failure / CITENTION
TU Darmstadt · sep 2025
Distinguishes citation failure (incomplete citation) from response failure (incorrect response).
cita_fast: no response failure possible (deterministic). FUENTE separates citation verification from response quality.
✅ Arq.
Deployment Constraints & Citation Hallucination
NYU / CMU / Boston U · mar 2026
Empirical study on how production prompting constraints aggravate citation hallucination, evaluated on Claude Sonnet, GPT-4o, LLaMA, and Qwen with a deterministic verification pipeline.
cita_fast is 100% deterministic and LLM-free: this is exactly what this paper demonstrates works. arXiv:2603.07287
✅ Sí
PaperAsk
arXiv · oct 2025
Quantitative benchmark: citation retrieval fails 48–98% on GPT-4o, GPT-5, and Gemini.
Not anecdote but reproducible numbers on the best models under real conditions.
Benchmark
OpenScholar
Allen Institute · 2024
Real system: scientific literature synthesis with RAG and anchored citations. Paper in Nature, public demo.
Operates in STEM, not humanities. Single model (LLaMA 3.1 8B).
Product
VeriCite
arXiv · oct 2025
Pipeline: generation → evidence selection → refinement with verified citations.
Prototype with no visible production deployment. Does not verify interpretive prose between citations.
Prototype
4 of 9 papers describe mechanisms comparable to components already implemented in Ateneo. The pending gap converges on one point: that interpretive prose have visible grounding in the text. That is precisely the territory where Ateneo is working.

Epistemological note: Quattrociocchi, Capraro, and Perc (arXiv 2512.19466, with empirical validation in PNAS) formalize this intuition under the concept of Epistemia: LLMs are not epistemic agents but stochastic completion systems where linguistic plausibility replaces epistemic evaluation. Ateneo starts from the same premise: reliability is not assumed from the model — it is designed into the contract, traceability, and abstention.

A territory still to be explored

A — Documentary gap
0
Papers on LLM citation verification in specialized humanities corpora — across 280 million indexed sources
Adjacent work is beginning to appear — such as citation attribution in novels (NAACL 2025, Michel et al.) — but the terrain of verification for specialized humanities corpora remains largely unoccupied.
Scite · 2 searches · March 10, 2026
B — SciRAG profiled
100%
STEM
SciRAG (Nov 2025): operates on indexed scientific literature in English with DOIs. Different domain from Ateneo.
DOI: 10.48550/arxiv.2511.14362
C — CiteGuard unrefuted
0
Contrasting citations in Scite for CiteGuard. The 65,4% en CiteME is the best available STEM result.
Scite tally · March 10, 2026

Across 280 million indexed sources, I found no papers on LLM citation verification in specialized humanities corpora. There is adjacent work — citation attribution in novels is already at top NLP conferences — but the central piece is missing: traceability and explicit judgment about what can be cited and what cannot. That is precisely the territory where Ateneo operates.

Scite Smart Citations search · March 12, 2026

Search updated March 12, 2026 — the gap in humanities corpora remains across 280M indexed sources.

Why humanities
is the hard case

Most research on citation verification is concentrated in STEM. Ateneo operates where that research has not yet reached: specialized humanities corpora.

STEM — The easy case
Digital crutches available

Stable DOIs. Structured APIs (PubMed, Semantic Scholar). Single-language corpus (English). Factual citations: "X causes Y." Atomic verification possible and adequate.

Humanities — The hard case
No crutches, with constitutive complexity

Edition variants. Translations with editorial variance. Disputed attribution. Oral circulation. And the fundamental point: in humanities, citing already is interpreting.

What others do
Suppress the probabilistic

The defensive narrative: "we have mitigated hallucinations." It puts you in the same race as everyone: who suppresses better what the LLM does naturally. Race to the bottom.

What Ateneo does
Frame the probabilistic

What others try to suppress, Ateneo frames with real evidence. The ground is hard. The reading is open but anchored. In humanities, this is not a compromise — it is the right approach.

Interpretation as a task, not as a defect

Gadamer & Derrida · Utrecht U. · 2024 · Open Access
«Doing Justice to Poetry»

Formalizes in peer-reviewed literature the central thesis: in humanities, interpretation cannot and should not be closed. DOI: 10.33391/jgjh.171

Openness as a task
Excerpts cited in Scite

"The task of philosophical hermeneutics is 'to leave the undecidable undecided,' because 'no one knows, and no one has the power to decide,' not even Gadamer."

The formula: Ateneo is not "an LLM we patched to stop hallucinating." It is a reading device where the deterministic (CITA, FUENTE) provides solid ground and the probabilistic (MAPA) opens reading. The guardrails do not eliminate probability — they frame it. The valuable thesis is not "we turned the bug into the feature," but something more precise: we designed an architecture adequate to the domain. Recent literature is beginning to converge on the same point via another route: EviBound (arXiv 2511.05524) demonstrates through autonomous agent governance that integrity does not emerge from model size, but from explicit architectural guards. Ateneo carries that same thesis into the editorial domain.

Current use cases

Ateneo currently operates on two production corpora. These are the profiles that already use it or that directly fit its architecture.

Researchers and educators
Citation verification in curated corpora

A researcher who needs to check whether a Lacan citation (Seminar, page, edition) is literal, approximate, or nonexistent. Ateneo resolves it in seconds with full traceability.

Publishers and critical edition projects
Quality control on manuscripts

An editor reviewing a manuscript with dozens of citations from Marcus Aurelius' Meditations who needs to verify them against multiple editions and languages. Ateneo cross-checks 4 languages automatically.

Research centers and universities
Validation infrastructure for applied AI

A center that wants to integrate AI into its processes but needs to guarantee that generated references are auditable. Ateneo's architecture is portable to new corpora.

Technology transfer
Pilot on a specific corpus

An institutional or technology transfer partner who wants to evaluate Ateneo on their own corpus: legal, philosophical, philological, or textual heritage.

A concrete conversation

What we propose: a private 30-minute session where we show Ateneo running live on the Lacan or Stoic corpus. No slide decks — directly on the tool. If there is a proprietary corpus on which to evaluate portability, we can explore a scoped pilot.

Web: ateneo.pablomartinezsamper.com
Contacto: pablo@pablomartinezsamper.com
Format
Private demo + open conversation

Video call or in person. No commitment. The goal is for the interlocutor to see the system operating and judge for themselves.

Exploration
Pilot on proprietary corpus

If there is a humanities or textual heritage corpus that requires verification, we can jointly evaluate the technical feasibility of a scoped adaptation.