A system that distinguishes between solid ground and open reading. This document presents the problem it solves, how its architecture works, the evidence that it works, and why the right solution for humanities is not the same as for sciences.
Updated 22.03.2026 · State of the art and bibliographic findings cross-checked with Scite Smart Citations
PaperAsk (arXiv, Oct 2025) evaluated GPT-4o, GPT-5, and Gemini-2.5-Flash on real-world academic tasks. Not just in controlled benchmarks, but under real working conditions.
An AI-assisted editorial research instrument that distinguishes between solid ground and open reading. Not a chatbot. Not a wrapper. An architecture designed so that every claim has an explicit confidence level.
| Level | Mode | Nature | Function |
|---|---|---|---|
| Solid ground | CITA | Deterministic, no LLM | Verifiable literal citability. Pure documentary search with cache. The reader can go to the text and judge for themselves. |
| Documentary finding | FUENTE | Deterministic + semantic rescue | Passage location. FTS headline + trigram fallback + semantic rescue with page matching. |
| Knotted reading | MAPA | Framed probabilistic | Open but verifiable interpretation. Guardrails + gap detection + auditing. It does not hallucinate — it rereads. |
In Lacan we are not looking for a certainty machine, but a system that distinguishes between verifiable citation and anchored assisted reading. Ateneo does not repeat — it rereads. And it does so anchored in the text.
Two interactive diagrams to read Ateneo's architecture at a glance before entering the technical layers.
An LLM is a probabilistic generator: it produces fluent text, but does not guarantee that text is faithful to the corpus. Ateneo interposes six verification layers between the model and the user.
The LLM never sees data from outside the curated corpus. Suspension of all external judgment: "you can only speak about what is in these books."
Reviews the model's output and strikes out any citation outside the allowed list. "Meditations, Book XV" → does not exist (there are only 12). Removed.
Prompt with formal instruction: "If you cannot find the citation, say you cannot find it. Do NOT invent." Sets the framework; if it fails, layers 4–6 catch it.
Detects unjustified temporal or conceptual jumps. Citation from Seminar 2 (1955) alongside a reference to Seminar 20 (1972) without explanation → alarm.
Each retrieved fragment is classified by proximity. Only ANCHOR (exact citation) and CORE (immediate context) reach the LLM. VICINITY and NOISE are discarded.
Cross-checks sources against each other. In Marcus Aurelius: 4 languages (Greek, English, Spanish, French). If "confirmed" in English but without a correlate in Greek → inconsistency.
Each search method has a different blind spot. Combined, they compensate for each other.
Divides text into 3-character chunks and compares. Tolerant of typos and imperfect OCR.
Each text → vector of 1,536 numbers. «The impediment to action...» y «τὸ ἐμποδίζον τῇ πράξει...» → nearly identical vectors.
Analytical index with morphological flexibility. "verificar" finds "verificación" and "verificado".
Ateneo has been in production since February 2026 operating on real humanities corpora. These are the verified metrics.
Curated corpus of Lacan's 23 Seminars. OCR corrected, pagination aligned with Paidós editions. The hard case: multiple editions, translations with variance, oral seminars transcribed.
Critical apparatus with original Greek text (Leopold 1908, Long 1862), English, Spanish, and French. Cross-verification across four languages. Public domain editions with stable textual tradition.
cita_fast contract is 100% deterministic, no LLM.
Papers from the last 6 months on citation verification in LLMs, cross-referenced against Ateneo's architecture.
stripDisallowedCitations: if the citation [SEM XI, p.N] did not
come from the DB, it is automatically removed.
cita_fast is 100% deterministic and LLM-free: this is exactly what this paper demonstrates works. arXiv:2603.07287
Across 280 million indexed sources, I found no papers on LLM citation verification in specialized humanities corpora. There is adjacent work — citation attribution in novels is already at top NLP conferences — but the central piece is missing: traceability and explicit judgment about what can be cited and what cannot. That is precisely the territory where Ateneo operates.
Search updated March 12, 2026 — the gap in humanities corpora remains across 280M indexed sources.
Most research on citation verification is concentrated in STEM. Ateneo operates where that research has not yet reached: specialized humanities corpora.
Stable DOIs. Structured APIs (PubMed, Semantic Scholar). Single-language corpus (English). Factual citations: "X causes Y." Atomic verification possible and adequate.
Edition variants. Translations with editorial variance. Disputed attribution. Oral circulation. And the fundamental point: in humanities, citing already is interpreting.
The defensive narrative: "we have mitigated hallucinations." It puts you in the same race as everyone: who suppresses better what the LLM does naturally. Race to the bottom.
What others try to suppress, Ateneo frames with real evidence. The ground is hard. The reading is open but anchored. In humanities, this is not a compromise — it is the right approach.
Formalizes in peer-reviewed literature the central thesis: in humanities, interpretation cannot and should not be closed. DOI: 10.33391/jgjh.171
"The task of philosophical hermeneutics is 'to leave the undecidable undecided,' because 'no one knows, and no one has the power to decide,' not even Gadamer."
Ateneo currently operates on two production corpora. These are the profiles that already use it or that directly fit its architecture.
A researcher who needs to check whether a Lacan citation (Seminar, page, edition) is literal, approximate, or nonexistent. Ateneo resolves it in seconds with full traceability.
An editor reviewing a manuscript with dozens of citations from Marcus Aurelius' Meditations who needs to verify them against multiple editions and languages. Ateneo cross-checks 4 languages automatically.
A center that wants to integrate AI into its processes but needs to guarantee that generated references are auditable. Ateneo's architecture is portable to new corpora.
An institutional or technology transfer partner who wants to evaluate Ateneo on their own corpus: legal, philosophical, philological, or textual heritage.
Video call or in person. No commitment. The goal is for the interlocutor to see the system operating and judge for themselves.
If there is a humanities or textual heritage corpus that requires verification, we can jointly evaluate the technical feasibility of a scoped adaptation.