Methodology

How this works

Transparency means showing our work. Here is exactly how documents enter Epstein Suite, how they're indexed, and how every answer is grounded and cited.

1. Where the documents come from

We index only official, public releases: records unsealed by courts, disclosed by the Department of Justice, and released by congressional committees. We do not source from leaks, hacks, or private collections. Every release we ingest is listed — with its source organization, official URL, and date — on the Sources page, and every document links back to that source.

2. Ingestion & OCR

Official releases arrive as scanned PDFs. For each release we:

Download the files from their official location.
Split them into individual documents and record the source reference (docket entry, Bates range, or release page number).
Run optical character recognition (OCR) to extract machine-readable text from the scans.
Store both the text and a pointer back to the original file's official URL.

OCR is imperfect. Scanned, redacted, and handwritten pages produce errors. Wherever we show document text we label it as machine OCR and link to the original so you can verify it yourself. The original record always governs.

3. Search & the name index

The extracted text is indexed for full-text search, so you can find any phrase across every document. We also detect the people and organizations named in the records and build a page for each one. A name page collects the verbatim passages that mention that entity, each with its source reference and a link to the document and its official source. It is a finding aid — a way to navigate public records — not a dossier.

4. How the AI answers are grounded

The "Ask the documents" box uses retrieval-augmented generation. When you ask a question, we:

Search the indexed documents for the passages most relevant to your question.
Pass only those passages to a language model, with instructions to answer using the provided text and nothing else.
Require the model to attach a citation — shown as S1, S2 — to each claim, pointing at the source passage it came from.
List those sources beneath the answer, each linking to the document page and to the original on its official release.

If the retrieved passages don't actually support an answer, the system is built to say so rather than speculate. The answer is then marked as not grounded, and we point you back to the search and the source documents. We never present a model's guess as if it were in the record.

5. The not-an-accusation policy

People appear in these documents for many reasons: as witnesses, as names in an address book, as people mentioned by others, as third parties, or simply in passing. Appearing in a record says nothing, by itself, about a person's conduct.

Being named in these documents is not an accusation or evidence of wrongdoing. Our name pages report only what the public records literally contain, with citations. We do not rank, score, or imply guilt; we do not aggregate "allegations"; and where a page notes a legally documented status (for example, a public charge, conviction, or civil settlement), it is stated neutrally and factually, with the presumption of innocence intact.

6. Corrections

If we have mis-attributed a passage, confused two people with the same name, or otherwise misrepresented what a record says, we want to fix it. Email [email protected] with the page URL and the specifics. We review corrections promptly. We index public records and cite them, so we generally cannot remove the underlying public facts — but we will correct any error in how we present them.

7. What we deliberately don't do

We don't host the original files — we link to the official source.
We don't publish private, leaked, or unredacted material.
We don't editorialize, sensationalize, or draw conclusions about individuals.
We don't let the AI answer beyond what the retrieved passages support.

Methodology last reviewed June 2026. Questions: [email protected].