Why reversing binarization
changes everything
Conventional OCR has a fundamental assumption baked into every pipeline: the background is noise. Characters are dark marks on a light surface — or light marks on a dark surface — and the job of binarization is to separate them cleanly. Background pixels are discarded. Only the foreground survives.
Entreacte inverts this assumption. The background is not noise. The background is signal.
The mathematics of the void
Consider a line of printed Latin text. Between each character, there is a gap. Between words, a larger gap. Between lines, the largest gap of all. These gaps are not random — they are structurally constrained by the typography, the language, the writer, and the medium. They form a lattice of intervals with measurable properties: width, height, aspect ratio, proximity to neighbouring intervals.
In Entreacte, we call these gaps différance objects — borrowing Derrida's term for the productive difference that makes meaning possible. A page of French prose yields 1,675 différance objects. A page of Arabic calligraphy yields a different distribution. A page of CJK ideographs yields another. The distribution is a fingerprint of the script, the language, and the document.
Why this produces speedups
Standard OCR engines spend most of their computation on foreground processing: detecting character boundaries, normalising scale, feeding patches through deep neural networks trained on millions of labelled examples. This is expensive, especially for dense ideographic scripts where a single CJK character can contain dozens of strokes.
Reversed binarization skips the character entirely. We process the intervals — and intervals are structurally simpler than characters. A différance object is a connected component of white (or near-white) pixels. Extracting connected components is O(n) in image size. The resulting lattice encodes line structure, word boundaries, and document layout without ever looking at a single character glyph.
On the Derrida benchmark (De la grammatologie p. 11, 1517×2492px), the interval extraction stage runs in 1.27 seconds at 150dpi — comparable to Tesseract's 0.58 seconds, but producing not just a text string but a full geometric representation of the document's structure.
What the representation contains
The output of reversed binarization is richer than text. Each différance object carries:
- Bounding box coordinates and area
- Aspect ratio and compactness
- Proximity graph edges to neighbouring objects
- Line band membership (which text line it belongs to)
- Cellular automata entropy contribution
This representation feeds directly into the GNN layer, the steganographic fingerprint, and the dual-channel CNN — all without any language-specific training data. The same pipeline processes Latin, Arabic, CJK, Devanagari, Hebrew, Greek, and Cyrillic without modification.
That universality is the core claim of the patent. Not that we read characters better — but that we read documents without reading characters at all.