Fichero transcribes and auto-catalogues historical archives using vision large language models and artificial intelligence, running locally or in the cloud. It is in early development. To learn more about Fichero, please read the Fichero development blog and the frequently asked questions.
Releases
0.0.2 – May 25, 2025
- Added support for running on multiple processors and in parallel, and doing asynchronous transcription calls. Fichero_director.py.
0.0.1 – May 9, 2025
- Initial development release, which crops, splits, rotates, enhances, removes background, and transcribes.
Frequently Asked Questions
Q: What does Fichero do?
A: Fichero processes and transcribes large collections (100s or 10,000s) of scanned or photographed documents. It crop, split notebooks, clean, segment, and run OCR (Optical Character Recognition)/HTR (Handwritten Text Recognition) on using AI, then output the results in formats like Word, Markdown, or PDF—automatically, and at scale.
You can customize each step or run it end-to-end with a single command. Fichero is designed for researchers and archivists who need to transform thousands of archival pages into clean, searchable text.
Q: How do I use Fichero?
Carefully. Fichero is in development on GitHub. If you are comfortable with using the terminal, installing Python apps, setting up the environments, running LM Studio or signing up for a Alibaba, Open AI, or Claude API Key, then follow the instructions on GitHub.
Otherwise, wait. I’m working on improving Fichero, and making it easier to use as an app using Briefcase and Toga.
Follow along on my [development notes]( page, or stay tuned here.
I’m happy to hear your thoughts. Email daniel@tubb.ca.