Making Fichero: A Diary of App Making (1/31)

Spent yesterday refactoring Fichero to get it to build on iOS. Some issues. Toga does not support Commands on iOS. Removed the Document management code, that never worked. iOS doesn’t support multiple Windows. I’ll refactor the windows, so that the views can be used in the Main Window.

This is a screen shot of it running on iOS. Not bad, for what began (and remains) a Typer CLI app.

IOS Screen Shot.

This morning, I did some design thinking with ChatGPT about the Library structure, and then the user interface on Mobile versus Desktop. I’ve been resisting doing this as an a Library app. But, I think it will make it more used.

For the library structure, this makes sense:

Library Collection Folder Folder … File Page |

Something like this:

Library
├── Chocó Mining Reports (Collection)

├── 1930s (Folder)
│ ├── Labor Disputes (Folder)
│ │ ├── Ibargüen v. Bonilla (File)
│ │ │ ├── Fiches (File-level, auto):
│ │ │ │ - auto catalogue
│ │ │ │ - auto translation
│ │ │ │ - auto keywords
│ │ │ │ - auto named entities
│ │ │ ├── Page 1
│ │ │ │ └── Fiches:
│ │ │ │ - OCR text (auto)
│ │ │ │ - original image
│ │ │ │ - enhanced image
│ │ │ ├── Page 2
│ │ │ │ └── Fiches:
│ │ │ │ - OCR text (auto)
│ │ │ │ - original image
│ │ │ │ - enhanced image
│ │ │ └── Page 3
│ │ │ └── Fiches:
│ │ │ - OCR text (auto)
│ │ │ - original image
│ │ │ - enhanced image

│ ├── Mining Rights (Folder)
│ │ ├── Domínguez v. Garrido (File)
│ │ │ ├── Fiches:
│ │ │ │ - auto catalogue
│ │ │ │ - auto named entities
│ │ │ │ - auto keywords
│ │ │ ├── Page 1
│ │ │ │ └── Fiches:
│ │ │ │ - OCR text (auto)
│ │ │ │ - enhanced image
│ │ │ └── Page 2
│ │ │ └── Fiches:
│ │ │ - OCR text (auto)
│ │ │ - original image

│ ├── Criminal Proceedings (Folder)
│ │ ├── Ibargüen v. Bonilla (Criminal) (File)
│ │ │ ├── Fiches:
│ │ │ │ - auto catalogue
│ │ │ │ - auto translation
│ │ │ │ - auto named entities
│ │ │ ├── Page 1
│ │ │ │ └── Fiches:
│ │ │ │ - OCR text (auto)
│ │ │ │ - original image
│ │ │ ├── Page 2
│ │ │ │ └── Fiches:
│ │ │ │ - OCR text (auto)
│ │ │ │ - enhanced image
│ │ │ └── Page 3
│ │ │ └── Fiches:
│ │ │ - OCR text (auto)
│ │ │ - original image

├── 1910s (Folder)
│ ├── Mining Rights (Folder)
│ │ ├── Anglo-Colombian Dev. Co. v. Ismael Rodríguez (File)
│ │ │ ├── Fiches:
│ │ │ │ - auto catalogue
│ │ │ │ - auto translation
│ │ │ │ - auto named entities
│ │ │ ├── Page 1
│ │ │ │ └── Fiches:
│ │ │ │ - OCR text (auto)
│ │ │ │ - original image
│ │ │ ├── Page 2
│ │ │ │ └── Fiches:
│ │ │ │ - OCR text (auto)
│ │ │ │ - enhanced image
│ │ │ └── Page 3
│ │ │ └── Fiches:
│ │ │ - OCR text (auto)
│ │ │ - original image

├── 1940s (Folder)
│ ├── Labor Disputes (Folder)
│ │ ├── Mosquera v. Chocó Pacífico (File)
│ │ │ ├── Fiches:
│ │ │ │ - auto catalogue
│ │ │ │ - auto keywords
│ │ │ │ - auto named entities
│ │ │ ├── Page 1
│ │ │ │ └── Fiches:
│ │ │ │ - OCR text (auto)
│ │ │ │ - original image
│ │ │ ├── Page 2
│ │ │ │ └── Fiches:
│ │ │ │ - OCR text (auto)
│ │ │ │ - enhanced image
│ │ │ └── Page 3
│ │ │ └── Fiches:
│ │ │ - OCR text (auto)
│ │ │ - original image

I also wire framed the user interface, on Desktop and Mobile, and did some notes.

Desktop (macOS, Windows, Linux)

  • Main Window = Library View
    • 3-column layout:
    • Collections (left, 20%)
    • Folder Tree (middle, 20%)
    • File and Content Viewer (right, 60%) (25%/75% vertical split)
  • Utility Windows:
    • Settings, Prompt Library, Plan Library, Activity, About, Help
    • Each is a **separate Window or View opened via menu or toolbar commands

Mobile (iOS / Android)

Structure

  • No separate windows — use stacked navigation
  • Navigation handled via:
    • Hamburger menu
    • Tab bar
    • Back navigation

Example Navigation Flow

  • Home screen = Library
    • Tap through:
    • Collection → Folder Tree → File Fiches (auto catalogues etc.) and Pages (images, etc.) Viewer/Editor
  • Tap menu or tab for:
    • Settings
    • Prompt Library
    • Plan Library
    • Activity
    • Help
    • About

Fichero Update and Working Towards 0.1.0-dev

Where am I in Fichero?

I spent the two weeks at the end of June doing a pretty major rewrite. So what did that involve?

I had a director.py module that handled all the parallel scripts. It was a mammoth—1200 lines of spaghetti code. It was disorganized. It worked in Celery/REDIS, as a way of doing multiprocessing on separate processes. But, how to share data? I liked it because it worked, as a hack. Basically, it just duplicated folders to process and ran the tools on multiple sets of folders simultaneously. It worked well enough. The problem was, director.py was super complex and very interconnected, and hard to maintain. So, I spent a week in June refactoring it.

Now, there are various modules of a director backend that each document window can talk to. “Take this folder,” and it processes it. The director.py sends files to various modules, and keeps track of various things.

It prepares the folders, copying them into place.

Then a workflow executor loads the plan files, which are stored either in the application’s resources folder, or in the default file locations for Linux, macOS, or Windows settings. The plan is a modified YAML version of the Weasel format, but with the Weasel-specific parts removed and simplified for the workflow executor. The executor communicates with a backend for multiprocessing, which then calls the tools.

The multiprocessing backend is either Celery/REDIS, or a Python-native backend using concurrent.futures and threads.

I’ve got the Python-native backend working well, but haven’t tested Celery/REDIS recently. So, I turn it of.

For both, we have two kinds of workers:
• CPU workers, for CPU-intensive tasks (like image manipulation), and
• I/O workers, which are much less intensive and handle tasks that involve waiting—e.g., writing to disk or waiting on network access (especially inference from the big language models).

At present, the Python backend only works with the CPU workers. Which you can set.

The way it will work is: we spin up four CPU workers to handle image processing, and once that’s complete, the images gets passed to the I/O workers. Ideally, the four CPU workers can keep things moving efficiently. This works well enough. But, I need to get the I/O workers working again for the Python backend.

The tools can also run their own multithreading, e.g. for LLM calls, or when they are only running on one folder. This setup works well, for example, when sending requests to AI models—doing so using threads and concurrent.futures.

Why concurrent.futures and threads (and not async or subprocesses) because Toga uses async, and I was running into issues trying to do async communication with the Python backend. Sub processes were not happy in a macOS build.

In short, refactoring involved breaking out all the utilities of director into smaller files, it’s now much clearer how it works. And it’s easier to bug fix. But, it ends up being way more code. I suspect Cursor.ai has over engineered it. And, I’m slowly going through each to make sure it’s minimal. But, that can wait to a later revision. I’m sure there are some things that are stupid or don’t make much sense, but as time allows, I’ll refactor each module to get it tighter with less code.

Speaking of other changes, I have put together a Toga GUI, synced as much as possible with the CLI. It now builds on a Mac, and runs in Briefcase Dev on Linux and Windows. Which I haven’t gotten around to testing yet, as I don’t have a PC.

Big issues to address in the future. One is the backend data model. Currently, each step creates its own manifest.jsonl file—we generate a new one each time. I wonder about revising so each folder as a single manifest file. The manifest file gets updated and edited there. The nice thing about the current process is that it’s reliable with multiple threads, but I think we could do better. Still, troubleshooting tools in the current setup is easy. I can go into the Finder and make changes to the files directly. If it’s abstracted too much, it might be easier to code, but harder for to understand.

Fichero is for researchers—it doesn’t have to be super slick. In fact, the clearer it is to the people using it how it works, the less “magic”
it will feel. I hope the transparency makes it more useful, and it gives users the ability to use it as a tool, not a replacement for their work.

I also added more reliable error checking. It used to be that if we hit an error in one process, it would just keep going. Now it stops.

Along with a GUI, inspired by Hyperspace and BBEdit, I spent some time working on a task monitoring interface. That said, having a clean display that shows what processors are doing, what errors have occurred, etc.

Anyway, what’s left? Lots. But, I’m hoping to get a build anyone can run for 0.1.0-dev.

This requires:

  • Building the macOS version, and testing it and the CLI version: That is, making sure everything runs smoothly on macOS, both the GUI and the CLI.
  • Testing the Windows version and CLI version: Run the build on Windows. Make sure the GUI launches, folders can be selected, and things process properly. Same for the CLI—it should run clean.

  • Testing the Linux version and CLI version: Try it out on a couple of Linux distros. Make sure GTK stuff works, files save in the right place, and multiprocessing behaves. Both UI and CLI.

  • Double-check the whole UI is translated. I added some internationalization. Go through the interface and make sure all the text is showing up in the right language. Add any missing translations. I think we can add a function to figure out language from the OS at launch

  • Build: Get builds working and ready to share. Signed if possible. Test install and run on all three platforms.

  • Update the documentation: Rewrite or clean up the README, dev notes, and any how-to guides. They should reflect where the app is now—not where it was a month ago.

  • Disable unused UI stuff. I hid buttons and features that aren’t wired up yet—Plans, Prompts, Commands like Save, Open, etc. No need to confuse people with stuff that doesn’t work yet. I need to double check that.

Create a workflow that goes straight to transcription: Not everyone wants their images changed. If I can pull it off, I’d like users to be able to edit workflows directly in the app. Doesn’t have to be fancy—just a way to open, tweak, and save a plan.

fichero_director.py: Running Fichero on Multiple Processors

What is Fichero? It is a Python-based pipeline for large-scale document processing—think thousands of scanned pages, each requiring cropping, enhancement, OCR, and transcription.

I’ve had versions of Fichero working for six month. But, it was slow. Fichero processed one step at a time, then the next. It was slow. The challenge, taking advantage of computing power, multiple processors, and not getting bogged down.

This week, I got fichero_director.py working, which uses Celery to run Fichero “workers” on multiple processes, each doing different steps at the same time.

Imagine one cook making 100 pizzas, one after the others. Versus 8 cooks making 8 pizzas at a time.

How does it work? Fichero_director.py breaks down workflows into CPU-intensive and I/O-intensive steps. For example, image tasks are CPU-heavy, while transcription using language models or converting to Word documents is mostly I/O-bound by disk or LLM inference. Tasks are sent to Celery queues based on script type.

Fichero_director tries to tune for the host system. On M1 Macs, for example, the hope is CPU workers use the performance cores, while more numerous I/O workers handle slower operations on the efficiency courses.

On my machine, I have 8 cores, and fichero_directory.py uses them all.

To follow along, there is a simple dashboard that tracks real-time progress, showing each folder’s status and current step. Each folder is processed independently, with logs written per folder and per step.