Where am I in Fichero?
I spent the two weeks at the end of June doing a pretty major rewrite. So what did that involve?
I had a director.py module that handled all the parallel scripts. It was a mammoth—1200 lines of spaghetti code. It was disorganized. It worked in Celery/REDIS, as a way of doing multiprocessing on separate processes. But, how to share data? I liked it because it worked, as a hack. Basically, it just duplicated folders to process and ran the tools on multiple sets of folders simultaneously. It worked well enough. The problem was, director.py was super complex and very interconnected, and hard to maintain. So, I spent a week in June refactoring it.
Now, there are various modules of a director backend that each document window can talk to. “Take this folder,” and it processes it. The director.py sends files to various modules, and keeps track of various things.
It prepares the folders, copying them into place.
Then a workflow executor loads the plan files, which are stored either in the application’s resources folder, or in the default file locations for Linux, macOS, or Windows settings. The plan is a modified YAML version of the Weasel format, but with the Weasel-specific parts removed and simplified for the workflow executor. The executor communicates with a backend for multiprocessing, which then calls the tools.
The multiprocessing backend is either Celery/REDIS, or a Python-native backend using concurrent.futures and threads.
I’ve got the Python-native backend working well, but haven’t tested Celery/REDIS recently. So, I turn it of.
For both, we have two kinds of workers:
• CPU workers, for CPU-intensive tasks (like image manipulation), and
• I/O workers, which are much less intensive and handle tasks that involve waiting—e.g., writing to disk or waiting on network access (especially inference from the big language models).
At present, the Python backend only works with the CPU workers. Which you can set.
The way it will work is: we spin up four CPU workers to handle image processing, and once that’s complete, the images gets passed to the I/O workers. Ideally, the four CPU workers can keep things moving efficiently. This works well enough. But, I need to get the I/O workers working again for the Python backend.
The tools can also run their own multithreading, e.g. for LLM calls, or when they are only running on one folder. This setup works well, for example, when sending requests to AI models—doing so using threads and concurrent.futures.
Why concurrent.futures and threads (and not async or subprocesses) because Toga uses async, and I was running into issues trying to do async communication with the Python backend. Sub processes were not happy in a macOS build.
In short, refactoring involved breaking out all the utilities of director into smaller files, it’s now much clearer how it works. And it’s easier to bug fix. But, it ends up being way more code. I suspect Cursor.ai has over engineered it. And, I’m slowly going through each to make sure it’s minimal. But, that can wait to a later revision. I’m sure there are some things that are stupid or don’t make much sense, but as time allows, I’ll refactor each module to get it tighter with less code.
Speaking of other changes, I have put together a Toga GUI, synced as much as possible with the CLI. It now builds on a Mac, and runs in Briefcase Dev on Linux and Windows. Which I haven’t gotten around to testing yet, as I don’t have a PC.
Big issues to address in the future. One is the backend data model. Currently, each step creates its own manifest.jsonl file—we generate a new one each time. I wonder about revising so each folder as a single manifest file. The manifest file gets updated and edited there. The nice thing about the current process is that it’s reliable with multiple threads, but I think we could do better. Still, troubleshooting tools in the current setup is easy. I can go into the Finder and make changes to the files directly. If it’s abstracted too much, it might be easier to code, but harder for to understand.
Fichero is for researchers—it doesn’t have to be super slick. In fact, the clearer it is to the people using it how it works, the less “magic”
it will feel. I hope the transparency makes it more useful, and it gives users the ability to use it as a tool, not a replacement for their work.
I also added more reliable error checking. It used to be that if we hit an error in one process, it would just keep going. Now it stops.
Along with a GUI, inspired by Hyperspace and BBEdit, I spent some time working on a task monitoring interface. That said, having a clean display that shows what processors are doing, what errors have occurred, etc.
Anyway, what’s left? Lots. But, I’m hoping to get a build anyone can run for 0.1.0-dev.
This requires:
- Building the macOS version, and testing it and the CLI version: That is, making sure everything runs smoothly on macOS, both the GUI and the CLI.
-
Testing the Windows version and CLI version: Run the build on Windows. Make sure the GUI launches, folders can be selected, and things process properly. Same for the CLI—it should run clean.
-
Testing the Linux version and CLI version: Try it out on a couple of Linux distros. Make sure GTK stuff works, files save in the right place, and multiprocessing behaves. Both UI and CLI.
-
Double-check the whole UI is translated. I added some internationalization. Go through the interface and make sure all the text is showing up in the right language. Add any missing translations. I think we can add a function to figure out language from the OS at launch
-
Build: Get builds working and ready to share. Signed if possible. Test install and run on all three platforms.
-
Update the documentation: Rewrite or clean up the README, dev notes, and any how-to guides. They should reflect where the app is now—not where it was a month ago.
-
Disable unused UI stuff. I hid buttons and features that aren’t wired up yet—Plans, Prompts, Commands like Save, Open, etc. No need to confuse people with stuff that doesn’t work yet. I need to double check that.
Create a workflow that goes straight to transcription: Not everyone wants their images changed. If I can pull it off, I’d like users to be able to edit workflows directly in the app. Doesn’t have to be fancy—just a way to open, tweak, and save a plan.