Fichero Update and Working Towards 0.1.0-dev

Where am I in Fichero?

I spent the two weeks at the end of June doing a pretty major rewrite. So what did that involve?

I had a director.py module that handled all the parallel scripts. It was a mammoth—1200 lines of spaghetti code. It was disorganized. It worked in Celery/REDIS, as a way of doing multiprocessing on separate processes. But, how to share data? I liked it because it worked, as a hack. Basically, it just duplicated folders to process and ran the tools on multiple sets of folders simultaneously. It worked well enough. The problem was, director.py was super complex and very interconnected, and hard to maintain. So, I spent a week in June refactoring it.

Now, there are various modules of a director backend that each document window can talk to. “Take this folder,” and it processes it. The director.py sends files to various modules, and keeps track of various things.

It prepares the folders, copying them into place.

Then a workflow executor loads the plan files, which are stored either in the application’s resources folder, or in the default file locations for Linux, macOS, or Windows settings. The plan is a modified YAML version of the Weasel format, but with the Weasel-specific parts removed and simplified for the workflow executor. The executor communicates with a backend for multiprocessing, which then calls the tools.

The multiprocessing backend is either Celery/REDIS, or a Python-native backend using concurrent.futures and threads.

I’ve got the Python-native backend working well, but haven’t tested Celery/REDIS recently. So, I turn it of.

For both, we have two kinds of workers:
• CPU workers, for CPU-intensive tasks (like image manipulation), and
• I/O workers, which are much less intensive and handle tasks that involve waiting—e.g., writing to disk or waiting on network access (especially inference from the big language models).

At present, the Python backend only works with the CPU workers. Which you can set.

The way it will work is: we spin up four CPU workers to handle image processing, and once that’s complete, the images gets passed to the I/O workers. Ideally, the four CPU workers can keep things moving efficiently. This works well enough. But, I need to get the I/O workers working again for the Python backend.

The tools can also run their own multithreading, e.g. for LLM calls, or when they are only running on one folder. This setup works well, for example, when sending requests to AI models—doing so using threads and concurrent.futures.

Why concurrent.futures and threads (and not async or subprocesses) because Toga uses async, and I was running into issues trying to do async communication with the Python backend. Sub processes were not happy in a macOS build.

In short, refactoring involved breaking out all the utilities of director into smaller files, it’s now much clearer how it works. And it’s easier to bug fix. But, it ends up being way more code. I suspect Cursor.ai has over engineered it. And, I’m slowly going through each to make sure it’s minimal. But, that can wait to a later revision. I’m sure there are some things that are stupid or don’t make much sense, but as time allows, I’ll refactor each module to get it tighter with less code.

Speaking of other changes, I have put together a Toga GUI, synced as much as possible with the CLI. It now builds on a Mac, and runs in Briefcase Dev on Linux and Windows. Which I haven’t gotten around to testing yet, as I don’t have a PC.

Big issues to address in the future. One is the backend data model. Currently, each step creates its own manifest.jsonl file—we generate a new one each time. I wonder about revising so each folder as a single manifest file. The manifest file gets updated and edited there. The nice thing about the current process is that it’s reliable with multiple threads, but I think we could do better. Still, troubleshooting tools in the current setup is easy. I can go into the Finder and make changes to the files directly. If it’s abstracted too much, it might be easier to code, but harder for to understand.

Fichero is for researchers—it doesn’t have to be super slick. In fact, the clearer it is to the people using it how it works, the less “magic”
it will feel. I hope the transparency makes it more useful, and it gives users the ability to use it as a tool, not a replacement for their work.

I also added more reliable error checking. It used to be that if we hit an error in one process, it would just keep going. Now it stops.

Along with a GUI, inspired by Hyperspace and BBEdit, I spent some time working on a task monitoring interface. That said, having a clean display that shows what processors are doing, what errors have occurred, etc.

Anyway, what’s left? Lots. But, I’m hoping to get a build anyone can run for 0.1.0-dev.

This requires:

  • Building the macOS version, and testing it and the CLI version: That is, making sure everything runs smoothly on macOS, both the GUI and the CLI.
  • Testing the Windows version and CLI version: Run the build on Windows. Make sure the GUI launches, folders can be selected, and things process properly. Same for the CLI—it should run clean.

  • Testing the Linux version and CLI version: Try it out on a couple of Linux distros. Make sure GTK stuff works, files save in the right place, and multiprocessing behaves. Both UI and CLI.

  • Double-check the whole UI is translated. I added some internationalization. Go through the interface and make sure all the text is showing up in the right language. Add any missing translations. I think we can add a function to figure out language from the OS at launch

  • Build: Get builds working and ready to share. Signed if possible. Test install and run on all three platforms.

  • Update the documentation: Rewrite or clean up the README, dev notes, and any how-to guides. They should reflect where the app is now—not where it was a month ago.

  • Disable unused UI stuff. I hid buttons and features that aren’t wired up yet—Plans, Prompts, Commands like Save, Open, etc. No need to confuse people with stuff that doesn’t work yet. I need to double check that.

Create a workflow that goes straight to transcription: Not everyone wants their images changed. If I can pull it off, I’d like users to be able to edit workflows directly in the app. Doesn’t have to be fancy—just a way to open, tweak, and save a plan.

Writer’s Diary #58: Getting Back to It

I’ve not done much book writing in the last month. To many thesis to read, proposals to edit, self-evaluations to write, and emails to get send of. That, and I have been hard at working getting Fichero working with some reliability. I had to rewrite the multiprocessing back end in pure Python, with the help of Cursor. I’m not totally convinced it’s as efficient as the other option which is Celery and RIDIS. But, the goal is a Mac, Windows, and Linux app, without dependencies, and which case having RIDIS as an external dependency isn’t ideal. But, in any case. We have a Mac App, using Beeware’s totally brilliant Briefcase package management system and Toga to make a Mac/Windows/Linux app. I’m quite proud of it. Not perfect, lots of bugs, but its runs, and does its magic of taking old documents and cataloguing them.

Today, I want to think about getting back to writing. I have lots of projects, lots of text, and no finished book. I’m going to try to be more regular again, and one way to do that is to have achievable goals. The goals are flexible though. What am I working on? I am coding, organizing, revising old text, writing new text, and polishing. I propose therefore:

A win is:

Daydream: 50 minutes
Code: 25,000 words
Organize: 5,000 words
Revise: 1,000 words
Write: 300 words
Polish: 3,000 words

Totally arbitrary, I know. But, sometimes achievable goals are important.

Today, a win is coding 25,000 words. (By coding, I mean tagging text with themes and topics using Structur

Update: 11:43 am. Today I coded 25,000 words!

Writer’s Diary #57: A Room, A Purse, and No Phone?

I seem to have spent a few days reading and reacting to Virginia Woolf’s A Room of One’s Own. It’s my third reading now. First thing. Four fifty-minute chunks. As I re-read it and my notes, I respond to it, and my notes, and I (re)write.

What to make of her description of a rather unsuccessful morning in the British Library, where all she finds is men writing about women? Obsessed it seems. She’s angry.

But then she goes for lunch.

A nice lunch. She has coffee too. And find a newspaper. To pay, she reaches into her purse. Five shillings and ninepence. Her purse produces 10 shilling notes. Her Aunt died, giving her 500 pounds a year. In perpetuity. (It was only 2500 pounds, in total, in reality.)

That’s 39.10 CAD for a lunch! A good lunch. Expensive, I imagine? How can she afford it?

Her aunt’s money. She doesn’t have to work.

It hits me in a flash.

When I was a student, I had no money, but I did have some. Not like my students today. I had scholarships and loans and easy student jobs.

First year, I lived with friends.

Later, I parlayed them into cheap rent and cheap food through global arbitrage. (I was a librarian, making minimum wage, in Ontario, working in Ecuador.)

I lived frugally, but rent and wine and food in Spain 2002 and Ecuador in 2005 were far cheaper than in Ontario.

Rent in Colombia as a graduate student doing ethnographic fieldwork, when I spent my days walking, thinking, eating good food, and reading, was something else. Less still, when I went the gold mines.

That fieldwork was funded by the Canadian taxpayer and Carleton University. To great expense. It was decent money. And, my accidental exercise in global arbitrage, made my purchasing power much higher.

Not deliberately, but accidentally. I moved where my money went further, and the scholarship and grants allowed me to spend two years wandering around Bogotá and the Chocó and letting my mind wander and writing.

(When the purse ended, I experienced the opposite. Moving to Yale, with a shrinking scholarship as the Canadian dollar collapsed.) I walked, and wrote, but was far more stressed about money. Anxious. I spent those three years trying to find a job. Which I did. Then, a decade worrying about money as one salary only goes so far.

But then, and maybe now again as Mercedes works, I have the equivalent for lunches.

(I used to go to the library, then walk and spend 40,000 pesos on a lunch in Bogotá now. That is one day’s minimum wage.)

But, for the last decade, my phone would have been in the way. Making the mind wandering impossible.

But, not as as student. As a student, I had no phone. So, many times, I did what Woolf describes—the thinking and daydreaming and writing and letting the mind wander and making connections. It’s this that Woolf’s famous essay is an exercise in. (It does it by showing, not telling.)

It’s fiction, to be sure. But there is an element of autobiography. What about calling it a fictionalized auto-ethnographic account of writing? The day dreaming by the river, the flash of insight lost by walking on turf, the lunch, and a walking into the evening thinking about the gold that went into the college, and a walk to the library and then the next day at the British Museum and then lunch paid for by her Aunt’s inheritance.

I think it is.

But, of course, she had no iPhone, computer, or Internet. Has all of this connection robbed us of our ability to let our mind wander and make connections?

Yes.

But, need it?

No.

Zadie Smith, an other famous English novelist, essayist, and short-story writer, doesn’t have a phone.

These two facts might be connected.

Writer’s Diary #56: A room of One’s Own

In the draft, I had an aside about the importance of a room of one’s own with a lock. My thought? A room of one’s own, with a lock, and without a computer, phone, or interruptions. But, yesterday morning, and again this morning, I re-read Virginia Woolf’s famous essay. Not only is it a feminist critique of the materiality of artistic creation and the ways in which women have been excluded for centuries. But in Woolf’s words, and in the story she weaves, you can also begin to see the glimmers of a method to fiction.

The walking and daydreaming, the trespassing on lawns, the lunches, the attempts to go to libraries, the walks before dinner and the remembering of snippets of ideas, of poems. The walks and strolls across in Oxbridge.

But it’s also the next day, and other days, of being in the room and taking books down and putting them back on the shelves, of going to the library and reading with a notebook, and of misremembered lines and lost quotations, and the concentration that goes into the work.

Even as I was doing that, I had my daughter behind me, asleep on the couch at 4 o’clock in the morning because she couldn’t sleep. She woke up early. I cuddled her. She fell back into bed.

And as I was writing, there were four or five messages. Running partners. Dentist appointments. Concentration.

But there’s also a bricolage in there.

Pulling down books. Looking at shelves. Going to the library for ideas.

Pulling down books. Looking at shelves. Going to the library to get ideas.

I read in the introduction to the 2000 Penguin edition that on the day she gave the lecture she wrote:

“My ambition is, from this very moment—eight minutes to six, on Saturday evening—to attain complete concentration again.”

Total concentration! It takes a room and money (CAD$70,000 in Canadian money, I’m guessing), and I know it helps to be white.

Total concentration! It takes a room and money (CAD$70,000 in Canadian money, I’m guessing), and I know it helps to be white and a man.

But a walk, and lunch, and time, and concentration are best achieved without the technology in my pocket. Which is its own difficulty.

Writer’s Diary #55: Just do it

Today’s update:

I met a drummer friend yesterday, along with another artist—a goldsmith. My drummer friend has been trying to work steadily every day. Four hours. He writes it down. He inspired me to try again. If he can drum for four hours a day, maybe I can find time to write? Just do it. “Do it” He said

He works at night; I am a morning person. So, I did my four hours in four 50 minute chunks this morning. It’s nice to be done by 9:30.

The other friend, the goldsmith, said, “I only let myself start on something new once I’ve finished something.”

That is my challenge—always starting something new. Perhaps I can finish something, before starting on another big project..

Anyway, a nice morning on the bricolage chapter. Re-read the whole draft. Lots to do.

I ended on a side tangent re-reading Woolf’s A Room of One’s Own. She’s talking about women and fiction; I think I’m talking about distraction and anthropology.

But anyway, I have to finish reading it again tomorrow.

For now, I’m done and of to a maple syrup sugar shack with the kids.

Writer’s Diary #51: In Praise of Mellel

Mellel 6.3 just came out. I bought my first copy of Mellel as a student in 2006 or so; although there have been long periods when I haven’t used it and worked in Word. Every long‑form thesis, dissertation, or book I’ve ever written has been finished in Mellel. Yet, most of my daily writing is done in Markdown in Tinderbox.

Markdown is a text‑to‑HTML conversion tool for web writers. It allows you to write using an easy‑to‑read, easy‑to‑write plain text format, which can then be converted to HTML. Since John Gruber of Daring Fireball designed the spec in 2004, it has taken over the Internet. Perhaps it should be required learning for students?

Tinderbox is a tool for notes—a place to put down ideas, move them around, edit them, and revise them. It’s a personal information toolbox, a piece of software that I find indispensable for my scattered writing process.

But at some point, the messy notes and ideas have to turn into drafts and manuscripts. By the time a draft goes to a reviewer or publisher, it has to be perfect.

For the last half decade, I’ve long been enamoured with the idea of writing in Markdown in Tinderbox and then using tools like Pandoc and CiteProc to take that output and convert Markdown into blog posts, websites, and Word manuscripts. Indeed, I’ve even written a few scripts and a contextual menu in Finder that convert a Markdown file with citations into a DOCX file, and vice versa.

But as an academic writer, there is, of course, a challenge with citations. I’ve long used Bookends as a citation manager. I’ve used Bookends since the early 2000s. It’s a powerful app for keeping track of thousands of articles. It’s fast, unlike Zotero. It plays well with Mellel, but also lets you sync with a BibTeX file, which can be used by CiteProc.

So, I can write in Markdown in Tinderbox using MultiMarkdown formatting for footnotes, and then send it to Pandoc to convert to Word or wherever with citations. It works well. I love it.

Yet, for every truly long form article or project, I find myself turning back to Mellel for the final step because one thing that an academic writer knows is that by the time it goes to peer review, it must be perfect. What you send to the press, the editor, or the peer review will first be reviewed, and if you get the subtle things wrong in the writing or the formatting, you can be prejudged as sloppy.

Writing a book or article is, in part, an exercise in getting it right. Perhaps it’s premature perception. Yet, as I work on finishing a manuscript, I again turn back to Mellel, and it shines as a beautiful word processor designed for print and for writing documents.

There’s a cognitive relief in not writing in an abstraction, even an elegant simple one like Markdown.

Mellel bills itself as:

is a word processor designed from the ground up to be the ultimate writing tool for academics, technical writers, scholars, and students. Mellel is powerful, stable, and reliable; it is the ideal companion for writing documents that are long and complex, short and simple, and anything in between.

It’s all of these things. Worth a look.

I turned back to it, because the recent 6.3 version has a new notes feature that allows you to put notes at the end of a section or page range (that is, at the end of a chapter). This means I can write the way that my corner of anthropology likes to write, with notes at the end of chapters.

This new notes are notes done right.

While a subtle addition would be to have chapter‑end notes arbitrarily situated in a notes chapter at the end of a manuscript, as publishers in the humanities do it.

For now, I want to say that the thing with Mellel that I love is that, as opposed to Markdown, is that there is a much reduced cognitive load in getting it right, and then moving on.

As I move into it once again and get my book set‑up, I feel a relief that once I get it right in Mellel, it stays right. There are no further processing steps where errors can be introduced. It’s done. Tinderbox makes it easy to make radical changes. But, at some point, one has to stop, and get it done.

Rather than write in a markdown, Mellel lets you just work on the final form.

It’s nice.

I write by picking at things, tinkering, changing, and cobbling. At times, an Markdown’s abstraction and portability is best. At the end, Mellel is best.

Scheduling a syllabus

It’s time to design a syllabus again. As each semester comes to an end and a new one begins, I always find myself redesigning and rethinking my syllabi. Why? A course is an opportunity to brush up on material, to think through new ideas, and to return to interests.

However, there is always the challenge of working out what to assign, what works, and what to think about. I often assign too much reading, for example. So, before I dive deep into designing my first-year course on economics and anthropology for this semester, I wanted to put down some thoughts on what I know works.

First, go slower. I look back at my syllabi from my first few years teaching, and they are far too ambitious. I know students don’t always read, but part of the trick is working out the structure and the framing of a course. The reading schedule matters, but because it won’t change, but because its rhythms and speed shape the experience of the semester, for myself and the students. There need to be moments of working hard, but also moments of slowing down and relaxing.

At the end of the last semester, it seemed that five four-class (M, W, F, M) modules, with a couple of classes between (W F) offers a way to frame the semester. Things in between might be an in-class exam, an in-class reading reflection, a few films, or an ethnography.

For an upper-level course, we could just read five books. Reading five books is too much for lower-level students. But, we can read, think, talk, and discuss bits of books.

I’m going to try this module framework this coming semester.

iPad Mini + Tailscale + Screens + Better Display = Relibale screen sharing to MacBook

Here’s the problem. I have my MacBook at the office on campus. I want it to be running through some long image processing tasks that are taking days. It requires an external hard drive. I don’t want to disconnect it. However, sometimes, I need to log in to do work, like post grades over the holiday.

What I want is to be able to log into the Mac remotely and use it to do other work if I’m at home. The problem: I don’t have another computer. I tried to get an old one, but I can’t.

I do have an iPad Mini, though.

However, the first time I tried to connect the iPad Mini to the Mac via screen sharing, I couldn’t get a reliable connection, the resolution of the display was off, I couldn’t get that keyboard mapping to work well, and it was just clunky.

I’ve solved it now.

What works?

First, to get a reliable connection between the two devices, I’ve found Tailscale works reliably. It lets me create a virtual network between the iPad and the MacBook, despite the fact one is on a university network and the other is at home. This allows me to connect between both devices as if they were on a local network, despite the fact they’re not.

To actually make the shared screen connection, I use Screens by Edovia. It works well and is a native iPad and Mac app.

For a while, I did have a hard time connecting from an external keyboard connected to the iPad mini because the iPad OS was grabbing the Command keys. If I tried to Command+Tab on the shared Mac, the iPad OS would Command+Tab to another iPad OS app. Perhaps this is logical, but as someone with a lot of muscle memory on the Mac, it was super annoying—almost a show stopper—to be working on the Mac, then suddenly in a different OS.

By inverting the Command and Control in the Keyboard settings on the iPad and then inverting it again in the settings of the Screens app on the iPad, it worked perfectly on the shared Mac display, which is my goal.

Next issue: the display resolution on the Mac did not match the resolution of the iPad Mini.

The solution: Better Display, a $20 Mac app that can create lots of custom resolutions and, crucially, has an option to create a virtual screen the size of the iPad mini. This means the full screen of the iPad mini is shared. This allows me to connect the iPad to a virtual display on the Mac that matches the iPad’s physical dimensions.

So, after an hour: the resolution matches the iPad Mini, the keyboard keys are perfectly mapped, the connection is reliable, and I can log in to the Mac from home with just an iPad.

Not bad. Now to post some grades.

Writing Diary #53: Cleanup Reepeated Text

What makes a book different from an article? What makes a book different from an essay? One thing: it’s long. In my case, what I thought would be one book, seems to be becoming two or maybe three. However, over the years I’ve been working on it, it has grown to almost a million words. This creates just a writing challenge. How to organize, cut, edit, and work with the detritus of various versions, drafts, initial starts, notes on an idea, and it just piles up in a chaotic, Escher-like jumble, that is so overwhelming as it leave me lost. One problem is the order. One is duplication.

Order can be solved by coding. Putting things about the same topic together. This solves the issues that writing different iterations of these books for quite a long time. This means that the ideas have sometimes flourished in different drafts, different versions of the same text, living in different places. I’ve long since lost track of where things are.

Pragmatically, they’re in a big Tinderbox file. But, how to turn that into a book?

When it comes to the final steps of editing, tightening, and turning rough ideas into book form, one step that is both banal and annoying is the general cutting of duplicate text.

There are different ways of doing this.

For a long time I’ve done it by hand. It’s not very efficient when dealing with so much text.

More recently, I have been using a script I wrote called structur.py. With structur.py I code text and send paragraphs and pieces of text to the right place. I put ideas together that are about similar topics. However, once structur.py has worked its magic, there is still a problem: I still have a lot of duplicate text? One way to deal with this is to keep coding until all the similar text is in the same place, then delete it. This is what I did. But it takes a long time just to read 10,000 words. Let alone 30,000 or 100,000, which is my problem. The problem is that I want a copy of a piece of text, and then any good sentences that I might want to put together.

A few days ago, I put together a script called DejaText.py that flags files, paragraphs, sentences and even words that are duplicated. This is super useful for identifying where text is being reused in multiple places. The result was somewhat shocking. My drafts were full of duplicates.

This morning, as I was trying to code six files to deal with the most egregious case of duplicated paragraphs and sentences, I realised that I was coding the same text over and over again. Why not write a script that deletes the second and subsequent instance of repeated paragraphs and sentences? It’s dangerous. But, point it at a temporary folder, and it works. I call it [dejatext_cleanup.py]. It’s on GitHub.

This combination of deleting duplicate text, and then coding with structur.py, allowed me to cut 35,000 words down to 13,000 words. The scripts will speed= things up considerably. How to proceed? The way forward is to work at the level of what I’ve already organised. Take a section, say a section on pencils. Delete duplicate text. Code the remainder. Put the codes in order. Then, I’ll have a complete section on pencils. Revise it a couple times, and then I’ll have a draft.

These are power tools. They’re dangerous. But, it iwll speed up turning a folder of notes into a book.