Weaving and Atarraya: A Diary of Book Making (6/31)

July 25, 2025 Daniel Tubb

Did a lot of coding late last night: tagging and categorizing and putting into categories, things I have written on bricolage.

Then, I found in the corner of a window on the bottom of my screen, a piece on fieldwork and participant observation. I realized it would fit well with the section I wrote a while ago on makeshift. So, I read that piece quickly, made a couple of minor edits, connected it to the piece on makeshift, and sent it out on 789 Serialized.

I’m still concerned about focus and structure and the whole book. But, that’s not a problem that can be solved today. And it’s probably not best solved at all in the abstract. Rather, it’ll be best solved as I get pieces tightened and written. And as I keep the momentum going.

A book is written not in one day.

I’ve got other things to do.

Weaving and Atarraya: A Diary of Book Making (5/31)

July 24, 2025July 24, 2025 Daniel Tubb

Today was a slow day to get going. E-mail. Office demands. Tomorrow, no e-mail before noon.

But I did get to coding the book for a few hours in the evening, and then edited a section on Ferdinand Cheval’s Ideal Process. I’m not sure where it will go in the end. But it feels like it’s connected to the workshops. So I put it there.

The method worked. I had written pieces a few months ago, but I went into the archive. I found all related text to Ferdinand Cheval’s groto in there, which he made by hand over 30 years.

I’ve not watched the movie yet. I might do that tonight. ((L’Incroyable Histoire du facteur Cheval)[https://www.imdb.com/fr/title/tt7248884/], 2019).

How does it fit into the big picture? I’m not sure. Does it matter? 1473 words. 684 more than I needed to.

Weaving and Atarraya: A Diary of Book Making (4/31)

July 23, 2025 Daniel Tubb

Just posted my second 789 Serialized post. I’m not super happy with how I’ve set it up on Substack. But, no matter. I can move over in the future to something else. The part I’m not sure on, is a platform for multiple newsletters. But, anyway.

The piece is on the greek concept of practical knowledge and mētis. I sat down to code some text, and it’s the first file I opened. Rather than code it, I edited it, and made it much better.

It’s nice to something off, as I fell behind the last two days. Too many other writing tasks.

I also spent a bit of time updating my logging for this project. It’s a markdown file, that a shell script updates when I add a file to the filed folder. I could set that up to work with folder actions, so when I make a change in the finder, the script runs, and the website is updated.

In the meantime, I’ll post it manually.

Also, as I filed this piece, I realize it fit right after the section on the attarya and fishing in Mompox, which is the centre of the book. An accident, but I think a nice serendipitous place to put it.

Weaving and Atarraya: A Diary of Book Making (3/31)

July 22, 2025July 23, 2025 Daniel Tubb

David Graeber once tweeted, at least I remember reading it but can’t find it, that he used to have two projects to work on. The A project, the thing he had to do, and the B project, the thing he used to procrastinate on the A project with.

My B project seems to be what I’m working on this morning, rather than the 789 Serialized. An article with a colleague about artificial intelligence and the Dawn of MIDI. We’re calling it the Dawn of AI. Its argument is at once a critique of how artificial intelligence is being used by students, and at the same time, it is a call to consider AI as offering the possibility of at once enabling new kinds of research that perhaps didn’t really need AI in the first place.

Re-reading the article, it was clear it had to do one or two things differently. It needs to be concise, and it needs a home.

Reviewing the draft we had, there were just too many examples. We had 5, but space for 1 or 2. Thinking about it, I can see two examples that have a synergy and fit together: vibe coding, on the one hand, and then using AI to read archives.

On a home. Today, before revising, we settled on an audience. A few journals come to mind. In both cases, the length is Chicago Manual of Style and 8,000 words seems appropriate.

So, 2 points, 8,000 words, CMOS, and it seems more doable.

This morning, I also drafted some notes on rules for writing.

Also, I missed this and the newsletter for yesterday. Hah. So much for Mad Rules for Writers.

Weaving and Atarraya: A Diary of Book Making (2/31)

July 20, 2025July 23, 2025 Daniel Tubb

I ran this morning, 7.3 km. I changed the route and headed north from the village. It has a hill! I ran downhill for 2 km, then uphill for 2km. It felt good to be out. I came back, showered, shaved(!) and sat down at the computer.

I edited a prologue section. It will become today’s 789 Serialized. It actually became three or four. I need to decide. Is 789 words the minimum or the maximum. I think it feels too little, this morning.

On the run, I thought about arbitrary rules for writing. I have had many over the years. I realized, yesterday. The point is I set them. They work for me, for a time. The particular don’t matters.

Craig Mod talks about his rules. A portrait on his walks by 10 a.m. A pop-up newsletter post a day. Mad rules. But stick to them.

Mad rules for me, are not for anyone else. What is 789 Serialized but some mad rules for me to write every day for a month? At least 789 words a day. But, maybe more.

On the run, I took notes for my self-evaluation. It was due a month ago, one of those academic chores I resent. I need to get it in. But I wonder if I think of these chores the wrong way. Why not see them as something to share? To publish as a newsletter. Online, publicly. Online, privately. I’m not sure. But think of an audience as more than an overworked administrator who may or may not read it.

For this book writing diary newsletter, my thoughts turn to the problem of coding huge quantities of text—field notes, morning pages, book drafts, articles, essays, old chapters, pieces of text, blog posts, articles. All of this is my archive. My fichero, as I’ve come to think of it. (Fichero from the Colombian for a box of index cards). It is now about 500,000 words coded, and another 3,000,000 words uncoded. How did I write that much? Why did I write that much.

One, is that I’ve accumulated text files of repeated text. Different versions of the same thing. Different drafts. Cruft that has duplicated. How to deal? Cut and don’t make more.

On cutting, six months ago, I wrote a script to delete second and subsequence instances of the same words, sentences, or paragraphs. Dejatext. I use it all the time.

On not making any more duplicated text, the issue is how to stop creating duplicate words. I have another script I use to code text. It takes text I’ve coded, and splits it into new text files, and brings together codes. It works well. I run it, and organize my notes. It’s good. But, one of the mistakes I’ve made over the years is letting repeated text pile up. Drafts and drafts and then I forget which is the most recent.

One issue. A reticence to delete. I leave words in the archive, even as I use them. Why? Why not take the text, code it, move it from the archive, synthesize it, publish it, and move it along.

How should this workflow function? What’s the actual process? Should I edit the original text file, or the coded version? It seems most logical to edit the coded text—then discard the original. That way, I can iterate toward order over time.

The key question is whether structur.py supports that approach. I may need to revise it.

How would this work? An inbox of notebooks and other sources; a to-code folder with raw uncoded text files; a coded folder with text organized by code; a codes.txt file listing possible codes; and a synthesis folder where ideas are developed. This way, I’d not have duplicating text.

Right now, I’m not sure if structur.py can append new material to existing code files. That’s what it needs to do: allow new material to be added to existing coded text. The coded folder should be able to receive more material over time.

Another issue: how does it handle text that’s already been processed? One possibility is that the system always regenerates empty coded files from codes.txt. If a code is moved or deleted, it’s recreated empty—ready to receive new codes text. That might make sense.

In that model, codes.txt acts as the master list. The coded folder always mirrors those codes. Structur recreates empty code files if needed, letting me keep feeding in new material. This would support then an iterative process of coding and synthesis.

A project for after finishing my 789 Serialized post for today.

Update: 8:30 PM: I fixed structur.py, squashed a bunch of bugs, posted it to GitHub. I also added unit testing to dejatext. Structur now works more reliably, and I deleted some duplicates, and pulled out some malformed text. Long day. But, good changes.

Weaving and Atarraya: A Diary of Book Making (1/31)

July 19, 2025July 20, 2025 Daniel Tubb

I need a creative boost. I’ve been talking about finishing a book for, I don’t know, five years. What was supposed to be my second book, a quick and dirty book about writing, has become a third one. It was supposed to be done in 2019. I edited a different one. It’s taken forever. This happens to academics. We get busy in the middle of our careers. We say yes to many things. We get busy. My daughter was born. I got diagnosed with ADHD, recently. The pandemic. Buying a house. Teaching, too much. A sabbatical, with new projects, also unfinished . Recently, a side hustle programming with AI. But, the book. Written and rewritten, and revised and never finished.

So where is it at? The book is chaos. It was three books, then four, then one. It’s a pile of notes.

This morning, I decided to do a 31-day, pop-up newsletter, à la Craig Mod. I’m calling it 789 Serialized. Mod has done many pop-up newsletters. He’s also had a long-running book diary on making Things Become Other Things. It’s private. A diary on work for a newsletter. It’s called A Nighthingaleingale. A nightingale in a gale, I think. I’ve not asked Craig. It’s a working diary, a log, of the making of Things Become Other Things. I joined his Special Projects to get access. Why not help someone creative, make things.

For my first book, I did something similar. I called Dinorimo, Dissertation to Monograph Writing Month. Inspired by NaNoWriMo. I started January 2016, with a friend. I wrote for a year. I posted to twitter and Facebook. I felt like I was spamming other people. The book took eighteen months, but it started as a 30-day challenge.

Simple rules. 1,000 words a day. Written or revised. If I failed, $50 to Stephen Harper’s election campaign. It feels like a lifetime ago. I didn’t fail. Harper got no money. I finished the book. I was 33. I used it to get tenure, to become a Professor in a small town. I am now Chair. My work? Giving young people a chance to do other things.

Yet, then as now, I feel shy to be online, even though, I’ve had a website and this domain since the late 1990s.

When the book draft was sent off, I stopped the daily spreadsheet, and I stopped the logging. That was a mistake. I kept writing, but I didn’t finishing.

With this book diary newsletter, I’m going to document the book process. A book takes time. This is a diary. I’ll write it, to mark the end the day writing. I used to sometimes post to my DiMoWriMo at 9 am. It felt good.

Where does Mod start with his NighthingaleInGale It was September 10, 2021. High pandemic. He’s lost. He’s printed it all out. He’s made index cards. He’s trying to find structure. Is it a letter? Is it short, stories? It’s the same dark morass, I’ve been in at time. I’m not sure if he would call it a morass. I would. John McPhee had the same experience on a picnic table.

Mod cut up his popup newsletter draft into drafts. Got off the computer, with print outs and scissors. I’ve done that too, with a big table. I tried to do it all. That was the mistake. Too much.

But, I have been months now I’ve been coding. Organizing. Putting things in boxes. Bringing pieces of text together. Where it goes I don’t know. But, it’s been making sense of lots of unpublished writing. Finding themes. Putting them together.

Here’s my log on that:

Coded Notes

Date	coded	filed	to_code	total_words	task
2025-05-03	3770	4960	255191	263921	coding
2025-05-03	4279	4588	254962	263829	coding
2025-05-03	4590	4588	254571	263749	coding
2025-05-03	4699	4588	254463	263750	coding
2025-05-03	5609	4588	253472	263750	coding
2025-05-04	6861	4588	252301	263750	coding
2025-05-04	8211	4588	250951	263750	coding
2025-05-09	12253	4588	246788	263629	coding
2025-05-31	13788	0	396922	410710	coding
2025-06-01	123040	0	280708	403748	coding
2025-06-05	136760	0	261858	398618	coding
2025-06-08	206784	0	151596	358380	coding
2025-06-09	314729	0	0	314837	coding
2025-06-10	333403	0	218899	552302	coding
2025-06-11	377253	0	163775	541028	coding
2025-06-26	434884	0	113255	548139	coding
2025-06-27	474447	0	72555	547002	coding
2025-07-03	524497	0	69287	593784	coding
2025-07-04	592109	0	0	592109	coding

The form of the book I have in mind? I’m not sure. It was once many books, now its many bits of books coded into different sections. 900 codes. Too many. The last few days, I added ethnographic field notes, morning pages, and so on.

My notes are in text files. There are too many words to print out. Too many aborted projects. Too many moments I’ve never used. But, also so much good, useful, insightful, things to draw on. Notes from the field. From readings. From diaries. From notebooks. Things written down and never you used. Lots to draw on.

Where to go from here? I’m going to do a newsletter. Daily. For a month. Edits pieces into a draft. Today I wrote a description of that. 789 Serialized.The form I have in mind is in an atarraya. A castnet that is used by fishers in the Colombian Caribbean to fish the shallow wetlands. Fishing in my notes. I’m going to write about the atarraya and the fichero tomorrow for the 789 Serialized.

For now, I’m going to go for a swim.

789 Serialized: A Newsletter

July 19, 2025August 2, 2025 Daniel Tubb

Don’t I know that finishing is hard? There’s always a crisis. Something more urgent. Something that gets in the way of finishing. If writing is easy for me, finishing is hard. Finishing is what I procrastinate on. It’s what I put off. I don’t send it off. I don’t publish. It’s publishing that is brutal. I’ve long had a writing habit, but not a publishing one. But if writing is the accumulation of small acts, publishing can be the same?

With this newsletter, I will publish, in instalments, in serialized form a draft of a book. My inspiration? Dickens’s 19th-century serialized novel’s. Dicken’s wrote novels in instalments. Anna Gibson, Adam Grener, and Frankie Goodenough describe how difficult is us to imagine reading a novel in short instalments, each month, over a year or two. But, the serial form shaped Dickens’s practice—he wrote to deadline, finishing each just as it came out. It wasn’t abnormal. Rosamund Bartlett’s brilliant biography of Tolstoy (Tolstoy: A Russian Life) describes how Tolstoy had a similar practice with his novels. He published War and Peace in instalments. And then, he edited it into a revised manuscript. In the podcast Serial, journalist Sarah Koenig and her producer Julie Snyder finished episodes of their hugely successful true-crime podcast weekly, just in time, as came out.

Letters from the Future was born out of a series of short letters I co-edited and published on the the New Brunswick Media co-op’s website in the summer of 2018. I edited a letter a week over the summer. Later, I continued, with co-editors, to turn them into a book. We solicited, edited, revised, and published a letter a week. It was a book, in serial form.

The challenge with tackling an academic book, is where to publish. Finding a place to publish feels like a reason to stop. So, why not self-publish? Editors and peer-reviewers improve the work. And, it’s harder to publish if a draft is already published online. So how do you solve that publishing conundrum? How do you publish, in a serialized way, that doesn’t live online forever?

Craig Mod stumbled on a solution: the pop-up newsletter. Mod moved to Japan in his 20s, and has made a career writing, photographing, and self-publishing on his website. He’s well know for taking long walks, writing about them, and making beautiful books. Mod’s most recent book, a memoir, Things Become Other Things began as a newsletter, written daily, 30 to 40 miles at a time, over a month-long 300 mile walk on a peninsula in Japan. Mod photographed, took notes, dictated into his iPhone, and each evening, spent hours working the material up into daily pop-up newsletter. While it became, over many more years, a book, it began as a temporary and impermanent newsletter. A pop-up newsletter.

Mod calls pop-up newsletters the greatest newsletters because they don’t life permanently online. They’re not archived. They are temporary. He describes the genre to John Gruber (00:14:22):

If I do a big walk and I’d make a newsletter out of that, that doesn’t need to be archived online because ideally I’m going to take that. I’m going to collate it. I’m going to edit it. I’m going to squeeze it. I’m going to take the best of it. I’m going to put the best version of whatever that was into a book and I’m going to distribute thousands of books around the world. And then that’s the archive.

Pop-up newsletters are e-mailed, they’re not published on the web, and are only available to subscribers. The archive comes later, as a book.

Mod advices:

Set a limit of three to six months and pick a frequency — perhaps once or twice weekly? — and a word limit — maybe 500 words? (something you can’t shirk away from, that you could refine in an hour if hard pressed) — and stick to your rules like a madperson.

My rules for this book, a serialized, pop-up newsletter?

789 words, daily, for a month.

Why seven hundred and eighty nine words? The number is in a silly sequence: seven, eight, nine. 789 is the highest consecutive digit number under 1,000. It just feels good. It’s why my friend drives at 123 km/h on the highway. It’s a number I can hit, daily, without too much effort. It’s arbitrary. Silly. But, it’s also longer than 500 words. More achievable than 1,000.

Why a month? It’s the summer, a month is doable, and will be over before academic chores begin.

789 Serialized is a book, serialized daily over a month, in instalments of 789 words published as a pop-up newsletter.

Subscribe on SubStack.

Fichero Update and Working Towards 0.1.0-dev

July 9, 2025 Daniel Tubb

Where am I in Fichero?

I spent the two weeks at the end of June doing a pretty major rewrite. So what did that involve?

I had a director.py module that handled all the parallel scripts. It was a mammoth—1200 lines of spaghetti code. It was disorganized. It worked in Celery/REDIS, as a way of doing multiprocessing on separate processes. But, how to share data? I liked it because it worked, as a hack. Basically, it just duplicated folders to process and ran the tools on multiple sets of folders simultaneously. It worked well enough. The problem was, director.py was super complex and very interconnected, and hard to maintain. So, I spent a week in June refactoring it.

Now, there are various modules of a director backend that each document window can talk to. “Take this folder,” and it processes it. The director.py sends files to various modules, and keeps track of various things.

It prepares the folders, copying them into place.

Then a workflow executor loads the plan files, which are stored either in the application’s resources folder, or in the default file locations for Linux, macOS, or Windows settings. The plan is a modified YAML version of the Weasel format, but with the Weasel-specific parts removed and simplified for the workflow executor. The executor communicates with a backend for multiprocessing, which then calls the tools.

The multiprocessing backend is either Celery/REDIS, or a Python-native backend using concurrent.futures and threads.

I’ve got the Python-native backend working well, but haven’t tested Celery/REDIS recently. So, I turn it of.

For both, we have two kinds of workers:
• CPU workers, for CPU-intensive tasks (like image manipulation), and
• I/O workers, which are much less intensive and handle tasks that involve waiting—e.g., writing to disk or waiting on network access (especially inference from the big language models).

At present, the Python backend only works with the CPU workers. Which you can set.

The way it will work is: we spin up four CPU workers to handle image processing, and once that’s complete, the images gets passed to the I/O workers. Ideally, the four CPU workers can keep things moving efficiently. This works well enough. But, I need to get the I/O workers working again for the Python backend.

The tools can also run their own multithreading, e.g. for LLM calls, or when they are only running on one folder. This setup works well, for example, when sending requests to AI models—doing so using threads and concurrent.futures.

Why concurrent.futures and threads (and not async or subprocesses) because Toga uses async, and I was running into issues trying to do async communication with the Python backend. Sub processes were not happy in a macOS build.

In short, refactoring involved breaking out all the utilities of director into smaller files, it’s now much clearer how it works. And it’s easier to bug fix. But, it ends up being way more code. I suspect Cursor.ai has over engineered it. And, I’m slowly going through each to make sure it’s minimal. But, that can wait to a later revision. I’m sure there are some things that are stupid or don’t make much sense, but as time allows, I’ll refactor each module to get it tighter with less code.

Speaking of other changes, I have put together a Toga GUI, synced as much as possible with the CLI. It now builds on a Mac, and runs in Briefcase Dev on Linux and Windows. Which I haven’t gotten around to testing yet, as I don’t have a PC.

Big issues to address in the future. One is the backend data model. Currently, each step creates its own manifest.jsonl file—we generate a new one each time. I wonder about revising so each folder as a single manifest file. The manifest file gets updated and edited there. The nice thing about the current process is that it’s reliable with multiple threads, but I think we could do better. Still, troubleshooting tools in the current setup is easy. I can go into the Finder and make changes to the files directly. If it’s abstracted too much, it might be easier to code, but harder for to understand.

Fichero is for researchers—it doesn’t have to be super slick. In fact, the clearer it is to the people using it how it works, the less “magic”
it will feel. I hope the transparency makes it more useful, and it gives users the ability to use it as a tool, not a replacement for their work.

I also added more reliable error checking. It used to be that if we hit an error in one process, it would just keep going. Now it stops.

Along with a GUI, inspired by Hyperspace and BBEdit, I spent some time working on a task monitoring interface. That said, having a clean display that shows what processors are doing, what errors have occurred, etc.

Anyway, what’s left? Lots. But, I’m hoping to get a build anyone can run for 0.1.0-dev.

This requires:

Building the macOS version, and testing it and the CLI version: That is, making sure everything runs smoothly on macOS, both the GUI and the CLI.
Testing the Windows version and CLI version: Run the build on Windows. Make sure the GUI launches, folders can be selected, and things process properly. Same for the CLI—it should run clean.
Testing the Linux version and CLI version: Try it out on a couple of Linux distros. Make sure GTK stuff works, files save in the right place, and multiprocessing behaves. Both UI and CLI.
Double-check the whole UI is translated. I added some internationalization. Go through the interface and make sure all the text is showing up in the right language. Add any missing translations. I think we can add a function to figure out language from the OS at launch
Build: Get builds working and ready to share. Signed if possible. Test install and run on all three platforms.
Update the documentation: Rewrite or clean up the README, dev notes, and any how-to guides. They should reflect where the app is now—not where it was a month ago.
Disable unused UI stuff. I hid buttons and features that aren’t wired up yet—Plans, Prompts, Commands like Save, Open, etc. No need to confuse people with stuff that doesn’t work yet. I need to double check that.

Create a workflow that goes straight to transcription: Not everyone wants their images changed. If I can pull it off, I’d like users to be able to edit workflows directly in the app. Doesn’t have to be fancy—just a way to open, tweak, and save a plan.

Writer’s Diary #58: Getting Back to It

June 27, 2025June 27, 2025 Daniel Tubb

I’ve not done much book writing in the last month. To many thesis to read, proposals to edit, self-evaluations to write, and emails to get send of. That, and I have been hard at working getting Fichero working with some reliability. I had to rewrite the multiprocessing back end in pure Python, with the help of Cursor. I’m not totally convinced it’s as efficient as the other option which is Celery and RIDIS. But, the goal is a Mac, Windows, and Linux app, without dependencies, and which case having RIDIS as an external dependency isn’t ideal. But, in any case. We have a Mac App, using Beeware’s totally brilliant Briefcase package management system and Toga to make a Mac/Windows/Linux app. I’m quite proud of it. Not perfect, lots of bugs, but its runs, and does its magic of taking old documents and cataloguing them.

Today, I want to think about getting back to writing. I have lots of projects, lots of text, and no finished book. I’m going to try to be more regular again, and one way to do that is to have achievable goals. The goals are flexible though. What am I working on? I am coding, organizing, revising old text, writing new text, and polishing. I propose therefore:

A win is:

Daydream: 50 minutes
Code: 25,000 words
Organize: 5,000 words
Revise: 1,000 words
Write: 300 words
Polish: 3,000 words

Totally arbitrary, I know. But, sometimes achievable goals are important.

Today, a win is coding 25,000 words. (By coding, I mean tagging text with themes and topics using Structur

Update: 11:43 am. Today I coded 25,000 words!

fichero_director.py: Running Fichero on Multiple Processors

May 28, 2025May 28, 2025 Daniel Tubb

What is Fichero? It is a Python-based pipeline for large-scale document processing—think thousands of scanned pages, each requiring cropping, enhancement, OCR, and transcription.

I’ve had versions of Fichero working for six month. But, it was slow. Fichero processed one step at a time, then the next. It was slow. The challenge, taking advantage of computing power, multiple processors, and not getting bogged down.

This week, I got fichero_director.py working, which uses Celery to run Fichero “workers” on multiple processes, each doing different steps at the same time.

Imagine one cook making 100 pizzas, one after the others. Versus 8 cooks making 8 pizzas at a time.

How does it work? Fichero_director.py breaks down workflows into CPU-intensive and I/O-intensive steps. For example, image tasks are CPU-heavy, while transcription using language models or converting to Word documents is mostly I/O-bound by disk or LLM inference. Tasks are sent to Celery queues based on script type.

Fichero_director tries to tune for the host system. On M1 Macs, for example, the hope is CPU workers use the performance cores, while more numerous I/O workers handle slower operations on the efficiency courses.

On my machine, I have 8 cores, and fichero_directory.py uses them all.

To follow along, there is a simple dashboard that tracks real-time progress, showing each folder’s status and current step. Each folder is processed independently, with logs written per folder and per step.