Of Cookbooks, LLMs, and Close Reading
The other night, as I was standing in my kitchen staring at the content of my fridge, I was faced with a very common problem: I wanted to find something interesting to cook with what I had.
I have a whole shelf of cookbooks—actually, make it three shelves of cookbooks—that cover the the cuisines of several continents and pay due to multiple styles. I knew that somewhere in one of those there had to be an interesting recipe that would make nice use of that fresh bunch of kale I was holding in my hand. But how could I find it? I could have looked online, sure to be confronted with a long list of kale suggestions, but I would never be sure I could trust a source I had never used before, or liked the style of an unknown chef, whereas I know I can trust Ottolenghi, Tamimi, Che, Al-Lopez, or Ada Boni. Wouldn’t it be nice if I had an app that would only search my cookbooks and return a list of recipes using kales with author, title, and page number (and format it in Chicago style on top of it)? A quick search online showed that no such app existed. Surely, I thought, as I sauteed my kale the same way I have for years, it shouldn’t be too difficult to hack one by leveraging the wealth of tools that AI-based tech has been flooding us with in the last few years.
The following day (it was a Saturday) I started experimenting, did some reading, and even built a test project using the Large Language Models you can now easily access through … and added …(RAG) backend focused on the digitized versions of my cookbooks. Even though the natural language interface behaved well—no surprises here—the search results were very disappointing. A couple of chats with people who actually build systems of this kind for a living made clear that the problem I faced with was not as trivial as I had hoped. It is easy enough to build a system that can parse even vaguely stated questions such as,
List all the recipes in my cookbooks only using kale and provide source titles, authors, and page numbers
or even less structured ones like, for instance,
Provide a list of all recipes from my cookbooks only suitable as a main course that include a green vegetable similar to kale as one of the main ingredients (and of course give title, author, and page number).
But it is not easy to turn those parsed questions into effective searches, unless, that is, you have converted a collection of digitized books into a structured database, effectively parsing your knowledge base by hand instead of having your tools do it.
And thus, the idle query that a solitary bunch of kale prompted has turned into a side project, whose primary goal is to assess how well different approaches—RAG, semantic search, and others—can deal with a corpus of documents that are semantically fairly homogeneous and therefore all more or less relevant to the possible queries if the results needs to be exhaustive, without resorting to extensive preprocessing of the knowledge base into a structured database.
As soon as I rephrased the problem in general terms, it became apparent that the most likely reason I even considered it as legitimate question was because querying a corpus of homogeneous documents (under sme definition of “homogeneous”) is what a philosopher, an art historian, or a literary critic does every day. In fact, we even have a name for this kind of activity, we call it “close reading.” I have been doing all my adult life by solving the problem manually—namely, by painstakingly reading the books in my “corpus” and disassembling them into often hundreds of 5x8 cards reporting the passages relevant to the topic inscribed in their titles.1
Could the insights gained from searching cookbooks be translated into tools to help scholars in their daily close reading activities? Some thought experiments will be given in a forthcoming post. Stay tuned!
(And comment below…)
-
As many European scholars of my generation I was bred into the so called Zettelkasten method (“card boxes,” literally) from an early age (professionally speaking), and received strong reinforcements by all my professors. Umberto Eco’s delightful little book—Come si fa una tesi di laurea now finally translated into English as How to Write a Thesis, gives an exhaustive account of a method that another great European scholar, Niklas Luhmann, credited as the secrete weapon behind his productivity. I was delighted to discover that a a digital version of the Zettelkasten approach has been created by Soren Biornstad and have been using it ever since. You can check it out on Soren’s github repo and see it in action on his publicly available notes. ↩
Comments
Comments are stored on this server. IP addresses are hashed. Email is optional and never shared.