Recent projects with LLMs

Published on September 7, 2024

Often I find myself talking to friends and trying to convince them that ‘this AI thing’ is progressing faster than they realize. They inevitably express skepticism. I tell them to focus on the rate, not the level. Said differently, models are improving faster than they think. For the most part, I am not successful. Even if they accept the premise that these models will have huge impact, they don’t seem to increase the time they spend learning to use and work with them. So here are a few things I have done with LLMs recently. Since much of this is technical work, it’s worth noting I have a non-technical background, and can only do these things (without huge upfront costs) because LLMs are already so capable.

Agentic Chatbot (link)

I wrote about this previously but, over a few months, in my spare time, I built a web-app that lets you build custom workflows with various LLMs. It doesn’t use any off-the-shelf provider like LangChain. This project has many thousands of lines of code and I wrote <10% of them. Probably <5%. And this was mostly before Claude 3.5 Sonnet, which was a jump in programming capabilities. We know 3.5 Opus is coming before the end of the year, and probably a new model from OpenAI and Gemini, so the scope of what you can do is growing fast. A side benefit of this tool is that I can use it whenever I hit my Claude.ai rate-limits.

Custom reporting for a shopify store

My mum runs a Shopify store selling girls clothing. For the most part she doesn’t do much in the way of reporting and analysis. In ~4 hours, I wrote an integration that syncs all orders and products to a google sheet. From there, I built her a bunch of reporting for things like best and worst selling products, revenue forecasts, and inventory forecasting to help her with ordering. I wrote 0 lines of code for this.

Standalone web app for e-commerce inventory management

After the reporting was done, I built a basic tool that lets my mum connect to her shopify store, select a reference period for sales, apply a growth adjustment, and output the exact order she should give to her supplier for the next X months. It supports filtering products by size, category, etc. It uses historical sales to help with growth expectations, and it saves her hours every month and probably increases revenue vs her doing this ‘manually’.

LLM email manager

I got frustrated with the number of emails I receive so I hooked up Claude 3.5 Sonnet to my Gmail account via oauth, pulled in a bunch of emails, and then calculated things like:

  • The number of emails received from this sender in the last 12 months

  • The number of opens and replies to this sender

  • The subject line and body of the email

I then had Claude take that data and categorize the emails into ‘must read now’, ‘should read sometime, ‘backlog’ (for things like newsletters that I read sporadically), and ‘don’t read’. I spot checked the categorization, adjusted the prompt, and then ran it on my backlogged emails. This is great and saves me a bunch of time now. I have not seen it categorize something important as ‘don’t read’.

Learning new things

I use Claude projects extensively for anything I’m learning about. I grab PDFs or epubs for books I’ve bought, and load compatible versions into a project, along with podcast transcripts, blog posts, etc. Then I just discuss things with Claude, or have it quiz me to judge where my gaps are and help focus on those. For quick questions about somethig I’m reading I use ChatGPT, because it’s faster and for practical purposes it has no rate limits. If the source content I’m querying is really big I use Gemini via Google’s AI Studio for the 2M token context window.

When I’m driving or walking, I’ll use ChatGPTs voice mode. I don’t have the new advanced version but it’s still useful and often better than listening to an audiobook in my opinion. And this only stands to become more true based on reviews I’ve seen of advanced voice mode, which should be widely available later this year. I suspect that when I get access, if there is a rate limit, I will find it. I’m incredibly excited for access.

LevineGPT

While not successful, I did try to build a fine-tuned version of gpt-4o-mini that would answer basic finance questions like Matt Levine. I authed into my gmail account, grabbed his last 500 emails, and then wrote a prompt that used 4o-mini and gemini-flash to take the raw content and turn it into single-turn question and answer pairs. This part was fairly successful after some prompt tweaking. I then ran this on a hundred or so articles, produced ~1000 training examples, and used OpenAIs fine-tuning UI since they were offering free fine-tuning of 4o-mini for a couple of million tokens per day.

The end result just wasn’t that great. I think the main reason was that I didn’t spend enough time on the prompt to produce good training examples, nor did I produce enough of them. I still think this idea has potential and would be very cool, but I implemented this in an evening after work, and it needs more attention to be done right.

Holiday / activity planning

I’ve travelled a bit recently, and on a couple of occasions, both overseas and in NYC, I’ve wanted help planning what to do. I usually jump into the ChatGPT app and use voice transcription to describe where I am, how long I have, and some things I’ve enjoyed elsewhere or in the same city, and have it give me ideas. In London, ChatGPT helped me plan a day-trip to Dover, find a spot for lunch, and go for a hike. Something about this UX of just verbalizing a bunch of context in ~60 seconds and having the model provide full recommendations, including how to get there, travel times, ticket information, and recommendations for what else is nearby, is just easier than spending an hour googling around for thoughts.