Are you an unimaginative person like me and your name is not Gordon Ramsey? Fear not, ChatGPT/Large Language Models (LLM) to the rescue! Lets see how we can run our own LLM on a local computer without a cloud and enrich its response with Retrieval Augmented Generation. All recipe data used in this example is provided by the public Rewe Rezeptsammlung, but you could insert any kind of data source.
Hello Ollama and Mistral
First we need to pick a runtime for our LLM. I decided to use Ollama due to the ease of setup and good support for my macbook M1 hardware including GPU acceleration. The next important choice is the model we want to use. First I tried llama2 based models but got poor results with german language support (since the recipes are in german, I want to generate german responses as well). After some try and error the models from the EM German familiy performed quite well, specifically the LeoLM Mistral Model
Create embeddings
After that is sorted, the next challenge is the RAG pipeline. A common technique is to create embeddings for your data and run a semantic search afterwards. In our context this means (just a quick overview):
- Create embeddings (basically large vector) for all recipes with SentenceTransformers
- Insert those embeddings and recipes into a PostgreSQL database
- When you run a query (i.e. “pesto nudeln mit fisch”), you calculate the embedding for the query and run a vector search
- The database returns the recipes with the nearest vectors/embeddings
How does it work in practice? Lets check an example:
The response looks reasonable. A second recipe “Chili-Nudeln mit Flusskrebsen” is returned though it doesnt contain the string “Garnelen”. So the similarity between “Garnele” and “Flusskrebs” was picked up apparently.
Query the LLM
Finally we can merge the response from the vector search into the LLM prompt and run queries. Prompt engineering is a tricky topic on its own, but after a couple of tries I got a ok working solution. You can check the whole project and prompts on github recipe llm. Never be hungry again:
The model picked up the recipes from REWE and returned 3 correctly.
I followed up with a request for the shopping list for one of the recipes
In this last example I requested a drink for small humans. Interestingly the embedding matched small humans with children, so I got instructions for a Kinderpunsch.
I’m stuffed
Thats for a quick overview of my system, you can tell there are a lot of parameters and models you can tune and everything will affect the text generation. I only skimped over the SentenceTransformers details (the component that creates the vectors), you can pick a lot of models or even train your own ones, to fine tune your search results! Maybe that will be a follow up for a more in depth article. This whole demo project can be found on github.
For now…back to the kitchen…after this silly joke!