LLM with llama and Retrieval Augmented Generation

I'm hungry and not a kitchen chef

Create a LLM with RAG to fetch recipes

Posted by Gregor Tätzner

Are you an unimaginative person like me and your name is not Gordon Ramsey? Fear not, ChatGPT/Large Language Models (LLM) to the rescue! Lets see how we can run our own LLM on a local computer without a cloud and enrich its response with Retrieval Augmented Generation. All recipe data used in this example is provided by the public Rewe Rezeptsammlung, but you could insert any kind of data source.

Hello Ollama and Mistral

First we need to pick a runtime for our LLM. I decided to use Ollama due to the ease of setup and good support for my macbook M1 hardware including GPU acceleration. The next important choice is the model we want to use. First I tried llama2 based models but got poor results with german language support (since the recipes are in german, I want to generate german responses as well). After some try and error the models from the EM German familiy performed quite well, specifically the LeoLM Mistral Model

Create embeddings

After that is sorted, the next challenge is the RAG pipeline. A common technique is to create embeddings for your data and run a semantic search afterwards. In our context this means (just a quick overview):

Create embeddings (basically large vector) for all recipes with SentenceTransformers
Insert those embeddings and recipes into a PostgreSQL database
When you run a query (i.e. “pesto nudeln mit fisch”), you calculate the embedding for the query and run a vector search
The database returns the recipes with the nearest vectors/embeddings

How does it work in practice? Lets check an example:

Vector Search Pasta Chili

The response looks reasonable. A second recipe “Chili-Nudeln mit Flusskrebsen” is returned though it doesnt contain the string “Garnelen”. So the similarity between “Garnele” and “Flusskrebs” was picked up apparently.

Query the LLM

Finally we can merge the response from the vector search into the LLM prompt and run queries. Prompt engineering is a tricky topic on its own, but after a couple of tries I got a ok working solution. You can check the whole project and prompts on github recipe llm. Never be hungry again:

Recipe LLM 1

The model picked up the recipes from REWE and returned 3 correctly.

LLM Shopping List

I followed up with a request for the shopping list for one of the recipes

LLM Drink children

In this last example I requested a drink for small humans. Interestingly the embedding matched small humans with children, so I got instructions for a Kinderpunsch.

I’m stuffed

Thats for a quick overview of my system, you can tell there are a lot of parameters and models you can tune and everything will affect the text generation. I only skimped over the SentenceTransformers details (the component that creates the vectors), you can pick a lot of models or even train your own ones, to fine tune your search results! Maybe that will be a follow up for a more in depth article. This whole demo project can be found on github.

For now…back to the kitchen…after this silly joke!

LLM joke

Greglog

Software Development & Stuff

I'm hungry and not a kitchen chef

Hello Ollama and Mistral

Create embeddings

Query the LLM

I’m stuffed