A local RAG chatbot

local platform for LLMs

Mick McQuaid

University of Texas at Austin

21 Mar 2025

Week Ten

Agenda

  • Presentations: Yuan, Siddharth
  • News (two weeks worth!)
  • Review whatialreadyknow (Ishwari)
  • Today’s plan
  • Ollama
  • DeepSeek
  • Local RAG chatbot
  • Work time

Presentations

News

AI and diagrams

It’s worth looking at the AI and diagrams article that was posted to Hacker News, along with the HN comments, which are more generally focused on prompting techniques. The article discusses the use of AI to generate diagrams, and concludes that it’s good for simple diagrams and brainstorming, but struggles with complex diagrams and diagrams of systems that require insights into the system that are not well-documented.

https://news.ycombinator.com/item?id=43398434

The Batch

Last week’s edition:

  • Agentic AI was a strong theme at AI Dev 25
  • Demos of Astra and Deep Research illustrated embedding AI in everything
  • Andrew thinks many companies fine-tune when they should be prompting, but fine-tuning is coming along
  • Aya Vision is a new model that has been deployed on WhatsApp and can translate text and images in 23 languages (good for under-resourced languages)
  • Google introduced AI co-scientist, a general multi-agent system designed to generate in-depth research proposals within user-specified constraints (It’s enjoyed some success already)

More of Batch

  • AI-generated works now qualify for copyright protection if a human being contributed enough creative input, in a determination of the US Copyright Office
  • Microsoft and Shenzhen Institute of Advanced Technology (SIAT) partnered to create MatterGen, a diffusion model that generates a material’s chemical composition (crystals so far) from a prompt specifying a desired property

Still more of Batch

\langle pause to look at this week’s edition \rangle

WhatIAlreadyKnow (Ishwari)

Today’s plan

Today we’ll create a local RAG chatbot, using Ollama, DeepSeek R1, LangChain, ChromaDB, and Gradio.

We’ll have to install a lot of stuff and we’ll have to use the localRAGchatbot.qmd file after we do some installations.

Ollama

Ollama is an open-source framework designed to facilitate the deployment of large language models on local environments. It aims to simplify the complexities involved in running and managing these models, providing a seamless experience for users across different operating systems. (source: nixos wiki)

Download and Install Ollama

Visit Ollama’s website for detailed installation instructions, or install directly via Homebrew on macOS:

brew install ollama

For Windows and Linux, follow the platform-specific steps provided on the Ollama website.

DeepSeek R1

DeepSeek R1 is a large language model (LLM) developed by DeepSeek, a Chinese AI company. It is designed to provide advanced natural language processing capabilities, including text generation, question answering, and more. It was released on 10 January 2025. It is incredibly popular but also controversial due to its adherence to Chinese government censorship policies. We can run it locally, which is a big attraction, meaning that your data never leaves your machine.

Fetch DeepSeek R1

Pull the DeepSeek R1 model onto your machine:

ollama pull deepseek-r1

By default, this downloads the 7B DeepSeek R1 model (which is 4.4GB). If you’re interested in a specific distilled variant (e.g., 1.5B, 7B, 14B), or the full 671B parameter model (671b, which is 404GB) just specify its tag, like:

ollama pull deepseek-r1:1.5b

Run DeepSeek R1

Do this in a separate terminal tab or a new terminal window:

ollama serve

You must keep this terminal window open. In other words, you must keep Ollama running while you’re using DeepSeek R1.

Start using DeepSeek R1

Once installed, you can interact with the model right from your terminal:

ollama run deepseek-r1

Or, to run the 1.5B distilled model:

ollama run deepseek-r1:1.5b

Or, to prompt the model:

ollama run deepseek-r1 "What is the latest news on Rust programming language trends?"

Model location

On my machine, the model manifest is located at

~/.ollama/models/manifests/registry.ollama.ai/library/deepseek-r1/latest

The model itself is located at

~/.ollama/models/blobs

divided across several files. The name deepseek-r1 is important because that is the name used by Ollama to refer to the model. If you store other models, they will be stored in this structure and can be accessed by name.

You can verify that you have the 7B model by saying

ls -lh ~/.ollama/models/blobs

One of the blob files should be about 4.4GB.

The chatbot

We now turn our attention to the localRAGchatbot.qmd file. You must download that file from Canvas, along with Xiao2025.pdf. This latter file is a book, Xiao and Zhu (2025).

The first thing to do is to copy and paste the pip install ... commands into a terminal window. Then you can try to render the localRAGchatbot.qmd file in RStudio.

If you can’t render the file, there may be several reasons. The first thing to try is to download the localRAGchatbot.py file and try to run that in Python by saying python localRAGchatbot.py at a terminal prompt. It takes about seven minutes to run on my machine. If that doesn’t work, the problem may be in your Python installation. If it does work, the problem may be in your RStudio installation.

END

References

Xiao, Tong, and Jingbo Zhu. 2025. “Foundations of Large Language Models.” https://arxiv.org/abs/2501.09223.

Colophon

This slideshow was produced using quarto

Fonts are Roboto Light, Roboto Bold, and JetBrains Mono Nerd Font