Raspberry Pi 5 Coding Assistant with Ollama and Continue

by André · February 9, 2025

Here’s how you can run a large language model (LLM) on a Raspberry Pi. The goal of this article is to find out if running an LLM on a Raspberry Pi is a reasonable replacement for tools like ChatGPT and GitHub Copilot.

You can watch the video below for a more detailed view of what you should expect from running an LLM on your Raspberry Pi (as a coding assistant). The video also shows how the Raspberry Pi 5 experience compares to other single-board computers with a built-in NPU, like the Radxa Rock 5C.

There are many benefits to running your local instance of an LLM:

You are not dependent on multi-billion-dollar companies that can change their terms of service on a whim.
Information never leaves your computer/network, providing the best privacy proposition
Flexibility trying different LLMs not available through paid subscription services like ChatGPT

Installing LLMs on a Raspberry Pi

Ollama makes installing and running LLMs locally on your Raspberry Pi super easy. You only need an internet connection to download the models. Once they are downloaded, everything runs locally.

curl -fsSL https://ollama.com/install.sh | sh

If you are going to be connecting to Ollama from a different machine, you have to set a couple of environment variables to make it work. The first environment binds the service to all the IP addresses associated with your Pi. The second environment variable is a little trickier because it depends on how you will be accessing the Ollama server from another device. I chose to use the IP address, but you could choose to use the hostname.Make changes accordingly.

Make sure to replace <IP or Hostname> with the actual IP address or hostname for your Pi. I’ll use the IP address for the reminder of this article.

Edit the file /etc/systemd/system/ollama.service

Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=http://&lt;IP or Hostname>:11434"

Let’s restart Ollama so it can pick up the settings we just changed.

sudo systemctl daemon-reload
sudo systemctl restart ollama

Open a browser window and navigate to http://<IP address>:11434, and you should see a message that Ollama is running. Double-check the environment variables configured above if you get a connection error or page not found.

Open WebUI (more on this below) allows you to download models directly from the web interface, but I prefer to use the terminal. Any of the models listed on Ollama’s page can be downloaded to the Raspberry Pi.

I recommend starting with 1.5B parameter models as they provide a good balance between accuracy and computational efficiency on the limited resources a Raspberry Pi offers.

Copy the run command from Ollama’s website and paste it into your terminal window.

ollama run qwen2.5-coder:1.5b

Chat Server

Once Ollama is ready to go, you can have a chat session directly from the terminal. But that’s clunky and not very user friendly. Open WebUI provides a more modern way of interacting with the LLM running on Ollama. This provides a similar experience to using a chatbot like ChatGPT.

Open WebUI is offered as a Python package so installing it should be straightforward. Starting with Raspberry Pi OS Bookworm, you have to create an environment to install Python packages using pip.

python -m venv myenv

The command above will create a folder called myenv that will house all the files required for a Python environment. To source (activate) the environment, type the command below.

source myenv/bin/activate

Now, install the Open WebUI package using pip.

pip install open-webui

Once the install is completed, start the Open WebUI server.

open-webui serve

The server will start with the default port 8080. To access your shiny new LLM chatbot, enter your Raspberry Pi IP address in a web browser followed by the port number. Open WebUI should load and you should be presented with a login screen. Go ahead and create an account for yourself. This is a local account and the information is not sent over the internet.

Once logged in, you can use the chatbox to help you with coding tasks.

If all you are looking for is a chatbot experience, you can stop here and enjoy your new assistance. But if you want to integrate Ollama directly into your code editor, like VS Code, more setup is needed.

Continue VS Code Plugin

The Continue VS Code plugin connects directly to the Ollama server running on the Raspberry Pi 5 and can provide automatic code completion. There are a few settings you should be aware of to make this work properly.

By default, Continue will send too much context data to the Raspberry Pi. This just causes the Pi 5 to run full tilt for a few minutes until it times out without returning a coding suggestion.

Open Continue’s settings and add the block below to the config.json file.

  "tabAutocompleteOptions": {
    "maxPromptTokens": 300
  }

Next, make sure the tab autocomplete settings match the model you are using and the Pi’s IP address. Here’s an example of how I set it up to use the Qwen2.5-Coder model.

  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b",
    "apiBase": "http://10.0.0.41:11434"
  }

Raspberry Pi 5 Coding Assistant with Ollama and Continue

Installing LLMs on a Raspberry Pi

Chat Server

Continue VS Code Plugin

You may also like...

Leave a Reply Cancel reply

Categories

Archives

Raspberry Pi 5 Coding Assistant with Ollama and Continue

Installing LLMs on a Raspberry Pi

Chat Server

Continue VS Code Plugin

Share this:

You may also like...

How to Boot a Pi CM4 from NVMe SSD

Pineberry Pi’s Pi 5 PCIe SSD Adapters

Raspberry Pi Camera Module 3 Comparison

Leave a Reply Cancel reply

Categories

Archives