Build Your Own Private, Customizable and Self-hosted AI GPT using Llama2 and Open WebUI with RAG

Build Your Own Private, Customizable and Self-hosted AI GPT using Llama2 and Open WebUI with RAG

What if you could build your own private GPT and connect it to your own knowledge base; technical solution description documents, design documents, technical manuals, RFC documents, configuration files, source code, scripts, MOPs (Method of Procedure), reports, notes, journals, log files, technical specification documents, technical guides, Root Cause Analysis (RCA) documents etc. Well, it’s possible and am about to show you how.

In this article, we are going to build a private GPT using a popular, free and open-source AI model called Llama2. We shall then connect Llama2 to a dockerized open-source graphical user interface (GUI) called Open WebUI to allow us interact with the AI model via a professional looking web interface. Using Open WebUI with RAG (Retrieval Augmented Generation), we shall create a “Knowledge Base” and upload the documents we want Llama2 to consult when answering our questions. And all this will be hosted locally on your PC or server running on CPU or GPU. Please note that having a GPU is an added advantage because AI models perform better on servers using GPUs compared to servers that use CPU. To successfully accomplish this task, here is what I believe is the basic requirements. Surprisingly simple but it works!  

  1. A server or PC; with CPU/GPU (GPU is preferred) @ 1.30 to 1.50 GHz, 16GB RAM, 500GB Storage
  2. An internet connection; Only during installation, setup and upgrades. No internet connection is needed to run and use your private AI.
  3. OS; Linux OS or Mac OS are preferred, but the setup can also run on Windows when combined with WSL (Windows Subsystem for Linux). I will be running this demo on Linux Ubuntu 22.04.
  4. Ollama API; An open-source platform that simplifies the process of installing and running large language models (LLMs) locally on consumer hardware.  It acts as a bridge between the user’s hardware and the complex requirements of running LLMs, making the process more accessible.
  5. Docker; A lightweight, standalone package that includes everything needed to run a piece of software, including code, runtime, system tools, system libraries and settings. Docker containers simplify deployment and ensure consistency across different environments.
  6. Llama2; a large language model (LLM) developed by Meta and Microsoft. It is open-source and available for commercial use, allowing developers to build upon it and create custom applications.
  7. Open WebUI; a user-friendly interface for Llama, this will be deployed via Docker.
  8. A collection of your documents; PDFs, Word, txt, etc. These will form your knowledge base.

But before we dive deep into the details of this demo, why build a private AI server when you can access public AI like ChatGPT, Copilot, or Gemini?

  1. Data privacy and security; a local and self-hosted AI gives you unparalleled privacy and control. In a scenario where you are working with private and confidential information for example when dealing with proprietary information, a private AI puts you in control of your data.
  2. Private AI is customizable and adaptable; using a process known as fine-tuning, you can adapt a pre-trained AI model like Llama2 to accomplish specific tasks and explore endless possibilities. This happens through training the model on new and specialized data. Also, through a technique known as RAG (Retrieval Augmented Generation), you can connect the LLMs to access your personalized and private knowledge sources like documents or databases to enhance the model’s response and provide more accurate information.  
  3. Empowering individuals and businesses; by leveraging local LLMs and RAG, Individuals and businesses can create their private and customized knowledge bases, question-answering chat bots based on personalized documents and company’s data without relying on external services. This is truly an empowering experience.
  4. No internet connection needed; yes, you can run a private AI offline without the need to connect to external servers. This enables enterprises to run and manage their own private AI infrastructure, including the deployment and fine-tuning of LLMs on their on-premises hardware.

Let’s also spend some time and understand the most commonly used terms in AI and the different building blocks of private AI before we get on with the demo.

What is an AI Model?

An AI model is a computer program trained on a large dataset to recognize patterns and make predictions or decisions. These models can perform tasks like text generation, image generation, image recognition, language translation, games, chat, etc. Think of an AI model like a sniffer dog trained to detect specific scents, an AI model is trained to identify patterns in data and understanding what AI models are is fundamental to understanding how AI works and what its limitations are.

What is RAG?

RAG (Retrieval-Augmented Generation) lets your AI access your internal data for accurate and relevant answers. It’s like having a research assistant built into your AI. RAG gives you real-time accuracy with your personal, private and confidential data by connecting AI to your knowledge base. RAG sort of eliminates the need for model retraining because it allows the pre-trained model to consult your private data before answering which maintains accuracy and enhances responses.

What is Hugging Face?

Hugging Face” is a platform and community for sharing and collaborating on machine learning models, datasets, and related resources. It hosts a vast collection of pre-trained models (over one million plus models and counting at the time of this article) that can be used for various tasks (e.g. Llama2 which we shall be using in this demo). Think of “Hugging Face” as a GitHub for AI models, where developers can share and collaborate on code, but for AI. Hugging Face makes it easy to access and use existing AI models, lowering the barrier to entry for developing AI applications. Hugging face is the home of AI models.

With this background information we are now ready to setup and run Llama2 with Open WebUI, follow the detailed steps below (You will need an active internet connection during this setup to allow access to the online software repos, after which you can run your private AI without internet):

Install a compatible OS (macOS or Linux), Windows OS is supported when combined with WSL (Windows Subsystem for Linux). For this demo, we shall use Ubuntu Linux 22.04 which belongs to the Debian distro family. Also remember to update/upgrade your OS, this ensures that your OS has the latest software updates and patches before adding applications.

These commands are used to update and upgrade your OS to the latest software patches:

#  apt update

#  apt upgrade -y

B) Download and install Ollama API Service

Ollama from ollama.com is a tool for managing and running large language models locally. It facilitates the download and execution of models like Llama2, ensuring they are readily available for use within the Open WebUI.

The following command will download, install and setup ollama on your server.

# curl -fsSL https://ollama.com/install.sh | sh

The following command will test and verify your Ollama installation.

# curl http://127.0.0.1:11434

C) Download and install Llama2 AI model

Llama2 will be the base for our private AI project and installing Llama2 on Linux is surprisingly easy with Ollama API. Just one command and you’re ready to start interacting with Llama2 AI model.

The following command will download and install Llama2.

# ollama pull llama2

Run and test the AI model with the command below:

# ollama run llama2

The above test is successful and at this stage we are interacting with Llama2 AI model but only via CLI. The next step is to install Open WebUI and interact with the model using a graphical interface.

D) Download and install docker engine

Because we want to dockerize Open WebUI (run it in a docker container), we shall first install and setup docker environment before installing the Open WebUI. For detailed information on how to install docker engine on Linux, please refer to the official docker website docs. For this project, am using Ubuntu and here is a detail guide on how to install docker engine on Ubuntu using “apt” repo.

i) Set up docker’s “apt” repo

# Add Docker’s official GPG key:

sudo apt-get update

sudo apt-get install ca-certificates curl

sudo install -m 0755 -d /etc/apt/keyrings

sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc

sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:

echo \

  “deb [arch=$(dpkg –print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \

  $(. /etc/os-release && echo “$VERSION_CODENAME”) stable” | \

  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update

ii) Install docker engine and its goodies:

The following command will download and install docker engine and its dependencies.

# apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

iii) Test Docker Engine installation by running the hello-world image. When the container runs, it prints a confirmation message and exits.

The following command downloads a test image and runs it in a container.

# docker run hello-world

E) Download and install Open WebUI Container

Open WebUI will be your private AI control panel and provides a beautiful GUI for Llama2 in this project. It is an extensible, feature-rich, easy-to-navigate, self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs. You can chat, upload files, and manage settings using this GUI. For detailed information, be sure to check out Open WebUI documentation on their official website or GitHub. We shall install Open WebUI with bundled Ollama support. i.e. the deployment uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. 

The following command will spin Open WebUI container instance, connect it to Ollama API service and make the GUI accessible on your server IP address on port 8080.

# docker run -d –network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 –name open-webui –restart always ghcr.io/open-webui/open-webui:main

The following command will verify that the Open WebUI container is running:

# docker ps

F) Access Open WebUI Portal, Configure Admin Setting, Upload Documents and Enable RAG(if needed)

Open a web browser and navigate to your server IP address on port 8080 e.g. http://a.b.c.d:8080 If you have a firewall enabled, make sure you have created firewall policies to allow access from external IPs. Explore Open Web UI features, including model selection, chat, file uploads, and admin settings. Begin interacting with Llama2 through the Open WebUI, ask questions and verify that the model retrieves and generates responses based on your documents.

To create an account, click on “Sign up”, the first account created will automatically become the admin account. When logged in you can change the model on the top left corner from the default “Arena Model” to “Llama2”:

Click on the account icon in the top right corner to access the portal settings.

To create your first knowledge base, Click the three lines menu on the top left corner, and select “workspace”. In the workspace menu, select “Knowledge” and click on the “+” in the right top corner.

Give your knowledge base a name, a good description and then click “Create Knowledge”

Select the knowledge base you created, and click on the “+” to the right of the “Search Collection” menu. On the pop-up menu, select Upload files. Browse to the file you want and click upload. To interact with the files you uploaded, click on “New Chat” in the top left corner, this will open a new chat with Llama2, begin your chat with a # sign, this will pop-up your knowledge base, select it to add it to the chat box and begin interacting with it, asking questions. You can add more files to the chat and ask questions about your own documents, personal notes, journals, company documents, technical guides, and more hence empowering personal knowledge management and this is the true power of private AI when combined with Open WebUI and RAG. Private AI is the future, putting you in control of your data and your AI.

You can also connect your private AI to Obsidian for enhanced note-taking. The Obsidian plugin (BMO Chatbot) for AI-powered note-taking is another game-changer for productivity. It can assist you to get contextual help and generate content within your notes. Another very interesting and powerful application of private AI is local image generation with Stable Diffusion, you can run Stable Diffusion with Automatic1111 UI or you can integrate it with Open Web UI for seamless image generation within the chat interface, It’s incredibly fast and easy to use but that will be a demo for another day. I hope this article has inspired you to build your own private AI, let me know in the comments section by sharing and suggesting new project ideas. Let’s explore the world of AI together.

About the Author

Joshua Makuru Nomwesigwa is a seasoned Telecommunications Engineer with vast experience in IP Technologies; he eats, drinks, and dreams IP packets. He is a passionate evangelist of the forth industrial revolution (4IR) a.k.a Industry 4.0 and all the technologies that it brings; 5G, Cloud Computing, BigData, Artificial Intelligence (AI), Machine Learning (ML), Internet of Things (IoT), Quantum Computing, etc. Basically, anything techie because a normal life is boring.

Spread the word: