Local AI Tech Stack: Setting Up a Self-Hosted AI Infrastructure

Create articles from any YouTube video or use our API to get YouTube transcriptions

or, create a free article to see how easy it is.

Introduction to Local AI Infrastructure

In recent years, the landscape of artificial intelligence has been rapidly evolving, with open-source models like Llama reaching performance levels that can compete with closed-source alternatives such as GPT. This progress has made running your own AI infrastructure not only feasible but increasingly attractive for many organizations and individuals.

This article will guide you through setting up a comprehensive local AI tech stack using a package developed by the n8n team. We'll cover everything from installation to creating a functional RAG (Retrieval-Augmented Generation) AI agent using locally hosted services.

Components of the Local AI Stack

The self-hosted AI starter kit we'll be using includes the following key components:

Ollama: For running large language models (LLMs)
Qdrant: A vector database for efficient similarity search
PostgreSQL: A robust SQL database
n8n: A workflow automation tool to tie everything together

This combination provides a solid foundation for building AI-powered applications entirely on your local machine.

Setting Up the Local AI Infrastructure

Prerequisites

Before we begin, make sure you have the following installed on your system:

Git
Docker Desktop (which includes Docker Compose)

Installation Steps

Clone the repository:

git clone https://github.com/n8n-io/self-hosted-ai-starter-kit.git
cd self-hosted-ai-starter-kit

Open the project in your preferred code editor (e.g., Visual Studio Code):
```
code .
```
Configure the environment variables:
- Open the .env file
- Set up your PostgreSQL username, password, and database name
- Configure n8n secrets (use long alphanumeric strings)
Modify the Docker Compose file:
- Expose the PostgreSQL port by adding these lines under the postgres service:
```
ports:
  - "5432:5432"
```
- Add an Ollama embedding model by including this line in the ollama service command:
```
ollama pull nomic-embed-text
```

Start the services:

For most users:
```
docker compose --profile cpu up -d
```
For Mac users:
```
docker compose --profile mac up -d
```
For NVIDIA GPU users:
```
docker compose --profile nvidia up -d
```

Wait for all containers to start and for Ollama to download the necessary models.

Verifying the Setup

Once the installation is complete, you can verify that everything is running correctly:

Open Docker Desktop
Look for the "self-hosted-ai-starter-kit" group
Expand it to see all running containers
Click on each container to view logs or execute commands

You can access the n8n interface by navigating to http://localhost:5678 in your web browser.

Building a RAG AI Agent in n8n

Now that we have our local AI infrastructure set up, let's create a RAG AI agent using n8n workflows.

Setting Up the Agent

Access your n8n instance at http://localhost:5678
Create a new workflow
Add a "Chat Trigger" node as the entry point

Configuring the AI Agent

Add an "AI Agent" node
Configure the Ollama chat model:
- Model: llama2:latest
- Base URL: http://host.docker.internal:11434
Set up PostgreSQL for chat memory:
- Table Name: Choose any name (n8n will create it automatically)
- Host: host.docker.internal
- Database, User, Password: Use values from your .env file
- Port: 5432
Configure the Qdrant vector store:
- API Key: Use the n8n password (should be pre-filled)
- URL: http://host.docker.internal:6333
Set up Ollama embeddings:
- Model: nomic-embed-text
- Base URL: http://host.docker.internal:11434

Creating the Document Ingestion Workflow

To populate our knowledge base, we'll create a workflow that ingests documents from Google Drive:

Add triggers for file creation and updates in a specific Google Drive folder
Fetch file metadata
Download the file
Extract text from the file
Split the text into chunks
Delete existing vectors for the file (if any)
Insert new vectors into Qdrant

Here's a crucial step often missed in RAG tutorials:

// Custom code to delete existing vectors before insertion
const { QdrantClient } = require("@qdrant/js-client-rest");

const client = new QdrantClient({
  url: "http://host.docker.internal:6333",
});

const fileId = $input.all()[0].json.fileId;

const response = await client.scroll("documents", {
  filter: {
    must: [
      {
        key: "metadata.file_id",
        match: {
          value: fileId,
        },
      },
    ],
  },
  limit: 100,
});

const pointIds = response.points.map((point) => point.id);

if (pointIds.length > 0) {
  await client.delete("documents", {
    points: pointIds,
  });
}

return { pointIds };

This code ensures that we don't have duplicate vectors for updated documents, maintaining the integrity of our knowledge base.

Testing the RAG AI Agent

With everything set up, we can now test our locally hosted RAG AI agent:

Save the workflow
Open the chat widget
Ask a question related to the ingested document

For example, if you've ingested a document about a company selling robotic pets, you might ask:

"What is the ad campaign focusing on?"

The agent should respond with relevant information extracted from the ingested document, demonstrating that it's successfully using the local LLM, vector database, and PostgreSQL for chat memory.

Extending the Local AI Stack

While this setup provides a solid foundation for local AI development, there are several ways to enhance and expand the infrastructure:

Implement Redis for caching to improve response times
Replace vanilla PostgreSQL with a self-hosted Supabase instance for added features like authentication
Develop a custom front-end interface for easier interaction with the AI agent
Incorporate best practices for prompt engineering and LLM interactions
Create template workflows for common AI tasks to accelerate development

Conclusion

Setting up a local AI infrastructure using open-source tools like Ollama, Qdrant, and n8n opens up a world of possibilities for AI development and experimentation. By following this guide, you've established a powerful foundation that can be customized and expanded to suit your specific needs.

The ability to run advanced AI models locally not only provides greater control over your data and processes but also allows for faster iteration and development. As open-source models continue to improve, the gap between local and cloud-based AI solutions narrows, making self-hosted AI infrastructures an increasingly attractive option for businesses and individuals alike.

Remember to keep your local AI stack updated and secure, and don't hesitate to explore new models and tools as they become available. The field of AI is rapidly evolving, and maintaining a flexible, locally hosted infrastructure puts you in an excellent position to adapt to new developments and leverage cutting-edge AI capabilities.

By mastering the setup and management of your local AI tech stack, you're not just following a trend – you're positioning yourself at the forefront of the AI revolution, ready to harness the full potential of artificial intelligence on your own terms.

Article created from: https://youtu.be/V_0dNE-H2gw?si=Kao3v6mJRoziNMAr

Local AI Tech Stack: Setting Up a Self-Hosted AI Infrastructure

Create articles from any YouTube video or use our API to get YouTube transcriptions

Introduction to Local AI Infrastructure

Components of the Local AI Stack