Create articles from any YouTube video or use our API to get YouTube transcriptions
Start for freeIntroduction to Local AI Infrastructure
In recent years, the landscape of artificial intelligence has been rapidly evolving, with open-source models like Llama reaching performance levels that can compete with closed-source alternatives such as GPT. This progress has made running your own AI infrastructure not only feasible but increasingly attractive for many organizations and individuals.
This article will guide you through setting up a comprehensive local AI tech stack using a package developed by the n8n team. We'll cover everything from installation to creating a functional RAG (Retrieval-Augmented Generation) AI agent using locally hosted services.
Components of the Local AI Stack
The self-hosted AI starter kit we'll be using includes the following key components:
- Ollama: For running large language models (LLMs)
- Qdrant: A vector database for efficient similarity search
- PostgreSQL: A robust SQL database
- n8n: A workflow automation tool to tie everything together
This combination provides a solid foundation for building AI-powered applications entirely on your local machine.
Setting Up the Local AI Infrastructure
Prerequisites
Before we begin, make sure you have the following installed on your system:
- Git
- Docker Desktop (which includes Docker Compose)
Installation Steps
-
Clone the repository:
git clone https://github.com/n8n-io/self-hosted-ai-starter-kit.git cd self-hosted-ai-starter-kit
-
Open the project in your preferred code editor (e.g., Visual Studio Code):
code .
-
Configure the environment variables:
- Open the
.env
file - Set up your PostgreSQL username, password, and database name
- Configure n8n secrets (use long alphanumeric strings)
- Open the
-
Modify the Docker Compose file:
- Expose the PostgreSQL port by adding these lines under the
postgres
service:ports: - "5432:5432"
- Add an Ollama embedding model by including this line in the
ollama
service command:ollama pull nomic-embed-text
- Expose the PostgreSQL port by adding these lines under the
-
Start the services:
- For most users:
docker compose --profile cpu up -d
- For Mac users:
docker compose --profile mac up -d
- For NVIDIA GPU users:
docker compose --profile nvidia up -d
- For most users:
-
Wait for all containers to start and for Ollama to download the necessary models.
Verifying the Setup
Once the installation is complete, you can verify that everything is running correctly:
- Open Docker Desktop
- Look for the "self-hosted-ai-starter-kit" group
- Expand it to see all running containers
- Click on each container to view logs or execute commands
You can access the n8n interface by navigating to http://localhost:5678
in your web browser.
Building a RAG AI Agent in n8n
Now that we have our local AI infrastructure set up, let's create a RAG AI agent using n8n workflows.
Setting Up the Agent
- Access your n8n instance at
http://localhost:5678
- Create a new workflow
- Add a "Chat Trigger" node as the entry point
Configuring the AI Agent
-
Add an "AI Agent" node
-
Configure the Ollama chat model:
- Model:
llama2:latest
- Base URL:
http://host.docker.internal:11434
- Model:
-
Set up PostgreSQL for chat memory:
- Table Name: Choose any name (n8n will create it automatically)
- Host:
host.docker.internal
- Database, User, Password: Use values from your
.env
file - Port: 5432
-
Configure the Qdrant vector store:
- API Key: Use the n8n password (should be pre-filled)
- URL:
http://host.docker.internal:6333
-
Set up Ollama embeddings:
- Model:
nomic-embed-text
- Base URL:
http://host.docker.internal:11434
- Model:
Creating the Document Ingestion Workflow
To populate our knowledge base, we'll create a workflow that ingests documents from Google Drive:
- Add triggers for file creation and updates in a specific Google Drive folder
- Fetch file metadata
- Download the file
- Extract text from the file
- Split the text into chunks
- Delete existing vectors for the file (if any)
- Insert new vectors into Qdrant
Here's a crucial step often missed in RAG tutorials:
// Custom code to delete existing vectors before insertion
const { QdrantClient } = require("@qdrant/js-client-rest");
const client = new QdrantClient({
url: "http://host.docker.internal:6333",
});
const fileId = $input.all()[0].json.fileId;
const response = await client.scroll("documents", {
filter: {
must: [
{
key: "metadata.file_id",
match: {
value: fileId,
},
},
],
},
limit: 100,
});
const pointIds = response.points.map((point) => point.id);
if (pointIds.length > 0) {
await client.delete("documents", {
points: pointIds,
});
}
return { pointIds };
This code ensures that we don't have duplicate vectors for updated documents, maintaining the integrity of our knowledge base.
Testing the RAG AI Agent
With everything set up, we can now test our locally hosted RAG AI agent:
- Save the workflow
- Open the chat widget
- Ask a question related to the ingested document
For example, if you've ingested a document about a company selling robotic pets, you might ask:
"What is the ad campaign focusing on?"
The agent should respond with relevant information extracted from the ingested document, demonstrating that it's successfully using the local LLM, vector database, and PostgreSQL for chat memory.
Extending the Local AI Stack
While this setup provides a solid foundation for local AI development, there are several ways to enhance and expand the infrastructure:
- Implement Redis for caching to improve response times
- Replace vanilla PostgreSQL with a self-hosted Supabase instance for added features like authentication
- Develop a custom front-end interface for easier interaction with the AI agent
- Incorporate best practices for prompt engineering and LLM interactions
- Create template workflows for common AI tasks to accelerate development
Conclusion
Setting up a local AI infrastructure using open-source tools like Ollama, Qdrant, and n8n opens up a world of possibilities for AI development and experimentation. By following this guide, you've established a powerful foundation that can be customized and expanded to suit your specific needs.
The ability to run advanced AI models locally not only provides greater control over your data and processes but also allows for faster iteration and development. As open-source models continue to improve, the gap between local and cloud-based AI solutions narrows, making self-hosted AI infrastructures an increasingly attractive option for businesses and individuals alike.
Remember to keep your local AI stack updated and secure, and don't hesitate to explore new models and tools as they become available. The field of AI is rapidly evolving, and maintaining a flexible, locally hosted infrastructure puts you in an excellent position to adapt to new developments and leverage cutting-edge AI capabilities.
By mastering the setup and management of your local AI tech stack, you're not just following a trend – you're positioning yourself at the forefront of the AI revolution, ready to harness the full potential of artificial intelligence on your own terms.
Article created from: https://youtu.be/V_0dNE-H2gw?si=Kao3v6mJRoziNMAr