Imagine you're a developer at a startup, tasked with building a custom AI chatbot that can answer questions about your company's data. You need it to run securely and privately, without sending sensitive data to third-party APIs. You also want a solution that's ready for production use – containerized, scalable, and easy to maintain. This is exactly the scenario that led to OllamaStack – a modern, production-ready boilerplate combining Ollama, LangChain, Next.js, and FastAPI for privacy-first AI applications . In this blog post, we'll explore what OllamaStack is, how it works, and how you can use and extend it to rapidly build your own AI chat applications.
OllamaStack is an open-source template that provides everything you need to get an AI chatbot application up and running. It's designed to be beginner-friendly yet packed with powerful features for serious development. Whether you're prototyping a new product or building an internal tool, OllamaStack aims to save you months of work by providing a solid foundation out of the box. Let's dive into why you might want to use OllamaStack for your next AI project.
Beyond these core advantages, OllamaStack comes pre-loaded with features that make a chat app shine. It supports real-time streaming responses, conversation history with memory, and dynamic tool integration to extend the AI's capabilities. For example, the chatbot can use a calculator tool for math or a text analyzer tool when needed – and you can add your own tools as well. And because it runs on Ollama (an open-source LLM runtime), you're not limited to one model – you can run multiple models (Llama 2, CodeLlama, Mistral, etc.) and choose the right one for the task.
High-level architecture of OllamaStack. The stack has a Next.js frontend, a FastAPI backend, and an Ollama server hosting the LLM. The backend leverages LangChain agents (via LangGraph) to orchestrate calls to the local LLM and integrate tool capabilities.
Under the hood, OllamaStack follows a clean three-tier architecture for separation of concerns and scalability. The frontend is a Next.js 14 application (React 18 + TypeScript) that provides a modern chat UI. It's styled with Tailwind CSS for rapid design and is fully responsive, so users can chat from desktop or mobile. The frontend communicates with the backend via REST API calls (using Axios on the client side).
The backend is a FastAPI server (Python 3) which acts as the brain of the application. When the frontend sends a user's query, the FastAPI backend handles it asynchronously and uses LangChain to manage the AI conversation logic. LangChain (with the help of a library called LangGraph) enables advanced agent behavior – the AI can maintain conversation context (memory), and invoke various tools to perform tasks. For example, out of the box the stack includes agent tools for math calculations and text analysis. The backend orchestrates these tools and the language model to produce a useful answer for the user.
The LLM (Large Language Model) itself runs locally via Ollama. Ollama is an engine for serving LLMs on your own hardware (like Llama 2 or other models), so you don't have to rely on cloud APIs. In OllamaStack, the LLM is hosted in a separate Docker container provided by Ollama, typically listening on port 11434. The FastAPI backend sends the user's prompt to the Ollama service (through the LangChain integration) and streams back the model's response. Because the model is local, responses come with low latency and your data never leaves your server. Ollama supports multiple model types – you can run a standard chatbot model, a code-focused model like CodeLlama, or others as needed. The system is designed so that swapping or adding models is straightforward.
To summarize the flow: the user interacts with the Next.js frontend, which calls the FastAPI backend. The backend uses LangChain agents (with possible tool calls) and queries the Ollama LLM server to get a response. This response is then streamed back through the backend to the frontend, enabling a smooth, real-time chat experience. All three components are decoupled and communicate over well-defined interfaces (HTTP for frontend-backend, and Ollama's API for backend-LLM), making it easy to scale or modify parts independently.
One of the goals of OllamaStack is to let developers get a working AI app running within minutes on their machine. The project provides a Docker-based development setup, so you don't have to install a bunch of dependencies manually. Here's how you can get started:
# 1. Clone the repository
git clone https://github.com/saki007ster/OllamaStack.git
cd OllamaStack
# 2. Start all services (frontend, backend, LLM) with Docker Compose
docker-compose up --build
# 3. Pull an LLM model for Ollama to use (e.g., Llama2)
docker exec ollamastack-ollama-1 ollama pull llama2
# 4. Your AI chat app is now running!
# Frontend UI: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs (Swagger UI): http://localhost:8000/docs
Yes, it's that simple – clone the repo and run one command to spin up everything. Docker will fetch and build the necessary images (frontend, backend, and the Ollama runtime). In step 3, we use Ollama's CLI to download a local model. In this example we pulled a variant of Llama 2, but you could substitute any model supported by Ollama (for instance, ollama pull codellama
to get a code-oriented model). After that, you can open your browser to http://localhost:3000 and start chatting with the AI!
During development, you might prefer running the frontend and backend in dev mode with hot-reload. OllamaStack supports that too: you can run npm run dev:frontend
to start the Next.js dev server on port 3000, and npm run dev:backend
to start the FastAPI server on port 8000. This way, you can edit code on either end and see changes immediately, which is great for iterative development. The backend also provides an interactive API docs UI (powered by FastAPI's automatic OpenAPI schema) at http://localhost:8000/docs – useful for testing endpoints or integrating with other services.
One of the best things about OllamaStack is how easy it is to make it your own. It's designed as a boilerplate that you can extend rather than a one-size-fits-all product. Here we'll look at a few ways you can customize and build on top of OllamaStack.
1. Theming and UI Customization: Want to match your company's branding or tweak the chat UI? OllamaStack uses a centralized theme configuration on the frontend. With just a few lines of code, you can create a new theme or modify colors, fonts, and more. For example, in the file frontend/lib/themes.ts
you can define a custom theme:
// frontend/lib/themes.ts
export const themes = {
yourBrand: {
primary: '#FF5733',
background: '#1E1E1E',
// ... 50+ customizable properties
}
};
Creating a new theme named "yourBrand" by setting brand colors and other style properties.
By adjusting the theme config, the entire application's look and feel can be changed to suit your needs – no need to dive deep into CSS. You can switch themes or extend the default theme, and because the frontend is built with Tailwind CSS and modular components, styling is both flexible and maintainable.
2. Adding New AI Tools (Agent Plugins): The backend's AI agent can be extended with custom tools to handle domain-specific tasks. OllamaStack comes with a few example tools (like a calculator for math, and a text analyzer), but you can easily add your own. Tools are just Python functions that the AI agent is allowed to call when appropriate. Here's how you could add a new tool:
# backend/app/tools/your_tool.py
from langchain.agents import tool
@tool
def your_custom_tool(input: str) -> str:
"""Your custom AI tool that echoes input in reverse"""
return input[::-1]
Defining a simple custom tool in the backend. The @tool decorator registers the function so the LangChain agent can use it.
In this example, we define your_custom_tool
that just reverses a string (silly, but illustrates the idea). By adding this module and registering it, the AI agent can now use the your_custom_tool
whenever it decides it's needed. In a real scenario, your tool could do anything – look up information in a database, call an external API, perform calculations, etc. This plugin-like architecture means you can give your AI "superpowers" beyond what the base LLM can do, all while keeping the logic and data processing under your control.
3. Changing or Adding Models: Since OllamaStack runs a local LLM server, you're not tied to a single model. You can pull and run any number of models supported by Ollama (for instance, a smaller/faster model for quick responses and a larger one for more accurate answers). The system is configured to use one model at a time by default, but you could extend it to route queries to different models based on context. Simply pulling a new model (via ollama pull modelName
) and updating the backend configuration to load that model is usually all that's needed to switch. The stack's documentation provides guidance on multi-model setups and how to adjust memory and context settings for different model sizes.
4. Other Extension Points: Virtually every part of OllamaStack can be customized. The modular project structure makes it easy to modify or replace components – for example, you could swap out the Next.js frontend for a custom UI framework, or replace FastAPI with another backend if you prefer, while still leveraging Ollama for LLM hosting. You can integrate a database to store chat histories or user profiles. The roadmap for the project even includes features like user authentication, persistent storage, and a plugin marketplace for sharing tools, which gives you an idea of how it can grow. Because it's open source (MIT licensed) and community-driven, you're free to bend it to your will. And if you build something cool, you can contribute back!
When it comes time to deploy your application, OllamaStack has you covered. Thanks to Docker, deploying the whole stack is straightforward on many platforms. You can run it on a single VM, on your own hardware, or use container services. Here are a few examples of how you could deploy:
# Deploy to AWS ECS using Terraform
terraform apply
# Deploy to Google Cloud Run
gcloud run deploy
# Deploy to a Kubernetes cluster
kubectl apply -f k8s/
One-command deployments to various platforms. The project provides config for AWS, GCP, and Kubernetes to jump-start production deployments.
The repository includes sample configurations (Terraform scripts, Kubernetes manifests, etc.) to facilitate cloud deployment. For instance, you can use the provided Terraform configuration to set up an AWS ECS cluster running the app, or use the included Kubernetes YAML to deploy on a K8s cluster. Since the app is split into containers (frontend, backend, LLM, plus an Nginx proxy for production), it's easy to scale horizontally by adding more backend instances or even load-balancing multiple LLM servers if needed. The stateless nature of the services (by default) means you can scale out without major complications – and you can introduce a database or persistent storage if your application requires remembering data between restarts.
Crucially, because everything is self-hosted, your deployment costs are predictable (no per-request or per-token fees to an AI API) – you just provision enough compute to run the model you need. For smaller models, this might even run on modest hardware like a single cloud instance or an on-premise server. And for heavier workloads, you can leverage cloud GPUs or larger machines; the stack will take advantage of them (for example, Ollama can use GPU acceleration if available).
What can you build with OllamaStack? The possibilities span many domains. Here are just a few examples of use cases that developers and organizations can tackle:
These are just a few ideas – in practice, because you can change the model and tools, you can tailor the AI for pretty much any niche. We've seen early users experiment with everything from customer support bots to scientific research assistants. The beauty is that the same stack can be adapted to all these scenarios with just configuration and custom code, rather than re-building an app from scratch each time.
OllamaStack represents a fusion of cutting-edge AI technology with pragmatic, real-world software engineering. For developers and startup engineers, it provides a shortcut to launching powerful AI chat applications without sacrificing privacy or spending weeks on boilerplate code. With a few commands, you get a fully-functioning app comprised of a Next.js frontend, a FastAPI backend, and a locally-hosted LLM – all orchestrated together to deliver a seamless chat experience.
What makes this even more exciting is the community and momentum behind the project. The documentation is comprehensive (covering quick start, architecture deep-dives, customization tutorials, and more), and the project is open to contributions. The roadmap hints at major upcoming features like user auth, database integration, multi-tenant support, and even a plugin marketplace for sharing extensions – meaning OllamaStack is continuously evolving to meet more needs.
If you've been dreaming about building your own ChatGPT-like app or an AI assistant tailored to your domain, now is a great time to give OllamaStack a try. It can dramatically accelerate your development while keeping you in full control of your AI's behavior and data. Check out the GitHub repository to get the code, and join the community discussions if you have ideas or need help. With OllamaStack, you can focus on what your AI should do, and let the boilerplate take care of the rest – enabling you to bring innovative, privacy-first AI solutions to life faster than ever. Happy hacking!