Local Models with Ollama¶
Run AI locally with no API keys or internet required.
Why Local Models?¶
🔒 Privacy |
All data stays on your machine |
💰 Cost |
No API fees or usage limits |
⚡ Offline |
Works without internet |
🧪 Experimentation |
Test models freely |
Install Ollama¶
Start Ollama¶
ollama serve
Pull a Model¶
# Recommended for coding
ollama pull llama3.2 # General purpose
ollama pull codellama # Code-focused
ollama pull deepseek-coder # Coding specialist
ollama pull qwen2.5-coder # Code completion
Use with Perspt¶
# Chat mode
perspt chat --model llama3.2
# Agent mode
perspt agent --model codellama "Create a Python script"
Model Recommendations¶
Task |
Model |
Notes |
|---|---|---|
General chat |
|
Best all-around |
Code generation |
|
Good for agent mode |
Code completion |
|
Fast, accurate |
Reasoning |
|
Complex tasks |
Agent Mode with Local Models¶
Local models can power SRBN, but with considerations:
# Use local for all tiers
perspt agent \
--architect-model deepseek-coder:33b \
--actuator-model codellama:13b \
--verifier-model llama3.2 \
--speculator-model llama3.2 \
"Create a web scraper"
Performance Note
Local models are slower than cloud APIs. For complex agent tasks, consider using a capable cloud model for the Architect tier.
Hybrid Approach¶
Use cloud for planning, local for execution:
perspt agent \
--architect-model gpt-5.2 \
--actuator-model codellama:13b \
"Build an API"
GPU Acceleration¶
For faster inference:
# Check GPU usage
ollama ps
# Most models auto-detect GPU
# For manual control:
OLLAMA_GPU_LAYERS=35 ollama serve
Troubleshooting¶
Model not found:
ollama list # Show installed models
ollama pull <model> # Install missing model
Slow performance:
Use smaller models (7B instead of 13B)
Ensure GPU is being used
Increase
OLLAMA_NUM_PARALLEL
Connection refused:
# Ensure Ollama is running
ollama serve
# Check port (default 11434)
curl http://localhost:11434/api/tags
See Also¶
First Chat - Basic usage
Agent Mode Tutorial - Autonomous coding