Local Models with Ollama¶

Run AI locally with no API keys or internet required.

Why Local Models?¶

🔒 Privacy	All data stays on your machine
💰 Cost	No API fees or usage limits
⚡ Offline	Works without internet
🧪 Experimentation	Test models freely

Install Ollama¶

macOS

brew install ollama

Linux

curl -fsSL https://ollama.ai/install.sh | sh

Windows

Download from ollama.ai

Start Ollama¶

ollama serve

Pull a Model¶

# Recommended for coding
ollama pull llama3.2        # General purpose
ollama pull codellama       # Code-focused
ollama pull deepseek-coder  # Coding specialist
ollama pull qwen2.5-coder   # Code completion

Use with Perspt¶

# Chat mode
perspt chat --model llama3.2

# Agent mode
perspt agent --model codellama "Create a Python script"

Model Recommendations¶

Task	Model	Notes
General chat	`llama3.2`	Best all-around
Code generation	`codellama:13b`	Good for agent mode
Code completion	`qwen2.5-coder`	Fast, accurate
Reasoning	`deepseek-coder:33b`	Complex tasks

Agent Mode with Local Models¶

Local models can power SRBN, but with considerations:

# Use local for all tiers
perspt agent \
  --architect-model deepseek-coder:33b \
  --actuator-model codellama:13b \
  --verifier-model llama3.2 \
  --speculator-model llama3.2 \
  "Create a web scraper"

Performance Note

Local models are slower than cloud APIs. For complex agent tasks, consider using a capable cloud model for the Architect tier.

Hybrid Approach¶

Use cloud for planning, local for execution:

perspt agent \
  --architect-model gpt-5.2 \
  --actuator-model codellama:13b \
  "Build an API"

GPU Acceleration¶

For faster inference:

# Check GPU usage
ollama ps

# Most models auto-detect GPU
# For manual control:
OLLAMA_GPU_LAYERS=35 ollama serve

Troubleshooting¶

Model not found:

ollama list     # Show installed models
ollama pull <model>  # Install missing model

Slow performance:

Use smaller models (7B instead of 13B)
Ensure GPU is being used
Increase OLLAMA_NUM_PARALLEL

Connection refused:

# Ensure Ollama is running
ollama serve

# Check port (default 11434)
curl http://localhost:11434/api/tags

Local Models with Ollama¶

Why Local Models?¶

Install Ollama¶

Start Ollama¶

Pull a Model¶

Use with Perspt¶

Model Recommendations¶

Agent Mode with Local Models¶

Hybrid Approach¶

GPU Acceleration¶

Troubleshooting¶

See Also¶