AI Providers¶
This comprehensive guide covers all supported AI providers in Perspt powered by the modern genai crate (v0.3.5), their latest capabilities, configuration options, and best practices for optimal performance.
Overview¶
Perspt leverages the unified genai crate to provide seamless access to multiple AI providers with consistent APIs and enhanced features:
Latest GPT models including reasoning models (o1-series), GPT-4.1, and optimized variants
Claude 3.5 family with constitutional AI and safety-focused design
Gemini 2.5 Pro and multimodal capabilities with large context windows
Ultra-fast inference with Llama and Mixtral models
Command R+ models optimized for business and RAG applications
Grok models with real-time web access and humor
Local model hosting with privacy and offline capabilities
Grok models for advanced reasoning and conversation
OpenAI¶
OpenAI provides cutting-edge language models including the latest reasoning capabilities through the genai crate integration.
Supported Models¶
Model |
Context Length |
Best For |
Notes |
---|---|---|---|
|
128K tokens |
Enhanced reasoning, latest capabilities |
Most advanced GPT-4 variant (2025) |
|
128K tokens |
Complex reasoning, problem solving |
Advanced reasoning with step-by-step thinking |
|
128K tokens |
Fast reasoning, coding tasks |
Efficient reasoning model |
|
128K tokens |
Latest reasoning capabilities |
Newest reasoning model (2025) |
|
128K tokens |
Multimodal, fast performance |
Optimized for speed and quality |
|
128K tokens |
Fast, cost-effective (default) |
Efficient version of GPT-4o |
|
128K tokens |
Complex reasoning, analysis |
Previous generation flagship |
|
16K tokens |
Fast, cost-effective |
Good for simple tasks |
Configuration¶
Basic OpenAI configuration with genai crate:
{
"provider_type": "openai",
"api_key": "sk-your-openai-api-key",
"default_model": "gpt-4o-mini",
"providers": {
"openai": "https://api.openai.com/v1"
}
}
CLI Usage¶
# Use latest reasoning model
perspt --provider-type openai --model o1-mini
# Use fastest model (default)
perspt --provider-type openai --model gpt-4o-mini
# List all available OpenAI models
perspt --provider-type openai --list-models
Reasoning Model Features
O1-series models provide enhanced reasoning with visual feedback:
> Solve this logic puzzle: There are 5 houses in a row...
[Reasoning...] Let me work through this step by step:
1. Setting up the constraints...
2. Analyzing the color clues...
3. Cross-referencing with pet information...
[Streaming...] Based on my analysis, here's the solution...
Environment Variables
export OPENAI_API_KEY="sk-your-key-here"
export OPENAI_ORG_ID="org-your-org-id" # Optional
Anthropic (Claude)¶
Anthropic’s Claude models excel at safety, reasoning, and nuanced understanding through constitutional AI principles.
Supported Models¶
Model |
Context Length |
Best For |
Notes |
---|---|---|---|
|
200K tokens |
Balanced performance, latest version |
Recommended default |
|
200K tokens |
Previous Sonnet version |
Stable and reliable |
|
200K tokens |
Fast responses, cost-effective |
Good for simple tasks |
|
200K tokens |
Most capable, complex reasoning |
Highest quality responses |
Configuration¶
{
"provider_type": "anthropic",
"api_key": "sk-ant-your-anthropic-key",
"default_model": "claude-3-5-sonnet-20241022",
"providers": {
"anthropic": "https://api.anthropic.com"
}
}
CLI Usage¶
# Use latest Claude model
perspt --provider-type anthropic --model claude-3-5-sonnet-20241022
# Use fastest Claude model
perspt --provider-type anthropic --model claude-3-5-haiku-20241022
# List available Anthropic models
perspt --provider-type anthropic --list-models
Environment Variables
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
Google AI (Gemini)¶
Google’s Gemini models offer multimodal capabilities and large context windows with competitive performance.
Supported Models¶
Model |
Context Length |
Best For |
Notes |
---|---|---|---|
|
1M tokens |
Latest experimental model |
Cutting-edge capabilities (2025) |
|
2M tokens |
Large documents, complex analysis |
Largest context window |
|
1M tokens |
Fast responses, good balance |
Recommended default |
|
32K tokens |
General purpose tasks |
Stable and reliable |
Configuration¶
{
"provider_type": "google",
"api_key": "your-google-api-key",
"default_model": "gemini-1.5-flash",
"providers": {
"google": "https://generativelanguage.googleapis.com"
}
}
CLI Usage¶
# Use latest Gemini model
perspt --provider-type google --model gemini-2.0-flash-exp
# Use model with largest context
perspt --provider-type google --model gemini-1.5-pro
# List available Google models
perspt --provider-type google --list-models
Environment Variables
export GOOGLE_API_KEY="your-key-here"
# or
export GEMINI_API_KEY="your-key-here"
"User-Agent": "Perspt/1.0"
}
}
Best Practices¶
Model Selection: - Use
gpt-4-turbo
for complex reasoning tasks - Usegpt-3.5-turbo
for simple queries to save costs - Usegpt-4-vision-preview
when working with imagesToken Management: - Monitor usage with longer conversations - Use appropriate
max_tokens
limits - Consider conversation history truncationRate Limits: - Implement retry logic for rate limit errors - Consider upgrading to higher tier plans for increased limits
Anthropic (Claude)¶
Anthropic’s Claude models are known for their helpfulness, harmlessness, and honesty.
Supported Models¶
Model |
Context Length |
Best For |
Notes |
---|---|---|---|
|
200K tokens |
Complex reasoning, creative tasks |
Most capable Claude model |
|
200K tokens |
Balanced performance/speed |
Good general-purpose model |
|
200K tokens |
Fast responses, simple tasks |
Most cost-effective |
|
200K tokens |
Legacy support |
Deprecated, use Claude-3 |
Configuration¶
Basic Anthropic configuration:
{
"provider": "anthropic",
"api_key": "your-anthropic-api-key",
"model": "claude-3-opus-20240229",
"base_url": "https://api.anthropic.com",
"version": "2023-06-01",
"max_tokens": 4000,
"temperature": 0.7,
"top_p": 1.0,
"top_k": 40,
"stop_sequences": ["\\n\\nHuman:", "\\n\\nAssistant:"]
}
Advanced Configuration¶
System Messages:
{
"provider": "anthropic",
"model": "claude-3-opus-20240229",
"system_message": "You are a helpful assistant specialized in software development. Provide detailed, accurate responses with code examples when appropriate."
}
Content Filtering:
{
"provider": "anthropic",
"content_filtering": {
"enabled": true,
"strictness": "moderate"
}
}
Best Practices¶
Model Selection: - Use
claude-3-opus
for complex analysis and creative work - Useclaude-3-sonnet
for balanced general-purpose tasks - Useclaude-3-haiku
for quick questions and simple tasksPrompt Engineering: - Claude responds well to clear, structured prompts - Use explicit instructions and examples - Leverage Claude’s strong reasoning capabilities
Long Conversations: - Take advantage of the large context window - Maintain conversation flow without frequent truncation
Google AI (Gemini)¶
Google’s Gemini models offer strong reasoning and multimodal capabilities.
Supported Models¶
Model |
Context Length |
Best For |
Notes |
---|---|---|---|
|
2M tokens |
Advanced reasoning, analysis |
Latest and most capable |
|
1M tokens |
Fast, efficient performance |
Optimized for speed |
|
2M tokens |
Complex reasoning, long context |
High-capability model |
|
1M tokens |
Fast responses, good quality |
Balanced speed and capability |
|
32K tokens |
General reasoning |
Legacy model |
|
16K tokens |
Multimodal tasks |
Supports images and text |
Configuration¶
Basic Google AI configuration:
{
"provider": "google",
"api_key": "your-google-api-key",
"model": "gemini-pro",
"base_url": "https://generativelanguage.googleapis.com/v1",
"safety_settings": {
"harassment": "BLOCK_MEDIUM_AND_ABOVE",
"hate_speech": "BLOCK_MEDIUM_AND_ABOVE",
"sexually_explicit": "BLOCK_MEDIUM_AND_ABOVE",
"dangerous_content": "BLOCK_MEDIUM_AND_ABOVE"
},
"generation_config": {
"temperature": 0.7,
"top_p": 1.0,
"top_k": 40,
"max_output_tokens": 4000
}
}
Multimodal Configuration¶
For image analysis with Gemini Vision:
{
"provider": "google",
"model": "gemini-pro-vision",
"multimodal": {
"enabled": true,
"supported_formats": ["png", "jpg", "jpeg", "webp", "gif"],
"max_image_size": "20MB"
}
}
Best Practices¶
Safety Settings: - Configure appropriate safety levels for your use case - Consider more permissive settings for creative tasks
Multimodal Usage: - Use Gemini Vision for image analysis and understanding - Combine text and images for richer interactions
Local Models¶
Perspt supports various local inference solutions for privacy and offline usage.
Ollama¶
Configuration for Ollama local models:
{
"provider": "ollama",
"base_url": "http://localhost:11434",
"model": "llama2:7b",
"stream": true,
"options": {
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"repeat_penalty": 1.1,
"seed": -1,
"num_ctx": 4096
}
}
Popular Ollama Models:
# Install popular models
ollama pull llama2:7b # General purpose
ollama pull codellama:7b # Code generation
ollama pull mistral:7b # Fast and capable
ollama pull neural-chat:7b # Conversational
LM Studio¶
Configuration for LM Studio:
{
"provider": "lm_studio",
"base_url": "http://localhost:1234/v1",
"model": "local-model",
"stream": true,
"context_length": 4096,
"gpu_layers": 35
}
OpenAI-Compatible Servers¶
For other OpenAI-compatible local servers:
{
"provider": "openai_compatible",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-model-name",
"stream": true
}
Provider Comparison¶
Provider |
Speed |
Quality |
Cost |
Privacy |
Context |
Multimodal |
---|---|---|---|---|---|---|
OpenAI |
Fast |
Excellent |
Medium |
Cloud |
128K |
Yes |
Anthropic |
Medium |
Excellent |
Medium |
Cloud |
200K |
No |
Google AI |
Fast |
Very Good |
Low |
Cloud |
32K |
Yes |
Groq |
Ultra-Fast |
Excellent |
Low |
Cloud |
32K |
No |
Local (Ollama) |
Variable |
Good |
Free |
Local |
Variable |
Limited |
Multi-Provider Setup¶
Configure multiple providers for different use cases:
{
"providers": {
"primary": {
"provider": "openai",
"model": "gpt-4-turbo",
"api_key": "your-openai-key"
},
"coding": {
"provider": "anthropic",
"model": "claude-3-opus-20240229",
"api_key": "your-anthropic-key"
},
"local": {
"provider": "ollama",
"model": "codellama:7b",
"base_url": "http://localhost:11434"
}
},
"default_provider": "primary"
}
Switch between providers during conversation:
> /provider coding
Switched to coding provider (Claude-3 Opus)
> /provider local
Switched to local provider (CodeLlama)
Fallback Configuration¶
Set up automatic fallbacks:
{
"fallback_chain": [
{
"provider": "openai",
"model": "gpt-4-turbo"
},
{
"provider": "anthropic",
"model": "claude-3-sonnet-20240229"
},
{
"provider": "ollama",
"model": "llama2:7b"
}
],
"fallback_conditions": [
"rate_limit_exceeded",
"api_error",
"timeout"
]
}
Troubleshooting¶
Common Issues¶
API Key Issues:
> /validate-key
Checking API key validity...
✓ OpenAI key: Valid
✗ Anthropic key: Invalid or expired
Connection Problems:
# Test connectivity
curl -H "Authorization: Bearer your-api-key" \\
https://api.openai.com/v1/models
Rate Limiting:
{
"rate_limiting": {
"requests_per_minute": 60,
"tokens_per_minute": 40000,
"retry_strategy": "exponential_backoff",
"max_retries": 3
}
}
Performance Optimization¶
Request Optimization:
{
"optimization": {
"batch_requests": true,
"compress_requests": true,
"connection_pooling": true,
"timeout": 30
}
}
Caching:
{
"cache": {
"enabled": true,
"provider_specific": true,
"ttl": 3600,
"max_size": "100MB"
}
}
Next Steps¶
Troubleshooting - Detailed troubleshooting for provider-specific issues
Advanced Features - Advanced features that work with different providers
Configuration Guide - Complete configuration reference
Extending Perspt - Create custom provider integrations
Groq¶
Groq provides ultra-fast inference speeds with popular open-source models, optimized for real-time conversations.
Supported Models¶
Model |
Context Length |
Best For |
Notes |
---|---|---|---|
|
128K tokens |
Complex reasoning, analysis |
Largest Llama model |
|
128K tokens |
Balanced performance |
Good general purpose model |
|
128K tokens |
Ultra-fast responses |
Best for speed |
|
32K tokens |
Mixture of experts |
Strong coding capabilities |
Configuration¶
{
"provider_type": "groq",
"api_key": "your-groq-api-key",
"default_model": "llama-3.1-70b-versatile",
"providers": {
"groq": "https://api.groq.com/openai/v1"
}
}
CLI Usage¶
# Ultra-fast responses
perspt --provider-type groq --model llama-3.1-8b-instant
# Balanced performance
perspt --provider-type groq --model llama-3.1-70b-versatile
Environment Variables
export GROQ_API_KEY="your-key-here"
Cohere¶
Cohere specializes in enterprise-focused models with strong RAG (Retrieval-Augmented Generation) capabilities.
Supported Models¶
Model |
Context Length |
Best For |
Notes |
---|---|---|---|
|
128K tokens |
RAG, business applications |
Most capable Cohere model |
|
128K tokens |
General purpose, fast |
Good balance of speed and quality |
|
4K tokens |
Simple tasks, cost-effective |
Basic model |
Configuration¶
{
"provider_type": "cohere",
"api_key": "your-cohere-api-key",
"default_model": "command-r-plus",
"providers": {
"cohere": "https://api.cohere.ai"
}
}
Environment Variables
export COHERE_API_KEY="your-key-here"
XAI (Grok)¶
XAI’s Grok models provide real-time web access and are known for their humor and current knowledge.
Supported Models¶
Model |
Context Length |
Best For |
Notes |
---|---|---|---|
|
128K tokens |
Current events, humor |
Latest Grok model |
|
128K tokens |
Multimodal analysis |
Image understanding |
Configuration¶
{
"provider_type": "xai",
"api_key": "your-xai-api-key",
"default_model": "grok-beta",
"providers": {
"xai": "https://api.x.ai/v1"
}
}
Environment Variables
export XAI_API_KEY="your-key-here"
Ollama (Local Models)¶
Ollama provides local model hosting for privacy, offline usage, and cost control with the genai crate integration. Perfect for testing, development, and privacy-conscious users.
Supported Models¶
Popular models available through Ollama:
Model |
Size |
RAM Required |
Best Use Cases |
---|---|---|---|
|
3B |
~4GB |
General chat, quick responses, testing |
|
8B |
~8GB |
Better reasoning, longer conversations |
|
70B |
~40GB |
Complex reasoning, professional tasks |
|
7B |
~7GB |
Code generation, debugging, technical docs |
|
7B |
~7GB |
Balanced performance, multilingual |
|
3.8B |
~4GB |
Efficient, resource-constrained systems |
|
7B |
~7GB |
Strong reasoning, mathematics |
# Large models (requires significant RAM)
llama3.1:70b # Most capable local model
qwen2.5:72b # Alibaba's flagship model
# Medium models (good balance)
llama3.1:8b # Recommended for most users
mistral-nemo:12b # Mistral's latest
codellama # Specialized for coding
# Small models (fast, low resource)
llama3.2 # Latest efficient model (default)
phi3 # Microsoft's compact model
qwen2.5:7b # Compact but capable
Setup and Configuration¶
Install Ollama:
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
Download Models:
# Download recommended starter models
ollama pull llama3.2 # General purpose (3B)
ollama pull codellama # Code assistance (7B)
ollama pull mistral # Balanced performance (7B)
# Optional: Download larger models if you have RAM
ollama pull llama3.1:8b # Better reasoning (8B)
ollama pull qwen2.5:7b # Strong at math/logic (7B)
# Check what's available
ollama list
Start Ollama Service:
# Start the service (runs on http://localhost:11434)
ollama serve
# Or run in background
nohup ollama serve > ollama.log 2>&1 &
Configure Perspt:
{
"provider_type": "ollama",
"default_model": "llama3.2",
"providers": {
"ollama": "http://localhost:11434/v1"
},
"api_key": "not-required"
}
CLI Usage¶
# Basic usage (no API key needed!)
perspt --provider-type ollama --model llama3.2
# Use specific models for different tasks
perspt --provider-type ollama --model codellama # For coding
perspt --provider-type ollama --model mistral # General purpose
perspt --provider-type ollama --model llama3.1:8b # Better reasoning
# List installed Ollama models
perspt --provider-type ollama --list-models
# Test connection and performance
perspt --provider-type ollama --model llama3.2 --config ollama_config.json
Testing Different Models
# Quick test with small model
echo "Explain quantum computing in simple terms" | \
perspt --provider-type ollama --model llama3.2
# Coding test with Code Llama
echo "Write a Python function to sort a list" | \
perspt --provider-type ollama --model codellama
# Reasoning test with larger model
echo "Solve this logic puzzle: ..." | \
perspt --provider-type ollama --model llama3.1:8b
Performance Monitoring
# Monitor resource usage
htop # Check CPU/Memory while running
# Time responses
time perspt --provider-type ollama --model llama3.2
# Compare model speeds
for model in llama3.2 mistral codellama; do
echo "Testing $model..."
time echo "What is 2+2?" | perspt --provider-type ollama --model $model
done
Benefits of Local Models
Privacy: Data stays on your machine
Offline Usage: No internet required after setup
Cost Control: No per-token charges
Customization: Fine-tune models for specific tasks
Environment Variables
export OLLAMA_HOST="http://localhost:11434"