AI Providers

This comprehensive guide covers all supported AI providers in Perspt powered by the modern genai crate (v0.3.5), their latest capabilities, configuration options, and best practices for optimal performance.

Overview

Perspt leverages the unified genai crate to provide seamless access to multiple AI providers with consistent APIs and enhanced features:

OpenAI

Latest GPT models including reasoning models (o1-series), GPT-4.1, and optimized variants

Anthropic

Claude 3.5 family with constitutional AI and safety-focused design

Google AI

Gemini 2.5 Pro and multimodal capabilities with large context windows

Groq

Ultra-fast inference with Llama and Mixtral models

Cohere

Command R+ models optimized for business and RAG applications

XAI

Grok models with real-time web access and humor

Ollama

Local model hosting with privacy and offline capabilities

XAI

Grok models for advanced reasoning and conversation

OpenAI

OpenAI provides cutting-edge language models including the latest reasoning capabilities through the genai crate integration.

Supported Models

Model

Context Length

Best For

Notes

gpt-4.1

128K tokens

Enhanced reasoning, latest capabilities

Most advanced GPT-4 variant (2025)

o1-preview

128K tokens

Complex reasoning, problem solving

Advanced reasoning with step-by-step thinking

o1-mini

128K tokens

Fast reasoning, coding tasks

Efficient reasoning model

o3-mini

128K tokens

Latest reasoning capabilities

Newest reasoning model (2025)

gpt-4o

128K tokens

Multimodal, fast performance

Optimized for speed and quality

gpt-4o-mini

128K tokens

Fast, cost-effective (default)

Efficient version of GPT-4o

gpt-4-turbo

128K tokens

Complex reasoning, analysis

Previous generation flagship

gpt-3.5-turbo

16K tokens

Fast, cost-effective

Good for simple tasks

Configuration

Basic OpenAI configuration with genai crate:

{
  "provider_type": "openai",
  "api_key": "sk-your-openai-api-key",
  "default_model": "gpt-4o-mini",
  "providers": {
    "openai": "https://api.openai.com/v1"
  }
}

CLI Usage

# Use latest reasoning model
perspt --provider-type openai --model o1-mini

# Use fastest model (default)
perspt --provider-type openai --model gpt-4o-mini

# List all available OpenAI models
perspt --provider-type openai --list-models

Reasoning Model Features

O1-series models provide enhanced reasoning with visual feedback:

> Solve this logic puzzle: There are 5 houses in a row...

[Reasoning...] Let me work through this step by step:
1. Setting up the constraints...
2. Analyzing the color clues...
3. Cross-referencing with pet information...
[Streaming...] Based on my analysis, here's the solution...

Environment Variables

export OPENAI_API_KEY="sk-your-key-here"
export OPENAI_ORG_ID="org-your-org-id"  # Optional

Anthropic (Claude)

Anthropic’s Claude models excel at safety, reasoning, and nuanced understanding through constitutional AI principles.

Supported Models

Model

Context Length

Best For

Notes

claude-3-5-sonnet-20241022

200K tokens

Balanced performance, latest version

Recommended default

claude-3-5-sonnet-20240620

200K tokens

Previous Sonnet version

Stable and reliable

claude-3-5-haiku-20241022

200K tokens

Fast responses, cost-effective

Good for simple tasks

claude-3-opus-20240229

200K tokens

Most capable, complex reasoning

Highest quality responses

Configuration

{
  "provider_type": "anthropic",
  "api_key": "sk-ant-your-anthropic-key",
  "default_model": "claude-3-5-sonnet-20241022",
  "providers": {
    "anthropic": "https://api.anthropic.com"
  }
}

CLI Usage

# Use latest Claude model
perspt --provider-type anthropic --model claude-3-5-sonnet-20241022

# Use fastest Claude model
perspt --provider-type anthropic --model claude-3-5-haiku-20241022

# List available Anthropic models
perspt --provider-type anthropic --list-models

Environment Variables

export ANTHROPIC_API_KEY="sk-ant-your-key-here"

Google AI (Gemini)

Google’s Gemini models offer multimodal capabilities and large context windows with competitive performance.

Supported Models

Model

Context Length

Best For

Notes

gemini-2.0-flash-exp

1M tokens

Latest experimental model

Cutting-edge capabilities (2025)

gemini-1.5-pro

2M tokens

Large documents, complex analysis

Largest context window

gemini-1.5-flash

1M tokens

Fast responses, good balance

Recommended default

gemini-pro

32K tokens

General purpose tasks

Stable and reliable

Configuration

{
  "provider_type": "google",
  "api_key": "your-google-api-key",
  "default_model": "gemini-1.5-flash",
  "providers": {
    "google": "https://generativelanguage.googleapis.com"
  }
}

CLI Usage

# Use latest Gemini model
perspt --provider-type google --model gemini-2.0-flash-exp

# Use model with largest context
perspt --provider-type google --model gemini-1.5-pro

# List available Google models
perspt --provider-type google --list-models

Environment Variables

export GOOGLE_API_KEY="your-key-here"
# or
export GEMINI_API_KEY="your-key-here"
    "User-Agent": "Perspt/1.0"
  }
}

Best Practices

  1. Model Selection: - Use gpt-4-turbo for complex reasoning tasks - Use gpt-3.5-turbo for simple queries to save costs - Use gpt-4-vision-preview when working with images

  2. Token Management: - Monitor usage with longer conversations - Use appropriate max_tokens limits - Consider conversation history truncation

  3. Rate Limits: - Implement retry logic for rate limit errors - Consider upgrading to higher tier plans for increased limits

Anthropic (Claude)

Anthropic’s Claude models are known for their helpfulness, harmlessness, and honesty.

Supported Models

Model

Context Length

Best For

Notes

claude-3-opus-20240229

200K tokens

Complex reasoning, creative tasks

Most capable Claude model

claude-3-sonnet-20240229

200K tokens

Balanced performance/speed

Good general-purpose model

claude-3-haiku-20240307

200K tokens

Fast responses, simple tasks

Most cost-effective

claude-2.1

200K tokens

Legacy support

Deprecated, use Claude-3

Configuration

Basic Anthropic configuration:

{
  "provider": "anthropic",
  "api_key": "your-anthropic-api-key",
  "model": "claude-3-opus-20240229",
  "base_url": "https://api.anthropic.com",
  "version": "2023-06-01",
  "max_tokens": 4000,
  "temperature": 0.7,
  "top_p": 1.0,
  "top_k": 40,
  "stop_sequences": ["\\n\\nHuman:", "\\n\\nAssistant:"]
}

Advanced Configuration

System Messages:

{
  "provider": "anthropic",
  "model": "claude-3-opus-20240229",
  "system_message": "You are a helpful assistant specialized in software development. Provide detailed, accurate responses with code examples when appropriate."
}

Content Filtering:

{
  "provider": "anthropic",
  "content_filtering": {
    "enabled": true,
    "strictness": "moderate"
  }
}

Best Practices

  1. Model Selection: - Use claude-3-opus for complex analysis and creative work - Use claude-3-sonnet for balanced general-purpose tasks - Use claude-3-haiku for quick questions and simple tasks

  2. Prompt Engineering: - Claude responds well to clear, structured prompts - Use explicit instructions and examples - Leverage Claude’s strong reasoning capabilities

  3. Long Conversations: - Take advantage of the large context window - Maintain conversation flow without frequent truncation

Google AI (Gemini)

Google’s Gemini models offer strong reasoning and multimodal capabilities.

Supported Models

Model

Context Length

Best For

Notes

gemini-2.5-pro

2M tokens

Advanced reasoning, analysis

Latest and most capable

gemini-2.0-flash

1M tokens

Fast, efficient performance

Optimized for speed

gemini-1.5-pro

2M tokens

Complex reasoning, long context

High-capability model

gemini-1.5-flash

1M tokens

Fast responses, good quality

Balanced speed and capability

gemini-pro

32K tokens

General reasoning

Legacy model

gemini-pro-vision

16K tokens

Multimodal tasks

Supports images and text

Configuration

Basic Google AI configuration:

{
  "provider": "google",
  "api_key": "your-google-api-key",
  "model": "gemini-pro",
  "base_url": "https://generativelanguage.googleapis.com/v1",
  "safety_settings": {
    "harassment": "BLOCK_MEDIUM_AND_ABOVE",
    "hate_speech": "BLOCK_MEDIUM_AND_ABOVE",
    "sexually_explicit": "BLOCK_MEDIUM_AND_ABOVE",
    "dangerous_content": "BLOCK_MEDIUM_AND_ABOVE"
  },
  "generation_config": {
    "temperature": 0.7,
    "top_p": 1.0,
    "top_k": 40,
    "max_output_tokens": 4000
  }
}

Multimodal Configuration

For image analysis with Gemini Vision:

{
  "provider": "google",
  "model": "gemini-pro-vision",
  "multimodal": {
    "enabled": true,
    "supported_formats": ["png", "jpg", "jpeg", "webp", "gif"],
    "max_image_size": "20MB"
  }
}

Best Practices

  1. Safety Settings: - Configure appropriate safety levels for your use case - Consider more permissive settings for creative tasks

  2. Multimodal Usage: - Use Gemini Vision for image analysis and understanding - Combine text and images for richer interactions

Local Models

Perspt supports various local inference solutions for privacy and offline usage.

Ollama

Configuration for Ollama local models:

{
  "provider": "ollama",
  "base_url": "http://localhost:11434",
  "model": "llama2:7b",
  "stream": true,
  "options": {
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 40,
    "repeat_penalty": 1.1,
    "seed": -1,
    "num_ctx": 4096
  }
}

Popular Ollama Models:

# Install popular models
ollama pull llama2:7b          # General purpose
ollama pull codellama:7b       # Code generation
ollama pull mistral:7b         # Fast and capable
ollama pull neural-chat:7b     # Conversational

LM Studio

Configuration for LM Studio:

{
  "provider": "lm_studio",
  "base_url": "http://localhost:1234/v1",
  "model": "local-model",
  "stream": true,
  "context_length": 4096,
  "gpu_layers": 35
}

OpenAI-Compatible Servers

For other OpenAI-compatible local servers:

{
  "provider": "openai_compatible",
  "base_url": "http://localhost:8000/v1",
  "api_key": "not-needed",
  "model": "local-model-name",
  "stream": true
}

Provider Comparison

Provider

Speed

Quality

Cost

Privacy

Context

Multimodal

OpenAI

Fast

Excellent

Medium

Cloud

128K

Yes

Anthropic

Medium

Excellent

Medium

Cloud

200K

No

Google AI

Fast

Very Good

Low

Cloud

32K

Yes

Groq

Ultra-Fast

Excellent

Low

Cloud

32K

No

Local (Ollama)

Variable

Good

Free

Local

Variable

Limited

Multi-Provider Setup

Configure multiple providers for different use cases:

{
  "providers": {
    "primary": {
      "provider": "openai",
      "model": "gpt-4-turbo",
      "api_key": "your-openai-key"
    },
    "coding": {
      "provider": "anthropic",
      "model": "claude-3-opus-20240229",
      "api_key": "your-anthropic-key"
    },
    "local": {
      "provider": "ollama",
      "model": "codellama:7b",
      "base_url": "http://localhost:11434"
    }
  },
  "default_provider": "primary"
}

Switch between providers during conversation:

> /provider coding
Switched to coding provider (Claude-3 Opus)

> /provider local
Switched to local provider (CodeLlama)

Fallback Configuration

Set up automatic fallbacks:

{
  "fallback_chain": [
    {
      "provider": "openai",
      "model": "gpt-4-turbo"
    },
    {
      "provider": "anthropic",
      "model": "claude-3-sonnet-20240229"
    },
    {
      "provider": "ollama",
      "model": "llama2:7b"
    }
  ],
  "fallback_conditions": [
    "rate_limit_exceeded",
    "api_error",
    "timeout"
  ]
}

Troubleshooting

Common Issues

API Key Issues:

> /validate-key
Checking API key validity...
✓ OpenAI key: Valid
✗ Anthropic key: Invalid or expired

Connection Problems:

# Test connectivity
curl -H "Authorization: Bearer your-api-key" \\
     https://api.openai.com/v1/models

Rate Limiting:

{
  "rate_limiting": {
    "requests_per_minute": 60,
    "tokens_per_minute": 40000,
    "retry_strategy": "exponential_backoff",
    "max_retries": 3
  }
}

Performance Optimization

Request Optimization:

{
  "optimization": {
    "batch_requests": true,
    "compress_requests": true,
    "connection_pooling": true,
    "timeout": 30
  }
}

Caching:

{
  "cache": {
    "enabled": true,
    "provider_specific": true,
    "ttl": 3600,
    "max_size": "100MB"
  }
}

Next Steps

Groq

Groq provides ultra-fast inference speeds with popular open-source models, optimized for real-time conversations.

Supported Models

Model

Context Length

Best For

Notes

llama-3.1-405b-reasoning

128K tokens

Complex reasoning, analysis

Largest Llama model

llama-3.1-70b-versatile

128K tokens

Balanced performance

Good general purpose model

llama-3.1-8b-instant

128K tokens

Ultra-fast responses

Best for speed

mixtral-8x7b-32768

32K tokens

Mixture of experts

Strong coding capabilities

Configuration

{
  "provider_type": "groq",
  "api_key": "your-groq-api-key",
  "default_model": "llama-3.1-70b-versatile",
  "providers": {
    "groq": "https://api.groq.com/openai/v1"
  }
}

CLI Usage

# Ultra-fast responses
perspt --provider-type groq --model llama-3.1-8b-instant

# Balanced performance
perspt --provider-type groq --model llama-3.1-70b-versatile

Environment Variables

export GROQ_API_KEY="your-key-here"

Cohere

Cohere specializes in enterprise-focused models with strong RAG (Retrieval-Augmented Generation) capabilities.

Supported Models

Model

Context Length

Best For

Notes

command-r-plus

128K tokens

RAG, business applications

Most capable Cohere model

command-r

128K tokens

General purpose, fast

Good balance of speed and quality

command

4K tokens

Simple tasks, cost-effective

Basic model

Configuration

{
  "provider_type": "cohere",
  "api_key": "your-cohere-api-key",
  "default_model": "command-r-plus",
  "providers": {
    "cohere": "https://api.cohere.ai"
  }
}

Environment Variables

export COHERE_API_KEY="your-key-here"

XAI (Grok)

XAI’s Grok models provide real-time web access and are known for their humor and current knowledge.

Supported Models

Model

Context Length

Best For

Notes

grok-beta

128K tokens

Current events, humor

Latest Grok model

grok-vision-beta

128K tokens

Multimodal analysis

Image understanding

Configuration

{
  "provider_type": "xai",
  "api_key": "your-xai-api-key",
  "default_model": "grok-beta",
  "providers": {
    "xai": "https://api.x.ai/v1"
  }
}

Environment Variables

export XAI_API_KEY="your-key-here"

Ollama (Local Models)

Ollama provides local model hosting for privacy, offline usage, and cost control with the genai crate integration. Perfect for testing, development, and privacy-conscious users.

Supported Models

Popular models available through Ollama:

Model

Size

RAM Required

Best Use Cases

llama3.2

3B

~4GB

General chat, quick responses, testing

llama3.1:8b

8B

~8GB

Better reasoning, longer conversations

llama3.1:70b

70B

~40GB

Complex reasoning, professional tasks

codellama

7B

~7GB

Code generation, debugging, technical docs

mistral

7B

~7GB

Balanced performance, multilingual

phi3

3.8B

~4GB

Efficient, resource-constrained systems

qwen2.5:7b

7B

~7GB

Strong reasoning, mathematics

# Large models (requires significant RAM)
llama3.1:70b     # Most capable local model
qwen2.5:72b      # Alibaba's flagship model

# Medium models (good balance)
llama3.1:8b      # Recommended for most users
mistral-nemo:12b # Mistral's latest
codellama        # Specialized for coding

# Small models (fast, low resource)
llama3.2         # Latest efficient model (default)
phi3             # Microsoft's compact model
qwen2.5:7b       # Compact but capable

Setup and Configuration

  1. Install Ollama:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh
  1. Download Models:

# Download recommended starter models
ollama pull llama3.2        # General purpose (3B)
ollama pull codellama       # Code assistance (7B)
ollama pull mistral         # Balanced performance (7B)

# Optional: Download larger models if you have RAM
ollama pull llama3.1:8b     # Better reasoning (8B)
ollama pull qwen2.5:7b      # Strong at math/logic (7B)

# Check what's available
ollama list
  1. Start Ollama Service:

# Start the service (runs on http://localhost:11434)
ollama serve

# Or run in background
nohup ollama serve > ollama.log 2>&1 &
  1. Configure Perspt:

{
  "provider_type": "ollama",
  "default_model": "llama3.2",
  "providers": {
    "ollama": "http://localhost:11434/v1"
  },
  "api_key": "not-required"
}

CLI Usage

# Basic usage (no API key needed!)
perspt --provider-type ollama --model llama3.2

# Use specific models for different tasks
perspt --provider-type ollama --model codellama    # For coding
perspt --provider-type ollama --model mistral      # General purpose
perspt --provider-type ollama --model llama3.1:8b  # Better reasoning

# List installed Ollama models
perspt --provider-type ollama --list-models

# Test connection and performance
perspt --provider-type ollama --model llama3.2 --config ollama_config.json

Testing Different Models

# Quick test with small model
echo "Explain quantum computing in simple terms" | \
perspt --provider-type ollama --model llama3.2

# Coding test with Code Llama
echo "Write a Python function to sort a list" | \
perspt --provider-type ollama --model codellama

# Reasoning test with larger model
echo "Solve this logic puzzle: ..." | \
perspt --provider-type ollama --model llama3.1:8b

Performance Monitoring

# Monitor resource usage
htop  # Check CPU/Memory while running

# Time responses
time perspt --provider-type ollama --model llama3.2

# Compare model speeds
for model in llama3.2 mistral codellama; do
  echo "Testing $model..."
  time echo "What is 2+2?" | perspt --provider-type ollama --model $model
done

Benefits of Local Models

  • Privacy: Data stays on your machine

  • Offline Usage: No internet required after setup

  • Cost Control: No per-token charges

  • Customization: Fine-tune models for specific tasks

Environment Variables

export OLLAMA_HOST="http://localhost:11434"