Basic Usage¶
This guide covers the fundamental usage patterns of Perspt, from starting your first conversation to understanding the CLI commands and streaming features powered by the modern genai crate.
Starting Perspt¶
Perspt uses the latest genai crate (v0.3.5) for unified LLM access with enhanced capabilities. You can start it with various configuration options:
Basic Usage
# Start with default configuration (OpenAI gpt-4o-mini)
perspt
Provider Selection
# Use Anthropic with Claude 3.5 Sonnet
perspt --provider-type anthropic --model claude-3-5-sonnet-20241022
# Use Google Gemini
perspt --provider-type gemini --model gemini-1.5-flash
# Use latest reasoning models
perspt --provider-type openai --model o1-mini
Configuration Files
# Use custom configuration file
perspt --config /path/to/your/config.json
# Override API key from command line
perspt --api-key your-api-key-here
Model Discovery
# List all available models for current provider
perspt --list-models
# List models for specific provider
perspt --provider-type anthropic --list-models
Your First Conversation¶
When Perspt starts, you’ll see a clean interface with model validation and streaming capabilities:
Perspt v0.4.0 - Performance LLM Chat CLI
Provider: OpenAI | Model: gpt-4o-mini | Status: Connected ✓
Enhanced streaming with genai crate v0.3.5
Type your message and press Enter to start a conversation.
Use Ctrl+C to exit gracefully.
>
Simply type your message or question and press Enter. Perspt will validate the model connection before starting:
> Hello, can you explain quantum computing?
Enhanced Streaming Experience
With the genai crate integration, responses stream in real-time with proper event handling:
Reasoning Models: See thinking process with reasoning chunks for o1-series models
Regular Models: Smooth token-by-token streaming for immediate feedback
Error Recovery: Robust error handling with terminal restoration
The AI maintains context throughout the session and provides rich, formatted responses with markdown support.
CLI Arguments and Options¶
Perspt supports comprehensive command-line arguments that actually work with the genai crate integration:
Core Arguments
# Configuration
perspt --config|-c FILE # Custom configuration file path
# Authentication
perspt --api-key|-k KEY # Override API key
# Model Selection
perspt --model|-m MODEL # Specific model name
perspt --provider-type|-p TYPE # Provider type
perspt --provider PROFILE # Provider profile from config
# Discovery
perspt --list-models|-l # List available models
Supported Provider Types
openai # OpenAI GPT models (default)
anthropic # Anthropic Claude models
google # Google Gemini models
groq # Groq ultra-fast inference
cohere # Cohere Command models
xai # XAI Grok models
deepseek # DeepSeek models
ollama # Local Ollama models
Example Usage Patterns
# Quick reasoning with o1-mini
perspt -p openai -m o1-mini
# Creative writing with Claude
perspt -p anthropic -m claude-3-5-sonnet-20241022
# Fast local inference
perspt -p ollama -m llama3.2
# Validate model before starting
perspt -p google -m gemini-2.0-flash-exp --list-models
Interactive Commands¶
Once in the chat interface, you can use keyboard shortcuts for efficient interaction:
Navigation Shortcuts
Shortcut |
Action |
---|---|
Enter |
Send message (validated before transmission) |
Ctrl+C |
Exit gracefully with terminal restoration |
↑/↓ Keys |
Scroll through chat history |
Page Up/Down |
Fast scroll through long conversations |
Ctrl+L |
Clear screen (preserves context) |
Input Management
Multi-line Input: Natural line breaks supported
Input Queuing: Type new messages while AI responds
Context Preservation: Full conversation history maintained
Markdown Rendering: Rich text formatting in responses
Managing Conversations¶
Enhanced Context Management¶
With the genai crate integration, Perspt provides superior context handling:
Context Awareness - Full conversation history maintained per session - Automatic context window management for each provider - Smart truncation when approaching token limits - Provider-specific optimizations
Streaming and Responsiveness - Real-time token streaming for immediate feedback - Reasoning chunk display for o1-series models - Background processing while you type new queries - Robust error recovery with terminal restoration
Example of enhanced conversation flow:
> I'm working on a Rust project with async/await
[Streaming...] I'd be happy to help with your Rust async project!
Rust's async/await provides excellent performance for concurrent operations...
> How do I handle multiple futures concurrently?
[Streaming...] For handling multiple futures concurrently in your Rust project,
you have several powerful options with tokio...
> Show me an example with tokio::join!
[Reasoning...] Let me provide a practical example using tokio::join!
for your async Rust project...
Advanced Conversation Features
Input Queuing: Continue typing while AI generates responses
Context Preservation: Seamless topic transitions within sessions
Error Recovery: Automatic reconnection and state restoration
Model Validation: Pre-flight checks ensure model availability
Message Formatting and Rendering¶
Enhanced Markdown Support¶
Perspt includes a custom markdown parser optimized for terminal rendering:
Supported Formatting
**Bold text** and *italic text*
`inline code` and ```code blocks```
# Headers and ## Subheaders
- Bullet points
- With proper indentation
1. Numbered lists
2. With automatic formatting
Code Block Rendering
Share code with syntax highlighting hints:
> Can you help optimize this Rust function?
```rust
async fn process_data(data: Vec<String>) -> Result<Vec<String>, Error> {
// Your code here
}
```
Long Message Handling
Automatic text wrapping for terminal width
Proper paragraph breaks and spacing
Smooth scrolling through long responses
Visual indicators for streaming progress
Testing Local Models with Ollama¶
Ollama provides an excellent way to test local models without API keys or internet connectivity. This section walks through setting up and testing Ollama with Perspt.
Prerequisites¶
Install and Start Ollama
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama service
ollama serve
Download Test Models
# Download Llama 3.2 (3B) - fast and efficient
ollama pull llama3.2
# Download Code Llama - for coding tasks
ollama pull codellama
# Verify models are available
ollama list
Basic Ollama Testing¶
Start with Simple Conversations
# Test basic functionality
perspt --provider-type ollama --model llama3.2
Example conversation flow:
Perspt v0.4.0 - Performance LLM Chat CLI
Provider: Ollama | Model: llama3.2 | Status: Connected ✓
Local model hosting - no API key required
> Hello! Can you help me understand how LLMs work?
[Assistant responds with explanation of language models...]
> That's helpful! Now explain it like I'm 5 years old.
[Assistant provides simplified explanation...]
Test Different Model Types
# General conversation
perspt --provider-type ollama --model llama3.2
# Coding assistance
perspt --provider-type ollama --model codellama
# Larger model for complex tasks (if you have enough RAM)
perspt --provider-type ollama --model llama3.1:8b
Performance Testing¶
Model Comparison
Test different model sizes to find the right balance for your system:
Model |
Size |
RAM Required |
Best For |
---|---|---|---|
|
3B |
~4GB |
Quick responses, chat |
|
8B |
~8GB |
Better reasoning, longer context |
|
7B |
~7GB |
Code generation, technical tasks |
|
7B |
~7GB |
Balanced performance |
Speed Testing
# Time how long responses take
time perspt --provider-type ollama --model llama3.2
# Compare with cloud providers
time perspt --provider-type openai --model gpt-4o-mini
Practical Test Scenarios
# Test 1: Basic Knowledge
> What is the capital of France?
# Test 2: Reasoning
> If a train travels 60 mph for 2.5 hours, how far does it go?
# Test 3: Creative Writing
> Write a short story about a robot learning to paint.
# Test 4: Code Generation (with codellama)
> Write a Python function to calculate fibonacci numbers.
Troubleshooting Ollama¶
Common Issues
# Check if Ollama is running
curl http://localhost:11434/api/tags
# If connection fails
ollama serve
# List available models
perspt --provider-type ollama --list-models
# Pull missing models
ollama pull llama3.2
Performance Issues
Slow responses: Try smaller models (llama3.2 vs llama3.1:8b)
Out of memory: Close other applications or use lighter models
Model not found: Ensure you’ve pulled the model with
ollama pull
Configuration for Regular Use
Create a config file for easy Ollama usage:
{
"provider_type": "ollama",
"default_model": "llama3.2",
"providers": {
"ollama": "http://localhost:11434/v1"
},
"api_key": "not-required"
}
# Save as ollama_config.json and use
perspt --config ollama_config.json
Benefits of Local Testing
Privacy: All data stays on your machine
Cost: No API fees or usage limits
Offline: Works without internet after initial setup
Experimentation: Try different models and settings freely
Learning: Understand model capabilities and limitations
Best Practices for Effective Usage¶
Communication Strategies¶
Optimized for GenAI Crate Integration
Model-Specific Approaches: - Reasoning Models (o1-series): Provide complex problems and let them work through the logic - Fast Models (gpt-4o-mini, claude-3-haiku): Use for quick questions and iterations - Large Context Models (claude-3-5-sonnet): Share entire codebases or documents
Provider Strengths: - OpenAI: Latest reasoning capabilities, coding assistance - Anthropic: Safety-focused, analytical reasoning, constitutional AI - Google: Multimodal capabilities, large context windows - Groq: Ultra-fast inference for real-time conversations
Effective Prompting Techniques
# Instead of vague requests:
> Help me with my code
# Be specific with context:
> I'm working on a Rust HTTP server using tokio and warp. The server
compiles but panics when handling concurrent requests. Here's the
relevant code: [paste code]. Can you help me identify the race condition?
Session Management Strategies
Single-Topic Sessions: Keep related discussions in one session for better context
Model Switching: Use perspt –list-models to explore optimal models for different tasks
Configuration Profiles: Set up different configs for work, creative, and development tasks
Troubleshooting Common Issues¶
Connection and Model Issues¶
Model Validation Failures
# Check if model exists for provider
perspt --provider-type openai --list-models | grep o1-mini
# Test connection with basic model
perspt --provider-type openai --model gpt-3.5-turbo
API Key Problems
# Test API key directly
perspt --api-key your-key --provider-type openai --list-models
# Use environment variables (recommended)
export OPENAI_API_KEY="your-key"
perspt
Streaming Issues
If streaming responses seem slow or interrupted:
Network Check: Ensure stable internet connection
Provider Status: Check provider service status pages
Model Selection: Try faster models like gpt-4o-mini
Terminal Compatibility: Ensure terminal supports ANSI colors and UTF-8
Performance Optimization¶
Memory and Speed
Local Models: Use Ollama for privacy and reduced latency
Model Selection: Choose appropriate model size for your task
Context Management: Clear context for unrelated new topics
Cost Optimization
Model Tiers: Use cheaper models (gpt-3.5-turbo) for simple queries
Streaming Benefits: Stop generation early if you have enough information
Batch Questions: Ask related questions in single sessions to share context
Next Steps¶
Once you’re comfortable with basic usage:
Advanced Features: Learn about configuration profiles and system prompts in Advanced Features
Provider Deep-Dive: Explore specific provider capabilities in AI Providers
Troubleshooting: Get help with specific issues in Troubleshooting
Configuration: Set up custom configurations in Configuration Guide