Model Selection Guide
For users who want to optimize performance or need specific model characteristics
Complete guide to embedding models, performance characteristics, and selection criteria for RAG-lite TS.
Table of Contents
- Quick Selection
- Supported Models
- Performance Comparison
- Model Switching
- Configuration
- Use Cases
- Troubleshooting
Quick Selection
For most users (recommended):
# Fast, efficient, good quality
raglite ingest ./docs/ # Uses sentence-transformers/all-MiniLM-L6-v2
For highest quality:
# Slower but better semantic understanding
raglite ingest ./docs/ --model Xenova/all-mpnet-base-v2
Supported Models
sentence-transformers/all-MiniLM-L6-v2 (Default)
Best for: Speed, efficiency, general-purpose search
- Dimensions: 384
- Model Size: ~23MB
- Speed: ~127 embeddings/second
- Memory: ~343MB total usage
- Quality: Good for most use cases
Auto-configured settings:
- Chunk size: 250 tokens
- Batch size: 16
- Overlap: 50 tokens
Xenova/all-mpnet-base-v2 (High Quality)
Best for: Complex queries, technical content, research
- Dimensions: 768 (2x more semantic information)
- Model Size: ~110MB
- Speed: ~29 embeddings/second
- Memory: ~892MB total usage
- Quality: Excellent semantic understanding
Auto-configured settings:
- Chunk size: 400 tokens
- Batch size: 8
- Overlap: 80 tokens
Performance Comparison
Speed Benchmarks
Metric | MiniLM-L6-v2 | MPNet-base-v2 | Difference |
---|---|---|---|
Single embedding | 16ms | 114ms | 7x slower |
Batch (10 texts) | 79ms | 341ms | 4.3x slower |
Throughput | 127/sec | 29/sec | 4.3x slower |
Model loading | 460ms | 6,086ms | 13x slower |
Memory Usage
Metric | MiniLM-L6-v2 | MPNet-base-v2 | Difference |
---|---|---|---|
Processing | 1.6MB | 12.3MB | 7.5x more |
Total memory | 343MB | 892MB | 2.6x more |
Model cache | 23MB | 110MB | 4.8x larger |
Quality Characteristics
Aspect | MiniLM-L6-v2 | MPNet-base-v2 |
---|---|---|
General search | ✅ Excellent | ✅ Excellent |
Technical content | ✅ Good | ✅ Superior |
Complex queries | ✅ Good | ✅ Excellent |
Domain-specific | ✅ Moderate | ✅ Better |
Semantic nuance | ✅ Good | ✅ Superior |
Model Switching
CLI Method (Recommended)
Switch to high-quality model:
# Automatically rebuilds index if needed
raglite ingest ./docs/ --model Xenova/all-mpnet-base-v2 --rebuild-if-needed
raglite search "complex query" # Uses MPNet automatically
Switch back to fast model:
# Automatically rebuilds index if needed
raglite ingest ./docs/ --model sentence-transformers/all-MiniLM-L6-v2 --rebuild-if-needed
raglite search "simple query" # Uses MiniLM automatically
Configuration File Method
- Update your config file:
// raglite.config.js
export const config = {
embedding_model: 'Xenova/all-mpnet-base-v2',
// Other settings auto-configured for this model
};
- Rebuild the index:
raglite rebuild # Required when changing models via config
Environment Variable Method
# Set new model
export RAG_EMBEDDING_MODEL="Xenova/all-mpnet-base-v2"
# Rebuild required
raglite rebuild
⚠️ Important: Model switching requires rebuilding the vector index because embeddings have different dimensions (384 vs 768).
Configuration
Model-Specific Auto-Configuration
The system automatically optimizes settings based on your chosen model:
MiniLM-L6-v2 Defaults
{
embedding_model: "sentence-transformers/all-MiniLM-L6-v2",
chunk_size: 250, // Optimized for 384D
chunk_overlap: 50,
batch_size: 16, // Higher throughput
dimensions: 384
}
MPNet-base-v2 Defaults
{
embedding_model: "Xenova/all-mpnet-base-v2",
chunk_size: 400, // Larger chunks for 768D
chunk_overlap: 80,
batch_size: 8, // Lower for memory efficiency
dimensions: 768
}
Custom Overrides
You can override auto-configured values through environment variables or programmatically:
Environment Variables:
# Override batch size for MiniLM
export RAG_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
export RAG_BATCH_SIZE="32" # Increase for more speed (if memory allows)
# Override chunk size for MPNet
export RAG_EMBEDDING_MODEL="Xenova/all-mpnet-base-v2"
export RAG_CHUNK_SIZE="300" # Smaller chunks for faster processing
Programmatic Configuration:
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
// MiniLM with custom batch size for speed
const fastSearch = new SearchEngine('./index.bin', './db.sqlite', {
embeddingModel: 'sentence-transformers/all-MiniLM-L6-v2',
batchSize: 32, // Higher throughput
topK: 10
});
// MPNet with custom chunk size for efficiency
const qualityIngestion = new IngestionPipeline('./db.sqlite', './index.bin', {
embeddingModel: 'Xenova/all-mpnet-base-v2',
chunkSize: 300, // Smaller chunks
chunkOverlap: 60,
batchSize: 8
});
Use Cases
Decision guide for choosing the right model for your needs
Choose MiniLM-L6-v2 When:
✅ Speed is critical
- Real-time search applications
- Interactive user interfaces
- Large batch processing jobs
// Fast search for real-time applications
const realtimeSearch = new SearchEngine('./index.bin', './db.sqlite', {
embeddingModel: 'sentence-transformers/all-MiniLM-L6-v2',
enableReranking: false, // Skip reranking for speed
topK: 5,
batchSize: 16
});
✅ Resources are limited
- Systems with < 4GB RAM
- Mobile or edge devices
- Shared hosting environments
✅ General-purpose search
- Documentation search
- FAQ systems
- Basic content discovery
✅ High-volume processing
- Processing thousands of documents
- Frequent re-indexing
- Continuous ingestion pipelines
// High-throughput ingestion
const batchIngestion = new IngestionPipeline('./db.sqlite', './index.bin', {
embeddingModel: 'sentence-transformers/all-MiniLM-L6-v2',
batchSize: 32, // Process more at once
chunkSize: 250,
chunkOverlap: 50
});
Choose MPNet-base-v2 When:
✅ Quality is paramount
- Research applications
- Technical documentation
- Complex domain knowledge
// High-quality search for research
const researchSearch = new SearchEngine('./index.bin', './db.sqlite', {
embeddingModel: 'Xenova/all-mpnet-base-v2',
enableReranking: true, // Enable for best quality
topK: 20, // More results for comprehensive search
batchSize: 8
});
✅ Complex semantic understanding
- Scientific papers
- Legal documents
- Code documentation with context
✅ Specialized content
- Domain-specific terminology
- Technical specifications
- Academic literature
// Technical documentation ingestion
const technicalIngestion = new IngestionPipeline('./db.sqlite', './index.bin', {
embeddingModel: 'Xenova/all-mpnet-base-v2',
chunkSize: 400, // Larger chunks for context
chunkOverlap: 80, // More overlap for continuity
batchSize: 8
});
✅ Sufficient resources
- Systems with 8GB+ RAM
- Dedicated search servers
- Quality over speed requirements
System Requirements
Minimum Requirements
- MiniLM: 2GB RAM, any modern CPU
- MPNet: 4GB RAM, modern CPU with good single-thread performance
Recommended Requirements
- MiniLM: 4GB+ RAM for optimal batch processing
- MPNet: 8GB+ RAM for comfortable operation
Storage Requirements
- Model cache: 150MB for both models
- Index storage: ~2KB per chunk (MiniLM), ~4KB per chunk (MPNet)
- Database: ~1.5x original document size
Model Management
Automatic Downloads
Models are downloaded automatically on first use:
# First run downloads and caches model
raglite ingest ./docs/ --model Xenova/all-mpnet-base-v2
# Downloading model... (this may take a few minutes)
# Model cached at ~/.raglite/models/
# Subsequent runs use cached model
raglite search "query" # Fast startup
Cache Management
Models are cached globally and reused across projects:
# Check cache location
echo $HOME/.raglite/models/
# Clear cache if needed (will re-download on next use)
rm -rf ~/.raglite/models/
Offline Setup
For offline environments, see the GitHub models directory for manual model setup instructions.
Troubleshooting
Model Loading Issues
Problem: Model fails to download
# Check internet connection
# Verify disk space (>500MB free)
# Try again later (Hugging Face servers may be busy)
Problem: Out of memory during loading
# Switch to smaller model
export RAG_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
raglite rebuild
# Or reduce batch size
export RAG_BATCH_SIZE="4"
Performance Issues
Problem: Slow embedding generation
# Use faster model
raglite ingest ./docs/ --model sentence-transformers/all-MiniLM-L6-v2
# Increase batch size (if memory allows)
export RAG_BATCH_SIZE="32"
# Reduce chunk size
export RAG_CHUNK_SIZE="200"
Problem: High memory usage
# Use smaller model
export RAG_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
# Reduce batch size
export RAG_BATCH_SIZE="8"
# Process in smaller batches
raglite ingest ./docs/batch1/
raglite ingest ./docs/batch2/
Model Compatibility
Problem: "Model mismatch detected"
# The system shows current vs index model:
# Current: Xenova/all-mpnet-base-v2 (768 dimensions)
# Index: sentence-transformers/all-MiniLM-L6-v2 (384 dimensions)
# Solution: Rebuild with new model
raglite rebuild
Problem: Inconsistent search results after model change
# Ensure complete rebuild
raglite rebuild
# Re-ingest if needed
raglite ingest ./docs/
Future Models
Planned Support
- sentence-transformers/all-mpnet-base-v2: Original HuggingFace version
- BAAI/bge-small-en-v1.5: Competitive 384D alternative
- intfloat/e5-small-v2: Another quality option
Evaluation Criteria
- transformers.js compatibility
- Performance characteristics
- Model size and memory usage
- Community adoption
- Quality benchmarks
Best Practices
Development Workflow
- Start with MiniLM for fast iteration
- Test with your actual content to assess quality needs
- Switch to MPNet if quality is insufficient
- Benchmark both models with your specific use case
Production Deployment
- Choose model based on requirements (speed vs quality)
- Pre-download models in deployment pipeline
- Monitor memory usage and adjust batch sizes
- Set up model caching for consistent performance
Model Selection Decision Tree
Do you need the highest possible quality?
├─ Yes → Use MPNet-base-v2
└─ No → Do you have resource constraints?
├─ Yes → Use MiniLM-L6-v2
└─ No → Do you process large volumes?
├─ Yes → Use MiniLM-L6-v2
└─ No → Test both, choose based on results
This guide covers everything you need to know about model selection and management. For detailed performance benchmarks, see EMBEDDING_MODELS_COMPARISON.md.