Skip to main content

Model Selection Guide

For users who want to optimize performance or need specific model characteristics

Complete guide to embedding models, performance characteristics, and selection criteria for RAG-lite TS.

Table of Contents

Quick Selection

For most users (recommended):

# Fast, efficient, good quality
raglite ingest ./docs/ # Uses sentence-transformers/all-MiniLM-L6-v2

For highest quality:

# Slower but better semantic understanding
raglite ingest ./docs/ --model Xenova/all-mpnet-base-v2

Supported Models

sentence-transformers/all-MiniLM-L6-v2 (Default)

Best for: Speed, efficiency, general-purpose search

  • Dimensions: 384
  • Model Size: ~23MB
  • Speed: ~127 embeddings/second
  • Memory: ~343MB total usage
  • Quality: Good for most use cases

Auto-configured settings:

  • Chunk size: 250 tokens
  • Batch size: 16
  • Overlap: 50 tokens

Xenova/all-mpnet-base-v2 (High Quality)

Best for: Complex queries, technical content, research

  • Dimensions: 768 (2x more semantic information)
  • Model Size: ~110MB
  • Speed: ~29 embeddings/second
  • Memory: ~892MB total usage
  • Quality: Excellent semantic understanding

Auto-configured settings:

  • Chunk size: 400 tokens
  • Batch size: 8
  • Overlap: 80 tokens

Performance Comparison

Speed Benchmarks

MetricMiniLM-L6-v2MPNet-base-v2Difference
Single embedding16ms114ms7x slower
Batch (10 texts)79ms341ms4.3x slower
Throughput127/sec29/sec4.3x slower
Model loading460ms6,086ms13x slower

Memory Usage

MetricMiniLM-L6-v2MPNet-base-v2Difference
Processing1.6MB12.3MB7.5x more
Total memory343MB892MB2.6x more
Model cache23MB110MB4.8x larger

Quality Characteristics

AspectMiniLM-L6-v2MPNet-base-v2
General search✅ Excellent✅ Excellent
Technical content✅ Good✅ Superior
Complex queries✅ Good✅ Excellent
Domain-specific✅ Moderate✅ Better
Semantic nuance✅ Good✅ Superior

Model Switching

Switch to high-quality model:

# Automatically rebuilds index if needed
raglite ingest ./docs/ --model Xenova/all-mpnet-base-v2 --rebuild-if-needed
raglite search "complex query" # Uses MPNet automatically

Switch back to fast model:

# Automatically rebuilds index if needed
raglite ingest ./docs/ --model sentence-transformers/all-MiniLM-L6-v2 --rebuild-if-needed
raglite search "simple query" # Uses MiniLM automatically

Configuration File Method

  1. Update your config file:
// raglite.config.js
export const config = {
embedding_model: 'Xenova/all-mpnet-base-v2',
// Other settings auto-configured for this model
};
  1. Rebuild the index:
raglite rebuild  # Required when changing models via config

Environment Variable Method

# Set new model
export RAG_EMBEDDING_MODEL="Xenova/all-mpnet-base-v2"

# Rebuild required
raglite rebuild

⚠️ Important: Model switching requires rebuilding the vector index because embeddings have different dimensions (384 vs 768).

Configuration

Model-Specific Auto-Configuration

The system automatically optimizes settings based on your chosen model:

MiniLM-L6-v2 Defaults

{
embedding_model: "sentence-transformers/all-MiniLM-L6-v2",
chunk_size: 250, // Optimized for 384D
chunk_overlap: 50,
batch_size: 16, // Higher throughput
dimensions: 384
}

MPNet-base-v2 Defaults

{
embedding_model: "Xenova/all-mpnet-base-v2",
chunk_size: 400, // Larger chunks for 768D
chunk_overlap: 80,
batch_size: 8, // Lower for memory efficiency
dimensions: 768
}

Custom Overrides

You can override auto-configured values through environment variables or programmatically:

Environment Variables:

# Override batch size for MiniLM
export RAG_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
export RAG_BATCH_SIZE="32" # Increase for more speed (if memory allows)

# Override chunk size for MPNet
export RAG_EMBEDDING_MODEL="Xenova/all-mpnet-base-v2"
export RAG_CHUNK_SIZE="300" # Smaller chunks for faster processing

Programmatic Configuration:

import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';

// MiniLM with custom batch size for speed
const fastSearch = new SearchEngine('./index.bin', './db.sqlite', {
embeddingModel: 'sentence-transformers/all-MiniLM-L6-v2',
batchSize: 32, // Higher throughput
topK: 10
});

// MPNet with custom chunk size for efficiency
const qualityIngestion = new IngestionPipeline('./db.sqlite', './index.bin', {
embeddingModel: 'Xenova/all-mpnet-base-v2',
chunkSize: 300, // Smaller chunks
chunkOverlap: 60,
batchSize: 8
});

Use Cases

Decision guide for choosing the right model for your needs

Choose MiniLM-L6-v2 When:

✅ Speed is critical

  • Real-time search applications
  • Interactive user interfaces
  • Large batch processing jobs
// Fast search for real-time applications
const realtimeSearch = new SearchEngine('./index.bin', './db.sqlite', {
embeddingModel: 'sentence-transformers/all-MiniLM-L6-v2',
enableReranking: false, // Skip reranking for speed
topK: 5,
batchSize: 16
});

✅ Resources are limited

  • Systems with < 4GB RAM
  • Mobile or edge devices
  • Shared hosting environments

✅ General-purpose search

  • Documentation search
  • FAQ systems
  • Basic content discovery

✅ High-volume processing

  • Processing thousands of documents
  • Frequent re-indexing
  • Continuous ingestion pipelines
// High-throughput ingestion
const batchIngestion = new IngestionPipeline('./db.sqlite', './index.bin', {
embeddingModel: 'sentence-transformers/all-MiniLM-L6-v2',
batchSize: 32, // Process more at once
chunkSize: 250,
chunkOverlap: 50
});

Choose MPNet-base-v2 When:

✅ Quality is paramount

  • Research applications
  • Technical documentation
  • Complex domain knowledge
// High-quality search for research
const researchSearch = new SearchEngine('./index.bin', './db.sqlite', {
embeddingModel: 'Xenova/all-mpnet-base-v2',
enableReranking: true, // Enable for best quality
topK: 20, // More results for comprehensive search
batchSize: 8
});

✅ Complex semantic understanding

  • Scientific papers
  • Legal documents
  • Code documentation with context

✅ Specialized content

  • Domain-specific terminology
  • Technical specifications
  • Academic literature
// Technical documentation ingestion
const technicalIngestion = new IngestionPipeline('./db.sqlite', './index.bin', {
embeddingModel: 'Xenova/all-mpnet-base-v2',
chunkSize: 400, // Larger chunks for context
chunkOverlap: 80, // More overlap for continuity
batchSize: 8
});

✅ Sufficient resources

  • Systems with 8GB+ RAM
  • Dedicated search servers
  • Quality over speed requirements

System Requirements

Minimum Requirements

  • MiniLM: 2GB RAM, any modern CPU
  • MPNet: 4GB RAM, modern CPU with good single-thread performance
  • MiniLM: 4GB+ RAM for optimal batch processing
  • MPNet: 8GB+ RAM for comfortable operation

Storage Requirements

  • Model cache: 150MB for both models
  • Index storage: ~2KB per chunk (MiniLM), ~4KB per chunk (MPNet)
  • Database: ~1.5x original document size

Model Management

Automatic Downloads

Models are downloaded automatically on first use:

# First run downloads and caches model
raglite ingest ./docs/ --model Xenova/all-mpnet-base-v2
# Downloading model... (this may take a few minutes)
# Model cached at ~/.raglite/models/

# Subsequent runs use cached model
raglite search "query" # Fast startup

Cache Management

Models are cached globally and reused across projects:

# Check cache location
echo $HOME/.raglite/models/

# Clear cache if needed (will re-download on next use)
rm -rf ~/.raglite/models/

Offline Setup

For offline environments, see the GitHub models directory for manual model setup instructions.

Troubleshooting

Model Loading Issues

Problem: Model fails to download

# Check internet connection
# Verify disk space (>500MB free)
# Try again later (Hugging Face servers may be busy)

Problem: Out of memory during loading

# Switch to smaller model
export RAG_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
raglite rebuild

# Or reduce batch size
export RAG_BATCH_SIZE="4"

Performance Issues

Problem: Slow embedding generation

# Use faster model
raglite ingest ./docs/ --model sentence-transformers/all-MiniLM-L6-v2

# Increase batch size (if memory allows)
export RAG_BATCH_SIZE="32"

# Reduce chunk size
export RAG_CHUNK_SIZE="200"

Problem: High memory usage

# Use smaller model
export RAG_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"

# Reduce batch size
export RAG_BATCH_SIZE="8"

# Process in smaller batches
raglite ingest ./docs/batch1/
raglite ingest ./docs/batch2/

Model Compatibility

Problem: "Model mismatch detected"

# The system shows current vs index model:
# Current: Xenova/all-mpnet-base-v2 (768 dimensions)
# Index: sentence-transformers/all-MiniLM-L6-v2 (384 dimensions)

# Solution: Rebuild with new model
raglite rebuild

Problem: Inconsistent search results after model change

# Ensure complete rebuild
raglite rebuild

# Re-ingest if needed
raglite ingest ./docs/

Future Models

Planned Support

  • sentence-transformers/all-mpnet-base-v2: Original HuggingFace version
  • BAAI/bge-small-en-v1.5: Competitive 384D alternative
  • intfloat/e5-small-v2: Another quality option

Evaluation Criteria

  1. transformers.js compatibility
  2. Performance characteristics
  3. Model size and memory usage
  4. Community adoption
  5. Quality benchmarks

Best Practices

Development Workflow

  1. Start with MiniLM for fast iteration
  2. Test with your actual content to assess quality needs
  3. Switch to MPNet if quality is insufficient
  4. Benchmark both models with your specific use case

Production Deployment

  1. Choose model based on requirements (speed vs quality)
  2. Pre-download models in deployment pipeline
  3. Monitor memory usage and adjust batch sizes
  4. Set up model caching for consistent performance

Model Selection Decision Tree

Do you need the highest possible quality?
├─ Yes → Use MPNet-base-v2
└─ No → Do you have resource constraints?
├─ Yes → Use MiniLM-L6-v2
└─ No → Do you process large volumes?
├─ Yes → Use MiniLM-L6-v2
└─ No → Test both, choose based on results

This guide covers everything you need to know about model selection and management. For detailed performance benchmarks, see EMBEDDING_MODELS_COMPARISON.md.