Skip to main content

Embedding Models Comparison and Selection Guide

Overview

This document provides a comprehensive comparison of embedding models supported by RAG-lite, including performance benchmarks, use case recommendations, and configuration guidelines to help you choose the right model for your needs.

Test Environment

All performance benchmarks were conducted on the following system:

System Specifications

  • OS: Microsoft Windows 11 Home Single Language (Build 26100)
  • CPU: 11th Gen Intel(R) Core(TM) i3-1115G4 @ 3.00GHz
  • RAM: 7,991 MB (~8GB)
  • Node.js: v20.19.2
  • Runtime: Single-threaded JavaScript execution
  • Storage: SSD (model caching enabled)

Test Methodology

  • Sample Size: 10 text samples for batch testing, 50 for performance testing
  • Measurements: Average of multiple runs
  • Memory: Process memory usage during embedding generation
  • Consistency: Multiple runs with identical inputs to verify deterministic behavior

Supported Models

1. sentence-transformers/all-MiniLM-L6-v2 (Current Default)

Specifications

  • Dimensions: 384
  • Architecture: MiniLM (distilled BERT)
  • Model Size: ~23MB
  • Quantization: Available

Performance Metrics

  • Single Embedding: 16.18ms average
  • Batch Processing (10 texts): 78.97ms total
  • Average per Embedding: 7.90ms
  • Throughput: 126.63 embeddings/second
  • Memory Usage: 1.64MB during processing
  • Total Memory: 342.93MB
  • Load Time: 460ms (cached)
{
embedding_model: "sentence-transformers/all-MiniLM-L6-v2",
chunk_size: 250,
chunk_overlap: 50,
batch_size: 16,
dimensions: 384
}

2. Xenova/all-mpnet-base-v2 (High-Quality Alternative)

Specifications

  • Dimensions: 768
  • Architecture: MPNet (Masked and Permuted Pre-training)
  • Model Size: ~110MB
  • Quantization: Available

Performance Metrics

  • Single Embedding: 113.57ms average
  • Batch Processing (10 texts): 341.10ms total
  • Average per Embedding: 34.11ms
  • Throughput: 29.32 embeddings/second
  • Memory Usage: 12.27MB during processing
  • Total Memory: 892.22MB
  • Load Time: 6,086ms (cached), ~41s (first download)
{
embedding_model: "Xenova/all-mpnet-base-v2",
chunk_size: 400,
chunk_overlap: 80,
batch_size: 8,
dimensions: 768
}

Performance Comparison

Speed Analysis

MetricMiniLM-L6-v2MPNet-base-v2Ratio
Avg Time/Embedding7.90ms34.11ms4.3x slower
Throughput126.63/sec29.32/sec4.3x slower
Load Time460ms6,086ms13.2x slower

Memory Analysis

MetricMiniLM-L6-v2MPNet-base-v2Ratio
Processing Memory1.64MB12.27MB7.5x more
Total Memory342.93MB892.22MB2.6x more
Model Size~23MB~110MB4.8x larger

Quality Analysis

AspectMiniLM-L6-v2MPNet-base-v2
Dimensions384768 (2x more)
Semantic RichnessGoodExcellent
Domain AdaptationModerateBetter
Fine-tuning PotentialLimitedHigher

Use Case Recommendations

Choose MiniLM-L6-v2 When:

  • Speed is critical (real-time applications)
  • Memory is limited (< 4GB RAM systems)
  • Processing large volumes (batch document ingestion)
  • General purpose search (good enough quality)
  • Resource-constrained environments
  • Frequent model loading/unloading

Choose MPNet-base-v2 When:

  • Quality is paramount (research, analysis)
  • Complex semantic understanding needed
  • Technical/specialized content (code, scientific papers)
  • Small to medium datasets (< 10k documents)
  • Sufficient system resources (8GB+ RAM)
  • Infrequent model loading (long-running processes)

Configuration Guidelines

System Resource Considerations

Low-End Systems (< 4GB RAM)

// Use MiniLM with conservative settings
{
embedding_model: "sentence-transformers/all-MiniLM-L6-v2",
batch_size: 8,
chunk_size: 200
}

Mid-Range Systems (4-8GB RAM)

// MiniLM with standard settings or MPNet with reduced batch
{
embedding_model: "sentence-transformers/all-MiniLM-L6-v2", // or MPNet
batch_size: 16, // or 4 for MPNet
chunk_size: 250 // or 300 for MPNet
}

High-End Systems (8GB+ RAM)

// Either model with optimal settings
{
embedding_model: "Xenova/all-mpnet-base-v2",
batch_size: 8,
chunk_size: 400
}

Performance Optimization Tips

For Speed-Critical Applications

  1. Use MiniLM-L6-v2
  2. Increase batch_size to 32 (if memory allows)
  3. Reduce chunk_size to 200
  4. Enable model quantization
  5. Pre-load model at startup

For Quality-Critical Applications

  1. Use MPNet-base-v2
  2. Increase chunk_size to 500
  3. Use overlap of 100-120
  4. Reduce batch_size to 4-6
  5. Ensure sufficient memory headroom

Migration Guide

Switching Between Models

⚠️ Important: Switching models requires rebuilding the vector index as embeddings have different dimensions.

From MiniLM to MPNet

# 1. Update configuration
export RAG_EMBEDDING_MODEL="Xenova/all-mpnet-base-v2"
export RAG_BATCH_SIZE="8"
export RAG_CHUNK_SIZE="400"

# 2. Rebuild index
raglite rebuild

From MPNet to MiniLM

# 1. Update configuration
export RAG_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
export RAG_BATCH_SIZE="16"
export RAG_CHUNK_SIZE="250"

# 2. Rebuild index
raglite rebuild

Technical Implementation Notes

Model Loading

Both models use the same transformers.js pipeline:

const model = await pipeline('feature-extraction', modelName, {
local_files_only: false,
revision: 'main',
quantized: false
});

Embedding Generation

Consistent API across models:

const embeddings = await model(texts, {
pooling: 'mean',
normalize: true
});

Compatibility Matrix

FeatureMiniLM-L6-v2MPNet-base-v2
transformers.js
Quantization
Batch Processing
Browser Support
Node.js Support

Future Model Support

Planned Additions

  • sentence-transformers/all-mpnet-base-v2: Original HuggingFace version
  • BAAI/bge-small-en-v1.5: Competitive 384D alternative
  • intfloat/e5-small-v2: Another quality option

Evaluation Criteria

  1. Performance: Speed and memory efficiency
  2. Quality: Semantic understanding capability
  3. Compatibility: transformers.js support
  4. Size: Model download and storage requirements
  5. Community: Adoption and maintenance status

Troubleshooting

Common Issues

Model Loading Failures

  • Ensure internet connection for first download
  • Check available disk space (>500MB free)
  • Verify Node.js version compatibility

Memory Issues

  • Reduce batch_size
  • Use smaller chunk_size
  • Switch to MiniLM model
  • Close other applications

Performance Issues

  • Check system resources
  • Verify model is cached locally
  • Consider model quantization
  • Optimize batch and chunk sizes

Conclusion

Both models are production-ready and serve different use cases:

  • MiniLM-L6-v2: Excellent balance of speed and quality for most applications
  • MPNet-base-v2: Superior quality for demanding semantic understanding tasks

Choose based on your specific requirements for speed vs. quality, and system resource constraints. The system supports seamless switching between models with a simple rebuild process.

Last Updated: Based on validation testing completed during multi-model support implementation Test Environment: Windows 11, Intel i3-1115G4, 8GB RAM, Node.js v20.19.2