RAG-lite MCP Server Multimodal Guide
Overview
The RAG-lite MCP (Model Context Protocol) server provides a comprehensive interface for both text-only and multimodal document retrieval. The server automatically adapts its behavior based on the mode configuration stored during ingestion, implementing the Chameleon Architecture's polymorphic runtime system.
Key Features
Dynamic Tool Descriptions (NEW)
The MCP server now automatically detects and advertises its capabilities based on the actual database content. When you connect multiple MCP server instances to different databases, each server will describe what it actually contains:
Text-only database:
[TEXT MODE] Search indexed documents using semantic similarity.
This database contains 150 text documents. Supports .md and .txt files only.
Multimodal database with images:
[MULTIMODAL MODE] Search indexed documents using semantic similarity.
This database contains 200 documents. Contains both text and image content.
Image results include base64-encoded data for display.
Supports cross-modal search (text queries can find images).
Benefits:
- Intelligent routing: AI assistants can choose the right database based on the query
- Self-documenting: Each server advertises its actual capabilities
- No guesswork: Clear indication of what content types are available
- Multi-instance support: Run multiple servers with different content types seamlessly
Quick Start
Basic Text Mode Usage
{
"name": "ingest",
"arguments": {
"path": "./documents",
"mode": "text"
}
}
What happens:
- Mode configuration stored in database as "text"
- Documents processed with sentence transformer model
- Cross-encoder reranking enabled by default
- Subsequent searches automatically detect text mode
Multimodal Mode Usage
{
"name": "ingest",
"arguments": {
"path": "./mixed-content",
"mode": "multimodal",
"model": "Xenova/clip-vit-base-patch32",
"rerank_strategy": "text-derived"
}
}
What happens:
- Mode configuration stored in database as "multimodal"
- Text and images processed with CLIP model
- Images converted to text descriptions for reranking
- Image metadata extracted (dimensions, format, file size)
- Subsequent searches automatically detect multimodal mode
Automatic Mode Detection
{
"name": "search",
"arguments": {
"query": "API documentation"
}
}
Chameleon Architecture behavior:
- Mode automatically detected from database configuration
- Appropriate model and reranking strategy loaded
- Same search interface works across all modes
- No manual mode specification required
Available Tools
Core Operations
ingest
Ingest documents with automatic mode configuration and model selection.
Parameters:
path(required): File or directory path to ingestmode(optional):"text"or"multimodal"(default:"text")model(optional): Embedding model to usererank_strategy(optional): Reranking strategy for multimodal modeforce_rebuild(optional): Force rebuild of entire index
Supported Models:
- Text mode:
sentence-transformers/all-MiniLM-L6-v2(384 dimensions, fast)Xenova/all-mpnet-base-v2(768 dimensions, accurate)
- Multimodal mode:
Xenova/clip-vit-base-patch32(512 dimensions, balanced)Xenova/clip-vit-base-patch16(512 dimensions, high accuracy)
Supported File Types:
- Text mode:
.md,.txt - Multimodal mode:
.md,.txt,.jpg,.jpeg,.png,.gif,.webp
Processing Features:
- Text Mode: Optimized text chunking, sentence-level embeddings, cross-encoder reranking
- Multimodal Mode: Image-to-text conversion, metadata extraction, unified embedding space, content-type aware reranking
search
Standard search with automatic mode detection from database. In multimodal mode, image results automatically include base64-encoded image data for display in MCP clients.
Parameters:
query(required): Search query stringtop_k(optional): Number of results (1-100, default: 10)rerank(optional): Enable reranking (default: false)
Response Format:
{
"query": "red car",
"results_count": 5,
"search_time_ms": 150,
"results": [
{
"rank": 1,
"score": 0.85,
"content_type": "image",
"document": {
"id": 1,
"title": "red-sports-car.jpg",
"source": "./images/red-sports-car.jpg",
"content_type": "image"
},
"text": "a red sports car parked on the street",
"image_data": "/9j/4AAQSkZJRg...", // Base64-encoded image data
"image_format": "base64"
}
]
}
Image Content Retrieval:
- Image results automatically include
image_datafield with base64-encoded content image_formatfield indicates encoding type (always "base64")- If image retrieval fails,
image_errorfield contains error message - Text results do not include image data fields
multimodal_search
Enhanced search with content type filtering and multimodal capabilities. Image results automatically include base64-encoded image data for display in MCP clients.
Parameters:
query(required): Search query stringtop_k(optional): Number of results (1-100, default: 10)rerank(optional): Enable reranking (default: false)content_type(optional): Filter by content type ("text","image","pdf","docx")
Cross-Modal Search Examples:
- Text query finding images:
"red sports car"→ Returns images of red cars - Text query with image filter:
"architecture diagram"+content_type: "image"→ Only diagram images - Mixed results:
"installation guide"→ Returns both text docs and instructional images
Response Format:
Same as search tool, with automatic base64 encoding for image content. See search tool documentation for response structure.
Information and Configuration Tools
get_mode_info
Get current system mode and configuration information.
Returns:
- Current mode (text/multimodal)
- Model information and capabilities
- Reranking strategy configuration
- Supported content types and file formats
list_supported_models
List all supported embedding models with detailed capabilities.
Parameters:
model_type(optional): Filter by type ("sentence-transformer","clip")content_type(optional): Filter by supported content ("text","image")
Returns:
- Model specifications and requirements
- Memory requirements and performance characteristics
- Supported content types and capabilities
- Usage recommendations
list_reranking_strategies
List all supported reranking strategies for different modes.
Parameters:
mode(optional): Filter by mode ("text","multimodal")
Returns:
- Strategy descriptions and requirements
- Performance impact and accuracy ratings
- Supported content types
- Use case recommendations
get_system_stats
Get comprehensive system statistics with mode-specific metrics.
Parameters:
include_performance(optional): Include performance metricsinclude_content_breakdown(optional): Include content type breakdown
Returns:
- Database and index status
- Mode-specific configuration
- Content statistics by type
- Performance metrics (if requested)
Image-Specific Tools
ingest_image
Specialized tool for ingesting individual images from local files or URLs. Automatically sets mode to multimodal and uses CLIP embeddings.
Parameters:
source(required): Image file path or URL- Local file:
"./images/diagram.jpg" - Remote URL:
"https://example.com/image.jpg"
- Local file:
model(optional): CLIP model to use (default:"Xenova/clip-vit-base-patch32")rerank_strategy(optional): Reranking strategy (default:"text-derived")title(optional): Custom title for the imagemetadata(optional): Additional metadata object
Supported Image Formats:
- JPEG/JPG
- PNG
- GIF
- WebP
Example - Local File:
{
"name": "ingest_image",
"arguments": {
"source": "./docs/images/architecture.png",
"title": "System Architecture Diagram",
"metadata": {
"category": "architecture",
"tags": ["microservices", "cloud", "diagram"]
}
}
}
Example - Remote URL:
{
"name": "ingest_image",
"arguments": {
"source": "https://example.com/product-photo.jpg",
"model": "Xenova/clip-vit-base-patch16",
"metadata": {
"product_id": "12345",
"category": "electronics"
}
}
}
Response:
{
"source": "./docs/images/architecture.png",
"source_type": "file",
"mode": "multimodal",
"model": "Xenova/clip-vit-base-patch32",
"reranking_strategy": "text-derived",
"documents_processed": 1,
"chunks_created": 1,
"embeddings_generated": 1,
"content_type": "image",
"success": true
}
Use Cases:
- Adding individual images to existing collections
- Ingesting images from web URLs
- Building image-only search indexes
- Programmatic image ingestion workflows
Utility Tools
get_stats
Get basic statistics about the search index.
rebuild_index
Rebuild the entire vector index from scratch.
Image Content Retrieval
Automatic Base64 Encoding
The MCP server automatically encodes image content as base64 for display in MCP clients. This feature works seamlessly with both search and multimodal_search tools.
How It Works:
- Search returns results with
contentIdfor image content - MCP server automatically retrieves image data using
contentId - Image is encoded as base64 and included in response
- MCP clients can directly display the image data
Example Response with Image Data:
{
"rank": 1,
"score": 0.92,
"content_type": "image",
"document": {
"id": 5,
"title": "architecture-diagram.png",
"source": "./docs/images/architecture-diagram.png",
"content_type": "image",
"contentId": "a1b2c3d4e5f6..."
},
"text": "system architecture diagram showing microservices",
"image_data": "iVBORw0KGgoAAAANSUhEUgAA...",
"image_format": "base64",
"metadata": {
"width": 1920,
"height": 1080,
"format": "png",
"size": 245760
}
}
Error Handling: If image retrieval fails, the response includes an error field instead:
{
"rank": 1,
"score": 0.92,
"content_type": "image",
"document": { ... },
"text": "system architecture diagram",
"image_error": "Content file not found: ./docs/images/missing.png"
}
Cross-Modal Search Capabilities
Multimodal mode enables true cross-modal search where text queries can find images and vice versa.
Text Query → Image Results:
// Find images using text descriptions
const results = await client.callTool('multimodal_search', {
query: 'red sports car on highway',
content_type: 'image',
top_k: 10
});
// Results include images with base64 data
results.forEach(result => {
console.log(`Found: ${result.document.title}`);
console.log(`Description: ${result.text}`);
console.log(`Image data: ${result.image_data.substring(0, 50)}...`);
// Display image using result.image_data
});
Mixed Content Results:
// Search returns both text and images ranked by relevance
const results = await client.callTool('search', {
query: 'installation instructions',
top_k: 20,
rerank: true
});
// Process different content types
results.forEach(result => {
if (result.content_type === 'image') {
// Display image with base64 data
displayImage(result.image_data, result.document.title);
} else {
// Display text content
displayText(result.text, result.document.title);
}
});
Content Type Filtering:
// Get only images related to query
const imageResults = await client.callTool('multimodal_search', {
query: 'system architecture',
content_type: 'image'
});
// Get only text documents
const textResults = await client.callTool('multimodal_search', {
query: 'system architecture',
content_type: 'text'
});
// Get all content types (default)
const allResults = await client.callTool('multimodal_search', {
query: 'system architecture'
});
Performance Considerations
Base64 Encoding:
- Small images (<1MB): Instant encoding
- Medium images (1-10MB): Optimized streaming encoding
- Large images (>10MB): Streaming with progress tracking
- Timeout: 30 seconds for standard files, 5 minutes for large files
Memory Management:
- Safe buffer handling prevents memory leaks
- Automatic cleanup after encoding
- Batch operations supported for multiple images
Optimization Tips:
- Use
content_typefilter to reduce unnecessary image encoding - Limit
top_kwhen expecting many image results - Consider image file sizes when designing content structure
- Monitor memory usage with large image collections
Mode Configuration
Text Mode
Optimized for text-only content with high performance and accuracy.
Default Configuration:
- Model:
sentence-transformers/all-MiniLM-L6-v2 - Reranking: Cross-encoder
- Content types: Text documents only
- File types:
.md,.txt
Use Cases:
- Documentation repositories
- Academic papers and articles
- Code documentation
- Text-heavy content collections
Multimodal Mode
Supports mixed text and image content with advanced reranking strategies.
Default Configuration:
- Model:
Xenova/clip-vit-base-patch32 - Reranking: Text-derived (image-to-text + cross-encoder)
- Content types: Text and images
- File types:
.md,.txt,.jpg,.jpeg,.png,.gif,.webp
Use Cases:
- Technical documentation with diagrams
- Visual content collections
- Mixed media repositories
- Educational materials with images
Reranking Strategies
Text Mode Strategies
cross-encoder
- Description: Uses cross-encoder models for semantic relevance scoring
- Performance: High computational cost, high accuracy
- Best for: Text documents, academic content, technical documentation
disabled
- Description: No reranking, results ordered by vector similarity only
- Performance: Minimal computational cost, baseline accuracy
- Best for: Fast retrieval, development, simple similarity search
Multimodal Mode Strategies
text-derived (Default)
- Description: Converts images to text descriptions, then applies cross-encoder reranking
- Requirements: Image-to-text model + cross-encoder model
- Performance: High computational cost, high accuracy
- Best for: Mixed content with meaningful images, visual documentation
metadata
- Description: Uses file metadata and filename patterns for scoring
- Requirements: None (file system metadata only)
- Performance: Low computational cost, medium accuracy
- Best for: Fast retrieval, filename-based search, content filtering
hybrid
- Description: Combines semantic and metadata signals with configurable weights
- Requirements: Text-derived reranker + metadata reranker
- Performance: High computational cost, very high accuracy
- Best for: Production systems, complex collections, maximum accuracy
disabled
- Description: No reranking, results ordered by vector similarity only
- Performance: Minimal computational cost, baseline accuracy
- Best for: Maximum performance, simple similarity search
Model Selection Guide
Performance vs. Accuracy Trade-offs
High Performance (Low Memory)
- Text:
sentence-transformers/all-MiniLM-L6-v2(384 dimensions, ~256MB) - Multimodal:
Xenova/clip-vit-base-patch32(512 dimensions, ~1GB)
High Accuracy (Higher Memory)
- Text:
Xenova/all-mpnet-base-v2(768 dimensions, ~512MB) - Multimodal:
Xenova/clip-vit-base-patch16(512 dimensions, ~1.5GB)
Content Type Compatibility
| Model | Text | Images | Batch Size | Memory |
|---|---|---|---|---|
sentence-transformers/all-MiniLM-L6-v2 | ✅ | ❌ | 32 | 256MB |
Xenova/all-mpnet-base-v2 | ✅ | ❌ | 16 | 512MB |
Xenova/clip-vit-base-patch32 | ✅ | ✅ | 8 | 1GB |
Xenova/clip-vit-base-patch16 | ✅ | ✅ | 4 | 1.5GB |
Error Handling
Common Error Scenarios
Model Mismatch
When the configured model doesn't match the indexed data:
{
"error": "MODEL_MISMATCH",
"message": "Cannot perform search due to model mismatch",
"resolution": {
"action": "manual_intervention_required",
"options": [
"Check if the model mismatch is intentional",
"Run rebuild_index tool to rebuild with new model",
"Verify model configuration matches indexing setup"
]
}
}
Dimension Mismatch
When vector dimensions don't match between model and index:
{
"error": "DIMENSION_MISMATCH",
"message": "Cannot perform search due to vector dimension mismatch",
"resolution": {
"action": "manual_intervention_required",
"options": [
"Check your model configuration",
"Run rebuild_index tool if changing models",
"Ensure consistency between indexing and search models"
]
}
}
Recovery Strategies
- Model Compatibility Issues: Use
rebuild_indextool to regenerate embeddings - Mode Detection Failures: Check database integrity and reinitialize if needed
- Reranking Failures: System automatically falls back to vector similarity
- Memory Issues: Switch to smaller models or increase available memory
Best Practices
Development Workflow
- Start Simple: Begin with text mode and basic models
- Test Incrementally: Add multimodal capabilities gradually
- Monitor Performance: Use
get_system_statsto track resource usage - Validate Configuration: Use
get_mode_infoto verify setup
Production Deployment
- Choose Appropriate Mode: Text for documents, multimodal for mixed content
- Select Models Carefully: Balance performance vs. accuracy needs
- Configure Reranking: Enable for better accuracy, disable for speed
- Monitor Resources: Track memory usage and processing times
Content Organization
- Consistent File Types: Keep similar content types together
- Meaningful Filenames: Use descriptive names for metadata-based reranking
- Image Quality: Use clear, relevant images for better text-derived reranking
- Regular Maintenance: Rebuild index when changing models or strategies
Integration Examples
Basic MCP Client Usage
// Initialize MCP client
const client = new MCPClient();
// Ingest multimodal content with automatic configuration storage
await client.callTool('ingest', {
path: './docs',
mode: 'multimodal',
model: 'Xenova/clip-vit-base-patch32',
rerank_strategy: 'text-derived'
});
// Search with automatic mode detection
const results = await client.callTool('search', {
query: 'architecture diagram'
});
// Mode automatically detected from database - no manual specification needed
// Search with content type filtering
const imageResults = await client.callTool('multimodal_search', {
query: 'system architecture',
content_type: 'image',
rerank: true
});
// Get system information and verify Chameleon Architecture behavior
const modeInfo = await client.callTool('get_mode_info', {});
console.log(`Current mode: ${modeInfo.mode}`); // "multimodal"
console.log(`Auto-detected model: ${modeInfo.model_name}`); // "Xenova/clip-vit-base-patch32"
Chameleon Architecture Demonstration
// Step 1: Set up text mode
await client.callTool('ingest', {
path: './text-docs',
mode: 'text',
model: 'sentence-transformers/all-MiniLM-L6-v2'
});
// Step 2: Search automatically uses text mode
const textResults = await client.callTool('search', {
query: 'installation guide'
});
// Uses sentence transformer + cross-encoder reranking
// Step 3: Switch to multimodal mode
await client.callTool('ingest', {
path: './mixed-content',
mode: 'multimodal',
model: 'Xenova/clip-vit-base-patch32',
rerank_strategy: 'text-derived'
});
// Step 4: Same search interface, different behavior
const multimodalResults = await client.callTool('search', {
query: 'installation guide'
});
// Now uses CLIP + text-derived reranking, can find images too
Configuration Management
// Check all supported models with detailed specifications
const models = await client.callTool('list_supported_models', {});
models.forEach(model => {
console.log(`${model.name}: ${model.dimensions}D, ${model.supported_content_types.join('+')}`);
});
// Compare reranking strategies for multimodal mode
const strategies = await client.callTool('list_reranking_strategies', {
mode: 'multimodal'
});
strategies.forEach(strategy => {
console.log(`${strategy.name}: ${strategy.description} (${strategy.performance_impact})`);
});
// Monitor system performance and content breakdown
const stats = await client.callTool('get_system_stats', {
include_performance: true,
include_content_breakdown: true
});
console.log(`Content breakdown:`, stats.content_breakdown);
console.log(`Performance metrics:`, stats.performance_metrics);
Error Handling and Recovery
try {
// Attempt operation that might fail
await client.callTool('ingest', {
path: './docs',
model: 'unsupported-model'
});
} catch (error) {
if (error.code === 'MODEL_VALIDATION_ERROR') {
// Get supported alternatives
const models = await client.callTool('list_supported_models', {
content_type: 'text'
});
console.log('Supported models:', models.map(m => m.name));
}
}
// Handle model mismatch scenarios
try {
const results = await client.callTool('search', { query: 'test' });
} catch (error) {
if (error.code === 'DIMENSION_MISMATCH') {
// Rebuild index with correct model
await client.callTool('rebuild_index', {});
// Retry search
const results = await client.callTool('search', { query: 'test' });
}
}
Complete Cross-Modal Search Example
This example demonstrates the full workflow of multimodal ingestion and cross-modal search with image retrieval:
// Step 1: Ingest mixed content (text + images)
console.log('Ingesting multimodal content...');
await client.callTool('ingest', {
path: './product-catalog',
mode: 'multimodal',
model: 'Xenova/clip-vit-base-patch32',
rerank_strategy: 'text-derived'
});
// Step 2: Ingest additional images from URLs
console.log('Adding product images from URLs...');
await client.callTool('ingest_image', {
source: 'https://example.com/products/laptop.jpg',
metadata: {
product: 'laptop',
category: 'electronics',
price: 999
}
});
// Step 3: Verify multimodal mode is active
const modeInfo = await client.callTool('get_mode_info', {});
console.log(`Mode: ${modeInfo.current_mode}`); // "multimodal"
console.log(`Model: ${modeInfo.model_name}`); // "Xenova/clip-vit-base-patch32"
console.log(`Content types: ${modeInfo.supported_content_types.join(', ')}`); // "text, image"
// Step 4: Cross-modal search - Text query finding images
console.log('\n=== Text Query → Image Results ===');
const imageResults = await client.callTool('multimodal_search', {
query: 'silver laptop with backlit keyboard',
content_type: 'image',
top_k: 5,
rerank: true
});
console.log(`Found ${imageResults.results_count} images`);
imageResults.results.forEach(result => {
console.log(`\n${result.rank}. ${result.document.title} (score: ${result.score})`);
console.log(` Description: ${result.text}`);
console.log(` Image data: ${result.image_data ? 'Available' : 'Not available'}`);
if (result.image_data) {
// Display image in MCP client
displayBase64Image(result.image_data, result.document.title);
}
if (result.metadata) {
console.log(` Metadata:`, result.metadata);
}
});
// Step 5: Mixed content search - Both text and images
console.log('\n=== Mixed Content Search ===');
const mixedResults = await client.callTool('search', {
query: 'laptop specifications and photos',
top_k: 10,
rerank: true
});
console.log(`Found ${mixedResults.results_count} results (mixed content)`);
mixedResults.results.forEach(result => {
const contentIcon = result.content_type === 'image' ? '🖼️' : '📄';
console.log(`\n${contentIcon} ${result.rank}. ${result.document.title}`);
console.log(` Type: ${result.content_type}`);
console.log(` Score: ${result.score}`);
if (result.content_type === 'image' && result.image_data) {
console.log(` Image: ${result.image_data.length} bytes (base64)`);
displayBase64Image(result.image_data, result.document.title);
} else {
console.log(` Text: ${result.text.substring(0, 100)}...`);
}
});
// Step 6: Content type filtering
console.log('\n=== Content Type Filtering ===');
// Only text documents
const textOnly = await client.callTool('multimodal_search', {
query: 'laptop',
content_type: 'text',
top_k: 5
});
console.log(`Text documents: ${textOnly.results_count}`);
// Only images
const imagesOnly = await client.callTool('multimodal_search', {
query: 'laptop',
content_type: 'image',
top_k: 5
});
console.log(`Images: ${imagesOnly.results_count}`);
// Step 7: Get system statistics
const stats = await client.callTool('get_system_stats', {
include_content_breakdown: true
});
console.log('\n=== Content Statistics ===');
console.log(`Total documents: ${stats.total_documents}`);
console.log(`Total chunks: ${stats.total_chunks}`);
console.log('Content breakdown:', stats.content_breakdown);
// Helper function to display base64 images
function displayBase64Image(base64Data, title) {
// In a real MCP client, this would render the image
// For example, in a web-based client:
// const img = document.createElement('img');
// img.src = `data:image/jpeg;base64,${base64Data}`;
// img.alt = title;
// container.appendChild(img);
console.log(` [Image would be displayed here: ${title}]`);
}
Expected Output:
Ingesting multimodal content...
✓ Ingested 50 documents (30 text, 20 images)
Adding product images from URLs...
✓ Image ingested successfully
Mode: multimodal
Model: Xenova/clip-vit-base-patch32
Content types: text, image
=== Text Query → Image Results ===
Found 5 images
1. laptop-silver-backlit.jpg (score: 0.89)
Description: silver laptop with illuminated keyboard on desk
Image data: Available
Metadata: { product: 'laptop', category: 'electronics' }
2. macbook-pro-keyboard.jpg (score: 0.85)
Description: close-up of laptop keyboard with backlight
Image data: Available
=== Mixed Content Search ===
Found 10 results (mixed content)
📄 1. laptop-specs.md
Type: text
Score: 0.92
Text: Technical specifications for our laptop lineup including processor, RAM, storage...
🖼️ 2. laptop-front-view.jpg
Type: image
Score: 0.88
Image: 15234 bytes (base64)
=== Content Type Filtering ===
Text documents: 5
Images: 5
=== Content Statistics ===
Total documents: 51
Total chunks: 75
Content breakdown: { text: 30, image: 21 }
Running Multiple MCP Server Instances
Overview
You can run multiple MCP server instances simultaneously, each connected to a different database. This is useful for:
- Separating text-only and multimodal content
- Managing different projects or knowledge bases
- Isolating different content domains (docs, code, images, etc.)
Configuration Example
Configure multiple servers in your MCP client (e.g., Claude Desktop, Cline):
{
"mcpServers": {
"rag-lite-text-docs": {
"command": "npx",
"args": ["rag-lite-mcp"],
"env": {
"RAG_DB_FILE": "./text-docs/db.sqlite",
"RAG_INDEX_FILE": "./text-docs/index.bin"
}
},
"rag-lite-multimodal-images": {
"command": "npx",
"args": ["rag-lite-mcp"],
"env": {
"RAG_DB_FILE": "./mixed-content/db.sqlite",
"RAG_INDEX_FILE": "./mixed-content/index.bin"
}
},
"rag-lite-project-a": {
"command": "npx",
"args": ["rag-lite-mcp"],
"env": {
"RAG_DB_FILE": "./projects/project-a/db.sqlite",
"RAG_INDEX_FILE": "./projects/project-a/index.bin"
}
}
}
}
How It Works
Each server instance:
- Runs independently with its own database and index
- Detects its mode (text/multimodal) from the database
- Advertises its capabilities through dynamic tool descriptions
- Maintains its own configuration and state
The MCP client sees:
Available tools:
rag-lite-text-docs::search
[TEXT MODE] Search indexed documents using semantic similarity.
This database contains 150 text documents. Supports .md and .txt files only.
rag-lite-multimodal-images::search
[MULTIMODAL MODE] Search indexed documents using semantic similarity.
This database contains 200 documents. Contains both text and image content.
Image results include base64-encoded data for display.
Supports cross-modal search (text queries can find images).
rag-lite-project-a::search
[TEXT MODE] Search indexed documents using semantic similarity.
This database contains 75 text documents. Supports .md and .txt files only.
Intelligent Routing
With dynamic tool descriptions, AI assistants can intelligently choose which server to query:
User query: "Find images of architecture diagrams"
- AI sees
[MULTIMODAL MODE]inrag-lite-multimodal-images::searchdescription - AI sees "Contains both text and image content"
- AI automatically calls
rag-lite-multimodal-images::search
User query: "Search the API documentation"
- AI sees
[TEXT MODE]inrag-lite-text-docs::searchdescription - AI sees "150 text documents"
- AI automatically calls
rag-lite-text-docs::search
Best Practices
- Use descriptive server names that indicate the content type or domain
- Separate by content type - text-only vs multimodal databases
- Separate by domain - docs, code, images, projects, etc.
- Monitor resource usage - each server instance uses memory
- Use consistent paths - organize databases in a clear directory structure
Example: Multi-Project Setup
{
"mcpServers": {
"docs-text": {
"command": "npx",
"args": ["rag-lite-mcp"],
"env": {
"RAG_DB_FILE": "./knowledge/docs/db.sqlite",
"RAG_INDEX_FILE": "./knowledge/docs/index.bin"
}
},
"images-multimodal": {
"command": "npx",
"args": ["rag-lite-mcp"],
"env": {
"RAG_DB_FILE": "./knowledge/images/db.sqlite",
"RAG_INDEX_FILE": "./knowledge/images/index.bin"
}
},
"code-examples": {
"command": "npx",
"args": ["rag-lite-mcp"],
"env": {
"RAG_DB_FILE": "./knowledge/code/db.sqlite",
"RAG_INDEX_FILE": "./knowledge/code/index.bin"
}
}
}
}
Limitations
- No automatic federation: Each query goes to one server at a time
- No cross-database search: Results are not automatically merged across servers
- Manual orchestration: AI assistant handles routing, not the MCP servers
- Resource usage: Each server instance uses memory and CPU
Advanced: Cross-Database Search
If you need to search across multiple databases, you can:
- Sequential queries: Call multiple servers and combine results in your code
- Custom orchestration: Build a wrapper that queries all servers and merges results
- Unified database: Ingest all content into a single multimodal database
Example of sequential queries:
// Query all servers
const textResults = await client.callTool('rag-lite-text-docs', 'search', {
query: 'authentication'
});
const imageResults = await client.callTool('rag-lite-multimodal-images', 'search', {
query: 'authentication'
});
// Combine and present results
console.log('Text results:', textResults.results_count);
console.log('Image results:', imageResults.results_count);
Troubleshooting
Performance Issues
- Use smaller models for faster processing
- Disable reranking for maximum speed
- Reduce batch sizes for memory-constrained environments
- Limit
top_kwhen expecting many image results to reduce base64 encoding overhead
Accuracy Issues
- Enable reranking for better results
- Use larger, more accurate models
- Try hybrid reranking strategy for multimodal content
- Ensure images are clear and relevant for better text-derived reranking
Memory Issues
- Switch to smaller models (MiniLM vs. MPNet, patch32 vs. patch16)
- Reduce batch sizes in model configuration
- Monitor memory usage with system stats
- Be mindful of large image files (>10MB) which use streaming encoding
Content Issues
- Verify file types are supported for the selected mode
- Check image formats for multimodal mode (jpg, png, gif, webp)
- Ensure text encoding is UTF-8 for text content
- Validate image URLs are accessible when using
ingest_imagewith remote sources
Image Retrieval Issues
Problem: Image results missing image_data field
- Cause: Content file may have been moved or deleted
- Solution: Check
image_errorfield for details, re-ingest if needed
Problem: Base64 encoding timeout
- Cause: Very large image files (>50MB)
- Solution: Resize images before ingestion, or increase timeout settings
Problem: Image quality issues in MCP client
- Cause: Base64 encoding/decoding issues
- Solution: Verify base64 data integrity, check MCP client image rendering
Problem: Cross-modal search not finding relevant images
- Cause: CLIP model limitations or poor image quality
- Solution:
- Use more descriptive text queries
- Try different CLIP models (patch16 for higher accuracy)
- Enable reranking with text-derived strategy
- Ensure images are clear and well-lit
Mode Detection Issues
Problem: Search using wrong mode after ingestion
- Cause: Database mode configuration not updated
- Solution: Use
get_mode_infoto verify mode, re-ingest with correct mode if needed
Problem: Cannot switch between modes
- Cause: Mode is stored in database and persists
- Solution: Use
force_rebuild: truewhen ingesting with new mode to rebuild index
For additional support and advanced configuration options, refer to the main RAG-lite documentation and the Chameleon Architecture specification.