Skip to content

A fast, local search engine (lib + cli) with vector embeddings and SQLite storage

License

Notifications You must be signed in to change notification settings

nnanto/localsearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local Search Engine

A fast, local search engine built in Rust with vector embeddings and SQLite storage.

Features

  • 🔍 Full-Text + Semantic Search using embeddings generated and stored locally
  • 📁 Local file indexing and search
  • 🗄️ SQLite-based storage
  • 📚 Both library and CLI interfaces

Installation Guide

Quick Install

Linux/macOS

curl -sSL https://raw.githubusercontent.com/nnanto/localsearch/main/scripts/install.sh | bash

Windows (PowerShell)

irm https://raw.githubusercontent.com/nnanto/localsearch/main/scripts/install.ps1 | iex

Manual Installation

Pre-built Binaries

Download the appropriate binary for your platform from the latest release:

Linux (x86_64)

curl -L https://github.com/nnanto/localsearch/releases/latest/download/localsearch-linux-x86_64.tar.gz | tar xz
sudo mv localsearch /usr/local/bin/

macOS (Intel)

curl -L https://github.com/nnanto/localsearch/releases/latest/download/localsearch-macos-x86_64.tar.gz | tar xz
sudo mv localsearch /usr/local/bin/

macOS (Apple Silicon)

curl -L https://github.com/nnanto/localsearch/releases/latest/download/localsearch-macos-aarch64.tar.gz | tar xz
sudo mv localsearch /usr/local/bin/

Windows

  1. Download localsearch-windows-x86_64.zip
  2. Extract the ZIP file
  3. Add the extracted directory to your PATH environment variable

From Source

If you have Rust installed, you can build from source:

cargo install --git https://github.com/nnanto/localsearch --features cli

Or clone and build:

git clone https://github.com/nnanto/localsearch.git
cd localsearch
cargo build --release --features cli
sudo cp target/release/localsearch /usr/local/bin/

Verify Installation

After installation, verify that the tool is working:

localsearch --help

You should see the help output for the localsearch CLI tool.

Updating

To update to the latest version, simply re-run the installation command. The installer will replace the existing binary with the latest version.

Uninstallation

Linux/macOS

sudo rm /usr/local/bin/localsearch

Windows

Remove the installation directory and update your PATH environment variable to remove the localsearch directory.

Troubleshooting

Permission Issues

If you get permission errors on Linux/macOS, make sure you're running the installation with appropriate permissions (using sudo when needed).

Path Issues

If the localsearch command is not found after installation, make sure the installation directory is in your PATH:

  • Linux/macOS: /usr/local/bin should be in your PATH
  • Windows: The installation directory should be added to your PATH environment variable

Antivirus False Positives

Some antivirus software may flag the binary as suspicious. This is a common issue with Rust binaries. You may need to add an exception for the localsearch binary.

Usage

CLI Usage

Basic Commands

# Index documents (uses system default directories)
localsearch index /path/to/documents

# Search for content
localsearch search "your query here"

Directory Configuration

By default, localsearch uses system-appropriate directories:

  • Cache: Model files are stored in the system cache directory (e.g., ~/.cache on Linux, ~/Library/Caches on macOS)
  • Database: SQLite database is stored in the application data directory (e.g., ~/.local/share on Linux, ~/Library/Application Support on macOS)

You can override these defaults:

# Use custom database location
localsearch index /path/to/documents --db /custom/path/to/database.db

# Use custom cache directory for embeddings
localsearch index /path/to/documents --cache-dir /custom/cache/path

# Use both custom paths
localsearch index /path/to/documents --db /custom/db.db --cache-dir /custom/cache

# Search with custom paths
localsearch search "query" --db /custom/db.db --cache-dir /custom/cache

Using Local ONNX Models (CLI)

You can use your own local ONNX embedding models instead of the default pre-built models:

# Index with local model
localsearch index /path/to/documents \
  --local-model-path /path/to/your/model.onnx \
  --tokenizer-dir /path/to/tokenizer/directory \
  --max-tokens 512

# Search with local model
localsearch search "your query" \
  --local-model-path /path/to/your/model.onnx \
  --tokenizer-dir /path/to/tokenizer/directory \
  --max-tokens 512

Required for local models:

  • --local-model-path: Path to your ONNX model file
  • --tokenizer-dir: Directory containing:
    • tokenizer.json
    • config.json
    • special_tokens_map.json
    • tokenizer_config.json
  • --max-tokens: (Optional) Maximum number of tokens (default: 512)

File Types

# Index JSON files (default)
localsearch index data.json --file-type json

# Index text files
localsearch index /path/to/text/files --file-type text

Search Options

# Different search types
localsearch search "query" --search-type semantic
localsearch search "query" --search-type fulltext  
localsearch search "query" --search-type hybrid    # default

# Limit results
localsearch search "query" --limit 5

# Pretty output
localsearch search "query" --pretty

Path Filtering

Filter search results to only include documents whose paths contain specific patterns:

# Search only in documents with "src" in the path
localsearch search "function" --path-filter "src"

# Search in multiple path patterns (OR logic)
localsearch search "test" --path-filter "src,test,doc"

# Mix different types of patterns
localsearch search "config" --path-filter "settings,config.json,main"

# Case-insensitive substring matching
localsearch search "data" --path-filter "API,database,models"

# Use with other options
localsearch search "error" --path-filter "src,lib" --search-type semantic --pretty

How Path Filtering Works:

  • Uses case-insensitive substring matching
  • Multiple patterns are separated by commas
  • Results match if the path contains ANY of the specified patterns (OR logic)
  • Examples:
    • "src" matches: src/main.rs, my_src_file.txt, project/src/lib.rs
    • "src,test" matches: src/main.rs, tests/unit.rs, src_backup.txt

Library Usage

use localsearch::{SqliteLocalSearchEngine, LocalEmbedder, DocumentIndexer, LocalSearch, SearchType, DocumentRequest, LocalSearchDirs};

fn main() -> anyhow::Result<()> {
    // Option 1: Use default system directories
    let dirs = LocalSearchDirs::new();
    let db_path = dirs.default_db_path();
    let embedder = LocalEmbedder::new_with_default_model()?;
    
    // Option 2: Use custom cache directory
    // let custom_cache = std::path::PathBuf::from("/custom/cache");
    // let embedder = LocalEmbedder::new_with_cache_dir(custom_cache)?;
    
    // Option 3: Use your own local ONNX model and tokenizer
    // let onnx_path = std::path::PathBuf::from("/path/to/your/model.onnx");
    // let tokenizer_dir = std::path::PathBuf::from("/path/to/tokenizer/files");
    // let embedder = LocalEmbedder::new_with_local_model(onnx_path, tokenizer_dir, Some(512))?;
    
    let mut engine = SqliteLocalSearchEngine::new(&db_path.to_string_lossy(), Some(embedder))?;

    // Index a document
    engine.insert_document(DocumentRequest {
        path: "some/unique/path".to_string(),
        content: "This is example content".to_string(),
        metadata: None,
    })?;

    // Search
    let results = engine.search("example", SearchType::Hybrid, Some(10))?;    
    // Search with path filters (multiple patterns supported)
    let filters = vec!["src".to_string(), "test".to_string()];
    let filtered_results = engine.search("example", SearchType::Hybrid, Some(10), Some(&filters))?;    Ok(())
}

Path Filtering in Library

use localsearch::{SqliteLocalSearchEngine, LocalEmbedder, LocalSearch, SearchType};

fn search_examples(engine: &SqliteLocalSearchEngine) -> anyhow::Result<()> {
    // Search all documents
    let all_results = engine.search("rust programming", SearchType::Hybrid, Some(10), None)?;
    
    // Search only in source files
    let src_filter = vec!["src".to_string()];
    let src_results = engine.search("function", SearchType::Semantic, Some(5), Some(&src_filter))?;
    
    // Search in multiple path patterns
    let multi_filters = vec!["src".to_string(), "test".to_string(), "doc".to_string()];
    let filtered_results = engine.search(
        "example code", 
        SearchType::Hybrid, 
        Some(10), 
        Some(&multi_filters)
    )?;
    
    // Search with specific file patterns
    let file_filters = vec!["main.rs".to_string(), "lib.rs".to_string()];
    let file_results = engine.search(
        "implementation", 
        SearchType::FullText, 
        Some(3), 
        Some(&file_filters)
    )?;
    
    Ok(())
}

### Using Local ONNX Models

You can now use your own local ONNX embedding models instead of the default pre-built models:

```rust
use localsearch::LocalEmbedder;
use std::path::PathBuf;

// Method 1: Using a tokenizer directory
// Your tokenizer directory should contain:
// - tokenizer.json
// - config.json
// - special_tokens_map.json
// - tokenizer_config.json
let onnx_path = PathBuf::from("/path/to/your/model.onnx");
let tokenizer_dir = PathBuf::from("/path/to/tokenizer/directory");
let embedder = LocalEmbedder::new_with_local_model(onnx_path, tokenizer_dir, Some(512))?;

// Method 2: Using individual file paths
let embedder = LocalEmbedder::new_with_local_files(
    PathBuf::from("/path/to/model.onnx"),
    PathBuf::from("/path/to/tokenizer.json"),
    PathBuf::from("/path/to/config.json"),
    PathBuf::from("/path/to/special_tokens_map.json"),
    PathBuf::from("/path/to/tokenizer_config.json"),
    Some(512) // max_length
)?;

Required Files for Local Models:

  1. ONNX Model File: Your embedding model in ONNX format (.onnx)
  2. Tokenizer Files: Four JSON files typically found with transformer models:
    • tokenizer.json - Main tokenizer configuration
    • config.json - Model configuration
    • special_tokens_map.json - Special token mappings
    • tokenizer_config.json - Tokenizer-specific configuration

These files are commonly found in HuggingFace model repositories or can be exported when converting models to ONNX format.

Development

# Clone the repository
git clone https://github.com/nnanto/localsearch.git
cd localsearch

# Run tests
cargo test

# Run CLI with features
cargo run --features cli -- search "query"

License

MIT License - see LICENSE file for details.

About

A fast, local search engine (lib + cli) with vector embeddings and SQLite storage

Resources

License

Stars

Watchers

Forks

Packages

No packages published