Rust library - local-first inference

Text embeddings

HypEmbed loads local BERT-family weights, tokenizes text, runs the full encoder stack in Rust, and returns vectors ready for search, retrieval, and ranking pipelines.

Install cargo add hypembed

Why HypEmbed

  • Pure-Rust inference from tokenizer to pooling layer
  • No Python, ONNX Runtime, libtorch, or hosted API dependency
  • Supports BERT-style encoder models such as MiniLM and DistilBERT
  • Stable numerics, typed errors, and a compact public API

Current Scope

  • Load local `config.json`, `vocab.txt`, and `model.safetensors`
  • Mean pooling and CLS pooling
  • F32, F16, and BF16 weights converted to `f32` for inference
  • CPU-only execution today
use hypembed::{Embedder, EmbeddingOptions, PoolingStrategy};

let model = Embedder::load("./model")?;

let options = EmbeddingOptions::default()
    .with_pooling(PoolingStrategy::Mean)
    .with_normalize(true);

let embeddings = model.embed(
    &["hello world", "rust embeddings"],
    &options,
)?;