Embedding Module¶

A local vector computation engine free from bulky services, providing a simpler and more user-friendly solution for vectorization services.

Module Overview and Core Components¶

The pkg/embedding module is specifically designed for dense vectorization (Embedding) of text and images. Its biggest highlight is complete pure Go implementation of inference (based on ONNX Runtime), completely breaking the dependency on bulky Ollama local server or Python (Transformers) environments. Just compile one binary package, and your system has a high-performance offline vectorization engine.

Core Classes/Interfaces of the Module:

embedding.Provider: The basic interface representing text embedding capability.
embedding.MultimodalProvider: Inherits from Provider, with ability to process images and multimodal content.
embedding.BatchProcessor: High-concurrency processing factory, providing hash caching and concurrent batch processing for large-scale texts.
embedding.Downloader: Built-in geeky tool that can silently download fragmented models directly from HuggingFace's distribution network and load them locally.

Ready-to-Use Embedding Providers¶

GoChat provides extremely simple factory loading methods for mainstream high-quality open-source vector models in the industry. Even if the model is not on your local machine, the framework can pull it with one click.

`BGEProvider`¶

Description: BGE (BAAI General Embedding) is a high-performance multilingual encoder developed by BAAI. It is designed for RAG retrieval scenarios and has excellent performance in various language scenarios.

Features and Advantages: The BGE (BAAI General Embedding) series is currently an excellent multilingual encoder with outstanding comprehensive performance, deeply optimized for RAG retrieval scenarios based on asymmetric context (short Query searching long Document).
Applicable Scenarios: Domestic business, knowledge base systems requiring strong Chinese context recognition.

How to Use:

// Directly load BGE-small model (if it doesn't exist, will auto-download to default directory ~/.embedding)
provider, err := embedding.WithBEG("bge-small-zh-v1.5", "")
if err != nil {
    panic(err)
}

vectors, err := provider.Embed(context.Background(), []string{"Natural Language Processing"})
fmt.Printf("BGE vector dimension: %d, extracted feature count: %d\n", provider.Dimension(), len(vectors[0]))
// Output: BGE vector dimension: 512, extracted feature count: 512

2. `SentenceBERTProvider`¶

Features and Advantages: Based on traditional Sentence-BERT (such as all-MiniLM-L6-v2) architecture, mainly solving symmetric semantic similarity (Semantic Textual Similarity) matching. Its model size is extremely small (about 45MB), and loading and computation speed is extremely fast.
Applicable Scenarios: English business, basic sentence cosine similarity matching, edge devices with strict memory and performance constraints.

How to Use:

// Directly load Sentence-BERT English model
provider, err := embedding.WithBERT("all-MiniLM-L6-v2", "")

vectors, _ := provider.Embed(context.Background(), []string{"Machine learning is fascinating."})
fmt.Printf("SentenceBERT vector dimension: %d\n", provider.Dimension())
// Output: SentenceBERT vector dimension: 384

Powerful Tool for Massive Data: Batch Processing and Progress Tracking¶

When you need to process a PDF knowledge base containing tens of thousands of text chunks, directly calling provider.Embed can easily cause OOM (Out Of Memory) and lead to computation idling.

BatchProcessor is precisely designed for industrial-grade massive data:

Automatic Concurrent Batching: Controls the array size fed to the ONNX engine each time through MaxBatchSize, and uses MaxConcurrent to fully utilize multi-core CPU computing power.
LRU Hash Cache: Automatically ignores text fragments with completely identical content, avoiding expensive underlying tensor recalculation.
Atomic Progress Bar with Cancellation Mechanism: Provides callbacks for interface rendering progress.

Complete Example: High-Concurrency Knowledge Base Import¶

package main

import (
    "context"
    "fmt"
    "github.com/DotNetAge/gochat/pkg/embedding"
)

func main() {
    ctx := context.Background()

    // 1. Initialize Provider
    provider, _ := embedding.WithBEG("bge-small-zh-v1.5", "")

    // 2. Initialize BatchProcessor
    batchProcessor := embedding.NewBatchProcessor(provider, embedding.BatchOptions{
        MaxBatchSize:  32, // 32 text chunks per batch
        MaxConcurrent: 4,  // Up to 4 concurrent threads for inference
    })

    // Simulate tens of thousands of short texts split from database or PDF
    texts := []string{
        "Go is a high-performance concurrent programming language.",
        "What is text vectorization (Embedding)?",
        // ... (10,000 data entries omitted here)
        "This is the last data entry.",
    }

    // 3. Execute progress-tracked computation
    embeddings, err := batchProcessor.ProcessWithProgress(ctx, texts, func(current, total int, err error) bool {
        if err != nil {
            fmt.Printf("Processing failed: %v\n", err)
            return false // Return false to early terminate the entire batch task
        }
        fmt.Printf("\rExtracting features... Current progress: %d / %d (%.1f%%)", current, total, float64(current)/float64(total)*100)
        return true
    })

    if err != nil {
        panic(err)
    }
    fmt.Printf("\nSuccessfully generated %d Embedding entries!\n", len(embeddings))
}

Multimodal Support: Image-Text Vectorization (CLIP)¶

CLIPProvider is the most special and powerful member of GoChat's Embedding module. It can not only process text, but also directly understand image pixels and project image-text features into the same vector space. This means you can use text vectors to directly calculate cosine distance with image vectors!

Features and Advantages: One-click connection between text and visual domains. Built-in graphics processor based on Go native Image library (Resize, Center Crop, Normalize).
Applicable Scenarios: Building "image search by text" image library, visual zero-sample classification system.

CLIP Quick Start Example¶

package main

import (
    "context"
    "fmt"
    "os"
    "github.com/DotNetAge/gochat/pkg/embedding"
)

func main() {
    // Pull CLIP model
    provider, err := embedding.WithCLIP("clip-vit-base-patch32", "")
    if err != nil {
        panic(err)
    }

    ctx := context.Background()

    // 1. Text encoding (using query words as example)
    textVectors, _ := provider.Embed(ctx, []string{"A photo of a cute cat"})
    fmt.Printf("Text vector generated successfully, dimension: %d\n", len(textVectors[0]))

    // 2. Image encoding (using database image source as example)
    imgData, _ := os.ReadFile("cute_cat.jpg")
    imageVectors, err := provider.EmbedImages(ctx, [][]byte{imgData})
    if err != nil {
        panic(err)
    }
    fmt.Printf("Image vector generated successfully, dimension: %d\n", len(imageVectors[0]))

    // Later, you can directly calculate similarity score between textVectors[0] and imageVectors[0]
}

Extension Development Guide: How to Integrate Custom Local Models¶

With the flourishing development of the AI community, if you download a new or fine-tuned ONNX Embedding model online, GoChat opens the door to seamless integration for you.

Standard Implementation Process¶

Implement EmbeddingModel Interface: Provide model loading and lifecycle management (Run() and Close()).
Inject Model into Base Provider: Use embedding.New(Config{}) to wrap your custom model into a standard LocalProvider with concurrent locks and built-in Tokenizer.

Code Skeleton and Integration Example¶

This example demonstrates how to create a custom model driver and register it into the system's high-performance batch processing architecture:

package main

import (
    "context"
    "fmt"
    "log"
    "github.com/DotNetAge/gochat/pkg/embedding"
)

// 1. Implement the underlying EmbeddingModel interface
type CustomONNXModel struct {
    dimension int
    // onnxSession *onnxruntime.Session // Hold real inference session in actual project
}

func NewCustomONNXModel(modelPath string, dim int) (*CustomONNXModel, error) {
    // Load your .onnx weight file here...
    return &CustomONNXModel{dimension: dim}, nil
}

// Core inference logic:接管从 Tokenizer 产出的 input_ids 并返回最后的特征张量
func (m *CustomONNXModel) Run(inputs map[string]interface{}) (map[string]interface{}, error) {
    inputIDs, ok := inputs["input_ids"].([][]int64)
    if !ok {
        return nil, fmt.Errorf("invalid input tensor")
    }

    batchSize := len(inputIDs)

    // Execute ONNX Runtime inference...
    // Suppose inference outputs matrix as embeddings
    embeddings := make([][]float32, batchSize)
    for i := 0; i < batchSize; i++ {
        embeddings[i] = make([]float32, m.dimension) // Mock
    }

    // Must return a map containing "last_hidden_state" or "embeddings" keys
    return map[string]interface{}{
        "last_hidden_state": embeddings,
    }, nil
}

func (m *CustomONNXModel) Close() error {
    // Release CGo or system memory
    return nil
}

func main() {
    ctx := context.Background()

    // 2. Instantiate custom model engine
    model, _ := NewCustomONNXModel("path/to/custom_finetuned.onnx", 768)
    defer model.Close()

    // 3. Wrap custom engine as generic Provider
    provider, err := embedding.New(embedding.Config{
        Model:        model,  // Inject your engine
        Dimension:    768,    // Declare output dimension
        MaxBatchSize: 32,     // Declare max tensor batch limit
    })
    if err != nil {
        log.Fatalf("Failed to generate Provider: %v", err)
    }

    // 4. Successfully integrated! You can now feed it to BatchProcessor to enjoy all advanced features
    processor := embedding.NewBatchProcessor(provider, embedding.BatchOptions{
        MaxBatchSize:  16,
        MaxConcurrent: 2,
    })

    results, _ := processor.Process(ctx, []string{"This is the power of my custom fine-tuned model!"})
    fmt.Printf("Processed successfully, extracted %d-dimensional features.\n", len(results[0]))
}

Through this extension mechanism, you can easily handle various strange new model protocols in the future.