← Back to Blog

MiniOB Vector Database Implementation

MiniOB Vector Database Implementation

In the 4th OceanBase Database Innovation Design Competition, our team implemented vector database functionality. This article provides a detailed technical overview.

Vector Database Overview

A vector database is a specialized database system for storing and retrieving high-dimensional vector data. It plays a crucial role in:

  • Recommendation Systems: Similar user or item recommendations based on user behavior vectors
  • Image Search: Converting images to feature vectors for similar image retrieval
  • Natural Language Processing: Storage and semantic search of text embedding vectors

Core Technical Implementation

1. HNSW Index

We adopted the Hierarchical Navigable Small World (HNSW) algorithm for vector indexing:

class HNSWIndex {
public:
    void insert(const std::vector<float>& vec, int id);
    std::vector<int> search(const std::vector<float>& query, int k);
private:
    int max_level_;
    std::vector<std::vector<Node>> layers_;
};

2. Distance Calculation Optimization

To improve query performance, we implemented SIMD-accelerated distance calculation:

float euclidean_distance_simd(const float* a, const float* b, int dim) {
    __m256 sum = _mm256_setzero_ps();
    for (int i = 0; i < dim; i += 8) {
        __m256 va = _mm256_loadu_ps(a + i);
        __m256 vb = _mm256_loadu_ps(b + i);
        __m256 diff = _mm256_sub_ps(va, vb);
        sum = _mm256_fmadd_ps(diff, diff, sum);
    }
    // ... reduction sum
}

Performance Test Results

On a dataset of 1 million 128-dimensional vectors, our implementation achieved:

MetricValue
Insert Speed10,000 rows/sec
Query Latency< 5ms (top-10)
Recall Rate> 95%

Conclusion

Through optimizing index structures and distance calculations, we successfully implemented efficient vector database functionality in MiniOB, laying the foundation for future competitions and research.