A key metric when evaluating vector similarity search algorithms is “recall” - which measures the relevancy of the returned search results.
I wanted to write a “year-in-review” covering all the performance pgvector has made (with significant credit to Andrew Kane), highlighting specific areas where pgvector has improved (including one 150x improvement!
While many AI/ML embedding models generate vectors that provide large amounts of information by using high dimensionality, this can come at the cost of using more memory for searches and more overall storage.
(Disclosure: I’m on the PostgreSQL Core Team, but what’s written in this post are my personal views and not official project statements…unless I link to something that’s an official project statement ;)
The past few releases of pgvector have emphasized features that help to vertically scale, particularly around index build parallelism.
When I first began exploring how to get involved in the PostgreSQL community, the first event I heard of was PGCon.
A question I often hear, and also ask myself, is “where is PostgreSQL going?
It’s here! pgvector 0.5.0 is released and has some incredible new features.
(Disclosure: I have been contributing to pgvector, though I did not work on the HNSW implementation outside of testing).
Vectors are the new JSON. That in itself is an interesting statement, given vectors are a well-studied mathematical structure, and JSON is a data interchange format.
- 1
- 2