HYBRID SEARCH WITH POSTGRESQL AND PGVECTOR
A key metric when evaluating vector similarity search algorithms is “recall” - which measures the relevancy of the returned search results. Typically better recall means better quality search results, but this is often at the cost of another key metric, such as index size or query latency. This has led to different techniques to “boost” recall while trying to limited any adverse impact to other metrics. There are a variety of techniques available for this, such as using different storage and search strategies to overcompensate for a key metric tradeoff. For example, quantization techniques can cause information loss when reducing the size of a vector, but using statistical quantization can help improve results in some cases.
THE 150X PGVECTOR SPEEDUP: A YEAR-IN-REVIEW
I wanted to write a “year-in-review” covering all the performance pgvector has made (with significant credit to Andrew Kane), highlighting specific areas where pgvector has improved (including one 150x improvement!) and areas where we can continue to do better.
SCALAR AND BINARY QUANTIZATION FOR PGVECTOR VECTOR SEARCH AND STORAGE
While many AI/ML embedding models generate vectors that provide large amounts of information by using high dimensionality, this can come at the cost of using more memory for searches and more overall storage. Both of these can have an impact on the cost and performance of a system that’s storing vectors, including when using PostgreSQL with the pgvector for these use cases.
WILL POSTGRESQL EVER CHANGE ITS LICENSE?
(Disclosure: I’m on the PostgreSQL Core Team, but what’s written in this post are my personal views and not official project statements…unless I link to something that’s an official project statement ;)
DISTRIBUTED QUERIES FOR PGVECTOR
The past few releases of pgvector have emphasized features that help to vertically scale, particularly around index build parallelism. Scaling vertically is convenient for many reasons, especially because it’s simpler to continue managing data that’s located within a single instance.
PGCONF.DEV: WHY, WHAT, AND HOW YOU CAN PARTICIPATE
When I first began exploring how to get involved in the PostgreSQL community, the first event I heard of was PGCon. I was still in college when PGCon had started(!), and I did have FOMO about not going (that said, I don’t think the phrase “FOMO” existed yet). Through the years, the timing of PGCon became very important: it served as a checkpoint between the in progress PostgreSQL major release (Beta 1 would have launched 1-2 weeks prior) and upcoming work on the new version of PostgreSQL. Additionally, because of the concentration of PostgreSQL contributors, both hackers and community builders, it was a great place to discuss how we can continue to make the PostgreSQL community better.
THOUGHTS ON POSTGRESQL IN 2024
A question I often hear, and also ask myself, is “where is PostgreSQL going?” This is a deep question: it’s not limited to the work on the core database engine, but rather everything going on in the community, including related open source projects and event and community development. Even with the popularity of PostgreSQL, which was selected as DB Engine’s “DBMS of the Year” for the fourth time, it’s a good idea to step back at times and reflect on what PostgreSQL will look like in the future. While it may not necessarily lead to immediate changes, it does help give context to all the work going on in the community.
PGVECTOR 0.5.0 FEATURE HIGHLIGHTS AND HOWTOS
It’s here! pgvector 0.5.0 is released and has some incredible new features. pgvector is an open-source project that brings vector database capabilities to PostgreSQL. The pgvector community is moving very rapidly on adding new features, so I thought it prudent to put together some highlights of the 0.5.0 release.
AN EARLY LOOK AT HNSW PERFORMANCE WITH PGVECTOR
(Disclosure: I have been contributing to pgvector, though I did not work on the HNSW implementation outside of testing).
VECTORS ARE THE NEW JSON IN POSTGRESQL
That in itself is an interesting statement, given vectors are a well-studied mathematical structure, and JSON is a data interchange format. And yet in the world of data storage and retrieval, both of these data representations have become the lingua franca of their domains and are either essential, or soon-to-be-essential, ingredients in modern application development. And if current trends continue (I think they will), vectors will be as crucial as JSON is for building applications.
- 1
- 2