what is vector search, what are the examples - OpenSearch

OpenSearch provides the vector search feature for diverse and complex business scenarios. In specific scenarios, such as question-answering searches in education and image searches, you can use the vector search feature with the multi-channel search feature to significantly improve search result accuracy. This topic describes the syntax for and notes on vector indexes.

Syntax

query = vector_index:'vector'&vector_search={"vector_index":{"namespaces":[],"threshold":0.5,"top_n":10,"search_params":{}}}

The vector_search parameter is optional. You can use it to configure vector index queries. This parameter is in a dictionary format. The key is the name of the vector index to query, and the value contains the specific query configuration. The value can contain the following common configuration items:

Parameter	Type	Default value	Description
namespaces	list<string>		Partitions a vector index by namespace. This lets you limit query requests to specific partitions of the index. Do not use more than 10,000 namespaces. If you configure namespaces, you must specify a namespace in your queries.
threshold	float		The minimum score threshold for vector retrieval.
top_n	uint32		The number of top N results to return for vector retrieval.
search_params.qc_scan_ratio	float	0.01	The ratio of documents to scan during a QC index query. The number of scanned documents = Total number of documents × scan_ratio.
search_params.hnsw_ef	uint32	500	The number of documents to scan during an HNSW index query. A larger value increases the recall rate and the time consumed.

Note

The vector_search parameter is also valid in multi-channel recall scenarios.

Example: Query a 64-dimensional vector index

vector: '0.377796,-0.958450,0.409853,-0.238177,-1.293826,0.356797,-0.295727,0.847301,-1.220337,0.148032,-1.128458,0.903187,0.509352,0.293686,-1.005852,-0.488839,0.888227,-0.555556,-0.658025,0.267552,-0.567601,0.003045,0.591734,-0.515983,-1.316453,-1.462450,0.091946,1.554954,0.384802,0.720498,0.144338,1.217826,0.724039,0.044212,0.571332,-1.425430,0.618965,0.481887,-1.617787,1.505416,-0.683652,1.030900,0.562021,0.162437,0.816546,0.112229,-0.739288,-0.342643,-0.199292,0.508368,-1.384887,-1.842170,0.952622,-1.699499,0.199430,-0.232464,-0.273227,-0.383696,-0.511302,0.005458,1.873572,-0.926169,-0.417587,-0.660156'

Usage examples

Set a minimum score threshold

Description: Excludes vectors with scores below the specified threshold from the retrieval results.

Old parameter format: &sf=number

New parameter format: vector_search={"vector_index":{"threshold":0.8}}

Example:

// Old version
query=index_name:'0.1,0.2,0.98,0.6;0.3,0.4,0.98,0.6&sf=0.8'

// New version
query=index_name:'0.1,0.2,0.98,0.6;0.3,0.4,0.98,0.6'&vector_search={"index_name":{"threshold":0.8}}

Specify a top N query

Description: Specifies the number of top results to return from the vector search.

Old parameter format: &n=number

New parameter format: vector_search={"vector_index":{"top_n":10}}

Example:

// Old version
query=vector_index:'0.1,0.2,0.98,0.6;0.3,0.4,0.98,0.6&n=10'

// New version
query=vector_index:'0.1,0.2,0.98,0.6;0.3,0.4,0.98,0.6'&vector_search={"index_name":{"top_n":10}}

Sort results by vector score

Description: You can use the proxima_score() function in a fine sort expression to sort results based on vector scores.

Steps:

Create a fine sort policy:

Note: The parameter of the proxima_score function is the name of the vector index.

On the Search Test page, reference the fine sort policy that you created and run a test:

Note

The system uses the Euclidean distance (l2) by default.
Inner product (ip) distance: A larger vector score indicates higher document relevance.
Euclidean distance (l2): A smaller vector score indicates higher document relevance.

Notes

By default, the system uses the Euclidean distance (l2) for vector distance during index building. If you want to use the inner product (ip) distance, you must normalize the vector before you pass the data to the engine.
The field that corresponds to a vector index must be of the DOUBLE_ARRAY type.
The vector tokenizer supports 64, 128, 256, and 512 dimensions. The number of elements in the corresponding DOUBLE_ARRAY field must match the specified dimension.
The maximum length of a vector index is 4 KB before encoding. A query supports a maximum of two vector indexes.