WO2024067593A1

WO2024067593A1 - Vector retrieval method and device

Info

Publication number: WO2024067593A1
Application number: PCT/CN2023/121585
Authority: WO
Inventors: 邝达; 施佩珍; 王兵
Original assignee: 华为技术有限公司
Priority date: 2022-09-28
Filing date: 2023-09-26
Publication date: 2024-04-04
Also published as: CN117828131A

Abstract

A vector retrieval method and device, for use in solving the problem of low retrieval speed existing in existing retrieval methods. In the present application, the method comprises: acquiring a vector to be queried; respectively performing similarity calculation on the vector to be queried and partition center vectors of M clustering partitions to obtain M first similarities; determining K retrieval partitions according to the M first similarities; performing loop execution on the following operations until a probability value that a target retrieval partition includes a target vector is greater than a first preset threshold: selecting a retrieval partition from the K retrieval partitions as the target retrieval partition, calculating a second similarity between the vector to be queried and each vector included in the target retrieval partition, and according to the second similarities, determining the probability value that the target retrieval partition includes the target vector; and outputting a query result on the basis of the selected at least one retrieval partition. A query result can be obtained without calculating the similarities between a vector to be queried and all the vectors in a vector base library, the amount of calculation can be reduced, and the query speed is increased.

Description

A vector search method and device

This application claims the priority of a Chinese patent application filed with the China Patent Office on September 28, 2022, with application number 202211193810.4 and invention name “A vector retrieval method and device”. The entire contents of the patent application are incorporated into this application by reference.

Technical Field

The present application relates to the field of search technology, and in particular to a vector search method and device.

Background technique

Vector retrieval plays an important role in the field of information retrieval. The process of vector retrieval is to first construct a vector base library, which contains a large number of vectors obtained by feature extraction of a large amount of data. The data can be in the form of pictures, videos, audio, text, etc. Then, the similarity between the query vector input by the user and all the vectors in the vector base library is calculated respectively, and the vectors corresponding to the first W similarities are sorted from high to low as the query result of the query vector.

This method performs a global search and comparison on a vector database containing hundreds of millions or even billions of vectors. It has a low retrieval throughput (Query Per Second) and a low retrieval speed.

Summary of the invention

The present application provides a vector search method and device, which are used to solve the problem of low search speed in existing vector search methods.

In a first aspect, the present application provides a vector retrieval method, which can be specifically executed by a computing device or by a chip inside the computing device, or by a processor in the computing device. The method includes: obtaining a vector to be queried;

Calculate similarity between the query vector and the partition center vectors of M cluster partitions respectively to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarity between the vectors; the partition center vector of any cluster partition is determined according to the multiple vectors contained in any cluster partition, and M is an integer greater than 1; among the M first similarities, select K first similarities ranked in descending order, and determine the cluster partitions corresponding to the K first similarities as K retrieval partitions, and K is an integer greater than or equal to 1, and K is less than M;

The following operations are performed in a loop until it is determined that the probability value of the target search partition selected from the K search partitions containing the target vector is greater than a first preset threshold, wherein the target vector is a vector whose similarity with the query vector is within a preset range:

Selecting an unselected retrieval partition from the K retrieval partitions as a target retrieval partition; calculating a second similarity between the query vector and each vector included in the target retrieval partition; and determining a probability value of the target retrieval partition containing a target vector according to each of the second similarities;

Based on the at least one selected retrieval partition and the vector to be queried, a query result is output.

In the above technical solution, it is not necessary to calculate the similarity between the vector to be queried and all the vectors in the vector base library to obtain the query result. Instead, each vector in the vector base library is clustered to obtain M cluster partitions, each cluster partition corresponds to a partition center vector; by calculating the first similarity between the vector to be queried and the partition center vectors of the M cluster partitions, K retrieval partitions are selected from the M cluster partitions according to the size relationship of the M first similarities; the target retrieval partition is selected in turn from the K retrieval partitions, and the probability value of the vector identical or similar to the vector to be queried falling in the selected target retrieval partition is determined for each selected target retrieval partition until a target retrieval partition with a probability value greater than a first preset threshold is selected. Then the query result of the vector to be queried is determined in at least one selected target retrieval partition. In this way, the amount of calculation can be reduced and the query speed can be improved.

In a possible implementation, based on at least one selected retrieval partition and the vector to be queried, the query result is output, including: outputting each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition as the query result; or, according to the order from high to low of the second similarities between each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition and the vector to be queried, outputting the vectors corresponding to the first W second similarities as the query result, where W is a positive integer.

In the above technical solution, since there is only one retrieval partition selected whose probability value is greater than the first preset threshold, all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are output as the query result, which can effectively reduce the amount of calculation and improve the retrieval speed. Alternatively, all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are respectively outputted from high to low with respect to the second similarity between the vector to be queried. The vectors corresponding to the W second similarities that are sorted in the order of are output as the query result, which can further simplify the output result.

In a possible implementation, based on at least one selected retrieval partition and the vector to be queried, the query result is output, including: outputting the vectors respectively contained in the at least one selected retrieval partition as the query result; or according to the order from high to low of the second similarities between the vectors respectively contained in the at least one selected retrieval partition and the vector to be queried, outputting the vectors corresponding to the first W second similarities as the query result, where W is a positive integer.

In the above technical solution, the query result is output based on at least one selected retrieval partition, rather than outputting the query result based on the retrieval partition whose probability value is greater than the first preset threshold. Since the retrieval partition whose probability value of the vector identical or similar to the vector to be queried falls in the selected partition is not greater than the first preset threshold, there may also be vectors with a high similarity to the vector to be queried, so the above solution can improve the accuracy of vector retrieval.

In a possible implementation, selecting an unselected retrieval partition as the target retrieval partition from the K retrieval partitions may be performed by selecting an unselected retrieval partition as the target retrieval partition from the K retrieval partitions in a descending order of the K first similarities.

In this way, by selecting the target retrieval partition in the order of the K first similarities, the target retrieval partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of re-selecting the target retrieval partition is minimized. There is no need to calculate the similarity between the query vector and the vector in the target retrieval partition selected again, thereby reducing the amount of calculation and improving the retrieval speed.

In a possible implementation, selecting a retrieval partition that has not been selected as a target retrieval partition from among the K retrieval partitions may also be as follows: for any retrieval partition among the K retrieval partitions, clustering each vector in the retrieval partition according to the similarity between the vectors to obtain a plurality of retrieval sub-partitions; determining a sub-partition center vector of any retrieval sub-partition according to a plurality of vectors contained in any retrieval sub-partition; calculating a third similarity between the vector to be queried and the sub-partition center vectors of the plurality of retrieval sub-partitions; sorting the K retrieval partitions according to a plurality of third similarities between the vector to be queried and the plurality of sub-partition center vectors in each retrieval partition; and selecting a retrieval partition that has not been selected as the target retrieval partition from among the sorted K retrieval partitions.

In this way, by further clustering the vectors in the retrieval partition, multiple retrieval sub-partitions are obtained, and each retrieval sub-partition corresponds to a sub-partition center vector. Due to the more detailed division, the sub-partition center vector obtained by the division can more accurately represent the vector in the retrieval sub-partition. Based on the multiple third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition, the K retrieval partitions are sorted, and the accuracy of the sorting can be improved. In this way, the target retrieval partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of re-selecting the target retrieval partition is minimized, so there is no need to calculate the similarity between the vector to be queried and the vector in the target retrieval partition selected again, so the amount of calculation can be reduced and the retrieval speed can be improved.

In a possible implementation, the K retrieval partitions are sorted according to multiple third similarities between the vector to be queried and multiple sub-partition center vectors in each retrieval partition, including: sorting the K retrieval partitions according to the number of third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition that exceed a second preset threshold; or sorting the K retrieval partitions according to the maximum similarity among the multiple third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition.

By sorting the K search partitions by the number of the third similarities among the multiple third similarities that exceed the second preset threshold or by the maximum similarity among the multiple third similarities, the sorting difficulty can be reduced, the sorting speed can be increased, and the search speed can be increased. At the same time, the accuracy of the sorting can also be improved. In this way, the target search partition with a probability value greater than the first preset threshold can be determined as early as possible.

In a possible implementation, the probability value of the target vector being included in the target retrieval partition is determined based on each of the second similarities, including: among each of the second similarities, determining t target second similarities whose second similarities are ranked first from high to low; inputting the query vector, the K first similarities and the t target second similarities into a prediction model to obtain the probability value; the prediction model is used to predict the probability value of the target vector being included in the target retrieval partition.

By predicting the probability value through the prediction model, the accuracy and speed of determining the probability value are improved. Among the second similarities, t target second similarities with the highest second similarities ranked in descending order are selected, and the t target second similarities are input into the prediction model, which can also reduce the calculation amount of the prediction model and improve the speed of predicting the probability value without affecting the prediction accuracy.

In a possible implementation, there are N vectors to be queried, where N is a positive integer greater than 1; accordingly, the vector to be queried, the K first similarities and the t target second similarities are input into a prediction model to obtain the probability value, including: inputting a matrix formed by the N vectors to be queried and a matrix formed by the K first similarities corresponding to each of the N vectors to be queried into a first prediction model to obtain N initial probability values corresponding to the N vectors to be queried; the initial probability value is used to characterize the probability that the target vector of the vector to be queried is contained in the K retrieval partitions corresponding to any vector to be queried; for any vector to be queried, the initial probability value corresponding to the vector to be queried and the t target second similarities corresponding to the vector to be queried are input into a second prediction model to obtain a final probability value corresponding to the vector to be queried.

The prediction of the probability value is divided into two stages, the first stage uses the first prediction model, and the second stage uses the second prediction model. Specifically, the matrix formed by the N query vectors and the matrix formed by the K first similarities corresponding to each query vector in the N query vectors are input into the first prediction model, so that the first prediction model can use the matrix multiplication method to predict the initial probability value, which gives full play to the computing power, improves the computing efficiency, and further improves the speed of vector retrieval.

In a second aspect, an embodiment of the present application provides a vector search device, which has the function of implementing the method in the first aspect or any possible implementation of the first aspect, and the device can be a computing device or a processor included in the computing device. The functions of the above-mentioned vector search device can be implemented by hardware, or by hardware executing corresponding software, and the hardware or software includes one or more modules or units or means corresponding to the above-mentioned functions.

In a possible implementation, the structure of the device includes a processing module and a transceiver module, wherein the processing module is configured to support the device to execute the method in the first aspect or any one of the implementations of the first aspect. The transceiver module is used to support communication between the device and other devices, for example, it can receive data from an acquisition device. The vector retrieval device may also include a storage module, which is coupled to the processing module and stores program instructions and data necessary for the device. As an example, the processing module may be a processor, the transceiver module may be a transceiver, and the storage module may be a memory. The memory may be integrated with the processor or may be set separately from the processor.

In another possible implementation, the structure of the device includes a processor and may also include a memory. The processor is coupled to the memory and may be used to execute computer program instructions stored in the memory so that the device performs the method in the first aspect or any possible implementation of the first aspect. Optionally, the device further includes a communication interface, and the processor is coupled to the communication interface. When the device is a computing device, the communication interface may be a transceiver or an input/output interface.

In a third aspect, an embodiment of the present application provides a chip, including a processor, wherein the processor is coupled to a memory, and the memory is used to store programs or instructions. When the programs or instructions are executed by the processor, the chip implements the method in the above-mentioned first aspect or any possible implementation method of the first aspect.

Optionally, the chip further includes an interface circuit for interacting code instructions with the processor.

Optionally, there may be one or more processors in the chip, and the processor may be implemented by hardware or software. When implemented by hardware, the processor may be a logic circuit, an integrated circuit, etc. When implemented by software, the processor may be a general-purpose processor implemented by reading software code stored in a memory.

Optionally, the memory in the chip may be one or more. The memory may be integrated with the processor or may be separately provided with the processor. Exemplarily, the memory may be a non-transient processor, such as a read-only memory ROM, which may be integrated with the processor on the same chip or may be provided on different chips.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium having a computer program or instructions stored thereon. When the computer program or instructions are executed, the computer executes the method in the above-mentioned first aspect or any possible implementation of the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product. When a computer reads and executes the computer program product, the computer executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.

The technical effects that can be achieved in any of the second to fifth aspects mentioned above can refer to the description of the beneficial effects in the first aspect mentioned above, and will not be repeated here.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1a is a schematic diagram of performing vector retrieval in a scene of image search provided by the present application;

FIG1b is a schematic diagram of performing vector retrieval in a drug discovery scenario provided by the present application;

FIG2 is a schematic diagram of a system architecture provided by the present application;

FIG3 is a schematic diagram of the structure of a computing device provided by the present application;

FIG4 is a schematic diagram of the structure of a processor provided by the present application;

FIG5a is a schematic diagram of a process flow of a vector retrieval technology provided by the present application;

FIG5b is a schematic diagram of M cluster partitions obtained after clustering vectors in a vector base library provided by the present application;

FIG6 is a schematic diagram of a flow chart of a vector search method provided by the present application;

FIG7 is a schematic diagram of dividing any search partition into search sub-partitions provided by the present application;

FIG8 is a schematic diagram of a second similarity between a query vector and a partition center vector of each retrieval partition and a third similarity between a query vector and a sub-partition center vector of each retrieval sub-partition provided by the present application;

FIG9 is a flow chart of a method for obtaining a probability value according to each second similarity provided by the present application;

FIG10 is a flow chart of another method for obtaining a probability value according to each second similarity provided by the present application;

FIG. 11a is a diagram of any query vector and M vectors obtained by calculating the matrix multiplication by a hardware accelerator provided in the present application. Schematic diagram of the matrix of M first similarities of the partition center vector;

FIG. 11b is a schematic diagram of a matrix formed by three first similarities corresponding to each of two query vectors provided in the present application;

FIG12 is a schematic diagram of a method for determining a probability value provided by the present application;

FIG13 is a schematic diagram of the overall process of a vector retrieval method provided by the present application;

FIG14 is a schematic diagram of a vector retrieval device provided in the present application.

Detailed ways

In order to better explain the present application, the technologies or terms involved in the present application are explained as follows.

1. Vector retrieval technology: In a given vector data set, vectors similar to the query vector are retrieved according to a certain metric.

2. K-means clustering algorithm is an iterative clustering analysis algorithm. Specifically, given the number of categories k, the entire data set is clustered. The objective function is the minimum sum of the distances from all samples to the class center. The objective function is iteratively calculated and optimized to obtain k class centers and the category to which each sample belongs.

3. Retrieval precision, also known as recall rate. Given a query vector, the retrieval system searches the query vector and returns W vectors as query results. Let the set of the returned W vectors be X, and let the set of the top W vectors in the entire vector base that are ranked from high to low in similarity with the query vector be Y. Then the retrieval precision of the retrieval system for the query vector is |X∩Y|/|Y|.

FIG1a shows a schematic diagram of vector retrieval in a scenario of image search. Specifically, a large number of vectors are obtained by performing feature extraction on a large number of images, and these large number of vectors are formed into a vector base library; the vectors to be queried obtained after feature extraction on the image to be queried are searched in the vector base library, and vectors that meet the similarity requirements with the vectors to be queried are retrieved; it is determined from which images these vectors that meet the similarity requirements are extracted, and these determined images are returned as query results. For example, in Internet applications, based on the product images to be queried input by the user, images containing products with similar appearance to the product images input by the user are retrieved; for example, based on the images of videos that the user frequently browses, other videos similar to these images are retrieved and pushed to the user, etc. The ever-increasing data scale of the Internet has put forward higher requirements on the retrieval speed and efficiency of the retrieval system.

Figure 1b shows a schematic diagram of vector retrieval in a drug discovery scenario. A large number of vectors obtained by encoding a large number of compounds with an encoder form a vector base library; the vectors to be queried obtained by encoding the active fragments or lead compounds of the drug to be queried with the encoder are searched in the vector base library to retrieve vectors that meet the similarity requirements with the vectors to be queried; the compounds corresponding to these vectors that meet the similarity requirements are returned as query results. The research and development of new drugs requires searching for compounds similar to the active fragments or lead compounds of new drugs as potential drugs in a compound base library of hundreds of millions/billions. Since the selection of similar compounds will affect subsequent animal experiments and clinical trials with longer cycles, this application also places great demands on the retrieval speed of the retrieval system.

How to determine the retrieved vector that meets the similarity requirement with the query vector? Here are two methods:

Method 1: Calculate the similarity between the query vector and all vectors in the entire vector base, and select W vectors ranked in descending order of similarity as the query results.

Method 2: Calculate the similarity between the query vector and each vector in the entire vector base database in turn until W vectors whose similarity meets the preset threshold are found, then stop calculating the similarity between the query vector and the remaining vectors in the vector base database.

Method 1 needs to calculate the similarity between the vector to be queried and all the vectors in the entire vector base. Although it can ensure the retrieval accuracy, the number of vectors in the vector base is very large, generally in the hundreds of millions/billions. This will result in a huge amount of calculation, which limits the improvement of the retrieval speed. The amount of calculation in method 2 is reduced compared to method 1, but if the preset threshold is set high, the amount of calculation is still large and the retrieval speed is slow; if the preset threshold is set low, the retrieval accuracy is affected. Therefore, using method 2 for vector retrieval has high requirements for the setting of the preset threshold, and even different preset thresholds need to be set for different vectors to be queried, and the retrieval method is not flexible enough.

In summary, the above-mentioned vector search method cannot take into account both search accuracy and search speed. Based on this, the present application embodiment provides a vector search method to improve the search speed while ensuring the search accuracy.

FIG2 provides a schematic diagram of a system architecture applicable to an embodiment of the present application, wherein the system includes a collection device 10, a computing device 20, and a storage device 30. The collection device 10 may be one or more, the computing device 20 may be one or more, and the storage device 30 may be one or more. One or more collection devices 10, one or more computing devices 20, and one or more storage devices 30 may be connected via a network.

The acquisition device 10 can be used to collect data and send the collected data to the computing device 20 through the network. The acquisition device 10 can be a camera, a mobile phone, a computer, etc., and the data collected by the acquisition device 10 can be pictures, videos, audio, text, etc. Exemplarily, in a video surveillance scenario, the acquisition device 10 can specifically be a camera, and the data collected by the camera can be, for example, pictures and/or videos taken by the camera.

The computing device 20 is used to extract features from any data obtained to obtain the vector corresponding to the data; a large number of vectors corresponding to a large amount of data form a vector base library, and a large number of vectors in the vector base library are clustered and calculated according to the similarity between the vectors, thereby obtaining M cluster partitions, and the similarity between each vector in each cluster partition is relatively high, wherein M is an integer greater than 1. Each cluster partition has a corresponding partition center vector, and the partition center vector of each cluster partition is determined according to the multiple vectors contained in the cluster partition. For example, the partition center vector of the cluster partition can be determined according to the mean, mode or median of the multiple vectors contained in the cluster partition. The partition center vector can be understood as a representative of the multiple vectors contained in the cluster partition, representing the characteristics of each vector contained in the cluster partition. The embodiment of the present application does not limit the clustering algorithm. For example, a k-means clustering algorithm, a mean shift clustering method, and a density-based clustering method can be used to perform clustering calculations on a large number of vectors in the vector base library according to the similarity between the vectors, thereby obtaining M cluster partitions.

The storage device 30 can be used to store multiple cluster partitions calculated by the computing device. Exemplarily, FIG5b shows a schematic diagram of M cluster partitions obtained after clustering the vectors in the vector base library. In FIG5b, it is assumed that 8 cluster partitions are obtained by clustering the vectors according to the similarity between the vectors, and the cluster partitions are distinguished by solid lines; the average value of each vector in each cluster partition is taken to obtain the partition center vector of the cluster partition, and the partition center vector is represented by a five-pointed star in the figure, and multiple black solid dots are used to represent multiple vectors contained in the cluster partition except the partition center vector. For example, a cluster partition contains 3 vectors, namely [1, 1, 1], [2, 2, 2] and [3, 3, 3], then the partition center vector of the cluster partition can be [2, 2, 2].

After the computing device 20 clusters the vectors in the vector base to obtain M cluster partitions, the partition center vector of each cluster partition in the M cluster partitions and each vector contained in each cluster partition can be sent to the storage device 30 for storage. In other words, the storage device 30 can store the data structure shown in FIG. 5b for the computing device 20 to perform subsequent vector retrieval.

In the vector retrieval stage, the acquisition device 10 can be used to acquire or obtain the data to be queried, and send the data to be queried to the computing device 20. For example, a user opens a shopping application and enters a picture to be queried containing a product to be queried in the shopping application. The acquisition device acquires the picture to be queried and can send the picture to be queried to the computing device 20.

The computing device 20 is used to extract features of the query image to obtain a query vector corresponding to the query image; then search for similar vectors in the M cluster partitions stored in the storage device 30 according to the query vector, and feed back the found similar vectors to the user.

It should be understood that the acquisition device 10, the computing device 20 and the storage device 30 may be integrated into the same device or respectively arranged in different devices. For example, the computing device 20 and the storage device 30 may be integrated into a server, and the acquisition device 10 may be integrated into a terminal device.

3 is a schematic diagram of a possible structure of a computing device 20, and the computing device 20 includes a processor 201, a memory 202, and a communication interface 203. Among them, any two of the processor 201, the memory 202, and the communication interface 203 may be connected via a bus 204.

The processor 201 may be a central processing unit (CPU), which may be used to execute software programs in the memory 202 to implement one or more functions, such as extracting features from data. In addition to the CPU, the processor 201 may also be an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a system on chip (SoC) or a complex programmable logic device (CPLD), a graphics processing unit (GPU), a neural-network processing unit (NPU), etc.

It should be noted that, in actual applications, there may be multiple processors 201, and the multiple processors 201 may include multiple processors of the same type, or may include multiple processors of different types. For example, multiple processors 201 are multiple CPUs. For another example, the multiple processors 201 include one or more CPUs and one or more GPUs. For another example, the multiple processors 201 include one or more CPUs and one or more NPUs. Alternatively, the multiple processors 201 include one or more CPUs, one or more GPUs, and one or more NPUs, etc. Among them, the processor 201 (such as a CPU, an NPU, etc.) may include one core, or may include multiple cores.

The memory 202 refers to a device for storing data, which can be a memory or a hard disk.

Memory refers to an internal memory that directly exchanges data with the processor 201. It can read and write data at any time and at a very fast speed. It serves as a temporary data storage for the operating system or other running programs running on the processor 201. Memory includes volatile memory (volatile memory), such as random access memory (RAM), dynamic random access memory (DRAM), etc., and may also include non-volatile memory (non-volatile memory), such as storage class memory (SCM), etc., or a combination of volatile memory and non-volatile memory. In practical applications, multiple memories can be configured in the computing device 20, and optionally, the multiple memories can be of different types. This embodiment does not limit the number and type of memory. In addition, the memory can be configured to have a power-saving function. The power-saving function means that when the system loses power and then powers on again, the data stored in the memory will not be lost. Memory with a power-saving function is called a non-volatile memory.

The hard disk is used to provide storage resources, such as for storing pictures, videos, audio, text and other data collected by the acquisition device 10. The hard disk includes but is not limited to: non-volatile memory (non-volatile memory), such as read-only memory (ROM), hard disk drive (HDD) or solid state drive (SSD). The difference between the hard disk and the memory is that the hard disk is The read and write speed is relatively slow and it is usually used to store data persistently. In one embodiment, the data, program instructions, etc. in the hard disk need to be loaded into the memory first, and then the processor obtains the data and/or program instructions from the memory.

The communication interface 203 is used for communicating with other devices, for example, for the computing device 20 to communicate with the acquisition device 10 or the storage device 30 .

In practical applications, as shown in FIG4 , the computing device 20 may include two processors 201, which may be a CPU and an NPU, respectively. The CPU may include 6 CPU cores, and the NPU may include 2 NPU cores, which may also be called AI cores. The computing power of the NPU is higher than that of the CPU. The CPU can be used to perform similarity sorting in the data retrieval process, and the NPU can be used to perform similarity calculation in the data retrieval process. For details, see the structure of a processor 201 in a computing device 20 shown in FIG4 .

Based on the system architecture shown in FIG. 2 and the hardware architecture of the computing device shown in FIG. 3 and FIG. 4 , the present application exemplarily provides a flow chart of vector retrieval, which can be seen in FIG. 5a . Specifically, the flow chart can be executed by the computing device 20 shown in FIG. 3 to FIG. 4 , and the flow chart can be roughly divided into the following three stages:

1. Feature extraction stage

The computing device 20 inputs each sample image into a preset feature extraction model for the acquired multiple sample images. The embodiment of the present application does not limit the type of feature extraction model. For example, it can be input into a convolutional neural network (CNN) model for feature extraction, so that the CNN model outputs the vector corresponding to each sample image. Subsequently, the computing device 20 stores the vector corresponding to each sample image in a vector base library, which can be located in the memory 202 of the computing device 20 or in the storage device 30. The storage device 30 can be an independent storage medium or memory, etc.

2. Clustering stage

The computing device 20 clusters each vector in the vector base according to the similarity between the vectors to obtain M cluster partitions, where each cluster partition corresponds to a partition center vector, and M is an integer greater than 1. Exemplarily, each vector in the vector base can be clustered in the following two ways to obtain M cluster partitions:

Implementation method 1, directly cluster each vector in the vector base library to obtain M cluster partitions and the partition center vector of each cluster partition. The partition center vector is obtained by each vector in the cluster partition, for example, taking the average value, median, etc. of each vector, which is not limited in the embodiment of the present application. The specific clustering algorithm can be a k-means clustering algorithm, a fuzzy c-means clustering algorithm, a mean shift clustering method, and a density-based clustering method, which is not limited in the embodiment of the present application.

Implementation method 2: randomly select a preset proportion (for example, about 10%) of vectors from the vector base library as training samples, cluster the training samples to obtain M cluster partitions and the partition center vector of each cluster partition. The specific clustering algorithm can be a k-means clustering algorithm, a fuzzy c-means clustering algorithm, a mean shift clustering method, and a density-based clustering method, etc., which is not limited in the embodiments of the present application. With the M partition center vectors as the center, the other vectors in the vector base library except the training samples are clustered into the M cluster partitions respectively. In this way, the amount of calculation for determining the partition center vector can be reduced, and the speed of determining the partition center vector can be increased.

3. Vector Retrieval Stage

After the processing of the above-mentioned feature extraction stage and clustering stage, multiple cluster centers can be obtained, and each cluster center has its own corresponding partition center vector. Subsequently, when the user has a query request, the data to be queried can be input to the computing device 20 through the client, and the computing device 20 performs feature extraction based on the acquired data to be queried to obtain the vector to be queried, and then calculates the similarity between the vector to be queried and the partition center vectors of the M cluster partitions, and obtains M first similarities. Then, among the M first similarities, select the K first similarities that are ranked first from high to low, and determine the cluster partitions corresponding to the K first similarities as K retrieval partitions.

Select an unselected retrieval partition from the K retrieval partitions as the target retrieval partition, calculate the second similarity between the query vector and each vector contained in the target retrieval partition; determine the probability value of the target retrieval partition containing the target vector according to each second similarity, where the target vector refers to a vector whose similarity with the query vector is within a preset range. For example, the target vector refers to a vector whose similarity with the query vector is greater than 0.9. If the probability value is greater than the first preset threshold, the next target retrieval partition will no longer be selected from the unselected retrieval partition, and the retrieval for the query vector can be terminated; if the probability value is not greater than the first preset threshold, continue to select the next target retrieval partition from the unselected retrieval partition, continue to calculate the second similarity between the query vector and each vector contained in the newly selected target retrieval partition, determine the probability value of the newly selected target retrieval partition containing the target vector according to each second similarity, and compare the probability value with the first preset threshold again... Repeat the above steps until the probability value corresponding to the selected target retrieval partition is greater than the first preset threshold, then stop selecting the next target retrieval partition from the unselected retrieval partition.

It can be seen that the vector retrieval method provided in the embodiment of the present application improves the speed of vector retrieval by reasoning whether to terminate the retrieval of the current vector to be queried in advance during the vector retrieval process. For example, the second similarities between the vector to be queried and each vector in the first retrieval partition are first calculated, and the probability that the target vector is included in the first retrieval partition is determined based on each second similarity. If the probability is high, the second similarities between the vector to be queried and each vector in other retrieval partitions are no longer calculated. In this way, the amount of retrieval calculation can be reduced and the speed of vector retrieval can be improved.

The vector retrieval method provided by the embodiment of the present application will be described in detail below through specific steps. As shown in FIG6 , the method can be performed by the above The computing device in FIG. 2 executes, or the chip in the computing device executes, the steps include:

Step 601, obtaining a vector to be queried. Exemplarily, the vector to be queried may be a vector input by a user to a computing device through a query client, or may be any vector obtained by the computing device from a vector base library. This embodiment of the application does not limit this.

Step 602: Calculate the similarity between the query vector and the partition center vectors of M cluster partitions to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarity between the vectors; the partition center vector of any cluster partition is determined according to the multiple vectors contained in any cluster partition, and M is an integer greater than 1. Among the M first similarities, select the K first similarities that are ranked first in descending order, and determine the cluster partitions corresponding to the K first similarities as K search partitions, where K is an integer greater than or equal to 1 and K is less than M.

Step 603, looping and performing the following operations until it is determined that the probability value of the target search partition selected from the K search partitions containing the target vector is greater than a first preset threshold, wherein the target vector is a vector whose similarity with the query vector is within a preset range:

Select an unselected retrieval partition from the K retrieval partitions as a target retrieval partition; calculate the second similarities between the query vector and each vector contained in the target retrieval partition; and determine the probability value of the target retrieval partition containing the target vector based on each of the second similarities.

It should be noted that there is no restriction on the order of selecting the target retrieval partition from the K retrieval partitions, and it can be selected arbitrarily. For example, assuming that the K retrieval partitions include retrieval partition A, retrieval partition B, and retrieval partition C respectively, you can select any target retrieval partition from retrieval partition A, retrieval partition B, and retrieval partition C, such as selecting retrieval partition A. There are 100 vectors in retrieval partition A, and the 100 second similarities between the query vector and the 100 vectors in retrieval partition A are calculated. The vector corresponding to the second similarity greater than 0.9 among the 100 second similarities can be used as the target vector. For example, if 20 target vectors are determined, it can be determined that the probability value of containing the target vector in retrieval partition A is 20/100=0.2.

If the determined probability value is greater than the first preset threshold, there is no need to select the next target search partition from the remaining search partitions, and there is no need to calculate the second similarity between the query vector and each vector in the next target search partition, thereby saving the search calculation amount. For example, when the first preset threshold is 0.18, the above-mentioned probability value of determining that the search partition A contains the target vector is 0.2, which is greater than 0.18, so there is no need to calculate the second similarity between the query vector and each vector in the search partition B and the search partition C, respectively, which can save a lot of calculation workload.

If the determined probability value is not greater than the first preset threshold, it means that the number of target vectors contained in the currently selected target retrieval partition is too small. The retrieval accuracy is likely to be low when retrieving the query vector based on such a target retrieval partition. Therefore, continue to select a target retrieval partition from the remaining unselected retrieval partitions. For example, continue to select retrieval partition B, and repeat the steps after selecting retrieval partition A until the obtained probability value is greater than the first preset threshold, and then stop selecting retrieval partitions.

In the above method, it is not necessary to calculate the similarity between the query vector and all the vectors in the K search partitions, but only a part of the search partitions is selected, and the query vector and the vectors in this part of the search partitions are calculated similarly. In this way, the amount of search calculation can be reduced and the search speed can be improved.

Step 604: when the loop execution stops in step 603, the query result is output based on the at least one selected search partition and the vector to be queried. The query result is output based on the at least one selected search partition and the vector to be queried, which may include but is not limited to the following possible ways:

One possible way is to determine the retrieval partition whose probability value is greater than the first preset threshold value in at least one retrieval partition that has been selected. Since the condition for terminating the loop in step 603 is that the probability value is greater than the first preset threshold value, there is only one "retrieval partition whose probability value is greater than the first preset threshold value", which is the target retrieval partition selected last. For example, in the above example, the "retrieval partition whose probability value is greater than the first preset threshold value" is retrieval partition A. Then, the query result of the vector to be queried is retrieved in this retrieval partition. For example, each vector in the retrieval partition is output or fed back to the user as the query result of the vector to be queried. For another example, the W vectors corresponding to the W second similarities ranked first from high to low between the vector to be queried and each vector in the retrieval partition can be output or fed back to the user as the query result.

In the above technical solution, since there is only one retrieval partition selected whose probability value is greater than the first preset threshold, all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are output as the query result, which can further effectively reduce the retrieval calculation amount and thus improve the retrieval speed. Alternatively, all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are respectively sorted in descending order with the second similarities between the vector to be queried and the vector corresponding to the first W second similarities as the query result, which can further simplify the query result.

Another possible way is to retrieve the query result of the vector to be queried from the multiple vectors respectively included in the at least one retrieval partition that has been selected. In step 603, if the probability value of the first target retrieval partition selected is not greater than the first preset threshold, the second target retrieval partition can be selected. If the probability value corresponding to the second target retrieval partition is greater than the first preset threshold, the next target retrieval partition is no longer selected. Therefore, the number of "at least one retrieval partition that has been selected" here may be greater than 1. For example, in the previous example, the retrieval partition may be finally selected. Only when partition A and retrieval partition B find a retrieval partition that satisfies "a retrieval partition whose probability value is greater than a first preset threshold value", the query result of the vector to be queried can be retrieved from the multiple vectors respectively included in the retrieval partition A and the retrieval partition B. For example, the vectors respectively included in the at least one retrieval partition that has been selected can be output as the query result or fed back to the user. For another example, the W vectors corresponding to the W second similarities respectively ranked from high to low between the vector to be queried and the vectors in the at least one retrieval partition that has been selected can be output as the query result or fed back to the user.

For example, in step 603, the first target retrieval partition is first selected as retrieval partition A, and the second similarities between the query vector and each vector in the retrieval partition A are calculated. A probability value is determined according to each second similarity. If the probability value is not greater than the first preset threshold, the next target retrieval partition is selected as retrieval partition B; the second similarities between the query vector and each vector in the retrieval partition B are calculated, and a probability value is determined according to each second similarity. If the probability value is greater than the first preset threshold, the target retrieval partition is no longer selected. Then in step 604, "at least one retrieval partition that has been selected" includes retrieval partition A and retrieval partition B. Since the second similarities between the query vector and each vector in the retrieval partition A and the second similarities between the query vector and each vector in the retrieval partition B have been calculated in step 603, there is no need to repeat the calculation in step 604, so the amount of calculation does not increase, but the second similarities between the query vector and each vector in the retrieval partition A and the second similarities between the query vector and each vector in the retrieval partition B are directly sorted from high to low, and the W vectors corresponding to the W second similarities ranked first are used as the query results.

In the above technical solution, the query result is output based on at least one selected search partition, rather than outputting the query result based on the search partition whose probability value is greater than the first preset threshold. Since the vectors identical or similar to the vector to be searched may fall in the search partition whose probability value is not greater than the first preset threshold in the selected partition, there may also be vectors with a high similarity to the vector to be searched. Therefore, more and more accurate search results can be output based on the calculated second similarity between the vector to be searched and each vector in the selected search partition, thereby improving the accuracy of vector search.

In a possible implementation, the target retrieval partition may not be selected arbitrarily from the K retrieval partitions, but may be selected according to certain rules. Two methods for selecting the K retrieval partitions are described below.

Method 1: sort the K search partitions in descending order of the K first similarities, so that the unselected search partitions can be selected in order from the sorted K search partitions as the target search partitions.

For example, if the first similarity between the query vector and the partition center vector of retrieval partition A is 0.9, the first similarity between the query vector and the partition center vector of retrieval partition B is 0.8, and the first similarity between the query vector and the partition center vector of retrieval partition C is 0.7, then the K retrieval partitions are sorted in the following order: retrieval partition A - retrieval partition B - retrieval partition C. In this way, when selecting the target retrieval partition, it is also selected in this order.

Scientifically and reasonably sorting the K search partitions corresponding to each vector to be queried, and selecting the target search partition according to the order, will undoubtedly help find the "search partition with a probability value greater than the first preset threshold" as soon as possible, thereby increasing the speed of vector retrieval and reducing the time consumed in retrieval. For example, in the above example, if the probability value of search partition A is calculated first, it is likely that the probability value will be greater than the first preset threshold, so that the search can be terminated as soon as possible. However, if the probability value of search partition B is calculated first, it is likely that the search partition with a probability value greater than the first preset threshold will not be obtained, so that the similarity between the search vector to be queried and the vectors in search partition A needs to be calculated again, which undoubtedly increases the amount of calculation and the time consumed in retrieval.

Method 2: Cluster the vectors in each search partition according to the similarity between the vectors, and then obtain multiple search sub-partitions, each of which also has a corresponding sub-partition center vector; calculate the third similarity between the query vector and the sub-partition center vectors of the multiple search sub-partitions, and sort the K search partitions according to the multiple third similarities between the query vector and the multiple sub-partition center vectors in each search partition. In this way, the search partitions that have not been selected can be selected in order from the sorted K search partitions as the target search partitions.

FIG7 shows a schematic diagram of dividing any retrieval partition into retrieval sub-partitions provided by an embodiment of the present application. As shown in the figure, three retrieval partitions are clustered for any retrieval partition. For example, each retrieval partition is divided into 5 retrieval sub-partitions. Of course, the number of retrieval sub-partitions divided by different retrieval partitions may be different. In FIG7 , the retrieval partitions are distinguished by solid lines, and the retrieval sub-partitions are distinguished by dotted lines. The five-pointed star in the figure represents the partition center vector of the retrieval partition, and the triangle in the figure represents the sub-partition center vector of the retrieval sub-partition. The embodiment of the present application does not limit the way of clustering the vectors in any retrieval partition, and can refer to the method of clustering the vectors in the vector base library to obtain multiple cluster partitions.

When sorting each retrieval partition, the K retrieval partitions can be sorted according to the maximum similarity among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition. For example, FIG8 illustrates the second similarity between the query vector and the partition center vector of each retrieval partition and the third similarity between the query vector and the sub-partition center vector of each retrieval sub-partition. As shown in FIG8 , retrieval partition A, retrieval partition B and retrieval partition C, each retrieval partition is divided into 5 retrieval sub-partitions. Calculate the query vector Calculate the five third similarities between the subpartition center vectors of the five retrieval subpartitions in retrieval partition A, and select the maximum similarity among the five third similarities; calculate the five third similarities between the query vector and the subpartition center vectors of the five retrieval subpartitions in retrieval partition B, and select the maximum similarity among the five third similarities; calculate the five third similarities between the query vector and the subpartition center vectors of the five retrieval subpartitions in retrieval partition C, and select the maximum similarity among the five third similarities; sort the three maximum similarities in descending order, and accordingly, obtain the sorting of the three retrieval partitions.

The K retrieval partitions can also be sorted according to the number of third similarities that exceed the second preset threshold value among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition. For example, in FIG8 , the five third similarities between the query vector and the sub-partition center vectors of the five retrieval sub-partitions in the retrieval partition A are calculated, and the number x1 of the third similarities that exceed the second preset threshold value is determined; the five third similarities between the query vector and the sub-partition center vectors of the five retrieval sub-partitions in the retrieval partition B are calculated, and the number x2 of the third similarities that exceed the second preset threshold value is determined; the five third similarities between the query vector and the sub-partition center vectors of the five retrieval sub-partitions in the retrieval partition C are calculated, and the number x3 of the third similarities that exceed the second preset threshold value is determined; x1, x2 and x3 are sorted in order from high to low, and accordingly, the sorting of the three retrieval partitions is obtained.

Since the sub-partition center vector obtained by division can more accurately represent the vector in the search sub-partition, the K search partitions are sorted based on the multiple third similarities between the query vector and the multiple sub-partition center vectors in each search partition, which can improve the accuracy of the sorting. In this way, the target search partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of reselecting the target search partition is minimized, so there is no need to calculate the similarity between the query vector and the vector in the target search partition selected again, so the amount of calculation can be reduced and the search speed can be improved.

The five-pointed star in Figure 8 represents the partition center vector of the retrieval partition, the triangle represents the sub-partition center vector of the retrieval sub-partition, and the square represents the vector to be queried. Figure 8 shows the influence of different sorting methods on the sorting of K retrieval partitions. When the three retrieval partitions are sorted in order from high to low according to the three first similarities, the retrieval partitions are sorted according to the order of the size of the three first similarities: retrieval partition A-retrieval partition B-retrieval partition C. Figure 8 shows the three first similarities (the three first similarities are represented by the distance from the square to the three five-pointed stars in Figure 8, and the closer the distance, the higher the similarity).

When the three retrieval partitions are sorted according to the maximum similarity among multiple third similarities between the query vector and multiple sub-partition center vectors in each retrieval partition, the three retrieval partitions are sorted as follows: retrieval partition B-retrieval partition A-retrieval partition C. FIG8 shows the maximum similarity among the third similarities between the query vector and the five sub-partition center vectors in each retrieval partition (represented by the distance from the square to the three triangles in FIG8 , the closer the distance, the higher the similarity).

It can be seen that by dividing the search sub-partitions into more detailed ones, the sorting of the search sub-partitions can be optimized and corrected. In the specific implementation, the search sub-partitions can be further divided, for example, each search sub-partition is further divided into multiple small partitions, so that the search accuracy and speed can be further improved. This application will not be repeated here.

In a possible implementation, the probability value of the target retrieval partition containing the target vector can also be predicted by a prediction model according to each second similarity. The prediction model can be a single-stage model or a two-stage model.

A possible way to train a prediction model is to use a large amount of labeled sample data to train the prediction model. For example, for any sample data, extract features from the sample data to obtain a sample vector; calculate the M first similarities between the sample vector and the M cluster partitions, and determine K retrieval partitions according to the size of the M first similarities; select a target retrieval partition from the K retrieval partitions, calculate the second similarities between the sample vector and each vector in the target retrieval partition, and select t target second similarities from each second similarity; input the sample vector, the K first similarities between the sample vector and the K retrieval partitions, the t target second similarities, and the label into the prediction model, and the label is the probability value of the target retrieval partition containing the target vector. Through multiple trainings, the parameters of the prediction model can be well optimized and adjusted.

Another possible way to train the prediction model is to use a large amount of sample data to train the prediction model, and the parameters of the prediction model are adjusted according to the objective function. The embodiment of the present application does not limit the form of the objective function. Through multiple trainings, the parameters of the prediction model are optimized and adjusted.

If a single-stage model is used, the method of obtaining a probability value according to each second similarity in step 603 can be further refined. FIG. 9 exemplarily shows a method of obtaining a probability value according to each second similarity, which may specifically include the following steps:

Step 901, among the K search partitions, select any unselected search partition as the target search partition; calculate the second similarities between the query vector and each vector in the target search partition. Among the second similarities, determine the t target second similarities that are ranked first in descending order.

The method for determining the target retrieval partition for the query vector is the same as described above and will not be repeated here.

The following specifically describes a method for determining the second similarities of t targets for a query vector.

For example, each second similarity between the vector to be queried and each vector in the target retrieval partition is used as the target second similarity; for example, the second similarity that satisfies a certain threshold among the second similarities is used as the target second similarity; for example, the first t second similarities in front of each second similarity after being sorted from high to low according to similarity are used as the target second similarity; for another example, the largest second similarity among the second similarities is used as the target second similarity. The above is only an example, and the embodiment of the present application does not limit the method for determining the target second similarity. Among them, the fewer the number of target second similarities, the more it can reduce the calculation amount of the prediction model and improve the retrieval speed. For example, taking the largest second similarity among the second similarities of the vector to be queried and all the vectors in the target retrieval partition as the target second similarity can reduce the computing power consumption of the prediction model without affecting the accuracy of the final probability value, and further improve the speed of vector retrieval.

For example, for the query vector q1, 100 second similarities between q1 and 100 vectors in the search partition A are calculated, and the second similarity with the largest value among the 100 second similarities is used as the target second similarity.

Step 902: input the query vector, K first similarities and t target second similarities into a prediction model to obtain a probability value.

By predicting the probability value through the prediction model, the accuracy of determining the probability value is improved, and the speed of determining the probability value through the prediction model is faster than that of the non-model prediction method. Among the second similarities, t target second similarities with the highest second similarities in descending order are selected, and the t target second similarities are input into the prediction model, which can also reduce the amount of calculation of the prediction model, thereby further improving the speed of predicting the probability value without affecting the prediction accuracy.

For example, cluster each vector in the vector base to obtain 10 cluster partitions, each of which corresponds to a partition center vector. Calculate the first similarity between the query vector q1 and the partition center vectors of the 10 cluster partitions, select the first K cluster partitions with the highest first similarity as the retrieval partitions, or select K cluster partitions whose first similarity meets the preset threshold as the retrieval partitions. If three retrieval partitions are determined, namely retrieval partition A, retrieval partition B and retrieval partition C, then determine the three first similarities between the query vector q1 and the partition center vectors of the three retrieval partitions. Then determine the target retrieval partition as retrieval partition A, calculate the 10 second similarities between the query vector q1 and the 10 vectors in retrieval partition A; select 5 target second similarities from the 10 second similarities. Input the query vector q1, the 3 first similarities and the 5 target second similarities into the single-stage model, and the single-stage model outputs a probability value. The probability value is used to characterize the probability that the target vector is contained in the retrieval partition A. The probability value can reflect the retrieval accuracy of the current retrieval. If the probability value is high, it means that the current retrieval accuracy is high. If the probability value is low, it means that the current retrieval accuracy is low. If the probability value meets the first preset threshold, for example, the probability value is 0.98, which is greater than the first preset threshold 0.9, it means that the retrieval partition A already contains most of the target vectors, the retrieval accuracy is high, and the retrieval of the query vector can be terminated.

The above method makes an inference judgment on whether to terminate the search for the current query vector in advance, thereby reducing the possibility of continuing to calculate the similarity between the query vector and the vectors in the remaining search partitions, reducing the search calculation amount and improving the speed of vector search.

However, there is a problem with the single-stage model, which means that if the method in the above embodiment is run in a hardware accelerator, it can only be calculated by matrix multiplication of vectors, and the computing power of the hardware accelerator cannot be fully utilized. Specifically, in the above embodiment, the input of the single-stage model is the vector to be queried, K first similarities, and t target second similarities. Since different retrieval partitions determined by different vectors to be queried are different, the corresponding target retrieval partitions are different, so each second similarity between each vector to be queried and each vector in the target retrieval partition is calculated separately, and then input into the single-stage model respectively after calculation. Then the single-stage model can only use the matrix multiplication of vectors of the hardware accelerator. For example, the retrieval partitions determined by the vector to be queried q1 are retrieval partition A, retrieval partition B and retrieval partition C, and the corresponding target retrieval partition is retrieval partition A; the retrieval partitions determined by the vector to be queried q2 are retrieval partition D, retrieval partition E and retrieval partition F, and the corresponding target retrieval partition is retrieval partition D. The vectors in the search partition A and the search partition D are different, so the second similarities between the query vector q1 and the vectors in the search partition A can only be calculated by matrix multiplication in the hardware accelerator, and then the second similarities between the query vector q2 and the vectors in the search partition D can be calculated by matrix multiplication in the hardware accelerator. Therefore, the probability value corresponding to the query vector q1 and the probability value corresponding to the query vector q2 can only be output separately through the single-stage model, and the single-stage model can only calculate the probability value by matrix multiplication. For example, for the query vector q1, the query vector q1, the three first similarities between the query vector q1 and the partition center vectors of the three retrieval partitions, and t target second similarities between the query vector q1 and each vector in the retrieval partition A are input into the single-stage model, and the single-stage model uses the matrix multiplication vector calculation method of the hardware accelerator to output the probability value of the query vector q1; then, the query vector q2, the three first similarities between the query vector q2 and the partition center vectors of the three retrieval partitions, and t target second similarities between the query vector q2 and each vector in the retrieval partition D are input into the single-stage model, and the single-stage model uses the matrix multiplication vector calculation method of the hardware accelerator to output the probability value of the query vector q2.

It can be seen that the single-stage model can only predict one vector to be queried at a time. This means that only the matrix-vector calculation method of the hardware accelerator can be used. The computational efficiency of the matrix-vector calculation method of the hardware accelerator is far less than the computational efficiency of the matrix-matrix calculation method of the hardware accelerator. This will cause a waste of computing power of the hardware accelerator. In addition, the single-stage model can only predict one vector to be queried at a time. If there are multiple vectors to be queried, the prediction time will be further increased. If a two-stage model is used, the problems existing in the single-stage model can be overcome to a certain extent, and the retrieval speed can be further improved on the basis of the single-stage model.

If a two-stage model is adopted, the number of vectors to be queried in step 601 can be N, where N is a positive integer greater than 1. When N is greater than 1, the advantages of the vector retrieval method provided in the embodiment of the present application can be fully utilized to improve the retrieval speed and reduce the retrieval time. The embodiment of the present application does not limit the way to obtain N vectors to be queried. For example, in a batch query, multiple vectors to be queried are obtained in batches at one time, and multiple vectors to be queried are used as input; for example, after obtaining a single vector to be queried (such as user input in an Internet application), the computing device integrates the single vectors to be queried obtained in sequence into multiple vectors to be queried as input. The integration method can adopt various methods well known to those skilled in the art, and the embodiment of the present application does not limit this.

The method for obtaining the M first similarities in step 602 is: according to the N query vectors and the partition center vectors of the M cluster partitions, the matrix multiplication method of the hardware accelerator is used to obtain the M first similarities between any query vector and the M partition center vectors. Among the M first similarities, the K first similarities ranked in descending order of the first similarities are selected, and the cluster partitions corresponding to the K first similarities are determined as K retrieval partitions, where K is an integer greater than or equal to 1, and K is less than M.

Each vector in the vector base is clustered according to the similarity between the vectors to obtain M cluster partitions. Each query vector needs to calculate the M first similarities with the partition center vectors of the M cluster partitions. In order to speed up the calculation, the N query vectors can be formed into a matrix, and the M partition center vectors can be formed into a matrix. The matrix multiplication method of the hardware accelerator is used for calculation. In this way, the M first similarities between any query vector in the N query vectors and the M partition center vectors can be quickly obtained.

For example, the vectors to be queried are q1 and q2, and the matrix formed is [q1, q2]; the M partition center vectors are m1, m2, m3, m4, m5, m6, m7, m8, m9 and m10, and the matrix formed is [m1, m2, m3, m4, m5, m6, m7, m8, m9, m10]. Figure 11a shows the matrix of M first similarities between any query vector and M partition center vectors obtained by calculating by matrix multiplication of the hardware accelerator. Among them, s11 represents the first similarity between q1 and m1, s12 represents the first similarity between q1 and m2, and so on, which will not be repeated here.

For any query vector, according to the M first similarities between the query vector and the M partition center vectors, the cluster partitions corresponding to the K first similarities ranked in descending order are used as the K retrieval partitions of the query vector. For example, three retrieval partitions are determined for each query vector. The retrieval partitions determined by the query vector q1 are retrieval partition A, retrieval partition B, and retrieval partition C; the retrieval partitions determined by the query vector q2 are retrieval partition D, retrieval partition E, and retrieval partition F.

The method for obtaining the probability value according to each second similarity in step 603 can be further refined. FIG. 10 exemplarily shows a method for obtaining the probability value according to each second similarity, which can specifically include the following steps:

Step 1001, input a matrix formed by N query vectors and a matrix formed by K first similarities corresponding to each of the N query vectors into a first prediction model, and use a matrix-matrix multiplication method of a hardware accelerator to obtain N initial probability values corresponding to the N query vectors; the initial probability values are used to represent the probability that a target vector in the K retrieval partitions corresponding to any query vector contains the query vector.

For example, the query vectors are q1 and q2, and the matrix formed is [q1, q2]; the three first similarities corresponding to the three retrieval partitions of the query vector q1 are s11, s12 and s13, respectively, corresponding to retrieval partition A, retrieval partition B and retrieval partition C; the three first similarities corresponding to the three retrieval partitions of the query vector q2 are s24, s25 and s26, respectively, corresponding to retrieval partition D, retrieval partition E and retrieval partition F. Figure 11b shows the matrix formed by the three first similarities corresponding to each of the two query vectors. In this matrix, it is not necessary to pay attention to which retrieval partitions each query vector corresponds to, because the first prediction model only needs to calculate the initial probability value for each query vector and the three first similarities corresponding to each query vector.

For example, two initial probability values p11 and p12 are generated for the query vector q1 and the query vector q2, respectively, where p11 represents the probability of the target vector containing the query vector q1 in the retrieval partition A, the retrieval partition B, and the retrieval partition C. Wherein p12 represents the probability of the target vector containing the query vector q2 in the retrieval partition D, the retrieval partition E, and the retrieval partition F. The above is only an example.

It can be seen that since the input of the first prediction model is N query vectors and K first similarities corresponding to each of the N query vectors, these features can be input in matrix form. Therefore, the matrix multiplication of the hardware accelerator can be used to obtain N initial probability values corresponding to the N query vectors. In this way, the computing power of the hardware accelerator is fully utilized. Compared with the single-stage model, the computing efficiency can be further improved, the speed of vector retrieval can be increased, and the retrieval time can be reduced.

Step 1002: For any query vector, select any unselected retrieval partition from the K retrieval partitions as a target retrieval partition; determine each second similarity between the query vector and each vector in the target retrieval partition. Among the second similarities, determine t target second similarities that are ranked first in descending order.

For example, for the query vector q1, the target retrieval partition is retrieval partition A, and 100 second similarities between q1 and 100 vectors in retrieval partition A are calculated, and the second similarity with the largest median value of the 100 second similarities is used as the target second similarity. For the query vector q2, the target retrieval partition is retrieval partition D, and 200 second similarities between q1 and 200 vectors in retrieval partition D are calculated, and the second similarity with the largest median value of the 200 second similarities is used as the target second similarity.

Step 1003: for any query vector, the initial probability value corresponding to the query vector and t target second similarities corresponding to the query vector are input into the second prediction model to obtain a final probability value corresponding to the query vector.

Since each query vector corresponds to a different target retrieval partition, the t target second similarities corresponding to different query vectors cannot be obtained at the same time, but are calculated separately, as described in step 1002. Therefore, in step 1003, each query vector is calculated separately, using the matrix multiplication vector calculation method of the hardware accelerator.

For example, for the query vector q1, its corresponding initial probability value p11 and the target second similarity are input into the second prediction model, and the final probability value p21 is obtained by matrix multiplication of the hardware accelerator. p21 reflects the probability of the target vector in the retrieval partition A containing the query vector q1.

For the query vector q2, the corresponding initial probability value p12 and the target second similarity are input into the second prediction model, and the final probability value p22 is obtained by matrix multiplication of the hardware accelerator. p22 reflects the probability of the target vector containing the query vector q2 in the retrieval partition D.

In the above technical solution, the prediction of the probability value is divided into two stages, the first stage uses the first prediction model, and the second stage uses the second prediction model. Specifically, the matrix formed by the N query vectors and the matrix formed by the K first similarities corresponding to each query vector in the N query vectors are input into the first prediction model, so that the first prediction model can use the matrix multiplication method to predict the initial probability value, which gives full play to the computing power, improves the computing efficiency, and further improves the speed of vector retrieval.

If a two-stage model is used, after step 1003, the step of determining the probability value may be further refined. FIG. 12 exemplarily shows a method for determining the probability value, which may specifically include the following steps:

Step 1201: If the final probability value is not greater than the first preset threshold, then select the next unselected retrieval partition from the K retrieval partitions as the target retrieval partition.

For example, the first similarities between the query vector q1 and the retrieval partition A, retrieval partition B and retrieval partition C are 0.9, 0.8 and 0.7 respectively, then the target retrieval partition of the query vector q1 is retrieval partition A, and the next target retrieval partition is retrieval partition B.

Step 1202, input the final probability value corresponding to the query vector and the target second similarity between the query vector and each vector in the next target retrieval partition into the second prediction model, and obtain the updated probability value corresponding to the query vector by matrix multiplication of the hardware accelerator.

The method for determining the second target similarity here is the same as the method for determining the second target similarity in the target retrieval partition in the previous text, and will not be repeated here.

For example, the second similarity between the query vector q1 and each vector in the retrieval partition B is calculated, and the value with the largest second similarity is determined as the target second similarity; the final probability value p21 corresponding to the query vector q1 and the target second similarity corresponding to the query vector q1 are input into the second prediction model, and the updated probability value corresponding to the query vector q1 is obtained by matrix multiplication of the vector by the hardware accelerator. The updated probability value is used to represent the probability that the target vector is included in all current target retrieval partitions. In this example, the updated probability value is used to represent the probability that the target vector is included in the retrieval partition A and the retrieval partition B.

Step 1203, if the updated probability value is not greater than the first preset threshold, the final probability value in step 1202 is updated to the updated probability value, and the process returns to step 1201 to select the next unselected retrieval partition from the K retrieval partitions as the target retrieval partition.

If the updated probability value is not greater than the first preset threshold, it means that the probability that the target vector is included in all current target retrieval partitions is very low, the retrieval accuracy is not high, and the retrieval should be continued.

If the update probability value is greater than the first preset threshold, it means that the probability of all current target search partitions containing the target vector is high, the search accuracy is high, and the search should be terminated. Alternatively, when the K search partitions are polled, the search should also be terminated to save computing power.

By inputting the final probability value into the second prediction model again, an updated probability value is obtained. If the updated probability value is not greater than the second preset threshold, the updated probability value is updated to the final probability value, and the cycle is repeated to determine whether to terminate the search. The accuracy of the judgment on terminating the search is improved, and the vector search accuracy can be improved.

For ease of understanding, the vector search method provided by the embodiment of the present application is generally described below through a specific embodiment. FIG13 is a schematic diagram of the overall flow of a vector search method provided by the embodiment of the present invention, which may include the following steps.

Step 1301, obtaining the vectors q1 and q2 to be queried.

Step 1302, the matrix formed by the query vectors q1 and q2 and the matrix formed by the partition center vectors of the 10 cluster partitions are multiplied by the matrix-matrix method of the hardware accelerator to obtain 10 first similarities between any query vector and the 10 partition center vectors.

Step 1303, among the 10 first similarities corresponding to the query vector q1, determine the 3 retrieval partitions corresponding to the top 3 values of the first similarities from high to low; among the 10 first similarities corresponding to the query vector q2, determine the 3 retrieval partitions corresponding to the top 3 values of the first similarities from high to low.

For example, the three retrieval partitions corresponding to the query vector q1 are retrieval partition A, retrieval partition B, and retrieval partition C. The three retrieval partitions corresponding to the query vector q2 are retrieval partition D, retrieval partition E, and retrieval partition F.

Step 1304: input the matrix formed by the query vectors q1 and q2 and the matrices formed by the three first similarities corresponding to the query vectors q1 and q2 respectively into the first prediction model.

Step 1305: In the first prediction model, the matrix multiplication method of the hardware accelerator is used to obtain the initial probability values corresponding to the query vectors q1 and q2 respectively.

Step 1306: for the query vector q1, sort the three search partitions according to the magnitude of the first similarity.

For example, the first similarities corresponding to retrieval partition A, retrieval partition B, and retrieval partition C are 0.9, 0.8, and 0.7, respectively.

Step 1307: determine the search partition with the largest first similarity as the target search partition of the query vector q1. For example, determine the search partition A as the i-th target search partition of the query vector q1.

Step 1308, calculating the second similarity between the query vector q1 and each vector in the target retrieval partition, and taking the maximum value of the second similarity as the target second similarity of the target retrieval partition.

Step 1309: input the initial probability value corresponding to the query vector q1 and the target second similarity into the second prediction model, and obtain the final probability value corresponding to the query vector q1 by matrix multiplication of the hardware accelerator.

Step 1310, determine whether the final probability value is greater than a first preset threshold, if so, proceed to step 1311. If not, proceed to step 1312.

Step 1311, terminate the search for the query vector q1. Return the vectors that meet the similarity requirement with the query vector in all current target search partitions as the query results. For example, if the final probability value corresponding to search partition A is 0.98, which is greater than the first preset threshold, then the vectors corresponding to the first W second similarities in search partition A with the query vector are returned as the query results.

Step 1312, selecting the next unselected retrieval partition among the three retrieval partitions as the target retrieval partition.

For example, the final probability value corresponding to the retrieval partition A is 0.58, which is not greater than the first preset threshold, and the retrieval partition B is selected as the target retrieval partition.

Step 1313, calculating the second similarity between the query vector q1 and each vector in the next target retrieval partition, and taking the maximum value of the second similarity as the target second similarity of the next target retrieval partition.

Step 1314, input the final probability value and the target second similarity of the next target retrieval partition into the second prediction model, and use the matrix multiplication vector method of the hardware accelerator to obtain the updated probability value corresponding to the query vector q1.

Step 1315 , determining whether the update probability value is greater than a first preset threshold, if so, proceeding to step 1311 , if not, proceeding to step 1316 .

Step 1316, update the final probability value in step 1314 to the updated probability value, and return to step 1312.

For the vector q2 to be queried, please refer to the processing steps for the vector q1 to be queried in the above steps 1306 to 1316 to determine the query result of the vector q2 to be queried, which will not be repeated here.

It should be noted that the steps in the above method embodiments are all described by taking the execution of the computing device 20 as an example. In addition, the steps in the above method embodiments can also be executed by the processor 201 in the computing device 20.

Based on the above content and the same technical concept, an embodiment of the present application provides a vector retrieval device, as shown in Figure 14, the vector retrieval device includes an acquisition unit 1401 and a processing unit 1402. The vector retrieval device is used to execute the method embodiments shown in Figure 5a, Figure 6, Figure 9, Figure 10, Figure 12 or Figure 13 above.

When the vector retrieval device is used to implement the function in the method embodiment shown in FIG13, the acquisition unit 1401 is used to acquire the vector to be queried; the processing unit 1402 is used to: perform similarity calculations on the vector to be queried and the partition center vectors of M cluster partitions respectively, to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarities between the vectors; the partition center vector of any cluster partition is determined based on the multiple vectors contained in any cluster partition, and M is an integer greater than 1; among the M first similarities, K first similarities whose first similarities are sorted from high to low are selected, and the cluster partitions corresponding to the K first similarities are respectively The area is determined to be K retrieval partitions, K is an integer greater than or equal to 1, and K is less than M; the following operations are performed in a loop until it is determined that the probability value of the target retrieval partition selected from the K retrieval partitions containing the target vector is greater than a first preset threshold, and the target vector is a vector whose similarity with the vector to be queried is within a preset range: a retrieval partition that has not been selected is selected from the K retrieval partitions as the target retrieval partition; a second similarity between the vector to be queried and each vector contained in the target retrieval partition is calculated; according to each second similarity, a probability value of the target retrieval partition containing the target vector is determined; based on at least one retrieval partition that has been selected and the vector to be queried, a query result is output.

In a possible implementation, when the processing unit 1402 outputs the query result based on at least one selected retrieval partition and the vector to be queried, it is specifically used to: output each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition as the query result; or, according to the order of the second similarities between each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition and the vector to be queried from high to low, output the vectors corresponding to the first W second similarities as the query result, where W is a positive integer.

In a possible implementation, when the processing unit 1402 outputs the query result based on the at least one selected retrieval partition and the vector to be queried, it is specifically configured to: output the vectors respectively included in the at least one selected retrieval partition as the query result; or output the vectors respectively included in the at least one selected retrieval partition as the query result according to the order of the second similarities between the vectors respectively included in the at least one selected retrieval partition and the vector to be queried from high to low. The vectors corresponding to the top W second similarities are output as the query result, where W is a positive integer.

In a possible implementation, when the processing unit 1402 selects an unselected retrieval partition as a target retrieval partition from the K retrieval partitions, it is specifically configured to: select an unselected retrieval partition as a target retrieval partition from the K retrieval partitions in descending order of the K first similarities.

In one possible implementation, when the processing unit 1402 selects a retrieval partition that has not been selected as a target retrieval partition from K retrieval partitions, it is specifically used to: for any retrieval partition from the K retrieval partitions, cluster each vector in the retrieval partition according to the similarity between the vectors to obtain multiple retrieval sub-partitions; determine the sub-partition center vector of any retrieval sub-partition based on the multiple vectors contained in any retrieval sub-partition; calculate the third similarities between the vector to be queried and the sub-partition center vectors of the multiple retrieval sub-partitions; sort the K retrieval partitions based on the multiple third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition; and select a retrieval partition that has not been selected as the target retrieval partition from the sorted K retrieval partitions.

In a possible implementation, when the processing unit 1402 sorts the K retrieval partitions according to the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition, it is specifically used to: sort the K retrieval partitions according to the number of third similarities that exceed a second preset threshold among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition; or sort the K retrieval partitions according to the maximum similarity among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition.

In one possible implementation, when the processing unit 1402 determines the probability value of the target retrieval partition containing the target vector based on each second similarity, it is specifically used to: determine, among each second similarity, t target second similarities whose second similarities are ranked from high to low; input the query vector, K first similarities and t target second similarities into the prediction model to obtain a probability value; the prediction model is used to predict the probability value of the target retrieval partition containing the target vector.

In a possible implementation, there are N vectors to be queried, where N is a positive integer greater than 1; when the processing unit 1402 inputs the vector to be queried, K first similarities and t target second similarities into the prediction model to obtain the probability value, it is specifically used to: input a matrix formed by the N vectors to be queried and a matrix formed by the K first similarities corresponding to each of the N vectors to be queried into the first prediction model to obtain N initial probability values corresponding to the N vectors to be queried; the initial probability value is used to characterize the probability that the target vector of the K retrieval partitions corresponding to any one of the vectors to be queried contains the vector to be queried; for any one of the vectors to be queried, the initial probability value corresponding to the vector to be queried and the t target second similarities corresponding to the vector to be queried are input into the second prediction model to obtain the final probability value corresponding to the vector to be queried.

Based on the above content and the same technical concept, an embodiment of the present application also provides a computer-readable storage medium, on which a computer program or instruction is stored. When the computer program or instruction is executed, the computer executes the method in the above method embodiment.

Based on the above content and the same technical concept, an embodiment of the present application provides a computer program product. When a computer reads and executes the computer program product, the computer executes the method in the above method embodiment.

It is understood that the various numbers involved in the embodiments of the present application are only for the convenience of description and are not used to limit the scope of the embodiments of the present application. The size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic.

Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the scope of protection of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

A vector search method, characterized by comprising:

Get the vector to be queried;

Calculate similarity between the query vector and the partition center vectors of M cluster partitions respectively to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarity between the vectors; the partition center vector of any of the cluster partitions is determined according to a plurality of vectors contained in any of the cluster partitions, and M is an integer greater than 1;

Among the M first similarities, select K first similarities ranked first in descending order, and determine the cluster partitions corresponding to the K first similarities as K search partitions, where K is an integer greater than or equal to 1, and K is less than M;

The following operations are performed in a loop until it is determined that the probability value of the target search partition selected from the K search partitions containing the target vector is greater than a first preset threshold, wherein the target vector is a vector whose similarity with the query vector is within a preset range:

Selecting a retrieval partition that has not been selected from the K retrieval partitions as a target retrieval partition;

Calculating second similarities between the query vector and each vector included in the target retrieval partition;

Determining, according to each of the second similarities, a probability value of including a target vector in the target retrieval partition;

Based on the at least one selected retrieval partition and the vector to be queried, a query result is output.
The method according to claim 1, characterized in that outputting the query result based on the at least one selected search partition and the vector to be queried comprises:

Outputting each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition as a query result; or

According to the order of the second similarities between the vectors contained in the retrieval partition whose probability value is greater than the first preset threshold in at least one selected retrieval partition and the vector to be queried, the vectors corresponding to the first W second similarities are output as query results, where W is a positive integer.
The method according to claim 1, characterized in that outputting the query result based on the at least one selected search partition and the vector to be queried comprises:

Outputting the vectors respectively contained in the at least one selected search partition as query results; or

According to the descending order of the second similarities between the vectors respectively contained in the at least one selected retrieval partition and the vector to be queried, the first W vectors corresponding to the second similarities are output as query results, where W is a positive integer.
The method according to claim 1, characterized in that selecting a retrieval partition that has not been selected as a target retrieval partition from the K retrieval partitions comprises:

A retrieval partition that has not been selected is selected from the K retrieval partitions as a target retrieval partition in a descending order of the K first similarities.
The method according to claim 1, characterized in that selecting a retrieval partition that has not been selected as a target retrieval partition from the K retrieval partitions comprises:

For any retrieval partition among the K retrieval partitions, clustering the vectors in the retrieval partition according to the similarity between the vectors to obtain a plurality of retrieval sub-partitions; determining a sub-partition center vector of any retrieval sub-partition according to the plurality of vectors contained in any retrieval sub-partition;

Calculating third similarities between the query vector and the sub-partition center vectors of the plurality of retrieval sub-partitions respectively;

sorting the K search partitions according to a plurality of third similarities between the query vector and a plurality of sub-partition center vectors in each search partition;

A retrieval partition that has not been selected is selected from the sorted K retrieval partitions as the target retrieval partition.
The method of claim 5, wherein the step of sorting the K search partitions according to a plurality of third similarities between the query vector and a plurality of sub-partition center vectors in each search partition comprises:

sorting the K retrieval partitions according to the number of the third similarities between the query vector and the central vectors of the multiple sub-partitions in each retrieval partition that exceeds a second preset threshold; or

The K retrieval partitions are sorted according to the maximum similarity among multiple third similarities between the query vector and multiple sub-partition center vectors in each retrieval partition.
The method according to claim 1, characterized in that determining, according to each of the second similarities, a probability value that the target vector is included in the target retrieval partition comprises:

Among the second similarities, determine t target second similarities that are ranked first in descending order of the second similarities;

The query vector, the K first similarities and the t target second similarities are input into a prediction model to obtain the probability value; the prediction model is used to predict the probability value of the target vector being included in the target retrieval partition.
The method according to claim 7, wherein the number of vectors to be queried is N, and N is a positive integer greater than 1;

Inputting the query vector, the K first similarities and the t target second similarities into a prediction model to obtain the probability value includes:

Inputting a matrix formed by N query vectors and a matrix formed by the K first similarities corresponding to each of the N query vectors into a first prediction model to obtain N initial probability values corresponding to the N query vectors; the initial probability values are used to represent the probability that a target vector of the query vector is included in the K search partitions corresponding to any query vector;

For any vector to be queried, the initial probability value corresponding to the vector to be queried and the t target second similarities corresponding to the vector to be queried are input into the second prediction model to obtain a final probability value corresponding to the vector to be queried.
A vector search device, characterized by comprising:

An acquisition unit, used for acquiring a vector to be queried;

A processing unit for:

Calculate similarity between the query vector and the partition center vectors of M cluster partitions respectively to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarity between the vectors; the partition center vector of any of the cluster partitions is determined according to a plurality of vectors contained in any of the cluster partitions, and M is an integer greater than 1;

Among the M first similarities, select K first similarities ranked first in descending order, and determine the cluster partitions corresponding to the K first similarities as K search partitions, where K is an integer greater than or equal to 1, and K is less than M;

The following operations are performed in a loop until it is determined that the probability value of the target search partition selected from the K search partitions containing the target vector is greater than a first preset threshold, wherein the target vector is a vector whose similarity with the query vector is within a preset range:

Selecting a retrieval partition that has not been selected from the K retrieval partitions as a target retrieval partition;

Calculating second similarities between the query vector and each vector included in the target retrieval partition;

Determining, according to each of the second similarities, a probability value of including a target vector in the target retrieval partition;

Based on the at least one selected retrieval partition and the vector to be queried, a query result is output.
The apparatus according to claim 9, wherein when the processing unit outputs the query result based on the at least one selected retrieval partition and the vector to be queried, it is specifically configured to:

Outputting each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition as a query result; or

According to the order from high to low of the second similarities between the vectors contained in the retrieval partition whose probability value is greater than the first preset threshold in at least one retrieval partition that has been selected and the vector to be queried, the vectors corresponding to the first W second similarities are output as query results, where W is a positive integer.
The apparatus according to claim 9, wherein when the processing unit outputs the query result based on the at least one selected retrieval partition and the vector to be queried, it is specifically configured to:

Outputting the vectors respectively contained in the at least one selected search partition as query results; or

According to the descending order of the second similarities between the vectors respectively contained in the at least one selected retrieval partition and the vector to be queried, the first W vectors corresponding to the second similarities are output as query results, where W is a positive integer.
The device according to claim 9, characterized in that when the processing unit selects a retrieval partition that has not been selected as the target retrieval partition among the K retrieval partitions, it is specifically used to:

A retrieval partition that has not been selected is selected from the K retrieval partitions as a target retrieval partition in the order of the K first similarities from high to low.
The device according to claim 9, characterized in that when the processing unit selects a retrieval partition that has not been selected as the target retrieval partition among the K retrieval partitions, it is specifically used to:

For any retrieval partition among the K retrieval partitions, clustering the vectors in the retrieval partition according to the similarity between the vectors to obtain a plurality of retrieval sub-partitions; determining a sub-partition center vector of any retrieval sub-partition according to the plurality of vectors contained in any retrieval sub-partition;

Calculating third similarities between the query vector and the sub-partition center vectors of the plurality of retrieval sub-partitions respectively;

sorting the K search partitions according to a plurality of third similarities between the query vector and a plurality of sub-partition center vectors in each search partition;

A retrieval partition that has not been selected is selected from the sorted K retrieval partitions as the target retrieval partition.
The apparatus according to claim 9, wherein when the processing unit sorts the K search partitions according to multiple third similarities between the query vector and multiple sub-partition center vectors in each search partition, it is specifically configured to:

sorting the K retrieval partitions according to the number of the third similarities between the query vector and the central vectors of the multiple sub-partitions in each retrieval partition that exceeds a second preset threshold; or

The K retrieval partitions are sorted according to the maximum similarity among multiple third similarities between the query vector and multiple sub-partition center vectors in each retrieval partition.
The device according to claim 9, characterized in that when the processing unit determines the probability value of the target retrieval partition containing the target vector according to each of the second similarities, it is specifically used to:

Among the second similarities, determine t target second similarities that are ranked first in descending order;

The query vector, the K first similarities and the t target second similarities are input into a prediction model to obtain the probability value; the prediction model is used to predict the probability value of the target vector being included in the target retrieval partition.
The device according to claim 15, wherein the number of vectors to be queried is N, and N is a positive integer greater than 1;

When the processing unit inputs the query vector, the K first similarities and the t target second similarities into the prediction model to obtain the probability value, it is specifically used to:

Inputting a matrix formed by N query vectors and a matrix formed by the K first similarities corresponding to each of the N query vectors into a first prediction model to obtain N initial probability values corresponding to the N query vectors; the initial probability values are used to represent the probability that a target vector of the query vector is included in the K search partitions corresponding to any query vector;

For any vector to be queried, the initial probability value corresponding to the vector to be queried and the t target second similarities corresponding to the vector to be queried are input into the second prediction model to obtain a final probability value corresponding to the vector to be queried.
A computer-readable storage medium, characterized in that a computer program or instruction is stored in the computer-readable storage medium, and when the computer program or instruction is executed by a vector retrieval device, the method as described in any one of claims 1 to 8 is implemented.
A chip, characterized in that it includes at least one processor and an interface; the interface is used to provide program instructions or data to the at least one processor; the at least one processor is used to execute the program line instructions to implement the method as described in any one of claims 1 to 8.