WO2024067593A1 - Vector retrieval method and device - Google Patents

Vector retrieval method and device Download PDF

Info

Publication number
WO2024067593A1
WO2024067593A1 PCT/CN2023/121585 CN2023121585W WO2024067593A1 WO 2024067593 A1 WO2024067593 A1 WO 2024067593A1 CN 2023121585 W CN2023121585 W CN 2023121585W WO 2024067593 A1 WO2024067593 A1 WO 2024067593A1
Authority
WO
WIPO (PCT)
Prior art keywords
retrieval
vector
partition
similarities
vectors
Prior art date
Application number
PCT/CN2023/121585
Other languages
French (fr)
Chinese (zh)
Inventor
邝达
施佩珍
王兵
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024067593A1 publication Critical patent/WO2024067593A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Definitions

  • the present application relates to the field of search technology, and in particular to a vector search method and device.
  • Vector retrieval plays an important role in the field of information retrieval.
  • the process of vector retrieval is to first construct a vector base library, which contains a large number of vectors obtained by feature extraction of a large amount of data.
  • the data can be in the form of pictures, videos, audio, text, etc.
  • the similarity between the query vector input by the user and all the vectors in the vector base library is calculated respectively, and the vectors corresponding to the first W similarities are sorted from high to low as the query result of the query vector.
  • This method performs a global search and comparison on a vector database containing hundreds of millions or even billions of vectors. It has a low retrieval throughput (Query Per Second) and a low retrieval speed.
  • the present application provides a vector search method and device, which are used to solve the problem of low search speed in existing vector search methods.
  • the present application provides a vector retrieval method, which can be specifically executed by a computing device or by a chip inside the computing device, or by a processor in the computing device.
  • the method includes: obtaining a vector to be queried;
  • the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarity between the vectors; the partition center vector of any cluster partition is determined according to the multiple vectors contained in any cluster partition, and M is an integer greater than 1; among the M first similarities, select K first similarities ranked in descending order, and determine the cluster partitions corresponding to the K first similarities as K retrieval partitions, and K is an integer greater than or equal to 1, and K is less than M;
  • the following operations are performed in a loop until it is determined that the probability value of the target search partition selected from the K search partitions containing the target vector is greater than a first preset threshold, wherein the target vector is a vector whose similarity with the query vector is within a preset range:
  • a query result is output.
  • each vector in the vector base library is clustered to obtain M cluster partitions, each cluster partition corresponds to a partition center vector; by calculating the first similarity between the vector to be queried and the partition center vectors of the M cluster partitions, K retrieval partitions are selected from the M cluster partitions according to the size relationship of the M first similarities; the target retrieval partition is selected in turn from the K retrieval partitions, and the probability value of the vector identical or similar to the vector to be queried falling in the selected target retrieval partition is determined for each selected target retrieval partition until a target retrieval partition with a probability value greater than a first preset threshold is selected. Then the query result of the vector to be queried is determined in at least one selected target retrieval partition. In this way, the amount of calculation can be reduced and the query speed can be improved.
  • the query result is output, including: outputting each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition as the query result; or, according to the order from high to low of the second similarities between each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition and the vector to be queried, outputting the vectors corresponding to the first W second similarities as the query result, where W is a positive integer.
  • all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are output as the query result, which can effectively reduce the amount of calculation and improve the retrieval speed.
  • all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are respectively outputted from high to low with respect to the second similarity between the vector to be queried.
  • the vectors corresponding to the W second similarities that are sorted in the order of are output as the query result, which can further simplify the output result.
  • the query result is output, including: outputting the vectors respectively contained in the at least one selected retrieval partition as the query result; or according to the order from high to low of the second similarities between the vectors respectively contained in the at least one selected retrieval partition and the vector to be queried, outputting the vectors corresponding to the first W second similarities as the query result, where W is a positive integer.
  • the query result is output based on at least one selected retrieval partition, rather than outputting the query result based on the retrieval partition whose probability value is greater than the first preset threshold. Since the retrieval partition whose probability value of the vector identical or similar to the vector to be queried falls in the selected partition is not greater than the first preset threshold, there may also be vectors with a high similarity to the vector to be queried, so the above solution can improve the accuracy of vector retrieval.
  • selecting an unselected retrieval partition as the target retrieval partition from the K retrieval partitions may be performed by selecting an unselected retrieval partition as the target retrieval partition from the K retrieval partitions in a descending order of the K first similarities.
  • the target retrieval partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of re-selecting the target retrieval partition is minimized. There is no need to calculate the similarity between the query vector and the vector in the target retrieval partition selected again, thereby reducing the amount of calculation and improving the retrieval speed.
  • selecting a retrieval partition that has not been selected as a target retrieval partition from among the K retrieval partitions may also be as follows: for any retrieval partition among the K retrieval partitions, clustering each vector in the retrieval partition according to the similarity between the vectors to obtain a plurality of retrieval sub-partitions; determining a sub-partition center vector of any retrieval sub-partition according to a plurality of vectors contained in any retrieval sub-partition; calculating a third similarity between the vector to be queried and the sub-partition center vectors of the plurality of retrieval sub-partitions; sorting the K retrieval partitions according to a plurality of third similarities between the vector to be queried and the plurality of sub-partition center vectors in each retrieval partition; and selecting a retrieval partition that has not been selected as the target retrieval partition from among the sorted K retrieval partitions.
  • each retrieval sub-partition corresponds to a sub-partition center vector. Due to the more detailed division, the sub-partition center vector obtained by the division can more accurately represent the vector in the retrieval sub-partition. Based on the multiple third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition, the K retrieval partitions are sorted, and the accuracy of the sorting can be improved.
  • the target retrieval partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of re-selecting the target retrieval partition is minimized, so there is no need to calculate the similarity between the vector to be queried and the vector in the target retrieval partition selected again, so the amount of calculation can be reduced and the retrieval speed can be improved.
  • the K retrieval partitions are sorted according to multiple third similarities between the vector to be queried and multiple sub-partition center vectors in each retrieval partition, including: sorting the K retrieval partitions according to the number of third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition that exceed a second preset threshold; or sorting the K retrieval partitions according to the maximum similarity among the multiple third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition.
  • the sorting difficulty can be reduced, the sorting speed can be increased, and the search speed can be increased.
  • the accuracy of the sorting can also be improved. In this way, the target search partition with a probability value greater than the first preset threshold can be determined as early as possible.
  • the probability value of the target vector being included in the target retrieval partition is determined based on each of the second similarities, including: among each of the second similarities, determining t target second similarities whose second similarities are ranked first from high to low; inputting the query vector, the K first similarities and the t target second similarities into a prediction model to obtain the probability value; the prediction model is used to predict the probability value of the target vector being included in the target retrieval partition.
  • the accuracy and speed of determining the probability value are improved.
  • t target second similarities with the highest second similarities ranked in descending order are selected, and the t target second similarities are input into the prediction model, which can also reduce the calculation amount of the prediction model and improve the speed of predicting the probability value without affecting the prediction accuracy.
  • N there are N vectors to be queried, where N is a positive integer greater than 1; accordingly, the vector to be queried, the K first similarities and the t target second similarities are input into a prediction model to obtain the probability value, including: inputting a matrix formed by the N vectors to be queried and a matrix formed by the K first similarities corresponding to each of the N vectors to be queried into a first prediction model to obtain N initial probability values corresponding to the N vectors to be queried; the initial probability value is used to characterize the probability that the target vector of the vector to be queried is contained in the K retrieval partitions corresponding to any vector to be queried; for any vector to be queried, the initial probability value corresponding to the vector to be queried and the t target second similarities corresponding to the vector to be queried are input into a second prediction model to obtain a final probability value corresponding to the vector to be queried.
  • the prediction of the probability value is divided into two stages, the first stage uses the first prediction model, and the second stage uses the second prediction model.
  • the matrix formed by the N query vectors and the matrix formed by the K first similarities corresponding to each query vector in the N query vectors are input into the first prediction model, so that the first prediction model can use the matrix multiplication method to predict the initial probability value, which gives full play to the computing power, improves the computing efficiency, and further improves the speed of vector retrieval.
  • an embodiment of the present application provides a vector search device, which has the function of implementing the method in the first aspect or any possible implementation of the first aspect, and the device can be a computing device or a processor included in the computing device.
  • the functions of the above-mentioned vector search device can be implemented by hardware, or by hardware executing corresponding software, and the hardware or software includes one or more modules or units or means corresponding to the above-mentioned functions.
  • the structure of the device includes a processing module and a transceiver module, wherein the processing module is configured to support the device to execute the method in the first aspect or any one of the implementations of the first aspect.
  • the transceiver module is used to support communication between the device and other devices, for example, it can receive data from an acquisition device.
  • the vector retrieval device may also include a storage module, which is coupled to the processing module and stores program instructions and data necessary for the device.
  • the processing module may be a processor
  • the transceiver module may be a transceiver
  • the storage module may be a memory.
  • the memory may be integrated with the processor or may be set separately from the processor.
  • the structure of the device includes a processor and may also include a memory.
  • the processor is coupled to the memory and may be used to execute computer program instructions stored in the memory so that the device performs the method in the first aspect or any possible implementation of the first aspect.
  • the device further includes a communication interface, and the processor is coupled to the communication interface.
  • the communication interface may be a transceiver or an input/output interface.
  • an embodiment of the present application provides a chip, including a processor, wherein the processor is coupled to a memory, and the memory is used to store programs or instructions.
  • the chip implements the method in the above-mentioned first aspect or any possible implementation method of the first aspect.
  • the chip further includes an interface circuit for interacting code instructions with the processor.
  • processors in the chip there may be one or more processors in the chip, and the processor may be implemented by hardware or software.
  • the processor When implemented by hardware, the processor may be a logic circuit, an integrated circuit, etc.
  • the processor When implemented by software, the processor may be a general-purpose processor implemented by reading software code stored in a memory.
  • the memory in the chip may be one or more.
  • the memory may be integrated with the processor or may be separately provided with the processor.
  • the memory may be a non-transient processor, such as a read-only memory ROM, which may be integrated with the processor on the same chip or may be provided on different chips.
  • an embodiment of the present application provides a computer-readable storage medium having a computer program or instructions stored thereon.
  • the computer program or instructions When executed, the computer executes the method in the above-mentioned first aspect or any possible implementation of the first aspect.
  • an embodiment of the present application provides a computer program product.
  • the computer reads and executes the computer program product, the computer executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
  • FIG. 1a is a schematic diagram of performing vector retrieval in a scene of image search provided by the present application
  • FIG1b is a schematic diagram of performing vector retrieval in a drug discovery scenario provided by the present application.
  • FIG2 is a schematic diagram of a system architecture provided by the present application.
  • FIG3 is a schematic diagram of the structure of a computing device provided by the present application.
  • FIG4 is a schematic diagram of the structure of a processor provided by the present application.
  • FIG5a is a schematic diagram of a process flow of a vector retrieval technology provided by the present application.
  • FIG5b is a schematic diagram of M cluster partitions obtained after clustering vectors in a vector base library provided by the present application.
  • FIG6 is a schematic diagram of a flow chart of a vector search method provided by the present application.
  • FIG7 is a schematic diagram of dividing any search partition into search sub-partitions provided by the present application.
  • FIG8 is a schematic diagram of a second similarity between a query vector and a partition center vector of each retrieval partition and a third similarity between a query vector and a sub-partition center vector of each retrieval sub-partition provided by the present application;
  • FIG9 is a flow chart of a method for obtaining a probability value according to each second similarity provided by the present application.
  • FIG10 is a flow chart of another method for obtaining a probability value according to each second similarity provided by the present application.
  • FIG. 11a is a diagram of any query vector and M vectors obtained by calculating the matrix multiplication by a hardware accelerator provided in the present application. Schematic diagram of the matrix of M first similarities of the partition center vector;
  • FIG. 11b is a schematic diagram of a matrix formed by three first similarities corresponding to each of two query vectors provided in the present application;
  • FIG12 is a schematic diagram of a method for determining a probability value provided by the present application.
  • FIG13 is a schematic diagram of the overall process of a vector retrieval method provided by the present application.
  • FIG14 is a schematic diagram of a vector retrieval device provided in the present application.
  • Vector retrieval technology In a given vector data set, vectors similar to the query vector are retrieved according to a certain metric.
  • K-means clustering algorithm is an iterative clustering analysis algorithm. Specifically, given the number of categories k, the entire data set is clustered. The objective function is the minimum sum of the distances from all samples to the class center. The objective function is iteratively calculated and optimized to obtain k class centers and the category to which each sample belongs.
  • Retrieval precision also known as recall rate.
  • the retrieval system searches the query vector and returns W vectors as query results. Let the set of the returned W vectors be X, and let the set of the top W vectors in the entire vector base that are ranked from high to low in similarity with the query vector be Y. Then the retrieval precision of the retrieval system for the query vector is
  • FIG1a shows a schematic diagram of vector retrieval in a scenario of image search. Specifically, a large number of vectors are obtained by performing feature extraction on a large number of images, and these large number of vectors are formed into a vector base library; the vectors to be queried obtained after feature extraction on the image to be queried are searched in the vector base library, and vectors that meet the similarity requirements with the vectors to be queried are retrieved; it is determined from which images these vectors that meet the similarity requirements are extracted, and these determined images are returned as query results.
  • images containing products with similar appearance to the product images input by the user are retrieved; for example, based on the images of videos that the user frequently browses, other videos similar to these images are retrieved and pushed to the user, etc.
  • the ever-increasing data scale of the Internet has put forward higher requirements on the retrieval speed and efficiency of the retrieval system.
  • Figure 1b shows a schematic diagram of vector retrieval in a drug discovery scenario.
  • a large number of vectors obtained by encoding a large number of compounds with an encoder form a vector base library; the vectors to be queried obtained by encoding the active fragments or lead compounds of the drug to be queried with the encoder are searched in the vector base library to retrieve vectors that meet the similarity requirements with the vectors to be queried; the compounds corresponding to these vectors that meet the similarity requirements are returned as query results.
  • the research and development of new drugs requires searching for compounds similar to the active fragments or lead compounds of new drugs as potential drugs in a compound base library of hundreds of millions/billions. Since the selection of similar compounds will affect subsequent animal experiments and clinical trials with longer cycles, this application also places great demands on the retrieval speed of the retrieval system.
  • Method 1 Calculate the similarity between the query vector and all vectors in the entire vector base, and select W vectors ranked in descending order of similarity as the query results.
  • Method 2 Calculate the similarity between the query vector and each vector in the entire vector base database in turn until W vectors whose similarity meets the preset threshold are found, then stop calculating the similarity between the query vector and the remaining vectors in the vector base database.
  • Method 1 needs to calculate the similarity between the vector to be queried and all the vectors in the entire vector base. Although it can ensure the retrieval accuracy, the number of vectors in the vector base is very large, generally in the hundreds of millions/billions. This will result in a huge amount of calculation, which limits the improvement of the retrieval speed.
  • the amount of calculation in method 2 is reduced compared to method 1, but if the preset threshold is set high, the amount of calculation is still large and the retrieval speed is slow; if the preset threshold is set low, the retrieval accuracy is affected. Therefore, using method 2 for vector retrieval has high requirements for the setting of the preset threshold, and even different preset thresholds need to be set for different vectors to be queried, and the retrieval method is not flexible enough.
  • the above-mentioned vector search method cannot take into account both search accuracy and search speed. Based on this, the present application embodiment provides a vector search method to improve the search speed while ensuring the search accuracy.
  • FIG2 provides a schematic diagram of a system architecture applicable to an embodiment of the present application, wherein the system includes a collection device 10, a computing device 20, and a storage device 30.
  • the collection device 10 may be one or more
  • the computing device 20 may be one or more
  • the storage device 30 may be one or more.
  • One or more collection devices 10, one or more computing devices 20, and one or more storage devices 30 may be connected via a network.
  • the acquisition device 10 can be used to collect data and send the collected data to the computing device 20 through the network.
  • the acquisition device 10 can be a camera, a mobile phone, a computer, etc., and the data collected by the acquisition device 10 can be pictures, videos, audio, text, etc.
  • the acquisition device 10 can specifically be a camera, and the data collected by the camera can be, for example, pictures and/or videos taken by the camera.
  • the computing device 20 is used to extract features from any data obtained to obtain the vector corresponding to the data; a large number of vectors corresponding to a large amount of data form a vector base library, and a large number of vectors in the vector base library are clustered and calculated according to the similarity between the vectors, thereby obtaining M cluster partitions, and the similarity between each vector in each cluster partition is relatively high, wherein M is an integer greater than 1.
  • Each cluster partition has a corresponding partition center vector, and the partition center vector of each cluster partition is determined according to the multiple vectors contained in the cluster partition. For example, the partition center vector of the cluster partition can be determined according to the mean, mode or median of the multiple vectors contained in the cluster partition.
  • the partition center vector can be understood as a representative of the multiple vectors contained in the cluster partition, representing the characteristics of each vector contained in the cluster partition.
  • the embodiment of the present application does not limit the clustering algorithm.
  • a k-means clustering algorithm, a mean shift clustering method, and a density-based clustering method can be used to perform clustering calculations on a large number of vectors in the vector base library according to the similarity between the vectors, thereby obtaining M cluster partitions.
  • the storage device 30 can be used to store multiple cluster partitions calculated by the computing device.
  • FIG5b shows a schematic diagram of M cluster partitions obtained after clustering the vectors in the vector base library.
  • 8 cluster partitions are obtained by clustering the vectors according to the similarity between the vectors, and the cluster partitions are distinguished by solid lines; the average value of each vector in each cluster partition is taken to obtain the partition center vector of the cluster partition, and the partition center vector is represented by a five-pointed star in the figure, and multiple black solid dots are used to represent multiple vectors contained in the cluster partition except the partition center vector.
  • a cluster partition contains 3 vectors, namely [1, 1, 1], [2, 2, 2] and [3, 3, 3], then the partition center vector of the cluster partition can be [2, 2, 2].
  • the partition center vector of each cluster partition in the M cluster partitions and each vector contained in each cluster partition can be sent to the storage device 30 for storage.
  • the storage device 30 can store the data structure shown in FIG. 5b for the computing device 20 to perform subsequent vector retrieval.
  • the acquisition device 10 can be used to acquire or obtain the data to be queried, and send the data to be queried to the computing device 20.
  • a user opens a shopping application and enters a picture to be queried containing a product to be queried in the shopping application.
  • the acquisition device acquires the picture to be queried and can send the picture to be queried to the computing device 20.
  • the computing device 20 is used to extract features of the query image to obtain a query vector corresponding to the query image; then search for similar vectors in the M cluster partitions stored in the storage device 30 according to the query vector, and feed back the found similar vectors to the user.
  • the acquisition device 10, the computing device 20 and the storage device 30 may be integrated into the same device or respectively arranged in different devices.
  • the computing device 20 and the storage device 30 may be integrated into a server, and the acquisition device 10 may be integrated into a terminal device.
  • FIG. 3 is a schematic diagram of a possible structure of a computing device 20, and the computing device 20 includes a processor 201, a memory 202, and a communication interface 203. Among them, any two of the processor 201, the memory 202, and the communication interface 203 may be connected via a bus 204.
  • the processor 201 may be a central processing unit (CPU), which may be used to execute software programs in the memory 202 to implement one or more functions, such as extracting features from data.
  • the processor 201 may also be an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a system on chip (SoC) or a complex programmable logic device (CPLD), a graphics processing unit (GPU), a neural-network processing unit (NPU), etc.
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • SoC system on chip
  • CPLD complex programmable logic device
  • GPU graphics processing unit
  • NPU neural-network processing unit
  • multiple processors 201 there may be multiple processors 201, and the multiple processors 201 may include multiple processors of the same type, or may include multiple processors of different types.
  • multiple processors 201 are multiple CPUs.
  • the multiple processors 201 include one or more CPUs and one or more GPUs.
  • the multiple processors 201 include one or more CPUs and one or more NPUs.
  • the multiple processors 201 include one or more CPUs, one or more GPUs, and one or more NPUs, etc.
  • the processor 201 (such as a CPU, an NPU, etc.) may include one core, or may include multiple cores.
  • the memory 202 refers to a device for storing data, which can be a memory or a hard disk.
  • Memory refers to an internal memory that directly exchanges data with the processor 201. It can read and write data at any time and at a very fast speed. It serves as a temporary data storage for the operating system or other running programs running on the processor 201.
  • Memory includes volatile memory (volatile memory), such as random access memory (RAM), dynamic random access memory (DRAM), etc., and may also include non-volatile memory (non-volatile memory), such as storage class memory (SCM), etc., or a combination of volatile memory and non-volatile memory.
  • volatile memory volatile memory
  • RAM random access memory
  • DRAM dynamic random access memory
  • SCM storage class memory
  • multiple memories can be configured in the computing device 20, and optionally, the multiple memories can be of different types. This embodiment does not limit the number and type of memory.
  • the memory can be configured to have a power-saving function.
  • the power-saving function means that when the system loses power and then powers on again, the data stored in the memory will not be lost. Memory with a power-saving
  • the hard disk is used to provide storage resources, such as for storing pictures, videos, audio, text and other data collected by the acquisition device 10.
  • the hard disk includes but is not limited to: non-volatile memory (non-volatile memory), such as read-only memory (ROM), hard disk drive (HDD) or solid state drive (SSD).
  • non-volatile memory such as read-only memory (ROM), hard disk drive (HDD) or solid state drive (SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the difference between the hard disk and the memory is that the hard disk is The read and write speed is relatively slow and it is usually used to store data persistently.
  • the data, program instructions, etc. in the hard disk need to be loaded into the memory first, and then the processor obtains the data and/or program instructions from the memory.
  • the communication interface 203 is used for communicating with other devices, for example, for the computing device 20 to communicate with the acquisition device 10 or the storage device 30 .
  • the computing device 20 may include two processors 201, which may be a CPU and an NPU, respectively.
  • the CPU may include 6 CPU cores
  • the NPU may include 2 NPU cores, which may also be called AI cores.
  • the computing power of the NPU is higher than that of the CPU.
  • the CPU can be used to perform similarity sorting in the data retrieval process, and the NPU can be used to perform similarity calculation in the data retrieval process. For details, see the structure of a processor 201 in a computing device 20 shown in FIG4 .
  • the present application exemplarily provides a flow chart of vector retrieval, which can be seen in FIG. 5a .
  • the flow chart can be executed by the computing device 20 shown in FIG. 3 to FIG. 4 , and the flow chart can be roughly divided into the following three stages:
  • the computing device 20 inputs each sample image into a preset feature extraction model for the acquired multiple sample images.
  • the embodiment of the present application does not limit the type of feature extraction model.
  • it can be input into a convolutional neural network (CNN) model for feature extraction, so that the CNN model outputs the vector corresponding to each sample image.
  • CNN convolutional neural network
  • the computing device 20 stores the vector corresponding to each sample image in a vector base library, which can be located in the memory 202 of the computing device 20 or in the storage device 30.
  • the storage device 30 can be an independent storage medium or memory, etc.
  • the computing device 20 clusters each vector in the vector base according to the similarity between the vectors to obtain M cluster partitions, where each cluster partition corresponds to a partition center vector, and M is an integer greater than 1.
  • each vector in the vector base can be clustered in the following two ways to obtain M cluster partitions:
  • Implementation method 1 directly cluster each vector in the vector base library to obtain M cluster partitions and the partition center vector of each cluster partition.
  • the partition center vector is obtained by each vector in the cluster partition, for example, taking the average value, median, etc. of each vector, which is not limited in the embodiment of the present application.
  • the specific clustering algorithm can be a k-means clustering algorithm, a fuzzy c-means clustering algorithm, a mean shift clustering method, and a density-based clustering method, which is not limited in the embodiment of the present application.
  • Implementation method 2 randomly select a preset proportion (for example, about 10%) of vectors from the vector base library as training samples, cluster the training samples to obtain M cluster partitions and the partition center vector of each cluster partition.
  • the specific clustering algorithm can be a k-means clustering algorithm, a fuzzy c-means clustering algorithm, a mean shift clustering method, and a density-based clustering method, etc., which is not limited in the embodiments of the present application.
  • the M partition center vectors as the center, the other vectors in the vector base library except the training samples are clustered into the M cluster partitions respectively. In this way, the amount of calculation for determining the partition center vector can be reduced, and the speed of determining the partition center vector can be increased.
  • each cluster center has its own corresponding partition center vector.
  • the data to be queried can be input to the computing device 20 through the client, and the computing device 20 performs feature extraction based on the acquired data to be queried to obtain the vector to be queried, and then calculates the similarity between the vector to be queried and the partition center vectors of the M cluster partitions, and obtains M first similarities. Then, among the M first similarities, select the K first similarities that are ranked first from high to low, and determine the cluster partitions corresponding to the K first similarities as K retrieval partitions.
  • the target retrieval partition Select an unselected retrieval partition from the K retrieval partitions as the target retrieval partition, calculate the second similarity between the query vector and each vector contained in the target retrieval partition; determine the probability value of the target retrieval partition containing the target vector according to each second similarity, where the target vector refers to a vector whose similarity with the query vector is within a preset range. For example, the target vector refers to a vector whose similarity with the query vector is greater than 0.9.
  • the probability value is greater than the first preset threshold, the next target retrieval partition will no longer be selected from the unselected retrieval partition, and the retrieval for the query vector can be terminated; if the probability value is not greater than the first preset threshold, continue to select the next target retrieval partition from the unselected retrieval partition, continue to calculate the second similarity between the query vector and each vector contained in the newly selected target retrieval partition, determine the probability value of the newly selected target retrieval partition containing the target vector according to each second similarity, and compare the probability value with the first preset threshold again... Repeat the above steps until the probability value corresponding to the selected target retrieval partition is greater than the first preset threshold, then stop selecting the next target retrieval partition from the unselected retrieval partition.
  • the vector retrieval method provided in the embodiment of the present application improves the speed of vector retrieval by reasoning whether to terminate the retrieval of the current vector to be queried in advance during the vector retrieval process. For example, the second similarities between the vector to be queried and each vector in the first retrieval partition are first calculated, and the probability that the target vector is included in the first retrieval partition is determined based on each second similarity. If the probability is high, the second similarities between the vector to be queried and each vector in other retrieval partitions are no longer calculated. In this way, the amount of retrieval calculation can be reduced and the speed of vector retrieval can be improved.
  • the vector retrieval method provided by the embodiment of the present application will be described in detail below through specific steps. As shown in FIG6 , the method can be performed by the above
  • the computing device in FIG. 2 executes, or the chip in the computing device executes, the steps include:
  • Step 601 obtaining a vector to be queried.
  • the vector to be queried may be a vector input by a user to a computing device through a query client, or may be any vector obtained by the computing device from a vector base library. This embodiment of the application does not limit this.
  • Step 602 Calculate the similarity between the query vector and the partition center vectors of M cluster partitions to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarity between the vectors; the partition center vector of any cluster partition is determined according to the multiple vectors contained in any cluster partition, and M is an integer greater than 1.
  • M first similarities select the K first similarities that are ranked first in descending order, and determine the cluster partitions corresponding to the K first similarities as K search partitions, where K is an integer greater than or equal to 1 and K is less than M.
  • Step 603 looping and performing the following operations until it is determined that the probability value of the target search partition selected from the K search partitions containing the target vector is greater than a first preset threshold, wherein the target vector is a vector whose similarity with the query vector is within a preset range:
  • the target retrieval partition from the K retrieval partitions there is no restriction on the order of selecting the target retrieval partition from the K retrieval partitions, and it can be selected arbitrarily.
  • the K retrieval partitions include retrieval partition A, retrieval partition B, and retrieval partition C respectively
  • There are 100 vectors in retrieval partition A and the 100 second similarities between the query vector and the 100 vectors in retrieval partition A are calculated.
  • the determined probability value is greater than the first preset threshold, there is no need to select the next target search partition from the remaining search partitions, and there is no need to calculate the second similarity between the query vector and each vector in the next target search partition, thereby saving the search calculation amount.
  • the first preset threshold is 0.18
  • the above-mentioned probability value of determining that the search partition A contains the target vector is 0.2, which is greater than 0.18, so there is no need to calculate the second similarity between the query vector and each vector in the search partition B and the search partition C, respectively, which can save a lot of calculation workload.
  • the determined probability value is not greater than the first preset threshold, it means that the number of target vectors contained in the currently selected target retrieval partition is too small.
  • the retrieval accuracy is likely to be low when retrieving the query vector based on such a target retrieval partition. Therefore, continue to select a target retrieval partition from the remaining unselected retrieval partitions. For example, continue to select retrieval partition B, and repeat the steps after selecting retrieval partition A until the obtained probability value is greater than the first preset threshold, and then stop selecting retrieval partitions.
  • Step 604 when the loop execution stops in step 603, the query result is output based on the at least one selected search partition and the vector to be queried.
  • the query result is output based on the at least one selected search partition and the vector to be queried, which may include but is not limited to the following possible ways:
  • One possible way is to determine the retrieval partition whose probability value is greater than the first preset threshold value in at least one retrieval partition that has been selected. Since the condition for terminating the loop in step 603 is that the probability value is greater than the first preset threshold value, there is only one "retrieval partition whose probability value is greater than the first preset threshold value", which is the target retrieval partition selected last. For example, in the above example, the "retrieval partition whose probability value is greater than the first preset threshold value" is retrieval partition A. Then, the query result of the vector to be queried is retrieved in this retrieval partition. For example, each vector in the retrieval partition is output or fed back to the user as the query result of the vector to be queried. For another example, the W vectors corresponding to the W second similarities ranked first from high to low between the vector to be queried and each vector in the retrieval partition can be output or fed back to the user as the query result.
  • all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are output as the query result, which can further effectively reduce the retrieval calculation amount and thus improve the retrieval speed.
  • all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are respectively sorted in descending order with the second similarities between the vector to be queried and the vector corresponding to the first W second similarities as the query result, which can further simplify the query result.
  • step 603 if the probability value of the first target retrieval partition selected is not greater than the first preset threshold, the second target retrieval partition can be selected. If the probability value corresponding to the second target retrieval partition is greater than the first preset threshold, the next target retrieval partition is no longer selected. Therefore, the number of "at least one retrieval partition that has been selected" here may be greater than 1. For example, in the previous example, the retrieval partition may be finally selected.
  • the query result of the vector to be queried can be retrieved from the multiple vectors respectively included in the retrieval partition A and the retrieval partition B.
  • the vectors respectively included in the at least one retrieval partition that has been selected can be output as the query result or fed back to the user.
  • the W vectors corresponding to the W second similarities respectively ranked from high to low between the vector to be queried and the vectors in the at least one retrieval partition that has been selected can be output as the query result or fed back to the user.
  • the first target retrieval partition is first selected as retrieval partition A, and the second similarities between the query vector and each vector in the retrieval partition A are calculated.
  • a probability value is determined according to each second similarity. If the probability value is not greater than the first preset threshold, the next target retrieval partition is selected as retrieval partition B; the second similarities between the query vector and each vector in the retrieval partition B are calculated, and a probability value is determined according to each second similarity. If the probability value is greater than the first preset threshold, the target retrieval partition is no longer selected.
  • "at least one retrieval partition that has been selected" includes retrieval partition A and retrieval partition B.
  • step 603 Since the second similarities between the query vector and each vector in the retrieval partition A and the second similarities between the query vector and each vector in the retrieval partition B have been calculated in step 603, there is no need to repeat the calculation in step 604, so the amount of calculation does not increase, but the second similarities between the query vector and each vector in the retrieval partition A and the second similarities between the query vector and each vector in the retrieval partition B are directly sorted from high to low, and the W vectors corresponding to the W second similarities ranked first are used as the query results.
  • the query result is output based on at least one selected search partition, rather than outputting the query result based on the search partition whose probability value is greater than the first preset threshold. Since the vectors identical or similar to the vector to be searched may fall in the search partition whose probability value is not greater than the first preset threshold in the selected partition, there may also be vectors with a high similarity to the vector to be searched. Therefore, more and more accurate search results can be output based on the calculated second similarity between the vector to be searched and each vector in the selected search partition, thereby improving the accuracy of vector search.
  • the target retrieval partition may not be selected arbitrarily from the K retrieval partitions, but may be selected according to certain rules. Two methods for selecting the K retrieval partitions are described below.
  • Method 1 sort the K search partitions in descending order of the K first similarities, so that the unselected search partitions can be selected in order from the sorted K search partitions as the target search partitions.
  • the K retrieval partitions are sorted in the following order: retrieval partition A - retrieval partition B - retrieval partition C. In this way, when selecting the target retrieval partition, it is also selected in this order.
  • search partition B is calculated first, it is likely that the search partition with a probability value greater than the first preset threshold will not be obtained, so that the similarity between the search vector to be queried and the vectors in search partition A needs to be calculated again, which undoubtedly increases the amount of calculation and the time consumed in retrieval.
  • the target retrieval partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of re-selecting the target retrieval partition is minimized. There is no need to calculate the similarity between the query vector and the vector in the target retrieval partition selected again, thereby reducing the amount of calculation and improving the retrieval speed.
  • Method 2 Cluster the vectors in each search partition according to the similarity between the vectors, and then obtain multiple search sub-partitions, each of which also has a corresponding sub-partition center vector; calculate the third similarity between the query vector and the sub-partition center vectors of the multiple search sub-partitions, and sort the K search partitions according to the multiple third similarities between the query vector and the multiple sub-partition center vectors in each search partition. In this way, the search partitions that have not been selected can be selected in order from the sorted K search partitions as the target search partitions.
  • FIG7 shows a schematic diagram of dividing any retrieval partition into retrieval sub-partitions provided by an embodiment of the present application.
  • three retrieval partitions are clustered for any retrieval partition.
  • each retrieval partition is divided into 5 retrieval sub-partitions.
  • the number of retrieval sub-partitions divided by different retrieval partitions may be different.
  • the retrieval partitions are distinguished by solid lines, and the retrieval sub-partitions are distinguished by dotted lines.
  • the five-pointed star in the figure represents the partition center vector of the retrieval partition, and the triangle in the figure represents the sub-partition center vector of the retrieval sub-partition.
  • the embodiment of the present application does not limit the way of clustering the vectors in any retrieval partition, and can refer to the method of clustering the vectors in the vector base library to obtain multiple cluster partitions.
  • the K retrieval partitions can be sorted according to the maximum similarity among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition.
  • FIG8 illustrates the second similarity between the query vector and the partition center vector of each retrieval partition and the third similarity between the query vector and the sub-partition center vector of each retrieval sub-partition.
  • retrieval partition A, retrieval partition B and retrieval partition C each retrieval partition is divided into 5 retrieval sub-partitions.
  • Calculate the query vector Calculate the five third similarities between the subpartition center vectors of the five retrieval subpartitions in retrieval partition A, and select the maximum similarity among the five third similarities; calculate the five third similarities between the query vector and the subpartition center vectors of the five retrieval subpartitions in retrieval partition B, and select the maximum similarity among the five third similarities; calculate the five third similarities between the query vector and the subpartition center vectors of the five retrieval subpartitions in retrieval partition C, and select the maximum similarity among the five third similarities; sort the three maximum similarities in descending order, and accordingly, obtain the sorting of the three retrieval partitions.
  • the K retrieval partitions can also be sorted according to the number of third similarities that exceed the second preset threshold value among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition. For example, in FIG8 , the five third similarities between the query vector and the sub-partition center vectors of the five retrieval sub-partitions in the retrieval partition A are calculated, and the number x1 of the third similarities that exceed the second preset threshold value is determined; the five third similarities between the query vector and the sub-partition center vectors of the five retrieval sub-partitions in the retrieval partition B are calculated, and the number x2 of the third similarities that exceed the second preset threshold value is determined; the five third similarities between the query vector and the sub-partition center vectors of the five retrieval sub-partitions in the retrieval partition C are calculated, and the number x3 of the third similarities that exceed the second preset threshold value is determined; x1, x2 and x3 are sorted in order from high to low, and accordingly, the
  • the K search partitions are sorted based on the multiple third similarities between the query vector and the multiple sub-partition center vectors in each search partition, which can improve the accuracy of the sorting.
  • the target search partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of reselecting the target search partition is minimized, so there is no need to calculate the similarity between the query vector and the vector in the target search partition selected again, so the amount of calculation can be reduced and the search speed can be improved.
  • the five-pointed star in Figure 8 represents the partition center vector of the retrieval partition
  • the triangle represents the sub-partition center vector of the retrieval sub-partition
  • the square represents the vector to be queried.
  • Figure 8 shows the influence of different sorting methods on the sorting of K retrieval partitions.
  • the retrieval partitions are sorted in order from high to low according to the three first similarities
  • the retrieval partitions are sorted according to the order of the size of the three first similarities: retrieval partition A-retrieval partition B-retrieval partition C.
  • Figure 8 shows the three first similarities (the three first similarities are represented by the distance from the square to the three five-pointed stars in Figure 8, and the closer the distance, the higher the similarity).
  • the three retrieval partitions are sorted according to the maximum similarity among multiple third similarities between the query vector and multiple sub-partition center vectors in each retrieval partition, the three retrieval partitions are sorted as follows: retrieval partition B-retrieval partition A-retrieval partition C.
  • FIG8 shows the maximum similarity among the third similarities between the query vector and the five sub-partition center vectors in each retrieval partition (represented by the distance from the square to the three triangles in FIG8 , the closer the distance, the higher the similarity).
  • search sub-partitions can be further divided, for example, each search sub-partition is further divided into multiple small partitions, so that the search accuracy and speed can be further improved. This application will not be repeated here.
  • the probability value of the target retrieval partition containing the target vector can also be predicted by a prediction model according to each second similarity.
  • the prediction model can be a single-stage model or a two-stage model.
  • a possible way to train a prediction model is to use a large amount of labeled sample data to train the prediction model. For example, for any sample data, extract features from the sample data to obtain a sample vector; calculate the M first similarities between the sample vector and the M cluster partitions, and determine K retrieval partitions according to the size of the M first similarities; select a target retrieval partition from the K retrieval partitions, calculate the second similarities between the sample vector and each vector in the target retrieval partition, and select t target second similarities from each second similarity; input the sample vector, the K first similarities between the sample vector and the K retrieval partitions, the t target second similarities, and the label into the prediction model, and the label is the probability value of the target retrieval partition containing the target vector.
  • the parameters of the prediction model can be well optimized and adjusted.
  • Another possible way to train the prediction model is to use a large amount of sample data to train the prediction model, and the parameters of the prediction model are adjusted according to the objective function.
  • the embodiment of the present application does not limit the form of the objective function. Through multiple trainings, the parameters of the prediction model are optimized and adjusted.
  • FIG. 9 exemplarily shows a method of obtaining a probability value according to each second similarity, which may specifically include the following steps:
  • Step 901 among the K search partitions, select any unselected search partition as the target search partition; calculate the second similarities between the query vector and each vector in the target search partition. Among the second similarities, determine the t target second similarities that are ranked first in descending order.
  • the method for determining the target retrieval partition for the query vector is the same as described above and will not be repeated here.
  • the following specifically describes a method for determining the second similarities of t targets for a query vector.
  • each second similarity between the vector to be queried and each vector in the target retrieval partition is used as the target second similarity; for example, the second similarity that satisfies a certain threshold among the second similarities is used as the target second similarity; for example, the first t second similarities in front of each second similarity after being sorted from high to low according to similarity are used as the target second similarity; for another example, the largest second similarity among the second similarities is used as the target second similarity.
  • the above is only an example, and the embodiment of the present application does not limit the method for determining the target second similarity. Among them, the fewer the number of target second similarities, the more it can reduce the calculation amount of the prediction model and improve the retrieval speed.
  • taking the largest second similarity among the second similarities of the vector to be queried and all the vectors in the target retrieval partition as the target second similarity can reduce the computing power consumption of the prediction model without affecting the accuracy of the final probability value, and further improve the speed of vector retrieval.
  • the target second similarity For example, for the query vector q1, 100 second similarities between q1 and 100 vectors in the search partition A are calculated, and the second similarity with the largest value among the 100 second similarities is used as the target second similarity.
  • Step 902 input the query vector, K first similarities and t target second similarities into a prediction model to obtain a probability value.
  • the accuracy of determining the probability value is improved, and the speed of determining the probability value through the prediction model is faster than that of the non-model prediction method.
  • the second similarities t target second similarities with the highest second similarities in descending order are selected, and the t target second similarities are input into the prediction model, which can also reduce the amount of calculation of the prediction model, thereby further improving the speed of predicting the probability value without affecting the prediction accuracy.
  • cluster each vector in the vector base to obtain 10 cluster partitions, each of which corresponds to a partition center vector.
  • calculate the first similarity between the query vector q1 and the partition center vectors of the 10 cluster partitions select the first K cluster partitions with the highest first similarity as the retrieval partitions, or select K cluster partitions whose first similarity meets the preset threshold as the retrieval partitions. If three retrieval partitions are determined, namely retrieval partition A, retrieval partition B and retrieval partition C, then determine the three first similarities between the query vector q1 and the partition center vectors of the three retrieval partitions.
  • the target retrieval partition as retrieval partition A, calculate the 10 second similarities between the query vector q1 and the 10 vectors in retrieval partition A; select 5 target second similarities from the 10 second similarities.
  • Input the query vector q1, the 3 first similarities and the 5 target second similarities into the single-stage model, and the single-stage model outputs a probability value.
  • the probability value is used to characterize the probability that the target vector is contained in the retrieval partition A.
  • the probability value can reflect the retrieval accuracy of the current retrieval. If the probability value is high, it means that the current retrieval accuracy is high. If the probability value is low, it means that the current retrieval accuracy is low.
  • the probability value meets the first preset threshold, for example, the probability value is 0.98, which is greater than the first preset threshold 0.9, it means that the retrieval partition A already contains most of the target vectors, the retrieval accuracy is high, and the retrieval of the query vector can be terminated.
  • the above method makes an inference judgment on whether to terminate the search for the current query vector in advance, thereby reducing the possibility of continuing to calculate the similarity between the query vector and the vectors in the remaining search partitions, reducing the search calculation amount and improving the speed of vector search.
  • the single-stage model which means that if the method in the above embodiment is run in a hardware accelerator, it can only be calculated by matrix multiplication of vectors, and the computing power of the hardware accelerator cannot be fully utilized.
  • the input of the single-stage model is the vector to be queried, K first similarities, and t target second similarities. Since different retrieval partitions determined by different vectors to be queried are different, the corresponding target retrieval partitions are different, so each second similarity between each vector to be queried and each vector in the target retrieval partition is calculated separately, and then input into the single-stage model respectively after calculation. Then the single-stage model can only use the matrix multiplication of vectors of the hardware accelerator.
  • the retrieval partitions determined by the vector to be queried q1 are retrieval partition A, retrieval partition B and retrieval partition C, and the corresponding target retrieval partition is retrieval partition A;
  • the retrieval partitions determined by the vector to be queried q2 are retrieval partition D, retrieval partition E and retrieval partition F, and the corresponding target retrieval partition is retrieval partition D.
  • the vectors in the search partition A and the search partition D are different, so the second similarities between the query vector q1 and the vectors in the search partition A can only be calculated by matrix multiplication in the hardware accelerator, and then the second similarities between the query vector q2 and the vectors in the search partition D can be calculated by matrix multiplication in the hardware accelerator.
  • the probability value corresponding to the query vector q1 and the probability value corresponding to the query vector q2 can only be output separately through the single-stage model, and the single-stage model can only calculate the probability value by matrix multiplication.
  • the query vector q1 the query vector q1
  • the query vector q2 the three first similarities between the query vector q2 and the partition center vectors of the three retrieval partitions, and t target second similarities between the query vector q2 and each vector in the retrieval partition D are input into the single-stage model, and the single-stage model uses the matrix multiplication vector calculation method of the hardware accelerator to output the probability value of the query vector q2.
  • the single-stage model can only predict one vector to be queried at a time. This means that only the matrix-vector calculation method of the hardware accelerator can be used. The computational efficiency of the matrix-vector calculation method of the hardware accelerator is far less than the computational efficiency of the matrix-matrix calculation method of the hardware accelerator. This will cause a waste of computing power of the hardware accelerator.
  • the single-stage model can only predict one vector to be queried at a time. If there are multiple vectors to be queried, the prediction time will be further increased. If a two-stage model is used, the problems existing in the single-stage model can be overcome to a certain extent, and the retrieval speed can be further improved on the basis of the single-stage model.
  • the number of vectors to be queried in step 601 can be N, where N is a positive integer greater than 1.
  • N is a positive integer greater than 1.
  • the advantages of the vector retrieval method provided in the embodiment of the present application can be fully utilized to improve the retrieval speed and reduce the retrieval time.
  • the embodiment of the present application does not limit the way to obtain N vectors to be queried.
  • multiple vectors to be queried are obtained in batches at one time, and multiple vectors to be queried are used as input; for example, after obtaining a single vector to be queried (such as user input in an Internet application), the computing device integrates the single vectors to be queried obtained in sequence into multiple vectors to be queried as input.
  • the integration method can adopt various methods well known to those skilled in the art, and the embodiment of the present application does not limit this.
  • the method for obtaining the M first similarities in step 602 is: according to the N query vectors and the partition center vectors of the M cluster partitions, the matrix multiplication method of the hardware accelerator is used to obtain the M first similarities between any query vector and the M partition center vectors.
  • the K first similarities ranked in descending order of the first similarities are selected, and the cluster partitions corresponding to the K first similarities are determined as K retrieval partitions, where K is an integer greater than or equal to 1, and K is less than M.
  • Each vector in the vector base is clustered according to the similarity between the vectors to obtain M cluster partitions.
  • Each query vector needs to calculate the M first similarities with the partition center vectors of the M cluster partitions.
  • the N query vectors can be formed into a matrix
  • the M partition center vectors can be formed into a matrix.
  • the matrix multiplication method of the hardware accelerator is used for calculation. In this way, the M first similarities between any query vector in the N query vectors and the M partition center vectors can be quickly obtained.
  • the vectors to be queried are q1 and q2, and the matrix formed is [q1, q2];
  • the M partition center vectors are m1, m2, m3, m4, m5, m6, m7, m8, m9 and m10, and the matrix formed is [m1, m2, m3, m4, m5, m6, m7, m8, m9, m10].
  • Figure 11a shows the matrix of M first similarities between any query vector and M partition center vectors obtained by calculating by matrix multiplication of the hardware accelerator. Among them, s11 represents the first similarity between q1 and m1, s12 represents the first similarity between q1 and m2, and so on, which will not be repeated here.
  • the cluster partitions corresponding to the K first similarities ranked in descending order are used as the K retrieval partitions of the query vector. For example, three retrieval partitions are determined for each query vector.
  • the retrieval partitions determined by the query vector q1 are retrieval partition A, retrieval partition B, and retrieval partition C; the retrieval partitions determined by the query vector q2 are retrieval partition D, retrieval partition E, and retrieval partition F.
  • FIG. 10 exemplarily shows a method for obtaining the probability value according to each second similarity, which can specifically include the following steps:
  • Step 1001 input a matrix formed by N query vectors and a matrix formed by K first similarities corresponding to each of the N query vectors into a first prediction model, and use a matrix-matrix multiplication method of a hardware accelerator to obtain N initial probability values corresponding to the N query vectors; the initial probability values are used to represent the probability that a target vector in the K retrieval partitions corresponding to any query vector contains the query vector.
  • the query vectors are q1 and q2, and the matrix formed is [q1, q2];
  • the three first similarities corresponding to the three retrieval partitions of the query vector q1 are s11, s12 and s13, respectively, corresponding to retrieval partition A, retrieval partition B and retrieval partition C;
  • the three first similarities corresponding to the three retrieval partitions of the query vector q2 are s24, s25 and s26, respectively, corresponding to retrieval partition D, retrieval partition E and retrieval partition F.
  • Figure 11b shows the matrix formed by the three first similarities corresponding to each of the two query vectors. In this matrix, it is not necessary to pay attention to which retrieval partitions each query vector corresponds to, because the first prediction model only needs to calculate the initial probability value for each query vector and the three first similarities corresponding to each query vector.
  • two initial probability values p11 and p12 are generated for the query vector q1 and the query vector q2, respectively, where p11 represents the probability of the target vector containing the query vector q1 in the retrieval partition A, the retrieval partition B, and the retrieval partition C. Wherein p12 represents the probability of the target vector containing the query vector q2 in the retrieval partition D, the retrieval partition E, and the retrieval partition F.
  • p11 represents the probability of the target vector containing the query vector q1 in the retrieval partition A, the retrieval partition B, and the retrieval partition C.
  • p12 represents the probability of the target vector containing the query vector q2 in the retrieval partition D, the retrieval partition E, and the retrieval partition F.
  • the input of the first prediction model is N query vectors and K first similarities corresponding to each of the N query vectors, these features can be input in matrix form. Therefore, the matrix multiplication of the hardware accelerator can be used to obtain N initial probability values corresponding to the N query vectors. In this way, the computing power of the hardware accelerator is fully utilized. Compared with the single-stage model, the computing efficiency can be further improved, the speed of vector retrieval can be increased, and the retrieval time can be reduced.
  • Step 1002 For any query vector, select any unselected retrieval partition from the K retrieval partitions as a target retrieval partition; determine each second similarity between the query vector and each vector in the target retrieval partition. Among the second similarities, determine t target second similarities that are ranked first in descending order.
  • the target retrieval partition is retrieval partition A, and 100 second similarities between q1 and 100 vectors in retrieval partition A are calculated, and the second similarity with the largest median value of the 100 second similarities is used as the target second similarity.
  • the target retrieval partition is retrieval partition D, and 200 second similarities between q1 and 200 vectors in retrieval partition D are calculated, and the second similarity with the largest median value of the 200 second similarities is used as the target second similarity.
  • Step 1003 for any query vector, the initial probability value corresponding to the query vector and t target second similarities corresponding to the query vector are input into the second prediction model to obtain a final probability value corresponding to the query vector.
  • each query vector corresponds to a different target retrieval partition, the t target second similarities corresponding to different query vectors cannot be obtained at the same time, but are calculated separately, as described in step 1002. Therefore, in step 1003, each query vector is calculated separately, using the matrix multiplication vector calculation method of the hardware accelerator.
  • the query vector q1 its corresponding initial probability value p11 and the target second similarity are input into the second prediction model, and the final probability value p21 is obtained by matrix multiplication of the hardware accelerator.
  • p21 reflects the probability of the target vector in the retrieval partition A containing the query vector q1.
  • the corresponding initial probability value p12 and the target second similarity are input into the second prediction model, and the final probability value p22 is obtained by matrix multiplication of the hardware accelerator.
  • p22 reflects the probability of the target vector containing the query vector q2 in the retrieval partition D.
  • the prediction of the probability value is divided into two stages, the first stage uses the first prediction model, and the second stage uses the second prediction model.
  • the matrix formed by the N query vectors and the matrix formed by the K first similarities corresponding to each query vector in the N query vectors are input into the first prediction model, so that the first prediction model can use the matrix multiplication method to predict the initial probability value, which gives full play to the computing power, improves the computing efficiency, and further improves the speed of vector retrieval.
  • FIG. 12 exemplarily shows a method for determining the probability value, which may specifically include the following steps:
  • Step 1201 If the final probability value is not greater than the first preset threshold, then select the next unselected retrieval partition from the K retrieval partitions as the target retrieval partition.
  • the first similarities between the query vector q1 and the retrieval partition A, retrieval partition B and retrieval partition C are 0.9, 0.8 and 0.7 respectively, then the target retrieval partition of the query vector q1 is retrieval partition A, and the next target retrieval partition is retrieval partition B.
  • Step 1202 input the final probability value corresponding to the query vector and the target second similarity between the query vector and each vector in the next target retrieval partition into the second prediction model, and obtain the updated probability value corresponding to the query vector by matrix multiplication of the hardware accelerator.
  • the method for determining the second target similarity here is the same as the method for determining the second target similarity in the target retrieval partition in the previous text, and will not be repeated here.
  • the second similarity between the query vector q1 and each vector in the retrieval partition B is calculated, and the value with the largest second similarity is determined as the target second similarity; the final probability value p21 corresponding to the query vector q1 and the target second similarity corresponding to the query vector q1 are input into the second prediction model, and the updated probability value corresponding to the query vector q1 is obtained by matrix multiplication of the vector by the hardware accelerator.
  • the updated probability value is used to represent the probability that the target vector is included in all current target retrieval partitions.
  • the updated probability value is used to represent the probability that the target vector is included in the retrieval partition A and the retrieval partition B.
  • Step 1203 if the updated probability value is not greater than the first preset threshold, the final probability value in step 1202 is updated to the updated probability value, and the process returns to step 1201 to select the next unselected retrieval partition from the K retrieval partitions as the target retrieval partition.
  • the updated probability value is not greater than the first preset threshold, it means that the probability that the target vector is included in all current target retrieval partitions is very low, the retrieval accuracy is not high, and the retrieval should be continued.
  • the update probability value is greater than the first preset threshold, it means that the probability of all current target search partitions containing the target vector is high, the search accuracy is high, and the search should be terminated. Alternatively, when the K search partitions are polled, the search should also be terminated to save computing power.
  • an updated probability value is obtained. If the updated probability value is not greater than the second preset threshold, the updated probability value is updated to the final probability value, and the cycle is repeated to determine whether to terminate the search. The accuracy of the judgment on terminating the search is improved, and the vector search accuracy can be improved.
  • FIG13 is a schematic diagram of the overall flow of a vector search method provided by the embodiment of the present invention, which may include the following steps.
  • Step 1301 obtaining the vectors q1 and q2 to be queried.
  • Step 1302 the matrix formed by the query vectors q1 and q2 and the matrix formed by the partition center vectors of the 10 cluster partitions are multiplied by the matrix-matrix method of the hardware accelerator to obtain 10 first similarities between any query vector and the 10 partition center vectors.
  • Step 1303 among the 10 first similarities corresponding to the query vector q1, determine the 3 retrieval partitions corresponding to the top 3 values of the first similarities from high to low; among the 10 first similarities corresponding to the query vector q2, determine the 3 retrieval partitions corresponding to the top 3 values of the first similarities from high to low.
  • the three retrieval partitions corresponding to the query vector q1 are retrieval partition A, retrieval partition B, and retrieval partition C.
  • the three retrieval partitions corresponding to the query vector q2 are retrieval partition D, retrieval partition E, and retrieval partition F.
  • Step 1304 input the matrix formed by the query vectors q1 and q2 and the matrices formed by the three first similarities corresponding to the query vectors q1 and q2 respectively into the first prediction model.
  • Step 1305 In the first prediction model, the matrix multiplication method of the hardware accelerator is used to obtain the initial probability values corresponding to the query vectors q1 and q2 respectively.
  • Step 1306 for the query vector q1, sort the three search partitions according to the magnitude of the first similarity.
  • the first similarities corresponding to retrieval partition A, retrieval partition B, and retrieval partition C are 0.9, 0.8, and 0.7, respectively.
  • Step 1307 determine the search partition with the largest first similarity as the target search partition of the query vector q1. For example, determine the search partition A as the i-th target search partition of the query vector q1.
  • Step 1308 calculating the second similarity between the query vector q1 and each vector in the target retrieval partition, and taking the maximum value of the second similarity as the target second similarity of the target retrieval partition.
  • Step 1309 input the initial probability value corresponding to the query vector q1 and the target second similarity into the second prediction model, and obtain the final probability value corresponding to the query vector q1 by matrix multiplication of the hardware accelerator.
  • Step 1310 determine whether the final probability value is greater than a first preset threshold, if so, proceed to step 1311. If not, proceed to step 1312.
  • Step 1311 terminate the search for the query vector q1. Return the vectors that meet the similarity requirement with the query vector in all current target search partitions as the query results. For example, if the final probability value corresponding to search partition A is 0.98, which is greater than the first preset threshold, then the vectors corresponding to the first W second similarities in search partition A with the query vector are returned as the query results.
  • Step 1312 selecting the next unselected retrieval partition among the three retrieval partitions as the target retrieval partition.
  • the final probability value corresponding to the retrieval partition A is 0.58, which is not greater than the first preset threshold, and the retrieval partition B is selected as the target retrieval partition.
  • Step 1313 calculating the second similarity between the query vector q1 and each vector in the next target retrieval partition, and taking the maximum value of the second similarity as the target second similarity of the next target retrieval partition.
  • Step 1314 input the final probability value and the target second similarity of the next target retrieval partition into the second prediction model, and use the matrix multiplication vector method of the hardware accelerator to obtain the updated probability value corresponding to the query vector q1.
  • Step 1315 determining whether the update probability value is greater than a first preset threshold, if so, proceeding to step 1311 , if not, proceeding to step 1316 .
  • Step 1316 update the final probability value in step 1314 to the updated probability value, and return to step 1312.
  • an embodiment of the present application provides a vector retrieval device, as shown in Figure 14, the vector retrieval device includes an acquisition unit 1401 and a processing unit 1402.
  • the vector retrieval device is used to execute the method embodiments shown in Figure 5a, Figure 6, Figure 9, Figure 10, Figure 12 or Figure 13 above.
  • the acquisition unit 1401 is used to acquire the vector to be queried; the processing unit 1402 is used to: perform similarity calculations on the vector to be queried and the partition center vectors of M cluster partitions respectively, to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarities between the vectors; the partition center vector of any cluster partition is determined based on the multiple vectors contained in any cluster partition, and M is an integer greater than 1; among the M first similarities, K first similarities whose first similarities are sorted from high to low are selected, and the cluster partitions corresponding to the K first similarities are respectively The area is determined to be K retrieval partitions, K is an integer greater than or equal to 1, and K is less than M; the following operations are performed in a loop until it is determined that the probability value of the target retrieval partition selected from the K retrieval partitions containing the target vector is greater than a first preset threshold, and the target vector is a vector
  • the processing unit 1402 when the processing unit 1402 outputs the query result based on at least one selected retrieval partition and the vector to be queried, it is specifically used to: output each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition as the query result; or, according to the order of the second similarities between each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition and the vector to be queried from high to low, output the vectors corresponding to the first W second similarities as the query result, where W is a positive integer.
  • the processing unit 1402 when the processing unit 1402 outputs the query result based on the at least one selected retrieval partition and the vector to be queried, it is specifically configured to: output the vectors respectively included in the at least one selected retrieval partition as the query result; or output the vectors respectively included in the at least one selected retrieval partition as the query result according to the order of the second similarities between the vectors respectively included in the at least one selected retrieval partition and the vector to be queried from high to low.
  • the vectors corresponding to the top W second similarities are output as the query result, where W is a positive integer.
  • the processing unit 1402 when the processing unit 1402 selects an unselected retrieval partition as a target retrieval partition from the K retrieval partitions, it is specifically configured to: select an unselected retrieval partition as a target retrieval partition from the K retrieval partitions in descending order of the K first similarities.
  • the processing unit 1402 when the processing unit 1402 selects a retrieval partition that has not been selected as a target retrieval partition from K retrieval partitions, it is specifically used to: for any retrieval partition from the K retrieval partitions, cluster each vector in the retrieval partition according to the similarity between the vectors to obtain multiple retrieval sub-partitions; determine the sub-partition center vector of any retrieval sub-partition based on the multiple vectors contained in any retrieval sub-partition; calculate the third similarities between the vector to be queried and the sub-partition center vectors of the multiple retrieval sub-partitions; sort the K retrieval partitions based on the multiple third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition; and select a retrieval partition that has not been selected as the target retrieval partition from the sorted K retrieval partitions.
  • the processing unit 1402 sorts the K retrieval partitions according to the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition, it is specifically used to: sort the K retrieval partitions according to the number of third similarities that exceed a second preset threshold among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition; or sort the K retrieval partitions according to the maximum similarity among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition.
  • the processing unit 1402 determines the probability value of the target retrieval partition containing the target vector based on each second similarity, it is specifically used to: determine, among each second similarity, t target second similarities whose second similarities are ranked from high to low; input the query vector, K first similarities and t target second similarities into the prediction model to obtain a probability value; the prediction model is used to predict the probability value of the target retrieval partition containing the target vector.
  • N there are N vectors to be queried, where N is a positive integer greater than 1; when the processing unit 1402 inputs the vector to be queried, K first similarities and t target second similarities into the prediction model to obtain the probability value, it is specifically used to: input a matrix formed by the N vectors to be queried and a matrix formed by the K first similarities corresponding to each of the N vectors to be queried into the first prediction model to obtain N initial probability values corresponding to the N vectors to be queried; the initial probability value is used to characterize the probability that the target vector of the K retrieval partitions corresponding to any one of the vectors to be queried contains the vector to be queried; for any one of the vectors to be queried, the initial probability value corresponding to the vector to be queried and the t target second similarities corresponding to the vector to be queried are input into the second prediction model to obtain the final probability value corresponding to the vector to be queried.
  • an embodiment of the present application also provides a computer-readable storage medium, on which a computer program or instruction is stored.
  • the computer program or instruction When the computer program or instruction is executed, the computer executes the method in the above method embodiment.
  • an embodiment of the present application provides a computer program product.
  • the computer executes the method in the above method embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A vector retrieval method and device, for use in solving the problem of low retrieval speed existing in existing retrieval methods. In the present application, the method comprises: acquiring a vector to be queried; respectively performing similarity calculation on the vector to be queried and partition center vectors of M clustering partitions to obtain M first similarities; determining K retrieval partitions according to the M first similarities; performing loop execution on the following operations until a probability value that a target retrieval partition includes a target vector is greater than a first preset threshold: selecting a retrieval partition from the K retrieval partitions as the target retrieval partition, calculating a second similarity between the vector to be queried and each vector included in the target retrieval partition, and according to the second similarities, determining the probability value that the target retrieval partition includes the target vector; and outputting a query result on the basis of the selected at least one retrieval partition. A query result can be obtained without calculating the similarities between a vector to be queried and all the vectors in a vector base library, the amount of calculation can be reduced, and the query speed is increased.

Description

一种向量检索方法及装置A vector search method and device
本申请要求于2022年9月28日提交中国专利局、申请号为202211193810.4、发明名称为“一种向量检索方法及装置”的中国专利申请的优先权,所述专利申请的全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the China Patent Office on September 28, 2022, with application number 202211193810.4 and invention name “A vector retrieval method and device”. The entire contents of the patent application are incorporated into this application by reference.
技术领域Technical Field
本申请涉及检索技术领域,尤其涉及一种向量检索方法及装置。The present application relates to the field of search technology, and in particular to a vector search method and device.
背景技术Background technique
向量检索在信息检索领域发挥着重要作用。向量检索的过程为,首先构造向量底库,向量底库中包含对大量数据进行特征提取后得到的大量向量,数据可以为图片、视频、音频、文本等形式;然后将用户输入的待查询向量与向量底库中的所有向量分别计算相似度,将相似度由高到低进行排序的前W个相似度对应的向量作为待查询向量的查询结果返回。Vector retrieval plays an important role in the field of information retrieval. The process of vector retrieval is to first construct a vector base library, which contains a large number of vectors obtained by feature extraction of a large amount of data. The data can be in the form of pictures, videos, audio, text, etc. Then, the similarity between the query vector input by the user and all the vectors in the vector base library is calculated respectively, and the vectors corresponding to the first W similarities are sorted from high to low as the query result of the query vector.
这种方法针对包含亿级乃至十亿级规模的向量的向量底库进行全局的搜索比对,检索吞吐量(Query Per Second,速度)较低,检索速度较低。This method performs a global search and comparison on a vector database containing hundreds of millions or even billions of vectors. It has a low retrieval throughput (Query Per Second) and a low retrieval speed.
发明内容Summary of the invention
本申请提供一种向量检索方法及装置,用于解决现有的向量检索方法存在的检索速度低的问题。The present application provides a vector search method and device, which are used to solve the problem of low search speed in existing vector search methods.
第一方面,本申请提供一种向量检索方法,该方法具体可以由计算设备执行或者由计算设备内部的芯片执行,或者由计算设备中的处理器执行。该方法包括:获取待查询向量;In a first aspect, the present application provides a vector retrieval method, which can be specifically executed by a computing device or by a chip inside the computing device, or by a processor in the computing device. The method includes: obtaining a vector to be queried;
将所述待查询向量分别与M个聚类分区的分区中心向量做相似度计算,得到M个第一相似度;所述M个聚类分区为对向量底库中的各向量按照向量之间的相似度进行聚类得到的;任一所述聚类分区的分区中心向量是根据任一所述聚类分区中包含的多个向量确定的,所述M为大于1的整数;在所述M个第一相似度中,选择所述第一相似度由高到低排序在前的K个第一相似度,并将所述K个第一相似度分别对应的聚类分区确定为K个检索分区,所述K为大于等于1的整数、且K小于所述M;Calculate similarity between the query vector and the partition center vectors of M cluster partitions respectively to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarity between the vectors; the partition center vector of any cluster partition is determined according to the multiple vectors contained in any cluster partition, and M is an integer greater than 1; among the M first similarities, select K first similarities ranked in descending order, and determine the cluster partitions corresponding to the K first similarities as K retrieval partitions, and K is an integer greater than or equal to 1, and K is less than M;
循环执行如下操作,直至确定出在所述K个检索分区中选择的目标检索分区包含目标向量的概率值大于第一预设阈值为止,所述目标向量为与所述待查询向量之间的相似度在预设范围内的向量:The following operations are performed in a loop until it is determined that the probability value of the target search partition selected from the K search partitions containing the target vector is greater than a first preset threshold, wherein the target vector is a vector whose similarity with the query vector is within a preset range:
在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区;计算所述待查询向量分别与所述目标检索分区中包含的各向量之间的第二相似度;根据各所述第二相似度,确定所述目标检索分区中包含目标向量的概率值;Selecting an unselected retrieval partition from the K retrieval partitions as a target retrieval partition; calculating a second similarity between the query vector and each vector included in the target retrieval partition; and determining a probability value of the target retrieval partition containing a target vector according to each of the second similarities;
基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果。Based on the at least one selected retrieval partition and the vector to be queried, a query result is output.
上述技术方案中,无需将待查询向量与向量底库中的所有向量计算相似度,从而得到查询结果,而是先将向量底库中的各向量进行聚类得到M个聚类分区,每个聚类分区对应有一个分区中心向量;通过计算待查询向量与M个聚类分区的分区中心向量的第一相似度,通过M个第一相似度的大小关系在M个聚类分区中挑选出K个检索分区;在K个检索分区中依次选择目标检索分区,针对选择的每个目标检索分区确定与待查询向量相同或相似的向量落在选择的目标检索分区中的概率值,直至选择到概率值大于第一预设阈值的目标检索分区为止。然后在已选择的至少一个目标检索分区中确定待查询向量的查询结果。如此,可以降低计算量,提高查询速度。In the above technical solution, it is not necessary to calculate the similarity between the vector to be queried and all the vectors in the vector base library to obtain the query result. Instead, each vector in the vector base library is clustered to obtain M cluster partitions, each cluster partition corresponds to a partition center vector; by calculating the first similarity between the vector to be queried and the partition center vectors of the M cluster partitions, K retrieval partitions are selected from the M cluster partitions according to the size relationship of the M first similarities; the target retrieval partition is selected in turn from the K retrieval partitions, and the probability value of the vector identical or similar to the vector to be queried falling in the selected target retrieval partition is determined for each selected target retrieval partition until a target retrieval partition with a probability value greater than a first preset threshold is selected. Then the query result of the vector to be queried is determined in at least one selected target retrieval partition. In this way, the amount of calculation can be reduced and the query speed can be improved.
在一种可能的实现方式中,基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果,包括:将已经选择的至少一个检索分区中所述概率值大于所述第一预设阈值的检索分区中包含的各向量,作为查询结果输出;或者,根据已经选择的至少一个检索分区中所述概率值大于所述第一预设阈值的检索分区中包含的各向量分别与所述待查询向量之间的所述第二相似度由高到低的顺序,将排序在前的W个所述第二相似度分别对应的向量,作为查询结果输出,所述W为正整数。In a possible implementation, based on at least one selected retrieval partition and the vector to be queried, the query result is output, including: outputting each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition as the query result; or, according to the order from high to low of the second similarities between each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition and the vector to be queried, outputting the vectors corresponding to the first W second similarities as the query result, where W is a positive integer.
上述技术方案中,由于选择出的概率值大于第一预设阈值的检索分区只有一个,那么将概率值大于第一预设阈值的检索分区包含的所有向量输出作为查询结果,能够有效降低计算量,提高检索速度。或者将概率值大于第一预设阈值的检索分区包含的所有向量分别与所述待查询向量之间的第二相似度由高到低 的顺序排序在前的W个第二相似度分别对应的向量,作为查询结果输出,可以进一步精简输出结果。In the above technical solution, since there is only one retrieval partition selected whose probability value is greater than the first preset threshold, all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are output as the query result, which can effectively reduce the amount of calculation and improve the retrieval speed. Alternatively, all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are respectively outputted from high to low with respect to the second similarity between the vector to be queried. The vectors corresponding to the W second similarities that are sorted in the order of are output as the query result, which can further simplify the output result.
在一种可能的实现方式中,基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果,包括:将已经选择的至少一个检索分区分别包含的各向量,作为查询结果输出;或者根据已经选择的至少一个检索分区中分别包含的各向量与所述待查询向量之间的所述第二相似度由高到低的顺序,将排序在前的W个所述第二相似度分别对应的向量,作为查询结果输出,所述W为正整数。In a possible implementation, based on at least one selected retrieval partition and the vector to be queried, the query result is output, including: outputting the vectors respectively contained in the at least one selected retrieval partition as the query result; or according to the order from high to low of the second similarities between the vectors respectively contained in the at least one selected retrieval partition and the vector to be queried, outputting the vectors corresponding to the first W second similarities as the query result, where W is a positive integer.
上述技术方案中,基于已经选择的至少一个检索分区输出查询结果,而不是基于概率值大于第一预设阈值的检索分区输出查询结果。由于与待查询向量相同或相似的向量落在选择的分区中的概率值不大于第一预设阈值的检索分区中可能也存在与待查询向量相似度较高的向量,因此上述方案可以提高向量检索的精度。In the above technical solution, the query result is output based on at least one selected retrieval partition, rather than outputting the query result based on the retrieval partition whose probability value is greater than the first preset threshold. Since the retrieval partition whose probability value of the vector identical or similar to the vector to be queried falls in the selected partition is not greater than the first preset threshold, there may also be vectors with a high similarity to the vector to be queried, so the above solution can improve the accuracy of vector retrieval.
在一种可能的实现方式中,在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区,可以为按照所述K个第一相似度由高到低的顺序在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区。In a possible implementation, selecting an unselected retrieval partition as the target retrieval partition from the K retrieval partitions may be performed by selecting an unselected retrieval partition as the target retrieval partition from the K retrieval partitions in a descending order of the K first similarities.
这样,按照K个第一相似度的大小顺序选择目标检索分区,能够尽早地确定出概率值大于第一预设阈值的目标检索分区,尽量减少再选择目标检索分区的可能,也就无需计算待查询向量与再次选择的目标检索分区中的向量之间的相似度,因此可以降低计算量,提高检索速度。In this way, by selecting the target retrieval partition in the order of the K first similarities, the target retrieval partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of re-selecting the target retrieval partition is minimized. There is no need to calculate the similarity between the query vector and the vector in the target retrieval partition selected again, thereby reducing the amount of calculation and improving the retrieval speed.
在一种可能的实现方式中,在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区,还可以为:针对所述K个检索分区中的任一检索分区,将所述检索分区中的各向量按照向量之间的相似度进行聚类得到多个检索子分区;根据任一检索子分区中包含的多个向量确定任一所述检索子分区的子分区中心向量;计算所述待查询向量分别与所述多个检索子分区的子分区中心向量的第三相似度;根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对所述K个检索分区进行排序;在排序后的K个检索分区中选择未被选择过的检索分区作为所述目标检索分区。In a possible implementation, selecting a retrieval partition that has not been selected as a target retrieval partition from among the K retrieval partitions may also be as follows: for any retrieval partition among the K retrieval partitions, clustering each vector in the retrieval partition according to the similarity between the vectors to obtain a plurality of retrieval sub-partitions; determining a sub-partition center vector of any retrieval sub-partition according to a plurality of vectors contained in any retrieval sub-partition; calculating a third similarity between the vector to be queried and the sub-partition center vectors of the plurality of retrieval sub-partitions; sorting the K retrieval partitions according to a plurality of third similarities between the vector to be queried and the plurality of sub-partition center vectors in each retrieval partition; and selecting a retrieval partition that has not been selected as the target retrieval partition from among the sorted K retrieval partitions.
这样,通过在对检索分区内的向量进一步聚类,得到多个检索子分区,每个检索子分区对应子分区中心向量。由于进行了更加细致的划分,划分得到的子分区中心向量更能准确地代表该检索子分区中的向量。基于待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对所述K个检索分区进行排序,可以提高排序的准确度。如此,能够尽早地确定出概率值大于第一预设阈值的目标检索分区,尽量减少再选择目标检索分区的可能,也就无需计算待查询向量与再次选择的目标检索分区中的向量之间的相似度,因此也可以降低计算量,提高检索速度。In this way, by further clustering the vectors in the retrieval partition, multiple retrieval sub-partitions are obtained, and each retrieval sub-partition corresponds to a sub-partition center vector. Due to the more detailed division, the sub-partition center vector obtained by the division can more accurately represent the vector in the retrieval sub-partition. Based on the multiple third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition, the K retrieval partitions are sorted, and the accuracy of the sorting can be improved. In this way, the target retrieval partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of re-selecting the target retrieval partition is minimized, so there is no need to calculate the similarity between the vector to be queried and the vector in the target retrieval partition selected again, so the amount of calculation can be reduced and the retrieval speed can be improved.
在一种可能的实现方式中,根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对所述K个检索分区进行排序,包括:根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中超出第二预设阈值的所述第三相似度的数量,对所述K个检索分区进行排序;或者,根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中的最大相似度,对所述K个检索分区进行排序。In a possible implementation, the K retrieval partitions are sorted according to multiple third similarities between the vector to be queried and multiple sub-partition center vectors in each retrieval partition, including: sorting the K retrieval partitions according to the number of third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition that exceed a second preset threshold; or sorting the K retrieval partitions according to the maximum similarity among the multiple third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition.
通过多个第三相似度中超出第二预设阈值的所述第三相似度的数量或者通过多个第三相似度中的最大相似度对K个检索分区进行排序,都可以降低排序难度,提高排序速度,进而提高检索速度。同时,还可以提高排序的准确度。如此,能够尽早地确定出概率值大于第一预设阈值的目标检索分区。By sorting the K search partitions by the number of the third similarities among the multiple third similarities that exceed the second preset threshold or by the maximum similarity among the multiple third similarities, the sorting difficulty can be reduced, the sorting speed can be increased, and the search speed can be increased. At the same time, the accuracy of the sorting can also be improved. In this way, the target search partition with a probability value greater than the first preset threshold can be determined as early as possible.
在一种可能的实现方式中,根据各所述第二相似度,确定所述目标检索分区中包含目标向量的概率值,包括:在各所述第二相似度中,确定所述第二相似度由高到低排序在前的t个目标第二相似度;将所述待查询向量、所述K个第一相似度和所述t个目标第二相似度输入预测模型,得到所述概率值;所述预测模型用于对所述目标检索分区中包含所述目标向量的概率值进行预测。In a possible implementation, the probability value of the target vector being included in the target retrieval partition is determined based on each of the second similarities, including: among each of the second similarities, determining t target second similarities whose second similarities are ranked first from high to low; inputting the query vector, the K first similarities and the t target second similarities into a prediction model to obtain the probability value; the prediction model is used to predict the probability value of the target vector being included in the target retrieval partition.
通过预测模型对概率值进行预测,提高了确定概率值的准确性以及确定概率值的速度。在各第二相似度中,选择第二相似度由高到低排序在前的t个目标第二相似度,将t个目标第二相似度输入预测模型,还可以减小预测模型的计算量,提高预测概率值的速度,且不影响预测精度。By predicting the probability value through the prediction model, the accuracy and speed of determining the probability value are improved. Among the second similarities, t target second similarities with the highest second similarities ranked in descending order are selected, and the t target second similarities are input into the prediction model, which can also reduce the calculation amount of the prediction model and improve the speed of predicting the probability value without affecting the prediction accuracy.
在一种可能的实现方式中,所述待查询向量为N个,所述N为大于1的正整数;相应的,将所述待查询向量、所述K个第一相似度和所述t个目标第二相似度输入预测模型,得到所述概率值,包括:将N个待查询向量形成的矩阵和所述N个待查询向量中每个待查询向量对应的所述K个第一相似度形成的矩阵输入第一预测模型,得到所述N个待查询向量对应的N个初始概率值;所述初始概率值用于表征任一待查询向量对应的K个检索分区中包含所述待查询向量的目标向量的概率;针对任一待查询向量,将所述待查询向量对应的初始概率值和所述待查询向量对应的所述t个目标第二相似度输入第二预测模型,得到所述待查询向量对应的最终概率值。 In a possible implementation, there are N vectors to be queried, where N is a positive integer greater than 1; accordingly, the vector to be queried, the K first similarities and the t target second similarities are input into a prediction model to obtain the probability value, including: inputting a matrix formed by the N vectors to be queried and a matrix formed by the K first similarities corresponding to each of the N vectors to be queried into a first prediction model to obtain N initial probability values corresponding to the N vectors to be queried; the initial probability value is used to characterize the probability that the target vector of the vector to be queried is contained in the K retrieval partitions corresponding to any vector to be queried; for any vector to be queried, the initial probability value corresponding to the vector to be queried and the t target second similarities corresponding to the vector to be queried are input into a second prediction model to obtain a final probability value corresponding to the vector to be queried.
将对概率值的预测分为两阶段进行,第一阶段采用第一预测模型,第二阶段采用第二预测模型。具体的,将N个待查询向量形成的矩阵和所述N个待查询向量中每个待查询向量对应的所述K个第一相似度形成的矩阵输入第一预测模型,如此第一预测模型就可以使用矩阵乘矩阵的方式预测初始概率值,充分发挥了算力,提高了计算效率,进一步提高了向量检索的速度。The prediction of the probability value is divided into two stages, the first stage uses the first prediction model, and the second stage uses the second prediction model. Specifically, the matrix formed by the N query vectors and the matrix formed by the K first similarities corresponding to each query vector in the N query vectors are input into the first prediction model, so that the first prediction model can use the matrix multiplication method to predict the initial probability value, which gives full play to the computing power, improves the computing efficiency, and further improves the speed of vector retrieval.
第二方面,本申请实施例提供一种向量检索装置,该装置具有实现上述第一方面或第一方面的任一种可能的实现方式中方法的功能,该装置可以为计算设备,也可以为计算设备中包括的处理器。上述向量检索装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现,硬件或软件包括一个或多个与上述功能相对应的模块或单元或手段。In a second aspect, an embodiment of the present application provides a vector search device, which has the function of implementing the method in the first aspect or any possible implementation of the first aspect, and the device can be a computing device or a processor included in the computing device. The functions of the above-mentioned vector search device can be implemented by hardware, or by hardware executing corresponding software, and the hardware or software includes one or more modules or units or means corresponding to the above-mentioned functions.
在一种可能的实现方式中,该装置的结构中包括处理模块和收发模块,其中,处理模块被配置为支持该装置执行上述第一方面或第一方面的任一种实现方式中方法。收发模块用于支持该装置与其他装置之间的通信,例如,可接收来自采集设备的数据。该向量检索装置还可以包括存储模块,存储模块与处理模块耦合,其保存有装置必要的程序指令和数据。作为一种示例,处理模块可以为处理器,收发模块可以为收发器,存储模块可以为存储器,存储器可以和处理器集成在一起,也可以和处理器分离设置。In a possible implementation, the structure of the device includes a processing module and a transceiver module, wherein the processing module is configured to support the device to execute the method in the first aspect or any one of the implementations of the first aspect. The transceiver module is used to support communication between the device and other devices, for example, it can receive data from an acquisition device. The vector retrieval device may also include a storage module, which is coupled to the processing module and stores program instructions and data necessary for the device. As an example, the processing module may be a processor, the transceiver module may be a transceiver, and the storage module may be a memory. The memory may be integrated with the processor or may be set separately from the processor.
在另一种可能的实现方式中,该装置的结构中包括处理器,还可以包括存储器。处理器与存储器耦合,可用于执行存储器中存储的计算机程序指令,以使装置执行上述第一方面或第一方面的任一种可能的实现方式中的方法。可选地,该装置还包括通信接口,处理器与通信接口耦合。当装置为计算设备时,该通信接口可以是收发器或输入/输出接口。In another possible implementation, the structure of the device includes a processor and may also include a memory. The processor is coupled to the memory and may be used to execute computer program instructions stored in the memory so that the device performs the method in the first aspect or any possible implementation of the first aspect. Optionally, the device further includes a communication interface, and the processor is coupled to the communication interface. When the device is a computing device, the communication interface may be a transceiver or an input/output interface.
第三方面,本申请实施例提供一种芯片,包括处理器,处理器与存储器耦合,存储器用于存储程序或指令,当程序或指令被处理器执行时,使得该芯片实现上述第一方面或第一方面的任一种可能的实现方式中的方法。In a third aspect, an embodiment of the present application provides a chip, including a processor, wherein the processor is coupled to a memory, and the memory is used to store programs or instructions. When the programs or instructions are executed by the processor, the chip implements the method in the above-mentioned first aspect or any possible implementation method of the first aspect.
可选地,该芯片还包括接口电路,该接口电路用于交互代码指令至处理器。Optionally, the chip further includes an interface circuit for interacting code instructions with the processor.
可选地,该芯片中的处理器可以为一个或多个,该处理器可以通过硬件实现也可以通过软件实现。当通过硬件实现时,该处理器可以是逻辑电路、集成电路等。当通过软件实现时,该处理器可以是一个通用处理器,通过读取存储器中存储的软件代码来实现。Optionally, there may be one or more processors in the chip, and the processor may be implemented by hardware or software. When implemented by hardware, the processor may be a logic circuit, an integrated circuit, etc. When implemented by software, the processor may be a general-purpose processor implemented by reading software code stored in a memory.
可选地,该芯片中的存储器也可以为一个或多个。该存储器可以与处理器集成在一起,也可以和处理器分离设置。示例性的,存储器可以是非瞬时性处理器,例如只读存储器ROM,其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上。Optionally, the memory in the chip may be one or more. The memory may be integrated with the processor or may be separately provided with the processor. Exemplarily, the memory may be a non-transient processor, such as a read-only memory ROM, which may be integrated with the processor on the same chip or may be provided on different chips.
第四方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序或指令,当该计算机程序或指令被执行时,使得计算机执行上述第一方面或第一方面的任一种可能的实现方式中的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium having a computer program or instructions stored thereon. When the computer program or instructions are executed, the computer executes the method in the above-mentioned first aspect or any possible implementation of the first aspect.
第五方面,本申请实施例提供一种计算机程序产品,当计算机读取并执行计算机程序产品时,使得计算机执行上述第一方面或第一方面的任一种可能的实现方式中的方法。In a fifth aspect, an embodiment of the present application provides a computer program product. When a computer reads and executes the computer program product, the computer executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
上述第二方面至第五方面中任一方面可以达到的技术效果可以参照上述第一方面中有益效果的描述,此处不再重复赘述。The technical effects that can be achieved in any of the second to fifth aspects mentioned above can refer to the description of the beneficial effects in the first aspect mentioned above, and will not be repeated here.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1a为本申请提供的一种在以图搜图的场景中进行向量检索的示意图;FIG. 1a is a schematic diagram of performing vector retrieval in a scene of image search provided by the present application;
图1b为本申请提供的一种在药物发现的场景中进行向量检索的示意图;FIG1b is a schematic diagram of performing vector retrieval in a drug discovery scenario provided by the present application;
图2为本申请提供的一种系统架构示意图;FIG2 is a schematic diagram of a system architecture provided by the present application;
图3为本申请提供的一种计算设备的结构示意图;FIG3 is a schematic diagram of the structure of a computing device provided by the present application;
图4为本申请提供的一种处理器的结构示意图;FIG4 is a schematic diagram of the structure of a processor provided by the present application;
图5a为本申请提供的一种向量检索技术的流程示意图;FIG5a is a schematic diagram of a process flow of a vector retrieval technology provided by the present application;
图5b为本申请提供的一种对向量底库中的向量进行聚类后得到的M个聚类分区的示意图;FIG5b is a schematic diagram of M cluster partitions obtained after clustering vectors in a vector base library provided by the present application;
图6为本申请提供的一种向量检索方法的流程示意图;FIG6 is a schematic diagram of a flow chart of a vector search method provided by the present application;
图7为本申请提供的一种对任一检索分区划分检索子分区的示意图;FIG7 is a schematic diagram of dividing any search partition into search sub-partitions provided by the present application;
图8为本申请提供的一种待查询向量与每个检索分区的分区中心向量之间的第二相似度和待查询向量与每个检索子分区的子分区中心向量的第三相似度的示意图;FIG8 is a schematic diagram of a second similarity between a query vector and a partition center vector of each retrieval partition and a third similarity between a query vector and a sub-partition center vector of each retrieval sub-partition provided by the present application;
图9为本申请提供的一种根据各第二相似度得到概率值的方法的流程示意图;FIG9 is a flow chart of a method for obtaining a probability value according to each second similarity provided by the present application;
图10为本申请提供的再一种根据各第二相似度得到概率值的方法的流程示意图;FIG10 is a flow chart of another method for obtaining a probability value according to each second similarity provided by the present application;
图11a为本申请提供的采用硬件加速器的矩阵乘矩阵的方式进行计算后得到的任一待查询向量与M个 分区中心向量的M个第一相似度的矩阵的示意图;FIG. 11a is a diagram of any query vector and M vectors obtained by calculating the matrix multiplication by a hardware accelerator provided in the present application. Schematic diagram of the matrix of M first similarities of the partition center vector;
图11b为本申请提供的2个待查询向量中每个待查询向量对应的3个第一相似度形成的矩阵的示意图;FIG. 11b is a schematic diagram of a matrix formed by three first similarities corresponding to each of two query vectors provided in the present application;
图12为本申请提供的一种对概率值进行判断的方法的示意图;FIG12 is a schematic diagram of a method for determining a probability value provided by the present application;
图13为本申请提供的一种向量检索方法的整体性流程示意图;FIG13 is a schematic diagram of the overall process of a vector retrieval method provided by the present application;
图14为本申请提供的一种向量检索装置的示意图。FIG14 is a schematic diagram of a vector retrieval device provided in the present application.
具体实施方式Detailed ways
为更好的解释本申请,如下先对本申请涉及的技术或名词解释如下。In order to better explain the present application, the technologies or terms involved in the present application are explained as follows.
1、向量检索技术,在一个给定的向量数据集中,按照某种度量方式,检索出与待查询向量相近的向量。1. Vector retrieval technology: In a given vector data set, vectors similar to the query vector are retrieved according to a certain metric.
2、k均值聚类算法(k-means clustering algorithm),是一种迭代求解的聚类分析算法。具体的,给定类别数目k,对整个数据集进行聚类,目标函数是所有样本到类中心的距离和最小,迭代计算优化目标函数,得到k个类中心和每个样本所属的类别。2. K-means clustering algorithm is an iterative clustering analysis algorithm. Specifically, given the number of categories k, the entire data set is clustered. The objective function is the minimum sum of the distances from all samples to the class center. The objective function is iteratively calculated and optimized to obtain k class centers and the category to which each sample belongs.
3、检索精度,也可称之为召回率。给定一待查询向量,检索系统对该待查询向量进行检索,返回W个向量作为查询结果。令返回的这W个向量的集合为X,定义在整个向量底库中与该查询向量的相似度由高到低排序在前的W个向量的集合为Y,则检索系统对该待查询向量的检索精度为|X∩Y|/|Y|。3. Retrieval precision, also known as recall rate. Given a query vector, the retrieval system searches the query vector and returns W vectors as query results. Let the set of the returned W vectors be X, and let the set of the top W vectors in the entire vector base that are ranked from high to low in similarity with the query vector be Y. Then the retrieval precision of the retrieval system for the query vector is |X∩Y|/|Y|.
图1a示出了一种在以图搜图的场景中进行向量检索的示意图。具体地,先对大量图片进行特征提取后得到大量向量,将这些大量向量形成向量底库;将待查询图片进行特征提取后得到的待查询向量在向量底库中进行检索,检索出与待查询向量满足相似度要求的向量;确定这些满足相似度要求的向量是从哪些图片中提取的,将确定的这些图片作为查询结果返回。例如在互联网应用中,根据用户输入的待查询的商品图片,检索出包含有与用户输入的商品图片中包含的商品外观相似的商品的图片;例如根据用户经常浏览的视频的画面,检索出与这些画面近似的其他视频,推送给用户等。互联网不断增长的数据规模对检索系统的检索速度和效率提出了更高要求。FIG1a shows a schematic diagram of vector retrieval in a scenario of image search. Specifically, a large number of vectors are obtained by performing feature extraction on a large number of images, and these large number of vectors are formed into a vector base library; the vectors to be queried obtained after feature extraction on the image to be queried are searched in the vector base library, and vectors that meet the similarity requirements with the vectors to be queried are retrieved; it is determined from which images these vectors that meet the similarity requirements are extracted, and these determined images are returned as query results. For example, in Internet applications, based on the product images to be queried input by the user, images containing products with similar appearance to the product images input by the user are retrieved; for example, based on the images of videos that the user frequently browses, other videos similar to these images are retrieved and pushed to the user, etc. The ever-increasing data scale of the Internet has put forward higher requirements on the retrieval speed and efficiency of the retrieval system.
图1b示出了一种在药物发现的场景中进行向量检索的示意图。将大量化合物经过编码器的编码后得到的大量向量形成向量底库;将待查询药物的活性片段或先导化合物进行编码器的编码后得到的待查询向量在向量底库中进行检索,检索出与待查询向量满足相似度要求的向量;将这些满足相似度要求的向量对应的化合物作为查询结果返回。新型药物的研发需要在亿级/十亿级的化合物底库中搜索与新型药物的活性片段或先导化合物相似的化合物作为潜在药物。由于相似化合物的选择会影响后续周期较长的动物实验和临床实验,这一应用对检索系统的检索速度也提出了较大要求。Figure 1b shows a schematic diagram of vector retrieval in a drug discovery scenario. A large number of vectors obtained by encoding a large number of compounds with an encoder form a vector base library; the vectors to be queried obtained by encoding the active fragments or lead compounds of the drug to be queried with the encoder are searched in the vector base library to retrieve vectors that meet the similarity requirements with the vectors to be queried; the compounds corresponding to these vectors that meet the similarity requirements are returned as query results. The research and development of new drugs requires searching for compounds similar to the active fragments or lead compounds of new drugs as potential drugs in a compound base library of hundreds of millions/billions. Since the selection of similar compounds will affect subsequent animal experiments and clinical trials with longer cycles, this application also places great demands on the retrieval speed of the retrieval system.
如何确定检索出与待查询向量满足相似度要求的向量,这里可以提供如下两种方式:How to determine the retrieved vector that meets the similarity requirement with the query vector? Here are two methods:
方式1、计算待查询向量与整个向量底库中的所有向量的相似度,在这些相似度中选取相似度由高到低排序在前的W个向量作为查询结果。Method 1: Calculate the similarity between the query vector and all vectors in the entire vector base, and select W vectors ranked in descending order of similarity as the query results.
方式2、依次计算待查询向量与整个向量底库中的各向量的相似度,直至找到相似度满足预设阈值的向量达到W个,则停止计算待查询向量与向量底库中剩余的其他向量之间的相似度。Method 2: Calculate the similarity between the query vector and each vector in the entire vector base database in turn until W vectors whose similarity meets the preset threshold are found, then stop calculating the similarity between the query vector and the remaining vectors in the vector base database.
方式1需要计算待查询向量与整个向量底库中的所有向量的相似度,虽然能够保证检索精度,但是向量底库中的向量数量十分庞大,一般为亿级/十亿级的数量。这会导致计算量极大,限制了检索速度的提高。方式2中计算量相较于方式1来说有所降低,但是若预设阈值设置的较高,则计算量仍然较大,检索速度慢;若预设阈值设置的较低,则检索精度受到影响。因此采用方式2进行向量检索,对预设阈值的设置要求较高,甚至对于不同的待查询向量,需要设置不同的预设阈值,检索方式不够灵活。Method 1 needs to calculate the similarity between the vector to be queried and all the vectors in the entire vector base. Although it can ensure the retrieval accuracy, the number of vectors in the vector base is very large, generally in the hundreds of millions/billions. This will result in a huge amount of calculation, which limits the improvement of the retrieval speed. The amount of calculation in method 2 is reduced compared to method 1, but if the preset threshold is set high, the amount of calculation is still large and the retrieval speed is slow; if the preset threshold is set low, the retrieval accuracy is affected. Therefore, using method 2 for vector retrieval has high requirements for the setting of the preset threshold, and even different preset thresholds need to be set for different vectors to be queried, and the retrieval method is not flexible enough.
综上,上述向量检索的方式不能兼顾检索精度和检索速度。基于此,本申请施例提供一种向量检索的方法,用以在保证检索精度的基础上,提高检索速度。In summary, the above-mentioned vector search method cannot take into account both search accuracy and search speed. Based on this, the present application embodiment provides a vector search method to improve the search speed while ensuring the search accuracy.
图2提供了一种本申请实施例可以适用的系统架构示意图,该系统中包括采集设备10、计算设备20和存储设备30。其中,采集设备10可以是一个或多个,计算设备20也可以是一个或多个,存储设备30也可以是一个或多个。一个或多个采集设备10、一个或多个计算设备20和一个或多个存储设备30可通过网络连接。FIG2 provides a schematic diagram of a system architecture applicable to an embodiment of the present application, wherein the system includes a collection device 10, a computing device 20, and a storage device 30. The collection device 10 may be one or more, the computing device 20 may be one or more, and the storage device 30 may be one or more. One or more collection devices 10, one or more computing devices 20, and one or more storage devices 30 may be connected via a network.
采集设备10可用于采集数据,将采集到的数据通过网络发送给计算设备20。采集设备10可以是摄像机、手机、电脑等,采集设备10采集到的数据可以是图片、视频、音频、文本等数据。示例性的,在视频监控场景中,采集设备10具体可以是摄像机,摄像机采集到的数据比如是摄像机拍摄的图片和/或视频。 The acquisition device 10 can be used to collect data and send the collected data to the computing device 20 through the network. The acquisition device 10 can be a camera, a mobile phone, a computer, etc., and the data collected by the acquisition device 10 can be pictures, videos, audio, text, etc. Exemplarily, in a video surveillance scenario, the acquisition device 10 can specifically be a camera, and the data collected by the camera can be, for example, pictures and/or videos taken by the camera.
计算设备20,用于对获得的任一数据进行特征提取得到该数据对应的向量;将大量数据对应的大量向量形成向量底库,对向量底库中的大量向量按照向量之间的相似度进行聚类计算,从而得到M个聚类分区,每个聚类分区中的各个向量之间的相似度均较高,其中,M为大于1的整数。每个聚类分区均有对应的分区中心向量,每个聚类分区的分区中心向量是根据该聚类分区中包含的多个向量确定的,例如,可以根据聚类分区中包含的多个向量的均值、众数或中位数确定该聚类分区的分区中心向量,分区中心向量可以理解为是聚类分区中包含的多个向量中的代表,代表该聚类分区包含的各个向量的特征。本申请实施例对聚类算法不作限制,例如,可以采用k均值聚类算法、均值漂移聚类和基于密度的聚类方法等对向量底库中的大量向量按照向量之间的相似度进行聚类计算,从而得到M个聚类分区。The computing device 20 is used to extract features from any data obtained to obtain the vector corresponding to the data; a large number of vectors corresponding to a large amount of data form a vector base library, and a large number of vectors in the vector base library are clustered and calculated according to the similarity between the vectors, thereby obtaining M cluster partitions, and the similarity between each vector in each cluster partition is relatively high, wherein M is an integer greater than 1. Each cluster partition has a corresponding partition center vector, and the partition center vector of each cluster partition is determined according to the multiple vectors contained in the cluster partition. For example, the partition center vector of the cluster partition can be determined according to the mean, mode or median of the multiple vectors contained in the cluster partition. The partition center vector can be understood as a representative of the multiple vectors contained in the cluster partition, representing the characteristics of each vector contained in the cluster partition. The embodiment of the present application does not limit the clustering algorithm. For example, a k-means clustering algorithm, a mean shift clustering method, and a density-based clustering method can be used to perform clustering calculations on a large number of vectors in the vector base library according to the similarity between the vectors, thereby obtaining M cluster partitions.
存储设备30,可以用于存储计算设备计算得到的多个聚类分区。示例性的,如图5b示出了一种可能的对向量底库中的向量进行聚类后得到的M个聚类分区的示意图,在图5b中,假设对各向量按照向量之间的相似度进行聚类得到了8个聚类分区,聚类分区之间用实线进行区分;将每个聚类分区中的各向量取平均值得到聚类分区的分区中心向量,分区中心向量在图中用五角星表示,多个黑色实心点用于表示聚类分区中包含的除分区中心向量之外的其他多个向量。例如,某个聚类分区中包含3个向量,分别为[1,1,1]、[2,2,2]和[3,3,3],则该聚类分区的分区中心向量可以为[2,2,2]。The storage device 30 can be used to store multiple cluster partitions calculated by the computing device. Exemplarily, FIG5b shows a schematic diagram of M cluster partitions obtained after clustering the vectors in the vector base library. In FIG5b, it is assumed that 8 cluster partitions are obtained by clustering the vectors according to the similarity between the vectors, and the cluster partitions are distinguished by solid lines; the average value of each vector in each cluster partition is taken to obtain the partition center vector of the cluster partition, and the partition center vector is represented by a five-pointed star in the figure, and multiple black solid dots are used to represent multiple vectors contained in the cluster partition except the partition center vector. For example, a cluster partition contains 3 vectors, namely [1, 1, 1], [2, 2, 2] and [3, 3, 3], then the partition center vector of the cluster partition can be [2, 2, 2].
计算设备20在对向量底库中的向量进行聚类得到M个聚类分区后,可以将M个聚类分区中的每个聚类分区的分区中心向量以及每个聚类分区中包含的各向量发送至存储设备30进行存储。也就是说,存储设备30中可以存储如图5b所示的数据结构,用于计算设备20进行后续的向量检索。After the computing device 20 clusters the vectors in the vector base to obtain M cluster partitions, the partition center vector of each cluster partition in the M cluster partitions and each vector contained in each cluster partition can be sent to the storage device 30 for storage. In other words, the storage device 30 can store the data structure shown in FIG. 5b for the computing device 20 to perform subsequent vector retrieval.
在向量检索阶段,采集设备10可以用于采集或获取待查询数据,将待查询数据发送至计算设备20。例如,用户打开购物应用程序,在购物应用程序中输入一张包含需要查询的商品的待查询图片,则采集设备采集到该待查询图片,就可以将该待查询图片发送至计算设备20。In the vector retrieval stage, the acquisition device 10 can be used to acquire or obtain the data to be queried, and send the data to be queried to the computing device 20. For example, a user opens a shopping application and enters a picture to be queried containing a product to be queried in the shopping application. The acquisition device acquires the picture to be queried and can send the picture to be queried to the computing device 20.
计算设备20,用于对该待查询图片进行特征提取,得到该待查询图片对应的待查询向量;然后根据待查询向量在存储设备30中存储的M个聚类分区中查找相似向量,并将查找到的相似向量反馈给用户。The computing device 20 is used to extract features of the query image to obtain a query vector corresponding to the query image; then search for similar vectors in the M cluster partitions stored in the storage device 30 according to the query vector, and feed back the found similar vectors to the user.
应理解,采集设备10、计算设备20和存储设备30可以集成在同一设备中,也可以分别设置在不同设备中。例如,可以将计算设备20和存储设备30集成在服务器中,将采集设备10集成在终端设备中等。It should be understood that the acquisition device 10, the computing device 20 and the storage device 30 may be integrated into the same device or respectively arranged in different devices. For example, the computing device 20 and the storage device 30 may be integrated into a server, and the acquisition device 10 may be integrated into a terminal device.
进一步的,如图3为一种可能的计算设备20的结构示意图,计算设备20包括处理器201、存储器202和通信接口203。其中,处理器201、存储器202和通信接口203任两个之间可通过总线204连接。3 is a schematic diagram of a possible structure of a computing device 20, and the computing device 20 includes a processor 201, a memory 202, and a communication interface 203. Among them, any two of the processor 201, the memory 202, and the communication interface 203 may be connected via a bus 204.
处理器201可以是中央处理器(central processing unit,CPU),该CPU可用于执行存储器202中的软件程序以实现一个或多个功能,例如,对数据进行特征提取等。除CPU之外,处理器201还可以是专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)、片上系统(system on chip,SoC)或复杂可编程逻辑器件(complex programmable logic device,CPLD)、图形处理器(graphics processing unit,GPU)、神经网络加速器(neural-network processing unit,NPU)等。The processor 201 may be a central processing unit (CPU), which may be used to execute software programs in the memory 202 to implement one or more functions, such as extracting features from data. In addition to the CPU, the processor 201 may also be an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a system on chip (SoC) or a complex programmable logic device (CPLD), a graphics processing unit (GPU), a neural-network processing unit (NPU), etc.
需要说明的是,在实际应用中,处理器201的数量可以有多个,该多个处理器201可以包括多个相同类型的处理器,也可以包括多个不同类型的处理器,例如,多个处理器201即为多个CPU。又例如,该多个处理器201中包括一个或多个CPU以及一个或多个GPU。再例如,该多个处理器201中包括一个或多个CPU以及一个或多个NPU。或者,该多个处理器201中包括一个或多个CPU、一个或多个GPU、以及一个或多个NPU等。其中,处理器201(比如CPU、NPU等)可以包括有一个核,或者包括多个核。It should be noted that, in actual applications, there may be multiple processors 201, and the multiple processors 201 may include multiple processors of the same type, or may include multiple processors of different types. For example, multiple processors 201 are multiple CPUs. For another example, the multiple processors 201 include one or more CPUs and one or more GPUs. For another example, the multiple processors 201 include one or more CPUs and one or more NPUs. Alternatively, the multiple processors 201 include one or more CPUs, one or more GPUs, and one or more NPUs, etc. Among them, the processor 201 (such as a CPU, an NPU, etc.) may include one core, or may include multiple cores.
存储器202,是指用于存储数据的装置,它可以是内存,也可以是硬盘。The memory 202 refers to a device for storing data, which can be a memory or a hard disk.
内存,是指与处理器201直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为运行在处理器201上的操作系统或其他正在运行中的程序的临时数据存储器。内存包括易失性存储器(volatile memory),例如,随机存储器(random access memory,RAM)、动态随机存储器(dynamic random access memory,DRAM)等,也可以包括非易失性存储器(non-volatile memory),例如存储级内存(storage class memory,SCM)等,或者易失性存储器与非易失性存储器的组合等。在实际应用中,计算设备20中可配置多个内存,可选的,该多个内存可以是不同类型。本实施例不对内存的数量和类型进行限定。此外,可对内存进行配置使其具有保电功能。保电功能是指系统发生掉电又重新上电时,内存中存储的数据也不会丢失。具有保电功能的内存被称为非易失性存储器。Memory refers to an internal memory that directly exchanges data with the processor 201. It can read and write data at any time and at a very fast speed. It serves as a temporary data storage for the operating system or other running programs running on the processor 201. Memory includes volatile memory (volatile memory), such as random access memory (RAM), dynamic random access memory (DRAM), etc., and may also include non-volatile memory (non-volatile memory), such as storage class memory (SCM), etc., or a combination of volatile memory and non-volatile memory. In practical applications, multiple memories can be configured in the computing device 20, and optionally, the multiple memories can be of different types. This embodiment does not limit the number and type of memory. In addition, the memory can be configured to have a power-saving function. The power-saving function means that when the system loses power and then powers on again, the data stored in the memory will not be lost. Memory with a power-saving function is called a non-volatile memory.
硬盘,用于提供存储资源,例如用于存储采集设备10采集的图片、视频、音频、文本等数据。硬盘包括但不限于:非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),硬盘驱动器(hard disk drive,HDD)或固态驱动器(solid state disk,SSD)等。与内存不同之处在于,硬盘的 读写速度较慢,通常用于持久性地存储数据。在一种实施方式中,硬盘中的数据、程序指令等需要先加载到内存中,然后,处理器再从内存中获取这些数据和/或程序指令。The hard disk is used to provide storage resources, such as for storing pictures, videos, audio, text and other data collected by the acquisition device 10. The hard disk includes but is not limited to: non-volatile memory (non-volatile memory), such as read-only memory (ROM), hard disk drive (HDD) or solid state drive (SSD). The difference between the hard disk and the memory is that the hard disk is The read and write speed is relatively slow and it is usually used to store data persistently. In one embodiment, the data, program instructions, etc. in the hard disk need to be loaded into the memory first, and then the processor obtains the data and/or program instructions from the memory.
通信接口203,用于与其他设备通信,比如用于计算设备20与采集设备10或存储设备30进行通信。The communication interface 203 is used for communicating with other devices, for example, for the computing device 20 to communicate with the acquisition device 10 or the storage device 30 .
在实际应用中,如图4所示,计算设备20中可包括两种处理器201,该两种处理器201可分别是CPU和NPU,其中,该CPU可包括6个CPU核,该NPU中可包括2个NPU核,NPU核又可称为是AI核。NPU的算力高于CPU的算力,CPU可用于执行数据检索过程中的相似度排序等,NPU可用于执行数据检索过程中的相似度计算等,具体可参见图4示出的一种计算设备20中处理器201的结构。In practical applications, as shown in FIG4 , the computing device 20 may include two processors 201, which may be a CPU and an NPU, respectively. The CPU may include 6 CPU cores, and the NPU may include 2 NPU cores, which may also be called AI cores. The computing power of the NPU is higher than that of the CPU. The CPU can be used to perform similarity sorting in the data retrieval process, and the NPU can be used to perform similarity calculation in the data retrieval process. For details, see the structure of a processor 201 in a computing device 20 shown in FIG4 .
基于上述图2所示的系统架构、图3和图4所示的计算设备的硬件架构,本申请示例性提供一种向量检索的流程示意图,该流程示意图可参见图5a所示。具体的,该流程具体可由图3至图4示出的计算设备20执行,该流程可大致划分为如下三个阶段:Based on the system architecture shown in FIG. 2 and the hardware architecture of the computing device shown in FIG. 3 and FIG. 4 , the present application exemplarily provides a flow chart of vector retrieval, which can be seen in FIG. 5a . Specifically, the flow chart can be executed by the computing device 20 shown in FIG. 3 to FIG. 4 , and the flow chart can be roughly divided into the following three stages:
一、特征提取阶段1. Feature extraction stage
计算设备20针对获取的多个样本图片,将每个样本图片输入至预设的特征提取模型,本申请实施例对特征提取模型的种类不作限制。例如可以输入至卷积神经网络(convolutional neural network,CNN)模型中进行特征提取,从而CNN模型分别输出每个样本图片对应的向量。随后,计算设备20将每个样本图片对应的向量存储于向量底库中,该向量底库具体可以位于计算设备20的存储器202中,也可以位于存储设备30中,存储设备30可以为独立的存储介质或存储器等。The computing device 20 inputs each sample image into a preset feature extraction model for the acquired multiple sample images. The embodiment of the present application does not limit the type of feature extraction model. For example, it can be input into a convolutional neural network (CNN) model for feature extraction, so that the CNN model outputs the vector corresponding to each sample image. Subsequently, the computing device 20 stores the vector corresponding to each sample image in a vector base library, which can be located in the memory 202 of the computing device 20 or in the storage device 30. The storage device 30 can be an independent storage medium or memory, etc.
二、聚类阶段2. Clustering stage
计算设备20对向量底库中的各向量按照向量之间的相似度进行聚类,得到M个聚类分区,其中每个聚类分区都对应有分区中心向量,M为大于1的整数。示例性的,可以按照如下两种方式对向量底库中的各向量进行聚类,得到M个聚类分区:The computing device 20 clusters each vector in the vector base according to the similarity between the vectors to obtain M cluster partitions, where each cluster partition corresponds to a partition center vector, and M is an integer greater than 1. Exemplarily, each vector in the vector base can be clustered in the following two ways to obtain M cluster partitions:
实现方式一,对向量底库中的各向量直接进行聚类,得到M个聚类分区以及每个聚类分区的分区中心向量。分区中心向量是由该聚类分区中的各向量得到,例如取各向量的平均值、中值等,本申请实施例对此不作限制。具体的聚类算法可以为k均值聚类算法、模糊c均值聚类算法、均值漂移聚类和基于密度的聚类方法等,本申请实施例对此不作限制。Implementation method 1, directly cluster each vector in the vector base library to obtain M cluster partitions and the partition center vector of each cluster partition. The partition center vector is obtained by each vector in the cluster partition, for example, taking the average value, median, etc. of each vector, which is not limited in the embodiment of the present application. The specific clustering algorithm can be a k-means clustering algorithm, a fuzzy c-means clustering algorithm, a mean shift clustering method, and a density-based clustering method, which is not limited in the embodiment of the present application.
实现方式二,从向量底库中随机选取预设比例(比如10%左右)的向量作为训练样本,对训练样本进行聚类得到M个聚类分区以及每个聚类分区的分区中心向量。具体的聚类算法可以为k均值聚类算法、模糊c均值聚类算法、均值漂移聚类和基于密度的聚类方法等,本申请实施例对此不作限制。以M个分区中心向量为中心,将向量底库中除训练样本以外的其他向量分别聚类至该M个聚类分区中。如此,可降低确定出分区中心向量的计算量,提高确定出分区中心向量的速度。Implementation method 2: randomly select a preset proportion (for example, about 10%) of vectors from the vector base library as training samples, cluster the training samples to obtain M cluster partitions and the partition center vector of each cluster partition. The specific clustering algorithm can be a k-means clustering algorithm, a fuzzy c-means clustering algorithm, a mean shift clustering method, and a density-based clustering method, etc., which is not limited in the embodiments of the present application. With the M partition center vectors as the center, the other vectors in the vector base library except the training samples are clustered into the M cluster partitions respectively. In this way, the amount of calculation for determining the partition center vector can be reduced, and the speed of determining the partition center vector can be increased.
三、向量检索阶段3. Vector Retrieval Stage
基于上述特征提取阶段和聚类阶段的处理后,就可以得到多个聚类中心,每个聚类中心有自己对应的分区中心向量。后续,在用户有查询请求时,可以通过客户端向计算设备20输入待查询的数据,计算设备20根据获取的待查询的数据进行特征提取,得到待查询向量,然后将待查询向量分别与M个聚类分区的分区中心向量做相似度计算,得到M个第一相似度。进而在M个第一相似度中,选择第一相似度由高到低排序在前的K个第一相似度,并将K个第一相似度分别对应的聚类分区确定为K个检索分区。After the processing of the above-mentioned feature extraction stage and clustering stage, multiple cluster centers can be obtained, and each cluster center has its own corresponding partition center vector. Subsequently, when the user has a query request, the data to be queried can be input to the computing device 20 through the client, and the computing device 20 performs feature extraction based on the acquired data to be queried to obtain the vector to be queried, and then calculates the similarity between the vector to be queried and the partition center vectors of the M cluster partitions, and obtains M first similarities. Then, among the M first similarities, select the K first similarities that are ranked first from high to low, and determine the cluster partitions corresponding to the K first similarities as K retrieval partitions.
在K个检索分区中选择未被选择过的检索分区作为目标检索分区,计算待查询向量分别与该目标检索分区中包含的各向量之间的第二相似度;根据各第二相似度,确定目标检索分区中包含目标向量的概率值,这里的目标向量是指与待查询向量之间的相似度在预设范围内的向量。例如目标向量是指与待查询向量之间的相似度大于0.9的向量。若概率值大于第一预设阈值,则不再从未被选择过的检索分区中选择下一个目标检索分区,针对该待查询向量可以终止检索了;若概率值不大于第一预设阈值,则继续在未被选择过的检索分区中选择下一个目标检索分区,继续计算待查询向量分别与新选取的目标检索分区中包含的各向量之间的第二相似度,根据各第二相似度,确定该新选取的目标检索分区中包含目标向量的概率值,将该概率值再次与第一预设阈值比较……重复执行上述步骤,直至选择到的目标检索分区对应的概率值大于第一预设阈值为止,则停止从未被选择过的检索分区中选择下一个目标检索分区。Select an unselected retrieval partition from the K retrieval partitions as the target retrieval partition, calculate the second similarity between the query vector and each vector contained in the target retrieval partition; determine the probability value of the target retrieval partition containing the target vector according to each second similarity, where the target vector refers to a vector whose similarity with the query vector is within a preset range. For example, the target vector refers to a vector whose similarity with the query vector is greater than 0.9. If the probability value is greater than the first preset threshold, the next target retrieval partition will no longer be selected from the unselected retrieval partition, and the retrieval for the query vector can be terminated; if the probability value is not greater than the first preset threshold, continue to select the next target retrieval partition from the unselected retrieval partition, continue to calculate the second similarity between the query vector and each vector contained in the newly selected target retrieval partition, determine the probability value of the newly selected target retrieval partition containing the target vector according to each second similarity, and compare the probability value with the first preset threshold again... Repeat the above steps until the probability value corresponding to the selected target retrieval partition is greater than the first preset threshold, then stop selecting the next target retrieval partition from the unselected retrieval partition.
可见,本申请实施例提供的向量检索方法,通过在向量检索的过程中,对是否对当前的待查询向量提前终止检索进行推理,从而提高向量检索的速度。例如,先计算待查询向量与第一个检索分区中的各向量的各第二相似度,根据各第二相似度确定出第一个检索分区中包含目标向量的概率,若概率较高,则不再计算待查询向量与其他检索分区的各向量的第二相似度。如此可以降低检索计算量,提高向量检索的速度。It can be seen that the vector retrieval method provided in the embodiment of the present application improves the speed of vector retrieval by reasoning whether to terminate the retrieval of the current vector to be queried in advance during the vector retrieval process. For example, the second similarities between the vector to be queried and each vector in the first retrieval partition are first calculated, and the probability that the target vector is included in the first retrieval partition is determined based on each second similarity. If the probability is high, the second similarities between the vector to be queried and each vector in other retrieval partitions are no longer calculated. In this way, the amount of retrieval calculation can be reduced and the speed of vector retrieval can be improved.
下面将通过具体的步骤详细阐述本申请实施例提供的向量检索方法,如图6所示,该方法可以由上述 图2中的计算设备执行,或者计算设备中的芯片执行,包括如下步骤:The vector retrieval method provided by the embodiment of the present application will be described in detail below through specific steps. As shown in FIG6 , the method can be performed by the above The computing device in FIG. 2 executes, or the chip in the computing device executes, the steps include:
步骤601,获取待查询向量。示例性的,待查询向量可以是用户通过查询客户端向计算设备输入的某个向量,也可以是计算设备从向量底库中获取的任一个向量。本申请实施例对此不作限制。Step 601, obtaining a vector to be queried. Exemplarily, the vector to be queried may be a vector input by a user to a computing device through a query client, or may be any vector obtained by the computing device from a vector base library. This embodiment of the application does not limit this.
步骤602,将所述待查询向量分别与M个聚类分区的分区中心向量做相似度计算,得到M个第一相似度;所述M个聚类分区为对向量底库中的各向量按照向量之间的相似度进行聚类得到的;任一所述聚类分区的分区中心向量是根据任一所述聚类分区中包含的多个向量确定的,所述M为大于1的整数。在所述M个第一相似度中,选择所述第一相似度由高到低排序在前的K个第一相似度,并将所述K个第一相似度分别对应的聚类分区确定为K个检索分区,所述K为大于等于1的整数、且K小于所述M。Step 602: Calculate the similarity between the query vector and the partition center vectors of M cluster partitions to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarity between the vectors; the partition center vector of any cluster partition is determined according to the multiple vectors contained in any cluster partition, and M is an integer greater than 1. Among the M first similarities, select the K first similarities that are ranked first in descending order, and determine the cluster partitions corresponding to the K first similarities as K search partitions, where K is an integer greater than or equal to 1 and K is less than M.
步骤603,循环执行如下操作,直至确定出在所述K个检索分区中选择的目标检索分区包含目标向量的概率值大于第一预设阈值为止,所述目标向量为与所述待查询向量之间的相似度在预设范围内的向量:Step 603, looping and performing the following operations until it is determined that the probability value of the target search partition selected from the K search partitions containing the target vector is greater than a first preset threshold, wherein the target vector is a vector whose similarity with the query vector is within a preset range:
在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区;计算所述待查询向量分别与所述目标检索分区中包含的各向量之间的第二相似度;根据各所述第二相似度,确定所述目标检索分区中包含目标向量的概率值。Select an unselected retrieval partition from the K retrieval partitions as a target retrieval partition; calculate the second similarities between the query vector and each vector contained in the target retrieval partition; and determine the probability value of the target retrieval partition containing the target vector based on each of the second similarities.
需要说明的是,这里对从K个检索分区中选择目标检索分区的顺序不作限制,可以任意选择。例如,假设K个检索分区分别包括检索分区A、检索分区B和检索分区C时,可以在检索分区A、检索分区B和检索分区C中任选一个目标检索分区,比如选择检索分区A。检索分区A中有100个向量,计算待查询向量分别与检索分区A中的100个向量的100个第二相似度。可以将100个第二相似度中大于0.9的第二相似度对应的向量作为目标向量。例如确定出20个目标向量,则可以确定检索分区A中包含目标向量的概率值为20/100=0.2。It should be noted that there is no restriction on the order of selecting the target retrieval partition from the K retrieval partitions, and it can be selected arbitrarily. For example, assuming that the K retrieval partitions include retrieval partition A, retrieval partition B, and retrieval partition C respectively, you can select any target retrieval partition from retrieval partition A, retrieval partition B, and retrieval partition C, such as selecting retrieval partition A. There are 100 vectors in retrieval partition A, and the 100 second similarities between the query vector and the 100 vectors in retrieval partition A are calculated. The vector corresponding to the second similarity greater than 0.9 among the 100 second similarities can be used as the target vector. For example, if 20 target vectors are determined, it can be determined that the probability value of containing the target vector in retrieval partition A is 20/100=0.2.
若确定出的概率值大于第一预设阈值,则无需在剩余的检索分区中选择下一个目标检索分区,也就无需计算待查询向量与下一个目标检索分区中的各向量的第二相似度,从而可以节省检索计算量。比如第一预设阈值为0.18时,上述确定出检索分区A包含目标向量的概率值为0.2已经大于0.18,从而无需再计算待查询向量分别与检索分区B和检索分区C中的各向量的第二相似度,可见可以节省很多计算工作量。If the determined probability value is greater than the first preset threshold, there is no need to select the next target search partition from the remaining search partitions, and there is no need to calculate the second similarity between the query vector and each vector in the next target search partition, thereby saving the search calculation amount. For example, when the first preset threshold is 0.18, the above-mentioned probability value of determining that the search partition A contains the target vector is 0.2, which is greater than 0.18, so there is no need to calculate the second similarity between the query vector and each vector in the search partition B and the search partition C, respectively, which can save a lot of calculation workload.
若确定出的概率值不大于第一预设阈值,则说明当前选择的目标检索分区中包含目标向量的数量太少,基于这样的目标检索分区检索待查询向量,检索精度很可能会比较低。因此在剩余的未被选择过的检索分区中继续选择一个目标检索分区。例如继续选择检索分区B,并重复之前选择检索分区A后的步骤,直至得到的概率值大于第一预设阈值为止,再停止继续选择检索分区。If the determined probability value is not greater than the first preset threshold, it means that the number of target vectors contained in the currently selected target retrieval partition is too small. The retrieval accuracy is likely to be low when retrieving the query vector based on such a target retrieval partition. Therefore, continue to select a target retrieval partition from the remaining unselected retrieval partitions. For example, continue to select retrieval partition B, and repeat the steps after selecting retrieval partition A until the obtained probability value is greater than the first preset threshold, and then stop selecting retrieval partitions.
通过上述方式,无需将待查询向量与K个检索分区中的所有向量均做相似度计算,而是仅选择其中的一部分检索分区,将待查询向量与这一部分的检索分区中的向量做相似度计算。如此,可减少检索计算量,提高检索速度。In the above method, it is not necessary to calculate the similarity between the query vector and all the vectors in the K search partitions, but only a part of the search partitions is selected, and the query vector and the vectors in this part of the search partitions are calculated similarly. In this way, the amount of search calculation can be reduced and the search speed can be improved.
步骤604,在步骤603停止循环执行时,基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果。其中,基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果,可以但不限于包括如下可能的方式:Step 604: when the loop execution stops in step 603, the query result is output based on the at least one selected search partition and the vector to be queried. The query result is output based on the at least one selected search partition and the vector to be queried, which may include but is not limited to the following possible ways:
一种可能的方式,在已经选择的至少一个检索分区中,确定概率值大于第一预设阈值的检索分区,由于步骤603的终止循环的条件就是概率值大于第一预设阈值,因此这里的“概率值大于第一预设阈值的检索分区”只有一个,就是最后选择的那个目标检索分区,比如上述例子中“概率值大于第一预设阈值的检索分区”即为检索分区A。然后在这个检索分区中检索出待查询向量的查询结果。例如,将该检索分区中的各向量均作为待查询向量的查询结果输出或反馈给用户。再例如,也可以将待查询向量与该检索分区中的各向量的第二相似度由高到低排序在前的W个第二相似度分别对应的W个向量,作为查询结果输出或反馈给用户。One possible way is to determine the retrieval partition whose probability value is greater than the first preset threshold value in at least one retrieval partition that has been selected. Since the condition for terminating the loop in step 603 is that the probability value is greater than the first preset threshold value, there is only one "retrieval partition whose probability value is greater than the first preset threshold value", which is the target retrieval partition selected last. For example, in the above example, the "retrieval partition whose probability value is greater than the first preset threshold value" is retrieval partition A. Then, the query result of the vector to be queried is retrieved in this retrieval partition. For example, each vector in the retrieval partition is output or fed back to the user as the query result of the vector to be queried. For another example, the W vectors corresponding to the W second similarities ranked first from high to low between the vector to be queried and each vector in the retrieval partition can be output or fed back to the user as the query result.
上述技术方案中,由于选择出的概率值大于第一预设阈值的检索分区只有一个,那么将概率值大于第一预设阈值的检索分区包含的所有向量输出作为查询结果,能够进一步有效降低检索计算量,进而提高检索速度。或者将概率值大于第一预设阈值的检索分区包含的所有向量分别与所述待查询向量之间的第二相似度由高到低的顺序排序在前的W个第二相似度分别对应的向量,作为查询结果输出,进而可以进一步精简查询结果。In the above technical solution, since there is only one retrieval partition selected whose probability value is greater than the first preset threshold, all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are output as the query result, which can further effectively reduce the retrieval calculation amount and thus improve the retrieval speed. Alternatively, all vectors contained in the retrieval partition whose probability value is greater than the first preset threshold are respectively sorted in descending order with the second similarities between the vector to be queried and the vector corresponding to the first W second similarities as the query result, which can further simplify the query result.
另一种可能的方式,在已经选择的至少一个检索分区分别包括的多个向量中检索出待查询向量的查询结果。在步骤603中,若选择的第一个目标检索分区的概率值不大于第一预设阈值,则可以选择第二个目标检索分区,若第二个目标检索分区对应的概率值大于第一预设阈值,则不再选择下一个目标检索分区。因此这里的“已经选择的至少一个检索分区”的数量可能大于1。比如上个例子中,可能最终选择了检索 分区A和检索分区B才找到满足“概率值大于第一预设阈值的检索分区”,则可以在检索分区A和检索分区B分别包括的多个向量中检索出待查询向量的查询结果。例如,可以将已经选择的至少一个检索分区分别包含的各向量,作为查询结果输出或反馈给用户。再例如,将待查询向量与已经选择的至少一个检索分区中的各向量的第二相似度由高到低排序在前的W个第二相似度分别对应的W个向量,作为查询结果输出或反馈给用户。Another possible way is to retrieve the query result of the vector to be queried from the multiple vectors respectively included in the at least one retrieval partition that has been selected. In step 603, if the probability value of the first target retrieval partition selected is not greater than the first preset threshold, the second target retrieval partition can be selected. If the probability value corresponding to the second target retrieval partition is greater than the first preset threshold, the next target retrieval partition is no longer selected. Therefore, the number of "at least one retrieval partition that has been selected" here may be greater than 1. For example, in the previous example, the retrieval partition may be finally selected. Only when partition A and retrieval partition B find a retrieval partition that satisfies "a retrieval partition whose probability value is greater than a first preset threshold value", the query result of the vector to be queried can be retrieved from the multiple vectors respectively included in the retrieval partition A and the retrieval partition B. For example, the vectors respectively included in the at least one retrieval partition that has been selected can be output as the query result or fed back to the user. For another example, the W vectors corresponding to the W second similarities respectively ranked from high to low between the vector to be queried and the vectors in the at least one retrieval partition that has been selected can be output as the query result or fed back to the user.
举个例子,在步骤603中,先选择第一个目标检索分区为检索分区A,计算待查询向量与检索分区A中的各向量的各第二相似度,根据各第二相似度确定概率值,该概率值不大于第一预设阈值,则选择下一个目标检索分区为检索分区B;计算待查询向量与检索分区B中的各向量的各第二相似度,根据各第二相似度确定概率值,该概率值大于第一预设阈值,则不再选择目标检索分区。那么在步骤604中,“已经选择的至少一个检索分区”包括检索分区A和检索分区B。由于在步骤603中已经计算过了待查询向量与检索分区A中各向量的第二相似度以及待查询向量与检索分区B中各向量的第二相似度,因此步骤604中无需再重复计算,因此计算量没有增加,而是直接将待查询向量与检索分区A中各向量的第二相似度以及待查询向量与检索分区B中各向量的第二相似度由高到低进行排序,将排序在前的W个第二相似度对应的W个向量作为查询结果。For example, in step 603, the first target retrieval partition is first selected as retrieval partition A, and the second similarities between the query vector and each vector in the retrieval partition A are calculated. A probability value is determined according to each second similarity. If the probability value is not greater than the first preset threshold, the next target retrieval partition is selected as retrieval partition B; the second similarities between the query vector and each vector in the retrieval partition B are calculated, and a probability value is determined according to each second similarity. If the probability value is greater than the first preset threshold, the target retrieval partition is no longer selected. Then in step 604, "at least one retrieval partition that has been selected" includes retrieval partition A and retrieval partition B. Since the second similarities between the query vector and each vector in the retrieval partition A and the second similarities between the query vector and each vector in the retrieval partition B have been calculated in step 603, there is no need to repeat the calculation in step 604, so the amount of calculation does not increase, but the second similarities between the query vector and each vector in the retrieval partition A and the second similarities between the query vector and each vector in the retrieval partition B are directly sorted from high to low, and the W vectors corresponding to the W second similarities ranked first are used as the query results.
上述技术方案中,基于已经选择的至少一个检索分区输出查询结果,而不是基于概率值大于第一预设阈值的检索分区输出查询结果。由于与待查询向量相同或相似的向量落在选择的分区中的概率值不大于第一预设阈值的检索分区中,可能也存在与待查询向量相似度较高的向量,因此可以基于计算得到的待检索向量与已经选择过的检索分区中的各个向量的第二相似度,输出更多更精准的检索结果,从而可以提高向量检索的精度。In the above technical solution, the query result is output based on at least one selected search partition, rather than outputting the query result based on the search partition whose probability value is greater than the first preset threshold. Since the vectors identical or similar to the vector to be searched may fall in the search partition whose probability value is not greater than the first preset threshold in the selected partition, there may also be vectors with a high similarity to the vector to be searched. Therefore, more and more accurate search results can be output based on the calculated second similarity between the vector to be searched and each vector in the selected search partition, thereby improving the accuracy of vector search.
在一种可能的实现方式中,在K个检索分区中选择目标检索分区也可以不是任意选择的,而是按照一定规则进行选择的。下面介绍两种对K个检索分区进行选择的方法。In a possible implementation, the target retrieval partition may not be selected arbitrarily from the K retrieval partitions, but may be selected according to certain rules. Two methods for selecting the K retrieval partitions are described below.
方法一,按照K个第一相似度由高到低的顺序对K个检索分区进行排序,如此,就可以在排序后的K个检索分区中按顺序依次选择未被选择过的检索分区作为目标检索分区。Method 1: sort the K search partitions in descending order of the K first similarities, so that the unselected search partitions can be selected in order from the sorted K search partitions as the target search partitions.
例如,待查询向量与检索分区A的分区中心向量的第一相似度为0.9,待查询向量与检索分区B的分区中心向量的第一相似度为0.8,待查询向量与检索分区C的分区中心向量的第一相似度为0.7,则K个检索分区按照如下顺序排序:检索分区A-检索分区B-检索分区C。如此,在选择目标检索分区时,也是按照这样的顺序进行选择。For example, if the first similarity between the query vector and the partition center vector of retrieval partition A is 0.9, the first similarity between the query vector and the partition center vector of retrieval partition B is 0.8, and the first similarity between the query vector and the partition center vector of retrieval partition C is 0.7, then the K retrieval partitions are sorted in the following order: retrieval partition A - retrieval partition B - retrieval partition C. In this way, when selecting the target retrieval partition, it is also selected in this order.
对每个待查询向量对应的K个检索分区进行科学合理的排序,根据顺序选择目标检索分区,无疑可以尽快找到“概率值大于第一预设阈值的检索分区”,从而可以提高向量检索的速度,减少检索耗时。比如,在上述例子中,先计算检索分区A的概率值,很可能就能够得出概率值大于第一预设阈值,从而可以尽快终止检索。而若先计算检索分区B的概率值,很可能就不能够得出概率值大于第一预设阈值的检索分区,从而需要再将待查询向量与检索分区A中的各向量再计算相似度,无疑增加了计算量,并增加了检索耗时。Scientifically and reasonably sorting the K search partitions corresponding to each vector to be queried, and selecting the target search partition according to the order, will undoubtedly help find the "search partition with a probability value greater than the first preset threshold" as soon as possible, thereby increasing the speed of vector retrieval and reducing the time consumed in retrieval. For example, in the above example, if the probability value of search partition A is calculated first, it is likely that the probability value will be greater than the first preset threshold, so that the search can be terminated as soon as possible. However, if the probability value of search partition B is calculated first, it is likely that the search partition with a probability value greater than the first preset threshold will not be obtained, so that the similarity between the search vector to be queried and the vectors in search partition A needs to be calculated again, which undoubtedly increases the amount of calculation and the time consumed in retrieval.
这样,按照K个第一相似度的大小顺序选择目标检索分区,能够尽早地确定出概率值大于第一预设阈值的目标检索分区,尽量减少再选择目标检索分区的可能,也就无需计算待查询向量与再次选择的目标检索分区中的向量之间的相似度,因此可以降低计算量,提高检索速度。In this way, by selecting the target retrieval partition in the order of the K first similarities, the target retrieval partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of re-selecting the target retrieval partition is minimized. There is no need to calculate the similarity between the query vector and the vector in the target retrieval partition selected again, thereby reducing the amount of calculation and improving the retrieval speed.
方法二,对每个检索分区中的向量按照向量之间的相似度进行聚类,进而得到多个检索子分区,每个检索子分区也有对应有子分区中心向量;计算待查询向量分别与多个检索子分区的子分区中心向量的第三相似度,根据待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对K个检索分区进行排序。如此,就可以在排序后的K个检索分区中按顺序选择未被选择过的检索分区作为目标检索分区。Method 2: Cluster the vectors in each search partition according to the similarity between the vectors, and then obtain multiple search sub-partitions, each of which also has a corresponding sub-partition center vector; calculate the third similarity between the query vector and the sub-partition center vectors of the multiple search sub-partitions, and sort the K search partitions according to the multiple third similarities between the query vector and the multiple sub-partition center vectors in each search partition. In this way, the search partitions that have not been selected can be selected in order from the sorted K search partitions as the target search partitions.
图7示出了本申请实施例提供的一种对任一检索分区划分检索子分区的示意图。如图示出了3个检索分区,针对任一检索分区,对该检索分区中的各向量进行聚类,例如,每个检索分区均划分出了5个检索子分区。当然不同的检索分区划分的检索子分区的数量可以不同。在图7中,检索分区之间以实线进行区分,检索子分区之间以虚线进行区分。图中的五角星所示意的是该检索分区的分区中心向量,图中的三角形所示意的是该检索子分区的子分区中心向量,本申请实施例对任一检索分区中的各向量进行聚类的方式不做限制,可以参考对向量底库中的各向量进行聚类得到多个聚类分区的方法。FIG7 shows a schematic diagram of dividing any retrieval partition into retrieval sub-partitions provided by an embodiment of the present application. As shown in the figure, three retrieval partitions are clustered for any retrieval partition. For example, each retrieval partition is divided into 5 retrieval sub-partitions. Of course, the number of retrieval sub-partitions divided by different retrieval partitions may be different. In FIG7 , the retrieval partitions are distinguished by solid lines, and the retrieval sub-partitions are distinguished by dotted lines. The five-pointed star in the figure represents the partition center vector of the retrieval partition, and the triangle in the figure represents the sub-partition center vector of the retrieval sub-partition. The embodiment of the present application does not limit the way of clustering the vectors in any retrieval partition, and can refer to the method of clustering the vectors in the vector base library to obtain multiple cluster partitions.
在对各检索分区进行排序时,可以根据待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中的最大相似度,对K个检索分区进行排序。例如,图8示意出待查询向量与每个检索分区的分区中心向量之间的第二相似度和待查询向量与每个检索子分区的子分区中心向量的第三相似度。如图8中所示出的检索分区A、检索分区B和检索分区C,每个检索分区中划分有5个检索子分区。计算待查询向 量与检索分区A中的5个检索子分区的子分区中心向量的5个第三相似度,在5个第三相似度中选择最大相似度;计算待查询向量与检索分区B中的5个检索子分区的子分区中心向量的5个第三相似度,在5个第三相似度中选择最大相似度;计算待查询向量与检索分区C中的5个检索子分区的子分区中心向量的5个第三相似度,在5个第三相似度中选择最大相似度;将这3个最大相似度按照由高到低顺序进行排序,相应的,就得到了这3个检索分区的排序。When sorting each retrieval partition, the K retrieval partitions can be sorted according to the maximum similarity among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition. For example, FIG8 illustrates the second similarity between the query vector and the partition center vector of each retrieval partition and the third similarity between the query vector and the sub-partition center vector of each retrieval sub-partition. As shown in FIG8 , retrieval partition A, retrieval partition B and retrieval partition C, each retrieval partition is divided into 5 retrieval sub-partitions. Calculate the query vector Calculate the five third similarities between the subpartition center vectors of the five retrieval subpartitions in retrieval partition A, and select the maximum similarity among the five third similarities; calculate the five third similarities between the query vector and the subpartition center vectors of the five retrieval subpartitions in retrieval partition B, and select the maximum similarity among the five third similarities; calculate the five third similarities between the query vector and the subpartition center vectors of the five retrieval subpartitions in retrieval partition C, and select the maximum similarity among the five third similarities; sort the three maximum similarities in descending order, and accordingly, obtain the sorting of the three retrieval partitions.
还可以根据待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中超出第二预设阈值的第三相似度的数量,对K个检索分区进行排序。例如,在图8中,计算出了待查询向量与检索分区A中的5个检索子分区的子分区中心向量的5个第三相似度,确定超出第二预设阈值的第三相似度的数量x1;计算出了待查询向量与检索分区B中的5个检索子分区的子分区中心向量的5个第三相似度,确定超出第二预设阈值的第三相似度的数量x2;计算出了待查询向量与检索分区C中的5个检索子分区的子分区中心向量的5个第三相似度,确定超出第二预设阈值的第三相似度的数量x3;将x1、x2和x3按照由高到低顺序进行排序,相应的,就得到了这3个检索分区的排序。The K retrieval partitions can also be sorted according to the number of third similarities that exceed the second preset threshold value among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition. For example, in FIG8 , the five third similarities between the query vector and the sub-partition center vectors of the five retrieval sub-partitions in the retrieval partition A are calculated, and the number x1 of the third similarities that exceed the second preset threshold value is determined; the five third similarities between the query vector and the sub-partition center vectors of the five retrieval sub-partitions in the retrieval partition B are calculated, and the number x2 of the third similarities that exceed the second preset threshold value is determined; the five third similarities between the query vector and the sub-partition center vectors of the five retrieval sub-partitions in the retrieval partition C are calculated, and the number x3 of the third similarities that exceed the second preset threshold value is determined; x1, x2 and x3 are sorted in order from high to low, and accordingly, the sorting of the three retrieval partitions is obtained.
由于划分得到的子分区中心向量更能准确地代表该检索子分区中的向量,因此基于待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对所述K个检索分区进行排序,可以提高排序的准确度。如此,能够尽早地确定出概率值大于第一预设阈值的目标检索分区,尽量减少再选择目标检索分区的可能,也就无需计算待查询向量与再次选择的目标检索分区中的向量之间的相似度,因此也可以降低计算量,提高检索速度。Since the sub-partition center vector obtained by division can more accurately represent the vector in the search sub-partition, the K search partitions are sorted based on the multiple third similarities between the query vector and the multiple sub-partition center vectors in each search partition, which can improve the accuracy of the sorting. In this way, the target search partition with a probability value greater than the first preset threshold can be determined as early as possible, and the possibility of reselecting the target search partition is minimized, so there is no need to calculate the similarity between the query vector and the vector in the target search partition selected again, so the amount of calculation can be reduced and the search speed can be improved.
图8中的五角星所示意的是该检索分区的分区中心向量,三角形所示意的是该检索子分区的子分区中心向量,正方形所示意的是待查询向量。根据图8可以看出采用不同的排序方法对K个检索分区的排序的影响。当按照3个第一相似度由高到低的顺序对3个检索分区进行排序时,根据3个第一相似度的大小顺序对各检索分区进行排序为:检索分区A-检索分区B-检索分区C。图8示出了3个第一相似度(3个第一相似度分别用图8中的正方形至3个五角星的距离表示,距离越近,相似度越高)。The five-pointed star in Figure 8 represents the partition center vector of the retrieval partition, the triangle represents the sub-partition center vector of the retrieval sub-partition, and the square represents the vector to be queried. Figure 8 shows the influence of different sorting methods on the sorting of K retrieval partitions. When the three retrieval partitions are sorted in order from high to low according to the three first similarities, the retrieval partitions are sorted according to the order of the size of the three first similarities: retrieval partition A-retrieval partition B-retrieval partition C. Figure 8 shows the three first similarities (the three first similarities are represented by the distance from the square to the three five-pointed stars in Figure 8, and the closer the distance, the higher the similarity).
当根据待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中的最大相似度,对3个检索分区进行排序时,3个检索分区的排序为:检索分区B-检索分区A-检索分区C。图8示出了待查询向量与每个检索分区中的5个子分区中心向量的第三相似度中的最大相似度(分别用图8中的正方形至3个三角形的距离表示,距离越近,相似度越高)。When the three retrieval partitions are sorted according to the maximum similarity among multiple third similarities between the query vector and multiple sub-partition center vectors in each retrieval partition, the three retrieval partitions are sorted as follows: retrieval partition B-retrieval partition A-retrieval partition C. FIG8 shows the maximum similarity among the third similarities between the query vector and the five sub-partition center vectors in each retrieval partition (represented by the distance from the square to the three triangles in FIG8 , the closer the distance, the higher the similarity).
可见,通过划分更加细致的检索子分区,可以对检索分区的排序进行优化和纠正。在具体实施中,还可对检索子分区进一步进行划分,例如每个检索子分区再进一步划分为多个小分区,如此可以进一步提高检索精度和速度。本申请这里不再重复论述。It can be seen that by dividing the search sub-partitions into more detailed ones, the sorting of the search sub-partitions can be optimized and corrected. In the specific implementation, the search sub-partitions can be further divided, for example, each search sub-partition is further divided into multiple small partitions, so that the search accuracy and speed can be further improved. This application will not be repeated here.
在一种可能的实现方式中,根据各第二相似度确定目标检索分区中包含目标向量的概率值还可以通过预测模型进行预测。预测模型可以为单阶段模型,也可以为双阶段模型,不论采用单阶段模型还是双阶段模型。In a possible implementation, the probability value of the target retrieval partition containing the target vector can also be predicted by a prediction model according to each second similarity. The prediction model can be a single-stage model or a two-stage model.
一种可能的训练预测模型的方式,可以为采用大量的带有标签的样本数据对预测模型进行训练。例如,针对任一样本数据,对该样本数据进行特征提取,得到样本向量;计算该样本向量与M个聚类分区的M个第一相似度,根据M个第一相似度的大小确定K个检索分区;在K个检索分区中选择一个目标检索分区,计算该样本向量与目标检索分区中的各向量的各第二相似度,在各第二相似度中选择t个目标第二相似度;将该样本向量,该样本向量与K个检索分区的K个第一相似度和t个目标第二相似度以及标签输入预测模型,标签为该目标检索分区中包含目标向量的概率值。通过多次训练,预测模型的参数可以得到较好的优化和调整。A possible way to train a prediction model is to use a large amount of labeled sample data to train the prediction model. For example, for any sample data, extract features from the sample data to obtain a sample vector; calculate the M first similarities between the sample vector and the M cluster partitions, and determine K retrieval partitions according to the size of the M first similarities; select a target retrieval partition from the K retrieval partitions, calculate the second similarities between the sample vector and each vector in the target retrieval partition, and select t target second similarities from each second similarity; input the sample vector, the K first similarities between the sample vector and the K retrieval partitions, the t target second similarities, and the label into the prediction model, and the label is the probability value of the target retrieval partition containing the target vector. Through multiple trainings, the parameters of the prediction model can be well optimized and adjusted.
另一种可能的训练预测模型的方式,可以为采用大量的样本数据对预测模型进行训练,预测模型的参数根据目标函数进行调整。本申请实施例对目标函数的形式不作限制。通过多次训练,预测模型的参数得到了优化和调整。Another possible way to train the prediction model is to use a large amount of sample data to train the prediction model, and the parameters of the prediction model are adjusted according to the objective function. The embodiment of the present application does not limit the form of the objective function. Through multiple trainings, the parameters of the prediction model are optimized and adjusted.
若采用单阶段模型,则步骤603中的根据各第二相似度得到概率值的方法可以进一步细化,图9示例性示出了一种根据各第二相似度得到概率值的方法,具体可以包括如下步骤:If a single-stage model is used, the method of obtaining a probability value according to each second similarity in step 603 can be further refined. FIG. 9 exemplarily shows a method of obtaining a probability value according to each second similarity, which may specifically include the following steps:
步骤901,在K个检索分区中,选择任一未被选择过的检索分区作为目标检索分区;计算待查询向量与目标检索分区中的各向量的各第二相似度。在各第二相似度中,确定第二相似度由高到低排序在前的t个目标第二相似度。Step 901, among the K search partitions, select any unselected search partition as the target search partition; calculate the second similarities between the query vector and each vector in the target search partition. Among the second similarities, determine the t target second similarities that are ranked first in descending order.
为待查询向量确定目标检索分区的方法同前文所述,在此不再赘述。The method for determining the target retrieval partition for the query vector is the same as described above and will not be repeated here.
下面具体介绍为待查询向量确定t个目标第二相似度的方法。 The following specifically describes a method for determining the second similarities of t targets for a query vector.
例如,将该待查询向量与目标检索分区中的各向量的各第二相似度均作为目标第二相似度;例如,将各第二相似度中满足某一阈值的第二相似度作为目标第二相似度;例如,将各第二相似度中按照相似度由高至低排序后排序在前面的前t个第二相似度作为目标第二相似度;又例如,将各第二相似度中最大的第二相似度作为目标第二相似度。以上仅为示例,本申请实施例对确定目标第二相似度的方式不作限制。其中,目标第二相似度的数量越少,越能减轻预测模型的计算量,提高检索速度。比如,将待查询向量与目标检索分区中的所有向量的第二相似度中最大的第二相似度作为目标第二相似度,能够在不影响最终概率值的精度的情况下,减少了预测模型的算力消耗,进一步提高向量检索的速度。For example, each second similarity between the vector to be queried and each vector in the target retrieval partition is used as the target second similarity; for example, the second similarity that satisfies a certain threshold among the second similarities is used as the target second similarity; for example, the first t second similarities in front of each second similarity after being sorted from high to low according to similarity are used as the target second similarity; for another example, the largest second similarity among the second similarities is used as the target second similarity. The above is only an example, and the embodiment of the present application does not limit the method for determining the target second similarity. Among them, the fewer the number of target second similarities, the more it can reduce the calculation amount of the prediction model and improve the retrieval speed. For example, taking the largest second similarity among the second similarities of the vector to be queried and all the vectors in the target retrieval partition as the target second similarity can reduce the computing power consumption of the prediction model without affecting the accuracy of the final probability value, and further improve the speed of vector retrieval.
例如,针对待查询向量q1,计算q1分别与检索分区A中的100个向量的100个第二相似度,将100个第二相似度中值最大的第二相似度作为目标第二相似度。For example, for the query vector q1, 100 second similarities between q1 and 100 vectors in the search partition A are calculated, and the second similarity with the largest value among the 100 second similarities is used as the target second similarity.
步骤902,将待查询向量、K个第一相似度和t个目标第二相似度输入预测模型,得到概率值。Step 902: input the query vector, K first similarities and t target second similarities into a prediction model to obtain a probability value.
通过预测模型对概率值进行预测,提高了确定概率值的准确性,并且相较于非模型预测的方法来说,通过预测模型确定概率值的速度更快。在各第二相似度中,选择第二相似度由高到低排序在前的t个目标第二相似度,将t个目标第二相似度输入预测模型,还可以减小预测模型的计算量,从而进一步提高预测概率值的速度,且不影响预测精度。By predicting the probability value through the prediction model, the accuracy of determining the probability value is improved, and the speed of determining the probability value through the prediction model is faster than that of the non-model prediction method. Among the second similarities, t target second similarities with the highest second similarities in descending order are selected, and the t target second similarities are input into the prediction model, which can also reduce the amount of calculation of the prediction model, thereby further improving the speed of predicting the probability value without affecting the prediction accuracy.
举例来说,对向量底库中的各向量进行聚类,得到10个聚类分区,每个聚类分区对应有分区中心向量。计算待查询向量q1与10个聚类分区的分区中心向量的第一相似度,选择第一相似度最高的前K个聚类分区作为检索分区,或者选择第一相似度满足预设阈值的K个聚类分区作为检索分区。假如确定了3个检索分区,分别为检索分区A、检索分区B和检索分区C,那么继而确定待查询向量q1与3个检索分区的分区中心向量的3个第一相似度。再确定目标检索分区为检索分区A,计算待查询向量q1与检索分区A中的10个向量的10个第二相似度;在10个第二相似度中选择5个目标第二相似度。将待查询向量q1、3个第一相似度和5个目标第二相似度输入单阶段模型,单阶段模型输出概率值。概率值用于表征检索分区A中包含目标向量的概率。概率值可以反映出当前检索的检索精度,若概率值较高,则说明当前检索精度较高,若概率值较低,则说明当前检索精度较低。若概率值满足第一预设阈值,例如概率值为0.98,大于第一预设阈值0.9,则说明检索分区A中已经包含了大部分的目标向量,检索精度较高,可以对待查询向量终止检索了。For example, cluster each vector in the vector base to obtain 10 cluster partitions, each of which corresponds to a partition center vector. Calculate the first similarity between the query vector q1 and the partition center vectors of the 10 cluster partitions, select the first K cluster partitions with the highest first similarity as the retrieval partitions, or select K cluster partitions whose first similarity meets the preset threshold as the retrieval partitions. If three retrieval partitions are determined, namely retrieval partition A, retrieval partition B and retrieval partition C, then determine the three first similarities between the query vector q1 and the partition center vectors of the three retrieval partitions. Then determine the target retrieval partition as retrieval partition A, calculate the 10 second similarities between the query vector q1 and the 10 vectors in retrieval partition A; select 5 target second similarities from the 10 second similarities. Input the query vector q1, the 3 first similarities and the 5 target second similarities into the single-stage model, and the single-stage model outputs a probability value. The probability value is used to characterize the probability that the target vector is contained in the retrieval partition A. The probability value can reflect the retrieval accuracy of the current retrieval. If the probability value is high, it means that the current retrieval accuracy is high. If the probability value is low, it means that the current retrieval accuracy is low. If the probability value meets the first preset threshold, for example, the probability value is 0.98, which is greater than the first preset threshold 0.9, it means that the retrieval partition A already contains most of the target vectors, the retrieval accuracy is high, and the retrieval of the query vector can be terminated.
上述方式通过对是否对当前的待查询向量提前终止检索进行推理判断,从而减少了待查询向量继续与剩余的检索分区中的向量计算相似度的可能,降低了检索计算量,提高了向量检索的速度。The above method makes an inference judgment on whether to terminate the search for the current query vector in advance, thereby reducing the possibility of continuing to calculate the similarity between the query vector and the vectors in the remaining search partitions, reducing the search calculation amount and improving the speed of vector search.
然而单阶段模型存在一个问题,导致上述实施例中的方法在硬件加速器中运行的话,只能采用矩阵乘向量的方式进行计算,无法充分发挥硬件加速器的算力。具体来说,在上述实施例中,单阶段模型的输入为待查询向量、K个第一相似度、t个目标第二相似度。由于不同的待查询向量确定的检索分区不同,对应的目标检索分区不同,因此每个待查询向量与目标检索分区中的各向量的各第二相似度是分别计算的,计算后分别输入至单阶段模型,那么单阶段模型只能使用到硬件加速器的矩阵乘向量的计算方式。例如,待查询向量q1确定的检索分区为检索分区A、检索分区B和检索分区C,对应的目标检索分区为检索分区A;待查询向量q2确定的检索分区为检索分区D、检索分区E和检索分区F,对应的目标检索分区为检索分区D。检索分区A和检索分区D中各向量都不同,因此只能在硬件加速器中采用矩阵乘向量的方式先计算待查询向量q1和检索分区A中各向量的各第二相似度,再在硬件加速器中采用矩阵乘向量的方式计算待查询向量q2和检索分区D中各向量的各第二相似度。因此,待查询向量q1对应的概率值和待查询向量q2对应的概率值也只能通过单阶段模型分别输出,单阶段模型就只能采用矩阵乘向量的方式计算概率值。例如,针对待查询向量q1,向单阶段模型中输入待查询向量q1、待查询向量q1与3个检索分区的分区中心向量的3个第一相似度和待查询向量q1与检索分区A中各向量的t个目标第二相似度,单阶段模型使用硬件加速器的矩阵乘向量的计算方式输出待查询向量q1的概率值;之后向单阶段模型中输入待查询向量q2、待查询向量q2与3个检索分区的分区中心向量的3个第一相似度和待查询向量q2与检索分区D中各向量的t个目标第二相似度,单阶段模型使用硬件加速器的矩阵乘向量的计算方式输出待查询向量q2的概率值。However, there is a problem with the single-stage model, which means that if the method in the above embodiment is run in a hardware accelerator, it can only be calculated by matrix multiplication of vectors, and the computing power of the hardware accelerator cannot be fully utilized. Specifically, in the above embodiment, the input of the single-stage model is the vector to be queried, K first similarities, and t target second similarities. Since different retrieval partitions determined by different vectors to be queried are different, the corresponding target retrieval partitions are different, so each second similarity between each vector to be queried and each vector in the target retrieval partition is calculated separately, and then input into the single-stage model respectively after calculation. Then the single-stage model can only use the matrix multiplication of vectors of the hardware accelerator. For example, the retrieval partitions determined by the vector to be queried q1 are retrieval partition A, retrieval partition B and retrieval partition C, and the corresponding target retrieval partition is retrieval partition A; the retrieval partitions determined by the vector to be queried q2 are retrieval partition D, retrieval partition E and retrieval partition F, and the corresponding target retrieval partition is retrieval partition D. The vectors in the search partition A and the search partition D are different, so the second similarities between the query vector q1 and the vectors in the search partition A can only be calculated by matrix multiplication in the hardware accelerator, and then the second similarities between the query vector q2 and the vectors in the search partition D can be calculated by matrix multiplication in the hardware accelerator. Therefore, the probability value corresponding to the query vector q1 and the probability value corresponding to the query vector q2 can only be output separately through the single-stage model, and the single-stage model can only calculate the probability value by matrix multiplication. For example, for the query vector q1, the query vector q1, the three first similarities between the query vector q1 and the partition center vectors of the three retrieval partitions, and t target second similarities between the query vector q1 and each vector in the retrieval partition A are input into the single-stage model, and the single-stage model uses the matrix multiplication vector calculation method of the hardware accelerator to output the probability value of the query vector q1; then, the query vector q2, the three first similarities between the query vector q2 and the partition center vectors of the three retrieval partitions, and t target second similarities between the query vector q2 and each vector in the retrieval partition D are input into the single-stage model, and the single-stage model uses the matrix multiplication vector calculation method of the hardware accelerator to output the probability value of the query vector q2.
可以看出,单阶段模型每次只能针对一个待查询向量进行预测。也就导致了只能使用到硬件加速器的矩阵乘向量的计算方式。而硬件加速器的矩阵乘向量的计算方式的计算效率远不如硬件加速器的矩阵乘矩阵的计算方式的计算效率。如此会造成了硬件加速器算力的浪费。此外,单阶段模型每次只能对一个待查询向量进行预测,若有多个待查询向量,则预测时间进一增加。若采用双阶段模型,则可以在一定程度上克服单阶段模型中存在的问题,在单阶段模型的基础上,进一步提高检索速度。 It can be seen that the single-stage model can only predict one vector to be queried at a time. This means that only the matrix-vector calculation method of the hardware accelerator can be used. The computational efficiency of the matrix-vector calculation method of the hardware accelerator is far less than the computational efficiency of the matrix-matrix calculation method of the hardware accelerator. This will cause a waste of computing power of the hardware accelerator. In addition, the single-stage model can only predict one vector to be queried at a time. If there are multiple vectors to be queried, the prediction time will be further increased. If a two-stage model is used, the problems existing in the single-stage model can be overcome to a certain extent, and the retrieval speed can be further improved on the basis of the single-stage model.
若采用双阶段模型,步骤601中的待查询向量的数量可以为N个,N为大于1的正整数。当N大于1时,能够充分发挥本申请实施例提供的向量检索的方法的优势,提高检索速度,降低检索耗时。本申请实施例对获取N个待查询向量的方式不作限制,例如在批量查询中,一次性批量获取多个待查询向量,将多个待查询向量作为输入;例如获取单个待查询向量(如互联网应用中的用户输入)后,由计算设备将依次获取的单个待查询向量进行整合,整合成多个待查询向量作为输入。整合的方式可以采用本领域技术人员所熟知的各种方式,本申请实施例对此不作限制。If a two-stage model is adopted, the number of vectors to be queried in step 601 can be N, where N is a positive integer greater than 1. When N is greater than 1, the advantages of the vector retrieval method provided in the embodiment of the present application can be fully utilized to improve the retrieval speed and reduce the retrieval time. The embodiment of the present application does not limit the way to obtain N vectors to be queried. For example, in a batch query, multiple vectors to be queried are obtained in batches at one time, and multiple vectors to be queried are used as input; for example, after obtaining a single vector to be queried (such as user input in an Internet application), the computing device integrates the single vectors to be queried obtained in sequence into multiple vectors to be queried as input. The integration method can adopt various methods well known to those skilled in the art, and the embodiment of the present application does not limit this.
步骤602中得到M个第一相似度的方法为:根据N个待查询向量和M个聚类分区的分区中心向量,采用硬件加速器的矩阵乘矩阵的方式,得到任一待查询向量与M个分区中心向量的M个第一相似度。在M个第一相似度中,选择第一相似度由高到低排序在前的K个第一相似度,并将K个第一相似度分别对应的聚类分区确定为K个检索分区,K为大于等于1的整数、且K小于M。The method for obtaining the M first similarities in step 602 is: according to the N query vectors and the partition center vectors of the M cluster partitions, the matrix multiplication method of the hardware accelerator is used to obtain the M first similarities between any query vector and the M partition center vectors. Among the M first similarities, the K first similarities ranked in descending order of the first similarities are selected, and the cluster partitions corresponding to the K first similarities are determined as K retrieval partitions, where K is an integer greater than or equal to 1, and K is less than M.
对向量底库中的各向量按照向量之间的相似度进行聚类得到了M个聚类分区。每个待查询向量均需计算与这M个聚类分区的分区中心向量的M个第一相似度,为了加快计算速度,可以将N个待查询向量形成一个矩阵,将M个分区中心向量形成一个矩阵,采用硬件加速器的矩阵乘矩阵的方式进行计算,如此,可以快速地得到N个待查询向量中任一待查询向量与M个分区中心向量的M个第一相似度。Each vector in the vector base is clustered according to the similarity between the vectors to obtain M cluster partitions. Each query vector needs to calculate the M first similarities with the partition center vectors of the M cluster partitions. In order to speed up the calculation, the N query vectors can be formed into a matrix, and the M partition center vectors can be formed into a matrix. The matrix multiplication method of the hardware accelerator is used for calculation. In this way, the M first similarities between any query vector in the N query vectors and the M partition center vectors can be quickly obtained.
例如,待查询向量为q1和q2,形成的矩阵为[q1,q2];M个分区中心向量分别为m1、m2、m3、m4、m5、m6、m7、m8、m9和m10,形成的矩阵为[m1,m2,m3,m4,m5,m6,m7,m8,m9,m10]。图11a示出了采用硬件加速器的矩阵乘矩阵的方式进行计算后得到的任一待查询向量与M个分区中心向量的M个第一相似度的矩阵。其中s11表示q1和m1的第一相似度,s12表示q1和m2的第一相似度,依此类推,在此不再赘述。For example, the vectors to be queried are q1 and q2, and the matrix formed is [q1, q2]; the M partition center vectors are m1, m2, m3, m4, m5, m6, m7, m8, m9 and m10, and the matrix formed is [m1, m2, m3, m4, m5, m6, m7, m8, m9, m10]. Figure 11a shows the matrix of M first similarities between any query vector and M partition center vectors obtained by calculating by matrix multiplication of the hardware accelerator. Among them, s11 represents the first similarity between q1 and m1, s12 represents the first similarity between q1 and m2, and so on, which will not be repeated here.
针对任一待查询向量,根据该待查询向量与M个分区中心向量的M个第一相似度,将第一相似度由高到低排序在前的K个第一相似度对应的聚类分区作为该待查询向量的K个检索分区。例如,每个待查询向量均确定了3个检索分区。待查询向量q1确定的检索分区为检索分区A、检索分区B和检索分区C;待查询向量q2确定的检索分区为检索分区D、检索分区E和检索分区F。For any query vector, according to the M first similarities between the query vector and the M partition center vectors, the cluster partitions corresponding to the K first similarities ranked in descending order are used as the K retrieval partitions of the query vector. For example, three retrieval partitions are determined for each query vector. The retrieval partitions determined by the query vector q1 are retrieval partition A, retrieval partition B, and retrieval partition C; the retrieval partitions determined by the query vector q2 are retrieval partition D, retrieval partition E, and retrieval partition F.
步骤603中的根据各第二相似度得到概率值的方法可以进一步细化,图10示例性示出了一种根据各第二相似度得到概率值的方法,具体可以包括如下步骤:The method for obtaining the probability value according to each second similarity in step 603 can be further refined. FIG. 10 exemplarily shows a method for obtaining the probability value according to each second similarity, which can specifically include the following steps:
步骤1001,将N个待查询向量形成的矩阵和N个待查询向量中每个待查询向量对应的K个第一相似度形成的矩阵输入第一预测模型,采用硬件加速器的矩阵乘矩阵的方法,得到N个待查询向量对应的N个初始概率值;初始概率值用于表征任一待查询向量对应的K个检索分区中包含待查询向量的目标向量的概率。Step 1001, input a matrix formed by N query vectors and a matrix formed by K first similarities corresponding to each of the N query vectors into a first prediction model, and use a matrix-matrix multiplication method of a hardware accelerator to obtain N initial probability values corresponding to the N query vectors; the initial probability values are used to represent the probability that a target vector in the K retrieval partitions corresponding to any query vector contains the query vector.
例如,待查询向量为q1和q2,形成的矩阵为[q1,q2];待查询向量q1的3个检索分区对应的3个第一相似度分别为s11、s12和s13,分别对应检索分区A、检索分区B和检索分区C;待查询向量q2的3个检索分区对应的3个第一相似度分别为s24、s25和s26,分别对应检索分区D、检索分区E和检索分区F。图11b示出了2个待查询向量中每个待查询向量对应的3个第一相似度形成的矩阵。在这个矩阵中,不必关注每个待查询向量对应的检索分区包括哪些,因为第一预测模型只需针对每个待查询向量和每个待查询向量对应的3个第一相似度,计算初始概率值。For example, the query vectors are q1 and q2, and the matrix formed is [q1, q2]; the three first similarities corresponding to the three retrieval partitions of the query vector q1 are s11, s12 and s13, respectively, corresponding to retrieval partition A, retrieval partition B and retrieval partition C; the three first similarities corresponding to the three retrieval partitions of the query vector q2 are s24, s25 and s26, respectively, corresponding to retrieval partition D, retrieval partition E and retrieval partition F. Figure 11b shows the matrix formed by the three first similarities corresponding to each of the two query vectors. In this matrix, it is not necessary to pay attention to which retrieval partitions each query vector corresponds to, because the first prediction model only needs to calculate the initial probability value for each query vector and the three first similarities corresponding to each query vector.
例如,为待查询向量q1和待查询向量q2分别生成2个初始概率值p11和p12,其中p11表征检索分区A、检索分区B和检索分区C中包含待查询向量q1的目标向量的概率。其中p12表征检索分区D、检索分区E和检索分区F中包含待查询向量q2的目标向量的概率。以上仅为示例。For example, two initial probability values p11 and p12 are generated for the query vector q1 and the query vector q2, respectively, where p11 represents the probability of the target vector containing the query vector q1 in the retrieval partition A, the retrieval partition B, and the retrieval partition C. Wherein p12 represents the probability of the target vector containing the query vector q2 in the retrieval partition D, the retrieval partition E, and the retrieval partition F. The above is only an example.
可以看出,由于第一预测模型的输入为N个待查询向量和N个待查询向量中每个待查询向量对应的K个第一相似度,这些特征均能够以矩阵形式输入,因此能够采用硬件加速器的矩阵乘矩阵的方式得到N个待查询向量对应的N个初始概率值,如此,充分发挥了硬件加速器的算力,相比较于单阶段模型而言,可以进一步提高计算效率,提高向量检索的速度,降低检索耗时。It can be seen that since the input of the first prediction model is N query vectors and K first similarities corresponding to each of the N query vectors, these features can be input in matrix form. Therefore, the matrix multiplication of the hardware accelerator can be used to obtain N initial probability values corresponding to the N query vectors. In this way, the computing power of the hardware accelerator is fully utilized. Compared with the single-stage model, the computing efficiency can be further improved, the speed of vector retrieval can be increased, and the retrieval time can be reduced.
步骤1002,针对任一待查询向量,在K个检索分区中,选择任一未被选择过的检索分区作为目标检索分区;确定待查询向量与该目标检索分区中的各向量的各第二相似度。在各第二相似度中,确定第二相似度由高到低排序在前的t个目标第二相似度。Step 1002: For any query vector, select any unselected retrieval partition from the K retrieval partitions as a target retrieval partition; determine each second similarity between the query vector and each vector in the target retrieval partition. Among the second similarities, determine t target second similarities that are ranked first in descending order.
例如,针对待查询向量q1,目标检索分区为检索分区A,计算q1分别与检索分区A中的100个向量的100个第二相似度,将100个第二相似度中值最大的第二相似度作为目标第二相似度。针对待查询向量q2,目标检索分区为检索分区D,计算q1分别与检索分区D中的200个向量的200个第二相似度,将200个第二相似度中值最大的第二相似度作为目标第二相似度。 For example, for the query vector q1, the target retrieval partition is retrieval partition A, and 100 second similarities between q1 and 100 vectors in retrieval partition A are calculated, and the second similarity with the largest median value of the 100 second similarities is used as the target second similarity. For the query vector q2, the target retrieval partition is retrieval partition D, and 200 second similarities between q1 and 200 vectors in retrieval partition D are calculated, and the second similarity with the largest median value of the 200 second similarities is used as the target second similarity.
步骤1003,针对任一待查询向量,将待查询向量对应的初始概率值和待查询向量对应的t个目标第二相似度输入第二预测模型,得到待查询向量对应的最终概率值。Step 1003: for any query vector, the initial probability value corresponding to the query vector and t target second similarities corresponding to the query vector are input into the second prediction model to obtain a final probability value corresponding to the query vector.
由于每个待查询向量对应的目标检索分区不同,因此不同的待查询向量对应的t个目标第二相似度不能同时获得,而是分别计算的,具体如步骤1002介绍。因此在步骤1003中,也是针对每个待查询向量分别计算的,采用的是硬件加速器的矩阵乘向量的计算方式。Since each query vector corresponds to a different target retrieval partition, the t target second similarities corresponding to different query vectors cannot be obtained at the same time, but are calculated separately, as described in step 1002. Therefore, in step 1003, each query vector is calculated separately, using the matrix multiplication vector calculation method of the hardware accelerator.
例如,针对待查询向量q1,将其对应的初始概率值p11,和目标第二相似度输入至第二预测模型,采用硬件加速器的矩阵乘向量的方式得到最终概率值p21。p21反映了检索分区A中包含待查询向量q1的目标向量的概率。For example, for the query vector q1, its corresponding initial probability value p11 and the target second similarity are input into the second prediction model, and the final probability value p21 is obtained by matrix multiplication of the hardware accelerator. p21 reflects the probability of the target vector in the retrieval partition A containing the query vector q1.
针对待查询向量q2,将其对应的初始概率值p12,和目标第二相似度输入至第二预测模型,采用硬件加速器的矩阵乘向量的方式得到最终概率值p22。p22反映了检索分区D中包含待查询向量q2的目标向量的概率。For the query vector q2, the corresponding initial probability value p12 and the target second similarity are input into the second prediction model, and the final probability value p22 is obtained by matrix multiplication of the hardware accelerator. p22 reflects the probability of the target vector containing the query vector q2 in the retrieval partition D.
上述技术方案中,将对概率值的预测分为两阶段进行,第一阶段采用第一预测模型,第二阶段采用第二预测模型。具体的,将N个待查询向量形成的矩阵和N个待查询向量中每个待查询向量对应的K个第一相似度形成的矩阵输入第一预测模型,如此第一预测模型就可以使用矩阵乘矩阵的方式预测初始概率值,充分发挥了算力,提高了计算效率,进一步提高了向量检索的速度。In the above technical solution, the prediction of the probability value is divided into two stages, the first stage uses the first prediction model, and the second stage uses the second prediction model. Specifically, the matrix formed by the N query vectors and the matrix formed by the K first similarities corresponding to each query vector in the N query vectors are input into the first prediction model, so that the first prediction model can use the matrix multiplication method to predict the initial probability value, which gives full play to the computing power, improves the computing efficiency, and further improves the speed of vector retrieval.
若采用双阶段模型,则在步骤1003之后,对概率值进行判断的步骤也可以进一步细化,图12示例性示出了一种对概率值进行判断的方法,具体可以包括如下步骤:If a two-stage model is used, after step 1003, the step of determining the probability value may be further refined. FIG. 12 exemplarily shows a method for determining the probability value, which may specifically include the following steps:
步骤1201,若最终概率值不大于第一预设阈值,则在K个检索分区中选择下一个未被选择过的检索分区作为目标检索分区。Step 1201: If the final probability value is not greater than the first preset threshold, then select the next unselected retrieval partition from the K retrieval partitions as the target retrieval partition.
例如,待查询向量q1与检索分区A、检索分区B和检索分区C的第一相似度分别为0.9,0.8和0.7,则待查询向量q1的目标检索分区为检索分区A,下一个目标检索分区为检索分区B。For example, the first similarities between the query vector q1 and the retrieval partition A, retrieval partition B and retrieval partition C are 0.9, 0.8 and 0.7 respectively, then the target retrieval partition of the query vector q1 is retrieval partition A, and the next target retrieval partition is retrieval partition B.
步骤1202,将待查询向量对应的最终概率值和待查询向量与下一个目标检索分区中的各向量的目标第二相似度输入至第二预测模型,采用硬件加速器的矩阵乘向量的方式得到待查询向量对应的更新概率值。Step 1202, input the final probability value corresponding to the query vector and the target second similarity between the query vector and each vector in the next target retrieval partition into the second prediction model, and obtain the updated probability value corresponding to the query vector by matrix multiplication of the hardware accelerator.
这里确定目标第二相似度的方法同前文中在目标检索分区中确定目标第二相似度的方法,在此不再赘述。The method for determining the second target similarity here is the same as the method for determining the second target similarity in the target retrieval partition in the previous text, and will not be repeated here.
例如,计算待查询向量q1与检索分区B中的各向量的第二相似度,确定第二相似度最大的值为目标第二相似度;将待查询向量q1对应的最终概率值p21和待查询向量q1对应的目标第二相似度输入至第二预测模型,采用硬件加速器的矩阵乘向量的方式得到待查询向量q1对应的更新概率值。更新概率值用于表征当前所有目标检索分区中包含目标向量的概率,在本例中,更新概率值用于表征检索分区A、检索分区B中包含目标向量的概率。For example, the second similarity between the query vector q1 and each vector in the retrieval partition B is calculated, and the value with the largest second similarity is determined as the target second similarity; the final probability value p21 corresponding to the query vector q1 and the target second similarity corresponding to the query vector q1 are input into the second prediction model, and the updated probability value corresponding to the query vector q1 is obtained by matrix multiplication of the vector by the hardware accelerator. The updated probability value is used to represent the probability that the target vector is included in all current target retrieval partitions. In this example, the updated probability value is used to represent the probability that the target vector is included in the retrieval partition A and the retrieval partition B.
步骤1203,若更新概率值不大于第一预设阈值,则将步骤1202中的最终概率值更新为更新概率值,返回步骤1201中的在K个检索分区中选择下一个未被选择过的检索分区作为目标检索分区的步骤。Step 1203, if the updated probability value is not greater than the first preset threshold, the final probability value in step 1202 is updated to the updated probability value, and the process returns to step 1201 to select the next unselected retrieval partition from the K retrieval partitions as the target retrieval partition.
若更新概率值不大于第一预设阈值,说明当前所有目标检索分区中包含目标向量的概率很低,检索精度不高,应当继续检索。If the updated probability value is not greater than the first preset threshold, it means that the probability that the target vector is included in all current target retrieval partitions is very low, the retrieval accuracy is not high, and the retrieval should be continued.
若更新概率值大于第一预设阈值,说明当前所有目标检索分区中包含目标向量的概率较高,检索精度较高,应当终止检索。或者,K个检索分区轮询完毕,则同样应当终止检索,节省算力。If the update probability value is greater than the first preset threshold, it means that the probability of all current target search partitions containing the target vector is high, the search accuracy is high, and the search should be terminated. Alternatively, when the K search partitions are polled, the search should also be terminated to save computing power.
通过将最终概率值又输入至第二预测模型,得到更新概率值,若更新概率值不大于第二预设阈值,则将更新概率值更新为最终概率值,如此循环,进行是否终止检索的判断。提高了对终止检索判断的准确性,可以提高向量检索精度。By inputting the final probability value into the second prediction model again, an updated probability value is obtained. If the updated probability value is not greater than the second preset threshold, the updated probability value is updated to the final probability value, and the cycle is repeated to determine whether to terminate the search. The accuracy of the judgment on terminating the search is improved, and the vector search accuracy can be improved.
为方便理解,下面通过一个具体的实施例对本申请实施例提供的向量检索方法作整体性说明。图13为本发明实施例提供的一种向量检索方法的整体性流程示意图,可以包括如下步骤。For ease of understanding, the vector search method provided by the embodiment of the present application is generally described below through a specific embodiment. FIG13 is a schematic diagram of the overall flow of a vector search method provided by the embodiment of the present invention, which may include the following steps.
步骤1301,获取待查询向量q1和q2。Step 1301, obtaining the vectors q1 and q2 to be queried.
步骤1302,将待查询向量q1和q2形成的矩阵,和10个聚类分区的分区中心向量形成的矩阵,采用硬件加速器的矩阵乘矩阵的方式,得到任一待查询向量与10个分区中心向量的10个第一相似度。Step 1302, the matrix formed by the query vectors q1 and q2 and the matrix formed by the partition center vectors of the 10 cluster partitions are multiplied by the matrix-matrix method of the hardware accelerator to obtain 10 first similarities between any query vector and the 10 partition center vectors.
步骤1303,在待查询向量q1对应的10个第一相似度中,确定第一相似度从高到低的前3个值分别对应的3个检索分区;在待查询向量q2对应的10个第一相似度中,确定第一相似度从高到低的前3个值分别对应的3个检索分区。Step 1303, among the 10 first similarities corresponding to the query vector q1, determine the 3 retrieval partitions corresponding to the top 3 values of the first similarities from high to low; among the 10 first similarities corresponding to the query vector q2, determine the 3 retrieval partitions corresponding to the top 3 values of the first similarities from high to low.
例如,待查询向量q1对应的3个检索分区为检索分区A、检索分区B和检索分区C。待查询向量q2对应的3个检索分区为检索分区D、检索分区E和检索分区F。 For example, the three retrieval partitions corresponding to the query vector q1 are retrieval partition A, retrieval partition B, and retrieval partition C. The three retrieval partitions corresponding to the query vector q2 are retrieval partition D, retrieval partition E, and retrieval partition F.
步骤1304,将待查询向量q1和q2形成的矩阵,待查询向量q1和q2分别对应的3个第一相似度形成的矩阵输入至第一预测模型。Step 1304: input the matrix formed by the query vectors q1 and q2 and the matrices formed by the three first similarities corresponding to the query vectors q1 and q2 respectively into the first prediction model.
步骤1305,在第一预测模型中,采用硬件加速器的矩阵乘矩阵的方式得到待查询向量q1和q2分别对应的初始概率值。Step 1305: In the first prediction model, the matrix multiplication method of the hardware accelerator is used to obtain the initial probability values corresponding to the query vectors q1 and q2 respectively.
步骤1306,针对待查询向量q1,对3个检索分区按照第一相似度的大小进行排序。Step 1306: for the query vector q1, sort the three search partitions according to the magnitude of the first similarity.
例如,检索分区A,检索分区B和检索分区C对应的第一相似度分别为0.9、0.8和0.7。For example, the first similarities corresponding to retrieval partition A, retrieval partition B, and retrieval partition C are 0.9, 0.8, and 0.7, respectively.
步骤1307,将第一相似度最大的检索分区确定为待查询向量q1的目标检索分区。例如,将检索分区A确定为待查询向量q1的第i目标检索分区。Step 1307: determine the search partition with the largest first similarity as the target search partition of the query vector q1. For example, determine the search partition A as the i-th target search partition of the query vector q1.
步骤1308,计算待查询向量q1与目标检索分区中各向量的第二相似度,将第二相似度的最大值作为目标检索分区的目标第二相似度。Step 1308, calculating the second similarity between the query vector q1 and each vector in the target retrieval partition, and taking the maximum value of the second similarity as the target second similarity of the target retrieval partition.
步骤1309,将待查询向量q1对应的初始概率值和目标第二相似度输入至第二预测模型,采用硬件加速器的矩阵乘向量的方式得到待查询向量q1对应的最终概率值。Step 1309: input the initial probability value corresponding to the query vector q1 and the target second similarity into the second prediction model, and obtain the final probability value corresponding to the query vector q1 by matrix multiplication of the hardware accelerator.
步骤1310,判断最终概率值是否大于第一预设阈值,若是,则进入步骤1311。若否,则进入步骤1312。Step 1310, determine whether the final probability value is greater than a first preset threshold, if so, proceed to step 1311. If not, proceed to step 1312.
步骤1311,对待查询向量q1终止检索。将当前所有目标检索分区中与待查询向量满足相似度要求的向量作为查询结果返回。例如,检索分区A对应的最终概率值为0.98,大于第一预设阈值,则将检索分区A中与待查询向量的第二相似度从大到小排序在前W个第二相似度分别对应的向量作为查询结果返回。Step 1311, terminate the search for the query vector q1. Return the vectors that meet the similarity requirement with the query vector in all current target search partitions as the query results. For example, if the final probability value corresponding to search partition A is 0.98, which is greater than the first preset threshold, then the vectors corresponding to the first W second similarities in search partition A with the query vector are returned as the query results.
步骤1312,在3个检索分区中选取下一个未被选择过的检索分区作为目标检索分区。Step 1312, selecting the next unselected retrieval partition among the three retrieval partitions as the target retrieval partition.
例如,检索分区A对应的最终概率值为0.58,不大于第一预设阈值,则选取检索分区B作为目标检索分区。For example, the final probability value corresponding to the retrieval partition A is 0.58, which is not greater than the first preset threshold, and the retrieval partition B is selected as the target retrieval partition.
步骤1313,计算待查询向量q1与下一个目标检索分区中的各向量的第二相似度,将第二相似度的最大值作为下一个目标检索分区的目标第二相似度。Step 1313, calculating the second similarity between the query vector q1 and each vector in the next target retrieval partition, and taking the maximum value of the second similarity as the target second similarity of the next target retrieval partition.
步骤1314,将最终概率值和下一个目标检索分区的目标第二相似度输入第二预测模型,采用硬件加速器的矩阵乘向量的方式得到待查询向量q1对应的更新概率值。Step 1314, input the final probability value and the target second similarity of the next target retrieval partition into the second prediction model, and use the matrix multiplication vector method of the hardware accelerator to obtain the updated probability value corresponding to the query vector q1.
步骤1315,判断更新概率值是否大于第一预设阈值,若是,则进入步骤1311。若否,则进入步骤1316。Step 1315 , determining whether the update probability value is greater than a first preset threshold, if so, proceeding to step 1311 , if not, proceeding to step 1316 .
步骤1316,将步骤1314中的最终概率值更新为更新概率值,返回步骤1312。Step 1316, update the final probability value in step 1314 to the updated probability value, and return to step 1312.
针对待查询向量q2,请参考上述步骤1306-步骤1316中针对待查询向量q1的处理步骤,以确定待查询向量q2的查询结果,此处不再重复赘述。For the vector q2 to be queried, please refer to the processing steps for the vector q1 to be queried in the above steps 1306 to 1316 to determine the query result of the vector q2 to be queried, which will not be repeated here.
需要指出的是,上述方法实施例中步骤均是以计算设备20执行为例说明,此外,上述方法实施例中步骤还可以是由计算设备20中的处理器201执行。It should be noted that the steps in the above method embodiments are all described by taking the execution of the computing device 20 as an example. In addition, the steps in the above method embodiments can also be executed by the processor 201 in the computing device 20.
基于上述内容和相同技术构思,本申请实施例提供一种向量检索装置,如图14所示,该向量检索装置包括获取单元1401和处理单元1402。向量检索装置用于执行上述图5a、图6、图9、图10、图12或图13中所示的方法实施例。Based on the above content and the same technical concept, an embodiment of the present application provides a vector retrieval device, as shown in Figure 14, the vector retrieval device includes an acquisition unit 1401 and a processing unit 1402. The vector retrieval device is used to execute the method embodiments shown in Figure 5a, Figure 6, Figure 9, Figure 10, Figure 12 or Figure 13 above.
当向量检索装置用于实现图13所示的方法实施例中的功能时,获取单元1401用于获取待查询向量;处理单元1402用于:将待查询向量分别与M个聚类分区的分区中心向量做相似度计算,得到M个第一相似度;M个聚类分区为对向量底库中的各向量按照向量之间的相似度进行聚类得到的;任一聚类分区的分区中心向量是根据任一聚类分区中包含的多个向量确定的,M为大于1的整数;在M个第一相似度中,选择第一相似度由高到低排序在前的K个第一相似度,并将K个第一相似度分别对应的聚类分区确定为K个检索分区,K为大于等于1的整数、且K小于M;循环执行如下操作,直至确定出在K个检索分区中选择的目标检索分区包含目标向量的概率值大于第一预设阈值为止,目标向量为与待查询向量之间的相似度在预设范围内的向量:在K个检索分区中选择未被选择过的检索分区作为目标检索分区;计算待查询向量分别与目标检索分区中包含的各向量之间的第二相似度;根据各第二相似度,确定目标检索分区中包含目标向量的概率值;基于已经选择的至少一个检索分区和待查询向量,输出查询结果。When the vector retrieval device is used to implement the function in the method embodiment shown in FIG13, the acquisition unit 1401 is used to acquire the vector to be queried; the processing unit 1402 is used to: perform similarity calculations on the vector to be queried and the partition center vectors of M cluster partitions respectively, to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarities between the vectors; the partition center vector of any cluster partition is determined based on the multiple vectors contained in any cluster partition, and M is an integer greater than 1; among the M first similarities, K first similarities whose first similarities are sorted from high to low are selected, and the cluster partitions corresponding to the K first similarities are respectively The area is determined to be K retrieval partitions, K is an integer greater than or equal to 1, and K is less than M; the following operations are performed in a loop until it is determined that the probability value of the target retrieval partition selected from the K retrieval partitions containing the target vector is greater than a first preset threshold, and the target vector is a vector whose similarity with the vector to be queried is within a preset range: a retrieval partition that has not been selected is selected from the K retrieval partitions as the target retrieval partition; a second similarity between the vector to be queried and each vector contained in the target retrieval partition is calculated; according to each second similarity, a probability value of the target retrieval partition containing the target vector is determined; based on at least one retrieval partition that has been selected and the vector to be queried, a query result is output.
在一种可能的实现方式中,处理单元1402在基于已经选择的至少一个检索分区和待查询向量,输出查询结果时,具体用于:将已经选择的至少一个检索分区中概率值大于第一预设阈值的检索分区中包含的各向量,作为查询结果输出;或者,根据已经选择的至少一个检索分区中概率值大于第一预设阈值的检索分区中包含的各向量分别与待查询向量之间的第二相似度由高到低的顺序,将排序在前的W个第二相似度分别对应的向量,作为查询结果输出,W为正整数。In a possible implementation, when the processing unit 1402 outputs the query result based on at least one selected retrieval partition and the vector to be queried, it is specifically used to: output each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition as the query result; or, according to the order of the second similarities between each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition and the vector to be queried from high to low, output the vectors corresponding to the first W second similarities as the query result, where W is a positive integer.
在一种可能的实现方式中,处理单元1402在基于已经选择的至少一个检索分区和待查询向量,输出查询结果时,具体用于:将已经选择的至少一个检索分区分别包含的各向量,作为查询结果输出;或者根据已经选择的至少一个检索分区中分别包含的各向量与待查询向量之间的第二相似度由高到低的顺序,将 排序在前的W个第二相似度分别对应的向量,作为查询结果输出,W为正整数。In a possible implementation, when the processing unit 1402 outputs the query result based on the at least one selected retrieval partition and the vector to be queried, it is specifically configured to: output the vectors respectively included in the at least one selected retrieval partition as the query result; or output the vectors respectively included in the at least one selected retrieval partition as the query result according to the order of the second similarities between the vectors respectively included in the at least one selected retrieval partition and the vector to be queried from high to low. The vectors corresponding to the top W second similarities are output as the query result, where W is a positive integer.
在一种可能的实现方式中,处理单元1402在K个检索分区中选择未被选择过的检索分区作为目标检索分区时,具体用于:按照K个第一相似度由高到低的顺序在K个检索分区中选择未被选择过的检索分区作为目标检索分区。In a possible implementation, when the processing unit 1402 selects an unselected retrieval partition as a target retrieval partition from the K retrieval partitions, it is specifically configured to: select an unselected retrieval partition as a target retrieval partition from the K retrieval partitions in descending order of the K first similarities.
在一种可能的实现方式中,处理单元1402在K个检索分区中选择未被选择过的检索分区作为目标检索分区时,具体用于:针对K个检索分区中的任一检索分区,将检索分区中的各向量按照向量之间的相似度进行聚类得到多个检索子分区;根据任一检索子分区中包含的多个向量确定任一检索子分区的子分区中心向量;计算待查询向量分别与多个检索子分区的子分区中心向量的第三相似度;根据待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对K个检索分区进行排序;在排序后的K个检索分区中选择未被选择过的检索分区作为目标检索分区。In one possible implementation, when the processing unit 1402 selects a retrieval partition that has not been selected as a target retrieval partition from K retrieval partitions, it is specifically used to: for any retrieval partition from the K retrieval partitions, cluster each vector in the retrieval partition according to the similarity between the vectors to obtain multiple retrieval sub-partitions; determine the sub-partition center vector of any retrieval sub-partition based on the multiple vectors contained in any retrieval sub-partition; calculate the third similarities between the vector to be queried and the sub-partition center vectors of the multiple retrieval sub-partitions; sort the K retrieval partitions based on the multiple third similarities between the vector to be queried and the multiple sub-partition center vectors in each retrieval partition; and select a retrieval partition that has not been selected as the target retrieval partition from the sorted K retrieval partitions.
在一种可能的实现方式中,处理单元1402在根据待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对K个检索分区进行排序时,具体用于:根据待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中超出第二预设阈值的第三相似度的数量,对K个检索分区进行排序;或者,根据待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中的最大相似度,对K个检索分区进行排序。In a possible implementation, when the processing unit 1402 sorts the K retrieval partitions according to the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition, it is specifically used to: sort the K retrieval partitions according to the number of third similarities that exceed a second preset threshold among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition; or sort the K retrieval partitions according to the maximum similarity among the multiple third similarities between the query vector and the multiple sub-partition center vectors in each retrieval partition.
在一种可能的实现方式中,处理单元1402在根据各第二相似度,确定目标检索分区中包含目标向量的概率值时,具体用于:在各第二相似度中,确定第二相似度由高到低排序在前的t个目标第二相似度;将待查询向量、K个第一相似度和t个目标第二相似度输入预测模型,得到概率值;预测模型用于对目标检索分区中包含目标向量的概率值进行预测。In one possible implementation, when the processing unit 1402 determines the probability value of the target retrieval partition containing the target vector based on each second similarity, it is specifically used to: determine, among each second similarity, t target second similarities whose second similarities are ranked from high to low; input the query vector, K first similarities and t target second similarities into the prediction model to obtain a probability value; the prediction model is used to predict the probability value of the target retrieval partition containing the target vector.
在一种可能的实现方式中,待查询向量为N个,N为大于1的正整数;处理单元1402在将待查询向量、K个第一相似度和t个目标第二相似度输入预测模型,得到概率值时,具体用于:将N个待查询向量形成的矩阵和N个待查询向量中每个待查询向量对应的K个第一相似度形成的矩阵输入第一预测模型,得到N个待查询向量对应的N个初始概率值;初始概率值用于表征任一待查询向量对应的K个检索分区中包含待查询向量的目标向量的概率;针对任一待查询向量,将待查询向量对应的初始概率值和待查询向量对应的t个目标第二相似度输入第二预测模型,得到待查询向量对应的最终概率值。In a possible implementation, there are N vectors to be queried, where N is a positive integer greater than 1; when the processing unit 1402 inputs the vector to be queried, K first similarities and t target second similarities into the prediction model to obtain the probability value, it is specifically used to: input a matrix formed by the N vectors to be queried and a matrix formed by the K first similarities corresponding to each of the N vectors to be queried into the first prediction model to obtain N initial probability values corresponding to the N vectors to be queried; the initial probability value is used to characterize the probability that the target vector of the K retrieval partitions corresponding to any one of the vectors to be queried contains the vector to be queried; for any one of the vectors to be queried, the initial probability value corresponding to the vector to be queried and the t target second similarities corresponding to the vector to be queried are input into the second prediction model to obtain the final probability value corresponding to the vector to be queried.
基于上述内容和相同的技术构思,本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序或指令,当该计算机程序或指令被执行时,使得计算机执行上述方法实施例中的方法。Based on the above content and the same technical concept, an embodiment of the present application also provides a computer-readable storage medium, on which a computer program or instruction is stored. When the computer program or instruction is executed, the computer executes the method in the above method embodiment.
基于上述内容和相同的技术构思,本申请实施例提供一种计算机程序产品,当计算机读取并执行计算机程序产品时,使得计算机执行上述方法实施例中的方法。Based on the above content and the same technical concept, an embodiment of the present application provides a computer program product. When a computer reads and executes the computer program product, the computer executes the method in the above method embodiment.
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。It is understood that the various numbers involved in the embodiments of the present application are only for the convenience of description and are not used to limit the scope of the embodiments of the present application. The size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic.
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。 Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the scope of protection of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims (18)

  1. 一种向量检索方法,其特征在于,包括:A vector search method, characterized by comprising:
    获取待查询向量;Get the vector to be queried;
    将所述待查询向量分别与M个聚类分区的分区中心向量做相似度计算,得到M个第一相似度;所述M个聚类分区为对向量底库中的各向量按照向量之间的相似度进行聚类得到的;任一所述聚类分区的分区中心向量是根据任一所述聚类分区中包含的多个向量确定的,所述M为大于1的整数;Calculate similarity between the query vector and the partition center vectors of M cluster partitions respectively to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarity between the vectors; the partition center vector of any of the cluster partitions is determined according to a plurality of vectors contained in any of the cluster partitions, and M is an integer greater than 1;
    在所述M个第一相似度中,选择所述第一相似度由高到低排序在前的K个第一相似度,并将所述K个第一相似度分别对应的聚类分区确定为K个检索分区,所述K为大于等于1的整数、且K小于所述M;Among the M first similarities, select K first similarities ranked first in descending order, and determine the cluster partitions corresponding to the K first similarities as K search partitions, where K is an integer greater than or equal to 1, and K is less than M;
    循环执行如下操作,直至确定出在所述K个检索分区中选择的目标检索分区包含目标向量的概率值大于第一预设阈值为止,所述目标向量为与所述待查询向量之间的相似度在预设范围内的向量:The following operations are performed in a loop until it is determined that the probability value of the target search partition selected from the K search partitions containing the target vector is greater than a first preset threshold, wherein the target vector is a vector whose similarity with the query vector is within a preset range:
    在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区;Selecting a retrieval partition that has not been selected from the K retrieval partitions as a target retrieval partition;
    计算所述待查询向量分别与所述目标检索分区中包含的各向量之间的第二相似度;Calculating second similarities between the query vector and each vector included in the target retrieval partition;
    根据各所述第二相似度,确定所述目标检索分区中包含目标向量的概率值;Determining, according to each of the second similarities, a probability value of including a target vector in the target retrieval partition;
    基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果。Based on the at least one selected retrieval partition and the vector to be queried, a query result is output.
  2. 如权利要求1所述的方法,其特征在于,基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果,包括:The method according to claim 1, characterized in that outputting the query result based on the at least one selected search partition and the vector to be queried comprises:
    将已经选择的至少一个检索分区中所述概率值大于所述第一预设阈值的检索分区中包含的各向量,作为查询结果输出;或者Outputting each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition as a query result; or
    根据已经选择的至少一个检索分区中所述概率值大于所述第一预设阈值的检索分区中包含的各向量分别与所述待查询向量之间的所述第二相似度由高到低的顺序,将排序在前的W个所述第二相似度分别对应的向量,作为查询结果输出,所述W为正整数。According to the order of the second similarities between the vectors contained in the retrieval partition whose probability value is greater than the first preset threshold in at least one selected retrieval partition and the vector to be queried, the vectors corresponding to the first W second similarities are output as query results, where W is a positive integer.
  3. 如权利要求1所述的方法,其特征在于,基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果,包括:The method according to claim 1, characterized in that outputting the query result based on the at least one selected search partition and the vector to be queried comprises:
    将已经选择的至少一个检索分区分别包含的各向量,作为查询结果输出;或者Outputting the vectors respectively contained in the at least one selected search partition as query results; or
    根据已经选择的至少一个检索分区中分别包含的各向量与所述待查询向量之间的所述第二相似度由高到低的顺序,将排序在前的W个所述第二相似度分别对应的向量,作为查询结果输出,所述W为正整数。According to the descending order of the second similarities between the vectors respectively contained in the at least one selected retrieval partition and the vector to be queried, the first W vectors corresponding to the second similarities are output as query results, where W is a positive integer.
  4. 如权利要求1所述的方法,其特征在于,在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区,包括:The method according to claim 1, characterized in that selecting a retrieval partition that has not been selected as a target retrieval partition from the K retrieval partitions comprises:
    按照所述K个第一相似度由高到低的顺序在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区。A retrieval partition that has not been selected is selected from the K retrieval partitions as a target retrieval partition in a descending order of the K first similarities.
  5. 如权利要求1所述的方法,其特征在于,在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区,包括:The method according to claim 1, characterized in that selecting a retrieval partition that has not been selected as a target retrieval partition from the K retrieval partitions comprises:
    针对所述K个检索分区中的任一检索分区,将所述检索分区中的各向量按照向量之间的相似度进行聚类得到多个检索子分区;根据任一检索子分区中包含的多个向量确定任一所述检索子分区的子分区中心向量;For any retrieval partition among the K retrieval partitions, clustering the vectors in the retrieval partition according to the similarity between the vectors to obtain a plurality of retrieval sub-partitions; determining a sub-partition center vector of any retrieval sub-partition according to the plurality of vectors contained in any retrieval sub-partition;
    计算所述待查询向量分别与所述多个检索子分区的子分区中心向量的第三相似度;Calculating third similarities between the query vector and the sub-partition center vectors of the plurality of retrieval sub-partitions respectively;
    根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对所述K个检索分区进行排序;sorting the K search partitions according to a plurality of third similarities between the query vector and a plurality of sub-partition center vectors in each search partition;
    在排序后的K个检索分区中选择未被选择过的检索分区作为所述目标检索分区。A retrieval partition that has not been selected is selected from the sorted K retrieval partitions as the target retrieval partition.
  6. 如权利要求5所述的方法,其特征在于,根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对所述K个检索分区进行排序,包括:The method of claim 5, wherein the step of sorting the K search partitions according to a plurality of third similarities between the query vector and a plurality of sub-partition center vectors in each search partition comprises:
    根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中超出第二预设阈值的所述第三相似度的数量,对所述K个检索分区进行排序;或者sorting the K retrieval partitions according to the number of the third similarities between the query vector and the central vectors of the multiple sub-partitions in each retrieval partition that exceeds a second preset threshold; or
    根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中的最大相似度,对所述K个检索分区进行排序。The K retrieval partitions are sorted according to the maximum similarity among multiple third similarities between the query vector and multiple sub-partition center vectors in each retrieval partition.
  7. 如权利要求1所述的方法,其特征在于,根据各所述第二相似度,确定所述目标检索分区中包含目标向量的概率值,包括:The method according to claim 1, characterized in that determining, according to each of the second similarities, a probability value that the target vector is included in the target retrieval partition comprises:
    在各所述第二相似度中,确定所述第二相似度由高到低排序在前的t个目标第二相似度; Among the second similarities, determine t target second similarities that are ranked first in descending order of the second similarities;
    将所述待查询向量、所述K个第一相似度和所述t个目标第二相似度输入预测模型,得到所述概率值;所述预测模型用于对所述目标检索分区中包含所述目标向量的概率值进行预测。The query vector, the K first similarities and the t target second similarities are input into a prediction model to obtain the probability value; the prediction model is used to predict the probability value of the target vector being included in the target retrieval partition.
  8. 如权利要求7所述的方法,其特征在于,所述待查询向量为N个,所述N为大于1的正整数;The method according to claim 7, wherein the number of vectors to be queried is N, and N is a positive integer greater than 1;
    将所述待查询向量、所述K个第一相似度和所述t个目标第二相似度输入预测模型,得到所述概率值,包括:Inputting the query vector, the K first similarities and the t target second similarities into a prediction model to obtain the probability value includes:
    将N个待查询向量形成的矩阵和所述N个待查询向量中每个待查询向量对应的所述K个第一相似度形成的矩阵输入第一预测模型,得到所述N个待查询向量对应的N个初始概率值;所述初始概率值用于表征任一待查询向量对应的K个检索分区中包含所述待查询向量的目标向量的概率;Inputting a matrix formed by N query vectors and a matrix formed by the K first similarities corresponding to each of the N query vectors into a first prediction model to obtain N initial probability values corresponding to the N query vectors; the initial probability values are used to represent the probability that a target vector of the query vector is included in the K search partitions corresponding to any query vector;
    针对任一待查询向量,将所述待查询向量对应的初始概率值和所述待查询向量对应的所述t个目标第二相似度输入第二预测模型,得到所述待查询向量对应的最终概率值。For any vector to be queried, the initial probability value corresponding to the vector to be queried and the t target second similarities corresponding to the vector to be queried are input into the second prediction model to obtain a final probability value corresponding to the vector to be queried.
  9. 一种向量检索装置,其特征在于,包括:A vector search device, characterized by comprising:
    获取单元,用于获取待查询向量;An acquisition unit, used for acquiring a vector to be queried;
    处理单元,用于:A processing unit for:
    将所述待查询向量分别与M个聚类分区的分区中心向量做相似度计算,得到M个第一相似度;所述M个聚类分区为对向量底库中的各向量按照向量之间的相似度进行聚类得到的;任一所述聚类分区的分区中心向量是根据任一所述聚类分区中包含的多个向量确定的,所述M为大于1的整数;Calculate similarity between the query vector and the partition center vectors of M cluster partitions respectively to obtain M first similarities; the M cluster partitions are obtained by clustering the vectors in the vector base according to the similarity between the vectors; the partition center vector of any of the cluster partitions is determined according to a plurality of vectors contained in any of the cluster partitions, and M is an integer greater than 1;
    在所述M个第一相似度中,选择所述第一相似度由高到低排序在前的K个第一相似度,并将所述K个第一相似度分别对应的聚类分区确定为K个检索分区,所述K为大于等于1的整数、且K小于所述M;Among the M first similarities, select K first similarities ranked first in descending order, and determine the cluster partitions corresponding to the K first similarities as K search partitions, where K is an integer greater than or equal to 1, and K is less than M;
    循环执行如下操作,直至确定出在所述K个检索分区中选择的目标检索分区包含目标向量的概率值大于第一预设阈值为止,所述目标向量为与所述待查询向量之间的相似度在预设范围内的向量:The following operations are performed in a loop until it is determined that the probability value of the target search partition selected from the K search partitions containing the target vector is greater than a first preset threshold, wherein the target vector is a vector whose similarity with the query vector is within a preset range:
    在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区;Selecting a retrieval partition that has not been selected from the K retrieval partitions as a target retrieval partition;
    计算所述待查询向量分别与所述目标检索分区中包含的各向量之间的第二相似度;Calculating second similarities between the query vector and each vector included in the target retrieval partition;
    根据各所述第二相似度,确定所述目标检索分区中包含目标向量的概率值;Determining, according to each of the second similarities, a probability value of including a target vector in the target retrieval partition;
    基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果。Based on the at least one selected retrieval partition and the vector to be queried, a query result is output.
  10. 如权利要求9所述的装置,其特征在于,在所述处理单元基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果时,具体用于:The apparatus according to claim 9, wherein when the processing unit outputs the query result based on the at least one selected retrieval partition and the vector to be queried, it is specifically configured to:
    将已经选择的至少一个检索分区中所述概率值大于所述第一预设阈值的检索分区中包含的各向量,作为查询结果输出;或者Outputting each vector contained in the retrieval partition whose probability value is greater than the first preset threshold in the at least one selected retrieval partition as a query result; or
    根据已经选择的至少一个检索分区中所述概率值大于所述第一预设阈值的检索分区中包含的各向量分别与所述待查询向量之间的所述第二相似度由高到低的顺序,将排序在前的W个所述第二相似度分别对应的向量,作为查询结果输出,所述W为正整数。According to the order from high to low of the second similarities between the vectors contained in the retrieval partition whose probability value is greater than the first preset threshold in at least one retrieval partition that has been selected and the vector to be queried, the vectors corresponding to the first W second similarities are output as query results, where W is a positive integer.
  11. 如权利要求9所述的装置,其特征在于,在所述处理单元基于已经选择的至少一个检索分区和所述待查询向量,输出查询结果时,具体用于:The apparatus according to claim 9, wherein when the processing unit outputs the query result based on the at least one selected retrieval partition and the vector to be queried, it is specifically configured to:
    将已经选择的至少一个检索分区分别包含的各向量,作为查询结果输出;或者Outputting the vectors respectively contained in the at least one selected search partition as query results; or
    根据已经选择的至少一个检索分区中分别包含的各向量与所述待查询向量之间的所述第二相似度由高到低的顺序,将排序在前的W个所述第二相似度分别对应的向量,作为查询结果输出,所述W为正整数。According to the descending order of the second similarities between the vectors respectively contained in the at least one selected retrieval partition and the vector to be queried, the first W vectors corresponding to the second similarities are output as query results, where W is a positive integer.
  12. 如权利要求9所述的装置,其特征在于,在所述处理单元在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区时,具体用于:The device according to claim 9, characterized in that when the processing unit selects a retrieval partition that has not been selected as the target retrieval partition among the K retrieval partitions, it is specifically used to:
    按照所述K个第一相似度由高到低的顺序在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区。A retrieval partition that has not been selected is selected from the K retrieval partitions as a target retrieval partition in the order of the K first similarities from high to low.
  13. 如权利要求9所述的装置,其特征在于,在所述处理单元在所述K个检索分区中选择未被选择过的检索分区作为目标检索分区时,具体用于:The device according to claim 9, characterized in that when the processing unit selects a retrieval partition that has not been selected as the target retrieval partition among the K retrieval partitions, it is specifically used to:
    针对所述K个检索分区中的任一检索分区,将所述检索分区中的各向量按照向量之间的相似度进行聚类得到多个检索子分区;根据任一检索子分区中包含的多个向量确定任一所述检索子分区的子分区中心向量;For any retrieval partition among the K retrieval partitions, clustering the vectors in the retrieval partition according to the similarity between the vectors to obtain a plurality of retrieval sub-partitions; determining a sub-partition center vector of any retrieval sub-partition according to the plurality of vectors contained in any retrieval sub-partition;
    计算所述待查询向量分别与所述多个检索子分区的子分区中心向量的第三相似度;Calculating third similarities between the query vector and the sub-partition center vectors of the plurality of retrieval sub-partitions respectively;
    根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对所述K个检索分区进行排序; sorting the K search partitions according to a plurality of third similarities between the query vector and a plurality of sub-partition center vectors in each search partition;
    在排序后的K个检索分区中选择未被选择过的检索分区作为所述目标检索分区。A retrieval partition that has not been selected is selected from the sorted K retrieval partitions as the target retrieval partition.
  14. 如权利要求9所述的装置,其特征在于,在所述处理单元根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度,对所述K个检索分区进行排序时,具体用于:The apparatus according to claim 9, wherein when the processing unit sorts the K search partitions according to multiple third similarities between the query vector and multiple sub-partition center vectors in each search partition, it is specifically configured to:
    根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中超出第二预设阈值的所述第三相似度的数量,对所述K个检索分区进行排序;或者sorting the K retrieval partitions according to the number of the third similarities between the query vector and the central vectors of the multiple sub-partitions in each retrieval partition that exceeds a second preset threshold; or
    根据所述待查询向量与每个检索分区中的多个子分区中心向量的多个第三相似度中的最大相似度,对所述K个检索分区进行排序。The K retrieval partitions are sorted according to the maximum similarity among multiple third similarities between the query vector and multiple sub-partition center vectors in each retrieval partition.
  15. 如权利要求9所述的装置,其特征在于,在所述处理单元根据各所述第二相似度,确定所述目标检索分区中包含目标向量的概率值时,具体用于:The device according to claim 9, characterized in that when the processing unit determines the probability value of the target retrieval partition containing the target vector according to each of the second similarities, it is specifically used to:
    在各所述第二相似度中,确定所述第二相似度由高到低排序在前的t个目标第二相似度;Among the second similarities, determine t target second similarities that are ranked first in descending order;
    将所述待查询向量、所述K个第一相似度和所述t个目标第二相似度输入预测模型,得到所述概率值;所述预测模型用于对所述目标检索分区中包含所述目标向量的概率值进行预测。The query vector, the K first similarities and the t target second similarities are input into a prediction model to obtain the probability value; the prediction model is used to predict the probability value of the target vector being included in the target retrieval partition.
  16. 如权利要求15所述的装置,其特征在于,所述待查询向量为N个,所述N为大于1的正整数;The device according to claim 15, wherein the number of vectors to be queried is N, and N is a positive integer greater than 1;
    在所述处理单元将所述待查询向量、所述K个第一相似度和所述t个目标第二相似度输入预测模型,得到所述概率值时,具体用于:When the processing unit inputs the query vector, the K first similarities and the t target second similarities into the prediction model to obtain the probability value, it is specifically used to:
    将N个待查询向量形成的矩阵和所述N个待查询向量中每个待查询向量对应的所述K个第一相似度形成的矩阵输入第一预测模型,得到所述N个待查询向量对应的N个初始概率值;所述初始概率值用于表征任一待查询向量对应的K个检索分区中包含所述待查询向量的目标向量的概率;Inputting a matrix formed by N query vectors and a matrix formed by the K first similarities corresponding to each of the N query vectors into a first prediction model to obtain N initial probability values corresponding to the N query vectors; the initial probability values are used to represent the probability that a target vector of the query vector is included in the K search partitions corresponding to any query vector;
    针对任一待查询向量,将所述待查询向量对应的初始概率值和所述待查询向量对应的所述t个目标第二相似度输入第二预测模型,得到所述待查询向量对应的最终概率值。For any vector to be queried, the initial probability value corresponding to the vector to be queried and the t target second similarities corresponding to the vector to be queried are input into the second prediction model to obtain a final probability value corresponding to the vector to be queried.
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序或指令,当所述计算机程序或指令被向量检索装置执行时,实现如权利要求1至8中任一项所述的方法。A computer-readable storage medium, characterized in that a computer program or instruction is stored in the computer-readable storage medium, and when the computer program or instruction is executed by a vector retrieval device, the method as described in any one of claims 1 to 8 is implemented.
  18. 一种芯片,其特征在于,包括至少一个处理器和接口;所述接口,用于为所述至少一个处理器提供程序指令或者数据;所述至少一个处理器用于执行所述程序行指令,以实现如权利要求1至8中任一项所述的方法。 A chip, characterized in that it includes at least one processor and an interface; the interface is used to provide program instructions or data to the at least one processor; the at least one processor is used to execute the program line instructions to implement the method as described in any one of claims 1 to 8.
PCT/CN2023/121585 2022-09-28 2023-09-26 Vector retrieval method and device WO2024067593A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211193810.4A CN117828131A (en) 2022-09-28 2022-09-28 Vector retrieval method and device
CN202211193810.4 2022-09-28

Publications (1)

Publication Number Publication Date
WO2024067593A1 true WO2024067593A1 (en) 2024-04-04

Family

ID=90476356

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/121585 WO2024067593A1 (en) 2022-09-28 2023-09-26 Vector retrieval method and device

Country Status (2)

Country Link
CN (1) CN117828131A (en)
WO (1) WO2024067593A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449132A (en) * 2021-08-26 2021-09-28 阿里云计算有限公司 Vector retrieval method and device
CN113704534A (en) * 2021-04-13 2021-11-26 腾讯科技(深圳)有限公司 Image processing method and device and computer equipment
CN114020746A (en) * 2021-11-04 2022-02-08 山东库睿科技有限公司 Data processing method and device
CN114385280A (en) * 2020-10-16 2022-04-22 华为技术有限公司 Parameter determination method and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385280A (en) * 2020-10-16 2022-04-22 华为技术有限公司 Parameter determination method and electronic equipment
CN113704534A (en) * 2021-04-13 2021-11-26 腾讯科技(深圳)有限公司 Image processing method and device and computer equipment
CN113449132A (en) * 2021-08-26 2021-09-28 阿里云计算有限公司 Vector retrieval method and device
CN114020746A (en) * 2021-11-04 2022-02-08 山东库睿科技有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN117828131A (en) 2024-04-05

Similar Documents

Publication Publication Date Title
US20210279285A1 (en) Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet ssd
US20110119467A1 (en) Massively parallel, smart memory based accelerator
US8380643B2 (en) Searching multi-dimensional data using a parallelization framework comprising data partitioning and short-cutting via early out
US9747547B2 (en) Hardware enhancements to radial basis function with restricted coulomb energy learning and/or k-Nearest Neighbor based neural network classifiers
US20240054384A1 (en) Operation-based partitioning of a parallelizable machine learning model network on accelerator hardware
CN109165307B (en) Feature retrieval method, device and storage medium
Jiang et al. MicroRec: efficient recommendation inference by hardware and data structure solutions
CN110795469B (en) Spark-based high-dimensional sequence data similarity query method and system
WO2019029714A1 (en) Image content-based display object determination method, device, medium, and apparatus
WO2019120007A1 (en) Method and apparatus for predicting user gender, and electronic device
CN112364093B (en) Learning type big data visualization method and system
US20230161811A1 (en) Image search system, method, and apparatus
CN113971225A (en) Image retrieval system, method and device
CN117056465A (en) Vector searching method, system, electronic device and storage medium
US20240192880A1 (en) Data processing method, apparatus, and system
WO2024067593A1 (en) Vector retrieval method and device
CN113239218A (en) Method for concurrently executing face search on NPU-equipped device
CN110209895B (en) Vector retrieval method, device and equipment
CN116547647A (en) Search device and search method
CN115836346A (en) In-memory computing device and data processing method thereof
US20210319022A1 (en) Parallel pruning and batch sorting for similarity search accelerators
CN118133039A (en) Accelerating large-scale similarity computation
CN112214627B (en) Search method, readable storage medium, and electronic device
CN113297226A (en) Data storage method, data reading method, data storage device, electronic device and medium
WO2023222091A1 (en) Vector retrieval method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23870821

Country of ref document: EP

Kind code of ref document: A1