CN117828131A - Vector retrieval method and device - Google Patents

Vector retrieval method and device Download PDF

Info

Publication number
CN117828131A
CN117828131A CN202211193810.4A CN202211193810A CN117828131A CN 117828131 A CN117828131 A CN 117828131A CN 202211193810 A CN202211193810 A CN 202211193810A CN 117828131 A CN117828131 A CN 117828131A
Authority
CN
China
Prior art keywords
vector
partition
search
queried
partitions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211193810.4A
Other languages
Chinese (zh)
Inventor
邝达
施佩珍
王兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202211193810.4A priority Critical patent/CN117828131A/en
Priority to PCT/CN2023/121585 priority patent/WO2024067593A1/en
Publication of CN117828131A publication Critical patent/CN117828131A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A vector retrieval method and device are used for solving the problem of low retrieval speed in the existing retrieval method. In this application, the method includes: obtaining a vector to be queried; respectively carrying out similarity calculation on the vector to be queried and partition center vectors of M clustering partitions to obtain M first similarities; determining K retrieval partitions according to the M first similarities; and circularly executing the following operations until the probability value of the target retrieval partition containing the target vector is greater than a first preset threshold value: selecting a search partition from the K search partitions as a target search partition; calculating a second similarity between the vector to be queried and each vector contained in the target retrieval partition; determining a probability value of the target retrieval partition containing the target vector according to each second similarity; based on the at least one search partition that has been selected, a query result is output. The similarity between the vector to be queried and all vectors in the vector base is not required to be calculated, so that a query result is obtained, the calculated amount can be reduced, and the query speed can be improved.

Description

Vector retrieval method and device
Technical Field
The present disclosure relates to the field of search technologies, and in particular, to a vector search method and apparatus.
Background
Vector search plays an important role in the field of information retrieval. The vector retrieval process comprises the steps of firstly constructing a vector base, wherein the vector base comprises a large number of vectors obtained by extracting features of a large number of data, and the data can be in the forms of pictures, videos, audios, texts and the like; and then, respectively calculating the similarity between the vector to be queried input by the user and all vectors in the vector base, and returning the vector corresponding to the first W similarity ordered from high to low as the query result of the vector to be queried.
The method carries out global search comparison on vector base libraries containing vector of hundred million levels or even billion levels, and has low search throughput (Query Per Second, speed) and low search speed.
Disclosure of Invention
The application provides a vector retrieval method and a vector retrieval device, which are used for solving the problem of low retrieval speed in the existing vector retrieval method.
In a first aspect, the present application provides a vector retrieval method, which may be executed in particular by a computing device or by a chip internal to the computing device, or by a processor in the computing device. The method comprises the following steps: obtaining a vector to be queried;
Respectively carrying out similarity calculation on the vector to be queried and partition center vectors of M clustering partitions to obtain M first similarities; the M clustering partitions are obtained by clustering the vectors in the vector base according to the similarity among the vectors; the partition center vector of any clustering partition is determined according to a plurality of vectors contained in any clustering partition, and M is an integer greater than 1; selecting K first similarities with the first similarities from high to low from the M first similarities, and determining clustering partitions respectively corresponding to the K first similarities as K retrieval partitions, wherein K is an integer greater than or equal to 1 and is smaller than M;
and circularly executing the following operations until the probability value that the target retrieval partition selected from the K retrieval partitions contains the target vector is larger than a first preset threshold value, wherein the target vector is a vector with similarity with the vector to be queried within a preset range:
selecting a search partition which is not selected from the K search partitions as a target search partition; calculating second similarity between the vector to be queried and each vector contained in the target retrieval partition; determining a probability value of the target vector contained in the target retrieval partition according to each second similarity;
And outputting a query result based on the at least one selected search partition and the vector to be queried.
In the technical scheme, the similarity between the vector to be queried and all the vectors in the vector base is not required to be calculated, so that a query result is obtained, but each vector in the vector base is clustered to obtain M clustering partitions, and each clustering partition corresponds to a partition center vector; selecting K retrieval partitions from the M clustering partitions by calculating first similarity of the vector to be queried and partition center vectors of the M clustering partitions and by the magnitude relation of the M first similarity; and sequentially selecting target retrieval partitions from the K retrieval partitions, and determining a probability value of the vector which is the same as or similar to the vector to be queried and falls in the selected target retrieval partition aiming at each selected target retrieval partition until the target retrieval partition with the probability value larger than a first preset threshold value is selected. The query results for the vector to be queried are then determined in the at least one target search partition that has been selected. Thus, the calculation amount can be reduced, and the query speed can be improved.
In one possible implementation, outputting the query result based on the at least one search partition that has been selected and the vector to be queried, includes: outputting each vector contained in the search partition with the probability value larger than the first preset threshold value in the at least one selected search partition as a query result; or, according to the sequence from high to low of the second similarity between each vector contained in the search partition with the probability value larger than the first preset threshold value in at least one selected search partition and the vector to be queried, outputting the vector with the W second similarities which are ranked in front and correspond to each other as a query result, wherein W is a positive integer.
In the technical scheme, since only one search partition with the probability value larger than the first preset threshold is selected, all vectors contained in the search partition with the probability value larger than the first preset threshold are output as query results, so that the calculated amount can be effectively reduced, and the search speed can be improved. Or the vectors with the probability value larger than the first preset threshold value and corresponding to the first W second similarity degrees are respectively sequenced from high to low between all vectors contained in the search partition with the probability value larger than the first preset threshold value and the vectors to be queried, and are output as query results, so that the output results can be further reduced.
In one possible implementation, outputting the query result based on the at least one search partition that has been selected and the vector to be queried, includes: each vector contained in the selected at least one search partition is output as a query result; or according to the sequence from high to low of the second similarity between each vector contained in at least one selected search partition and the vector to be queried, outputting vectors corresponding to the first W second similarities as query results, wherein W is a positive integer.
In the above technical solution, the query result is output based on at least one selected search partition, instead of outputting the query result based on the search partition whose probability value is greater than the first preset threshold. The vector with higher similarity to the vector to be queried may exist in the retrieval partition with the probability value of the vector which is the same as or similar to the vector to be queried falling in the selected partition not larger than the first preset threshold, so that the scheme can improve the precision of vector retrieval.
In one possible implementation manner, the unselected search partitions in the K search partitions are selected as target search partitions, and the unselected search partitions in the K search partitions may be selected as target search partitions in order of the K first similarities from high to low.
In this way, the target retrieval partitions are selected according to the order of the K first similarities, the target retrieval partition with the probability value larger than the first preset threshold value can be determined as early as possible, the possibility of selecting the target retrieval partition again is reduced as possible, namely, the similarity between the vector to be queried and the vector in the target retrieval partition selected again is not required to be calculated, so that the calculated amount can be reduced, and the retrieval speed is improved.
In one possible implementation manner, selecting a search partition that is not selected from the K search partitions as a target search partition may be: clustering each vector in the search partition according to the similarity among vectors to obtain a plurality of search sub-partitions aiming at any one of the K search partitions; determining a sub-partition center vector of any search sub-partition according to a plurality of vectors contained in any search sub-partition; calculating third similarity between the vector to be queried and sub-partition center vectors of the plurality of cable sub-partitions respectively; sorting the K search partitions according to a plurality of third similarity between the vector to be queried and a plurality of sub-partition center vectors in each search partition; and selecting unselected search partitions from the K search partitions after sequencing as the target search partition.
In this way, by further clustering the vectors in the search partition, a plurality of search sub-partitions are obtained, each search sub-partition corresponding to a sub-partition center vector. Because of finer division, the center vector of the sub-partition obtained by division can more accurately represent the vector in the retrieval sub-partition. Based on the third similarity between the vector to be queried and the center vectors of the sub-partitions in each search partition, the K search partitions are ranked, so that the accuracy of ranking can be improved. Therefore, the target retrieval partition with the probability value larger than the first preset threshold value can be determined as soon as possible, the possibility of selecting the target retrieval partition again is reduced as possible, and the similarity between the vector to be queried and the vector in the target retrieval partition selected again is not required to be calculated, so that the calculated amount can be reduced, and the retrieval speed is improved.
In one possible implementation manner, sorting the K search partitions according to a plurality of third similarities between the vector to be queried and a plurality of sub-partition center vectors in each search partition includes: sorting the K search partitions according to the number of third similarity exceeding a second preset threshold value in the third similarity between the vector to be queried and the center vectors of the sub-partitions in each search partition; or sorting the K search partitions according to the maximum similarity of the third similarities of the vector to be queried and the center vectors of the sub-partitions in each search partition.
The K search partitions are ranked through the number of the third similarity exceeding a second preset threshold value in the third similarities or through the maximum similarity in the third similarities, so that the ranking difficulty can be reduced, the ranking speed can be improved, and the search speed can be further improved. Meanwhile, the accuracy of sequencing can be improved. Thus, the target retrieval partition with the probability value larger than the first preset threshold value can be determined as early as possible.
In one possible implementation manner, determining a probability value of the target retrieval partition containing the target vector according to each second similarity includes: determining the first t target second similarities of the second similarity from high to low in each second similarity; inputting the vector to be queried, the K first similarities and the t target second similarities into a prediction model to obtain the probability value; the prediction model is used for predicting the probability value of the target vector contained in the target retrieval partition.
The probability value is predicted through the prediction model, so that the accuracy of determining the probability value and the speed of determining the probability value are improved. And in the second similarity, t target second similarities with the second similarity being ranked from high to low are selected, and the t target second similarities are input into the prediction model, so that the calculated amount of the prediction model can be reduced, the speed of predicting the probability value is improved, and the prediction precision is not influenced.
In one possible implementation manner, the number of vectors to be queried is N, where N is a positive integer greater than 1; correspondingly, inputting the vector to be queried, the K first similarities and the t target second similarities into a prediction model to obtain the probability value, wherein the probability value comprises: inputting a matrix formed by N vectors to be queried and a matrix formed by the K first similarities corresponding to each vector to be queried in the N vectors to be queried into a first prediction model to obtain N initial probability values corresponding to the N vectors to be queried; the initial probability value is used for representing the probability of the target vector containing the vector to be queried in K search partitions corresponding to any vector to be queried; and inputting an initial probability value corresponding to any vector to be queried and the t target second similarity corresponding to the vector to be queried into a second prediction model to obtain a final probability value corresponding to the vector to be queried.
The prediction of the probability value is divided into two stages, wherein the first stage adopts a first prediction model, and the second stage adopts a second prediction model. Specifically, a matrix formed by N vectors to be queried and a matrix formed by K first similarities corresponding to each vector to be queried in the N vectors to be queried are input into a first prediction model, so that the first prediction model can predict an initial probability value in a matrix multiplying mode, the computing power is fully exerted, the computing efficiency is improved, and the vector retrieval speed is further improved.
In a second aspect, an embodiment of the present application provides a vector retrieving apparatus, where the apparatus has a function of implementing the method in the first aspect or any possible implementation manner of the first aspect, and the apparatus may be a computing device or may be a processor included in the computing device. The functions of the vector retrieval apparatus may be implemented by hardware, or may be implemented by executing corresponding software by hardware, where the hardware or software includes one or more modules or units or means corresponding to the functions.
In a possible implementation manner, the apparatus includes a processing module and a transceiver module in a structure of the apparatus, where the processing module is configured to support the apparatus to perform the method in the first aspect or any implementation manner of the first aspect. The transceiver module is used to support communication between the device and other devices, for example, data from the acquisition device may be received. The vector retrieval apparatus may further comprise a memory module coupled to the processing module, which holds the program instructions and data necessary for the apparatus. As an example, the processing module may be a processor, the transceiver module may be a transceiver, and the storage module may be a memory, where the memory may be integrated with the processor, or may be separately provided from the processor.
In another possible implementation, the apparatus includes a processor in its structure and may also include a memory. The processor is coupled to the memory and operable to execute computer program instructions stored in the memory to cause the apparatus to perform the method of the first aspect or any one of the possible implementations of the first aspect. Optionally, the apparatus further comprises a communication interface, the processor being coupled to the communication interface. When the apparatus is a computing device, the communication interface may be a transceiver or an input/output interface.
In a third aspect, embodiments of the present application provide a chip comprising a processor coupled to a memory for storing a program or instructions that, when executed by the processor, cause the chip to implement the method of the first aspect or any one of the possible implementations of the first aspect.
Optionally, the chip further comprises an interface circuit for interfacing code instructions to the processor.
Alternatively, the processor in the chip may be one or more, and the processor may be implemented by hardware or software. When implemented in hardware, the processor may be a logic circuit, an integrated circuit, or the like. When implemented in software, the processor may be a general purpose processor, implemented by reading software code stored in a memory.
Alternatively, the memory in the chip may be one or more. The memory may be integral to the processor or separate from the processor. For example, the memory may be a non-transitory processor, such as a read only memory ROM, which may be integrated on the same chip as the processor or may be separately provided on different chips.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program or instructions which, when executed, cause a computer to perform the method of the first aspect or any of the possible implementations of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, which when read and executed by a computer causes the computer to perform the method of the first aspect or any of the possible implementations of the first aspect.
The technical effects achieved by any one of the second to fifth aspects may be referred to the description of the beneficial effects in the first aspect, and the detailed description is not repeated here.
Drawings
FIG. 1a is a schematic diagram of vector search in a graph searching scenario provided in the present application;
FIG. 1b is a schematic diagram of vector search in the context of drug discovery provided herein;
FIG. 2 is a schematic diagram of a system architecture provided herein;
FIG. 3 is a schematic diagram of a computing device provided herein;
FIG. 4 is a schematic diagram of a processor according to the present application;
FIG. 5a is a schematic flow chart of a vector search technique provided in the present application;
FIG. 5b is a schematic diagram of M clustering partitions obtained after clustering vectors in a vector base;
FIG. 6 is a schematic flow chart of a vector retrieval method provided in the present application;
FIG. 7 is a schematic diagram of a partitioning search sub-partition for any search partition provided herein;
FIG. 8 is a schematic diagram of a second similarity between a vector to be queried and a partition center vector of each search partition and a third similarity between a vector to be queried and a sub-partition center vector of each search sub-partition provided in the present application;
FIG. 9 is a flowchart of a method for obtaining probability values according to second similarities provided by the present application;
FIG. 10 is a flowchart of another method for obtaining probability values according to the second similarity provided in the present application;
FIG. 11a is a schematic diagram of M first similarity matrices between any vector to be queried and M partition center vectors obtained by performing calculation in a matrix multiplication manner of a hardware accelerator;
FIG. 11b is a schematic diagram of a matrix formed by 3 first similarities corresponding to each of 2 vectors to be queried;
FIG. 12 is a schematic diagram of a method for determining probability values according to the present application;
FIG. 13 is an overall flow chart of a vector search method provided in the present application;
fig. 14 is a schematic diagram of a vector search device provided in the present application.
Detailed Description
For a better explanation of the present application, the following description of the techniques or terms referred to in the present application will be given.
1. Vector retrieval techniques, in which a vector is retrieved in a given vector dataset, in a manner that is similar to the vector to be queried.
2. The k-means clustering algorithm (k-means clustering algorithm) is an iteratively solved clustering analysis algorithm. Specifically, given the number k of classes, the whole data set is clustered, the objective function is the sum of the distances from all samples to class centers, and the objective function is optimized by iterative calculation, so that k class centers and the class to which each sample belongs are obtained.
3. The accuracy of the search may also be referred to as recall. And giving a vector to be queried, retrieving the vector to be queried by a retrieval system, and returning W vectors as query results. Let the returned set of W vectors be X, define the set of W vectors with the top order from high to low in the whole vector base as Y, then the search accuracy of the search system to the vector to be searched is |X U Y|/|Y|.
Fig. 1a shows a schematic diagram of vector retrieval in a graph searching scenario. Specifically, a large number of images are firstly subjected to feature extraction to obtain a large number of vectors, and the vectors form a vector base; searching a vector to be queried obtained after feature extraction of the picture to be queried in a vector base, and searching a vector meeting the similarity requirement with the vector to be queried; and determining from which pictures the vectors meeting the similarity requirements are extracted, and returning the determined pictures as query results. For example, in the internet application, according to the commodity picture to be queried input by the user, searching out a picture containing commodities with similar appearance to the commodities contained in the commodity picture input by the user; for example, other videos similar to the pictures are retrieved from the pictures of the video frequently browsed by the user, and pushed to the user, and the like. The ever-increasing data size of the internet places higher demands on the retrieval speed and efficiency of the retrieval system.
Fig. 1b shows a schematic diagram of vector retrieval in a drug discovery scenario. Forming a vector base by a large number of vectors obtained after the large number of compounds are coded by a coder; searching a vector to be queried obtained after the active fragment or the lead compound of the drug to be queried is encoded by an encoder in a vector base, and searching a vector meeting the similarity requirement with the vector to be queried; and returning the compounds corresponding to the vectors meeting the similarity requirement as query results. The development of new drugs requires searching a library of compounds of the order of billions/billions for compounds similar to the active fragment or lead of the new drug as potential drugs. Since the selection of similar compounds can affect the subsequent animal experiments and clinical experiments with longer period, the application also puts a great demand on the retrieval speed of the retrieval system.
How to determine how to retrieve a vector satisfying the similarity requirement with the vector to be queried can be provided in two ways:
and (2) calculating the similarity between the vector to be queried and all vectors in the whole vector base, and selecting W vectors with the similarity from high to low and the top ranking as query results from the similarity.
And 2, sequentially calculating the similarity between the vector to be queried and each vector in the whole vector base until the number W of vectors with the similarity meeting the preset threshold are found, and stopping calculating the similarity between the vector to be queried and the rest other vectors in the vector base.
Mode 1 requires calculation of the similarity between the vector to be queried and all the vectors in the whole vector base, and the number of vectors in the vector base is huge, generally in the order of one hundred million/one billion, although the retrieval accuracy can be ensured. This results in a great amount of calculation, which limits the improvement of the search speed. The calculated amount in the mode 2 is reduced compared with the mode 1, but if the preset threshold is set higher, the calculated amount is still larger, and the searching speed is slow; if the preset threshold is set lower, the retrieval accuracy is affected. Therefore, vector retrieval is performed in the mode 2, the requirement on setting the preset threshold is high, even different preset thresholds are required to be set for different vectors to be queried, and the retrieval mode is not flexible enough.
In summary, the above-described vector search method cannot achieve both search accuracy and search speed. Based on the above, the embodiment of the application provides a vector retrieval method, which is used for improving the retrieval speed on the basis of ensuring the retrieval precision.
Fig. 2 provides a schematic diagram of a system architecture in which embodiments of the present application may be applicable, including an acquisition device 10, a computing device 20, and a storage device 30. Wherein the collection device 10 may be one or more, the computing device 20 may be one or more, and the storage device 30 may be one or more. The one or more acquisition devices 10, the one or more computing devices 20, and the one or more storage devices 30 may be connected by a network.
The acquisition device 10 may be used to acquire data and transmit the acquired data to the computing device 20 over a network. The acquisition device 10 may be a video camera, a mobile phone, a computer, etc., and the data acquired by the acquisition device 10 may be data such as pictures, video, audio, text, etc. For example, in a video surveillance scenario, the acquisition device 10 may be a camera, for example, that captures data such as pictures and/or video taken by the camera.
The computing device 20 is configured to perform feature extraction on any one of the obtained data to obtain a vector corresponding to the data; and forming a vector base by a large number of vectors corresponding to a large number of data, and carrying out clustering calculation on the large number of vectors in the vector base according to the similarity between the vectors so as to obtain M clustering partitions, wherein the similarity between the vectors in each clustering partition is higher, and M is an integer larger than 1. Each cluster partition has a corresponding partition center vector, and the partition center vector of each cluster partition is determined according to a plurality of vectors contained in the cluster partition, for example, the partition center vector of the cluster partition can be determined according to the mean, mode or median of the plurality of vectors contained in the cluster partition, and the partition center vector can be understood to be a representation of the plurality of vectors contained in the cluster partition and represents the characteristics of each vector contained in the cluster partition. The embodiment of the application does not limit the clustering algorithm, for example, a k-means clustering algorithm, a mean shift clustering method, a density-based clustering method and the like can be adopted to perform clustering calculation on a large number of vectors in a vector base according to the similarity among the vectors, so that M clustering partitions are obtained.
The storage device 30 may be configured to store a plurality of clustered partitions calculated by the computing device. For example, fig. 5b shows a schematic diagram of M clustering partitions obtained after clustering the vectors in the vector base, and in fig. 5b, it is assumed that 8 clustering partitions are obtained by clustering the vectors according to the similarity between the vectors, and the clustering partitions are distinguished by solid lines; and averaging the vectors in each clustering partition to obtain partition center vectors of the clustering partition, wherein the partition center vectors are represented by five-pointed stars in the figure, and a plurality of black solid points are used for representing a plurality of other vectors except the partition center vectors contained in the clustering partition. For example, a cluster partition contains 3 vectors, respectively [1, 1], [2, 2] and [3, 3], and the partition center vector of the cluster partition may be [2, 2].
After clustering the vectors in the vector base to obtain M cluster partitions, the computing device 20 may send the partition center vector of each cluster partition of the M cluster partitions and the vectors included in each cluster partition to the storage device 30 for storage. That is, the data structure shown in FIG. 5b may be stored in the storage device 30 for subsequent vector retrieval by the computing device 20.
In the vector retrieval phase, the acquisition device 10 may be used to acquire or retrieve data to be queried, which is sent to the computing device 20. For example, when the user opens the shopping application and inputs a picture to be queried including the commodity to be queried in the shopping application, the acquisition device acquires the picture to be queried and then sends the picture to be queried to the computing device 20.
The computing device 20 is configured to perform feature extraction on the picture to be queried to obtain a vector to be queried corresponding to the picture to be queried; and then searching for the similar vectors in M cluster partitions stored in the storage device 30 according to the vectors to be queried, and feeding back the found similar vectors to the user.
It should be appreciated that the acquisition device 10, the computing device 20 and the storage device 30 may be integrated in the same device or may be provided separately in different devices. For example, the computing device 20 and the storage device 30 may be integrated in a server, the acquisition device 10 in a terminal device, etc.
Further, as shown in FIG. 3, which is a schematic diagram of one possible configuration of a computing device 20, the computing device 20 includes a processor 201, a memory 202, and a communication interface 203. Any two of the processor 201, the memory 202, and the communication interface 203 may be connected via a bus 204.
The processor 201 may be a central processing unit (central processing unit, CPU) that may be used to execute software programs in the memory 202 to perform one or more functions, such as feature extraction of data, etc. In addition to a CPU, the processor 201 may be an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field programmable gate array, FPGA), a system on chip (SoC) or complex programmable logic device (complex programmable logic device, CPLD), a graphics processor (graphics processing unit, GPU), a neural-network accelerator (NPU), or the like.
It should be noted that, in practical applications, the number of the processors 201 may be plural, and the plural processors 201 may include plural processors of the same type, or may include plural processors of different types, for example, plural processors 201 are plural CPUs. For another example, the plurality of processors 201 include one or more CPUs and one or more GPUs. For another example, the plurality of processors 201 includes one or more CPUs and one or more NPUs. Alternatively, the plurality of processors 201 may include one or more CPUs, one or more GPUs, one or more NPUs, and the like. Wherein the processor 201 (e.g., CPU, NPU, etc.) may include one core or may include multiple cores.
The memory 202 refers to a device for storing data, and may be a memory or a hard disk.
Memory refers to an internal memory that exchanges data directly with the processor 201, which can read and write data at any time and at a high speed, as a temporary data memory for an operating system or other running program running on the processor 201. The memory includes volatile memory (RAM), such as random access memory (random access memory), dynamic random access memory (dynamic random access memory), and the like, and may also include nonvolatile memory (non-volatile memory), such as storage class memory (storage class memory, SCM), and the like, or a combination of volatile memory and nonvolatile memory, and the like. In practice, multiple memories may be configured in computing device 20, and optionally, the multiple memories may be of different types. The number and type of the memories are not limited in this embodiment. In addition, the memory can be configured to have a power-saving function. The power-saving function means that the data stored in the memory cannot be lost when the system is powered down and powered up again. The memory having the power-saving function is called a nonvolatile memory.
A hard disk for providing storage resources, for example, for storing data such as pictures, video, audio, text, etc. acquired by the acquisition device 10. Hard disks include, but are not limited to: a nonvolatile memory (non-volatile memory), such as a read-only memory (ROM), a Hard Disk Drive (HDD), or a Solid State Drive (SSD), or the like. The difference from the memory is that the hard disk has a slower read-write speed, and is generally used for storing data permanently. In one embodiment, data, program instructions, etc. in the hard disk are loaded into the memory before the processor retrieves the data and/or program instructions from the memory.
Communication interface 203 is used to communicate with other devices, such as for computing device 20 to communicate with acquisition device 10 or storage device 30.
In practice, as shown in fig. 4, two processors 201 may be included in the computing device 20, where the two processors 201 may be a CPU and an NPU, respectively, and the CPU may include 6 CPU cores, and the NPU may include 2 NPU cores, which may also be referred to as AI cores. The computing power of the NPU is higher than that of the CPU, the CPU may be used to perform similarity sorting during data retrieval, and the NPU may be used to perform similarity calculation during data retrieval, and the like, and in particular, reference may be made to the architecture of the processor 201 in the computing device 20 shown in fig. 4.
Based on the system architecture shown in fig. 2 and the hardware architectures of the computing devices shown in fig. 3 and 4, the present application exemplarily provides a schematic flow diagram of vector retrieval, and the schematic flow diagram may be referred to as fig. 5 a. In particular, the process may be performed by the computing device 20 shown in fig. 3-4, and the process may be divided into three stages:
1. feature extraction stage
The computing device 20 inputs each sample picture to a preset feature extraction model for the acquired plurality of sample pictures, and the embodiment of the present application does not limit the kinds of feature extraction models. For example, the vector can be input into a convolutional neural network (convolutional neural network, CNN) model to perform feature extraction, so that the CNN model outputs a vector corresponding to each sample picture. The computing device 20 then stores the vector for each sample picture in a vector base, which may be located in the memory 202 of the computing device 20 or in the storage device 30, where the storage device 30 may be a separate storage medium or memory, or the like.
2. Clustering stage
The computing device 20 clusters the vectors in the vector base according to the similarity between the vectors to obtain M clustered partitions, where each clustered partition corresponds to a partition center vector, and M is an integer greater than 1. By way of example, each vector in the vector base may be clustered in the following two ways to obtain M clustered partitions:
In a first implementation manner, each vector in the vector base is directly clustered to obtain M clustering partitions and partition center vectors of each clustering partition. The partition center vector is obtained from each vector in the clustered partition, for example, taking the average value, the median value, etc. of each vector, which is not limited in the embodiment of the present application. The specific clustering algorithm can be a k-means clustering algorithm, a fuzzy c-means clustering algorithm, a mean shift clustering method, a density-based clustering method and the like, and the embodiment of the application is not limited to the k-means clustering algorithm.
In the second implementation manner, vectors with a preset proportion (such as about 10%) are randomly selected from a vector base as training samples, and the training samples are clustered to obtain M clustering partitions and partition center vectors of each clustering partition. The specific clustering algorithm can be a k-means clustering algorithm, a fuzzy c-means clustering algorithm, a mean shift clustering method, a density-based clustering method and the like, and the embodiment of the application is not limited to the k-means clustering algorithm. And taking M partition center vectors as centers, and clustering other vectors except training samples in a vector base into M clustering partitions respectively. Therefore, the calculated amount for determining the partition center vector can be reduced, and the speed for determining the partition center vector can be improved.
3. Vector retrieval phase
Based on the feature extraction stage and the clustering stage, a plurality of clustering centers can be obtained, and each clustering center has a partition center vector corresponding to the clustering center. Subsequently, when a user has a query request, data to be queried can be input to the computing device 20 through the client, the computing device 20 performs feature extraction according to the acquired data to be queried to obtain vectors to be queried, and then the vectors to be queried are respectively subjected to similarity calculation with partition center vectors of M clustering partitions to obtain M first similarities. And further, selecting K first similarities with the first similarities ranked from high to low from M first similarities, and determining the clustering partitions corresponding to the K first similarities as K retrieval partitions.
Selecting a search partition which is not selected from the K search partitions as a target search partition, and calculating second similarity between vectors to be queried and each vector contained in the target search partition; and determining a probability value of a target vector in the target retrieval partition according to each second similarity, wherein the target vector refers to a vector of which the similarity with the vector to be queried is in a preset range. For example, a target vector refers to a vector having a similarity to the vector to be queried of greater than 0.9. If the probability value is larger than a first preset threshold value, the next target retrieval partition is not selected from the unselected retrieval partitions, and the retrieval can be stopped aiming at the vector to be queried; if the probability value is not greater than the first preset threshold value, continuing to select the next target retrieval partition in the unselected retrieval partition, continuing to calculate second similarity between the vector to be queried and each vector contained in the newly selected target retrieval partition, determining the probability value of the target vector contained in the newly selected target retrieval partition according to each second similarity, comparing the probability value with the first preset threshold value … … again, repeatedly executing the steps until the probability value corresponding to the selected target retrieval partition is greater than the first preset threshold value, and stopping selecting the next target retrieval partition in the unselected retrieval partition.
Therefore, in the vector retrieval method provided by the embodiment of the application, whether the retrieval is terminated in advance for the current vector to be queried is inferred in the process of vector retrieval, so that the speed of vector retrieval is improved. For example, the second similarity of the vector to be queried and each vector in the first search partition is calculated first, the probability of the target vector contained in the first search partition is determined according to the second similarity, and if the probability is higher, the second similarity of the vector to be queried and each vector of other search partitions is not calculated. Therefore, the calculation amount of the search can be reduced, and the speed of vector search can be improved.
The vector retrieval method provided in the embodiments of the present application will be described in detail below through specific steps, as shown in fig. 6, where the method may be performed by the computing device in fig. 2 or a chip in the computing device, and includes the following steps:
in step 601, a vector to be queried is obtained. The vector to be queried may be a certain vector input to the computing device by the user through the query client, or may be any vector obtained by the computing device from a vector base. The embodiments of the present application are not limited in this regard.
Step 602, performing similarity calculation on the vector to be queried and partition center vectors of M clustering partitions respectively to obtain M first similarities; the M clustering partitions are obtained by clustering the vectors in the vector base according to the similarity among the vectors; the partition center vector of any one of the clustered partitions is determined from a plurality of vectors contained in any one of the clustered partitions, and M is an integer greater than 1. And selecting K first similarities with the first similarities ranked from high to low from the M first similarities, and determining clustering partitions respectively corresponding to the K first similarities as K retrieval partitions, wherein K is an integer greater than or equal to 1, and K is smaller than M.
Step 603, performing the following operations in a circulating manner until it is determined that the probability value of the target search partition selected from the K search partitions, including the target vector, is greater than a first preset threshold, where the target vector is a vector having a similarity with the vector to be queried within a preset range:
selecting a search partition which is not selected from the K search partitions as a target search partition; calculating second similarity between the vector to be queried and each vector contained in the target retrieval partition; and determining the probability value of the target vector contained in the target retrieval partition according to each second similarity.
The order of selecting the target search partition from the K search partitions is not limited, and may be arbitrarily selected. For example, assuming that the K search partitions include a search partition a, a search partition B, and a search partition C, respectively, one target search partition may be selected among the search partition a, the search partition B, and the search partition C, such as selecting the search partition a. There are 100 vectors in the search partition A, and 100 second similarities between the vector to be queried and the 100 vectors in the search partition A are calculated. A vector corresponding to a second similarity greater than 0.9 among the 100 second similarities may be taken as the target vector. For example, 20 target vectors are determined, the probability value of the target vector contained in the search partition a may be determined to be 20/100=0.2.
If the determined probability value is greater than the first preset threshold value, the next target retrieval partition is not required to be selected from the rest of retrieval partitions, and the second similarity between the vector to be queried and each vector in the next target retrieval partition is not required to be calculated, so that the retrieval calculation amount can be saved. For example, when the first preset threshold is 0.18, the probability value that the search partition a contains the target vector is determined to be 0.2 to be greater than 0.18, so that the second similarity between the vector to be queried and each vector in the search partition B and the search partition C is not required to be calculated, and a lot of calculation workload can be saved.
If the determined probability value is not greater than the first preset threshold value, the fact that the number of the target vectors contained in the target retrieval partition selected currently is too small is indicated, the vectors to be queried are retrieved based on the target retrieval partition, and the retrieval accuracy is likely to be low. And thus continue to select a target search partition from the remaining unselected search partitions. For example, the search partition B is continuously selected, the steps after the search partition a is previously selected are repeated until the obtained probability value is greater than the first preset threshold value, and the selection of the search partition is stopped.
By the method, similarity calculation is not required between the vector to be queried and all vectors in the K search partitions, but only a part of the search partitions are selected, and similarity calculation is performed between the vector to be queried and the vectors in the part of the search partitions. Thus, the search calculation amount can be reduced and the search speed can be improved.
Step 604, when the loop execution is stopped in step 603, outputting a query result based on the at least one search partition and the vector to be queried. Wherein, based on the at least one search partition and the vector to be queried, the query result is output, which can include the following possible ways:
in a possible manner, among the at least one search partition that has been selected, a search partition whose probability value is greater than the first preset threshold is determined, and since the condition of the termination loop of step 603 is that the probability value is greater than the first preset threshold, only one of the search partitions whose probability value is greater than the first preset threshold is the last selected target search partition, for example, the search partition whose probability value is greater than the first preset threshold is the search partition a in the above example. And then searching the search result of the vector to be searched in the search partition. For example, each vector in the search partition is output or fed back to the user as a query result of the vector to be queried. For another example, W vectors corresponding to the first W second similarities of the vector to be queried and the vectors in the search partition from high to low may be output or fed back to the user as query results.
In the above technical solution, since only one search partition with the probability value greater than the first preset threshold is selected, all vectors contained in the search partition with the probability value greater than the first preset threshold are output as query results, so that the search calculation amount can be further effectively reduced, and the search speed can be further improved. Or the vectors with the probability value larger than the first preset threshold value and corresponding to the first W second similarity degrees are respectively sequenced from high to low between all vectors contained in the search partition with the probability value larger than the first preset threshold value and the vectors to be queried, and are output as query results, so that the query results can be further simplified.
In another possible manner, the query result of the vector to be queried is retrieved from a plurality of vectors respectively included in the at least one retrieval partition which has been selected. In step 603, if the probability value of the selected first target search partition is not greater than the first preset threshold, a second target search partition may be selected, and if the probability value corresponding to the second target search partition is greater than the first preset threshold, the next target search partition is not selected. The number of "at least one search partition that has been selected" herein may be greater than 1. For example, in the above example, the search partition a and the search partition B may be finally selected to find the search partition satisfying the "probability value is greater than the first preset threshold", and then the query result of the vector to be queried may be retrieved from the multiple vectors included in the search partition a and the search partition B respectively. For example, each vector included in the at least one search partition that has been selected may be output or fed back to the user as a query result. For another example, W vectors with the second similarity between the vector to be queried and each vector in the at least one selected search partition being respectively corresponding to the W second similarities with the top-down ranking are output or fed back to the user as query results.
For example, in step 603, a first target search partition is selected as a search partition a, each second similarity between the vector to be queried and each vector in the search partition a is calculated, a probability value is determined according to each second similarity, the probability value is not greater than a first preset threshold, and then the next target search partition is selected as a search partition B; and calculating second similarity of the vector to be queried and each vector in the retrieval partition B, and determining a probability value according to the second similarity, wherein the probability value is larger than a first preset threshold value, and the target retrieval partition is not selected any more. Then in step 604, "at least one search partition that has been selected" includes search partition a and search partition B. Since the second similarity between the vector to be queried and each vector in the search partition a and the second similarity between the vector to be queried and each vector in the search partition B have been calculated in step 603, in step 604, no repeated calculation is needed, so that the calculated amount is not increased, but the second similarity between the vector to be queried and each vector in the search partition a and the second similarity between the vector to be queried and each vector in the search partition B are directly ranked from high to low, and W vectors corresponding to the W first ranked second similarities are used as query results.
In the above technical solution, the query result is output based on at least one selected search partition, instead of outputting the query result based on the search partition whose probability value is greater than the first preset threshold. Because the vector which is the same as or similar to the vector to be queried falls in the retrieval partition with the probability value not larger than the first preset threshold value in the selected partition, the vector with higher similarity to the vector to be queried can exist, more accurate retrieval results can be output based on the calculated second similarity between the vector to be retrieved and each vector in the selected retrieval partition, and therefore the precision of vector retrieval can be improved.
In one possible implementation, the selection of the target search partition among the K search partitions may be performed according to a rule instead of being arbitrarily selected. Two methods of selecting the K search partitions are described below.
According to the method, K search partitions are ordered according to the order from high to low of K first similarity, so that unselected search partitions can be sequentially selected from the ordered K search partitions to serve as target search partitions.
For example, the first similarity between the vector to be queried and the partition center vector of the search partition a is 0.9, the first similarity between the vector to be queried and the partition center vector of the search partition B is 0.8, the first similarity between the vector to be queried and the partition center vector of the search partition C is 0.7, and the K search partitions are ordered in the following order: search partition a-search partition B-search partition C. In this way, when selecting the target search partition, the selection is also performed in this order.
The K search partitions corresponding to each vector to be queried are scientifically and reasonably ordered, and the target search partition is selected according to the sequence, so that the search partition with the probability value larger than the first preset threshold value can be found out as soon as possible, the vector search speed can be improved, and the search time consumption is reduced. For example, in the above example, the probability value of the search partition a is calculated first, and it is likely that the probability value is larger than the first preset threshold value, so that the search can be terminated as soon as possible. If the probability value of the search partition B is calculated first, it is likely that the search partition with the probability value greater than the first preset threshold cannot be obtained, so that the similarity between the vector to be queried and each vector in the search partition a needs to be calculated again, which increases the calculation amount and the search time.
In this way, the target retrieval partitions are selected according to the order of the K first similarities, the target retrieval partition with the probability value larger than the first preset threshold value can be determined as early as possible, the possibility of selecting the target retrieval partition again is reduced as possible, namely, the similarity between the vector to be queried and the vector in the target retrieval partition selected again is not required to be calculated, so that the calculated amount can be reduced, and the retrieval speed is improved.
Clustering vectors in each search partition according to the similarity among the vectors to obtain a plurality of search sub-partitions, wherein each search sub-partition also has a corresponding sub-partition center vector; and calculating third similarity between the vector to be queried and the sub-partition center vectors of the plurality of search sub-partitions, and sequencing the K search partitions according to the third similarity between the vector to be queried and the sub-partition center vectors in each search partition. Thus, the search partitions which are not selected from the K search partitions after the sorting can be sequentially selected as the target search partition.
FIG. 7 is a schematic diagram of a search sub-partition divided into any search partitions according to an embodiment of the present application. As illustrated, 3 search partitions are shown, and for any search partition, the vectors in that search partition are clustered, e.g., each search partition is divided into 5 search sub-partitions. Of course the number of search sub-partitions divided by different search partitions may be different. In fig. 7, search partitions are distinguished by solid lines, and search sub-partitions are distinguished by broken lines. The five-pointed star in the figure illustrates the partition center vector of the search partition, the triangle in the figure illustrates the sub-partition center vector of the search sub-partition, and the method for clustering the vectors in any search partition according to the embodiment of the application is not limited, and can refer to the method for clustering the vectors in the vector base to obtain a plurality of clustering partitions.
When the search partitions are ordered, the K search partitions can be ordered according to the maximum similarity of the third similarities of the vector to be queried and the center vectors of the sub-partitions in each search partition. For example, fig. 8 illustrates a second similarity between the vector to be queried and the partition center vector of each search partition and a third similarity between the vector to be queried and the sub-partition center vector of each search sub-partition. As shown in fig. 8, search partition a, search partition B, and search partition C, each of which is divided into 5 search sub-partitions. Calculating 5 third similarity of the vector to be queried and the sub-partition center vectors of the 5 retrieval sub-partitions in the retrieval partition A, and selecting the maximum similarity from the 5 third similarity; calculating 5 third similarity of the vector to be queried and the sub-partition center vectors of the 5 retrieval sub-partitions in the retrieval partition B, and selecting the maximum similarity from the 5 third similarity; calculating 5 third similarity of the vector to be queried and the sub-partition center vectors of the 5 retrieval sub-partitions in the retrieval partition C, and selecting the maximum similarity from the 5 third similarity; the 3 maximum similarities are ranked in order from high to low, and accordingly, the ranking of the 3 search partitions is obtained.
The K search partitions may be further ranked according to the number of third similarity degrees exceeding a second preset threshold among the plurality of third similarity degrees of the vector to be queried and the plurality of sub-partition center vectors in each search partition. For example, in fig. 8, 5 third similarity degrees of the vector to be queried and the sub-partition center vectors of 5 search sub-partitions in the search partition a are calculated, and the number x1 of the third similarity degrees exceeding the second preset threshold is determined; 5 third similarity degrees of the vector to be queried and the sub-partition center vectors of 5 retrieval sub-partitions in the retrieval partition B are calculated, and the number x2 of the third similarity degrees exceeding a second preset threshold value is determined; 5 third similarity degrees of the vector to be queried and the sub-partition center vectors of 5 retrieval sub-partitions in the retrieval partition C are calculated, and the number x3 of the third similarity degrees exceeding a second preset threshold value is determined; ordering x1, x2, and x3 in order from high to low, and accordingly, ordering of the 3 search partitions results.
Because the sub-partition center vector obtained by division can more accurately represent the vector in the search sub-partition, the K search partitions are ranked based on a plurality of third similarity between the vector to be queried and the sub-partition center vectors in each search partition, so that the accuracy of ranking can be improved. Therefore, the target retrieval partition with the probability value larger than the first preset threshold value can be determined as soon as possible, the possibility of selecting the target retrieval partition again is reduced as possible, and the similarity between the vector to be queried and the vector in the target retrieval partition selected again is not required to be calculated, so that the calculated amount can be reduced, and the retrieval speed is improved.
The pentagram in fig. 8 illustrates the partition center vector of the search partition, the triangle illustrates the sub-partition center vector of the search sub-partition, and the square illustrates the vector to be queried. The effect of using different sorting methods on the sorting of the K search partitions can be seen from FIG. 8. When the 3 search partitions are ordered according to the order of the 3 first similarities from high to low, the search partitions are ordered according to the order of the magnitudes of the 3 first similarities: search partition a-search partition B-search partition C. Fig. 8 shows 3 first similarities (3 first similarities are represented by distances from the square to 3 five stars in fig. 8, respectively), the closer the distance, the higher the similarity.
When the 3 search partitions are ordered according to the maximum similarity among the third similarities of the vector to be queried and the center vectors of the sub-partitions in each search partition, the ordering of the 3 search partitions is as follows: search partition B-search partition a-search partition C. Fig. 8 shows the maximum similarity among the third similarities of the vector to be queried and the 5 sub-partition center vectors in each search partition (represented by distances from square to 3 triangles in fig. 8, respectively), the closer the distance, the higher the similarity.
It can be seen that by dividing more detailed search sub-partitions, the ordering of the search partitions can be optimized and corrected. In a specific implementation, the search sub-partitions may be further divided, for example, each search sub-partition is further divided into a plurality of small partitions, so that the search accuracy and speed may be further improved. This application is not repeated here.
In one possible implementation, determining the probability value of the target vector included in the target search partition according to each second similarity may also be predicted by a prediction model. The prediction model can be a single-stage model or a double-stage model, and a single-stage model or a double-stage model is adopted.
One possible way to train the predictive model may be to train the predictive model with a large amount of tagged sample data. For example, for any sample data, extracting features of the sample data to obtain a sample vector; m first similarities between the sample vector and M clustering partitions are calculated, and K retrieval partitions are determined according to the magnitudes of the M first similarities; selecting one target retrieval partition from the K retrieval partitions, calculating each second similarity of the sample vector and each vector in the target retrieval partition, and selecting t target second similarities from each second similarity; and inputting the sample vector, K first similarities of the sample vector and K retrieval partitions, t target second similarities and labels into a prediction model, wherein the labels are probability values of target vectors contained in the target retrieval partitions. Through multiple times of training, parameters of the prediction model can be better optimized and adjusted.
Another possible way to train the prediction model may be to train the prediction model with a large amount of sample data, and parameters of the prediction model are adjusted according to an objective function. The embodiment of the application does not limit the form of the objective function. Through multiple times of training, parameters of the prediction model are optimized and adjusted.
If a single-stage model is adopted, the method for obtaining a probability value according to each second similarity in step 603 may be further refined, and fig. 9 exemplarily shows a method for obtaining a probability value according to each second similarity, which may specifically include the following steps:
step 901, selecting any unselected search partition from the K search partitions as a target search partition; and calculating each second similarity of the vector to be queried and each vector in the target retrieval partition. Among the second similarities, the first t target second similarities are determined from the top to the bottom.
The method for determining the target search partition for the vector to be queried is the same as that described above, and will not be described in detail here.
The method of determining t target second likelihoods for vectors to be queried is described in detail below.
For example, each second similarity of the vector to be queried and each vector in the target retrieval partition is used as a target second similarity; for example, a second similarity satisfying a certain threshold value among the second similarities is set as the target second similarity; for example, the first t second similarities, which are ranked in the top after ranking from high to low in terms of similarity, among the second similarities are taken as target second similarities; for another example, the largest second similarity among the second similarities is set as the target second similarity. The above is merely an example, and embodiments of the present application do not limit the manner in which the target second similarity is determined. The smaller the number of the target second similarity is, the more the calculation amount of the prediction model can be lightened, and the retrieval speed is improved. For example, the largest second similarity of the second similarities of the vector to be queried and all vectors in the target retrieval partition is used as the target second similarity, so that the calculation power consumption of the prediction model can be reduced under the condition that the accuracy of the final probability value is not affected, and the vector retrieval speed is further improved.
For example, for the vector q1 to be queried, 100 second similarities of q1 to 100 vectors in the search partition a are calculated, and the second similarity with the largest value among the 100 second similarities is taken as the target second similarity.
And step 902, inputting the vector to be queried, the K first similarities and the t target second similarities into a prediction model to obtain a probability value.
The probability value is predicted by the prediction model, so that the accuracy of determining the probability value is improved, and compared with a method for non-model prediction, the probability value is determined by the prediction model more quickly. In the second similarity, t target second similarities with the second similarity being ranked from high to low are selected, the t target second similarities are input into the prediction model, and the calculated amount of the prediction model can be reduced, so that the speed of predicting the probability value is further improved, and the prediction precision is not affected.
For example, clustering the vectors in the vector base results in 10 clustered partitions, each clustered partition corresponding to a partition center vector. And calculating the first similarity of the vector q1 to be queried and partition center vectors of 10 clustering partitions, and selecting the first K clustering partitions with the highest first similarity as search partitions, or selecting K clustering partitions with the first similarity meeting a preset threshold as search partitions. If 3 search partitions are determined, search partition a, search partition B, and search partition C, respectively, then 3 first similarities of the vector q1 to be queried and the partition center vectors of the 3 search partitions are determined. Then determining the target retrieval partition as a retrieval partition A, and calculating 10 second similarity between the vector q1 to be queried and 10 vectors in the retrieval partition A; 5 target second likelihoods are selected among the 10 second likelihoods. And inputting the vector q1 to be queried, 3 first similarities and 5 target second similarities into a single-stage model, and outputting a probability value by the single-stage model. The probability value is used to characterize the probability of the target vector being contained in the search partition a. The probability value may reflect the retrieval accuracy of the current retrieval, and if the probability value is higher, it indicates that the current retrieval accuracy is higher, and if the probability value is lower, it indicates that the current retrieval accuracy is lower. If the probability value meets the first preset threshold, for example, the probability value is 0.98 and is greater than the first preset threshold value by 0.9, the probability value indicates that most of the target vectors are already contained in the search partition A, the search accuracy is higher, and the search of the vectors to be queried can be stopped.
By means of the method, whether the search is terminated in advance for the current vector to be queried is judged in an inference mode, so that the possibility that the vector to be queried continues to calculate the similarity with vectors in the rest search partitions is reduced, the search calculation amount is reduced, and the speed of vector search is improved.
However, the single-stage model has a problem that if the method in the above embodiment runs in the hardware accelerator, the calculation can only be performed by adopting a matrix-vector mode, and the calculation power of the hardware accelerator cannot be fully exerted. Specifically, in the above embodiment, the input of the single-stage model is the vector to be queried, K first similarities, and t target second similarities. Because the search partitions determined by different vectors to be queried are different, and the corresponding target search partitions are different, each second similarity of each vector to be queried and each vector in the target search partition is calculated respectively, and then is input into a single-stage model respectively after calculation, and then the single-stage model can only use a matrix multiplication vector calculation mode of a hardware accelerator. For example, the search partitions determined by the vector q1 to be queried are a search partition a, a search partition B and a search partition C, and the corresponding target search partition is a search partition a; the search partitions determined by the vector q2 to be queried are a search partition D, a search partition E and a search partition F, and the corresponding target search partition is the search partition D. The search partition A and the search partition D are different in each vector, so that the vector q1 to be queried and each second similarity of each vector in the search partition A can be calculated only in the hardware accelerator in a matrix-vector mode, and then the vector q2 to be queried and each second similarity of each vector in the search partition D are calculated in the hardware accelerator in a matrix-vector mode. Therefore, the probability value corresponding to the vector q1 to be queried and the probability value corresponding to the vector q2 to be queried can be output respectively only through a single-stage model, and the single-stage model can only calculate the probability value in a matrix-vector manner. For example, for a vector q1 to be queried, 3 first similarities of the vector q1 to be queried, partition center vectors of the vector q1 to be queried and 3 search partitions and t target second similarities of the vector q1 to be queried and each vector in the search partition a are input into a single-stage model, and the single-stage model outputs a probability value of the vector q1 to be queried by using a calculation mode of matrix multiplication vectors of a hardware accelerator; and then inputting a vector q2 to be queried, 3 first similarities of the vector q2 to be queried and partition center vectors of 3 search partitions and t target second similarities of the vector q2 to be queried and each vector in the search partition D into a single-stage model, and outputting a probability value of the vector q2 to be queried by using a calculation mode of matrix multiplication vectors of a hardware accelerator through the single-stage model.
It can be seen that the single-stage model can only predict for one vector to be queried at a time. This results in a way of calculating the matrix multiplication vector that can only be used with hardware accelerators. The calculation efficiency of the matrix multiplication vector calculation mode of the hardware accelerator is far lower than that of the matrix multiplication matrix calculation mode of the hardware accelerator. This results in wasted hardware accelerator computing power. In addition, the single-stage model can only predict one vector to be queried at a time, and if a plurality of vectors to be queried exist, the prediction time is further increased. If a double-stage model is adopted, the problems existing in the single-stage model can be overcome to a certain extent, and the retrieval speed is further improved on the basis of the single-stage model.
If a dual-stage model is used, the number of vectors to be queried in step 601 may be N, where N is a positive integer greater than 1. When N is greater than 1, the advantages of the vector retrieval method provided by the embodiment of the application can be fully exerted, the retrieval speed is improved, and the retrieval time is reduced. The method for acquiring the N vectors to be queried is not limited, for example, in batch query, a plurality of vectors to be queried are acquired in batch at a time, and the plurality of vectors to be queried are used as input; for example, after a single vector to be queried (such as user input in internet application) is acquired, the computing device integrates the sequentially acquired single vector to be queried into a plurality of vectors to be queried as input. The manner of integration may take various forms well known to those skilled in the art, and the examples herein are not limited in this regard.
The method for obtaining M first similarities in step 602 is as follows: according to N vectors to be queried and partition center vectors of M clustering partitions, M first similarities of any vector to be queried and M partition center vectors are obtained by means of matrix multiplication of a hardware accelerator. And selecting K first similarities with the first similarities ranked from high to low from M first similarities, determining clustering partitions corresponding to the K first similarities as K search partitions, wherein K is an integer greater than or equal to 1, and K is smaller than M.
And clustering the vectors in the vector base according to the similarity among the vectors to obtain M clustering partitions. In order to increase the calculation speed, the N vectors to be queried can be formed into a matrix, the M vectors to be queried can be formed into a matrix, and the matrix multiplication mode of a hardware accelerator is adopted for calculation, so that the M first similarities of any vector to be queried in the N vectors to be queried and the M vectors to be partitioned can be obtained rapidly.
For example, the vectors to be queried are q1 and q2, and the matrix formed is [ q1, q2]; the M partition center vectors are M1, M2, M3, M4, M5, M6, M7, M8, M9 and M10, respectively, and the matrix formed is [ M1, M2, M3, M4, M5, M6, M7, M8, M9, M10]. Fig. 11a shows a matrix of M first similarities between any vector to be queried and M partition center vectors, which is calculated by matrix multiplication of a hardware accelerator. Where s11 represents the first similarity between q1 and m1, s12 represents the first similarity between q1 and m2, and so on, and will not be described again here.
For any vector to be queried, according to M first similarities of the vector to be queried and M partition center vectors, taking the clustering partitions corresponding to the K first similarities with the first similarities sequenced from high to low as K retrieval partitions of the vector to be queried. For example, 3 search partitions are determined for each vector to be queried. The search partitions determined by the vector q1 to be queried are a search partition A, a search partition B and a search partition C; the search partitions determined by the vector q2 to be queried are a search partition D, a search partition E and a search partition F.
The method of obtaining probability values according to the second similarities in step 603 may be further refined, and fig. 10 exemplarily shows a method of obtaining probability values according to the second similarities, and may specifically include the following steps:
step 1001, inputting a matrix formed by N vectors to be queried and a matrix formed by K first similarities corresponding to each vector to be queried in the N vectors to be queried into a first prediction model, and obtaining N initial probability values corresponding to the N vectors to be queried by a matrix multiplication method of a hardware accelerator; the initial probability value is used for representing the probability of the target vector containing the vector to be queried in the K search partitions corresponding to any vector to be queried.
For example, the vectors to be queried are q1 and q2, and the matrix formed is [ q1, q2]; the 3 first similarities corresponding to the 3 retrieval partitions of the vector q1 to be queried are s11, s12 and s13 respectively, and correspond to the retrieval partition A, the retrieval partition B and the retrieval partition C respectively; the 3 first similarities corresponding to the 3 search partitions of the vector q2 to be queried are s24, s25 and s26 respectively, and correspond to the search partition D, the search partition E and the search partition F respectively. Fig. 11b shows a matrix formed by 3 first similarities corresponding to each of the 2 vectors to be queried. In this matrix, it is not necessary to pay attention to which search partitions each vector to be queried includes, because the first prediction model only needs to calculate an initial probability value for each vector to be queried and 3 first similarities each vector to be queried corresponds to.
For example, 2 initial probability values p11 and p12 are generated for the vector q1 to be queried and the vector q2 to be queried, respectively, where p11 characterizes probabilities of target vectors of the vector q1 to be queried in the search partition a, the search partition B and the search partition C. Wherein p12 characterizes the probabilities of target vectors in search partition D, search partition E, and search partition F containing vector q2 to be queried. The above are merely examples.
It can be seen that, since the input of the first prediction model is N vectors to be queried and K first similarities corresponding to each vector to be queried in the N vectors to be queried, these features can be input in a matrix form, so that N initial probability values corresponding to the N vectors to be queried can be obtained by matrix multiplication of the hardware accelerator, thus, the calculation force of the hardware accelerator is fully exerted, compared with the single-stage model, the calculation efficiency can be further improved, the speed of vector retrieval is improved, and the retrieval time consumption is reduced.
Step 1002, selecting any unselected search partition from the K search partitions as a target search partition for any vector to be queried; and determining each second similarity of the vector to be queried and each vector in the target retrieval partition. Among the second similarities, the first t target second similarities are determined from the top to the bottom.
For example, for the vector q1 to be queried, the target search partition is search partition a, 100 second similarities of q1 and 100 vectors in the search partition a are calculated, and the second similarity with the largest value among the 100 second similarities is taken as the target second similarity. For the vector q2 to be queried, the target retrieval partition is the retrieval partition D, 200 second similarities of the q1 and 200 vectors in the retrieval partition D are calculated, and the second similarity with the largest value among the 200 second similarities is taken as the target second similarity.
Step 1003, for any vector to be queried, inputting an initial probability value corresponding to the vector to be queried and t target second similarity values corresponding to the vector to be queried into a second prediction model to obtain a final probability value corresponding to the vector to be queried.
Since the target search partitions corresponding to each vector to be queried are different, the t target second likelihoods corresponding to different vectors to be queried cannot be obtained simultaneously, but are calculated separately, as described in step 1002. Therefore, in step 1003, the matrix multiplication vector of the hardware accelerator is calculated for each vector to be queried.
For example, for the vector q1 to be queried, the corresponding initial probability value p11 and the target second similarity are input into a second prediction model, and the final probability value p21 is obtained by adopting a matrix multiplication vector mode of a hardware accelerator. p21 reflects the probability of retrieving the target vector in partition a containing vector q1 to be queried.
For the vector q2 to be queried, inputting the corresponding initial probability value p12 and the target second similarity to a second prediction model, and obtaining a final probability value p22 by adopting a matrix multiplication vector mode of a hardware accelerator. p22 reflects the probability of retrieving the target vector in partition D containing vector q2 to be queried.
In the technical scheme, the prediction of the probability value is divided into two stages, wherein the first stage adopts a first prediction model, and the second stage adopts a second prediction model. Specifically, a matrix formed by N vectors to be queried and a matrix formed by K first similarities corresponding to each vector to be queried in the N vectors to be queried are input into a first prediction model, so that the first prediction model can predict an initial probability value in a matrix multiplying mode, calculation force is fully exerted, calculation efficiency is improved, and vector retrieval speed is further improved.
If a two-stage model is adopted, after step 1003, the step of determining the probability value may be further refined, and fig. 12 exemplarily shows a method of determining the probability value, which may specifically include the following steps:
step 1201, if the final probability value is not greater than the first preset threshold, selecting the next unselected search partition from the K search partitions as the target search partition.
For example, the first similarity between the vector q1 to be queried and the search partition a, the search partition B and the search partition C is 0.9,0.8 and 0.7, respectively, and then the target search partition of the vector q1 to be queried is the search partition a, and the next target search partition is the search partition B.
Step 1202, inputting the final probability value corresponding to the vector to be queried and the target second similarity between the vector to be queried and each vector in the next target retrieval partition into a second prediction model, and obtaining the update probability value corresponding to the vector to be queried by adopting a matrix multiplication vector mode of a hardware accelerator.
The method for determining the target second similarity is the same as the method for determining the target second similarity in the target search partition in the foregoing, and will not be described herein.
For example, calculating a second similarity between the vector q1 to be queried and each vector in the search partition B, and determining a value with the maximum second similarity as a target second similarity; and inputting a final probability value p21 corresponding to the vector q1 to be queried and a target second similarity corresponding to the vector q1 to be queried into a second prediction model, and obtaining an update probability value corresponding to the vector q1 to be queried by adopting a matrix multiplication vector mode of a hardware accelerator. The update probability value is used to characterize the probability of the target vector contained in all the current target search partitions, in this example, the update probability value is used to characterize the probability of the target vector contained in the search partition a and the search partition B.
If the update probability value is not greater than the first preset threshold, the final probability value in step 1202 is updated to the update probability value, and the process returns to step 1201 in which the next unselected search partition is selected from the K search partitions as the target search partition.
If the update probability value is not greater than the first preset threshold value, the probability that all the current target retrieval partitions contain the target vector is low, the retrieval precision is not high, and the retrieval should be continued.
If the update probability value is larger than the first preset threshold value, the probability that all the current target retrieval partitions contain the target vector is higher, the retrieval precision is higher, and the retrieval should be terminated. Or after the polling of K search partitions is finished, the search should be terminated as well, so that the calculation force is saved.
And (3) inputting the final probability value into the second prediction model to obtain an update probability value, and if the update probability value is not greater than a second preset threshold value, updating the update probability value into the final probability value, and performing the cycle to judge whether to terminate the search. The accuracy of judging the termination of the search is improved, and the vector search precision can be improved.
For ease of understanding, the vector search method provided in the embodiments of the present application will be described in its entirety by a specific embodiment. Fig. 13 is an overall flowchart of a vector search method according to an embodiment of the present invention, which may include the following steps.
In step 1301, the vectors q1 and q2 to be queried are obtained.
In step 1302, a matrix formed by the vectors q1 and q2 to be queried and a matrix formed by the partition center vectors of 10 clustered partitions are multiplied by a matrix of a hardware accelerator to obtain 10 first similarities between any vector to be queried and 10 partition center vectors.
Step 1303, determining 3 search partitions corresponding to the first 3 values from high to low of the first similarity in the 10 first similarities corresponding to the vector q1 to be queried; among the 10 first similarities corresponding to the vector q2 to be queried, 3 search partitions corresponding to the first 3 values from high to low of the first similarities are determined.
For example, the 3 search partitions corresponding to the vector q1 to be queried are a search partition a, a search partition B, and a search partition C. The 3 search partitions corresponding to the vector q2 to be queried are a search partition D, a search partition E and a search partition F.
In step 1304, a matrix formed by the vectors q1 and q2 to be queried, and a matrix formed by 3 first similarities corresponding to the vectors q1 and q2 to be queried are input to the first prediction model.
In step 1305, in the first prediction model, initial probability values corresponding to the vectors q1 and q2 to be queried are obtained by matrix multiplication of the hardware accelerator.
In step 1306, 3 search partitions are sorted according to the first similarity for the vector q1 to be queried.
For example, the search partition a, the search partition B, and the search partition C have first similarities of 0.9, 0.8, and 0.7, respectively.
In step 1307, the search partition with the largest first similarity is determined as the target search partition of the vector q1 to be queried. For example, the search partition a is determined as the i-th target search partition of the vector q1 to be queried.
Step 1308, a second similarity between the vector q1 to be queried and each vector in the target retrieval partition is calculated, and the maximum value of the second similarity is taken as the target second similarity of the target retrieval partition.
Step 1309, inputting the initial probability value corresponding to the vector q1 to be queried and the target second similarity to the second prediction model, and obtaining the final probability value corresponding to the vector q1 to be queried by adopting a matrix multiplication vector mode of the hardware accelerator.
Step 1310, determining whether the final probability value is greater than a first preset threshold, if so, proceeding to step 1311. If not, go to step 1312.
Step 1311, the search is terminated for the vector q1 to be queried. And returning the vector meeting the similarity requirement with the vector to be queried in all the current target retrieval partitions as a query result. For example, if the final probability value corresponding to the search partition a is 0.98 and is greater than the first preset threshold, the vectors in the search partition a, which correspond to the first W second similarities from large to small, are returned as the query results.
In step 1312, the next unselected search partition is selected from the 3 search partitions as the target search partition. For example, if the final probability value corresponding to the search partition a is 0.58 and is not greater than the first preset threshold, the search partition B is selected as the target search partition.
In step 1313, a second similarity between the vector q1 to be queried and each vector in the next target search partition is calculated, and the maximum value of the second similarity is taken as the target second similarity of the next target search partition.
In step 1314, the final probability value and the target second similarity of the next target retrieval partition are input into a second prediction model, and an update probability value corresponding to the vector q1 to be queried is obtained by means of matrix multiplication of the hardware accelerator.
Step 1315, determining whether the update probability value is greater than a first preset threshold, if so, proceeding to step 1311. If not, go to step 1316.
Step 1316, the final probability value in step 1314 is updated to an updated probability value, returning to step 1312.
For the vector q2 to be queried, please refer to the processing steps for the vector q1 to be queried in the steps 1306-1316 to determine the query result of the vector q2 to be queried, which is not repeated here.
It should be noted that, the steps in the above method embodiments are described by taking the computing device 20 as an example, and the steps in the above method embodiments may also be performed by the processor 201 in the computing device 20.
Based on the above and the same technical idea, the present embodiment provides a vector retrieval apparatus, which includes an acquisition unit 1401 and a processing unit 1402, as shown in fig. 14. The vector retrieving means are for performing the method embodiments shown in fig. 5a, 6, 9, 10, 12 or 13 described above.
When the vector retrieving apparatus is used to implement the functions in the method embodiment shown in fig. 13, the obtaining unit 1401 is used to obtain a vector to be queried; the processing unit 1402 is configured to: respectively carrying out similarity calculation on the vector to be queried and partition center vectors of M clustering partitions to obtain M first similarities; the M clustering partitions are obtained by clustering the vectors in the vector base according to the similarity among the vectors; the partition center vector of any cluster partition is determined according to a plurality of vectors contained in any cluster partition, and M is an integer greater than 1; selecting K first similarities with the first similarities from high to low from M first similarities, determining clustering partitions corresponding to the K first similarities as K retrieval partitions, wherein K is an integer greater than or equal to 1 and K is smaller than M; and circularly executing the following operations until the probability value that the target retrieval partition selected from the K retrieval partitions contains the target vector is larger than a first preset threshold value, wherein the target vector is a vector with similarity with the vector to be queried within a preset range: selecting a search partition which is not selected from the K search partitions as a target search partition; calculating second similarity between the vector to be queried and each vector contained in the target retrieval partition; determining a probability value of the target vector contained in the target retrieval partition according to each second similarity; based on the at least one search partition and the vector to be queried, a query result is output.
In one possible implementation, the processing unit 1402, when outputting the query result based on the at least one search partition and the vector to be queried, is specifically configured to: outputting each vector contained in the search partition with the probability value larger than a first preset threshold value in the selected at least one search partition as a query result; or according to the sequence from high to low of the second similarity between each vector contained in the search partition with the probability value larger than the first preset threshold value in at least one selected search partition and the vector to be queried, outputting the vectors with the W second similarities which are ranked in front and correspond to each other as the query result, wherein W is a positive integer.
In one possible implementation, the processing unit 1402, when outputting the query result based on the at least one search partition and the vector to be queried, is specifically configured to: each vector contained in the selected at least one search partition is output as a query result; or according to the sequence from high to low of the second similarity between each vector contained in the selected at least one search partition and the vector to be queried, outputting the vectors with the W second similarities which are sequenced in front as query results, wherein W is a positive integer.
In one possible implementation, when the processing unit 1402 selects a search partition that is not selected from the K search partitions as the target search partition, the processing unit is specifically configured to: and selecting unselected search partitions from the K search partitions according to the order of the K first similarities from high to low as target search partitions.
In one possible implementation, when the processing unit 1402 selects a search partition that is not selected from the K search partitions as the target search partition, the processing unit is specifically configured to: clustering each vector in the search partition according to the similarity among vectors to obtain a plurality of search sub-partitions aiming at any one of the K search partitions; determining a sub-partition center vector of any search sub-partition according to a plurality of vectors contained in any search sub-partition; calculating third similarity between the vector to be queried and sub-partition center vectors of the plurality of cable sub-partitions respectively; according to a plurality of third similarity between the vector to be queried and the center vectors of a plurality of sub-partitions in each search partition, sequencing the K search partitions; and selecting unselected search partitions from the K search partitions after sequencing as target search partitions.
In one possible implementation, the processing unit 1402 is specifically configured to, when sorting the K search partitions according to a plurality of third similarities between the vector to be queried and a plurality of sub-partition center vectors in each search partition: the K search partitions are ordered according to the number of third similarity exceeding a second preset threshold value in the third similarity between the vector to be queried and the center vectors of the sub-partitions in each search partition; or, sorting the K search partitions according to the maximum similarity of the third similarities of the vector to be queried and the center vectors of the sub-partitions in each search partition.
In one possible implementation, the processing unit 1402 is specifically configured to, when determining the probability value of the target vector included in the target search partition according to each second similarity: among the second similarities, determining t target second similarities in which the second similarities are ranked first from high to low; inputting the vector to be queried, K first similarities and t target second similarities into a prediction model to obtain a probability value; the prediction model is used for predicting probability values of the target vectors contained in the target retrieval partition.
In one possible implementation, the number of vectors to be queried is N, N being a positive integer greater than 1; the processing unit 1402 is specifically configured to, when inputting the vector to be queried, the K first similarities, and the t target second similarities into the prediction model to obtain the probability value: inputting a matrix formed by N vectors to be queried and a matrix formed by K first similarity corresponding to each vector to be queried in the N vectors to be queried into a first prediction model to obtain N initial probability values corresponding to the N vectors to be queried; the initial probability value is used for representing the probability of the target vector containing the vector to be queried in the K search partitions corresponding to any vector to be queried; and inputting an initial probability value corresponding to the vector to be queried and t target second similarity corresponding to the vector to be queried into a second prediction model aiming at any vector to be queried, so as to obtain a final probability value corresponding to the vector to be queried.
Based on the foregoing and the same technical ideas, the embodiments of the present application also provide a computer-readable storage medium, on which a computer program or instructions are stored, which when executed, cause a computer to perform the method in the method embodiments described above.
Based on the above and the same technical ideas, the present application provides a computer program product, which when read and executed by a computer, causes the computer to perform the method in the above method embodiments.
It will be appreciated that the various numerical numbers referred to in the embodiments of the present application are merely for ease of description and are not intended to limit the scope of the embodiments of the present application. The sequence number of each process does not mean the sequence of the execution sequence, and the execution sequence of each process should be determined according to the function and the internal logic.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (18)

1. A vector retrieval method, comprising:
obtaining a vector to be queried;
respectively carrying out similarity calculation on the vector to be queried and partition center vectors of M clustering partitions to obtain M first similarities; the M clustering partitions are obtained by clustering the vectors in the vector base according to the similarity among the vectors; the partition center vector of any clustering partition is determined according to a plurality of vectors contained in any clustering partition, and M is an integer greater than 1;
selecting K first similarities with the first similarities from high to low from the M first similarities, and determining clustering partitions respectively corresponding to the K first similarities as K retrieval partitions, wherein K is an integer greater than or equal to 1 and is smaller than M;
and circularly executing the following operations until the probability value that the target retrieval partition selected from the K retrieval partitions contains the target vector is larger than a first preset threshold value, wherein the target vector is a vector with similarity with the vector to be queried within a preset range:
selecting a search partition which is not selected from the K search partitions as a target search partition;
Calculating second similarity between the vector to be queried and each vector contained in the target retrieval partition;
determining a probability value of the target vector contained in the target retrieval partition according to each second similarity;
and outputting a query result based on the at least one selected search partition and the vector to be queried.
2. The method of claim 1, wherein outputting a query result based on the at least one search partition that has been selected and the vector to be queried comprises:
outputting each vector contained in the search partition with the probability value larger than the first preset threshold value in the at least one selected search partition as a query result; or alternatively
And outputting vectors which are respectively corresponding to the W second similarities and are ranked in front as query results according to the sequence from high to low of the second similarities between each vector contained in the search partition with the probability value larger than the first preset threshold value in at least one selected search partition and the vector to be queried, wherein W is a positive integer.
3. The method of claim 1, wherein outputting a query result based on the at least one search partition that has been selected and the vector to be queried comprises:
Each vector contained in the selected at least one search partition is output as a query result; or alternatively
And outputting vectors which are respectively corresponding to the W second similarities and are sequenced in front as query results according to the sequence from high to low of the second similarities between each vector and the vector to be queried which are respectively contained in at least one selected retrieval partition, wherein W is a positive integer.
4. The method of claim 1, wherein selecting an unselected search partition among the K search partitions as a target search partition comprises:
and selecting unselected search partitions from the K search partitions as target search partitions according to the sequence of the K first similarities from high to low.
5. The method of claim 1, wherein selecting an unselected search partition among the K search partitions as a target search partition comprises:
clustering each vector in the search partition according to the similarity among vectors to obtain a plurality of search sub-partitions aiming at any one of the K search partitions; determining a sub-partition center vector of any search sub-partition according to a plurality of vectors contained in any search sub-partition;
Calculating third similarity between the vector to be queried and sub-partition center vectors of the plurality of cable sub-partitions respectively;
sorting the K search partitions according to a plurality of third similarity between the vector to be queried and a plurality of sub-partition center vectors in each search partition;
and selecting unselected search partitions from the K search partitions after sequencing as the target search partition.
6. The method of claim 5, wherein sorting the K search partitions according to a plurality of third similarities of the vector to be queried to a plurality of sub-partition center vectors in each search partition, comprises:
sorting the K search partitions according to the number of third similarity exceeding a second preset threshold value in the third similarity between the vector to be queried and the center vectors of the sub-partitions in each search partition; or alternatively
And sequencing the K search partitions according to the maximum similarity of the third similarities of the vector to be queried and the center vectors of the sub-partitions in each search partition.
7. The method of claim 1, wherein determining a probability value for a target vector contained in the target search partition based on each of the second similarities comprises:
Determining the first t target second similarities of the second similarity from high to low in each second similarity;
inputting the vector to be queried, the K first similarities and the t target second similarities into a prediction model to obtain the probability value; the prediction model is used for predicting the probability value of the target vector contained in the target retrieval partition.
8. The method of claim 7, wherein the vectors to be queried are N, N being a positive integer greater than 1;
inputting the vector to be queried, the K first similarities and the t target second similarities into a prediction model to obtain the probability value, wherein the probability value comprises the following steps:
inputting a matrix formed by N vectors to be queried and a matrix formed by the K first similarities corresponding to each vector to be queried in the N vectors to be queried into a first prediction model to obtain N initial probability values corresponding to the N vectors to be queried; the initial probability value is used for representing the probability of the target vector containing the vector to be queried in K search partitions corresponding to any vector to be queried;
and inputting an initial probability value corresponding to any vector to be queried and the t target second similarity corresponding to the vector to be queried into a second prediction model to obtain a final probability value corresponding to the vector to be queried.
9. A vector retrieval apparatus, comprising:
the acquisition unit is used for acquiring the vector to be queried;
a processing unit for:
respectively carrying out similarity calculation on the vector to be queried and partition center vectors of M clustering partitions to obtain M first similarities; the M clustering partitions are obtained by clustering the vectors in the vector base according to the similarity among the vectors; the partition center vector of any clustering partition is determined according to a plurality of vectors contained in any clustering partition, and M is an integer greater than 1;
selecting K first similarities with the first similarities from high to low from the M first similarities, and determining clustering partitions respectively corresponding to the K first similarities as K retrieval partitions, wherein K is an integer greater than or equal to 1 and is smaller than M;
and circularly executing the following operations until the probability value that the target retrieval partition selected from the K retrieval partitions contains the target vector is larger than a first preset threshold value, wherein the target vector is a vector with similarity with the vector to be queried within a preset range:
selecting a search partition which is not selected from the K search partitions as a target search partition;
Calculating second similarity between the vector to be queried and each vector contained in the target retrieval partition;
determining a probability value of the target vector contained in the target retrieval partition according to each second similarity;
and outputting a query result based on the at least one selected search partition and the vector to be queried.
10. The apparatus of claim 9, wherein when the processing unit outputs a query result based on the at least one search partition that has been selected and the vector to be queried, it is specifically configured to:
outputting each vector contained in the search partition with the probability value larger than the first preset threshold value in the at least one selected search partition as a query result; or alternatively
And outputting vectors which are respectively corresponding to the W second similarities and are ranked in front as query results according to the sequence from high to low of the second similarities between each vector contained in the search partition with the probability value larger than the first preset threshold value in at least one selected search partition and the vector to be queried, wherein W is a positive integer.
11. The apparatus of claim 9, wherein when the processing unit outputs a query result based on the at least one search partition that has been selected and the vector to be queried, it is specifically configured to:
Each vector contained in the selected at least one search partition is output as a query result; or alternatively
And outputting vectors which are respectively corresponding to the W second similarities and are sequenced in front as query results according to the sequence from high to low of the second similarities between each vector and the vector to be queried which are respectively contained in at least one selected retrieval partition, wherein W is a positive integer.
12. The apparatus of claim 9, wherein when the processing unit selects an unselected search partition among the K search partitions as a target search partition, it is specifically configured to:
and selecting unselected search partitions from the K search partitions as target search partitions according to the sequence of the K first similarities from high to low.
13. The apparatus of claim 9, wherein when the processing unit selects an unselected search partition among the K search partitions as a target search partition, it is specifically configured to:
clustering each vector in the search partition according to the similarity among vectors to obtain a plurality of search sub-partitions aiming at any one of the K search partitions; determining a sub-partition center vector of any search sub-partition according to a plurality of vectors contained in any search sub-partition;
Calculating third similarity between the vector to be queried and sub-partition center vectors of the plurality of cable sub-partitions respectively;
sorting the K search partitions according to a plurality of third similarity between the vector to be queried and a plurality of sub-partition center vectors in each search partition;
and selecting unselected search partitions from the K search partitions after sequencing as the target search partition.
14. The apparatus of claim 9, wherein when the processing unit ranks the K search partitions according to a plurality of third similarities between the vector to be queried and a plurality of sub-partition center vectors in each search partition, the processing unit is specifically configured to:
sorting the K search partitions according to the number of third similarity exceeding a second preset threshold value in the third similarity between the vector to be queried and the center vectors of the sub-partitions in each search partition; or alternatively
And sequencing the K search partitions according to the maximum similarity of the third similarities of the vector to be queried and the center vectors of the sub-partitions in each search partition.
15. The apparatus of claim 9, wherein when the processing unit determines, based on each of the second similarities, a probability value for a target vector included in the target search partition, the processing unit is specifically configured to:
Determining the first t target second similarities of the second similarity from high to low in each second similarity;
inputting the vector to be queried, the K first similarities and the t target second similarities into a prediction model to obtain the probability value; the prediction model is used for predicting the probability value of the target vector contained in the target retrieval partition.
16. The apparatus of claim 15, wherein the vectors to be queried are N, the N being a positive integer greater than 1;
when the processing unit inputs the vector to be queried, the K first similarities and the t target second similarities into a prediction model to obtain the probability value, the method is specifically used for:
inputting a matrix formed by N vectors to be queried and a matrix formed by the K first similarities corresponding to each vector to be queried in the N vectors to be queried into a first prediction model to obtain N initial probability values corresponding to the N vectors to be queried; the initial probability value is used for representing the probability of the target vector containing the vector to be queried in K search partitions corresponding to any vector to be queried;
and inputting an initial probability value corresponding to any vector to be queried and the t target second similarity corresponding to the vector to be queried into a second prediction model to obtain a final probability value corresponding to the vector to be queried.
17. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program or instructions which, when executed by vector retrieving means, implement the method according to any of claims 1 to 8.
18. A chip comprising at least one processor and an interface; the interface is used for providing program instructions or data for the at least one processor; the at least one processor is configured to execute the program line instructions to implement the method of any one of claims 1 to 8.
CN202211193810.4A 2022-09-28 2022-09-28 Vector retrieval method and device Pending CN117828131A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211193810.4A CN117828131A (en) 2022-09-28 2022-09-28 Vector retrieval method and device
PCT/CN2023/121585 WO2024067593A1 (en) 2022-09-28 2023-09-26 Vector retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211193810.4A CN117828131A (en) 2022-09-28 2022-09-28 Vector retrieval method and device

Publications (1)

Publication Number Publication Date
CN117828131A true CN117828131A (en) 2024-04-05

Family

ID=90476356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211193810.4A Pending CN117828131A (en) 2022-09-28 2022-09-28 Vector retrieval method and device

Country Status (2)

Country Link
CN (1) CN117828131A (en)
WO (1) WO2024067593A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385280A (en) * 2020-10-16 2022-04-22 华为技术有限公司 Parameter determination method and electronic equipment
CN113704534A (en) * 2021-04-13 2021-11-26 腾讯科技(深圳)有限公司 Image processing method and device and computer equipment
CN113449132B (en) * 2021-08-26 2022-02-25 阿里云计算有限公司 Vector retrieval method and device
CN114020746A (en) * 2021-11-04 2022-02-08 山东库睿科技有限公司 Data processing method and device

Also Published As

Publication number Publication date
WO2024067593A1 (en) 2024-04-04

Similar Documents

Publication Publication Date Title
Pan et al. Fast GPU-based locality sensitive hashing for k-nearest neighbor computation
US8583896B2 (en) Massively parallel processing core with plural chains of processing elements and respective smart memory storing select data received from each chain
US11392829B1 (en) Managing data sparsity for neural networks
CN112380003B (en) High-performance parallel implementation device for K-NN on GPU processor
CN109165307B (en) Feature retrieval method, device and storage medium
US20200265045A1 (en) Technologies for refining stochastic similarity search candidates
CN112395396A (en) Question-answer matching and searching method, device, system and storage medium
US20200264874A1 (en) Technologies for performing random sparse lifting and procrustean orthogonal sparse hashing using column read-enabled memory
CN117251641A (en) Vector database retrieval method, system, electronic device and storage medium
CN115878824B (en) Image retrieval system, method and device
CN113971225A (en) Image retrieval system, method and device
CN110209895B (en) Vector retrieval method, device and equipment
CN117056465A (en) Vector searching method, system, electronic device and storage medium
CN117828131A (en) Vector retrieval method and device
CN115836346A (en) In-memory computing device and data processing method thereof
JP7213890B2 (en) Accelerated large-scale similarity computation
CN110633379A (en) System and method for searching images by images based on GPU parallel operation
CN112328630B (en) Data query method, device, equipment and storage medium
CN112364093B (en) Learning type big data visualization method and system
Masek et al. Multi–gpu implementation of machine learning algorithm using cuda and opencl
US20200327365A1 (en) Large-scale similarity search with on-chip cache
WO2020237511A1 (en) Similarity search method, device, server, and storage medium
CN116737607B (en) Sample data caching method, system, computer device and storage medium
Tung et al. Practical selection of representative sets of RNA-seq samples using a hierarchical approach
CN112214627A (en) Search method, readable storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination