WO2023065696A1

WO2023065696A1 - Nearest neighbor search method and apparatus, terminal, and storage medium

Info

Publication number: WO2023065696A1
Application number: PCT/CN2022/099850
Authority: WO
Inventors: 张号逵; 胡文泽; 王孝宇
Original assignee: 深圳云天励飞技术股份有限公司
Priority date: 2021-10-21
Filing date: 2022-06-20
Publication date: 2023-04-27
Also published as: CN113868291A

Abstract

The present application is applicable to the technical field of computers, and provides a nearest neighbor search method and apparatus, a terminal, and a storage medium. The nearest neighbor search method specifically comprises: inputting a reference high-dimensional feature of a reference object into a feature compression network to obtain a reference low-dimensional feature output by the feature compression network, a loss function of the feature compression network being a function obtained on the basis of a high-dimensional neighbor relationship of a sample object and a low-dimensional neighbor relationship of the sample object; establishing a low-dimensional neighbor graph by using the reference low-dimensional feature; obtaining a target feature of a target object; and performing nearest neighbor search on the basis of the low-dimensional neighbor graph and the target feature to obtain a reference object nearest to the target object. According to the embodiments of the present application, the construction efficiency of a neighbor graph can be improved while the nearest neighbor search precision is guaranteed.

Description

A nearest neighbor search method, device, terminal and storage medium

This application claims the priority of the Chinese patent application with the application number 202111227715.7 and the title of the invention "a nearest neighbor search method, device, terminal and storage medium" submitted to the China Patent Office on October 21, 2021, the entire content of which is passed References are incorporated in this application.

technical field

The present application belongs to the field of computer technology, and in particular relates to a nearest neighbor search method, device, terminal and storage medium.

Background technique

The main goal of the approximate nearest neighbor search algorithm is to retrieve multiple data feature vectors most similar to a given query object from a database containing a large number of data feature vectors under a certain similarity measure criterion. Approximate nearest neighbor search is the basis of information retrieval and is widely used in various search engines and recommendation systems. How to quickly and accurately implement approximate nearest neighbor search under the condition of limited hardware cost has always been a research hotspot in the field of information retrieval.

The approximate nearest neighbor search algorithm based on the index graph is to iteratively approach the query object along the boundary line in the pre-configured relative neighbor graph (RNG). This algorithm only needs to calculate the eigenvector of the query object and the approximate route The similarity between data feature vectors can significantly improve the retrieval speed. This type of method is the most widely used approximate nearest neighbor search algorithm in recent years, and has been applied in many practical scenarios.

However, in order to ensure reliability, this type of method needs to pre-build a high-precision RNG. When processing a database with hundreds of millions of eigenvector data, it takes days or even weeks for more than 30 threads to build the RNG. The problem that it takes too long to construct the RNG seriously limits the scope of application of the approximate nearest neighbor search algorithm based on the index graph.

Contents of the invention

Embodiments of the present application provide a nearest neighbor search method, device, terminal, and storage medium, which can improve the efficiency of RNG construction while ensuring the accuracy of nearest neighbor search.

The first aspect of the embodiment of the present application provides a nearest neighbor search method, including:

Input the reference high-dimensional feature of the reference object into the feature compression network to obtain the reference low-dimensional feature output by the feature compression network, and the loss function of the feature compression network is based on the high-dimensional neighbor relationship of the sample object and the sample object The function obtained by the low-dimensional nearest neighbor relationship;

Establishing a low-dimensional neighbor graph by using the reference low-dimensional features;

Obtain the target feature of the target object;

A nearest neighbor search is performed based on the low-dimensional neighbor graph and the target feature to obtain a reference object with the closest distance to the target object.

In some embodiments of the present application, the high-dimensional neighbor relationship is the high-dimensional Euclidean distance between the sample high-dimensional features associated with each two sample objects among the plurality of sample objects, and the low-dimensional neighbor relationship is It is the low-dimensional Euclidean distance between the sample low-dimensional features associated with each two sample objects among the plurality of sample objects; the loss function of the feature compression network is based on the sum of the high-dimensional Euclidean distance and the The error value between the low-dimensional Euclidean distance corresponding to the high-dimensional Euclidean distance, and the function obtained by the weight value associated with the high-dimensional Euclidean distance, wherein, the value of the weight value and the high-dimensional Euclidean distance associated with it The size of the Euclidean distance is related.

In some embodiments of the present application, the feature compression network includes a compression module, a projection module, and a global optimization module, and the compression module includes a first linear mapping module, a second linear mapping module, and a feature compression module; The reference high-dimensional features of the object are input into the feature compression network, and the reference low-dimensional features output by the feature compression network are obtained, including: inputting the reference high-dimensional features into the feature compression module and the first linear mapping module and the projection module to obtain the first feature output by the feature compression module, the second feature output by the first linear mapping module, and at least one third feature output by the projection module, wherein the The dimension of the first feature, the second feature and each of the third features is the same as the dimension of the reference low-dimensional feature; the first feature, the second feature and the at least one third feature The feature is input to the global optimization module to obtain a fourth feature and at least one fifth feature output by the global optimization module; the fourth feature and the at least one fifth feature are input to the compression module to obtain The reference low-dimensional features output by the compression module.

In some embodiments of the present application, the global optimization module includes at least one encoder; the input of the first feature, the second feature and the at least one third feature to the global optimization module, Obtaining the fourth feature and at least one fifth feature output by the global optimization module includes: composing the first feature and the at least one third feature into a first vector, and inputting the first vector into the The first encoder in the at least one encoder obtains the second vector output by the first encoder based on the multi-attention head mechanism and the linear mapping layer, wherein the second vector includes the same as the first The sixth feature corresponding to the feature, and at least one seventh feature corresponding to the at least one third feature one-to-one; the sixth feature corresponding to the first feature in the second vector is compared with the second feature Add to obtain the eighth feature; replace the sixth feature in the second vector with the eighth feature to obtain a third vector; input the third vector to the first in the at least one encoder two encoders, and so on, until the target vector output by the last encoder of the at least one encoder is obtained, and the target vector includes the fourth feature and the at least one fifth feature.

In some embodiments of the present application, the calculation process of the output value of the multi-attention head mechanism includes: determining the first input value, the second input value and the first input value of the multi-attention head mechanism based on the first vector Three input values; respectively performing mapping processing on the first input value, the second input value, and the third input value to obtain the first mapping value and the second input value corresponding to the first input value The corresponding second mapping value and the third mapping value corresponding to the third input value, wherein the dimension of the third mapping value is lower than the dimension of the first mapping value and lower than the dimension of the second mapping value Dimension; use the first dimension reduction value and the second dimension reduction value to calculate the merging parameter of the third mapping value; use the third mapping value and the merging parameter to calculate the multi-attention head mechanism output value.

In some embodiments of the present application, performing the nearest neighbor search based on the low-dimensional neighbor graph and the target feature to obtain the reference object with the closest distance to the target object includes: combining the low-dimensional neighbor graph Each reference low-dimensional feature is replaced by its corresponding reference high-dimensional feature, and the high-dimensional neighbor graph corresponding to the low-dimensional neighbor graph is obtained; based on the high-dimensional neighbor graph and the target feature, the nearest neighbor search is performed to obtain the corresponding high-dimensional neighbor graph. The closest reference object to the target object.

A nearest neighbor search device provided in the second aspect of the embodiment of the present application includes:

The feature compression unit is used to input the reference high-dimensional feature of the reference object into the feature compression network to obtain the reference low-dimensional feature output by the feature compression network, and the loss function of the feature compression network is based on the high-dimensional neighbor of the sample object The function obtained by the relationship and the low-dimensional neighbor relationship of the sample object;

A neighbor graph construction unit, configured to use the reference low-dimensional features to build a low-dimensional neighbor graph;

A feature acquisition unit, configured to acquire the target feature of the target object;

A nearest neighbor search unit is configured to perform a nearest neighbor search based on the low-dimensional neighbor graph and the target feature to obtain a reference object with the closest distance to the target object.

The third aspect of the embodiments of the present application provides a terminal, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the above method is implemented when the processor executes the computer program A step of.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the foregoing method are implemented.

The fifth aspect of the embodiments of the present application provides a computer program product, which, when the computer program product runs on a terminal, enables the terminal to execute the steps of the method.

In the embodiment of the present application, by inputting the reference high-dimensional features of the reference object into the feature compression network, the reference low-dimensional features output by the feature compression network are obtained, and the low-dimensional neighbor graph is established by using the reference low-dimensional features, and then the target The target feature of the object, and the nearest neighbor search is performed based on the low-dimensional neighbor graph and the target feature, and the reference object closest to the target object is obtained. On the one hand, because the loss function of the feature compression network is based on the high-dimensional neighbor relationship of the sample object and the The function obtained by describing the low-dimensional neighbor relationship of the sample object, using the trained feature compression network to reduce the dimensionality of the reference high-dimensional features, can avoid the problem of loss of adjacent relationship information between features caused by dimensionality reduction directly through the dimensionality reduction algorithm , and then improve the search accuracy. On the other hand, building a low-dimensional neighbor graph based on the reduced-dimensional reference low-dimensional features can reduce the time consumption of building RNG and improve the efficiency of RNG construction.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the accompanying drawings that need to be used in the descriptions of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings in the following description are only for the present application For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without paying creative efforts.

FIG. 1 is a schematic diagram of the implementation flow of a nearest neighbor search method provided by an embodiment of the present application;

Fig. 2 is a schematic diagram of the nearest neighbor search provided by the embodiment of the present application;

Fig. 3 is a schematic structural diagram of a feature compression network provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a specific implementation flow of step S101 provided by the embodiment of the present application;

FIG. 5 is a schematic structural diagram of an encoder provided in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a nearest neighbor search device provided in an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a terminal provided by an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without creative efforts shall belong to the protection of the present application.

In order to ensure reliability, the approximate nearest neighbor search algorithm based on the index graph needs to pre-build a high-precision RNG. When processing a database with hundreds of millions of eigenvector data, it takes days or even weeks for more than 30 threads to build the RNG. time. The problem that it takes too long to construct the RNG seriously limits the scope of application of the approximate nearest neighbor search algorithm based on the index graph.

Therefore, this application proposes a nearest neighbor search method. First, the feature compression network is obtained by training the loss function based on the high-dimensional neighbor relationship of the sample object and the low-dimensional neighbor relationship of the sample object, and uses the trained feature compression network to compare the reference Dimensionality reduction of high-dimensional features can avoid the problem of loss of adjacent relationship information between features caused by dimensionality reduction directly through dimensionality reduction algorithms, thereby improving search accuracy. , which can reduce the time consumption of building RNG and improve the efficiency of RNG building.

In order to illustrate the technical solution of the present application, specific examples are used below to illustrate.

Figure 1 shows a schematic diagram of the implementation process of a nearest neighbor search method provided by the embodiment of the present application. Smart devices such as devices, set-top boxes, servers, and satellite wireless devices can be applied to situations where it is necessary to improve the efficiency of RNG construction while ensuring the accuracy of the nearest neighbor search.

Specifically, the above nearest neighbor search method may include the following steps S101 to S104.

Step S101, input the reference high-dimensional features of the reference object into the feature compression network to obtain the reference low-dimensional features output by the feature compression network.

Wherein, the reference object refers to the object used to construct the RNG in the database, and the type of the reference object can be adjusted according to the actual situation, and generally can be an image or the like.

In the embodiment of this application, the terminal can extract the reference high-dimensional features of the reference object through the feature extraction algorithm, and use the trained feature compression network to convert the reference high-dimensional features of the reference object to

Compress to reference low-dimensional vector

Specifically, the network structure of the feature extraction algorithm and the feature compression network can be set according to the actual situation. The loss function of the feature compression network is a function based on the high-dimensional neighbor relationship of the sample object and the low-dimensional neighbor relationship of the sample object.

Among them, the sample object is the object used to train the feature compression network. The high-dimensional neighbor relationship refers to the neighbor relationship between the sample high-dimensional features associated with every two sample objects in the sample object, and the low-dimensional neighbor relationship refers to the sample low-dimensional features associated with every two sample objects in the sample object. neighbor relationship between them.

That is to say, the present application can use the loss function related to the neighbor relationship of the sample object to train the feature compression network to be trained until the feature compression network converges to obtain a trained feature compression network.

It should be noted that this application does not limit the algorithm used for model training, for example, it can be implemented by using a gradient descent algorithm.

The conventional loss function is generally a function established based on the error between the high-dimensional features of the sample and the low-dimensional features of the sample. In the embodiment of the present application, the loss function is constructed based on the high-dimensional neighbor relationship and the low-dimensional neighbor relationship. Compared with Dimensionality reduction through conventional dimensionality reduction algorithms, or dimensionality reduction through the feature compression network trained by conventional loss functions, can keep the low-dimensional features of the sample to maintain certain neighbor relationship information, and because the search process of the nearest neighbor search algorithm is It is realized by using the neighbor relationship between features, therefore, the method provided by the present application improves the integrity of the neighbor relationship in the compression process, thereby improving the search accuracy.

It should be noted that this application does not limit the ratio d _in /d _out of feature compression, and the ratio d _in /d _out of feature compression in practical applications can be 2, 4, 8, etc.

Step S102, using the reference low-dimensional features to establish a low-dimensional neighbor graph.

In the implementation of the present application, algorithms such as HNSW (Hierarchical Navigable Small World) algorithm and NSG (Navigating Spreading-out Graph) algorithm can be used to construct RNG, and low-dimensional neighbor graphs can be established by using reference low-dimensional features. The low-dimensional neighbor graph records the neighbor relationship between the reference low-dimensional features.

Since the similarity between features needs to be calculated in the process of constructing RNG, in the process of establishing a low-dimensional neighbor graph, only the similarity between reference low-dimensional features needs to be calculated, and there is no need to calculate the similarity between reference high-dimensional features. Similarity, so the calculation of similarity is faster, which can reduce the time consumption of building RNG and improve the construction efficiency of RNG.

Step S103, acquiring target features of the target object.

Specifically, the target object refers to the object to be queried, and its type is the same as that of the reference object and the sample object. The above-mentioned target features may refer to target high-dimensional features or target low-dimensional features of the target object.

In step S104, the nearest neighbor search is performed based on the low-dimensional neighbor graph and the target feature to obtain the reference object with the closest distance to the target object.

In some embodiments of the present application, the terminal can perform feature extraction on the target object to obtain the target high-dimensional features of the target object, and then input the target high-dimensional features of the target object into the aforementioned feature compression network to obtain the target high-dimensional features of the target object. The output target low-dimensional feature, and the obtained target low-dimensional feature is used as the target feature of the target object. Using the neighbor relationship between the reference low-dimensional features recorded in the low-dimensional neighbor map to perform nearest neighbor search, one or more reference low-dimensional features with the closest Euclidean distance to the target feature in the low-dimensional neighbor map can be searched, and combined with the search The reference objects associated with one or more reference low-dimensional features are taken as the closest reference objects to the target object.

In other embodiments of the present application, as shown in FIG. 2 , the terminal can also replace each reference low-dimensional feature of the low-dimensional neighbor graph with its corresponding reference high-dimensional feature to obtain the high-dimensional feature corresponding to the low-dimensional neighbor graph. The nearest neighbor graph, and based on the high-dimensional neighbor graph and the target feature, the nearest neighbor search is performed to obtain the reference object closest to the target object.

Specifically, the terminal may perform feature extraction on the target object to obtain target high-dimensional features of the target object, and use the obtained target high-dimensional features as target features of the target object. Using the neighbor relationship between the reference high-dimensional features recorded in the high-dimensional neighbor map to perform nearest neighbor search, one or more reference high-dimensional features with the closest Euclidean distance to the target feature in the high-dimensional neighbor map can be searched, and combined with the search The reference objects associated with one or more reference high-dimensional features are taken as the closest reference objects to the target object.

In the embodiment of the present application, by replacing each reference low-dimensional feature of the low-dimensional neighbor graph with its corresponding reference high-dimensional feature, the high-dimensional neighbor graph corresponding to the low-dimensional neighbor graph is obtained, and the high-dimensional neighbor graph is used to carry out The nearest neighbor search allows the terminal to use the Euclidean distance between the reference high-dimensional feature and the target high-dimensional feature when performing the nearest neighbor search. Compared with the Euclidean distance between the reference low-dimensional feature and the target low-dimensional feature, The accuracy of the search can be improved.

Taking the target object as the target image as an example, when the scene recognition of the target image is required, the target features of the target image can be extracted, and then one or more reference heights with the closest Euclidean distance to the target feature can be searched in the high-dimensional nearest neighbor graph. dimensional features, and the reference images associated with the searched one or more reference high-dimensional features are taken as the reference images with the closest distance to the target image, and then the scene to which the target image belongs can be determined as the reference image with the closest distance to the target image. scene.

In some embodiments of the present application, the terminal may construct the feature compression network shown in FIG. 3 . Specifically, the feature compression network may include a compression module 31 , a projection module 32 and a global optimization module 33 . The compression module 31 specifically includes a first linear mapping module 311 , a second linear mapping module 312 and a feature compression module 313 .

Correspondingly, the above step S101 may specifically include the following steps S401 to S402.

Step S401, input the reference high-dimensional features into the feature compression module, the first linear mapping module and the projection module, and obtain the first feature output by the feature compression module, the second feature output by the first linear mapping module, and the projection module At least one third feature of the output.

Specifically, the feature compression module 313 may include a linear mapping function f(x)=W _c x, the parameter

An activation function Hardswish, and a batch normalization (Batch Normalization, BN) layer. The terminal will refer to the high-dimensional features

After being input to the feature compression module 313, the first feature whose dimension is d _out can be obtained.

The first linear mapping module 311 may contain a linear mapping function

parameter

Similarly, the terminal will refer to high-dimensional features

After being input to the feature compression module 311, the second feature cp(x) whose dimension is d _out can be obtained.

The projection module 32 contains n compressed projection functions, which will refer to high-dimensional features

Enter the projection module to get the third feature of a sequence

Among them, the third feature p ⁱ (x)=w ⁱ x, the dimension is d _out , and the parameter

It can be initialized by sparse random projection.

It can be seen that the dimension of the first feature, the second feature and each third feature is the same as that of the reference low-dimensional feature, all of which are d _out .

Step S402, inputting the first feature, the second feature and at least one third feature to the global optimization module to obtain a fourth feature and at least one fifth feature output by the global optimization module.

Specifically, the global optimization module 33 may include at least one encoder. The above-mentioned terminal may combine the first feature output by the feature compression module 313 and at least one third feature output by the projection module 32 to obtain the first vector. Next, the first vector is input to the first encoder 331 of the at least one encoder to obtain a second vector output by the first encoder 331 based on the multi-attention head mechanism and the linear mapping layer. Wherein, the second vector includes a sixth feature corresponding to the first feature, and a sixth feature corresponding to the at least one third feature

One-to-one correspondence with at least one seventh feature.

Then, add the sixth feature corresponding to the first feature cp(x) in the second vector to the second feature output by the first linear mapping module 311 to obtain the eighth feature, and replace the sixth feature in the second vector For the eighth feature, the third vector is obtained. The third vector is then input to the second encoder of the at least one encoder, and so on, until the target vector output by the last encoder 33n of the at least one encoder is obtained. Wherein, the target vector includes a fourth feature and at least one fifth feature.

Fig. 5 shows the structure of a single encoder of the present application. A single encoder can include a multi-head attention module as well as an encoder mapping module.

In some embodiments of the present application, the above-mentioned multi-head attention mechanism module adopts a multi-head attention mechanism. The process for the terminal to calculate the output value of the multi-attention head mechanism may specifically include: based on the first vector

Determine the first input value Q, the second input value K and the third input value V of the multi-attention head mechanism; respectively map the first input value Q, the second input value K and the third input value V to obtain the first The first mapping value corresponding to an input value Q

The second mapping value corresponding to the second input value K

The third mapping value corresponding to the third input value V

Using the first mapping value and the second mapping value, calculate the merging parameter of the third mapping value; using the third mapping value and the merging parameter to calculate the output value head _i (Q, K, V) of the multi-attention head mechanism.

Wherein, the dimension of the first mapping value and the dimension of the second mapping value are both lower than the dimension of the third mapping value.

Specifically, the above-mentioned mapping process on the first input value Q may adopt a linear mapping method, and the obtained first mapping value may be expressed as

in,

e is the expansion coefficient, h _n is the preset number of attention heads. Similarly, the mapping process of the second input value K can also be obtained by linear mapping, and the first mapping value can be expressed as

in,

The mapping process for the third input value V can be performed in a linear mapping manner, and the obtained third mapping value can be expressed as

in,

It can be seen that after the linear mapping, the dimension of the first mapping value and the dimension of the second mapping value are lower than the dimension of the third mapping value.

In some implementations, the terminal can use the first mapping value

and the second mapped value

Divided by the product of , calculate the square root value of the preset feature dimension value, and obtain the quotient

input into the softmax function, and then use the output value of the softmax function as a combination parameter of the above-mentioned third mapping value.

That is, the output value of the multi-attention head mechanism

At this point, the terminal can combine the output of each attention head into a vector through the mapping function

The encoder mapping module can be a Linear_BN layer and a Linear_ABN layer, wherein the Linear_BN layer includes a linear mapping layer and a BN layer. The Linear_ABN layer consists of a linear mapping layer, an activation function and a BN layer in turn. The encoder mapping module is used to map the output value of the multi-attention head mechanism to the feature output by the encoder.

The conventional multi-attention head mechanism generally requires the dimensions of Q, K, and V to be the same, but in the embodiment of this application, since Q and K are only used to provide information on how to combine V, the terminal can compare Q and K Carry out dimension reduction, and use the reduced Q, the reduced K and the original dimension V to calculate the output value of the multi-attention head mechanism, which reduces the calculation amount of the softmax value during training and improves the calculation speed of the encoder. , thus improving the training efficiency of the model.

Step S403, input the fourth feature and at least one fifth feature into the compression module to obtain reference low-dimensional features output by the compression module.

Specifically, the terminal may add the fourth feature and at least one fifth feature, and input the obtained feature into the second linear mapping module 312 of the compression module 31 to obtain the reference low-dimensional feature

Wherein, the second linear mapping module 312 may include a linear mapping function

parameter

In the embodiment of the present application, the eighth feature is obtained by adding the sixth feature output by the encoder to the second feature output by the first linear mapping module, and then the eighth feature is used as the input value of the next encoder, In the process of encoder iteration, the input value of each encoder will not deviate too much from the actual feature value, and the convergence speed of the feature compression network is improved.

After completing the construction of the feature compression network, the terminal can use the loss function obtained based on the high-dimensional neighbor relationship of the sample object and the low-dimensional neighbor relationship of the sample object to train the feature compression network.

Among them, the above-mentioned high-dimensional neighbor relationship can be the high-dimensional Euclidean distance between the sample high-dimensional features associated with every two sample objects in the multiple sample objects, and the above-mentioned low-dimensional neighbor relationship is the high-dimensional Euclidean distance between every two sample objects in the multiple sample objects. The low-dimensional Euclidean distance between the sample low-dimensional features associated with the objects respectively.

Correspondingly, the loss function of the feature compression network is a function based on the error value between the high-dimensional Euclidean distance and the low-dimensional Euclidean distance corresponding to the high-dimensional Euclidean distance, and the weight value associated with the high-dimensional Euclidean distance .

In some embodiments of the present application, the value of the weight value is related to the magnitude of the associated high-dimensional Euclidean distance.

Specifically, the process of the terminal calculating the loss value loss of the above loss function may specifically include: calculating the high-dimensional Euclidean distance || _xi -x _j || ₂ and the low-dimensional Euclidean distance corresponding to the high-dimensional Euclidean distance| The error value between |f(x _i )-f(x _j )|| ₂ ; calculate the weight value ω _ij associated with the high-dimensional Euclidean distance, and use each weight value to weight and add each error value , get the accumulated value

Divide the accumulated value and the square value m ² of the total number of sample objects to obtain the loss value loss of the loss function. That is, the loss value of the loss function

Among them, m represents the total number of sample objects, ||f( _xi )-f(x _j )|| ₂ represents the high-dimensional Euclidean distance, and f( _xi ) represents the high-dimensional sample associated with the i-th sample object feature, f( _xi ) represents the high-dimensional feature of the sample associated with the j-th sample object, || _xi -x _j || ₂ represents the low-dimensional Euclidean distance, and x _i represents the sample associated with the i-th sample object Low-dimensional feature, xj represents the sample low-dimensional feature associated with the jth sample object, ω _ij represents the weight value associated with the high-dimensional Euclidean distance.

In some embodiments of the present application, the process for the terminal to calculate the above weight value ω _ij may include: obtaining the first hyperparameter α and the second hyperparameter β, and calculating the average boundary of each high-dimensional Euclidean distance; then, calculating The inverse of the natural logarithm value of the quotient of the high-dimensional Euclidean distance and the mean

Determine the maximum value between the inverse number and the second hyperparameter, and use the minimum value between the first hyperparameter and the maximum value as the weight value associated with the high-dimensional Euclidean distance.

That is, the weight value associated with the high-dimensional Euclidean distance

Wherein, the first hyperparameter α is greater than the second hyperparameter β. The specific values of α and β can be set according to the actual situation. In practical applications, α can be set to 2, and β can be set to 0.01.

Based on the above formula, when the high-dimensional Euclidean distance d _ij is small, the weight value ω _ij associated with it will be α or

The weight value ω _ij is greater than β; and when the high-dimensional Euclidean distance d _ij is larger, the weight value ω _ij associated with it will be β. Furthermore, in the loss function, the weights corresponding to the high-dimensional features of two samples with small high-dimensional Euclidean distance are higher.

In the embodiment of the present application, using the above weight formula and loss function to train the feature compression network can make the high-dimensional neighbor relationship and low-dimensional neighbor relationship between features with small high-dimensional Euclidean distance more accurate, that is, , after inputting the two reference high-dimensional features with smaller high-dimensional Euclidean distance into the trained feature compression network, the obtained neighbor relationship information between the two reference low-dimensional features will be more complete. The purpose of the nearest neighbor search algorithm is to search for the feature with the closest distance to the target feature. Therefore, this method can make the adjacent relationship information between the features with a closer distance more complete, and further improve the search accuracy.

In other embodiments, the terminal can also quantize the low-dimensional features output by the feature compression network into low-dimensional integer vectors through scalar quantization

And the quantized low-dimensional integer vector is used as a reference low-dimensional feature to construct RNG.

It should be noted that for the foregoing method embodiments, for the sake of simple description, they are expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Because of this application, certain steps may be performed in other orders.

FIG. 6 is a schematic structural diagram of a nearest neighbor search apparatus 600 provided in an embodiment of the present application, and the nearest neighbor search apparatus 600 is configured on a terminal.

Specifically, the nearest neighbor search device 600 may include:

The feature compression unit 601 is used to input the reference high-dimensional features of the reference object into the feature compression network to obtain the reference low-dimensional features output by the feature compression network. The loss function of the feature compression network is based on the high-dimensional neighbor relationship of the sample object and the sample The function obtained by the low-dimensional neighbor relationship of the object;

A neighbor graph construction unit 602, configured to construct a low-dimensional neighbor graph using reference low-dimensional features;

A feature acquisition unit 603, configured to acquire target features of the target object;

The nearest neighbor search unit 604 is configured to perform a nearest neighbor search based on the low-dimensional neighbor graph and the target feature to obtain a reference object with the closest distance to the target object.

In some embodiments of the present application, the above-mentioned high-dimensional neighbor relationship is the high-dimensional Euclidean distance between the sample high-dimensional features associated with every two sample objects in the multiple sample objects, and the above-mentioned low-dimensional neighbor relationship is the multiple sample objects The low-dimensional Euclidean distance between the sample low-dimensional features associated with every two sample objects in the object. The loss function of the above feature compression network is a function based on the error value between the high-dimensional Euclidean distance and the low-dimensional Euclidean distance corresponding to the high-dimensional Euclidean distance, and the weight value associated with the high-dimensional Euclidean distance, where , the value of the weight value is related to the size of the high-dimensional Euclidean distance associated with it.

In some embodiments of the present application, the calculation process of the loss value of the above loss function includes: calculating the error value between the high-dimensional Euclidean distance and the low-dimensional Euclidean distance corresponding to the high-dimensional Euclidean distance ; Calculate the weight value associated with the high-dimensional Euclidean distance, and use each of the weight values to weight and add each of the error values to obtain an accumulated value; combine the accumulated value and the sample object The square value of the total number is divided to obtain the loss value of the loss function.

In some embodiments of the present application, the calculation process of the above weight value includes: obtaining a first hyperparameter and a second hyperparameter, wherein the first hyperparameter is greater than the second hyperparameter; calculating each of the high an average value of the dimensional Euclidean distance; calculating the inverse of the natural logarithm value of the quotient of the high-dimensional Euclidean distance and the average; determining a maximum value between the inverse number and the second hyperparameter, and A minimum value between the first hyperparameter and the maximum value is used as a weight value associated with the high-dimensional Euclidean distance.

In some embodiments of the present application, the above-mentioned feature compression network includes a compression module, a projection module, and a global optimization module, and the compression module includes a first linear mapping module, a second linear mapping module, and a feature compression module. The above-mentioned feature compression unit 601 can be specifically used to: input the reference high-dimensional feature to the feature compression module, the first linear mapping module and the projection module, and obtain the first feature output by the feature compression module and the first feature output by the first linear mapping module. Two features, and at least one third feature output by the projection module, wherein the dimension of the first feature, the second feature and each third feature is the same as the dimension of the reference low-dimensional feature; the first feature, the second feature and At least one third feature is input to the global optimization module to obtain a fourth feature and at least one fifth feature output by the global optimization module; the fourth feature and at least one fifth feature are input to the compression module to obtain a reference output by the compression module low-dimensional features.

In some embodiments of the present application, the above-mentioned global optimization module includes at least one encoder. The above-mentioned feature compression unit 601 can be specifically configured to: form the first feature and at least one third feature into a first vector, and input the first vector into the first encoder of at least one encoder to obtain The second vector based on the multi-attention head mechanism and the linear mapping layer output, wherein the second vector includes a sixth feature corresponding to the first feature, and at least one seventh feature corresponding to at least one third feature; Add the sixth feature corresponding to the first feature in the second vector to the second feature to obtain the eighth feature; replace the sixth feature in the second vector with the eighth feature to obtain the third vector; input the third vector to The second encoder of the at least one encoder, and so on, until the target vector output by the last encoder of the at least one encoder is obtained, the target vector includes the fourth feature and at least one fifth feature.

In some embodiments of the present application, the above calculation process includes: determining the first input value, the second input value and the third input value of the multi-attention head mechanism based on the first vector; The input value, the second input value, and the third input value are respectively subjected to mapping processing to obtain the first mapping value corresponding to the first input value, the second mapping value corresponding to the second input value, and the The third mapping value corresponding to the third input value, wherein the dimension of the first mapping value and the dimension of the second mapping value are lower than the dimension of the third mapping value; using the first mapping value and The second mapping value is used to calculate a combination parameter of the third mapping value; and the output value of the multi-attention head mechanism is calculated by using the third mapping value and the combination parameter.

In some embodiments of the present application, the above-mentioned nearest neighbor search unit 604 can also be specifically configured to: replace each reference low-dimensional feature of the low-dimensional neighbor graph with its corresponding reference high-dimensional feature, and obtain the corresponding High-dimensional neighbor map; based on the high-dimensional neighbor map and target features, the nearest neighbor search is performed to obtain the reference object closest to the target object.

It should be noted that, for the convenience and brevity of the description, the specific working process of the nearest neighbor search device 600 may refer to the corresponding process of the method described in FIG. 1 to FIG. 5 , which will not be repeated here.

As shown in FIG. 7 , it is a schematic diagram of a terminal provided in the embodiment of the present application. The terminal 7 may include: a processor 70, a memory 71, and a computer program 72 stored in the memory 71 and operable on the processor 70, such as a nearest neighbor search program. When the processor 70 executes the computer program 72, it implements the steps in the above embodiments of the nearest neighbor search method, for example, steps S101 to S104 shown in FIG. 1 . Alternatively, when the processor 70 executes the computer program 72, it realizes the functions of each module/unit in the above-mentioned device embodiments, such as the feature compression unit 601 shown in FIG. 6 , the neighbor graph construction unit 602, the feature acquisition unit 603 and nearest neighbor search unit 604 .

The computer program can be divided into one or more modules/units, and the one or more modules/units are stored in the memory 71 and executed by the processor 70 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program in the terminal.

For example, the computer program can be divided into: a feature compression unit, a neighbor graph construction unit, a feature acquisition unit and a nearest neighbor search unit.

The specific functions of each unit are as follows: the feature compression unit is used to input the reference high-dimensional features of the reference object into the feature compression network to obtain the reference low-dimensional features output by the feature compression network, and the loss function of the feature compression network is based on The function obtained by the high-dimensional neighbor relationship of the sample object and the low-dimensional neighbor relationship of the sample object; the neighbor graph construction unit is used to use the reference low-dimensional feature to establish a low-dimensional neighbor graph; the feature acquisition unit is used to acquire the target object the target feature; a nearest neighbor search unit, configured to perform a nearest neighbor search based on the low-dimensional neighbor graph and the target feature, to obtain a reference object with the closest distance to the target object.

The terminal may include, but not limited to, a processor 70 and a memory 71 . Those skilled in the art can understand that FIG. 7 is only an example of a terminal, and does not constitute a limitation on the terminal. It may include more or less components than those shown in the figure, or combine certain components, or different components, such as the Terminals may also include input and output devices, network access devices, buses, and so on.

The so-called processor 70 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

The storage 71 may be an internal storage unit of the terminal, such as a hard disk or memory of the terminal. The memory 71 can also be an external storage device of the terminal, such as a plug-in hard disk equipped on the terminal, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) etc. Further, the memory 71 may also include both an internal storage unit of the terminal and an external storage device. The memory 71 is used to store the computer program and other programs and data required by the terminal. The memory 71 can also be used to temporarily store data that has been output or will be output.

Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units, Completion of modules means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware It can also be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above system, reference may be made to the corresponding process in the foregoing method embodiments, and details will not be repeated here.

In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts that are not detailed or recorded in a certain embodiment, refer to the relevant descriptions of other embodiments.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

In the embodiments provided in this application, it should be understood that the disclosed device/terminal and method may be implemented in other ways. For example, the device/terminal embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or Components may be combined or integrated into another system, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If the integrated module/unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments in the present application can also be completed by instructing related hardware through computer programs. The computer programs can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps in the above-mentioned various method embodiments can be realized. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (Read-Only Memory, ROM) , random access memory (Random Access Memory, RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable media Excludes electrical carrier signals and telecommunication signals.

The above-described embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still implement the foregoing embodiments Modifications to the technical solutions described in the examples, or equivalent replacements for some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application, and should be included in the Within the protection scope of this application.

Claims

A nearest neighbor search method, characterized in that it comprises:

Input the reference high-dimensional feature of the reference object into the feature compression network to obtain the reference low-dimensional feature output by the feature compression network, and the loss function of the feature compression network is based on the high-dimensional neighbor relationship of the sample object and the sample object The function obtained by the low-dimensional nearest neighbor relationship;

Establishing a low-dimensional neighbor graph by using the reference low-dimensional features;

Obtain the target feature of the target object;

A nearest neighbor search is performed based on the low-dimensional neighbor graph and the target feature to obtain a reference object with the closest distance to the target object.
The nearest neighbor search method according to claim 1, wherein the high-dimensional neighbor relationship is a high-dimensional Euclidean distance between sample high-dimensional features associated with every two sample objects in a plurality of sample objects , the low-dimensional neighbor relationship is a low-dimensional Euclidean distance between sample low-dimensional features associated with each two sample objects among the plurality of sample objects;

The loss function of the feature compression network is based on the error value between the high-dimensional Euclidean distance and the low-dimensional Euclidean distance corresponding to the high-dimensional Euclidean distance, and the error value associated with the high-dimensional Euclidean distance A function obtained from the weight value, wherein the value of the weight value is related to the size of the high-dimensional Euclidean distance associated with it.
The nearest neighbor search method according to claim 2, wherein the calculation process of the loss value of the loss function comprises:

calculating an error value between the high-dimensional Euclidean distance and the low-dimensional Euclidean distance corresponding to the high-dimensional Euclidean distance;

calculating a weight value associated with the high-dimensional Euclidean distance, and using each of the weight values to weight and add each of the error values to obtain an accumulated value;

The accumulated value is divided by the square value of the total number of sample objects to obtain the loss value of the loss function.
The nearest neighbor search method as claimed in claim 2 or 3, wherein the calculation process of the weight value comprises:

obtaining a first hyperparameter and a second hyperparameter, wherein the first hyperparameter is greater than the second hyperparameter;

Calculate the average value of each of the high-dimensional Euclidean distances;

Calculating the inverse of the natural logarithm value of the quotient of the high-dimensional Euclidean distance and the mean;

determining a maximum value between the inverse number and the second hyperparameter, and using a minimum value between the first hyperparameter and the maximum value as a weight value associated with the high-dimensional Euclidean distance.
The nearest neighbor search method according to any one of claims 1 to 3, wherein the feature compression network includes a compression module, a projection module and a global optimization module, and the compression module includes a first linear mapping module, a second Linear mapping module and feature compression module;

The input of the reference high-dimensional features of the reference object into the feature compression network to obtain the reference low-dimensional features output by the feature compression network includes:

The reference high-dimensional features are input to the feature compression module, the first linear mapping module, and the projection module to obtain the first features output by the feature compression module, and the first features output by the first linear mapping module and at least one third feature output by the projection module, wherein the dimensions of the first feature, the second feature, and each of the third features are the same as those of the reference low-dimensional feature the same dimension;

inputting the first feature, the second feature and the at least one third feature to the global optimization module to obtain a fourth feature and at least one fifth feature output by the global optimization module;

The fourth feature and the at least one fifth feature are input to the compression module to obtain the reference low-dimensional feature output by the compression module.
The nearest neighbor search method according to claim 5, wherein said global optimization module comprises at least one encoder;

The inputting the first feature, the second feature and the at least one third feature to the global optimization module to obtain a fourth feature and at least one fifth feature output by the global optimization module includes :

Composing the first feature and the at least one third feature into a first vector, and inputting the first vector into a first encoder in the at least one encoder, obtaining Based on the second vector output by the multi-attention head mechanism and the linear mapping layer, the second vector includes a sixth feature corresponding to the first feature, and at least one corresponding to the at least one third feature. a seventh characteristic;

adding the sixth feature corresponding to the first feature in the second vector to the second feature to obtain an eighth feature;

replacing the sixth feature in the second vector with the eighth feature to obtain a third vector;

inputting the third vector to a second encoder of the at least one encoder, and so on, until a target vector output by the last encoder of the at least one encoder is obtained, the target vector comprising The fourth feature and the at least one fifth feature.
The nearest neighbor searching method as claimed in claim 6, is characterized in that, the calculation process of the output value of described multi-attention head mechanism comprises:

determining a first input value, a second input value, and a third input value of the multi-attention head mechanism based on the first vector;

Perform mapping processing on the first input value, the second input value, and the third input value respectively to obtain a first mapping value corresponding to the first input value and a second mapping value corresponding to the second input value. a mapping value and a third mapping value corresponding to the third input value, wherein the dimension of the first mapping value and the dimension of the second mapping value are both lower than the dimension of the third mapping value;

calculating a combination parameter of the third mapped value using the first mapped value and the second mapped value;

calculating an output value of the multi-attention head mechanism by using the third mapping value and the merging parameter.
The nearest neighbor search method according to any one of claims 1 to 3, wherein the nearest neighbor search is performed based on the low-dimensional neighbor graph and the target feature to obtain a reference with the closest distance to the target object objects, including:

replacing each reference low-dimensional feature of the low-dimensional neighbor graph with its corresponding reference high-dimensional feature to obtain a high-dimensional neighbor graph corresponding to the low-dimensional neighbor graph;

A nearest neighbor search is performed based on the high-dimensional neighbor graph and the target feature to obtain a reference object with the closest distance to the target object.
A nearest neighbor search device is characterized in that it comprises:

The feature compression unit is used to input the reference high-dimensional feature of the reference object into the feature compression network to obtain the reference low-dimensional feature output by the feature compression network, and the loss function of the feature compression network is based on the high-dimensional neighbor of the sample object The function obtained by the relationship and the low-dimensional neighbor relationship of the sample object;

A neighbor graph construction unit, configured to use the reference low-dimensional features to build a low-dimensional neighbor graph;

A feature acquisition unit, configured to acquire the target feature of the target object;

The nearest neighbor search unit is configured to perform a nearest neighbor search based on the low-dimensional neighbor graph and the target feature to obtain the closest reference object to the target object.
A terminal, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, characterized in that, when the processor executes the computer program, the computer program according to claims 1 to 8 is implemented. The steps of any one of the methods.
A computer-readable storage medium storing a computer program, wherein the computer program implements the steps of the method according to any one of claims 1 to 8 when the computer program is executed by a processor.