CN117556068A

CN117556068A - Training method of target index model, information retrieval method and device

Info

Publication number: CN117556068A
Application number: CN202410043764.2A
Authority: CN
Inventors: 连德富; 陈恩红; 黎武超; 冯超
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2024-01-12
Filing date: 2024-01-12
Publication date: 2024-02-13
Anticipated expiration: 2044-01-12

Abstract

The invention provides a training method of a target index model, an information retrieval method and a device. The training method comprises the steps of carrying out vector conversion on product data to obtain a plurality of first training vectors; the product data includes image data or text data; for each second training vector, determining a first neighbor vector corresponding to the second training vector in a plurality of second training vectors based on a preset estimation method, wherein the first neighbor vector characterizes a first label vector of the second training vector; training a first classification model constructed based on a transducer by using a plurality of second training vectors and a plurality of first label vectors to obtain a second classification model, wherein each node of an output layer in the first classification model comprises a preset number of first training vectors; training the third classification model by using a plurality of third training vectors and a plurality of second label vectors to obtain a target index model, wherein the third training vectors and the second label vectors are generated by using the second classification model.

Description

Training method of target index model, information retrieval method and device

Technical Field

The present invention relates to the field of information retrieval technology, and more particularly, to a training method, an information retrieval method, a training device, and an information retrieval device for a target index model.

Background

Vector retrieval is intended to assist a user in selecting k nearest neighbor points from a large number of data points to a query point, i.e., nearest neighbor search (Nearest Neighbor Search, NNS) problem, and is widely used in information retrieval and recommendation systems. However, computing the query vector and the distance of all vectors in an unconventional manner has intolerable time overhead. In order to perform vector retrieval efficiently, index structures such as trees, graphs, locality sensitive hashes, quantization and the like are usually built, and these index structures can effectively reduce time overhead and ensure the accuracy of vector retrieval to a certain extent.

In the process of implementing the disclosed concept, it is found that at least the following problems exist in the related art: in the information matching process of the existing index model, the accuracy between the obtained matching vector and the query vector is poor, meanwhile, the storage space occupied by the index model is large, and the retrieval efficiency is low.

Disclosure of Invention

In view of this, embodiments of the present invention provide a training method for a target index model, an information retrieval method, a training apparatus for a target index model, an information retrieval apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

An aspect of an embodiment of the present invention provides a training method for a target index model, which is applied to an information retrieval system, where the method includes:

obtaining a plurality of product data from a database in response to a training instruction, wherein the product data comprises image data or text data;

vector conversion is carried out on the product data to obtain a plurality of first training vectors;

determining, by a processor, a first neighbor vector corresponding to a second training vector among a plurality of the second training vectors based on a preset estimation method, wherein the first neighbor vector characterizes a first label vector of the second training vector, and the second training vector is selected from the plurality of first training vectors;

training a first classification model constructed based on a transducer by using a plurality of second training vectors and a plurality of first label vectors to obtain a second classification model, wherein each node of an output layer in the first classification model comprises a preset number of first training vectors;

training a third classification model by using a plurality of third training vectors and a plurality of second label vectors to obtain the target index model, wherein the third classification model is constructed by using a balanced tree algorithm, and the third training vectors and the second label vectors are generated by using the second classification model.

According to an embodiment of the present invention, the third training vector and the second tag vector are generated by:

inputting a preset number of first training vectors into the second classification model for each node, and outputting an allocation confidence corresponding to each first training vector, wherein the allocation confidence represents the probability that the first training vector is allocated to the node;

determining the first training vector with the greatest confidence coefficient to be allocated as the third training vector;

processing a plurality of the third training vectors by using the second classification model, and outputting a plurality of confidence degrees corresponding to different nodes of each third training vector;

for each of the third training vectors, determining a plurality of first training vectors in a node corresponding to a maximum confidence among the plurality of confidences of the third training vector as second label vectors of the third training vector.

According to an embodiment of the present invention, determining, with a processor, a first neighbor vector corresponding to the second training vector among a plurality of the second training vectors based on a preset estimation method includes:

estimating, by the processor, a distance between the second training vector and each of the second training vectors based on the preset estimation method for each of the second training vectors;

Determining a second training vector corresponding to a minimum distance as the first neighbor vector when the preset estimation method is a Euclidean distance calculation method or a cosine distance calculation method;

and determining a second training vector corresponding to the maximum distance as the first neighbor vector when the preset estimation method is an inner product distance calculation method.

According to an embodiment of the present invention, training, based on the processor, a first classification model using a plurality of the second training vectors and a plurality of first tag vectors to obtain a second classification model includes:

inputting the second training vector into the first classification model in the processor, and outputting a first prediction confidence, wherein the first prediction confidence represents a probability that the second training vector is assigned to a node where the label vector is located, the first classification model comprises a multi-bifurcation tree model, and an output layer of the multi-bifurcation tree model comprises a plurality of the nodes;

inputting the first prediction confidence and the first label vector into a first target loss function based on the processor, and outputting a first loss result;

iteratively adjusting network parameters of the first classification model according to the first loss result to obtain the trained second classification model.

According to an embodiment of the present invention, the first objective loss function is as shown in formula (1):

（1）

wherein,qa second training vector is represented and is used to represent,and->Representing tag vectors, respectivelyyBranching sequence at h layer and branching sequence before h layer of bifurcation tree model, ++>Representation ofqThe coded token vector, cat, prob, prediction confidence,Hrepresenting the height of the bifurcation tree model, start is a pre-parameter of the first classification model.

According to an embodiment of the present invention, training, based on the processor, the third classification model using a plurality of third training vectors and a plurality of second label vectors to obtain the target index model includes:

hierarchical division is carried out on vector space by the processor through the balanced tree algorithm, and the third classification model is obtained;

distributing the first training vectors to a plurality of nodes of an output layer in the third classification model according to a preset rule;

inputting the third training vector into the third classification model in the processor, and outputting a second prediction confidence, wherein the second prediction confidence characterizes a probability that the third training vector is assigned to a node where the second label vector is located, and the third classification model comprises a multi-bifurcation tree model;

Inputting the prediction confidence and the label vector into a second target loss function, and outputting a second loss result;

and generating the trained target index model according to the second loss result.

According to an embodiment of the present invention, generating the trained target index model according to the second loss result includes:

under the condition that the second loss result does not meet a preset convergence condition, iteratively training the first classification model by using a plurality of third training vectors and a plurality of second label vectors to obtain a new second classification model, so as to generate a plurality of new third training vectors and a plurality of new second label vectors by using the new second classification model;

and training the new third classification model by using the new third training vectors and the new second label vectors to obtain the target index model.

Another aspect of an embodiment of the present invention provides an information retrieval method, including:

acquiring target query data and a plurality of data to be matched, wherein the target query data and the data to be matched comprise image data or text data;

vector conversion is carried out on the target query data and the plurality of data to be matched, so that a target query vector and a plurality of vectors to be matched are respectively obtained;

Inputting the target query vector into a target index model, and outputting a plurality of prediction confidence degrees, wherein the prediction confidence degrees represent the probability that the target query vector is distributed to one node of an output layer in the target index model, and the node stores a preset number of vectors to be matched;

and determining the vector to be matched in the node corresponding to the maximum confidence value in the plurality of prediction confidence values as a target matching vector.

Another aspect of the embodiment of the present invention provides a training apparatus for a target index model, including:

a first acquisition module for acquiring a plurality of product data from a database in response to a training instruction, wherein the product data comprises image data or text data;

the first conversion module is used for carrying out vector conversion on the product data to obtain a plurality of first training vectors;

the first determining module is used for determining, by the processor, a first neighbor vector corresponding to the second training vector among a plurality of second training vectors based on a preset estimation method, wherein the first neighbor vector characterizes a first label vector of the second training vector, and the second training vector is selected from the initial training set;

The first training module is used for training a first classification model constructed based on a transducer by utilizing a plurality of second training vectors and a plurality of first label vectors to obtain a second classification model, wherein each node of an output layer in the first classification model comprises a preset number of first training vectors;

and the second training module is used for training a third classification model by using a plurality of third training vectors and a plurality of second label vectors to obtain the target index model, wherein the third classification model is constructed by using a balanced tree algorithm, and the third training vectors and the second label vectors are generated by using the second classification model.

Another aspect of an embodiment of the present invention provides an information retrieval apparatus, including:

the second acquisition module is used for acquiring target query data and a plurality of data to be matched, wherein the target query data and the data to be matched comprise image data or text data;

the second conversion module is used for carrying out vector conversion on the target query data and a plurality of data to be matched to obtain a target query vector and a plurality of vectors to be matched respectively;

the prediction module is used for inputting the target query vector into a target index model and outputting a plurality of prediction confidence degrees, wherein the prediction confidence degrees represent the probability that the target query vector is distributed to one node of an output layer in the target index model, and the node stores a preset number of vectors to be matched;

And the second determining module is used for determining the vector to be matched in the node corresponding to the maximum confidence value in the plurality of prediction confidence values as a target matching vector.

Another aspect of an embodiment of the present invention provides an electronic device, including: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.

Another aspect of embodiments of the invention provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement a method as described above.

Another aspect of embodiments of the present invention provides a computer program product comprising computer executable instructions which, when executed, are adapted to carry out the method as described above.

According to the embodiment of the invention, the second training vectors are selected from the plurality of first training vectors, the first neighbor vector corresponding to each second training vector is determined in the plurality of second training vectors by using a preset estimation method, the first classification model is trained by using the plurality of second training vectors and the plurality of first label vectors, the second classification model is obtained, the third training vector and the second label vector are generated by using the second classification model to carry out self-distillation, and therefore, the more representative third training vector and second label vector can be used in the training process of the target index model, and the target index model with higher retrieval accuracy is obtained. The method at least partially solves the technical problem of poor accuracy between the predicted matching vector and the query vector in the information matching process in the related technology, reduces the computing resources required by information retrieval and improves the retrieval efficiency.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following description of embodiments of the present invention with reference to the accompanying drawings, in which:

FIG. 1 illustrates an exemplary system architecture diagram of a training method or information retrieval method to which a target index model may be applied, according to an embodiment of the present invention;

FIG. 2 illustrates a flow chart of a training method of a target index model according to an embodiment of the invention;

FIG. 3 shows a flow chart of an information retrieval method according to an embodiment of the invention;

FIG. 4 shows a block diagram of a training apparatus of a human body posture estimation model according to an embodiment of the present invention;

FIG. 5 shows a block diagram of an information retrieval apparatus according to an embodiment of the present invention;

fig. 6 shows a block diagram of an electronic device adapted to implement the method described above, according to an embodiment of the invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

HNSW (Hierarchical Navigable Small Word) graph index is the most advanced index in terms of efficiency and accuracy in the current conventional index structure, but its hierarchical graph structure incurs a significant space overhead, which limits its application to large-scale data sets. The index structure based on space division, although smaller in space overhead, is not as efficient and accurate as HNSW.

Around this research problem, researchers have proposed a learnable index based on spatial partitioning, aiming to improve the performance of the index using a powerful neural network in machine learning, and to greatly reduce the spatial overhead of the index. One representative operation is a multi-layer perceptron model based on an inverted list structure that uses a two-layer neural network to partition vectors in a dataset into different bins, with each node in the last layer of the neural network representing a bin classification. The neural network and the index structure are converged through iterative training and space division. When a user gives a query vector, the neural network selects the bucket with the highest score, and performs accurate distance calculation with the vector in the bucket, and finally returns k nearest neighbors. However, such an index structure is still limited in several respects:

1. the generalization ability of the learnable index is limited by the granularity of the spatial partitioning, i.e., the number of buckets. The prior art is difficult to solve the learning obstacle caused by the space division which needs to be more refined along with the expansion of the size of the database;

2. high-quality labels are key to improving the generalization capability of a learnable index, but the calculation burden of obtaining accurate neighbors for query vectors is excessive, and the approximate neighbors estimated by the existing method are not accurate enough, so that the generalization capability of the learned index is insufficient;

3. The efficiency of training a learning-type index depends primarily on the size of the query set. Existing methods typically generate a query set by uniformly sampling the data set, which can make it difficult to ensure that the query set adequately covers the data set, leaving a large space for optimizing training efficiency of learning indexes.

In view of this, the embodiments of the present invention provide a training method, an information retrieval method and an apparatus for a target index model. The training method includes obtaining a plurality of product data from a database in response to a training instruction, wherein the product data includes image data or text data; vector conversion is carried out on the product data to obtain a plurality of first training vectors; determining, by a processor, for each second training vector, a first neighbor vector corresponding to the second training vector among a plurality of second training vectors based on a preset estimation method, wherein the first neighbor vector characterizes a first label vector of the second training vector, and the second training vector is selected from an initial training set; training a first classification model constructed based on a transducer by using a plurality of second training vectors and a plurality of first label vectors to obtain a second classification model, wherein each node of an output layer in the first classification model comprises a preset number of first training vectors; training a third classification model by using a plurality of third training vectors and a plurality of second label vectors to obtain a target index model, wherein the third classification model is constructed by using a balanced tree algorithm, and the third training vectors and the second label vectors are generated by using the second classification model.

Fig. 1 is only an example of a system architecture to which embodiments of the present invention may be applied to assist those skilled in the art in understanding the technical content of the present invention, but does not mean that embodiments of the present invention may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, the system architecture 100 of this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium for providing a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages etc. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, and/or social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, the third terminal device 103.

The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by the user using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the training method or the information retrieval method of the target index model provided by the embodiment of the present invention may be executed by one of the server 105, the first terminal device 101, the second terminal device 102, the third terminal device 103, and other servers or server clusters capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the training apparatus or the information retrieving apparatus of the object index model provided in the embodiments of the present invention may be generally set in one of the server 105, the first terminal device 101, the second terminal device 102, the third terminal device 103, and a server or a server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.

It should be understood that the number of terminal devices, networks, servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, servers, as desired for implementation.

FIG. 2 shows a flowchart of a training method of a target index model according to an embodiment of the invention.

As shown in FIG. 2, the training method of the target index model applied to the information retrieval system comprises operations S201-S205.

In operation S201, a plurality of product data are acquired from a database in response to a training instruction, wherein the product data include image data or text data;

in operation S202, vector conversion is performed on the product data to obtain a plurality of first training vectors;

in operation S203, for each second training vector, determining, by the processor, a first neighbor vector corresponding to the second training vector among a plurality of second training vectors based on a preset estimation method, wherein the first neighbor vector characterizes a first label vector of the second training vector, and the second training vector is selected from the plurality of first training vectors;

in operation S204, training a first classification model constructed based on a transducer by using a plurality of second training vectors and a plurality of first tag vectors to obtain a second classification model, wherein each node of an output layer in the first classification model includes a preset number of first training vectors;

In operation S205, a third classification model is trained using a plurality of third training vectors and a plurality of second label vectors to obtain a target index model, wherein the third classification model is constructed using a balanced tree algorithm, and the third training vectors and the second label vectors are generated using the second classification model.

According to embodiments of the present invention, an information retrieval system may refer to a system that performs image matching or text matching, such as determining an image (or text) associated with a current image (or current text) from a plurality of images (or text to be matched).

According to an embodiment of the invention, the training instructions may be operated by an operator on the electronic device to output a training request, such that the electronic device generates corresponding training instructions in response to the training request, and the image data may be different types of images, such as scenic images, face images, product images, etc. Text data may refer to different types of text, such as Word or PDF type text.

According to the embodiment of the invention, after the operator inputs the training request in the electronic device, the electronic device generates the corresponding training instruction, and at this time, the electronic device can acquire a plurality of products from at least one database And carrying out vector conversion on the product data, thereby obtaining a first training vector corresponding to each product data. A predetermined number of first training vectors are selected as second training vectors from the plurality of first training vectors, for example 1% of the first training vectors are selected as second training vectors. At the same time, a plurality of first training vectors are evenly distributed into B nodes of an output layer of a first classification model constructed based on a transducer, wherein the number of the nodesK is the bifurcation number of the K-ary tree model, H is the height of the two-ary tree model, each node comprisesOr->+1 vectors, each node represented by a path of the node on the bipartite tree model, N being the number of first training vectors.

According to the embodiment of the invention, the processor determines a first neighbor vector corresponding to the second training vector from a plurality of second training vectors based on a preset estimation method, wherein the first neighbor vector can be used as a first label vector of the second training vector. For example by L ₂ The neighbor vectors are determined by means of distance, inner product distance or cosine distance.

According to the embodiment of the invention, the first classification model is trained by using a plurality of second training vectors and a plurality of first label vectors to obtain a second classification model, and simultaneously, the plurality of first training vectors are processed by using the second classification model to obtain a third training vector and a second label vector. And training the third classification model by using a plurality of third training vectors and a plurality of second label vectors to obtain a target index model.

According to an embodiment of the invention, the third training vector and the second label vector are generated by:

inputting a preset number of first training vectors into a second classification model for each node, and outputting an allocation confidence corresponding to each first training vector, wherein the allocation confidence characterizes the probability of the first training vector being allocated to the node;

Determining the first training vector with the greatest confidence coefficient to be allocated as a third training vector;

processing a plurality of third training vectors using the second classification model, outputting a plurality of confidence levels corresponding to different nodes for each third training vector;

for each third training vector, determining a plurality of first training vectors in the nodes corresponding to a maximum confidence in the plurality of confidences of the third training vector as second label vectors of the third training vector.

According to an embodiment of the invention, for each node in the second classification model, the node is includedOr->And inputting +1 first training vectors into the trained second classification model, so that the allocation confidence that each first training vector is allocated to a node can be obtained, and determining the first training vector with the largest allocation confidence as a third training vector. For a certain node->Node->First training vector of->The first training vector with the highest assignment confidence assigned to the node is selected as a representative query vector, i.e., a third training vector, as shown in equation (2):

（2）

wherein,characterizing branches at layer b in the K-ary tree model,qis the query vector, i.e., the third training vector.

According to an embodiment of the present invention, a plurality of third training vectors are processed using a second classification model, a plurality of confidences corresponding to different nodes are output corresponding to each third training vector, and a plurality of first training vectors in a node corresponding to a maximum confidence among the plurality of confidences of the third training vectors are determined as second label vectors of the third training vectors.

In one embodiment, a decoder is first used to generate top-m most likely sequences for third training vectors, the sequences include a plurality of third training vectors, the third training vectors are routed on a tree to find corresponding top-m nodes, the vectors of the m nodes are taken out to be combined as a candidate set, accurate distance calculation is performed on a small number of vectors in the candidate set, and top-k neighbor vectors closest to the third training vectors are returned to be used as second label vectors of the third training vectors.

According to an embodiment of the present invention, determining, with a processor, a first neighbor vector corresponding to a second training vector among a plurality of second training vectors based on a preset estimation method, includes:

estimating, with a processor, a distance between the second training vector and each second training vector based on a preset estimation method for each second training vector;

Under the condition that the preset estimation method is a Euclidean distance calculation method or a cosine distance calculation method, determining a second training vector corresponding to the minimum distance as a first neighbor vector;

in the case that the preset estimation method is an inner product distance calculation method, the second training vector corresponding to the maximum distance is determined as the first neighbor vector.

According to an embodiment of the present invention, training, based on a processor, a first classification model using a plurality of second training vectors and a plurality of first tag vectors, resulting in a second classification model, includes:

inputting the second training vector into a first classification model in the processor, and outputting a first prediction confidence, wherein the first prediction confidence characterizes the probability that the second training vector is distributed to the node where the label vector is located, the first classification model comprises a multi-bifurcation tree model, and an output layer of the multi-bifurcation tree model comprises a plurality of nodes;

inputting a first prediction confidence and a first label vector into a first target loss function based on the processor, and outputting a first loss result;

and iteratively adjusting network parameters of the first classification model according to the first loss result to obtain a trained second classification model.

Taking a two-layer quadtree as an example, assume that the path on the tree of the bucket (i.e., node) where a certain neighbor vector (i.e., first label vector) is located is [2,1 ] ]. The first classification model uses [ start,2]To predict [2, 1]]Where 'start' is a special symbol used for sequence generation initialization. For each level of the tree, the goal of the training is to give a historical sequence to predict the branching direction of the next level, belonging to the multi-classification problem. Given a query vectorq(i.e., the second training vector) and its neighbor vector (i.e., the first label vector)By optimizing the first objective loss function of equation (3), a trained second is obtainedAnd (5) classifying the models.

According to an embodiment of the present invention, the first objective loss function is as shown in formula (3):

（3）

It should be noted that y is a sequence tag, and it is desirable that the first classification model can generate, for example, a tag of a certain sample is a sequence of [3,1,2], which is generated step by step during training, and when generating 3, since no symbol is generated at this time, a specific symbol start (i.e., the pre-parameter of the present disclosure) needs to be given, and this start is the same for all samples. The training process is given [ start ] to generate [3], given [ start,3] to generate [3,1], and given [ start,3,1] to generate the final label [3,1,2].

According to an embodiment of the present invention, training, based on a processor, the third classification model using a plurality of third training vectors and a plurality of second label vectors to obtain a target index model includes:

hierarchical division is carried out on the vector space by using a processor through a balanced tree algorithm, so that a third classification model is obtained;

distributing a plurality of first training vectors to a plurality of nodes of an output layer in a third classification model according to a preset rule;

inputting a third training vector into a third classification model in the processor, and outputting a second prediction confidence, wherein the second prediction confidence characterizes the probability that the third training vector is distributed to the node where the second label vector is located, and the third classification model comprises a multi-bifurcation tree model;

a trained target index model is generated from the second loss result.

According to the embodiment of the invention, a processor is utilized to conduct layering division on vector space by adopting a balanced tree algorithm to obtain a new tree, namely a third classification model, then a plurality of first training vectors are distributed to a plurality of nodes of an output layer in the third classification model according to a preset rule, the third training vectors are input into the third classification model in the processor, a second prediction confidence is output, the prediction confidence and a label vector are input into a second target loss function, and a second loss result is output; a trained target index model is generated from the second loss result. Wherein the second objective loss function is the same as the first objective loss function.

According to an embodiment of the invention, generating a trained target index model from the second loss result comprises:

under the condition that the second loss result does not meet the preset convergence condition, iteratively training the first classification model by using a plurality of third training vectors and a plurality of second label vectors to obtain a new second classification model, so as to generate a plurality of new third training vectors and a plurality of new second label vectors by using the new second classification model;

and training the new third classification model by using the plurality of new third training vectors and the plurality of new second label vectors to obtain a target index model.

According to an embodiment of the present invention, the preset convergence condition may mean that a difference between the second loss results of two adjacent times is smaller than a preset value, and the preset value may be specifically set according to an actual situation, for example, 0.3.

According to the embodiment of the invention, if the difference value between the second loss results of two adjacent times is greater than or equal to a preset value, the first classification model is trained by using a plurality of third training vectors and a plurality of second label vectors in an iterative manner to obtain a new second classification model, so that a plurality of new third training vectors and a plurality of new second label vectors are generated by using the new second classification model, and then the new third classification model is trained by using the new third training vectors and the plurality of new second label vectors to obtain a target index model until the difference value between the second loss results of two adjacent times is smaller than the preset value, and the third classification model trained in the last iteration can be determined as the target index model.

Fig. 3 shows a flow chart of an information retrieval method according to an embodiment of the invention.

As shown in FIG. 3, the information retrieval method includes operations S301-S304.

In operation S301, target query data and a plurality of data to be matched are acquired, wherein the target query data and the data to be matched each include image data or text data;

in operation S302, vector conversion is performed on the target query data and the plurality of data to be matched, so as to obtain a target query vector and a plurality of vectors to be matched respectively;

in operation S303, a target query vector is input into a target index model, and a plurality of prediction confidence degrees are output, wherein the prediction confidence degrees represent the probability that the target query vector is allocated to one node of an output layer in the target index model, and the node stores a preset number of vectors to be matched;

in operation S304, a vector to be matched in a node corresponding to a confidence maximum among the plurality of prediction confidence levels is determined as a target matching vector.

In one embodiment, the target query data and the plurality of data to be matched are different types of commodity data, such as text data of commodities such as mobile phones and computers.

According to the embodiment of the invention, vector conversion is carried out on target query data and a plurality of data to be matched to respectively obtain a target query vector and a plurality of vectors to be matched, the target query vector is input into a target index model, a plurality of prediction confidence degrees corresponding to different vectors to be matched are output, and therefore the vectors to be matched in the nodes corresponding to the maximum confidence degrees can be determined as target matching vectors, and commodity data corresponding to the target matching vectors can be recommended to users in a visual mode.

In another embodiment, the target query data and the plurality of data to be matched are each different text documents. And carrying out vector conversion on different text documents to respectively obtain a target query vector and a plurality of vectors to be matched, inputting the target query vector into a target index model, and outputting a plurality of prediction confidence degrees corresponding to the different vectors to be matched, so that the vectors to be matched in the nodes corresponding to the maximum confidence degrees can be determined to be target matching vectors, and the text documents corresponding to the target matching vectors can be displayed to a user in a visual mode, so that the user can know matching texts similar to the text of the current target query data.

In another embodiment, the target query data and the plurality of data to be matched are different face images. And carrying out vector conversion on different face images to respectively obtain a target query vector and a plurality of vectors to be matched, inputting the target query vector into a target index model, and outputting a plurality of prediction confidence degrees corresponding to the different vectors to be matched, so that the vectors to be matched in the nodes corresponding to the maximum confidence degrees can be determined to be target matching vectors, and the face images corresponding to the target matching vectors can be displayed to a user in a visual mode, so that the user can know the matching face images similar to the face images of the current target query data.

According to the embodiment of the invention, the more representative third training vector and second label vector can be used in the training process of the target index model, and the third training vector and the second label vector are generated in a self-distillation mode through the second classification model, so that the target index model with higher retrieval accuracy can be obtained by training through the third training vector and the second label vector. The method at least partially solves the technical problem of poor accuracy between the predicted matching vector and the query vector in the information matching process in the related technology, and reduces the calculation resources required by information retrieval.

Fig. 4 shows a block diagram of a training device of a human body posture estimation model according to an embodiment of the invention.

As shown in fig. 4, the training apparatus 400 for the target index model includes a first acquisition module 410, a first conversion module 420, a first determination module 430, a first training module 440, and a second training module 450.

A first obtaining module 410, configured to obtain a plurality of product data from a database in response to a training instruction, where the product data includes image data or text data;

the first conversion module 420 is configured to perform vector conversion on the product data to obtain a plurality of first training vectors;

A first determining module 430, configured to determine, for each second training vector, a first neighbor vector corresponding to the second training vector from a plurality of second training vectors based on a preset estimation method, where the first neighbor vector characterizes a first label vector of the second training vector, and the second training vector is selected from the plurality of first training vectors;

a first training module 440, configured to train a first classification model constructed based on a transducer by using a plurality of second training vectors and a plurality of first label vectors, to obtain a second classification model, where each node of an output layer in the first classification model includes a preset number of first training vectors;

and a second training module 450, configured to train a third classification model by using a plurality of third training vectors and a plurality of second label vectors to obtain a target index model, where the third classification model is constructed by using a balanced tree algorithm, and the third training vectors and the second label vectors are generated by using the second classification model.

According to an embodiment of the present invention, the third training vector and the second label vector are generated by a first obtaining unit, a first determining unit, a second obtaining unit, a second determining unit.

The first obtaining unit is used for inputting a preset number of first training vectors into the second classification model for each node, and outputting an allocation confidence corresponding to each first training vector, wherein the allocation confidence represents the probability that the first training vector is allocated to the node;

a first determining unit, configured to determine a first training vector with the greatest confidence coefficient to be assigned as a third training vector;

a second obtaining unit, configured to process a plurality of third training vectors by using a second classification model, and output a plurality of confidence degrees corresponding to different nodes for each third training vector;

and a second determination unit configured to determine, for each third training vector, a plurality of first training vectors in nodes corresponding to a maximum confidence among a plurality of confidences of the third training vector as second label vectors of the third training vector.

According to an embodiment of the present invention, the first determining module 430 includes an estimating unit, a third determining unit, and a fourth determining unit.

An estimating unit for estimating, with respect to each second training vector, a distance between the second training vector and each second training vector based on a preset estimating method using a processor;

a third determining unit, configured to determine, as the first neighbor vector, a second training vector corresponding to the minimum distance in a case where the preset estimating method is a euclidean distance calculating method or a cosine distance calculating method;

and a fourth determining unit configured to determine, as the first neighbor vector, a second training vector corresponding to the maximum distance in a case where the preset estimating method is an inner product distance calculating method.

According to an embodiment of the present invention, the first training module 440 includes a third obtaining unit, a fourth obtaining unit, and a fifth obtaining unit.

A third obtaining unit, configured to input a second training vector into a first classification model in the processor, and output a first prediction confidence, where the first prediction confidence characterizes a probability that the second training vector is assigned to a node where the label vector is located, the first classification model includes a multi-bifurcation tree model, and an output layer of the multi-bifurcation tree model includes a plurality of nodes;

a fourth obtaining unit, configured to input a first prediction confidence and a first label vector into a first objective loss function based on the processor, and output a first loss result;

And a fifth obtaining unit, configured to iteratively adjust the network parameters of the first classification model according to the first loss result, to obtain a trained second classification model.

According to an embodiment of the present invention, the second training module 450 includes a sixth obtaining unit, an allocating unit, a seventh obtaining unit, an eighth obtaining unit, and a generating unit.

A sixth obtaining unit, configured to perform hierarchical division on the vector space by using the processor and adopting a balanced tree algorithm to obtain a third classification model;

the distribution unit is used for distributing the plurality of first training vectors to a plurality of nodes of an output layer in the third classification model according to a preset rule;

a seventh obtaining unit, configured to input a third training vector into a third classification model in the processor, and output a second prediction confidence, where the second prediction confidence characterizes a probability that the third training vector is assigned to a node where the second label vector is located, and the third classification model includes a multi-bifurcation tree model;

an eighth obtaining unit, configured to input the prediction confidence and the tag vector into a second objective loss function, and output a second loss result;

and the generating unit is used for generating a trained target index model according to the second loss result.

According to an embodiment of the invention, the generating unit comprises an iteration subunit, a training subunit.

The iteration subunit is configured to iteratively train the first classification model by using a plurality of third training vectors and a plurality of second label vectors to obtain a new second classification model, so as to generate a plurality of new third training vectors and a plurality of new second label vectors by using the new second classification model, where the second loss result does not meet the preset convergence condition;

and the training subunit is used for training the new third classification model by utilizing the plurality of new third training vectors and the plurality of new second label vectors to obtain a target index model.

Fig. 5 shows a block diagram of an information retrieval apparatus according to an embodiment of the invention.

As shown in fig. 5, the information retrieval apparatus 500 includes a second acquisition module 510, a second conversion module 520, a prediction module 530, and a second determination module 540.

A second obtaining module 510, configured to obtain target query data and a plurality of data to be matched, where the target query data and the data to be matched each include image data or text data;

the second conversion module 520 is configured to perform vector conversion on the target query data and the plurality of data to be matched, so as to obtain a target query vector and a plurality of vectors to be matched respectively;

The prediction module 530 is configured to input the target query vector into the target index model, and output a plurality of prediction confidence degrees, where the prediction confidence degrees characterize a probability that the target query vector is allocated to a node of an output layer in the target index model, and the node stores a preset number of vectors to be matched;

the second determining module 540 is configured to determine a vector to be matched in a node corresponding to a confidence maximum among the plurality of prediction confidence levels as a target matching vector.

Any number of the modules, units, sub-units, or at least some of the functionality of any number of the modules, units, sub-units, or sub-units according to embodiments of the invention may be implemented in one module. Any one or more of the modules, units, sub-units according to embodiments of the present invention may be implemented as split into multiple modules. Any one or more of the modules, units, sub-units according to embodiments of the invention may be implemented at least in part as hardware circuitry, such as a field programmable gate array (Field Programmable Gate Array, FPGA), a programmable logic array (Programmable Logic Arrays, PLA), a system on a chip, a system on a substrate, a system on a package, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, one or more of the modules, units, sub-units according to embodiments of the invention may be at least partly implemented as computer program modules which, when run, may perform the corresponding functions.

For example, any of the first acquisition module 410, the first conversion module 420, the first determination module 430, the first training module 440, the second training module 450, or the second acquisition module 510, the second conversion module 520, the prediction module 530, the second determination module 540 may be combined in one module/unit/sub-unit, or any of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least some of the functionality of one or more of these modules/units/sub-units may be combined with at least some of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to embodiments of the invention, at least one of the first acquisition module 410, the first conversion module 420, the first determination module 430, the first training module 440, the second training module 450, or the second acquisition module 510, the second conversion module 520, the prediction module 530, the second determination module 540 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable manner of integrating or packaging the circuitry, or any other hardware or firmware, or any one of or any suitable combination of the three. Alternatively, at least one of the first acquisition module 410, the first conversion module 420, the first determination module 430, the first training module 440, the second training module 450, or the second acquisition module 510, the second conversion module 520, the prediction module 530, the second determination module 540 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.

It should be noted that, in the embodiment of the present invention, the training device portion of the target index model corresponds to the training method portion of the target index model in the embodiment of the present invention, and the description of the training device portion of the target index model specifically refers to the training method portion of the target index model, which is not described herein. Similarly, the information retrieval device portion in the embodiment of the present invention corresponds to the information retrieval method portion in the embodiment of the present invention, and the description of the information retrieval device portion specifically refers to the information retrieval method portion and is not described herein again.

Fig. 6 shows a block diagram of an electronic device adapted to implement the method described above, according to an embodiment of the invention. The electronic device shown in fig. 6 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the invention.

As shown in fig. 6, an electronic device 600 according to an embodiment of the present invention includes a processor 601 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 602 or a program loaded from a storage section 608 into a random access Memory (Random Access Memory, RAM) 603. The processor 601 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 601 may also include on-board memory for caching purposes. Processor 601 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the invention.

In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are stored. The processor 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. The processor 601 performs various operations of the method flow according to an embodiment of the present invention by executing programs in the ROM 602 and/or the RAM 603. Note that the program may be stored in one or more memories other than the ROM 602 and the RAM 603. The processor 601 may also perform various operations of the method flow according to embodiments of the present invention by executing programs stored in the one or more memories.

According to an embodiment of the invention, the electronic device 600 may also include an input/output (I/O) interface 605, the input/output (I/O) interface 605 also being connected to the bus 604. The system 600 may also include one or more of the following components connected to an input/output (I/O) interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, and a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to an input/output (I/O) interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.

According to an embodiment of the present invention, the method flow according to an embodiment of the present invention may be implemented as a computer software program. For example, embodiments of the present invention include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the embodiment of the present invention are performed when the computer program is executed by the processor 601. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.

The present invention also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.

According to an embodiment of the present invention, the computer-readable storage medium may be a nonvolatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (EPROM) or flash Memory, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

For example, according to embodiments of the invention, the computer-readable storage medium may include ROM 602 and/or RAM 603 and/or one or more memories other than ROM 602 and RAM 603 described above.

Embodiments of the present invention also include a computer program product comprising a computer program comprising program code for performing the methods provided by the embodiments of the present invention for causing an electronic device to implement the training method or the information retrieval method of the object index model provided by the embodiments of the present invention when the computer program product is run on the electronic device.

The above-described functions defined in the system/apparatus of the embodiment of the present invention are performed when the computer program is executed by the processor 601. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of signals over a network medium, and downloaded and installed via the communication section 609, and/or installed from the removable medium 611. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

According to embodiments of the present invention, program code for carrying out computer programs provided by embodiments of the present invention may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or in assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that the features recited in the various embodiments of the invention and/or in the claims may be combined in various combinations and/or combinations even if such combinations or combinations are not explicitly recited in the invention. In particular, the features recited in the various embodiments of the invention and/or in the claims can be combined in various combinations and/or combinations without departing from the spirit and teachings of the invention. All such combinations and/or combinations fall within the scope of the invention.

The embodiments of the present invention are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the invention is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the invention, and such alternatives and modifications are intended to fall within the scope of the invention.

Claims

1. The training method of the target index model is applied to an information retrieval system and is characterized by comprising the following steps of:

determining, by a processor, a first neighbor vector corresponding to a second training vector among a plurality of the second training vectors based on a preset estimation method for each second training vector, wherein the first neighbor vector characterizes a first label vector of the second training vector, and the second training vector is selected from a plurality of first training vectors;

2. The method of claim 1, wherein the third training vector and the second label vector are generated by:

inputting a preset number of first training vectors into the second classification model for each node, and outputting an allocation confidence corresponding to each first training vector, wherein the allocation confidence characterizes the probability of the first training vector being allocated to the node;

determining a first training vector with the greatest confidence coefficient to be allocated as a third training vector;

Processing a plurality of the third training vectors using the second classification model, outputting a plurality of confidence levels corresponding to different nodes for each of the third training vectors;

3. The method of claim 1, wherein determining, with the processor, a first neighbor vector corresponding to the second training vector among the plurality of second training vectors based on a preset estimation method, comprises:

estimating, with the processor, a distance between the second training vector and each of the second training vectors based on the preset estimation method for each of the second training vectors;

determining a second training vector corresponding to the minimum distance as the first neighbor vector when the preset estimation method is a Euclidean distance calculation method or a cosine distance calculation method;

and determining a second training vector corresponding to the maximum distance as the first neighbor vector under the condition that the preset estimation method is an inner product distance calculation method.

4. The method of claim 1, wherein training a first classification model based on the processor using a plurality of the second training vectors and a plurality of first tag vectors to obtain a second classification model comprises:

inputting the second training vector into the first classification model in the processor, and outputting a first prediction confidence, wherein the first prediction confidence characterizes the probability that the second training vector is distributed to the node where the label vector is located, the first classification model comprises a multi-bifurcation tree model, and an output layer of the multi-bifurcation tree model comprises a plurality of nodes;

5. The method of claim 4, wherein the first target loss function is as shown in equation (1):

（1）

wherein,qa second training vector is represented and is used to represent,and->Representing tag vectors, respectivelyyBranching sequence at h layer and branching sequence before h layer of bifurcation tree model, ++ >Representation ofqThe coded token vector, cat, prob, prediction confidence,Hrepresenting the height of the bifurcation tree model, start is a pre-parameter of the first classification model.

6. The method of claim 1, wherein training a third classification model based on the processor using a plurality of third training vectors and a plurality of second label vectors to obtain the target index model comprises:

distributing the plurality of first training vectors to a plurality of nodes of an output layer in the third classification model according to a preset rule;

inputting the third training vector into the third classification model in the processor, and outputting a second prediction confidence, wherein the second prediction confidence characterizes the probability that the third training vector is assigned to the node where the second label vector is located, and the third classification model comprises a multi-bifurcation tree model;

7. The method of claim 6, wherein generating the trained target index model from the second loss result comprises:

and training a new third classification model by using the new third training vectors and the new second label vectors to obtain the target index model.

8. An information retrieval method, comprising:

vector conversion is carried out on the target query data and the data to be matched, so that a target query vector and a plurality of vectors to be matched are respectively obtained;

inputting the target query vector into a target index model, and outputting a plurality of prediction confidence degrees, wherein the prediction confidence degrees represent the probability that the target query vector is distributed to one node of an output layer in the target index model, the node stores a preset number of vectors to be matched, and the target index model is trained by the method as claimed in any one of claims 1 to 7;

And determining the vector to be matched in the node corresponding to the maximum confidence value in the prediction confidence values as a target matching vector.

9. A training device for a target index model, comprising:

the first determining module is used for determining, by the processor, a first neighbor vector corresponding to each second training vector in a plurality of second training vectors based on a preset estimating method, wherein the first neighbor vector characterizes a first label vector of the second training vector, and the second training vector is selected from the initial training set;

And the second training module is used for training a third classification model by utilizing a plurality of third training vectors and a plurality of second label vectors to obtain the target index model, wherein the third classification model is constructed by utilizing a balanced tree algorithm, and the third training vectors and the second label vectors are generated by utilizing the second classification model.

10. An information retrieval apparatus, comprising:

the second conversion module is used for carrying out vector conversion on the target query data and the plurality of data to be matched to respectively obtain a target query vector and a plurality of vectors to be matched;

and the second determining module is used for determining the vector to be matched in the node corresponding to the maximum confidence value among the plurality of the prediction confidence values as a target matching vector.