CN112232360A

CN112232360A - Image retrieval model optimization method, image retrieval device and storage medium

Info

Publication number: CN112232360A
Application number: CN202011066003.7A
Authority: CN
Inventors: 刘琦
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-15

Abstract

The application relates to an image retrieval model optimization method, an image retrieval device and a storage medium, wherein the method obtains an image feature set corresponding to a training data set by obtaining a first image feature of each sample image in the training data set, performs image feature extraction on any sample image in the training data set through an image retrieval model to be optimized to obtain a second image feature of the sample image, performs image feature extraction on the sample image through an image feature extraction model to obtain a third image feature of the sample image, determines the loss of model optimization according to the second image feature of the sample image, the third image feature and the image feature set corresponding to the training data set, obtains an optimized image retrieval model according to the loss, and enables the image retrieval model to be optimized to really learn the relation information between the sample images, and the optimization of the image retrieval model to be optimized is completed, so that the image retrieval model with better performance is obtained.

Description

Image retrieval model optimization method, image retrieval device and storage medium

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to an image retrieval model optimization method, an image retrieval device, and a storage medium.

Background

With the rise of deep learning technology, the development of the field of Image retrieval (Image retrieval) is greatly promoted, and for extracting Image features, a neural network has irreplaceable advantages at present, so that the neural network is increasingly used in the field of Image retrieval which is extremely dependent on the quality of the Image features.

In the related art, knowledge in some metric learning fields is also increasingly used to improve image retrieval models. It is intuitively understood that for image samples of the same class, the distance in feature space should be closer, and for image samples of different classes, the distance in feature space should be farther. However, the current methods for improving the image retrieval model are usually limited to the number of single batch (samples), and the concerned feature relationship is limited to the current iteration (number of iterations) of the network and the current batch. However, since the image retrieval task usually has a large number of categories and a small number of image samples of each category, the image retrieval model improved by the current method is not global.

Disclosure of Invention

In view of the above, it is necessary to provide an image retrieval model optimization method, an image retrieval device, and a storage medium, which can have globality, for solving the problem that the conventional method for improving the image retrieval model does not have globality.

A method of image retrieval model optimization, the method comprising:

acquiring a first image feature of each sample image in a training data set to obtain an image feature set corresponding to the training data set, wherein the training data set comprises a plurality of sample images;

for any sample image in the training data set, carrying out image feature extraction through an image retrieval model to be optimized to obtain a second image feature of the sample image;

performing image feature extraction on the sample image through an image feature extraction model to obtain a third image feature of the sample image, wherein the image feature extraction model is obtained after momentum updating is performed on the basis of the image retrieval model to be optimized;

determining the loss of model optimization according to the second image feature and the third image feature of the sample image and the image feature set corresponding to the training data set;

and carrying out back propagation on the image retrieval model to be optimized according to the loss to obtain the optimized image retrieval model.

In one embodiment, the determining a loss of model optimization according to the second image feature, the third image feature and the image feature set corresponding to the training data set of the sample image includes: according to the third image feature of the sample image, updating the first image feature corresponding to the sample image in the image feature set to obtain an updated image feature set; determining a loss of model optimization according to the updated image feature set and the second image features of the sample image.

In one embodiment, the determining a loss of model optimization according to the updated image feature set and the second image feature of the sample image includes: acquiring positive example features and negative example features corresponding to the sample image based on the updated image feature set; and calculating the loss of model optimization according to the positive example feature and the negative example feature corresponding to the sample image and the second image feature of the sample image.

In one embodiment, the obtaining positive and negative example features corresponding to the sample image based on the updated image feature set includes: acquiring an updated first image feature corresponding to the sample image in the updated image feature set, and determining the updated first image feature as a normal feature corresponding to the sample image; and acquiring other first image features except the positive example features corresponding to the sample image in the updated image feature set, and determining the first image features as the negative example features corresponding to the sample image.

In one embodiment, the calculating a loss of model optimization according to the positive example feature, the negative example feature and the second image feature of the sample image includes: according to the positive example feature, the negative example feature and the second image corresponding to the sample imageCharacterized by calculating the model-optimized loss using an information noise convergence estimation loss function

Wherein q is a second image feature corresponding to the sample image, t₊Is the corresponding positive example feature of the sample image, t_i-And the number of the ith negative example features corresponding to the sample image is K, and tau is a hyperparameter.

In one embodiment, the obtaining the first image feature of each sample image in the training data set to obtain an image feature set corresponding to the training data set includes: performing image feature extraction on each sample image in the training data set by adopting an initial image feature extraction model to obtain a first image feature of each sample image; based on the first image features of each sample image in the training data set, an image feature set corresponding to the training data set is generated.

In one embodiment, the performing, by using an image retrieval model to be optimized, image feature extraction on any sample image in the training data set to obtain a second image feature of the sample image includes: for any sample image in the training data set, inputting the sample image into an image retrieval model to be optimized to obtain a second original image characteristic corresponding to the sample image; and carrying out affine transformation and norm normalization processing on the second original image characteristics to obtain the correspondingly processed second image characteristics.

In one embodiment, the performing, by an image feature extraction model, image feature extraction on the sample image to obtain a third image feature of the sample image includes: momentum updating is carried out on the image feature extraction model based on the image retrieval model to be optimized, and an updated image feature extraction model is obtained; inputting the sample image into an image feature extraction model after momentum updating to obtain a third original image feature corresponding to the sample image; and carrying out affine transformation and norm normalization processing on the third original image characteristics to obtain correspondingly processed third image characteristics.

An image retrieval method, the method comprising:

acquiring an image to be retrieved;

and inputting the image to be retrieved into the optimized image retrieval model obtained by the image retrieval model optimization method to obtain a corresponding target image.

An image retrieval model optimization apparatus, the apparatus comprising:

the image feature set acquisition module is used for acquiring a first image feature of each sample image in a training data set to obtain an image feature set corresponding to the training data set, wherein the training data set comprises a plurality of sample images;

the second image feature extraction module is used for extracting image features of any sample image in the training data set through an image retrieval model to be optimized to obtain second image features of the sample image;

the third image feature extraction module is used for extracting image features of the sample image through an image feature extraction model to obtain third image features of the sample image, and the image feature extraction model is obtained after momentum updating is carried out on the basis of the image retrieval model to be optimized;

the loss obtaining module is used for determining the loss of model optimization according to the second image characteristic and the third image characteristic of the sample image and the image characteristic set corresponding to the training data set;

and the model optimization module is used for performing back propagation on the image retrieval model to be optimized according to the loss to obtain the optimized image retrieval model.

An image retrieval apparatus, the apparatus comprising:

the image to be retrieved acquiring module is used for acquiring an image to be retrieved;

and the retrieval processing module is used for inputting the image to be retrieved into the optimized image retrieval model obtained by the image retrieval model optimizing method to obtain a corresponding target image.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method as described above when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.

The image retrieval model optimization method, the image retrieval device and the storage medium obtain the image feature set corresponding to the training data set by obtaining the first image feature of each sample image in the training data set, extract the image feature of any sample image in the training data set by the image retrieval model to be optimized to obtain the second image feature of the sample image, extract the image feature of the sample image by the image feature extraction model to obtain the third image feature of the sample image, determine the loss of model optimization according to the second image feature of the sample image, the third image feature and the image feature set corresponding to the training data set, perform back propagation on the image retrieval model to be optimized according to the loss to obtain the optimized image retrieval model, thereby enabling the image retrieval model to be optimized to really learn the relationship information between the sample images, and the optimization of the image retrieval model to be optimized is completed, so that the image retrieval model with better performance is obtained.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a method for optimizing an image retrieval model according to one embodiment;

FIG. 2 is a flowchart illustrating the step of obtaining a feature set of an image according to one embodiment;

FIG. 3 is a flowchart illustrating the step of obtaining second image features in one embodiment;

FIG. 4 is a flowchart illustrating the step of obtaining a third image feature according to one embodiment;

FIG. 5 is a schematic flow chart diagram illustrating the loss step of determining model optimization in one embodiment;

FIG. 6 is a schematic flow chart diagram illustrating the penalty step of computational model optimization in one embodiment;

FIG. 7 is a flowchart illustrating a method for optimizing an image retrieval model according to another embodiment;

FIG. 8 is a flowchart illustrating an image retrieval method according to one embodiment;

FIG. 9 is a block diagram showing the configuration of an image search model optimizing apparatus according to an embodiment;

FIG. 10 is a block diagram showing the configuration of an image search device according to an embodiment;

FIG. 11 is a diagram illustrating an internal structure of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, there is provided an image retrieval model optimization method, as shown in fig. 1, including the following steps:

step 110, obtaining a first image feature of each sample image in the training data set, and obtaining an image feature set corresponding to the training data set.

The training data set comprises a plurality of sample images, and is sample data used for optimizing an image retrieval model to be optimized so that retrieval effect can be improved. And the image retrieval model to be optimized may be any neural network model for image retrieval in conventional techniques. The first image feature is an image feature result obtained by performing image feature extraction on a sample image in the training data set. In this embodiment, when a certain image retrieval model needs to be optimized, a training data set for model optimization needs to be obtained first, and then image feature extraction is performed on each sample image in the training data set, so as to obtain first image features corresponding to each sample image, and an image feature set corresponding to the training data set is obtained based on the first image features corresponding to each sample image in the training data set one to one, that is, the image feature set includes the first image features of each sample image.

And step 120, for any sample image in the training data set, performing image feature extraction through an image retrieval model to be optimized to obtain a second image feature of the sample image.

The image retrieval model to be optimized may be any neural network model for performing image retrieval in the conventional technology, for example, the neural network model may be a neural network model for performing a graph search in a search engine (Google, hundredth), a neural network model for performing similar commodity search in an e-commerce website (naobao, Amazon, ebay), a neural network model for performing similar content recommendation in a social platform (Pinterest), and the like. And the second image characteristic is an image characteristic result obtained after the sample image passes through the image retrieval model to be optimized. Specifically, in this embodiment, for any sample image in the training data set, the sample image is input into the image retrieval model to be optimized, so as to obtain the second image feature corresponding to the sample image.

And step 130, performing image feature extraction on the sample image through the image feature extraction model to obtain a third image feature of the sample image.

The image feature extraction model is obtained by momentum updating based on the image retrieval model to be optimized. And the third image characteristic is an image characteristic result obtained after the sample image passes through the image characteristic extraction model. Specifically, for any sample image in the training data set (i.e., the sample image in step 120), the sample image is input into an image feature extraction model obtained by momentum update based on the image retrieval model to be optimized, so as to obtain a third image feature corresponding to the sample image.

And 140, determining the loss of model optimization according to the second image characteristic and the third image characteristic of the sample image and the image characteristic set corresponding to the training data set.

The loss of model optimization represents the relationship information between the sample images, and is used for optimizing the image retrieval model to be optimized. In this embodiment, the loss of model optimization is determined by the second image feature and the third image feature corresponding to any sample image in the training data set and the image feature set corresponding to the training data set, and then the model parameter of the image retrieval model to be optimized is adjusted based on the loss of model optimization, so that the image retrieval model to be optimized can really learn the relationship information between the sample images, thereby completing the optimization of the image retrieval model to be optimized and improving the retrieval effect of the optimized image retrieval model.

And 150, reversely propagating the image retrieval model to be optimized according to the loss to obtain the optimized image retrieval model.

Specifically, after the loss of the model optimization is obtained through the above steps, the image retrieval model to be optimized may be subjected to one-time back propagation according to the loss, and the calculation of one-time iterative training of the image retrieval model to be optimized is completed, so as to update the model parameters of the image retrieval model to be optimized. It can be understood that, for each sample image in the training data set, the processing may be performed according to the method in steps 120 to 150, so that the image retrieval model to be optimized can really learn the relationship information between the sample images, so as to complete the optimization of the image retrieval model to be optimized, and improve the retrieval effect of the optimized image retrieval model.

The image retrieval model optimization method comprises the steps of obtaining an image feature set corresponding to a training data set by obtaining a first image feature of each sample image in the training data set, carrying out image feature extraction on any sample image in the training data set through an image retrieval model to be optimized to obtain a second image feature of the sample image, carrying out image feature extraction on the sample image through an image feature extraction model to obtain a third image feature of the sample image, determining the loss of model optimization according to the second image feature and the third image feature of the sample image and the image feature set corresponding to the training data set, carrying out back propagation on the image retrieval model to be optimized according to the loss to obtain the optimized image retrieval model, and enabling the image retrieval model to be optimized to really learn the relation information between the sample images, and the optimization of the image retrieval model to be optimized is completed, so that the image retrieval model with better performance is obtained.

In an embodiment, as shown in fig. 2, in step 110, obtaining a first image feature of each sample image in the training data set to obtain an image feature set corresponding to the training data set may specifically include the following steps:

and step 111, performing image feature extraction on each sample image in the training data set by using an initial image feature extraction model to obtain a first image feature of each sample image.

In this embodiment, the structure of the image retrieval model to be optimized is given, and the image retrieval model to be optimized is initialized based on ImageNet (a large visual database for visual object recognition software research) pre-training parameters, so as to obtain the image retrieval model to be optimized, and the initial image feature extraction model is the same as the image retrieval model to be optimized. In this embodiment, each sample image in the training data set is input into the initial image feature extraction model, so as to obtain a first image feature corresponding to each sample image.

Step 112, based on the first image feature of each sample image in the training data set, an image feature set corresponding to the training data set is generated.

Specifically, image feature extraction is performed on each sample image in the training data set respectively, so that first image features corresponding to each sample image are obtained, and an image feature set corresponding to the training data set is obtained based on the first image features corresponding to each sample image in the training data set one to one.

In the above embodiment, an initial image feature extraction model is adopted to perform image feature extraction on each sample image in the training data set to obtain the first image feature of each sample image, and an image feature set corresponding to the training data set is generated based on the first image feature of each sample image in the training data set.

In an embodiment, as shown in fig. 3, in step 120, for any sample image in the training data set, image feature extraction is performed through an image retrieval model to be optimized to obtain a second image feature of the sample image, which may specifically include the following steps:

and step 121, inputting the sample image into an image retrieval model to be optimized for any sample image in the training data set to obtain a second original image characteristic corresponding to the sample image.

The image retrieval model to be optimized generally comprises a backbone network and a full-connection layer norm normalization part, and can be updated according to normal gradient. And the second original image characteristic is the output of the sample image after passing through the backbone network of the image retrieval model to be optimized, namely the sample image passes through the backbone network of the image retrieval model to be optimized and is not subjected to full-connection layer and norm normalization processing to obtain the original image characteristic. Specifically, for any sample image in the training data set, the sample image is input into the image retrieval model to be optimized, so that a second original image feature corresponding to the sample image is obtained.

And step 122, performing affine transformation and norm normalization processing on the second original image characteristics to obtain correspondingly processed second image characteristics.

Affine transformation, also called affine mapping, is a process of performing linear transformation on one vector space and then translating the vector space into another vector space in geometry. The norm normalization processing can be processed in a L2 norm normalization mode, L2 norm normalization is to divide each dimension data X1, X2, …, xn of the vector X by | | | | |2 (namely L2 norm) to obtain a new vector, the influence of units between features can be eliminated through L2 norm normalization processing, and the convergence speed of the learning model is improved. In this embodiment, the dimension of the second original image feature output through the backbone network of the image retrieval model to be optimized is made to coincide with the dimension of the first original image feature output through the initial image feature extraction model by performing affine transformation on the second original image feature. And further carrying out norm normalization processing on the second original image characteristics after affine transformation, thereby obtaining the second image characteristics after corresponding processing.

In the above embodiment, for any sample image in the training data set, the second original image feature corresponding to the sample image is obtained by inputting the sample image into the image retrieval model to be optimized, and affine transformation and norm normalization processing are performed on the second original image feature to obtain the second image feature after corresponding processing, so that the image retrieval model to be optimized can learn the relationship information between the sample images.

In an embodiment, as shown in fig. 4, in step 130, performing image feature extraction on the sample image through an image feature extraction model to obtain a third image feature of the sample image, which may specifically include the following steps:

and 131, momentum updating is carried out on the image feature extraction model based on the image retrieval model to be optimized, and the updated image feature extraction model is obtained.

The initial image feature extraction model is the same as the initial image retrieval model to be optimized in a same mode and comprises a backbone network and a full-connection layer norm normalization part, specifically, the image feature extraction model and the initial image retrieval model to be optimized adopt the same network structure and initialization parameters, the image retrieval model to be optimized can be updated according to normal gradient, and the image feature extraction model can adopt the image retrieval model to be optimizedThe number of iterations (iteration) is updated by a running average (i.e., momentum update). Specifically, if the model parameter of the image retrieval model to be optimized is θ_qThe model parameter of the image feature extraction model before momentum updating is theta_kThen theta_k+1＝mθ_k+(1-m)θ_qWherein, theta_k+1And extracting model parameters of the model for the image features after momentum updating, wherein m is the momentum size which can be specifically a super parameter preset before network training. In this embodiment, momentum update is performed on the image feature extraction model based on the image retrieval model to be optimized, so as to obtain an updated image feature extraction model.

Step 132, inputting the sample image into the momentum-updated image feature extraction model to obtain a third original image feature corresponding to the sample image.

Similarly, the third original image feature is output of the sample image after passing through the backbone network of the momentum-updated image feature extraction model, that is, the original image feature obtained by the sample image after passing through the backbone network of the momentum-updated image feature extraction model without being subjected to full-link layer and norm normalization processing. Specifically, the sample image (i.e., the sample image determined in step 121) is input into the momentum-updated image feature extraction model, so as to obtain a third original image feature corresponding to the sample image.

And step 133, performing affine transformation and norm normalization processing on the third original image feature to obtain a correspondingly processed third image feature.

Specifically, affine transformation and norm normalization processing are performed on the third original image feature, so that a correspondingly processed third image feature is obtained. The affine transformation and norm normalization processing may adopt the same method as that in step 122, which is not described again in this embodiment.

In the above embodiment, the momentum of the image feature extraction model is updated based on the image retrieval model to be optimized to obtain an updated image feature extraction model, the sample image is input into the momentum-updated image feature extraction model to obtain a third original image feature corresponding to the sample image, affine transformation and norm normalization processing are performed on the third original image feature to obtain a third image feature which has the same dimensionality as the first image feature and is subjected to corresponding processing, so that the image retrieval model to be optimized can learn the relationship features between the sample images.

In an embodiment, as shown in fig. 5, in step 140, determining a loss of model optimization according to the second image feature and the third image feature of the sample image and the image feature set corresponding to the training data set may specifically include the following steps:

and step 141, updating the first image feature corresponding to the sample image in the image feature set according to the third image feature of the sample image to obtain an updated image feature set.

The image feature set comprises first image features of each sample image, and the first image features are obtained by performing image feature extraction on each sample image in the training data set by adopting an initial image feature extraction model. And the third image characteristic is obtained by adopting an image characteristic extraction model after momentum updating to extract the image characteristic of a certain sample image in the training data set. Therefore, in this embodiment, based on the third image feature of a certain sample image, the first image feature corresponding to the sample image in the image feature set is updated, that is, the first image feature corresponding to the sample image in the image feature set is updated to the third image feature of the sample image, so as to obtain an updated image feature set, thereby ensuring consistency and slowness of features in the image feature set.

And 142, determining the loss of model optimization according to the updated image feature set and the second image feature of the sample image.

Specifically, the loss of model optimization is determined according to the updated image feature set and the second image features of the sample images, and then the model parameters of the image retrieval model to be optimized are adjusted based on the loss of model optimization, so that the image retrieval model to be optimized can really learn the relationship information between the sample images, and the optimization of the image retrieval model to be optimized is completed.

In an embodiment, as shown in fig. 6, in step 142, determining a loss of model optimization according to the updated image feature set and the second image feature of the sample image may specifically include the following steps:

step 601, based on the updated image feature set, acquiring positive example features and negative example features corresponding to the sample image.

The positive example feature refers to a first image feature corresponding to any sample image in the training data set in the updated image feature set, and the negative example feature is other first image features except the positive example feature corresponding to the sample image in the updated image feature set. That is, for any sample image in the training data set, the corresponding positive example feature is the first image feature corresponding to the sample image in the updated image feature set, and the negative example feature is all the first image features except the positive example feature corresponding to the sample image in the updated image feature set. Based on the positive example feature and the negative example feature corresponding to any sample image in the training data set can be determined.

Step 602, calculating the loss of model optimization according to the positive example feature and the negative example feature corresponding to the sample image and the second image feature of the sample image.

Specifically, the present embodiment may adopt an information Noise convergence Estimation Loss function (info Noise-convergence Estimation Loss) to calculate the Loss of model optimization, wherein the information Noise convergence Estimation Loss function

Where q is the second image feature corresponding to the sample image (i.e., the sample image determined in step 121), and t is₊For the corresponding positive example feature of the sample image, t_i-And K is the number of the negative example features, and tau is a settable hyper-parameter.

As shown in fig. 7, the image retrieval model optimization method in the present application is further described below based on fig. 7, wherein a backbone network (i.e., a backbone network) takes googlenet as an example, and a pre-training parameter on Imagenet is given for initialization, and the specific scheme includes the following steps:

1. as shown in fig. 7, model _ q (701 in the figure) is an image retrieval model to be optimized, which includes two parts, namely, a backbone _ q (googlelenet) and a head _ q (single-layer full-connected layer plus L2 norm normalization), and model _ k is an image feature extraction model, which may also be called a momentum encoder (momentum encoder), and is a network which is the same as model _ q, including two parts, namely, backbone _ k and head _ k. Initially, the model _ k is the same initialization parameter as the model _ q, the model _ q is updated according to the normal gradient, and the model _ k is directly updated by the running average of each iteration number (iteration) of the model _ q parameter, and the specific updating method is shown as follows:

θ_k+1＝mθ_k+(1-m)θ_qwherein, theta_k+1Model parameters, θ, for the momentum updated model _ k_kModel parameters, θ, for model _ k before momentum update_qThe model parameter is model _ q, and m is the momentum size, which may be a hyper-parameter preset before network training.

2. Acquiring a training data set for model optimization, assuming that the training data set contains N sample images, and inputting each sample image into an initial model _ k to obtain a corresponding first image feature, namely f₁,f₂,…,f_NBased on the N image features, obtaining a corresponding image feature set f_iAnd the size of the Memory Bank is N multiplied by D, wherein N is the number of elements in the set, and D is a characteristic dimension.

3. For model _ q, for any sample image m in the training dataset, it outputs the corresponding feature (i.e. the second original image feature) through background _ q, if it is marked as f_mThen f is_m∈R^dAnd d is the characteristic dimension.

In the present embodiment, by pairing f_mAffine transformation is performed so that f_mHaving a and f_iSame feature dimensionIn this embodiment, a specific affine transformation method is as follows:

q_m′＝Wf_m+ b; wherein W ∈ R^d×DW represents a parameter matrix of affine transformation, b is an offset, d is f_mD is a pair f_mDimension of object feature for affine transformation, f_mIs the second original image characteristic obtained after the sample image m is output by model _ q, q_mIs' as a pair of f_mAnd carrying out affine transformation to obtain corresponding image characteristics.

And then q is_m' normalization processing of L2 norm is carried out (affine transformation and normalization processing of L2 norm are both completed by head _ q in the figure), so that a second image characteristic after corresponding processing, namely q, is obtained. The L2 norm normalization processing procedure is as follows:

wherein q is_mIs' as a pair of f_mCorresponding image features, | q, obtained after affine transformation_m′||₂For image features q_m' corresponding L2 norm, D is the characteristic dimension, q_m′_jIs q_m' dimension j image feature.

Wherein q is a pair q_m' second image feature obtained after L2 norm normalization processing.

4. Similarly, for the model _ k network, the same sample image m outputs a feature k (i.e., a third original image feature) through a backbone _ k and a head _ k, and if the feature k is recorded as k, the corresponding feature stored in the memory bank of the sample image m is updated according to the index (index) of the sample image m, so that the updated memory bank is obtained.

5. For sample image m, it corresponds to the first image feature t in Memory Bank_i+For positive example characteristics (for ease of explanation, t is used in the following formula)₊Representation) and other first image features in Memory Bank

The negative example feature corresponding to the sample image m is obtained (for the sake of convenience, t is used in the following formula_i-Expressed), where K is the number of negative example features.

6. For the sample image m, the infoNCE loss corresponding to the sample image is obtained based on the positive and negative features in the corresponding Memory Bank obtained as follows:

wherein q is a second image feature corresponding to the sample image m, t₊Is the positive example feature corresponding to the sample image m, t_i-The ith negative example feature corresponding to the sample image m is shown, K is the number of the negative example features, and tau is a settable hyper-parameter.

Based on this, for an arbitrary sample image n, info loss is obtained based on the positive and negative case features in the Memory Bank:

wherein q is_nFor a second image feature corresponding to an arbitrary sample image n, t_n+For the positive example feature corresponding to the sample image n,

the j-th negative example feature corresponding to the sample image n is shown, K is the number of the negative example features, and tau is a settable hyper-parameter.

7. And the Loss feedback updates the network parameters of the model _ q according to the gradient.

The image retrieval model optimization method provided by the embodiment stores feature vectors of a large number of sample images by introducing the Memory Bank, solves the problem of insufficient number of sample images in a single Bank, extracts relationship information among samples by using infoNCE loss which is excellent in representation and learning tasks, ensures feature multiplexing by using the Memory Bank, provides a large number of negative samples, ensures consistency and degeneration of features in the Memory Bank by using a momentum encoder, and overcomes the defect that the features in the original Memory Bank come from different iterations and different samples, thereby greatly improving the effect of optimizing the image retrieval model.

In one embodiment, as shown in fig. 8, there is provided an image retrieval method including the steps of:

step 810, acquiring an image to be retrieved.

The image to be retrieved is an image to be subjected to image retrieval, and specifically, the image retrieval includes, but is not limited to, image retrieval performed by searching images in a search engine, image retrieval performed by searching similar commodities in an e-commerce website, image retrieval performed by recommending similar contents in a social platform, and the like.

And step 820, inputting the image to be retrieved into the image retrieval model to obtain a corresponding target image.

The image retrieval model is an optimized image retrieval model obtained by the image retrieval model optimization method described in fig. 1 to 7. It can be understood that, after the image retrieval model is obtained by the image retrieval model optimization method described in fig. 1 to fig. 7, the image retrieval model may be trained based on a scene related to image retrieval in practical application, so that features in a specific scene can be learned, and the image retrieval is performed on the image to be retrieved by the trained image retrieval model, so as to obtain a target image whose similarity to the image to be retrieved reaches a target value, where the target value may be set according to the practical application scene.

Since the image retrieval model for performing image retrieval is obtained by the image retrieval model optimization method described in fig. 1 to 7, the image retrieval model can really learn the relationship characteristics between sample images, so as to achieve the effect of optimizing the image retrieval model, and when the image retrieval model is trained and used for a specific image retrieval task, the effect of the specific image retrieval task can be improved.

It should be understood that although the various steps in the flow charts of fig. 1-8 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-8 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 9, there is provided an image retrieval model optimization apparatus including: an image feature set obtaining module 901, a second image feature extracting module 902, a third image feature extracting module 903, a loss obtaining module 904, and a model optimizing module 905, wherein:

an image feature set obtaining module 901, configured to obtain a first image feature of each sample image in a training data set, to obtain an image feature set corresponding to the training data set, where the training data set includes a plurality of sample images;

a second image feature extraction module 902, configured to perform image feature extraction on any sample image in the training data set through an image retrieval model to be optimized, to obtain a second image feature of the sample image;

a third image feature extraction module 903, configured to perform image feature extraction on the sample image through an image feature extraction model to obtain a third image feature of the sample image, where the image feature extraction model is obtained after momentum update is performed on the basis of the image retrieval model to be optimized;

a loss obtaining module 904, configured to determine a loss of model optimization according to the second image feature and the third image feature of the sample image and the image feature set corresponding to the training data set;

and the model optimization module 905 is configured to perform back propagation on the image retrieval model to be optimized according to the loss to obtain an optimized image retrieval model.

In one embodiment, the loss acquisition module includes: the updating unit is used for updating the first image feature corresponding to the sample image in the image feature set according to the third image feature of the sample image to obtain an updated image feature set; and the loss determining unit is used for determining the loss of model optimization according to the updated image feature set and the second image feature of the sample image.

In one embodiment, the loss determination unit includes: the characteristic determining subunit is used for acquiring positive example characteristics and negative example characteristics corresponding to the sample image based on the updated image characteristic set; and the loss calculation subunit is used for calculating the loss of model optimization according to the positive example feature and the negative example feature corresponding to the sample image and the second image feature of the sample image.

In one embodiment, the feature determination subunit is specifically configured to: acquiring an updated first image feature corresponding to the sample image in the updated image feature set, and determining the updated first image feature as a normal feature corresponding to the sample image; and acquiring other first image features except the positive example features corresponding to the sample image in the updated image feature set, and determining the first image features as the negative example features corresponding to the sample image.

In one embodiment, the loss calculation subunit is specifically configured to: calculating the model optimization loss by adopting an information noise convergence estimation loss function according to the positive example feature, the negative example feature and the second image feature corresponding to the sample image, wherein the information noise convergence estimation loss function

In one embodiment, the image feature set acquisition module is specifically configured to: performing image feature extraction on each sample image in the training data set by adopting an initial image feature extraction model to obtain a first image feature of each sample image; based on the first image features of each sample image in the training data set, an image feature set corresponding to the training data set is generated.

In one embodiment, the second image feature extraction module is specifically configured to: for any sample image in the training data set, inputting the sample image into an image retrieval model to be optimized to obtain a second original image characteristic corresponding to the sample image; and carrying out affine transformation and norm normalization processing on the second original image characteristics to obtain the correspondingly processed second image characteristics.

In one embodiment, the third image feature extraction module is specifically configured to: momentum updating is carried out on the image feature extraction model based on the image retrieval model to be optimized, and an updated image feature extraction model is obtained; inputting the sample image into an image feature extraction model after momentum updating to obtain a third original image feature corresponding to the sample image; and carrying out affine transformation and norm normalization processing on the third original image characteristics to obtain correspondingly processed third image characteristics.

For specific limitations of the image retrieval model optimization device, reference may be made to the above limitations of the image retrieval model optimization method, which are not described herein again. The respective blocks in the image retrieval model optimization apparatus described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, as shown in fig. 10, an image retrieval apparatus is provided, which includes an image to be retrieved acquisition module 1001 and a retrieval processing module 1002, wherein:

an image to be retrieved acquisition module 1001 configured to acquire an image to be retrieved;

the retrieval processing module 1002 is configured to input the image to be retrieved into the optimized image retrieval model obtained by the image retrieval model optimization method described above, so as to obtain a corresponding target image.

For specific limitations of the image retrieval apparatus, reference may be made to the above limitations of the image retrieval method, which are not described herein again. The modules in the image retrieval device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing the image feature set corresponding to the training data set. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image retrieval model optimization method or to implement an image retrieval method.

Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of: according to the third image feature of the sample image, updating the first image feature corresponding to the sample image in the image feature set to obtain an updated image feature set; determining a loss of model optimization according to the updated image feature set and the second image features of the sample image.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring positive example features and negative example features corresponding to the sample image based on the updated image feature set; and calculating the loss of model optimization according to the positive example feature and the negative example feature corresponding to the sample image and the second image feature of the sample image.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring an updated first image feature corresponding to the sample image in the updated image feature set, and determining the updated first image feature as a normal feature corresponding to the sample image; and acquiring other first image features except the positive example features corresponding to the sample image in the updated image feature set, and determining the first image features as the negative example features corresponding to the sample image.

In one embodiment, the processor, when executing the computer program, further performs the steps of: calculating the model optimization loss by adopting an information noise convergence estimation loss function according to the positive example feature, the negative example feature and the second image feature corresponding to the sample image, wherein the information noise convergence estimation loss function

In one embodiment, the processor, when executing the computer program, further performs the steps of: performing image feature extraction on each sample image in the training data set by adopting an initial image feature extraction model to obtain a first image feature of each sample image; based on the first image features of each sample image in the training data set, an image feature set corresponding to the training data set is generated.

In one embodiment, the processor, when executing the computer program, further performs the steps of: for any sample image in the training data set, inputting the sample image into an image retrieval model to be optimized to obtain a second original image characteristic corresponding to the sample image; and carrying out affine transformation and norm normalization processing on the second original image characteristics to obtain the correspondingly processed second image characteristics.

In one embodiment, the processor, when executing the computer program, further performs the steps of: momentum updating is carried out on the image feature extraction model based on the image retrieval model to be optimized, and an updated image feature extraction model is obtained; inputting the sample image into an image feature extraction model after momentum updating to obtain a third original image feature corresponding to the sample image; and carrying out affine transformation and norm normalization processing on the third original image characteristics to obtain correspondingly processed third image characteristics.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

acquiring an image to be retrieved;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: according to the third image feature of the sample image, updating the first image feature corresponding to the sample image in the image feature set to obtain an updated image feature set; determining a loss of model optimization according to the updated image feature set and the second image features of the sample image.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring positive example features and negative example features corresponding to the sample image based on the updated image feature set; and calculating the loss of model optimization according to the positive example feature and the negative example feature corresponding to the sample image and the second image feature of the sample image.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring an updated first image feature corresponding to the sample image in the updated image feature set, and determining the updated first image feature as a normal feature corresponding to the sample image; and acquiring other first image features except the positive example features corresponding to the sample image in the updated image feature set, and determining the first image features as the negative example features corresponding to the sample image.

In one embodiment, the computer program when executed by the processor further performs the steps of: calculating the model optimization loss by adopting an information noise convergence estimation loss function according to the positive example feature, the negative example feature and the second image feature corresponding to the sample image, wherein the information noise convergence estimation loss function

Wherein q is a second image feature corresponding to the sample image, t₊Is the corresponding positive example feature of the sample image, t_i-The ith negative example characteristic corresponding to the sample imageK is the number of the negative example characteristics, and tau is a hyperparameter.

In one embodiment, the computer program when executed by the processor further performs the steps of: performing image feature extraction on each sample image in the training data set by adopting an initial image feature extraction model to obtain a first image feature of each sample image; based on the first image features of each sample image in the training data set, an image feature set corresponding to the training data set is generated.

In one embodiment, the computer program when executed by the processor further performs the steps of: for any sample image in the training data set, inputting the sample image into an image retrieval model to be optimized to obtain a second original image characteristic corresponding to the sample image; and carrying out affine transformation and norm normalization processing on the second original image characteristics to obtain the correspondingly processed second image characteristics.

In one embodiment, the computer program when executed by the processor further performs the steps of: momentum updating is carried out on the image feature extraction model based on the image retrieval model to be optimized, and an updated image feature extraction model is obtained; inputting the sample image into an image feature extraction model after momentum updating to obtain a third original image feature corresponding to the sample image; and carrying out affine transformation and norm normalization processing on the third original image characteristics to obtain correspondingly processed third image characteristics.

In one embodiment, the computer program when executed by the processor further performs the steps of:

acquiring an image to be retrieved;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image retrieval model optimization method, the method comprising:

2. The method of claim 1, wherein determining a loss of model optimization based on the second image feature, the third image feature of the sample image, and the set of image features corresponding to the training dataset comprises:

according to the third image feature of the sample image, updating the first image feature corresponding to the sample image in the image feature set to obtain an updated image feature set;

determining a loss of model optimization according to the updated image feature set and the second image features of the sample image.

3. The method of claim 2, wherein determining a loss of model optimization from the updated set of image features and the second image features of the sample image comprises:

acquiring positive example features and negative example features corresponding to the sample image based on the updated image feature set;

and calculating the loss of model optimization according to the positive example feature and the negative example feature corresponding to the sample image and the second image feature of the sample image.

4. The method of claim 3, wherein obtaining positive and negative example features corresponding to the sample image based on the updated set of image features comprises:

acquiring an updated first image feature corresponding to the sample image in the updated image feature set, and determining the updated first image feature as a normal feature corresponding to the sample image;

and acquiring other first image features except the positive example features corresponding to the sample image in the updated image feature set, and determining the first image features as the negative example features corresponding to the sample image.

5. The method of claim 4, wherein calculating a loss of model optimization based on the positive and negative side features corresponding to the sample image and the second image feature of the sample image comprises:

calculating the model optimization loss by adopting an information noise convergence estimation loss function according to the positive example feature, the negative example feature and the second image feature corresponding to the sample image, wherein the information noise convergence estimation loss function

6. The method of any one of claims 1 to 5, wherein the obtaining the first image feature of each sample image in the training data set to obtain an image feature set corresponding to the training data set comprises:

performing image feature extraction on each sample image in the training data set by adopting an initial image feature extraction model to obtain a first image feature of each sample image;

based on the first image features of each sample image in the training data set, an image feature set corresponding to the training data set is generated.

7. The method according to any one of claims 1 to 5, wherein the obtaining, for any one sample image in the training data set, a second image feature of the sample image by performing image feature extraction through an image retrieval model to be optimized comprises:

for any sample image in the training data set, inputting the sample image into an image retrieval model to be optimized to obtain a second original image characteristic corresponding to the sample image;

and carrying out affine transformation and norm normalization processing on the second original image characteristics to obtain the correspondingly processed second image characteristics.

8. The method according to any one of claims 1 to 5, wherein the performing image feature extraction on the sample image through an image feature extraction model to obtain a third image feature of the sample image comprises:

momentum updating is carried out on the image feature extraction model based on the image retrieval model to be optimized, and an updated image feature extraction model is obtained;

inputting the sample image into an image feature extraction model after momentum updating to obtain a third original image feature corresponding to the sample image;

and carrying out affine transformation and norm normalization processing on the third original image characteristics to obtain correspondingly processed third image characteristics.

9. An image retrieval method, characterized in that the method comprises:

acquiring an image to be retrieved;

inputting the image to be retrieved into the optimized image retrieval model obtained by the image retrieval model optimization method according to any one of claims 1 to 8, and obtaining a corresponding target image.

10. An image retrieval model optimization apparatus, characterized in that the apparatus comprises:

11. An image retrieval apparatus, characterized in that the apparatus comprises:

a retrieval processing module, configured to input the image to be retrieved into the optimized image retrieval model obtained by the image retrieval model optimization method according to any one of claims 1 to 8, so as to obtain a corresponding target image.

12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.