CN117521768A

CN117521768A - Training method, device, equipment and storage medium of image search model

Info

Publication number: CN117521768A
Application number: CN202311554803.7A
Authority: CN
Inventors: 李林超; 周凯; 权家新
Original assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Current assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-02-06

Abstract

The embodiment of the invention discloses a training method, device and equipment for an image search model and a storage medium. The method comprises the following steps: respectively performing M times of data processing on N original images to obtain a sample image set; extracting local features of sample images in each sample image set through a backbone network in an image search model to obtain a local feature sequence; extracting global features through an attention layer in an image search model to obtain a global feature sequence; quantizing through a quantization layer in the image search model to obtain a quantized feature sequence; and constructing a loss function according to the global feature sequence and the quantized feature sequence of each original image, and performing unsupervised training on the image search model by adopting the loss function. According to the technical scheme provided by the embodiment of the invention, the loss function in the image search model is constructed according to the difference between the global feature sequences and the quantized feature sequences of different original images, so that the image search efficiency and the search accuracy are improved.

Description

Training method, device, equipment and storage medium of image search model

Technical Field

The embodiment of the invention relates to the technical field of image recognition, in particular to a training method, device and equipment of an image search model and a storage medium.

Background

Along with the rapid development of the express industry, the variety of express is more and more, so that many packages which harm social stability can be transported through the express. When contraband is found, it is important to trace back the source of the express package.

In the prior art, the security inspector needs to search and match the express security inspection images corresponding to the contraband from the massive express security inspection images. However, when the forbidden package inspection is performed in the mode, time and labor are consumed, the searching difficulty is high, the searching efficiency is low due to the fact that the number of forbidden packages is large, and the searching accuracy of the forbidden packages is difficult to guarantee.

Disclosure of Invention

The invention provides a training method, device, equipment and storage medium of an image search model, so as to improve search efficiency and search accuracy.

In a first aspect, an embodiment of the present invention provides a training method for an image search model, where the method includes:

respectively carrying out M times of data processing on N original images to obtain a sample image set corresponding to the original images;

Respectively extracting local features of the sample images in each sample image set through a backbone network in an image search model to be trained to obtain a local feature sequence corresponding to the original image;

extracting global features of the local feature sequence of the original image through an attention layer in an image search model to obtain a global feature sequence of the original image;

quantizing the global feature sequence of the original image through a quantization layer in an image search model to obtain a quantized feature sequence of the original image;

and constructing a loss function according to the global feature sequence and the quantized feature sequence of each original image, and performing unsupervised training on the image search model by adopting the loss function.

In a second aspect, an embodiment of the present invention further provides a training apparatus for an image search model, where the apparatus includes:

the data processing module is used for respectively processing M times of data on N original images to obtain a sample image set corresponding to the original images;

the local feature extraction module is used for extracting local features of the sample images in each sample image set through a backbone network in an image search model to be trained, so as to obtain a local feature sequence corresponding to the original image;

The global sequence determining module is used for respectively carrying out global feature extraction on the local feature sequences of the original images through the attention layer in the image searching model to obtain global feature sequences of the original images;

the quantization sequence determining module is used for quantizing the global feature sequence of the original image through a quantization layer in an image search model to obtain a quantization feature sequence of the original image;

and the loss function construction module is used for constructing a loss function according to the global feature sequence and the quantized feature sequence of each original image, and performing unsupervised training on the image search model by adopting the loss function.

In a third aspect, an embodiment of the present invention provides an image searching method, including:

acquiring at least two target images to be processed;

inputting the at least two target images into an image search model to obtain the distance between different target images; the image search model can be obtained by training the image search model by adopting the training method.

In a fourth aspect, an embodiment of the present invention further provides an image searching apparatus, including:

The target image determining module is used for acquiring at least two target images to be processed;

the target distance determining module is used for inputting the at least two target images into an image searching model to obtain the distance between different target images; wherein, the image search model can be provided by a training device of the image search model.

In a fifth aspect, an embodiment of the present invention further provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform a training method of an image search model or an image search method of any of the embodiments of the present invention.

In a sixth aspect, embodiments of the present invention further provide a storage medium containing computer-executable instructions that, when executed by a computer processor, enable the computer processor to perform any one of the training methods of the image search model or an image search method provided by the embodiments of the present invention.

According to the embodiment of the invention, M times of data processing are respectively carried out on N original images, so that a sample image set corresponding to the original images is obtained; respectively extracting local features of sample images in each sample image set through a backbone network in an image search model to be trained to obtain a local feature sequence corresponding to an original image; global feature extraction is respectively carried out on the local feature sequences through the attention layers in the image search model, so that global feature sequences are obtained; quantizing the global feature sequence through a quantization layer in the image search model to obtain a quantized feature sequence; and constructing a loss function according to the global feature sequences and the quantized feature sequences of different original images, and performing unsupervised training on the image search model by adopting the loss function. According to the technical scheme provided by the embodiment of the invention, the loss function in the image search model is constructed according to the difference between the global feature sequences and the quantized feature sequences of different original images, so that the image search efficiency and the search accuracy are improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a training method of an image search model according to a first embodiment of the present invention;

FIG. 2 is a flowchart of another training method of an image search model according to a second embodiment of the present invention;

fig. 3 is a flowchart of an image searching method according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a training device for an image search model according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of an image searching apparatus according to a fifth embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to a training method of an image search model or an image search method provided in a sixth embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In addition, it should be noted that, in the technical scheme of the invention, the related processes of collection, storage, use, processing, transmission, provision, disclosure and the like of the related data and the like all conform to the regulations of related laws and regulations and do not violate the popular regulations.

Example 1

Fig. 1 is a flowchart of a training method for an image search model according to an embodiment of the present invention, where the present embodiment is applicable to an unsupervised training of an image search model in combination with different original images, and the method may be performed by a training device for an image search model, where the training device for an image search model may be implemented in hardware and/or software, and the training device for an image search model may be configured in an electronic device, where the electronic device may be a terminal device or a server, and the embodiment of the present invention is not limited thereto.

As shown in fig. 1, the training method for an image search model provided by the embodiment of the invention specifically includes the following steps:

s110, respectively performing M times of data processing on N original images to obtain a sample image set corresponding to the original images.

Specifically, in response to a data processing requirement, M times of data processing are performed on N original images, so that N sample image sets corresponding to the N original images can be obtained, where each sample image set has M sample images, that is, after M times of data processing are performed on each original image, M sample images in the sample image set corresponding to the original image can be obtained. It will be appreciated that the N original images subjected to data processing may be different. In particular, the number of sheets of the original image and the number of times of data processing may be specified manually in advance, and the embodiment of the present invention does not limit the number of sheets of the original image and the number of times of data processing.

In one embodiment, the number of data processing times is 2 assuming that the original images are 3, so that after 2 times of data processing are performed on the 3 original images, 3 sample image sets corresponding to the 3 original images can be obtained, where each sample image set corresponds to 2 sample images, that is, after 2 times of data processing are performed on each original image, 2 sample images in the sample image set corresponding to the original image can be obtained. According to the embodiment of the invention, M times of data processing are respectively carried out on N original images, so that a sample image set corresponding to the original images can be obtained, and M sample images are arranged in each sample image set. According to the technical scheme provided by the embodiment of the invention, data processing is performed on the original images, a corresponding sample image set is obtained through each original image, and sample images in the sample image set are obtained so as to carry out subsequent work based on the sample images extracted from each original image.

S120, respectively extracting local features of the sample images in each sample image set through a backbone network in an image search model to be trained, and obtaining a local feature sequence corresponding to the original image.

Specifically, through a pre-constructed backbone network in an image search model to be trained, sample images in each sample image set corresponding to the original images are respectively processed, local feature information in each sample image can be extracted, and then local feature sequences corresponding to each original image can be obtained.

The backbone network may be selected from convolutional neural network models (Convlution Neural Network, CNN), including, but not limited to AlexNet, resNet, VGG or DenseNet. In particular, for the backbone network in the image search model, a professional technician can also set up the backbone network for extracting the characteristic information in the image data by combining the characteristics of different CNN models according to the acquired image data. The embodiment of the invention does not limit the selection of the backbone network in the image search model.

According to the technical scheme provided by the embodiment of the invention, the local features of the sample images in each sample image set are extracted through the main network in the image search model to be trained, and the local feature sequence corresponding to the original image is obtained based on the extracted local features, so that the extracted local feature sequence is processed conveniently, and the feature extraction capability of the subsequent model is improved.

S130, global feature extraction is carried out on the local feature sequences of the original images through an attention layer in an image search model, and global feature sequences of the original images are obtained.

Specifically, through the attention layer in the pre-constructed image search model, feature extraction is performed on the local feature sequences extracted based on the original image respectively, so as to extract global feature information in the local feature sequences, and further obtain global feature sequences corresponding to the original image.

The selection of the attention layer may be a Self-attention mechanism (Self-Attention Machanism), a Multi-head Self-attention mechanism (Multi-head Self-Attention Machanism), a channel attention (Channel Attention), a spatial attention (Spatial Attention), or a fusion of a channel and a spatial attention mechanism (Convlosulform BlockAttentionModule, CBAM), which is not limited in the embodiment of the present invention.

According to the technical scheme provided by the embodiment of the invention, the attention layer in the image search model is used for extracting global feature information based on the local feature sequence extracted from the original image, the global feature sequence of the original image is obtained based on the extracted global feature information, the generated global feature sequence has global property through the multi-layer perceptron (Multilayer Perceptron, MLP), the influence on the feature extraction capacity due to different positions of the features is reduced while the feature value receptive field is increased, and the extracted global feature sequence is further processed.

S140, quantizing the global feature sequence of the original image through a quantization layer in an image search model to obtain a quantized feature sequence of the original image.

Specifically, the global feature sequence extracted based on the original image is quantized through a quantization layer in the pre-constructed image search model, so that effective feature information in the global feature sequence is extracted, and further a quantized feature sequence corresponding to the original image is obtained.

According to the technical scheme provided by the embodiment of the invention, the global feature sequence extracted based on the original image is quantized through the quantization layer in the image search model, so that the effective feature information in the global feature sequence is extracted, and the quantized feature sequence corresponding to the original image is obtained, so that the calculation amount of a subsequent model is reduced, the feature matching speed is increased, the quick matching of the effective features is realized, and the influence of irrelevant noise is restrained.

S150, constructing a loss function according to the global feature sequence and the quantized feature sequence of each original image, and performing unsupervised training on the image search model by adopting the loss function.

Specifically, a loss function can be constructed based on the global feature sequence and the quantized feature sequence acquired by the original image, and the constructed loss function is adopted in the pre-constructed image search model to be trained, so that a complete image search model is constructed, and subsequent training of an unsupervised model is facilitated.

In particular, when the image search model constructed according to the embodiment of the present invention performs unsupervised training, the adopted deep learning framework includes, but is not limited to Pytorch, tensorFlow, keras, caffe, and the embodiment of the present invention does not limit the selection of the deep learning framework.

It should be noted that, according to the global feature sequence and the quantized feature sequence of each original image, a loss function is constructed, and based on the constructed loss function, an image search model is perfected, so as to develop unsupervised training. By optimizing the loss function in the image search model and performing unsupervised training, the image search model can reduce the intra-group distance and increase the inter-group distance when calculating the loss of the sample image, and effective feature information is extracted from the sample image. The non-supervision training has the advantages that the sample images do not need to be marked manually, the marking workload is reduced, the characteristic information in the sample images can be automatically extracted through the image search model, and the efficiency is greatly improved.

Example two

Fig. 2 is a flowchart of another training method for an image search model according to the second embodiment of the present invention, where the technical solution of the embodiment of the present invention is further optimized based on the foregoing alternative technical solutions.

Further, "constructing a loss function according to the global feature sequence and the quantized feature sequence of each original image, and performing unsupervised training on the image search model by adopting the loss function", further refining into "for any original image, determining hidden similarity according to the correlation between different global coefficients in the global feature sequence of the original image and the correlation between the global coefficient corresponding to the original image and the global coefficients in the global feature sequences of other original images; determining quantized feature similarity according to the correlation between different quantized features in the quantized feature sequence of the original image and the correlation between the quantized features corresponding to the original image and the quantized features in the quantized feature sequences of other original images; and summing the hidden similarity and the quantized feature similarity, taking the summation result as a loss function, carrying out unsupervised training on the image search model by adopting the loss function, and constructing the loss function in the image search model according to the difference between the global feature sequence and the quantized feature sequence of different original images so as to improve the image search efficiency and the search accuracy. It should be noted that, in the present embodiment, parts not described in the present embodiment may refer to the related expressions of other embodiments, which are not described herein.

As shown in fig. 2, another training method for an image search model provided by the embodiment of the present invention specifically includes the following steps:

s210, respectively performing M times of data processing on N original images to obtain a sample image set corresponding to the original images.

Specifically, in response to a requirement of performing data processing on the original images, M times of data processing are performed on N original images, and sample image sets corresponding to the N original images are obtained, where each sample image set has M sample images, that is, after M times of data processing are performed on the N original images, N sample image sets corresponding to the original images can be obtained, and n×m sample images can be obtained. In particular, the number of sheets of the original image and the number of times of data processing may be specified manually in advance, and the embodiment of the present invention does not limit the number of sheets of the original image and the number of times of data processing.

Optionally, the processing the N original images for M times to obtain a sample image set corresponding to the original images includes: screening and combining candidate data processing modes aiming at each original image to obtain a data processing group of the original image; the data processing group comprises M times of data processing modes; and carrying out M times of data processing on the original image by adopting the data processing group of the original image to obtain a sample image set corresponding to the original image.

When each original image is subjected to data processing, candidate data processing modes including but not limited to rotation, overturn, illumination change, color change, blurring and affine transformation can be screened and combined to obtain a data processing group of the original image, the data processing group comprises M data processing modes, and the data processing group of the original image is adopted to perform M data processing on the original image to obtain a sample image set corresponding to the original image.

It should be noted that, the data processing manner adopted when the data processing is performed on each original image is a data processing group obtained by screening and combining the candidate data processing manners based on the candidate data processing manners, and then randomly generating the data processing group for each original image, that is, the data processing group obtained by screening and combining the candidate data processing manners including, but not limited to, rotation, flipping, changing illumination, changing color, blurring, affine transformation, and the data processing group includes M data processing manners. It will be appreciated that since the data processing groups adopted for the respective original images are randomly generated, there is also a difference between sample images generated for the respective original images when the respective original images are subjected to data processing based on the data processing groups. Optionally, when data processing is performed on each original image, candidate data processing modes may be screened and combined to obtain a randomly generated data processing group, and the same data processing group is adopted for performing data processing operation on each original image based on the data processing group. The screening and combining modes of the data processing modes and whether the same data processing group is adopted for data processing operation are not limited.

According to the technical scheme provided by the embodiment of the invention, each original image is screened and combined based on the candidate data processing mode to obtain the data processing group of the original image, the data processing group comprises M data processing modes, and the data processing group of the original image is adopted to process M data of the original image, so that a sample image set corresponding to the original image can be obtained. And obtaining a data processing group by screening and combining different data processing modes, and performing data processing on the original image based on the data processing group so as to generate sample images under different conditions, thereby improving the characteristic extraction capability of the model on the sample images.

S220, respectively extracting local features of the sample images in each sample image set through a backbone network in an image search model to be trained, and obtaining a local feature sequence corresponding to the original image.

S230, global feature extraction is carried out on the local feature sequences of the original images through an attention layer in an image search model, and global feature sequences of the original images are obtained.

Specifically, through a pre-constructed backbone network in an image search model to be trained, sample images in each sample image set corresponding to the original images are respectively processed, local feature information in each sample image can be extracted, and then local feature sequences corresponding to each original image can be obtained. And based on the extracted local feature sequences, extracting global features of the local feature sequences of the original images through the attention layers in the image search model, so that the global feature sequences of the original images can be obtained.

Optionally, the global feature extraction is performed on the local feature sequences of the original images by the attention layer in the image search model to obtain global feature sequences of the original images, which includes: processing a local feature map of an original image by using a first vector in an attention layer to obtain a first processing result, processing a global coefficient of the original image by using a second vector in the attention layer to obtain a second processing result, and obtaining the global coefficient of the original image according to the first processing result, the second processing result and a preset feature sequence; the local feature map belongs to the local feature sequence, and the global coefficient belongs to the global feature sequence; processing the local feature image of the original image by adopting a third vector in the attention layer to obtain a third processing result, and determining a global feature value of the original image according to the third processing result and a global coefficient of the original image; the global feature value belongs to the global feature sequence.

Specifically, the process of obtaining the global feature sequence of the original image may be that a first vector K in the attention layer is used to process a local feature map background_feature of the original image, so as to obtain a first processing result K (background_feature), a second vector Q in the attention layer is used to process a global coefficient Emd _feature of the original image, so as to obtain a second processing result Q (Emd _feature), and the global coefficient of the original image may be obtained according to the first processing result K (background_feature), the second processing result Q (Emd _feature) and a preset feature sequence d. Wherein the local feature map belongs to a local feature sequence, and the global coefficient belongs to a global feature sequence. And processing the local feature map background_feature of the original image by adopting a third vector V in the attention layer to obtain a third processing result V (background_feature), and determining a global feature value of the original image according to the third processing result V (background_feature) and a global coefficient of the original image, wherein the global feature value belongs to the global feature sequence. In particular, the attention layer in the image search model may determine the global coefficient and the global feature value of the original image by using the following formula, so as to obtain the global feature sequence of the original image:

seq_feature＝MLP(Emd_Feature×V(backbone_feature))；

Wherein Emd _feature is a global coefficient of the original image; the seq_feature is the global feature value of the original image; the backup feature is a local feature map of the original image; k is a vector, and a corresponding feature sequence is obtained by inquiring the features in the local feature map; q is a vector, and the generated feature sequence is guided to realize the prediction of the effective feature sequence; kxq is the pairwise correlation between the serialized feature vector and the predicted vector; d is a feature sequence of D dimension; v represents inquiring the feature vector in the backup_feature; MLP represents a full convolution of the product of Emd _feature and V (back_feature).

Specifically, when global features of the local feature sequence of the original image are extracted by the attention layer in the image search model based on the extracted local feature sequence, the global coefficient and the global feature value of the original image can be determined by the above formula, and then the global feature sequence of the original image is obtained.

According to the technical scheme provided by the embodiment of the invention, local features in a sample image are extracted through a backbone network in a pre-constructed image search model to be trained, a corresponding local feature sequence is obtained, global features in the local feature sequence are extracted through an attention layer in the image search model, global coefficients and global feature values of an original image are determined, and then a corresponding global feature sequence is obtained, so that further extraction of effective features of the model is realized.

S240, quantizing the global feature sequence of the original image through a quantization layer in an image search model to obtain a quantized feature sequence of the original image.

Optionally, the quantizing the global feature sequence of the original image by a quantizing layer in the image search model is used for obtaining a quantized feature sequence of the original image, which is specifically: taking the product between the global eigenvalue at any position and the quantization matrix at the position as the quantization intermediate quantity at the position; the global feature value at the position belongs to the global feature sequence of the original image; summing the quantized intermediate quantities at each position to obtain an intermediate quantity sum, determining the ratio between the quantized intermediate quantity at the position and the intermediate quantity sum, and determining the quantization characteristic of the original image according to the ratio and the quantization matrix at the position; the quantized features of the original image belong to a sequence of quantized features of the original image.

Specifically, for quantization matrix c at any position _y The global feature value seq_feature at that location may be used _i And quantization matrix c at that location _y The product is performed and the result of the product is taken as the quantized intermediate quantity at that location. The intermediate quantity sum can be obtained by summing the quantized intermediate quantities at each position, the ratio between the quantized intermediate quantity at the position and the intermediate quantity sum is determined, and further the quantized feature of the original image is determined according to the ratio and the quantized matrix at the position, and the quantized feature of the original image belongs to the quantized feature sequence of the original image. In particular, the quantization layer in the image search model may quantize the global feature sequence of the original image using the following formula to determine the quantized feature sequence of the original image:

wherein, atten _i Is the ith quantization feature; seq_feature _i The ith global feature to be quantized; c _j 、c _y The quantization matrix at the j-th position and the quantization matrix at the y-th position are respectively; k is the number of global features.

Specifically, when the global feature sequence of the original image is quantized through the quantization layer in the image search model, the quantization operation on the global feature sequence of the original image can be realized through the formula, and then the quantized feature sequence of the sample image corresponding to the original image is obtained.

According to the technical scheme provided by the embodiment of the invention, the global feature sequence of the original image is quantized through the quantization layer in the image search model, so that the quantized feature sequence of the original image can be obtained, the calculated amount of the model is reduced, the feature matching speed is increased, the quick matching of effective features is realized, and the influence of irrelevant noise is restrained.

S250, determining hidden similarity according to correlation between different global coefficients in a global feature sequence of an original image and correlation between the global coefficients corresponding to the original image and the global coefficients in the global feature sequences of other original images aiming at any original image.

Specifically, for any given original image, the hidden similarity may be determined according to the following formula according to the correlation between different global coefficients in the global feature sequence of the original image and the correlation between the global coefficient corresponding to the original image and the global coefficients in the global feature sequences of other original images:

wherein, loss _emd Is global feature similarity; emd _feature-Emd _feature ₊ Correlation among different global coefficients in a global feature sequence of the original image is obtained; emd _feature-Emd _feature _{_} The correlation between the global coefficient corresponding to the original image and the global coefficients in the global feature sequences of other original images is obtained; n represents the number of original images; m represents the number of times of data processing on the original image.

Illustratively, N original images are subjected to 2 times of data processing as an example, and explanation is made. At this time, the formula for calculating the hidden similarity is shown below, and the number of times of data processing is not limited in the embodiment of the present invention:

the technical scheme provided by the embodiment of the invention can realize that for any given original image, the hidden similarity is determined according to the correlation between different global coefficients in the global feature sequence of the original image and the correlation between the global coefficient corresponding to the original image and the global coefficients in the global feature sequences of other original images.

S260, determining the similarity of the quantized features according to the correlation between different quantized features in the quantized feature sequence of the original image and the correlation between the quantized features corresponding to the original image and the quantized features in the quantized feature sequences of other original images.

Specifically, according to the correlation between different quantized features in the quantized feature sequence of the original image and the correlation between the quantized features corresponding to the original image and quantized features in the quantized feature sequences of other original images, the quantized feature similarity can be determined by the following formula:

Wherein, loss _obj To quantify feature similarity;correlation between different quantized features in a quantized feature sequence of the original image; />Is the correlation between different quantized features in the quantized feature sequence of the original image.

According to the technical scheme provided by the embodiment of the invention, the quantized feature similarity can be determined according to the correlation between different quantized features in the quantized feature sequence of the original image and the correlation between the quantized features corresponding to the original image and quantized features in quantized feature sequences of other original images.

S270, summing the hidden similarity and the quantized feature similarity, taking the summation result as a loss function, and performing unsupervised training on the image search model by adopting the loss function.

Specifically, when determining the loss function, the hidden similarity and the quantized feature similarity can be summed, and the summation result is used as the constructed loss function, and the constructed loss function can be adopted in the image search model to construct a complete image search model, so that subsequent training of the unsupervised model is facilitated. The loss function may be determined by the following formula:

in a specific embodiment, when determining the loss function of an original image according to the hidden similarity and the quantized feature similarity of any original image, if there is a lack of correlation between different global coefficients in the global feature sequence of the original image and a correlation between a global coefficient corresponding to the original image and global coefficients in the global feature sequences of other original images, the hidden similarity of the original image does not exist, so the loss function of the original image can be determined by the following formula:

According to the embodiment of the invention, the candidate data processing modes are screened and combined for each original image to obtain the data processing group of the original image; the data processing group comprises M times of data processing modes; performing M times of data processing on the original image by adopting the data processing group of the original image to obtain a sample image set corresponding to the original image; the method comprises the steps of respectively carrying out local feature extraction on sample images in each sample image set through a backbone network, an attention layer and a quantization layer in an image search model to be trained to determine local feature sequences, carrying out global feature extraction on the local feature sequences to obtain global feature sequences, and carrying out quantization on the global feature sequences to obtain quantized feature sequences; determining hidden similarity according to the correlation between different global coefficients in the global feature sequence of the original image and the correlation between the global coefficients corresponding to the original image and the global coefficients in the global feature sequences of other original images for any original image; determining quantized feature similarity according to the correlation between different quantized features in the quantized feature sequence of the original image and the correlation between the quantized features corresponding to the original image and quantized features in quantized feature sequences of other original images; and summing the hidden similarity and the quantized feature similarity, taking the summation result as a loss function, and performing unsupervised training on the image search model by adopting the loss function. According to the technical scheme provided by the embodiment of the invention, the loss function in the image search model is constructed according to the difference between the global feature sequences and the quantized feature sequences of different original images, so that the image search efficiency and the search accuracy are improved.

Example III

Fig. 3 is a flowchart of an image searching method according to a third embodiment of the present invention, where the method may be applied to a case of obtaining a distance between different target images, and the method may be performed by an image searching apparatus, where the image searching apparatus may be implemented in a form of hardware and/or software, and the image searching apparatus may be configured in an electronic device, where the electronic device may be a terminal device or a server, etc., and the embodiment of the present invention is not limited thereto.

As shown in fig. 3, the image searching method provided by the embodiment of the invention specifically includes the following steps:

s310, at least two target images to be processed are acquired.

Specifically, corresponding at least two target images are acquired in response to an image search requirement initiated by an image search requirement party, so that a subsequent image search task can be conveniently carried out according to the acquired target images.

S320, inputting the at least two target images into an image search model to obtain the distances between different target images; the image search model can be obtained by training the image search model by adopting the training method.

Specifically, at least two obtained target images are used as input images and input into an image search model, so that distances among different target images can be obtained. It should be noted that, the selected image search model may be obtained through training by the training method of the image search model.

According to the embodiment of the invention, at least two target images to be processed are acquired; inputting the at least two target images into an image search model to obtain the distance between different target images; the image search model can be obtained by training the image search model by adopting the training method. According to the technical scheme provided by the embodiment of the invention, the distance between different target images is determined through the image search model, so that an image search task can be conveniently carried out according to the obtained distance between the different target images, and the image search efficiency and the search accuracy are further improved.

Example IV

Fig. 4 is a schematic structural diagram of a training device for an image search model according to a fourth embodiment of the present invention. As shown in fig. 4, the training device of the image search model includes: a data processing module 410, a local feature extraction module 420, a global sequence determination module 430, a quantization sequence determination module 440, and a loss function construction module 450. Wherein:

the data processing module 410 is configured to perform M times of data processing on N original images, so as to obtain a sample image set corresponding to the original images;

the local feature extraction module 420 is configured to extract local features of the sample images in each sample image set through a backbone network in an image search model to be trained, so as to obtain a local feature sequence corresponding to the original image;

The global sequence determining module 430 is configured to perform global feature extraction on the local feature sequences of the original images through an attention layer in an image search model, so as to obtain global feature sequences of the original images;

a quantization sequence determining module 440, configured to quantize the global feature sequence of the original image through a quantization layer in an image search model, to obtain a quantized feature sequence of the original image;

the loss function construction module 450 is configured to construct a loss function according to the global feature sequence and the quantized feature sequence of each original image, and perform unsupervised training on the image search model by using the loss function.

Optionally, the loss function construction module 450 includes:

the hidden similarity determining unit is used for determining hidden similarity according to the correlation between different global coefficients in the global feature sequence of any original image and the correlation between the global coefficient corresponding to the original image and the global coefficients in the global feature sequences of other original images;

a quantized similarity determining unit, configured to determine quantized feature similarity according to a correlation between different quantized features in a quantized feature sequence of the original image and a correlation between a quantized feature corresponding to the original image and the quantized features in quantized feature sequences of other original images;

and the loss determination unit is used for summing the hidden similarity and the quantized feature similarity, taking the summation result as a loss function, and performing unsupervised training on the image search model by adopting the loss function.

Optionally, the global sequence determining module 430 includes:

the global coefficient determining unit is used for processing the local feature map of the original image by adopting a first vector in the attention layer to obtain a first processing result, processing the global coefficient of the original image by adopting a second vector in the attention layer to obtain a second processing result, and obtaining the global coefficient of the original image according to the first processing result, the second processing result and a preset feature sequence; the local feature map belongs to the local feature sequence, and the global coefficient belongs to the global feature sequence;

The global feature determining unit is used for processing the local feature map of the original image by adopting a third vector in the attention layer to obtain a third processing result, and determining a global feature value of the original image according to the third processing result and a global coefficient of the original image; the global feature value belongs to the global feature sequence.

Optionally, the quantization sequence determining module 440 is specifically configured to:

a quantization intermediate amount determination unit configured to, for a quantization matrix at any position, take a product between a global feature value at the position and the quantization matrix at the position as a quantization intermediate amount at the position; the global feature value at the position belongs to the global feature sequence of the original image;

the quantization characteristic determining unit is used for summing the quantization intermediate quantities at each position to obtain an intermediate quantity sum, determining the ratio between the quantization intermediate quantity at the position and the intermediate quantity sum, and determining the quantization characteristic of the original image according to the ratio and the quantization matrix at the position; the quantized features of the original image belong to a sequence of quantized features of the original image.

Optionally, the data processing module 410 includes:

the data processing group determining unit is used for screening and combining the candidate data processing modes aiming at each original image to obtain a data processing group of the original image; the data processing group comprises M times of data processing modes;

And the image sequence determining unit is used for performing M times of data processing on the original image by adopting the data processing group of the original image to obtain a sample image set corresponding to the original image.

The training device for the image search model provided by the embodiment of the invention can execute the training method for the image search model provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example five

Fig. 5 is a schematic structural diagram of an image searching device according to a fifth embodiment of the present invention. As shown in fig. 5, the image search apparatus includes: a target image determination module 510 and a target distance determination module 520.

Wherein:

a target image determining module 510, configured to obtain at least two target images to be processed;

the target distance determining module 520 is configured to input the at least two target images into an image search model to obtain distances between different target images; wherein, the image search model can be provided by a training device of the image search model.

The image searching device provided by the embodiment of the invention can execute the image searching method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.

Example six

Fig. 6 shows a schematic diagram of an electronic device 600 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 6, the electronic device 600 includes at least one processor 610, and a memory, such as a Read Only Memory (ROM) 620, a Random Access Memory (RAM) 630, etc., communicatively coupled to the at least one processor 610, wherein the memory stores computer programs executable by the at least one processor, and the processor 610 may perform various suitable actions and processes according to the computer programs stored in the Read Only Memory (ROM) 620 or the computer programs loaded from the storage unit 680 into the Random Access Memory (RAM) 630. In (RAM) 630, various programs and data required for the operation of electronic device 600 may also be stored. The processors 610, (RAM) 620, and (RAM) 630 are connected to each other by a bus 640. An input/output (I/O) interface 650 is also connected to bus 640.

Various components in electronic device 600 are connected to I/O interface 650, including: an input unit 660 such as a keyboard, a mouse, etc.; an output unit 670 such as various types of displays, speakers, and the like; a storage unit 680 such as a magnetic disk, an optical disk, or the like; and a communication unit 690 such as a network card, modem, wireless communication transceiver, etc. The communication unit 690 allows the electronic device 600 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.

The processor 610 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 610 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 610 performs the various methods and processes described above, such as a training method for an image search model or an image search method.

In some embodiments, a training method of an image search model or an image search method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 680. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the (RAM) 620 and/or the communication unit 690. When the computer program is loaded into (RAM) 630 and executed by processor 610, one or more steps of a training method of an image search model or an image search method described above may be performed. Alternatively, in other embodiments, processor 610 may be configured to perform a training method of an image search model or an image search method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for training an image search model, comprising:

2. The method of claim 1, wherein constructing a loss function from the global feature sequence and the quantized feature sequence for each of the original images and performing unsupervised training on the image search model using the loss function comprises:

determining hidden similarity according to the correlation between different global coefficients in the global feature sequence of the original image and the correlation between the global coefficient corresponding to the original image and the global coefficients in the global feature sequences of other original images for any original image;

Determining quantized feature similarity according to the correlation between different quantized features in the quantized feature sequence of the original image and the correlation between the quantized features corresponding to the original image and the quantized features in the quantized feature sequences of other original images;

and summing the hidden similarity and the quantized feature similarity, taking the summation result as a loss function, and performing unsupervised training on the image search model by adopting the loss function.

3. The method according to claim 1, wherein the global feature extraction is performed on the local feature sequences of the original images by the attention layer in the image search model, respectively, to obtain global feature sequences of the original images, including:

processing a local feature map of an original image by using a first vector in an attention layer to obtain a first processing result, processing a global coefficient of the original image by using a second vector in the attention layer to obtain a second processing result, and obtaining the global coefficient of the original image according to the first processing result, the second processing result and a preset feature sequence; the local feature map belongs to the local feature sequence, and the global coefficient belongs to the global feature sequence;

Processing the local feature image of the original image by adopting a third vector in the attention layer to obtain a third processing result, and determining a global feature value of the original image according to the third processing result and a global coefficient of the original image; the global feature value belongs to the global feature sequence.

4. The method according to claim 1, wherein the quantizing the global feature sequence of the original image by a quantization layer in an image search model is performed to obtain a quantized feature sequence of the original image, specifically for:

taking the product between the global eigenvalue at any position and the quantization matrix at the position as the quantization intermediate quantity at the position; the global feature value at the position belongs to the global feature sequence of the original image;

summing the quantized intermediate quantities at each position to obtain an intermediate quantity sum, determining the ratio between the quantized intermediate quantity at the position and the intermediate quantity sum, and determining the quantization characteristic of the original image according to the ratio and the quantization matrix at the position; the quantized features of the original image belong to a sequence of quantized features of the original image.

5. The method according to claim 1, wherein the performing M times of data processing on the N original images to obtain a sample image set corresponding to the original images includes:

Screening and combining candidate data processing modes aiming at each original image to obtain a data processing group of the original image; the data processing group comprises M times of data processing modes;

and carrying out M times of data processing on the original image by adopting the data processing group of the original image to obtain a sample image set corresponding to the original image.

6. An image search method, comprising:

acquiring at least two target images to be processed;

inputting the at least two target images into an image search model to obtain the distance between different target images; wherein the image search model is trained by the method of any one of claims 1-5.

7. A training device for an image search model, comprising:

8. The apparatus of claim 7, wherein the loss function construction module comprises:

9. The apparatus of claim 7, wherein the global sequence determination module comprises:

10. An image search apparatus, comprising:

the target distance determining module is used for inputting the at least two target images into an image searching model to obtain the distance between different target images; wherein the image search model is provided by the apparatus of any of claims 7-9.

11. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-6.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.