CN112287144B

CN112287144B - Picture retrieval method, equipment and storage medium

Info

Publication number: CN112287144B
Application number: CN202011181535.5A
Authority: CN
Inventors: 高毓声; 肖潇; 付马; 孟祥昊
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2022-07-05
Anticipated expiration: 2040-10-29
Also published as: CN112287144A

Abstract

The invention provides a picture retrieval method, picture retrieval equipment and a storage medium, wherein the method comprises the following steps: acquiring a picture to be detected with attribute label information; the attribute tag information comprises attribute types and attribute values corresponding to each attribute type; inputting the picture to be detected into a trained feature extraction network to obtain a retrieval feature vector; the trained feature extraction network extracts a first retrieval feature map from the picture to be detected, predicts an attribute confidence coefficient corresponding to each attribute label information according to the first retrieval feature map, and obtains a second retrieval feature map based on the attribute confidence coefficient and the first retrieval feature map; obtaining a retrieval feature vector based on the second retrieval feature map; acquiring a target picture matched with the picture to be searched from a data set to be searched according to the search characteristic vector; the method and the device improve the accuracy of the picture retrieval result.

Description

Picture retrieval method, equipment and storage medium

Technical Field

The invention relates to the technical field of computer application, in particular to a picture retrieval method, picture retrieval equipment and a storage medium.

Background

The picture retrieval is a method for extracting a characteristic vector from a picture to be detected or a video screenshot and then retrieving a data set to be retrieved consisting of a large number of pictures with a query target based on the characteristic vector. The main part of the picture retrieval method is the extraction of the feature vector. The current extraction method based on deep learning is a mainstream of the image retrieval feature extraction method due to high speed, high efficiency and strong adaptability.

However, the existing deep learning method still has some defects: the features extracted by the existing method are insufficient, even the attributes which need to be concerned by the user are difficult to focus, and the attributes which the user does not want to be concerned about can cause interference to the retrieval result. These problems severely affect the everyday use of this technology. For example, when performing a non-motor vehicle search, the following two situations may occur: (1) the key attributes are not sufficiently valued by the feature extraction model. For example, the following often occurs in the existing models during retrieval: the retrieval target is a green electric vehicle, a green bicycle exists in the retrieval result, and the green electric vehicle is arranged behind the bicycle due to color difference possibly caused by illumination. However, the user needs to search for the electric vehicle, and thus naturally wants to place all electric vehicles in front of the bicycle in the search result. (2) Objects of non-critical nature can interfere with the extracted information, eventually leading to retrieval errors. For example, model training is disturbed because some non-motor vehicles have drivers and some do not have drivers in the training data, thereby resulting in the network model being overfitting to consider the driver as one of the retrieval targets. The above results in poor accuracy of the retrieval result.

Disclosure of Invention

In view of the problems in the prior art, an object of the present invention is to provide a picture retrieval method, a device and a storage medium, so as to improve the accuracy of a picture retrieval result.

In order to achieve the above object, the present invention provides a picture retrieval method, comprising the steps of:

acquiring a picture to be detected with attribute tag information; the attribute tag information comprises attribute types and attribute values corresponding to each attribute type;

inputting the picture to be detected into a trained feature extraction network to obtain a retrieval feature vector; the trained feature extraction network extracts a first retrieval feature map from the picture to be detected, predicts an attribute confidence coefficient corresponding to each attribute value according to the first retrieval feature map, and obtains a second retrieval feature map based on the attribute confidence coefficient and the first retrieval feature map; obtaining a retrieval feature vector based on the second retrieval feature map;

and acquiring a target picture matched with the picture to be detected from the data set to be searched according to the retrieval feature vector.

Preferably, before the step of obtaining the picture to be tested with the attribute tag information, the method further includes the steps of:

acquiring a training set formed by a plurality of sample images with the attribute label information;

constructing a feature extraction network, wherein the feature extraction network comprises a backbone network, an attribute prediction branch network and a feature extraction branch network;

constructing a loss function;

and training a feature extraction network according to the training set and the loss function to obtain the trained feature extraction network.

Preferably, the training a feature extraction network according to the training set and the loss function to obtain a trained feature extraction network includes:

the main network extracts a first sample characteristic graph from the sample images of the training set, and the first sample characteristic graph is used as the input of the attribute prediction branch network and the characteristic extraction branch network;

the attribute prediction branch network takes the first sample characteristic graph as input and predicts attribute confidence corresponding to each attribute value under each attribute type in the sample image;

the feature extraction branch network is used for multiplying the predicted attribute confidence coefficient with the first sample feature map to obtain a second sample feature map; obtaining a first feature vector based on the second sample feature map;

and taking the first feature vector as an input of the loss function, and calculating the prediction loss of all sample pairs.

Preferably, the constructing a loss function comprises: respectively constructing loss functions corresponding to the attribute prediction branch network and the feature extraction branch network;

wherein, the constructing the loss function corresponding to the feature extraction branch network comprises:

carrying out interval weighting processing on the prediction loss of the sample pair; the weighted sample pair interval is (m/sim)_ij) Wherein, sim_ijRepresenting the similarity of sample image i and sample image j, and m represents the pre-weighting interval between sample image i and sample image j.

Preferably, the constructing a loss function corresponding to the feature extraction branch network further includes:

carrying out loss value weighting processing on the prediction loss of the sample pair; the weighted loss value is (loss)_ij*sim_ij)，loss_ijRepresenting the prediction loss value before weighting.

Preferably, the inputting the picture to be detected into the trained feature extraction network to obtain the retrieval feature vector includes:

multiplying the preset mask vector by the second retrieval feature map to obtain a third retrieval feature map; reserving the attribute type corresponding to the element position with the value of 1 in the preset mask vector and contained in the second retrieval feature map; the value of an element in the preset mask vector is 0 or 1;

taking the picture to be detected and the third retrieval feature map as input of a trained feature extraction network, and obtaining short feature vectors corresponding to the reserved attribute types;

and fully connecting the short feature vectors corresponding to all the reserved attribute types to obtain the retrieval feature vector.

Preferably, the obtaining a target picture matched with the picture to be retrieved from the data set to be retrieved according to the retrieval feature vector includes:

acquiring second feature vectors corresponding to the pictures in the data set to be retrieved;

respectively calculating the similarity between the retrieval feature vector and the second feature vector corresponding to each picture;

and taking the picture corresponding to the second feature vector with the maximum similarity as a target picture.

Preferably, after the step of obtaining a training set consisting of a plurality of sample images with attribute label information, the method further comprises:

and performing data enhancement processing on all the sample images, wherein the attribute value in each attribute label information is unchanged in the data enhancement processing process.

obtaining the similarity of each attribute type between every two sample images according to the attribute label information of all the sample images in the training set;

and establishing a sample similarity matrix corresponding to each attribute type according to the similarity of each two sample images on each attribute type.

The invention also provides a picture retrieval device, comprising:

a processor;

a memory having stored therein executable instructions of the processor;

wherein the processor is configured to perform the steps of any of the above-described picture retrieval methods via execution of the executable instructions.

The present invention also provides a computer-readable storage medium storing a program which, when executed by a processor, implements the steps of any of the above-described picture retrieval methods.

Compared with the prior art, the invention has the following advantages and prominent effects:

according to the picture retrieval method, the picture retrieval equipment and the storage medium, the attribute confidence degrees corresponding to the attribute values under each attribute type are obtained by utilizing the feature extraction network prediction, and the first retrieval feature map is subjected to weighting intervention by utilizing the attribute confidence degrees, so that the attribute type corresponding to the attribute value with high weight is concerned, and the attribute type corresponding to the attribute value with low weight is not concerned; the problem that key attributes in the current retrieval result are not concerned is solved, and the accuracy of picture retrieval is improved;

the preset mask vector determined according to the needs of the user is multiplied by the second feature map to obtain the retrieval feature vector associated with the reserved attribute type, so that the relevant retrieval of the attribute type which is not reserved can be shielded, the problem of interference of non-key attributes to the retrieval is solved, the workload and the resource consumption for performing secondary logic judgment and cleaning on the retrieval result are saved, and the accuracy and the efficiency of the picture retrieval are improved.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart illustrating a picture retrieval method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of step S60 in FIG. 1;

FIG. 3 is a schematic flowchart of step S70 in FIG. 1;

FIG. 4 is a flowchart illustrating a method for retrieving pictures according to another embodiment of the present invention;

FIG. 5 is a schematic flowchart of step S40 in FIG. 4;

FIG. 6 is a schematic structural diagram of a picture retrieval system according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an image retrieval device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus their repetitive description will be omitted.

As shown in fig. 1, an embodiment of the present invention discloses a picture retrieval method, which includes the following steps:

and S50, acquiring the picture to be detected with the attribute label information. Specifically, the attribute tag information may include an attribute type and an attribute value corresponding to the attribute type. The attribute type may include color or non-motor vehicle type, etc. The picture to be detected is a picture which is used as a reference when a target picture is retrieved from the data set to be retrieved in the later period. In this embodiment, the image to be detected is labeled with the attribute tag information, and then the image to be detected is obtained. The application does not limit the method, and the picture to be detected can be obtained first, and then attribute label information is marked on the picture to be detected.

And S60, inputting the picture to be detected into the trained feature extraction network to obtain a retrieval feature vector. The trained feature extraction network extracts a first retrieval feature map from the picture to be detected, predicts the attribute confidence coefficient corresponding to each attribute value under all attribute types in the picture to be detected according to the first retrieval feature map, and obtains a second retrieval feature map based on the attribute confidence coefficient and the first retrieval feature map; and obtaining a retrieval feature vector based on the second retrieval feature map.

Specifically, as shown in fig. 2, step S60 includes:

s601, multiplying the preset mask vector by the second retrieval feature map to obtain a third retrieval feature map; and reserving the attribute type corresponding to the element position with the value of 1 in the preset mask vector in the second retrieval feature map. The preset mask vector is generated by presetting according to the attribute type required to be reserved by a user. The value of the element in the preset mask vector is 0 or 1. Each element corresponds to an attribute type, 1 indicates that the attribute type is retained, and 0 indicates that the attribute type is discarded.

In this way, the attribute type which is not needed is set to be 0, and the attribute type contained in the second retrieval feature map corresponding to the element position with the value of 1 in the preset mask vector is reserved; finally, the features corresponding to the reserved attributes can be normally extracted, and the features corresponding to the masked attributes are 0. For example, the attribute type corresponding to the attribute of the driver on the non-motor vehicle is set to 0, so that the network model shields the attention of the driver attribute in the retrieval process.

And S602, taking the picture to be detected and the third retrieval feature map as input of the trained feature extraction network, and obtaining a short feature vector corresponding to the reserved attribute type. And

and S603, fully connecting the short feature vectors corresponding to all the reserved attribute types to obtain a retrieval feature vector.

Therefore, the finally output retrieval feature vector only contains the information of the reserved attribute, the information of the attribute needing to be shielded is removed, and the problem of interference of the non-key attribute on the retrieval result is solved; therefore, the workload and the resource consumption for performing secondary logic judgment and cleaning on the retrieval result are saved, and the accuracy and the retrieval efficiency of the picture retrieval are improved.

And S70, acquiring a target picture matched with the picture to be detected from the data set to be searched according to the search characteristic vector. Specifically, as shown in fig. 3, step S70 includes:

s701, second feature vectors corresponding to the pictures in the data set to be retrieved are obtained. That is, the picture in the data set to be retrieved is used to replace the picture to be detected, and the picture is used as the input of the trained feature extraction network in step S60, so as to obtain the second feature vector corresponding to each picture.

S702, respectively calculating the similarity between the search feature vector and the second feature vector corresponding to each picture. The calculation of the similarity in this step can be implemented by using the prior art, for example, by calculating the cosine distance between two vectors, and the like, which is not described in detail in this embodiment. And

and S703, taking the picture corresponding to the second feature vector with the maximum similarity as a target picture. And the target picture closest to, i.e. most similar to, the picture to be detected can be found.

In this embodiment, the feature extraction branch network model includes n hidden layers and a global pooling layer. And the last layer of the hidden layers is a convolution layer, and the convolution layer is used for outputting k x s second feature maps. The global pooling layer is used for outputting k × s short feature vectors; k is the number of the reserved attribute types, and s is the length of each second feature map or the length of each short feature vector. n is a preset parameter, and n is a positive integer.

As shown in fig. 4, on the basis of the above embodiment, another embodiment of the present application discloses another picture retrieval method, which, before step S50 of the above embodiment, further includes the following steps:

s10, a training set composed of a plurality of sample images with attribute label information is obtained. Specifically, the training set may include at least one set of sample data, where each set of sample data includes a sample image and attribute label information of the sample image. The attribute tag information may include an attribute type and an attribute value corresponding to the attribute type. The attribute type may include color or non-motor vehicle type, etc. The attribute label information and the sample image are in one-to-one correspondence.

In this embodiment, the following steps may be further included after step S10 and before step S20:

and according to the attribute label information of all the sample images in the training set, obtaining the similarity of each two sample images on each attribute type. For example, the absolute value of the difference between the attribute values belonging to the same attribute type in the attribute label information of each of the two sample images may be used as the similarity between the two sample images in the attribute type. The smaller the absolute value, the greater the similarity, indicating that the two sample images are more similar. Conversely, the larger the absolute value, the smaller the similarity, indicating that the two sample images are more dissimilar.

For example, the attribute label information of the sample image i contains the attribute type of the color. In this attribute type, the attribute value corresponding to pink is 1, the attribute value corresponding to red is 2, and the attribute value corresponding to blue is 3. Then the absolute value of the difference between pink and red is less than the absolute value of the difference between pink and blue. Therefore, red is closer to pink than blue. It should be noted that the above is only an exemplary description of the calculation method of the similarity, and those skilled in the art may select an appropriate method to calculate the similarity between the sample images as needed in the specific implementation.

And establishing a sample similarity matrix corresponding to each attribute type according to the similarity of each two sample images on each attribute type. Specifically, in this embodiment, a plurality of sample similarity matrices are established, and each sample similarity matrix corresponds to one attribute type. The number of attribute types is equal to the number of sample similarity matrices. Wherein, the ith row and the jth element in the sample similarity matrix represent the similarity of the sample image i and the sample image j on an attribute type. For example, the fifth element in the third row of the sample similarity matrix is the similarity between the sample image three and the sample image five in the attribute type.

And S20, constructing a feature extraction network, wherein the feature extraction network comprises a backbone network, an attribute prediction branch network and a feature extraction branch network.

S30, constructing a loss function. Specifically, the loss function is used to calculate the difference between the data in the training results and the data labeled in the attribute label information. Step S30 includes:

and respectively constructing loss functions corresponding to the attribute prediction branch network and the feature extraction branch network.

And the loss function corresponding to the attribute prediction branch network is used for calculating the loss of the training result of the attribute prediction branch network. And the loss function corresponding to the feature extraction branch network is used for calculating the loss of the training result of the feature extraction branch network. The loss function corresponding to the attribute prediction branch network can be constructed by using the prior art, such as a Softmax (logistic regression model) classification loss function.

Wherein, the above-mentioned loss function corresponding to constructing the feature extraction branch network includes:

carrying out interval weighting processing on the prediction loss of the sample pair; the weighted sample pair interval is (m/sim)_ij) Wherein, sim_ijAnd representing the similarity of the sample image i and the sample image j, wherein the similarity can be obtained by the generated sample similarity matrix. m represents the pre-weighting interval between sample image i and sample image j. Specifically, the predicted loss is an initial output result of a loss function corresponding to the feature extraction branch network in the training result, that is, an output result before weighting.

According to the method, after the loss function corresponding to the feature extraction branch network is constructed by using the image identification loss function based on the sample pair in the prior art, the obtained similarity between the two sample images is used for carrying out interval weighting. Therefore, for the sample pairs with high similarity, the optimized interval is smaller than the sample pairs with low similarity, so that when the network model carries out image retrieval according to the extracted feature vectors, the samples with high similarity are arranged in front of the samples with low similarity, and the accuracy of the image retrieval is improved.

For the above image identification loss function based on the sample pair, those skilled in the art can select an appropriate loss function as needed in specific implementation. Such as constrained loss (network-inferred-full-loss function), triplet loss (triple-loss function), etc.

And performing loss value weighting processing on the prediction loss of the sample pair; the weighted loss value is (loss)_ij*sim_ij)，loss_ijRepresenting the prediction loss value before weighting. Specifically, after the loss function corresponding to the feature extraction branch network is constructed by using the image recognition loss function based on the sample pair in the related art, the above-described loss value weighting process needs to be performed by using the similarity between the two obtained sample images.

Thus, the loss becomes large for samples with high similarity, and the loss becomes small for samples with low similarity. The network can pay more attention to the samples with high similarity, the learning of information difference among the samples with high similarity is emphasized, the network model can be helped to be converged more quickly, and the accuracy and the retrieval efficiency of the network model for retrieving pictures are improved.

And S40, training a feature extraction network according to the training set and the loss function to obtain a trained feature extraction network. Specifically, as shown in fig. 5, step S40 includes:

s401, using the backbone network to extract a first sample feature map from the sample image of the training set, and using the first sample feature map as an input of the attribute prediction branch network and the feature extraction branch network. In this embodiment, the backbone network may be a common backbone network in the prior art, such as ResNet (a residual network model), or may be customized by using a deep learning network structure. That is, the backbone network can be implemented by using the prior art, which is not limited in the present application.

And S402, predicting attribute confidence degrees corresponding to each attribute value under each attribute type in the sample image by using the attribute prediction branch network as input based on the first sample feature map. In this embodiment, the attribute-prediction branch network is composed of a deep learning network structure commonly found in the prior art, and includes a plurality of convolution layers, an active layer, a batch norm layer, and a Scale layer. The attribute prediction branch network takes the first sample characteristic graph output by the backbone network as input, and based on the deep learning network model structure in the prior art, the attribute confidence corresponding to each attribute value can be obtained.

S403, the attribute confidence degrees corresponding to all the attribute values obtained by the prediction of the attribute prediction branch network are multiplied by the first sample feature map by using the feature extraction branch network to obtain a second sample feature map; and obtaining a first feature vector based on the second sample feature map. Specifically, the feature extraction branch network additionally includes an attention mechanism structure based on a deep learning network structure common in the prior art, that is, based on a deep learning network structure similar to the attribute prediction branch network. Specifically, the attention mechanism structure multiplies the attribute confidence coefficient obtained by the prediction of the attribute prediction branch network by the first sample feature map output by the main network, namely, the first sample feature map is weighted by the attribute confidence coefficient, and a second sample feature map is output. The process of obtaining the first feature vector using the second sample feature map can be implemented with reference to the prior art.

Therefore, the feature extraction branch network utilizes the attention mechanism of the neural network, so that the key attribute has higher attribute confidence and is more concerned in the training process, the problem that the key attribute in the current retrieval result is not concerned is solved, and the accuracy of picture retrieval is favorably improved.

S404, using the first feature vector as an input of the loss function, calculates prediction losses of all sample pairs. This step can be implemented using existing techniques, and is not described in detail herein.

And S405, continuously correcting the feature extraction network by using the predicted loss to obtain the trained feature extraction network. This step can be implemented using existing techniques, and is not described in detail herein.

In another embodiment of the present application, based on the above embodiment, the attribute tag information includes an attribute value. Between steps S10 and S20, step S70 is further included:

and performing data enhancement processing on all the sample images, such as rotation or random cropping processing. And in the data enhancement processing process, the attribute value in each attribute tag information is not changed, for example, the color attribute value cannot be changed, and red cannot be changed into pink. Therefore, the richness of the sample data in the training set is improved, and the accuracy of the final network model retrieval result is improved.

As shown in fig. 6, the embodiment of the present invention further discloses an image retrieval system 6, which includes:

the to-be-detected picture acquisition module 61 is used for acquiring a to-be-detected picture with attribute tag information; the attribute tag information includes attribute types and attribute values corresponding to each of the attribute types.

A retrieval feature vector obtaining module 62, configured to input the to-be-detected picture into a trained feature extraction network, so as to obtain a retrieval feature vector; the trained feature extraction network extracts a first retrieval feature map from the picture to be detected, predicts an attribute confidence coefficient corresponding to each attribute value according to the first retrieval feature map, and obtains a second retrieval feature map based on the attribute confidence coefficient and the first retrieval feature map; and obtaining a retrieval feature vector based on the second retrieval feature map.

And the target picture retrieval module 63 is configured to obtain a target picture matched with the picture to be retrieved from the data set to be retrieved according to the retrieval feature vector.

It is understood that the picture retrieval system of the present invention further includes other existing functional modules that support the operation of the picture retrieval system. The picture retrieval system shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

The image retrieval system in this embodiment is used to implement the above-mentioned image retrieval method, so for the specific implementation steps of the image retrieval system, reference may be made to the description of the image retrieval method, which is not described herein again.

The embodiment of the invention also discloses picture retrieval equipment, which comprises a processor and a memory, wherein the memory stores the executable instruction of the processor; the processor is configured to perform the steps of the above-described picture retrieval method via execution of executable instructions. Fig. 7 is a schematic structural diagram of a picture retrieval device disclosed by the present invention. An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 7. The electronic device 600 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different platform components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.

Wherein the storage unit stores program code which can be executed by the processing unit 610 such that the processing unit 610 performs the steps according to various exemplary embodiments of the present invention as described in the above-mentioned picture retrieval method section of the present specification. For example, processing unit 610 may perform the steps as shown in fig. 1.

The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.

The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.

The invention also discloses a computer readable storage medium for storing a program, wherein the program realizes the steps in the picture retrieval method when executed. In some possible embodiments, the aspects of the present invention may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned picture retrieval methods of this specification when the program product is run on the terminal device.

As shown above, when the program of the computer-readable storage medium of this embodiment is executed, the attribute confidence corresponding to each attribute tag information is predicted by using the attribute prediction branch network, and then the second feature map is weighted and output by using the attribute confidence based on the attention mechanism of the neural network, so as to solve the problem that the key attribute in the current retrieval result is not focused; based on the second feature map obtained after weighting by using the attribute confidence coefficient and a preset mask vector, obtaining a retrieval feature vector, and solving the problem of interference of non-key attributes on retrieval; therefore, the workload and the resource consumption for performing secondary logic judgment and cleaning on the retrieval result are saved, and the accuracy and the efficiency of the picture retrieval are improved.

Fig. 8 is a schematic structural diagram of a computer-readable storage medium of the present invention. Referring to fig. 8, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

According to the picture retrieval method, the picture retrieval equipment and the picture retrieval storage medium, the attribute confidence degrees corresponding to the attribute values under each attribute type are obtained through feature extraction network prediction, and the first retrieval feature map is subjected to weighting intervention by using the attribute confidence degrees, so that the attribute type corresponding to the attribute value with high weight is concerned, and the attribute type corresponding to the attribute value with low weight is not concerned; the problem that key attributes in the current retrieval result are not concerned is solved, and the accuracy of picture retrieval is improved;

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. An image retrieval method is characterized by comprising the following steps:

2. The picture retrieval method according to claim 1, wherein, prior to the step of obtaining the picture to be tested having the attribute tag information, the method further comprises the steps of:

constructing a loss function;

3. The method of claim 2, wherein the training a feature extraction network according to the training set and the loss function to obtain a trained feature extraction network comprises:

the main network extracts a first sample feature map from the sample images of the training set, and the first sample feature map is used as the input of the attribute prediction branch network and the feature extraction branch network;

4. The picture retrieval method of claim 2, wherein the constructing a loss function comprises: respectively constructing loss functions corresponding to the attribute prediction branch network and the feature extraction branch network;

carrying out interval weighting processing on the prediction loss of the sample pair; the weighted sample pair interval is m/sim_ijWherein, sim_ijRepresenting the similarity of sample image i and sample image j, and m represents the pre-weighting interval between sample image i and sample image j.

5. The picture retrieval method of claim 4, wherein the constructing the loss function corresponding to the feature extraction branch network further comprises:

carrying out loss value weighting processing on the prediction loss of the sample pair; weighted loss value is loss_ij*sim_ij，loss_ijRepresenting the prediction loss value before weighting.

6. The method as claimed in claim 1, wherein the step of inputting the picture to be detected into the trained feature extraction network to obtain the search feature vector comprises:

taking the picture to be detected and the third retrieval feature map as input of a trained feature extraction network, and obtaining a short feature vector corresponding to the reserved attribute type;

and fully connecting the short characteristic vectors corresponding to all the reserved attribute types to obtain a retrieval characteristic vector.

7. The picture retrieval method according to claim 1, wherein the obtaining a target picture matching the picture to be retrieved from the data set to be retrieved according to the retrieval feature vector comprises:

8. The picture retrieval method of claim 2, wherein after the step of obtaining a training set of a plurality of sample images with attribute tag information, the method further comprises:

9. The picture retrieval method of claim 2, wherein after the step of obtaining a training set of a plurality of sample images with attribute tag information, the method further comprises:

10. An image retrieval device, characterized by comprising:

a processor;

a memory having stored therein executable instructions of the processor;

wherein the processor is configured to perform the steps of the picture retrieval method of any one of claims 1 to 9 via execution of the executable instructions.

11. A computer-readable storage medium storing a program, wherein the program when executed by a processor implements the steps of the picture retrieval method of any one of claims 1 to 9.