CN108875076A

CN108875076A - A kind of quick trademark image retrieval method based on Attention mechanism and convolutional neural networks

Info

Publication number: CN108875076A
Application number: CN201810750096.1A
Authority: CN
Inventors: 冯永; 张英琦; 尚家兴; 强保华; 邱媛媛
Original assignee: Chongqing University; Guilin University of Electronic Technology
Current assignee: Chongqing University; Guilin University of Electronic Technology
Priority date: 2018-07-10
Filing date: 2018-07-10
Publication date: 2018-11-23
Anticipated expiration: 2038-07-10
Also published as: CN108875076B

Abstract

The quick trademark image retrieval method based on Attention mechanism and convolutional neural networks that the invention discloses a kind of, including Caffe deep learning Open Framework is built, open source VGG16 network model is trained；Based on the design of VGG16 network model include the Attention network of two layers of convolutional layer, and adds Attention network in trained VGG16 network model；The VGG16 network model for being added to Attention network is trained using the training set in FlickrLogos-32 data set；Attention-MAC trade mark Feature Selection Model is generated based on the trained VGG16 network model for being added to Attention network；Trademark image to be checked is retrieved based on Attention-MAC trade mark Feature Selection Model, and generates search result.The present invention avoids the parameter using full articulamentum redundancy, achievees the purpose that reduced model, improves the speed of training and retrieval, reduces false detection rate.

Description

It is a kind of to be examined based on the quick trademark image of Attention mechanism and convolutional neural networks Suo Fangfa

Technical field

The present invention relates to field of computer technology, are based on Attention mechanism and convolutional Neural net more particularly to one kind The quick trademark image retrieval method of network.

Background technique

In recent years, with the rapid development of economy, the registration amount of all kinds of trade marks is rising year by year.Therefore, trade mark retrieval pair Have great importance in trade mark registration, management and protection.For trade mark registration side, can be found in time by trade mark retrieval The obstacle of trade mark registration application, such as trade mark to be registered whether by other people rush to register, whether with other existing trade marks in certain block region There are similar etc..For trademark office, by trade mark retrieval can intelligence obtain have quotient in trade mark to be registered and database Target similarity degree reduces the workload using artificial contrast, improves work efficiency.

Traditional trade mark detection includes based on the artificial database images retrieval for marking and comparing, based on quotient with identifying system Mark coding retrieval, the image retrieval based on image binaryzation feature and the retrieval based on keyword etc. of graphical element.Tradition It to the mark of image and compares in method and needs to take a substantial amount of time and manpower, increase the cost of trademark image retrieval.With Gradualling mature for convolutional neural networks, trademark image is retrieved using convolutional neural networks and classify as trade mark detect With the new approaches of identifying system.Model is trained using the training image in data set first, adjusts each network layer Weight is tested with the image in test set later, and the model for finally completing test is applied to system.Although such method Intelligence degree is higher, but structure is complicated for traditional area convolutional neural networks, complete to connect layer parameter redundancy, cause training process compared with Slowly, it needs to consume a lot of time and resources；And in such method, for data set, it is desirable that it marks classification and location information Completely, however for trademark office, not fully, it is dfficult to apply to training for the image labeling in trademark database.

Summary of the invention

In view of this, the present invention provides a kind of quick trademark image based on Attention mechanism and convolutional neural networks As search method, by combining Attention mechanism with convolutional neural networks, and remove full articulamentum on this basis, Using the character representation of the output representative image of convolutional neural networks middle layer, a kind of quick trade mark detection method is provided.

In order to realize above-mentioned purpose of the invention, the present invention provides one kind to be based on Attention mechanism and convolutional Neural The quick trademark image retrieval method of network, described method includes following steps：

S1, Caffe deep learning Open Framework is built, open source VGG16 network model is trained；

S2, the Attention network comprising two layers of convolutional layer is designed based on VGG16 network model, and trained Attention network is added in VGG16 network model；

S3, using the training set in FlickrLogos-32 data set to the VGG16 network for being added to Attention network Model is trained；

S4, Attention-MAC quotient is generated based on the trained VGG16 network model for being added to Attention network Mark Feature Selection Model；

S5, trademark image to be checked is retrieved based on Attention-MAC trade mark Feature Selection Model, and generates inspection Hitch fruit.

Preferably, the step S1 includes the following steps：

S1-1, Caffe deep learning Open Framework is built, VGG16 network model is carried out using ImageNet data set Pre-training；

S1-2, the VGG16 network model that pre-training is obtained using the training set in FlickrLogos-32 data set into The training of row transfer learning.

Preferably, the step S2 includes the following steps：

S2-1, the model parameter design based on VGG16 network model include the Attention network of two layers of convolutional layer；

S2-2, between the last layer output and full articulamentum of the pond layer of the trained VGG16 network model, Add designed Attention network.

Preferably, the step S3 includes the following steps：

S3-1, the fixed network weight for being added to characteristic extraction part in the VGG16 network model of Attention network；

S3-2, using the training set in FlickrLogos-32 data set to the VGG16 net for being added to Attention network Attention network in network model is trained.

Preferably, the step S4 includes the following steps：

S4-1, remove full articulamentum in the trained VGG16 network model for being added to Attention network；

S4-2, pond layer is added after Attention network, generate Attention-MAC trade mark Feature Selection Model.

Preferably, the step S5 includes the following steps：

S5-1, the characteristics of image table that Attention-MAC trade mark Feature Selection Model is exported by main method for feature analysis Show carry out dimensionality reduction, obtains Attention-MAC feature vector；

S5-2, using the trademark image in FlickrLogos-32 data set as input picture, be sequentially inputted to In Attention-MAC trade mark Feature Selection Model, the Attention-MAC feature vector of branding data collection image is generated, and Construct the feature database of branding data collection image；

S5-3, using trademark image to be retrieved as input picture, be input to Attention-MAC trade mark Feature Selection Model In, generate the Attention-MAC feature vector of trademark image to be retrieved；

S5-4, the Attention-MAC feature vector and FlickrLogos-32 data set for calculating trademark image to be retrieved The cosine similarity of the Attention-MAC feature vector of image obtains branding data collection image based on the first of cosine similarity Begin to sort；

S5-5, FlickrLogos-32 data images are reset by expanding query, obtains branding data collection figure As finally sorting with the similarity of trademark image to be retrieved and reporting trade mark search result.

Preferably, the step S1-2 includes the following steps：

S1-2-1, the VGG16 network model that pre-training is obtained using the training set in FlickrLogos-32 data set Network weight be finely adjusted；

S1-2-2, in the training of transfer learning, pre-training is obtained using standard cross entropy loss function VGG16 net Network model carries out classification based training.

Preferably, the step S5-1 includes the following steps：

S5-1-1, L is carried out to the character representation of Attention-MAC trade mark Feature Selection Model output₂Regularization；

S5-1-2, by main method for feature analysis, to treated, character representation carries out the feature after feature selecting obtains dimensionality reduction Vector；

S5-1-3, the feature vector after dimensionality reduction is subjected to L again₂Regularization obtain Attention-MAC feature to Amount.

Preferably, the pond layer added in the step S4-2 is using regional average value pond method to Attention-MAC The character representation of trade mark Feature Selection Model output is handled, wherein regional average value pond method specifically includes as follows Step：

C1, input picture pass through trained Attention-MAC trade mark Feature Selection Model, will output one group of W × H × The space matrix of K；

C2, the set x={ x that this group of three-dimensional matrice is regarded as to one group of two dimensional character response matrix_i, wherein i=1, 2...k, k is the overall channel number of this group output two dimensional character figure, x_iRepresent the two dimensional character response square of ith feature channel output Battle array；

C3, Ω is enabled to represent the two dimensional character response matrix W that ith feature channel exports_i×H_iIn all possible position, x_i(p) x is represented_iThe response of upper position p, it is assumed that：

Continuity based on feature is calculated with x_i(p_j) centered on 3 × 3 regions characteristic mean as x_iPondization it is defeated Out, i.e.,

Wherein, p_lThe all pixels point that 3 × 3 regions include thus, n=1,2...9, due to position x_i(p_j) thus 3 × 3 The center in region, therefore as l=5, there is x_i(p₅)=x_i(p_j), by each layer of output characteristic response matrix W_i×H_iIt calculates Regional average value pond, the k dimensional feature for obtaining image indicate：

Feature vector f_ΩIt is the output of Attention-MAC trade mark Feature Selection Model.

In conclusion the invention discloses a kind of quick trademark image based on Attention mechanism and convolutional neural networks As search method, Caffe deep learning Open Framework is built first, and open source VGG16 network model is trained；Then it is based on The design of VGG16 network model includes the Attention network of two layers of convolutional layer, and adds in trained VGG16 network model Add Attention network；Then using the training set in FlickrLogos-32 data set to being added to Attention network VGG16 network model is trained；And then it is generated based on the trained VGG16 network model for being added to Attention network Attention-MAC trade mark Feature Selection Model；Finally based on Attention-MAC trade mark Feature Selection Model to quotient to be checked Logo image is retrieved, and generates search result.The present invention combines Attention mechanism with convolutional neural networks, proposes Attention-MAC trade mark Feature Selection Model do not include any full articulamentum, but to middle layer convolution output feature Figure is handled, and the character representation of original image is obtained, and is avoided the parameter using full articulamentum redundancy, has been reached reduced model Purpose, while the speed of training and retrieval is improved, reduce false detection rate.

Additional aspect and advantage of the invention will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.

Detailed description of the invention

Above-mentioned and/or additional aspect of the invention and advantage will become from the description of the embodiment in conjunction with the following figures Obviously and it is readily appreciated that, wherein：

Fig. 1 is a kind of based on Attention mechanism and convolutional Neural net in a kind of preferred embodiment disclosed by the invention The basic flow chart of the quick trademark image retrieval method of network；

Fig. 2 is that the network structure of the VGG16 network model of pre-training in a kind of preferred embodiment disclosed by the invention is shown It is intended to；

Fig. 3 is after adding Attention network proposed by the invention in a kind of preferred embodiment disclosed by the invention Network structure schematic diagram；

Fig. 4 is Attention-MAC trademark image proposed by the invention in a kind of preferred embodiment disclosed by the invention As the network structure of retrieval model is illustrated；

Fig. 5 is that Attention-MAC trademark image retrieval model is utilized in a kind of preferred embodiment disclosed by the invention Carry out the retrieval flow figure of trade mark detection.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be understood that, term " longitudinal direction ", " transverse direction ", "upper", "lower", "front", "rear", The orientation or positional relationship of the instructions such as "left", "right", "vertical", "horizontal", "top", "bottom" "inner", "outside" is based on attached drawing institute The orientation or positional relationship shown, is merely for convenience of description of the present invention and simplification of the description, rather than the dress of indication or suggestion meaning It sets or element must have a particular orientation, be constructed and operated in a specific orientation, therefore should not be understood as to limit of the invention System.

In the description of the present invention, unless otherwise specified and limited, it should be noted that term " installation ", " connected ", " connection " shall be understood in a broad sense, for example, it may be mechanical connection or electrical connection, the connection being also possible to inside two elements can , can also indirectly connected through an intermediary, for the ordinary skill in the art to be to be connected directly, it can basis Concrete condition understands the concrete meaning of above-mentioned term.

The present invention provides a kind of quick trademark image retrieval side based on Attention mechanism and convolutional neural networks Method, as shown in Figure 1, including the following steps：

Preferably, the step S1 includes the following steps：

Preferably, the step S2 includes the following steps：

Preferably, the step S3 includes the following steps：

Preferably, the step S4 includes the following steps：

Preferably, the step S5 includes the following steps：

Preferably, the step S1-2 includes the following steps：

Preferably, the step S5-1 includes the following steps：

Fig. 2 is that the network structure of the VGG16 network model of pre-training in a kind of preferred embodiment disclosed by the invention is shown It is intended to, the VGG network model of selection includes 13 layers of convolutional layer and 3 layers of full articulamentum；The knot of training on ImageNet data set Fruit is 1000 kinds of class categories of output.

Below with reference to Fig. 3, illustrate training for Attention-MAC trade mark Feature Selection Model proposed by the invention Journey specifically comprises the following steps：

Sa, Caffe deep learning Open Framework is built, VGG16 network model is carried out using ImageNet data set pre- Training；

Sb, the VGG16 network model that pre-training is obtained using the training image collection in FlickrLogos-32 data set It is finely adjusted, re -training network weight, completes transfer learning of the model under new data set, further increase and extract feature Accuracy；

Sc, trained VGG16 network model is added between the output of the last layer pond layer and full articulamentum Attention network proposed by the present invention；Fixed character extracts subnetwork weight, uses FlickrLogos-32 data set Training image is trained Attention network.

Sd, trained model is removed to full articulamentum, retains other parts；Pond is added after Attention network Layer, obtains Attention-MAC trade mark Feature Selection Model proposed by the present invention.

Above-mentioned steps are described in more detail below：

Transfer learning described in step Sb comprises the steps of:

The selection of A1, data set.FlickerLogos-32 data set is divided into three parts P1, P2, P3, respectively trains Collection, verifying collection and test set.Wherein each part includes 32 different trademark class, and P1 includes 320 images, and P2 includes 3960 images, P3 include 3960 images, and the image of each part is non-intersecting.

The adjustment of A2, loss function.In the training of transfer learning, model is carried out using standard cross entropy loss function Classification based training：

Wherein, y represents the classification of model output in training；y^*The true classification of representative image, (such as a 0-1 vector Fruit belongs to a certain classification, then representing the position of the category is 1,0) remaining is is；1 is complete 1 vector.

The design and training of Attention network described in step Sc comprise the steps of：

The model parameter of B1, the VGG16 selected according to the present invention, the output of convolutional layer Conv5_3 will be defeated behind pond This output is known as Origin_feature_map by W × H × K space matrix out.

As the Attention network proposed in B2, the present invention, (Attention_ is referred to as comprising level 2 volume lamination Conv_1 and Attention_Conv_2), use Softplus function as activation primitive.

The convolution kernel size that B3, wherein Attention_Conv_1 are used is 1 × 1, the port number of number and space matrix It is mutually all k；For the convolution kernel size that Attention_Conv_2 is used for 1 × 1, number is 1.Training is initial at random when starting Change the parameter θ of convolution kernel.

B4, the parameter θ of Attention network is trained using standard cross entropy loss function and backpropagation.Instruction Experienced purpose is the significance level by Attention e-learning each feature into Origin_feature_map.Therefore, Define the output function φ (f of Attention network_i；θ), for indicating feature f_iScore on origin_feature_map (i.e. weight), wherein f_i∈R^k, i=1,2...k.

After the completion of B5, Attention network training, function phi (f will be passed through in step B4_i；The score φ arrived θ)_iAs Weight, corresponding feature vector f in Origin_feature_map_iIt is multiplied to get the W × H × K adjusted to one group by weight Space matrix, by this output be known as attention_feature_map.

Pond layer described in step Sd proposes a kind of method in new regional average value pond in the present invention.Specifically Steps are as follows for calculating：

C1, input picture pass through trained Attention-MAC trade mark Feature Selection Model, will output one group of W × H × The space matrix of K.

C2, the set x={ x that this group of three-dimensional matrice is regarded as to one group of two dimensional character response matrix_i, wherein i=1, 2...k, k is the overall channel number of this group output two dimensional character figure, x_iRepresent the two dimensional character response square of ith feature channel output Battle array.

Below with reference to Fig. 4, illustrate the calculating process of Attention-MAC feature vector proposed in the present invention.

For the image feature representation of Attention-MAC trade mark Feature Selection Model output, pass through main signature analysis side Method carries out dimensionality reduction to output character representation, obtains Attention-MAC feature vector proposed by the invention.Circular It is as follows：

At D1, the k dimensional feature expression exported for Attention-MAC trade mark Feature Selection Model in step Sd Reason.Over-fitting in order to prevent, first to f_ΩCarry out L₂Regularization.

D2, the k dimensional feature after processed is indicated, using the method for main signature analysis, feature selecting is carried out, for k A feature, main method for feature analysis, which calculates, finds out the wherein l feature with high correlation or high interactive information, and data are tieed up from k Drop to l dimension.

D3, the l dimensional vector that main method for feature analysis is calculated carry out L again₂Regularization.

The image feature vector that D4, above-mentioned steps obtain, Attention-MAC feature vector as of the present invention.

Below with reference to Fig. 5, illustrate proposed by the invention Attention-MAC trademark image retrieval model to be utilized to carry out quotient Mark the retrieval flow of detection.

E1, using the trademark image in FlickrLogos-32 data set as input picture, sequentially input Attention- MAC trademark image retrieval model, obtain indicate FlickrLogos-32 data images l tie up Attention-MAC feature to Measure f_i, i=1,2...n, n is the number of trade mark in FlickrLogos-32 data set.

E2, using trademark image to be retrieved as input picture, be input to Attention-MAC trade mark Feature Selection Model In, obtain the Attention-MAC feature vector q of trademark image to be retrieved_A。

E3, successively calculated using cosine similarity image to be retrieved in FlickrLogos-32 data set image it is similar Degree.

Utilize similarity s_iThe similarity for representing i-th image and image to be retrieved in FlickrLogos-32 data set obtains Point, according to score s_iIt is initially sorted to FlickrLogos-32 data images.

In E4, the initial sequence of selection, preceding 5 images are come, by its Attention-MAC feature vector and quotient to be retrieved Mean value is calculated after the Attention-MAC feature vector summation of logo image,

The q that will be obtained_reAs new query vector, preceding 100 images in initial sequence are selected, q is used_reCalculate phase Like degree, preceding 100 images are reset according to new score, obtain ranking results to the end.

It should be noted that system structure shown in Fig. 1-Fig. 5 of the present invention or method flow are of the invention some Preferred embodiment is shown here and simply facilitates the understanding present invention and be not considered as limiting the invention, of the invention Under thought guidance, it is within the scope of the present invention to implement the structure or method obtained according to the technique and scheme of the present invention, Therefore not to repeat here.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any One or more embodiment or examples in can be combined in any suitable manner.

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Although an embodiment of the present invention has been shown and described, it will be understood by those skilled in the art that：Not A variety of change, modification, replacement and modification can be carried out to these embodiments in the case where being detached from the principle of the present invention and objective, this The range of invention is defined by the claims and their equivalents.

Claims

1. a kind of quick trademark image retrieval method based on Attention mechanism and convolutional neural networks, which is characterized in that institute The method of stating includes the following steps：

S2, the Attention network comprising two layers of convolutional layer is designed based on VGG16 network model, and in trained VGG16 net Attention network is added in network model；

S3, using the training set in FlickrLogos-32 data set to the VGG16 network model for being added to Attention network It is trained；

S4, Attention-MAC trade mark spy is generated based on the trained VGG16 network model for being added to Attention network Sign extracts model；

S5, trademark image to be checked is retrieved based on Attention-MAC trade mark Feature Selection Model, and generates retrieval knot Fruit.

2. the quick trademark image retrieval side according to claim 1 based on Attention mechanism and convolutional neural networks Method, which is characterized in that the step S1 includes the following steps：

S1-1, Caffe deep learning Open Framework is built, VGG16 network model is instructed in advance using ImageNet data set Practice；

S1-2, the VGG16 network model that pre-training obtains is moved using the training set in FlickrLogos-32 data set Move the training of study.

3. the quick trademark image retrieval side according to claim 1 based on Attention mechanism and convolutional neural networks Method, which is characterized in that the step S2 includes the following steps：

S2-2, between the last layer output and full articulamentum of the pond layer of the trained VGG16 network model, addition Designed Attention network.

4. the quick trademark image retrieval side according to claim 1 based on Attention mechanism and convolutional neural networks Method, which is characterized in that the step S3 includes the following steps：

S3-2, using the training set in FlickrLogos-32 data set to the VGG16 network mould for being added to Attention network Attention network in type is trained.

5. the quick trademark image retrieval side according to claim 1 based on Attention mechanism and convolutional neural networks Method, which is characterized in that the step S4 includes the following steps：

6. the quick trademark image retrieval side according to claim 1 based on Attention mechanism and convolutional neural networks Method, which is characterized in that the step S5 includes the following steps：

S5-1, the image feature representation that Attention-MAC trade mark Feature Selection Model is exported by main method for feature analysis into Row dimensionality reduction obtains Attention-MAC feature vector；

S5-2, using the trademark image in FlickrLogos-32 data set as input picture, be sequentially inputted to Attention- In MAC trade mark Feature Selection Model, the Attention-MAC feature vector of branding data collection image is generated, and constructs trade mark number According to the feature database of collection image；

S5-3, using trademark image to be retrieved as input picture, be input in Attention-MAC trade mark Feature Selection Model, Generate the Attention-MAC feature vector of trademark image to be retrieved；

S5-4, the Attention-MAC feature vector and FlickrLogos-32 data images for calculating trademark image to be retrieved Attention-MAC feature vector cosine similarity, obtain initial row of the branding data collection image based on cosine similarity Sequence；

S5-5, FlickrLogos-32 data images are reset by expanding query, obtain branding data collection image with The similarity of trademark image to be retrieved finally sorts and reports trade mark search result.

7. the quick trademark image retrieval side according to claim 2 based on Attention mechanism and convolutional neural networks Method, which is characterized in that the step S1-2 includes the following steps：

The net of S1-2-1, the VGG16 network model that pre-training is obtained using the training set in FlickrLogos-32 data set Network weight is finely adjusted；

S1-2-2, in the training of transfer learning, pre-training is obtained using standard cross entropy loss function VGG16 network mould Type carries out classification based training.

8. the quick trademark image retrieval side according to claim 6 based on Attention mechanism and convolutional neural networks Method, which is characterized in that the step S5-1 includes the following steps：

S5-1-2, by main method for feature analysis to treated character representation carry out feature selecting obtain the feature after dimensionality reduction to Amount；

S5-1-3, the feature vector after dimensionality reduction is subjected to L again₂Regularization obtains Attention-MAC feature vector.

9. the quick trademark image retrieval side according to claim 5 based on Attention mechanism and convolutional neural networks Method, which is characterized in that the pond layer added in the step S4-2 is using regional average value pond method to Attention-MAC quotient The character representation of mark Feature Selection Model output is handled, wherein regional average value pond method specifically includes following step Suddenly：

C1, input picture pass through trained Attention-MAC trade mark Feature Selection Model, export one group of W × H × K sky Between matrix；

C2, the set x={ x that this group of three-dimensional matrice is regarded as to one group of two dimensional character response matrix_i, wherein i=1,2...k, k are The overall channel number of this group output two dimensional character figure, x_iRepresent the two dimensional character response matrix of ith feature channel output；

C3, Ω is enabled to represent the two dimensional character response matrix W that ith feature channel exports_i×H_iIn all possible position, x_i(p) Represent x_iThe response of upper position p, it is assumed that：

Continuity based on feature is calculated with x_i(p_j) centered on 3 × 3 regions characteristic mean as x_iPondization output, i.e.,

Wherein, p_lThe all pixels point that 3 × 3 regions include thus, n=1,2...9, due to position x_i(p_j) 3 × 3 region thus Center have x therefore as l=5_i(p₅)=x_i(p_j), by each layer of output characteristic response matrix W_i×H_iZoning Mean value pond, the k dimensional feature for obtaining image indicate：