CN108875076B - Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network - Google Patents

Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network Download PDF

Info

Publication number
CN108875076B
CN108875076B CN201810750096.1A CN201810750096A CN108875076B CN 108875076 B CN108875076 B CN 108875076B CN 201810750096 A CN201810750096 A CN 201810750096A CN 108875076 B CN108875076 B CN 108875076B
Authority
CN
China
Prior art keywords
attention
trademark
network
training
mac
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810750096.1A
Other languages
Chinese (zh)
Other versions
CN108875076A (en
Inventor
冯永
张英琦
尚家兴
强保华
邱媛媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Guilin University of Electronic Technology
Original Assignee
Chongqing University
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University, Guilin University of Electronic Technology filed Critical Chongqing University
Priority to CN201810750096.1A priority Critical patent/CN108875076B/en
Publication of CN108875076A publication Critical patent/CN108875076A/en
Application granted granted Critical
Publication of CN108875076B publication Critical patent/CN108875076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network, which comprises the steps of constructing a Caffe deep learning open source framework and training an open source VGG16 network model; designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into a trained VGG16 network model; training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set; generating an Attention-MAC trademark feature extraction model based on a trained VGG16 network model added with an Attention network; and retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result. The invention avoids using the redundant parameters of the full connection layer, achieves the purpose of simplifying the model, improves the training and searching speed and reduces the false detection rate.

Description

Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network
Technical Field
The invention relates to the technical field of computers, in particular to a rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network.
Background
In recent years, with rapid economic development, the registration amount of various trademarks has increased year by year. Therefore, trademark retrieval is of great significance to trademark registration, management, and protection. For a trademark registration party, barriers of trademark registration application can be timely found through trademark retrieval, such as whether trademarks to be registered are preempted by others, whether similarity exists in a certain area with other existing trademarks, and the like. For the trademark office, the similarity between the trademark to be registered and the existing trademark in the database can be intelligently obtained through trademark retrieval, the workload of manual comparison is reduced, and the working efficiency is improved.
The traditional trademark detection and identification system comprises database image retrieval based on manual labeling and comparison, trademark graphic element-based coding retrieval, image retrieval based on image binarization characteristics, keyword-based retrieval and the like. In the traditional method, a large amount of time and labor are consumed for marking and comparing the images, and the cost of searching the trademark images is increased. With the gradual maturity of the convolutional neural network, the search and classification of the trademark image by using the convolutional neural network becomes a new idea of a trademark detection and identification system. Firstly, training a model by using a training image in a data set, adjusting the weight of each network layer, then testing by using the image in a test set, and finally applying the tested model to a system. Although the degree of intellectualization is higher, the traditional regional convolutional neural network has a complex structure and redundant parameters of a full connection layer, so that the training process is slow and a large amount of time and resources are consumed; in addition, in this method, the data set is required to be completely labeled with the category and the positioning information, but for the trademark office, the image labeling in the trademark database is not complete, so that the method is difficult to be applied to training.
Disclosure of Invention
In view of the above, the invention provides a fast trademark image retrieval method based on an Attention mechanism and a convolutional neural network, which combines the Attention mechanism with the convolutional neural network, removes a full connection layer on the basis, and uses the output of a convolutional neural network intermediate layer to represent the characteristic representation of an image, thereby providing a fast trademark detection method.
In order to achieve the above object, the present invention provides a fast trademark image retrieval method based on an Attention mechanism and a convolutional neural network, the method comprising the steps of:
s1, building a Caffe deep learning open source framework, and training an open source VGG16 network model;
s2, designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into the trained VGG16 network model;
s3, training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set;
s4, generating an Attention-MAC trademark feature extraction model based on the trained VGG16 network model added with the Attention network;
s5, retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result.
Preferably, the step S1 includes the steps of:
s1-1, building a Caffe deep learning open source framework, and pre-training a VGG16 network model by using an ImageNet data set;
and S1-2, carrying out transfer learning training on the VGG16 network model obtained by pre-training by using a training set in a FlickrLogos-32 data set.
Preferably, the step S2 includes the steps of:
s2-1, designing an Attention network comprising two convolutional layers based on model parameters of a VGG16 network model;
s2-2, adding a designed Attention network between the output of the last layer of the trained pooling layer of the VGG16 network model and a full connection layer.
Preferably, the step S3 includes the steps of:
s3-1, fixing the network weight of the feature extraction part in the VGG16 network model added with the Attention network;
s3-2, training the Attention network in the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set.
Preferably, the step S4 includes the steps of:
s4-1, removing a trained full connection layer in the VGG16 network model added with the Attention network;
s4-2, adding a pooling layer behind the Attention network to generate an Attention-MAC trademark feature extraction model.
Preferably, the step S5 includes the steps of:
s5-1, performing dimensionality reduction on image feature representation output by the Attention-MAC trademark feature extraction model through a main feature analysis method to obtain an Attention-MAC feature vector;
s5-2, taking trademark images in FlickrLogos-32 data sets as input images, sequentially inputting the input images into an Attention-MAC trademark feature extraction model, generating Attention-MAC feature vectors of the trademark data set images, and constructing a feature library of the trademark data set images;
s5-3, inputting the trademark image to be retrieved into an Attention-MAC trademark feature extraction model as an input image, and generating an Attention-MAC feature vector of the trademark image to be retrieved;
s5-4, calculating cosine similarity of the Attention-MAC feature vector of the trademark image to be retrieved and the Attention-MAC feature vector of the FlickrLogos-32 data set image to obtain initial sequencing of the trademark data set image based on the cosine similarity;
s5-5, rearranging FlickrLogos-32 data set images through expanding query to obtain the final sequence of the similarity between the trademark data set images and the trademark images to be retrieved and reporting the trademark retrieval result.
Preferably, the step S1-2 includes the steps of:
s1-2-1, fine-tuning the network weight of the VGG16 network model obtained through pre-training by using a training set in a FlickrLogos-32 data set;
s1-2-2, in the training of transfer learning, a standard cross entropy loss function is used for carrying out classification training on the VGG16 network model obtained through pre-training.
Preferably, the step S5-1 includes the steps of:
s5-1-1, L representation of the feature output by the Attention-MAC trademark feature extraction model2Carrying out regularization treatment;
s5-1-2, performing feature selection on the processed feature representation through a main feature analysis method to obtain a feature vector after dimension reduction;
s5-1-3, carrying out L again on the feature vector after dimension reduction2And (5) regularizing to obtain an Attention-MAC feature vector.
Preferably, the pooling layer added in step S4-2 processes the feature representation output by the Attention-MAC trademark feature extraction model by using a region mean pooling method, wherein the region mean pooling method specifically includes the following steps:
c1, inputting an Attention-MAC trademark feature extraction model of the image after training, and outputting a group of space matrixes W multiplied by H multiplied by K;
c2, regarding the set of three-dimensional matrixes as a set x ═ x of a set of two-dimensional characteristic response matrixesiK, k being the total number of channels of the set of output two-dimensional feature maps, xiA two-dimensional characteristic response matrix representing the output of the ith characteristic channel;
c3, let Ω represent the two-dimensional characteristic response matrix W of the ith characteristic channel outputi×HiAll possible positions in (1), xi(p) represents xiResponse at upper position p, assume:
Figure BDA0001725343510000041
based on the continuity of the features, calculate by xi(pj) The characteristic mean of the centered 3 × 3 region is taken as xiOf pooled output, i.e.
Figure BDA0001725343510000042
Wherein p islFor this purpose, all pixels in the 3 × 3 region, n is 1,2.. 9, because of the position xi(pj) For this purpose, the center of the 3 × 3 region, so when l is 5, there is xi(p5)=xi(pj) By outputting a characteristic response matrix W for each layeri×HiThe mean value of the calculated area is pooled,obtaining a k-dimensional feature representation of the image:
Figure BDA0001725343510000043
feature vector fΩIs the output of the Attention-MAC trademark feature extraction model.
In summary, the invention discloses a rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network, which comprises the steps of firstly, building a Caffe deep learning open source framework and training an open source VGG16 network model; designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into the trained VGG16 network model; then training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set; generating an Attention-MAC trademark feature extraction model based on the trained VGG16 network model added with the Attention network; and finally, retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result. The invention combines the Attention mechanism with the convolutional neural network, and the provided Attention-MAC trademark feature extraction model does not comprise any full connection layer, but processes the feature graph output by the convolution of the middle layer to obtain the feature representation of the original image, thereby avoiding using the redundant parameters of the full connection layer, achieving the purpose of simplifying the model, simultaneously improving the training and retrieval speed and reducing the false detection rate.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a basic flow chart of a fast trademark image retrieval method based on an Attention mechanism and a convolutional neural network in a preferred embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a network structure of a pre-trained VGG16 network model in a preferred embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a network structure after an Attention network proposed by the present invention is added in a preferred embodiment disclosed in the present invention;
FIG. 4 is a schematic network structure of the Attention-MAC trademark image retrieval model proposed by the present invention in a preferred embodiment disclosed in the present invention;
FIG. 5 is a search flow chart for trademark detection using the Attention-MAC trademark image search model in a preferred embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.
In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.
The invention provides a rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network, which comprises the following steps as shown in figure 1:
s1, building a Caffe deep learning open source framework, and training an open source VGG16 network model;
s2, designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into the trained VGG16 network model;
s3, training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set;
s4, generating an Attention-MAC trademark feature extraction model based on the trained VGG16 network model added with the Attention network;
s5, retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result.
Preferably, the step S1 includes the steps of:
s1-1, building a Caffe deep learning open source framework, and pre-training a VGG16 network model by using an ImageNet data set;
and S1-2, carrying out transfer learning training on the VGG16 network model obtained by pre-training by using a training set in a FlickrLogos-32 data set.
Preferably, the step S2 includes the steps of:
s2-1, designing an Attention network comprising two convolutional layers based on model parameters of a VGG16 network model;
s2-2, adding a designed Attention network between the output of the last layer of the trained pooling layer of the VGG16 network model and a full connection layer.
Preferably, the step S3 includes the steps of:
s3-1, fixing the network weight of the feature extraction part in the VGG16 network model added with the Attention network;
s3-2, training the Attention network in the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set.
Preferably, the step S4 includes the steps of:
s4-1, removing a trained full connection layer in the VGG16 network model added with the Attention network;
s4-2, adding a pooling layer behind the Attention network to generate an Attention-MAC trademark feature extraction model.
Preferably, the step S5 includes the steps of:
s5-1, performing dimensionality reduction on image feature representation output by the Attention-MAC trademark feature extraction model through a main feature analysis method to obtain an Attention-MAC feature vector;
s5-2, taking trademark images in FlickrLogos-32 data sets as input images, sequentially inputting the input images into an Attention-MAC trademark feature extraction model, generating Attention-MAC feature vectors of the trademark data set images, and constructing a feature library of the trademark data set images;
s5-3, inputting the trademark image to be retrieved into an Attention-MAC trademark feature extraction model as an input image, and generating an Attention-MAC feature vector of the trademark image to be retrieved;
s5-4, calculating cosine similarity of the Attention-MAC feature vector of the trademark image to be retrieved and the Attention-MAC feature vector of the FlickrLogos-32 data set image to obtain initial sequencing of the trademark data set image based on the cosine similarity;
s5-5, rearranging FlickrLogos-32 data set images through expanding query to obtain the final sequence of the similarity between the trademark data set images and the trademark images to be retrieved and reporting the trademark retrieval result.
Preferably, the step S1-2 includes the steps of:
s1-2-1, fine-tuning the network weight of the VGG16 network model obtained through pre-training by using a training set in a FlickrLogos-32 data set;
s1-2-2, in the training of transfer learning, a standard cross entropy loss function is used for carrying out classification training on the VGG16 network model obtained through pre-training.
Preferably, the step S5-1 includes the steps of:
s5-1-1, L representation of the feature output by the Attention-MAC trademark feature extraction model2Carrying out regularization treatment;
s5-1-2, performing feature selection on the processed feature representation through a main feature analysis method to obtain a feature vector after dimension reduction;
s5-1-3, carrying out L again on the feature vector after dimension reduction2And (5) regularizing to obtain an Attention-MAC feature vector.
In summary, the invention discloses a rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network, which comprises the steps of firstly, building a Caffe deep learning open source framework and training an open source VGG16 network model; designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into the trained VGG16 network model; then training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set; generating an Attention-MAC trademark feature extraction model based on the trained VGG16 network model added with the Attention network; and finally, retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result. The invention combines the Attention mechanism with the convolutional neural network, and the provided Attention-MAC trademark feature extraction model does not comprise any full connection layer, but processes the feature graph output by the convolution of the middle layer to obtain the feature representation of the original image, thereby avoiding using the redundant parameters of the full connection layer, achieving the purpose of simplifying the model, simultaneously improving the training and retrieval speed and reducing the false detection rate.
FIG. 2 is a schematic diagram of a network structure of a pre-trained VGG16 network model in a preferred embodiment of the present disclosure, wherein the selected VGG network model includes 13 convolutional layers and 3 fully-connected layers; the result of training on the ImageNet dataset is to output 1000 classification categories.
The following describes the training process of the Attention-MAC trademark feature extraction model proposed by the present invention with reference to FIG. 3, which specifically includes the following steps:
sa, building a Caffe deep learning open source framework, and pre-training the VGG16 network model by using an ImageNet data set;
sb, fine tuning the VGG16 network model obtained by pre-training by using a training image set in a FlickrLogos-32 data set, retraining the network weight, completing transfer learning of the model under a new data set, and further improving the accuracy of feature extraction;
sc, adding the Attention network provided by the invention between the output of the last pooling layer and the full connection layer for the trained VGG16 network model; partial network weights are extracted by fixing the features, and the Attention network is trained by using training images of a FlickrLogos-32 data set.
Sd, removing a full connection layer from the trained model, and keeping other parts; and adding a pooling layer behind the Attention network to obtain the Attention-MAC trademark feature extraction model provided by the invention.
The above steps are explained in more detail below:
the transfer learning described in step Sb includes the following steps:
a1, selection of data sets. The FlickerLogis-32 data set is divided into three parts, P1, P2 and P3, which are respectively a training set, a verification set and a test set. Where each part contains 32 different brand classes, P1 contains 320 pictures, P2 contains 3960 pictures, and P3 contains 3960 pictures, the pictures of each part being disjoint.
A2, adjustment of the loss function. In the training of the transfer learning, the model is classified and trained by using a standard cross entropy loss function:
Figure BDA0001725343510000101
wherein y represents the classification of the model output in the training; y is*Representing the real classification of the image, which is a 0-1 vector (if belonging to a certain class, the position representing the class is 1, and the rest are 0); 1 is an all 1 vector.
The design and training of the Attention network described in step Sc comprises the following steps:
b1, model parameters of VGG16 selected according to the invention, the output of convolutional layer Conv5_3 will be pooled and a W × H × K spatial matrix will be output, this output is called Origin _ feature _ map.
B2, the Attention network proposed in the present invention, includes 2 convolution layers (referred to as Attention _ Conv _1 and Attention _ Conv _2, respectively), using Softplus function as activation function.
B3, wherein the convolution kernel size adopted by the Attention _ Conv _1 is 1 × 1, and the number of the convolution kernels is k, which is the same as the number of channels of the spatial matrix; the convolution kernel size adopted by the Attention _ Conv _2 is 1 × 1, and the number is 1. The parameter θ of the convolution kernel is initialized randomly at the beginning of training.
B4, training the parameter theta of the Attention network by adopting a standard cross entropy loss function and back propagation. The purpose of the training is to learn the importance of each feature in the Origin _ feature _ map through the Attention network. Thus, the output function φ (f) of the Attention network is definedi(ii) a θ) for expressing the feature fiScore (i.e., weight) on origin _ feature _ map, where fi∈Rk,i=1,2...k。
B5, after the Attention network training is completed, the step B4 passes through the function phi (f)i(ii) a Theta) score ofiAs weights, the corresponding feature vector f in Origin _ feature _ mapiMultiplying to obtain a group of W × H × K space matrixes with adjusted weights, and referring the output as the attribute _ feature _ map.
The pooling layer described in step Sd, the present invention proposes a new method of area mean pooling. The specific calculation steps are as follows:
c1, outputting a group of W × H × K space matrix by the trained Attention-MAC trademark feature extraction model of the input image.
C2, regarding the set of three-dimensional matrixes as a set x ═ x of a set of two-dimensional characteristic response matrixesiK, k being the total number of channels of the set of output two-dimensional feature maps, xiA two-dimensional eigenresponse matrix representing the output of the ith eigenchannel.
C3, let Ω represent the two-dimensional characteristic response matrix W of the ith characteristic channel outputi×HiAll possible positions in (1), xi(p) represents xiResponse at upper position p, assume:
Figure BDA0001725343510000111
based on the continuity of the features, calculate by xi(pj) The characteristic mean of the centered 3 × 3 region is taken as xiOf pooled output, i.e.
Figure BDA0001725343510000112
Wherein p islFor this purpose, all pixels in the 3 × 3 region, n is 1,2.. 9, because of the position xi(pj) For this purpose, the center of the 3 × 3 region, so when l is 5, there is xi(p5)=xi(pj) By outputting a characteristic response matrix W for each layeri×HiCalculating the region mean pooling to obtain k-dimensional feature representation of the image:
Figure BDA0001725343510000113
feature vector fΩIs the output of the Attention-MAC trademark feature extraction model.
The calculation process of the Attention-MAC feature vector proposed in the present invention is described below with reference to FIG. 4.
For the image feature representation output by the Attention-MAC trademark feature extraction model, the output feature representation is subjected to dimensionality reduction through a main feature analysis method to obtain the Attention-MAC feature vector provided by the invention. The specific calculation method is as follows:
d1, processing the k-dimensional feature representation output by the Attention-MAC trademark feature extraction model in the step Sd. To prevent overfitting, f is first pairedΩCarry out L2And (6) regularizing.
D2, for the processed k-dimensional feature representation, selecting features by adopting a main feature analysis method, for k features, calculating and finding out l features with high correlation or high interaction information by using the main feature analysis method, and reducing the data from k dimension to l dimension.
D3, performing L again on the L-dimensional vector calculated by the principal feature analysis method2And (6) regularizing.
D4, the image feature vector obtained in the above steps is the Attention-MAC feature vector of the present invention.
The following describes a search process for trademark detection by using the Attention-MAC trademark image search model in accordance with the present invention with reference to fig. 5.
E1, using trademark images in FlickrLogos-32 data set as input images, sequentially inputting the Attention-MAC trademark image retrieval model to obtain a one-dimensional Attention-MAC characteristic vector f representing the FlickrLogos-32 data set imageiN, n is the number of trademarks in the FlickrLogos-32 dataset.
E2, inputting the trademark image to be retrieved into the Attention-MAC trademark feature extraction model as an input image to obtain the Attention-MAC feature vector q of the trademark image to be retrievedA
E3, sequentially calculating the similarity between the image to be retrieved and the images in the FlickrLogos-32 data set by utilizing the cosine similarity.
Figure BDA0001725343510000121
Using similarity siRepresenting the similarity score of the ith image and the image to be retrieved in the FlickrLogos-32 data set according to the score siThe FlickrLogos-32 dataset images were initially sorted.
E4, selecting the images in the initial ordering, arranging the images in the first 5, summing the Attention-MAC feature vector of the images and the Attention-MAC feature vector of the trademark image to be searched, calculating the average value,
Figure BDA0001725343510000131
the obtained q isreAs a new query vector, the first 100 images in the initial ranking are selected, using qreAnd calculating the similarity, and rearranging the first 100 images according to the new score to obtain the final sorting result.
It should be noted that the system structures or method flows shown in fig. 1 to fig. 5 of the present invention are only some preferred embodiments of the present invention, and the illustration is only for the convenience of understanding the present invention and is not to be construed as a limitation of the present invention.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (7)

1. A rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network is characterized by comprising the following steps:
s1, building a Caffe deep learning open source framework, and training an open source VGG16 network model;
s2, designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into the trained VGG16 network model;
s3, training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set;
s4, generating an Attention-MAC trademark feature extraction model based on the trained VGG16 network model added with the Attention network;
s4-1, removing a trained full connection layer in the VGG16 network model added with the Attention network;
s4-2, adding a pooling layer behind the Attention network to generate an Attention-MAC trademark feature extraction model; the pooling layer adopts a region mean pooling method to process the feature representation output by the Attention-MAC trademark feature extraction model, and the method specifically comprises the following steps:
c1, inputting an Attention-MAC trademark feature extraction model of the image after training, and outputting a group of space matrixes W multiplied by H multiplied by K;
c2, regarding the set of three-dimensional matrixes as a set x ═ x of a set of two-dimensional characteristic response matrixesiK, k being the total number of channels of the set of output two-dimensional feature maps, xiA two-dimensional characteristic response matrix representing the output of the ith characteristic channel;
c3, let Ω represent the two-dimensional characteristic response matrix W of the ith characteristic channel outputi×HiAll possible positions in (1), xi(p) represents xiResponse at upper position p, assume:
Figure FDA0003073525240000011
based on the continuity of the features, calculate by xi(pj) The characteristic mean of the centered 3 × 3 region is taken as xiOf pooled output, i.e.
Figure FDA0003073525240000021
Wherein p islFor this purpose, all pixels in the 3 × 3 region, n is 1,2.. 9, because of the position xi(pj) For this purpose, the center of the 3 × 3 region, so when l is 5, there is xi(p5)=xi(pj) By outputting a characteristic response matrix W for each layeri×HiCalculating the region mean pooling to obtain k-dimensional feature representation of the image:
fΩ=[fΩ,1,fΩ,2,...fΩ,i,...fΩ,k]T,
Figure FDA0003073525240000022
feature vector fΩIs the output of the Attention-MAC trademark feature extraction model;
s5, retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result.
2. The Attention mechanism and convolutional neural network based fast trademark image retrieval method of claim 1, wherein the step S1 comprises the steps of:
s1-1, building a Caffe deep learning open source framework, and pre-training a VGG16 network model by using an ImageNet data set;
and S1-2, carrying out transfer learning training on the VGG16 network model obtained by pre-training by using a training set in a FlickrLogos-32 data set.
3. The Attention mechanism and convolutional neural network based fast trademark image retrieval method of claim 1, wherein the step S2 comprises the steps of:
s2-1, designing an Attention network comprising two convolutional layers based on model parameters of a VGG16 network model;
s2-2, adding a designed Attention network between the output of the last layer of the trained pooling layer of the VGG16 network model and a full connection layer.
4. The Attention mechanism and convolutional neural network based fast trademark image retrieval method of claim 1, wherein the step S3 comprises the steps of:
s3-1, fixing the network weight of the feature extraction part in the VGG16 network model added with the Attention network;
s3-2, training the Attention network in the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set.
5. The Attention mechanism and convolutional neural network based fast trademark image retrieval method of claim 1, wherein the step S5 comprises the steps of:
s5-1, performing dimensionality reduction on image feature representation output by the Attention-MAC trademark feature extraction model through a main feature analysis method to obtain an Attention-MAC feature vector;
s5-2, taking trademark images in FlickrLogos-32 data sets as input images, sequentially inputting the input images into an Attention-MAC trademark feature extraction model, generating Attention-MAC feature vectors of the trademark data set images, and constructing a feature library of the trademark data set images;
s5-3, inputting the trademark image to be retrieved into an Attention-MAC trademark feature extraction model as an input image, and generating an Attention-MAC feature vector of the trademark image to be retrieved;
s5-4, calculating cosine similarity of the Attention-MAC feature vector of the trademark image to be retrieved and the Attention-MAC feature vector of the FlickrLogos-32 data set image to obtain initial sequencing of the trademark data set image based on the cosine similarity;
s5-5, rearranging FlickrLogos-32 data set images through expanding query to obtain the final sequence of the similarity between the trademark data set images and the trademark images to be retrieved and reporting the trademark retrieval result.
6. The Attention mechanism and convolutional neural network based fast trademark image retrieval method of claim 2, wherein the step S1-2 comprises the steps of:
s1-2-1, fine-tuning the network weight of the VGG16 network model obtained through pre-training by using a training set in a FlickrLogos-32 data set;
s1-2-2, in the training of transfer learning, a standard cross entropy loss function is used for carrying out classification training on the VGG16 network model obtained through pre-training.
7. The Attention mechanism and convolutional neural network-based fast trademark image retrieval method of claim 5, wherein the step S5-1 comprises the steps of:
s5-1-1, L representation of the feature output by the Attention-MAC trademark feature extraction model2Carrying out regularization treatment;
s5-1-2, performing feature selection on the processed feature representation through a main feature analysis method to obtain a feature vector after dimension reduction;
s5-1-3, carrying out L again on the feature vector after dimension reduction2And (5) regularizing to obtain an Attention-MAC feature vector.
CN201810750096.1A 2018-07-10 2018-07-10 Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network Active CN108875076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810750096.1A CN108875076B (en) 2018-07-10 2018-07-10 Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810750096.1A CN108875076B (en) 2018-07-10 2018-07-10 Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network

Publications (2)

Publication Number Publication Date
CN108875076A CN108875076A (en) 2018-11-23
CN108875076B true CN108875076B (en) 2021-07-20

Family

ID=64300452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810750096.1A Active CN108875076B (en) 2018-07-10 2018-07-10 Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network

Country Status (1)

Country Link
CN (1) CN108875076B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697257A (en) * 2018-12-18 2019-04-30 天罡网(北京)安全科技有限公司 It is a kind of based on the network information retrieval method presorted with feature learning anti-noise
CN109857897B (en) * 2019-02-14 2021-06-29 厦门一品威客网络科技股份有限公司 Trademark image retrieval method and device, computer equipment and storage medium
CN110334226B (en) * 2019-04-25 2022-04-05 吉林大学 Depth image retrieval method fusing feature distribution entropy
CN110599459A (en) * 2019-08-14 2019-12-20 深圳市勘察研究院有限公司 Underground pipe network risk assessment cloud system based on deep learning
CN111694974A (en) * 2020-06-12 2020-09-22 桂林电子科技大学 Depth hash vehicle image retrieval method integrating attention mechanism
CN111694977A (en) * 2020-06-12 2020-09-22 桂林电子科技大学 Vehicle image retrieval method based on data enhancement
CN111985161B (en) * 2020-08-21 2024-06-14 广东电网有限责任公司清远供电局 Reconstruction method of three-dimensional model of transformer substation
CN113127661B (en) * 2021-04-06 2023-09-12 中国科学院计算技术研究所 Multi-supervision medical image retrieval method and system based on cyclic query expansion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226826A (en) * 2013-03-20 2013-07-31 西安电子科技大学 Method for detecting changes of remote sensing image of visual attention model based on local entropy
CN108038519A (en) * 2018-01-30 2018-05-15 浙江大学 A kind of uterine neck image processing method and device based on dense feature pyramid network
WO2018106783A1 (en) * 2016-12-06 2018-06-14 Siemens Energy, Inc. Weakly supervised anomaly detection and segmentation in images
CN108171141A (en) * 2017-12-25 2018-06-15 淮阴工学院 The video target tracking method of cascade multi-pattern Fusion based on attention model
CN108229267A (en) * 2016-12-29 2018-06-29 北京市商汤科技开发有限公司 Object properties detection, neural metwork training, method for detecting area and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10558750B2 (en) * 2016-11-18 2020-02-11 Salesforce.Com, Inc. Spatial attention model for image captioning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226826A (en) * 2013-03-20 2013-07-31 西安电子科技大学 Method for detecting changes of remote sensing image of visual attention model based on local entropy
WO2018106783A1 (en) * 2016-12-06 2018-06-14 Siemens Energy, Inc. Weakly supervised anomaly detection and segmentation in images
CN108229267A (en) * 2016-12-29 2018-06-29 北京市商汤科技开发有限公司 Object properties detection, neural metwork training, method for detecting area and device
CN108171141A (en) * 2017-12-25 2018-06-15 淮阴工学院 The video target tracking method of cascade multi-pattern Fusion based on attention model
CN108038519A (en) * 2018-01-30 2018-05-15 浙江大学 A kind of uterine neck image processing method and device based on dense feature pyramid network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network";Le Wang;Jinliang Zang;Qilin Zhang;Zhenxing Niu;Gang Hua;《Sensors》;20180621;全文 *
"基于深度特征的快速人脸图像检索方法";李振东,钟勇,陈蔓,曹冬平;《光学学报》;20180530;第38卷(第10期);全文 *

Also Published As

Publication number Publication date
CN108875076A (en) 2018-11-23

Similar Documents

Publication Publication Date Title
CN108875076B (en) Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network
CN110866140B (en) Image feature extraction model training method, image searching method and computer equipment
CN103345645B (en) Commodity image class prediction method towards net purchase platform
EP4002161A1 (en) Image retrieval method and apparatus, storage medium, and device
CN107683469A (en) A kind of product classification method and device based on deep learning
CN108921198A (en) commodity image classification method, server and system based on deep learning
CN111898703B (en) Multi-label video classification method, model training method, device and medium
CN105354593B (en) A kind of threedimensional model sorting technique based on NMF
CN108154156B (en) Image set classification method and device based on neural topic model
CN116580257A (en) Feature fusion model training and sample retrieval method and device and computer equipment
CN113378938B (en) Edge transform graph neural network-based small sample image classification method and system
CN114510594A (en) Traditional pattern subgraph retrieval method based on self-attention mechanism
CN114332889A (en) Text box ordering method and text box ordering device for text image
CN113806580A (en) Cross-modal Hash retrieval method based on hierarchical semantic structure
CN113032613A (en) Three-dimensional model retrieval method based on interactive attention convolution neural network
Nezamabadi-pour et al. Concept learning by fuzzy k-NN classification and relevance feedback for efficient image retrieval
CN113378962A (en) Clothing attribute identification method and system based on graph attention network
CN112800262A (en) Image self-organizing clustering visualization method and device and storage medium
CN105069136A (en) Image recognition method in big data environment
CN110532409B (en) Image retrieval method based on heterogeneous bilinear attention network
CN106874927A (en) The construction method and system of a kind of random strong classifier
Adnan et al. Automated image annotation with novel features based on deep ResNet50-SLT
CN111768214A (en) Product attribute prediction method, system, device and storage medium
CN113378934A (en) Small sample image classification method and system based on semantic perception map neural network
CN113011506A (en) Texture image classification method based on depth re-fractal spectrum network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant