CN108875076B - Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network - Google Patents
Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network Download PDFInfo
- Publication number
- CN108875076B CN108875076B CN201810750096.1A CN201810750096A CN108875076B CN 108875076 B CN108875076 B CN 108875076B CN 201810750096 A CN201810750096 A CN 201810750096A CN 108875076 B CN108875076 B CN 108875076B
- Authority
- CN
- China
- Prior art keywords
- attention
- trademark
- network
- training
- mac
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network, which comprises the steps of constructing a Caffe deep learning open source framework and training an open source VGG16 network model; designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into a trained VGG16 network model; training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set; generating an Attention-MAC trademark feature extraction model based on a trained VGG16 network model added with an Attention network; and retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result. The invention avoids using the redundant parameters of the full connection layer, achieves the purpose of simplifying the model, improves the training and searching speed and reduces the false detection rate.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network.
Background
In recent years, with rapid economic development, the registration amount of various trademarks has increased year by year. Therefore, trademark retrieval is of great significance to trademark registration, management, and protection. For a trademark registration party, barriers of trademark registration application can be timely found through trademark retrieval, such as whether trademarks to be registered are preempted by others, whether similarity exists in a certain area with other existing trademarks, and the like. For the trademark office, the similarity between the trademark to be registered and the existing trademark in the database can be intelligently obtained through trademark retrieval, the workload of manual comparison is reduced, and the working efficiency is improved.
The traditional trademark detection and identification system comprises database image retrieval based on manual labeling and comparison, trademark graphic element-based coding retrieval, image retrieval based on image binarization characteristics, keyword-based retrieval and the like. In the traditional method, a large amount of time and labor are consumed for marking and comparing the images, and the cost of searching the trademark images is increased. With the gradual maturity of the convolutional neural network, the search and classification of the trademark image by using the convolutional neural network becomes a new idea of a trademark detection and identification system. Firstly, training a model by using a training image in a data set, adjusting the weight of each network layer, then testing by using the image in a test set, and finally applying the tested model to a system. Although the degree of intellectualization is higher, the traditional regional convolutional neural network has a complex structure and redundant parameters of a full connection layer, so that the training process is slow and a large amount of time and resources are consumed; in addition, in this method, the data set is required to be completely labeled with the category and the positioning information, but for the trademark office, the image labeling in the trademark database is not complete, so that the method is difficult to be applied to training.
Disclosure of Invention
In view of the above, the invention provides a fast trademark image retrieval method based on an Attention mechanism and a convolutional neural network, which combines the Attention mechanism with the convolutional neural network, removes a full connection layer on the basis, and uses the output of a convolutional neural network intermediate layer to represent the characteristic representation of an image, thereby providing a fast trademark detection method.
In order to achieve the above object, the present invention provides a fast trademark image retrieval method based on an Attention mechanism and a convolutional neural network, the method comprising the steps of:
s1, building a Caffe deep learning open source framework, and training an open source VGG16 network model;
s2, designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into the trained VGG16 network model;
s3, training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set;
s4, generating an Attention-MAC trademark feature extraction model based on the trained VGG16 network model added with the Attention network;
s5, retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result.
Preferably, the step S1 includes the steps of:
s1-1, building a Caffe deep learning open source framework, and pre-training a VGG16 network model by using an ImageNet data set;
and S1-2, carrying out transfer learning training on the VGG16 network model obtained by pre-training by using a training set in a FlickrLogos-32 data set.
Preferably, the step S2 includes the steps of:
s2-1, designing an Attention network comprising two convolutional layers based on model parameters of a VGG16 network model;
s2-2, adding a designed Attention network between the output of the last layer of the trained pooling layer of the VGG16 network model and a full connection layer.
Preferably, the step S3 includes the steps of:
s3-1, fixing the network weight of the feature extraction part in the VGG16 network model added with the Attention network;
s3-2, training the Attention network in the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set.
Preferably, the step S4 includes the steps of:
s4-1, removing a trained full connection layer in the VGG16 network model added with the Attention network;
s4-2, adding a pooling layer behind the Attention network to generate an Attention-MAC trademark feature extraction model.
Preferably, the step S5 includes the steps of:
s5-1, performing dimensionality reduction on image feature representation output by the Attention-MAC trademark feature extraction model through a main feature analysis method to obtain an Attention-MAC feature vector;
s5-2, taking trademark images in FlickrLogos-32 data sets as input images, sequentially inputting the input images into an Attention-MAC trademark feature extraction model, generating Attention-MAC feature vectors of the trademark data set images, and constructing a feature library of the trademark data set images;
s5-3, inputting the trademark image to be retrieved into an Attention-MAC trademark feature extraction model as an input image, and generating an Attention-MAC feature vector of the trademark image to be retrieved;
s5-4, calculating cosine similarity of the Attention-MAC feature vector of the trademark image to be retrieved and the Attention-MAC feature vector of the FlickrLogos-32 data set image to obtain initial sequencing of the trademark data set image based on the cosine similarity;
s5-5, rearranging FlickrLogos-32 data set images through expanding query to obtain the final sequence of the similarity between the trademark data set images and the trademark images to be retrieved and reporting the trademark retrieval result.
Preferably, the step S1-2 includes the steps of:
s1-2-1, fine-tuning the network weight of the VGG16 network model obtained through pre-training by using a training set in a FlickrLogos-32 data set;
s1-2-2, in the training of transfer learning, a standard cross entropy loss function is used for carrying out classification training on the VGG16 network model obtained through pre-training.
Preferably, the step S5-1 includes the steps of:
s5-1-1, L representation of the feature output by the Attention-MAC trademark feature extraction model2Carrying out regularization treatment;
s5-1-2, performing feature selection on the processed feature representation through a main feature analysis method to obtain a feature vector after dimension reduction;
s5-1-3, carrying out L again on the feature vector after dimension reduction2And (5) regularizing to obtain an Attention-MAC feature vector.
Preferably, the pooling layer added in step S4-2 processes the feature representation output by the Attention-MAC trademark feature extraction model by using a region mean pooling method, wherein the region mean pooling method specifically includes the following steps:
c1, inputting an Attention-MAC trademark feature extraction model of the image after training, and outputting a group of space matrixes W multiplied by H multiplied by K;
c2, regarding the set of three-dimensional matrixes as a set x ═ x of a set of two-dimensional characteristic response matrixesiK, k being the total number of channels of the set of output two-dimensional feature maps, xiA two-dimensional characteristic response matrix representing the output of the ith characteristic channel;
c3, let Ω represent the two-dimensional characteristic response matrix W of the ith characteristic channel outputi×HiAll possible positions in (1), xi(p) represents xiResponse at upper position p, assume:
based on the continuity of the features, calculate by xi(pj) The characteristic mean of the centered 3 × 3 region is taken as xiOf pooled output, i.e.
Wherein p islFor this purpose, all pixels in the 3 × 3 region, n is 1,2.. 9, because of the position xi(pj) For this purpose, the center of the 3 × 3 region, so when l is 5, there is xi(p5)=xi(pj) By outputting a characteristic response matrix W for each layeri×HiThe mean value of the calculated area is pooled,obtaining a k-dimensional feature representation of the image:
feature vector fΩIs the output of the Attention-MAC trademark feature extraction model.
In summary, the invention discloses a rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network, which comprises the steps of firstly, building a Caffe deep learning open source framework and training an open source VGG16 network model; designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into the trained VGG16 network model; then training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set; generating an Attention-MAC trademark feature extraction model based on the trained VGG16 network model added with the Attention network; and finally, retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result. The invention combines the Attention mechanism with the convolutional neural network, and the provided Attention-MAC trademark feature extraction model does not comprise any full connection layer, but processes the feature graph output by the convolution of the middle layer to obtain the feature representation of the original image, thereby avoiding using the redundant parameters of the full connection layer, achieving the purpose of simplifying the model, simultaneously improving the training and retrieval speed and reducing the false detection rate.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a basic flow chart of a fast trademark image retrieval method based on an Attention mechanism and a convolutional neural network in a preferred embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a network structure of a pre-trained VGG16 network model in a preferred embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a network structure after an Attention network proposed by the present invention is added in a preferred embodiment disclosed in the present invention;
FIG. 4 is a schematic network structure of the Attention-MAC trademark image retrieval model proposed by the present invention in a preferred embodiment disclosed in the present invention;
FIG. 5 is a search flow chart for trademark detection using the Attention-MAC trademark image search model in a preferred embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.
In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.
The invention provides a rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network, which comprises the following steps as shown in figure 1:
s1, building a Caffe deep learning open source framework, and training an open source VGG16 network model;
s2, designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into the trained VGG16 network model;
s3, training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set;
s4, generating an Attention-MAC trademark feature extraction model based on the trained VGG16 network model added with the Attention network;
s5, retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result.
Preferably, the step S1 includes the steps of:
s1-1, building a Caffe deep learning open source framework, and pre-training a VGG16 network model by using an ImageNet data set;
and S1-2, carrying out transfer learning training on the VGG16 network model obtained by pre-training by using a training set in a FlickrLogos-32 data set.
Preferably, the step S2 includes the steps of:
s2-1, designing an Attention network comprising two convolutional layers based on model parameters of a VGG16 network model;
s2-2, adding a designed Attention network between the output of the last layer of the trained pooling layer of the VGG16 network model and a full connection layer.
Preferably, the step S3 includes the steps of:
s3-1, fixing the network weight of the feature extraction part in the VGG16 network model added with the Attention network;
s3-2, training the Attention network in the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set.
Preferably, the step S4 includes the steps of:
s4-1, removing a trained full connection layer in the VGG16 network model added with the Attention network;
s4-2, adding a pooling layer behind the Attention network to generate an Attention-MAC trademark feature extraction model.
Preferably, the step S5 includes the steps of:
s5-1, performing dimensionality reduction on image feature representation output by the Attention-MAC trademark feature extraction model through a main feature analysis method to obtain an Attention-MAC feature vector;
s5-2, taking trademark images in FlickrLogos-32 data sets as input images, sequentially inputting the input images into an Attention-MAC trademark feature extraction model, generating Attention-MAC feature vectors of the trademark data set images, and constructing a feature library of the trademark data set images;
s5-3, inputting the trademark image to be retrieved into an Attention-MAC trademark feature extraction model as an input image, and generating an Attention-MAC feature vector of the trademark image to be retrieved;
s5-4, calculating cosine similarity of the Attention-MAC feature vector of the trademark image to be retrieved and the Attention-MAC feature vector of the FlickrLogos-32 data set image to obtain initial sequencing of the trademark data set image based on the cosine similarity;
s5-5, rearranging FlickrLogos-32 data set images through expanding query to obtain the final sequence of the similarity between the trademark data set images and the trademark images to be retrieved and reporting the trademark retrieval result.
Preferably, the step S1-2 includes the steps of:
s1-2-1, fine-tuning the network weight of the VGG16 network model obtained through pre-training by using a training set in a FlickrLogos-32 data set;
s1-2-2, in the training of transfer learning, a standard cross entropy loss function is used for carrying out classification training on the VGG16 network model obtained through pre-training.
Preferably, the step S5-1 includes the steps of:
s5-1-1, L representation of the feature output by the Attention-MAC trademark feature extraction model2Carrying out regularization treatment;
s5-1-2, performing feature selection on the processed feature representation through a main feature analysis method to obtain a feature vector after dimension reduction;
s5-1-3, carrying out L again on the feature vector after dimension reduction2And (5) regularizing to obtain an Attention-MAC feature vector.
In summary, the invention discloses a rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network, which comprises the steps of firstly, building a Caffe deep learning open source framework and training an open source VGG16 network model; designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into the trained VGG16 network model; then training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set; generating an Attention-MAC trademark feature extraction model based on the trained VGG16 network model added with the Attention network; and finally, retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result. The invention combines the Attention mechanism with the convolutional neural network, and the provided Attention-MAC trademark feature extraction model does not comprise any full connection layer, but processes the feature graph output by the convolution of the middle layer to obtain the feature representation of the original image, thereby avoiding using the redundant parameters of the full connection layer, achieving the purpose of simplifying the model, simultaneously improving the training and retrieval speed and reducing the false detection rate.
FIG. 2 is a schematic diagram of a network structure of a pre-trained VGG16 network model in a preferred embodiment of the present disclosure, wherein the selected VGG network model includes 13 convolutional layers and 3 fully-connected layers; the result of training on the ImageNet dataset is to output 1000 classification categories.
The following describes the training process of the Attention-MAC trademark feature extraction model proposed by the present invention with reference to FIG. 3, which specifically includes the following steps:
sa, building a Caffe deep learning open source framework, and pre-training the VGG16 network model by using an ImageNet data set;
sb, fine tuning the VGG16 network model obtained by pre-training by using a training image set in a FlickrLogos-32 data set, retraining the network weight, completing transfer learning of the model under a new data set, and further improving the accuracy of feature extraction;
sc, adding the Attention network provided by the invention between the output of the last pooling layer and the full connection layer for the trained VGG16 network model; partial network weights are extracted by fixing the features, and the Attention network is trained by using training images of a FlickrLogos-32 data set.
Sd, removing a full connection layer from the trained model, and keeping other parts; and adding a pooling layer behind the Attention network to obtain the Attention-MAC trademark feature extraction model provided by the invention.
The above steps are explained in more detail below:
the transfer learning described in step Sb includes the following steps:
a1, selection of data sets. The FlickerLogis-32 data set is divided into three parts, P1, P2 and P3, which are respectively a training set, a verification set and a test set. Where each part contains 32 different brand classes, P1 contains 320 pictures, P2 contains 3960 pictures, and P3 contains 3960 pictures, the pictures of each part being disjoint.
A2, adjustment of the loss function. In the training of the transfer learning, the model is classified and trained by using a standard cross entropy loss function:
wherein y represents the classification of the model output in the training; y is*Representing the real classification of the image, which is a 0-1 vector (if belonging to a certain class, the position representing the class is 1, and the rest are 0); 1 is an all 1 vector.
The design and training of the Attention network described in step Sc comprises the following steps:
b1, model parameters of VGG16 selected according to the invention, the output of convolutional layer Conv5_3 will be pooled and a W × H × K spatial matrix will be output, this output is called Origin _ feature _ map.
B2, the Attention network proposed in the present invention, includes 2 convolution layers (referred to as Attention _ Conv _1 and Attention _ Conv _2, respectively), using Softplus function as activation function.
B3, wherein the convolution kernel size adopted by the Attention _ Conv _1 is 1 × 1, and the number of the convolution kernels is k, which is the same as the number of channels of the spatial matrix; the convolution kernel size adopted by the Attention _ Conv _2 is 1 × 1, and the number is 1. The parameter θ of the convolution kernel is initialized randomly at the beginning of training.
B4, training the parameter theta of the Attention network by adopting a standard cross entropy loss function and back propagation. The purpose of the training is to learn the importance of each feature in the Origin _ feature _ map through the Attention network. Thus, the output function φ (f) of the Attention network is definedi(ii) a θ) for expressing the feature fiScore (i.e., weight) on origin _ feature _ map, where fi∈Rk,i=1,2...k。
B5, after the Attention network training is completed, the step B4 passes through the function phi (f)i(ii) a Theta) score ofiAs weights, the corresponding feature vector f in Origin _ feature _ mapiMultiplying to obtain a group of W × H × K space matrixes with adjusted weights, and referring the output as the attribute _ feature _ map.
The pooling layer described in step Sd, the present invention proposes a new method of area mean pooling. The specific calculation steps are as follows:
c1, outputting a group of W × H × K space matrix by the trained Attention-MAC trademark feature extraction model of the input image.
C2, regarding the set of three-dimensional matrixes as a set x ═ x of a set of two-dimensional characteristic response matrixesiK, k being the total number of channels of the set of output two-dimensional feature maps, xiA two-dimensional eigenresponse matrix representing the output of the ith eigenchannel.
C3, let Ω represent the two-dimensional characteristic response matrix W of the ith characteristic channel outputi×HiAll possible positions in (1), xi(p) represents xiResponse at upper position p, assume:
based on the continuity of the features, calculate by xi(pj) The characteristic mean of the centered 3 × 3 region is taken as xiOf pooled output, i.e.
Wherein p islFor this purpose, all pixels in the 3 × 3 region, n is 1,2.. 9, because of the position xi(pj) For this purpose, the center of the 3 × 3 region, so when l is 5, there is xi(p5)=xi(pj) By outputting a characteristic response matrix W for each layeri×HiCalculating the region mean pooling to obtain k-dimensional feature representation of the image:
feature vector fΩIs the output of the Attention-MAC trademark feature extraction model.
The calculation process of the Attention-MAC feature vector proposed in the present invention is described below with reference to FIG. 4.
For the image feature representation output by the Attention-MAC trademark feature extraction model, the output feature representation is subjected to dimensionality reduction through a main feature analysis method to obtain the Attention-MAC feature vector provided by the invention. The specific calculation method is as follows:
d1, processing the k-dimensional feature representation output by the Attention-MAC trademark feature extraction model in the step Sd. To prevent overfitting, f is first pairedΩCarry out L2And (6) regularizing.
D2, for the processed k-dimensional feature representation, selecting features by adopting a main feature analysis method, for k features, calculating and finding out l features with high correlation or high interaction information by using the main feature analysis method, and reducing the data from k dimension to l dimension.
D3, performing L again on the L-dimensional vector calculated by the principal feature analysis method2And (6) regularizing.
D4, the image feature vector obtained in the above steps is the Attention-MAC feature vector of the present invention.
The following describes a search process for trademark detection by using the Attention-MAC trademark image search model in accordance with the present invention with reference to fig. 5.
E1, using trademark images in FlickrLogos-32 data set as input images, sequentially inputting the Attention-MAC trademark image retrieval model to obtain a one-dimensional Attention-MAC characteristic vector f representing the FlickrLogos-32 data set imageiN, n is the number of trademarks in the FlickrLogos-32 dataset.
E2, inputting the trademark image to be retrieved into the Attention-MAC trademark feature extraction model as an input image to obtain the Attention-MAC feature vector q of the trademark image to be retrievedA。
E3, sequentially calculating the similarity between the image to be retrieved and the images in the FlickrLogos-32 data set by utilizing the cosine similarity.
Using similarity siRepresenting the similarity score of the ith image and the image to be retrieved in the FlickrLogos-32 data set according to the score siThe FlickrLogos-32 dataset images were initially sorted.
E4, selecting the images in the initial ordering, arranging the images in the first 5, summing the Attention-MAC feature vector of the images and the Attention-MAC feature vector of the trademark image to be searched, calculating the average value,
the obtained q isreAs a new query vector, the first 100 images in the initial ranking are selected, using qreAnd calculating the similarity, and rearranging the first 100 images according to the new score to obtain the final sorting result.
It should be noted that the system structures or method flows shown in fig. 1 to fig. 5 of the present invention are only some preferred embodiments of the present invention, and the illustration is only for the convenience of understanding the present invention and is not to be construed as a limitation of the present invention.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Claims (7)
1. A rapid trademark image retrieval method based on an Attention mechanism and a convolutional neural network is characterized by comprising the following steps:
s1, building a Caffe deep learning open source framework, and training an open source VGG16 network model;
s2, designing an Attention network comprising two convolutional layers based on a VGG16 network model, and adding the Attention network into the trained VGG16 network model;
s3, training the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set;
s4, generating an Attention-MAC trademark feature extraction model based on the trained VGG16 network model added with the Attention network;
s4-1, removing a trained full connection layer in the VGG16 network model added with the Attention network;
s4-2, adding a pooling layer behind the Attention network to generate an Attention-MAC trademark feature extraction model; the pooling layer adopts a region mean pooling method to process the feature representation output by the Attention-MAC trademark feature extraction model, and the method specifically comprises the following steps:
c1, inputting an Attention-MAC trademark feature extraction model of the image after training, and outputting a group of space matrixes W multiplied by H multiplied by K;
c2, regarding the set of three-dimensional matrixes as a set x ═ x of a set of two-dimensional characteristic response matrixesiK, k being the total number of channels of the set of output two-dimensional feature maps, xiA two-dimensional characteristic response matrix representing the output of the ith characteristic channel;
c3, let Ω represent the two-dimensional characteristic response matrix W of the ith characteristic channel outputi×HiAll possible positions in (1), xi(p) represents xiResponse at upper position p, assume:
based on the continuity of the features, calculate by xi(pj) The characteristic mean of the centered 3 × 3 region is taken as xiOf pooled output, i.e.
Wherein p islFor this purpose, all pixels in the 3 × 3 region, n is 1,2.. 9, because of the position xi(pj) For this purpose, the center of the 3 × 3 region, so when l is 5, there is xi(p5)=xi(pj) By outputting a characteristic response matrix W for each layeri×HiCalculating the region mean pooling to obtain k-dimensional feature representation of the image:
feature vector fΩIs the output of the Attention-MAC trademark feature extraction model;
s5, retrieving the trademark image to be queried based on the Attention-MAC trademark feature extraction model, and generating a retrieval result.
2. The Attention mechanism and convolutional neural network based fast trademark image retrieval method of claim 1, wherein the step S1 comprises the steps of:
s1-1, building a Caffe deep learning open source framework, and pre-training a VGG16 network model by using an ImageNet data set;
and S1-2, carrying out transfer learning training on the VGG16 network model obtained by pre-training by using a training set in a FlickrLogos-32 data set.
3. The Attention mechanism and convolutional neural network based fast trademark image retrieval method of claim 1, wherein the step S2 comprises the steps of:
s2-1, designing an Attention network comprising two convolutional layers based on model parameters of a VGG16 network model;
s2-2, adding a designed Attention network between the output of the last layer of the trained pooling layer of the VGG16 network model and a full connection layer.
4. The Attention mechanism and convolutional neural network based fast trademark image retrieval method of claim 1, wherein the step S3 comprises the steps of:
s3-1, fixing the network weight of the feature extraction part in the VGG16 network model added with the Attention network;
s3-2, training the Attention network in the VGG16 network model added with the Attention network by using a training set in a FlickrLogos-32 data set.
5. The Attention mechanism and convolutional neural network based fast trademark image retrieval method of claim 1, wherein the step S5 comprises the steps of:
s5-1, performing dimensionality reduction on image feature representation output by the Attention-MAC trademark feature extraction model through a main feature analysis method to obtain an Attention-MAC feature vector;
s5-2, taking trademark images in FlickrLogos-32 data sets as input images, sequentially inputting the input images into an Attention-MAC trademark feature extraction model, generating Attention-MAC feature vectors of the trademark data set images, and constructing a feature library of the trademark data set images;
s5-3, inputting the trademark image to be retrieved into an Attention-MAC trademark feature extraction model as an input image, and generating an Attention-MAC feature vector of the trademark image to be retrieved;
s5-4, calculating cosine similarity of the Attention-MAC feature vector of the trademark image to be retrieved and the Attention-MAC feature vector of the FlickrLogos-32 data set image to obtain initial sequencing of the trademark data set image based on the cosine similarity;
s5-5, rearranging FlickrLogos-32 data set images through expanding query to obtain the final sequence of the similarity between the trademark data set images and the trademark images to be retrieved and reporting the trademark retrieval result.
6. The Attention mechanism and convolutional neural network based fast trademark image retrieval method of claim 2, wherein the step S1-2 comprises the steps of:
s1-2-1, fine-tuning the network weight of the VGG16 network model obtained through pre-training by using a training set in a FlickrLogos-32 data set;
s1-2-2, in the training of transfer learning, a standard cross entropy loss function is used for carrying out classification training on the VGG16 network model obtained through pre-training.
7. The Attention mechanism and convolutional neural network-based fast trademark image retrieval method of claim 5, wherein the step S5-1 comprises the steps of:
s5-1-1, L representation of the feature output by the Attention-MAC trademark feature extraction model2Carrying out regularization treatment;
s5-1-2, performing feature selection on the processed feature representation through a main feature analysis method to obtain a feature vector after dimension reduction;
s5-1-3, carrying out L again on the feature vector after dimension reduction2And (5) regularizing to obtain an Attention-MAC feature vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810750096.1A CN108875076B (en) | 2018-07-10 | 2018-07-10 | Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810750096.1A CN108875076B (en) | 2018-07-10 | 2018-07-10 | Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108875076A CN108875076A (en) | 2018-11-23 |
CN108875076B true CN108875076B (en) | 2021-07-20 |
Family
ID=64300452
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810750096.1A Active CN108875076B (en) | 2018-07-10 | 2018-07-10 | Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108875076B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109697257A (en) * | 2018-12-18 | 2019-04-30 | 天罡网(北京)安全科技有限公司 | It is a kind of based on the network information retrieval method presorted with feature learning anti-noise |
CN109857897B (en) * | 2019-02-14 | 2021-06-29 | 厦门一品威客网络科技股份有限公司 | Trademark image retrieval method and device, computer equipment and storage medium |
CN110334226B (en) * | 2019-04-25 | 2022-04-05 | 吉林大学 | Depth image retrieval method fusing feature distribution entropy |
CN110599459A (en) * | 2019-08-14 | 2019-12-20 | 深圳市勘察研究院有限公司 | Underground pipe network risk assessment cloud system based on deep learning |
CN111694974A (en) * | 2020-06-12 | 2020-09-22 | 桂林电子科技大学 | Depth hash vehicle image retrieval method integrating attention mechanism |
CN111694977A (en) * | 2020-06-12 | 2020-09-22 | 桂林电子科技大学 | Vehicle image retrieval method based on data enhancement |
CN111985161B (en) * | 2020-08-21 | 2024-06-14 | 广东电网有限责任公司清远供电局 | Reconstruction method of three-dimensional model of transformer substation |
CN113127661B (en) * | 2021-04-06 | 2023-09-12 | 中国科学院计算技术研究所 | Multi-supervision medical image retrieval method and system based on cyclic query expansion |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103226826A (en) * | 2013-03-20 | 2013-07-31 | 西安电子科技大学 | Method for detecting changes of remote sensing image of visual attention model based on local entropy |
CN108038519A (en) * | 2018-01-30 | 2018-05-15 | 浙江大学 | A kind of uterine neck image processing method and device based on dense feature pyramid network |
WO2018106783A1 (en) * | 2016-12-06 | 2018-06-14 | Siemens Energy, Inc. | Weakly supervised anomaly detection and segmentation in images |
CN108171141A (en) * | 2017-12-25 | 2018-06-15 | 淮阴工学院 | The video target tracking method of cascade multi-pattern Fusion based on attention model |
CN108229267A (en) * | 2016-12-29 | 2018-06-29 | 北京市商汤科技开发有限公司 | Object properties detection, neural metwork training, method for detecting area and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10558750B2 (en) * | 2016-11-18 | 2020-02-11 | Salesforce.Com, Inc. | Spatial attention model for image captioning |
-
2018
- 2018-07-10 CN CN201810750096.1A patent/CN108875076B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103226826A (en) * | 2013-03-20 | 2013-07-31 | 西安电子科技大学 | Method for detecting changes of remote sensing image of visual attention model based on local entropy |
WO2018106783A1 (en) * | 2016-12-06 | 2018-06-14 | Siemens Energy, Inc. | Weakly supervised anomaly detection and segmentation in images |
CN108229267A (en) * | 2016-12-29 | 2018-06-29 | 北京市商汤科技开发有限公司 | Object properties detection, neural metwork training, method for detecting area and device |
CN108171141A (en) * | 2017-12-25 | 2018-06-15 | 淮阴工学院 | The video target tracking method of cascade multi-pattern Fusion based on attention model |
CN108038519A (en) * | 2018-01-30 | 2018-05-15 | 浙江大学 | A kind of uterine neck image processing method and device based on dense feature pyramid network |
Non-Patent Citations (2)
Title |
---|
"Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network";Le Wang;Jinliang Zang;Qilin Zhang;Zhenxing Niu;Gang Hua;《Sensors》;20180621;全文 * |
"基于深度特征的快速人脸图像检索方法";李振东,钟勇,陈蔓,曹冬平;《光学学报》;20180530;第38卷(第10期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108875076A (en) | 2018-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108875076B (en) | Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network | |
CN110866140B (en) | Image feature extraction model training method, image searching method and computer equipment | |
CN103345645B (en) | Commodity image class prediction method towards net purchase platform | |
EP4002161A1 (en) | Image retrieval method and apparatus, storage medium, and device | |
CN107683469A (en) | A kind of product classification method and device based on deep learning | |
CN108921198A (en) | commodity image classification method, server and system based on deep learning | |
CN111898703B (en) | Multi-label video classification method, model training method, device and medium | |
CN105354593B (en) | A kind of threedimensional model sorting technique based on NMF | |
CN108154156B (en) | Image set classification method and device based on neural topic model | |
CN116580257A (en) | Feature fusion model training and sample retrieval method and device and computer equipment | |
CN113378938B (en) | Edge transform graph neural network-based small sample image classification method and system | |
CN114510594A (en) | Traditional pattern subgraph retrieval method based on self-attention mechanism | |
CN114332889A (en) | Text box ordering method and text box ordering device for text image | |
CN113806580A (en) | Cross-modal Hash retrieval method based on hierarchical semantic structure | |
CN113032613A (en) | Three-dimensional model retrieval method based on interactive attention convolution neural network | |
Nezamabadi-pour et al. | Concept learning by fuzzy k-NN classification and relevance feedback for efficient image retrieval | |
CN113378962A (en) | Clothing attribute identification method and system based on graph attention network | |
CN112800262A (en) | Image self-organizing clustering visualization method and device and storage medium | |
CN105069136A (en) | Image recognition method in big data environment | |
CN110532409B (en) | Image retrieval method based on heterogeneous bilinear attention network | |
CN106874927A (en) | The construction method and system of a kind of random strong classifier | |
Adnan et al. | Automated image annotation with novel features based on deep ResNet50-SLT | |
CN111768214A (en) | Product attribute prediction method, system, device and storage medium | |
CN113378934A (en) | Small sample image classification method and system based on semantic perception map neural network | |
CN113011506A (en) | Texture image classification method based on depth re-fractal spectrum network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |