Trademark approximate retrieval system and method based on multi-dimensional feature fusion
Technical Field
The invention relates to the technical field of image search, in particular to a trademark approximate retrieval system and method based on multi-dimensional feature fusion.
Background
Trademarks are prominent marks that identify a good, service, or specific individual or business associated with it. Registered trademarks are a legitimate property that needs to be protected against trademark infringement. The trademark office undertakes the works of trademark examination, registration, administrative judgment and the like, and with the rapid development of economy, the number of enterprises is continuously increased, the registration amount of trademarks is continuously increased, and therefore the difficulty of the trademark office in examining, verifying and managing the trademarks is increased.
For the new trademark applied, the trademark office will audit the trademark, ensuring that the new trademark does not mimic and sufficiently differ from the registered trademark. At present, the trademark office carries out trademark checking method, which mainly searches trademarks through manually marked text information and graphic codes. The method has the advantages that the retrieval precision and efficiency are limited, and the workload of manual marking and auditing is large, so that the method faces important challenges for processing ever-increasing trademark registration applications.
With the development of computer image processing technology, a search method based on trademark image content itself is developed. The method does not depend on manually marked information, extracts corresponding image features from the trademark image, and performs similarity matching through the image features to further search out an approximate trademark. Most of the existing methods adopt a traditional image retrieval mode based on simple single image characteristics. The trademark image is usually composed of abstract figures and symbols, so that the method has strong abstraction and complexity, and semantic gaps exist between the representation of the trademark image and human cognition by a computer, so that the conventional method is difficult to understand the trademark image, and the accuracy and efficiency of trademark retrieval are influenced.
Therefore, aiming at the problems in trademark retrieval, the invention provides a trademark approximate retrieval method and a system with multi-dimensional feature fusion, which are used for fusing three-dimensional features of a convolutional neural network, a visual word bag and a graphic element, providing trademark approximate retrieval with semantic information and identification power and achieving better trademark retrieval effect.
Disclosure of Invention
The invention provides a trademark approximate retrieval system and method based on multi-dimensional feature fusion, aiming at solving the problems that when single image features are adopted in the prior art, because a trademark image is usually composed of abstract figures and symbols and has strong abstraction and complexity, and a computer has semantic gap between the representation of the trademark image and human cognition, the existing method is difficult to realize the understanding of the trademark image, and further the accuracy and efficiency of trademark retrieval are influenced.
The invention provides a trademark approximate retrieval system based on multi-dimensional feature fusion, which comprises a trademark database, a feature retrieval module and a feature extraction module, wherein the trademark database is connected with the feature extraction module;
the feature extraction module comprises a neural network feature module, a visual word bag feature module and a graphic element module, the neural network feature module is connected with the trademark database and the feature retrieval module, the visual word bag feature module is connected with the trademark database and the feature retrieval module, the pattern element module is connected with the trademark database, the neural network feature module is used for extracting image multi-scale convolution neural network features through the trademark database, carrying out optimization training on the neural network based on a triple measurement loss function and outputting the multi-scale convolution neural network features to the feature retrieval module, the visual word bag feature module is used for constructing a visual dictionary according to the extracted image key point features through the trademark database and outputting the image visual word bag features extracted based on the visual dictionary to the feature retrieval module, the graphic element module is used for establishing an index library of the registered trademark graphic elements and manually inputting the characteristics of the graphic elements to the characteristic retrieval module through a query staff.
A neural network feature module: the module is responsible for extracting the multi-scale convolution neural network characteristics of the image and carrying out optimization training on the neural network based on the triple measurement loss function;
visual word bag characteristic module: the module is responsible for extracting image key point features, constructing a visual dictionary and finally outputting image visual word bag features extracted based on the visual dictionary;
a graphic element feature module: the module is responsible for establishing an index library of registered trademark graphic elements through a full-text search engine, and inputting the graphic element characteristics of the trademark to be registered by a query staff through manual operation;
a characteristic retrieval module: the module is responsible for calling the characteristic module to extract the multi-scale convolution neural network characteristic, the visual word bag characteristic and the graphic element characteristic of the trademark image to be registered in the retrieval stage, calculating the similarity of the trademark to be registered and the trademarks in the registered trademark library, and returning a retrieval result by sequencing the similarity.
The invention relates to a trademark approximate retrieval system based on multi-dimensional feature fusion, which is characterized in that a neural network feature module comprises a residual error network, a first full connection layer, a second full connection layer, a first convolution layer, a second convolution layer, a first pooling layer and a second pooling layer, wherein the residual error network, the first convolution layer and the second convolution layer are all connected with a trademark database, the residual error network data is connected with the first full connection layer, the first convolution layer is connected with the first pooling layer, the second convolution layer data is connected with the second pooling layer, the first full connection layer, the first pooling layer and the second pooling layer are all connected with the second full connection layer, and the second full connection layer transmits multi-scale convolution neural network features to a feature retrieval module.
The invention provides a trademark approximate retrieval method based on multi-dimensional feature fusion, which comprises the following steps of:
s1, establishing a trademark database;
s2, inputting a trademark to be registered through a characteristic retrieval module;
s3, according to a trademark to be registered, the neural network feature module extracts multi-scale convolution neural network features of the trademark image through the trademark database and transmits the features to the feature retrieval module, the visual bag feature module extracts visual bag features of the trademark image through the trademark database and transmits the visual bag features to the feature retrieval module, and the graphic element module extracts graphic element features of the trademark image through the trademark database and transmits the graphic element features to the feature retrieval module;
and S4, performing similarity matching on the trademark to be registered and trademarks in a registered trademark library based on the multi-scale convolutional neural network characteristics, the visual word bag characteristics and the graphic element characteristics to obtain a fusion retrieval result of the three dimensional characteristics.
The invention discloses a trademark approximate retrieval method based on multi-dimensional feature fusion, which is a preferable mode, similarity matching is carried out on a trademark to be registered and trademarks in a registered trademark library in step S4, and a specific calculation formula for obtaining a fusion retrieval result of three dimensional features is as follows:
Score(a,di)
=α*ScoreCNN(a,di)+β*ScoreBoVW(a,di)+Scoreelement(a,di)
wherein Score (a, d)i) For the ith trademark d in the trademark a to be registered and the registered trademark libraryiTotal similarity of (1), ScoreCNNSimilarity, Score, for multi-scale convolutional neural network featuresBoVWSimilarity, Score, which is a feature of visual bag of wordselementAlpha and beta are weight parameters for similarity of the graphic element features.
The invention discloses a trademark approximate retrieval method based on multi-dimensional feature fusion, which is characterized in that as an optimal mode, a specific mode that a neural network feature module extracts multi-scale convolution neural network features of a trademark image and transmits the neural network features to a feature retrieval module in the step S3 is as follows:
s311, inputting the trademark image into a residual error network, a first convolution layer and a second convolution layer respectively, initializing the residual error network by ImageNet pre-training parameters, and extracting an Average Pooling layer in the residual error network to obtain a first characteristic through a first full-connection layer;
s312, the first convolution layer and the second convolution layer adopt different Padding values and Stride values, and a second characteristic and a third characteristic are obtained through the first pooling layer and the second pooling layer respectively;
s313, the first feature, the second feature and the third feature all pass through L2Regularization;
s314, splicing and inputting the regularized first feature, second feature and third feature into a second full-connection layer;
and S315, obtaining the multi-scale convolutional neural network characteristics through linear mapping by the second full-connection layer.
The invention discloses a trademark approximate retrieval method based on multi-dimensional feature fusion, which is characterized in that as an optimal mode, the step S3 of extracting the multi-scale convolution neural network features of a trademark image by a neural network feature module and transmitting the neural network features to a feature retrieval module further comprises the following steps:
s316, optimizing the multi-scale convolutional neural network parameters through a triple measurement loss function, and mining and learning semantic information of the registered trademark image;
the loss function is expressed as:
Loss(p,q,r)
=max(0,margin+D(f(q),f(p))-D(f(q),D(f(n))))
wherein Loss (p, q, r) is a metric Loss function of the triple (p, q, r), p, q, r are respectively a current trademark, an approximate trademark of the current trademark and a non-approximate trademark of the current trademark, f (·) is an output feature of the multi-scale convolutional neural network, D (·) is a cosine distance between two input features, and margin is a boundary parameter.
The invention discloses a trademark approximate retrieval method based on multi-dimensional feature fusion, which is a preferable mode, wherein the specific mode that a visual bag-of-words feature module extracts visual bag-of-words features of a trademark image and transmits the visual bag-of-words features to a feature retrieval module in the step S3 is as follows:
s321, extracting key points of the image by using three local detectors of Harris, Hessian and Kaze;
s322, extracting key point characteristics of the image key points through a Sift descriptor;
s323, establishing a visual dictionary through Kmeans clustering of the key point characteristics;
and S324, extracting visual word bag characteristics from the image through a visual dictionary.
On one hand, the method is based on the convolutional neural network, extracts the multi-scale convolutional neural network characteristics, and further models the semantic information of the trademark by performing optimization training on the multi-scale convolutional neural network characteristics based on the triple loss function. Meanwhile, the invention considers that a large amount of complementary information is contained among various dimensional characteristics, and based on the characteristics of three dimensions, namely the multi-scale convolutional neural network characteristics, the visual word bag characteristics and the graphic element characteristics, the relevance complementarity among different dimensional characteristics is fully utilized and mined, so that better retrieval accuracy can be obtained.
The invention has the following beneficial effects:
(1) in the multi-scale convolutional neural network, a residual network emphasizes the learning of high-level semantic features, the other two paths of shallow networks emphasize the learning of content features of images, and meanwhile, the optimization training based on a triple measurement loss function enables the multi-scale convolutional neural network features to better model the relative similarity of trademarks, so that a better retrieval effect is obtained;
(2) the local characteristics of the visual word bag side-duplication image content and the graphic coding represent high-level semantic understanding of a person on the trademark, and the integration of three dimensional characteristics, namely the multi-scale convolution nerve characteristics, the visual word bag characteristics and the graphic coding characteristics, enables advantages among different dimensional characteristics to be complemented, and improves the comprehensive trademark retrieval effect.
Drawings
FIG. 1 is a schematic diagram of a trademark approximate retrieval system based on multi-dimensional feature fusion;
FIG. 2 is a schematic diagram of a feature extraction module of a trademark approximate retrieval system based on multi-dimensional feature fusion;
FIG. 3 is a schematic structural diagram of a neural network feature module of a trademark approximation retrieval system based on multi-dimensional feature fusion;
FIG. 4 is a flowchart of a trademark approximation retrieval method based on multi-dimensional feature fusion.
Reference numerals:
1. a trademark database; 2. a feature retrieval module; 3. a feature extraction module; 31. a neural network feature module; 311. a residual network; 312. a first fully-connected layer; 313. a second fully connected layer; 314. a first winding layer; 315. a second convolutional layer; 316. a first pooling layer; 317. a second pooling layer; 32. a visual bag of words feature module; 33. and a graphic element module.
Detailed Description
The technical solutions in the embodiments of the present invention will be made clear below with reference to the accompanying drawings in the embodiments of the present invention.
Example 1
As shown in fig. 1, the device comprises a trademark database 1, a feature retrieval module 2 and a feature extraction module 3, wherein the trademark database 1 is connected with the feature extraction module 3, and the feature extraction module 3 is connected with the feature retrieval module 2.
As shown in fig. 2, the feature extraction module 3 includes a neural network feature module 31, a visual word bag feature module 32 and a graphic element module 33, the neural network feature module 31 is connected with the trademark database 1 and the feature retrieval module 2, the visual word bag feature module 32 is connected with the trademark database 1 and the feature retrieval module 2, the graphic element module 33 is connected with the trademark database 1, the neural network feature module 31 is used for extracting image multi-scale convolution neural network features through the trademark database 1, performing optimization training on the neural network based on a triple metric loss function and outputting the multi-scale convolution neural network features to the feature retrieval module 2, the visual word bag feature module 32 is used for constructing a visual dictionary according to the extracted image key point features and outputting the image visual word bag features extracted based on the visual dictionary to the feature retrieval module 2 through the trademark database 1, the graphic element module 33 is used to create an index library of registered trademark graphic elements and output the graphic element features to the feature retrieval module 2.
As shown in fig. 3, the neural network feature module 31 includes a residual network 311, a first fully-connected layer 312, a second fully-connected layer 313, a first convolutional layer 314, a second convolutional layer 315, a first pooling layer 316 and a second pooling layer 317, the residual network 311, the first convolutional layer 314 and the second convolutional layer 315 are all connected to the trademark database 1, the residual network 311 is data-connected to the first fully-connected layer 312, the first convolutional layer 314 is data-connected to the first pooling layer 316, the second convolutional layer 315 is data-connected to the second pooling layer 317, the first fully-connected layer 312, the first pooling layer 316 and the second pooling layer 317 are all connected to the second fully-connected layer 313, and the second fully-connected layer 313 transfers the multi-scale convolutional neural network feature to the feature retrieval module 2.
As shown in fig. 4, a trademark approximate retrieval method based on multi-dimensional feature fusion includes the following steps:
s1, establishing trademark database 1
Converting all registered trademark images and trademark images to be registered into gray images, eliminating the interference of colors on retrieval results, and uniformly storing all the images into a JPG format;
s2, inputting a trademark to be registered through the characteristic retrieval module 2;
s3, according to a trademark to be registered, the neural network feature module 31 extracts multi-scale convolution neural network features of the trademark image through the trademark database 1 and transmits the features to the feature retrieval module 2, the visual word bag feature module 32 extracts visual word bag features of the trademark image through the trademark database 1 and transmits the visual word bag features to the feature retrieval module 2, and the graphic element module 33 extracts graphic element features of the trademark image through the trademark database 1 and transmits the graphic element features to the feature retrieval module 2;
and S4, performing similarity matching on the trademark to be registered and trademarks in a registered trademark library based on the multi-scale convolutional neural network characteristics, the visual word bag characteristics and the graphic element characteristics to obtain a fusion retrieval result of the three dimensional characteristics.
The specific calculation formula for obtaining the fusion retrieval result of the three dimensional characteristics by similarity matching between the trademark to be registered and the trademarks in the registered trademark library is as follows:
Score(a,di)
=α*ScoreCNN(a,di)+β*ScoreBoVW(a,di)+Scoreelement(a,di)
wherein Score (a, d)i) For the ith trademark d in the trademark a to be registered and the registered trademark libraryiTotal similarity of (1), ScoreCNNSimilarity, Score, for multi-scale convolutional neural network featuresBoVWSimilarity, Score, which is a feature of visual bag of wordselementFor the similarity of the features of the graphic elements, alpha and beta are weight parameters, and in the implementation, the value of alpha is 0.6, and the value of beta is 0.4. Based on Score (a, d)i) And sorting the trademarks in the registered trademark library from big to small, and returning a sorting result, namely a retrieval result of the trademarks to be registered.
In the specific implementation, the accuracy of trademark retrieval is evaluated by adopting the MAP index. MAP is a common statistical indicator of search results, and is defined as follows:
where R is the total number of positive samples in the quotient library, RkIndicates the number of positive samples, rel, in the first k returned resultskAnd the result of the kth return is represented as whether the result is a positive sample or not, if the result is the positive sample, the result is 1, and if the result is not the positive sample, the result is 0. The above equation defines the AP value for a single query, and the MAP index is the average of all queried APs.
In this embodiment, the image is first unified to 224 × 224 pixel size. The first path adopts a structure that ResNet is connected with the first full connection layer 313, in the implementation, a ResNet101 network initialized by ImageNet data set (www.image-net.org) pre-training parameters is adopted, 2048-dimensional characteristics of an Average Pooling layer in the network are extracted and input into the first full connection layer 313. The first fully-connected layer 313 employs a linear mapping of the number of hidden neurons 4096. The other two paths adopt a structure of a convolution layer and a pooling layer, but Padding values of the convolution layer adopt 1 and 4 respectively, Stride values adopt 16 and 32 respectively, and the characteristic of 1536 dimension is obtained through the pooling layer. The characteristics obtained by the three networks pass through L2Regularization, splicing and inputting to a second full-connection layer 313, wherein the second full-connection layer 313 also adopts the linear mapping of the hidden neuron number 4096 to finally obtain the 4096-dimensional multi-scale convolution neural network characteristics.
The specific way in which the neural network feature module 31 extracts the multi-scale convolution neural network features of the trademark image and transmits the extracted multi-scale convolution neural network features to the feature retrieval module 2 in step S3 is as follows:
s311, inputting the trademark image into a residual error network 311, a first convolution layer 314 and a second convolution layer 315 respectively, and the residual error network 311 initialized by ImageNet pre-training parameters, and extracting a first feature of an Average Pooling layer in the residual error network 311 into a first full-connection layer 312;
s312, the first convolutional layer 314 and the second convolutional layer 315 obtain a second feature and a third feature through the first pooling layer 316 and the second pooling layer 317 respectively by using different Padding values and Stride values;
s313, the first feature, the second feature and the third feature all pass through L2Regularization;
s314, splicing and inputting the regularized first feature, second feature and third feature into a second full-connection layer 313;
and S315, obtaining the multi-scale convolutional neural network characteristics through linear mapping by the second fully-connected layer 313.
S316, optimizing the multi-scale convolutional neural network parameters through a triple measurement loss function, and mining and learning semantic information of the registered trademark image;
the loss function is expressed as:
Loss(p,q,r)
=max(0,margin+D(f(q),f(p))-D(f(q),D(f(n))))
wherein Loss (p, q, r) is a metric Loss function of the triple (p, q, r), p, q, r are respectively a current trademark, an approximate trademark of the current trademark and an approximate trademark of the current trademark, f (·) is an output feature of the multi-scale convolutional neural network, D (·) is a cosine distance between two input features, margin is a boundary parameter, and the value of margin in the implementation process is 0.1.
In the implementation process, the triples are constructed according to the existing labeling information, and the training is carried out in a mode of randomly selecting data samples in the training process. The training is optimized by adopting a random gradient descent algorithm, the initial learning rate is 0.001, the momentum parameter is 0.9, and the data volume of single training is 64.
The specific way in which the visual bag-of-words feature module 32 extracts the visual bag-of-words features of the trademark image and transmits the visual bag-of-words features to the feature retrieval module 2 in step S3 is as follows:
s321, extracting key points of the image by using three local detectors of Harris, Hessian and Kaze;
s322, extracting key point characteristics of the image key points through a Sift descriptor;
s323, establishing a visual dictionary through Kmeans clustering of the key point characteristics;
and S324, extracting visual word bag characteristics from the image through a visual dictionary.
In this embodiment, first, key points of an image are extracted using three local detectors, i.e., Harris, Hessian, and Kaze.
Based on the extracted image key points, key point features are respectively extracted through a Sift descriptor, and the feature dimension of each key point is 128.
And establishing a visual dictionary based on the Sift key point characteristics through a Kmeans clustering algorithm, wherein the size of the visual dictionary is 2000 words.
And finally, for each image, calculating the word frequency and the inverse document frequency based on the visual dictionary, and outputting 2000-dimensional visual word bag characteristics.
In this embodiment, an index library of registered trademark graphic elements is established, and the graphic element features of the trademark to be registered are manually input by an inquirer.
In the step, an index library is established for the graphic elements of the registered trademark by adopting full-text search engine libraries such as Lucene and Whoosh to support the quick retrieval of the graphic elements of the next step, and an index field of the index library comprises the registration number of the trademark and a graphic coding set contained in a trademark image.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.