CN111881906A

CN111881906A - LOGO identification method based on attention mechanism image retrieval

Info

Publication number: CN111881906A
Application number: CN202010558069.1A
Authority: CN
Inventors: 张容琛
Original assignee: Guangzhou Wanwei Innovation Technology Co ltd
Current assignee: Guangzhou Gaowei Network Technology Co ltd
Priority date: 2020-06-18
Filing date: 2020-06-18
Publication date: 2020-11-03

Abstract

The invention discloses a LOGO identification method based on attention mechanism image retrieval, which comprises the steps of firstly judging whether LOGO exists or not by acquiring a region containing the LOGO in an image, then intercepting the region containing the LOGO in the image, and acquiring a feature tensor containing the LOGO region by a feature extraction network; the feature tensor containing the LOGO area is subjected to feature compression to obtain feature vectors, feature space distance operation is carried out on the feature vectors of the LOGO in the feature vectors containing the LOGO area and an image library, the image with the shortest distance is selected as the matched LOGO, the label corresponding to the feature code of the matched LOGO is read, and the label of the matched LOGO is used as the label of the LOGO to be identified to determine the label of the LOGO. Under the logic, the condition that LOGO does not exist can be filtered, the invalid operation process is removed, the operation efficiency is improved, and the problem of high retrieval error rate caused by large background ratio in the image retrieval process is solved.

Description

LOGO identification method based on attention mechanism image retrieval

Technical Field

The invention relates to the technical field of LOGO recognition, in particular to a LOGO recognition method based on attention mechanism image retrieval.

Background

LOGO (trademark/LOGO), which is a visual information expression mode formed in long-term life and practice, has a certain meaning and visual patterns understood by people, and has a simple, clear and clear visual transmission effect. The traditional LOGO recognition method comprises the following steps: the method (1) realizes a small amount of specific LOGO recognition by a target detection method; the method (2) realizes specific similar LOGO image searching and matching through an image retrieval method.

The method (1) target detection method can be divided into two categories according to the processing flow: two Stage, One Stage. TwoStage realizes target detection in two steps, firstly generates a sample candidate frame through a feature network, and then classifies the candidate frame through a classification network, wherein typical algorithms comprise Fast R-CNN and Fast R-CNN. The One Stage target detection algorithm directly obtains the target class probability and the bounding box through the output regression of the convolutional neural network, and typical algorithms comprise YOLO3, SSD and RetinaNet have the advantages of few model parameters, high inference speed and poor precision. The limitation is that the accurate classification of large-scale LOGO recognition cannot be achieved by using a target detection algorithm, the large-scale LOGO recognition involves thousands of classes of LOGOs, and the target detection obtains class confidence information of a target by reading feature vectors of each dimensionality output by a convolutional layer, so that the parameter number of a target detection neural network is in direct proportion to the number of the classes of the targets to be recognized, and a target detection model is only suitable for target recognition of dozens of classes. Therefore, if method (1) is used for large-scale LOGO recognition, two unsolvable problems are faced: 1. unbalanced class samples of the data set can cause low LOGO recognition and classification precision, and a large amount of misjudgments are caused. 2. Parameters of the neural network convolution layer are greatly increased, so that a deep learning model becomes very large, the forward reasoning time is greatly increased, and the model cannot be deployed and applied.

Method (2) the image retrieval method consists of three parts: image feature acquisition, image feature coding and image feature matching. There are two methods for image feature acquisition: 1. the image feature descriptors are obtained by traditional image graphics methods such as SIFT, SURF, ORB and the like. 2. Image features are obtained through a deep convolutional neural network. Image feature coding generally uses a Hashing method, and image feature matching generally uses a hang-Distance to describe the Distance of feature codes. Compared with the method (1), the method (2) has the advantages of not being limited by the LOGO types and not being influenced by the imbalance of the data set samples. The method has the limitation that the feature extraction is oriented to the whole picture, the region in the rectangular LOGO frame in the picture 1 is effective feature information, the rest is noise information, the noise feature information in the global feature vector obtained through feature engineering is often large in proportion, feature coding is carried out under the condition, and the image obtained through retrieval after feature matching is often the condition that the image background is similar but not the corresponding LOGO image.

Disclosure of Invention

The invention aims to provide an efficient and accurate LOGO recognition method based on attention mechanism image retrieval.

The invention discloses a LOGO recognition method based on attention mechanism image retrieval, which comprises the following steps of:

step S1: acquiring a region containing LOGO in an image;

step S2: intercepting an area containing the LOGO in the image, and acquiring a feature tensor of the area containing the LOGO by a feature extraction network;

step S3: performing feature compression on a feature tensor containing a LOGO area to obtain a feature vector;

step S4: performing feature space distance operation on the feature vector containing the LOGO area and the feature vector of the LOGO in the image library, and selecting the image with the shortest distance as the matched LOGO;

step S5: and reading a label corresponding to the feature code of the matched LOGO, and taking the label of the matched LOGO as the label of the LOGO to be identified to finish LOGO identification.

The invention relates to a LOGO identification method based on attention mechanism image retrieval, which comprises the steps of firstly judging whether LOGO exists or not by acquiring a region containing the LOGO in an image, then intercepting the region containing the LOGO in the image, and acquiring a feature tensor containing the LOGO region by a feature extraction network; the feature tensor containing the LOGO area is subjected to feature compression to obtain feature vectors, feature space distance operation is carried out on the feature vectors of the LOGO in the feature vectors containing the LOGO area and an image library, the image with the shortest distance is selected as the matched LOGO, the label corresponding to the feature code of the matched LOGO is read, and the label of the matched LOGO is used as the label of the LOGO to be identified to determine the label of the LOGO. Under this logic, can filter the condition that does not have LOGO, get rid of invalid operation process, improved the operating efficiency. The similar picture is found in the image library through the characteristic of a certain area in the picture, and the problem of high retrieval error rate caused by large background ratio in the image retrieval process is solved. The problem of efficiency in the face of large-scale LOGO image recognition process is solved, real-time result feedback can be achieved, and accuracy is guaranteed. And under the condition that the number of samples in a single LOGO category training set is small and the number of samples in the same category is not uniform, the purpose of accurately identifying each LOGO category can be achieved.

Drawings

FIG. 1 is a diagram of the background art in which the regions within a rectangular LOGO frame are valid feature information and the rest are noise information;

FIG. 2 is a schematic flow chart of a LOGO recognition method based on attention mechanism image retrieval according to the present invention;

FIG. 3 is a schematic diagram of an image data model established by the full-convolution FCN structure of image data, the image data model fusing multi-scale shallow image features and high-level semantic features in a residual jump layer connection manner;

FIG. 4 is a schematic diagram of a predicted bounding box of LOGO obtained by the present invention.

Detailed Description

As shown in fig. 2, the LOGO recognition method based on attention mechanism image retrieval according to the present invention includes the following steps:

step S1: acquiring a region containing LOGO in an image;

Judging whether the LOGO exists or not by acquiring a region containing the LOGO in the image, and then acquiring a feature tensor of the region containing the LOGO by a feature extraction network by intercepting the region containing the LOGO in the image; the feature tensor containing the LOGO area is subjected to feature compression to obtain feature vectors, feature space distance operation is carried out on the feature vectors of the LOGO in the feature vectors containing the LOGO area and an image library, the image with the shortest distance is selected as the matched LOGO, the label corresponding to the feature code of the matched LOGO is read, and the label of the matched LOGO is used as the label of the LOGO to be identified to determine the label of the LOGO. Under this logic, can filter the condition that does not have LOGO, get rid of invalid operation process, improved the operating efficiency. The similar picture is found in the image library through the characteristic of a certain area in the picture, and the problem of high retrieval error rate caused by large background ratio in the image retrieval process is solved. The problem of efficiency in the face of large-scale LOGO image recognition process is solved, real-time result feedback can be achieved, and accuracy is guaranteed. And under the condition that the number of samples in a single LOGO category training set is small and the number of samples in the same category is not uniform, the purpose of accurately identifying each LOGO category can be achieved.

As shown in fig. 3, the step S1 includes the following steps:

step S1-1: acquiring image data of a region containing LOGO;

step S1-2: establishing an image data model for the image data through a full convolution FCN structure;

step S1-3: the image data model fuses the multi-scale shallow image features and the high-level semantic features in a residual error layer-skipping connection mode;

step S1-4: and obtaining a prediction boundary frame of the LOGO by an image data model obtained by fusing the multi-scale shallow image features and the high-level semantic features through non-maximum suppression operation, and further obtaining image data containing a LOGO area.

The LOGO is detected by adopting the neural network model of the regression bounding box, and the model parameters are few, the reasoning is fast, the requirement of real-time response can be met, and the requirement of real-time response can also be met by deploying the model at the mobile terminal after the convolution operation optimization, so that the application flexibility is improved.

The step S2 includes the following steps:

step S2-1: intercepting LOGO area image data according to the bounding box information of the LOGO area in the image;

step S2-2: the feature extraction network executes a plurality of convolution operations or pooling operations on the intercepted LOGO area image data for outputting;

step S2-3: performing feature fusion on all the outputs to generate a new feature map;

step S2-4: and (4) convolving the fused characteristic diagram to obtain the characteristic tensor of the LOGO area image.

The step S3 includes: and mapping the high-dimensional space feature points of the feature tensor containing the LOGO area to a low-dimensional space for mapping, and obtaining the feature vector of the low-dimensional space by calculating a mapping relation.

The method comprises the following steps of mapping the feature points of the high-dimensional space containing the feature tensor of the LOGO area to the low-dimensional space for mapping, and obtaining the feature vector of the low-dimensional space by calculating the mapping relation:

step S3-1: establishing a feature space mapping equation, taking the covariance matrix of the output low-dimensional feature vector as an optimization object, and calculating step S3-2: the corresponding eigenvectors and eigenvalues for the covariance matrix maximization;

step S3-3: sorting the eigenvalues of the covariance matrix according to the magnitude to obtain the weight relation of the corresponding eigenvectors of the covariance matrix;

step S3-4: and carrying out L2 standardization on the eigenvector corresponding to each eigenvalue of the covariance matrix to obtain a scalar, sequencing all the scalar obtained through the standardization according to the weight relation of the eigenvector to obtain a new low-dimensional eigenvector, and thus finishing the mapping from the high dimension to the low dimension.

The step S4 includes: when unknown LOGO images are retrieved and identified, a feature database file is read, the Euclidean distance between the feature vector of the LOGO to be identified and the LOGO feature vector in the database is calculated, the LOGO similarity is judged according to the Euclidean distance, and the image with the shortest distance is taken as the matched LOGO.

The step S4 further includes: and performing feature coding on the existing LOGO image library to generate a feature database file with a large number of LOGO feature vectors and LOGO labels in one-to-one corresponding feature coding.

According to LOGO boundary box information, a LOGO area can be intercepted and input into an autonomously designed feature extraction network, the feature extraction network executes a plurality of convolution operations or pooling operations on an input image in parallel, and performs feature fusion on all output results to obtain a new feature map, because different convolution operations and pooling operations such as 1 × 13 × 3 or 5 × 5 can obtain different information of the input image, and the operations are processed in parallel and combined with all results to obtain better image representation. Considering that the deep neural network consumes a lot of computing resources, in order to reduce the computational cost, an additional 1x1 convolutional layer is added before the 3x3 and 5x5 convolutional layers to limit the number of input channels, and the 1x1 convolution is much smaller than the 5x5 convolution. The feature tensor of the convolution layer output feature tensor of the feature extraction network is obtained and is used as the feature tensor of the LOGO area image, and the tensor size is [16, 512 ].

For ease of computation and storage, pair [16, 512)]The feature tensor is subjected to dimensionality reduction compression, the feature tensor is projected from a high-dimensional space to a low-dimensional space, and the mathematical expression of the projection of the feature point xi of the original feature tensor X in the low-dimensional space is assumed as follows:

in accordance with the maximum separability principle, it is desirable that the feature points projected to the low-dimensional space are as dispersed as possible, and the optimization problem for the projection mapping relation g (xi) can be expressed mathematically as:

the optimization problem can be written as:

as can be seen from the above equation, the row vectors of the matrix wi are eigenvectors of the covariance matrix XXT, and the eigenvalues λ i determine the weight of the eigenvectors. Therefore, the covariance matrix XXT can be subjected to eigen decomposition, the calculated eigenvalues λ i are sorted, and eigenvectors corresponding to the first n eigenvalues are taken in the descending order to form a new projection matrix:

in this embodiment, taking the size of n as 128, the method described above will be used to generate [16, 512]]Is compressed to [1,128 ]]While preserving the data distribution characteristics of the original feature tensor as much as possible.

Feature coding is carried out on the existing LOGO image library to generate a feature database file with a large number of LOGO feature vectors and LOGO labels corresponding to the feature codes one by one; when retrieving and identifying unknown LOGO images, reading a feature database file, calculating the Euclidean distance between the feature vector of the to-be-identified LOGO and the LOGO feature vector in a library, judging the LOGO similarity according to the Euclidean distance, taking the image with the shortest distance as the matched LOGO, reading the label corresponding to the feature code of the matched LOGO, taking the label of the matched LOGO as the label of the to-be-identified LOGO, and completing the LOGO identification.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A LOGO recognition method based on attention mechanism image retrieval is characterized by comprising the following steps:

step S1: acquiring a region containing LOGO in an image;

2. The LOGO recognition method based on attention mechanism image retrieval as claimed in claim 1, wherein said step S1 comprises the following steps:

step S1-1: acquiring image data of a region containing LOGO;

3. The LOGO recognition method based on attention mechanism image retrieval as claimed in claim 1, wherein said step S2 comprises the following steps:

4. The LOGO recognition method based on attention mechanism image retrieval as claimed in claim 1, wherein said step S3 further comprises: and mapping the high-dimensional space feature points of the feature tensor containing the LOGO area to a low-dimensional space for mapping, and obtaining the feature vector of the low-dimensional space by calculating a mapping relation.

5. The LOGO recognition method based on attention mechanism image retrieval as claimed in claim 4, wherein said mapping the feature points of the high dimensional space containing the feature tensor of the LOGO region to the low dimensional space to obtain the feature vector of the low dimensional space by calculating the mapping relationship comprises the following steps:

6. The LOGO recognition method based on attention mechanism image retrieval as claimed in claim 1, wherein said step S4 comprises: when unknown LOGO images are retrieved and identified, a feature database file is read, the Euclidean distance between the feature vector of the LOGO to be identified and the LOGO feature vector in the database is calculated, the LOGO similarity is judged according to the Euclidean distance, and the image with the shortest distance is taken as the matched LOGO.

7. The LOGO recognition method based on attention mechanism image retrieval as claimed in claim 6, wherein said step S4 further comprises: and performing feature coding on the existing LOGO image library to generate a feature database file with a large number of LOGO feature vectors and LOGO labels in one-to-one corresponding feature coding.