CN114266887A

CN114266887A - Large-scale trademark detection method based on deep learning

Info

Publication number: CN114266887A
Application number: CN202111610685.8A
Authority: CN
Inventors: 陈凯彦; 张拓; 金润辉; 徐瑞吉; 毛科技
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-04-01
Anticipated expiration: 2041-12-27
Also published as: CN114266887B

Abstract

A large-scale trademark detection method based on deep learning comprises the following steps: the method comprises the following steps: step 1), preprocessing trademark picture data; step 2), training a trademark detection model; step 3), identifying a label corresponding to the input picture trademark; the method can solve the problems of lack of training data, inconsistent multi-scale objects and inconsistent bounding box regression. The experimental result shows that compared with other depth detection models, the invention has higher performance.

Description

Large-scale trademark detection method based on deep learning

Technical Field

The invention belongs to the technical field of machine vision, and discloses a novel method for large-scale trademark detection based on deep learning.

Background

In the multimedia field, the research on identification is very extensive. As one of the important branches of marker research, marker detection plays a great role in various applications. Logo detection may be used for video advertising research, brand awareness monitoring and analysis, brand infringement detection, autonomous driving, and intelligent transportation, to name a few.

However, detecting a logo from an image is a challenging task. Because there are many brands in the real world, the same brand identity is diverse. Meanwhile, compared with a general object image, the background of the logo image is highly complex and may be interfered by factors such as illumination, shading and blurring. Identifying unknown fonts, colors, and sizes that may be different on different platforms, inter-class similarity and intra-class differences may make this problem more difficult. Finally, the markers are usually small targets compared to the general detection objects, which presents a great challenge to the marker detection algorithm.

In the past, most marker detection algorithms were based on SIFI. This method can detect stable and significant points across multiple scales in an image, commonly referred to as keypoints. The image markers are then modeled by the keypoints. Although there are many effective logo detection methods in the past, deep learning methods have become the mainstream at present. Many depth-finding models, such as Faster R-CNN, SSD, CornerNet, Yolov3, and Yolov4, have been widely used in the field of marker detection. The deep learning-based model achieves a satisfactory effect in the aspect of identification detection. However, the accuracy and speed of these models have not been adequate for practical applications.

Disclosure of Invention

The invention provides a large-scale trademark detection method based on deep learning, which aims to overcome the defects in the prior art.

Aiming at the problems of training data shortage and inconsistent multi-scale object and bounding box regression, the invention combines an attention mechanism, strip pooling and weighting box fusion into the most advanced YOLOv4 framework and provides a new trademark detection method based on deep learning.

According to the invention, a scSE attention module is added at the key feature fusion position of the YOLOv4 backbone network, and aiming at the characteristics of long and narrow patterns in a logo image, stripe pooling is used for replacing maximum pooling in spatial pyramid pooling, so that the view of the model in a long and narrow space is expanded. In the prediction box selection phase, weighting box fusion is used to replace the traditional non-maximum suppression method.

The invention relates to a trademark detection method based on deep learning, which comprises the following steps:

step 1), preprocessing trademark picture data, and specifically comprising:

(1.1) sorting the obtained various trademark pictures, and classifying according to different trademark types;

(1.2) using a commercial picture label marking tool to mark the classified trademark pictures;

(1.3) checking the marked trademark picture, cleaning fuzzy data in the marked trademark picture, and manually correcting the possibly-existing wrong content of the mark;

step 2), training a trademark detection model, and specifically comprising:

(2.1) extracting the large-scale trademark data set obtained in the step 1, extracting a trademark corresponding name as a label, corresponding the picture with the label, and performing AFK-MC²Calculating 9 clustering centers of the labeling boxes in the data set by a clustering algorithm, recording numerical values, and completing the manufacture of a training set, a verification set and a test set;

(2.2) building a Trinity-yolk target detection model, wherein the Trinity-yolk takes a yolk 4 detector as a basic model, and 3 scSE (scale-sensitive sensing) attention mechanism modules are embedded in a backbone network CSPDarknet 53; the average Pooling in the replacement Feature Pyramid Network was Stripe Pooling; carrying out Weighted Fusion on the output frame by using Weighted Boxes Fusion in the Yolo Head to complete the construction of the model;

and (2.3) constructing a weighted fusion formula. The weighted fusion formula is shown as (1-1) (1-2).

And (2.4) constructing a CIoU Loss function. The formula of the loss function is shown in (1-3).

(2.5) inputting a constructed Trinity-yolk recognition model by taking a training set trademark picture and a label as input signals, adopting model weights pre-trained by ImageNet to perform transfer learning, changing the input picture into a 3-channel two-dimensional vector, extracting a feature vector through multiple convolution operations, selecting PA Net and SPP Net to perform reinforced extraction on the feature vector obtained by a trunk feature extraction network, and inputting the obtained feature vector into a YOLO Head acquisition result to obtain output signals, namely a trademark label and a confidence coefficient corresponding to the trademark picture;

step 3), identifying the label corresponding to the input picture trademark, and specifically comprising the following steps:

(3.1) selecting a trademark picture to be identified, and adjusting the size of the selected trademark picture;

and (3.2) loading the trademark detection model stored in the step (2), inputting the trademark picture obtained in the step (3.1), obtaining a label corresponding to the trademark picture, namely the type of the trademark picture, and obtaining a detection result.

Preferably, the Trinity-Yolo model input picture size in step (2.2) is 412 × 412, the weight attenuation rule is set to 0.0005, the initial learning rate is set to 0.0013, and the label corresponding to the input picture and the confidence thereof are output by using Mosaic data enhancement.

Preferably, step (3.1) resizes the identified trademark picture to 416 x 416.

The method can solve the problems of lack of training data, inconsistent multi-scale objects and inconsistent bounding box regression. The experimental result shows that compared with other depth detection models, the invention has higher performance.

The invention has the advantages that: the feature extraction capability of the backbone network is strong; stripe pooling enlarges the view of the model in a narrow and long space; the fusion of the weighting frames enables the final output prediction frame to be well corrected. The method is easy to operate, rapid in model training, high in accuracy and strong in generalization capability.

Drawings

Fig. 1 is a general block diagram of the present invention.

Fig. 2 is a YOLOv4 backbone feature extraction network with scSE modules added.

FIG. 3 is a diagram of a Strip Pooling Module model according to the present invention.

FIG. 4 is a diagram of an improved spatial pyramid pooling model according to the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The invention relates to a large-scale trademark detection method based on deep learning, which comprises the following steps of:

step 1), preprocessing trademark picture data, and specifically comprising:

step 2), training a trademark detection model, and specifically comprising:

(2.1) extracting the large-scale trademark data set obtained in the step 1, extracting a trademark corresponding name as a label, corresponding the picture with the label, calculating 9 clustering centers of a labeling frame in the data set through an AFK-MC2 clustering algorithm, recording numerical values, finishing the training set, the verification set and the test set;

and (2.2) building a Trinity-Yolo target detection model. The marker image contains fewer objects and there will be less data available for training. In the case that the training data volume cannot be changed, the characteristic capability of the network needs to be improved as much as possible, and the invention utilizes an attention mechanism to strengthen the network. As shown in fig. 2, Trinity-Yolo takes Yolov4 detector as a basic model, and 3 scSE attention mechanism modules are embedded in a backbone network CSPDarknet 53; spatial pooling can effectively capture remote context information for target detection class-pixel level prediction tasks. In addition to regular spatial pooling, which typically has a regular shape of NxN, the present invention introduces a new pooling strategy called Stripe pooling (Stripe pooling) to reconsider spatial pooling. Strip posing deploys an elongated pooled core shape and a spatial dimension that can capture the long distance relationship of isolated regions. In addition, strip posing maintains a narrow kernel shape in other spatial dimensions, so that local feature information is conveniently captured, and irrelevant areas are prevented from interfering with label prediction.

The model of the Strip Pooling Module is shown in FIG. 3. The average Pooling in the replacement Feature Pyramid Network was Stripe Pooling; carrying out Weighted Fusion on the output frame by using Weighted Boxes Fusion in the Yolo Head to complete the construction of the model; the size of an input picture of the Trinity-Yolo model is 412 x 412, the weight attenuation regular value is set to be 0.0005, the initial learning rate is set to be 0.0013, the Mosaic data is used for enhancing, and the output is a label corresponding to the input picture and the confidence coefficient of the label; in the work of identification recognition, there is a pattern composed of a large number of characters. These characters are typically arranged in stripes. Thereby obtaining a global characterization of the fringe pattern. The present invention modifies the spatial pyramid pooling in the YOLOv4 model. The improved spatial pyramid pooling model is shown in fig. 4. Spatial pyramid pooling may expand the receptive field of the model. The largest pooling used for spatial pyramid pooling is a square pooling window, which is difficult to capture the overall features of long and narrow patterns. After the strip pooling is used for replacing the maximum pooling, the capability of the model for extracting the characteristic of the strip target mode is enhanced.

(3.1) selecting a trademark picture to be identified, and adjusting the size of the trademark picture to 416 x 416;

and (3.2) loading the trademark detection model stored in the step 2, inputting the trademark picture obtained in the step (3.1), obtaining a label corresponding to the trademark picture, namely the type of the trademark picture, and obtaining a detection result.

The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims

1. A large-scale trademark detection method based on deep learning comprises the following steps:

step 1), preprocessing trademark picture data, and specifically comprising:

step 2), training a trademark detection model, and specifically comprising:

2. The large-scale trademark detection method based on deep learning as claimed in claim 1, wherein: the Trinity-Yolo model input picture size in the step (2.2) is 412 x 412, the weight attenuation regular value is set to 0.0005, the initial learning rate is set to 0.0013, the Mosaic data is used for enhancing, and the label corresponding to the input picture and the confidence coefficient of the label are output.

3. The large-scale trademark detection method based on deep learning as claimed in claim 1, wherein: step (3.1) resizes the identified trademark picture to 416 x 416.