CN114266887B

CN114266887B - Large-scale trademark detection method based on deep learning

Info

Publication number: CN114266887B
Application number: CN202111610685.8A
Authority: CN
Inventors: 陈凯彦; 张拓; 金润辉; 徐瑞吉; 毛科技
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2023-07-14
Anticipated expiration: 2041-12-27
Also published as: CN114266887A

Abstract

A large scale brand detection method based on deep learning, comprising: the method comprises the following steps: step 1), preprocessing trademark picture data; step 2), training a trademark detection model; step 3), identifying a label corresponding to the input picture trademark; the invention can solve the problems of lack of training data and inconsistent regression of multi-scale objects and boundary boxes. Experimental results show that compared with other depth detection models, the method has higher performance.

Description

Large-scale trademark detection method based on deep learning

Technical Field

The invention belongs to the technical field of machine vision, and discloses a novel method for large-scale trademark detection based on deep learning.

Background

In the multimedia field, identification is very widely studied. As one of the important branches of marker research, marker detection plays a great role in various applications. Sign detection may be used for video advertising research, brand awareness monitoring and analysis, brand infringement detection, autopilot, and intelligent transportation, to name a few.

However, detecting a logo from an image is a challenging task. Because there are many brands in the real world, the same brand identity is diverse. Meanwhile, compared with a general object image, the background of the logo image is highly complex and can be interfered by factors such as illumination, shielding, blurring and the like. Identifying unknown fonts, colors, and sizes that may be different on different platforms, inter-class similarity and intra-class differences may make this problem more difficult. Finally, markers are typically small targets compared to general detection objects, which presents a significant challenge to marker detection algorithms.

In the past, most logo detection algorithms were based on SIFI. This method can detect stable and significant points in an image across multiple scales, commonly referred to as keypoints. The image markers are then modeled by keypoints. Although there are many effective logo detection methods in the past, deep learning methods have become the mainstream. Many depth detection models, such as Faster R-CNN, SSD, cornerNet, YOLOv3 and Yolov4, have been widely used in the field of marker detection. The model based on deep learning achieves satisfactory effect in identification detection. However, the accuracy and speed of these models has not been satisfactory for practical applications.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a large-scale trademark detection method based on deep learning.

Aiming at the problems of lack of training data and inconsistent regression of multi-scale objects and boundary frames, the invention combines an attention mechanism, banding pooling and weighted frame fusion into the most advanced YOLOv4 frame, and provides a novel trademark detection method based on deep learning.

According to the invention, the scSE attention module is added at the key feature fusion position of the YOLOv4 backbone network, and aiming at the characteristics of long and narrow patterns in logo images, stripe pooling is used for replacing maximum pooling in spatial pyramid pooling, so that the view of the model in a long and narrow space is enlarged. In the prediction block selection stage, weighted block fusion is used instead of the conventional non-maximum suppression method.

The trademark detection method based on deep learning provided by the invention comprises the following steps:

step 1), trademark picture data preprocessing, which specifically comprises the following steps:

(1.1) sorting the obtained various trademark pictures, and classifying according to different trademark types;

(1.2) marking the classified trademark pictures by using a picture label marking tool on the market;

(1.3) checking the marked trademark picture, cleaning fuzzy data in the trademark picture, and manually correcting marking error contents possibly existing;

step 2), training a trademark detection model, which specifically comprises the following steps:

(2.1) extracting the large-scale trademark data set obtained in the step 1, extracting the trademark corresponding name as a label, associating the picture with the label, and passing through the AFK-MC ² The clustering algorithm calculates 9 clustering centers of the labeling frame in the data set, records numerical values, and completes the production of a training set, a verification set and a test set;

(2.2) building a Trinity-Yolo target detection model, wherein the Trinity-Yolo takes a Yolov4 detector as a basic model, and 3 scSE attention mechanism modules are embedded in a backbone network CSPDarknet 53; the average Pooling in substitution Feature Pyramid Network is strip Pooling; using Weighted Boxes Fusion to carry out weighted fusion on the output frame in the Yolo Head to complete the construction of a model;

(2.3) constructing a weighted fusion formula. The weighted fusion formula is shown as (1-1) (1-2).

(2.4) constructing a CIoU Loss function. The loss function formula is shown as (1-3).

(2.5) taking a training set trademark picture and a label as input signals, inputting a built Trinity-Yolo recognition model, performing migration learning by adopting model weights pre-trained by ImageNet, converting the input picture into a 3-channel two-dimensional vector, extracting a feature vector through multiple convolution operations, performing enhanced extraction on the feature vector obtained by a trunk feature extraction network by using PA Net and SPP Net, and finally inputting the obtained feature vector into a Yolo Head to obtain a result, thereby obtaining an output signal, namely a trademark label and a confidence corresponding to the trademark picture;

step 3), identifying a label corresponding to the input picture trademark, which specifically comprises the following steps:

(3.1) selecting a trademark picture to be identified, and adjusting the size of the selected trademark picture;

and (3.2) loading the trademark detection model stored in the step (2), inputting the trademark picture obtained in the step (3.1) to obtain a label corresponding to the trademark picture, namely the type of the trademark picture, and obtaining a detection result.

Preferably, in the step (2.2), the size of the input picture of the Trinity-Yolo model is 412×412, the weight attenuation regular value is set to 0.0005, the initial learning rate is set to 0.0013, the input picture is enhanced by using Mosaic data, and the label corresponding to the input picture and the confidence coefficient thereof are output.

Preferably, step (3.1) resizes the identified trademark picture to 416 x 416.

The invention can solve the problems of lack of training data and inconsistent regression of multi-scale objects and boundary boxes. Experimental results show that compared with other depth detection models, the method has higher performance.

The invention has the advantages that: the backbone network has strong feature extraction capability; striping enlarges the field of view of the model in the elongated space; the weighted frame fusion enables the finally output prediction frame to be well corrected. The method is easy to operate, quick in model training, high in accuracy and strong in generalization capability.

Drawings

Fig. 1 is a general block diagram of the present invention.

FIG. 2 is a diagram of a Yolov4 backbone feature extraction network with scSE modules added in accordance with the present invention.

Fig. 3 is a model diagram of the invention Strip Pooling Module.

FIG. 4 is a diagram of an improved spatial pyramid pooling model of the present invention.

Detailed Description

The invention is further described below with reference to the drawings.

The invention relates to a large-scale trademark detection method based on deep learning, which comprises the following steps:

(2.1) extracting the large-scale trademark data set obtained in the step 1, extracting the trademark corresponding name as a label, corresponding the picture to the label, calculating 9 clustering centers of a labeling frame in the data set through an AFK-MC2 clustering algorithm, recording values, and completing the production of a training set, a verification set and a test set;

(2.2) building a Trinity-Yolo target detection model. The logo image contains fewer objects and there will be less data available for training. In the case where the amount of training data cannot be changed, it is desirable to increase the feature capacity of the network as much as possible, and the present invention uses the attention mechanism to strengthen the network. As shown in fig. 2, the Trinity-Yolo uses the Yolov4 detector as a basic model, and 3 scSE attention mechanism modules are embedded in the backbone network CSPDarknet 53; spatial pooling can effectively capture remote context information for pixel-level prediction tasks such as target detection. In addition to regular spatial pooling, which typically has a regular shape NxN, the present invention introduces a new pooling strategy called Stripe pooling (striping) to reconsider spatial pooling. The strip pooling deploys a strip of pooled core shape and a spatial dimension to capture long distance relationships of isolated regions. In addition, the strip mapping maintains a narrow kernel shape in other spatial dimensions, facilitates capturing local feature information, and prevents irrelevant areas from interfering with tag predictions.

A model of Strip Pooling Module is shown in fig. 3. The average Pooling in substitution Feature Pyramid Network is strip Pooling; using Weighted Boxes Fusion to carry out weighted fusion on the output frame in the Yolo Head to complete the construction of a model; the size of an input picture of the Trinity-Yolo model is 412, the weight attenuation regular value is set to 0.0005, the initial learning rate is set to 0.0013, the input picture is enhanced by using Mosaic data, and a label corresponding to the input picture and the confidence coefficient thereof are output; in the work of identification recognition, there is a pattern composed of a large number of characters. These characters are typically arranged in stripes. Thereby obtaining global features of the fringe pattern. The invention modifies the spatial pyramid pooling in the YOLOv4 model. The improved spatial pyramid pooling model is shown in fig. 4. Spatial pyramid pooling can expand the receptive field of the model. The maximum pooling used for spatial pyramid pooling is a square pooling window, which makes it difficult to capture the overall features of long and narrow patterns. The ability of the model to extract the target pattern features of the stripes is enhanced after using stripe pooling instead of maximum pooling.

(3.1) selecting a trademark picture to be identified, and adjusting the size of the trademark picture to 416 x 416;

The embodiments described in the present specification are merely examples of implementation forms of the inventive concept, and the scope of protection of the present invention should not be construed as being limited to the specific forms set forth in the embodiments, and the scope of protection of the present invention and equivalent technical means that can be conceived by those skilled in the art based on the inventive concept.

Claims

1. A large scale brand detection method based on deep learning, comprising:

(2.3) constructing a weighted fusion formula; the weighted fusion formula is shown as (1-1) (1-2):

wherein X1,2 and Y1,2 respectively represent X, Y coordinates of the center of the fused output frame; t represents the number of output frame groups generated by the model, and i represents a certain group in the T group output frames; c (C) _i The weight corresponding to the i groups of output frames is calculated; x1,2 _i 、Y1,2 _i X, Y coordinates of the center of the i groups of output frames;

(2.4) constructing a CIoU Loss function; the loss function formula is shown as (1-3):

wherein b represents an output box of the model; b _gt Representing a real frame of a target in an input picture; c represents covariance for scaling the size of the target box; alpha represents a balance coefficient; w and h are the width and height of the model output box; h is a ^gt 、w ^gt Is the height and width of the real frame;

representing a minimum bounding rectangle between two boxes;

2. A deep learning-based large scale brand detection method as claimed in claim 1, wherein: and (2.2) setting the size of an input picture of the Trinity-Yolo model to 412 x 412, setting the weight attenuation regular value to 0.0005, setting the initial learning rate to 0.0013, enhancing by using the Mosaic data, and outputting a label corresponding to the input picture and the confidence coefficient thereof.

3. A deep learning-based large scale brand detection method as claimed in claim 1, wherein: step (3.1) adjusts the size of the identified trademark picture to 416 x 416.