CN114266887A - Large-scale trademark detection method based on deep learning - Google Patents

Large-scale trademark detection method based on deep learning Download PDF

Info

Publication number
CN114266887A
CN114266887A CN202111610685.8A CN202111610685A CN114266887A CN 114266887 A CN114266887 A CN 114266887A CN 202111610685 A CN202111610685 A CN 202111610685A CN 114266887 A CN114266887 A CN 114266887A
Authority
CN
China
Prior art keywords
trademark
picture
label
model
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111610685.8A
Other languages
Chinese (zh)
Other versions
CN114266887B (en
Inventor
陈凯彦
张拓
金润辉
徐瑞吉
毛科技
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202111610685.8A priority Critical patent/CN114266887B/en
Publication of CN114266887A publication Critical patent/CN114266887A/en
Application granted granted Critical
Publication of CN114266887B publication Critical patent/CN114266887B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Image Analysis (AREA)

Abstract

A large-scale trademark detection method based on deep learning comprises the following steps: the method comprises the following steps: step 1), preprocessing trademark picture data; step 2), training a trademark detection model; step 3), identifying a label corresponding to the input picture trademark; the method can solve the problems of lack of training data, inconsistent multi-scale objects and inconsistent bounding box regression. The experimental result shows that compared with other depth detection models, the invention has higher performance.

Description

Large-scale trademark detection method based on deep learning
Technical Field
The invention belongs to the technical field of machine vision, and discloses a novel method for large-scale trademark detection based on deep learning.
Background
In the multimedia field, the research on identification is very extensive. As one of the important branches of marker research, marker detection plays a great role in various applications. Logo detection may be used for video advertising research, brand awareness monitoring and analysis, brand infringement detection, autonomous driving, and intelligent transportation, to name a few.
However, detecting a logo from an image is a challenging task. Because there are many brands in the real world, the same brand identity is diverse. Meanwhile, compared with a general object image, the background of the logo image is highly complex and may be interfered by factors such as illumination, shading and blurring. Identifying unknown fonts, colors, and sizes that may be different on different platforms, inter-class similarity and intra-class differences may make this problem more difficult. Finally, the markers are usually small targets compared to the general detection objects, which presents a great challenge to the marker detection algorithm.
In the past, most marker detection algorithms were based on SIFI. This method can detect stable and significant points across multiple scales in an image, commonly referred to as keypoints. The image markers are then modeled by the keypoints. Although there are many effective logo detection methods in the past, deep learning methods have become the mainstream at present. Many depth-finding models, such as Faster R-CNN, SSD, CornerNet, Yolov3, and Yolov4, have been widely used in the field of marker detection. The deep learning-based model achieves a satisfactory effect in the aspect of identification detection. However, the accuracy and speed of these models have not been adequate for practical applications.
Disclosure of Invention
The invention provides a large-scale trademark detection method based on deep learning, which aims to overcome the defects in the prior art.
Aiming at the problems of training data shortage and inconsistent multi-scale object and bounding box regression, the invention combines an attention mechanism, strip pooling and weighting box fusion into the most advanced YOLOv4 framework and provides a new trademark detection method based on deep learning.
According to the invention, a scSE attention module is added at the key feature fusion position of the YOLOv4 backbone network, and aiming at the characteristics of long and narrow patterns in a logo image, stripe pooling is used for replacing maximum pooling in spatial pyramid pooling, so that the view of the model in a long and narrow space is expanded. In the prediction box selection phase, weighting box fusion is used to replace the traditional non-maximum suppression method.
The invention relates to a trademark detection method based on deep learning, which comprises the following steps:
step 1), preprocessing trademark picture data, and specifically comprising:
(1.1) sorting the obtained various trademark pictures, and classifying according to different trademark types;
(1.2) using a commercial picture label marking tool to mark the classified trademark pictures;
(1.3) checking the marked trademark picture, cleaning fuzzy data in the marked trademark picture, and manually correcting the possibly-existing wrong content of the mark;
step 2), training a trademark detection model, and specifically comprising:
(2.1) extracting the large-scale trademark data set obtained in the step 1, extracting a trademark corresponding name as a label, corresponding the picture with the label, and performing AFK-MC2Calculating 9 clustering centers of the labeling boxes in the data set by a clustering algorithm, recording numerical values, and completing the manufacture of a training set, a verification set and a test set;
(2.2) building a Trinity-yolk target detection model, wherein the Trinity-yolk takes a yolk 4 detector as a basic model, and 3 scSE (scale-sensitive sensing) attention mechanism modules are embedded in a backbone network CSPDarknet 53; the average Pooling in the replacement Feature Pyramid Network was Stripe Pooling; carrying out Weighted Fusion on the output frame by using Weighted Boxes Fusion in the Yolo Head to complete the construction of the model;
and (2.3) constructing a weighted fusion formula. The weighted fusion formula is shown as (1-1) (1-2).
Figure BDA0003434802000000021
Figure BDA0003434802000000022
And (2.4) constructing a CIoU Loss function. The formula of the loss function is shown in (1-3).
Figure BDA0003434802000000023
(2.5) inputting a constructed Trinity-yolk recognition model by taking a training set trademark picture and a label as input signals, adopting model weights pre-trained by ImageNet to perform transfer learning, changing the input picture into a 3-channel two-dimensional vector, extracting a feature vector through multiple convolution operations, selecting PA Net and SPP Net to perform reinforced extraction on the feature vector obtained by a trunk feature extraction network, and inputting the obtained feature vector into a YOLO Head acquisition result to obtain output signals, namely a trademark label and a confidence coefficient corresponding to the trademark picture;
step 3), identifying the label corresponding to the input picture trademark, and specifically comprising the following steps:
(3.1) selecting a trademark picture to be identified, and adjusting the size of the selected trademark picture;
and (3.2) loading the trademark detection model stored in the step (2), inputting the trademark picture obtained in the step (3.1), obtaining a label corresponding to the trademark picture, namely the type of the trademark picture, and obtaining a detection result.
Preferably, the Trinity-Yolo model input picture size in step (2.2) is 412 × 412, the weight attenuation rule is set to 0.0005, the initial learning rate is set to 0.0013, and the label corresponding to the input picture and the confidence thereof are output by using Mosaic data enhancement.
Preferably, step (3.1) resizes the identified trademark picture to 416 x 416.
The method can solve the problems of lack of training data, inconsistent multi-scale objects and inconsistent bounding box regression. The experimental result shows that compared with other depth detection models, the invention has higher performance.
The invention has the advantages that: the feature extraction capability of the backbone network is strong; stripe pooling enlarges the view of the model in a narrow and long space; the fusion of the weighting frames enables the final output prediction frame to be well corrected. The method is easy to operate, rapid in model training, high in accuracy and strong in generalization capability.
Drawings
Fig. 1 is a general block diagram of the present invention.
Fig. 2 is a YOLOv4 backbone feature extraction network with scSE modules added.
FIG. 3 is a diagram of a Strip Pooling Module model according to the present invention.
FIG. 4 is a diagram of an improved spatial pyramid pooling model according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The invention relates to a large-scale trademark detection method based on deep learning, which comprises the following steps of:
step 1), preprocessing trademark picture data, and specifically comprising:
(1.1) sorting the obtained various trademark pictures, and classifying according to different trademark types;
(1.2) using a commercial picture label marking tool to mark the classified trademark pictures;
(1.3) checking the marked trademark picture, cleaning fuzzy data in the marked trademark picture, and manually correcting the possibly-existing wrong content of the mark;
step 2), training a trademark detection model, and specifically comprising:
(2.1) extracting the large-scale trademark data set obtained in the step 1, extracting a trademark corresponding name as a label, corresponding the picture with the label, calculating 9 clustering centers of a labeling frame in the data set through an AFK-MC2 clustering algorithm, recording numerical values, finishing the training set, the verification set and the test set;
and (2.2) building a Trinity-Yolo target detection model. The marker image contains fewer objects and there will be less data available for training. In the case that the training data volume cannot be changed, the characteristic capability of the network needs to be improved as much as possible, and the invention utilizes an attention mechanism to strengthen the network. As shown in fig. 2, Trinity-Yolo takes Yolov4 detector as a basic model, and 3 scSE attention mechanism modules are embedded in a backbone network CSPDarknet 53; spatial pooling can effectively capture remote context information for target detection class-pixel level prediction tasks. In addition to regular spatial pooling, which typically has a regular shape of NxN, the present invention introduces a new pooling strategy called Stripe pooling (Stripe pooling) to reconsider spatial pooling. Strip posing deploys an elongated pooled core shape and a spatial dimension that can capture the long distance relationship of isolated regions. In addition, strip posing maintains a narrow kernel shape in other spatial dimensions, so that local feature information is conveniently captured, and irrelevant areas are prevented from interfering with label prediction.
The model of the Strip Pooling Module is shown in FIG. 3. The average Pooling in the replacement Feature Pyramid Network was Stripe Pooling; carrying out Weighted Fusion on the output frame by using Weighted Boxes Fusion in the Yolo Head to complete the construction of the model; the size of an input picture of the Trinity-Yolo model is 412 x 412, the weight attenuation regular value is set to be 0.0005, the initial learning rate is set to be 0.0013, the Mosaic data is used for enhancing, and the output is a label corresponding to the input picture and the confidence coefficient of the label; in the work of identification recognition, there is a pattern composed of a large number of characters. These characters are typically arranged in stripes. Thereby obtaining a global characterization of the fringe pattern. The present invention modifies the spatial pyramid pooling in the YOLOv4 model. The improved spatial pyramid pooling model is shown in fig. 4. Spatial pyramid pooling may expand the receptive field of the model. The largest pooling used for spatial pyramid pooling is a square pooling window, which is difficult to capture the overall features of long and narrow patterns. After the strip pooling is used for replacing the maximum pooling, the capability of the model for extracting the characteristic of the strip target mode is enhanced.
And (2.3) constructing a weighted fusion formula. The weighted fusion formula is shown as (1-1) (1-2).
Figure BDA0003434802000000041
Figure BDA0003434802000000042
And (2.4) constructing a CIoU Loss function. The formula of the loss function is shown in (1-3).
Figure BDA0003434802000000043
(2.5) inputting a constructed Trinity-yolk recognition model by taking a training set trademark picture and a label as input signals, adopting model weights pre-trained by ImageNet to perform transfer learning, changing the input picture into a 3-channel two-dimensional vector, extracting a feature vector through multiple convolution operations, selecting PA Net and SPP Net to perform reinforced extraction on the feature vector obtained by a trunk feature extraction network, and inputting the obtained feature vector into a YOLO Head acquisition result to obtain output signals, namely a trademark label and a confidence coefficient corresponding to the trademark picture;
step 3), identifying the label corresponding to the input picture trademark, and specifically comprising the following steps:
(3.1) selecting a trademark picture to be identified, and adjusting the size of the trademark picture to 416 x 416;
and (3.2) loading the trademark detection model stored in the step 2, inputting the trademark picture obtained in the step (3.1), obtaining a label corresponding to the trademark picture, namely the type of the trademark picture, and obtaining a detection result.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims (3)

1. A large-scale trademark detection method based on deep learning comprises the following steps:
step 1), preprocessing trademark picture data, and specifically comprising:
(1.1) sorting the obtained various trademark pictures, and classifying according to different trademark types;
(1.2) using a commercial picture label marking tool to mark the classified trademark pictures;
(1.3) checking the marked trademark picture, cleaning fuzzy data in the marked trademark picture, and manually correcting the possibly-existing wrong content of the mark;
step 2), training a trademark detection model, and specifically comprising:
(2.1) extracting the large-scale trademark data set obtained in the step 1, extracting a trademark corresponding name as a label, corresponding the picture with the label, and performing AFK-MC2Calculating 9 clustering centers of the labeling boxes in the data set by a clustering algorithm, recording numerical values, and completing the manufacture of a training set, a verification set and a test set;
(2.2) building a Trinity-yolk target detection model, wherein the Trinity-yolk takes a yolk 4 detector as a basic model, and 3 scSE (scale-sensitive sensing) attention mechanism modules are embedded in a backbone network CSPDarknet 53; the average Pooling in the replacement Feature Pyramid Network was Stripe Pooling; carrying out Weighted Fusion on the output frame by using Weighted Boxes Fusion in the Yolo Head to complete the construction of the model;
and (2.3) constructing a weighted fusion formula. The weighted fusion formula is shown as (1-1) (1-2).
Figure FDA0003434801990000011
Figure FDA0003434801990000012
And (2.4) constructing a CIoU Loss function. The formula of the loss function is shown in (1-3).
Figure FDA0003434801990000013
(2.5) inputting a constructed Trinity-yolk recognition model by taking a training set trademark picture and a label as input signals, adopting model weights pre-trained by ImageNet to perform transfer learning, changing the input picture into a 3-channel two-dimensional vector, extracting a feature vector through multiple convolution operations, selecting PA Net and SPP Net to perform reinforced extraction on the feature vector obtained by a trunk feature extraction network, and inputting the obtained feature vector into a YOLO Head acquisition result to obtain output signals, namely a trademark label and a confidence coefficient corresponding to the trademark picture;
step 3), identifying the label corresponding to the input picture trademark, and specifically comprising the following steps:
(3.1) selecting a trademark picture to be identified, and adjusting the size of the selected trademark picture;
and (3.2) loading the trademark detection model stored in the step (2), inputting the trademark picture obtained in the step (3.1), obtaining a label corresponding to the trademark picture, namely the type of the trademark picture, and obtaining a detection result.
2. The large-scale trademark detection method based on deep learning as claimed in claim 1, wherein: the Trinity-Yolo model input picture size in the step (2.2) is 412 x 412, the weight attenuation regular value is set to 0.0005, the initial learning rate is set to 0.0013, the Mosaic data is used for enhancing, and the label corresponding to the input picture and the confidence coefficient of the label are output.
3. The large-scale trademark detection method based on deep learning as claimed in claim 1, wherein: step (3.1) resizes the identified trademark picture to 416 x 416.
CN202111610685.8A 2021-12-27 2021-12-27 Large-scale trademark detection method based on deep learning Active CN114266887B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111610685.8A CN114266887B (en) 2021-12-27 2021-12-27 Large-scale trademark detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111610685.8A CN114266887B (en) 2021-12-27 2021-12-27 Large-scale trademark detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN114266887A true CN114266887A (en) 2022-04-01
CN114266887B CN114266887B (en) 2023-07-14

Family

ID=80830171

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111610685.8A Active CN114266887B (en) 2021-12-27 2021-12-27 Large-scale trademark detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN114266887B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520273A (en) * 2018-03-26 2018-09-11 天津大学 A kind of quick detection recognition method of dense small item based on target detection
CN113344847A (en) * 2021-04-21 2021-09-03 安徽工业大学 Long tail clamp defect detection method and system based on deep learning
CN113591850A (en) * 2021-08-05 2021-11-02 广西师范大学 Two-stage trademark detection method based on computer vision robustness target detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520273A (en) * 2018-03-26 2018-09-11 天津大学 A kind of quick detection recognition method of dense small item based on target detection
CN113344847A (en) * 2021-04-21 2021-09-03 安徽工业大学 Long tail clamp defect detection method and system based on deep learning
CN113591850A (en) * 2021-08-05 2021-11-02 广西师范大学 Two-stage trademark detection method based on computer vision robustness target detection

Also Published As

Publication number Publication date
CN114266887B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
US10607362B2 (en) Remote determination of containers in geographical region
CN109409263B (en) Method for detecting urban ground feature change of remote sensing image based on Siamese convolutional network
Zhao et al. Cloud shape classification system based on multi-channel cnn and improved fdm
CN113160192B (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN105303198B (en) A kind of remote sensing image semisupervised classification method learnt from fixed step size
CN109711366B (en) Pedestrian re-identification method based on group information loss function
CN108960245A (en) The detection of tire-mold character and recognition methods, device, equipment and storage medium
CN108918536A (en) Tire-mold face character defect inspection method, device, equipment and storage medium
CN103400151A (en) Optical remote-sensing image, GIS automatic registration and water body extraction integrated method
CN102842044B (en) Method for detecting variation of remote-sensing image of high-resolution visible light
CN106845341A (en) A kind of unlicensed vehicle identification method based on virtual number plate
CN104598883A (en) Method for re-recognizing target in multi-camera monitoring network
CN106780727B (en) Vehicle head detection model reconstruction method and device
CN108305260A (en) Detection method, device and the equipment of angle point in a kind of image
CN107808157A (en) A kind of method and device of detonator coding positioning and identification
CN105574545B (en) The semantic cutting method of street environment image various visual angles and device
CN105718552A (en) Clothing freehand sketch based clothing image retrieval method
EP3553700A2 (en) Remote determination of containers in geographical region
CN112329559A (en) Method for detecting homestead target based on deep convolutional neural network
CN113033385A (en) Deep learning-based violation building remote sensing identification method and system
CN106529472A (en) Target detection method and apparatus based on large-scale high-resolution and high-spectral image
Saha et al. Unsupervised multiple-change detection in VHR optical images using deep features
CN105160285A (en) Method and system for recognizing human body tumble automatically based on stereoscopic vision
Zhong et al. Background subtraction driven seeds selection for moving objects segmentation and matting
CN111339974B (en) Method for identifying modern ceramics and ancient ceramics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant