CN114266887A - Large-scale trademark detection method based on deep learning - Google Patents
Large-scale trademark detection method based on deep learning Download PDFInfo
- Publication number
- CN114266887A CN114266887A CN202111610685.8A CN202111610685A CN114266887A CN 114266887 A CN114266887 A CN 114266887A CN 202111610685 A CN202111610685 A CN 202111610685A CN 114266887 A CN114266887 A CN 114266887A
- Authority
- CN
- China
- Prior art keywords
- trademark
- picture
- label
- model
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Image Analysis (AREA)
Abstract
A large-scale trademark detection method based on deep learning comprises the following steps: the method comprises the following steps: step 1), preprocessing trademark picture data; step 2), training a trademark detection model; step 3), identifying a label corresponding to the input picture trademark; the method can solve the problems of lack of training data, inconsistent multi-scale objects and inconsistent bounding box regression. The experimental result shows that compared with other depth detection models, the invention has higher performance.
Description
Technical Field
The invention belongs to the technical field of machine vision, and discloses a novel method for large-scale trademark detection based on deep learning.
Background
In the multimedia field, the research on identification is very extensive. As one of the important branches of marker research, marker detection plays a great role in various applications. Logo detection may be used for video advertising research, brand awareness monitoring and analysis, brand infringement detection, autonomous driving, and intelligent transportation, to name a few.
However, detecting a logo from an image is a challenging task. Because there are many brands in the real world, the same brand identity is diverse. Meanwhile, compared with a general object image, the background of the logo image is highly complex and may be interfered by factors such as illumination, shading and blurring. Identifying unknown fonts, colors, and sizes that may be different on different platforms, inter-class similarity and intra-class differences may make this problem more difficult. Finally, the markers are usually small targets compared to the general detection objects, which presents a great challenge to the marker detection algorithm.
In the past, most marker detection algorithms were based on SIFI. This method can detect stable and significant points across multiple scales in an image, commonly referred to as keypoints. The image markers are then modeled by the keypoints. Although there are many effective logo detection methods in the past, deep learning methods have become the mainstream at present. Many depth-finding models, such as Faster R-CNN, SSD, CornerNet, Yolov3, and Yolov4, have been widely used in the field of marker detection. The deep learning-based model achieves a satisfactory effect in the aspect of identification detection. However, the accuracy and speed of these models have not been adequate for practical applications.
Disclosure of Invention
The invention provides a large-scale trademark detection method based on deep learning, which aims to overcome the defects in the prior art.
Aiming at the problems of training data shortage and inconsistent multi-scale object and bounding box regression, the invention combines an attention mechanism, strip pooling and weighting box fusion into the most advanced YOLOv4 framework and provides a new trademark detection method based on deep learning.
According to the invention, a scSE attention module is added at the key feature fusion position of the YOLOv4 backbone network, and aiming at the characteristics of long and narrow patterns in a logo image, stripe pooling is used for replacing maximum pooling in spatial pyramid pooling, so that the view of the model in a long and narrow space is expanded. In the prediction box selection phase, weighting box fusion is used to replace the traditional non-maximum suppression method.
The invention relates to a trademark detection method based on deep learning, which comprises the following steps:
step 1), preprocessing trademark picture data, and specifically comprising:
(1.1) sorting the obtained various trademark pictures, and classifying according to different trademark types;
(1.2) using a commercial picture label marking tool to mark the classified trademark pictures;
(1.3) checking the marked trademark picture, cleaning fuzzy data in the marked trademark picture, and manually correcting the possibly-existing wrong content of the mark;
step 2), training a trademark detection model, and specifically comprising:
(2.1) extracting the large-scale trademark data set obtained in the step 1, extracting a trademark corresponding name as a label, corresponding the picture with the label, and performing AFK-MC2Calculating 9 clustering centers of the labeling boxes in the data set by a clustering algorithm, recording numerical values, and completing the manufacture of a training set, a verification set and a test set;
(2.2) building a Trinity-yolk target detection model, wherein the Trinity-yolk takes a yolk 4 detector as a basic model, and 3 scSE (scale-sensitive sensing) attention mechanism modules are embedded in a backbone network CSPDarknet 53; the average Pooling in the replacement Feature Pyramid Network was Stripe Pooling; carrying out Weighted Fusion on the output frame by using Weighted Boxes Fusion in the Yolo Head to complete the construction of the model;
and (2.3) constructing a weighted fusion formula. The weighted fusion formula is shown as (1-1) (1-2).
And (2.4) constructing a CIoU Loss function. The formula of the loss function is shown in (1-3).
(2.5) inputting a constructed Trinity-yolk recognition model by taking a training set trademark picture and a label as input signals, adopting model weights pre-trained by ImageNet to perform transfer learning, changing the input picture into a 3-channel two-dimensional vector, extracting a feature vector through multiple convolution operations, selecting PA Net and SPP Net to perform reinforced extraction on the feature vector obtained by a trunk feature extraction network, and inputting the obtained feature vector into a YOLO Head acquisition result to obtain output signals, namely a trademark label and a confidence coefficient corresponding to the trademark picture;
step 3), identifying the label corresponding to the input picture trademark, and specifically comprising the following steps:
(3.1) selecting a trademark picture to be identified, and adjusting the size of the selected trademark picture;
and (3.2) loading the trademark detection model stored in the step (2), inputting the trademark picture obtained in the step (3.1), obtaining a label corresponding to the trademark picture, namely the type of the trademark picture, and obtaining a detection result.
Preferably, the Trinity-Yolo model input picture size in step (2.2) is 412 × 412, the weight attenuation rule is set to 0.0005, the initial learning rate is set to 0.0013, and the label corresponding to the input picture and the confidence thereof are output by using Mosaic data enhancement.
Preferably, step (3.1) resizes the identified trademark picture to 416 x 416.
The method can solve the problems of lack of training data, inconsistent multi-scale objects and inconsistent bounding box regression. The experimental result shows that compared with other depth detection models, the invention has higher performance.
The invention has the advantages that: the feature extraction capability of the backbone network is strong; stripe pooling enlarges the view of the model in a narrow and long space; the fusion of the weighting frames enables the final output prediction frame to be well corrected. The method is easy to operate, rapid in model training, high in accuracy and strong in generalization capability.
Drawings
Fig. 1 is a general block diagram of the present invention.
Fig. 2 is a YOLOv4 backbone feature extraction network with scSE modules added.
FIG. 3 is a diagram of a Strip Pooling Module model according to the present invention.
FIG. 4 is a diagram of an improved spatial pyramid pooling model according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The invention relates to a large-scale trademark detection method based on deep learning, which comprises the following steps of:
step 1), preprocessing trademark picture data, and specifically comprising:
(1.1) sorting the obtained various trademark pictures, and classifying according to different trademark types;
(1.2) using a commercial picture label marking tool to mark the classified trademark pictures;
(1.3) checking the marked trademark picture, cleaning fuzzy data in the marked trademark picture, and manually correcting the possibly-existing wrong content of the mark;
step 2), training a trademark detection model, and specifically comprising:
(2.1) extracting the large-scale trademark data set obtained in the step 1, extracting a trademark corresponding name as a label, corresponding the picture with the label, calculating 9 clustering centers of a labeling frame in the data set through an AFK-MC2 clustering algorithm, recording numerical values, finishing the training set, the verification set and the test set;
and (2.2) building a Trinity-Yolo target detection model. The marker image contains fewer objects and there will be less data available for training. In the case that the training data volume cannot be changed, the characteristic capability of the network needs to be improved as much as possible, and the invention utilizes an attention mechanism to strengthen the network. As shown in fig. 2, Trinity-Yolo takes Yolov4 detector as a basic model, and 3 scSE attention mechanism modules are embedded in a backbone network CSPDarknet 53; spatial pooling can effectively capture remote context information for target detection class-pixel level prediction tasks. In addition to regular spatial pooling, which typically has a regular shape of NxN, the present invention introduces a new pooling strategy called Stripe pooling (Stripe pooling) to reconsider spatial pooling. Strip posing deploys an elongated pooled core shape and a spatial dimension that can capture the long distance relationship of isolated regions. In addition, strip posing maintains a narrow kernel shape in other spatial dimensions, so that local feature information is conveniently captured, and irrelevant areas are prevented from interfering with label prediction.
The model of the Strip Pooling Module is shown in FIG. 3. The average Pooling in the replacement Feature Pyramid Network was Stripe Pooling; carrying out Weighted Fusion on the output frame by using Weighted Boxes Fusion in the Yolo Head to complete the construction of the model; the size of an input picture of the Trinity-Yolo model is 412 x 412, the weight attenuation regular value is set to be 0.0005, the initial learning rate is set to be 0.0013, the Mosaic data is used for enhancing, and the output is a label corresponding to the input picture and the confidence coefficient of the label; in the work of identification recognition, there is a pattern composed of a large number of characters. These characters are typically arranged in stripes. Thereby obtaining a global characterization of the fringe pattern. The present invention modifies the spatial pyramid pooling in the YOLOv4 model. The improved spatial pyramid pooling model is shown in fig. 4. Spatial pyramid pooling may expand the receptive field of the model. The largest pooling used for spatial pyramid pooling is a square pooling window, which is difficult to capture the overall features of long and narrow patterns. After the strip pooling is used for replacing the maximum pooling, the capability of the model for extracting the characteristic of the strip target mode is enhanced.
And (2.3) constructing a weighted fusion formula. The weighted fusion formula is shown as (1-1) (1-2).
And (2.4) constructing a CIoU Loss function. The formula of the loss function is shown in (1-3).
(2.5) inputting a constructed Trinity-yolk recognition model by taking a training set trademark picture and a label as input signals, adopting model weights pre-trained by ImageNet to perform transfer learning, changing the input picture into a 3-channel two-dimensional vector, extracting a feature vector through multiple convolution operations, selecting PA Net and SPP Net to perform reinforced extraction on the feature vector obtained by a trunk feature extraction network, and inputting the obtained feature vector into a YOLO Head acquisition result to obtain output signals, namely a trademark label and a confidence coefficient corresponding to the trademark picture;
step 3), identifying the label corresponding to the input picture trademark, and specifically comprising the following steps:
(3.1) selecting a trademark picture to be identified, and adjusting the size of the trademark picture to 416 x 416;
and (3.2) loading the trademark detection model stored in the step 2, inputting the trademark picture obtained in the step (3.1), obtaining a label corresponding to the trademark picture, namely the type of the trademark picture, and obtaining a detection result.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.
Claims (3)
1. A large-scale trademark detection method based on deep learning comprises the following steps:
step 1), preprocessing trademark picture data, and specifically comprising:
(1.1) sorting the obtained various trademark pictures, and classifying according to different trademark types;
(1.2) using a commercial picture label marking tool to mark the classified trademark pictures;
(1.3) checking the marked trademark picture, cleaning fuzzy data in the marked trademark picture, and manually correcting the possibly-existing wrong content of the mark;
step 2), training a trademark detection model, and specifically comprising:
(2.1) extracting the large-scale trademark data set obtained in the step 1, extracting a trademark corresponding name as a label, corresponding the picture with the label, and performing AFK-MC2Calculating 9 clustering centers of the labeling boxes in the data set by a clustering algorithm, recording numerical values, and completing the manufacture of a training set, a verification set and a test set;
(2.2) building a Trinity-yolk target detection model, wherein the Trinity-yolk takes a yolk 4 detector as a basic model, and 3 scSE (scale-sensitive sensing) attention mechanism modules are embedded in a backbone network CSPDarknet 53; the average Pooling in the replacement Feature Pyramid Network was Stripe Pooling; carrying out Weighted Fusion on the output frame by using Weighted Boxes Fusion in the Yolo Head to complete the construction of the model;
and (2.3) constructing a weighted fusion formula. The weighted fusion formula is shown as (1-1) (1-2).
And (2.4) constructing a CIoU Loss function. The formula of the loss function is shown in (1-3).
(2.5) inputting a constructed Trinity-yolk recognition model by taking a training set trademark picture and a label as input signals, adopting model weights pre-trained by ImageNet to perform transfer learning, changing the input picture into a 3-channel two-dimensional vector, extracting a feature vector through multiple convolution operations, selecting PA Net and SPP Net to perform reinforced extraction on the feature vector obtained by a trunk feature extraction network, and inputting the obtained feature vector into a YOLO Head acquisition result to obtain output signals, namely a trademark label and a confidence coefficient corresponding to the trademark picture;
step 3), identifying the label corresponding to the input picture trademark, and specifically comprising the following steps:
(3.1) selecting a trademark picture to be identified, and adjusting the size of the selected trademark picture;
and (3.2) loading the trademark detection model stored in the step (2), inputting the trademark picture obtained in the step (3.1), obtaining a label corresponding to the trademark picture, namely the type of the trademark picture, and obtaining a detection result.
2. The large-scale trademark detection method based on deep learning as claimed in claim 1, wherein: the Trinity-Yolo model input picture size in the step (2.2) is 412 x 412, the weight attenuation regular value is set to 0.0005, the initial learning rate is set to 0.0013, the Mosaic data is used for enhancing, and the label corresponding to the input picture and the confidence coefficient of the label are output.
3. The large-scale trademark detection method based on deep learning as claimed in claim 1, wherein: step (3.1) resizes the identified trademark picture to 416 x 416.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111610685.8A CN114266887B (en) | 2021-12-27 | 2021-12-27 | Large-scale trademark detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111610685.8A CN114266887B (en) | 2021-12-27 | 2021-12-27 | Large-scale trademark detection method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114266887A true CN114266887A (en) | 2022-04-01 |
CN114266887B CN114266887B (en) | 2023-07-14 |
Family
ID=80830171
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111610685.8A Active CN114266887B (en) | 2021-12-27 | 2021-12-27 | Large-scale trademark detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114266887B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520273A (en) * | 2018-03-26 | 2018-09-11 | 天津大学 | A kind of quick detection recognition method of dense small item based on target detection |
CN113344847A (en) * | 2021-04-21 | 2021-09-03 | 安徽工业大学 | Long tail clamp defect detection method and system based on deep learning |
CN113591850A (en) * | 2021-08-05 | 2021-11-02 | 广西师范大学 | Two-stage trademark detection method based on computer vision robustness target detection |
-
2021
- 2021-12-27 CN CN202111610685.8A patent/CN114266887B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520273A (en) * | 2018-03-26 | 2018-09-11 | 天津大学 | A kind of quick detection recognition method of dense small item based on target detection |
CN113344847A (en) * | 2021-04-21 | 2021-09-03 | 安徽工业大学 | Long tail clamp defect detection method and system based on deep learning |
CN113591850A (en) * | 2021-08-05 | 2021-11-02 | 广西师范大学 | Two-stage trademark detection method based on computer vision robustness target detection |
Also Published As
Publication number | Publication date |
---|---|
CN114266887B (en) | 2023-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10607362B2 (en) | Remote determination of containers in geographical region | |
CN109409263B (en) | Method for detecting urban ground feature change of remote sensing image based on Siamese convolutional network | |
Zhao et al. | Cloud shape classification system based on multi-channel cnn and improved fdm | |
CN113160192B (en) | Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background | |
CN105303198B (en) | A kind of remote sensing image semisupervised classification method learnt from fixed step size | |
CN109711366B (en) | Pedestrian re-identification method based on group information loss function | |
CN108960245A (en) | The detection of tire-mold character and recognition methods, device, equipment and storage medium | |
CN108918536A (en) | Tire-mold face character defect inspection method, device, equipment and storage medium | |
CN103400151A (en) | Optical remote-sensing image, GIS automatic registration and water body extraction integrated method | |
CN102842044B (en) | Method for detecting variation of remote-sensing image of high-resolution visible light | |
CN106845341A (en) | A kind of unlicensed vehicle identification method based on virtual number plate | |
CN104598883A (en) | Method for re-recognizing target in multi-camera monitoring network | |
CN106780727B (en) | Vehicle head detection model reconstruction method and device | |
CN108305260A (en) | Detection method, device and the equipment of angle point in a kind of image | |
CN107808157A (en) | A kind of method and device of detonator coding positioning and identification | |
CN105574545B (en) | The semantic cutting method of street environment image various visual angles and device | |
CN105718552A (en) | Clothing freehand sketch based clothing image retrieval method | |
EP3553700A2 (en) | Remote determination of containers in geographical region | |
CN112329559A (en) | Method for detecting homestead target based on deep convolutional neural network | |
CN113033385A (en) | Deep learning-based violation building remote sensing identification method and system | |
CN106529472A (en) | Target detection method and apparatus based on large-scale high-resolution and high-spectral image | |
Saha et al. | Unsupervised multiple-change detection in VHR optical images using deep features | |
CN105160285A (en) | Method and system for recognizing human body tumble automatically based on stereoscopic vision | |
Zhong et al. | Background subtraction driven seeds selection for moving objects segmentation and matting | |
CN111339974B (en) | Method for identifying modern ceramics and ancient ceramics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |