CN114693966A - Target detection method based on deep learning - Google Patents
Target detection method based on deep learning Download PDFInfo
- Publication number
- CN114693966A CN114693966A CN202210259711.5A CN202210259711A CN114693966A CN 114693966 A CN114693966 A CN 114693966A CN 202210259711 A CN202210259711 A CN 202210259711A CN 114693966 A CN114693966 A CN 114693966A
- Authority
- CN
- China
- Prior art keywords
- model
- data
- data set
- training
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 26
- 238000013135 deep learning Methods 0.000 title claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000012360 testing method Methods 0.000 claims abstract description 16
- 238000010276 construction Methods 0.000 claims abstract description 15
- 238000011156 evaluation Methods 0.000 claims abstract description 15
- 230000011218 segmentation Effects 0.000 claims abstract description 13
- 238000000034 method Methods 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims abstract description 6
- 230000000694 effects Effects 0.000 claims abstract description 5
- 238000013145 classification model Methods 0.000 claims abstract description 4
- 238000002372 labelling Methods 0.000 claims abstract description 4
- 230000004807 localization Effects 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 2
- 238000011056 performance test Methods 0.000 claims 1
- 238000010801 machine learning Methods 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000010606 normalization Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 210000003739 neck Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of machine learning, and particularly relates to a target detection method based on deep learning, which comprises the following steps of: collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set EMG to complete the construction of an original data set, labeling the category of the original data set, and completing the construction of a data set required by model training; data preprocessing: preprocessing data, and dividing different types of original data pictures by different data segmentation methods to ensure the training effect of the model; model construction: adopting a low-dimensional convolutional neural network to build a recognition classification model, inputting training data, completing the building of a parameter model, and storing the model: when the loss function of the model is not reduced any more, the model is saved; and (3) model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance. The invention provides a new model (G-RCN) for separating classification and positioning tasks, optimizes the difference between the classification and positioning tasks and realizes the improvement of performance.
Description
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to a target detection method based on deep learning.
Background
At present, the deep neural network achieves the performance of a super-human level in 1000 types of target recognition tasks, but the detection network does not achieve the performance equivalent to that of a human in 80 types of target identification tasks, which indicates that a gap exists between the recognition task and the detection task. Object detection requires powerful classification performance and the ability to pinpoint an object among an infinite number of candidate locations.
Problems or disadvantages of the prior art: at present, a target detection model utilizes a shared feature map to complete classification and positioning tasks at the same time, but the performance of the original fast RCNN and the RCNN of partial separation feature mapping can only be compared, and the performance is poor.
Disclosure of Invention
Based on the method, the target detection method based on deep learning is provided, the construction of an original data set is completed by collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set, and images in the COCO data set are classified according to the pixel size. And after the data collection is finished, preprocessing the data, wherein the preprocessing comprises segmentation, normalization and the like. Inputting the preprocessed data into the built G-RCN fusion network model for training the network model, storing the model until the loss function of the model does not decrease any more, completing model construction, inputting the test data into the stored network model to complete model evaluation, and identifying the performance of the model.
The application discloses a target detection method based on deep learning, which comprises the following steps,
s1, data acquisition: collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set EMG to complete the construction of an original data set, labeling the category of the original data set, and completing the construction of a data set required by model training;
s2, preprocessing data: preprocessing data, and dividing different types of original data pictures by different data segmentation methods to ensure the training effect of the model;
s3, model construction: building a recognition classification model by adopting a low-dimensional convolutional neural network, inputting training data, and completing the building of a parameter model;
s4, model storage: when the loss function of the model is not reduced any more, the model is saved;
s5, model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance.
Further, in step S1, the images in the original data set are used for performance testing of the model, and the images in the COCO data set are classified according to the size of pixels, where the classification is based on the size of the images being respectively less than 32 × 32, greater than 32 × 32, less than 96 × 96, and greater than 96 × 96, and the PASCAL VOC data set includes a training set and a testing set, and the two data sets are integrated to construct training and testing suitable for the network model.
Further, in step S2: the method comprises data segmentation and image scaling, wherein the data segmentation is used for segmenting an obtained original data set, and the data segmentation is carried out according to the following steps of 7: 3, dividing the model into a training set and a testing set, wherein the training set is used for training the model, and the rest images are used for final model performance evaluation;
the image scaling is as follows: the pixel sizes of the images are unified to 600 × 600 before the images are input.
Further, in step S3: constructing a convolutional network based on a gap optimization region to realize the identification of minimum texture without adding any additional modules or information, adopting G-RCN on fast R-CNN with ResNet101 backbone, separating ResNet101 from the last 6 bottlenecks of conv4 blocks, and using conv5 blocks as original blocks in head; the first convolution layer of the first bottleneck in the conv4 block originally has a stride of 2, is modified to a stride of 1 in the localization branch, the classification and localization branches share the same kernel of the first 17 bottlenecks of the conv4 block, the conv5 layer modifies stride to 1, and where all layers of the conv5 block in the ResNet101 used as headers, appended to the RoI pooling layer, the backbone is composed of all layers in the first four blocks, and a shared signature for classification and localization is generated.
Further, in step S5, the index is evaluated by using AP and MPA, and the formula is as follows:where pii represents the number of correctly classified pixels; pij represents the number of all pixels, i, j are serial numbers;where N is all categories.
Compared with the prior art, the invention has the following beneficial effects:
the invention designs a target detection method based on deep learning, which analyzes the classification and positioning tasks of a target detection model based on a region and respectively researches the effects of the two tasks, so that the high-level feature sharing of the classification and positioning tasks is suboptimal, the large span is beneficial to classification but not beneficial to positioning, and the global context information can improve the classification performance. On the basis, a new model (G-RCN) for separating classification and positioning tasks is provided, the difference between the classification and positioning tasks is optimized, and the performance is improved.
Drawings
FIG. 1 is a block flow diagram of the present invention.
Detailed Description
Technical inventions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The application discloses a target detection method based on deep learning, as shown in fig. 1, comprising the following steps,
s1, data acquisition: collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set EMG to complete the construction of an original data set, labeling the category of the original data set, and completing the construction of a data set required by model training;
collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set to complete the construction of an original data set, wherein the COCO data set comprises 80 categories, 83000 images are used for training, 40000 images are used for performance testing of a model, and the images in the COCO data set are classified according to the pixel size, and the classification is carried out according to the sizes of the images which are respectively smaller than 32 x 32, larger than 32 x 32, smaller than 96 x 96 and larger than 96 x 96. The PASCAL VOC data set comprises 11000 training sets and 5000 testing sets, and the two data sets are integrated to construct a special data set suitable for the experiment and used for training and testing a network model.
S2, preprocessing data: preprocessing data, and dividing different types of original data pictures by different data segmentation methods to ensure the training effect of the model;
data normalization: Min-Max normalization was performed for each piece of data.
Data segmentation: and segmenting the obtained original data set according to the following steps of 7: 3 into a training set and a test set, wherein the training set is used for training the model, and the rest images are used for final model performance evaluation.
Image zooming: because the image source directions in the original data are different, the sizes of the obtained images are different, and in order to meet the input requirement of a training model, the pixel sizes of the images are unified to 600 multiplied by 600 before the images are input, so that the model detection performance is improved.
S3, the specific method for model construction is as follows: building a recognition classification model by adopting a low-dimensional convolution neural network, inputting training data, and completing building of a parameter model; a Convolutional Network (G-RCN) Based on a Gap optimized Region is constructed, the identification of the minimum texture is realized under the condition of not adding any extra module or information, the G-RCN is adopted on a Faster R-CNN with a ResNet101 backbone, and a method for separating the two tasks is provided. ResNet101 was detached from the last 6 necks of the conv4 block, while the conv5 block was still used as the original block in the head. The original stride of the first convolution layer of the first bottleneck in the conv4 block is 2, and the modified stride is 1 in the localization branch. However, the classification and localization branches share the same kernel of the first 17 bottlenecks of the conv4 block. Where the RoI is aligned to output size 7 × 7 instead of 14 × 14 to save memory footprint on level conv5, and level conv5 modifies stride to 1 to improve localization performance. And all layers of the conv5 block in the ResNet101 used therein as headers, appended to the RoI pooling layer. The skeleton consists of all layers in the first four blocks and generates a shared feature map for classification and localization. In addition, the conv5 block utilizing the header can obtain better performance than the conv5 block which is put into a feature extraction network and shared MLP is used at the header, and meanwhile, stronger and more abstract feature representation can be obtained, which is beneficial to classification; the used Faster R-CNN learns the features of the low level by means of multi-task learning, starts splitting from the last few layers of the backbone and updates the feature map according to the correlation of the local part and the global context. Meanwhile, the stride of the positioning branch is reduced, the performance of the model is improved, and finally, the classification and positioning branches share one head to prevent the parameters from being obviously increased; the Gap-optimized Region is mainly used for improving the performance without additional functions by slightly modifying the paradigm, and the improvement is mainly from 3 aspects: separating, detecting global context and stride. The classification and positioning tasks are respectively placed on different parts of the detection model, so that the performance of the whole target detection model is improved. The method has the advantages that the global context information is rich, more auxiliary information and diversified information are added, and the classification can be obviously improved. Pooling layers are widely used in convolutional neural networks. In the tasks of classification, segmentation and detection, the pooling layer is an important component of the feature extraction network, and plays an important role in reducing resolution, reducing parameters, transmitting main information and the like. However, pooling layers also suffer from drawbacks including loss of sub-optimal information and alignment inaccuracies, preserving information by using a 2 × 2 convolution kernel with stride of 2 instead of pooling layers, while reducing the resolution of feature maps.
S4 model saving: when the loss function of the model is no longer decreasing, the model is saved.
S5, model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance. In order to complete the performance evaluation of the model, the evaluation indexes are obtained by using AP, MPA and the like, and the formula is as follows:where pii represents the number of correctly classified pixels; pij denotes the number of all pixels.
Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are included in the scope of the present invention.
Claims (5)
1. A target detection method based on deep learning is characterized by comprising the following steps,
s1, data acquisition: collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set EMG to complete the construction of an original data set, labeling the category of the original data set, and completing the construction of a data set required by model training;
s2, preprocessing data: preprocessing data, and dividing different types of original data pictures by different data segmentation methods to ensure the training effect of the model;
s3, model construction: building a recognition classification model by adopting a low-dimensional convolutional neural network, inputting training data, and completing the building of a parameter model;
s4, model storage: when the loss function of the model is not reduced any more, the model is saved;
s5, model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance.
2. The method for detecting an object based on deep learning of claim 1, wherein in step S1, the images in the raw data set are used for performance test of the model, and the images in the COCO data set are classified according to pixel size, the classification is based on sizes respectively smaller than 32 × 32, larger than 32 × 32, smaller than 96 × 96, and larger than 96 × 96, and the paschaloc data set includes a training set and a testing set, and the two data sets are integrated to construct the training and testing set suitable for the network model.
3. The deep learning-based object detection method according to claim 2, wherein in step S2: the method comprises data segmentation and image scaling, wherein the data segmentation is used for segmenting an obtained original data set, and the data segmentation is carried out according to the following steps of 7: 3, dividing the model into a training set and a testing set, wherein the training set is used for training the model, and the rest images are used for final model performance evaluation;
the image scaling is as follows: the pixel sizes of the images are unified to 600 × 600 before the images are input.
4. The deep learning-based object detection method according to claim 3, wherein in step S3: constructing a convolutional network based on a gap optimization region to realize the identification of minimum texture without adding any additional modules or information, adopting G-RCN on fast R-CNN with ResNet101 backbone, separating ResNet101 from the last 6 bottlenecks of conv4 blocks, and using conv5 blocks as original blocks in head; the first convolution layer of the first bottleneck in the conv4 block originally has a stride of 2, is modified to a stride of 1 in the localization branch, the classification and localization branches share the same kernel of the first 17 bottlenecks of the conv4 block, the conv5 layer modifies stride to 1, and where all layers of the conv5 block in the ResNet101 used as headers, appended to the RoI pooling layer, the backbone is composed of all layers in the first four blocks, and a shared signature for classification and localization is generated.
5. The method for detecting an object based on deep learning of claim 4, wherein in the step S5, the evaluation index is evaluated by using AP and MPA, and the formula is as follows:where pii represents the number of correctly classified pixels; pij represents the number of all pixels, i, j are serial numbers;where N is all categories.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210259711.5A CN114693966A (en) | 2022-03-16 | 2022-03-16 | Target detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210259711.5A CN114693966A (en) | 2022-03-16 | 2022-03-16 | Target detection method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114693966A true CN114693966A (en) | 2022-07-01 |
Family
ID=82139098
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210259711.5A Pending CN114693966A (en) | 2022-03-16 | 2022-03-16 | Target detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114693966A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115902814A (en) * | 2023-03-09 | 2023-04-04 | 中国人民解放军国防科技大学 | Target recognition model performance evaluation method and device based on information space measurement |
CN117741070A (en) * | 2024-02-21 | 2024-03-22 | 山东多瑞电子科技有限公司 | Deep learning-based gas safety intelligent detection method |
-
2022
- 2022-03-16 CN CN202210259711.5A patent/CN114693966A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115902814A (en) * | 2023-03-09 | 2023-04-04 | 中国人民解放军国防科技大学 | Target recognition model performance evaluation method and device based on information space measurement |
CN117741070A (en) * | 2024-02-21 | 2024-03-22 | 山东多瑞电子科技有限公司 | Deep learning-based gas safety intelligent detection method |
CN117741070B (en) * | 2024-02-21 | 2024-05-03 | 山东多瑞电子科技有限公司 | Deep learning-based gas safety intelligent detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111210443B (en) | Deformable convolution mixing task cascading semantic segmentation method based on embedding balance | |
CN113052210B (en) | Rapid low-light target detection method based on convolutional neural network | |
CN109241982B (en) | Target detection method based on deep and shallow layer convolutional neural network | |
CN109190752A (en) | The image, semantic dividing method of global characteristics and local feature based on deep learning | |
CN114067143B (en) | Vehicle re-identification method based on double sub-networks | |
CN112150493A (en) | Semantic guidance-based screen area detection method in natural scene | |
CN114693966A (en) | Target detection method based on deep learning | |
CN111832453B (en) | Unmanned scene real-time semantic segmentation method based on two-way deep neural network | |
CN113011288A (en) | Mask RCNN algorithm-based remote sensing building detection method | |
CN110598788A (en) | Target detection method and device, electronic equipment and storage medium | |
CN112488229A (en) | Domain self-adaptive unsupervised target detection method based on feature separation and alignment | |
CN110334709A (en) | Detection method of license plate based on end-to-end multitask deep learning | |
US12087046B2 (en) | Method for fine-grained detection of driver distraction based on unsupervised learning | |
CN111767854B (en) | SLAM loop detection method combined with scene text semantic information | |
CN113850136A (en) | Yolov5 and BCNN-based vehicle orientation identification method and system | |
CN108363962B (en) | Face detection method and system based on multi-level feature deep learning | |
CN111462090A (en) | Multi-scale image target detection method | |
CN113066089A (en) | Real-time image semantic segmentation network based on attention guide mechanism | |
CN113160291A (en) | Change detection method based on image registration | |
CN117912058A (en) | Cattle face recognition method | |
CN117218643A (en) | Fruit identification method based on lightweight neural network | |
CN117422998A (en) | Improved river float identification algorithm based on YOLOv5s | |
CN116797830A (en) | Image risk classification method and device based on YOLOv7 | |
US20240005635A1 (en) | Object detection method and electronic apparatus | |
CN116758610A (en) | Attention mechanism and feature fusion-based light-weight human ear recognition method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |