CN114693966A - Target detection method based on deep learning - Google Patents

Target detection method based on deep learning Download PDF

Info

Publication number
CN114693966A
CN114693966A CN202210259711.5A CN202210259711A CN114693966A CN 114693966 A CN114693966 A CN 114693966A CN 202210259711 A CN202210259711 A CN 202210259711A CN 114693966 A CN114693966 A CN 114693966A
Authority
CN
China
Prior art keywords
model
data
data set
training
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210259711.5A
Other languages
Chinese (zh)
Inventor
潘晓光
王小华
陈亮
张雅娜
张娜
姚珊珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Sanyouhe Smart Information Technology Co Ltd
Original Assignee
Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi Sanyouhe Smart Information Technology Co Ltd filed Critical Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority to CN202210259711.5A priority Critical patent/CN114693966A/en
Publication of CN114693966A publication Critical patent/CN114693966A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of machine learning, and particularly relates to a target detection method based on deep learning, which comprises the following steps of: collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set EMG to complete the construction of an original data set, labeling the category of the original data set, and completing the construction of a data set required by model training; data preprocessing: preprocessing data, and dividing different types of original data pictures by different data segmentation methods to ensure the training effect of the model; model construction: adopting a low-dimensional convolutional neural network to build a recognition classification model, inputting training data, completing the building of a parameter model, and storing the model: when the loss function of the model is not reduced any more, the model is saved; and (3) model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance. The invention provides a new model (G-RCN) for separating classification and positioning tasks, optimizes the difference between the classification and positioning tasks and realizes the improvement of performance.

Description

Target detection method based on deep learning
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to a target detection method based on deep learning.
Background
At present, the deep neural network achieves the performance of a super-human level in 1000 types of target recognition tasks, but the detection network does not achieve the performance equivalent to that of a human in 80 types of target identification tasks, which indicates that a gap exists between the recognition task and the detection task. Object detection requires powerful classification performance and the ability to pinpoint an object among an infinite number of candidate locations.
Problems or disadvantages of the prior art: at present, a target detection model utilizes a shared feature map to complete classification and positioning tasks at the same time, but the performance of the original fast RCNN and the RCNN of partial separation feature mapping can only be compared, and the performance is poor.
Disclosure of Invention
Based on the method, the target detection method based on deep learning is provided, the construction of an original data set is completed by collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set, and images in the COCO data set are classified according to the pixel size. And after the data collection is finished, preprocessing the data, wherein the preprocessing comprises segmentation, normalization and the like. Inputting the preprocessed data into the built G-RCN fusion network model for training the network model, storing the model until the loss function of the model does not decrease any more, completing model construction, inputting the test data into the stored network model to complete model evaluation, and identifying the performance of the model.
The application discloses a target detection method based on deep learning, which comprises the following steps,
s1, data acquisition: collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set EMG to complete the construction of an original data set, labeling the category of the original data set, and completing the construction of a data set required by model training;
s2, preprocessing data: preprocessing data, and dividing different types of original data pictures by different data segmentation methods to ensure the training effect of the model;
s3, model construction: building a recognition classification model by adopting a low-dimensional convolutional neural network, inputting training data, and completing the building of a parameter model;
s4, model storage: when the loss function of the model is not reduced any more, the model is saved;
s5, model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance.
Further, in step S1, the images in the original data set are used for performance testing of the model, and the images in the COCO data set are classified according to the size of pixels, where the classification is based on the size of the images being respectively less than 32 × 32, greater than 32 × 32, less than 96 × 96, and greater than 96 × 96, and the PASCAL VOC data set includes a training set and a testing set, and the two data sets are integrated to construct training and testing suitable for the network model.
Further, in step S2: the method comprises data segmentation and image scaling, wherein the data segmentation is used for segmenting an obtained original data set, and the data segmentation is carried out according to the following steps of 7: 3, dividing the model into a training set and a testing set, wherein the training set is used for training the model, and the rest images are used for final model performance evaluation;
the image scaling is as follows: the pixel sizes of the images are unified to 600 × 600 before the images are input.
Further, in step S3: constructing a convolutional network based on a gap optimization region to realize the identification of minimum texture without adding any additional modules or information, adopting G-RCN on fast R-CNN with ResNet101 backbone, separating ResNet101 from the last 6 bottlenecks of conv4 blocks, and using conv5 blocks as original blocks in head; the first convolution layer of the first bottleneck in the conv4 block originally has a stride of 2, is modified to a stride of 1 in the localization branch, the classification and localization branches share the same kernel of the first 17 bottlenecks of the conv4 block, the conv5 layer modifies stride to 1, and where all layers of the conv5 block in the ResNet101 used as headers, appended to the RoI pooling layer, the backbone is composed of all layers in the first four blocks, and a shared signature for classification and localization is generated.
Further, in step S5, the index is evaluated by using AP and MPA, and the formula is as follows:
Figure BDA0003550313140000031
where pii represents the number of correctly classified pixels; pij represents the number of all pixels, i, j are serial numbers;
Figure BDA0003550313140000032
where N is all categories.
Compared with the prior art, the invention has the following beneficial effects:
the invention designs a target detection method based on deep learning, which analyzes the classification and positioning tasks of a target detection model based on a region and respectively researches the effects of the two tasks, so that the high-level feature sharing of the classification and positioning tasks is suboptimal, the large span is beneficial to classification but not beneficial to positioning, and the global context information can improve the classification performance. On the basis, a new model (G-RCN) for separating classification and positioning tasks is provided, the difference between the classification and positioning tasks is optimized, and the performance is improved.
Drawings
FIG. 1 is a block flow diagram of the present invention.
Detailed Description
Technical inventions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The application discloses a target detection method based on deep learning, as shown in fig. 1, comprising the following steps,
s1, data acquisition: collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set EMG to complete the construction of an original data set, labeling the category of the original data set, and completing the construction of a data set required by model training;
collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set to complete the construction of an original data set, wherein the COCO data set comprises 80 categories, 83000 images are used for training, 40000 images are used for performance testing of a model, and the images in the COCO data set are classified according to the pixel size, and the classification is carried out according to the sizes of the images which are respectively smaller than 32 x 32, larger than 32 x 32, smaller than 96 x 96 and larger than 96 x 96. The PASCAL VOC data set comprises 11000 training sets and 5000 testing sets, and the two data sets are integrated to construct a special data set suitable for the experiment and used for training and testing a network model.
S2, preprocessing data: preprocessing data, and dividing different types of original data pictures by different data segmentation methods to ensure the training effect of the model;
data normalization: Min-Max normalization was performed for each piece of data.
Data segmentation: and segmenting the obtained original data set according to the following steps of 7: 3 into a training set and a test set, wherein the training set is used for training the model, and the rest images are used for final model performance evaluation.
Image zooming: because the image source directions in the original data are different, the sizes of the obtained images are different, and in order to meet the input requirement of a training model, the pixel sizes of the images are unified to 600 multiplied by 600 before the images are input, so that the model detection performance is improved.
S3, the specific method for model construction is as follows: building a recognition classification model by adopting a low-dimensional convolution neural network, inputting training data, and completing building of a parameter model; a Convolutional Network (G-RCN) Based on a Gap optimized Region is constructed, the identification of the minimum texture is realized under the condition of not adding any extra module or information, the G-RCN is adopted on a Faster R-CNN with a ResNet101 backbone, and a method for separating the two tasks is provided. ResNet101 was detached from the last 6 necks of the conv4 block, while the conv5 block was still used as the original block in the head. The original stride of the first convolution layer of the first bottleneck in the conv4 block is 2, and the modified stride is 1 in the localization branch. However, the classification and localization branches share the same kernel of the first 17 bottlenecks of the conv4 block. Where the RoI is aligned to output size 7 × 7 instead of 14 × 14 to save memory footprint on level conv5, and level conv5 modifies stride to 1 to improve localization performance. And all layers of the conv5 block in the ResNet101 used therein as headers, appended to the RoI pooling layer. The skeleton consists of all layers in the first four blocks and generates a shared feature map for classification and localization. In addition, the conv5 block utilizing the header can obtain better performance than the conv5 block which is put into a feature extraction network and shared MLP is used at the header, and meanwhile, stronger and more abstract feature representation can be obtained, which is beneficial to classification; the used Faster R-CNN learns the features of the low level by means of multi-task learning, starts splitting from the last few layers of the backbone and updates the feature map according to the correlation of the local part and the global context. Meanwhile, the stride of the positioning branch is reduced, the performance of the model is improved, and finally, the classification and positioning branches share one head to prevent the parameters from being obviously increased; the Gap-optimized Region is mainly used for improving the performance without additional functions by slightly modifying the paradigm, and the improvement is mainly from 3 aspects: separating, detecting global context and stride. The classification and positioning tasks are respectively placed on different parts of the detection model, so that the performance of the whole target detection model is improved. The method has the advantages that the global context information is rich, more auxiliary information and diversified information are added, and the classification can be obviously improved. Pooling layers are widely used in convolutional neural networks. In the tasks of classification, segmentation and detection, the pooling layer is an important component of the feature extraction network, and plays an important role in reducing resolution, reducing parameters, transmitting main information and the like. However, pooling layers also suffer from drawbacks including loss of sub-optimal information and alignment inaccuracies, preserving information by using a 2 × 2 convolution kernel with stride of 2 instead of pooling layers, while reducing the resolution of feature maps.
S4 model saving: when the loss function of the model is no longer decreasing, the model is saved.
S5, model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance. In order to complete the performance evaluation of the model, the evaluation indexes are obtained by using AP, MPA and the like, and the formula is as follows:
Figure BDA0003550313140000061
where pii represents the number of correctly classified pixels; pij denotes the number of all pixels.
Figure BDA0003550313140000062
Wherein N is all of the classes
Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are included in the scope of the present invention.

Claims (5)

1. A target detection method based on deep learning is characterized by comprising the following steps,
s1, data acquisition: collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set EMG to complete the construction of an original data set, labeling the category of the original data set, and completing the construction of a data set required by model training;
s2, preprocessing data: preprocessing data, and dividing different types of original data pictures by different data segmentation methods to ensure the training effect of the model;
s3, model construction: building a recognition classification model by adopting a low-dimensional convolutional neural network, inputting training data, and completing the building of a parameter model;
s4, model storage: when the loss function of the model is not reduced any more, the model is saved;
s5, model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance.
2. The method for detecting an object based on deep learning of claim 1, wherein in step S1, the images in the raw data set are used for performance test of the model, and the images in the COCO data set are classified according to pixel size, the classification is based on sizes respectively smaller than 32 × 32, larger than 32 × 32, smaller than 96 × 96, and larger than 96 × 96, and the paschaloc data set includes a training set and a testing set, and the two data sets are integrated to construct the training and testing set suitable for the network model.
3. The deep learning-based object detection method according to claim 2, wherein in step S2: the method comprises data segmentation and image scaling, wherein the data segmentation is used for segmenting an obtained original data set, and the data segmentation is carried out according to the following steps of 7: 3, dividing the model into a training set and a testing set, wherein the training set is used for training the model, and the rest images are used for final model performance evaluation;
the image scaling is as follows: the pixel sizes of the images are unified to 600 × 600 before the images are input.
4. The deep learning-based object detection method according to claim 3, wherein in step S3: constructing a convolutional network based on a gap optimization region to realize the identification of minimum texture without adding any additional modules or information, adopting G-RCN on fast R-CNN with ResNet101 backbone, separating ResNet101 from the last 6 bottlenecks of conv4 blocks, and using conv5 blocks as original blocks in head; the first convolution layer of the first bottleneck in the conv4 block originally has a stride of 2, is modified to a stride of 1 in the localization branch, the classification and localization branches share the same kernel of the first 17 bottlenecks of the conv4 block, the conv5 layer modifies stride to 1, and where all layers of the conv5 block in the ResNet101 used as headers, appended to the RoI pooling layer, the backbone is composed of all layers in the first four blocks, and a shared signature for classification and localization is generated.
5. The method for detecting an object based on deep learning of claim 4, wherein in the step S5, the evaluation index is evaluated by using AP and MPA, and the formula is as follows:
Figure FDA0003550313130000021
where pii represents the number of correctly classified pixels; pij represents the number of all pixels, i, j are serial numbers;
Figure FDA0003550313130000022
where N is all categories.
CN202210259711.5A 2022-03-16 2022-03-16 Target detection method based on deep learning Pending CN114693966A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210259711.5A CN114693966A (en) 2022-03-16 2022-03-16 Target detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210259711.5A CN114693966A (en) 2022-03-16 2022-03-16 Target detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN114693966A true CN114693966A (en) 2022-07-01

Family

ID=82139098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210259711.5A Pending CN114693966A (en) 2022-03-16 2022-03-16 Target detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN114693966A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115902814A (en) * 2023-03-09 2023-04-04 中国人民解放军国防科技大学 Target recognition model performance evaluation method and device based on information space measurement
CN117741070A (en) * 2024-02-21 2024-03-22 山东多瑞电子科技有限公司 Deep learning-based gas safety intelligent detection method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115902814A (en) * 2023-03-09 2023-04-04 中国人民解放军国防科技大学 Target recognition model performance evaluation method and device based on information space measurement
CN117741070A (en) * 2024-02-21 2024-03-22 山东多瑞电子科技有限公司 Deep learning-based gas safety intelligent detection method
CN117741070B (en) * 2024-02-21 2024-05-03 山东多瑞电子科技有限公司 Deep learning-based gas safety intelligent detection method

Similar Documents

Publication Publication Date Title
CN111210443B (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
CN109190752A (en) The image, semantic dividing method of global characteristics and local feature based on deep learning
CN114067143B (en) Vehicle re-identification method based on double sub-networks
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN114693966A (en) Target detection method based on deep learning
CN111832453B (en) Unmanned scene real-time semantic segmentation method based on two-way deep neural network
CN113011288A (en) Mask RCNN algorithm-based remote sensing building detection method
CN110598788A (en) Target detection method and device, electronic equipment and storage medium
CN112488229A (en) Domain self-adaptive unsupervised target detection method based on feature separation and alignment
CN110334709A (en) Detection method of license plate based on end-to-end multitask deep learning
US12087046B2 (en) Method for fine-grained detection of driver distraction based on unsupervised learning
CN111767854B (en) SLAM loop detection method combined with scene text semantic information
CN113850136A (en) Yolov5 and BCNN-based vehicle orientation identification method and system
CN108363962B (en) Face detection method and system based on multi-level feature deep learning
CN111462090A (en) Multi-scale image target detection method
CN113066089A (en) Real-time image semantic segmentation network based on attention guide mechanism
CN113160291A (en) Change detection method based on image registration
CN117912058A (en) Cattle face recognition method
CN117218643A (en) Fruit identification method based on lightweight neural network
CN117422998A (en) Improved river float identification algorithm based on YOLOv5s
CN116797830A (en) Image risk classification method and device based on YOLOv7
US20240005635A1 (en) Object detection method and electronic apparatus
CN116758610A (en) Attention mechanism and feature fusion-based light-weight human ear recognition method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination