CN110889425A - Target detection method based on deep learning - Google Patents

Target detection method based on deep learning Download PDF

Info

Publication number
CN110889425A
CN110889425A CN201811644255.6A CN201811644255A CN110889425A CN 110889425 A CN110889425 A CN 110889425A CN 201811644255 A CN201811644255 A CN 201811644255A CN 110889425 A CN110889425 A CN 110889425A
Authority
CN
China
Prior art keywords
image
target detection
grid
target
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811644255.6A
Other languages
Chinese (zh)
Inventor
邓远志
林淼
刘志永
陈志列
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EVOC Intelligent Technology Co Ltd
Original Assignee
EVOC Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by EVOC Intelligent Technology Co Ltd filed Critical EVOC Intelligent Technology Co Ltd
Priority to CN201811644255.6A priority Critical patent/CN110889425A/en
Publication of CN110889425A publication Critical patent/CN110889425A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection method based on deep learning. According to the method, an integral image training model is directly selected, two stages of candidate region extraction and feature detection are integrated, namely classification categories and rectangular surrounding frames of real targets are directly regressed at multiple positions of an image, the stored features are read and written by using a video memory, and a softmax function is combined to replace svm for classifying the features, so that the speed of target detection can be increased, the targets and background regions can be better distinguished by utilizing integral image direct training, and the precision of target detection can be improved.

Description

Target detection method based on deep learning
Technical Field
The invention relates to the technical field of computer vision, in particular to a target detection method based on deep learning.
Background
The target detection is the basis for realizing complex visual tasks such as target retrieval, target tracking, abnormal behavior detection, scene understanding and the like, and the detection of the target in the image or the video through the algorithm can provide more bases for advanced decision-making, so that a good target detection model is an important link.
Currently, a target detection method based on a regional convolutional neural network (R-CNN) is dominant in the field of target detection, and a target detection process of the method includes: firstly, a candidate area set is generated, wherein the candidate area is obtained by finding out the possible positions of targets in the image in advance by using the information of textures, edges, colors and the like in the image, then all the candidate areas are used as training samples and input into a Convolutional Neural Network (CNN) for training, then the CNN characteristics extracted from each candidate area are input into a classifier SVM for training, and finally the classified candidate areas of the classifier SVM are subjected to frame regression to correct the candidate areas so as to meet the condition that the window extracted from the candidate areas is more consistent with a target real window.
In the process of implementing the invention, the inventor finds that at least the following technical problems exist in the prior art:
in the target detection algorithm based on the R-CNN, training must be performed by two parts, namely candidate region training and CNN feature training, and the algorithm needs to frequently read and write the stored features through a hard disk, so that the conventional target detection method is relatively time-consuming to detect images with the same resolution on the same hardware platform.
Disclosure of Invention
According to the target detection method based on deep learning, the two stages of candidate region extraction and feature detection are integrated, the stored features are read and written by using the video memory, and the classification of the features by using the softmax function instead of svm is combined, so that the speed and the precision of target detection can be improved.
The invention provides a target detection method based on deep learning, which comprises the following steps:
(1) loading the image and the corresponding annotation information file into a computer video memory, and randomly initializing a weight matrix;
the annotation information file comprises the category of each real target in the image and the coordinates of a rectangular bounding box containing the real target;
(2) carrying out grid division on the image to obtain a plurality of grid subimages, and predicting a candidate area of each grid subimage;
(3) performing convolution operation on a plurality of candidate area matrix vectors of each grid sub-image to obtain a feature map of the grid sub-image, performing convolution operation on the feature map on different convolution layers by using convolution kernels of different scales, and performing integral fusion on the feature maps of different scales corresponding to each grid sub-image;
(4) performing pooling operation on the fused feature map, and performing convolution operation on the pooled feature map and a convolution kernel with a fixed scale to further optimize the feature map;
(5) performing pooling operation on the output characteristic diagram of the step (4) by using a filter;
(6) taking the output of the step (5) as the input of the full connection layer, and performing convolution operation by adopting a fixed step length;
(7) taking the output of the step (6) as the input of a classification function Softmax, calculating the confidence coefficient of the image target class and the predicted coordinate information, calculating the error of the image target class and the predicted coordinate information, and calculating the corresponding gradient value through the error to update the weight matrix of each layer;
(8) stopping training if the training times reach the set times, otherwise, returning to the step (3);
(9) and obtaining a trained model after the set training times are reached, and performing product calculation on the image to be detected and the model weight matrix to obtain a target detection result in the image.
According to the target detection method based on deep learning provided by the embodiment of the invention, the whole image training model is directly selected, and the target detection problem is converted into a regression problem, namely the classification category and the rectangular surrounding frame of the real target are directly regressed at a plurality of positions of the input image. Compared with the prior art, on one hand, the candidate region extraction and the feature detection are integrated, namely classification categories and rectangular surrounding frames of real targets are directly regressed at a plurality of positions of the image, and in the training process, feature reading and writing are not needed through a hard disk, but the stored features are read and written by utilizing a video memory, so that the reading and writing efficiency is obviously improved, and the speed of target detection can be improved; on the other hand, convolution operations are carried out on different convolution layers through convolution kernels of different scales, feature maps of different scales are fused after convolution calculation, so that the method is suitable for a multi-scale real target, and the classification of features by using a softmax function instead of svm is combined, so that the precision of target detection is improved.
Drawings
FIG. 1 is a flowchart of a deep learning-based target detection method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an application of the deep learning-based target detection method in a security platform.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a target detection method based on deep learning, as shown in fig. 1, the method comprises the following steps:
(1) and loading the image and the corresponding annotation information file into a computer video memory, and randomly initializing a weight matrix.
The annotation information file comprises the category of each real target in the image and the coordinates of a rectangular bounding box containing the real target.
(2) And carrying out grid division on the image to obtain a plurality of grid sub-images, and predicting the candidate area of each grid sub-image.
(3) Convolution operation is carried out on a plurality of candidate area matrix vectors of each grid sub-image to obtain a feature map of the grid sub-image, convolution operation is carried out on the feature map on different convolution layers by utilizing convolution kernels of different scales, and integration fusion is carried out on the feature maps of different scales corresponding to each grid sub-image.
(4) And performing pooling operation on the fused feature map, and performing convolution operation on the pooled feature map and a convolution kernel with a fixed scale to further optimize the feature map.
And (4) reducing feature dimensions and enhancing the anti-interference (such as interference caused by image stretching, rotation and other operations) capability of the features.
(5) And (4) performing pooling operation on the output characteristic diagram of the step (4) by using a filter.
(6) And (5) taking the output of the step (5) as the input of the full connection layer, and performing convolution operation by adopting a fixed step.
Specifically, the features output in step (5) are scaled to 1 × 1000, that is, a 1000-dimensional feature map is obtained, and then the feature map is convolved with a fixed step.
(7) And (4) taking the output of the step (6) as the input of a classification function Softmax, firstly calculating the confidence coefficient of the image target class and the predicted coordinate information, then calculating the error of the image target class and the real labeling information, and calculating the corresponding gradient value through the error so as to update the weight matrix of each layer.
Specifically, the output image characteristics of the step (6) are used as the input of a classification function Softmax, the confidence coefficient of the target category in the image and the coordinate information corresponding to the target are calculated, standard Euclidean distance calculation is carried out on the confidence coefficient and the coordinate information corresponding to the target in the current image, the corresponding gradient value is calculated through the error, the weight matrixes of all layers are added and updated, and the target confidence coefficient and the corresponding coordinate obtained in the next training are closer to the real value.
(8) And (5) stopping training if the training times reach the set times, and otherwise, returning to the step (3).
(9) And obtaining a trained model after the set training times are reached, and performing product calculation on the image to be detected and the model weight matrix to obtain a target detection result in the image.
Specifically, the set training times are reached to obtain a trained model, then an image to be detected is input, the image to be detected is subjected to calculation such as convolution kernel pooling from (2) to (6), and finally the category and coordinate information of target detection is obtained through a classification function softmax, namely, the product calculation is carried out on the image to be detected and a model weight matrix to obtain a target detection result in the image.
According to the target detection method based on deep learning provided by the embodiment of the invention, the whole image training model is directly selected, and the target detection problem is converted into a regression problem, namely the classification category and the rectangular surrounding frame of the real target are directly regressed at a plurality of positions of the input image. Compared with the prior art, on one hand, from the step (2) to the step (7), the invention integrates a series of processes of extracting the classification of the characteristic Softmax from the candidate region into a whole, realizes the training from the input end to the output end, namely directly regresses the classification category and the rectangular surrounding frame of the real target on a plurality of positions of the image, and does not need to read and write the characteristic through a hard disk in the training process, but utilizes the video memory to read and write the stored characteristic, so that the reading and writing efficiency is obviously improved, and the target detection speed can be improved; on the other hand, convolution operations are carried out on different convolution layers through convolution kernels of different scales, feature maps of different scales are fused after convolution calculation so as to adapt to a multi-scale real target, and a softmax function is combined to replace svm to classify features, so that better performance is still kept in high-dimensional feature classification, and the accuracy of target detection is improved.
Optionally, if the center coordinate of the rectangular bounding box is located in the coordinate range of the grid sub-image, performing product calculation on the matrix vector of the grid sub-image and the weight matrix, and predicting a plurality of candidate regions, otherwise, not performing candidate region prediction processing on the grid sub-image.
Optionally, before loading the image and the corresponding annotation information file into the computer video memory, the method further includes;
and marking each real target in the image by adopting an image marking tool to mark, and generating a marking information file.
Optionally, after the loading the image and the corresponding annotation information file into the computer video memory, before the grid-dividing the image into a plurality of grid sub-images, the method further includes:
initializing coordinates of a candidate region of the image as null.
Optionally, the convolution kernel with a fixed scale is a convolution kernel of 3x3 or a convolution kernel of 5x5, the filter is a filter of 2x2, and the fixed step is a step of 1x 1.
The target detection algorithm based on deep learning is well applied to security images, target detection can be performed on road scenes of traffic security images after the target detection algorithm is embedded into a security platform, and the target detection working process of the security platform is as follows:
1) and carrying out video recording on a traffic road scene through the road traffic camera, and uploading the recorded image video at regular intervals.
2) And the server decodes the video into frames, initializes the graph accelerator and loads the deep learning model.
3) And inputting the image to be detected into the deep learning network model to obtain the target category and position coordinate information in the road traffic image, such as the position of a pedestrian and the position and model of a vehicle.
4) The recognized target is framed out and displayed in the image, and the recognition effect graph is shown in fig. 2.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A target detection method based on deep learning is characterized by comprising the following steps:
(1) loading the image and the corresponding annotation information file into a computer video memory, and randomly initializing a weight matrix;
the annotation information file comprises the category of each real target in the image and the coordinates of a rectangular bounding box containing the real target;
(2) carrying out grid division on the image to obtain a plurality of grid subimages, and predicting a candidate area of each grid subimage;
(3) performing convolution operation on a plurality of candidate area matrix vectors of each grid sub-image to obtain a feature map of the grid sub-image, performing convolution operation on the feature map on different convolution layers by using convolution kernels of different scales, and performing integral fusion on the feature maps of different scales corresponding to each grid sub-image;
(4) performing pooling operation on the fused feature map, and performing convolution operation on the pooled feature map and a convolution kernel with a fixed scale to further optimize the feature map;
(5) performing pooling operation on the output characteristic diagram of the step (4) by using a filter;
(6) taking the output of the step (5) as the input of the full connection layer, and performing convolution operation by adopting a fixed step length;
(7) taking the output of the step (6) as the input of a classification function Softmax, calculating the confidence coefficient of the image target class and the predicted coordinate information, calculating the error of the image target class and the predicted coordinate information, and calculating the corresponding gradient value through the error to update the weight matrix of each layer;
(8) stopping training if the training times reach the set times, otherwise, returning to the step (3);
(9) and obtaining a trained model after the set training times are reached, and performing product calculation on the image to be detected and the model weight matrix to obtain a target detection result in the image.
2. The method of claim 1, wherein predicting the candidate regions for each mesh sub-image comprises:
and if the central coordinate of the rectangular surrounding frame is positioned in the coordinate range of the grid sub-image, performing product calculation on the matrix vector of the grid sub-image and a weight matrix to predict a plurality of candidate regions, and otherwise, not performing candidate region prediction processing on the grid sub-image.
3. The method of claim 1, wherein before loading the image and the corresponding annotation information file into the computer video memory, further comprising;
and marking each real target in the image by adopting an image marking tool to generate a marking information file.
4. The method of claim 1, wherein after loading the image and the corresponding annotation information file into the computer memory, and before performing the mesh division on the image to obtain a plurality of mesh sub-images, further comprising:
initializing coordinates of a candidate region of the image as null.
5. The method of claim 1, wherein the fixed-scale convolution kernel is a 3x3 convolution kernel or a 5x5 convolution kernel.
6. The method of claim 1, wherein the filter is a 2x2 filter.
7. The method of claim 1, wherein the fixed stride is a 1x1 stride.
CN201811644255.6A 2018-12-29 2018-12-29 Target detection method based on deep learning Pending CN110889425A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811644255.6A CN110889425A (en) 2018-12-29 2018-12-29 Target detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811644255.6A CN110889425A (en) 2018-12-29 2018-12-29 Target detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN110889425A true CN110889425A (en) 2020-03-17

Family

ID=69745752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811644255.6A Pending CN110889425A (en) 2018-12-29 2018-12-29 Target detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN110889425A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106611162A (en) * 2016-12-20 2017-05-03 西安电子科技大学 Method for real-time detection of road vehicle based on deep learning SSD frame
CN107527009A (en) * 2017-07-11 2017-12-29 浙江汉凡软件科技有限公司 A kind of remnant object detection method based on YOLO target detections
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN108960198A (en) * 2018-07-28 2018-12-07 天津大学 A kind of road traffic sign detection and recognition methods based on residual error SSD model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106611162A (en) * 2016-12-20 2017-05-03 西安电子科技大学 Method for real-time detection of road vehicle based on deep learning SSD frame
CN107527009A (en) * 2017-07-11 2017-12-29 浙江汉凡软件科技有限公司 A kind of remnant object detection method based on YOLO target detections
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN108960198A (en) * 2018-07-28 2018-12-07 天津大学 A kind of road traffic sign detection and recognition methods based on residual error SSD model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JOSEPH REDMON ET AL.: "《YOLOv3: An Incremental Improvement》", 《AIXIV》 *
袁利毫 等: "《基于YOLOv3的水下小目标自主识别》", 《海洋工程装备与技术》 *

Similar Documents

Publication Publication Date Title
US11200447B2 (en) Low- and high-fidelity classifiers applied to road-scene images
CN112560999B (en) Target detection model training method and device, electronic equipment and storage medium
JP6897335B2 (en) Learning program, learning method and object detector
US9607228B2 (en) Parts based object tracking method and apparatus
US20170206434A1 (en) Low- and high-fidelity classifiers applied to road-scene images
Hariyono et al. Moving object localization using optical flow for pedestrian detection from a moving vehicle
CN111767878B (en) Deep learning-based traffic sign detection method and system in embedded device
CN112861635B (en) Fire disaster and smoke real-time detection method based on deep learning
CN109492576B (en) Image recognition method and device and electronic equipment
CN111274926B (en) Image data screening method, device, computer equipment and storage medium
KR20200027889A (en) Learning method, learning device for detecting lane using cnn and test method, test device using the same
Ye et al. A two-stage real-time YOLOv2-based road marking detector with lightweight spatial transformation-invariant classification
CN112906816B (en) Target detection method and device based on optical differential and two-channel neural network
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN111209774A (en) Target behavior recognition and display method, device, equipment and readable medium
CN112861631A (en) Wagon balance human body intrusion detection method based on Mask Rcnn and SSD
Zelener et al. Cnn-based object segmentation in urban lidar with missing points
CN112861917A (en) Weak supervision target detection method based on image attribute learning
CN111833353B (en) Hyperspectral target detection method based on image segmentation
CN115115825B (en) Method, device, computer equipment and storage medium for detecting object in image
Sharma et al. Analytical review on object segmentation and recognition
CN114972492A (en) Position and pose determination method and device based on aerial view and computer storage medium
CN111709377A (en) Feature extraction method, target re-identification method and device and electronic equipment
CN110889425A (en) Target detection method based on deep learning
CN114783042A (en) Face recognition method, device, equipment and storage medium based on multiple moving targets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200317