CN111104921A - Multi-mode pedestrian detection model and method based on Faster rcnn - Google Patents

Multi-mode pedestrian detection model and method based on Faster rcnn Download PDF

Info

Publication number
CN111104921A
CN111104921A CN201911390948.1A CN201911390948A CN111104921A CN 111104921 A CN111104921 A CN 111104921A CN 201911390948 A CN201911390948 A CN 201911390948A CN 111104921 A CN111104921 A CN 111104921A
Authority
CN
China
Prior art keywords
map
depth
network
color
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911390948.1A
Other languages
Chinese (zh)
Inventor
柯良军
陆鑫
孙凯旋
董鹏辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201911390948.1A priority Critical patent/CN111104921A/en
Publication of CN111104921A publication Critical patent/CN111104921A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Abstract

A multi-mode pedestrian detection model and method based on Faster rcnn comprises input data alignment processing and a parallel feature extraction network, and results obtained by the parallel feature extraction network are processed through a subsequent RPN network and a classification network, so that category classification and position regression are performed. The method effectively judges the position of the pedestrian in the video or the picture, and simultaneously avoids the problems of false detection when the pedestrian is shielded and missed detection when the object shields the human body.

Description

Multi-mode pedestrian detection model and method based on Faster rcnn
Technical Field
The invention relates to the technical field of pedestrian detection models, in particular to a multi-mode pedestrian detection model and method based on Fasterrcnn.
Background
Human body detection is one of the most applied research directions in the field of computer vision, and is also one of the key and difficult problems. The human body detection problem is that whether a human body exists or not is judged in a video or a picture, and if the human body exists, the position of the human body needs to be output. The human body detection has important practical application value in the fields of unmanned driving, intelligent security and home service robots, and is a premise and basis for numerous applications such as human body behavior and gait analysis, human body identity recognition and pedestrian tracking. Early human body detection tasks are generally performed on the basis of color maps, and with the continuous development of deep learning methods, the utilization rate of information contained in the color maps is close to saturation. Because the color image is easy to be subjected to inherent defects such as illumination change and the like, the simple use of the color image for human body detection has little potential.
The depth map contains depth information of the external environment, so that geometric shape information of an object is represented, and meanwhile, the depth map has good illumination invariance which is not possessed by a color map. For these reasons, research into human detection based on RGB-D multimodal data is increasingly active in computer vision and robotics and other disciplines.
Most of the existing pedestrian detection algorithms are single-input networks only taking RGB images as input, and are easily influenced by the brightness, contrast and image blur of the RGB images; meanwhile, the discrimination degree of the model on the whole features which can be extracted by the shielded pedestrians is not high.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a multimode pedestrian detection model and a multimode pedestrian detection method based on Faster rcnn, which can effectively judge the position of a pedestrian in a video or picture and simultaneously avoid the problems of false detection when people are shielded and missed detection when objects shield the human body.
In order to achieve the purpose, the invention adopts the technical scheme that:
a multimode pedestrian detection model based on Faster rcnn comprises input data alignment processing and a parallel feature extraction network, wherein results obtained by the parallel feature extraction network are processed through a subsequent RPN network and a classification network, so that category classification and position regression are performed; the input data alignment processing adopts a Zhang Zhengyou camera calibration method to calibrate a depth camera, a depth map is converted into a color map image coordinate system, then overlapped parts in the color map and the depth map are intercepted and stored respectively to obtain a group of aligned color map and depth map, when feature maps of different modes are merged, color map features and depth map features at the same position can be merged together to play a role together, and a parallel feature extraction network respectively extracts the features of color map data and depth map data by using two independent convolutional neural networks to serve as the basis for the feature fusion of the subsequent two modes.
A multimode pedestrian detection method based on Faster rcnn comprises the following steps;
firstly, input data alignment processing:
secondly, parallel feature extraction network:
thirdly, the method comprises the following steps: and processing the result obtained by the parallel feature extraction network through a subsequent RPN network and a classification network so as to perform class classification and position regression.
The input data alignment processing specifically comprises:
the method comprises the following steps: the method comprises the steps that a Microsoft 2 generation Kinect depth sensor is used for collecting, 5 scenes in real life are included, and various human body postures are included;
step two: calibrating the depth camera by adopting a Zhang Zhengyou camera calibration method, converting the depth map into a color image coordinate system, then intercepting overlapped parts in the color image and the depth map, and respectively storing to obtain a group of aligned color images and depth maps;
step three: and (3) encoding the depth map by a Jet color map to obtain a depth map and a color map intercepted in a color map image coordinate system, and sending the depth map and the color map into a pedestrian detection model.
The parallel feature extraction network specifically comprises:
the method comprises the following steps: extracting deep characteristic information from the input color image and the input depth image by using different characteristic extraction networks to obtain a characteristic image;
step two: carrying out L2 normalization processing on the feature map obtained in the last step;
assume that an original input picture inputted in parallel is (I)RGBIDepth) After feature extraction through the convolutional neural network, a group of parallel feature maps (f) are obtainedRGB,fDepth) Suppose a feature map (f)RGB,fDepth) F and r × c, the feature map u is normalized by L2fComprises the following steps:
Figure BDA0002344928510000031
wherein:
Figure BDA0002344928510000032
after the two groups of feature maps are respectively subjected to L2 normalization, the numerical values of the two groups of feature maps are scaled to the same scale, and the two groups of feature maps jointly play a role in the final detection result;
step three: for normalized feature maps
Figure BDA0002344928510000041
Characteristic map of each channel in the system
Figure BDA0002344928510000042
Designing a scale parameter gammaiAmplifying the channel characteristic diagram in a certain proportion, and amplifying the scale parameter to obtain a characteristic diagram FiComprises the following steps:
Figure BDA0002344928510000043
and processing the result obtained by the parallel feature extraction network through a subsequent RPN network and a classification network so as to perform class classification and position regression.
The subsequent RPN network and classification network are consistent with the fast Rcnn network.
IRGBRepresenting an RGB input image, IDepthRepresenting depth mapsAn image is input. f. ofRGBAnd representing the RGB feature map output by the feature extraction layer. f. ofDepthAnd representing the depth map feature map output by the feature extraction layer.
Figure BDA0002344928510000044
Representing the corresponding RGB image and depth map normalized feature maps. Gamma rayiCorresponding to the scale parameter on the ith feature map. FiShowing the ith feature map after being amplified by the scale parameter.
The invention has the beneficial effects that:
the method and the device introduce the information of the depth map as auxiliary information of pedestrian detection, can effectively overcome the problem that RGB images are sensitive to illumination and pedestrian shielding, and improve the performance of a pedestrian detection network; and a characteristic block algorithm is introduced, so that the local discrimination of the pedestrian under the shielding condition is effectively improved.
Drawings
Fig. 1 is an overall technical flow diagram.
FIG. 2 is a schematic operational flow diagram.
Fig. 3 is a schematic diagram of a laboratory shot.
Fig. 4 is a schematic diagram of a conference room shot.
Fig. 5 is a schematic view of office photography.
Fig. 6 is a diagram of a corridor shot.
Fig. 7 is a schematic view of hall photography.
Fig. 8 is a diagram of erroneous detection in verification.
FIG. 9 is a schematic diagram of the false detection promotion during verification.
FIG. 10 is a diagram illustrating the results of the detection without parallelism.
FIG. 11 is a diagram illustrating the parallel detection results.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1 and 2:
the input data comprises RGB data and correspondingly aligned depth map data, the feature maps of the corresponding data are respectively extracted through a feature extraction network, the value range of the feature values of the whole RGB feature maps is larger than that of the depth maps, so that two groups of feature maps need to be normalized respectively, the feature values of the two groups of feature maps are distributed in the same value range, the pedestrian detection is performed quite effectively, meanwhile, the feature data of the depth maps comprise depth information blocking pedestrians, and the blocked pedestrians can be better detected.
The color image and depth image data are used as input data of a pedestrian detection model, a target detection model FasterRCNN is used as a basic detection framework, a parallel feature extraction network is designed to integrate multi-modal input data, and depth information is introduced to improve the detection capability of the network on blocking pedestrians.
As shown in fig. 2:
firstly, input data alignment processing:
the method comprises the following steps: calibrating the depth camera by adopting a Zhang Zhengyou camera calibration method, converting the depth map into a color image coordinate system, then intercepting overlapped parts in the color image and the depth map, and respectively storing to obtain a group of aligned color images and depth maps;
step two: encoding the depth map by a Jet color map to obtain an equal depth map and an original color picture, and sending the equal depth map and the original color picture into a pedestrian detection model;
secondly, parallel feature extraction network:
the method comprises the following steps: extracting deep characteristic information from the input color image and depth image by using different characteristic extraction networks;
step two: carrying out L2 normalization processing on the feature map obtained in the last step;
assume that an original input picture inputted in parallel is (I)RGBIDepth) After feature extraction by the convolutional neural network, a set of feature maps (f) is obtainedRGB,fDepth) The feature maps are often multi-channel, and the feature maps of all channels are operated by taking a single-channel feature map as a unit, and the feature map (f) is assumedRGB,fDepth) F and r × c, the feature map u is normalized by L2fComprises the following steps:
Figure BDA0002344928510000061
wherein:
Figure BDA0002344928510000062
after the two groups of feature maps are respectively subjected to L2 normalization, the numerical values of the two groups of feature maps are scaled to the same scale, and the two groups of feature maps jointly play a role in the final detection result;
step three: for normalized feature maps
Figure BDA0002344928510000063
Characteristic map of each channel in the system
Figure BDA0002344928510000064
Designing a scale parameter gammaiAmplifying the channel characteristic diagram in a certain proportion, and amplifying the scale parameter to obtain a characteristic diagram FiComprises the following steps:
Figure BDA0002344928510000071
in the method, the scale parameters corresponding to each channel in the characteristic diagram can be obtained by learning through a Back Propagation (BP) algorithm, and the scale parameters obtained by automatic learning can better improve the robustness of network training;
thirdly, the method comprises the following steps: and processing the result obtained by the parallel feature extraction network through a subsequent RPN network and a classification network so as to perform class classification and position regression.
The subsequent RPN network and classification network are consistent with the fast Rcnn network.
There are only two classification results, whether pedestrian or not.
As shown in fig. 3 to 7: there are 2647 aligned color and depth maps, 5372 human examples. The human body examples comprise various human body postures such as standing posture, sitting posture and the like. The details of this data set are shown in the table below.
Examples of databases are as follows:
Figure BDA0002344928510000072
the 2647 pairs of pictures were randomly assigned training and testing sets in a 9:1 ratio.
As shown in fig. 8 to 9: the improvement of the parallel fast RCNN on false detection can effectively overcome the problem that RGB images are sensitive to illumination and pedestrian shielding, and the performance of a pedestrian detection network is improved; and a characteristic block algorithm is introduced, so that the local discrimination of the pedestrian under the shielding condition is effectively improved.

Claims (5)

1. A multimode pedestrian detection model based on Faster rcnn is characterized by comprising input data alignment processing and a parallel feature extraction network, wherein the result obtained by the parallel feature extraction network is processed through a subsequent RPN network and a classification network so as to carry out category classification and position regression; the input data alignment processing adopts a Zhang Zhengyou camera calibration method to calibrate a depth camera, a depth map is converted into a color map image coordinate system, then overlapped parts in the color map and the depth map are intercepted and stored respectively to obtain a group of aligned color map and depth map, when feature maps of different modes are merged, color map features and depth map features at the same position can be merged together to play a role together, and a parallel feature extraction network respectively extracts the features of color map data and depth map data by using two independent convolutional neural networks to serve as the basis for the feature fusion of the subsequent two modes.
2. A multi-mode pedestrian detection method based on Faster rcnn is characterized by comprising the following steps;
firstly, input data alignment processing:
secondly, parallel feature extraction network:
thirdly, the method comprises the following steps: and processing the result obtained by the parallel feature extraction network through a subsequent RPN network and a classification network so as to perform class classification and position regression.
3. The multi-modal pedestrian detection method according to claim 2, wherein the input data alignment process specifically comprises:
the method comprises the following steps: the method comprises the steps that a Microsoft 2 generation Kinect depth sensor is used for collecting, 5 scenes in real life are included, and various human body postures are included;
step two: calibrating the depth camera by adopting a Zhang Zhengyou camera calibration method, converting the depth map into a color image coordinate system, then intercepting overlapped parts in the color image and the depth map, and respectively storing to obtain a group of aligned color images and depth maps;
step three: and (3) encoding the depth map by a Jet color map to obtain a depth map and a color map intercepted in a color map image coordinate system, and sending the depth map and the color map into a pedestrian detection model.
4. The multi-modal pedestrian detection method based on fast rcnn according to claim 2, characterized in that the parallel feature extraction network specifically comprises:
the method comprises the following steps: extracting deep characteristic information from the input color image and the input depth image by using different characteristic extraction networks to obtain a characteristic image;
step two: carrying out L2 normalization processing on the feature map obtained in the last step;
assume that an original input picture inputted in parallel is (I)RGBIDepth) After feature extraction through the convolutional neural network, a group of parallel feature maps (f) are obtainedRGB,fDepth) Suppose a feature map (f)RGB,fDepth) F and r × c, the feature map u is normalized by L2fComprises the following steps:
Figure FDA0002344928500000021
wherein:
Figure FDA0002344928500000022
after the two groups of feature maps are respectively subjected to L2 normalization, the numerical values of the two groups of feature maps are scaled to the same scale, and the two groups of feature maps jointly play a role in the final detection result;
step three: for normalized feature maps
Figure FDA0002344928500000023
Characteristic map of each channel in the system
Figure FDA0002344928500000024
Designing a scale parameter gammaiAmplifying the channel characteristic diagram in a certain proportion, and amplifying the scale parameter to obtain a characteristic diagram FiComprises the following steps:
Figure FDA0002344928500000031
and processing the result obtained by the parallel feature extraction network through a subsequent RPN network and a classification network so as to perform class classification and position regression.
5. The method according to claim 4 wherein the subsequent RPN network and classification network is identical to the Faster Rcnn network.
CN201911390948.1A 2019-12-30 2019-12-30 Multi-mode pedestrian detection model and method based on Faster rcnn Pending CN111104921A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911390948.1A CN111104921A (en) 2019-12-30 2019-12-30 Multi-mode pedestrian detection model and method based on Faster rcnn

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911390948.1A CN111104921A (en) 2019-12-30 2019-12-30 Multi-mode pedestrian detection model and method based on Faster rcnn

Publications (1)

Publication Number Publication Date
CN111104921A true CN111104921A (en) 2020-05-05

Family

ID=70425119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911390948.1A Pending CN111104921A (en) 2019-12-30 2019-12-30 Multi-mode pedestrian detection model and method based on Faster rcnn

Country Status (1)

Country Link
CN (1) CN111104921A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022011560A1 (en) * 2020-07-14 2022-01-20 Oppo广东移动通信有限公司 Image cropping method and apparatus, electronic device, and storage medium
WO2022104618A1 (en) * 2020-11-19 2022-05-27 Intel Corporation Bidirectional compact deep fusion networks for multimodality visual analysis applications

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203506A (en) * 2016-07-11 2016-12-07 上海凌科智能科技有限公司 A kind of pedestrian detection method based on degree of depth learning art
CN109766856A (en) * 2019-01-16 2019-05-17 华南农业大学 A kind of method of double fluid RGB-D Faster R-CNN identification milking sow posture
CN110276265A (en) * 2019-05-27 2019-09-24 魏运 Pedestrian monitoring method and device based on intelligent three-dimensional solid monitoring device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203506A (en) * 2016-07-11 2016-12-07 上海凌科智能科技有限公司 A kind of pedestrian detection method based on degree of depth learning art
CN109766856A (en) * 2019-01-16 2019-05-17 华南农业大学 A kind of method of double fluid RGB-D Faster R-CNN identification milking sow posture
CN110276265A (en) * 2019-05-27 2019-09-24 魏运 Pedestrian monitoring method and device based on intelligent three-dimensional solid monitoring device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘壮 等: "双通道Faster rcnn在RGB-D手部检测中的应用", 《计算机科学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022011560A1 (en) * 2020-07-14 2022-01-20 Oppo广东移动通信有限公司 Image cropping method and apparatus, electronic device, and storage medium
WO2022104618A1 (en) * 2020-11-19 2022-05-27 Intel Corporation Bidirectional compact deep fusion networks for multimodality visual analysis applications

Similar Documents

Publication Publication Date Title
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN106845487B (en) End-to-end license plate identification method
CN106886216B (en) Robot automatic tracking method and system based on RGBD face detection
WO2019071664A1 (en) Human face recognition method and apparatus combined with depth information, and storage medium
CN107067015B (en) Vehicle detection method and device based on multi-feature deep learning
CN111144207B (en) Human body detection and tracking method based on multi-mode information perception
CN114241548A (en) Small target detection algorithm based on improved YOLOv5
US10922531B2 (en) Face recognition method
CN105740775A (en) Three-dimensional face living body recognition method and device
US11875599B2 (en) Method and device for detecting blurriness of human face in image and computer-readable storage medium
CN113326735B (en) YOLOv 5-based multi-mode small target detection method
US9280209B2 (en) Method for generating 3D coordinates and mobile terminal for generating 3D coordinates
CN105989608A (en) Visual capture method orienting intelligent robot and visual capture device thereof
CN110298330B (en) Monocular detection and positioning method for power transmission line inspection robot
CN106881716A (en) Human body follower method and system based on 3D cameras robot
CN115170792B (en) Infrared image processing method, device and equipment and storage medium
CN112487981A (en) MA-YOLO dynamic gesture rapid recognition method based on two-way segmentation
CN110969045B (en) Behavior detection method and device, electronic equipment and storage medium
CN113673584A (en) Image detection method and related device
CN111104921A (en) Multi-mode pedestrian detection model and method based on Faster rcnn
CN110991256A (en) System and method for carrying out age estimation and/or gender identification based on face features
WO2022127814A1 (en) Method and apparatus for detecting salient object in image, and device and storage medium
CN112287867A (en) Multi-camera human body action recognition method and device
CN104866826B (en) A kind of static sign Language Recognition Method based on KNN and pixel ratio Gradient Features
US11709914B2 (en) Face recognition method, terminal device using the same, and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200505

RJ01 Rejection of invention patent application after publication