CN110110665A - The detection method of hand region under a kind of driving environment - Google Patents

The detection method of hand region under a kind of driving environment Download PDF

Info

Publication number
CN110110665A
CN110110665A CN201910378179.7A CN201910378179A CN110110665A CN 110110665 A CN110110665 A CN 110110665A CN 201910378179 A CN201910378179 A CN 201910378179A CN 110110665 A CN110110665 A CN 110110665A
Authority
CN
China
Prior art keywords
frame
hand region
hand
driving environment
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910378179.7A
Other languages
Chinese (zh)
Other versions
CN110110665B (en
Inventor
林相波
史明明
李一博
戴佐俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Chuang Yuan Micro Software Co Ltd
Dalian University of Technology
Original Assignee
Beijing Chuang Yuan Micro Software Co Ltd
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Chuang Yuan Micro Software Co Ltd, Dalian University of Technology filed Critical Beijing Chuang Yuan Micro Software Co Ltd
Priority to CN201910378179.7A priority Critical patent/CN110110665B/en
Publication of CN110110665A publication Critical patent/CN110110665A/en
Application granted granted Critical
Publication of CN110110665B publication Critical patent/CN110110665B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm

Abstract

The invention discloses a kind of detection methods of hand region under driving environment, include the following steps: that step 1) prepares data set, the data set is shot in driver's cabin by being mounted on camera apparatus at driver's cabin different location in true driving environment and is obtained under situation, and data set is divided into training image collection and test chart image set, then data extending is carried out to data set, generates new hand region label later;Step 2) constructs hand and detects convolutional neural networks structure, is completed feature extraction using the characteristic information on different scale using multiple dimensioned framework and merged;Step 3) uses the end-to-end training of ADAM optimization algorithm, concentrates stochastical sampling from training image, deconditioning after loss function L stablizes;Step 4) is used to eliminate the candidate frame of redundancy using non-maxima suppression, obtains optimal hand target and surrounds frame;Step 5) announces testing result;It is easy to implement the detection to human hand region, suitable for the manpower area marking under cab environment.

Description

The detection method of hand region under a kind of driving environment
Technical field
The invention belongs to the object detection fields of computer vision, more particularly, to hand region under a kind of driving environment Detection method.
Background technique
Manpower detection, classification and tracking have had years of researches history, can apply in many fields, such as virtually existing It is real, man-machine interaction environment, driving behavior monitoring etc..Since hand region is by more multifactor interference in natural image, such as Illumination variation, block, hand shape variation, visual angle change, hand resolution ratio are low etc., up to the present, the hand area in natural image Domain detection is far from reaching the accuracy of mankind's identification, and many applications must not depend on the artificial detection side of inefficiency Formula.Therefore, the accurate detection method for studying mankind's hand region under natural environment is of great significance.This paper target be from Hand region is detected in still image under motor vehicle driving room environmental, studies a kind of new method based on depth learning technology, Technological means can be provided for driving behavior detection etc..
It using skin color information is the available strategy that many methods obtain better effects in hand detection.Such as document [1][A.Mittal,A.Zisserman,and P.H.S.Torr.Hand detection using multiple Proposals.In British Machine Vision Conference, 2011] a kind of two-part method is proposed, in use Hereafter, the colour of skin, sliding window shape these three complementary detectors provide hand region candidate frame, are then provided often by classifier The fiducial probability of a candidate frame.The shortcomings that such methods is when detecting the hand region in natural image, since complexity illuminates The variation of skin color caused by situation greatly influences its detection performance.It can also be answered certain using the method for multi-modal information Preferable result is obtained with middle.Such as document [2] [E.Ohn-Bar, S.Martin, A.Tawari, and M.M.Trivedi.Head,eye,and hand patterns for driver activity recognition.In ICPR, pages 660-665,2014] the HOG feature of RGB image and depth image is extracted simultaneously, hand area is detected in conjunction with SVM Simultaneously complete driving behavior identification in domain.But, because of the limitation of selected HOG feature, inspection of this method to hand region It is not high to survey precision.Document [3] [X.Zhu, X.Jia, and K.Wong, " Pixel-level hand detection with shapeaware structured forests,”in Processing of Asian Conference on Computer Vision.Springer Press, 2014, pp.64-78.] it is detected using shape sensitive type structuring forest algorithm individual element Hand region, although having better effects to the hand detection under the first visual angle, individual element scans the mode of entire image It is excessively time-consuming.By human body segmentation indirectly obtain hand region [4] [L.Karlinsky, M.Dinerstein, D.Harari,and S.Ullman,“The chains model for detecting parts by their context,”in Proceedings of Computer Vision and Pattern Recognition.IEEE Press, 2010, pp.25-32.] it is another hand region detection scheme, it is determined by the way that human body is divided into different positions Hand region, not excessive when blocking, such method is difficult to detect hand.With the flourishing hair of depth learning technology Exhibition, the target detection based on convolutional neural networks, which achieves, greatly to improve.Convolutional Neural net such as based on candidate region nomination Network series (RCNN, Fast-RCNN, Faster-RCNN, R-FCN), YOLO list of target detect network etc., although they are detected The objects such as cat, dog, pedestrian, automobile, sofa yield good result, but when shared region is relatively small in the picture for target When (such as manpower) or when blocking, it is not high using the prototype structure accuracy in detection of these networks, need to design more effectively Structure.Document [5] [Lu Ding, Yong Wang, et al.Multi-scale predictions for robust hand Detection and classification, arXiv:1804.08220v1 [cs.CV], 2018] it is a kind of multiple dimensioned to propose R-FCN network structure includes 5 convolutional layers, provides hand region candidate frame from different scale, and therefrom extracts the spy of different layers Sign figure is merged, and then the hand region detected surrounds frame.Document [6] [T.Hoang Ngan Le Kha Gia Quach Chenchen Zhu,et al.Robust Hand Detection and Classification in Vehicles And in the Wild, CVPRW 2018, pp:39-46] it is also using R-FCN network structure as basic framework, use is multiple dimensioned Mode merges the feature of different layers, and hand region is screened in candidate frame.Document [7] [Xiaoming Deng, Ye Yuan, Yinda Zhang,et al.,Joint Hand Detection and Rotation Estimation by Using CNN, ArXiv:1612.02742v1 [cs.CV], 2016.] a kind of joint network that hand region detection is detected with hand rotation direction is designed, Last hand region detection is completed by the way that feature is shared.
Summary of the invention
Object of the present invention is to: a kind of detection method of hand region under driving environment is provided, as a kind of new hand inspection Network structure is surveyed, does not need to establish complexion model, does not need additional feature extractor, pass through the RGB number under cab environment Network model is trained according to collection, the detection to human hand region is realized, suitable for the manpower region under cab environment Mark.
The technical scheme is that under a kind of driving environment hand region detection method, specifically comprise the following steps:
Step 1) prepares data set, and the data set is in true driving environment by being mounted at driver's cabin different location It is obtained under situation in camera apparatus shooting driver's cabin, and data set is divided into training image collection and test chart image set, then logarithm Data extending is carried out according to collection, generates new hand region label later;
Step 2) building hand detection convolutional neural networks structure utilizes the spy on different scale using multiple dimensioned framework Reference breath is completed feature extraction and is merged;
Step 3) uses the end-to-end training of ADAM optimization algorithm, stochastical sampling is concentrated from training image, when loss function L is steady Deconditioning after fixed;
Loss function L formula is as follows:
L=Lc+Lr (1)
Wherein LcFor the probability whether evaluation in-out-snap pixel correctly classifies, LrSurround whether frame vertex position obtains for evaluation It is returned to correct;
Lc=-α p* (1-p)γlogp-(1-α)(1-p*)pγlog(1-p) (2)
Wherein p* indicates true pixel classifications as a result, p indicates that network-evaluated pixel is located at the probability surrounded in frame, α It is positive negative sample balance factor,γ rule of thumb value, setting γ=2 can obtain preferably in experiment Experimental result;
Wherein CiWithRespectively indicate regression result and true value that hand surrounds frame coordinate;
Step 4) is used to eliminate the candidate frame of redundancy using non-maxima suppression, obtains optimal hand target and surrounds frame;
Step 5) announces testing result.
As a preferred technical solution, training image collection described in step 1) according to 9:1 ratio be randomly divided into training subset, Verify subset.
Include as a preferred technical solution, flip horizontal to the data extending method of data set in step 1), vertically turn over Turn, random angles rotation, translation, Gaussian Blur and sharpening, training data increases at least 22000 width images after expansion.
Data extending includes following rule in step 1) as a preferred technical solution:
Expand rule 1: brightness enhances 1.2-1.5 times of range, and 0.7-1.5 times of scaling, the direction x translates 40 pixels, the side y To translating 60 pixels;
Expand rule 2: random cropping back gauge 0-16 pixel, is overturn by 50% probability level;
Expand rule 3:100% flip vertical, it is 0 that mean value, which is added, and the Gaussian Blur that variance is 3 is handled;
Expand rule 4: Random-Rotation, rotate 45 ° of the angle upper limit, white Gaussian noise, noise level 20%, by 50% is added Probability sharpens at random.
Hand region label generating method new in step 1) is as follows as a preferred technical solution: with original encirclement frame Four edges frame on the basis of, into frame be retracted designated length d=0.2lmin, lminFor most short frame length, frame inner part is labeled as 1, outer frame part is labeled as 0.
Feature extraction and fusion include three convolution modules and a up-sampling in step 2) as a preferred technical solution, Fusion Features processing, specifically comprises the following steps:
Input layer picture size 256 × 256, first convolution module ConvB_1 is containing two convolutional layers and a maximum pond Change layer, 3 × 3,64 channels of convolution kernel;Second convolution module ConvB_2 is containing two convolutional layers and a maximum pond layer, volume Product 3 × 3,128 channels of core;Third convolution module ConvB_3 is containing three convolutional layers and a maximum pond layer, convolution kernel 3 × 3,256 channels;The core size of above-mentioned pond layer is 2 × 2, step-length 2;
By the characteristic pattern up-sampling of third convolution module ConvB_3 output, one times of dimension enlargement, then by second convolution The characteristic pattern of module ConvB_2 output removes 20% port number using Dropout mechanism at random, and the two is cascaded;After fusion Characteristic pattern FusF_1 standardization processing after be sent into 1 × 1 and 3 × 3 concatenated convolutional group ConvC_1, totally 128 channels;It is exported Output layer is sent into after 3 × 3 convolutional layers for being 32 using a convolution kernel number;Output layer branch containing Liang Ge, branch 1 pass through list 1 × 1 convolution of channel predicts that each pixel is located at the probability of target area;Branch 2 predicts target by 4 channel, 1 × 1 convolution Surround the coordinate value on frame vertex.
Testing result includes following objective quantification evaluation index in step 5) as a preferred technical solution: average accurate Spend AP, average recall rate AR, comprehensive evaluation index F1-score and detection speed FPS;
Assuming that TP expression has estimated that real goal, FP indicate that the target estimated is not real goal, FN indicates true Target is not estimated, then
FPS is described using frame per second.
The invention has the advantages that
1, under driving environment of the present invention hand region detection method, not only accuracy rate is high, but also applicability is more preferable, calculates Complexity is low, and runing time is few, and training process is simple, high-efficient, and testing efficiency reaches 42fps.
2, the present invention establishes the model of hand detection using depth convolutional neural networks structure, and it is relevant can to extract manpower More comprehensively feature, to block, uneven illumination, dimensional variation, change in shape etc. have better robustness.
Detailed description of the invention
The invention will be further described with reference to the accompanying drawings and embodiments:
Fig. 1 is for different illumination, the testing result signal of different hand shapes, different size of hand, different number hand Figure.
Specific embodiment
Embodiment: since hand region has biggish change in size in different images, consider with different depth Characteristic pattern expresses the feature of different size manpowers respectively, wherein using the biggish hand region of feature focusing of deeper, and compared with The feature of shallow-layer focuses lesser hand region, and in order to reduce computing cost, the present invention is using U-shaped convolutional neural networks structure Thought gradually merges characteristic pattern, specifically comprises the following steps:
Step 1) prepares data set, and the data set is in true driving environment by being mounted at driver's cabin different location It is obtained under situation in camera apparatus shooting driver's cabin, the purpose is to research backgrounds mixed and disorderly, complicated lighting condition and regular screening The performance of manpower method for detecting area in the case of gear, and data set is divided into training image collection and test chart image set, then logarithm Data extending is carried out according to collection, generates new hand region label later;
Wherein data set includes 5500 training images altogether, and 5500 test images, picture size is in training and test Uniformly it is adjusted to 256 × 256;Training image collection is randomly divided into training subset, verifying subset according to 9:1 ratio, wherein training Collection includes 4950 images, and verifying subset includes 550 images, and test chart image set includes 5500 images.Camera perspective includes: Follow shot is fixed on left front shooting driver, is fixed on right front shooting driver, is fixed on rear, be fixed on the right side of driver, It is fixed on that top, to be worn over driver first-class.
Deep neural network needs the data training of magnanimity that can just obtain a preferable model.Therefore, in legacy data On the basis of, it needs to expand data set.Data extending method to data set includes flip horizontal, flip vertical, random Angle rotation, translation, Gaussian Blur and sharpening, training data increases at least 22000 width images after expansion.
Data extending includes following rule:
Expand rule 1: brightness enhances 1.2-1.5 times of range, and 0.7-1.5 times of scaling, the direction x translates 40 pixels, the side y To translating 60 pixels;
Expand rule 2: random cropping back gauge 0-16 pixel, is overturn by 50% probability level;
Expand rule 3:100% flip vertical, it is 0 that mean value, which is added, and the Gaussian Blur that variance is 3 is handled;
Expand rule 4: Random-Rotation, rotate 45 ° of the angle upper limit, white Gaussian noise, noise level 20%, by 50% is added Probability sharpens at random.
The hand region label that legacy data collection provides is to surround box form, that is, surrounds the apex coordinate of frame.This patent net The information that network output par, c uses is that pixel falls in the probabilistic information surrounded in frame, it is therefore desirable at original tag Reason, generates new label.New hand region label generating method is as follows: on the basis of original four edges frame for surrounding frame, to Designated length d=0.2l is retracted in framemin, lminFor most short frame length, frame inner part is labeled as 1, and outer frame part is labeled as 0.
Step 2) building hand detection convolutional neural networks structure utilizes the spy on different scale using multiple dimensioned framework Reference breath is completed feature extraction and is merged;
Feature extraction is simultaneously merged comprising three convolution modules and a up-sampling Fusion Features processing, and following step is specifically included It is rapid:
Input layer picture size 256 × 256, first convolution module ConvB_1 is containing two convolutional layers and a maximum pond Change layer, 3 × 3,64 channels of convolution kernel;Second convolution module ConvB_2 is containing two convolutional layers and a maximum pond layer, volume Product 3 × 3,128 channels of core;Third convolution module ConvB_3 is containing three convolutional layers and a maximum pond layer, convolution kernel 3 × 3,256 channels;The core size of above-mentioned pond layer is 2 × 2, step-length 2;
By the characteristic pattern up-sampling of third convolution module ConvB_3 output, one times of dimension enlargement, then by second convolution The characteristic pattern of module ConvB_2 output removes 20% port number using Dropout mechanism at random, and the two is cascaded;After fusion Characteristic pattern FusF_1 standardization processing after be sent into 1 × 1 and 3 × 3 concatenated convolutional group ConvC_1, totally 128 channels;It is exported Output layer is sent into after 3 × 3 convolutional layers for being 32 using a convolution kernel number;Output layer branch containing Liang Ge, branch 1 pass through list 1 × 1 convolution of channel predicts that each pixel is located at the probability of target area;Branch 2 predicts target by 4 channel, 1 × 1 convolution Surround the coordinate value on frame vertex.
Step 3) uses the end-to-end training of ADAM optimization algorithm, stochastical sampling is concentrated from training image, when loss function L is steady Deconditioning after fixed;
Loss function L formula is as follows:
L=Lc+Lr (1)
Wherein LcFor the probability whether evaluation in-out-snap pixel correctly classifies, LrSurround whether frame vertex position obtains for evaluation It is returned to correct;
Lc=-α p* (1-p)γlogp-(1-α)(1-p*)pγlog(1-p) (2)
Wherein p* indicates true pixel classifications as a result, p indicates that network-evaluated pixel is located at the probability surrounded in frame, α It is positive negative sample balance factor,γ rule of thumb value, setting γ=2 can obtain preferably in experiment Experimental result;
Wherein CiWithRespectively indicate regression result and true value that hand surrounds frame coordinate;
During step 4) target detection, a large amount of overlapped candidate frame can be generated in same target position, each Candidate frame has different confidence levels.It is used to eliminate the candidate frame of redundancy using non-maxima suppression, obtains optimal hand target Surround frame;
Step 5) announces testing result;Testing result includes following objective quantification evaluation index: bat AP, being averaged Recall rate AR, comprehensive evaluation index F1-score and detection speed FPS;
Assuming that TP expression has estimated that real goal, FP indicate that the target estimated is not real goal, FN indicates true Target is not estimated, then
FPS is described using frame per second.
The performance that present networks detect hand region in RGB still image under cab environment is detected by subjective vision and visitor The mode for seeing quantizating index is evaluated, and Fig. 1 show the hand testing result of several representative instances, it can be seen that the method pair Different illumination, different hand shape, different size of hand, different number hand all have preferable detection effect.
The results are shown in Table 1 for quantitative assessment on test set for this method, and method performance and contest on VIVA data set are best As a result it is compared.
To the quantitative assessing index of hand region detection in 1. test set of table
Method AP (%) AR (%) F FPS
This patent 98.3 86.7 92.2 42
Background technology document [6] 94.8 74.7 - 4.65
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe The personage for knowing this technology all without departing from the spirit and scope of the present invention, carries out modifications and changes to above-described embodiment.Cause This, institute is complete without departing from the spirit and technical ideas disclosed in the present invention by those of ordinary skill in the art such as At all equivalent modifications or change, should be covered by the claims of the present invention.

Claims (7)

1. the detection method of hand region under a kind of driving environment, which is characterized in that specifically comprise the following steps:
Step 1) prepares data set, camera of the data set in true driving environment by being mounted at driver's cabin different location It is obtained under situation in equipment shooting driver's cabin, and data set is divided into training image collection and test chart image set, then to data set Data extending is carried out, generates new hand region label later;
Step 2) constructs hand and detects convolutional neural networks structure, using multiple dimensioned framework, is believed using the feature on different scale Breath is completed feature extraction and is merged;
Step 3) uses the end-to-end training of ADAM optimization algorithm, concentrates stochastical sampling from training image, after loss function L stablizes Deconditioning;
Loss function L formula is as follows:
L=Lc+Lr (1)
Wherein LcFor the probability whether evaluation in-out-snap pixel correctly classifies, LrSurround whether frame vertex position obtains just for evaluation Really return;
Lc=-α p* (1-p)γlogp-(1-α)(1-p*)pγlog(1-p) (2)
Wherein p* indicates true pixel classifications as a result, p indicates that network-evaluated pixel is located at the probability surrounded in frame, and α is just Negative sample balance factor,γ rule of thumb value, setting γ=2 can obtain preferable reality in experiment Test result;
Wherein CiWithRespectively indicate regression result and true value that hand surrounds frame coordinate;
Step 4) is used to eliminate the candidate frame of redundancy using non-maxima suppression, obtains optimal hand target and surrounds frame;
Step 5) announces testing result.
2. the detection method of hand region under driving environment according to claim 1, which is characterized in that described in step 1) Training image collection is randomly divided into training subset, verifying subset according to 9:1 ratio.
3. the detection method of hand region under driving environment according to claim 1, which is characterized in that logarithm in step 1) Data extending method according to collection includes flip horizontal, flip vertical, random angles rotation, translation, Gaussian Blur and sharpening, is expanded Training data increases at least 22000 width images afterwards.
4. the detection method of hand region under driving environment according to claim 1, which is characterized in that data in step 1) Expand includes following rule:
Expand rule 1: brightness enhances 1.2-1.5 times of range, and 0.7-1.5 times of scaling, the direction x translates 40 pixels, and the direction y is flat Move 60 pixels;
Expand rule 2: random cropping back gauge 0-16 pixel, is overturn by 50% probability level;
Expand rule 3:100% flip vertical, it is 0 that mean value, which is added, and the Gaussian Blur that variance is 3 is handled;
Expand rule 4: Random-Rotation, rotate 45 ° of the angle upper limit, white Gaussian noise, noise level 20%, by 50% probability is added It is random to sharpen.
5. the detection method of hand region under driving environment according to claim 1, which is characterized in that in step 1) newly Hand region label generating method is as follows: on the basis of original four edges frame for surrounding frame, designated length d=is retracted into frame 0.2lmin, lminFor most short frame length, frame inner part is labeled as 1, and outer frame part is labeled as 0.
6. the detection method of hand region under driving environment according to claim 1, which is characterized in that feature in step 2) It extracts and merges comprising three convolution modules and a up-sampling Fusion Features processing, specifically comprise the following steps:
Input layer picture size 256 × 256, first convolution module ConvB_1 contain two convolutional layers and a maximum pond layer, 3 × 3,64 channels of convolution kernel;Second convolution module ConvB_2 is containing two convolutional layers and a maximum pond layer, convolution kernel 3 × 3,128 channels;Third convolution module ConvB_3 is containing the maximum pond layer of three convolutional layers and one, convolution kernel 3 × 3, 256 channels;The core size of above-mentioned pond layer is 2 × 2, step-length 2;
By the characteristic pattern up-sampling of third convolution module ConvB_3 output, one times of dimension enlargement, then by second convolution module The characteristic pattern of ConvB_2 output removes 20% port number using Dropout mechanism at random, and the two is cascaded;Fused spy 1 × 1 and 3 × 3 concatenated convolutional group ConvC_1 are sent into after sign figure FusF_1 standardization processing, totally 128 channels;Its output passes through again Output layer is sent into after crossing 3 × 3 convolutional layers that a convolution kernel number is 32;Output layer branch containing Liang Ge, branch 1 pass through single channel 1 × 1 convolution predicts that each pixel is located at the probability of target area;Branch 2 is surrounded by 4 channel, 1 × 1 convolution, prediction target The coordinate value on frame vertex.
7. the detection method of hand region under driving environment according to claim 1, which is characterized in that detection in step 5) As a result include following objective quantification evaluation index: bat AP, average recall rate AR, comprehensive evaluation index F1-score and Detect speed FPS;
Assuming that TP expression has estimated that real goal, FP indicate that the target estimated is not real goal, FN indicates real goal It is not estimated, then
FPS is described using frame per second.
CN201910378179.7A 2019-05-08 2019-05-08 Detection method for hand area in driving environment Active CN110110665B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910378179.7A CN110110665B (en) 2019-05-08 2019-05-08 Detection method for hand area in driving environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910378179.7A CN110110665B (en) 2019-05-08 2019-05-08 Detection method for hand area in driving environment

Publications (2)

Publication Number Publication Date
CN110110665A true CN110110665A (en) 2019-08-09
CN110110665B CN110110665B (en) 2021-05-04

Family

ID=67488704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910378179.7A Active CN110110665B (en) 2019-05-08 2019-05-08 Detection method for hand area in driving environment

Country Status (1)

Country Link
CN (1) CN110110665B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364805A (en) * 2020-11-21 2021-02-12 西安交通大学 Rotary palm image detection method
CN112686888A (en) * 2021-01-27 2021-04-20 上海电气集团股份有限公司 Method, system, equipment and medium for detecting cracks of concrete sleeper

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129673A (en) * 2011-04-19 2011-07-20 大连理工大学 Color digital image enhancing and denoising method under random illumination
CN108875732A (en) * 2018-01-11 2018-11-23 北京旷视科技有限公司 Model training and example dividing method, device and system and storage medium
CN109086779A (en) * 2018-07-28 2018-12-25 天津大学 A kind of attention target identification method based on convolutional neural networks
US20190064389A1 (en) * 2017-08-25 2019-02-28 Huseyin Denli Geophysical Inversion with Convolutional Neural Networks
CN109635750A (en) * 2018-12-14 2019-04-16 广西师范大学 A kind of compound convolutional neural networks images of gestures recognition methods under complex background
CN109711288A (en) * 2018-12-13 2019-05-03 西安电子科技大学 Remote sensing ship detecting method based on feature pyramid and distance restraint FCN

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129673A (en) * 2011-04-19 2011-07-20 大连理工大学 Color digital image enhancing and denoising method under random illumination
US20190064389A1 (en) * 2017-08-25 2019-02-28 Huseyin Denli Geophysical Inversion with Convolutional Neural Networks
CN108875732A (en) * 2018-01-11 2018-11-23 北京旷视科技有限公司 Model training and example dividing method, device and system and storage medium
CN109086779A (en) * 2018-07-28 2018-12-25 天津大学 A kind of attention target identification method based on convolutional neural networks
CN109711288A (en) * 2018-12-13 2019-05-03 西安电子科技大学 Remote sensing ship detecting method based on feature pyramid and distance restraint FCN
CN109635750A (en) * 2018-12-14 2019-04-16 广西师范大学 A kind of compound convolutional neural networks images of gestures recognition methods under complex background

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YIDAN ZHOU 等: "HBE: Hand Branch Ensemble Network for Real-time 3D Hand Pose Estimation", 《ECCV 2018》 *
刘万军 等: "自适应增强卷积神经网络图像识别", 《中国图象图形学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364805A (en) * 2020-11-21 2021-02-12 西安交通大学 Rotary palm image detection method
CN112686888A (en) * 2021-01-27 2021-04-20 上海电气集团股份有限公司 Method, system, equipment and medium for detecting cracks of concrete sleeper

Also Published As

Publication number Publication date
CN110110665B (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN106548182B (en) Pavement crack detection method and device based on deep learning and main cause analysis
CN109165623B (en) Rice disease spot detection method and system based on deep learning
CN108154102B (en) Road traffic sign identification method
US9639748B2 (en) Method for detecting persons using 1D depths and 2D texture
CN109711264B (en) Method and device for detecting occupation of bus lane
CN108171112A (en) Vehicle identification and tracking based on convolutional neural networks
CN107945153A (en) A kind of road surface crack detection method based on deep learning
CN105046197A (en) Multi-template pedestrian detection method based on cluster
CN103390164A (en) Object detection method based on depth image and implementing device thereof
CN103310194A (en) Method for detecting head and shoulders of pedestrian in video based on overhead pixel gradient direction
CN105069807A (en) Punched workpiece defect detection method based on image processing
CN106023257A (en) Target tracking method based on rotor UAV platform
CN102722712A (en) Multiple-scale high-resolution image object detection method based on continuity
CN103778435A (en) Pedestrian fast detection method based on videos
CN104036284A (en) Adaboost algorithm based multi-scale pedestrian detection method
CN111915583B (en) Vehicle and pedestrian detection method based on vehicle-mounted thermal infrared imager in complex scene
CN104657717A (en) Pedestrian detection method based on layered kernel sparse representation
CN104268598A (en) Human leg detection method based on two-dimensional scanning lasers
Liu et al. Multi-type road marking recognition using adaboost detection and extreme learning machine classification
CN105893971A (en) Traffic signal lamp recognition method based on Gabor and sparse representation
EP4174792A1 (en) Method for scene understanding and semantic analysis of objects
CN106529441A (en) Fuzzy boundary fragmentation-based depth motion map human body action recognition method
CN110110665A (en) The detection method of hand region under a kind of driving environment
CN105354547A (en) Pedestrian detection method in combination of texture and color features
CN106886754A (en) Object identification method and system under a kind of three-dimensional scenic based on tri patch

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant