CN107808376A - A kind of detection method of raising one's hand based on deep learning - Google Patents
A kind of detection method of raising one's hand based on deep learning Download PDFInfo
- Publication number
- CN107808376A CN107808376A CN201711044722.7A CN201711044722A CN107808376A CN 107808376 A CN107808376 A CN 107808376A CN 201711044722 A CN201711044722 A CN 201711044722A CN 107808376 A CN107808376 A CN 107808376A
- Authority
- CN
- China
- Prior art keywords
- hand
- raising
- frame
- sample
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 62
- 238000013135 deep learning Methods 0.000 title claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 15
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 238000000034 method Methods 0.000 claims description 28
- 230000009466 transformation Effects 0.000 claims description 13
- 230000009471 action Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 8
- 238000003491 array Methods 0.000 claims description 7
- 239000003607 modifier Substances 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 4
- 230000001537 neural effect Effects 0.000 claims description 2
- 230000032696 parturition Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000208199 Buxus sempervirens Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/254—Analysis of motion involving subtraction of images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30242—Counting objects in image
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of detection method of raising one's hand based on deep learning, comprise the following steps:1) sample is collected, the sample is complex environment sample;2) detection model of raising one's hand is established, the detection model of raising one's hand is based on convolutional neural networks structure, and is trained based on the sample with R FCN algorithm of target detection;3) video to be measured is carried out raising one's hand to detect using the detection model of raising one's hand after training, obtains frame position of raising one's hand.Compared with prior art, the present invention have the advantages that can to raise one's hand in detection of complex environment to act, accuracy rate recall ratio it is high.
Description
Technical field
The present invention relates to a kind of video detecting method, more particularly, to a kind of detection method of raising one's hand based on deep learning.
Background technology
Movement human detection in video sequence and Activity recognition are one and are related to computer vision, pattern-recognition and artificial
The multi-field research topics such as intelligence, it is always people because it is widely applied value in the fields such as business, medical treatment and military affairs
The focus of research.However, because the diversity of human body behavior and non-rigid and intrinsic video image complexity, be proposed
A kind of sane and real-time accurately method is still difficult point.
Due to noise and highly dynamic background, different illumination conditions, and small size and multiple possible matchings pair
As the action of raising one's hand that people is detected in a typical classroom environment is a challenging task.
Document " Haar-Feature Based Gesture Detection of Hand-Raising for Mobile
Robot in HRI Environments " disclose a kind of detection technique of raising one's hand based on Haar features, and this method is trained first
Then two graders, all positions of this method human-face detector scanning input picture raise one's hand to examine to search people with one
Survey device and scan the specific region around face to have detected whether to raise one's hand.This method is divided into training stage and detection-phase.Training
Stage specifically includes:(1) sample is created, training sample is divided into positive sample and negative sample, and wherein positive sample refers to target sample to be checked
This, negative sample refers to other any images;(2) feature extraction, including edge feature, linear feature and central feature;(3)
Cascaded Adaboost are trained, and are completed by calling OpenCV opencv_traincascade programs.Training terminates
A .xml model file is generated afterwards, and the adaboost cascade classifiers of generation, which can detect, raises one's hand to act, and this is also entirely to examine
The key of survey technology.Detection-phase specifically includes:(1) video cuts frame and carries out Face datection;(2) sense based on face constraint is emerging
Interesting regional choice;(3) carry out raising one's hand to detect in the region of interest using the cascade classifier trained.
Although the above method can obtain testing result, still have several drawbacks:(1) need to carry out Face datection, face
The effect quality of detection will directly affect the effect of final detection of raising one's hand;(2) selection of area-of-interest needs to continuously attempt to, right
New detection environment needs to reformulate selection scheme, as testing result not robust;(3) raise one's hand to detect based on Haar features
Ineffective, accuracy rate and recall ratio are relatively low.
The content of the invention
It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide one kind is based on deep learning
Detection method of raising one's hand.
An object of the present invention is can to raise one's hand in detection of complex environment (such as classroom environment) to act.
The second object of the present invention is to improve the accuracy rate for detection of raising one's hand.
The third object of the present invention is to improve the recall ratio for detection of raising one's hand.
The fourth object of the present invention is to merge the same action of raising one's hand of different frame, obtains number of more really raising one's hand.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of detection method of raising one's hand based on deep learning, comprises the following steps:
1) sample is collected, the sample is complex environment sample;
2) detection model of raising one's hand is established, the detection model of raising one's hand is based on convolutional neural networks structure, and is based on the sample
It is trained with R-FCN algorithm of target detection;
3) video to be measured is carried out raising one's hand to detect using the detection model of raising one's hand after training, obtains frame position of raising one's hand.
Further, in the step 1), sample size is more than 30,000.
Further, the step 1) also includes:Sample information is preserved, the sample information includes key frame of video figure
The bounding box coordinate for target of being raised one's hand in picture, key frame image information and key frame image information.
Further, the step 1) also includes:Sample-size is clustered, obtains the Pattern plate ruler needed for training process
It is very little.
Further, the convolutional neural networks structure includes intermediate level fused layer.
Further, this method also includes step:
4) merged using same raise one's hand action of the track algorithm to different frame.
Further, the step 4) is specially:
401) obtaining first picture frame and the frame coordinate of raising one's hand detected, frame correspondence establishment of respectively raising one's hand has a tracklet
Array, and state initialization is ALIVE;
402) next picture frame is obtained, judges whether that camera lens view transformation occurs, if so, then by all tracklet numbers
The state of group is changed to DEAD, re-establishes new tracklet arrays, return to step 402), if it is not, then performing step 403);
403) all frames of raising one's hand that traversal current image frame detects, it is optimal for each frame selection of raising one's hand using track algorithm
One tracklet array of matching;
404) for the tracklet arrays not being matched under current image frame, judge its state whether ALIVE, if
It is that then status modifier is WAIT, if it is not, then status modifier is DEAD, return to step 402), until all images are completed in processing
Frame.
Further, it is described to judge whether that camera lens view transformation, which occurs, is specially:
Two adjacent images frame is obtained, counts the pixel that two picture frame corresponding pixel points rates of change exceed first threshold
Number;Judge whether the pixel number of change is more than Second Threshold, if so, be then judged to that camera lens view transformation occurs, if it is not,
Camera lens view transformation does not occur then.
Further, this method also includes step:
5) action of raising one's hand after detection and merging is counted.
Compared with prior art, the invention has the advantages that:
1st, the present invention uses the video image in complex environment as sample raise one's hand the training of detection model so that
Inventive method can raise one's hand to detect suitable for complex environment, can be well adapted for more complicated background.
2nd, detection model of raising one's hand proposed by the invention is the depth based on a large amount of (sample of being raised one's hand more than 30,000) sample trainings
Learning model, the accuracy rate of model is high, and by substantial amounts of test, accuracy rate of the present invention is more than 90%.
3rd, the template size required for training process of the present invention is that the size cluster based on sample obtains, rather than artificial choosing
Select, effectively improve the effect of model.
4th, template size of the invention cluster and the fusion of the network intermediate level ensure that the recall ratio of model, by a large amount of
Test, recall ratio of the present invention be more than 70%.
5th, the track algorithm that uses of the present invention can effectively track between different frame it is same raise one's hand to act, therefore can obtain true
Raise one's hand in fact the data of number, foundation is provided for further analysis and evaluation.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the present invention;
Fig. 2 is the schematic flow sheet of sample-size of the present invention cluster;
Fig. 3 is the schematic diagram of network intermediate layer level fusion;
Fig. 4 is the schematic network structure of detection model of the invention of raising one's hand;
Fig. 5 is the merging schematic flow sheet that the present invention raises one's hand to act;
Fig. 6 is that shot boundary of the present invention judges schematic flow sheet;
Fig. 7 is the Detection results figure in embodiment.
Embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention
Premised on implemented, give detailed embodiment and specific operating process, but protection scope of the present invention is not limited to
Following embodiments.
As shown in figure 1, the present invention provides a kind of detection method of raising one's hand based on deep learning, comprise the following steps:
1) sample is collected, the sample is complex environment sample, and sample size is more than 30,000.
Need to preserve sample information, including Video Key two field picture, key frame image information and key frame after collecting sample
Bounding box coordinate for target of being raised one's hand in image information etc..
Preserving for sample information can make according to the form of PASCAL VOC data sets.PASCAL VOC are image recognition
The outstanding data set of a whole set of standardization is provided with classification, the file preserved under the form include JPEGImages,
The key frame images of video are deposited in Annotations etc., wherein JPEGImages, correspondence image is deposited in Annotations
Details and image in raise one's hand the bounding box coordinate of target, wherein frame position mark form of raising one's hand by top left co-ordinate and
Lower left corner coordinate composition.
Need to use template (anchors) during model training, template size is clustered by sample-size in the present invention
Mode obtains.In certain embodiments, sample-size is clustered using kmeans algorithms, selects most representational 9
Kind size is as template.
Distance metric formula in k-means is newly defined as herein:
D (box, centroid)=1-IOU (box, centroid)
Wherein, d (box, centroid) expression bounding box box and particle centroid distance, IOU (box,
Centroid rate is overlapped corresponding to) representing.
In above-mentioned formula, IOU (Intersection over Union) represents template anchors (i.e. box) and preliminary making
Raise one's hand frame ground truth (i.e. centroid) overlapping rate, be defined as:
As shown in Fig. 2 the input detailed process false code of cluster can be described as:
Require:The pre- bounding box for demarcating frame of raising one's hand of input
Ensure:9 kinds of most typical sizes are exported as template size
1:K=9
2:K point is selected as initial barycenter
3:repeat
4:According to range formula:D (box, centroid)=1-IOU (box, centroid)
5:Each bounding box are assigned to nearest barycenter, form k cluster
6:Recalculate the barycenter of each cluster
7:Until clusters do not change
2) detection model of raising one's hand is established, the detection model of raising one's hand is based on convolutional neural networks structure, and is based on the sample
It is trained with R-FCN algorithm of target detection.Convolutional neural networks structure includes intermediate level fused layer, to enrich convolutional Neural
The feature that network extraction arrives, and then improve the accuracy rate of detection.
In certain embodiments, the ResNet-101 for the revision that convolutional neural networks structure uses, with C1, C2, C3,
C4, C5 represent ResNet-101 conv1, conv2, conv3, conv4, conv5 output respectively.With folding for the convolution number of plies
Add, the receptive field of each convolution kernel is increasing, and the semantic feature learnt is also more advanced, but some trickle features are got over
Easily it is ignored.And some environment pass the imperial examinations at the provincial level the resolution ratio made manually can be smaller, therefore in order to correctly detect Small object, we are by C3
It is superimposed with C5 output, make feature that network learns in C5 layers while have high-level semantics feature and low-level details special
Sign.As shown in figure 3, res5c_relu is C5 output, C5_topdown is C5 up-sampling layer, C5 is upsampled to and C3 mono-
The size of sample, last C5_topdown are superimposed to obtain P3 layers with C3, and P3 is in being that instead of outputs of the res5c_relu as C5, and this is just
Enrich the feature that convolutional neural networks extract.
After feature extraction network uses ResNet-101, and the characteristic pattern for having done the network intermediate level merges, using R-FCN
Algorithm of target detection carries out model training.Extract image's first by the conv+relu+pooling layers on one group of basis
feature maps.The feature maps are shared for follow-up RPN networks and detection networks.RPN networks are used to give birth to
Into region proposals, the layer judges that anchors belongs to foreground or background by softmax, then
Accurate proposals is obtained using bounding box regression amendments anchors.Roi Pooling layers are collected defeated
The feature maps and proposals entered, proposal feature maps are extracted after integrating these information, and calculated
Position-sensitive score maps, it is then fed into follow-up detection networks and judges target classification.Finally utilize
Proposal feature maps calculate proposal classification, and obtain the final exact position of detection block.
ResNet-101 includes 5 convolution blocks, 101 layers altogether, and 4 convolution blocks are as RPN nets before the R-FCN of master is used
The shared weights network of network and detection networks, feature extraction network of the 5th convolution block as detection networks,
The present invention is using all 101 layers as RPN networks and the shared weights network of detection networks, the 5th convolution block output
Feature map be shared for RPN networks and detection networks, such processing mode is ensureing the base of accuracy rate
Amount of calculation is also greatly reduced on plinth simultaneously.
Raise one's hand detection model network it is as shown in Figure 4.
3) video to be measured is carried out raising one's hand to detect using the detection model of raising one's hand after training, obtains frame position of raising one's hand.
In certain embodiments, this method also includes step:4) according to the position of previous frame, next frame is raised one's hand to act
It is tracked, is merged using same raise one's hand action of the track algorithm to different frame.In the feelings that camera lens visual angle does not convert
Under condition, it can be tracked using same raise one's hand action of the track algorithm to different frame.Track algorithm can use backtracking-beta pruning
Method, for raise one's hand action and the action progress Optimum Matching of raising one's hand of next frame of previous frame.
Step 4) is specially:
401) obtaining first picture frame and the frame coordinate of raising one's hand detected, frame correspondence establishment of respectively raising one's hand has a tracklet
Array, and state initialization is ALIVE;
402) next picture frame is obtained, judges whether that camera lens view transformation occurs, if so, then by all tracklet numbers
The state of group is changed to DEAD, re-establishes new tracklet arrays, return to step 402), if it is not, then performing step 403);
403) all frames of raising one's hand that traversal current image frame detects, frame selection is raised one's hand most to be each using beta pruning method is recalled
One tracklet array of good matching;
404) for the tracklet arrays not being matched under current image frame, judge its state whether ALIVE, if
It is that then status modifier is WAIT, if it is not, then status modifier is DEAD, return to step 402), until all images are completed in processing
Frame.
The false code of said process can be summarized as:
Require:The set of N number of image is inputted, and the frame bounding box that raise one's hand detected respectively
Ensure:Export tracklets
The single image frame merging process made manually of passing the imperial examinations at the provincial level is as shown in Figure 5.
There is the possibility of camera lens view transformation in the video capture based on camera, the present invention solves this using frame difference method and asked
Topic, i.e., successive frame subtracts each other.As shown in fig. 6, judge whether that camera lens view transformation, which occurs, is specially:
Two adjacent images frame is obtained, counts the pixel that two picture frame corresponding pixel points rates of change exceed first threshold
Number;Judge whether the pixel number of change is more than Second Threshold, if so, be then judged to that camera lens view transformation occurs, if it is not,
Camera lens view transformation does not occur then.
Specific determination methods are that white portion (i.e. motion parts) accounts for whether overall pixel has exceeded 20%, more than being to cut
Change.
Based on above-mentioned merging process, this method may also include step:5) action of raising one's hand after detection and merging is counted
Number.
Embodiment 1
The present embodiment illustrates the above method by taking students in middle and primary schools' classroom environment as an example.40,000 sample sizes are collected, by PASCAL
The form of VOC data sets makes sample of raising one's hand.By the cluster of sample-size, the 9 kinds of anchor box sizes finally clustered out
For:
(37,59) (44,72) (53,80) (56,96) (67,105) (75,128) (91,150) (115,184) (177,
283)。
Training process in the present embodiment has iteration altogether 20000 times, obtains an effect and preferably raises one's hand detection model.
The detection model part design sketch of raising one's hand trained is as shown in Figure 7.
After the merging for action of being raised one's hand using track algorithm progress different frame, the statistics of quantity is carried out, records whole classroom
Pass the imperial examinations at the provincial level the frequency made manually, complete a classroom and pass the imperial examinations at the provincial level the counting made manually, classroom atmosphere is assessed with this, is classroom atmosphere
Intellectual analysis provide foundation.
Through experiment, the above method raise one's hand Detection accuracy and recall ratio it is higher, accuracy rate more than 90%, recall ratio 70%
More than.
Preferred embodiment of the invention described in detail above.It should be appreciated that one of ordinary skill in the art without
Creative work can is needed to make many modifications and variations according to the design of the present invention.Therefore, all technologies in the art
Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea
Technical scheme, all should be in the protection domain being defined in the patent claims.
Claims (9)
1. a kind of detection method of raising one's hand based on deep learning, it is characterised in that comprise the following steps:
1) sample is collected, the sample is complex environment sample;
2) detection model of raising one's hand is established, the detection model of raising one's hand is based on convolutional neural networks structure, and based on the sample with R-
FCN algorithm of target detection is trained;
3) video to be measured is carried out raising one's hand to detect using the detection model of raising one's hand after training, obtains frame position of raising one's hand.
2. the detection method of raising one's hand according to claim 1 based on deep learning, it is characterised in that in the step 1),
Sample size is more than 30,000.
3. the detection method of raising one's hand according to claim 1 based on deep learning, it is characterised in that the step 1) is also wrapped
Include:Sample information is preserved, the sample information includes Video Key two field picture, key frame image information and key frame image information
In raise one's hand the bounding box coordinate of target.
4. the detection method of raising one's hand according to claim 1 based on deep learning, it is characterised in that the step 1) is also wrapped
Include:Sample-size is clustered, obtains the template size needed for training process.
5. the detection method of raising one's hand according to claim 1 based on deep learning, it is characterised in that the convolutional Neural net
Network structure includes intermediate level fused layer.
6. the detection method of raising one's hand according to claim 1 based on deep learning, it is characterised in that this method also includes step
Suddenly:
4) merged using same raise one's hand action of the track algorithm to different frame.
7. the detection method of raising one's hand according to claim 6 based on deep learning, it is characterised in that the step 4) is specific
For:
401) obtaining first picture frame and the frame coordinate of raising one's hand detected, frame correspondence establishment of respectively raising one's hand has a tracklet numbers
Group, and state initialization is ALIVE;
402) next picture frame is obtained, judges whether that camera lens view transformation occurs, if so, then by all tracklet arrays
State is changed to DEAD, re-establishes new tracklet arrays, return to step 402), if it is not, then performing step 403);
403) all frames of raising one's hand that traversal current image frame detects, best match is selected for each frame of raising one's hand using track algorithm
A tracklet array;
404) for the tracklet arrays not being matched under current image frame, judge its state whether ALIVE, if so, then
Status modifier is WAIT, if it is not, then status modifier is DEAD, return to step 402), until all picture frames are completed in processing.
8. the detection method of raising one's hand according to claim 6 based on deep learning, it is characterised in that described to judge whether to send out
Giving birth to camera lens view transformation is specially:
Two adjacent images frame is obtained, counts the pixel that two picture frame corresponding pixel points rates of change exceed first threshold
Number;Judge whether the pixel number of change is more than Second Threshold, if so, being then judged to that camera lens view transformation occurs, if it is not, then
Camera lens view transformation does not occur.
9. the detection method of raising one's hand according to claim 6 based on deep learning, it is characterised in that this method also includes step
Suddenly:
5) action of raising one's hand after detection and merging is counted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711044722.7A CN107808376B (en) | 2017-10-31 | 2017-10-31 | Hand raising detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711044722.7A CN107808376B (en) | 2017-10-31 | 2017-10-31 | Hand raising detection method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107808376A true CN107808376A (en) | 2018-03-16 |
CN107808376B CN107808376B (en) | 2022-03-11 |
Family
ID=61591064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711044722.7A Expired - Fee Related CN107808376B (en) | 2017-10-31 | 2017-10-31 | Hand raising detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107808376B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921748A (en) * | 2018-07-17 | 2018-11-30 | 郑州大学体育学院 | Didactic code method and computer-readable medium based on big data analysis |
CN109508661A (en) * | 2018-10-31 | 2019-03-22 | 上海交通大学 | A kind of person's of raising one's hand detection method based on object detection and Attitude estimation |
CN110163836A (en) * | 2018-11-14 | 2019-08-23 | 宁波大学 | Based on deep learning for the excavator detection method under the inspection of high-altitude |
CN110399822A (en) * | 2019-07-17 | 2019-11-01 | 思百达物联网科技(北京)有限公司 | Action identification method of raising one's hand, device and storage medium based on deep learning |
CN110414380A (en) * | 2019-07-10 | 2019-11-05 | 上海交通大学 | A kind of students ' behavior detection method based on target detection |
CN110941976A (en) * | 2018-09-24 | 2020-03-31 | 天津大学 | Student classroom behavior identification method based on convolutional neural network |
CN112686128A (en) * | 2020-12-28 | 2021-04-20 | 南京览众智能科技有限公司 | Classroom desk detection method based on machine learning |
CN116739859A (en) * | 2023-08-15 | 2023-09-12 | 创而新(北京)教育科技有限公司 | Method and system for on-line teaching question-answering interaction |
CN117670259A (en) * | 2024-01-31 | 2024-03-08 | 天津师范大学 | Sample detection information management method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104112121A (en) * | 2014-07-01 | 2014-10-22 | 深圳市欢创科技有限公司 | Face identification method, device and interactive game system used for interactive game device |
CN106651765A (en) * | 2016-12-30 | 2017-05-10 | 深圳市唯特视科技有限公司 | Method for automatically generating thumbnail by use of deep neutral network |
CN107122736A (en) * | 2017-04-26 | 2017-09-01 | 北京邮电大学 | A kind of human body based on deep learning is towards Forecasting Methodology and device |
CN107145908A (en) * | 2017-05-08 | 2017-09-08 | 江南大学 | A kind of small target detecting method based on R FCN |
CN107273828A (en) * | 2017-05-29 | 2017-10-20 | 浙江师范大学 | A kind of guideboard detection method of the full convolutional neural networks based on region |
-
2017
- 2017-10-31 CN CN201711044722.7A patent/CN107808376B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104112121A (en) * | 2014-07-01 | 2014-10-22 | 深圳市欢创科技有限公司 | Face identification method, device and interactive game system used for interactive game device |
CN106651765A (en) * | 2016-12-30 | 2017-05-10 | 深圳市唯特视科技有限公司 | Method for automatically generating thumbnail by use of deep neutral network |
CN107122736A (en) * | 2017-04-26 | 2017-09-01 | 北京邮电大学 | A kind of human body based on deep learning is towards Forecasting Methodology and device |
CN107145908A (en) * | 2017-05-08 | 2017-09-08 | 江南大学 | A kind of small target detecting method based on R FCN |
CN107273828A (en) * | 2017-05-29 | 2017-10-20 | 浙江师范大学 | A kind of guideboard detection method of the full convolutional neural networks based on region |
Non-Patent Citations (2)
Title |
---|
TIAGO S.NAZAR´ 等: "Hand-Raising Gesture Detection with Lienhart-Maydt Method in Videoconference and Distance Learning", 《SPRING》 * |
桑农 等: "复杂场景下基于R-FCN的手势识别", 《华中科技大学学报(自然科学版)》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921748B (en) * | 2018-07-17 | 2022-02-01 | 郑州大学体育学院 | Teaching planning method based on big data analysis and computer readable medium |
CN108921748A (en) * | 2018-07-17 | 2018-11-30 | 郑州大学体育学院 | Didactic code method and computer-readable medium based on big data analysis |
CN110941976A (en) * | 2018-09-24 | 2020-03-31 | 天津大学 | Student classroom behavior identification method based on convolutional neural network |
CN109508661A (en) * | 2018-10-31 | 2019-03-22 | 上海交通大学 | A kind of person's of raising one's hand detection method based on object detection and Attitude estimation |
CN109508661B (en) * | 2018-10-31 | 2021-07-09 | 上海交通大学 | Method for detecting hand lifter based on object detection and posture estimation |
CN110163836B (en) * | 2018-11-14 | 2021-04-06 | 宁波大学 | Excavator detection method used under high-altitude inspection based on deep learning |
CN110163836A (en) * | 2018-11-14 | 2019-08-23 | 宁波大学 | Based on deep learning for the excavator detection method under the inspection of high-altitude |
CN110414380A (en) * | 2019-07-10 | 2019-11-05 | 上海交通大学 | A kind of students ' behavior detection method based on target detection |
CN110399822A (en) * | 2019-07-17 | 2019-11-01 | 思百达物联网科技(北京)有限公司 | Action identification method of raising one's hand, device and storage medium based on deep learning |
CN112686128A (en) * | 2020-12-28 | 2021-04-20 | 南京览众智能科技有限公司 | Classroom desk detection method based on machine learning |
CN116739859A (en) * | 2023-08-15 | 2023-09-12 | 创而新(北京)教育科技有限公司 | Method and system for on-line teaching question-answering interaction |
CN117670259A (en) * | 2024-01-31 | 2024-03-08 | 天津师范大学 | Sample detection information management method |
CN117670259B (en) * | 2024-01-31 | 2024-04-19 | 天津师范大学 | Sample detection information management method |
Also Published As
Publication number | Publication date |
---|---|
CN107808376B (en) | 2022-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107808376A (en) | A kind of detection method of raising one's hand based on deep learning | |
Wu et al. | Recent advances in video-based human action recognition using deep learning: A review | |
CN108334848B (en) | Tiny face recognition method based on generation countermeasure network | |
Li et al. | Robust visual tracking based on convolutional features with illumination and occlusion handing | |
CN108062525B (en) | Deep learning hand detection method based on hand region prediction | |
CN107316058A (en) | Improve the method for target detection performance by improving target classification and positional accuracy | |
CN105160310A (en) | 3D (three-dimensional) convolutional neural network based human body behavior recognition method | |
CN107481264A (en) | A kind of video target tracking method of adaptive scale | |
CN114220035A (en) | Rapid pest detection method based on improved YOLO V4 | |
CN109816689A (en) | A kind of motion target tracking method that multilayer convolution feature adaptively merges | |
CN107808143A (en) | Dynamic gesture identification method based on computer vision | |
CN107945153A (en) | A kind of road surface crack detection method based on deep learning | |
CN105740758A (en) | Internet video face recognition method based on deep learning | |
CN107292246A (en) | Infrared human body target identification method based on HOG PCA and transfer learning | |
Li et al. | Sign language recognition based on computer vision | |
CN106650619A (en) | Human action recognition method | |
CN114241548A (en) | Small target detection algorithm based on improved YOLOv5 | |
CN103400122A (en) | Method for recognizing faces of living bodies rapidly | |
CN108664838A (en) | Based on the monitoring scene pedestrian detection method end to end for improving RPN depth networks | |
CN110163567A (en) | Classroom roll calling system based on multitask concatenated convolutional neural network | |
Yang et al. | Facial expression recognition based on dual-feature fusion and improved random forest classifier | |
CN108171133A (en) | A kind of dynamic gesture identification method of feature based covariance matrix | |
CN114241422A (en) | Student classroom behavior detection method based on ESRGAN and improved YOLOv5s | |
Tanisik et al. | Facial descriptors for human interaction recognition in still images | |
CN109360179A (en) | A kind of image interfusion method, device and readable storage medium storing program for executing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220311 |
|
CF01 | Termination of patent right due to non-payment of annual fee |