CN109977773A - Human bodys' response method and system based on multi-target detection 3D CNN - Google Patents
Human bodys' response method and system based on multi-target detection 3D CNN Download PDFInfo
- Publication number
- CN109977773A CN109977773A CN201910136442.1A CN201910136442A CN109977773A CN 109977773 A CN109977773 A CN 109977773A CN 201910136442 A CN201910136442 A CN 201910136442A CN 109977773 A CN109977773 A CN 109977773A
- Authority
- CN
- China
- Prior art keywords
- model
- data
- video
- cnn
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 53
- 230000004044 response Effects 0.000 title claims abstract description 35
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 46
- 230000004927 fusion Effects 0.000 claims abstract description 26
- 238000000605 extraction Methods 0.000 claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 230000006399 behavior Effects 0.000 claims description 36
- 238000012549 training Methods 0.000 claims description 21
- 238000013480 data collection Methods 0.000 claims description 20
- 230000003542 behavioural effect Effects 0.000 claims description 19
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 230000009471 action Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 230000008014 freezing Effects 0.000 claims description 3
- 238000007710 freezing Methods 0.000 claims description 3
- 238000013508 migration Methods 0.000 claims description 3
- 230000005012 migration Effects 0.000 claims description 3
- 238000013442 quality metrics Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- NVNSXBXKNMWKEJ-UHFFFAOYSA-N 5-[[5-(2-nitrophenyl)furan-2-yl]methylidene]-1,3-diphenyl-2-sulfanylidene-1,3-diazinane-4,6-dione Chemical compound [O-][N+](=O)C1=CC=CC=C1C(O1)=CC=C1C=C1C(=O)N(C=2C=CC=CC=2)C(=S)N(C=2C=CC=CC=2)C1=O NVNSXBXKNMWKEJ-UHFFFAOYSA-N 0.000 claims description 2
- 238000003062 neural network model Methods 0.000 abstract description 2
- 230000000052 comparative effect Effects 0.000 abstract 1
- 238000013461 design Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000033001 locomotion Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008018 melting Effects 0.000 description 2
- 238000002844 melting Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000009123 feedback regulation Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of Human bodys' response method and system based on multi-target detection 3D CNN to convert picture frame for video flowing this method comprises: 1) pre-processing to video;2) calibration cutting is carried out to the target object in video using the SSD detection technique of current comparative maturity;3) it establishes image frame data and demarcates the feature extraction network structure of cut data;4) Fusion Features model is established, the two kinds of features extracted in step 3) are merged;5) classified using Softmax regression model classifier;6) according to actual application scenarios or common data sets, trained model is finely adjusted.The present invention makes up current deep neural network model convolution and a kind of situation for causing information to lose on time dimension, strengthens the expression of feature on time dimension, the whole recognition efficiency for improving model enables model to better understand the behavior act of human body.
Description
Technical field
The present invention relates to the technical fields of Human bodys' response analysis, refer in particular to a kind of based on multi-target detection 3D CNN
Human bodys' response method and system.
Background technique
Human bodys' response refers to behavior expression or the movement of the mankind in identification true environment, can be in every field
In applied.Application scenarios common at present have: intelligent monitoring, smart home, human-computer interaction and the analysis of human body behavior property,
The fields such as anticipation.However, the accuracy rate and efficiency of promotion identification are still a very challenging task, also receive
The extensive concern of all researchers.
In in the past few decades, extraction and expression to human body behavioural characteristic predominantly stay in the artificial stage, and artificial
The experience of designer is often depended on again to design, the extraction of feature.Common manual features extracting method has: space-time interest points
(STIP), vision bag of words (BOVW), histograms of oriented gradients (HOG), motion history figure (MHI), kinergety image (MEI)
Deng.The design of manual features is often to carry out just for the specific data of certain a part, and which results in the general of model
Change ability is poor, can not quickly move to other application up, greatly increase artificial cost.Conventional method can be with
Say it is to enter a bottleneck period.
Application of the deep learning in Human bodys' response can be described as to present on tional identification mode insufficient one
A very big makes up.It is mainly reflected in the following aspects: (1) avoiding the trouble of manual features extraction, simplify feature and mention
The process taken;(2) since deep neural network all has certain feedback regulation effect, model is largely strengthened
Generalization ability;(3) automatic dimensionality reduction can be carried out to complicated feature;(4) it in terms of handling big data, can greatly reduce
The expense of calculating and the execution efficiency for improving entirety;(5) classify for the identification of no label data, performance is more excellent;(6) for base
It is relatively easy in the realization of the Activity recognition of mode, it is only necessary to which individually designed corresponding deep learning model carries out mentioning for feature
It takes, then the feature of two or more network models is merged, it is very big that this has obtained the accuracy of identification
It is promoted.
One maximum difference of the analysis of Human bodys' response and image classification detection is that whether contain the time
Information in dimension.Therefore, the analysis of Human bodys' response to not only be gone to extract behavioural characteristic from Spatial Dimension, also wanted
Successional information is excavated up from the time dimension of its behavior.It can guarantee in this way to successional behavior act
Correct description.
Summary of the invention
It is an object of the invention to overcome current deep neural network model time dimension information on Human bodys' response
The deficiency of capture proposes a kind of Human bodys' response method and system based on multi-target detection 3D CNN, make up its when
Between convolution and a kind of situation for causing information to lose in dimension, strengthen the expression of the feature on time dimension, it is whole to improve mould
The recognition efficiency of type enables model to better understand the behavior act of human body.
To achieve the above object, technical solution provided by the present invention is as follows:
Human bodys' response method based on multi-target detection 3D CNN, comprising the following steps:
1) video is pre-processed, converts picture frame for video flowing;
2) using SSD (full name: Single Shot MultiBox Detector) detection technique to the target pair in video
As carrying out calibration cutting;
3) it establishes image frame data and demarcates the feature extraction network structure of cut data;
4) Fusion Features model is established, the two kinds of features extracted in step 3) are merged;
5) classified using Softmax regression model classifier;
6) according to actual application scenarios or common data sets, trained model is finely adjusted, enhances the general of model
Change, Generalization Ability.
In step 1), video is pre-processed, converts picture frame for video flowing, comprising the following steps:
1.1) sets of video data is obtained, is mainly used for the training of model, test data using common data sets here
Integrate and is acquired as camera under true environment;
1.2) archive operation is carried out to sets of video data, the video data filing of same action behavior is pressed from both sides to same file
Under, file is named with its behavior label;
1.3) sets of video data is pre-processed, all videos is completely converted by Video Quality Metric shell script
Corresponding picture frame collection;
1.4) cutting division is carried out to the picture frame collection that step 1.3) obtains using cross-validation method, the instruction for model
Practice;
In step 2), calibration cutting, including following step are carried out to the target object in video using SSD detection technique
It is rapid:
2.1) trained SSD detection model is loaded;
2.2) video stream data is read, is sent into SSD detection model, calibration detection is carried out to each frame of video;
2.3) size that setting nominal data is cut, the half of each frame sign is concentrated for step 1.3) frame data, to institute
There is video to be converted and saves as the picture frame collection of calibration.
In step 3), establishes image frame data and demarcate the feature extraction network structure of cut data, specific as follows:
Firstly, building the 3D convolutional neural networks model and human detection module data set based on image frame data collection respectively
3D convolutional neural networks model;Then 5 layers of 3D convolution operation, 5 are respectively adopted as the input of model using continuous 16 frame data
The maximum 3D pondization operation of layer, 1 layer of Fusion Features layer and 3 layers of full attended operation;To prevent model training over-fitting, to 5 layers of convolution
Layer uses L2 canonical, adds dropout (0.5) in full articulamentum;
In step 4), Fusion Features model is established, carries out the fusion of feature, comprising the following steps:
4.1) 3D convolutional neural networks model and human detection module data set based on image frame data collection are obtained respectively
3D convolutional neural networks model extraction 3D convolution feature, and Flatten () operation is carried out to the feature of acquisition, as melting
Close the input of layer;
4.2) fusion of intermediate features, the input as full articulamentum are completed.
In step 5), classified using Softmax classifier, comprising the following steps:
5.1) after the fusion for completing feature in the step 4), into crossing after three layers of full articulamentum as Softmax classifier
Input, then classifies;
5.2) setting early warning report threshold value, when determine some behavior act discrimination reach its corresponding threshold value it
Afterwards, system provides early warning.
In step 6), according to actual application scenarios or common data sets, trained model is finely adjusted, is enhanced
Extensive, the Generalization Ability of model, comprising the following steps:
6.1) migration models are into specific application scenarios, the convolution sum pond layer parameter of Freezing Model;
6.2) input of model, output layer are changed;
6.3) data set under new scene, the parameter of the full articulamentum of re -training are loaded.
Human bodys' response system based on multi-target detection 3D CNN, comprising:
Data acquisition module, for acquiring the original video data information of human body behavioural analysis, including public behavior number
According to the sets of video data in collection and actual scene;
Data preprocessing module, for being pre-processed to original video data, calibration of classifying, target detection, cutting, with
And video frame conversion;
Characteristic extracting module is extracted respectively for pretreated data to be sent into the 3D CNN network model of building
The behavioral agent characteristic information that video flowing behavior characteristic information and calibration are cut;
Fusion Features module, for being merged to the characteristic information that characteristic extracting module obtains;
Model training module, by carrying out learning model building to pretreated training set, the multi-target detection after being trained
3D CNN Human bodys' response model;
Human bodys' response module, it is dynamic to the behavior of human body using the 3D CNN Human bodys' response model of multi-target detection
Make carry out Classification and Identification.
Further, the data acquisition module acquires the video in actual scene by monocular cam and binocular camera
Data, and download disclosed human body behavioral data collection;The data preprocessing module is using " FFmpeg " tool to video data
It is handled, is converted to picture frame collection, while video is demarcated using SSD detection algorithm, is cut, generate multiple target frame number
According to collection;The characteristic extracting module uses 3D CNN model, using continuous 16 frame data as the input of model, using 5 layers 3D volumes
Product operation and 5 layers of maximum 3D pondization operation;The Fusion Features module uses 1 layer of 3D Fusion Features layer structure, merges two kinds of rows
It is characterized information, 3 layers of full articulamentum are further extracted and classified to feature;The model training module uses " UCF-
101 " combine composing training data with " HMDB51 " public human body behavioral data collection and the real data collection oneself acquired
Collection;The Human bodys' response module carries out Classification and Identification using Softmax classifier.
Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that
1, picture frame collection is converted video data to, and utilizes SSD (full name: Single Shot MultiBox
Detector) detection algorithm is demarcated the personage in video flowing, is cut, can not only be from the global behavior extracted in video
Characteristic information, additionally it is possible to which the extraction that local feature is carried out for behavioral agent makes up the drawbacks of global characteristics weaken, and strengthens model
The ability of study.
2, the extraction that using 3D CNN model two kinds of pretreated data sets are carried out with feature, can make up for it traditional 2D CNN
It can only be from the shortcoming for spatially extracting video features, without individually doing other extractions to the temporal aspect of behavior, melting
It closes, it is only necessary to press batch input picture frame data;Model will get on extraction behavior from time and two, space dimension automatically
Feature greatly reduces the difficulty of feature extraction on time dimension.
3, the behavioural characteristic that model learning arrives can not only be used to Classification and Identification, be also used as the effect of early warning report,
Model prejudges according to the threshold value of warning that sets special behavior and reported, increases model in practical applications
Scene.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart.
Fig. 2 is 3D convolution operation structural schematic diagram in the present invention.
Fig. 3 is 3D convolutional neural networks model structure design drawing in the present invention.
Fig. 4 is based on multi-target detection 3D CNN model structure schematic diagram.
Specific embodiment
The present invention is further explained in the light of specific embodiments.
It is shown in Figure 1, the Human bodys' response method based on multi-target detection 3D CNN provided by the present embodiment, packet
Include following steps:
1) Human bodys' response data collection system is established, human body behavior sets of video data is obtained, mainly uses here
It is the training that common data sets are used for model, test data set is that camera is acquired under true environment;
2) frame data collection is respectively converted into the sets of video data of acquisition and utilizes SSD (full name: Single Shot
MultiBox Detector) detection algorithm calibration cut data set;
3) 3D CNN learning model is established, data set is learnt respectively, and the feature of each self study is merged
Processing;
4) Classification and Identification is carried out to fused feature using Softmax classifier;
5) classify to classification results behavior and demarcate identification or early warning report;
6) model is finely adjusted according to specific application scene, enhances popularization and the generalization ability of model.
In step 2), the sets of video data of step 1) acquisition is pre-processed.Since the model is for multiple target
Fusion recognition, therefore be divided into including the independent process of following two:
2.1) frame cutting is directly carried out to sets of video data, establishes first frame data collection, comprising the following steps:
2.1.1 archive operation) is carried out to sets of video data, the video data of same action behavior is filed to same file
Under folder, file is named with its behavior label;
2.1.2) sets of video data is pre-processed, all videos are totally converted by Video Quality Metric shell script
For corresponding picture frame collection;
2.1.3) using cross-validation method to 2.1.2) obtain picture frame collection carry out cutting division, the instruction for model
Practice.
2.2) with SSD (full name: Single Shot MultiBox Detector) algorithm to the master of behavior act
Body is detected, and is extracted targetedly motion characteristic, is established second frame data collection, comprising the following steps:
2.2.1 trained SSD (full name: Single Shot MultiBox Detector) detection model) is loaded;
2.2.2 video stream data) is read, is sent into SSD detection model, calibration detection is carried out to each frame of video;
2.2.3) the size that setting nominal data is cut, the half of each frame sign is concentrated for 2.1.3) frame data, to institute
There is video to be converted and saves as the picture frame collection of calibration.
Shown in Figure 2, the 3D CNN model to design in the present invention carries out convolution operation, extracts the structure of behavioural characteristic
Schematic diagram.3D CNN can go to extract behavior characteristic information from two dimensions of room and time, as can be seen from Figure 2, carry out convolution behaviour
The time dimension of work is N, i.e., carries out convolution operation to continuous N frame image.3D convolution in figure is N number of continuous by stacking
Picture frame forms a cube, and 3D convolution kernel is then used in cube.In this structure, each in convolutional layer is special
Sign map can be connected with multiple neighbouring successive frames in upper one layer, therefore capture motion information.
It is shown in Figure 3, in step 3), 3D CNN model is established, the study of feature is carried out, comprising the following steps:
3.1) 3D convolutional neural networks model and human detection module data set based on image frame data collection are built respectively
3D convolutional neural networks model.Using continuous 16 frame data as the input of model, 5 layers of 3D convolution operation are respectively adopted (wherein
The number of convolution kernel is followed successively by 64,128,256,512,512), 5 layers of maximum 3D pondizations operation and 1 layer connect that (number is entirely
2048) operate, the feature of acquisition is used as the input of Model Fusion layer, it is specific as shown in figure 4, it the following steps are included:
3.1.1 the 3D convolution feature of two model extractions) is obtained respectively, and Flatten () behaviour is carried out to the feature of acquisition
Make, the input as fused layer;
3.1.2 the fusion of intermediate features, the input as full articulamentum) are completed.
3.2) to prevent model training over-fitting, L2 canonical is used to 5 layers of convolutional layer, adds dropout in full articulamentum
(0.5)。
It is shown in Figure 4, classification knowledge is carried out to the fused feature of step 3.1) using Softmax classifier in step 4)
Not, comprising the following steps:
4.1) after the fusion for completing feature, into input as Softmax classifier after three layers of full articulamentum is crossed, then into
Row classification;
4.2) setting early warning report threshold value, when determine some behavior act discrimination reach its corresponding threshold value it
Afterwards, system provides early warning.
In step 6), model is finely adjusted according to specific application scene, enhances popularization and the generalization ability of model, including
Following steps:
6.1) migration models are into specific application scenarios, the convolution sum pond layer parameter of Freezing Model;
6.2) input of model, output layer are changed;
6.3) data set under new scene, the parameter of the full articulamentum of re -training are loaded.
It is below a kind of Human bodys' response system based on multi-modal 3D CNN provided by the present embodiment, comprising:
Data acquisition module: for acquiring the original video data information of human body behavioural analysis, including public behavior number
According to the sets of video data in collection and actual scene.In the present embodiment, it is acquired using monocular cam and binocular camera real
Video data in the scene of border, and download disclosed human body behavioral data collection, total data set as acquisition.
Data preprocessing module: for being pre-processed to original video data, calibration of classifying, target detection, cutting, with
And video frame conversion.In the present embodiment, video data is handled using " FFmpeg " tool, is converted to picture frame
Collect, while video is demarcated using SSD (full name: Single Shot MultiBox Detector) detection algorithm, is cut out
It cuts, generates multiple target frame data collection.
Characteristic extracting module: it for pretreated data to be sent into the 3D CNN network model of building, extracts respectively
The behavioral agent characteristic information that video flowing behavior characteristic information and calibration are cut.In the present embodiment, using 3D CNN model.
Two kinds are extracted to obtain using 5 layers of 3D convolution operation and 5 layers of maximum 3D pondization operation using continuous 16 frame data as the input of model
Input of the characteristic information as Fusion Features module.
Fusion Features module: for being merged to the characteristic information that characteristic extracting module obtains.In the present embodiment,
Using 1 layer of 3D Fusion Features layer structure, two kinds of behavior characteristic informations are merged, 3 layers of full articulamentum further extract feature,
Classification.
Model training module: by carrying out learning model building to pretreated training set, the multi-target detection after being trained
3D CNN Human bodys' response model.In the present embodiment, " UCF-101 " is used, the public human body behavior such as " HMDB51 "
Data set and the real data collection of oneself acquisition combine composing training data set.
Human bodys' response module: dynamic to the behavior of human body using the 3D CNN Human bodys' response model of multi-target detection
Make carry out Classification and Identification.In the present embodiment, Classification and Identification is carried out using Softmax classifier.
In the above-described embodiments, included modules are that function logic according to the invention is divided, but
It is not limited to the above division, as long as corresponding functions can be realized, the protection scope that is not intended to restrict the invention.
In conclusion the Human bodys' response method and system provided by the present invention based on multi-target detection 3D CNN,
Not only compensate for the deficiency that 2D neural network extracts feature on time dimension;The method of multi-target detection is also used, is introduced
SSD (full name: Single Shot MultiBox Detector) algorithm of target detection carries out the behavioral agent in video flowing
The drawbacks of calibration is for obtaining more detailed local feature, being fused in model, make up the reduction of model global characteristics;Mould simultaneously
The behavioural characteristic that type learns can not only be used to Classification and Identification, be also used as the effect of early warning report, and model will be according to setting
The threshold value of warning set, prejudges special behavior, report.Increase the scene of model in practical applications.Of the invention
Model, which can also migrate, is applied to the enterprising enforcement use of the platform of internet of things such as smart home, intelligent monitoring, intelligent anti-theft, has extensive
Research and use value, the popularization referred to.
Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore
All shapes according to the present invention change made by principle, should all be included within the scope of protection of the present invention.
Claims (7)
1. the Human bodys' response method based on multi-target detection 3D CNN, which comprises the following steps:
1) video is pre-processed, converts picture frame for video flowing;
2) calibration cutting is carried out to the target object in video using SSD detection technique;
3) it establishes image frame data and demarcates the feature extraction network structure of cut data;
4) Fusion Features model is established, the two kinds of features extracted in step 3) are merged;
5) classified using Softmax regression model classifier;
6) according to actual application scenarios or common data sets, trained model is finely adjusted, enhance model it is extensive, push away
Wide ability.
2. the Human bodys' response method according to claim 1 based on multi-target detection 3D CNN, which is characterized in that
In step 1), video is pre-processed, converts picture frame for video flowing, comprising the following steps:
1.1) sets of video data is obtained, is mainly used for the training of model using common data sets here, test data set is
Camera is acquired under true environment;
1.2) archive operation is carried out to sets of video data, under the video data filing to same file folder of same action behavior,
File is named with its behavior label;
1.3) sets of video data is pre-processed, all videos is completely converted by Video Quality Metric shell script by correspondence
Picture frame collection;
1.4) cutting division is carried out to the picture frame collection that step 1.3) obtains using cross-validation method, the training for model;
In step 2), calibration cutting is carried out to the target object in video using SSD detection technique, comprising the following steps:
2.1) trained SSD detection model is loaded;
2.2) video stream data is read, is sent into SSD detection model, calibration detection is carried out to each frame of video;
2.3) size that setting nominal data is cut, the half of each frame sign is concentrated for step 1.3) frame data, to all views
The picture frame collection of calibration is converted and saved as to frequency.
3. the Human bodys' response method according to claim 1 based on multi-target detection 3D CNN, which is characterized in that
In step 3), establishes image frame data and demarcate the feature extraction network structure of cut data, specific as follows:
Firstly, building the 3D of 3D convolutional neural networks model and human detection module data set based on image frame data collection respectively
Convolutional neural networks model;Then using continuous 16 frame data as the input of model, be respectively adopted 5 layers of 3D convolution operation, 5 layers most
Big 3D pondization operation, 1 layer of Fusion Features layer and 3 layers of full attended operation;To prevent model training over-fitting, 5 layers of convolutional layer are adopted
With L2 canonical, dropout (0.5) is added in full articulamentum;
In step 4), Fusion Features model is established, carries out the fusion of feature, comprising the following steps:
4.1) 3D of 3D convolutional neural networks model and human detection module data set based on image frame data collection is obtained respectively
The 3D convolution feature of convolutional neural networks model extraction, and Flatten () operation is carried out to the feature of acquisition, as fused layer
Input;
4.2) fusion of intermediate features, the input as full articulamentum are completed.
4. the Human bodys' response method according to claim 1 based on multi-target detection 3D CNN, which is characterized in that
In step 5), classified using Softmax classifier, comprising the following steps:
5.1) after the fusion for completing feature in the step 4), into the input crossed after three layers of full articulamentum as Softmax classifier,
Then classify;
5.2) threshold value of setting early warning report, after the discrimination for determining some behavior act reaches its corresponding threshold value,
System provides early warning.
5. the Human bodys' response method according to claim 1 based on multi-target detection 3D CNN, which is characterized in that
In step 6), according to actual application scenarios or common data sets, trained model is finely adjusted, enhances the general of model
Change, Generalization Ability, comprising the following steps:
6.1) migration models are into specific application scenarios, the convolution sum pond layer parameter of Freezing Model;
6.2) input of model, output layer are changed;
6.3) data set under new scene, the parameter of the full articulamentum of re -training are loaded.
6. the Human bodys' response system based on multi-target detection 3D CNN characterized by comprising
Data acquisition module, for acquiring the original video data information of human body behavioural analysis, including public behavioral data collection
With the sets of video data in actual scene;
Data preprocessing module, for being pre-processed to original video data, calibration of classifying, target detection, cutting and view
The conversion of frequency frame;
Characteristic extracting module extracts video for pretreated data to be sent into the 3D CNN network model of building respectively
The behavioral agent characteristic information that Flow Behavior characteristic information and calibration are cut;
Fusion Features module, for being merged to the characteristic information that characteristic extracting module obtains;
Model training module, by carrying out learning model building, the 3D of the multi-target detection after being trained to pretreated training set
CNN Human bodys' response model;
Human bodys' response module, using multi-target detection 3D CNN Human bodys' response model to the behavior act of human body into
Row Classification and Identification.
7. the Human bodys' response system according to claim 6 based on multi-target detection 3D CNN, it is characterised in that: institute
Data acquisition module is stated by the video data in monocular cam and binocular camera acquisition actual scene, and disclosed in downloading
Human body behavioral data collection;The data preprocessing module is handled video data using " FFmpeg " tool, is converted to figure
As frame collection, while using SSD detection algorithm video demarcated, being cut, generates multiple target frame data collection;The feature extraction
Module uses 3D CNN model, using continuous 16 frame data as the input of model, using 5 layers of 3D convolution operation and 5 layers of maximum 3D
Pondization operation;The Fusion Features module uses 1 layer of 3D Fusion Features layer structure, merges two kinds of behavior characteristic informations, 3 layers connect entirely
It connects layer and feature is further extracted and classified;The model training module uses " UCF-101 " and " HMDB51 " public people
Body behavioral data collection and the real data collection of oneself acquisition combine composing training data set;The Human bodys' response
Module carries out Classification and Identification using Softmax classifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910136442.1A CN109977773B (en) | 2019-02-18 | 2019-02-18 | Human behavior identification method and system based on multi-target detection 3D CNN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910136442.1A CN109977773B (en) | 2019-02-18 | 2019-02-18 | Human behavior identification method and system based on multi-target detection 3D CNN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109977773A true CN109977773A (en) | 2019-07-05 |
CN109977773B CN109977773B (en) | 2021-01-19 |
Family
ID=67077264
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910136442.1A Expired - Fee Related CN109977773B (en) | 2019-02-18 | 2019-02-18 | Human behavior identification method and system based on multi-target detection 3D CNN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977773B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348420A (en) * | 2019-07-18 | 2019-10-18 | 腾讯科技(深圳)有限公司 | Sign Language Recognition Method, device, computer readable storage medium and computer equipment |
CN110414415A (en) * | 2019-07-24 | 2019-11-05 | 北京理工大学 | Human bodys' response method towards classroom scene |
CN110414421A (en) * | 2019-07-25 | 2019-11-05 | 电子科技大学 | A kind of Activity recognition method based on sequential frame image |
CN110532909A (en) * | 2019-08-16 | 2019-12-03 | 成都电科慧安科技有限公司 | A kind of Human bodys' response method based on three-dimensional UWB positioning |
CN111259838A (en) * | 2020-01-20 | 2020-06-09 | 山东大学 | Method and system for deeply understanding human body behaviors in service robot service environment |
CN111382677A (en) * | 2020-02-25 | 2020-07-07 | 华南理工大学 | Human behavior identification method and system based on 3D attention residual error model |
CN112016461A (en) * | 2020-08-28 | 2020-12-01 | 深圳市信义科技有限公司 | Multi-target behavior identification method and system |
CN112232190A (en) * | 2020-10-15 | 2021-01-15 | 南京邮电大学 | Method for detecting abnormal behaviors of old people facing home scene |
CN112613428A (en) * | 2020-12-28 | 2021-04-06 | 杭州电子科技大学 | Resnet-3D convolution cattle video target detection method based on balance loss |
CN112766151A (en) * | 2021-01-19 | 2021-05-07 | 北京深睿博联科技有限责任公司 | Binocular target detection method and system for blind guiding glasses |
CN113052059A (en) * | 2021-03-22 | 2021-06-29 | 中国石油大学(华东) | Real-time action recognition method based on space-time feature fusion |
CN113221658A (en) * | 2021-04-13 | 2021-08-06 | 卓尔智联(武汉)研究院有限公司 | Training method and device of image processing model, electronic equipment and storage medium |
CN113420703A (en) * | 2021-07-03 | 2021-09-21 | 西北工业大学 | Dynamic facial expression recognition method based on multi-scale feature extraction and multi-attention mechanism modeling |
CN113515986A (en) * | 2020-07-02 | 2021-10-19 | 阿里巴巴集团控股有限公司 | Video processing method, data processing method and equipment |
CN113536847A (en) * | 2020-04-17 | 2021-10-22 | 天津职业技术师范大学(中国职业培训指导教师进修中心) | Industrial scene video analysis system and method based on deep learning |
CN115601714A (en) * | 2022-12-16 | 2023-01-13 | 广东汇通信息科技股份有限公司(Cn) | Campus violent behavior identification method based on multi-mode data analysis |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899561A (en) * | 2015-05-27 | 2015-09-09 | 华南理工大学 | Parallelized human body behavior identification method |
US20180032846A1 (en) * | 2016-08-01 | 2018-02-01 | Nvidia Corporation | Fusing multilayer and multimodal deep neural networks for video classification |
CN108108652A (en) * | 2017-03-29 | 2018-06-01 | 广东工业大学 | A kind of across visual angle Human bodys' response method and device based on dictionary learning |
CN108647591A (en) * | 2018-04-25 | 2018-10-12 | 长沙学院 | Activity recognition method and system in a kind of video of view-based access control model-semantic feature |
CN108985173A (en) * | 2018-06-19 | 2018-12-11 | 奕通信息科技(上海)股份有限公司 | Towards the depth network migration learning method for having the label apparent age data library of noise |
CN109002808A (en) * | 2018-07-27 | 2018-12-14 | 高新兴科技集团股份有限公司 | A kind of Human bodys' response method and system |
-
2019
- 2019-02-18 CN CN201910136442.1A patent/CN109977773B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899561A (en) * | 2015-05-27 | 2015-09-09 | 华南理工大学 | Parallelized human body behavior identification method |
US20180032846A1 (en) * | 2016-08-01 | 2018-02-01 | Nvidia Corporation | Fusing multilayer and multimodal deep neural networks for video classification |
CN108108652A (en) * | 2017-03-29 | 2018-06-01 | 广东工业大学 | A kind of across visual angle Human bodys' response method and device based on dictionary learning |
CN108647591A (en) * | 2018-04-25 | 2018-10-12 | 长沙学院 | Activity recognition method and system in a kind of video of view-based access control model-semantic feature |
CN108985173A (en) * | 2018-06-19 | 2018-12-11 | 奕通信息科技(上海)股份有限公司 | Towards the depth network migration learning method for having the label apparent age data library of noise |
CN109002808A (en) * | 2018-07-27 | 2018-12-14 | 高新兴科技集团股份有限公司 | A kind of Human bodys' response method and system |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348420A (en) * | 2019-07-18 | 2019-10-18 | 腾讯科技(深圳)有限公司 | Sign Language Recognition Method, device, computer readable storage medium and computer equipment |
CN110348420B (en) * | 2019-07-18 | 2022-03-18 | 腾讯科技(深圳)有限公司 | Sign language recognition method and device, computer readable storage medium and computer equipment |
US11749029B2 (en) | 2019-07-18 | 2023-09-05 | Tencent Technology (Shenzhen) Company Limited | Gesture language recognition method and apparatus, computer-readable storage medium, and computer device |
CN110414415A (en) * | 2019-07-24 | 2019-11-05 | 北京理工大学 | Human bodys' response method towards classroom scene |
CN110414421B (en) * | 2019-07-25 | 2023-04-07 | 电子科技大学 | Behavior identification method based on continuous frame images |
CN110414421A (en) * | 2019-07-25 | 2019-11-05 | 电子科技大学 | A kind of Activity recognition method based on sequential frame image |
CN110532909A (en) * | 2019-08-16 | 2019-12-03 | 成都电科慧安科技有限公司 | A kind of Human bodys' response method based on three-dimensional UWB positioning |
CN111259838A (en) * | 2020-01-20 | 2020-06-09 | 山东大学 | Method and system for deeply understanding human body behaviors in service robot service environment |
CN111259838B (en) * | 2020-01-20 | 2023-02-03 | 山东大学 | Method and system for deeply understanding human body behaviors in service robot service environment |
CN111382677A (en) * | 2020-02-25 | 2020-07-07 | 华南理工大学 | Human behavior identification method and system based on 3D attention residual error model |
CN111382677B (en) * | 2020-02-25 | 2023-06-20 | 华南理工大学 | Human behavior recognition method and system based on 3D attention residual error model |
CN113536847A (en) * | 2020-04-17 | 2021-10-22 | 天津职业技术师范大学(中国职业培训指导教师进修中心) | Industrial scene video analysis system and method based on deep learning |
CN113515986A (en) * | 2020-07-02 | 2021-10-19 | 阿里巴巴集团控股有限公司 | Video processing method, data processing method and equipment |
CN112016461A (en) * | 2020-08-28 | 2020-12-01 | 深圳市信义科技有限公司 | Multi-target behavior identification method and system |
CN112232190B (en) * | 2020-10-15 | 2023-04-18 | 南京邮电大学 | Method for detecting abnormal behaviors of old people facing home scene |
CN112232190A (en) * | 2020-10-15 | 2021-01-15 | 南京邮电大学 | Method for detecting abnormal behaviors of old people facing home scene |
CN112613428A (en) * | 2020-12-28 | 2021-04-06 | 杭州电子科技大学 | Resnet-3D convolution cattle video target detection method based on balance loss |
CN112613428B (en) * | 2020-12-28 | 2024-03-22 | 易采天成(郑州)信息技术有限公司 | Resnet-3D convolution cattle video target detection method based on balance loss |
CN112766151B (en) * | 2021-01-19 | 2022-07-12 | 北京深睿博联科技有限责任公司 | Binocular target detection method and system for blind guiding glasses |
CN112766151A (en) * | 2021-01-19 | 2021-05-07 | 北京深睿博联科技有限责任公司 | Binocular target detection method and system for blind guiding glasses |
CN113052059A (en) * | 2021-03-22 | 2021-06-29 | 中国石油大学(华东) | Real-time action recognition method based on space-time feature fusion |
CN113221658A (en) * | 2021-04-13 | 2021-08-06 | 卓尔智联(武汉)研究院有限公司 | Training method and device of image processing model, electronic equipment and storage medium |
CN113420703A (en) * | 2021-07-03 | 2021-09-21 | 西北工业大学 | Dynamic facial expression recognition method based on multi-scale feature extraction and multi-attention mechanism modeling |
CN115601714A (en) * | 2022-12-16 | 2023-01-13 | 广东汇通信息科技股份有限公司(Cn) | Campus violent behavior identification method based on multi-mode data analysis |
CN115601714B (en) * | 2022-12-16 | 2023-03-10 | 广东汇通信息科技股份有限公司 | Campus violent behavior identification method based on multi-modal data analysis |
Also Published As
Publication number | Publication date |
---|---|
CN109977773B (en) | 2021-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977773A (en) | Human bodys' response method and system based on multi-target detection 3D CNN | |
Mees et al. | Choosing smartly: Adaptive multimodal fusion for object detection in changing environments | |
Yin et al. | Recurrent convolutional network for video-based smoke detection | |
US10083378B2 (en) | Automatic detection of objects in video images | |
CN109598268B (en) | RGB-D (Red Green blue-D) significant target detection method based on single-stream deep network | |
CN103871079B (en) | Wireless vehicle tracking based on machine learning and light stream | |
CN107247956B (en) | Rapid target detection method based on grid judgment | |
CN109543697A (en) | A kind of RGBD images steganalysis method based on deep learning | |
CN111382677B (en) | Human behavior recognition method and system based on 3D attention residual error model | |
CN113255443B (en) | Graph annotation meaning network time sequence action positioning method based on pyramid structure | |
CN110929593A (en) | Real-time significance pedestrian detection method based on detail distinguishing and distinguishing | |
CN104504395A (en) | Method and system for achieving classification of pedestrians and vehicles based on neural network | |
Chen et al. | An improved Yolov3 based on dual path network for cherry tomatoes detection | |
CN108345894A (en) | A kind of traffic incidents detection method based on deep learning and entropy model | |
Lu et al. | Multi-object detection method based on YOLO and ResNet hybrid networks | |
CN110852190A (en) | Driving behavior recognition method and system integrating target detection and gesture recognition | |
CN111723600B (en) | Pedestrian re-recognition feature descriptor based on multi-task learning | |
CN110619268A (en) | Pedestrian re-identification method and device based on space-time analysis and depth features | |
CN110688938A (en) | Pedestrian re-identification method integrated with attention mechanism | |
Yang et al. | Counting crowds using a scale-distribution-aware network and adaptive human-shaped kernel | |
CN103500456B (en) | A kind of method for tracing object based on dynamic Bayesian network network and equipment | |
CN105469050A (en) | Video behavior identification method based on local space-time characteristic description and pyramid vocabulary tree | |
Tsutsui et al. | Distantly supervised road segmentation | |
Mathur et al. | A brief survey of deep learning techniques for person re-identification | |
CN113205060A (en) | Human body action detection method adopting circulatory neural network to judge according to bone morphology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210119 |
|
CF01 | Termination of patent right due to non-payment of annual fee |