CN109977773A - Human bodys' response method and system based on multi-target detection 3D CNN - Google Patents

Human bodys' response method and system based on multi-target detection 3D CNN Download PDF

Info

Publication number
CN109977773A
CN109977773A CN201910136442.1A CN201910136442A CN109977773A CN 109977773 A CN109977773 A CN 109977773A CN 201910136442 A CN201910136442 A CN 201910136442A CN 109977773 A CN109977773 A CN 109977773A
Authority
CN
China
Prior art keywords
model
data
video
cnn
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910136442.1A
Other languages
Chinese (zh)
Other versions
CN109977773B (en
Inventor
董敏
李永发
毕盛
聂宏蓄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910136442.1A priority Critical patent/CN109977773B/en
Publication of CN109977773A publication Critical patent/CN109977773A/en
Application granted granted Critical
Publication of CN109977773B publication Critical patent/CN109977773B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of Human bodys' response method and system based on multi-target detection 3D CNN to convert picture frame for video flowing this method comprises: 1) pre-processing to video;2) calibration cutting is carried out to the target object in video using the SSD detection technique of current comparative maturity;3) it establishes image frame data and demarcates the feature extraction network structure of cut data;4) Fusion Features model is established, the two kinds of features extracted in step 3) are merged;5) classified using Softmax regression model classifier;6) according to actual application scenarios or common data sets, trained model is finely adjusted.The present invention makes up current deep neural network model convolution and a kind of situation for causing information to lose on time dimension, strengthens the expression of feature on time dimension, the whole recognition efficiency for improving model enables model to better understand the behavior act of human body.

Description

Human bodys' response method and system based on multi-target detection 3D CNN
Technical field
The present invention relates to the technical fields of Human bodys' response analysis, refer in particular to a kind of based on multi-target detection 3D CNN Human bodys' response method and system.
Background technique
Human bodys' response refers to behavior expression or the movement of the mankind in identification true environment, can be in every field In applied.Application scenarios common at present have: intelligent monitoring, smart home, human-computer interaction and the analysis of human body behavior property, The fields such as anticipation.However, the accuracy rate and efficiency of promotion identification are still a very challenging task, also receive The extensive concern of all researchers.
In in the past few decades, extraction and expression to human body behavioural characteristic predominantly stay in the artificial stage, and artificial The experience of designer is often depended on again to design, the extraction of feature.Common manual features extracting method has: space-time interest points (STIP), vision bag of words (BOVW), histograms of oriented gradients (HOG), motion history figure (MHI), kinergety image (MEI) Deng.The design of manual features is often to carry out just for the specific data of certain a part, and which results in the general of model Change ability is poor, can not quickly move to other application up, greatly increase artificial cost.Conventional method can be with Say it is to enter a bottleneck period.
Application of the deep learning in Human bodys' response can be described as to present on tional identification mode insufficient one A very big makes up.It is mainly reflected in the following aspects: (1) avoiding the trouble of manual features extraction, simplify feature and mention The process taken;(2) since deep neural network all has certain feedback regulation effect, model is largely strengthened Generalization ability;(3) automatic dimensionality reduction can be carried out to complicated feature;(4) it in terms of handling big data, can greatly reduce The expense of calculating and the execution efficiency for improving entirety;(5) classify for the identification of no label data, performance is more excellent;(6) for base It is relatively easy in the realization of the Activity recognition of mode, it is only necessary to which individually designed corresponding deep learning model carries out mentioning for feature It takes, then the feature of two or more network models is merged, it is very big that this has obtained the accuracy of identification It is promoted.
One maximum difference of the analysis of Human bodys' response and image classification detection is that whether contain the time Information in dimension.Therefore, the analysis of Human bodys' response to not only be gone to extract behavioural characteristic from Spatial Dimension, also wanted Successional information is excavated up from the time dimension of its behavior.It can guarantee in this way to successional behavior act Correct description.
Summary of the invention
It is an object of the invention to overcome current deep neural network model time dimension information on Human bodys' response The deficiency of capture proposes a kind of Human bodys' response method and system based on multi-target detection 3D CNN, make up its when Between convolution and a kind of situation for causing information to lose in dimension, strengthen the expression of the feature on time dimension, it is whole to improve mould The recognition efficiency of type enables model to better understand the behavior act of human body.
To achieve the above object, technical solution provided by the present invention is as follows:
Human bodys' response method based on multi-target detection 3D CNN, comprising the following steps:
1) video is pre-processed, converts picture frame for video flowing;
2) using SSD (full name: Single Shot MultiBox Detector) detection technique to the target pair in video As carrying out calibration cutting;
3) it establishes image frame data and demarcates the feature extraction network structure of cut data;
4) Fusion Features model is established, the two kinds of features extracted in step 3) are merged;
5) classified using Softmax regression model classifier;
6) according to actual application scenarios or common data sets, trained model is finely adjusted, enhances the general of model Change, Generalization Ability.
In step 1), video is pre-processed, converts picture frame for video flowing, comprising the following steps:
1.1) sets of video data is obtained, is mainly used for the training of model, test data using common data sets here Integrate and is acquired as camera under true environment;
1.2) archive operation is carried out to sets of video data, the video data filing of same action behavior is pressed from both sides to same file Under, file is named with its behavior label;
1.3) sets of video data is pre-processed, all videos is completely converted by Video Quality Metric shell script Corresponding picture frame collection;
1.4) cutting division is carried out to the picture frame collection that step 1.3) obtains using cross-validation method, the instruction for model Practice;
In step 2), calibration cutting, including following step are carried out to the target object in video using SSD detection technique It is rapid:
2.1) trained SSD detection model is loaded;
2.2) video stream data is read, is sent into SSD detection model, calibration detection is carried out to each frame of video;
2.3) size that setting nominal data is cut, the half of each frame sign is concentrated for step 1.3) frame data, to institute There is video to be converted and saves as the picture frame collection of calibration.
In step 3), establishes image frame data and demarcate the feature extraction network structure of cut data, specific as follows:
Firstly, building the 3D convolutional neural networks model and human detection module data set based on image frame data collection respectively 3D convolutional neural networks model;Then 5 layers of 3D convolution operation, 5 are respectively adopted as the input of model using continuous 16 frame data The maximum 3D pondization operation of layer, 1 layer of Fusion Features layer and 3 layers of full attended operation;To prevent model training over-fitting, to 5 layers of convolution Layer uses L2 canonical, adds dropout (0.5) in full articulamentum;
In step 4), Fusion Features model is established, carries out the fusion of feature, comprising the following steps:
4.1) 3D convolutional neural networks model and human detection module data set based on image frame data collection are obtained respectively 3D convolutional neural networks model extraction 3D convolution feature, and Flatten () operation is carried out to the feature of acquisition, as melting Close the input of layer;
4.2) fusion of intermediate features, the input as full articulamentum are completed.
In step 5), classified using Softmax classifier, comprising the following steps:
5.1) after the fusion for completing feature in the step 4), into crossing after three layers of full articulamentum as Softmax classifier Input, then classifies;
5.2) setting early warning report threshold value, when determine some behavior act discrimination reach its corresponding threshold value it Afterwards, system provides early warning.
In step 6), according to actual application scenarios or common data sets, trained model is finely adjusted, is enhanced Extensive, the Generalization Ability of model, comprising the following steps:
6.1) migration models are into specific application scenarios, the convolution sum pond layer parameter of Freezing Model;
6.2) input of model, output layer are changed;
6.3) data set under new scene, the parameter of the full articulamentum of re -training are loaded.
Human bodys' response system based on multi-target detection 3D CNN, comprising:
Data acquisition module, for acquiring the original video data information of human body behavioural analysis, including public behavior number According to the sets of video data in collection and actual scene;
Data preprocessing module, for being pre-processed to original video data, calibration of classifying, target detection, cutting, with And video frame conversion;
Characteristic extracting module is extracted respectively for pretreated data to be sent into the 3D CNN network model of building The behavioral agent characteristic information that video flowing behavior characteristic information and calibration are cut;
Fusion Features module, for being merged to the characteristic information that characteristic extracting module obtains;
Model training module, by carrying out learning model building to pretreated training set, the multi-target detection after being trained 3D CNN Human bodys' response model;
Human bodys' response module, it is dynamic to the behavior of human body using the 3D CNN Human bodys' response model of multi-target detection Make carry out Classification and Identification.
Further, the data acquisition module acquires the video in actual scene by monocular cam and binocular camera Data, and download disclosed human body behavioral data collection;The data preprocessing module is using " FFmpeg " tool to video data It is handled, is converted to picture frame collection, while video is demarcated using SSD detection algorithm, is cut, generate multiple target frame number According to collection;The characteristic extracting module uses 3D CNN model, using continuous 16 frame data as the input of model, using 5 layers 3D volumes Product operation and 5 layers of maximum 3D pondization operation;The Fusion Features module uses 1 layer of 3D Fusion Features layer structure, merges two kinds of rows It is characterized information, 3 layers of full articulamentum are further extracted and classified to feature;The model training module uses " UCF- 101 " combine composing training data with " HMDB51 " public human body behavioral data collection and the real data collection oneself acquired Collection;The Human bodys' response module carries out Classification and Identification using Softmax classifier.
Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that
1, picture frame collection is converted video data to, and utilizes SSD (full name: Single Shot MultiBox Detector) detection algorithm is demarcated the personage in video flowing, is cut, can not only be from the global behavior extracted in video Characteristic information, additionally it is possible to which the extraction that local feature is carried out for behavioral agent makes up the drawbacks of global characteristics weaken, and strengthens model The ability of study.
2, the extraction that using 3D CNN model two kinds of pretreated data sets are carried out with feature, can make up for it traditional 2D CNN It can only be from the shortcoming for spatially extracting video features, without individually doing other extractions to the temporal aspect of behavior, melting It closes, it is only necessary to press batch input picture frame data;Model will get on extraction behavior from time and two, space dimension automatically Feature greatly reduces the difficulty of feature extraction on time dimension.
3, the behavioural characteristic that model learning arrives can not only be used to Classification and Identification, be also used as the effect of early warning report, Model prejudges according to the threshold value of warning that sets special behavior and reported, increases model in practical applications Scene.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart.
Fig. 2 is 3D convolution operation structural schematic diagram in the present invention.
Fig. 3 is 3D convolutional neural networks model structure design drawing in the present invention.
Fig. 4 is based on multi-target detection 3D CNN model structure schematic diagram.
Specific embodiment
The present invention is further explained in the light of specific embodiments.
It is shown in Figure 1, the Human bodys' response method based on multi-target detection 3D CNN provided by the present embodiment, packet Include following steps:
1) Human bodys' response data collection system is established, human body behavior sets of video data is obtained, mainly uses here It is the training that common data sets are used for model, test data set is that camera is acquired under true environment;
2) frame data collection is respectively converted into the sets of video data of acquisition and utilizes SSD (full name: Single Shot MultiBox Detector) detection algorithm calibration cut data set;
3) 3D CNN learning model is established, data set is learnt respectively, and the feature of each self study is merged Processing;
4) Classification and Identification is carried out to fused feature using Softmax classifier;
5) classify to classification results behavior and demarcate identification or early warning report;
6) model is finely adjusted according to specific application scene, enhances popularization and the generalization ability of model.
In step 2), the sets of video data of step 1) acquisition is pre-processed.Since the model is for multiple target Fusion recognition, therefore be divided into including the independent process of following two:
2.1) frame cutting is directly carried out to sets of video data, establishes first frame data collection, comprising the following steps:
2.1.1 archive operation) is carried out to sets of video data, the video data of same action behavior is filed to same file Under folder, file is named with its behavior label;
2.1.2) sets of video data is pre-processed, all videos are totally converted by Video Quality Metric shell script For corresponding picture frame collection;
2.1.3) using cross-validation method to 2.1.2) obtain picture frame collection carry out cutting division, the instruction for model Practice.
2.2) with SSD (full name: Single Shot MultiBox Detector) algorithm to the master of behavior act
Body is detected, and is extracted targetedly motion characteristic, is established second frame data collection, comprising the following steps:
2.2.1 trained SSD (full name: Single Shot MultiBox Detector) detection model) is loaded;
2.2.2 video stream data) is read, is sent into SSD detection model, calibration detection is carried out to each frame of video;
2.2.3) the size that setting nominal data is cut, the half of each frame sign is concentrated for 2.1.3) frame data, to institute There is video to be converted and saves as the picture frame collection of calibration.
Shown in Figure 2, the 3D CNN model to design in the present invention carries out convolution operation, extracts the structure of behavioural characteristic Schematic diagram.3D CNN can go to extract behavior characteristic information from two dimensions of room and time, as can be seen from Figure 2, carry out convolution behaviour The time dimension of work is N, i.e., carries out convolution operation to continuous N frame image.3D convolution in figure is N number of continuous by stacking Picture frame forms a cube, and 3D convolution kernel is then used in cube.In this structure, each in convolutional layer is special Sign map can be connected with multiple neighbouring successive frames in upper one layer, therefore capture motion information.
It is shown in Figure 3, in step 3), 3D CNN model is established, the study of feature is carried out, comprising the following steps:
3.1) 3D convolutional neural networks model and human detection module data set based on image frame data collection are built respectively 3D convolutional neural networks model.Using continuous 16 frame data as the input of model, 5 layers of 3D convolution operation are respectively adopted (wherein The number of convolution kernel is followed successively by 64,128,256,512,512), 5 layers of maximum 3D pondizations operation and 1 layer connect that (number is entirely 2048) operate, the feature of acquisition is used as the input of Model Fusion layer, it is specific as shown in figure 4, it the following steps are included:
3.1.1 the 3D convolution feature of two model extractions) is obtained respectively, and Flatten () behaviour is carried out to the feature of acquisition Make, the input as fused layer;
3.1.2 the fusion of intermediate features, the input as full articulamentum) are completed.
3.2) to prevent model training over-fitting, L2 canonical is used to 5 layers of convolutional layer, adds dropout in full articulamentum (0.5)。
It is shown in Figure 4, classification knowledge is carried out to the fused feature of step 3.1) using Softmax classifier in step 4) Not, comprising the following steps:
4.1) after the fusion for completing feature, into input as Softmax classifier after three layers of full articulamentum is crossed, then into Row classification;
4.2) setting early warning report threshold value, when determine some behavior act discrimination reach its corresponding threshold value it Afterwards, system provides early warning.
In step 6), model is finely adjusted according to specific application scene, enhances popularization and the generalization ability of model, including Following steps:
6.1) migration models are into specific application scenarios, the convolution sum pond layer parameter of Freezing Model;
6.2) input of model, output layer are changed;
6.3) data set under new scene, the parameter of the full articulamentum of re -training are loaded.
It is below a kind of Human bodys' response system based on multi-modal 3D CNN provided by the present embodiment, comprising:
Data acquisition module: for acquiring the original video data information of human body behavioural analysis, including public behavior number According to the sets of video data in collection and actual scene.In the present embodiment, it is acquired using monocular cam and binocular camera real Video data in the scene of border, and download disclosed human body behavioral data collection, total data set as acquisition.
Data preprocessing module: for being pre-processed to original video data, calibration of classifying, target detection, cutting, with And video frame conversion.In the present embodiment, video data is handled using " FFmpeg " tool, is converted to picture frame Collect, while video is demarcated using SSD (full name: Single Shot MultiBox Detector) detection algorithm, is cut out It cuts, generates multiple target frame data collection.
Characteristic extracting module: it for pretreated data to be sent into the 3D CNN network model of building, extracts respectively The behavioral agent characteristic information that video flowing behavior characteristic information and calibration are cut.In the present embodiment, using 3D CNN model. Two kinds are extracted to obtain using 5 layers of 3D convolution operation and 5 layers of maximum 3D pondization operation using continuous 16 frame data as the input of model Input of the characteristic information as Fusion Features module.
Fusion Features module: for being merged to the characteristic information that characteristic extracting module obtains.In the present embodiment, Using 1 layer of 3D Fusion Features layer structure, two kinds of behavior characteristic informations are merged, 3 layers of full articulamentum further extract feature, Classification.
Model training module: by carrying out learning model building to pretreated training set, the multi-target detection after being trained 3D CNN Human bodys' response model.In the present embodiment, " UCF-101 " is used, the public human body behavior such as " HMDB51 " Data set and the real data collection of oneself acquisition combine composing training data set.
Human bodys' response module: dynamic to the behavior of human body using the 3D CNN Human bodys' response model of multi-target detection Make carry out Classification and Identification.In the present embodiment, Classification and Identification is carried out using Softmax classifier.
In the above-described embodiments, included modules are that function logic according to the invention is divided, but It is not limited to the above division, as long as corresponding functions can be realized, the protection scope that is not intended to restrict the invention.
In conclusion the Human bodys' response method and system provided by the present invention based on multi-target detection 3D CNN, Not only compensate for the deficiency that 2D neural network extracts feature on time dimension;The method of multi-target detection is also used, is introduced SSD (full name: Single Shot MultiBox Detector) algorithm of target detection carries out the behavioral agent in video flowing The drawbacks of calibration is for obtaining more detailed local feature, being fused in model, make up the reduction of model global characteristics;Mould simultaneously The behavioural characteristic that type learns can not only be used to Classification and Identification, be also used as the effect of early warning report, and model will be according to setting The threshold value of warning set, prejudges special behavior, report.Increase the scene of model in practical applications.Of the invention Model, which can also migrate, is applied to the enterprising enforcement use of the platform of internet of things such as smart home, intelligent monitoring, intelligent anti-theft, has extensive Research and use value, the popularization referred to.
Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore All shapes according to the present invention change made by principle, should all be included within the scope of protection of the present invention.

Claims (7)

1. the Human bodys' response method based on multi-target detection 3D CNN, which comprises the following steps:
1) video is pre-processed, converts picture frame for video flowing;
2) calibration cutting is carried out to the target object in video using SSD detection technique;
3) it establishes image frame data and demarcates the feature extraction network structure of cut data;
4) Fusion Features model is established, the two kinds of features extracted in step 3) are merged;
5) classified using Softmax regression model classifier;
6) according to actual application scenarios or common data sets, trained model is finely adjusted, enhance model it is extensive, push away Wide ability.
2. the Human bodys' response method according to claim 1 based on multi-target detection 3D CNN, which is characterized in that In step 1), video is pre-processed, converts picture frame for video flowing, comprising the following steps:
1.1) sets of video data is obtained, is mainly used for the training of model using common data sets here, test data set is Camera is acquired under true environment;
1.2) archive operation is carried out to sets of video data, under the video data filing to same file folder of same action behavior, File is named with its behavior label;
1.3) sets of video data is pre-processed, all videos is completely converted by Video Quality Metric shell script by correspondence Picture frame collection;
1.4) cutting division is carried out to the picture frame collection that step 1.3) obtains using cross-validation method, the training for model;
In step 2), calibration cutting is carried out to the target object in video using SSD detection technique, comprising the following steps:
2.1) trained SSD detection model is loaded;
2.2) video stream data is read, is sent into SSD detection model, calibration detection is carried out to each frame of video;
2.3) size that setting nominal data is cut, the half of each frame sign is concentrated for step 1.3) frame data, to all views The picture frame collection of calibration is converted and saved as to frequency.
3. the Human bodys' response method according to claim 1 based on multi-target detection 3D CNN, which is characterized in that In step 3), establishes image frame data and demarcate the feature extraction network structure of cut data, specific as follows:
Firstly, building the 3D of 3D convolutional neural networks model and human detection module data set based on image frame data collection respectively Convolutional neural networks model;Then using continuous 16 frame data as the input of model, be respectively adopted 5 layers of 3D convolution operation, 5 layers most Big 3D pondization operation, 1 layer of Fusion Features layer and 3 layers of full attended operation;To prevent model training over-fitting, 5 layers of convolutional layer are adopted With L2 canonical, dropout (0.5) is added in full articulamentum;
In step 4), Fusion Features model is established, carries out the fusion of feature, comprising the following steps:
4.1) 3D of 3D convolutional neural networks model and human detection module data set based on image frame data collection is obtained respectively The 3D convolution feature of convolutional neural networks model extraction, and Flatten () operation is carried out to the feature of acquisition, as fused layer Input;
4.2) fusion of intermediate features, the input as full articulamentum are completed.
4. the Human bodys' response method according to claim 1 based on multi-target detection 3D CNN, which is characterized in that In step 5), classified using Softmax classifier, comprising the following steps:
5.1) after the fusion for completing feature in the step 4), into the input crossed after three layers of full articulamentum as Softmax classifier, Then classify;
5.2) threshold value of setting early warning report, after the discrimination for determining some behavior act reaches its corresponding threshold value, System provides early warning.
5. the Human bodys' response method according to claim 1 based on multi-target detection 3D CNN, which is characterized in that In step 6), according to actual application scenarios or common data sets, trained model is finely adjusted, enhances the general of model Change, Generalization Ability, comprising the following steps:
6.1) migration models are into specific application scenarios, the convolution sum pond layer parameter of Freezing Model;
6.2) input of model, output layer are changed;
6.3) data set under new scene, the parameter of the full articulamentum of re -training are loaded.
6. the Human bodys' response system based on multi-target detection 3D CNN characterized by comprising
Data acquisition module, for acquiring the original video data information of human body behavioural analysis, including public behavioral data collection With the sets of video data in actual scene;
Data preprocessing module, for being pre-processed to original video data, calibration of classifying, target detection, cutting and view The conversion of frequency frame;
Characteristic extracting module extracts video for pretreated data to be sent into the 3D CNN network model of building respectively The behavioral agent characteristic information that Flow Behavior characteristic information and calibration are cut;
Fusion Features module, for being merged to the characteristic information that characteristic extracting module obtains;
Model training module, by carrying out learning model building, the 3D of the multi-target detection after being trained to pretreated training set CNN Human bodys' response model;
Human bodys' response module, using multi-target detection 3D CNN Human bodys' response model to the behavior act of human body into Row Classification and Identification.
7. the Human bodys' response system according to claim 6 based on multi-target detection 3D CNN, it is characterised in that: institute Data acquisition module is stated by the video data in monocular cam and binocular camera acquisition actual scene, and disclosed in downloading Human body behavioral data collection;The data preprocessing module is handled video data using " FFmpeg " tool, is converted to figure As frame collection, while using SSD detection algorithm video demarcated, being cut, generates multiple target frame data collection;The feature extraction Module uses 3D CNN model, using continuous 16 frame data as the input of model, using 5 layers of 3D convolution operation and 5 layers of maximum 3D Pondization operation;The Fusion Features module uses 1 layer of 3D Fusion Features layer structure, merges two kinds of behavior characteristic informations, 3 layers connect entirely It connects layer and feature is further extracted and classified;The model training module uses " UCF-101 " and " HMDB51 " public people Body behavioral data collection and the real data collection of oneself acquisition combine composing training data set;The Human bodys' response Module carries out Classification and Identification using Softmax classifier.
CN201910136442.1A 2019-02-18 2019-02-18 Human behavior identification method and system based on multi-target detection 3D CNN Expired - Fee Related CN109977773B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910136442.1A CN109977773B (en) 2019-02-18 2019-02-18 Human behavior identification method and system based on multi-target detection 3D CNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910136442.1A CN109977773B (en) 2019-02-18 2019-02-18 Human behavior identification method and system based on multi-target detection 3D CNN

Publications (2)

Publication Number Publication Date
CN109977773A true CN109977773A (en) 2019-07-05
CN109977773B CN109977773B (en) 2021-01-19

Family

ID=67077264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910136442.1A Expired - Fee Related CN109977773B (en) 2019-02-18 2019-02-18 Human behavior identification method and system based on multi-target detection 3D CNN

Country Status (1)

Country Link
CN (1) CN109977773B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348420A (en) * 2019-07-18 2019-10-18 腾讯科技(深圳)有限公司 Sign Language Recognition Method, device, computer readable storage medium and computer equipment
CN110414415A (en) * 2019-07-24 2019-11-05 北京理工大学 Human bodys' response method towards classroom scene
CN110414421A (en) * 2019-07-25 2019-11-05 电子科技大学 A kind of Activity recognition method based on sequential frame image
CN110532909A (en) * 2019-08-16 2019-12-03 成都电科慧安科技有限公司 A kind of Human bodys' response method based on three-dimensional UWB positioning
CN111259838A (en) * 2020-01-20 2020-06-09 山东大学 Method and system for deeply understanding human body behaviors in service robot service environment
CN111382677A (en) * 2020-02-25 2020-07-07 华南理工大学 Human behavior identification method and system based on 3D attention residual error model
CN112016461A (en) * 2020-08-28 2020-12-01 深圳市信义科技有限公司 Multi-target behavior identification method and system
CN112232190A (en) * 2020-10-15 2021-01-15 南京邮电大学 Method for detecting abnormal behaviors of old people facing home scene
CN112613428A (en) * 2020-12-28 2021-04-06 杭州电子科技大学 Resnet-3D convolution cattle video target detection method based on balance loss
CN112766151A (en) * 2021-01-19 2021-05-07 北京深睿博联科技有限责任公司 Binocular target detection method and system for blind guiding glasses
CN113052059A (en) * 2021-03-22 2021-06-29 中国石油大学(华东) Real-time action recognition method based on space-time feature fusion
CN113221658A (en) * 2021-04-13 2021-08-06 卓尔智联(武汉)研究院有限公司 Training method and device of image processing model, electronic equipment and storage medium
CN113420703A (en) * 2021-07-03 2021-09-21 西北工业大学 Dynamic facial expression recognition method based on multi-scale feature extraction and multi-attention mechanism modeling
CN113515986A (en) * 2020-07-02 2021-10-19 阿里巴巴集团控股有限公司 Video processing method, data processing method and equipment
CN113536847A (en) * 2020-04-17 2021-10-22 天津职业技术师范大学(中国职业培训指导教师进修中心) Industrial scene video analysis system and method based on deep learning
CN115601714A (en) * 2022-12-16 2023-01-13 广东汇通信息科技股份有限公司(Cn) Campus violent behavior identification method based on multi-mode data analysis

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899561A (en) * 2015-05-27 2015-09-09 华南理工大学 Parallelized human body behavior identification method
US20180032846A1 (en) * 2016-08-01 2018-02-01 Nvidia Corporation Fusing multilayer and multimodal deep neural networks for video classification
CN108108652A (en) * 2017-03-29 2018-06-01 广东工业大学 A kind of across visual angle Human bodys' response method and device based on dictionary learning
CN108647591A (en) * 2018-04-25 2018-10-12 长沙学院 Activity recognition method and system in a kind of video of view-based access control model-semantic feature
CN108985173A (en) * 2018-06-19 2018-12-11 奕通信息科技(上海)股份有限公司 Towards the depth network migration learning method for having the label apparent age data library of noise
CN109002808A (en) * 2018-07-27 2018-12-14 高新兴科技集团股份有限公司 A kind of Human bodys' response method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899561A (en) * 2015-05-27 2015-09-09 华南理工大学 Parallelized human body behavior identification method
US20180032846A1 (en) * 2016-08-01 2018-02-01 Nvidia Corporation Fusing multilayer and multimodal deep neural networks for video classification
CN108108652A (en) * 2017-03-29 2018-06-01 广东工业大学 A kind of across visual angle Human bodys' response method and device based on dictionary learning
CN108647591A (en) * 2018-04-25 2018-10-12 长沙学院 Activity recognition method and system in a kind of video of view-based access control model-semantic feature
CN108985173A (en) * 2018-06-19 2018-12-11 奕通信息科技(上海)股份有限公司 Towards the depth network migration learning method for having the label apparent age data library of noise
CN109002808A (en) * 2018-07-27 2018-12-14 高新兴科技集团股份有限公司 A kind of Human bodys' response method and system

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348420A (en) * 2019-07-18 2019-10-18 腾讯科技(深圳)有限公司 Sign Language Recognition Method, device, computer readable storage medium and computer equipment
CN110348420B (en) * 2019-07-18 2022-03-18 腾讯科技(深圳)有限公司 Sign language recognition method and device, computer readable storage medium and computer equipment
US11749029B2 (en) 2019-07-18 2023-09-05 Tencent Technology (Shenzhen) Company Limited Gesture language recognition method and apparatus, computer-readable storage medium, and computer device
CN110414415A (en) * 2019-07-24 2019-11-05 北京理工大学 Human bodys' response method towards classroom scene
CN110414421B (en) * 2019-07-25 2023-04-07 电子科技大学 Behavior identification method based on continuous frame images
CN110414421A (en) * 2019-07-25 2019-11-05 电子科技大学 A kind of Activity recognition method based on sequential frame image
CN110532909A (en) * 2019-08-16 2019-12-03 成都电科慧安科技有限公司 A kind of Human bodys' response method based on three-dimensional UWB positioning
CN111259838A (en) * 2020-01-20 2020-06-09 山东大学 Method and system for deeply understanding human body behaviors in service robot service environment
CN111259838B (en) * 2020-01-20 2023-02-03 山东大学 Method and system for deeply understanding human body behaviors in service robot service environment
CN111382677A (en) * 2020-02-25 2020-07-07 华南理工大学 Human behavior identification method and system based on 3D attention residual error model
CN111382677B (en) * 2020-02-25 2023-06-20 华南理工大学 Human behavior recognition method and system based on 3D attention residual error model
CN113536847A (en) * 2020-04-17 2021-10-22 天津职业技术师范大学(中国职业培训指导教师进修中心) Industrial scene video analysis system and method based on deep learning
CN113515986A (en) * 2020-07-02 2021-10-19 阿里巴巴集团控股有限公司 Video processing method, data processing method and equipment
CN112016461A (en) * 2020-08-28 2020-12-01 深圳市信义科技有限公司 Multi-target behavior identification method and system
CN112232190B (en) * 2020-10-15 2023-04-18 南京邮电大学 Method for detecting abnormal behaviors of old people facing home scene
CN112232190A (en) * 2020-10-15 2021-01-15 南京邮电大学 Method for detecting abnormal behaviors of old people facing home scene
CN112613428A (en) * 2020-12-28 2021-04-06 杭州电子科技大学 Resnet-3D convolution cattle video target detection method based on balance loss
CN112613428B (en) * 2020-12-28 2024-03-22 易采天成(郑州)信息技术有限公司 Resnet-3D convolution cattle video target detection method based on balance loss
CN112766151B (en) * 2021-01-19 2022-07-12 北京深睿博联科技有限责任公司 Binocular target detection method and system for blind guiding glasses
CN112766151A (en) * 2021-01-19 2021-05-07 北京深睿博联科技有限责任公司 Binocular target detection method and system for blind guiding glasses
CN113052059A (en) * 2021-03-22 2021-06-29 中国石油大学(华东) Real-time action recognition method based on space-time feature fusion
CN113221658A (en) * 2021-04-13 2021-08-06 卓尔智联(武汉)研究院有限公司 Training method and device of image processing model, electronic equipment and storage medium
CN113420703A (en) * 2021-07-03 2021-09-21 西北工业大学 Dynamic facial expression recognition method based on multi-scale feature extraction and multi-attention mechanism modeling
CN115601714A (en) * 2022-12-16 2023-01-13 广东汇通信息科技股份有限公司(Cn) Campus violent behavior identification method based on multi-mode data analysis
CN115601714B (en) * 2022-12-16 2023-03-10 广东汇通信息科技股份有限公司 Campus violent behavior identification method based on multi-modal data analysis

Also Published As

Publication number Publication date
CN109977773B (en) 2021-01-19

Similar Documents

Publication Publication Date Title
CN109977773A (en) Human bodys' response method and system based on multi-target detection 3D CNN
Mees et al. Choosing smartly: Adaptive multimodal fusion for object detection in changing environments
Yin et al. Recurrent convolutional network for video-based smoke detection
US10083378B2 (en) Automatic detection of objects in video images
CN109598268B (en) RGB-D (Red Green blue-D) significant target detection method based on single-stream deep network
CN103871079B (en) Wireless vehicle tracking based on machine learning and light stream
CN107247956B (en) Rapid target detection method based on grid judgment
CN109543697A (en) A kind of RGBD images steganalysis method based on deep learning
CN111382677B (en) Human behavior recognition method and system based on 3D attention residual error model
CN113255443B (en) Graph annotation meaning network time sequence action positioning method based on pyramid structure
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
CN104504395A (en) Method and system for achieving classification of pedestrians and vehicles based on neural network
Chen et al. An improved Yolov3 based on dual path network for cherry tomatoes detection
CN108345894A (en) A kind of traffic incidents detection method based on deep learning and entropy model
Lu et al. Multi-object detection method based on YOLO and ResNet hybrid networks
CN110852190A (en) Driving behavior recognition method and system integrating target detection and gesture recognition
CN111723600B (en) Pedestrian re-recognition feature descriptor based on multi-task learning
CN110619268A (en) Pedestrian re-identification method and device based on space-time analysis and depth features
CN110688938A (en) Pedestrian re-identification method integrated with attention mechanism
Yang et al. Counting crowds using a scale-distribution-aware network and adaptive human-shaped kernel
CN103500456B (en) A kind of method for tracing object based on dynamic Bayesian network network and equipment
CN105469050A (en) Video behavior identification method based on local space-time characteristic description and pyramid vocabulary tree
Tsutsui et al. Distantly supervised road segmentation
Mathur et al. A brief survey of deep learning techniques for person re-identification
CN113205060A (en) Human body action detection method adopting circulatory neural network to judge according to bone morphology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210119

CF01 Termination of patent right due to non-payment of annual fee