CN105740773B - Activity recognition method based on deep learning and multi-scale information - Google Patents
Activity recognition method based on deep learning and multi-scale information Download PDFInfo
- Publication number
- CN105740773B CN105740773B CN201610047682.0A CN201610047682A CN105740773B CN 105740773 B CN105740773 B CN 105740773B CN 201610047682 A CN201610047682 A CN 201610047682A CN 105740773 B CN105740773 B CN 105740773B
- Authority
- CN
- China
- Prior art keywords
- video
- seg
- coarseness
- frequency band
- deep
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The Activity recognition method based on deep learning and multi-scale information that the invention discloses a kind of, by constructing multiple depth networks, composition parallel organization carrys out the Human bodys' response of the depth of investigation video, deep video is first split into multiple video-frequency bands first, then learnt respectively using each parallel branch neural network, fusion connection is carried out to the high-rise expression that each neural network branch learns again, full articulamentum and classification layer finally are sent into fused high-rise expression and carries out Classification and Identification.Activity recognition can be effectively carried out using the method for deep learning, especially when each behavior act difference is larger, discrimination can be significantly improved, and real-time is high.
Description
Technical field
The present invention relates to Human bodys' response fields, more particularly to a kind of row based on deep learning and multi-scale information
For recognition methods.
Background technique
With the maturation of the hardware technologies such as computer, camera and the requirements at the higher level of social management, Human bodys' response
Research increasingly causes the attention of computer vision research worker, and is widely used to monitor automatically, and event detection is man-machine
Interface, the every field such as video acquisition.Traditional Human bodys' response method describes the view of human body behavior first against each
Frequency carries out feature extraction, such as histograms of oriented gradients (Histograms of Oriented Gradient, HOG), motion history
Image (Motion History Image, MHI) etc., then using classifiers such as support vector machines, random forests to extraction
Feature carries out Classification and Identification.The research of Human bodys' response based on calculation method has been achieved for many outstanding achievements, however
There is also some insoluble problems: the feature of extraction has specific aim, is not easy extensive to other data;Computing cost is too
Greatly, it is difficult to accomplish real-time.
Deep learning can automatically extract the expression of the multilayer feature between being hidden in data, the depth based on convolutional neural networks
Practise research image classification, identification, positioning, in terms of achieve very big success.However, the convolution in image procossing is
Two-dimentional operation is not directly applicable the 3 D video of description human body behavior.
Summary of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of based on deep learning and multi-scale information
Activity recognition method can effectively carry out Activity recognition using the method for deep learning, especially when each behavior act difference compared with
When big, discrimination can be significantly improved, and Generalization Capability of the invention is good, can be trained on a large data sets,
Be subsequently used for lacking in training the Activity recognition fields of data, can greatly reduce the time overhead of Activity recognition, real-time is high.
The present invention by constructing the deep neural network structure based on CNN, and is melted using deep video data as research object
The multi-scale informations such as the global hand motion of human body behavioural information and part are closed, study three-dimensional using traditional two-dimentional CNN
Human bodys' response.
The present invention carrys out the Human bodys' response of the depth of investigation video by constructing multiple depth networks, composition parallel organization.
Deep video is first split into multiple video-frequency bands first, is then learnt respectively using each parallel branch neural network, then right
The high-rise expression that each neural network branch learns carries out fusion connection, and the data vectorization of each branch's neural network is laggard
Row connection, becomes one-dimensional vector, to input subsequent full articulamentum.Full articulamentum finally is sent into fused high-rise expression
Classification and Identification is carried out with classification layer.At the same time, only exist for behavior most of in MSRDailyActivity3D data set
Hand has nuance, such as read, write, with laptop, play game behavior, the invention proposes by fusion coarse grain
The thought of the multi-scale informations such as the global behavior information of degree and fine-grained hand motion.
The object of the present invention is achieved like this: a kind of Activity recognition method based on deep learning and multi-scale information,
Include the following steps:
(1) training dataset is established;The coarseness global behavior video that the training data is concentrated is selected from
MSRDailyActivity3D data set.
(2) building has the deep neural network model of several parallel depth convolutional neural networks;
(3) step-length L of the coarseness global behavior video of training data concentration to set is chosenStrideIt is segmented,
In, every segment length is set as LSeg, N is formd after segmentationSegA coarseness video-frequency band matrix, segments NSeg=1+ (NF?
LSeg)/LStride, NFFor the frame number of coarseness global behavior video;
(4) behavior video in fine granularity part is obtained from the coarseness global behavior video in step (3), to fine granularity office
Portion's behavior video takes steps, and (3) similarly method is segmented to obtain NSegA fine granularity video-frequency band matrix;Fine granularity video-frequency band
The size of each frame of matrix is identical as the size of each frame of coarseness video-frequency band matrix.Intercept coarseness global behavior video
Fine granularity part behavior sequence in each frame forms fine granularity part behavior video.Fine-grained part behavior can be hand
Movement, or the details at other positions acts.Obtain fine granularity video method: with each frame of coarseness global behavior video
Left hand joint centered on, interception W/4 × H/4 size frame form NFThe new video of × W/4 × H/4, the video are fine granularity
Hand motion video, wherein W, H, NFThe frame number for respectively including in the width of original depth video frame, height and video.This is big
It is in the same size after the small video down-sampling with coarseness.
(5) N for obtaining step (3)SegThe N that a coarseness video-frequency band matrix and step (4) obtainSegA fine granularity video
What is constructed in section matrix parallel feeding step (1) has 2NSegThe deep neural network model of a parallel depth convolutional neural networks
In be trained;
(6) selection coarseness global behavior video to be identified carries out step (3), (4) respectively obtain NSegA coarseness view
Frequency range matrix and NSegA fine granularity video-frequency band matrix, the N that will be obtainedSegA coarseness video-frequency band matrix and NSegA fine granularity view
Frequency range matrix parallel is sent into the trained deep neural network model that step (5) obtain and carries out Activity recognition.Wait know
Other coarseness global behavior video is to pass through pretreated video.
Deep neural network in step (2) using convolutional neural networks as structure block, have a classification layer, at least one
Convolutional layer, at least one pond layer and at least one full articulamentum.Parallel depth convolutional neural networks include sequentially connected
First convolutional layer, the first pond layer, the second convolutional layer, the second pond layer, third convolutional layer, third pond layer, the first full connection
Layer, the second full articulamentum and classification layer.
It is segmented, acts on again after each frame of coarseness global behavior video in step (3) is carried out down-sampling are as follows:
1, calculation amount is reduced;2, make the big of the size of each frame of coarseness video-frequency band matrix and each frame of fine granularity video-frequency band matrix
It is small identical, convenient for input network.
Coarseness global behavior video is deep video.
The coarseness global behavior video that training data is concentrated is to pass through pretreated video, and coarseness to be identified is global
Behavior video is to pass through pretreated video.The pretreatment are as follows: firstly, using interpolation technique by all videos in data set
Standardize to unified length.The length is the median of all video lengths.Secondly, removal background, only retain taking human as
The video section at center, and by video size specification to certain size.Again, using min-max method respectively by all videos
X, y, z coordinate value standardization arrive [0,1] range.Finally, all samples progress flip horizontal is formed new sample from forming
Training sample in times dilated data set.
A kind of Activity recognition method based on deep learning and multi-scale information, includes the following steps:
(1) training dataset is established;The deep video that the training data is concentrated is selected from MSRDailyActivity3D number
According to collection;
(2) building has the deep neural network model of several parallel depth convolutional neural networks;
(3) step-length L of the behavior video of training data concentration to set is chosenStrideIt is segmented, wherein every segment length is set
It is set to LSeg, N is formd after segmentationSegA video-frequency band matrix, segments NSeg=1+ (NF- LSeg)/LStride, NFFor depth view
The frame number of frequency;
(4) N for obtaining step (3)SegWhat is constructed in a video-frequency band matrix parallel feeding step (2) has NSegIt is a parallel
It is trained in the deep neural network model of depth convolutional neural networks;
(5) it chooses behavior video progress step (3) to be identified and obtains NSegA video-frequency band matrix, the N that will be obtainedSegA view
Frequency range matrix parallel is sent into trained deep neural network model and carries out Activity recognition.Behavior video to be identified is
By pretreated video.
Deep neural network in step (2) using convolutional neural networks as structure block, have a classification layer, at least one
Convolutional layer, at least one pond layer and at least one full articulamentum.
Behavior video is deep video.
The behavior video that training data is concentrated is to pass through pretreated video, and behavior video to be identified is by pretreatment
Video.The pretreatment are as follows: firstly, all video specificationizations in data set are arrived unified length using interpolation technique.
The length is the median of all video lengths.Secondly, removal background, only retains video section focusing on people, and will
Video size specification is to certain size.Again, using min-max method respectively by the x of all videos, y, z coordinate value standardization
To [0,1] range.Finally, all samples progress flip horizontal is formed new sample thus the training in dilated data set at double
Sample.
The invention has the benefit that the present invention obtains coarseness and fine granularity video matrix, to designed parallel depth
Degree convolutional neural networks are trained, and the identification that the deep neural network after training is used for behavior are classified, so that of the invention
Generalization Capability is good, can be trained on a large data sets, the Activity recognition field for the data that are subsequently used for lacking in training.
The present invention devises a parallel depth convolutional neural networks can be subtracted significantly by the parallel input of behavior video
The time overhead of few Activity recognition, real-time effect are good.
The present invention is research object using deep video, and deep video has description object geometry and light, color
Insensitive feature.
Experiment and the result shows that, the deep learning method proposed by the present invention based on CNN can be indicated with deep video
Human body behavior effectively identified that behavioral difference is more significantly lain down sand in MSRDailyActivity3D data set
Send out, walk, play guitar, stand and sit down five behaviors average recognition rate be 98%, to all behaviors on entire data set
Discrimination is 60.625%.
Present invention will be further explained below with reference to the attached drawings and specific embodiments.
Detailed description of the invention
Fig. 1 is the functional block diagram of the Activity recognition method of the invention based on deep learning and multi-scale information;
Fig. 2 be in MSRDailyActivity3D behavior video (it is upper before pretreatment: drink water, under: write);
Fig. 3 be in MSRDailyActivity3D behavior video (it is upper after pretreatment: drink water, under: write).
Specific embodiment
Embodiment one
Referring to Fig. 1, a kind of Activity recognition method based on deep learning and multi-scale information includes the following steps:
(1) training dataset is established;The coarseness global behavior video that the training data is concentrated is selected from
MSRDailyActivity3D data set.The coarseness global behavior video that training data is concentrated is to pass through pretreated video.
Coarseness global behavior video to be identified is to pass through pretreated video.The pretreatment are as follows: firstly, will using interpolation technique
All video specificationizations in data set arrive unified length.The length is the median of all video lengths.Secondly, removal
Background, only retains video section focusing on people, and by video size specification to certain size.Again, the side min-max is used
Method is respectively by the x of all videos, y, z coordinate value standardization to [0,1] range.Finally, all samples are carried out flip horizontal shape
The sample of Cheng Xin is to the training sample in dilated data set at double.
(2) building has the deep neural network model of several parallel depth convolutional neural networks.Depth in step (2)
Neural network using convolutional neural networks as structure block, have a classification layer, at least one convolutional layer, at least one pond layer with
And at least one full articulamentum.Present invention classification layer uses softmax classifier.The parallel depth convolution mind of the present embodiment
It include sequentially connected first convolutional layer, the first pond layer, the second convolutional layer, the second pond layer, third convolutional layer, through network
Three pond layers, the first full articulamentum, the second full articulamentum and classification layer.
(3) step-length L of the coarseness global behavior video of training data concentration to set is chosenStrideIt is segmented,
In, every segment length is set as LSeg, N is formd after segmentationSegA coarseness video-frequency band matrix, segments NSeg=1+ (NF?
LSeg)/LStride, NFFor the frame number of coarseness global behavior video.By each of the coarseness global behavior video in step (3)
Frame is segmented again after carrying out down-sampling, is acted on are as follows: 1, reduction calculation amount;2, make the big of each frame of coarseness video-frequency band matrix
It is small identical as the size of each frame of fine granularity video-frequency band matrix, convenient for input network.Research object, that is, coarseness global behavior
Video uses deep video.
(4) behavior video in fine granularity part is obtained from the coarseness global behavior video in step (3), to fine granularity office
Portion's behavior video takes steps, and (3) similarly method is segmented to obtain NSegA fine granularity video-frequency band matrix.Fine granularity video-frequency band
The size of each frame of matrix is identical as the size of each frame of coarseness video-frequency band matrix.Intercept coarseness global behavior video
Fine granularity part behavior sequence in each frame forms fine granularity part behavior video.Fine-grained part behavior can be hand
Movement, or other details movement.Fine-grained partial row is to be determined according to specifically application, and the details of notebook data collection is dynamic
It is concentrated mainly on hand, if details movement may choose the details movement of other parts at other positions.The present embodiment
Centered on the hand joint of each frame of coarseness global behavior video, intercepting the frame composition frame number being sized is NFParticulate
Spend local behavior video.
(5) N for obtaining step (3)SegThe N that a coarseness video-frequency band matrix and step (4) obtainSegA fine granularity video
What is constructed in section matrix parallel feeding step (1) has 2NSegThe deep neural network model of a parallel depth convolutional neural networks
In be trained;
(6) selection coarseness global behavior video to be identified carries out step (3), (4) respectively obtain NSegA coarseness view
Frequency range matrix and NSegA fine granularity video-frequency band matrix, the N that will be obtainedSegA coarseness video-frequency band matrix and NSegA fine granularity view
Frequency range matrix parallel is sent into trained deep neural network model and carries out Activity recognition.N before the present embodimentSegA net
Network handles coarseness video, rear NSegA network processes fine granularity video.
Embodiment two
The Activity recognition method based on deep learning and multi-scale information that present embodiment discloses a kind of, the present embodiment only make
Activity recognition is carried out with the global behavior information of coarseness.Include the following steps:
(1) training dataset is established;The deep video that the training data is concentrated is selected from MSRDailyActivity3D number
According to collection;The behavior video that training data is concentrated is to pass through pretreated video.Behavior video to be identified is by pretreated
Video.The pretreatment are as follows: firstly, all video specificationizations in data set are arrived unified length using interpolation technique.It should
Length is the median of all video lengths.Secondly, removal background, only retains video section focusing on people, and will view
Frequency size specification is to certain size.Again, using min-max method respectively by the x of all videos, y, z coordinate value, which standardizes, to be arrived
[0,1] range.Finally, all samples progress flip horizontal is formed new sample to the training sample in dilated data set at double
This.
(2) referring to Fig. 1, the deep neural network model with several parallel depth convolutional neural networks is constructed.Step (2)
In deep neural network using convolutional neural networks as structure block, have a classification layer, at least one convolutional layer, at least one
Pond layer and at least one full articulamentum.
Present invention classification layer uses softmax classifier.
(3) step-length L of the deep video of training data concentration to set is chosenStrideIt is segmented, wherein every segment length is set
It is set to LSeg, N is formd after segmentationSegA video-frequency band matrix, segments NSeg=1+ (NF- LSeg)/LStride, NFFor depth view
The frame number of frequency;
(4) N for obtaining step (3)SegWhat is constructed in a video-frequency band matrix parallel feeding step (2) has NSegIt is a parallel
It is trained in the deep neural network model of depth convolutional neural networks;
(5) it chooses deep video progress step (3) to be identified and obtains NSegA video-frequency band matrix, the N that will be obtainedSegA view
Frequency range matrix parallel is sent into trained deep neural network model and carries out Activity recognition.
Experimental procedure of the present invention is described as follows: assuming that the video size of one behavior of expression after standardization is NF×W×H
(being 192 × 128 × 128 in the present invention), wherein W, H are respectively the width and height of video frame.
It (1) is N by frame numberFBehavior video with LStrideIt is segmented for step-length, wherein every segment length is LSeg, then it is segmented
Number is NSeg=1+ (NF- LSeg/LStride, then by 1/4 down-sampling of video frame, then N is formd after being segmentedSeg×LSeg×W/4×
The video-frequency band matrix of H/4;
(2) centered on the left hand joint of each frame of deep video, the frame of interception W/4 × H/4 size forms NF×W/4×
The new video of H/4, taking steps to new video, (1) similarly method obtains NSeg×LSegThe video-frequency band matrix of × W/4 × H/4;
(3) it is merged the video-frequency band matrix of step (1) and step (2) to obtain 2NSeg×LSegThe view of × W/4 × H/4
Frequency range matrix;The video-frequency band matrix is the input of depth network, i.e., the network has 2NSegA parallel depth convolutional Neural net
Network, the input of each deep neural network are LSegThe video of × W/4 × H/4.
(4) parallel depth convolutional neural networks are trained using training dataset, then using test data set into
The test of row Human bodys' response, training dataset and subject data set are completely non-intersecting.{ 1,3,5,7,9 } is tested in the present invention
The behavior video of performance is tested the behavior video of { 2,4,6,8,10 } performance for testing for training.The data set is by 10
Personal (subject) is completed, and the data of the 1st, 3,5,7,9 people are for training, and the data of 2,4,6,8,10 this 5 people are for testing.
Assuming that LSeg=16, LStride=16, then deep neural network frame is needed using 24 parallel networks, each network
Input be 16 × 32 × 32 video-frequency band sequence, i.e., each video-frequency band contains 16 frame videos, and video image size is 32 × 32.
The depth network and its parameter that 1 present invention of table uses
Experiment and discussion
1. data set and pretreatment
The MSRDailyActivity3D data set that the present invention uses Kinect device to acquire using Microsoft, the number
Have collected 16 kinds of behaviors common in daily life according to collection: drink water, eat a piece, read, make a phone call, write, with laptop,
With dust catcher, hails, stands still, paper-tear picture, plays game, sofa of lying down, walks, plays guitar, stands and sit down.Each behavior is dynamic
Make to be completed in two different ways by same main examination: being sitting on sofa or stand.Entire data set shares 320 behavior views
Frequently.Fig. 2 gives some behavior samples in the data set.The data set has recorded human body behavior and ambient enviroment simultaneously, mentions
The depth information of taking-up contains a large amount of noise, and only nuance is being locally present in most of behavior in data set, such as
Shown in Fig. 2, Fig. 3, thus it is extremely challenging.
Before experiment, simple pretreatment is carried out to each video, firstly, using interpolation technique by the institute in data set
There is video specificationization to unified length, which is the median of all video lengths;Secondly, removal background, only retains
Video section focusing on people, and by video size specification to certain size, as shown in Figure 3;Again, the side min-max is used
Method is respectively by the x of all videos, y, z coordinate value standardization to [0,1] range;Finally, all samples are carried out flip horizontal shape
The sample of Cheng Xin is to the training sample in dilated data set at double.Experiment of the invention is compiled using Torch platform [20]
It writes, learning rate therein is 1*10-4, loss function is the soft max function that platform carries.
2. the HAR based on multi-scale information fusion and deep learning is identified
The present invention identifies video and fine-grained hand using the 2CNN2F network in table 1, by the global behavior of coarseness
Input of the multi-scale informations such as action sequence as depth network.Step-length L in this section experimentStrideWith segments LSegIt is respectively provided with
It is 16, i.e., it is the local hand of 12 × 16 × 32 × 32 global behavior sequence and 12 × 16 × 32 × 32 for extracting entire video is dynamic
Make sequence and merges fusion 24 × 16 × 32 × 32 input video matrixes of formation.Table 2 gives proposition method of the present invention and its other party
The comparison of method recognition performance on MSRDailyActivity3D data set.Wherein 2CNN2F refers to the overall situation using only coarseness
Behavioural information, and 2CNN2F+Joint then indicates multi-scale information fusion method of the invention.It can be seen that the method for the present invention from table
The accuracy of Activity recognition is 60.625%, if only discrimination is in a slight decrease using the global behavior information of coarseness,
It is 56.875%, the method for recognition performance and traditional artificial feature extraction is comparable.It is worth noting that, if only
Then discrimination, which reaches, is identified to the 11-16 behavior (play game, sofa of lying down, walk, play guitar, stand and sit down)
To 98%, this may be because with biggish difference between the 11-16 behavior, and between other large number of rows in data set are
Difference it is very subtle, such as read, write, only having nuance in hand motion with the several behaviors of laptop.Experiment
As a result illustrate, can effectively carry out Activity recognition using the method for deep learning, especially when each behavior act difference is larger,
Discrimination can be significantly improved.
2 the method for the present invention of table is compared with other methods are in the recognition performance on MSRDailyActivity3D data set
Algorithm | Discrimination |
LOP features[8] | 42.5% |
Joint Position features[8] | 68% |
Dynamic Temporal Warping[21] | 54% |
2CNN2F | 56.875% |
2CNN2F+Joint | 60.625% |
3. influence of the network depth to identification
The present invention constructs the neural network containing 3 layers of CNN and 4 layer of CNN simultaneously respectively, i.e. 3CNN2F_8 and 4 CNN2F are (such as
Shown in table 3), the influence for Probe into Network depth to recognition effect.Network parameter is as shown in table 1.Since network depth increases,
In order to guarantee that network not transition is fitted, this experiment uses the input of 24 × 8 × 128 × 128 video sequence as neural network,
192 × 128 × 128 videos after will standardizing split into 24 8 × 128 × 128 video-frequency bands, simultaneously with step-length for 8
It is input to the neural network with 24 parallel organizations.Discrimination when as shown in Table 2, using 3CNN2F_8 network is
52.5%, and the discrimination of 4 CNN2F is 58.75%.Experimental result illustrates that the increase of network depth can effectively improve behavior
Discrimination.
Parameter configuration and discrimination in 3 heterogeneous networks of table
4. splitting influence of the step-length to recognition effect
In order to examine the influence for splitting step-length to recognition effect, the present invention constructs two differences for 3CNN2F type network
The network of input: the video sequence that the input of 3CNN2F_8 and 3CNN2F_4,3CNN2F_8 are 24 × 8 × 128 × 128, and
The size of the input of 3CNN2F_4 is 47 × 8 × 128 × 128, i.e., by 192 × 128 × 128 videos after standardization, with step-length
It is 4, splits into 47 8 × 128 × 128 video-frequency bands, with the repetition of 4 frames between the two adjacent video section after fractionation.Experiment
The results are shown in Table 3.Step-length be 8 when, recognition accuracy 52.5%, and step-length be 4 when, recognition accuracy 56.875%.
Discrimination effectively improves, and the reduction for being primarily due to step-length will lead to both sides variation, and one side step-length is smaller, splits
Video-frequency band it is more, depth network needs more parallel branch, and what is horizontally become is wider, and network parameter is more, network
General Huaneng Group power is better;On the other hand, the increase of the reduction of step-length and fractionation video-frequency band, so that training data is also increased simultaneously
Add, network training effect is more preferable.
In view of deep video have the characteristics that describe object geometry and light, color it is insensitive, the present invention is with depth
Degree video is research object, constructs deep neural network model using traditional two-dimentional CNN (convolutional neural networks), right
Behavior in MSRDailyActivity3D data set carries out Classification and Identification.Experiment and the result shows that, this article propose based on CNN
Deep learning method the human body behavior indicated with deep video can effectively be identified, in MSRDailyActivity3D
In data set behavioral difference more significantly lie down sofa, five behaviors of walking, play guitar, stand and sit down average recognition rate
It is 98%, the discrimination to all behaviors on entire data set is 60.625%.How the present invention is also to improving depth simultaneously
The discrimination of habit has carried out certain explorative experiment.Research finds to split the reduction of video-frequency band step-length, fusion coarseness and particulate
The video information of degree, the appropriate network depth that increases can effectively improve the discrimination of depth network.
The present invention is not limited solely to above-described embodiment, without departing substantially from technical solution of the present invention spirit into
The technical solution of row few modifications should fall into protection scope of the present invention.
Claims (9)
1. a kind of Activity recognition method based on deep learning and multi-scale information, which comprises the steps of:
(1) training dataset is established;
(2) building has the deep neural network model of several parallel depth convolutional neural networks;
(3) the coarseness global behavior video that training data is concentrated is chosen, with the step-length L of settingStrideIt is segmented, wherein every
Segment length is set as LSeg, N is formd after segmentationSegA coarseness video-frequency band matrix, segments NSeg=1+ (NF- LSeg)/
LStride, NFFor the frame number of coarseness global behavior video;
(4) behavior video in fine granularity part is obtained from the coarseness global behavior video in step (3), to fine granularity partial row
Taking steps for video, (3) similarly method is segmented to obtain NSegA fine granularity video-frequency band matrix;
(5) N for obtaining step (3)SegThe N that a coarseness video-frequency band matrix and step (4) obtainSegA fine granularity video-frequency band
What is constructed in matrix parallel feeding step (2) has 2NSegIn the deep neural network model of a parallel depth convolutional neural networks
It is trained;
(6) selection coarseness global behavior video to be identified carries out step (3), (4) respectively obtain NSegA coarseness video
Section matrix and NSegA fine granularity video-frequency band matrix, the N that will be obtainedSegA coarseness video-frequency band matrix and NSegA fine granularity video
Section matrix parallel, which is sent into the trained deep neural network model that step (5) obtain, carries out Activity recognition.
2. the Activity recognition method according to claim 1 based on deep learning and multi-scale information, it is characterised in that: step
Suddenly the deep neural network model in (2) has classification layer, at least one convolution using convolutional neural networks as structure block
Layer, at least one pond layer and at least one full articulamentum.
3. the Activity recognition method according to claim 1 based on deep learning and multi-scale information, it is characterised in that: will
Each frame of coarseness global behavior video in step (3) is segmented again after carrying out down-sampling, makes coarseness video-frequency band square
The size of each frame of battle array is identical as the size of each frame of fine granularity video-frequency band matrix.
4. the Activity recognition method according to claim 1 based on deep learning and multi-scale information, it is characterised in that: thick
Granularity global behavior video is deep video.
5. the Activity recognition method according to claim 1 or 4 based on deep learning and multi-scale information, feature exist
In: the coarseness global behavior video that training data is concentrated is to pass through pretreated video, coarseness global behavior to be identified
Video is to pass through pretreated video.
6. the Activity recognition method according to claim 1 based on deep learning and multi-scale information, it is characterised in that: cut
The fine granularity part behavior sequence in each frame of coarseness global behavior video is taken to form fine granularity part behavior video.
7. a kind of Activity recognition method of the global behavior information based on deep learning and coarseness, which is characterized in that including such as
Lower step:
(1) training dataset is established;
(2) building has the deep neural network model of several parallel depth convolutional neural networks;
(3) step-length L of the global behavior video for the coarseness that training data is concentrated to set is chosenStrideIt is segmented, wherein
Every segment length is set as LSeg, N is formd after segmentationSegA video-frequency band matrix, segments NSeg=1+ (NF- LSeg)/LStride,
NFFor the frame number of deep video;Behavior video is deep video;
(4) N for obtaining step (3)SegWhat is constructed in a video-frequency band matrix parallel feeding step (2) has NSegA parallel depth
It spends in the deep neural network model of convolutional neural networks and is trained;
(5) the global behavior video for choosing coarseness to be identified carries out step (3) and obtains NSegA video-frequency band matrix, will obtain
NSegA video-frequency band matrix parallel is sent into trained deep neural network model and carries out Activity recognition.
8. the Activity recognition method of the global behavior information according to claim 7 based on deep learning and coarseness,
Be characterized in that: the deep neural network in step (2) has classification layer, at least one using convolutional neural networks as structure block
A convolutional layer, at least one pond layer and at least one full articulamentum.
9. the Activity recognition method of the global behavior information according to claim 7 based on deep learning and coarseness,
Be characterized in that: the global behavior video for the coarseness that training data is concentrated is by pretreated video, coarseness to be identified
Global behavior video be pass through pretreated video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610047682.0A CN105740773B (en) | 2016-01-25 | 2016-01-25 | Activity recognition method based on deep learning and multi-scale information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610047682.0A CN105740773B (en) | 2016-01-25 | 2016-01-25 | Activity recognition method based on deep learning and multi-scale information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105740773A CN105740773A (en) | 2016-07-06 |
CN105740773B true CN105740773B (en) | 2019-02-01 |
Family
ID=56247501
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610047682.0A Expired - Fee Related CN105740773B (en) | 2016-01-25 | 2016-01-25 | Activity recognition method based on deep learning and multi-scale information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105740773B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203503B (en) * | 2016-07-08 | 2019-04-05 | 天津大学 | A kind of action identification method based on bone sequence |
CN106599789B (en) * | 2016-07-29 | 2019-10-11 | 北京市商汤科技开发有限公司 | The recognition methods of video classification and device, data processing equipment and electronic equipment |
CN106228240B (en) * | 2016-07-30 | 2020-09-01 | 复旦大学 | Deep convolution neural network implementation method based on FPGA |
CN106504266B (en) * | 2016-09-29 | 2019-06-14 | 北京市商汤科技开发有限公司 | The prediction technique and device of walking behavior, data processing equipment and electronic equipment |
CN106778576B (en) * | 2016-12-06 | 2020-05-26 | 中山大学 | Motion recognition method based on SEHM characteristic diagram sequence |
CN106951872B (en) * | 2017-03-24 | 2020-11-06 | 江苏大学 | Pedestrian re-identification method based on unsupervised depth model and hierarchical attributes |
CN107066979A (en) * | 2017-04-18 | 2017-08-18 | 重庆邮电大学 | A kind of human motion recognition method based on depth information and various dimensions convolutional neural networks |
CN107886117A (en) * | 2017-10-30 | 2018-04-06 | 国家新闻出版广电总局广播科学研究院 | The algorithm of target detection merged based on multi-feature extraction and multitask |
CN107837087A (en) * | 2017-12-08 | 2018-03-27 | 兰州理工大学 | A kind of human motion state recognition methods based on smart mobile phone |
CN108038107B (en) * | 2017-12-22 | 2021-06-25 | 东软集团股份有限公司 | Sentence emotion classification method, device and equipment based on convolutional neural network |
CN108182441B (en) * | 2017-12-29 | 2020-09-18 | 华中科技大学 | Parallel multichannel convolutional neural network, construction method and image feature extraction method |
CN108280406A (en) * | 2017-12-30 | 2018-07-13 | 广州海昇计算机科技有限公司 | A kind of Activity recognition method, system and device based on segmentation double-stream digestion |
CN108182416A (en) * | 2017-12-30 | 2018-06-19 | 广州海昇计算机科技有限公司 | A kind of Human bodys' response method, system and device under monitoring unmanned scene |
CN108524209A (en) * | 2018-03-30 | 2018-09-14 | 江西科技师范大学 | Blind-guiding method, system, readable storage medium storing program for executing and mobile terminal |
CN108664931B (en) * | 2018-05-11 | 2022-03-01 | 中国科学技术大学 | Multi-stage video motion detection method |
CN108805083B (en) * | 2018-06-13 | 2022-03-01 | 中国科学技术大学 | Single-stage video behavior detection method |
CN109558805A (en) * | 2018-11-06 | 2019-04-02 | 南京邮电大学 | Human bodys' response method based on multilayer depth characteristic |
CN109214375B (en) * | 2018-11-07 | 2020-11-24 | 浙江大学 | Embryo pregnancy result prediction device based on segmented sampling video characteristics |
CN109657546A (en) * | 2018-11-12 | 2019-04-19 | 平安科技(深圳)有限公司 | Video behavior recognition methods neural network based and terminal device |
CN110119760B (en) * | 2019-04-11 | 2021-08-10 | 华南理工大学 | Sequence classification method based on hierarchical multi-scale recurrent neural network |
CN110163127A (en) * | 2019-05-07 | 2019-08-23 | 国网江西省电力有限公司检修分公司 | A kind of video object Activity recognition method from thick to thin |
CN110222587A (en) * | 2019-05-13 | 2019-09-10 | 杭州电子科技大学 | A kind of commodity attribute detection recognition methods again based on characteristic pattern |
CN110222598B (en) * | 2019-05-21 | 2022-09-27 | 平安科技(深圳)有限公司 | Video behavior identification method and device, storage medium and server |
CN111460876B (en) | 2019-06-05 | 2021-05-25 | 北京京东尚科信息技术有限公司 | Method and apparatus for identifying video |
CN110321963B (en) * | 2019-07-09 | 2022-03-04 | 西安电子科技大学 | Hyperspectral image classification method based on fusion of multi-scale and multi-dimensional space spectrum features |
CN111242110B (en) * | 2020-04-28 | 2020-08-14 | 成都索贝数码科技股份有限公司 | Training method of self-adaptive conditional random field algorithm for automatically breaking news items |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101866429A (en) * | 2010-06-01 | 2010-10-20 | 中国科学院计算技术研究所 | Training method of multi-moving object action identification and multi-moving object action identification method |
CN103593464A (en) * | 2013-11-25 | 2014-02-19 | 华中科技大学 | Video fingerprint detecting and video sequence matching method and system based on visual features |
CN104299012A (en) * | 2014-10-28 | 2015-01-21 | 中国科学院自动化研究所 | Gait recognition method based on deep learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8345984B2 (en) * | 2010-01-28 | 2013-01-01 | Nec Laboratories America, Inc. | 3D convolutional neural networks for automatic human action recognition |
-
2016
- 2016-01-25 CN CN201610047682.0A patent/CN105740773B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101866429A (en) * | 2010-06-01 | 2010-10-20 | 中国科学院计算技术研究所 | Training method of multi-moving object action identification and multi-moving object action identification method |
CN103593464A (en) * | 2013-11-25 | 2014-02-19 | 华中科技大学 | Video fingerprint detecting and video sequence matching method and system based on visual features |
CN104299012A (en) * | 2014-10-28 | 2015-01-21 | 中国科学院自动化研究所 | Gait recognition method based on deep learning |
Non-Patent Citations (2)
Title |
---|
Action Recognition Based on A Bag of 3D Points;Wanqing Li 等;《2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops》;20101231;第9-14页 |
人体动作行为识别研究综述;李瑞峰 等;《模式识别与人工智能》;20140131;第27卷(第1期);第33-46页 |
Also Published As
Publication number | Publication date |
---|---|
CN105740773A (en) | 2016-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105740773B (en) | Activity recognition method based on deep learning and multi-scale information | |
Ahmed | The impact of filter size and number of filters on classification accuracy in CNN | |
Hou et al. | Identification of animal individuals using deep learning: A case study of giant panda | |
CN103988232B (en) | Motion manifold is used to improve images match | |
CN107527351A (en) | A kind of fusion FCN and Threshold segmentation milking sow image partition method | |
KR20160101973A (en) | System and method for identifying faces in unconstrained media | |
CN105574510A (en) | Gait identification method and device | |
CN105975932B (en) | Gait Recognition classification method based on time series shapelet | |
CN103324677B (en) | Hierarchical fast image global positioning system (GPS) position estimation method | |
CN107918772B (en) | Target tracking method based on compressed sensing theory and gcForest | |
Singh et al. | Nature and biologically inspired image segmentation techniques | |
CN112784763A (en) | Expression recognition method and system based on local and overall feature adaptive fusion | |
CN109508675A (en) | A kind of pedestrian detection method for complex scene | |
Shang et al. | Using lightweight deep learning algorithm for real-time detection of apple flowers in natural environments | |
Aydogdu et al. | Comparison of three different CNN architectures for age classification | |
CN110532874A (en) | A kind of generation method, storage medium and the electronic equipment of thingness identification model | |
CN109886153A (en) | A kind of real-time face detection method based on depth convolutional neural networks | |
Chalasani et al. | Egocentric gesture recognition for head-mounted ar devices | |
Sun et al. | An improved CNN-based apple appearance quality classification method with small samples | |
CN106845456A (en) | A kind of method of falling over of human body monitoring in video monitoring system | |
CN112861718A (en) | Lightweight feature fusion crowd counting method and system | |
CN110232331A (en) | A kind of method and system of online face cluster | |
Lin et al. | Bird posture recognition based on target keypoints estimation in dual-task convolutional neural networks | |
JP2019204505A (en) | Object detection deice, object detection method, and storage medium | |
CN113869276A (en) | Lie recognition method and system based on micro-expression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190201 Termination date: 20220125 |
|
CF01 | Termination of patent right due to non-payment of annual fee |