CN105740773B - Activity recognition method based on deep learning and multi-scale information - Google Patents

Activity recognition method based on deep learning and multi-scale information Download PDF

Info

Publication number
CN105740773B
CN105740773B CN201610047682.0A CN201610047682A CN105740773B CN 105740773 B CN105740773 B CN 105740773B CN 201610047682 A CN201610047682 A CN 201610047682A CN 105740773 B CN105740773 B CN 105740773B
Authority
CN
China
Prior art keywords
video
seg
coarseness
frequency band
deep
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610047682.0A
Other languages
Chinese (zh)
Other versions
CN105740773A (en
Inventor
刘智
冯欣
张�杰
张杰慧
张凌
黄智勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Technology
Original Assignee
Chongqing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN201610047682.0A priority Critical patent/CN105740773B/en
Publication of CN105740773A publication Critical patent/CN105740773A/en
Application granted granted Critical
Publication of CN105740773B publication Critical patent/CN105740773B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The Activity recognition method based on deep learning and multi-scale information that the invention discloses a kind of, by constructing multiple depth networks, composition parallel organization carrys out the Human bodys' response of the depth of investigation video, deep video is first split into multiple video-frequency bands first, then learnt respectively using each parallel branch neural network, fusion connection is carried out to the high-rise expression that each neural network branch learns again, full articulamentum and classification layer finally are sent into fused high-rise expression and carries out Classification and Identification.Activity recognition can be effectively carried out using the method for deep learning, especially when each behavior act difference is larger, discrimination can be significantly improved, and real-time is high.

Description

Activity recognition method based on deep learning and multi-scale information
Technical field
The present invention relates to Human bodys' response fields, more particularly to a kind of row based on deep learning and multi-scale information For recognition methods.
Background technique
With the maturation of the hardware technologies such as computer, camera and the requirements at the higher level of social management, Human bodys' response Research increasingly causes the attention of computer vision research worker, and is widely used to monitor automatically, and event detection is man-machine Interface, the every field such as video acquisition.Traditional Human bodys' response method describes the view of human body behavior first against each Frequency carries out feature extraction, such as histograms of oriented gradients (Histograms of Oriented Gradient, HOG), motion history Image (Motion History Image, MHI) etc., then using classifiers such as support vector machines, random forests to extraction Feature carries out Classification and Identification.The research of Human bodys' response based on calculation method has been achieved for many outstanding achievements, however There is also some insoluble problems: the feature of extraction has specific aim, is not easy extensive to other data;Computing cost is too Greatly, it is difficult to accomplish real-time.
Deep learning can automatically extract the expression of the multilayer feature between being hidden in data, the depth based on convolutional neural networks Practise research image classification, identification, positioning, in terms of achieve very big success.However, the convolution in image procossing is Two-dimentional operation is not directly applicable the 3 D video of description human body behavior.
Summary of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of based on deep learning and multi-scale information Activity recognition method can effectively carry out Activity recognition using the method for deep learning, especially when each behavior act difference compared with When big, discrimination can be significantly improved, and Generalization Capability of the invention is good, can be trained on a large data sets, Be subsequently used for lacking in training the Activity recognition fields of data, can greatly reduce the time overhead of Activity recognition, real-time is high.
The present invention by constructing the deep neural network structure based on CNN, and is melted using deep video data as research object The multi-scale informations such as the global hand motion of human body behavioural information and part are closed, study three-dimensional using traditional two-dimentional CNN Human bodys' response.
The present invention carrys out the Human bodys' response of the depth of investigation video by constructing multiple depth networks, composition parallel organization. Deep video is first split into multiple video-frequency bands first, is then learnt respectively using each parallel branch neural network, then right The high-rise expression that each neural network branch learns carries out fusion connection, and the data vectorization of each branch's neural network is laggard Row connection, becomes one-dimensional vector, to input subsequent full articulamentum.Full articulamentum finally is sent into fused high-rise expression Classification and Identification is carried out with classification layer.At the same time, only exist for behavior most of in MSRDailyActivity3D data set Hand has nuance, such as read, write, with laptop, play game behavior, the invention proposes by fusion coarse grain The thought of the multi-scale informations such as the global behavior information of degree and fine-grained hand motion.
The object of the present invention is achieved like this: a kind of Activity recognition method based on deep learning and multi-scale information, Include the following steps:
(1) training dataset is established;The coarseness global behavior video that the training data is concentrated is selected from MSRDailyActivity3D data set.
(2) building has the deep neural network model of several parallel depth convolutional neural networks;
(3) step-length L of the coarseness global behavior video of training data concentration to set is chosenStrideIt is segmented, In, every segment length is set as LSeg, N is formd after segmentationSegA coarseness video-frequency band matrix, segments NSeg=1+ (NF? LSeg)/LStride, NFFor the frame number of coarseness global behavior video;
(4) behavior video in fine granularity part is obtained from the coarseness global behavior video in step (3), to fine granularity office Portion's behavior video takes steps, and (3) similarly method is segmented to obtain NSegA fine granularity video-frequency band matrix;Fine granularity video-frequency band The size of each frame of matrix is identical as the size of each frame of coarseness video-frequency band matrix.Intercept coarseness global behavior video Fine granularity part behavior sequence in each frame forms fine granularity part behavior video.Fine-grained part behavior can be hand Movement, or the details at other positions acts.Obtain fine granularity video method: with each frame of coarseness global behavior video Left hand joint centered on, interception W/4 × H/4 size frame form NFThe new video of × W/4 × H/4, the video are fine granularity Hand motion video, wherein W, H, NFThe frame number for respectively including in the width of original depth video frame, height and video.This is big It is in the same size after the small video down-sampling with coarseness.
(5) N for obtaining step (3)SegThe N that a coarseness video-frequency band matrix and step (4) obtainSegA fine granularity video What is constructed in section matrix parallel feeding step (1) has 2NSegThe deep neural network model of a parallel depth convolutional neural networks In be trained;
(6) selection coarseness global behavior video to be identified carries out step (3), (4) respectively obtain NSegA coarseness view Frequency range matrix and NSegA fine granularity video-frequency band matrix, the N that will be obtainedSegA coarseness video-frequency band matrix and NSegA fine granularity view Frequency range matrix parallel is sent into the trained deep neural network model that step (5) obtain and carries out Activity recognition.Wait know Other coarseness global behavior video is to pass through pretreated video.
Deep neural network in step (2) using convolutional neural networks as structure block, have a classification layer, at least one Convolutional layer, at least one pond layer and at least one full articulamentum.Parallel depth convolutional neural networks include sequentially connected First convolutional layer, the first pond layer, the second convolutional layer, the second pond layer, third convolutional layer, third pond layer, the first full connection Layer, the second full articulamentum and classification layer.
It is segmented, acts on again after each frame of coarseness global behavior video in step (3) is carried out down-sampling are as follows: 1, calculation amount is reduced;2, make the big of the size of each frame of coarseness video-frequency band matrix and each frame of fine granularity video-frequency band matrix It is small identical, convenient for input network.
Coarseness global behavior video is deep video.
The coarseness global behavior video that training data is concentrated is to pass through pretreated video, and coarseness to be identified is global Behavior video is to pass through pretreated video.The pretreatment are as follows: firstly, using interpolation technique by all videos in data set Standardize to unified length.The length is the median of all video lengths.Secondly, removal background, only retain taking human as The video section at center, and by video size specification to certain size.Again, using min-max method respectively by all videos X, y, z coordinate value standardization arrive [0,1] range.Finally, all samples progress flip horizontal is formed new sample from forming Training sample in times dilated data set.
A kind of Activity recognition method based on deep learning and multi-scale information, includes the following steps:
(1) training dataset is established;The deep video that the training data is concentrated is selected from MSRDailyActivity3D number According to collection;
(2) building has the deep neural network model of several parallel depth convolutional neural networks;
(3) step-length L of the behavior video of training data concentration to set is chosenStrideIt is segmented, wherein every segment length is set It is set to LSeg, N is formd after segmentationSegA video-frequency band matrix, segments NSeg=1+ (NF- LSeg)/LStride, NFFor depth view The frame number of frequency;
(4) N for obtaining step (3)SegWhat is constructed in a video-frequency band matrix parallel feeding step (2) has NSegIt is a parallel It is trained in the deep neural network model of depth convolutional neural networks;
(5) it chooses behavior video progress step (3) to be identified and obtains NSegA video-frequency band matrix, the N that will be obtainedSegA view Frequency range matrix parallel is sent into trained deep neural network model and carries out Activity recognition.Behavior video to be identified is By pretreated video.
Deep neural network in step (2) using convolutional neural networks as structure block, have a classification layer, at least one Convolutional layer, at least one pond layer and at least one full articulamentum.
Behavior video is deep video.
The behavior video that training data is concentrated is to pass through pretreated video, and behavior video to be identified is by pretreatment Video.The pretreatment are as follows: firstly, all video specificationizations in data set are arrived unified length using interpolation technique. The length is the median of all video lengths.Secondly, removal background, only retains video section focusing on people, and will Video size specification is to certain size.Again, using min-max method respectively by the x of all videos, y, z coordinate value standardization To [0,1] range.Finally, all samples progress flip horizontal is formed new sample thus the training in dilated data set at double Sample.
The invention has the benefit that the present invention obtains coarseness and fine granularity video matrix, to designed parallel depth Degree convolutional neural networks are trained, and the identification that the deep neural network after training is used for behavior are classified, so that of the invention Generalization Capability is good, can be trained on a large data sets, the Activity recognition field for the data that are subsequently used for lacking in training.
The present invention devises a parallel depth convolutional neural networks can be subtracted significantly by the parallel input of behavior video The time overhead of few Activity recognition, real-time effect are good.
The present invention is research object using deep video, and deep video has description object geometry and light, color Insensitive feature.
Experiment and the result shows that, the deep learning method proposed by the present invention based on CNN can be indicated with deep video Human body behavior effectively identified that behavioral difference is more significantly lain down sand in MSRDailyActivity3D data set Send out, walk, play guitar, stand and sit down five behaviors average recognition rate be 98%, to all behaviors on entire data set Discrimination is 60.625%.
Present invention will be further explained below with reference to the attached drawings and specific embodiments.
Detailed description of the invention
Fig. 1 is the functional block diagram of the Activity recognition method of the invention based on deep learning and multi-scale information;
Fig. 2 be in MSRDailyActivity3D behavior video (it is upper before pretreatment: drink water, under: write);
Fig. 3 be in MSRDailyActivity3D behavior video (it is upper after pretreatment: drink water, under: write).
Specific embodiment
Embodiment one
Referring to Fig. 1, a kind of Activity recognition method based on deep learning and multi-scale information includes the following steps:
(1) training dataset is established;The coarseness global behavior video that the training data is concentrated is selected from MSRDailyActivity3D data set.The coarseness global behavior video that training data is concentrated is to pass through pretreated video. Coarseness global behavior video to be identified is to pass through pretreated video.The pretreatment are as follows: firstly, will using interpolation technique All video specificationizations in data set arrive unified length.The length is the median of all video lengths.Secondly, removal Background, only retains video section focusing on people, and by video size specification to certain size.Again, the side min-max is used Method is respectively by the x of all videos, y, z coordinate value standardization to [0,1] range.Finally, all samples are carried out flip horizontal shape The sample of Cheng Xin is to the training sample in dilated data set at double.
(2) building has the deep neural network model of several parallel depth convolutional neural networks.Depth in step (2) Neural network using convolutional neural networks as structure block, have a classification layer, at least one convolutional layer, at least one pond layer with And at least one full articulamentum.Present invention classification layer uses softmax classifier.The parallel depth convolution mind of the present embodiment It include sequentially connected first convolutional layer, the first pond layer, the second convolutional layer, the second pond layer, third convolutional layer, through network Three pond layers, the first full articulamentum, the second full articulamentum and classification layer.
(3) step-length L of the coarseness global behavior video of training data concentration to set is chosenStrideIt is segmented, In, every segment length is set as LSeg, N is formd after segmentationSegA coarseness video-frequency band matrix, segments NSeg=1+ (NF? LSeg)/LStride, NFFor the frame number of coarseness global behavior video.By each of the coarseness global behavior video in step (3) Frame is segmented again after carrying out down-sampling, is acted on are as follows: 1, reduction calculation amount;2, make the big of each frame of coarseness video-frequency band matrix It is small identical as the size of each frame of fine granularity video-frequency band matrix, convenient for input network.Research object, that is, coarseness global behavior Video uses deep video.
(4) behavior video in fine granularity part is obtained from the coarseness global behavior video in step (3), to fine granularity office Portion's behavior video takes steps, and (3) similarly method is segmented to obtain NSegA fine granularity video-frequency band matrix.Fine granularity video-frequency band The size of each frame of matrix is identical as the size of each frame of coarseness video-frequency band matrix.Intercept coarseness global behavior video Fine granularity part behavior sequence in each frame forms fine granularity part behavior video.Fine-grained part behavior can be hand Movement, or other details movement.Fine-grained partial row is to be determined according to specifically application, and the details of notebook data collection is dynamic It is concentrated mainly on hand, if details movement may choose the details movement of other parts at other positions.The present embodiment Centered on the hand joint of each frame of coarseness global behavior video, intercepting the frame composition frame number being sized is NFParticulate Spend local behavior video.
(5) N for obtaining step (3)SegThe N that a coarseness video-frequency band matrix and step (4) obtainSegA fine granularity video What is constructed in section matrix parallel feeding step (1) has 2NSegThe deep neural network model of a parallel depth convolutional neural networks In be trained;
(6) selection coarseness global behavior video to be identified carries out step (3), (4) respectively obtain NSegA coarseness view Frequency range matrix and NSegA fine granularity video-frequency band matrix, the N that will be obtainedSegA coarseness video-frequency band matrix and NSegA fine granularity view Frequency range matrix parallel is sent into trained deep neural network model and carries out Activity recognition.N before the present embodimentSegA net Network handles coarseness video, rear NSegA network processes fine granularity video.
Embodiment two
The Activity recognition method based on deep learning and multi-scale information that present embodiment discloses a kind of, the present embodiment only make Activity recognition is carried out with the global behavior information of coarseness.Include the following steps:
(1) training dataset is established;The deep video that the training data is concentrated is selected from MSRDailyActivity3D number According to collection;The behavior video that training data is concentrated is to pass through pretreated video.Behavior video to be identified is by pretreated Video.The pretreatment are as follows: firstly, all video specificationizations in data set are arrived unified length using interpolation technique.It should Length is the median of all video lengths.Secondly, removal background, only retains video section focusing on people, and will view Frequency size specification is to certain size.Again, using min-max method respectively by the x of all videos, y, z coordinate value, which standardizes, to be arrived [0,1] range.Finally, all samples progress flip horizontal is formed new sample to the training sample in dilated data set at double This.
(2) referring to Fig. 1, the deep neural network model with several parallel depth convolutional neural networks is constructed.Step (2) In deep neural network using convolutional neural networks as structure block, have a classification layer, at least one convolutional layer, at least one Pond layer and at least one full articulamentum.
Present invention classification layer uses softmax classifier.
(3) step-length L of the deep video of training data concentration to set is chosenStrideIt is segmented, wherein every segment length is set It is set to LSeg, N is formd after segmentationSegA video-frequency band matrix, segments NSeg=1+ (NF- LSeg)/LStride, NFFor depth view The frame number of frequency;
(4) N for obtaining step (3)SegWhat is constructed in a video-frequency band matrix parallel feeding step (2) has NSegIt is a parallel It is trained in the deep neural network model of depth convolutional neural networks;
(5) it chooses deep video progress step (3) to be identified and obtains NSegA video-frequency band matrix, the N that will be obtainedSegA view Frequency range matrix parallel is sent into trained deep neural network model and carries out Activity recognition.
Experimental procedure of the present invention is described as follows: assuming that the video size of one behavior of expression after standardization is NF×W×H (being 192 × 128 × 128 in the present invention), wherein W, H are respectively the width and height of video frame.
It (1) is N by frame numberFBehavior video with LStrideIt is segmented for step-length, wherein every segment length is LSeg, then it is segmented Number is NSeg=1+ (NF- LSeg/LStride, then by 1/4 down-sampling of video frame, then N is formd after being segmentedSeg×LSeg×W/4× The video-frequency band matrix of H/4;
(2) centered on the left hand joint of each frame of deep video, the frame of interception W/4 × H/4 size forms NF×W/4× The new video of H/4, taking steps to new video, (1) similarly method obtains NSeg×LSegThe video-frequency band matrix of × W/4 × H/4;
(3) it is merged the video-frequency band matrix of step (1) and step (2) to obtain 2NSeg×LSegThe view of × W/4 × H/4 Frequency range matrix;The video-frequency band matrix is the input of depth network, i.e., the network has 2NSegA parallel depth convolutional Neural net Network, the input of each deep neural network are LSegThe video of × W/4 × H/4.
(4) parallel depth convolutional neural networks are trained using training dataset, then using test data set into The test of row Human bodys' response, training dataset and subject data set are completely non-intersecting.{ 1,3,5,7,9 } is tested in the present invention The behavior video of performance is tested the behavior video of { 2,4,6,8,10 } performance for testing for training.The data set is by 10 Personal (subject) is completed, and the data of the 1st, 3,5,7,9 people are for training, and the data of 2,4,6,8,10 this 5 people are for testing.
Assuming that LSeg=16, LStride=16, then deep neural network frame is needed using 24 parallel networks, each network Input be 16 × 32 × 32 video-frequency band sequence, i.e., each video-frequency band contains 16 frame videos, and video image size is 32 × 32.
The depth network and its parameter that 1 present invention of table uses
Experiment and discussion
1. data set and pretreatment
The MSRDailyActivity3D data set that the present invention uses Kinect device to acquire using Microsoft, the number Have collected 16 kinds of behaviors common in daily life according to collection: drink water, eat a piece, read, make a phone call, write, with laptop, With dust catcher, hails, stands still, paper-tear picture, plays game, sofa of lying down, walks, plays guitar, stands and sit down.Each behavior is dynamic Make to be completed in two different ways by same main examination: being sitting on sofa or stand.Entire data set shares 320 behavior views Frequently.Fig. 2 gives some behavior samples in the data set.The data set has recorded human body behavior and ambient enviroment simultaneously, mentions The depth information of taking-up contains a large amount of noise, and only nuance is being locally present in most of behavior in data set, such as Shown in Fig. 2, Fig. 3, thus it is extremely challenging.
Before experiment, simple pretreatment is carried out to each video, firstly, using interpolation technique by the institute in data set There is video specificationization to unified length, which is the median of all video lengths;Secondly, removal background, only retains Video section focusing on people, and by video size specification to certain size, as shown in Figure 3;Again, the side min-max is used Method is respectively by the x of all videos, y, z coordinate value standardization to [0,1] range;Finally, all samples are carried out flip horizontal shape The sample of Cheng Xin is to the training sample in dilated data set at double.Experiment of the invention is compiled using Torch platform [20] It writes, learning rate therein is 1*10-4, loss function is the soft max function that platform carries.
2. the HAR based on multi-scale information fusion and deep learning is identified
The present invention identifies video and fine-grained hand using the 2CNN2F network in table 1, by the global behavior of coarseness Input of the multi-scale informations such as action sequence as depth network.Step-length L in this section experimentStrideWith segments LSegIt is respectively provided with It is 16, i.e., it is the local hand of 12 × 16 × 32 × 32 global behavior sequence and 12 × 16 × 32 × 32 for extracting entire video is dynamic Make sequence and merges fusion 24 × 16 × 32 × 32 input video matrixes of formation.Table 2 gives proposition method of the present invention and its other party The comparison of method recognition performance on MSRDailyActivity3D data set.Wherein 2CNN2F refers to the overall situation using only coarseness Behavioural information, and 2CNN2F+Joint then indicates multi-scale information fusion method of the invention.It can be seen that the method for the present invention from table The accuracy of Activity recognition is 60.625%, if only discrimination is in a slight decrease using the global behavior information of coarseness, It is 56.875%, the method for recognition performance and traditional artificial feature extraction is comparable.It is worth noting that, if only Then discrimination, which reaches, is identified to the 11-16 behavior (play game, sofa of lying down, walk, play guitar, stand and sit down) To 98%, this may be because with biggish difference between the 11-16 behavior, and between other large number of rows in data set are Difference it is very subtle, such as read, write, only having nuance in hand motion with the several behaviors of laptop.Experiment As a result illustrate, can effectively carry out Activity recognition using the method for deep learning, especially when each behavior act difference is larger, Discrimination can be significantly improved.
2 the method for the present invention of table is compared with other methods are in the recognition performance on MSRDailyActivity3D data set
Algorithm Discrimination
LOP features[8] 42.5%
Joint Position features[8] 68%
Dynamic Temporal Warping[21] 54%
2CNN2F 56.875%
2CNN2F+Joint 60.625%
3. influence of the network depth to identification
The present invention constructs the neural network containing 3 layers of CNN and 4 layer of CNN simultaneously respectively, i.e. 3CNN2F_8 and 4 CNN2F are (such as Shown in table 3), the influence for Probe into Network depth to recognition effect.Network parameter is as shown in table 1.Since network depth increases, In order to guarantee that network not transition is fitted, this experiment uses the input of 24 × 8 × 128 × 128 video sequence as neural network, 192 × 128 × 128 videos after will standardizing split into 24 8 × 128 × 128 video-frequency bands, simultaneously with step-length for 8 It is input to the neural network with 24 parallel organizations.Discrimination when as shown in Table 2, using 3CNN2F_8 network is 52.5%, and the discrimination of 4 CNN2F is 58.75%.Experimental result illustrates that the increase of network depth can effectively improve behavior Discrimination.
Parameter configuration and discrimination in 3 heterogeneous networks of table
4. splitting influence of the step-length to recognition effect
In order to examine the influence for splitting step-length to recognition effect, the present invention constructs two differences for 3CNN2F type network The network of input: the video sequence that the input of 3CNN2F_8 and 3CNN2F_4,3CNN2F_8 are 24 × 8 × 128 × 128, and The size of the input of 3CNN2F_4 is 47 × 8 × 128 × 128, i.e., by 192 × 128 × 128 videos after standardization, with step-length It is 4, splits into 47 8 × 128 × 128 video-frequency bands, with the repetition of 4 frames between the two adjacent video section after fractionation.Experiment The results are shown in Table 3.Step-length be 8 when, recognition accuracy 52.5%, and step-length be 4 when, recognition accuracy 56.875%. Discrimination effectively improves, and the reduction for being primarily due to step-length will lead to both sides variation, and one side step-length is smaller, splits Video-frequency band it is more, depth network needs more parallel branch, and what is horizontally become is wider, and network parameter is more, network General Huaneng Group power is better;On the other hand, the increase of the reduction of step-length and fractionation video-frequency band, so that training data is also increased simultaneously Add, network training effect is more preferable.
In view of deep video have the characteristics that describe object geometry and light, color it is insensitive, the present invention is with depth Degree video is research object, constructs deep neural network model using traditional two-dimentional CNN (convolutional neural networks), right Behavior in MSRDailyActivity3D data set carries out Classification and Identification.Experiment and the result shows that, this article propose based on CNN Deep learning method the human body behavior indicated with deep video can effectively be identified, in MSRDailyActivity3D In data set behavioral difference more significantly lie down sofa, five behaviors of walking, play guitar, stand and sit down average recognition rate It is 98%, the discrimination to all behaviors on entire data set is 60.625%.How the present invention is also to improving depth simultaneously The discrimination of habit has carried out certain explorative experiment.Research finds to split the reduction of video-frequency band step-length, fusion coarseness and particulate The video information of degree, the appropriate network depth that increases can effectively improve the discrimination of depth network.
The present invention is not limited solely to above-described embodiment, without departing substantially from technical solution of the present invention spirit into The technical solution of row few modifications should fall into protection scope of the present invention.

Claims (9)

1. a kind of Activity recognition method based on deep learning and multi-scale information, which comprises the steps of:
(1) training dataset is established;
(2) building has the deep neural network model of several parallel depth convolutional neural networks;
(3) the coarseness global behavior video that training data is concentrated is chosen, with the step-length L of settingStrideIt is segmented, wherein every Segment length is set as LSeg, N is formd after segmentationSegA coarseness video-frequency band matrix, segments NSeg=1+ (NF- LSeg)/ LStride, NFFor the frame number of coarseness global behavior video;
(4) behavior video in fine granularity part is obtained from the coarseness global behavior video in step (3), to fine granularity partial row Taking steps for video, (3) similarly method is segmented to obtain NSegA fine granularity video-frequency band matrix;
(5) N for obtaining step (3)SegThe N that a coarseness video-frequency band matrix and step (4) obtainSegA fine granularity video-frequency band What is constructed in matrix parallel feeding step (2) has 2NSegIn the deep neural network model of a parallel depth convolutional neural networks It is trained;
(6) selection coarseness global behavior video to be identified carries out step (3), (4) respectively obtain NSegA coarseness video Section matrix and NSegA fine granularity video-frequency band matrix, the N that will be obtainedSegA coarseness video-frequency band matrix and NSegA fine granularity video Section matrix parallel, which is sent into the trained deep neural network model that step (5) obtain, carries out Activity recognition.
2. the Activity recognition method according to claim 1 based on deep learning and multi-scale information, it is characterised in that: step Suddenly the deep neural network model in (2) has classification layer, at least one convolution using convolutional neural networks as structure block Layer, at least one pond layer and at least one full articulamentum.
3. the Activity recognition method according to claim 1 based on deep learning and multi-scale information, it is characterised in that: will Each frame of coarseness global behavior video in step (3) is segmented again after carrying out down-sampling, makes coarseness video-frequency band square The size of each frame of battle array is identical as the size of each frame of fine granularity video-frequency band matrix.
4. the Activity recognition method according to claim 1 based on deep learning and multi-scale information, it is characterised in that: thick Granularity global behavior video is deep video.
5. the Activity recognition method according to claim 1 or 4 based on deep learning and multi-scale information, feature exist In: the coarseness global behavior video that training data is concentrated is to pass through pretreated video, coarseness global behavior to be identified Video is to pass through pretreated video.
6. the Activity recognition method according to claim 1 based on deep learning and multi-scale information, it is characterised in that: cut The fine granularity part behavior sequence in each frame of coarseness global behavior video is taken to form fine granularity part behavior video.
7. a kind of Activity recognition method of the global behavior information based on deep learning and coarseness, which is characterized in that including such as Lower step:
(1) training dataset is established;
(2) building has the deep neural network model of several parallel depth convolutional neural networks;
(3) step-length L of the global behavior video for the coarseness that training data is concentrated to set is chosenStrideIt is segmented, wherein Every segment length is set as LSeg, N is formd after segmentationSegA video-frequency band matrix, segments NSeg=1+ (NF- LSeg)/LStride, NFFor the frame number of deep video;Behavior video is deep video;
(4) N for obtaining step (3)SegWhat is constructed in a video-frequency band matrix parallel feeding step (2) has NSegA parallel depth It spends in the deep neural network model of convolutional neural networks and is trained;
(5) the global behavior video for choosing coarseness to be identified carries out step (3) and obtains NSegA video-frequency band matrix, will obtain NSegA video-frequency band matrix parallel is sent into trained deep neural network model and carries out Activity recognition.
8. the Activity recognition method of the global behavior information according to claim 7 based on deep learning and coarseness, Be characterized in that: the deep neural network in step (2) has classification layer, at least one using convolutional neural networks as structure block A convolutional layer, at least one pond layer and at least one full articulamentum.
9. the Activity recognition method of the global behavior information according to claim 7 based on deep learning and coarseness, Be characterized in that: the global behavior video for the coarseness that training data is concentrated is by pretreated video, coarseness to be identified Global behavior video be pass through pretreated video.
CN201610047682.0A 2016-01-25 2016-01-25 Activity recognition method based on deep learning and multi-scale information Expired - Fee Related CN105740773B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610047682.0A CN105740773B (en) 2016-01-25 2016-01-25 Activity recognition method based on deep learning and multi-scale information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610047682.0A CN105740773B (en) 2016-01-25 2016-01-25 Activity recognition method based on deep learning and multi-scale information

Publications (2)

Publication Number Publication Date
CN105740773A CN105740773A (en) 2016-07-06
CN105740773B true CN105740773B (en) 2019-02-01

Family

ID=56247501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610047682.0A Expired - Fee Related CN105740773B (en) 2016-01-25 2016-01-25 Activity recognition method based on deep learning and multi-scale information

Country Status (1)

Country Link
CN (1) CN105740773B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203503B (en) * 2016-07-08 2019-04-05 天津大学 A kind of action identification method based on bone sequence
CN106599789B (en) * 2016-07-29 2019-10-11 北京市商汤科技开发有限公司 The recognition methods of video classification and device, data processing equipment and electronic equipment
CN106228240B (en) * 2016-07-30 2020-09-01 复旦大学 Deep convolution neural network implementation method based on FPGA
CN106504266B (en) * 2016-09-29 2019-06-14 北京市商汤科技开发有限公司 The prediction technique and device of walking behavior, data processing equipment and electronic equipment
CN106778576B (en) * 2016-12-06 2020-05-26 中山大学 Motion recognition method based on SEHM characteristic diagram sequence
CN106951872B (en) * 2017-03-24 2020-11-06 江苏大学 Pedestrian re-identification method based on unsupervised depth model and hierarchical attributes
CN107066979A (en) * 2017-04-18 2017-08-18 重庆邮电大学 A kind of human motion recognition method based on depth information and various dimensions convolutional neural networks
CN107886117A (en) * 2017-10-30 2018-04-06 国家新闻出版广电总局广播科学研究院 The algorithm of target detection merged based on multi-feature extraction and multitask
CN107837087A (en) * 2017-12-08 2018-03-27 兰州理工大学 A kind of human motion state recognition methods based on smart mobile phone
CN108038107B (en) * 2017-12-22 2021-06-25 东软集团股份有限公司 Sentence emotion classification method, device and equipment based on convolutional neural network
CN108182441B (en) * 2017-12-29 2020-09-18 华中科技大学 Parallel multichannel convolutional neural network, construction method and image feature extraction method
CN108280406A (en) * 2017-12-30 2018-07-13 广州海昇计算机科技有限公司 A kind of Activity recognition method, system and device based on segmentation double-stream digestion
CN108182416A (en) * 2017-12-30 2018-06-19 广州海昇计算机科技有限公司 A kind of Human bodys' response method, system and device under monitoring unmanned scene
CN108524209A (en) * 2018-03-30 2018-09-14 江西科技师范大学 Blind-guiding method, system, readable storage medium storing program for executing and mobile terminal
CN108664931B (en) * 2018-05-11 2022-03-01 中国科学技术大学 Multi-stage video motion detection method
CN108805083B (en) * 2018-06-13 2022-03-01 中国科学技术大学 Single-stage video behavior detection method
CN109558805A (en) * 2018-11-06 2019-04-02 南京邮电大学 Human bodys' response method based on multilayer depth characteristic
CN109214375B (en) * 2018-11-07 2020-11-24 浙江大学 Embryo pregnancy result prediction device based on segmented sampling video characteristics
CN109657546A (en) * 2018-11-12 2019-04-19 平安科技(深圳)有限公司 Video behavior recognition methods neural network based and terminal device
CN110119760B (en) * 2019-04-11 2021-08-10 华南理工大学 Sequence classification method based on hierarchical multi-scale recurrent neural network
CN110163127A (en) * 2019-05-07 2019-08-23 国网江西省电力有限公司检修分公司 A kind of video object Activity recognition method from thick to thin
CN110222587A (en) * 2019-05-13 2019-09-10 杭州电子科技大学 A kind of commodity attribute detection recognition methods again based on characteristic pattern
CN110222598B (en) * 2019-05-21 2022-09-27 平安科技(深圳)有限公司 Video behavior identification method and device, storage medium and server
CN111460876B (en) 2019-06-05 2021-05-25 北京京东尚科信息技术有限公司 Method and apparatus for identifying video
CN110321963B (en) * 2019-07-09 2022-03-04 西安电子科技大学 Hyperspectral image classification method based on fusion of multi-scale and multi-dimensional space spectrum features
CN111242110B (en) * 2020-04-28 2020-08-14 成都索贝数码科技股份有限公司 Training method of self-adaptive conditional random field algorithm for automatically breaking news items

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866429A (en) * 2010-06-01 2010-10-20 中国科学院计算技术研究所 Training method of multi-moving object action identification and multi-moving object action identification method
CN103593464A (en) * 2013-11-25 2014-02-19 华中科技大学 Video fingerprint detecting and video sequence matching method and system based on visual features
CN104299012A (en) * 2014-10-28 2015-01-21 中国科学院自动化研究所 Gait recognition method based on deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8345984B2 (en) * 2010-01-28 2013-01-01 Nec Laboratories America, Inc. 3D convolutional neural networks for automatic human action recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866429A (en) * 2010-06-01 2010-10-20 中国科学院计算技术研究所 Training method of multi-moving object action identification and multi-moving object action identification method
CN103593464A (en) * 2013-11-25 2014-02-19 华中科技大学 Video fingerprint detecting and video sequence matching method and system based on visual features
CN104299012A (en) * 2014-10-28 2015-01-21 中国科学院自动化研究所 Gait recognition method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Action Recognition Based on A Bag of 3D Points;Wanqing Li 等;《2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops》;20101231;第9-14页
人体动作行为识别研究综述;李瑞峰 等;《模式识别与人工智能》;20140131;第27卷(第1期);第33-46页

Also Published As

Publication number Publication date
CN105740773A (en) 2016-07-06

Similar Documents

Publication Publication Date Title
CN105740773B (en) Activity recognition method based on deep learning and multi-scale information
Ahmed The impact of filter size and number of filters on classification accuracy in CNN
Hou et al. Identification of animal individuals using deep learning: A case study of giant panda
CN103988232B (en) Motion manifold is used to improve images match
CN107527351A (en) A kind of fusion FCN and Threshold segmentation milking sow image partition method
KR20160101973A (en) System and method for identifying faces in unconstrained media
CN105574510A (en) Gait identification method and device
CN105975932B (en) Gait Recognition classification method based on time series shapelet
CN103324677B (en) Hierarchical fast image global positioning system (GPS) position estimation method
CN107918772B (en) Target tracking method based on compressed sensing theory and gcForest
Singh et al. Nature and biologically inspired image segmentation techniques
CN112784763A (en) Expression recognition method and system based on local and overall feature adaptive fusion
CN109508675A (en) A kind of pedestrian detection method for complex scene
Shang et al. Using lightweight deep learning algorithm for real-time detection of apple flowers in natural environments
Aydogdu et al. Comparison of three different CNN architectures for age classification
CN110532874A (en) A kind of generation method, storage medium and the electronic equipment of thingness identification model
CN109886153A (en) A kind of real-time face detection method based on depth convolutional neural networks
Chalasani et al. Egocentric gesture recognition for head-mounted ar devices
Sun et al. An improved CNN-based apple appearance quality classification method with small samples
CN106845456A (en) A kind of method of falling over of human body monitoring in video monitoring system
CN112861718A (en) Lightweight feature fusion crowd counting method and system
CN110232331A (en) A kind of method and system of online face cluster
Lin et al. Bird posture recognition based on target keypoints estimation in dual-task convolutional neural networks
JP2019204505A (en) Object detection deice, object detection method, and storage medium
CN113869276A (en) Lie recognition method and system based on micro-expression

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190201

Termination date: 20220125

CF01 Termination of patent right due to non-payment of annual fee