CN110222556A

CN110222556A - A kind of human action identifying system and method

Info

Publication number: CN110222556A
Application number: CN201910324097.4A
Authority: CN
Inventors: 叶青; 钟浩鑫; 张永梅
Original assignee: North China University of Technology
Current assignee: North China University of Technology
Priority date: 2019-04-22
Filing date: 2019-04-22
Publication date: 2019-09-10

Abstract

The present invention provides a kind of human action identifying system and method, the system comprises: the two-way more Memory Neural Networks modules of video image acquisition module, image standardization module, depth residual error, classifier modules；Video image acquisition module will carry out sampling processing to raw video image, obtain sampling rear video image, and input to image standardization module and be standardized, obtain stadardized video image；The stadardized video image input two-way more Memory Neural Networks modules of depth residual error are handled；The two-way more Memory Neural Networks modules of depth residual error include beta pruning network unit, depth residual error bilateral network unit；Classifier modules are classified based on the output of the two-way more Memory Neural Networks modules of the depth residual error.Technical solution of the present invention well solves gradient disappearance problem, and the coupling ability of Enhanced feature significantly speeds up training, improves the accuracy rate and arithmetic speed of identification.

Description

A kind of human action identifying system and method

Technical field

The present invention relates to computer vision field, be based particularly on the human action of neural network visual identifying system and Method.

Background technique

With the continuous development of the information society, computer vision is increasingly by the concern of every field, and in computer In visual field, the identification of human action in video is a big important research direction, by the human action in video into Row identification, can bring huge convenience to fields such as intelligent video monitoring, virtual reality, video security protection and smart homes.Meter Calculation machine visual field studies computer generation mainly for manually the work such as identification, detection, processing, video is completed to multimedia information In human action identification be an important research direction in the field again, the identification of human action, for smart home, The fields such as security protection, virtual reality play a very important role.This sentences safety protection field as an example to carry out Explanation.The range of security protection is very wide, the small accident early warning into city, is all the model of security protection to Border Protection early warning greatly It encloses, by the way that the human action identification in video is applied to the field, analysis can be made to the concrete condition of accident, it can also be with The movement of a suspect on boundary is judged and early warning, to greatly improve the efficiency of security protection and reduce people The consumption of power material resources, meanwhile, rest can not had in view of computer, the time of work is longer and will not be tired out, moreover it is possible to without dead angle It is comprehensive be monitored to ensure the features such as safe, which brings great convenience to people's lives, has wide General application prospect.

Traditional human motion recognition method is first to carry out moving object detection, then carry out feature extraction, finally by feature Classified to obtain recognition result.There is common feature type in conventional action identification: static nature, behavioral characteristics, space-time are special It seeks peace descriptive characteristics.Common action identification method is divided into three classes: the method for method, probability statistics based on template is based on The method of grammer.In order to further increase the discrimination of algorithm, with the proposition of deep learning method, it is based on depth e-learning The method for automatically extracting feature has been applied in human action identification.However the existing human action identification based on deep learning In method, the network structures such as traditional AlexNet, VGG are all to obtain better instruction by increasing the depth (number of plies) of network Practice effect, but the increase of the number of plies can bring many negative interactions, such as the problems such as the disappearance of over-fitting, gradient, gradient explosion.

Summary of the invention

In view of the deficiencies of the prior art, this patent proposes a kind of human motion recognition method, two-way more based on depth residual error Memory Neural Networks, the network constitute front network module, then the feature square that will be obtained by removing unwanted network unit Battle array is sent into as input and is trained by the depth bilateral network of residual error structure of modification, is finally classified by classifier, is obtained Human action recognition result.The training technique of the Web vector graphic " end-to-end ", parameter and depth residual error network to neural network Combined optimization is carried out, ability to express of the new neural network characteristics in new data set is ensure that, improves the general of whole system Change performance.Specifically, the present invention provides technical solutions below:

On the one hand, the present invention provides a kind of human action identifying system, the system comprises: video image acquisition mould The two-way more Memory Neural Networks modules of block, image standardization module, depth residual error, classifier modules；

The video image acquisition module will carry out sampling processing to raw video image, obtain sampling rear video image, And input to image standardization module and be standardized, obtain stadardized video image；

The stadardized video image input two-way more Memory Neural Networks modules of depth residual error are handled；

The two-way more Memory Neural Networks modules of depth residual error include beta pruning network unit, depth residual error bilateral network list Member；

The beta pruning network unit handles convolutional neural networks by deleting mode, the neural network after obtaining beta pruning, institute It states beta pruning and follows following principle:

If | w | < α, respective weights are removed, and wherein w is circulation weight, and α is sensitivity parameter；

The depth residual error bilateral network unit, by the two-way LSTM network implementations of depth residual error network integration, with quick Connection type reduces relation of interdependence between parameter, and itself will be mapped to superimposed layer and be added with the output of convolutional layer；

Neural network and the two-way LSTM network association training after the beta pruning；

The classifier modules are classified based on the output of the two-way more Memory Neural Networks modules of the depth residual error.

Preferably, raw video image carries out sampling processing to the video acquisition module in the following manner:

The extraction of frame number, the frame number T at interval are carried out to one section of video every identical frame number are as follows:

Wherein, M is the frame number of original video, and N is the video frame number after sampling.

Preferably, the standardization is realized by handling the contrast of each frame image, the contrast of image after processing Are as follows:

WhereinIt is the average gray of entire described image, meetsImage size be r × c。

Preferably, the parameter sets of the convolutional neural networks and LSTM network are (θ₁,θ₂), then system loss function can It is denoted as:

Wherein, X is the abstract characteristics of input signal or level indicates feature, and W is convolution kernel, and b is biasing, and M is entire net The number of stages of network, N are input feature vector quantity, and L is loss function, and t is the network number of plies, and k is characterized mapping channel, and TR indicates instruction Practice, TE indicates test, and R (W) and R (θ) are regularization term.

Preferably, the classifier modules are as follows:

Wherein, { (x⁽¹⁾,y⁽¹⁾),...,(x^(m),y^(m)) it is training set, y⁽ⁱ⁾∈ { 1,2,3 ..., k }, total k classification, p (y=j | x) is the probability that each input x can have each class of correspondence, j=(1,2 ..., k),It is the parameter of model.

In addition, the present invention also provides a kind of human motion recognition methods, which comprises

S1, sampling processing is carried out to the raw video image of input, obtains sampling rear video image；And to sampling rear video Image is standardized, and obtains stadardized video image；

S2, pass through the two-way LSTM network of convolutional neural networks combination depth residual error after beta pruning, extraction characteristics of image；

S3, classified based on described image feature, obtain recognition result.

Preferably, in the S1, the sampling processing carries out in the following manner:

Wherein, M is the frame number of original video, and N is the video frame number after sampling

Preferably, the neural network in the S2, after the beta pruning, it then follows following principle carries out beta pruning:

If | w | < α, respective weights are removed, and wherein w is circulation weight, and α is sensitivity parameter.

Preferably, in the S2, the parameter set of the two-way LSTM network of convolutional neural networks combination depth residual error after beta pruning It is combined into (θ₁,θ₂), then system loss function can be denoted as:

Wherein, TR indicates training, and TE indicates test, and R (W) and R (θ) are regularization term, and X is the abstract characteristics of input signal Or level indicates feature, W is convolution kernel, and b is biasing, and M is the number of stages of whole network, and N is input feature vector quantity, and L is loss Function, t are the network number of plies, and k is characterized mapping channel ....

Preferably, in network training process, the convolutional neural networks and the two-way LSTM network association of depth residual error are instructed Practice.

Compared with prior art, the present invention has following advantages:

1, the present invention combines residual error Module Idea, well solves since too deep network structure generates asking for gradient disappearance Topic, because residual error network can increase the range of choice of feature, thus the coupling ability of Enhanced feature, this structure can be greatly Accelerate training, while performance also has promotion；It efficiently solves and disappears since too deep network structure easily leads to gradient, cause to identify Effect and bad problem.

2, knowledge can be improved for traditional human motion recognition method in human motion recognition method of the invention Other accuracy rate, and arithmetic speed can be improved using technology of prunning branches.

Detailed description of the invention

Fig. 1 is the human action identification process figure of the embodiment of the present invention；

Fig. 2 is the GoogLeNet network structure adjusted of the embodiment of the present invention；

Fig. 3 is the histogram of 102400 connection weights in Conv layers of network model of the embodiment of the present invention；

Fig. 4 is two-way more Memory Neural Networks single layer structure figures of the embodiment of the present invention；

Fig. 5 is the residual error network structure of the embodiment of the present invention.

Specific embodiment

Below in conjunction with the figure in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that the described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, those of ordinary skill in the art's every other implementation obtained under that premise of not paying creative labor Example, shall fall within the protection scope of the present invention.

Embodiment 1

In a specific embodiment, technical solution provided by the present invention can pass through human action identifying system Mode is realized, it is preferred that as shown in Figure 1, including video image sampling module, image standardization module, depth in the system The two-way more Memory Neural Networks modules of residual error, classifier modules are known the human action video of input by above system structure It Wei not the result of the action output.Specifically, can be accomplished in that

One, video image sampling module

For convolutional neural networks, some video images are input in network and extract feature, then video is adopted Quadrat method will directly influence the generalization ability of trained data model.Because for action video complete for one, It is extracted if whole videos participates in image feature information, consecutive frame may have a large amount of information redundancy phenomenon, great Liang Lei Same image can also aggravate the burden of network.In addition, since the frame number of different action videos is not also most identical, rationally Video down-sampling processing be necessary.

More to retain the effective information in human action, and unnecessary redundancy is removed, in the present embodiment Video sampling carries out in the following way:

For entire video, the extraction of frame number, the frame at this interval are carried out to one section of video every identical frame number Number will consider the totalframes size of different video.Assuming that one section of action video has M frame video, finally want down-sampling generate N (N < M) the short-sighted frequency of the movement of frame at this time can take a video image every T frame, form last video image.Here the choosing of T The size of M and N is depended on, under normal conditions:

By the above method, the video of multi input is sampled, and obtains sampling rear video.

Two, image standardization module

The image of different video is there are situations such as not of uniform size, resolution ratio is different, and uneven illumination is even, these situations are to video The result of image classification can generate adverse effect.So video image data before feature extraction, is needed to its expanding data Pretreatment work.In traditional feature extraction algorithm, pretreatment work has image denoising, picture superposition etc., and adopts It is also most important to the pretreatment of image when carrying out the identification of image with neural network.Image should be standardized, so that Their pixel is all in identical and reasonable range, such as [0,1] or [- 1,1].By in [0,1] image with [0, 255] image blend in, it will usually lead to the failure.Picture format is turned into ratio having the same, it is stringent above to say it is unique one The necessary pretreatment of kind.Many computer vision frameworks need standard-sized image, it is therefore necessary to cut or zoomed image with Adapt to the size.However, strictly speaking even the operation of this readjustment ratio is not always necessary.Some convolution moulds Type receives the input of variable-size, and dynamically adjusts their pond area size to keep output size constant.Other volumes Product module type has the output of variable-size, and size is with input auto zoom, such as denoises to each pixel in image Or the model of mark.

The pretreatment of image is realized in the following way in the present embodiment: in deep learning, contrast is commonly referred to as The standard deviation of pixel in image or image-region.Assuming that we have the image X ∈ R an of tensor representation^r×c×3, wherein X_i,j,1Table Show the red gray scale of the i-th row jth column, X_i,j,2Indicate the gray scale of the i-th row jth column green, X_i,j,3Indicate the i-th row jth column blue Gray scale.Then the contrast of whole image can be expressed as follows:

WhereinIt is the average gray of entire picture, image size is r × c, is met

The standardized way proposed in the present embodiment, it is intended to by subtracting its average value from each image, then weigh New scaling makes the standard deviation in its pixel be equal to some constant s to prevent the modified contrast of image.

Three, the two-way more Memory Neural Networks modules of depth residual error

1. beta pruning network characterization extraction module

1.1 in the present embodiment, which can be used GoogLeNet network implementations adjusted

The feature that the present embodiment makes full use of the convergence rate of GoogLeNet network faster than traditional convolutional network, not While influencing last discrimination, training detection time is improved.First to the size of original input picture known to network structure It is fixed as the RGB image of 224x224x3.First layer convolutional layer uses pad=3, convolution kernel size for 7x7, the fixation of step=2 Structure, in this way by obtaining the feature of 64 dimensions after convolution, therefore this layer of feature output is 112x112x64.It is directly defeated after convolutional layer Enter to excitation layer, excitation function equally selects general ReLu function, followed by the pond layer of 3x3 core, and pondization uses max The mode of pool, can calculate becomes 56x56x64 dimension by Chi Huahou feature vector.It then joined norm layers.The second layer Convolution uses the convolution kernel of 3x3, and step-length step, which is set as 1 such feature vector, becomes 56x56x192, is equally then motivated Layer and pond layer output feature are 28x28x192 dimension.1x1,3x3,5x5 convolution kernel are then used, includes such as Fig. 2 institute, 4 branches Show.

Construct the mathematical model of neural network

(1) input and output relation are as follows:

Wherein, X is the abstract characteristics of input signal or level indicates feature, and W is convolution kernel, and b is biasing.

(2) classifier parameters are as follows:

Final output classification are as follows: y=argmax { Y (k) } (6)

(3) on training set, the objective function that is constructed using cross entropy are as follows:

Wherein, TR indicates training, and TE indicates test, and R (W) and R (θ) are regularization term, make parameter rarefaction, prevent from intending It closes.

(4) set the number of stages of whole network as M (in one preferred embodiment, M=3), GoogLeNet network and The parameter sets of LSTM network are (θ₁,θ₂), then system loss function can be denoted as:

Wherein X is the abstract characteristics of input signal or level indicates feature, and W is convolution kernel, and b is biasing, and M is whole network Number of stages, N is input feature vector quantity, and L is loss function, and t is the network number of plies, and k is characterized mapping channel.

On the basis of the above, by prune approach, network operating efficiency is improved.

1.2 pruning method

In optimal cerebral injury (Optimal Brain Damage) algorithm, the conspicuousness of parameter is defined as deleting and is somebody's turn to do Parameter and the variation for causing objective function.Each parameter conspicuousness can be directly assessed from this definition, i.e., temporarily deletes each ginseng Objective function is counted and reappraises, by simplifying objective function E

H in formula_iiIt is diagonal second dervative, u_iIt is parameter to be measured, then calculates the income s of each parameter_i, according to conspicuousness Parameter is ranked up, and deletes some low conspicuousness parameters.

This process needs a kind of effective calculating h_iiMethod, this needs takes a substantial amount of time and computing resource.Most Approximation needed for excellent cerebral injury is mainly cornerwise value on Hessian matrix, and usually there are many off-diagonal Xiang Yuqi Diagonal term is suitable, and therefore, optimal brain surgery (Optimal Brain Surgeon) algorithm has done certain on OBD algorithm It improves, it utilizes the second derivative information of error function, and analytical Prediction weight disturbs the influence degree to function, with from the top down Mode weaken or eliminate certain connection weights, realize structure optimization.However it turns out that, these hypothesis be not it is effective, Also it is not enough to reach optimal beta pruning effect.

The quantity of hidden unit largely determines the ability that network accurately estimates posterior probability.However, neural Connection number in network increases sharply as the quantity of hiding level and unit increases, and the quantity is by actually available The constraint of computing resource.Not only number of parameters is network performance key factor, and understand they be how to come into operation, and Obtain network in it is available can training parameter and connect it is even more important.By taking GoogLeNet model as an example, Fig. 3 is in its convolutional layer Connection weight histogram, wherein x-axis show connection weight, y-axis is the connection number in each section.Filament is normal state point Cloth, average value are -0.01, standard deviation 0.056.If the hypothesis that the weight of small magnitude can be deleted be correctly, As can be seen that many connections may be trimmed to about in Fig. 3.This will greatly reduce the calculation amount of trained network development process, and can be with Improve the generalization ability of network.In the present embodiment, it can be cut off by two state modulator beta pruning processes when w meets (14).

Wherein, α, β are respectively two different sensitivity parameters, and E is objective function, w_iFor weight after update.The sensitivity of α Value depends on network and training data lower than 0.10, β, but adjustable to obtain required trimming accuracy.Through many experiments, Second about first constraint of beam ratio is much weaker.By considering the absolute value of weight, realize that the trimming of identical quantity obtains almost Identical network performance.Therefore, first standard or equivalent in (14) assessment is used only, it is assumed that β=∞.Therefore, such as Fruit | w | < α, then weight is simply removed.In this case, constantly adjustment is to obtain optimal pruning rate.

The gradient descent method updated for weight is generated as many small weights for updating the sum of steps.If it is assumed that for Small weight, new parameter is the independent random variable with same distribution after single updates, then gained weight is normal distribution.Foundation The weight distribution of Fig. 3, it is clear that its actual distribution can be approximate well by normal distribution, and mean value is but right close to zero In the weight of amplitude about 0.1 or so, degree of fitting is simultaneously bad.It follows that at least not every weight 0.1 or so is all It is the human factor generation of training process, this is great reference value to the selection of parameter alpha.

After trimming, network is by re -training, to find the optimal network weight of new more correlation, and retraining Convergence rate it is faster than original training.

1.3 beta pruning neural network implementations

Network beta pruning refers to when one larger neural network of training, removes unwanted part.I.e. by removing one Unit deletes itself and its all connections that is transferred into and out from network.The network architecture of larger size is to initial training Condition is more sensitive, and trims so that network complexity reduction, is conducive to promote generalization ability.In the present embodiment, beta pruning nerve Network is accomplished in that firstly, it starts to update weight by proper network training.However, it is different from conventional exercises, Whether important focus on relationship between the level unit of training front and back.Secondly, trimming low weight connection.It is low that weight is deleted from network In all connections of threshold value, so that dense network is converted to sparse network.Finally, re -training network, remaining dilute to obtain Dredge the final weight of connection.This step is very crucial, if the network after trimming directly carries out in the case where no retraining It uses, precision will receive influence.

2. depth residual error bilateral network module

In the present embodiment, it on the basis of model after standardization, is not finely adjusted merely with the parameter in beta pruning network, Two-way LSTM network is improved using depth residual error structure simultaneously.Traditional convolutional network feature representation ability is limited, mould Type generalization ability is not strong, so that mistake is more when action recognition, but rationally can provide one using residual error structure for whole network Fixed disturbance and regularization, avoids the network of higher from falling into over-fitting state, improves the generalization ability of whole network.

2.1 two-way LSTM networks

It in the present embodiment, is improved for two-way LSTM network, the foundation structure of LSTM network such as Fig. 4.It can be seen that Forward layers and Backward layers are connected to output layer jointly, wherein containing 6 shared weight w 1-w6.

It is calculated one time from 1 moment to t moment forward direction at Forward layers, obtain and saves hidden layer forward of each moment Output.Backward layers along moment t to 1 retrospectively calculate of moment one time, obtain and save hidden layer backward of each moment Output.The result of the output of corresponding moment of Forward layer and Backward layers is finally combined to obtain at each moment final defeated Out, as follows with mathematic(al) representation, wherein h_tFor positive hidden state information；h_t' it is reversed hidden state information:

It is related to first circulation layer in duplicate network, so that side by side now with two layers, then as former state by list entries It is supplied to the input of first layer, and the reversed copy of list entries is supplied to the second layer.In all time steps of list entries In the problem of length can be used, two-way LSTM trains two rather than a LSTM on list entries.This can be provided more for network More context, and causing faster, more fully problem concerning study.

2.2 depth residual errors

For theoretically, the capacity and feature decision ability of model framework can with the network number of plies continuous intensification without It is disconnected to improve.However if the depth for simply increasing network will appear gradient disperse problem, i.e., too deep network structure easily leads to ladder Degree disappears, therefore in the present embodiment, improves for depth residual error network structure.

Assuming that H (x) indicates optimal demapping of the deep neural network after input sample x, traditional convolutional neural networks It is direct fitting H (x)=x, and the expectation regression criterion mapping of depth residual error network, i.e. F (x)=H (x)-x.Since x is input Source images, it is of equal value for can verifying fitting F (x) and be fitted the target of H (x).With this condition, optimal demapping quilt originally It is expressed as H (x)=F (x)+x.As shown in Figure 5.

At this point, in the present embodiment, it is residual by depth is input to by the eigenmatrix obtained after the Processing with Neural Network after beta pruning The two-way LSTM network of difference.Depth residual error network is skipped 2 or 3 convolutional layers, is used between each network by change connection structure The activation of ReLU function, the relationship of interdependence between parameter is reduced with this, the appropriate generation for alleviating over-fitting.In the present embodiment The two-way LSTM network of depth residual error is after skipping 2 two-way LSTM networks as shown in Figure 5, itself is mapped to superimposed layer and convolutional layer Output be added.The two-way LSTM network of depth residual error will be by the eigenmatrix that obtains after the Processing with Neural Network after beta pruning Output is as input, second two-way LSTM network that input sample is mapped to superimposed layer (adder in Fig. 5) and is skipped The output of convolutional layer is added.In this way, before the two-way more Memory Neural Networks of depth residual error are used as by the neural network after beta pruning Network is held, the two-way LSTM network of depth residual error is as back-end network.Wherein the two-way LSTM network of depth residual error is by residual error network Structure is introduced into the two-way LSTM network of depth, and the single two-way LSTM network of depth is improved.

Obviously, the function F (x)=0 for making network go fitting determining approaches optimal function H (x) Yao Rongyi very than optimization It is more.Depth residual error structure can be added in existing depth model, without changing the existing framework of master mould, so that training Performance is more preferable out, the more network models of the number of plies are possibly realized.Even if in this way, the residual error network with identical mapping is with depth Degree is continuously increased similarly that there is multilayer residual error modules to share a small amount of this drawback of gradient information stream.So only few The parameter of part residual error module is updated.In order to solve this problem, the present embodiment introduces the two-way LSTM network of depth, instead of Traditional neural network, and residual error network structure is introduced into the two-way LSTM network of depth, single depth is two-way LSTM network improves, and obtains depth residual error bilateral network, because residual error network increases the range of choice of feature, to increase The strong coupling ability of feature.

Four, classifier modules

In the present embodiment, the classification that classifier carries out human action identification behavioural characteristic is finally set in system, is each Video generates a probability tag.The classifier set-up mode is as follows:

For training set { (x⁽¹⁾,y⁽¹⁾),...,(x^(m),y^(m)), there is y⁽ⁱ⁾∈ { 1,2,3 ..., k } shares k classification, Can there are the Probability p (y=j | x) of corresponding each class, j=(1,2 ..., k) for each input x.It is assumed that function h_θ(x) The vector (vector element and for 1) of k dimension will be exported to indicate this k probability value estimated, as follows:

It is the parameter of model

By above-mentioned identifying system, final realize carries out feature extraction and classification to the raw video image of input, real Now final human action identification.

Embodiment 2

In the present embodiment, it is dynamic to describe a kind of human body for being preferably based on the progress of system described in embodiment 1 in detail Make recognition methods.The described method includes:

S3, classified based on described image feature, obtain recognition result.

If | w | < α, respective weights are removed, and wherein w is circulation weight, and α is sensitivity parameter ....

Wherein, TR indicates training, and TE indicates test, and R (W) and R (θ) are regularization term, and X is the abstract characteristics of input signal Or level indicates feature, W is convolution kernel, and b is biasing, and M is the number of stages of whole network, and N is input feature vector quantity, and L is loss Function, t are the network number of plies, and k is characterized mapping channel.

Preferably, in S3, Classification and Identification mode is as follows:

It is the parameter of model.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify to technical solution documented by previous embodiment or equivalent replacement of some of the technical features；And These are modified or replaceed, the spirit and model of technical solution of the embodiment of the present invention that it does not separate the essence of the corresponding technical solution It encloses.

Claims

1. a kind of human action identifying system, which is characterized in that the system comprises: video image acquisition module, graphics standard Change module, the two-way more Memory Neural Networks modules of depth residual error, classifier modules；

The video image acquisition module will carry out sampling processing to raw video image, obtain sampling rear video image, and defeated Enter and be standardized to image standardization module, obtains stadardized video image；

The two-way more Memory Neural Networks modules of depth residual error include beta pruning network unit, depth residual error bilateral network unit；

The beta pruning network unit handles convolutional neural networks by deleting mode, and the neural network after obtaining beta pruning is described to cut Branch follows following principle:

The depth residual error bilateral network unit, by the two-way LSTM network implementations of depth residual error network integration, fast to connect Mode reduces relation of interdependence between parameter, and itself will be mapped to superimposed layer and be added with the output of convolutional layer；

2. system according to claim 1, which is characterized in that video acquisition module original video in the following manner Image carries out sampling processing:

3. system according to claim 1, which is characterized in that the comparison that the standardization passes through each frame image of processing Degree realization, the contrast of image after processing are as follows:

WhereinIt is the average gray of entire described image, meetsImage size is r × c.

4. system according to claim 1, which is characterized in that the parameter sets of the convolutional neural networks and LSTM network For (θ₁,θ₂), then system loss function can be denoted as:

Wherein, X is the abstract characteristics of input signal or level indicates feature, and W is convolution kernel, and b is biasing, and M is whole network Number of stages, N are input feature vector quantity, and L is loss function, and t is the network number of plies, and k is characterized mapping channel.

5. system according to claim 1, which is characterized in that the classifier modules are as follows:

Wherein, { (x⁽¹⁾,y⁽¹⁾),...,(x^(m),y^(m)) it is training set, y⁽ⁱ⁾∈ { 1,2,3 ..., k }, total k classification, p (y= J | can x) there be the probability of each class of correspondence for each input x, j=(1,2 ..., k),It is The parameter of model.

6. a kind of human motion recognition method, which is characterized in that the described method includes:

S1, sampling processing is carried out to the raw video image of input, obtains sampling rear video image；And to sampling rear video image It is standardized, obtains stadardized video image；

S3, classified based on described image feature, obtain recognition result.

7. according to the method described in claim 6, it is characterized in that, in the S1, the sampling processing in the following manner into Row:

8. according to the method described in claim 6, it is characterized in that, neural network in the S2, after the beta pruning, it then follows with Lower principle carries out beta pruning:

9. according to the method described in claim 6, it is characterized in that, the convolutional neural networks after beta pruning combine deep in the S2 The parameter sets for spending the two-way LSTM network of residual error are (θ₁,θ₂), then system loss function can be denoted as:

10. according to the method described in claim 6, it is characterized in that, nerve net in network training process, after the beta pruning Network and the two-way LSTM network association training of depth residual error.