CN110222556A - A kind of human action identifying system and method - Google Patents
A kind of human action identifying system and method Download PDFInfo
- Publication number
- CN110222556A CN110222556A CN201910324097.4A CN201910324097A CN110222556A CN 110222556 A CN110222556 A CN 110222556A CN 201910324097 A CN201910324097 A CN 201910324097A CN 110222556 A CN110222556 A CN 110222556A
- Authority
- CN
- China
- Prior art keywords
- network
- residual error
- image
- video image
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of human action identifying system and method, the system comprises: the two-way more Memory Neural Networks modules of video image acquisition module, image standardization module, depth residual error, classifier modules;Video image acquisition module will carry out sampling processing to raw video image, obtain sampling rear video image, and input to image standardization module and be standardized, obtain stadardized video image;The stadardized video image input two-way more Memory Neural Networks modules of depth residual error are handled;The two-way more Memory Neural Networks modules of depth residual error include beta pruning network unit, depth residual error bilateral network unit;Classifier modules are classified based on the output of the two-way more Memory Neural Networks modules of the depth residual error.Technical solution of the present invention well solves gradient disappearance problem, and the coupling ability of Enhanced feature significantly speeds up training, improves the accuracy rate and arithmetic speed of identification.
Description
Technical field
The present invention relates to computer vision field, be based particularly on the human action of neural network visual identifying system and
Method.
Background technique
With the continuous development of the information society, computer vision is increasingly by the concern of every field, and in computer
In visual field, the identification of human action in video is a big important research direction, by the human action in video into
Row identification, can bring huge convenience to fields such as intelligent video monitoring, virtual reality, video security protection and smart homes.Meter
Calculation machine visual field studies computer generation mainly for manually the work such as identification, detection, processing, video is completed to multimedia information
In human action identification be an important research direction in the field again, the identification of human action, for smart home,
The fields such as security protection, virtual reality play a very important role.This sentences safety protection field as an example to carry out
Explanation.The range of security protection is very wide, the small accident early warning into city, is all the model of security protection to Border Protection early warning greatly
It encloses, by the way that the human action identification in video is applied to the field, analysis can be made to the concrete condition of accident, it can also be with
The movement of a suspect on boundary is judged and early warning, to greatly improve the efficiency of security protection and reduce people
The consumption of power material resources, meanwhile, rest can not had in view of computer, the time of work is longer and will not be tired out, moreover it is possible to without dead angle
It is comprehensive be monitored to ensure the features such as safe, which brings great convenience to people's lives, has wide
General application prospect.
Traditional human motion recognition method is first to carry out moving object detection, then carry out feature extraction, finally by feature
Classified to obtain recognition result.There is common feature type in conventional action identification: static nature, behavioral characteristics, space-time are special
It seeks peace descriptive characteristics.Common action identification method is divided into three classes: the method for method, probability statistics based on template is based on
The method of grammer.In order to further increase the discrimination of algorithm, with the proposition of deep learning method, it is based on depth e-learning
The method for automatically extracting feature has been applied in human action identification.However the existing human action identification based on deep learning
In method, the network structures such as traditional AlexNet, VGG are all to obtain better instruction by increasing the depth (number of plies) of network
Practice effect, but the increase of the number of plies can bring many negative interactions, such as the problems such as the disappearance of over-fitting, gradient, gradient explosion.
Summary of the invention
In view of the deficiencies of the prior art, this patent proposes a kind of human motion recognition method, two-way more based on depth residual error
Memory Neural Networks, the network constitute front network module, then the feature square that will be obtained by removing unwanted network unit
Battle array is sent into as input and is trained by the depth bilateral network of residual error structure of modification, is finally classified by classifier, is obtained
Human action recognition result.The training technique of the Web vector graphic " end-to-end ", parameter and depth residual error network to neural network
Combined optimization is carried out, ability to express of the new neural network characteristics in new data set is ensure that, improves the general of whole system
Change performance.Specifically, the present invention provides technical solutions below:
On the one hand, the present invention provides a kind of human action identifying system, the system comprises: video image acquisition mould
The two-way more Memory Neural Networks modules of block, image standardization module, depth residual error, classifier modules;
The video image acquisition module will carry out sampling processing to raw video image, obtain sampling rear video image,
And input to image standardization module and be standardized, obtain stadardized video image;
The stadardized video image input two-way more Memory Neural Networks modules of depth residual error are handled;
The two-way more Memory Neural Networks modules of depth residual error include beta pruning network unit, depth residual error bilateral network list
Member;
The beta pruning network unit handles convolutional neural networks by deleting mode, the neural network after obtaining beta pruning, institute
It states beta pruning and follows following principle:
If | w | < α, respective weights are removed, and wherein w is circulation weight, and α is sensitivity parameter;
The depth residual error bilateral network unit, by the two-way LSTM network implementations of depth residual error network integration, with quick
Connection type reduces relation of interdependence between parameter, and itself will be mapped to superimposed layer and be added with the output of convolutional layer;
Neural network and the two-way LSTM network association training after the beta pruning;
The classifier modules are classified based on the output of the two-way more Memory Neural Networks modules of the depth residual error.
Preferably, raw video image carries out sampling processing to the video acquisition module in the following manner:
The extraction of frame number, the frame number T at interval are carried out to one section of video every identical frame number are as follows:
Wherein, M is the frame number of original video, and N is the video frame number after sampling.
Preferably, the standardization is realized by handling the contrast of each frame image, the contrast of image after processing
Are as follows:
WhereinIt is the average gray of entire described image, meetsImage size be r ×
c。
Preferably, the parameter sets of the convolutional neural networks and LSTM network are (θ1,θ2), then system loss function can
It is denoted as:
Wherein, X is the abstract characteristics of input signal or level indicates feature, and W is convolution kernel, and b is biasing, and M is entire net
The number of stages of network, N are input feature vector quantity, and L is loss function, and t is the network number of plies, and k is characterized mapping channel, and TR indicates instruction
Practice, TE indicates test, and R (W) and R (θ) are regularization term.
Preferably, the classifier modules are as follows:
Wherein, { (x(1),y(1)),...,(x(m),y(m)) it is training set, y(i)∈ { 1,2,3 ..., k }, total k classification, p
(y=j | x) is the probability that each input x can have each class of correspondence, j=(1,2 ..., k),It is the parameter of model.
In addition, the present invention also provides a kind of human motion recognition methods, which comprises
S1, sampling processing is carried out to the raw video image of input, obtains sampling rear video image;And to sampling rear video
Image is standardized, and obtains stadardized video image;
S2, pass through the two-way LSTM network of convolutional neural networks combination depth residual error after beta pruning, extraction characteristics of image;
S3, classified based on described image feature, obtain recognition result.
Preferably, in the S1, the sampling processing carries out in the following manner:
The extraction of frame number, the frame number T at interval are carried out to one section of video every identical frame number are as follows:
Wherein, M is the frame number of original video, and N is the video frame number after sampling
Preferably, the neural network in the S2, after the beta pruning, it then follows following principle carries out beta pruning:
If | w | < α, respective weights are removed, and wherein w is circulation weight, and α is sensitivity parameter.
Preferably, in the S2, the parameter set of the two-way LSTM network of convolutional neural networks combination depth residual error after beta pruning
It is combined into (θ1,θ2), then system loss function can be denoted as:
Wherein, TR indicates training, and TE indicates test, and R (W) and R (θ) are regularization term, and X is the abstract characteristics of input signal
Or level indicates feature, W is convolution kernel, and b is biasing, and M is the number of stages of whole network, and N is input feature vector quantity, and L is loss
Function, t are the network number of plies, and k is characterized mapping channel ....
Preferably, in network training process, the convolutional neural networks and the two-way LSTM network association of depth residual error are instructed
Practice.
Compared with prior art, the present invention has following advantages:
1, the present invention combines residual error Module Idea, well solves since too deep network structure generates asking for gradient disappearance
Topic, because residual error network can increase the range of choice of feature, thus the coupling ability of Enhanced feature, this structure can be greatly
Accelerate training, while performance also has promotion;It efficiently solves and disappears since too deep network structure easily leads to gradient, cause to identify
Effect and bad problem.
2, knowledge can be improved for traditional human motion recognition method in human motion recognition method of the invention
Other accuracy rate, and arithmetic speed can be improved using technology of prunning branches.
Detailed description of the invention
Fig. 1 is the human action identification process figure of the embodiment of the present invention;
Fig. 2 is the GoogLeNet network structure adjusted of the embodiment of the present invention;
Fig. 3 is the histogram of 102400 connection weights in Conv layers of network model of the embodiment of the present invention;
Fig. 4 is two-way more Memory Neural Networks single layer structure figures of the embodiment of the present invention;
Fig. 5 is the residual error network structure of the embodiment of the present invention.
Specific embodiment
Below in conjunction with the figure in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete
Ground description, it is clear that the described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, those of ordinary skill in the art's every other implementation obtained under that premise of not paying creative labor
Example, shall fall within the protection scope of the present invention.
Embodiment 1
In a specific embodiment, technical solution provided by the present invention can pass through human action identifying system
Mode is realized, it is preferred that as shown in Figure 1, including video image sampling module, image standardization module, depth in the system
The two-way more Memory Neural Networks modules of residual error, classifier modules are known the human action video of input by above system structure
It Wei not the result of the action output.Specifically, can be accomplished in that
One, video image sampling module
For convolutional neural networks, some video images are input in network and extract feature, then video is adopted
Quadrat method will directly influence the generalization ability of trained data model.Because for action video complete for one,
It is extracted if whole videos participates in image feature information, consecutive frame may have a large amount of information redundancy phenomenon, great Liang Lei
Same image can also aggravate the burden of network.In addition, since the frame number of different action videos is not also most identical, rationally
Video down-sampling processing be necessary.
More to retain the effective information in human action, and unnecessary redundancy is removed, in the present embodiment
Video sampling carries out in the following way:
For entire video, the extraction of frame number, the frame at this interval are carried out to one section of video every identical frame number
Number will consider the totalframes size of different video.Assuming that one section of action video has M frame video, finally want down-sampling generate N (N <
M) the short-sighted frequency of the movement of frame at this time can take a video image every T frame, form last video image.Here the choosing of T
The size of M and N is depended on, under normal conditions:
By the above method, the video of multi input is sampled, and obtains sampling rear video.
Two, image standardization module
The image of different video is there are situations such as not of uniform size, resolution ratio is different, and uneven illumination is even, these situations are to video
The result of image classification can generate adverse effect.So video image data before feature extraction, is needed to its expanding data
Pretreatment work.In traditional feature extraction algorithm, pretreatment work has image denoising, picture superposition etc., and adopts
It is also most important to the pretreatment of image when carrying out the identification of image with neural network.Image should be standardized, so that
Their pixel is all in identical and reasonable range, such as [0,1] or [- 1,1].By in [0,1] image with [0,
255] image blend in, it will usually lead to the failure.Picture format is turned into ratio having the same, it is stringent above to say it is unique one
The necessary pretreatment of kind.Many computer vision frameworks need standard-sized image, it is therefore necessary to cut or zoomed image with
Adapt to the size.However, strictly speaking even the operation of this readjustment ratio is not always necessary.Some convolution moulds
Type receives the input of variable-size, and dynamically adjusts their pond area size to keep output size constant.Other volumes
Product module type has the output of variable-size, and size is with input auto zoom, such as denoises to each pixel in image
Or the model of mark.
The pretreatment of image is realized in the following way in the present embodiment: in deep learning, contrast is commonly referred to as
The standard deviation of pixel in image or image-region.Assuming that we have the image X ∈ R an of tensor representationr×c×3, wherein Xi,j,1Table
Show the red gray scale of the i-th row jth column, Xi,j,2Indicate the gray scale of the i-th row jth column green, Xi,j,3Indicate the i-th row jth column blue
Gray scale.Then the contrast of whole image can be expressed as follows:
WhereinIt is the average gray of entire picture, image size is r × c, is met
The standardized way proposed in the present embodiment, it is intended to by subtracting its average value from each image, then weigh
New scaling makes the standard deviation in its pixel be equal to some constant s to prevent the modified contrast of image.
Three, the two-way more Memory Neural Networks modules of depth residual error
1. beta pruning network characterization extraction module
1.1 in the present embodiment, which can be used GoogLeNet network implementations adjusted
The feature that the present embodiment makes full use of the convergence rate of GoogLeNet network faster than traditional convolutional network, not
While influencing last discrimination, training detection time is improved.First to the size of original input picture known to network structure
It is fixed as the RGB image of 224x224x3.First layer convolutional layer uses pad=3, convolution kernel size for 7x7, the fixation of step=2
Structure, in this way by obtaining the feature of 64 dimensions after convolution, therefore this layer of feature output is 112x112x64.It is directly defeated after convolutional layer
Enter to excitation layer, excitation function equally selects general ReLu function, followed by the pond layer of 3x3 core, and pondization uses max
The mode of pool, can calculate becomes 56x56x64 dimension by Chi Huahou feature vector.It then joined norm layers.The second layer
Convolution uses the convolution kernel of 3x3, and step-length step, which is set as 1 such feature vector, becomes 56x56x192, is equally then motivated
Layer and pond layer output feature are 28x28x192 dimension.1x1,3x3,5x5 convolution kernel are then used, includes such as Fig. 2 institute, 4 branches
Show.
Construct the mathematical model of neural network
(1) input and output relation are as follows:
Wherein, X is the abstract characteristics of input signal or level indicates feature, and W is convolution kernel, and b is biasing.
(2) classifier parameters are as follows:
Final output classification are as follows: y=argmax { Y (k) } (6)
(3) on training set, the objective function that is constructed using cross entropy are as follows:
Wherein, TR indicates training, and TE indicates test, and R (W) and R (θ) are regularization term, make parameter rarefaction, prevent from intending
It closes.
(4) set the number of stages of whole network as M (in one preferred embodiment, M=3), GoogLeNet network and
The parameter sets of LSTM network are (θ1,θ2), then system loss function can be denoted as:
Wherein X is the abstract characteristics of input signal or level indicates feature, and W is convolution kernel, and b is biasing, and M is whole network
Number of stages, N is input feature vector quantity, and L is loss function, and t is the network number of plies, and k is characterized mapping channel.
On the basis of the above, by prune approach, network operating efficiency is improved.
1.2 pruning method
In optimal cerebral injury (Optimal Brain Damage) algorithm, the conspicuousness of parameter is defined as deleting and is somebody's turn to do
Parameter and the variation for causing objective function.Each parameter conspicuousness can be directly assessed from this definition, i.e., temporarily deletes each ginseng
Objective function is counted and reappraises, by simplifying objective function E
H in formulaiiIt is diagonal second dervative, uiIt is parameter to be measured, then calculates the income s of each parameteri, according to conspicuousness
Parameter is ranked up, and deletes some low conspicuousness parameters.
This process needs a kind of effective calculating hiiMethod, this needs takes a substantial amount of time and computing resource.Most
Approximation needed for excellent cerebral injury is mainly cornerwise value on Hessian matrix, and usually there are many off-diagonal Xiang Yuqi
Diagonal term is suitable, and therefore, optimal brain surgery (Optimal Brain Surgeon) algorithm has done certain on OBD algorithm
It improves, it utilizes the second derivative information of error function, and analytical Prediction weight disturbs the influence degree to function, with from the top down
Mode weaken or eliminate certain connection weights, realize structure optimization.However it turns out that, these hypothesis be not it is effective,
Also it is not enough to reach optimal beta pruning effect.
The quantity of hidden unit largely determines the ability that network accurately estimates posterior probability.However, neural
Connection number in network increases sharply as the quantity of hiding level and unit increases, and the quantity is by actually available
The constraint of computing resource.Not only number of parameters is network performance key factor, and understand they be how to come into operation, and
Obtain network in it is available can training parameter and connect it is even more important.By taking GoogLeNet model as an example, Fig. 3 is in its convolutional layer
Connection weight histogram, wherein x-axis show connection weight, y-axis is the connection number in each section.Filament is normal state point
Cloth, average value are -0.01, standard deviation 0.056.If the hypothesis that the weight of small magnitude can be deleted be correctly,
As can be seen that many connections may be trimmed to about in Fig. 3.This will greatly reduce the calculation amount of trained network development process, and can be with
Improve the generalization ability of network.In the present embodiment, it can be cut off by two state modulator beta pruning processes when w meets (14).
Wherein, α, β are respectively two different sensitivity parameters, and E is objective function, wiFor weight after update.The sensitivity of α
Value depends on network and training data lower than 0.10, β, but adjustable to obtain required trimming accuracy.Through many experiments,
Second about first constraint of beam ratio is much weaker.By considering the absolute value of weight, realize that the trimming of identical quantity obtains almost
Identical network performance.Therefore, first standard or equivalent in (14) assessment is used only, it is assumed that β=∞.Therefore, such as
Fruit | w | < α, then weight is simply removed.In this case, constantly adjustment is to obtain optimal pruning rate.
The gradient descent method updated for weight is generated as many small weights for updating the sum of steps.If it is assumed that for
Small weight, new parameter is the independent random variable with same distribution after single updates, then gained weight is normal distribution.Foundation
The weight distribution of Fig. 3, it is clear that its actual distribution can be approximate well by normal distribution, and mean value is but right close to zero
In the weight of amplitude about 0.1 or so, degree of fitting is simultaneously bad.It follows that at least not every weight 0.1 or so is all
It is the human factor generation of training process, this is great reference value to the selection of parameter alpha.
After trimming, network is by re -training, to find the optimal network weight of new more correlation, and retraining
Convergence rate it is faster than original training.
1.3 beta pruning neural network implementations
Network beta pruning refers to when one larger neural network of training, removes unwanted part.I.e. by removing one
Unit deletes itself and its all connections that is transferred into and out from network.The network architecture of larger size is to initial training
Condition is more sensitive, and trims so that network complexity reduction, is conducive to promote generalization ability.In the present embodiment, beta pruning nerve
Network is accomplished in that firstly, it starts to update weight by proper network training.However, it is different from conventional exercises,
Whether important focus on relationship between the level unit of training front and back.Secondly, trimming low weight connection.It is low that weight is deleted from network
In all connections of threshold value, so that dense network is converted to sparse network.Finally, re -training network, remaining dilute to obtain
Dredge the final weight of connection.This step is very crucial, if the network after trimming directly carries out in the case where no retraining
It uses, precision will receive influence.
2. depth residual error bilateral network module
In the present embodiment, it on the basis of model after standardization, is not finely adjusted merely with the parameter in beta pruning network,
Two-way LSTM network is improved using depth residual error structure simultaneously.Traditional convolutional network feature representation ability is limited, mould
Type generalization ability is not strong, so that mistake is more when action recognition, but rationally can provide one using residual error structure for whole network
Fixed disturbance and regularization, avoids the network of higher from falling into over-fitting state, improves the generalization ability of whole network.
2.1 two-way LSTM networks
It in the present embodiment, is improved for two-way LSTM network, the foundation structure of LSTM network such as Fig. 4.It can be seen that
Forward layers and Backward layers are connected to output layer jointly, wherein containing 6 shared weight w 1-w6.
It is calculated one time from 1 moment to t moment forward direction at Forward layers, obtain and saves hidden layer forward of each moment
Output.Backward layers along moment t to 1 retrospectively calculate of moment one time, obtain and save hidden layer backward of each moment
Output.The result of the output of corresponding moment of Forward layer and Backward layers is finally combined to obtain at each moment final defeated
Out, as follows with mathematic(al) representation, wherein htFor positive hidden state information;ht' it is reversed hidden state information:
It is related to first circulation layer in duplicate network, so that side by side now with two layers, then as former state by list entries
It is supplied to the input of first layer, and the reversed copy of list entries is supplied to the second layer.In all time steps of list entries
In the problem of length can be used, two-way LSTM trains two rather than a LSTM on list entries.This can be provided more for network
More context, and causing faster, more fully problem concerning study.
2.2 depth residual errors
For theoretically, the capacity and feature decision ability of model framework can with the network number of plies continuous intensification without
It is disconnected to improve.However if the depth for simply increasing network will appear gradient disperse problem, i.e., too deep network structure easily leads to ladder
Degree disappears, therefore in the present embodiment, improves for depth residual error network structure.
Assuming that H (x) indicates optimal demapping of the deep neural network after input sample x, traditional convolutional neural networks
It is direct fitting H (x)=x, and the expectation regression criterion mapping of depth residual error network, i.e. F (x)=H (x)-x.Since x is input
Source images, it is of equal value for can verifying fitting F (x) and be fitted the target of H (x).With this condition, optimal demapping quilt originally
It is expressed as H (x)=F (x)+x.As shown in Figure 5.
At this point, in the present embodiment, it is residual by depth is input to by the eigenmatrix obtained after the Processing with Neural Network after beta pruning
The two-way LSTM network of difference.Depth residual error network is skipped 2 or 3 convolutional layers, is used between each network by change connection structure
The activation of ReLU function, the relationship of interdependence between parameter is reduced with this, the appropriate generation for alleviating over-fitting.In the present embodiment
The two-way LSTM network of depth residual error is after skipping 2 two-way LSTM networks as shown in Figure 5, itself is mapped to superimposed layer and convolutional layer
Output be added.The two-way LSTM network of depth residual error will be by the eigenmatrix that obtains after the Processing with Neural Network after beta pruning
Output is as input, second two-way LSTM network that input sample is mapped to superimposed layer (adder in Fig. 5) and is skipped
The output of convolutional layer is added.In this way, before the two-way more Memory Neural Networks of depth residual error are used as by the neural network after beta pruning
Network is held, the two-way LSTM network of depth residual error is as back-end network.Wherein the two-way LSTM network of depth residual error is by residual error network
Structure is introduced into the two-way LSTM network of depth, and the single two-way LSTM network of depth is improved.
Obviously, the function F (x)=0 for making network go fitting determining approaches optimal function H (x) Yao Rongyi very than optimization
It is more.Depth residual error structure can be added in existing depth model, without changing the existing framework of master mould, so that training
Performance is more preferable out, the more network models of the number of plies are possibly realized.Even if in this way, the residual error network with identical mapping is with depth
Degree is continuously increased similarly that there is multilayer residual error modules to share a small amount of this drawback of gradient information stream.So only few
The parameter of part residual error module is updated.In order to solve this problem, the present embodiment introduces the two-way LSTM network of depth, instead of
Traditional neural network, and residual error network structure is introduced into the two-way LSTM network of depth, single depth is two-way
LSTM network improves, and obtains depth residual error bilateral network, because residual error network increases the range of choice of feature, to increase
The strong coupling ability of feature.
Four, classifier modules
In the present embodiment, the classification that classifier carries out human action identification behavioural characteristic is finally set in system, is each
Video generates a probability tag.The classifier set-up mode is as follows:
For training set { (x(1),y(1)),...,(x(m),y(m)), there is y(i)∈ { 1,2,3 ..., k } shares k classification,
Can there are the Probability p (y=j | x) of corresponding each class, j=(1,2 ..., k) for each input x.It is assumed that function hθ(x)
The vector (vector element and for 1) of k dimension will be exported to indicate this k probability value estimated, as follows:
It is the parameter of model
By above-mentioned identifying system, final realize carries out feature extraction and classification to the raw video image of input, real
Now final human action identification.
Embodiment 2
In the present embodiment, it is dynamic to describe a kind of human body for being preferably based on the progress of system described in embodiment 1 in detail
Make recognition methods.The described method includes:
S1, sampling processing is carried out to the raw video image of input, obtains sampling rear video image;And to sampling rear video
Image is standardized, and obtains stadardized video image;
S2, pass through the two-way LSTM network of convolutional neural networks combination depth residual error after beta pruning, extraction characteristics of image;
S3, classified based on described image feature, obtain recognition result.
Preferably, in the S1, the sampling processing carries out in the following manner:
The extraction of frame number, the frame number T at interval are carried out to one section of video every identical frame number are as follows:
Wherein, M is the frame number of original video, and N is the video frame number after sampling
Preferably, the neural network in the S2, after the beta pruning, it then follows following principle carries out beta pruning:
If | w | < α, respective weights are removed, and wherein w is circulation weight, and α is sensitivity parameter ....
Preferably, in the S2, the parameter set of the two-way LSTM network of convolutional neural networks combination depth residual error after beta pruning
It is combined into (θ1,θ2), then system loss function can be denoted as:
Wherein, TR indicates training, and TE indicates test, and R (W) and R (θ) are regularization term, and X is the abstract characteristics of input signal
Or level indicates feature, W is convolution kernel, and b is biasing, and M is the number of stages of whole network, and N is input feature vector quantity, and L is loss
Function, t are the network number of plies, and k is characterized mapping channel.
Preferably, in network training process, the convolutional neural networks and the two-way LSTM network association of depth residual error are instructed
Practice.
Preferably, in S3, Classification and Identification mode is as follows:
For training set { (x(1),y(1)),...,(x(m),y(m)), there is y(i)∈ { 1,2,3 ..., k } shares k classification,
Can there are the Probability p (y=j | x) of corresponding each class, j=(1,2 ..., k) for each input x.It is assumed that function hθ(x)
The vector (vector element and for 1) of k dimension will be exported to indicate this k probability value estimated, as follows:
It is the parameter of model.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify to technical solution documented by previous embodiment or equivalent replacement of some of the technical features;And
These are modified or replaceed, the spirit and model of technical solution of the embodiment of the present invention that it does not separate the essence of the corresponding technical solution
It encloses.
Claims (10)
1. a kind of human action identifying system, which is characterized in that the system comprises: video image acquisition module, graphics standard
Change module, the two-way more Memory Neural Networks modules of depth residual error, classifier modules;
The video image acquisition module will carry out sampling processing to raw video image, obtain sampling rear video image, and defeated
Enter and be standardized to image standardization module, obtains stadardized video image;
The stadardized video image input two-way more Memory Neural Networks modules of depth residual error are handled;
The two-way more Memory Neural Networks modules of depth residual error include beta pruning network unit, depth residual error bilateral network unit;
The beta pruning network unit handles convolutional neural networks by deleting mode, and the neural network after obtaining beta pruning is described to cut
Branch follows following principle:
If | w | < α, respective weights are removed, and wherein w is circulation weight, and α is sensitivity parameter;
The depth residual error bilateral network unit, by the two-way LSTM network implementations of depth residual error network integration, fast to connect
Mode reduces relation of interdependence between parameter, and itself will be mapped to superimposed layer and be added with the output of convolutional layer;
Neural network and the two-way LSTM network association training after the beta pruning;
The classifier modules are classified based on the output of the two-way more Memory Neural Networks modules of the depth residual error.
2. system according to claim 1, which is characterized in that video acquisition module original video in the following manner
Image carries out sampling processing:
The extraction of frame number, the frame number T at interval are carried out to one section of video every identical frame number are as follows:
Wherein, M is the frame number of original video, and N is the video frame number after sampling.
3. system according to claim 1, which is characterized in that the comparison that the standardization passes through each frame image of processing
Degree realization, the contrast of image after processing are as follows:
WhereinIt is the average gray of entire described image, meetsImage size is r × c.
4. system according to claim 1, which is characterized in that the parameter sets of the convolutional neural networks and LSTM network
For (θ1,θ2), then system loss function can be denoted as:
Wherein, X is the abstract characteristics of input signal or level indicates feature, and W is convolution kernel, and b is biasing, and M is whole network
Number of stages, N are input feature vector quantity, and L is loss function, and t is the network number of plies, and k is characterized mapping channel.
5. system according to claim 1, which is characterized in that the classifier modules are as follows:
Wherein, { (x(1),y(1)),...,(x(m),y(m)) it is training set, y(i)∈ { 1,2,3 ..., k }, total k classification, p (y=
J | can x) there be the probability of each class of correspondence for each input x, j=(1,2 ..., k),It is
The parameter of model.
6. a kind of human motion recognition method, which is characterized in that the described method includes:
S1, sampling processing is carried out to the raw video image of input, obtains sampling rear video image;And to sampling rear video image
It is standardized, obtains stadardized video image;
S2, pass through the two-way LSTM network of convolutional neural networks combination depth residual error after beta pruning, extraction characteristics of image;
S3, classified based on described image feature, obtain recognition result.
7. according to the method described in claim 6, it is characterized in that, in the S1, the sampling processing in the following manner into
Row:
The extraction of frame number, the frame number T at interval are carried out to one section of video every identical frame number are as follows:
Wherein, M is the frame number of original video, and N is the video frame number after sampling.
8. according to the method described in claim 6, it is characterized in that, neural network in the S2, after the beta pruning, it then follows with
Lower principle carries out beta pruning:
If | w | < α, respective weights are removed, and wherein w is circulation weight, and α is sensitivity parameter.
9. according to the method described in claim 6, it is characterized in that, the convolutional neural networks after beta pruning combine deep in the S2
The parameter sets for spending the two-way LSTM network of residual error are (θ1,θ2), then system loss function can be denoted as:
Wherein, X is the abstract characteristics of input signal or level indicates feature, and W is convolution kernel, and b is biasing, and M is whole network
Number of stages, N are input feature vector quantity, and L is loss function, and t is the network number of plies, and k is characterized mapping channel.
10. according to the method described in claim 6, it is characterized in that, nerve net in network training process, after the beta pruning
Network and the two-way LSTM network association training of depth residual error.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910324097.4A CN110222556A (en) | 2019-04-22 | 2019-04-22 | A kind of human action identifying system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910324097.4A CN110222556A (en) | 2019-04-22 | 2019-04-22 | A kind of human action identifying system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110222556A true CN110222556A (en) | 2019-09-10 |
Family
ID=67819978
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910324097.4A Pending CN110222556A (en) | 2019-04-22 | 2019-04-22 | A kind of human action identifying system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110222556A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738213A (en) * | 2019-09-20 | 2020-01-31 | 成都芯云微电子有限公司 | image recognition method and device comprising surrounding environment |
CN111008640A (en) * | 2019-10-17 | 2020-04-14 | 平安科技(深圳)有限公司 | Image recognition model training and image recognition method, device, terminal and medium |
CN111401207A (en) * | 2020-03-11 | 2020-07-10 | 福州大学 | Human body action recognition method based on MARS depth feature extraction and enhancement |
CN111931602A (en) * | 2020-07-22 | 2020-11-13 | 北方工业大学 | Multi-stream segmented network human body action identification method and system based on attention mechanism |
CN113469062A (en) * | 2021-07-05 | 2021-10-01 | 中山大学 | Method, system and medium for detecting face exchange tampering video based on key frame face characteristics |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341462A (en) * | 2017-06-28 | 2017-11-10 | 电子科技大学 | A kind of video classification methods based on notice mechanism |
CN108446594A (en) * | 2018-02-11 | 2018-08-24 | 四川省北青数据技术有限公司 | Emergency reaction ability assessment method based on action recognition |
CN108491680A (en) * | 2018-03-07 | 2018-09-04 | 安庆师范大学 | Drug relationship abstracting method based on residual error network and attention mechanism |
-
2019
- 2019-04-22 CN CN201910324097.4A patent/CN110222556A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341462A (en) * | 2017-06-28 | 2017-11-10 | 电子科技大学 | A kind of video classification methods based on notice mechanism |
CN108446594A (en) * | 2018-02-11 | 2018-08-24 | 四川省北青数据技术有限公司 | Emergency reaction ability assessment method based on action recognition |
CN108491680A (en) * | 2018-03-07 | 2018-09-04 | 安庆师范大学 | Drug relationship abstracting method based on residual error network and attention mechanism |
Non-Patent Citations (3)
Title |
---|
YU ZHAO: "Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors", 《HINDAWI》 * |
汤鹏杰等: "基于GoogLeNet多阶段连带优化的图像描述", 《井冈山大学学报(自然科学版)》 * |
王天兴: "基于GoogLeNet网络结构的改进算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738213A (en) * | 2019-09-20 | 2020-01-31 | 成都芯云微电子有限公司 | image recognition method and device comprising surrounding environment |
CN110738213B (en) * | 2019-09-20 | 2022-07-01 | 成都芯云微电子有限公司 | Image identification method and device comprising surrounding environment |
CN111008640A (en) * | 2019-10-17 | 2020-04-14 | 平安科技(深圳)有限公司 | Image recognition model training and image recognition method, device, terminal and medium |
CN111008640B (en) * | 2019-10-17 | 2024-03-19 | 平安科技(深圳)有限公司 | Image recognition model training and image recognition method, device, terminal and medium |
CN111401207A (en) * | 2020-03-11 | 2020-07-10 | 福州大学 | Human body action recognition method based on MARS depth feature extraction and enhancement |
CN111401207B (en) * | 2020-03-11 | 2022-07-08 | 福州大学 | Human body action recognition method based on MARS depth feature extraction and enhancement |
CN111931602A (en) * | 2020-07-22 | 2020-11-13 | 北方工业大学 | Multi-stream segmented network human body action identification method and system based on attention mechanism |
CN111931602B (en) * | 2020-07-22 | 2023-08-08 | 北方工业大学 | Attention mechanism-based multi-flow segmented network human body action recognition method and system |
CN113469062A (en) * | 2021-07-05 | 2021-10-01 | 中山大学 | Method, system and medium for detecting face exchange tampering video based on key frame face characteristics |
CN113469062B (en) * | 2021-07-05 | 2023-07-25 | 中山大学 | Method, system and medium for detecting face exchange tampered video based on key frame face characteristics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110222556A (en) | A kind of human action identifying system and method | |
CN109859190B (en) | Target area detection method based on deep learning | |
Liao et al. | Deep facial spatiotemporal network for engagement prediction in online learning | |
CN106650806B (en) | A kind of cooperating type depth net model methodology for pedestrian detection | |
CN106446930B (en) | Robot operative scenario recognition methods based on deep layer convolutional neural networks | |
CN109741318B (en) | Real-time detection method of single-stage multi-scale specific target based on effective receptive field | |
Ma et al. | Contrast-based image attention analysis by using fuzzy growing | |
CN108446729A (en) | Egg embryo classification method based on convolutional neural networks | |
CN107463920A (en) | A kind of face identification method for eliminating partial occlusion thing and influenceing | |
CN109785300A (en) | A kind of cancer medical image processing method, system, device and storage medium | |
CN109255375A (en) | Panoramic picture method for checking object based on deep learning | |
CN107633522A (en) | Brain image dividing method and system based on local similarity movable contour model | |
CN107145889A (en) | Target identification method based on double CNN networks with RoI ponds | |
CN105654141A (en) | Isomap and SVM algorithm-based overlooked herded pig individual recognition method | |
WO2022198808A1 (en) | Medical image data classification method and system based on bilinear attention network | |
CN112529146B (en) | Neural network model training method and device | |
CN114998210B (en) | Retinopathy of prematurity detecting system based on deep learning target detection | |
CN111062296B (en) | Automatic white blood cell identification and classification method based on computer | |
CN109902558A (en) | A kind of human health deep learning prediction technique based on CNN-LSTM | |
CN107066916A (en) | Scene Semantics dividing method based on deconvolution neutral net | |
CN109190683A (en) | A kind of classification method based on attention mechanism and bimodal image | |
CN110648331A (en) | Detection method for medical image segmentation, medical image segmentation method and device | |
JP2024018938A (en) | Night object detection and training method and device based on frequency domain self-attention mechanism | |
CN114581434A (en) | Pathological image processing method based on deep learning segmentation model and electronic equipment | |
CN108230330A (en) | A kind of quick express highway pavement segmentation and the method for Camera Positioning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190910 |
|
RJ01 | Rejection of invention patent application after publication |