CN109101858A

CN109101858A - Action identification method and device

Info

Publication number: CN109101858A
Application number: CN201710470470.8A
Authority: CN
Inventors: 胡越予; 刘家瑛; 张昊华; 郭宗明
Original assignee: Peking University; Peking University Founder Group Co Ltd; Beijing Founder Electronics Co Ltd
Current assignee: Peking University
Priority date: 2017-06-20
Filing date: 2017-06-20
Publication date: 2018-12-28
Anticipated expiration: 2037-06-20
Also published as: CN109101858B

Abstract

Action identification method and device provided by the invention determine continuous several frames before receiving target frame and target frame in video data, and extract in video data the data information of the target frame and the data information of continuous several frames before the target frame.The process of convolution of preset times is carried out to the data information of continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame, obtain high-order characteristic, the high-order characteristic is added in video data, form data to be extracted, temporal aspect extraction is carried out to data to be extracted, obtains feature vector, action recognition result is finally obtained according to feature vector, so as to extract the high-order feature of video data, and then improve the accuracy of action recognition.

Description

Action identification method and device

Technical field

The present invention relates to computer vision technique more particularly to a kind of action identification methods and device.

Background technique

With the development of computer vision technique, carrying out action recognition using video capture device becomes research emphasis.It is existing Some action identification methods need to extract the data such as joint position from video flowing, and these data are input to three layers of two-way length Short-term memory recycles in artificial neural network, and the behavioral characteristics of data are extracted by the neural network.Then, dynamic by what is extracted State feature is input to classifier network, final to obtain type of action corresponding with the data of video flowing.

But it due to the limitation of three layers of two-way long short-term memory circulation artificial neural network, is only capable of extraction data and exists Behavioral characteristics in entire timing, can not extract the high-order feature of data at a time.For example: in identification " pushing away " and " beating " both position datas more similar movement when, need by " pushing away " and " beating " at a time or continuous several moment " acceleration " distinguish.And using three layers of two-way long short-term memories circulation artificial neural network to " pushing away " or " beating " into It when row identification, is only capable of extracting " average acceleration " this behavioral characteristics, wink occurs for the movement that do not extract in " pushing away " and " beating " Between and the generation moment before and after " instantaneous acceleration " behavioral characteristics.For needing the high-order feature using data to know Other type of action, existing action identification method, which can not achieve, accurately identifies this kind of type of action.

Summary of the invention

In order to solve the low technical problem of recognition accuracy existing in the prior art, the present invention provides a kind of movement knowledges Other method and device.

For one side, this application provides a kind of action identification methods, comprising:

Receive video data；

It determines continuous several frames before target frame and the target frame, and extracts the target in the video data The data information of continuous several frames before the data information of frame and the target frame；

To continuous several frames before the gain parameter of predetermined number, the data information of the target frame and the target frame Data information carry out preset times process of convolution, obtain high-order characteristic；

The high-order characteristic is added in the video data, data to be extracted are formed；

Temporal aspect extraction is carried out to the data to be extracted, obtains feature vector；

Action recognition result is obtained according to described eigenvector.

Further, before the reception video data, comprising:

Training dataset is received, the training dataset includes several training datas and the corresponding knowledge of each training data Other result；

It is concentrated from the training data and chooses a training data as to training data；

To continuous several frames before the training frames and the training frames in training data described in determining, and described wait instruct Practice the training data letter that the training data information of the training frames and continuous several frames before the training frames are extracted in data Breath；

Before the gain parameter to be trained of predetermined number, the training data information of the training frames and the training frames The training data information of continuous several frames carries out the process of convolution of preset times, obtains high-order feature training data；

The high-order feature training data is added to described in training data, form training data to be extracted；

Temporal aspect extraction is carried out to the training data to be extracted, obtain the training characteristics to training data to Amount；

Prediction result is obtained according to the training feature vector；

The cross entropy to training data corresponding recognition result and the prediction result is obtained, and judges the intersection Whether entropy restrains；

If convergence, by the gain parameter to be trained as gain parameter, and the step for receiving video data is executed Suddenly；

If not restraining, the gain parameter to be trained is modified according to the cross entropy, is concentrated from training data It chooses next training data to be used as to training data, and returns to the determination to the training frames and the training frames in training data The step of continuous several frames before.

Further, it is described to the gain parameter of predetermined number, the data information of the target frame and the target frame it The data information of preceding continuous several frames carries out the process of convolution of preset times, obtains high-order characteristic, comprising:

According to the data information of continuous several frames before the data information of the target frame and the target frame generate to Convolved data；

It is described to carry out convolution with the gain parameter of the predetermined number respectively to convolved data, obtain several convolution knots Fruit；

Several described convolution results are spliced, high-order characteristic is obtained；

Correspondingly, the gain parameter to be trained, the training frames training data information and the instruction to predetermined number Continuous several frame training data information before practicing frame carry out the process of convolution of preset times, obtain high-order feature training data, Include:

Believed according to the training data of continuous several frames before the training data information of the training frames and the training frames Breath is generated to convolution training data；

It is described to carry out convolution with the training gain parameter of the predetermined number respectively to convolution training data, obtain several Training convolutional result；

Several described training convolutional results are spliced, high-order feature training data is obtained.

Further, described that the high-order characteristic is added in the video data, data to be extracted are formed, are wrapped It includes:

The data information of the target frame and the high-order characteristic are packaged to the data for generating updated target frame Information；

The data information of the target frame in the video data is replaced with to the data of the updated target frame Information obtains the data to be extracted；

Correspondingly, described carry out temporal aspect extraction to the data to be extracted, feature vector is obtained, comprising:

Feature extraction is carried out to the data information of each frame in the data to be extracted respectively, obtains the feature of each frame Data；

Average value processing is carried out to the characteristic of the whole frame in the data to be extracted, obtains the data to be extracted Feature vector.

Further, the corresponding gain parameter of each process of convolution, the number of process of convolution is according to high-order characteristic According to kind number determine.

The present invention also provides a kind of action recognition devices, comprising:

Receiving module, for receiving video data；

Data extraction module, for determining continuous several frames before target frame and the target frame, and in the video The data information of the data information of the target frame and continuous several frames before the target frame is extracted in data；

High-order characteristic extracting module, for the data information of gain parameter, the target frame to predetermined number and described The data information of continuous several frames before target frame carries out the process of convolution of preset times, obtains high-order characteristic；Also use It is added in the video data in by the high-order characteristic, forms data to be extracted；

Characteristic vector pickup module obtains feature vector for carrying out temporal aspect extraction to the data to be extracted；

Recognition result obtains module, for obtaining action recognition result according to described eigenvector.

Further, the receiving module is also used to before receiving video data, receives training dataset, the training Data set includes several training datas and the corresponding recognition result of each training data；

The data extraction module, which is also used to concentrate from the training data, chooses a training data as to training data； To continuous several frames before the training frames and the training frames in training data described in being also used to determine, and described wait train The training data information of the training data information of the training frames and continuous several frames before the training frames is extracted in data；

The high-order characteristic extracting module is also used to the training to the gain parameter to be trained, the training frames of predetermined number The training data information of continuous several frames before data information and the training frames carries out the process of convolution of preset times, obtains High-order feature training data；The high-order feature training data is added to described in training data, form training to be extracted Data；

Described eigenvector extraction module is also used to carry out temporal aspect extraction to the training data to be extracted, obtains institute State the training feature vector to training data；

The recognition result obtains module and is also used to obtain prediction result according to the training feature vector；

The action recognition device, further includes: determination module；The determination module is described to training data pair for obtaining The cross entropy of the recognition result and the prediction result answered, and judge whether the cross entropy restrains；

The high-order characteristic extracting module is also used to will be described when the determination module judges cross entropy convergence Gain parameter to be trained is as gain parameter, the step of receiving module executes the reception video data；

The high-order characteristic extracting module is also used to when the determination module judges that the cross entropy is not restrained, according to The cross entropy is modified the gain parameter to be trained, and the data extraction module is also used to concentrate from training data and select Remove a training data be used as to training data, and return the determination in training data training frames and the training frames it The step of preceding continuous several frames.

Further, the high-order characteristic extracting module, is specifically used for:

According to the data information of continuous several frames before the data information of the target frame and the target frame generate to Convolved data；It is described to carry out convolution with the gain parameter of the predetermined number respectively to convolved data, obtain several convolution knots Fruit；Several described convolution results are spliced, high-order characteristic is obtained；

Believed according to the training data of continuous several frames before the training data information of the training frames and the training frames Breath is generated to convolution training data；It is described to be rolled up respectively with the training gain parameter of the predetermined number to convolution training data Product, obtains several training convolutional results；Several described training convolutional results are spliced, high-order feature training number is obtained According to.

Further, the high-order characteristic extracting module, is specifically used for: by the data information of the target frame and the height Rank characteristic is packaged the data information for generating updated target frame；By the data of the target frame in the video data Information replaces with the data information of the updated target frame, obtains the data to be extracted；

Correspondingly, described eigenvector extraction module is specifically used for: respectively to each frame in the data to be extracted Data information carries out feature extraction, obtains the characteristic of each frame；To the characteristic of the whole frame in the data to be extracted According to average value processing is carried out, the feature vector of the data to be extracted is obtained.

Action identification method and device provided by the invention determine before the target frame and target frame received in video data Continuous several frames, and the data information for extracting in video data the target frame and continuous several frames before the target frame Data information.To the data of continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame Information carries out the process of convolution of preset times, obtains high-order characteristic, which is added in video data, Data to be extracted are formed, temporal aspect extraction is carried out to data to be extracted, feature vector is obtained, is finally obtained according to feature vector Action recognition is as a result, so as to extract the high-order feature of video data, and then the accuracy of raising action recognition.

Detailed description of the invention

Fig. 1 is a kind of flow diagram for action identification method that the embodiment of the present invention one provides；

Fig. 2 is a kind of flow diagram of action identification method provided by Embodiment 2 of the present invention；

Fig. 3 is a kind of structural schematic diagram for action recognition device that the embodiment of the present invention three provides；

Fig. 4 is a kind of structural schematic diagram for action recognition device that the embodiment of the present invention four provides.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.

Fig. 1 is a kind of flow diagram for action identification method that the embodiment of the present invention one provides, as shown in Figure 1, this is dynamic Include: as recognition methods

Step 101 receives video data.

It should be noted that executing subject of the invention concretely action recognition device, physical aspect can for by Manage the terminal device of the hardware such as device, memory, logic circuit, electronic chip composition.

Specifically, in a step 101, a video data is received, wherein include the number of several frames in the video data It is believed that breath, and the source of the video data then can for acquisition equipment obtain, can also for derived from other storage mediums or from Network-side downloading obtains, and the present invention is not limited this.

Step 102 determines continuous several frames before target frame and target frame, and target frame is extracted in video data The data information of continuous several frames before data information and target frame.

Step 103, to continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame Data information carry out preset times process of convolution, obtain high-order characteristic.

Wherein, each process of convolution can correspond to a gain parameter, and the number of process of convolution is according to high-order characteristic Kind number determine.

Specifically, a gain parameter can is corresponding to it for each high-order characteristic, at each convolution Reason is only with a gain parameter.For example, a gain can be used for for " acceleration " this high-order feature for needing to extract Parameter is corresponding to it, and " adds so as to can be characterized according to the high-order characteristic that the gain parameter obtain after process of convolution This high-order feature of speed ".

Furthermore, it is understood that in step 103, firstly, can be according to continuous before the data information and target frame of target frame The data information of several frames is generated to convolved data, is then rolled up respectively with the gain parameter of predetermined number to convolved data Product, obtains several convolution results, finally splices several convolution results, obtains high-order characteristic.

High-order characteristic is added in video data by step 104, forms data to be extracted.

Specifically, the high-order characteristic of acquisition is added in the received video data of step 101, is used for being formed Extract the data to be extracted of temporal aspect.

Furthermore, it is understood that at step 104, the data information of target frame and high-order characteristic can be packaged generate first The data information of target frame in video data is then replaced with updated target by the data information of updated target frame The data information of frame, to obtain data to be extracted.

Step 105 carries out temporal aspect extraction to data to be extracted, obtains feature vector.

Specifically, using the algorithm of three layers of two-way long short-term memory circulation artificial neural network, number to be extracted is extracted According to temporal aspect, to obtain a feature vector.Due to having included high-order characteristic in data to be extracted, it treats Extract data extract obtain feature vector can also reflect the video data high-order mark sheet as.

Step 106 obtains action recognition result according to feature vector.

Specifically, classify using classifier network algorithm to feature vector, and obtain and this feature vector Several type of action and corresponding probability matched choose that type of action conduct of maximum probability from several type of action Action recognition result.

Action identification method provided by the invention determines continuous before receiving target frame and target frame in video data Several frames, and the data for the data information and continuous several frames before the target frame for extracting the target frame in video data are believed Breath.To the data informations of continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame into The process of convolution of row preset times, obtain high-order characteristic, which is added in video data, formed to Data are extracted, temporal aspect extraction is carried out to data to be extracted, obtains feature vector, is finally acted and is known according to feature vector acquisition Not as a result, so as to extract the high-order feature of video data, and then improve the accuracy of action recognition.

On the basis of method shown in Fig. 1, Fig. 2 is a kind of process of action identification method provided by Embodiment 2 of the present invention Schematic diagram, as shown in Fig. 2, this method comprises:

Step 200 receives training dataset, and training dataset includes that several training datas and each training data are corresponding Recognition result.

Step 201 is concentrated from training data and chooses a training data as to training data.

Step 202 is determined to continuous several frames before the training frames and training frames in training data, and in number to be trained According to the training data information of continuous several frames before the middle training data information for extracting training frames and training frames.

In order to further describe technical solution provided by the present embodiment two, to include the people of several frames in video data It is illustrated for the location information in each joint of body.

Specifically, what is different from the first embodiment is that the present embodiment two further includes the training process having to gain parameter.? Above-mentioned steps 200 are into step 203, reception training dataset first, and training dataset includes several training datas and each The corresponding recognition result of training data, i.e. training dataset include the training data and each group of training data of several groups The corresponding type of action having confirmed that.Training data therein may include the location information in each joint of human body of several frames, and know The type of action that other result can be stated for the location information in each joint of human body of several frames.Then, it is concentrated from training data It chooses a training data to be used as to training data, continuous several frames before determining training frames and training frames to be treated, and The training data information of continuous several frames before the training data information to extract training frames in training data and training frames.

Step 203, to the gain parameter to be trained of predetermined number, the training data information of training frames and training frames before The training data information of continuous several frames carries out the process of convolution of preset times, obtains high-order feature training data.

Specifically, a gain parameter to be trained can be preset for each high-order characteristic, so that often Secondary process of convolution is only with a gain parameter to be trained.For example, for need " acceleration " this high-order feature for extracting come Say, a gain parameter to be trained can be used and be corresponding to it so that the gain parameter that finishes of training can be used for extracting " acceleration " this One high-order feature.

Furthermore, it is understood that firstly, can be according to continuous several frames before the training data information and training frames of training frames Training data information generate to convolution training data, then to convolution training data respectively with the training gain parameter of predetermined number Convolution is carried out, several training convolutionals is obtained as a result, finally splicing several training convolutional results, obtains high-order feature Training data.

As an example it is assumed that include 30 frame training data information, the training data information of each frame in training data In include 25 human synovials location information, the 20th frame therein be training frames, and 15-19 frame be then training frames it Preceding continuous several frames.In above-mentioned steps, for the first human synovial, training to convolution for the first human synovial can be first obtained Data then carry out first human synovial at convolution to convolution training data and preset first training gain parameter Reason obtains the training convolutional result of the first human synovial.Then, it is processed similarly for the second human synovial, and obtains the Two human synovials to convolution training data.When the human synovials for obtaining the whole for the first training gain parameter are to convolution It after training data, can be processed similarly for preset second training gain parameter, until obtaining all training gain ginsengs Several training convolutional results.Then, the splicing to these training convolutional results and high-order feature training data is obtained.

High-order feature training data is added to in training data, form training data to be extracted by step 204.

Step 205, to training data to be extracted carry out temporal aspect extraction, obtain to training data training characteristics to Amount.

Step 206 obtains prediction result according to training feature vector.

It specifically, first can be by the data information of training frames and the training of high-order feature in step 204 to step 206 Data are packaged the data information for generating updated training frames, then replace the data information to the training frames in training data For the data information of updated training frames, to obtain training data to be extracted.Then, it is followed using three layers of two-way long short-term memory Ring artificial neural network algorithm to training data to be extracted carry out temporal aspect extraction, obtain to training data training characteristics to Amount.Finally, classifying using classifier network algorithm to training feature vector, and obtains and matched with the training feature vector Several prediction action types and corresponding probability, from several prediction action types choose maximum probability that movement class Type is as prediction result.

The cross entropy of step 207, acquisition to training data corresponding recognition result and prediction result, and judge that cross entropy is No convergence.

If it is not, thening follow the steps 208；If so, thening follow the steps 209.

Step 208 is treated trained gain parameter according to cross entropy and is modified, and concentrates from training data and chooses next training Data are used as to training data, and return step 202.

Step 209, will gain parameter be trained as gain parameter.

It specifically, first can will be to the corresponding recognition result of training data and prediction in step 204 to step 206 As a result vector expression is respectively adopted to be indicated, calculates the cross entropy of recognition result and prediction result, and is to the cross entropy No convergence is judged.

When cross entropy is not restrained, then trained gain parameter is treated using the cross entropy and be modified, for example, using will Gain parameter to be trained and the sum of cross entropy are as revised gain parameter to be trained, and then, concentrate and choose from training data Next training data is used as to training data, is returned and is determined to continuous several before the training frames and training frames in training data The step of frame, at this point, when being executed again to step 203, gain parameter to be trained therein will for it is above-mentioned it is revised to Training gain parameter, i.e., when cross entropy is not restrained, treat trained gain parameter be modified it is obtained revised wait instruct Practicing gain parameter will participate in training process as the gain parameter to be trained when training next time, and so circulation is until intersect Entropy convergence.

When cross entropy convergence, the training for treating trained gain parameter has been finished, and at this time can incite somebody to action gain parameter to be trained As gain parameter, for the identification to video data.

Step 210 receives video data.

Step 211 determines continuous several frames before target frame and target frame, and target frame is extracted in video data The data information of continuous several frames before data information and target frame；

Step 212, to continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame Data information carry out preset times process of convolution, obtain high-order characteristic.

High-order characteristic is added in video data by step 213, forms data to be extracted.

Step 214 carries out temporal aspect extraction to data to be extracted, obtains feature vector.

Step 215 obtains action recognition result according to feature vector.

In step 210 to step 215, specifically, video data is received first, may include several in the video data The location information in each joint of the human body of frame, then continuous several frames before determining target frame and target frame to be treated, and The data information of target frame and the data information of continuous several frames before target frame are extracted in video data.According to target frame The data information of continuous several frames before data information and target frame is generated to convolved data.

Then, using the gain parameter for training predetermined number, the data information of target frame and continuous before target frame The data information of several frames carries out the process of convolution of preset times, to obtain high-order characteristic.It specifically, can be according to target The data information of the data information of frame and continuous several frames before target frame is generated to convolved data, then to convolved data point Convolution is not carried out with the gain parameter of predetermined number, several convolution results is obtained, finally spells several convolution results It connects, obtains high-order characteristic.As an example it is assumed that including 30 frame of data information, the data letter of each frame in video data It include the location information of 25 human synovials in breath, the 20th frame therein is target frame, and 15-19 frame is then target frame Continuous several frames before.In above-mentioned steps, can be directed to the first human synovial, can first obtain the first human synovial to convolution First human synovial is then carried out process of convolution to convolved data and preset first gain parameter by data, obtains the The convolution results of one human synovial.Then, be processed similarly for the second human synovial, and obtain the second human synovial to Convolved data.After the human synovials for obtaining the whole for the first gain parameter are to convolved data, the second gain can be used Parameter is processed similarly video data, until obtaining the convolution results of whole gain parameters.Then, to these convolution results Splicing and obtain high-order characteristic, wherein each process of convolution can correspond to a gain parameter, and the number of process of convolution is It is determined according to the kind number of high-order characteristic.

High-order characteristic is added in video data, data to be extracted are formed.Specifically, include by target frame The high-order characteristic that the data information and above-mentioned steps of the location information in each joint of human body obtain, which is packaged, generates updated mesh The data information of frame is marked, and the data that the data information of the target frame in video data replaces with updated target frame are believed Breath, obtains data to be extracted, that is to say, that compared with original video data, in the data letter of the target frame of data to be extracted It further include the high-order characteristic having using above-mentioned steps acquisition in breath, and the data information of other frames in addition to target frame Then remain unchanged.

Temporal aspect extraction is carried out to data to be extracted, obtains feature vector.Specifically, using three layers of two-way length When memory circulation artificial neural network algorithm, feature is carried out to the data information of each frame in data to be extracted respectively and is mentioned It takes, obtains the characteristic of each frame.Then, directly the characteristic of each frame can be spliced, obtains data to be extracted Feature vector；Alternatively, algorithm of averaging can be used, the characteristic of the whole frame in data to be extracted is carried out at mean value Reason, obtains the feature vector of data to be extracted；Alternatively, means clustering algorithm also can be used, to the whole frame in data to be extracted Characteristic carry out average value processing, obtain the feature vector of data to be extracted, those skilled in the art can also be according to practical need Different mean algorithms is selected, the present invention is not limited this.

Finally, classifying using classifier network algorithm to feature vector, and obtain and this feature Vectors matching Several type of action and corresponding probability choose that type of action of maximum probability as movement from several type of action Recognition result.

Action recognition device provided by the invention is also before carrying out action recognition to video data, with including several training The training dataset of data and the corresponding recognition result of each training data is trained gain parameter, has been trained with obtaining Complete gain parameter.Then, continuous several frames before the target frame and target frame in video data are received, and in video data The data information of continuous several frames before the middle data information for extracting the target frame and the target frame.Training to predetermined number The data information of the data information of the gain parameter, target frame that finish and continuous several frames before target frame carries out preset times Process of convolution, obtain high-order characteristic, which is added in video data, data to be extracted are formed, To data to be extracted carry out temporal aspect extraction, obtain feature vector, finally according to feature vector obtain action recognition as a result, from And the high-order feature of video data can be extracted, and then improve the accuracy of action recognition.

Fig. 3 is a kind of structural schematic diagram for action recognition device that the embodiment of the present invention three provides, as shown in figure 3, this hair The action recognition device that bright embodiment three provides specifically includes:

Receiving module 10, for receiving video data；

Data extraction module 20, for determining continuous several frames before target frame and target frame, and in video data Extract the data information of target frame and the data information of continuous several frames before target frame；

High-order characteristic extracting module 30, data information and target frame for gain parameter, target frame to predetermined number The data information of continuous several frames before carries out the process of convolution of preset times, obtains high-order characteristic；Being also used to will be high Rank characteristic is added in video data, forms data to be extracted；

Characteristic vector pickup module 40 obtains feature vector for carrying out temporal aspect extraction to data to be extracted；

Recognition result obtains module 50, for obtaining action recognition result according to feature vector.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description Specific work process and corresponding beneficial effect, can refer to corresponding processes in the foregoing method embodiment, herein no longer It repeats.

Action recognition device provided by the invention determines continuous before receiving target frame and target frame in video data Several frames, and the data for the data information and continuous several frames before the target frame for extracting the target frame in video data are believed Breath.To the data informations of continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame into The process of convolution of row preset times, obtain high-order characteristic, which is added in video data, formed to Data are extracted, temporal aspect extraction is carried out to data to be extracted, obtains feature vector, is finally acted and is known according to feature vector acquisition Not as a result, so as to extract the high-order feature of video data, and then improve the accuracy of action recognition.

On the basis of structure shown in Fig. 3, Fig. 4 is a kind of structure for action recognition device that the embodiment of the present invention four provides Schematic diagram.As shown in figure 4, similarly with embodiment three, which includes:

Receiving module 10, for receiving video data.

Data extraction module 20, for determining continuous several frames before target frame and target frame, and in video data Extract the data information of target frame and the data information of continuous several frames before target frame.

High-order characteristic extracting module 30, data information and target frame for gain parameter, target frame to predetermined number The data information of continuous several frames before carries out the process of convolution of preset times, obtains high-order characteristic；Being also used to will be high Rank characteristic is added in video data, forms data to be extracted.

Characteristic vector pickup module 40 obtains feature vector for carrying out temporal aspect extraction to data to be extracted.

Unlike embodiment three:

Receiving module 10 is also used to before receiving video data, receives training dataset, training dataset includes several Training data and the corresponding recognition result of each training data.

Data extraction module 20, which is also used to concentrate from training data, chooses a training data as to training data；It is also used to It determines to continuous several frames before the training frames and training frames in training data, and to extract training frames in training data The training data information of continuous several frames before training data information and training frames.

High-order characteristic extracting module 30 is also used to the training data letter of the gain parameter to be trained to predetermined number, training frames The training data information of continuous several frames before breath and training frames carries out the process of convolution of preset times, obtains high-order feature instruction Practice data；High-order feature training data is added to in training data, form training data to be extracted.

Characteristic vector pickup module 40 is also used to carry out temporal aspect extraction to training data to be extracted, obtains number to be trained According to training feature vector.

Recognition result obtains module 50 and is also used to obtain prediction result according to training feature vector.

Action recognition device, further includes: determination module 60, determination module 60 is for obtaining to the corresponding identification of training data As a result with the cross entropy of prediction result, and judge whether cross entropy restrains.

High-order characteristic extracting module 30 is also used to will gain ginseng be trained when determination module 60 judges cross entropy convergence Number is used as gain parameter, and receiving module 10 executes the step of receiving video data.

High-order characteristic extracting module 30 is also used to when determination module 60 judges that cross entropy is not restrained, according to cross entropy pair Gain parameter to be trained is modified, and data extraction module 20, which is also used to concentrate from training data, chooses next training data conduct To training data, and return to the step of determining to continuous several frames before the training frames and training frames in training data.

Further, high-order characteristic extracting module 30, is specifically used for:

It is generated according to the data information of continuous several frames before the data information of target frame and target frame to convolved data； Convolution is carried out with the gain parameter of predetermined number respectively to convolved data, obtains several convolution results；By several convolution knots Fruit is spliced, and high-order characteristic is obtained；According to continuous several frames before the training data information and training frames of training frames Training data information generate to convolution training data；To convolution training data respectively with the training gain parameter of predetermined number into Row convolution obtains several training convolutional results；Several training convolutional results are spliced, high-order feature training number is obtained According to.

Further, high-order characteristic extracting module 30, is specifically used for: by the data information of target frame and high-order characteristic It is packaged the data information for generating updated target frame；The data information of target frame in video data is replaced with updated The data information of target frame obtains data to be extracted.

Correspondingly, 40 module of characteristic vector pickup is specifically used for: believing respectively the data of each frame in data to be extracted Breath carries out feature extraction, obtains the characteristic of each frame；Mean value is carried out to the characteristic of the whole frame in data to be extracted Processing, obtains the feature vector of data to be extracted.

Action recognition device provided by the invention is also before carrying out action recognition to video data, with including several training The training dataset of data and the corresponding recognition result of each training data is trained gain parameter, completes instruction to obtain Experienced gain parameter.Then, continuous several frames before the target frame and target frame in video data are received, and in video data The data information of continuous several frames before the middle data information for extracting the target frame and the target frame.Training to predetermined number The data information of the data information of the gain parameter, target frame that finish and continuous several frames before target frame carries out preset times Process of convolution, obtain high-order characteristic, which is added in video data, data to be extracted are formed, To data to be extracted carry out temporal aspect extraction, obtain feature vector, finally according to feature vector obtain action recognition as a result, from And the high-order feature of video data can be extracted, and then improve the accuracy of action recognition.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence；And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of action identification method characterized by comprising

Receive video data；

It determines continuous several frames before target frame and the target frame, and extracts the target frame in the video data The data information of continuous several frames before data information and the target frame；

To the number of continuous several frames before the gain parameter of predetermined number, the data information of the target frame and the target frame It is believed that breath carries out the process of convolution of preset times, high-order characteristic is obtained；

Action recognition result is obtained according to described eigenvector.

2. action identification method according to claim 1, which is characterized in that before the reception video data, comprising:

Training dataset is received, the training dataset includes several training datas and the corresponding identification knot of each training data Fruit；

To continuous several frames before the training frames and the training frames in training data described in determining, and in the number to be trained According to the training data information of continuous several frames before the middle training data information for extracting the training frames and the training frames；

To continuous before the gain parameter to be trained of predetermined number, the training data information of the training frames and the training frames The training data information of several frames carries out the process of convolution of preset times, obtains high-order feature training data；

Temporal aspect extraction is carried out to the training data to be extracted, obtains the training feature vector to training data；

Prediction result is obtained according to the training feature vector；

The cross entropy to training data corresponding recognition result and the prediction result is obtained, and judges that the cross entropy is No convergence；

If convergence, by the gain parameter to be trained as gain parameter, and the step of executing the reception video data；

If not restraining, the gain parameter to be trained is modified according to the cross entropy, concentrates and chooses from training data Next training data is used as to training data, and before returning to the determination to the training frames and the training frames in training data Continuous several frames the step of.

3. action identification method according to claim 2, which is characterized in that the gain parameter to predetermined number, institute The data information for stating the data information of target frame and continuous several frames before the target frame carries out at the convolution of preset times Reason obtains high-order characteristic, comprising:

It is generated according to the data information of continuous several frames before the data information of the target frame and the target frame to convolution Data；

It is described to carry out convolution with the gain parameter of the predetermined number respectively to convolved data, obtain several convolution results；

Correspondingly, the gain parameter to be trained, the training frames training data information and the training frames to predetermined number Continuous several frame training data information before carry out the process of convolution of preset times, obtain high-order feature training data, comprising:

It is raw according to the training data information of continuous several frames before the training data information of the training frames and the training frames At to convolution training data；

It is described to carry out convolution with the training gain parameter of the predetermined number respectively to convolution training data, obtain several training Convolution results；

4. action identification method according to claim 1, which is characterized in that described to be added to the high-order characteristic In the video data, data to be extracted are formed, comprising:

The data information of the target frame and the high-order characteristic are packaged to the data information for generating updated target frame；

The data information of the target frame in the video data is replaced with to the data information of the updated target frame, Obtain the data to be extracted；

Feature extraction is carried out to the data information of each frame in the data to be extracted respectively, obtains the characteristic of each frame According to；

Average value processing is carried out to the characteristic of the whole frame in the data to be extracted, obtains the feature of the data to be extracted Vector.

5. action identification method according to claim 1, which is characterized in that each corresponding gain ginseng of process of convolution Number, the number of process of convolution are determined according to the kind number of high-order characteristic.

6. a kind of action recognition device characterized by comprising

Receiving module, for receiving video data；

Data extraction module, for determining continuous several frames before target frame and the target frame, and in the video data The data information of the middle data information for extracting the target frame and continuous several frames before the target frame；

High-order characteristic extracting module, data information and the target for gain parameter, the target frame to predetermined number The data information of continuous several frames before frame carries out the process of convolution of preset times, obtains high-order characteristic；Being also used to will The high-order characteristic is added in the video data, forms data to be extracted；

7. action recognition device according to claim 6, which is characterized in that

The receiving module is also used to before receiving video data, receives training dataset, if the training dataset includes Dry training data and the corresponding recognition result of each training data；

The data extraction module, which is also used to concentrate from the training data, chooses a training data as to training data；Also use It is described to continuous several frames before the training frames and the training frames in training data in determining, and described to training data The training data information of middle the training data information for extracting the training frames and continuous several frames before the training frames；

The high-order characteristic extracting module is also used to the training data of gain parameter to be trained to predetermined number, the training frames The training data information of continuous several frames before information and the training frames carries out the process of convolution of preset times, obtains high-order Feature training data；The high-order feature training data is added to described in training data, form training data to be extracted；

Described eigenvector extraction module is also used to carry out temporal aspect extraction to the training data to be extracted, obtain it is described to The training feature vector of training data；

The action recognition device, further includes: determination module；The determination module is described corresponding to training data for obtaining The cross entropy of recognition result and the prediction result, and judge whether the cross entropy restrains；

The high-order characteristic extracting module is also used to described when the determination module judges cross entropy convergence wait instruct Practice gain parameter and be used as gain parameter, the step of the receiving module execution reception video data；

The high-order characteristic extracting module is also used to when the determination module judges that the cross entropy is not restrained, according to described Cross entropy is modified the gain parameter to be trained, and the data extraction module is also used to concentrate under selection from training data One training data is used as to training data, and before returning to the determination to the training frames and the training frames in training data The step of continuous several frames.

8. action recognition device according to claim 7, which is characterized in that the high-order characteristic extracting module is specific to use In:

It is generated according to the data information of continuous several frames before the data information of the target frame and the target frame to convolution Data；It is described to carry out convolution with the gain parameter of the predetermined number respectively to convolved data, obtain several convolution results；It will Several described convolution results are spliced, and high-order characteristic is obtained；

It is raw according to the training data information of continuous several frames before the training data information of the training frames and the training frames At to convolution training data；It is described to carry out convolution with the training gain parameter of the predetermined number respectively to convolution training data, Obtain several training convolutional results；Several described training convolutional results are spliced, high-order feature training data is obtained.

9. action recognition device according to claim 6, which is characterized in that

The high-order characteristic extracting module, is specifically used for: the data information of the target frame and the high-order characteristic are beaten Packet generates the data information of updated target frame；The data information of the target frame in the video data is replaced with into institute The data information of updated target frame is stated, the data to be extracted are obtained；

Correspondingly, described eigenvector extraction module is specifically used for: respectively to the data of each frame in the data to be extracted Information carries out feature extraction, obtains the characteristic of each frame；To the characteristic of the whole frame in the data to be extracted into Row average value processing obtains the feature vector of the data to be extracted.

10. action recognition device according to claim 6, which is characterized in that each corresponding gain ginseng of process of convolution Number, the number of process of convolution are determined according to the kind number of high-order characteristic.