CN109101858A - Action identification method and device - Google Patents
Action identification method and device Download PDFInfo
- Publication number
- CN109101858A CN109101858A CN201710470470.8A CN201710470470A CN109101858A CN 109101858 A CN109101858 A CN 109101858A CN 201710470470 A CN201710470470 A CN 201710470470A CN 109101858 A CN109101858 A CN 109101858A
- Authority
- CN
- China
- Prior art keywords
- data
- training
- target frame
- training data
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Abstract
Action identification method and device provided by the invention determine continuous several frames before receiving target frame and target frame in video data, and extract in video data the data information of the target frame and the data information of continuous several frames before the target frame.The process of convolution of preset times is carried out to the data information of continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame, obtain high-order characteristic, the high-order characteristic is added in video data, form data to be extracted, temporal aspect extraction is carried out to data to be extracted, obtains feature vector, action recognition result is finally obtained according to feature vector, so as to extract the high-order feature of video data, and then improve the accuracy of action recognition.
Description
Technical field
The present invention relates to computer vision technique more particularly to a kind of action identification methods and device.
Background technique
With the development of computer vision technique, carrying out action recognition using video capture device becomes research emphasis.It is existing
Some action identification methods need to extract the data such as joint position from video flowing, and these data are input to three layers of two-way length
Short-term memory recycles in artificial neural network, and the behavioral characteristics of data are extracted by the neural network.Then, dynamic by what is extracted
State feature is input to classifier network, final to obtain type of action corresponding with the data of video flowing.
But it due to the limitation of three layers of two-way long short-term memory circulation artificial neural network, is only capable of extraction data and exists
Behavioral characteristics in entire timing, can not extract the high-order feature of data at a time.For example: in identification " pushing away " and
" beating " both position datas more similar movement when, need by " pushing away " and " beating " at a time or continuous several moment
" acceleration " distinguish.And using three layers of two-way long short-term memories circulation artificial neural network to " pushing away " or " beating " into
It when row identification, is only capable of extracting " average acceleration " this behavioral characteristics, wink occurs for the movement that do not extract in " pushing away " and " beating "
Between and the generation moment before and after " instantaneous acceleration " behavioral characteristics.For needing the high-order feature using data to know
Other type of action, existing action identification method, which can not achieve, accurately identifies this kind of type of action.
Summary of the invention
In order to solve the low technical problem of recognition accuracy existing in the prior art, the present invention provides a kind of movement knowledges
Other method and device.
For one side, this application provides a kind of action identification methods, comprising:
Receive video data;
It determines continuous several frames before target frame and the target frame, and extracts the target in the video data
The data information of continuous several frames before the data information of frame and the target frame;
To continuous several frames before the gain parameter of predetermined number, the data information of the target frame and the target frame
Data information carry out preset times process of convolution, obtain high-order characteristic;
The high-order characteristic is added in the video data, data to be extracted are formed;
Temporal aspect extraction is carried out to the data to be extracted, obtains feature vector;
Action recognition result is obtained according to described eigenvector.
Further, before the reception video data, comprising:
Training dataset is received, the training dataset includes several training datas and the corresponding knowledge of each training data
Other result;
It is concentrated from the training data and chooses a training data as to training data;
To continuous several frames before the training frames and the training frames in training data described in determining, and described wait instruct
Practice the training data letter that the training data information of the training frames and continuous several frames before the training frames are extracted in data
Breath;
Before the gain parameter to be trained of predetermined number, the training data information of the training frames and the training frames
The training data information of continuous several frames carries out the process of convolution of preset times, obtains high-order feature training data;
The high-order feature training data is added to described in training data, form training data to be extracted;
Temporal aspect extraction is carried out to the training data to be extracted, obtain the training characteristics to training data to
Amount;
Prediction result is obtained according to the training feature vector;
The cross entropy to training data corresponding recognition result and the prediction result is obtained, and judges the intersection
Whether entropy restrains;
If convergence, by the gain parameter to be trained as gain parameter, and the step for receiving video data is executed
Suddenly;
If not restraining, the gain parameter to be trained is modified according to the cross entropy, is concentrated from training data
It chooses next training data to be used as to training data, and returns to the determination to the training frames and the training frames in training data
The step of continuous several frames before.
Further, it is described to the gain parameter of predetermined number, the data information of the target frame and the target frame it
The data information of preceding continuous several frames carries out the process of convolution of preset times, obtains high-order characteristic, comprising:
According to the data information of continuous several frames before the data information of the target frame and the target frame generate to
Convolved data;
It is described to carry out convolution with the gain parameter of the predetermined number respectively to convolved data, obtain several convolution knots
Fruit;
Several described convolution results are spliced, high-order characteristic is obtained;
Correspondingly, the gain parameter to be trained, the training frames training data information and the instruction to predetermined number
Continuous several frame training data information before practicing frame carry out the process of convolution of preset times, obtain high-order feature training data,
Include:
Believed according to the training data of continuous several frames before the training data information of the training frames and the training frames
Breath is generated to convolution training data;
It is described to carry out convolution with the training gain parameter of the predetermined number respectively to convolution training data, obtain several
Training convolutional result;
Several described training convolutional results are spliced, high-order feature training data is obtained.
Further, described that the high-order characteristic is added in the video data, data to be extracted are formed, are wrapped
It includes:
The data information of the target frame and the high-order characteristic are packaged to the data for generating updated target frame
Information;
The data information of the target frame in the video data is replaced with to the data of the updated target frame
Information obtains the data to be extracted;
Correspondingly, described carry out temporal aspect extraction to the data to be extracted, feature vector is obtained, comprising:
Feature extraction is carried out to the data information of each frame in the data to be extracted respectively, obtains the feature of each frame
Data;
Average value processing is carried out to the characteristic of the whole frame in the data to be extracted, obtains the data to be extracted
Feature vector.
Further, the corresponding gain parameter of each process of convolution, the number of process of convolution is according to high-order characteristic
According to kind number determine.
The present invention also provides a kind of action recognition devices, comprising:
Receiving module, for receiving video data;
Data extraction module, for determining continuous several frames before target frame and the target frame, and in the video
The data information of the data information of the target frame and continuous several frames before the target frame is extracted in data;
High-order characteristic extracting module, for the data information of gain parameter, the target frame to predetermined number and described
The data information of continuous several frames before target frame carries out the process of convolution of preset times, obtains high-order characteristic;Also use
It is added in the video data in by the high-order characteristic, forms data to be extracted;
Characteristic vector pickup module obtains feature vector for carrying out temporal aspect extraction to the data to be extracted;
Recognition result obtains module, for obtaining action recognition result according to described eigenvector.
Further, the receiving module is also used to before receiving video data, receives training dataset, the training
Data set includes several training datas and the corresponding recognition result of each training data;
The data extraction module, which is also used to concentrate from the training data, chooses a training data as to training data;
To continuous several frames before the training frames and the training frames in training data described in being also used to determine, and described wait train
The training data information of the training data information of the training frames and continuous several frames before the training frames is extracted in data;
The high-order characteristic extracting module is also used to the training to the gain parameter to be trained, the training frames of predetermined number
The training data information of continuous several frames before data information and the training frames carries out the process of convolution of preset times, obtains
High-order feature training data;The high-order feature training data is added to described in training data, form training to be extracted
Data;
Described eigenvector extraction module is also used to carry out temporal aspect extraction to the training data to be extracted, obtains institute
State the training feature vector to training data;
The recognition result obtains module and is also used to obtain prediction result according to the training feature vector;
The action recognition device, further includes: determination module;The determination module is described to training data pair for obtaining
The cross entropy of the recognition result and the prediction result answered, and judge whether the cross entropy restrains;
The high-order characteristic extracting module is also used to will be described when the determination module judges cross entropy convergence
Gain parameter to be trained is as gain parameter, the step of receiving module executes the reception video data;
The high-order characteristic extracting module is also used to when the determination module judges that the cross entropy is not restrained, according to
The cross entropy is modified the gain parameter to be trained, and the data extraction module is also used to concentrate from training data and select
Remove a training data be used as to training data, and return the determination in training data training frames and the training frames it
The step of preceding continuous several frames.
Further, the high-order characteristic extracting module, is specifically used for:
According to the data information of continuous several frames before the data information of the target frame and the target frame generate to
Convolved data;It is described to carry out convolution with the gain parameter of the predetermined number respectively to convolved data, obtain several convolution knots
Fruit;Several described convolution results are spliced, high-order characteristic is obtained;
Believed according to the training data of continuous several frames before the training data information of the training frames and the training frames
Breath is generated to convolution training data;It is described to be rolled up respectively with the training gain parameter of the predetermined number to convolution training data
Product, obtains several training convolutional results;Several described training convolutional results are spliced, high-order feature training number is obtained
According to.
Further, the high-order characteristic extracting module, is specifically used for: by the data information of the target frame and the height
Rank characteristic is packaged the data information for generating updated target frame;By the data of the target frame in the video data
Information replaces with the data information of the updated target frame, obtains the data to be extracted;
Correspondingly, described eigenvector extraction module is specifically used for: respectively to each frame in the data to be extracted
Data information carries out feature extraction, obtains the characteristic of each frame;To the characteristic of the whole frame in the data to be extracted
According to average value processing is carried out, the feature vector of the data to be extracted is obtained.
Further, the corresponding gain parameter of each process of convolution, the number of process of convolution is according to high-order characteristic
According to kind number determine.
Action identification method and device provided by the invention determine before the target frame and target frame received in video data
Continuous several frames, and the data information for extracting in video data the target frame and continuous several frames before the target frame
Data information.To the data of continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame
Information carries out the process of convolution of preset times, obtains high-order characteristic, which is added in video data,
Data to be extracted are formed, temporal aspect extraction is carried out to data to be extracted, feature vector is obtained, is finally obtained according to feature vector
Action recognition is as a result, so as to extract the high-order feature of video data, and then the accuracy of raising action recognition.
Detailed description of the invention
Fig. 1 is a kind of flow diagram for action identification method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow diagram of action identification method provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of structural schematic diagram for action recognition device that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural schematic diagram for action recognition device that the embodiment of the present invention four provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.
Fig. 1 is a kind of flow diagram for action identification method that the embodiment of the present invention one provides, as shown in Figure 1, this is dynamic
Include: as recognition methods
Step 101 receives video data.
It should be noted that executing subject of the invention concretely action recognition device, physical aspect can for by
Manage the terminal device of the hardware such as device, memory, logic circuit, electronic chip composition.
Specifically, in a step 101, a video data is received, wherein include the number of several frames in the video data
It is believed that breath, and the source of the video data then can for acquisition equipment obtain, can also for derived from other storage mediums or from
Network-side downloading obtains, and the present invention is not limited this.
Step 102 determines continuous several frames before target frame and target frame, and target frame is extracted in video data
The data information of continuous several frames before data information and target frame.
Step 103, to continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame
Data information carry out preset times process of convolution, obtain high-order characteristic.
Wherein, each process of convolution can correspond to a gain parameter, and the number of process of convolution is according to high-order characteristic
Kind number determine.
Specifically, a gain parameter can is corresponding to it for each high-order characteristic, at each convolution
Reason is only with a gain parameter.For example, a gain can be used for for " acceleration " this high-order feature for needing to extract
Parameter is corresponding to it, and " adds so as to can be characterized according to the high-order characteristic that the gain parameter obtain after process of convolution
This high-order feature of speed ".
Furthermore, it is understood that in step 103, firstly, can be according to continuous before the data information and target frame of target frame
The data information of several frames is generated to convolved data, is then rolled up respectively with the gain parameter of predetermined number to convolved data
Product, obtains several convolution results, finally splices several convolution results, obtains high-order characteristic.
High-order characteristic is added in video data by step 104, forms data to be extracted.
Specifically, the high-order characteristic of acquisition is added in the received video data of step 101, is used for being formed
Extract the data to be extracted of temporal aspect.
Furthermore, it is understood that at step 104, the data information of target frame and high-order characteristic can be packaged generate first
The data information of target frame in video data is then replaced with updated target by the data information of updated target frame
The data information of frame, to obtain data to be extracted.
Step 105 carries out temporal aspect extraction to data to be extracted, obtains feature vector.
Specifically, using the algorithm of three layers of two-way long short-term memory circulation artificial neural network, number to be extracted is extracted
According to temporal aspect, to obtain a feature vector.Due to having included high-order characteristic in data to be extracted, it treats
Extract data extract obtain feature vector can also reflect the video data high-order mark sheet as.
Step 106 obtains action recognition result according to feature vector.
Specifically, classify using classifier network algorithm to feature vector, and obtain and this feature vector
Several type of action and corresponding probability matched choose that type of action conduct of maximum probability from several type of action
Action recognition result.
Action identification method provided by the invention determines continuous before receiving target frame and target frame in video data
Several frames, and the data for the data information and continuous several frames before the target frame for extracting the target frame in video data are believed
Breath.To the data informations of continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame into
The process of convolution of row preset times, obtain high-order characteristic, which is added in video data, formed to
Data are extracted, temporal aspect extraction is carried out to data to be extracted, obtains feature vector, is finally acted and is known according to feature vector acquisition
Not as a result, so as to extract the high-order feature of video data, and then improve the accuracy of action recognition.
On the basis of method shown in Fig. 1, Fig. 2 is a kind of process of action identification method provided by Embodiment 2 of the present invention
Schematic diagram, as shown in Fig. 2, this method comprises:
Step 200 receives training dataset, and training dataset includes that several training datas and each training data are corresponding
Recognition result.
Step 201 is concentrated from training data and chooses a training data as to training data.
Step 202 is determined to continuous several frames before the training frames and training frames in training data, and in number to be trained
According to the training data information of continuous several frames before the middle training data information for extracting training frames and training frames.
In order to further describe technical solution provided by the present embodiment two, to include the people of several frames in video data
It is illustrated for the location information in each joint of body.
Specifically, what is different from the first embodiment is that the present embodiment two further includes the training process having to gain parameter.?
Above-mentioned steps 200 are into step 203, reception training dataset first, and training dataset includes several training datas and each
The corresponding recognition result of training data, i.e. training dataset include the training data and each group of training data of several groups
The corresponding type of action having confirmed that.Training data therein may include the location information in each joint of human body of several frames, and know
The type of action that other result can be stated for the location information in each joint of human body of several frames.Then, it is concentrated from training data
It chooses a training data to be used as to training data, continuous several frames before determining training frames and training frames to be treated, and
The training data information of continuous several frames before the training data information to extract training frames in training data and training frames.
Step 203, to the gain parameter to be trained of predetermined number, the training data information of training frames and training frames before
The training data information of continuous several frames carries out the process of convolution of preset times, obtains high-order feature training data.
Wherein, each process of convolution can correspond to a gain parameter, and the number of process of convolution is according to high-order characteristic
Kind number determine.
Specifically, a gain parameter to be trained can be preset for each high-order characteristic, so that often
Secondary process of convolution is only with a gain parameter to be trained.For example, for need " acceleration " this high-order feature for extracting come
Say, a gain parameter to be trained can be used and be corresponding to it so that the gain parameter that finishes of training can be used for extracting " acceleration " this
One high-order feature.
Furthermore, it is understood that firstly, can be according to continuous several frames before the training data information and training frames of training frames
Training data information generate to convolution training data, then to convolution training data respectively with the training gain parameter of predetermined number
Convolution is carried out, several training convolutionals is obtained as a result, finally splicing several training convolutional results, obtains high-order feature
Training data.
As an example it is assumed that include 30 frame training data information, the training data information of each frame in training data
In include 25 human synovials location information, the 20th frame therein be training frames, and 15-19 frame be then training frames it
Preceding continuous several frames.In above-mentioned steps, for the first human synovial, training to convolution for the first human synovial can be first obtained
Data then carry out first human synovial at convolution to convolution training data and preset first training gain parameter
Reason obtains the training convolutional result of the first human synovial.Then, it is processed similarly for the second human synovial, and obtains the
Two human synovials to convolution training data.When the human synovials for obtaining the whole for the first training gain parameter are to convolution
It after training data, can be processed similarly for preset second training gain parameter, until obtaining all training gain ginsengs
Several training convolutional results.Then, the splicing to these training convolutional results and high-order feature training data is obtained.
High-order feature training data is added to in training data, form training data to be extracted by step 204.
Step 205, to training data to be extracted carry out temporal aspect extraction, obtain to training data training characteristics to
Amount.
Step 206 obtains prediction result according to training feature vector.
It specifically, first can be by the data information of training frames and the training of high-order feature in step 204 to step 206
Data are packaged the data information for generating updated training frames, then replace the data information to the training frames in training data
For the data information of updated training frames, to obtain training data to be extracted.Then, it is followed using three layers of two-way long short-term memory
Ring artificial neural network algorithm to training data to be extracted carry out temporal aspect extraction, obtain to training data training characteristics to
Amount.Finally, classifying using classifier network algorithm to training feature vector, and obtains and matched with the training feature vector
Several prediction action types and corresponding probability, from several prediction action types choose maximum probability that movement class
Type is as prediction result.
The cross entropy of step 207, acquisition to training data corresponding recognition result and prediction result, and judge that cross entropy is
No convergence.
If it is not, thening follow the steps 208;If so, thening follow the steps 209.
Step 208 is treated trained gain parameter according to cross entropy and is modified, and concentrates from training data and chooses next training
Data are used as to training data, and return step 202.
Step 209, will gain parameter be trained as gain parameter.
It specifically, first can will be to the corresponding recognition result of training data and prediction in step 204 to step 206
As a result vector expression is respectively adopted to be indicated, calculates the cross entropy of recognition result and prediction result, and is to the cross entropy
No convergence is judged.
When cross entropy is not restrained, then trained gain parameter is treated using the cross entropy and be modified, for example, using will
Gain parameter to be trained and the sum of cross entropy are as revised gain parameter to be trained, and then, concentrate and choose from training data
Next training data is used as to training data, is returned and is determined to continuous several before the training frames and training frames in training data
The step of frame, at this point, when being executed again to step 203, gain parameter to be trained therein will for it is above-mentioned it is revised to
Training gain parameter, i.e., when cross entropy is not restrained, treat trained gain parameter be modified it is obtained revised wait instruct
Practicing gain parameter will participate in training process as the gain parameter to be trained when training next time, and so circulation is until intersect
Entropy convergence.
When cross entropy convergence, the training for treating trained gain parameter has been finished, and at this time can incite somebody to action gain parameter to be trained
As gain parameter, for the identification to video data.
Step 210 receives video data.
Step 211 determines continuous several frames before target frame and target frame, and target frame is extracted in video data
The data information of continuous several frames before data information and target frame;
Step 212, to continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame
Data information carry out preset times process of convolution, obtain high-order characteristic.
High-order characteristic is added in video data by step 213, forms data to be extracted.
Step 214 carries out temporal aspect extraction to data to be extracted, obtains feature vector.
Step 215 obtains action recognition result according to feature vector.
In step 210 to step 215, specifically, video data is received first, may include several in the video data
The location information in each joint of the human body of frame, then continuous several frames before determining target frame and target frame to be treated, and
The data information of target frame and the data information of continuous several frames before target frame are extracted in video data.According to target frame
The data information of continuous several frames before data information and target frame is generated to convolved data.
Then, using the gain parameter for training predetermined number, the data information of target frame and continuous before target frame
The data information of several frames carries out the process of convolution of preset times, to obtain high-order characteristic.It specifically, can be according to target
The data information of the data information of frame and continuous several frames before target frame is generated to convolved data, then to convolved data point
Convolution is not carried out with the gain parameter of predetermined number, several convolution results is obtained, finally spells several convolution results
It connects, obtains high-order characteristic.As an example it is assumed that including 30 frame of data information, the data letter of each frame in video data
It include the location information of 25 human synovials in breath, the 20th frame therein is target frame, and 15-19 frame is then target frame
Continuous several frames before.In above-mentioned steps, can be directed to the first human synovial, can first obtain the first human synovial to convolution
First human synovial is then carried out process of convolution to convolved data and preset first gain parameter by data, obtains the
The convolution results of one human synovial.Then, be processed similarly for the second human synovial, and obtain the second human synovial to
Convolved data.After the human synovials for obtaining the whole for the first gain parameter are to convolved data, the second gain can be used
Parameter is processed similarly video data, until obtaining the convolution results of whole gain parameters.Then, to these convolution results
Splicing and obtain high-order characteristic, wherein each process of convolution can correspond to a gain parameter, and the number of process of convolution is
It is determined according to the kind number of high-order characteristic.
High-order characteristic is added in video data, data to be extracted are formed.Specifically, include by target frame
The high-order characteristic that the data information and above-mentioned steps of the location information in each joint of human body obtain, which is packaged, generates updated mesh
The data information of frame is marked, and the data that the data information of the target frame in video data replaces with updated target frame are believed
Breath, obtains data to be extracted, that is to say, that compared with original video data, in the data letter of the target frame of data to be extracted
It further include the high-order characteristic having using above-mentioned steps acquisition in breath, and the data information of other frames in addition to target frame
Then remain unchanged.
Temporal aspect extraction is carried out to data to be extracted, obtains feature vector.Specifically, using three layers of two-way length
When memory circulation artificial neural network algorithm, feature is carried out to the data information of each frame in data to be extracted respectively and is mentioned
It takes, obtains the characteristic of each frame.Then, directly the characteristic of each frame can be spliced, obtains data to be extracted
Feature vector;Alternatively, algorithm of averaging can be used, the characteristic of the whole frame in data to be extracted is carried out at mean value
Reason, obtains the feature vector of data to be extracted;Alternatively, means clustering algorithm also can be used, to the whole frame in data to be extracted
Characteristic carry out average value processing, obtain the feature vector of data to be extracted, those skilled in the art can also be according to practical need
Different mean algorithms is selected, the present invention is not limited this.
Finally, classifying using classifier network algorithm to feature vector, and obtain and this feature Vectors matching
Several type of action and corresponding probability choose that type of action of maximum probability as movement from several type of action
Recognition result.
Action recognition device provided by the invention is also before carrying out action recognition to video data, with including several training
The training dataset of data and the corresponding recognition result of each training data is trained gain parameter, has been trained with obtaining
Complete gain parameter.Then, continuous several frames before the target frame and target frame in video data are received, and in video data
The data information of continuous several frames before the middle data information for extracting the target frame and the target frame.Training to predetermined number
The data information of the data information of the gain parameter, target frame that finish and continuous several frames before target frame carries out preset times
Process of convolution, obtain high-order characteristic, which is added in video data, data to be extracted are formed,
To data to be extracted carry out temporal aspect extraction, obtain feature vector, finally according to feature vector obtain action recognition as a result, from
And the high-order feature of video data can be extracted, and then improve the accuracy of action recognition.
Fig. 3 is a kind of structural schematic diagram for action recognition device that the embodiment of the present invention three provides, as shown in figure 3, this hair
The action recognition device that bright embodiment three provides specifically includes:
Receiving module 10, for receiving video data;
Data extraction module 20, for determining continuous several frames before target frame and target frame, and in video data
Extract the data information of target frame and the data information of continuous several frames before target frame;
High-order characteristic extracting module 30, data information and target frame for gain parameter, target frame to predetermined number
The data information of continuous several frames before carries out the process of convolution of preset times, obtains high-order characteristic;Being also used to will be high
Rank characteristic is added in video data, forms data to be extracted;
Characteristic vector pickup module 40 obtains feature vector for carrying out temporal aspect extraction to data to be extracted;
Recognition result obtains module 50, for obtaining action recognition result according to feature vector.
Further, the corresponding gain parameter of each process of convolution, the number of process of convolution is according to high-order characteristic
According to kind number determine.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
Specific work process and corresponding beneficial effect, can refer to corresponding processes in the foregoing method embodiment, herein no longer
It repeats.
Action recognition device provided by the invention determines continuous before receiving target frame and target frame in video data
Several frames, and the data for the data information and continuous several frames before the target frame for extracting the target frame in video data are believed
Breath.To the data informations of continuous several frames before the gain parameter of predetermined number, the data information of target frame and target frame into
The process of convolution of row preset times, obtain high-order characteristic, which is added in video data, formed to
Data are extracted, temporal aspect extraction is carried out to data to be extracted, obtains feature vector, is finally acted and is known according to feature vector acquisition
Not as a result, so as to extract the high-order feature of video data, and then improve the accuracy of action recognition.
On the basis of structure shown in Fig. 3, Fig. 4 is a kind of structure for action recognition device that the embodiment of the present invention four provides
Schematic diagram.As shown in figure 4, similarly with embodiment three, which includes:
Receiving module 10, for receiving video data.
Data extraction module 20, for determining continuous several frames before target frame and target frame, and in video data
Extract the data information of target frame and the data information of continuous several frames before target frame.
High-order characteristic extracting module 30, data information and target frame for gain parameter, target frame to predetermined number
The data information of continuous several frames before carries out the process of convolution of preset times, obtains high-order characteristic;Being also used to will be high
Rank characteristic is added in video data, forms data to be extracted.
Characteristic vector pickup module 40 obtains feature vector for carrying out temporal aspect extraction to data to be extracted.
Recognition result obtains module 50, for obtaining action recognition result according to feature vector.
Unlike embodiment three:
Receiving module 10 is also used to before receiving video data, receives training dataset, training dataset includes several
Training data and the corresponding recognition result of each training data.
Data extraction module 20, which is also used to concentrate from training data, chooses a training data as to training data;It is also used to
It determines to continuous several frames before the training frames and training frames in training data, and to extract training frames in training data
The training data information of continuous several frames before training data information and training frames.
High-order characteristic extracting module 30 is also used to the training data letter of the gain parameter to be trained to predetermined number, training frames
The training data information of continuous several frames before breath and training frames carries out the process of convolution of preset times, obtains high-order feature instruction
Practice data;High-order feature training data is added to in training data, form training data to be extracted.
Characteristic vector pickup module 40 is also used to carry out temporal aspect extraction to training data to be extracted, obtains number to be trained
According to training feature vector.
Recognition result obtains module 50 and is also used to obtain prediction result according to training feature vector.
Action recognition device, further includes: determination module 60, determination module 60 is for obtaining to the corresponding identification of training data
As a result with the cross entropy of prediction result, and judge whether cross entropy restrains.
High-order characteristic extracting module 30 is also used to will gain ginseng be trained when determination module 60 judges cross entropy convergence
Number is used as gain parameter, and receiving module 10 executes the step of receiving video data.
High-order characteristic extracting module 30 is also used to when determination module 60 judges that cross entropy is not restrained, according to cross entropy pair
Gain parameter to be trained is modified, and data extraction module 20, which is also used to concentrate from training data, chooses next training data conduct
To training data, and return to the step of determining to continuous several frames before the training frames and training frames in training data.
Further, high-order characteristic extracting module 30, is specifically used for:
It is generated according to the data information of continuous several frames before the data information of target frame and target frame to convolved data;
Convolution is carried out with the gain parameter of predetermined number respectively to convolved data, obtains several convolution results;By several convolution knots
Fruit is spliced, and high-order characteristic is obtained;According to continuous several frames before the training data information and training frames of training frames
Training data information generate to convolution training data;To convolution training data respectively with the training gain parameter of predetermined number into
Row convolution obtains several training convolutional results;Several training convolutional results are spliced, high-order feature training number is obtained
According to.
Further, high-order characteristic extracting module 30, is specifically used for: by the data information of target frame and high-order characteristic
It is packaged the data information for generating updated target frame;The data information of target frame in video data is replaced with updated
The data information of target frame obtains data to be extracted.
Correspondingly, 40 module of characteristic vector pickup is specifically used for: believing respectively the data of each frame in data to be extracted
Breath carries out feature extraction, obtains the characteristic of each frame;Mean value is carried out to the characteristic of the whole frame in data to be extracted
Processing, obtains the feature vector of data to be extracted.
Further, the corresponding gain parameter of each process of convolution, the number of process of convolution is according to high-order characteristic
According to kind number determine.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
Specific work process and corresponding beneficial effect, can refer to corresponding processes in the foregoing method embodiment, herein no longer
It repeats.
Action recognition device provided by the invention is also before carrying out action recognition to video data, with including several training
The training dataset of data and the corresponding recognition result of each training data is trained gain parameter, completes instruction to obtain
Experienced gain parameter.Then, continuous several frames before the target frame and target frame in video data are received, and in video data
The data information of continuous several frames before the middle data information for extracting the target frame and the target frame.Training to predetermined number
The data information of the data information of the gain parameter, target frame that finish and continuous several frames before target frame carries out preset times
Process of convolution, obtain high-order characteristic, which is added in video data, data to be extracted are formed,
To data to be extracted carry out temporal aspect extraction, obtain feature vector, finally according to feature vector obtain action recognition as a result, from
And the high-order feature of video data can be extracted, and then improve the accuracy of action recognition.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to
The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey
When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or
The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (10)
1. a kind of action identification method characterized by comprising
Receive video data;
It determines continuous several frames before target frame and the target frame, and extracts the target frame in the video data
The data information of continuous several frames before data information and the target frame;
To the number of continuous several frames before the gain parameter of predetermined number, the data information of the target frame and the target frame
It is believed that breath carries out the process of convolution of preset times, high-order characteristic is obtained;
The high-order characteristic is added in the video data, data to be extracted are formed;
Temporal aspect extraction is carried out to the data to be extracted, obtains feature vector;
Action recognition result is obtained according to described eigenvector.
2. action identification method according to claim 1, which is characterized in that before the reception video data, comprising:
Training dataset is received, the training dataset includes several training datas and the corresponding identification knot of each training data
Fruit;
It is concentrated from the training data and chooses a training data as to training data;
To continuous several frames before the training frames and the training frames in training data described in determining, and in the number to be trained
According to the training data information of continuous several frames before the middle training data information for extracting the training frames and the training frames;
To continuous before the gain parameter to be trained of predetermined number, the training data information of the training frames and the training frames
The training data information of several frames carries out the process of convolution of preset times, obtains high-order feature training data;
The high-order feature training data is added to described in training data, form training data to be extracted;
Temporal aspect extraction is carried out to the training data to be extracted, obtains the training feature vector to training data;
Prediction result is obtained according to the training feature vector;
The cross entropy to training data corresponding recognition result and the prediction result is obtained, and judges that the cross entropy is
No convergence;
If convergence, by the gain parameter to be trained as gain parameter, and the step of executing the reception video data;
If not restraining, the gain parameter to be trained is modified according to the cross entropy, concentrates and chooses from training data
Next training data is used as to training data, and before returning to the determination to the training frames and the training frames in training data
Continuous several frames the step of.
3. action identification method according to claim 2, which is characterized in that the gain parameter to predetermined number, institute
The data information for stating the data information of target frame and continuous several frames before the target frame carries out at the convolution of preset times
Reason obtains high-order characteristic, comprising:
It is generated according to the data information of continuous several frames before the data information of the target frame and the target frame to convolution
Data;
It is described to carry out convolution with the gain parameter of the predetermined number respectively to convolved data, obtain several convolution results;
Several described convolution results are spliced, high-order characteristic is obtained;
Correspondingly, the gain parameter to be trained, the training frames training data information and the training frames to predetermined number
Continuous several frame training data information before carry out the process of convolution of preset times, obtain high-order feature training data, comprising:
It is raw according to the training data information of continuous several frames before the training data information of the training frames and the training frames
At to convolution training data;
It is described to carry out convolution with the training gain parameter of the predetermined number respectively to convolution training data, obtain several training
Convolution results;
Several described training convolutional results are spliced, high-order feature training data is obtained.
4. action identification method according to claim 1, which is characterized in that described to be added to the high-order characteristic
In the video data, data to be extracted are formed, comprising:
The data information of the target frame and the high-order characteristic are packaged to the data information for generating updated target frame;
The data information of the target frame in the video data is replaced with to the data information of the updated target frame,
Obtain the data to be extracted;
Correspondingly, described carry out temporal aspect extraction to the data to be extracted, feature vector is obtained, comprising:
Feature extraction is carried out to the data information of each frame in the data to be extracted respectively, obtains the characteristic of each frame
According to;
Average value processing is carried out to the characteristic of the whole frame in the data to be extracted, obtains the feature of the data to be extracted
Vector.
5. action identification method according to claim 1, which is characterized in that each corresponding gain ginseng of process of convolution
Number, the number of process of convolution are determined according to the kind number of high-order characteristic.
6. a kind of action recognition device characterized by comprising
Receiving module, for receiving video data;
Data extraction module, for determining continuous several frames before target frame and the target frame, and in the video data
The data information of the middle data information for extracting the target frame and continuous several frames before the target frame;
High-order characteristic extracting module, data information and the target for gain parameter, the target frame to predetermined number
The data information of continuous several frames before frame carries out the process of convolution of preset times, obtains high-order characteristic;Being also used to will
The high-order characteristic is added in the video data, forms data to be extracted;
Characteristic vector pickup module obtains feature vector for carrying out temporal aspect extraction to the data to be extracted;
Recognition result obtains module, for obtaining action recognition result according to described eigenvector.
7. action recognition device according to claim 6, which is characterized in that
The receiving module is also used to before receiving video data, receives training dataset, if the training dataset includes
Dry training data and the corresponding recognition result of each training data;
The data extraction module, which is also used to concentrate from the training data, chooses a training data as to training data;Also use
It is described to continuous several frames before the training frames and the training frames in training data in determining, and described to training data
The training data information of middle the training data information for extracting the training frames and continuous several frames before the training frames;
The high-order characteristic extracting module is also used to the training data of gain parameter to be trained to predetermined number, the training frames
The training data information of continuous several frames before information and the training frames carries out the process of convolution of preset times, obtains high-order
Feature training data;The high-order feature training data is added to described in training data, form training data to be extracted;
Described eigenvector extraction module is also used to carry out temporal aspect extraction to the training data to be extracted, obtain it is described to
The training feature vector of training data;
The recognition result obtains module and is also used to obtain prediction result according to the training feature vector;
The action recognition device, further includes: determination module;The determination module is described corresponding to training data for obtaining
The cross entropy of recognition result and the prediction result, and judge whether the cross entropy restrains;
The high-order characteristic extracting module is also used to described when the determination module judges cross entropy convergence wait instruct
Practice gain parameter and be used as gain parameter, the step of the receiving module execution reception video data;
The high-order characteristic extracting module is also used to when the determination module judges that the cross entropy is not restrained, according to described
Cross entropy is modified the gain parameter to be trained, and the data extraction module is also used to concentrate under selection from training data
One training data is used as to training data, and before returning to the determination to the training frames and the training frames in training data
The step of continuous several frames.
8. action recognition device according to claim 7, which is characterized in that the high-order characteristic extracting module is specific to use
In:
It is generated according to the data information of continuous several frames before the data information of the target frame and the target frame to convolution
Data;It is described to carry out convolution with the gain parameter of the predetermined number respectively to convolved data, obtain several convolution results;It will
Several described convolution results are spliced, and high-order characteristic is obtained;
It is raw according to the training data information of continuous several frames before the training data information of the training frames and the training frames
At to convolution training data;It is described to carry out convolution with the training gain parameter of the predetermined number respectively to convolution training data,
Obtain several training convolutional results;Several described training convolutional results are spliced, high-order feature training data is obtained.
9. action recognition device according to claim 6, which is characterized in that
The high-order characteristic extracting module, is specifically used for: the data information of the target frame and the high-order characteristic are beaten
Packet generates the data information of updated target frame;The data information of the target frame in the video data is replaced with into institute
The data information of updated target frame is stated, the data to be extracted are obtained;
Correspondingly, described eigenvector extraction module is specifically used for: respectively to the data of each frame in the data to be extracted
Information carries out feature extraction, obtains the characteristic of each frame;To the characteristic of the whole frame in the data to be extracted into
Row average value processing obtains the feature vector of the data to be extracted.
10. action recognition device according to claim 6, which is characterized in that each corresponding gain ginseng of process of convolution
Number, the number of process of convolution are determined according to the kind number of high-order characteristic.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710470470.8A CN109101858B (en) | 2017-06-20 | 2017-06-20 | Action recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710470470.8A CN109101858B (en) | 2017-06-20 | 2017-06-20 | Action recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109101858A true CN109101858A (en) | 2018-12-28 |
CN109101858B CN109101858B (en) | 2022-02-18 |
Family
ID=64795666
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710470470.8A Expired - Fee Related CN109101858B (en) | 2017-06-20 | 2017-06-20 | Action recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109101858B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598744A (en) * | 2019-08-12 | 2019-12-20 | 浙江大学 | Real-time human body behavior recognition system and method based on inertial sensor and Edge TPU |
CN112580577A (en) * | 2020-12-28 | 2021-03-30 | 出门问问(苏州)信息科技有限公司 | Training method and device for generating speaker image based on face key points |
CN113573076A (en) * | 2020-04-29 | 2021-10-29 | 华为技术有限公司 | Method and apparatus for video encoding |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102306301A (en) * | 2011-08-26 | 2012-01-04 | 中南民族大学 | Motion identification system by simulating spiking neuron of primary visual cortex |
CN103413154A (en) * | 2013-08-29 | 2013-11-27 | 北京大学深圳研究生院 | Human motion identification method based on normalized class Google measurement matrix |
CN104217214A (en) * | 2014-08-21 | 2014-12-17 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Configurable convolutional neural network based red green blue-distance (RGB-D) figure behavior identification method |
US9230159B1 (en) * | 2013-12-09 | 2016-01-05 | Google Inc. | Action recognition and detection on videos |
CN106407889A (en) * | 2016-08-26 | 2017-02-15 | 上海交通大学 | Video human body interaction motion identification method based on optical flow graph depth learning model |
-
2017
- 2017-06-20 CN CN201710470470.8A patent/CN109101858B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102306301A (en) * | 2011-08-26 | 2012-01-04 | 中南民族大学 | Motion identification system by simulating spiking neuron of primary visual cortex |
CN103413154A (en) * | 2013-08-29 | 2013-11-27 | 北京大学深圳研究生院 | Human motion identification method based on normalized class Google measurement matrix |
US9230159B1 (en) * | 2013-12-09 | 2016-01-05 | Google Inc. | Action recognition and detection on videos |
CN104217214A (en) * | 2014-08-21 | 2014-12-17 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Configurable convolutional neural network based red green blue-distance (RGB-D) figure behavior identification method |
CN106407889A (en) * | 2016-08-26 | 2017-02-15 | 上海交通大学 | Video human body interaction motion identification method based on optical flow graph depth learning model |
Non-Patent Citations (3)
Title |
---|
VIVEK VEERIAH ET AL.: "Differential recurrent neural networks for action recognition", 《PROCESSING OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 * |
YONGHONG HOU ET AL.: "Skeleton optical spectra-based action recognition using convolutional neural network", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 * |
周风余等: "基于时序深度置信网络的在线人体动作识别", 《自动化学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598744A (en) * | 2019-08-12 | 2019-12-20 | 浙江大学 | Real-time human body behavior recognition system and method based on inertial sensor and Edge TPU |
CN113573076A (en) * | 2020-04-29 | 2021-10-29 | 华为技术有限公司 | Method and apparatus for video encoding |
CN112580577A (en) * | 2020-12-28 | 2021-03-30 | 出门问问(苏州)信息科技有限公司 | Training method and device for generating speaker image based on face key points |
CN112580577B (en) * | 2020-12-28 | 2023-06-30 | 出门问问(苏州)信息科技有限公司 | Training method and device for generating speaker image based on facial key points |
Also Published As
Publication number | Publication date |
---|---|
CN109101858B (en) | 2022-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jie et al. | Tree-structured reinforcement learning for sequential object localization | |
CN110826530B (en) | Face detection using machine learning | |
CN108805016B (en) | Head and shoulder area detection method and device | |
CN109271958B (en) | Face age identification method and device | |
CN108230291B (en) | Object recognition system training method, object recognition method, device and electronic equipment | |
Obinata et al. | Temporal extension module for skeleton-based action recognition | |
JP2014524630A5 (en) | ||
CN110472672B (en) | Method and apparatus for training machine learning models | |
CN109101858A (en) | Action identification method and device | |
CN112149651B (en) | Facial expression recognition method, device and equipment based on deep learning | |
CN111914665A (en) | Face shielding detection method, device, equipment and storage medium | |
Zhang et al. | A new architecture of feature pyramid network for object detection | |
Tarasiewicz et al. | Skinny: A lightweight U-Net for skin detection and segmentation | |
CN111291668A (en) | Living body detection method, living body detection device, electronic equipment and readable storage medium | |
CN112307984A (en) | Safety helmet detection method and device based on neural network | |
CN110263872B (en) | Training data processing method and device | |
Tjon et al. | Eff-ynet: A dual task network for deepfake detection and segmentation | |
CN113627256B (en) | False video inspection method and system based on blink synchronization and binocular movement detection | |
CN109447095B (en) | Visual attribute identification method, device and storage medium | |
CN109598201B (en) | Action detection method and device, electronic equipment and readable storage medium | |
JP7073171B2 (en) | Learning equipment, learning methods and programs | |
CN112487903B (en) | Gait data generation method and device based on countermeasure network | |
CN114913588A (en) | Face image restoration and recognition method applied to complex scene | |
CN109409226B (en) | Finger vein image quality evaluation method and device based on cascade optimization CNN | |
CN112329736A (en) | Face recognition method and financial system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230410 Address after: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District Patentee after: Peking University Address before: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District Patentee before: Peking University Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee before: BEIJING FOUNDER ELECTRONICS Co.,Ltd. |
|
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220218 |