CN110210430A - A kind of Activity recognition method and device - Google Patents
A kind of Activity recognition method and device Download PDFInfo
- Publication number
- CN110210430A CN110210430A CN201910491344.XA CN201910491344A CN110210430A CN 110210430 A CN110210430 A CN 110210430A CN 201910491344 A CN201910491344 A CN 201910491344A CN 110210430 A CN110210430 A CN 110210430A
- Authority
- CN
- China
- Prior art keywords
- scoring
- behavior classification
- video
- current
- identified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of Activity recognition method and device, this method, comprising: presets at least two behavior classifications;Video to be identified is divided at least two video clips;It for each video clip, executes: extracting the key frame of current video segment, stacks light stream and successive frame still image;Activity recognition is carried out respectively according to key frame, stacking light stream and successive frame still image, determines the first scoring, the second scoring and third scoring of each behavior classification of current video segment respectively;According to the first of each behavior classification of each video clip the scoring, the second scoring and third scoring, the spatial flow scoring, time flow scoring and 3D scoring of each behavior classification of video to be identified are determined respectively;According to the scoring of the spatial flow of each behavior classification of video to be identified, time flow scoring and 3D scoring, the final scoring of each behavior classification of video to be identified is generated.The present invention provides a kind of Activity recognition method and devices, can be improved the accuracy of Activity recognition.
Description
Technical field
The present invention relates to field of computer technology, in particular to a kind of Activity recognition method and device.
Background technique
The Activity recognition of video refers to automatically analyzes the behavior that identification human body executes from one section of video.Simplest behavior
Identification is also referred to as behavior classification, and the human body behavior in unknown video can be categorized into several behavior classifications predetermined by it
In.
In the prior art, the still image that several frames are extracted from video to be identified is carried out from these still images
Activity recognition generates final for the recognition result identified to equipment.
As can be seen from the above description, the prior art only considers the appearance in still image when carrying out Activity recognition
Information, recognition result inaccuracy.
Summary of the invention
The embodiment of the invention provides a kind of Activity recognition method and devices, can be improved the accuracy of Activity recognition.
On the one hand, the embodiment of the invention provides a kind of Activity recognition methods, comprising:
Preset at least two behavior classifications;
Video to be identified is divided at least two video clips;
For each video clip, execute: key frame, stacking light stream and the successive frame for extracting current video segment are quiet
State image;Activity recognition is carried out to the current video segment according to the key frame, determines the every of the current video segment
First scoring of a behavior classification carries out Activity recognition to the current video segment according to the stacking light stream, determines
Second scoring of each of the current video segment behavior classification, according to the successive frame still image to described current
Video clip carries out Activity recognition, determines the third scoring of each of described current video segment behavior classification;
According to the first of each of each video clip behavior classification the scoring, the video to be identified is determined
The spatial flow of each behavior classification scores;It is commented according to the second of each of each video clip behavior classification
Point, determine the time flow scoring of each of described video to be identified behavior classification;According to the every of each video clip
The third of a behavior classification scores, and determines the 3D scoring of each of described video to be identified behavior classification;
According to the scoring of the spatial flow of each of the video to be identified behavior classification, each behavior classification
Time flow scoring and the 3D of each behavior classification score, generate each of described video to be identified row
For the final scoring of classification.
Optionally,
This method further comprises:
Preset the weight of the weight of the spatial flow scoring, the weight of time flow scoring and 3D scoring;
It is described according to the spatial flow of each of the video to be identified behavior classification scoring, each behavior
The time flow scoring of classification and the 3D of each behavior classification score, and generate each institute of the video to be identified
State the final scoring of behavior classification, comprising:
It for each behavior classification, executes: according to the scoring of the spatial flow of current behavior classification, the current line
Weight, the time flow for the time flow scoring of classification, the 3D scoring of the current behavior classification, spatial flow scoring are commented
The weight of the weight divided and 3D scoring determines the current behavior classification of the video to be identified most using formula four
Final review point, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior class
Other spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification is commented
Point, a is the weight of spatial flow scoring, and b is the weight of time flow scoring, and c is the weight of 3D scoring.
Optionally,
It is described according to the spatial flow of each of the video to be identified behavior classification scoring, each behavior
The time flow scoring of classification and the 3D of each behavior classification score, and generate each institute of the video to be identified
State the final scoring of behavior classification, comprising:
By the spatial flow of each of the video to be identified behavior classification scoring, each behavior classification
The time flow scoring and the 3D scoring of each behavior classification are input in the Linear SVM classifier of training completion,
The final scoring of each of described video to be identified behavior classification is determined using the Linear SVM classifier;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
Optionally,
It is described that Activity recognition is carried out to the current video segment according to the key frame, determine the current video segment
Each of the behavior classification first scoring, comprising:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, benefit
Activity recognition is carried out with the key frame of the space flow model of the 2D convolution to the current video segment, is worked as described in determination
First scoring of each of the preceding video clip behavior classification.
Optionally,
It is described that Activity recognition is carried out to the current video segment according to the stacking light stream, determine the current video piece
Second scoring of each of the section behavior classification, comprising:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion,
Activity recognition is carried out to the stacking light stream of the current video segment using the time flow model of the 2D convolution, determines institute
State the second scoring of each of current video segment behavior classification.
Optionally,
It is described that Activity recognition is carried out to the current video segment according to the successive frame still image, it determines described current
The third of each of video clip behavior classification scores, comprising:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, benefit
Activity recognition is carried out with the successive frame still image of the 3D convolution model to the current video segment, is worked as described in determination
The third of each of the preceding video clip behavior classification scores.
Optionally,
It is described to score according to the first of each of each video clip behavior classification, determine the view to be identified
The spatial flow of each of frequency behavior classification scores, comprising:
For each behavior classification, executes: being commented according to the first of the current behavior classification of each video clip
Point, determine that the spatial flow of the current behavior classification of the video to be identified scores using formula one, wherein the formula one
Are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is described at least two
The sum of a video clip,For the first scoring of the current behavior classification of k-th of video clip.
Optionally,
It is described to score according to the second of each of each video clip behavior classification, determine the view to be identified
The time flow of each of frequency behavior classification scores, comprising:
For each behavior classification, executes: being commented according to the second of the current behavior classification of each video clip
Point, determine that the time flow of the current behavior classification of the video to be identified scores using formula two, wherein the formula two
Are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is described at least two
The sum of a video clip,For the second scoring of the current behavior classification of k-th of video clip.
Optionally,
It is described to be scored according to the third of each of each video clip behavior classification, determine the view to be identified
The 3D of each of frequency behavior classification scores, comprising:
For each behavior classification, executes: being commented according to the third of the current behavior classification of each video clip
Point, determine that the 3D of the current behavior classification of the video to be identified scores using formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two view
The sum of frequency segment,For the third scoring of the current behavior classification of k-th of video clip.
On the other hand, the embodiment of the invention provides a kind of Activity recognition devices, comprising:
First setting unit, at least two behavior classifications to be arranged;
Cutting unit, for video to be identified to be divided at least two video clips;
Fragment processing unit, for be directed to each video clip, execute: extract current video segment key frame,
Stack light stream and successive frame still image;Activity recognition is carried out to the current video segment according to the key frame, determines institute
The first scoring for stating each of current video segment behavior classification, according to the stacking light stream to the current video segment
Activity recognition is carried out, the second scoring of each of described current video segment behavior classification is determined, according to the successive frame
Still image carries out Activity recognition to the current video segment, determines each of described current video segment behavior classification
Third scoring;
Segment composition unit, for scoring according to the first of each of each video clip behavior classification, really
The spatial flow scoring of each of the fixed video to be identified behavior classification;According to described in each of each described video clip
Second scoring of behavior classification determines the time flow scoring of each of described video to be identified behavior classification;According to each
The third of each of the video clip behavior classification scores, and determines each of described video to be identified behavior classification
3D scoring;
Final integrated unit, for being commented according to the spatial flow of each of the video to be identified behavior classification
Point, the 3D scoring of the scoring of the time flow of each behavior classification and each behavior classification, generate it is described to
Identify the final scoring of each of the video behavior classification.
Optionally,
The device further comprises:
Second setting unit, for the weight of spatial flow scoring, the weight of time flow scoring and described to be arranged
The weight of 3D scoring;
The final integrated unit executes: according to current behavior classification for being directed to each behavior classification
Spatial flow scoring, the time flow scoring of the current behavior classification, the 3D scoring of the current behavior classification, the spatial flow are commented
The weight of the weight, time flow scoring divided and the weight of 3D scoring, utilize formula four to determine the video to be identified
The current behavior classification final scoring, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior class
Other spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification is commented
Point, a is the weight of spatial flow scoring, and b is the weight of time flow scoring, and c is the weight of 3D scoring.
Optionally,
The final integrated unit, for commenting the spatial flow of each of the video to be identified behavior classification
Divide, the scoring of the time flow of each behavior classification and the 3D scoring of each behavior classification are input to and have trained
At Linear SVM classifier in, determine each of described video to be identified behavior classification using the Linear SVM classifier
Final scoring;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
Optionally,
The fragment processing unit, execute it is described according to the key frame to the current video segment carry out behavior knowledge
Not, when determining the first scoring of each of described current video segment behavior classification, it is specifically used for:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, benefit
Activity recognition is carried out with the key frame of the space flow model of the 2D convolution to the current video segment, is worked as described in determination
First scoring of each of the preceding video clip behavior classification.
Optionally,
The fragment processing unit, execute it is described according to the stackings light stream to the current video segment progress behavior
Identification is specifically used for when determining the second scoring of each of described current video segment behavior classification:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion,
Activity recognition is carried out to the stacking light stream of the current video segment using the time flow model of the 2D convolution, determines institute
State the second scoring of each of current video segment behavior classification.
Optionally,
The fragment processing unit, execute it is described according to the successive frame still image to the current video segment into
Row Activity recognition is specifically used for when determining the third scoring of each of described current video segment behavior classification:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, benefit
Activity recognition is carried out with the successive frame still image of the 3D convolution model to the current video segment, is worked as described in determination
The third of each of the preceding video clip behavior classification scores.
Optionally,
The segment composition unit, it is described according to the of each of each video clip behavior classification executing
One scoring is specifically used for when determining the spatial flow scoring of each of described video to be identified behavior classification:
For each behavior classification, executes: being commented according to the first of the current behavior classification of each video clip
Point, determine that the spatial flow of the current behavior classification of the video to be identified scores using formula one, wherein the formula one
Are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is described at least two
The sum of a video clip,For the first scoring of the current behavior classification of k-th of video clip.
Optionally,
The segment composition unit, it is described according to the of each of each video clip behavior classification executing
Two scorings are specifically used for when determining the time flow scoring of each of described video to be identified behavior classification:
For each behavior classification, executes: being commented according to the second of the current behavior classification of each video clip
Point, determine that the time flow of the current behavior classification of the video to be identified scores using formula two, wherein the formula two
Are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is described at least two
The sum of a video clip,For the second scoring of the current behavior classification of k-th of video clip.
Optionally,
The segment composition unit, it is described according to the of each of each video clip behavior classification executing
Three scorings are specifically used for when determining the 3D scoring of each of described video to be identified behavior classification:
For each behavior classification, executes: being commented according to the third of the current behavior classification of each video clip
Point, determine that the 3D of the current behavior classification of the video to be identified scores using formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two view
The sum of frequency segment,For the third scoring of the current behavior classification of k-th of video clip.
In embodiments of the present invention, video to be identified is divided at least two video clips, extracts each video clip and closes
Key frame stacks light stream and successive frame still image, right in terms of key frame, stacking light stream and successive frame still image three respectively
Each video clip carries out Activity recognition and is then fused to the final recognition result of video to be identified, in identification process,
Activity recognition is carried out in terms of three by video segmentation to be identified, and to each video clip, it can be from time, space and space-time
Three angles carry out Activity recognition to video to be identified, can based on more angles, more information come it is comprehensive identify to
It identifies video, will finally be fused to final recognition result, greatly improve the accuracy of Activity recognition.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart for Activity recognition method that one embodiment of the invention provides;
Fig. 2 is the flow chart for another Activity recognition method that one embodiment of the invention provides;
Fig. 3 is a kind of schematic diagram for Activity recognition device that one embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, this method may comprise steps of the embodiment of the invention provides a kind of Activity recognition method:
Step 101: presetting at least two behavior classifications;
Step 102: video to be identified is divided at least two video clips;
Step 103: be directed to each video clip, execute: extract current video segment key frame, stack light stream and
Successive frame still image;Activity recognition is carried out to the current video segment according to the key frame, determines the current video
First scoring of each of segment behavior classification carries out behavior knowledge to the current video segment according to the stacking light stream
Not, the second scoring for determining each of described current video segment behavior classification, according to the successive frame still image pair
The current video segment carries out Activity recognition, determines that the third of each of described current video segment behavior classification is commented
Point;
Step 104: according to the first of each of each video clip behavior classification the scoring, determining described wait know
The spatial flow of each of the other video behavior classification scores;According to each of each video clip behavior classification
Second scoring determines the time flow scoring of each of described video to be identified behavior classification;According to each piece of video
The third scoring of each of the section behavior classification determines the 3D scoring of each of described video to be identified behavior classification;
Step 105: being scored according to the spatial flow of each of the video to be identified behavior classification, is each described
The time flow scoring of behavior classification and the 3D of each behavior classification score, and generate the every of the video to be identified
The final scoring of a behavior classification.
In embodiments of the present invention, video to be identified is divided at least two video clips, extracts each video clip and closes
Key frame stacks light stream and successive frame still image, right in terms of key frame, stacking light stream and successive frame still image three respectively
Each video clip carries out Activity recognition and is then fused to the final recognition result of video to be identified, in identification process,
Activity recognition is carried out in terms of three by video segmentation to be identified, and to each video clip, it can be from time, space and space-time
Three angles carry out Activity recognition to video to be identified, can based on more angles, more information come it is comprehensive identify to
It identifies video, will finally be fused to final recognition result, greatly improve the accuracy of Activity recognition.
In embodiments of the present invention, the scoring of behavior classification refers to that the behavior in video belongs to the score of behavior classification,
Score is higher, illustrates that a possibility that current video segment belongs to behavior classification is higher.
Specifically, the first scoring refers under the mode for carrying out Activity recognition based on key frame, in current video segment
Behavior belongs to the scoring of some behavior classification;Second, which scores, refers under based on the mode for stacking light stream progress Activity recognition, when
Behavior in preceding video clip belongs to the scoring of some behavior classification;Third scoring refers to be carried out based on successive frame still image
Under the mode of Activity recognition, the behavior in current video segment belongs to the scoring of some behavior classification.
Spatial flow scoring refers to that under the mode for carrying out Activity recognition based on key frame, the behavior in video to be identified belongs to
The scoring of some behavior classification;Time flow, which scores, to be referred under based on the mode for stacking light stream progress Activity recognition, view to be identified
Behavior in frequency belongs to the scoring of some behavior classification;3D scoring, which refers to, is carrying out Activity recognition based on successive frame still image
Under mode, the behavior in video to be identified belongs to the scoring of some behavior classification.
Final scoring, which refers to, will be carried out the mode of Activity recognition based on key frame, carries out Activity recognition based on stacking light stream
After mode and the mode for carrying out Activity recognition based on successive frame still image are merged, the obtained behavior in video to be identified
Belong to the scoring of some behavior classification.
In embodiments of the present invention, it can be realized by way of sparse sampling and described video to be identified is divided at least two
A video clip and the key frame for extracting current video segment stack light stream and successive frame still image.
In embodiments of the present invention, described that video to be identified is divided at least two video clips, comprising: by view to be identified
Frequency is averagely divided at least two video clips.Wherein, the duration of each video clip is equal.
In embodiments of the present invention, the key frame for extracting current video segment includes: to obtain from current video segment at random
A frame image is taken, using the frame image as key frame.Here keyword can be individual still image.
In an embodiment of the present invention, this method further comprises:
Preset the weight of the weight of the spatial flow scoring, the weight of time flow scoring and 3D scoring;
It is described according to the spatial flow of each of the video to be identified behavior classification scoring, each behavior
The time flow scoring of classification and the 3D of each behavior classification score, and generate each institute of the video to be identified
State the final scoring of behavior classification, comprising:
It for each behavior classification, executes: according to the scoring of the spatial flow of current behavior classification, the current line
Weight, the time flow for the time flow scoring of classification, the 3D scoring of the current behavior classification, spatial flow scoring are commented
The weight of the weight divided and 3D scoring determines the current behavior classification of the video to be identified most using formula four
Final review point, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior class
Other spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification is commented
Point, a is the weight of spatial flow scoring, and b is the weight of time flow scoring, and c is the weight of 3D scoring.
In embodiments of the present invention, weight, the weight of time flow scoring and the 3D scoring scored by installation space stream
Spatial flow scoring, time flow scoring and 3D scoring are merged, generate final scoring by weight.For example, a can be
0.3, b, which can be 0.3, c, can be 0.4 etc..Spatial flow scoring, time flow scoring and 3D is merged by the implementation to comment
Point, it is more accurate to enable to finally score.
In an embodiment of the present invention, the space according to each of the video to be identified behavior classification
The 3D of stream scoring, the time flow scoring of each behavior classification and each behavior classification scores, and generates institute
State the final scoring of each of video to be identified behavior classification, comprising:
By the spatial flow of each of the video to be identified behavior classification scoring, each behavior classification
The time flow scoring and the 3D scoring of each behavior classification are input in the Linear SVM classifier of training completion,
The final scoring of each of described video to be identified behavior classification is determined using the Linear SVM classifier;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
In embodiments of the present invention, the SVM classifier completed using training is commented spatial flow scoring, time flow scoring and 3D
Divide fusion, and above-mentioned kernel function is set, it is more accurate to enable to finally score.Wherein, d can be 1,2,3 etc.,
Specifically, d can be 9.
In addition, when being trained to Linear SVM classifier,
SVM classifier is by predicting that score and label are input in Linear SVM classifier and carry out for training set videl stage
Training.
SVM classifier is by predicting that score and label are input in Linear SVM classifier and carry out for training set videl stage
Training.
It is in an embodiment of the present invention, described that Activity recognition is carried out to the current video segment according to the key frame,
Determine the first scoring of each of described current video segment behavior classification, comprising:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, benefit
Activity recognition is carried out with the key frame of the space flow model of the 2D convolution to the current video segment, is worked as described in determination
First scoring of each of the preceding video clip behavior classification.
In embodiments of the present invention, using the space flow model of the 2D convolution in double-stream digestion come to key frame
Reason realizes the Activity recognition to current video segment.It can be from the angle in space to current by the space flow model of 2D convolution
Video clip carries out Activity recognition.
In an embodiment of the present invention, described that behavior knowledge is carried out to the current video segment according to the stacking light stream
Not, the second scoring of each of described current video segment behavior classification is determined, comprising:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion,
Activity recognition is carried out to the stacking light stream of the current video segment using the time flow model of the 2D convolution, determines institute
State the second scoring of each of current video segment behavior classification.
In embodiments of the present invention, using the time flow model of the 2D convolution in double-stream digestion come to stack light stream at
Reason realizes the Activity recognition to current video segment.It can be from the angle of time to current by the time flow model of 2D convolution
Video clip carries out Activity recognition.
In an embodiment of the present invention, described to be gone according to the successive frame still image to the current video segment
For identification, the third scoring of each of described current video segment behavior classification is determined, comprising:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, benefit
Activity recognition is carried out with the successive frame still image of the 3D convolution model to the current video segment, is worked as described in determination
The third of each of the preceding video clip behavior classification scores.
In embodiments of the present invention, successive frame still image is handled using 3D convolution model, is realized to current
The Activity recognition of video clip.Activity recognition can be carried out to current video segment from the angle of space-time by 3D convolution model.
In addition, the time flow model and 3D convolution model of the space flow model of 2D convolution, 2D convolution can be by with lower sections
Formula is trained:
J1: the building space flow model of 2D convolution, the time flow model of 2D convolution and 3D convolution model these three networks
Model, the training set data that will acquire are sent into each network model, by a series of convolution, Chi Hua, nonlinear activation function,
Normalization, full articulamentum, softmax function operation, output video actions classification score, complete the forward-propagating process of network;
Above-mentioned formula be softmax function, wherein Xi be network in i-th of neuron of the last layer output, i ∈ [1,
N], N is total classification number of behavior classification.
J2: it calculates network model and finally exports and intersect entropy loss between layer data and actual value, according to backpropagation tune
The parameter for saving each layer in each network model completes the back-propagation process of network;
The concrete form of cross entropy loss function is as follows:
L=- ∑ zilnyi;
Wherein, ziFor true classification results.
J3: the forward-propagating and back-propagation process of continuous iteration first two steps, until network convergence.
In an embodiment of the present invention, first according to each of each video clip behavior classification is commented
Point, determine the spatial flow scoring of each of described video to be identified behavior classification, comprising:
For each behavior classification, executes: being commented according to the first of the current behavior classification of each video clip
Point, determine that the spatial flow of the current behavior classification of the video to be identified scores using formula one, wherein the formula one
Are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is described at least two
The sum of a video clip,For the first scoring of the current behavior classification of k-th of video clip.
In an embodiment of the present invention, second according to each of each video clip behavior classification is commented
Point, determine the time flow scoring of each of described video to be identified behavior classification, comprising:
For each behavior classification, executes: being commented according to the second of the current behavior classification of each video clip
Point, determine that the time flow of the current behavior classification of the video to be identified scores using formula two, wherein the formula two
Are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is described at least two
The sum of a video clip,For the second scoring of the current behavior classification of k-th of video clip.
In an embodiment of the present invention, the third according to each of each video clip behavior classification is commented
Point, determine the 3D scoring of each of described video to be identified behavior classification, comprising:
For each behavior classification, executes: being commented according to the third of the current behavior classification of each video clip
Point, determine that the 3D of the current behavior classification of the video to be identified scores using formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two view
The sum of frequency segment,For the third scoring of the current behavior classification of k-th of video clip.
As shown in Fig. 2, this method may comprise steps of the embodiment of the invention provides a kind of Activity recognition method:
Step 201: presetting at least two behavior classifications.
Specifically, which may include: race, jumps, walks, climbs, plays basketball, plays tennis, plays volleyball, kicks
Football etc..
Step 202: video to be identified is divided at least two video clips.
Specifically, video to be identified can be averagely divided at least two video clips.
Step 203: being directed to each video clip, execute: extracting the key frame of current video segment, stack light stream and continuous
Frame still image;The key frame of current video segment is input in the space flow model of the 2D convolution of training completion, utilizes 2D
The space flow model of convolution carries out Activity recognition to the key frame of current video segment, determines each behavior of current video segment
First scoring of classification;The stacking light stream of current video segment is input in the time flow model of the 2D convolution of training completion,
Activity recognition is carried out to the stacking light stream of current video segment using the time flow model of 2D convolution, determines current video segment
Second scoring of each behavior classification;The successive frame still image of current video segment is input to the 3D convolution mould of training completion
In type, Activity recognition is carried out using successive frame still image of the 3D convolution model to current video segment, determines current video piece
The third scoring of each behavior classification of section.
For example, it for video clip 1, executes:
It extracts the key frame of video clip 1, stack light stream and successive frame still image;
The key frame of video clip 1 is input in the space flow model of the 2D convolution of training completion, utilizes 2D convolution
Space flow model carries out Activity recognition to the key frame of current video segment, determines each behavior classification of current video segment
First scoring;Such as: the first scoring of behavior classification " race ", the first scoring of behavior classification " jump ", behavior classification " are played basketball "
First scoring etc..
The stacking light stream of video clip 1 is input in the time flow model of the 2D convolution of training completion, utilizes 2D convolution
Time flow model Activity recognition is carried out to the stacking light stream of video clip 1, determine the of each behavior classification of video clip 1
Two scorings;Such as: the second scoring of behavior classification " race ", the second scoring of behavior classification " jump ", behavior classification " playing basketball "
Second scoring etc..
The successive frame still image of video clip 1 is input in the 3D convolution model of training completion, utilizes 3D convolution mould
Type carries out Activity recognition to the successive frame still image of video clip 1, determines that the third of each behavior classification of video clip 1 is commented
Point;Such as: the third scoring of behavior classification " race ", the third scoring of behavior classification " jump ", the third of behavior classification " playing basketball "
Scoring etc..
In addition, can be extracted in the way of 5 frame per second when extracting stacking light stream.
The size of key frame can be 1 × 3 × L × W (such as RGB channel number is 3), and stacking light stream size can be 5 × 2
(the five light stream figures directly extracted from continuous six static images indicate x wherein a light stream figure is two channels to × L × W
The pixel variation in direction and the pixel in the direction y change), successive frame still image size can be (continuous for 16 × 3 × L × W
16 frame RGB pictures), L and W are the length and width of input picture.
Step 204: according to the first of each behavior classification of each video clip the scoring, determining each of video to be identified
The spatial flow of behavior classification scores;According to the second of each behavior classification of each video clip the scoring, video to be identified is determined
Each behavior classification time flow scoring;It is scored, is determined wait know according to the third of each behavior classification of each video clip
The 3D of each behavior classification of other video scores.
For example, there are three video clips altogether for video to be identified, are video clip 1, video clip 2 and piece of video respectively
Section 3, for behavior classification " race ", the first scoring of the behavior classification " race " of video clip 1 is P1, the row of video clip 2
The first scoring for classification " race " is P2, and the first scoring of the behavior classification " race " of video clip 3 is P3, then, according to P1, P2
And P3, determine the spatial flow scoring of the behavior classification " race " of video to be identified.
Step 205: according to the scoring of the spatial flow of each behavior classification of video to be identified, the time flow of each behavior classification
The 3D of scoring and each behavior classification scores, and generates the final scoring of each behavior classification of video to be identified.
For example, for for behavior classification " race ", the spatial flow scoring S1 of the behavior classification " race " of video to be identified,
The 3D scoring M1 of the behavior classification " race " of the time flow scoring T1 and video to be identified of the behavior classification " race " of video to be identified, that
, according to S1, T1 and M1, generate the final scoring of the behavior classification " race " of video to be identified.
The embodiment of the present invention makes full use of time in video, space, space time information, simultaneously by merging multiple models
In view of the influence of long movement, merges multiple fragment stage Activity recognition results and obtain videl stage Activity recognition as a result, obtaining more
Accurate recognition result.
The embodiment of the present invention has merged 2D convolution and 3D convolution model, and wherein 2D convolution refers to double-stream digestion, respectively from
The angle in time and space analyzes video content, is adequately utilized and acts change information and appearance information, and 3D
Convolution model then analyzes video content from the angle of space-time, and the space time information between multiple frames is adequately utilized.It is logical
Fusion 2D convolution sum 3D convolution model is crossed, more fully information is obtained.Meanwhile video is divided into multiple by the embodiment of the present invention
Section obtains the Activity recognition result of videl stage by merging the prediction result of multiple segments.Therefore, the embodiment of the present invention is not only
The information of multiple dimensions is utilized, has also widened the visual field of time dimension, obtains the prediction result of videl stage, meanwhile, the present invention
Embodiment uses sparse sampling, reduces training parameter on the basis of obtaining global information.The embodiment of the present invention can obtain more
Complete wider array of information, to reach more accurate recognition result.
As shown in figure 3, the embodiment of the invention provides a kind of Activity recognition devices, comprising:
First setting unit 301, at least two behavior classifications to be arranged;
Cutting unit 302, for video to be identified to be divided at least two video clips;
Fragment processing unit 303 executes for being directed to each video clip: extracting the key of current video segment
Frame stacks light stream and successive frame still image;Activity recognition is carried out to the current video segment according to the key frame, is determined
First scoring of each of the current video segment behavior classification, according to the stacking light stream to the current video piece
Duan Jinhang Activity recognition determines the second scoring of each of described current video segment behavior classification, according to described continuous
Frame still image carries out Activity recognition to the current video segment, determines each of described current video segment behavior class
Other third scoring;
Segment composition unit 304, for scoring according to the first of each of each video clip behavior classification,
Determine the spatial flow scoring of each of described video to be identified behavior classification;According to each institute of each video clip
The second scoring for stating behavior classification determines the time flow scoring of each of described video to be identified behavior classification;According to every
The third of each of a video clip behavior classification scores, and determines each of described video to be identified behavior class
Other 3D scoring;
Final integrated unit 305, for the spatial flow according to each of the video to be identified behavior classification
The 3D of scoring, the time flow scoring of each behavior classification and each behavior classification scores, described in generation
The final scoring of each of the video to be identified behavior classification.
In an embodiment of the present invention, which further comprises:
Second setting unit, for the weight of spatial flow scoring, the weight of time flow scoring and described to be arranged
The weight of 3D scoring;
The final integrated unit executes: according to current behavior classification for being directed to each behavior classification
Spatial flow scoring, the time flow scoring of the current behavior classification, the 3D scoring of the current behavior classification, the spatial flow are commented
The weight of the weight, time flow scoring divided and the weight of 3D scoring, utilize formula four to determine the video to be identified
The current behavior classification final scoring, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior class
Other spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification is commented
Point, a is the weight of spatial flow scoring, and b is the weight of time flow scoring, and c is the weight of 3D scoring.
In an embodiment of the present invention, the final integrated unit, for by each of the video to be identified row
For the institute of spatial flow scoring, the time flow scoring and each behavior classification of each behavior classification of classification
It states 3D scoring to be input in the Linear SVM classifier of training completion, determines the view to be identified using the Linear SVM classifier
The final scoring of each of frequency behavior classification;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
In an embodiment of the present invention, the fragment processing unit described is worked as according to the key frame to described executing
Preceding video clip carries out Activity recognition, when determining the first scoring of each of described current video segment behavior classification, tool
Body is used for:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, benefit
Activity recognition is carried out with the key frame of the space flow model of the 2D convolution to the current video segment, is worked as described in determination
First scoring of each of the preceding video clip behavior classification.
In an embodiment of the present invention, the fragment processing unit, execute it is described according to the stacking light stream to described
Current video segment carries out Activity recognition, when determining the second scoring of each of described current video segment behavior classification,
It is specifically used for:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion,
Activity recognition is carried out to the stacking light stream of the current video segment using the time flow model of the 2D convolution, determines institute
State the second scoring of each of current video segment behavior classification.
In an embodiment of the present invention, the fragment processing unit, it is described according to the successive frame still image executing
Activity recognition is carried out to the current video segment, determines that the third of each of described current video segment behavior classification is commented
Timesharing is specifically used for:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, benefit
Activity recognition is carried out with the successive frame still image of the 3D convolution model to the current video segment, is worked as described in determination
The third of each of the preceding video clip behavior classification scores.
In an embodiment of the present invention, the segment composition unit, it is described according to each video clip executing
First scoring of each behavior classification determines the spatial flow scoring of each of described video to be identified behavior classification
When, it is specifically used for:
For each behavior classification, executes: being commented according to the first of the current behavior classification of each video clip
Point, determine that the spatial flow of the current behavior classification of the video to be identified scores using formula one, wherein the formula one
Are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is described at least two
The sum of a video clip,For the first scoring of the current behavior classification of k-th of video clip.
In an embodiment of the present invention, the segment composition unit, it is described according to each video clip executing
Second scoring of each behavior classification determines the time flow scoring of each of described video to be identified behavior classification
When, it is specifically used for:
For each behavior classification, executes: being commented according to the second of the current behavior classification of each video clip
Point, determine that the time flow of the current behavior classification of the video to be identified scores using formula two, wherein the formula two
Are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is described at least two
The sum of a video clip,For the second scoring of the current behavior classification of k-th of video clip.
In an embodiment of the present invention, the segment composition unit, it is described according to each video clip executing
The third of each behavior classification scores, when determining the 3D scoring of each of described video to be identified behavior classification, tool
Body is used for:
For each behavior classification, executes: being commented according to the third of the current behavior classification of each video clip
Point, determine that the 3D of the current behavior classification of the video to be identified scores using formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two view
The sum of frequency segment,For the third scoring of the current behavior classification of k-th of video clip.
The contents such as the information exchange between each unit, implementation procedure in above-mentioned apparatus, due to implementing with the method for the present invention
Example is based on same design, and for details, please refer to the description in the embodiment of the method for the present invention, and details are not described herein again.
The embodiment of the invention provides a kind of readable mediums, including execute instruction, when the processor of storage control executes
Described when executing instruction, the storage control executes any one Activity recognition method provided in an embodiment of the present invention.
The embodiment of the invention provides a kind of storage controls, comprising: processor, memory and bus;
The memory is executed instruction for storing, and the processor is connect with the memory by the bus, when
When the storage control is run, the processor executes the described of memory storage and executes instruction, so that the storage
Controller executes any one Activity recognition method provided in an embodiment of the present invention.
The each embodiment of the present invention at least has the following beneficial effects:
1, video to be identified in embodiments of the present invention, is divided at least two video clips, extracts each video clip
Key frame stacks light stream and successive frame still image, respectively in terms of key frame, stacking light stream and successive frame still image three
Activity recognition is carried out to each video clip and then the final recognition result of video to be identified is fused to, in identification process
In, Activity recognition is carried out in terms of three by video segmentation to be identified, and to each video clip, can from the time, space and
Three angles of space-time carry out Activity recognition to video to be identified, can comprehensively be known based on more angles, more information
Video not to be identified will finally be fused to final recognition result, greatly improve the accuracy of Activity recognition.
2, the embodiment of the present invention is by merging multiple models, makes full use of time in video, space, space time information, together
When in view of long movement influence, merge multiple fragment stage Activity recognition results and obtain videl stage Activity recognition as a result, obtaining more
Add accurate recognition result.
3, the embodiment of the present invention has merged 2D convolution and 3D convolution model, and wherein 2D convolution refers to double-stream digestion, respectively
Video content is analyzed from the angle in time and space, movement change information and appearance information is adequately utilized, and
3D convolution model then analyzes video content from the angle of space-time, and the space time information between multiple frames is adequately utilized.
By merging 2D convolution sum 3D convolution model, more fully information is obtained.Meanwhile the embodiment of the present invention video is divided into it is multiple
Segment obtains the Activity recognition result of videl stage by merging the prediction result of multiple segments.Therefore, the embodiment of the present invention is not
Merely with the information of multiple dimensions, the visual field of time dimension has also been widened, has obtained the prediction result of videl stage, the present invention is implemented
Example can obtain more complete wider array of information, to reach more accurate recognition result.
It should be noted that, in this document, such as first and second etc relational terms are used merely to an entity
Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation
Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non-
It is exclusive to include, so that the process, method, article or equipment for including a series of elements not only includes those elements,
It but also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment
Some elements.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including
There is also other identical factors in the process, method, article or equipment of the element.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through
The relevant hardware of program instruction is completed, and program above-mentioned can store in computer-readable storage medium, the program
When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light
In the various media that can store program code such as disk.
Finally, it should be noted that the foregoing is merely presently preferred embodiments of the present invention, it is merely to illustrate skill of the invention
Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention,
Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.
Claims (10)
1. a kind of Activity recognition method, which is characterized in that preset at least two behavior classifications, comprising:
Video to be identified is divided at least two video clips;
It for each video clip, executes: extracting the key frame of current video segment, stacks light stream and successive frame static map
Picture;Activity recognition is carried out to the current video segment according to the key frame, determines each institute of the current video segment
State behavior classification first scoring, according to the stackings light stream to the current video segment carry out Activity recognition, determination described in
Second scoring of each of current video segment behavior classification, according to the successive frame still image to the current video
Segment carries out Activity recognition, determines the third scoring of each of described current video segment behavior classification;
According to the first of each of each video clip behavior classification the scoring, each of described video to be identified is determined
The spatial flow of the behavior classification scores;According to the second of each of each video clip behavior classification the scoring, really
The time flow scoring of each of the fixed video to be identified behavior classification;According to described in each of each described video clip
The third of behavior classification scores, and determines the 3D scoring of each of described video to be identified behavior classification;
According to the scoring of the spatial flow of each of the video to be identified behavior classification, the institute of each behavior classification
The 3D for stating time flow scoring and each behavior classification scores, and generates each of the video to be identified behavior class
Other final scoring.
2. the method according to claim 1, wherein
Further comprise:
Preset the weight of the weight of the spatial flow scoring, the weight of time flow scoring and 3D scoring;
It is described according to the spatial flow of each of the video to be identified behavior classification scoring, each behavior classification
Time flow scoring and the 3D of each behavior classification score, generate each of described video to be identified row
For the final scoring of classification, comprising:
It for each behavior classification, executes: according to the scoring of the spatial flow of current behavior classification, the current behavior class
Other time flow scoring, the 3D scoring of the current behavior classification, the weight of spatial flow scoring, the time flow score
The weight of weight and 3D scoring, the most final review of the current behavior classification of the video to be identified is determined using formula four
Point, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior classification
The spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification scores, a
For the weight of spatial flow scoring, b is the weight of time flow scoring, and c is the weight of 3D scoring.
3. the method according to claim 1, wherein
It is described according to the spatial flow of each of the video to be identified behavior classification scoring, each behavior classification
Time flow scoring and the 3D of each behavior classification score, generate each of described video to be identified row
For the final scoring of classification, comprising:
It will be described in the spatial flow scoring of each of the video to be identified behavior classification, each behavior classification
Time flow scoring and the 3D scoring of each behavior classification are input in the Linear SVM classifier of training completion, are utilized
The Linear SVM classifier determines the final scoring of each of described video to be identified behavior classification;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
4. method according to claim 1 to 3, which is characterized in that
It is described that Activity recognition is carried out to the current video segment according to the key frame, determine the every of the current video segment
First scoring of a behavior classification, comprising:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, utilizes institute
The space flow model for stating 2D convolution carries out Activity recognition to the key frame of the current video segment, and determination is described to work as forward sight
First scoring of each of the frequency segment behavior classification;
And/or
It is described that Activity recognition is carried out to the current video segment according to the stacking light stream, determine the current video segment
Second scoring of each behavior classification, comprising:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion, is utilized
The time flow model of the 2D convolution carries out Activity recognition to the stacking light stream of the current video segment, works as described in determination
Second scoring of each of the preceding video clip behavior classification;
And/or
It is described that Activity recognition is carried out to the current video segment according to the successive frame still image, determine the current video
The third of each of segment behavior classification scores, comprising:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, utilizes institute
It states 3D convolution model and Activity recognition is carried out to the successive frame still image of the current video segment, determination is described to work as forward sight
The third of each of the frequency segment behavior classification scores.
5. method according to claim 1 to 3, which is characterized in that
It is described to score according to the first of each of each video clip behavior classification, determine the video to be identified
The spatial flow of each behavior classification scores, comprising:
It for each behavior classification, executes: according to the first of the current behavior classification of each video clip the scoring, benefit
The spatial flow scoring of the current behavior classification of the video to be identified is determined with formula one, wherein the formula one are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is at least two view
The sum of frequency segment,For the first scoring of the current behavior classification of k-th of video clip;
And/or
It is described to score according to the second of each of each video clip behavior classification, determine the video to be identified
The time flow of each behavior classification scores, comprising:
It for each behavior classification, executes: according to the second of the current behavior classification of each video clip the scoring, benefit
The time flow scoring of the current behavior classification of the video to be identified is determined with formula two, wherein the formula two are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is at least two view
The sum of frequency segment,For the second scoring of the current behavior classification of k-th of video clip;
And/or
It is described to be scored according to the third of each of each video clip behavior classification, determine the video to be identified
The 3D of each behavior classification scores, comprising:
It for each behavior classification, executes: being scored according to the third of the current behavior classification of each video clip, benefit
The 3D scoring of the current behavior classification of the video to be identified is determined with formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two piece of video
The sum of section,For the third scoring of the current behavior classification of k-th of video clip.
6. a kind of Activity recognition device characterized by comprising
First setting unit, at least two behavior classifications to be arranged;
Cutting unit, for video to be identified to be divided at least two video clips;
Fragment processing unit executes for being directed to each video clip: extracting the key frame of current video segment, stacks
Light stream and successive frame still image;Activity recognition is carried out to the current video segment according to the key frame, is worked as described in determination
First scoring of each of the preceding video clip behavior classification carries out the current video segment according to the stacking light stream
Activity recognition determines the second scoring of each of described current video segment behavior classification, static according to the successive frame
Image carries out Activity recognition to the current video segment, determines the of each of described current video segment behavior classification
Three scorings;
Segment composition unit, for determining institute according to the first of each of each video clip behavior classification the scoring
State the spatial flow scoring of each of video to be identified behavior classification;According to the behavior of each of each video clip
Second scoring of classification determines the time flow scoring of each of described video to be identified behavior classification;According to each described
The third of each of video clip behavior classification scores, and determines the 3D of each of described video to be identified behavior classification
Scoring;
Final integrated unit, for being scored according to the spatial flow of each of the video to be identified behavior classification, often
The time flow scoring of a behavior classification and the 3D of each behavior classification score, and generate the view to be identified
The final scoring of each of frequency behavior classification.
7. device according to claim 6, which is characterized in that further comprise:
Second setting unit, weight, the weight of time flow scoring and the 3D for the spatial flow scoring to be arranged are commented
The weight divided;
The final integrated unit executes: for being directed to each behavior classification according to the space of current behavior classification
Stream scoring, the time flow scoring of the current behavior classification, the 3D scoring of the current behavior classification, the spatial flow score
The weight of weight, the weight of time flow scoring and 3D scoring, the institute of the video to be identified is determined using formula four
State the final scoring of current behavior classification, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior classification
The spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification scores, a
For the weight of spatial flow scoring, b is the weight of time flow scoring, and c is the weight of 3D scoring.
8. device according to claim 6, which is characterized in that
The final integrated unit, for the spatial flow of each of the video to be identified behavior classification to score,
The time flow scoring of each behavior classification and the 3D scoring of each behavior classification are input to trained completion
Linear SVM classifier in, determine each of described video to be identified behavior classification using the Linear SVM classifier
Final scoring;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
9. device according to claim 1 to 3, which is characterized in that
The fragment processing unit, execute it is described according to the key frame to the current video segment carry out Activity recognition,
When determining the first scoring of each of described current video segment behavior classification, it is specifically used for:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, utilizes institute
The space flow model for stating 2D convolution carries out Activity recognition to the key frame of the current video segment, and determination is described to work as forward sight
First scoring of each of the frequency segment behavior classification;
And/or
The fragment processing unit, execute it is described according to the stackings light stream to the current video segment progress behavior knowledge
Not, when determining the second scoring of each of described current video segment behavior classification, it is specifically used for:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion, is utilized
The time flow model of the 2D convolution carries out Activity recognition to the stacking light stream of the current video segment, works as described in determination
Second scoring of each of the preceding video clip behavior classification;
And/or
The fragment processing unit described goes to the current video segment according to the successive frame still image executing
It is specifically used for when determining the third scoring of each of described current video segment behavior classification for identification:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, utilizes institute
It states 3D convolution model and Activity recognition is carried out to the successive frame still image of the current video segment, determination is described to work as forward sight
The third of each of the frequency segment behavior classification scores.
10. device according to claim 1 to 3, which is characterized in that
The segment composition unit is commented executing first according to each of each video clip behavior classification
Point, when determining the spatial flow scoring of each of described video to be identified behavior classification, it is specifically used for:
It for each behavior classification, executes: according to the first of the current behavior classification of each video clip the scoring, benefit
The spatial flow scoring of the current behavior classification of the video to be identified is determined with formula one, wherein the formula one are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is at least two view
The sum of frequency segment,For the first scoring of the current behavior classification of k-th of video clip;
And/or
The segment composition unit is commented executing second according to each of each video clip behavior classification
Point, when determining the time flow scoring of each of described video to be identified behavior classification, it is specifically used for:
It for each behavior classification, executes: according to the second of the current behavior classification of each video clip the scoring, benefit
The time flow scoring of the current behavior classification of the video to be identified is determined with formula two, wherein the formula two are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is at least two view
The sum of frequency segment,For the second scoring of the current behavior classification of k-th of video clip;
And/or
The segment composition unit is commented executing the third according to each of each video clip behavior classification
Point, when determining the 3D scoring of each of described video to be identified behavior classification, it is specifically used for:
It for each behavior classification, executes: being scored according to the third of the current behavior classification of each video clip, benefit
The 3D scoring of the current behavior classification of the video to be identified is determined with formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two piece of video
The sum of section,For the third scoring of the current behavior classification of k-th of video clip.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910491344.XA CN110210430A (en) | 2019-06-06 | 2019-06-06 | A kind of Activity recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910491344.XA CN110210430A (en) | 2019-06-06 | 2019-06-06 | A kind of Activity recognition method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110210430A true CN110210430A (en) | 2019-09-06 |
Family
ID=67791406
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910491344.XA Pending CN110210430A (en) | 2019-06-06 | 2019-06-06 | A kind of Activity recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110210430A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111768668A (en) * | 2020-03-31 | 2020-10-13 | 杭州海康威视数字技术股份有限公司 | Experimental operation scoring method, device, equipment and storage medium |
CN113807222A (en) * | 2021-09-07 | 2021-12-17 | 中山大学 | Video question-answering method and system for end-to-end training based on sparse sampling |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101896A (en) * | 2018-07-19 | 2018-12-28 | 电子科技大学 | A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism |
CN109376603A (en) * | 2018-09-25 | 2019-02-22 | 北京周同科技有限公司 | A kind of video frequency identifying method, device, computer equipment and storage medium |
-
2019
- 2019-06-06 CN CN201910491344.XA patent/CN110210430A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101896A (en) * | 2018-07-19 | 2018-12-28 | 电子科技大学 | A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism |
CN109376603A (en) * | 2018-09-25 | 2019-02-22 | 北京周同科技有限公司 | A kind of video frequency identifying method, device, computer equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
HAIMA1998: "Video Analysis相关领域解读之Action Recognition(行为识别)", 《HTTPS://BLOG.CSDN.NET/HAIMA1998/ARTICLE/DETAILS/78846442》, 19 December 2017 (2017-12-19) * |
佘玉梅等: "《人工智能原理及应用》", 上海交通大学出版社, pages: 177 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111768668A (en) * | 2020-03-31 | 2020-10-13 | 杭州海康威视数字技术股份有限公司 | Experimental operation scoring method, device, equipment and storage medium |
CN111768668B (en) * | 2020-03-31 | 2022-09-02 | 杭州海康威视数字技术股份有限公司 | Experimental operation scoring method, device, equipment and storage medium |
CN113807222A (en) * | 2021-09-07 | 2021-12-17 | 中山大学 | Video question-answering method and system for end-to-end training based on sparse sampling |
CN113807222B (en) * | 2021-09-07 | 2023-06-27 | 中山大学 | Video question-answering method and system for end-to-end training based on sparse sampling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Teinet: Towards an efficient architecture for video recognition | |
Varol et al. | Long-term temporal convolutions for action recognition | |
US12051275B2 (en) | Video processing method and apparatus for action recognition | |
Huang et al. | Multi-scale dense convolutional networks for efficient prediction | |
Peng et al. | Two-stream collaborative learning with spatial-temporal attention for video classification | |
JP7147078B2 (en) | Video frame information labeling method, apparatus, apparatus and computer program | |
Bilen et al. | Dynamic image networks for action recognition | |
Zhang et al. | Gender and smile classification using deep convolutional neural networks | |
CN102334118B (en) | Promoting method and system for personalized advertisement based on interested learning of user | |
Tran et al. | Two-stream flow-guided convolutional attention networks for action recognition | |
CN110516536A (en) | A kind of Weakly supervised video behavior detection method for activating figure complementary based on timing classification | |
CN106778854A (en) | Activity recognition method based on track and convolutional neural networks feature extraction | |
EP2908268A2 (en) | Face detector training method, face detection method, and apparatus | |
Kulhare et al. | Key frame extraction for salient activity recognition | |
WO2019228316A1 (en) | Action recognition method and apparatus | |
US20230351718A1 (en) | Apparatus and method for image classification | |
Hammam et al. | Real-time multiple spatiotemporal action localization and prediction approach using deep learning | |
CN110210430A (en) | A kind of Activity recognition method and device | |
CN110046568A (en) | A kind of video actions recognition methods based on Time Perception structure | |
Huang et al. | Human action recognition based on temporal pose CNN and multi-dimensional fusion | |
Symeonidis et al. | Neural attention-driven non-maximum suppression for person detection | |
CN112200096A (en) | Method, device and storage medium for realizing real-time abnormal behavior recognition based on compressed video | |
Zhao et al. | Saliency-guided video classification via adaptively weighted learning | |
Aliakbarian et al. | Deep action-and context-aware sequence learning for activity recognition and anticipation | |
Pouthier et al. | Active speaker detection as a multi-objective optimization with uncertainty-based multimodal fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |