CN110210430A - A kind of Activity recognition method and device - Google Patents

A kind of Activity recognition method and device Download PDF

Info

Publication number
CN110210430A
CN110210430A CN201910491344.XA CN201910491344A CN110210430A CN 110210430 A CN110210430 A CN 110210430A CN 201910491344 A CN201910491344 A CN 201910491344A CN 110210430 A CN110210430 A CN 110210430A
Authority
CN
China
Prior art keywords
scoring
behavior classification
video
current
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910491344.XA
Other languages
Chinese (zh)
Inventor
张俊三
王晓敏
王雷全
吴春雷
李克文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN201910491344.XA priority Critical patent/CN110210430A/en
Publication of CN110210430A publication Critical patent/CN110210430A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of Activity recognition method and device, this method, comprising: presets at least two behavior classifications;Video to be identified is divided at least two video clips;It for each video clip, executes: extracting the key frame of current video segment, stacks light stream and successive frame still image;Activity recognition is carried out respectively according to key frame, stacking light stream and successive frame still image, determines the first scoring, the second scoring and third scoring of each behavior classification of current video segment respectively;According to the first of each behavior classification of each video clip the scoring, the second scoring and third scoring, the spatial flow scoring, time flow scoring and 3D scoring of each behavior classification of video to be identified are determined respectively;According to the scoring of the spatial flow of each behavior classification of video to be identified, time flow scoring and 3D scoring, the final scoring of each behavior classification of video to be identified is generated.The present invention provides a kind of Activity recognition method and devices, can be improved the accuracy of Activity recognition.

Description

A kind of Activity recognition method and device
Technical field
The present invention relates to field of computer technology, in particular to a kind of Activity recognition method and device.
Background technique
The Activity recognition of video refers to automatically analyzes the behavior that identification human body executes from one section of video.Simplest behavior Identification is also referred to as behavior classification, and the human body behavior in unknown video can be categorized into several behavior classifications predetermined by it In.
In the prior art, the still image that several frames are extracted from video to be identified is carried out from these still images Activity recognition generates final for the recognition result identified to equipment.
As can be seen from the above description, the prior art only considers the appearance in still image when carrying out Activity recognition Information, recognition result inaccuracy.
Summary of the invention
The embodiment of the invention provides a kind of Activity recognition method and devices, can be improved the accuracy of Activity recognition.
On the one hand, the embodiment of the invention provides a kind of Activity recognition methods, comprising:
Preset at least two behavior classifications;
Video to be identified is divided at least two video clips;
For each video clip, execute: key frame, stacking light stream and the successive frame for extracting current video segment are quiet State image;Activity recognition is carried out to the current video segment according to the key frame, determines the every of the current video segment First scoring of a behavior classification carries out Activity recognition to the current video segment according to the stacking light stream, determines Second scoring of each of the current video segment behavior classification, according to the successive frame still image to described current Video clip carries out Activity recognition, determines the third scoring of each of described current video segment behavior classification;
According to the first of each of each video clip behavior classification the scoring, the video to be identified is determined The spatial flow of each behavior classification scores;It is commented according to the second of each of each video clip behavior classification Point, determine the time flow scoring of each of described video to be identified behavior classification;According to the every of each video clip The third of a behavior classification scores, and determines the 3D scoring of each of described video to be identified behavior classification;
According to the scoring of the spatial flow of each of the video to be identified behavior classification, each behavior classification Time flow scoring and the 3D of each behavior classification score, generate each of described video to be identified row For the final scoring of classification.
Optionally,
This method further comprises:
Preset the weight of the weight of the spatial flow scoring, the weight of time flow scoring and 3D scoring;
It is described according to the spatial flow of each of the video to be identified behavior classification scoring, each behavior The time flow scoring of classification and the 3D of each behavior classification score, and generate each institute of the video to be identified State the final scoring of behavior classification, comprising:
It for each behavior classification, executes: according to the scoring of the spatial flow of current behavior classification, the current line Weight, the time flow for the time flow scoring of classification, the 3D scoring of the current behavior classification, spatial flow scoring are commented The weight of the weight divided and 3D scoring determines the current behavior classification of the video to be identified most using formula four Final review point, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior class Other spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification is commented Point, a is the weight of spatial flow scoring, and b is the weight of time flow scoring, and c is the weight of 3D scoring.
Optionally,
It is described according to the spatial flow of each of the video to be identified behavior classification scoring, each behavior The time flow scoring of classification and the 3D of each behavior classification score, and generate each institute of the video to be identified State the final scoring of behavior classification, comprising:
By the spatial flow of each of the video to be identified behavior classification scoring, each behavior classification The time flow scoring and the 3D scoring of each behavior classification are input in the Linear SVM classifier of training completion, The final scoring of each of described video to be identified behavior classification is determined using the Linear SVM classifier;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
Optionally,
It is described that Activity recognition is carried out to the current video segment according to the key frame, determine the current video segment Each of the behavior classification first scoring, comprising:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, benefit Activity recognition is carried out with the key frame of the space flow model of the 2D convolution to the current video segment, is worked as described in determination First scoring of each of the preceding video clip behavior classification.
Optionally,
It is described that Activity recognition is carried out to the current video segment according to the stacking light stream, determine the current video piece Second scoring of each of the section behavior classification, comprising:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion, Activity recognition is carried out to the stacking light stream of the current video segment using the time flow model of the 2D convolution, determines institute State the second scoring of each of current video segment behavior classification.
Optionally,
It is described that Activity recognition is carried out to the current video segment according to the successive frame still image, it determines described current The third of each of video clip behavior classification scores, comprising:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, benefit Activity recognition is carried out with the successive frame still image of the 3D convolution model to the current video segment, is worked as described in determination The third of each of the preceding video clip behavior classification scores.
Optionally,
It is described to score according to the first of each of each video clip behavior classification, determine the view to be identified The spatial flow of each of frequency behavior classification scores, comprising:
For each behavior classification, executes: being commented according to the first of the current behavior classification of each video clip Point, determine that the spatial flow of the current behavior classification of the video to be identified scores using formula one, wherein the formula one Are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is described at least two The sum of a video clip,For the first scoring of the current behavior classification of k-th of video clip.
Optionally,
It is described to score according to the second of each of each video clip behavior classification, determine the view to be identified The time flow of each of frequency behavior classification scores, comprising:
For each behavior classification, executes: being commented according to the second of the current behavior classification of each video clip Point, determine that the time flow of the current behavior classification of the video to be identified scores using formula two, wherein the formula two Are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is described at least two The sum of a video clip,For the second scoring of the current behavior classification of k-th of video clip.
Optionally,
It is described to be scored according to the third of each of each video clip behavior classification, determine the view to be identified The 3D of each of frequency behavior classification scores, comprising:
For each behavior classification, executes: being commented according to the third of the current behavior classification of each video clip Point, determine that the 3D of the current behavior classification of the video to be identified scores using formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two view The sum of frequency segment,For the third scoring of the current behavior classification of k-th of video clip.
On the other hand, the embodiment of the invention provides a kind of Activity recognition devices, comprising:
First setting unit, at least two behavior classifications to be arranged;
Cutting unit, for video to be identified to be divided at least two video clips;
Fragment processing unit, for be directed to each video clip, execute: extract current video segment key frame, Stack light stream and successive frame still image;Activity recognition is carried out to the current video segment according to the key frame, determines institute The first scoring for stating each of current video segment behavior classification, according to the stacking light stream to the current video segment Activity recognition is carried out, the second scoring of each of described current video segment behavior classification is determined, according to the successive frame Still image carries out Activity recognition to the current video segment, determines each of described current video segment behavior classification Third scoring;
Segment composition unit, for scoring according to the first of each of each video clip behavior classification, really The spatial flow scoring of each of the fixed video to be identified behavior classification;According to described in each of each described video clip Second scoring of behavior classification determines the time flow scoring of each of described video to be identified behavior classification;According to each The third of each of the video clip behavior classification scores, and determines each of described video to be identified behavior classification 3D scoring;
Final integrated unit, for being commented according to the spatial flow of each of the video to be identified behavior classification Point, the 3D scoring of the scoring of the time flow of each behavior classification and each behavior classification, generate it is described to Identify the final scoring of each of the video behavior classification.
Optionally,
The device further comprises:
Second setting unit, for the weight of spatial flow scoring, the weight of time flow scoring and described to be arranged The weight of 3D scoring;
The final integrated unit executes: according to current behavior classification for being directed to each behavior classification Spatial flow scoring, the time flow scoring of the current behavior classification, the 3D scoring of the current behavior classification, the spatial flow are commented The weight of the weight, time flow scoring divided and the weight of 3D scoring, utilize formula four to determine the video to be identified The current behavior classification final scoring, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior class Other spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification is commented Point, a is the weight of spatial flow scoring, and b is the weight of time flow scoring, and c is the weight of 3D scoring.
Optionally,
The final integrated unit, for commenting the spatial flow of each of the video to be identified behavior classification Divide, the scoring of the time flow of each behavior classification and the 3D scoring of each behavior classification are input to and have trained At Linear SVM classifier in, determine each of described video to be identified behavior classification using the Linear SVM classifier Final scoring;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
Optionally,
The fragment processing unit, execute it is described according to the key frame to the current video segment carry out behavior knowledge Not, when determining the first scoring of each of described current video segment behavior classification, it is specifically used for:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, benefit Activity recognition is carried out with the key frame of the space flow model of the 2D convolution to the current video segment, is worked as described in determination First scoring of each of the preceding video clip behavior classification.
Optionally,
The fragment processing unit, execute it is described according to the stackings light stream to the current video segment progress behavior Identification is specifically used for when determining the second scoring of each of described current video segment behavior classification:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion, Activity recognition is carried out to the stacking light stream of the current video segment using the time flow model of the 2D convolution, determines institute State the second scoring of each of current video segment behavior classification.
Optionally,
The fragment processing unit, execute it is described according to the successive frame still image to the current video segment into Row Activity recognition is specifically used for when determining the third scoring of each of described current video segment behavior classification:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, benefit Activity recognition is carried out with the successive frame still image of the 3D convolution model to the current video segment, is worked as described in determination The third of each of the preceding video clip behavior classification scores.
Optionally,
The segment composition unit, it is described according to the of each of each video clip behavior classification executing One scoring is specifically used for when determining the spatial flow scoring of each of described video to be identified behavior classification:
For each behavior classification, executes: being commented according to the first of the current behavior classification of each video clip Point, determine that the spatial flow of the current behavior classification of the video to be identified scores using formula one, wherein the formula one Are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is described at least two The sum of a video clip,For the first scoring of the current behavior classification of k-th of video clip.
Optionally,
The segment composition unit, it is described according to the of each of each video clip behavior classification executing Two scorings are specifically used for when determining the time flow scoring of each of described video to be identified behavior classification:
For each behavior classification, executes: being commented according to the second of the current behavior classification of each video clip Point, determine that the time flow of the current behavior classification of the video to be identified scores using formula two, wherein the formula two Are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is described at least two The sum of a video clip,For the second scoring of the current behavior classification of k-th of video clip.
Optionally,
The segment composition unit, it is described according to the of each of each video clip behavior classification executing Three scorings are specifically used for when determining the 3D scoring of each of described video to be identified behavior classification:
For each behavior classification, executes: being commented according to the third of the current behavior classification of each video clip Point, determine that the 3D of the current behavior classification of the video to be identified scores using formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two view The sum of frequency segment,For the third scoring of the current behavior classification of k-th of video clip.
In embodiments of the present invention, video to be identified is divided at least two video clips, extracts each video clip and closes Key frame stacks light stream and successive frame still image, right in terms of key frame, stacking light stream and successive frame still image three respectively Each video clip carries out Activity recognition and is then fused to the final recognition result of video to be identified, in identification process, Activity recognition is carried out in terms of three by video segmentation to be identified, and to each video clip, it can be from time, space and space-time Three angles carry out Activity recognition to video to be identified, can based on more angles, more information come it is comprehensive identify to It identifies video, will finally be fused to final recognition result, greatly improve the accuracy of Activity recognition.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart for Activity recognition method that one embodiment of the invention provides;
Fig. 2 is the flow chart for another Activity recognition method that one embodiment of the invention provides;
Fig. 3 is a kind of schematic diagram for Activity recognition device that one embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, this method may comprise steps of the embodiment of the invention provides a kind of Activity recognition method:
Step 101: presetting at least two behavior classifications;
Step 102: video to be identified is divided at least two video clips;
Step 103: be directed to each video clip, execute: extract current video segment key frame, stack light stream and Successive frame still image;Activity recognition is carried out to the current video segment according to the key frame, determines the current video First scoring of each of segment behavior classification carries out behavior knowledge to the current video segment according to the stacking light stream Not, the second scoring for determining each of described current video segment behavior classification, according to the successive frame still image pair The current video segment carries out Activity recognition, determines that the third of each of described current video segment behavior classification is commented Point;
Step 104: according to the first of each of each video clip behavior classification the scoring, determining described wait know The spatial flow of each of the other video behavior classification scores;According to each of each video clip behavior classification Second scoring determines the time flow scoring of each of described video to be identified behavior classification;According to each piece of video The third scoring of each of the section behavior classification determines the 3D scoring of each of described video to be identified behavior classification;
Step 105: being scored according to the spatial flow of each of the video to be identified behavior classification, is each described The time flow scoring of behavior classification and the 3D of each behavior classification score, and generate the every of the video to be identified The final scoring of a behavior classification.
In embodiments of the present invention, video to be identified is divided at least two video clips, extracts each video clip and closes Key frame stacks light stream and successive frame still image, right in terms of key frame, stacking light stream and successive frame still image three respectively Each video clip carries out Activity recognition and is then fused to the final recognition result of video to be identified, in identification process, Activity recognition is carried out in terms of three by video segmentation to be identified, and to each video clip, it can be from time, space and space-time Three angles carry out Activity recognition to video to be identified, can based on more angles, more information come it is comprehensive identify to It identifies video, will finally be fused to final recognition result, greatly improve the accuracy of Activity recognition.
In embodiments of the present invention, the scoring of behavior classification refers to that the behavior in video belongs to the score of behavior classification, Score is higher, illustrates that a possibility that current video segment belongs to behavior classification is higher.
Specifically, the first scoring refers under the mode for carrying out Activity recognition based on key frame, in current video segment Behavior belongs to the scoring of some behavior classification;Second, which scores, refers under based on the mode for stacking light stream progress Activity recognition, when Behavior in preceding video clip belongs to the scoring of some behavior classification;Third scoring refers to be carried out based on successive frame still image Under the mode of Activity recognition, the behavior in current video segment belongs to the scoring of some behavior classification.
Spatial flow scoring refers to that under the mode for carrying out Activity recognition based on key frame, the behavior in video to be identified belongs to The scoring of some behavior classification;Time flow, which scores, to be referred under based on the mode for stacking light stream progress Activity recognition, view to be identified Behavior in frequency belongs to the scoring of some behavior classification;3D scoring, which refers to, is carrying out Activity recognition based on successive frame still image Under mode, the behavior in video to be identified belongs to the scoring of some behavior classification.
Final scoring, which refers to, will be carried out the mode of Activity recognition based on key frame, carries out Activity recognition based on stacking light stream After mode and the mode for carrying out Activity recognition based on successive frame still image are merged, the obtained behavior in video to be identified Belong to the scoring of some behavior classification.
In embodiments of the present invention, it can be realized by way of sparse sampling and described video to be identified is divided at least two A video clip and the key frame for extracting current video segment stack light stream and successive frame still image.
In embodiments of the present invention, described that video to be identified is divided at least two video clips, comprising: by view to be identified Frequency is averagely divided at least two video clips.Wherein, the duration of each video clip is equal.
In embodiments of the present invention, the key frame for extracting current video segment includes: to obtain from current video segment at random A frame image is taken, using the frame image as key frame.Here keyword can be individual still image.
In an embodiment of the present invention, this method further comprises:
Preset the weight of the weight of the spatial flow scoring, the weight of time flow scoring and 3D scoring;
It is described according to the spatial flow of each of the video to be identified behavior classification scoring, each behavior The time flow scoring of classification and the 3D of each behavior classification score, and generate each institute of the video to be identified State the final scoring of behavior classification, comprising:
It for each behavior classification, executes: according to the scoring of the spatial flow of current behavior classification, the current line Weight, the time flow for the time flow scoring of classification, the 3D scoring of the current behavior classification, spatial flow scoring are commented The weight of the weight divided and 3D scoring determines the current behavior classification of the video to be identified most using formula four Final review point, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior class Other spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification is commented Point, a is the weight of spatial flow scoring, and b is the weight of time flow scoring, and c is the weight of 3D scoring.
In embodiments of the present invention, weight, the weight of time flow scoring and the 3D scoring scored by installation space stream Spatial flow scoring, time flow scoring and 3D scoring are merged, generate final scoring by weight.For example, a can be 0.3, b, which can be 0.3, c, can be 0.4 etc..Spatial flow scoring, time flow scoring and 3D is merged by the implementation to comment Point, it is more accurate to enable to finally score.
In an embodiment of the present invention, the space according to each of the video to be identified behavior classification The 3D of stream scoring, the time flow scoring of each behavior classification and each behavior classification scores, and generates institute State the final scoring of each of video to be identified behavior classification, comprising:
By the spatial flow of each of the video to be identified behavior classification scoring, each behavior classification The time flow scoring and the 3D scoring of each behavior classification are input in the Linear SVM classifier of training completion, The final scoring of each of described video to be identified behavior classification is determined using the Linear SVM classifier;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
In embodiments of the present invention, the SVM classifier completed using training is commented spatial flow scoring, time flow scoring and 3D Divide fusion, and above-mentioned kernel function is set, it is more accurate to enable to finally score.Wherein, d can be 1,2,3 etc., Specifically, d can be 9.
In addition, when being trained to Linear SVM classifier,
SVM classifier is by predicting that score and label are input in Linear SVM classifier and carry out for training set videl stage Training.
SVM classifier is by predicting that score and label are input in Linear SVM classifier and carry out for training set videl stage Training.
It is in an embodiment of the present invention, described that Activity recognition is carried out to the current video segment according to the key frame, Determine the first scoring of each of described current video segment behavior classification, comprising:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, benefit Activity recognition is carried out with the key frame of the space flow model of the 2D convolution to the current video segment, is worked as described in determination First scoring of each of the preceding video clip behavior classification.
In embodiments of the present invention, using the space flow model of the 2D convolution in double-stream digestion come to key frame Reason realizes the Activity recognition to current video segment.It can be from the angle in space to current by the space flow model of 2D convolution Video clip carries out Activity recognition.
In an embodiment of the present invention, described that behavior knowledge is carried out to the current video segment according to the stacking light stream Not, the second scoring of each of described current video segment behavior classification is determined, comprising:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion, Activity recognition is carried out to the stacking light stream of the current video segment using the time flow model of the 2D convolution, determines institute State the second scoring of each of current video segment behavior classification.
In embodiments of the present invention, using the time flow model of the 2D convolution in double-stream digestion come to stack light stream at Reason realizes the Activity recognition to current video segment.It can be from the angle of time to current by the time flow model of 2D convolution Video clip carries out Activity recognition.
In an embodiment of the present invention, described to be gone according to the successive frame still image to the current video segment For identification, the third scoring of each of described current video segment behavior classification is determined, comprising:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, benefit Activity recognition is carried out with the successive frame still image of the 3D convolution model to the current video segment, is worked as described in determination The third of each of the preceding video clip behavior classification scores.
In embodiments of the present invention, successive frame still image is handled using 3D convolution model, is realized to current The Activity recognition of video clip.Activity recognition can be carried out to current video segment from the angle of space-time by 3D convolution model.
In addition, the time flow model and 3D convolution model of the space flow model of 2D convolution, 2D convolution can be by with lower sections Formula is trained:
J1: the building space flow model of 2D convolution, the time flow model of 2D convolution and 3D convolution model these three networks Model, the training set data that will acquire are sent into each network model, by a series of convolution, Chi Hua, nonlinear activation function, Normalization, full articulamentum, softmax function operation, output video actions classification score, complete the forward-propagating process of network;
Above-mentioned formula be softmax function, wherein Xi be network in i-th of neuron of the last layer output, i ∈ [1, N], N is total classification number of behavior classification.
J2: it calculates network model and finally exports and intersect entropy loss between layer data and actual value, according to backpropagation tune The parameter for saving each layer in each network model completes the back-propagation process of network;
The concrete form of cross entropy loss function is as follows:
L=- ∑ zilnyi
Wherein, ziFor true classification results.
J3: the forward-propagating and back-propagation process of continuous iteration first two steps, until network convergence.
In an embodiment of the present invention, first according to each of each video clip behavior classification is commented Point, determine the spatial flow scoring of each of described video to be identified behavior classification, comprising:
For each behavior classification, executes: being commented according to the first of the current behavior classification of each video clip Point, determine that the spatial flow of the current behavior classification of the video to be identified scores using formula one, wherein the formula one Are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is described at least two The sum of a video clip,For the first scoring of the current behavior classification of k-th of video clip.
In an embodiment of the present invention, second according to each of each video clip behavior classification is commented Point, determine the time flow scoring of each of described video to be identified behavior classification, comprising:
For each behavior classification, executes: being commented according to the second of the current behavior classification of each video clip Point, determine that the time flow of the current behavior classification of the video to be identified scores using formula two, wherein the formula two Are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is described at least two The sum of a video clip,For the second scoring of the current behavior classification of k-th of video clip.
In an embodiment of the present invention, the third according to each of each video clip behavior classification is commented Point, determine the 3D scoring of each of described video to be identified behavior classification, comprising:
For each behavior classification, executes: being commented according to the third of the current behavior classification of each video clip Point, determine that the 3D of the current behavior classification of the video to be identified scores using formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two view The sum of frequency segment,For the third scoring of the current behavior classification of k-th of video clip.
As shown in Fig. 2, this method may comprise steps of the embodiment of the invention provides a kind of Activity recognition method:
Step 201: presetting at least two behavior classifications.
Specifically, which may include: race, jumps, walks, climbs, plays basketball, plays tennis, plays volleyball, kicks Football etc..
Step 202: video to be identified is divided at least two video clips.
Specifically, video to be identified can be averagely divided at least two video clips.
Step 203: being directed to each video clip, execute: extracting the key frame of current video segment, stack light stream and continuous Frame still image;The key frame of current video segment is input in the space flow model of the 2D convolution of training completion, utilizes 2D The space flow model of convolution carries out Activity recognition to the key frame of current video segment, determines each behavior of current video segment First scoring of classification;The stacking light stream of current video segment is input in the time flow model of the 2D convolution of training completion, Activity recognition is carried out to the stacking light stream of current video segment using the time flow model of 2D convolution, determines current video segment Second scoring of each behavior classification;The successive frame still image of current video segment is input to the 3D convolution mould of training completion In type, Activity recognition is carried out using successive frame still image of the 3D convolution model to current video segment, determines current video piece The third scoring of each behavior classification of section.
For example, it for video clip 1, executes:
It extracts the key frame of video clip 1, stack light stream and successive frame still image;
The key frame of video clip 1 is input in the space flow model of the 2D convolution of training completion, utilizes 2D convolution Space flow model carries out Activity recognition to the key frame of current video segment, determines each behavior classification of current video segment First scoring;Such as: the first scoring of behavior classification " race ", the first scoring of behavior classification " jump ", behavior classification " are played basketball " First scoring etc..
The stacking light stream of video clip 1 is input in the time flow model of the 2D convolution of training completion, utilizes 2D convolution Time flow model Activity recognition is carried out to the stacking light stream of video clip 1, determine the of each behavior classification of video clip 1 Two scorings;Such as: the second scoring of behavior classification " race ", the second scoring of behavior classification " jump ", behavior classification " playing basketball " Second scoring etc..
The successive frame still image of video clip 1 is input in the 3D convolution model of training completion, utilizes 3D convolution mould Type carries out Activity recognition to the successive frame still image of video clip 1, determines that the third of each behavior classification of video clip 1 is commented Point;Such as: the third scoring of behavior classification " race ", the third scoring of behavior classification " jump ", the third of behavior classification " playing basketball " Scoring etc..
In addition, can be extracted in the way of 5 frame per second when extracting stacking light stream.
The size of key frame can be 1 × 3 × L × W (such as RGB channel number is 3), and stacking light stream size can be 5 × 2 (the five light stream figures directly extracted from continuous six static images indicate x wherein a light stream figure is two channels to × L × W The pixel variation in direction and the pixel in the direction y change), successive frame still image size can be (continuous for 16 × 3 × L × W 16 frame RGB pictures), L and W are the length and width of input picture.
Step 204: according to the first of each behavior classification of each video clip the scoring, determining each of video to be identified The spatial flow of behavior classification scores;According to the second of each behavior classification of each video clip the scoring, video to be identified is determined Each behavior classification time flow scoring;It is scored, is determined wait know according to the third of each behavior classification of each video clip The 3D of each behavior classification of other video scores.
For example, there are three video clips altogether for video to be identified, are video clip 1, video clip 2 and piece of video respectively Section 3, for behavior classification " race ", the first scoring of the behavior classification " race " of video clip 1 is P1, the row of video clip 2 The first scoring for classification " race " is P2, and the first scoring of the behavior classification " race " of video clip 3 is P3, then, according to P1, P2 And P3, determine the spatial flow scoring of the behavior classification " race " of video to be identified.
Step 205: according to the scoring of the spatial flow of each behavior classification of video to be identified, the time flow of each behavior classification The 3D of scoring and each behavior classification scores, and generates the final scoring of each behavior classification of video to be identified.
For example, for for behavior classification " race ", the spatial flow scoring S1 of the behavior classification " race " of video to be identified, The 3D scoring M1 of the behavior classification " race " of the time flow scoring T1 and video to be identified of the behavior classification " race " of video to be identified, that , according to S1, T1 and M1, generate the final scoring of the behavior classification " race " of video to be identified.
The embodiment of the present invention makes full use of time in video, space, space time information, simultaneously by merging multiple models In view of the influence of long movement, merges multiple fragment stage Activity recognition results and obtain videl stage Activity recognition as a result, obtaining more Accurate recognition result.
The embodiment of the present invention has merged 2D convolution and 3D convolution model, and wherein 2D convolution refers to double-stream digestion, respectively from The angle in time and space analyzes video content, is adequately utilized and acts change information and appearance information, and 3D Convolution model then analyzes video content from the angle of space-time, and the space time information between multiple frames is adequately utilized.It is logical Fusion 2D convolution sum 3D convolution model is crossed, more fully information is obtained.Meanwhile video is divided into multiple by the embodiment of the present invention Section obtains the Activity recognition result of videl stage by merging the prediction result of multiple segments.Therefore, the embodiment of the present invention is not only The information of multiple dimensions is utilized, has also widened the visual field of time dimension, obtains the prediction result of videl stage, meanwhile, the present invention Embodiment uses sparse sampling, reduces training parameter on the basis of obtaining global information.The embodiment of the present invention can obtain more Complete wider array of information, to reach more accurate recognition result.
As shown in figure 3, the embodiment of the invention provides a kind of Activity recognition devices, comprising:
First setting unit 301, at least two behavior classifications to be arranged;
Cutting unit 302, for video to be identified to be divided at least two video clips;
Fragment processing unit 303 executes for being directed to each video clip: extracting the key of current video segment Frame stacks light stream and successive frame still image;Activity recognition is carried out to the current video segment according to the key frame, is determined First scoring of each of the current video segment behavior classification, according to the stacking light stream to the current video piece Duan Jinhang Activity recognition determines the second scoring of each of described current video segment behavior classification, according to described continuous Frame still image carries out Activity recognition to the current video segment, determines each of described current video segment behavior class Other third scoring;
Segment composition unit 304, for scoring according to the first of each of each video clip behavior classification, Determine the spatial flow scoring of each of described video to be identified behavior classification;According to each institute of each video clip The second scoring for stating behavior classification determines the time flow scoring of each of described video to be identified behavior classification;According to every The third of each of a video clip behavior classification scores, and determines each of described video to be identified behavior class Other 3D scoring;
Final integrated unit 305, for the spatial flow according to each of the video to be identified behavior classification The 3D of scoring, the time flow scoring of each behavior classification and each behavior classification scores, described in generation The final scoring of each of the video to be identified behavior classification.
In an embodiment of the present invention, which further comprises:
Second setting unit, for the weight of spatial flow scoring, the weight of time flow scoring and described to be arranged The weight of 3D scoring;
The final integrated unit executes: according to current behavior classification for being directed to each behavior classification Spatial flow scoring, the time flow scoring of the current behavior classification, the 3D scoring of the current behavior classification, the spatial flow are commented The weight of the weight, time flow scoring divided and the weight of 3D scoring, utilize formula four to determine the video to be identified The current behavior classification final scoring, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior class Other spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification is commented Point, a is the weight of spatial flow scoring, and b is the weight of time flow scoring, and c is the weight of 3D scoring.
In an embodiment of the present invention, the final integrated unit, for by each of the video to be identified row For the institute of spatial flow scoring, the time flow scoring and each behavior classification of each behavior classification of classification It states 3D scoring to be input in the Linear SVM classifier of training completion, determines the view to be identified using the Linear SVM classifier The final scoring of each of frequency behavior classification;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
In an embodiment of the present invention, the fragment processing unit described is worked as according to the key frame to described executing Preceding video clip carries out Activity recognition, when determining the first scoring of each of described current video segment behavior classification, tool Body is used for:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, benefit Activity recognition is carried out with the key frame of the space flow model of the 2D convolution to the current video segment, is worked as described in determination First scoring of each of the preceding video clip behavior classification.
In an embodiment of the present invention, the fragment processing unit, execute it is described according to the stacking light stream to described Current video segment carries out Activity recognition, when determining the second scoring of each of described current video segment behavior classification, It is specifically used for:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion, Activity recognition is carried out to the stacking light stream of the current video segment using the time flow model of the 2D convolution, determines institute State the second scoring of each of current video segment behavior classification.
In an embodiment of the present invention, the fragment processing unit, it is described according to the successive frame still image executing Activity recognition is carried out to the current video segment, determines that the third of each of described current video segment behavior classification is commented Timesharing is specifically used for:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, benefit Activity recognition is carried out with the successive frame still image of the 3D convolution model to the current video segment, is worked as described in determination The third of each of the preceding video clip behavior classification scores.
In an embodiment of the present invention, the segment composition unit, it is described according to each video clip executing First scoring of each behavior classification determines the spatial flow scoring of each of described video to be identified behavior classification When, it is specifically used for:
For each behavior classification, executes: being commented according to the first of the current behavior classification of each video clip Point, determine that the spatial flow of the current behavior classification of the video to be identified scores using formula one, wherein the formula one Are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is described at least two The sum of a video clip,For the first scoring of the current behavior classification of k-th of video clip.
In an embodiment of the present invention, the segment composition unit, it is described according to each video clip executing Second scoring of each behavior classification determines the time flow scoring of each of described video to be identified behavior classification When, it is specifically used for:
For each behavior classification, executes: being commented according to the second of the current behavior classification of each video clip Point, determine that the time flow of the current behavior classification of the video to be identified scores using formula two, wherein the formula two Are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is described at least two The sum of a video clip,For the second scoring of the current behavior classification of k-th of video clip.
In an embodiment of the present invention, the segment composition unit, it is described according to each video clip executing The third of each behavior classification scores, when determining the 3D scoring of each of described video to be identified behavior classification, tool Body is used for:
For each behavior classification, executes: being commented according to the third of the current behavior classification of each video clip Point, determine that the 3D of the current behavior classification of the video to be identified scores using formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two view The sum of frequency segment,For the third scoring of the current behavior classification of k-th of video clip.
The contents such as the information exchange between each unit, implementation procedure in above-mentioned apparatus, due to implementing with the method for the present invention Example is based on same design, and for details, please refer to the description in the embodiment of the method for the present invention, and details are not described herein again.
The embodiment of the invention provides a kind of readable mediums, including execute instruction, when the processor of storage control executes Described when executing instruction, the storage control executes any one Activity recognition method provided in an embodiment of the present invention.
The embodiment of the invention provides a kind of storage controls, comprising: processor, memory and bus;
The memory is executed instruction for storing, and the processor is connect with the memory by the bus, when When the storage control is run, the processor executes the described of memory storage and executes instruction, so that the storage Controller executes any one Activity recognition method provided in an embodiment of the present invention.
The each embodiment of the present invention at least has the following beneficial effects:
1, video to be identified in embodiments of the present invention, is divided at least two video clips, extracts each video clip Key frame stacks light stream and successive frame still image, respectively in terms of key frame, stacking light stream and successive frame still image three Activity recognition is carried out to each video clip and then the final recognition result of video to be identified is fused to, in identification process In, Activity recognition is carried out in terms of three by video segmentation to be identified, and to each video clip, can from the time, space and Three angles of space-time carry out Activity recognition to video to be identified, can comprehensively be known based on more angles, more information Video not to be identified will finally be fused to final recognition result, greatly improve the accuracy of Activity recognition.
2, the embodiment of the present invention is by merging multiple models, makes full use of time in video, space, space time information, together When in view of long movement influence, merge multiple fragment stage Activity recognition results and obtain videl stage Activity recognition as a result, obtaining more Add accurate recognition result.
3, the embodiment of the present invention has merged 2D convolution and 3D convolution model, and wherein 2D convolution refers to double-stream digestion, respectively Video content is analyzed from the angle in time and space, movement change information and appearance information is adequately utilized, and 3D convolution model then analyzes video content from the angle of space-time, and the space time information between multiple frames is adequately utilized. By merging 2D convolution sum 3D convolution model, more fully information is obtained.Meanwhile the embodiment of the present invention video is divided into it is multiple Segment obtains the Activity recognition result of videl stage by merging the prediction result of multiple segments.Therefore, the embodiment of the present invention is not Merely with the information of multiple dimensions, the visual field of time dimension has also been widened, has obtained the prediction result of videl stage, the present invention is implemented Example can obtain more complete wider array of information, to reach more accurate recognition result.
It should be noted that, in this document, such as first and second etc relational terms are used merely to an entity Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non- It is exclusive to include, so that the process, method, article or equipment for including a series of elements not only includes those elements, It but also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment Some elements.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including There is also other identical factors in the process, method, article or equipment of the element.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can store in computer-readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light In the various media that can store program code such as disk.
Finally, it should be noted that the foregoing is merely presently preferred embodiments of the present invention, it is merely to illustrate skill of the invention Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention, Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.

Claims (10)

1. a kind of Activity recognition method, which is characterized in that preset at least two behavior classifications, comprising:
Video to be identified is divided at least two video clips;
It for each video clip, executes: extracting the key frame of current video segment, stacks light stream and successive frame static map Picture;Activity recognition is carried out to the current video segment according to the key frame, determines each institute of the current video segment State behavior classification first scoring, according to the stackings light stream to the current video segment carry out Activity recognition, determination described in Second scoring of each of current video segment behavior classification, according to the successive frame still image to the current video Segment carries out Activity recognition, determines the third scoring of each of described current video segment behavior classification;
According to the first of each of each video clip behavior classification the scoring, each of described video to be identified is determined The spatial flow of the behavior classification scores;According to the second of each of each video clip behavior classification the scoring, really The time flow scoring of each of the fixed video to be identified behavior classification;According to described in each of each described video clip The third of behavior classification scores, and determines the 3D scoring of each of described video to be identified behavior classification;
According to the scoring of the spatial flow of each of the video to be identified behavior classification, the institute of each behavior classification The 3D for stating time flow scoring and each behavior classification scores, and generates each of the video to be identified behavior class Other final scoring.
2. the method according to claim 1, wherein
Further comprise:
Preset the weight of the weight of the spatial flow scoring, the weight of time flow scoring and 3D scoring;
It is described according to the spatial flow of each of the video to be identified behavior classification scoring, each behavior classification Time flow scoring and the 3D of each behavior classification score, generate each of described video to be identified row For the final scoring of classification, comprising:
It for each behavior classification, executes: according to the scoring of the spatial flow of current behavior classification, the current behavior class Other time flow scoring, the 3D scoring of the current behavior classification, the weight of spatial flow scoring, the time flow score The weight of weight and 3D scoring, the most final review of the current behavior classification of the video to be identified is determined using formula four Point, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior classification The spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification scores, a For the weight of spatial flow scoring, b is the weight of time flow scoring, and c is the weight of 3D scoring.
3. the method according to claim 1, wherein
It is described according to the spatial flow of each of the video to be identified behavior classification scoring, each behavior classification Time flow scoring and the 3D of each behavior classification score, generate each of described video to be identified row For the final scoring of classification, comprising:
It will be described in the spatial flow scoring of each of the video to be identified behavior classification, each behavior classification Time flow scoring and the 3D scoring of each behavior classification are input in the Linear SVM classifier of training completion, are utilized The Linear SVM classifier determines the final scoring of each of described video to be identified behavior classification;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
4. method according to claim 1 to 3, which is characterized in that
It is described that Activity recognition is carried out to the current video segment according to the key frame, determine the every of the current video segment First scoring of a behavior classification, comprising:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, utilizes institute The space flow model for stating 2D convolution carries out Activity recognition to the key frame of the current video segment, and determination is described to work as forward sight First scoring of each of the frequency segment behavior classification;
And/or
It is described that Activity recognition is carried out to the current video segment according to the stacking light stream, determine the current video segment Second scoring of each behavior classification, comprising:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion, is utilized The time flow model of the 2D convolution carries out Activity recognition to the stacking light stream of the current video segment, works as described in determination Second scoring of each of the preceding video clip behavior classification;
And/or
It is described that Activity recognition is carried out to the current video segment according to the successive frame still image, determine the current video The third of each of segment behavior classification scores, comprising:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, utilizes institute It states 3D convolution model and Activity recognition is carried out to the successive frame still image of the current video segment, determination is described to work as forward sight The third of each of the frequency segment behavior classification scores.
5. method according to claim 1 to 3, which is characterized in that
It is described to score according to the first of each of each video clip behavior classification, determine the video to be identified The spatial flow of each behavior classification scores, comprising:
It for each behavior classification, executes: according to the first of the current behavior classification of each video clip the scoring, benefit The spatial flow scoring of the current behavior classification of the video to be identified is determined with formula one, wherein the formula one are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is at least two view The sum of frequency segment,For the first scoring of the current behavior classification of k-th of video clip;
And/or
It is described to score according to the second of each of each video clip behavior classification, determine the video to be identified The time flow of each behavior classification scores, comprising:
It for each behavior classification, executes: according to the second of the current behavior classification of each video clip the scoring, benefit The time flow scoring of the current behavior classification of the video to be identified is determined with formula two, wherein the formula two are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is at least two view The sum of frequency segment,For the second scoring of the current behavior classification of k-th of video clip;
And/or
It is described to be scored according to the third of each of each video clip behavior classification, determine the video to be identified The 3D of each behavior classification scores, comprising:
It for each behavior classification, executes: being scored according to the third of the current behavior classification of each video clip, benefit The 3D scoring of the current behavior classification of the video to be identified is determined with formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two piece of video The sum of section,For the third scoring of the current behavior classification of k-th of video clip.
6. a kind of Activity recognition device characterized by comprising
First setting unit, at least two behavior classifications to be arranged;
Cutting unit, for video to be identified to be divided at least two video clips;
Fragment processing unit executes for being directed to each video clip: extracting the key frame of current video segment, stacks Light stream and successive frame still image;Activity recognition is carried out to the current video segment according to the key frame, is worked as described in determination First scoring of each of the preceding video clip behavior classification carries out the current video segment according to the stacking light stream Activity recognition determines the second scoring of each of described current video segment behavior classification, static according to the successive frame Image carries out Activity recognition to the current video segment, determines the of each of described current video segment behavior classification Three scorings;
Segment composition unit, for determining institute according to the first of each of each video clip behavior classification the scoring State the spatial flow scoring of each of video to be identified behavior classification;According to the behavior of each of each video clip Second scoring of classification determines the time flow scoring of each of described video to be identified behavior classification;According to each described The third of each of video clip behavior classification scores, and determines the 3D of each of described video to be identified behavior classification Scoring;
Final integrated unit, for being scored according to the spatial flow of each of the video to be identified behavior classification, often The time flow scoring of a behavior classification and the 3D of each behavior classification score, and generate the view to be identified The final scoring of each of frequency behavior classification.
7. device according to claim 6, which is characterized in that further comprise:
Second setting unit, weight, the weight of time flow scoring and the 3D for the spatial flow scoring to be arranged are commented The weight divided;
The final integrated unit executes: for being directed to each behavior classification according to the space of current behavior classification Stream scoring, the time flow scoring of the current behavior classification, the 3D scoring of the current behavior classification, the spatial flow score The weight of weight, the weight of time flow scoring and 3D scoring, the institute of the video to be identified is determined using formula four State the final scoring of current behavior classification, wherein the formula four are as follows:
O=aS+bT+cM;
Wherein, O is the final scoring of the current behavior classification of the video to be identified, and S is the current behavior classification The spatial flow scoring, T are that the time flow of the current behavior classification scores, and M is that the 3D of the current behavior classification scores, a For the weight of spatial flow scoring, b is the weight of time flow scoring, and c is the weight of 3D scoring.
8. device according to claim 6, which is characterized in that
The final integrated unit, for the spatial flow of each of the video to be identified behavior classification to score, The time flow scoring of each behavior classification and the 3D scoring of each behavior classification are input to trained completion Linear SVM classifier in, determine each of described video to be identified behavior classification using the Linear SVM classifier Final scoring;
Wherein, the kernel function of the Linear SVM classifier are as follows:
k(a,ai)=((xxi)+1)d, d is preset constant, and d is positive integer.
9. device according to claim 1 to 3, which is characterized in that
The fragment processing unit, execute it is described according to the key frame to the current video segment carry out Activity recognition, When determining the first scoring of each of described current video segment behavior classification, it is specifically used for:
The key frame of the current video segment is input in the space flow model of the 2D convolution of training completion, utilizes institute The space flow model for stating 2D convolution carries out Activity recognition to the key frame of the current video segment, and determination is described to work as forward sight First scoring of each of the frequency segment behavior classification;
And/or
The fragment processing unit, execute it is described according to the stackings light stream to the current video segment progress behavior knowledge Not, when determining the second scoring of each of described current video segment behavior classification, it is specifically used for:
The stacking light stream of the current video segment is input in the time flow model of the 2D convolution of training completion, is utilized The time flow model of the 2D convolution carries out Activity recognition to the stacking light stream of the current video segment, works as described in determination Second scoring of each of the preceding video clip behavior classification;
And/or
The fragment processing unit described goes to the current video segment according to the successive frame still image executing It is specifically used for when determining the third scoring of each of described current video segment behavior classification for identification:
The successive frame still image of the current video segment is input in the 3D convolution model of training completion, utilizes institute It states 3D convolution model and Activity recognition is carried out to the successive frame still image of the current video segment, determination is described to work as forward sight The third of each of the frequency segment behavior classification scores.
10. device according to claim 1 to 3, which is characterized in that
The segment composition unit is commented executing first according to each of each video clip behavior classification Point, when determining the spatial flow scoring of each of described video to be identified behavior classification, it is specifically used for:
It for each behavior classification, executes: according to the first of the current behavior classification of each video clip the scoring, benefit The spatial flow scoring of the current behavior classification of the video to be identified is determined with formula one, wherein the formula one are as follows:
Wherein, VidαFor the spatial flow scoring of the current behavior classification of the video to be identified, K is at least two view The sum of frequency segment,For the first scoring of the current behavior classification of k-th of video clip;
And/or
The segment composition unit is commented executing second according to each of each video clip behavior classification Point, when determining the time flow scoring of each of described video to be identified behavior classification, it is specifically used for:
It for each behavior classification, executes: according to the second of the current behavior classification of each video clip the scoring, benefit The time flow scoring of the current behavior classification of the video to be identified is determined with formula two, wherein the formula two are as follows:
Wherein, VidβFor the time flow scoring of the current behavior classification of the video to be identified, K is at least two view The sum of frequency segment,For the second scoring of the current behavior classification of k-th of video clip;
And/or
The segment composition unit is commented executing the third according to each of each video clip behavior classification Point, when determining the 3D scoring of each of described video to be identified behavior classification, it is specifically used for:
It for each behavior classification, executes: being scored according to the third of the current behavior classification of each video clip, benefit The 3D scoring of the current behavior classification of the video to be identified is determined with formula three, wherein the formula three are as follows:
Wherein, VidγFor the 3D scoring of the current behavior classification of the video to be identified, K is at least two piece of video The sum of section,For the third scoring of the current behavior classification of k-th of video clip.
CN201910491344.XA 2019-06-06 2019-06-06 A kind of Activity recognition method and device Pending CN110210430A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910491344.XA CN110210430A (en) 2019-06-06 2019-06-06 A kind of Activity recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910491344.XA CN110210430A (en) 2019-06-06 2019-06-06 A kind of Activity recognition method and device

Publications (1)

Publication Number Publication Date
CN110210430A true CN110210430A (en) 2019-09-06

Family

ID=67791406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910491344.XA Pending CN110210430A (en) 2019-06-06 2019-06-06 A kind of Activity recognition method and device

Country Status (1)

Country Link
CN (1) CN110210430A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768668A (en) * 2020-03-31 2020-10-13 杭州海康威视数字技术股份有限公司 Experimental operation scoring method, device, equipment and storage medium
CN113807222A (en) * 2021-09-07 2021-12-17 中山大学 Video question-answering method and system for end-to-end training based on sparse sampling

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism
CN109376603A (en) * 2018-09-25 2019-02-22 北京周同科技有限公司 A kind of video frequency identifying method, device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism
CN109376603A (en) * 2018-09-25 2019-02-22 北京周同科技有限公司 A kind of video frequency identifying method, device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAIMA1998: "Video Analysis相关领域解读之Action Recognition(行为识别)", 《HTTPS://BLOG.CSDN.NET/HAIMA1998/ARTICLE/DETAILS/78846442》, 19 December 2017 (2017-12-19) *
佘玉梅等: "《人工智能原理及应用》", 上海交通大学出版社, pages: 177 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768668A (en) * 2020-03-31 2020-10-13 杭州海康威视数字技术股份有限公司 Experimental operation scoring method, device, equipment and storage medium
CN111768668B (en) * 2020-03-31 2022-09-02 杭州海康威视数字技术股份有限公司 Experimental operation scoring method, device, equipment and storage medium
CN113807222A (en) * 2021-09-07 2021-12-17 中山大学 Video question-answering method and system for end-to-end training based on sparse sampling
CN113807222B (en) * 2021-09-07 2023-06-27 中山大学 Video question-answering method and system for end-to-end training based on sparse sampling

Similar Documents

Publication Publication Date Title
Liu et al. Teinet: Towards an efficient architecture for video recognition
Varol et al. Long-term temporal convolutions for action recognition
US12051275B2 (en) Video processing method and apparatus for action recognition
Huang et al. Multi-scale dense convolutional networks for efficient prediction
Peng et al. Two-stream collaborative learning with spatial-temporal attention for video classification
JP7147078B2 (en) Video frame information labeling method, apparatus, apparatus and computer program
Bilen et al. Dynamic image networks for action recognition
Zhang et al. Gender and smile classification using deep convolutional neural networks
CN102334118B (en) Promoting method and system for personalized advertisement based on interested learning of user
Tran et al. Two-stream flow-guided convolutional attention networks for action recognition
CN110516536A (en) A kind of Weakly supervised video behavior detection method for activating figure complementary based on timing classification
CN106778854A (en) Activity recognition method based on track and convolutional neural networks feature extraction
EP2908268A2 (en) Face detector training method, face detection method, and apparatus
Kulhare et al. Key frame extraction for salient activity recognition
WO2019228316A1 (en) Action recognition method and apparatus
US20230351718A1 (en) Apparatus and method for image classification
Hammam et al. Real-time multiple spatiotemporal action localization and prediction approach using deep learning
CN110210430A (en) A kind of Activity recognition method and device
CN110046568A (en) A kind of video actions recognition methods based on Time Perception structure
Huang et al. Human action recognition based on temporal pose CNN and multi-dimensional fusion
Symeonidis et al. Neural attention-driven non-maximum suppression for person detection
CN112200096A (en) Method, device and storage medium for realizing real-time abnormal behavior recognition based on compressed video
Zhao et al. Saliency-guided video classification via adaptively weighted learning
Aliakbarian et al. Deep action-and context-aware sequence learning for activity recognition and anticipation
Pouthier et al. Active speaker detection as a multi-objective optimization with uncertainty-based multimodal fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination