CN109886104A

CN109886104A - A kind of motion feature extracting method based on the perception of video before and after frames relevant information

Info

Publication number: CN109886104A
Application number: CN201910033541.7A
Authority: CN
Inventors: 姜伟; 吴骞
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-01-14
Filing date: 2019-01-14
Publication date: 2019-06-14

Abstract

The invention discloses a kind of motion feature extracting methods based on the perception of video before and after frames relevant information, this method uses the feature of neural network method extraction target video first, then the correlation information feature between before and after frames feature is explicitly extracted according to this feature, finally by the feature of the target video of extraction in conjunction with correlation information feature, the motion feature of target video is extracted using neural network method.The present invention is explicitly extracted the relevant information between video before and after frames, can substitute the extraction module of Optic flow information and quickly with existing various extraction of motion information Network integrations, reduce light stream extraction calculating and time cost, improve the recognition capability of network, method is easy, and means are flexible.

Description

A kind of motion feature extracting method based on the perception of video before and after frames relevant information

Technical field

The present invention relates to image identification technical field more particularly to a kind of neural network acquisition methods.

Background technique

Currently, the important foundation that action recognition is automatically analyzed as video, it will be in intelligent monitoring, new retail, human-computer interaction, It plays an important role in a series of application scenarios such as education and instruction.

For example, stolen if can identify well, picking lock in safety monitoring scene, the abnormal behaviours such as fight, Neng Gouqi To the critical function for reducing manpower monitoring cost, keeping the peace；In new retail domain, action recognition helps to better understand User behavior automatically analyzes customer's hobby, promotes user experience.

But current action recognition neural network focuses primarily upon shot and long term memory network (Long Short-Term Memory, abbreviation LSTM), the traditional image recognition nerve such as markers network diagramming (time scalar network, abbreviation TSN) Network method.The information extracted between frame and frame is needed by means and methods such as light stream figures, but the calculating of light stream needs to consume It is a large amount of to calculate power, power and time are stored, therefore cannot be used directly for actually using at present.

Summary of the invention

In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of based on the perception of video before and after frames relevant information Motion feature extracting method.

The purpose of the present invention is achieved through the following technical solutions: one kind is perceived based on video before and after frames relevant information Motion feature extracting method, comprising the following steps:

(1) neural network method is used, the feature of target video is extracted；

(2) feature extracted according to step 1 explicitly extracts the correlation information feature between before and after frames feature；

(3) feature for the target video that step 1 is extracted is used into mind in conjunction with the correlation information feature that step 2 is extracted The motion feature of target video is extracted through network method.

Further, the step 1 specifically:

The feature of target video is extracted in input to target video using two-dimentional neural network；

Or the input to target video, the feature of target video is extracted using three-dimensional nerve network.

Further, the step 2 includes following sub-step:

(2.1) feature for the target video for extracting step 1 is 1x1 by one layer of convolution kernel size as input feature vector Convolutional network layer, generate new feature；

(2.2) according to above-mentioned new feature, by using sobel operator or temporal aspect difference as the convolutional network of convolution kernel Layer generates correlation information feature.

Further, in the step 2.2, the convolutional network layer is preferably the direction x of 3x3 with vector dimension Sobel operator be convolution kernel convolutional network layer, using vector dimension for the direction y of 3x3 sobel operator as the convolution of convolution kernel Network layer with vector dimension is 2x1x1, is worth the convolutional network layer that the vector for [- 1,1] is convolution kernel；

Further, the step 3 specifically: extract to step 2 feature for the target video that step 1 is extracted related Property information characteristics be directly added and obtain newly-generated feature, the volume for being 1x1 by one layer of convolution kernel size by newly-generated feature Product nervous layer, generates the motion feature of target video；Or are as follows: the feature for the target video that step 1 is extracted is extracted with step 2 Correlation information feature be stitched together in characteristic dimension, obtain newly-generated feature, by newly-generated feature pass through one The convolutional Neural layer that layer convolution kernel size is 1x1, generates the motion feature of target video.

The invention has the advantages that the present invention is explicitly extracted the relevant information between video before and after frames, can replace Extraction module for Optic flow information and quickly with existing various extraction of motion information Network integrations, reduces light stream extraction Calculating and time cost improve the recognition capability of network, and method is easy, and means are flexible.

Detailed description of the invention

Fig. 1 is the process for using figure of video before and after frames relevant information sensing module proposed by the present invention.

Fig. 2 is the network structure of the first possible embodiment of step 2 of the present invention.

Fig. 3 is the network structure of second of possible embodiment of step 2 of the present invention.

Fig. 4 is the network structure of the third possible embodiment of step 2 of the present invention.

Fig. 5 is the network structure of the 4th kind of possible embodiment of step 2 of the present invention.

Specific embodiment

Below according to attached drawing, the present invention will be described in detail.

The present invention is based on the motion feature extracting method of video before and after frames relevant information perception is as shown in Figure 1.This method can To be interspersed in the junction of network node.For ResNet network can be inserted in two residual error modules (Residual block) it Between, GooleNet can be inserted between two Inception modules etc..

The present invention is based on the motion feature extracting methods of video before and after frames relevant information perception, comprising the following steps:

(1) neural network method is used, the feature of target video is extracted；

Target video is split as individual picture frame or to target video continuous sampling by the step.

When target video is split as individual picture frame, individual picture frame as input, using GoogleNet, The two dimension neural network such as VGG, ResNet extracts the feature of target video；

When to target video continuous sampling, the multiple image of acquisition is spliced into continuous segment as input, is passed through The three-dimensional nerves such as C3D, Inflated ResNet network extracts the feature of target video；

(2.2) according to above-mentioned new feature, by using sobel operator or temporal aspect difference as the convolutional network of convolution kernel Layer generates correlation information feature.Fig. 2-Fig. 5 shows four kinds of concrete processing procedures of the step.

As shown in Fig. 2, the specific steps of step 2.2 are as follows:

(2.2.1) is by the sobel operator with vector dimension for the direction x of 3x3 according to the new feature of 2.1 extractions The convolutional network layer of convolution kernel, obtains the feature vector in the direction x；

(2.2.2) is by the sobel operator with vector dimension for the direction y of 3x3 according to the new feature of 2.1 extractions The convolutional network layer of convolution kernel, obtains the feature vector in the direction y；

The new feature that (2.2.3) is extracted according to 2.1, by being 2x1x1 with vector dimension, the vector being worth for [- 1,1] is The convolutional network layer of convolution kernel, obtains the feature vector of time orientation.

(2.2.4) by the feature in the direction x extracted 2.2.1, the feature and 2.2.3 in the direction y that 2.2.2 is extracted extract when Between the feature in direction be stitched together in characteristic dimension, obtain the correlative character between video frame and frame.

As shown in figure 3, the specific steps of step 2.2 are as follows:

(2.2.3) spells the feature in the direction y that the feature in the direction x extracted 2.2.1 is extracted with 2.2.2 in characteristic dimension It picks up and, obtain the correlative character on sdi video；

The correlative character on sdi video that (2.2.4) is extracted according to 2.2.3 is worth by being 2x1x1 with vector dimension Vector for [- 1,1] is the convolutional network layer of convolution kernel, obtains the correlative character between video frame and frame.

As shown in figure 4, the specific steps of step 2.2 are as follows:

The correlative character on sdi video that (2.2.4) is extracted according to 2.2.3 is worth by being 2x1x1 with vector dimension Vector for [- 1,1] is the convolutional network layer of convolution kernel, obtains the feature vector of time orientation；

The feature of the correlative character on sdi video that (2.2.5) obtains 2.2.3 and the time orientation of 2.2.4 extraction Vector is stitched together in dimension, obtains the correlative character between video frame and frame.

As shown in figure 5, the specific steps of step 2.2 are as follows:

The new feature that (2.2.1) is extracted according to 2.1, by being 2x1x1 with vector dimension, the vector being worth for [- 1,1] is The convolutional network layer of convolution kernel, obtains the feature vector of time orientation；

(2.2.2) passes through the direction x with vector dimension for 3x3 according to the feature vector of the 2.2.1 time orientation extracted Sobel operator is the convolutional network layer of convolution kernel, obtains the feature vector in the direction x；

(2.2.3) passes through the direction y with vector dimension for 3x3 according to the feature vector of the 2.2.1 time orientation extracted Sobel operator is the convolutional network layer of convolution kernel, obtains the feature vector in the direction y；

(2.2.3) spells the feature in the direction y that the feature in the direction x extracted 2.2.2 is extracted with 2.2.3 in characteristic dimension It picks up and, obtain the correlative character between video frame and frame.

(3.1) feature for the target video that step 1 is extracted directly is added with the correlation information feature that step 2 is extracted To newly-generated feature；Or the correlation information feature for extracting the feature for the target video that step 1 is extracted and step 2 is in spy It is stitched together in sign dimension, obtains newly-generated feature；

(3.2) the convolutional Neural layer for being 1x1 by one layer of convolution kernel size by feature newly-generated in step 3.1 generates The motion feature of target video.

Claims

1. a kind of motion feature extracting method based on the perception of video before and after frames relevant information, which is characterized in that including following step It is rapid:

(1) neural network method is used, the feature of target video is extracted.

(2) feature extracted according to step 1 explicitly extracts the correlation information feature between before and after frames feature.

(3) feature for the target video that step 1 is extracted is used into nerve net in conjunction with the correlation information feature that step 2 is extracted The motion feature of network method extraction target video.

2. motion feature extracting method according to claim 1, which is characterized in that the step 1 specifically:

3. motion feature extracting method according to claim 1, which is characterized in that the step 2 includes following sub-step:

(2.1) feature for the target video for extracting step 1 is as input feature vector, the volume for being 1x1 by one layer of convolution kernel size Product network layer, generates new feature.

(2.2) according to above-mentioned new feature, by raw as the convolutional network layer of convolution kernel using sobel operator or temporal aspect difference At correlation information feature.

4. motion feature extracting method according to claim 3, which is characterized in that in the step 2.2, the convolution net Network layers preferably using vector dimension for the direction x of 3x3 sobel operator as the convolutional network layer of convolution kernel, with vector dimension be 3x3 The direction y sobel operator be the convolutional network layer of convolution kernel, with vector dimension for 2x1x1, being worth the vector for [- 1,1] is volume The convolutional network layer of product core；

5. motion feature extracting method according to claim 1, which is characterized in that the step 3 specifically: by step 1 The feature of the target video of extraction is directly added to obtain newly-generated feature with the correlation information feature that step 2 is extracted, will be new The convolutional Neural layer that the feature of generation is 1x1 by one layer of convolution kernel size, generates the motion feature of target video.

6. motion feature extracting method according to claim 1, which is characterized in that the step 3 specifically: by step 1 The correlation information feature that the feature of the target video of extraction is extracted with step 2 is stitched together in characteristic dimension, obtains new The feature of generation, the convolutional Neural layer for being 1x1 by one layer of convolution kernel size by newly-generated feature, generates the fortune of target video Dynamic feature.