CN109583334A

CN109583334A - A kind of action identification method and its system based on space time correlation neural network

Info

Publication number: CN109583334A
Application number: CN201811368191.1A
Authority: CN
Inventors: 胡海峰; 刘峥; 何琛; 张俊轩
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2018-11-16
Filing date: 2018-11-16
Publication date: 2019-04-05
Anticipated expiration: 2038-11-16
Also published as: CN109583334B

Abstract

The present invention provides a kind of action identification method based on space time correlation neural network, is trained to obtain space time correlation neural network by the video data for having marked action classification to magnanimity.The network is by the action recognition model of three parts (space neural network module, consecutive frame are associated with neural network module, relating module) composition.The invention can effectively extract the spatial information of video actions and the temporal information of movement, and the extraction of video actions space time information can be better achieved.The space time correlation neural network of design can be realized to be learnt end to end, has good performance in the accuracy rate and speed of action recognition.

Description

A kind of action identification method and its system based on space time correlation neural network

Technical field

The present invention relates to artificial intelligence fields, know more particularly, to a kind of movement based on space time correlation neural network Other method and its system.

Background technique

For the action recognition technology of early stage based on the method for artificial design features, effect is general.Recently as depth Extensive use of the learning method in computer vision field forms a series of action recognition side based on neural network Method achieves bigger success.Smionyan in 2014 et al. proposes double-current convolutional neural networks, is replaced using light stream Temporal information in video achieves bigger success.2016 and Wang etc. devises a kind of time slice network, the net Network structure can be modeled for a long time by the way that video frame to be segmented, the prolonged action message of acquisition that can be relatively good.

Although binary-flow network achieves certain effect in action recognition, which uses light stream as movement The characterization of temporal information has Railway Project.The temporal information that can one side light stream represent in movement remains to be discussed, another party The calculating needs of face light stream take a substantial amount of time, and real-time application scenarios, the light streams such as monitoring are not available, the party is limited The timeliness of method.Therefore this method can not make a breakthrough in practical application.

Summary of the invention

The present invention be overcome be not able to satisfy described in the above-mentioned prior art all application places and calculating ratio it is relatively time-consuming lack It falls into, a kind of action identification method and its system based on space time correlation neural network is provided.

The present invention is directed to solve above-mentioned technical problem at least to a certain extent.

In order to solve the above technical problems, technical scheme is as follows: a kind of dynamic based on space time correlation neural network Make recognition methods, step includes:

S1: construction and training space neural network module；

S2: construction is associated with neural network module with training consecutive frame, uses relating module connection space neural network module Neural network module is associated with consecutive frame；

S3: training is associated with the space time correlation mind that neural network module is combined by space neural network module with consecutive frame Through network system；

S4: video input to be measured is subjected to action recognition into trained space time correlation neural network module.

The present invention is closed by construction with the space-time that relating module closes networking module come connection space network module and consecutive frame Connection neural network module extracts spatial information and temporal information, to accurately carry out the classification of motion to video to be measured.

Preferably, detailed process is as follows by the step S1:

S1.1: being equally divided into 3 sections for each training video, and every section randomly selects a frame, uses the three frames video as space The input of neural network module, while three frame videos are handled using data enhancing processing technique；

S1.2: building space neural network module, using the good convolutional network model of pre-training to space neural network mould Block is initialized；

S1.3: the Output Size of the full articulamentum of modification space neural network module protects the class number of itself and movement It holds consistent；

S1.4: for each video, three video frames of selection are sequentially input, each video frame obtains classification results can To be expressed as Pre_i, wherein i=1,2,3；

S1.5: the classification results of three frame images are averaged, and as the expression of the final video, can be indicated are as follows:

S1.6: the classification results of video and given label are compared, using the method for stochastic gradient descent to whole The parameter of a space neural network module is updated, and realizes the training of space neural network module；

S1.7: after the completion of the training of space neural network module, the parameter of retaining space neural network module, for space-time It is associated with the initialization of nerve network system.

Preferably, detailed process is as follows by the step S2:

S2.1: each training video is equally divided into 3 sections, every section randomly selects continuous 5 frame as space time correlation nerve net The input of network, while being handled using data enhancing, random cropping and flip horizontal are carried out to each frame image；

S2.2: building consecutive frame association neural network module, and consecutive frame is associated with neural network module structure and space nerve Network module structure is almost the same, only removes the convolutional layer of space neural network module first layer；

S2.3: building relating module is believed for integrating the information between consecutive frame from association is extracted between consecutive frame Breath, as the input of consecutive frame association neural network module, relating module is by a three dimensional convolution kernel and a maximum pond Layer composition；

S2.4: the feature of space neural network module is converted into consecutive frame linked character using relating module；

S2.5: the parameter constant of fixed space neural network module, it is initial using the network parameter retained in step S1.7 Change space neural network module, training consecutive frame is associated with neural network module；

S2.6: the consecutive frame that three are segmented is sequentially inputted in network, and every section of consecutive frame is associated with by consecutive frame The output result of neural network module can be expressed as Pre_i, wherein i=1, and 2,3；

S2.7: the classification results that three are segmented are averaged, and as the expression of the final video, can be indicated are as follows:

S2.8: the classification results of video and given label are compared, using the method for stochastic gradient descent to whole The parameter of a consecutive frame association neural network module and relating module is updated, realize consecutive frame association neural network module with The training of relating module；

S2.9: consecutive frame is associated with after the completion of neural network module training, retains the ginseng of consecutive frame association neural network module Several and relating module parameter, the initialization for space time correlation neural network.

Preferably, detailed process is as follows by the step S3:

S3.1: building space time correlation neural network, space time correlation neural network is by space neural network module and consecutive frame Related network is combined, and the output of first convolutional layer of space neural network module is made after the processing of relating module The input of first residual error module of neural network module is associated with for consecutive frame；

S3.2: space time correlation neural network is initialized using the network parameter retained in S1.7 and S2.9, so After start to be trained entire space time correlation neural network, realize the optimization of parameter；

S3.3: each video is equally divided into 3 sections, every section randomly selects continuous 5 frame as space time correlation neural network Input, while being handled using data enhancing, random cropping and flip horizontal carried out to each frame image；

S3.4: the consecutive frame that three are segmented is sequentially inputted in network, and every section of consecutive frame is associated with by consecutive frame The output result of neural network module is expressed as Pre_r_i(i=1,2,3) passes through the output result table of spatial convoluted neural network It is shown as Pre_s_i(i=1,2,3)；

S3.5: by output that three are segmented as a result, and the spatial convoluted neural network of each segmentation be associated with consecutive frame The output result of neural network module blends, and as the classification score of the final video, can indicate are as follows:

S3.6: the classification results of video and given label are compared, using the method for stochastic gradient descent to whole The parameter of a space time correlation neural network is updated, and realizes the training of space time correlation neural network；

S3.7: trained space time correlation neural network is for the classification to video actions.

In addition a kind of scheme met under same inventive concept has: a kind of movement knowledge based on space time correlation neural network The system of other method, including space neural network module, consecutive frame association neural network module and relating module；

Space neural network module extracts the spatial information in video；

Consecutive frame association neural network module is used to extract the temporal information in video；

Relating module is used to merge several space neural network modules with several consecutive frames association neural network module To space time correlation neural network.

Preferably, the space neural network module includes convolutional layer, several residual error modules, pond layer and full connection Layer；

The consecutive frame association neural network module includes several residual error modules, pond layer and full articulamentum；

The output of first convolutional layer of space neural network module is used as consecutive frame after the processing of relating module It is associated with the input of first residual error module of neural network module.

Preferably, the space time correlation nerve network system is to be closed several by space neural network module, consecutive frame Join the network module composition of neural network module and relating module composition, the output of the network module of the composition is by average Pond obtains classification results.

Compared with prior art, the beneficial effect of technical solution of the present invention is: the invention proposes a kind of new end-to-end Binary-flow network structure, i.e. space time correlation neural network.Space time correlation neural network is made of Liang Ge branch, is space networks respectively Network and consecutive frame related network, for extracting spatial information and temporal information in video.Relating module is devised to connect sky Between network net is associated with consecutive frame, the related information between consecutive frame can be used as movement in temporal information expression.The present invention It can effectively be proposed from video to spatial information and temporal information, so that capture is for accurately expressing the space-time of video features Information, for the classification of motion in video, this method has relatively good performance, being capable of accurately and quickly realization movement Classification.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Fig. 2 is space time correlation nerve network system of the present invention；

Fig. 3 is relating module of the present invention.

Specific embodiment

The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent；

The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.

Embodiment 1

Action identification method flow chart based on space time correlation neural network as shown in Figure 1, step include:

S1: construction and training space neural network module；

S1.1: being equally divided into 3 sections for each training video, and every section randomly selects a frame, uses the three frames video as space The input of neural network module, while being handled using data enhancing, i.e., random cropping and flip horizontal are carried out to each frame image；

S1.7: after the completion of the training of space neural network module, the parameter of retaining space neural network module, for space-time It is associated with the initialization of neural network module.

S2.3: building relating module is believed for integrating the information between consecutive frame from association is extracted between consecutive frame Breath, as the input of consecutive frame association neural network module, relating module is by a three dimensional convolution kernel and a maximum pond Layer composition, structure are as shown in Figure 3；

S2.4: consecutive frame association neural network module is combined into composition with space neural network module structure, by space The feature extraction of first convolutional layer of neural network module comes out the input for being used as consecutive frame association neural network module, simultaneously will The output of first residual error module of space neural network module is added to consecutive frame association nerve by relating module as residual error The same position of network module, forms space time correlation neural network, and the feature of space neural network module is turned using relating module Change consecutive frame linked character into；

S3.1: building space time correlation nerve network system, space time correlation neural network is by space neural network module and phase Adjacent frame related network is combined, first convolutional layer of space neural network module output by relating module processing it The input as first residual error module of consecutive frame association neural network module, specific system diagram are as shown in Figure 2 afterwards；

S4: video input to be measured is subjected to action recognition into trained space time correlation nerve network system.

Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims

1. a kind of action identification method based on space time correlation neural network, which comprises the following steps:

S1: construction and training space neural network module；

S2: construction is associated with neural network module with training consecutive frame, uses relating module connection space neural network module and phase Adjacent frame is associated with neural network module；

S3: training is associated with the space time correlation nerve net that neural network module is combined by space neural network module with consecutive frame Network system；

2. the action identification method according to claim 1 based on space time correlation neural network, which is characterized in that the step Detailed process is as follows by rapid S1:

S1.1: being equally divided into 3 sections for each training video, and every section randomly selects a frame, uses the three frames video as space nerve The input of network module, while three frame videos are handled using data enhancing processing technique；

S1.2: building space neural network module, using the good convolutional network model of pre-training to space neural network module into Row initialization；

S1.3: the Output Size of the full articulamentum of modification space neural network module makes the class number of itself and movement keep one It causes；

S1.4: for each video, sequentially inputting three video frames of selection, and the classification results that each video frame obtains can be with It is expressed as Pre_i, wherein i=1,2,3；

S1.5: the classification results of three frame images are averaged, and as the expression of the final video, are indicated are as follows:

S1.6: the classification results of video and given label are compared, using the method for stochastic gradient descent to entire sky Between the parameter of neural network module be updated, realize the training of space neural network module；

S1.7: after the completion of the training of space neural network module, the parameter of retaining space neural network module, for space time correlation The initialization of nerve network system.

3. the action identification method according to claim 2 based on space time correlation neural network, which is characterized in that the step Detailed process is as follows by rapid S2:

S2.1: each training video is equally divided into 3 sections, every section randomly selects continuous 5 frame as consecutive frame and is associated with neural network The input of module, while 5 frame videos are handled using data enhancing processing technique；

S2.2: consecutive frame association neural network module is built；

S2.3: building relating module, from related information is extracted between consecutive frame, is made for integrating the information between consecutive frame The input of neural network module is associated with for consecutive frame, relating module is by a three dimensional convolution kernel and a maximum pond layer group At；

S2.5: the parameter constant of fixed space neural network module is initialized empty using the network parameter retained in step S1.7 Between neural network module, training consecutive frame be associated with neural network module；

S2.6: the consecutive frame that three are segmented is sequentially inputted in consecutive frame association neural network module, every section of consecutive frame is logical The output result for crossing consecutive frame association neural network module can be expressed as Pre_i, wherein i=1, and 2,3；

S2.7: the classification results that three are segmented are averaged, and as the expression of the final video, are indicated are as follows:

S2.8: the classification results of video and given label are compared, using the method for stochastic gradient descent to entire phase The parameter of adjacent frame association neural network module and relating module is updated, realize consecutive frame association neural network module be associated with The training of module；

S2.9: consecutive frame be associated with neural network module training after the completion of, retain consecutive frame association neural network module parameter with And the parameter of relating module, the initialization for space time correlation nerve network system.

4. the action identification method according to claim 3 based on space time correlation neural network, which is characterized in that the step Detailed process is as follows by rapid S3:

S3.1: building space time correlation nerve network system, space time correlation nerve network system is by space neural network module and phase Adjacent frame related network is combined, first convolutional layer of space neural network module output by relating module processing it Input as first residual error module of consecutive frame association neural network module afterwards；

S3.2: space time correlation nerve network system is initialized using the network parameter retained in S1.7 and S2.9, so After start to be trained entire space time correlation nerve network system, realize the optimization of parameter；

S3.3: each video is equally divided into 3 sections, every section randomly selects continuous 5 frame as space time correlation nerve network system Input, while using data enhancing processing technique 5 frame videos are handled；

S3.4: the consecutive frame that three are segmented is sequentially inputted in space time correlation nerve network system, every section of consecutive frame passes through The output result of consecutive frame association neural network module is expressed as Pre_r_i(i=1,2,3), passes through space neural network module Output result is expressed as Pre_s_i(i=1,2,3)；

S3.5: by output that three are segmented as a result, and the space neural network module of each segmentation nerve is associated with consecutive frame The output result of network module blends, and as the classification score of the final video, indicates are as follows:

S3.6: the classification results of video and given label are compared, using stochastic gradient descent method to it is entire when The parameter of null Context nerve network system is updated, and realizes the training of space time correlation nerve network system.

5. a kind of system of the method for any one according to claim 1~4, which is characterized in that including space neural network module, phase Adjacent frame association neural network module and relating module.

6. a kind of system of action identification method based on space time correlation neural network according to claim 5, feature It is:

The space neural network module includes convolutional layer, several residual error modules, pond layer and full articulamentum；

The output of first convolutional layer of space neural network module is used as consecutive frame to be associated with after the processing of relating module The input of first residual error module of neural network module.

7. a kind of system of action identification method based on space time correlation neural network according to claim 5, feature Be: the space time correlation nerve network system includes several space neural network modules, consecutive frame association neural network mould The output of block and relating module, the space time correlation nerve network system obtains classification results by average pond.