CN109886104A - A kind of motion feature extracting method based on the perception of video before and after frames relevant information - Google Patents

A kind of motion feature extracting method based on the perception of video before and after frames relevant information Download PDF

Info

Publication number
CN109886104A
CN109886104A CN201910033541.7A CN201910033541A CN109886104A CN 109886104 A CN109886104 A CN 109886104A CN 201910033541 A CN201910033541 A CN 201910033541A CN 109886104 A CN109886104 A CN 109886104A
Authority
CN
China
Prior art keywords
feature
target video
extracted
layer
motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910033541.7A
Other languages
Chinese (zh)
Inventor
姜伟
吴骞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910033541.7A priority Critical patent/CN109886104A/en
Publication of CN109886104A publication Critical patent/CN109886104A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a kind of motion feature extracting methods based on the perception of video before and after frames relevant information, this method uses the feature of neural network method extraction target video first, then the correlation information feature between before and after frames feature is explicitly extracted according to this feature, finally by the feature of the target video of extraction in conjunction with correlation information feature, the motion feature of target video is extracted using neural network method.The present invention is explicitly extracted the relevant information between video before and after frames, can substitute the extraction module of Optic flow information and quickly with existing various extraction of motion information Network integrations, reduce light stream extraction calculating and time cost, improve the recognition capability of network, method is easy, and means are flexible.

Description

A kind of motion feature extracting method based on the perception of video before and after frames relevant information
Technical field
The present invention relates to image identification technical field more particularly to a kind of neural network acquisition methods.
Background technique
Currently, the important foundation that action recognition is automatically analyzed as video, it will be in intelligent monitoring, new retail, human-computer interaction, It plays an important role in a series of application scenarios such as education and instruction.
For example, stolen if can identify well, picking lock in safety monitoring scene, the abnormal behaviours such as fight, Neng Gouqi To the critical function for reducing manpower monitoring cost, keeping the peace;In new retail domain, action recognition helps to better understand User behavior automatically analyzes customer's hobby, promotes user experience.
But current action recognition neural network focuses primarily upon shot and long term memory network (Long Short-Term Memory, abbreviation LSTM), the traditional image recognition nerve such as markers network diagramming (time scalar network, abbreviation TSN) Network method.The information extracted between frame and frame is needed by means and methods such as light stream figures, but the calculating of light stream needs to consume It is a large amount of to calculate power, power and time are stored, therefore cannot be used directly for actually using at present.
Summary of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of based on the perception of video before and after frames relevant information Motion feature extracting method.
The purpose of the present invention is achieved through the following technical solutions: one kind is perceived based on video before and after frames relevant information Motion feature extracting method, comprising the following steps:
(1) neural network method is used, the feature of target video is extracted;
(2) feature extracted according to step 1 explicitly extracts the correlation information feature between before and after frames feature;
(3) feature for the target video that step 1 is extracted is used into mind in conjunction with the correlation information feature that step 2 is extracted The motion feature of target video is extracted through network method.
Further, the step 1 specifically:
The feature of target video is extracted in input to target video using two-dimentional neural network;
Or the input to target video, the feature of target video is extracted using three-dimensional nerve network.
Further, the step 2 includes following sub-step:
(2.1) feature for the target video for extracting step 1 is 1x1 by one layer of convolution kernel size as input feature vector Convolutional network layer, generate new feature;
(2.2) according to above-mentioned new feature, by using sobel operator or temporal aspect difference as the convolutional network of convolution kernel Layer generates correlation information feature.
Further, in the step 2.2, the convolutional network layer is preferably the direction x of 3x3 with vector dimension Sobel operator be convolution kernel convolutional network layer, using vector dimension for the direction y of 3x3 sobel operator as the convolution of convolution kernel Network layer with vector dimension is 2x1x1, is worth the convolutional network layer that the vector for [- 1,1] is convolution kernel;
Further, the step 3 specifically: extract to step 2 feature for the target video that step 1 is extracted related Property information characteristics be directly added and obtain newly-generated feature, the volume for being 1x1 by one layer of convolution kernel size by newly-generated feature Product nervous layer, generates the motion feature of target video;Or are as follows: the feature for the target video that step 1 is extracted is extracted with step 2 Correlation information feature be stitched together in characteristic dimension, obtain newly-generated feature, by newly-generated feature pass through one The convolutional Neural layer that layer convolution kernel size is 1x1, generates the motion feature of target video.
The invention has the advantages that the present invention is explicitly extracted the relevant information between video before and after frames, can replace Extraction module for Optic flow information and quickly with existing various extraction of motion information Network integrations, reduces light stream extraction Calculating and time cost improve the recognition capability of network, and method is easy, and means are flexible.
Detailed description of the invention
Fig. 1 is the process for using figure of video before and after frames relevant information sensing module proposed by the present invention.
Fig. 2 is the network structure of the first possible embodiment of step 2 of the present invention.
Fig. 3 is the network structure of second of possible embodiment of step 2 of the present invention.
Fig. 4 is the network structure of the third possible embodiment of step 2 of the present invention.
Fig. 5 is the network structure of the 4th kind of possible embodiment of step 2 of the present invention.
Specific embodiment
Below according to attached drawing, the present invention will be described in detail.
The present invention is based on the motion feature extracting method of video before and after frames relevant information perception is as shown in Figure 1.This method can To be interspersed in the junction of network node.For ResNet network can be inserted in two residual error modules (Residual block) it Between, GooleNet can be inserted between two Inception modules etc..
The present invention is based on the motion feature extracting methods of video before and after frames relevant information perception, comprising the following steps:
(1) neural network method is used, the feature of target video is extracted;
Target video is split as individual picture frame or to target video continuous sampling by the step.
When target video is split as individual picture frame, individual picture frame as input, using GoogleNet, The two dimension neural network such as VGG, ResNet extracts the feature of target video;
When to target video continuous sampling, the multiple image of acquisition is spliced into continuous segment as input, is passed through The three-dimensional nerves such as C3D, Inflated ResNet network extracts the feature of target video;
(2) feature extracted according to step 1 explicitly extracts the correlation information feature between before and after frames feature;
(2.1) feature for the target video for extracting step 1 is 1x1 by one layer of convolution kernel size as input feature vector Convolutional network layer, generate new feature;
(2.2) according to above-mentioned new feature, by using sobel operator or temporal aspect difference as the convolutional network of convolution kernel Layer generates correlation information feature.Fig. 2-Fig. 5 shows four kinds of concrete processing procedures of the step.
As shown in Fig. 2, the specific steps of step 2.2 are as follows:
(2.2.1) is by the sobel operator with vector dimension for the direction x of 3x3 according to the new feature of 2.1 extractions The convolutional network layer of convolution kernel, obtains the feature vector in the direction x;
(2.2.2) is by the sobel operator with vector dimension for the direction y of 3x3 according to the new feature of 2.1 extractions The convolutional network layer of convolution kernel, obtains the feature vector in the direction y;
The new feature that (2.2.3) is extracted according to 2.1, by being 2x1x1 with vector dimension, the vector being worth for [- 1,1] is The convolutional network layer of convolution kernel, obtains the feature vector of time orientation.
(2.2.4) by the feature in the direction x extracted 2.2.1, the feature and 2.2.3 in the direction y that 2.2.2 is extracted extract when Between the feature in direction be stitched together in characteristic dimension, obtain the correlative character between video frame and frame.
As shown in figure 3, the specific steps of step 2.2 are as follows:
(2.2.1) is by the sobel operator with vector dimension for the direction x of 3x3 according to the new feature of 2.1 extractions The convolutional network layer of convolution kernel, obtains the feature vector in the direction x;
(2.2.2) is by the sobel operator with vector dimension for the direction y of 3x3 according to the new feature of 2.1 extractions The convolutional network layer of convolution kernel, obtains the feature vector in the direction y;
(2.2.3) spells the feature in the direction y that the feature in the direction x extracted 2.2.1 is extracted with 2.2.2 in characteristic dimension It picks up and, obtain the correlative character on sdi video;
The correlative character on sdi video that (2.2.4) is extracted according to 2.2.3 is worth by being 2x1x1 with vector dimension Vector for [- 1,1] is the convolutional network layer of convolution kernel, obtains the correlative character between video frame and frame.
As shown in figure 4, the specific steps of step 2.2 are as follows:
(2.2.1) is by the sobel operator with vector dimension for the direction x of 3x3 according to the new feature of 2.1 extractions The convolutional network layer of convolution kernel, obtains the feature vector in the direction x;
(2.2.2) is by the sobel operator with vector dimension for the direction y of 3x3 according to the new feature of 2.1 extractions The convolutional network layer of convolution kernel, obtains the feature vector in the direction y;
(2.2.3) spells the feature in the direction y that the feature in the direction x extracted 2.2.1 is extracted with 2.2.2 in characteristic dimension It picks up and, obtain the correlative character on sdi video;
The correlative character on sdi video that (2.2.4) is extracted according to 2.2.3 is worth by being 2x1x1 with vector dimension Vector for [- 1,1] is the convolutional network layer of convolution kernel, obtains the feature vector of time orientation;
The feature of the correlative character on sdi video that (2.2.5) obtains 2.2.3 and the time orientation of 2.2.4 extraction Vector is stitched together in dimension, obtains the correlative character between video frame and frame.
As shown in figure 5, the specific steps of step 2.2 are as follows:
The new feature that (2.2.1) is extracted according to 2.1, by being 2x1x1 with vector dimension, the vector being worth for [- 1,1] is The convolutional network layer of convolution kernel, obtains the feature vector of time orientation;
(2.2.2) passes through the direction x with vector dimension for 3x3 according to the feature vector of the 2.2.1 time orientation extracted Sobel operator is the convolutional network layer of convolution kernel, obtains the feature vector in the direction x;
(2.2.3) passes through the direction y with vector dimension for 3x3 according to the feature vector of the 2.2.1 time orientation extracted Sobel operator is the convolutional network layer of convolution kernel, obtains the feature vector in the direction y;
(2.2.3) spells the feature in the direction y that the feature in the direction x extracted 2.2.2 is extracted with 2.2.3 in characteristic dimension It picks up and, obtain the correlative character between video frame and frame.
(3) feature for the target video that step 1 is extracted is used into mind in conjunction with the correlation information feature that step 2 is extracted The motion feature of target video is extracted through network method.
(3.1) feature for the target video that step 1 is extracted directly is added with the correlation information feature that step 2 is extracted To newly-generated feature;Or the correlation information feature for extracting the feature for the target video that step 1 is extracted and step 2 is in spy It is stitched together in sign dimension, obtains newly-generated feature;
(3.2) the convolutional Neural layer for being 1x1 by one layer of convolution kernel size by feature newly-generated in step 3.1 generates The motion feature of target video.

Claims (6)

1. a kind of motion feature extracting method based on the perception of video before and after frames relevant information, which is characterized in that including following step It is rapid:
(1) neural network method is used, the feature of target video is extracted.
(2) feature extracted according to step 1 explicitly extracts the correlation information feature between before and after frames feature.
(3) feature for the target video that step 1 is extracted is used into nerve net in conjunction with the correlation information feature that step 2 is extracted The motion feature of network method extraction target video.
2. motion feature extracting method according to claim 1, which is characterized in that the step 1 specifically:
The feature of target video is extracted in input to target video using two-dimentional neural network;
Or the input to target video, the feature of target video is extracted using three-dimensional nerve network.
3. motion feature extracting method according to claim 1, which is characterized in that the step 2 includes following sub-step:
(2.1) feature for the target video for extracting step 1 is as input feature vector, the volume for being 1x1 by one layer of convolution kernel size Product network layer, generates new feature.
(2.2) according to above-mentioned new feature, by raw as the convolutional network layer of convolution kernel using sobel operator or temporal aspect difference At correlation information feature.
4. motion feature extracting method according to claim 3, which is characterized in that in the step 2.2, the convolution net Network layers preferably using vector dimension for the direction x of 3x3 sobel operator as the convolutional network layer of convolution kernel, with vector dimension be 3x3 The direction y sobel operator be the convolutional network layer of convolution kernel, with vector dimension for 2x1x1, being worth the vector for [- 1,1] is volume The convolutional network layer of product core;
5. motion feature extracting method according to claim 1, which is characterized in that the step 3 specifically: by step 1 The feature of the target video of extraction is directly added to obtain newly-generated feature with the correlation information feature that step 2 is extracted, will be new The convolutional Neural layer that the feature of generation is 1x1 by one layer of convolution kernel size, generates the motion feature of target video.
6. motion feature extracting method according to claim 1, which is characterized in that the step 3 specifically: by step 1 The correlation information feature that the feature of the target video of extraction is extracted with step 2 is stitched together in characteristic dimension, obtains new The feature of generation, the convolutional Neural layer for being 1x1 by one layer of convolution kernel size by newly-generated feature, generates the fortune of target video Dynamic feature.
CN201910033541.7A 2019-01-14 2019-01-14 A kind of motion feature extracting method based on the perception of video before and after frames relevant information Pending CN109886104A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910033541.7A CN109886104A (en) 2019-01-14 2019-01-14 A kind of motion feature extracting method based on the perception of video before and after frames relevant information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910033541.7A CN109886104A (en) 2019-01-14 2019-01-14 A kind of motion feature extracting method based on the perception of video before and after frames relevant information

Publications (1)

Publication Number Publication Date
CN109886104A true CN109886104A (en) 2019-06-14

Family

ID=66925907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910033541.7A Pending CN109886104A (en) 2019-01-14 2019-01-14 A kind of motion feature extracting method based on the perception of video before and after frames relevant information

Country Status (1)

Country Link
CN (1) CN109886104A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783069A (en) * 2022-06-21 2022-07-22 中山大学深圳研究院 Method, device, terminal equipment and storage medium for identifying object based on gait

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009121075A2 (en) * 2008-03-28 2009-10-01 Kalpaxis Alex J System, method and computer-program product for cognitive learning
CN102945268A (en) * 2012-10-25 2013-02-27 北京腾逸科技发展有限公司 Method and system for excavating comments on characteristics of product
CN107292247A (en) * 2017-06-05 2017-10-24 浙江理工大学 A kind of Human bodys' response method and device based on residual error network
CN107437258A (en) * 2016-05-27 2017-12-05 株式会社理光 Feature extracting method, estimation method of motion state and state estimation device
CN107832708A (en) * 2017-11-09 2018-03-23 云丁网络技术(北京)有限公司 A kind of human motion recognition method and device
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009121075A2 (en) * 2008-03-28 2009-10-01 Kalpaxis Alex J System, method and computer-program product for cognitive learning
CN102945268A (en) * 2012-10-25 2013-02-27 北京腾逸科技发展有限公司 Method and system for excavating comments on characteristics of product
CN107437258A (en) * 2016-05-27 2017-12-05 株式会社理光 Feature extracting method, estimation method of motion state and state estimation device
CN107292247A (en) * 2017-06-05 2017-10-24 浙江理工大学 A kind of Human bodys' response method and device based on residual error network
CN107832708A (en) * 2017-11-09 2018-03-23 云丁网络技术(北京)有限公司 A kind of human motion recognition method and device
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FAN, LJ 等: "End-to-End Learning of Motion Representation for Video Understanding", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
SHUYANG SUN: "Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
ZHAOFAN QIU 等: "Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783069A (en) * 2022-06-21 2022-07-22 中山大学深圳研究院 Method, device, terminal equipment and storage medium for identifying object based on gait

Similar Documents

Publication Publication Date Title
Rössler et al. Faceforensics: A large-scale video dataset for forgery detection in human faces
CN108846365B (en) Detection method and device for fighting behavior in video, storage medium and processor
Johnston et al. A review of digital video tampering: From simple editing to full synthesis
KR102106135B1 (en) Apparatus and method for providing application service by using action recognition
CN113179368A (en) Data processing method and device for vehicle damage assessment, processing equipment and client
CN106254933A (en) Subtitle extraction method and device
CN102088597B (en) Method for estimating video visual salience through dynamic and static combination
CN110427859A (en) A kind of method for detecting human face, device, electronic equipment and storage medium
CN110297943A (en) Adding method, device, electronic equipment and the storage medium of label
CN109712144A (en) Processing method, training method, equipment and the storage medium of face-image
CN106157363A (en) A kind of photographic method based on augmented reality, device and mobile terminal
CN113365147A (en) Video editing method, device, equipment and storage medium based on music card point
CN110476141A (en) Sight tracing and user terminal for executing this method
CN109635822B (en) Stereoscopic image visual saliency extraction method based on deep learning coding and decoding network
CN110503076A (en) Video classification methods, device, equipment and medium based on artificial intelligence
CN111539290A (en) Video motion recognition method and device, electronic equipment and storage medium
Karaman et al. Human daily activities indexing in videos from wearable cameras for monitoring of patients with dementia diseases
CN103020606A (en) Pedestrian detection method based on spatio-temporal context information
CN104036243A (en) Behavior recognition method based on light stream information
CN103777757A (en) System for placing virtual object in augmented reality by combining with significance detection
CN103327359A (en) Video significance region searching method applied to video quality evaluation
CN114926734B (en) Solid waste detection device and method based on feature aggregation and attention fusion
CN105979283A (en) Video transcoding method and device
Hafiz et al. Foreground segmentation-based human detection with shadow removal
CN109886104A (en) A kind of motion feature extracting method based on the perception of video before and after frames relevant information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190614