CN109886104A - A kind of motion feature extracting method based on the perception of video before and after frames relevant information - Google Patents
A kind of motion feature extracting method based on the perception of video before and after frames relevant information Download PDFInfo
- Publication number
- CN109886104A CN109886104A CN201910033541.7A CN201910033541A CN109886104A CN 109886104 A CN109886104 A CN 109886104A CN 201910033541 A CN201910033541 A CN 201910033541A CN 109886104 A CN109886104 A CN 109886104A
- Authority
- CN
- China
- Prior art keywords
- feature
- target video
- extracted
- layer
- motion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a kind of motion feature extracting methods based on the perception of video before and after frames relevant information, this method uses the feature of neural network method extraction target video first, then the correlation information feature between before and after frames feature is explicitly extracted according to this feature, finally by the feature of the target video of extraction in conjunction with correlation information feature, the motion feature of target video is extracted using neural network method.The present invention is explicitly extracted the relevant information between video before and after frames, can substitute the extraction module of Optic flow information and quickly with existing various extraction of motion information Network integrations, reduce light stream extraction calculating and time cost, improve the recognition capability of network, method is easy, and means are flexible.
Description
Technical field
The present invention relates to image identification technical field more particularly to a kind of neural network acquisition methods.
Background technique
Currently, the important foundation that action recognition is automatically analyzed as video, it will be in intelligent monitoring, new retail, human-computer interaction,
It plays an important role in a series of application scenarios such as education and instruction.
For example, stolen if can identify well, picking lock in safety monitoring scene, the abnormal behaviours such as fight, Neng Gouqi
To the critical function for reducing manpower monitoring cost, keeping the peace;In new retail domain, action recognition helps to better understand
User behavior automatically analyzes customer's hobby, promotes user experience.
But current action recognition neural network focuses primarily upon shot and long term memory network (Long Short-Term
Memory, abbreviation LSTM), the traditional image recognition nerve such as markers network diagramming (time scalar network, abbreviation TSN)
Network method.The information extracted between frame and frame is needed by means and methods such as light stream figures, but the calculating of light stream needs to consume
It is a large amount of to calculate power, power and time are stored, therefore cannot be used directly for actually using at present.
Summary of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of based on the perception of video before and after frames relevant information
Motion feature extracting method.
The purpose of the present invention is achieved through the following technical solutions: one kind is perceived based on video before and after frames relevant information
Motion feature extracting method, comprising the following steps:
(1) neural network method is used, the feature of target video is extracted;
(2) feature extracted according to step 1 explicitly extracts the correlation information feature between before and after frames feature;
(3) feature for the target video that step 1 is extracted is used into mind in conjunction with the correlation information feature that step 2 is extracted
The motion feature of target video is extracted through network method.
Further, the step 1 specifically:
The feature of target video is extracted in input to target video using two-dimentional neural network;
Or the input to target video, the feature of target video is extracted using three-dimensional nerve network.
Further, the step 2 includes following sub-step:
(2.1) feature for the target video for extracting step 1 is 1x1 by one layer of convolution kernel size as input feature vector
Convolutional network layer, generate new feature;
(2.2) according to above-mentioned new feature, by using sobel operator or temporal aspect difference as the convolutional network of convolution kernel
Layer generates correlation information feature.
Further, in the step 2.2, the convolutional network layer is preferably the direction x of 3x3 with vector dimension
Sobel operator be convolution kernel convolutional network layer, using vector dimension for the direction y of 3x3 sobel operator as the convolution of convolution kernel
Network layer with vector dimension is 2x1x1, is worth the convolutional network layer that the vector for [- 1,1] is convolution kernel;
Further, the step 3 specifically: extract to step 2 feature for the target video that step 1 is extracted related
Property information characteristics be directly added and obtain newly-generated feature, the volume for being 1x1 by one layer of convolution kernel size by newly-generated feature
Product nervous layer, generates the motion feature of target video;Or are as follows: the feature for the target video that step 1 is extracted is extracted with step 2
Correlation information feature be stitched together in characteristic dimension, obtain newly-generated feature, by newly-generated feature pass through one
The convolutional Neural layer that layer convolution kernel size is 1x1, generates the motion feature of target video.
The invention has the advantages that the present invention is explicitly extracted the relevant information between video before and after frames, can replace
Extraction module for Optic flow information and quickly with existing various extraction of motion information Network integrations, reduces light stream extraction
Calculating and time cost improve the recognition capability of network, and method is easy, and means are flexible.
Detailed description of the invention
Fig. 1 is the process for using figure of video before and after frames relevant information sensing module proposed by the present invention.
Fig. 2 is the network structure of the first possible embodiment of step 2 of the present invention.
Fig. 3 is the network structure of second of possible embodiment of step 2 of the present invention.
Fig. 4 is the network structure of the third possible embodiment of step 2 of the present invention.
Fig. 5 is the network structure of the 4th kind of possible embodiment of step 2 of the present invention.
Specific embodiment
Below according to attached drawing, the present invention will be described in detail.
The present invention is based on the motion feature extracting method of video before and after frames relevant information perception is as shown in Figure 1.This method can
To be interspersed in the junction of network node.For ResNet network can be inserted in two residual error modules (Residual block) it
Between, GooleNet can be inserted between two Inception modules etc..
The present invention is based on the motion feature extracting methods of video before and after frames relevant information perception, comprising the following steps:
(1) neural network method is used, the feature of target video is extracted;
Target video is split as individual picture frame or to target video continuous sampling by the step.
When target video is split as individual picture frame, individual picture frame as input, using GoogleNet,
The two dimension neural network such as VGG, ResNet extracts the feature of target video;
When to target video continuous sampling, the multiple image of acquisition is spliced into continuous segment as input, is passed through
The three-dimensional nerves such as C3D, Inflated ResNet network extracts the feature of target video;
(2) feature extracted according to step 1 explicitly extracts the correlation information feature between before and after frames feature;
(2.1) feature for the target video for extracting step 1 is 1x1 by one layer of convolution kernel size as input feature vector
Convolutional network layer, generate new feature;
(2.2) according to above-mentioned new feature, by using sobel operator or temporal aspect difference as the convolutional network of convolution kernel
Layer generates correlation information feature.Fig. 2-Fig. 5 shows four kinds of concrete processing procedures of the step.
As shown in Fig. 2, the specific steps of step 2.2 are as follows:
(2.2.1) is by the sobel operator with vector dimension for the direction x of 3x3 according to the new feature of 2.1 extractions
The convolutional network layer of convolution kernel, obtains the feature vector in the direction x;
(2.2.2) is by the sobel operator with vector dimension for the direction y of 3x3 according to the new feature of 2.1 extractions
The convolutional network layer of convolution kernel, obtains the feature vector in the direction y;
The new feature that (2.2.3) is extracted according to 2.1, by being 2x1x1 with vector dimension, the vector being worth for [- 1,1] is
The convolutional network layer of convolution kernel, obtains the feature vector of time orientation.
(2.2.4) by the feature in the direction x extracted 2.2.1, the feature and 2.2.3 in the direction y that 2.2.2 is extracted extract when
Between the feature in direction be stitched together in characteristic dimension, obtain the correlative character between video frame and frame.
As shown in figure 3, the specific steps of step 2.2 are as follows:
(2.2.1) is by the sobel operator with vector dimension for the direction x of 3x3 according to the new feature of 2.1 extractions
The convolutional network layer of convolution kernel, obtains the feature vector in the direction x;
(2.2.2) is by the sobel operator with vector dimension for the direction y of 3x3 according to the new feature of 2.1 extractions
The convolutional network layer of convolution kernel, obtains the feature vector in the direction y;
(2.2.3) spells the feature in the direction y that the feature in the direction x extracted 2.2.1 is extracted with 2.2.2 in characteristic dimension
It picks up and, obtain the correlative character on sdi video;
The correlative character on sdi video that (2.2.4) is extracted according to 2.2.3 is worth by being 2x1x1 with vector dimension
Vector for [- 1,1] is the convolutional network layer of convolution kernel, obtains the correlative character between video frame and frame.
As shown in figure 4, the specific steps of step 2.2 are as follows:
(2.2.1) is by the sobel operator with vector dimension for the direction x of 3x3 according to the new feature of 2.1 extractions
The convolutional network layer of convolution kernel, obtains the feature vector in the direction x;
(2.2.2) is by the sobel operator with vector dimension for the direction y of 3x3 according to the new feature of 2.1 extractions
The convolutional network layer of convolution kernel, obtains the feature vector in the direction y;
(2.2.3) spells the feature in the direction y that the feature in the direction x extracted 2.2.1 is extracted with 2.2.2 in characteristic dimension
It picks up and, obtain the correlative character on sdi video;
The correlative character on sdi video that (2.2.4) is extracted according to 2.2.3 is worth by being 2x1x1 with vector dimension
Vector for [- 1,1] is the convolutional network layer of convolution kernel, obtains the feature vector of time orientation;
The feature of the correlative character on sdi video that (2.2.5) obtains 2.2.3 and the time orientation of 2.2.4 extraction
Vector is stitched together in dimension, obtains the correlative character between video frame and frame.
As shown in figure 5, the specific steps of step 2.2 are as follows:
The new feature that (2.2.1) is extracted according to 2.1, by being 2x1x1 with vector dimension, the vector being worth for [- 1,1] is
The convolutional network layer of convolution kernel, obtains the feature vector of time orientation;
(2.2.2) passes through the direction x with vector dimension for 3x3 according to the feature vector of the 2.2.1 time orientation extracted
Sobel operator is the convolutional network layer of convolution kernel, obtains the feature vector in the direction x;
(2.2.3) passes through the direction y with vector dimension for 3x3 according to the feature vector of the 2.2.1 time orientation extracted
Sobel operator is the convolutional network layer of convolution kernel, obtains the feature vector in the direction y;
(2.2.3) spells the feature in the direction y that the feature in the direction x extracted 2.2.2 is extracted with 2.2.3 in characteristic dimension
It picks up and, obtain the correlative character between video frame and frame.
(3) feature for the target video that step 1 is extracted is used into mind in conjunction with the correlation information feature that step 2 is extracted
The motion feature of target video is extracted through network method.
(3.1) feature for the target video that step 1 is extracted directly is added with the correlation information feature that step 2 is extracted
To newly-generated feature;Or the correlation information feature for extracting the feature for the target video that step 1 is extracted and step 2 is in spy
It is stitched together in sign dimension, obtains newly-generated feature;
(3.2) the convolutional Neural layer for being 1x1 by one layer of convolution kernel size by feature newly-generated in step 3.1 generates
The motion feature of target video.
Claims (6)
1. a kind of motion feature extracting method based on the perception of video before and after frames relevant information, which is characterized in that including following step
It is rapid:
(1) neural network method is used, the feature of target video is extracted.
(2) feature extracted according to step 1 explicitly extracts the correlation information feature between before and after frames feature.
(3) feature for the target video that step 1 is extracted is used into nerve net in conjunction with the correlation information feature that step 2 is extracted
The motion feature of network method extraction target video.
2. motion feature extracting method according to claim 1, which is characterized in that the step 1 specifically:
The feature of target video is extracted in input to target video using two-dimentional neural network;
Or the input to target video, the feature of target video is extracted using three-dimensional nerve network.
3. motion feature extracting method according to claim 1, which is characterized in that the step 2 includes following sub-step:
(2.1) feature for the target video for extracting step 1 is as input feature vector, the volume for being 1x1 by one layer of convolution kernel size
Product network layer, generates new feature.
(2.2) according to above-mentioned new feature, by raw as the convolutional network layer of convolution kernel using sobel operator or temporal aspect difference
At correlation information feature.
4. motion feature extracting method according to claim 3, which is characterized in that in the step 2.2, the convolution net
Network layers preferably using vector dimension for the direction x of 3x3 sobel operator as the convolutional network layer of convolution kernel, with vector dimension be 3x3
The direction y sobel operator be the convolutional network layer of convolution kernel, with vector dimension for 2x1x1, being worth the vector for [- 1,1] is volume
The convolutional network layer of product core;
5. motion feature extracting method according to claim 1, which is characterized in that the step 3 specifically: by step 1
The feature of the target video of extraction is directly added to obtain newly-generated feature with the correlation information feature that step 2 is extracted, will be new
The convolutional Neural layer that the feature of generation is 1x1 by one layer of convolution kernel size, generates the motion feature of target video.
6. motion feature extracting method according to claim 1, which is characterized in that the step 3 specifically: by step 1
The correlation information feature that the feature of the target video of extraction is extracted with step 2 is stitched together in characteristic dimension, obtains new
The feature of generation, the convolutional Neural layer for being 1x1 by one layer of convolution kernel size by newly-generated feature, generates the fortune of target video
Dynamic feature.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910033541.7A CN109886104A (en) | 2019-01-14 | 2019-01-14 | A kind of motion feature extracting method based on the perception of video before and after frames relevant information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910033541.7A CN109886104A (en) | 2019-01-14 | 2019-01-14 | A kind of motion feature extracting method based on the perception of video before and after frames relevant information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109886104A true CN109886104A (en) | 2019-06-14 |
Family
ID=66925907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910033541.7A Pending CN109886104A (en) | 2019-01-14 | 2019-01-14 | A kind of motion feature extracting method based on the perception of video before and after frames relevant information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109886104A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114783069A (en) * | 2022-06-21 | 2022-07-22 | 中山大学深圳研究院 | Method, device, terminal equipment and storage medium for identifying object based on gait |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009121075A2 (en) * | 2008-03-28 | 2009-10-01 | Kalpaxis Alex J | System, method and computer-program product for cognitive learning |
CN102945268A (en) * | 2012-10-25 | 2013-02-27 | 北京腾逸科技发展有限公司 | Method and system for excavating comments on characteristics of product |
CN107292247A (en) * | 2017-06-05 | 2017-10-24 | 浙江理工大学 | A kind of Human bodys' response method and device based on residual error network |
CN107437258A (en) * | 2016-05-27 | 2017-12-05 | 株式会社理光 | Feature extracting method, estimation method of motion state and state estimation device |
CN107832708A (en) * | 2017-11-09 | 2018-03-23 | 云丁网络技术(北京)有限公司 | A kind of human motion recognition method and device |
CN109101896A (en) * | 2018-07-19 | 2018-12-28 | 电子科技大学 | A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism |
-
2019
- 2019-01-14 CN CN201910033541.7A patent/CN109886104A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009121075A2 (en) * | 2008-03-28 | 2009-10-01 | Kalpaxis Alex J | System, method and computer-program product for cognitive learning |
CN102945268A (en) * | 2012-10-25 | 2013-02-27 | 北京腾逸科技发展有限公司 | Method and system for excavating comments on characteristics of product |
CN107437258A (en) * | 2016-05-27 | 2017-12-05 | 株式会社理光 | Feature extracting method, estimation method of motion state and state estimation device |
CN107292247A (en) * | 2017-06-05 | 2017-10-24 | 浙江理工大学 | A kind of Human bodys' response method and device based on residual error network |
CN107832708A (en) * | 2017-11-09 | 2018-03-23 | 云丁网络技术(北京)有限公司 | A kind of human motion recognition method and device |
CN109101896A (en) * | 2018-07-19 | 2018-12-28 | 电子科技大学 | A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism |
Non-Patent Citations (3)
Title |
---|
FAN, LJ 等: "End-to-End Learning of Motion Representation for Video Understanding", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
SHUYANG SUN: "Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
ZHAOFAN QIU 等: "Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114783069A (en) * | 2022-06-21 | 2022-07-22 | 中山大学深圳研究院 | Method, device, terminal equipment and storage medium for identifying object based on gait |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rössler et al. | Faceforensics: A large-scale video dataset for forgery detection in human faces | |
CN108846365B (en) | Detection method and device for fighting behavior in video, storage medium and processor | |
Johnston et al. | A review of digital video tampering: From simple editing to full synthesis | |
KR102106135B1 (en) | Apparatus and method for providing application service by using action recognition | |
CN113179368A (en) | Data processing method and device for vehicle damage assessment, processing equipment and client | |
CN106254933A (en) | Subtitle extraction method and device | |
CN102088597B (en) | Method for estimating video visual salience through dynamic and static combination | |
CN110427859A (en) | A kind of method for detecting human face, device, electronic equipment and storage medium | |
CN110297943A (en) | Adding method, device, electronic equipment and the storage medium of label | |
CN109712144A (en) | Processing method, training method, equipment and the storage medium of face-image | |
CN106157363A (en) | A kind of photographic method based on augmented reality, device and mobile terminal | |
CN113365147A (en) | Video editing method, device, equipment and storage medium based on music card point | |
CN110476141A (en) | Sight tracing and user terminal for executing this method | |
CN109635822B (en) | Stereoscopic image visual saliency extraction method based on deep learning coding and decoding network | |
CN110503076A (en) | Video classification methods, device, equipment and medium based on artificial intelligence | |
CN111539290A (en) | Video motion recognition method and device, electronic equipment and storage medium | |
Karaman et al. | Human daily activities indexing in videos from wearable cameras for monitoring of patients with dementia diseases | |
CN103020606A (en) | Pedestrian detection method based on spatio-temporal context information | |
CN104036243A (en) | Behavior recognition method based on light stream information | |
CN103777757A (en) | System for placing virtual object in augmented reality by combining with significance detection | |
CN103327359A (en) | Video significance region searching method applied to video quality evaluation | |
CN114926734B (en) | Solid waste detection device and method based on feature aggregation and attention fusion | |
CN105979283A (en) | Video transcoding method and device | |
Hafiz et al. | Foreground segmentation-based human detection with shadow removal | |
CN109886104A (en) | A kind of motion feature extracting method based on the perception of video before and after frames relevant information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190614 |