CN109344764A - Measure the system and device of difference between video successive frame and its convolution characteristic pattern - Google Patents
Measure the system and device of difference between video successive frame and its convolution characteristic pattern Download PDFInfo
- Publication number
- CN109344764A CN109344764A CN201811138401.8A CN201811138401A CN109344764A CN 109344764 A CN109344764 A CN 109344764A CN 201811138401 A CN201811138401 A CN 201811138401A CN 109344764 A CN109344764 A CN 109344764A
- Authority
- CN
- China
- Prior art keywords
- difference
- temporal information
- characteristic pattern
- convolution
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
Measure the system and device of difference between video successive frame and its convolution characteristic pattern, the video belonged in computer vision application understands field, the type of information can be obtained in order to solve to increase convolutional neural networks, the problem of to increase understandability of the convolutional neural networks for video data, including camera and computer, the camera is for shooting video, the computer is stored with a plurality of instruction, described instruction is suitable for processor and loads and execute: the corresponding convolution characteristic pattern of the continuous frame data of video that camera is shot is used to calculate, to obtain the difference about temporal information of the two;Using the difference of temporal information as a part of convolutional neural networks loss function, the gradient descent procedures of convolutional neural networks backpropagation are participated in, make the gradient parameter of convolution kernel towards carrying out parameter update the case where retain input data temporal information.
Description
Technical field
The invention belongs to the videos in computer vision application to understand field, specifically a kind of measurement video successive frame
The method, system and device of difference between its convolution characteristic pattern.
Background technique
While deep learning utilizes the model realization of neural network structure building end-to-end application mode, model itself
The degree of reliability that ensure that model for the storage capacity of key message in huge data, makes deep learning model compared to tradition
There is incomparable advantage for algorithm, is ground in of short duration several years by numerous scholars of image, voice, text field
Study carefully and achieves significant progress.
Target detection, target classification, target identification, Target Segmentation in computer vision technique etc. are directed to single-frame images
In, deep learning can access the correspondence model for meeting practical landing demand precision.Faster-RCNN algorithm, which is used as, works as
The basic calculation structure of lower plurality of target detection algorithm is examined using suggestion areas and the dual structure for extracting feature convolution in target
Suggestion areas is generated confidence level, convolution feature weight and the final output target detection of window by phase mutual feedback during survey
As a result accuracy is associated formula calculating, is allowed to promote fitting degree jointly during neural network forward and reverse propagation,
It is finally reached excellent effect.Depth residual error neural network all shows good in the direction that multiple computer visions are applied
Effect, it handles the information exchanged between neuron by the method for stage introducing shorting layer, makes neural network just
Become very smooth to transmittance process, so that the gradient effectively solved in deep neural network disappears and gradient explosion issues.
The Target Segmentation neural network side of algorithm as a kind of classics OSVOS (One Shot Video Object Segmentation)
The part Calculation of the shunted current of image zooming-out prospect and profile is greater than a degree of contour area with foreground mask registration by method
As final segmentation result, have good robustness Target Segmentation.
With the increasingly maturation for single-frame images related application technology, further, to logic between successive image frame
The understanding of information, i.e., the Research Requirements understood the continuous frame time information of video are also suggested.It is acted to the pedestrian in video
Classify in this research direction, there are two types of most important technological means, respectively utilizes the binary-flow network of Optic flow information and 3D volumes
Product neural network.Binary-flow network is input to two networks using the RGB image of video frame and light stream image as input data
The training of model is carried out, the judgement information fusion calculation that will be exported each other, to obtain final pedestrian's classification of motion result.3D
Convolutional neural networks handle continuous multiple image using 3 dimension convolution kernels, remain the temporal information of video successive frame, thus
To reliably classification results.But it is accurate under practical application scene since video understands that the development time in direction is not long
Degree can not be satisfactory.More and more scholars think that existing method can not accurately extract the time letter of video successive frame
Breath, is not enough to reach application demand so as to cause the accuracy of model, needs to be further improved original method.
Summary of the invention
The type of information can be obtained in order to solve to increase convolutional neural networks, to increase convolutional neural networks for video
The problem of understandability of data, the following technical solutions are proposed by the present invention: a kind of measurement video successive frame and its convolution characteristic pattern
Between difference system, including camera and computer, for the camera for shooting video, the computer is stored with a plurality of finger
It enables, described instruction is suitable for processor and loads and execute:
The corresponding convolution characteristic pattern of the continuous frame data of video that camera is shot is used to calculate, to obtain the pass of the two
In the difference of temporal information;
Using the difference of temporal information as a part of convolutional neural networks loss function, it is reversed to participate in convolutional neural networks
The gradient descent procedures of propagation make the gradient parameter of convolution kernel towards carrying out parameter the case where retaining input data temporal information
It updates.
Further, the computer obtains the difference about temporal information based on such as under type:
Step 1: being image by Video Quality Metric, the video frame images that sum is n are obtained, all raw video images are taken out
Frame xiAnd the corresponding convolution characteristic pattern of the picture frameWherein i represents frame number, by raw video image and convolution characteristic pattern
It is divided into two set, in each set, two adjacent images are as one group of temporal information element to be calculated in set;
Step 2: the carry out zero padding of the data different to dimension rises dimension or the dimensionality reduction that zero-suppresses so that the same dimension of each data, obtains
To the second raw video image set P 'n-1With the second convolution characteristic pattern set Q'n-1;
Step 3: by the second raw video image set P 'n-1With the second convolution characteristic pattern set Q'n-1Set in institute
There are data to carry out space reflection calculating and be averaging, obtains third raw video image set f (x) and third convolution feature atlas
Close f (xc);Wherein f represents the continuous function collection on the renewable core Hilbert space being mapped;
Step 4: to third raw video image set f (x), third convolution feature set of graphs f (xc), it calculates separately each
The difference of the mapping data of each group same dimension temporal information element to be calculated in set sums to difference and calculates average value,
To described two mean values make difference and square, obtain the Largest Mean difference of temporal information;
Step 5: using the numerical value of the Largest Mean difference of temporal information as the one of convolutional neural networks model loss function
Part participates in the gradient descent procedures in network backpropagation, makes the gradient parameter of convolution kernel towards the reservation input data time
The case where information, carries out parameter update.
Further, the first raw video image set expression are as follows:
Pn-1={ [x1,x2],[x2,x3],[x3,x4]…[xn-1,xn]}
First convolution characteristic pattern set expression are as follows:
The Largest Mean difference of temporal information is indicated with formula are as follows:
Further, renewable core Hilbert space is constituted using renewable kernel function as basic data in space
The inner product space with completeness, the Limit Operation that completeness represents any function in the space cannot all be detached from the space
Range, the inner product space are to be conjugated symmetrical, line between arbitrary data can all carry out inner product and meet data in a kind of any dimensional space
Property and orthotropicity space, any space for meeting above-mentioned two condition all be referred to as Hilbert space, renewable core letter
Number represents the kernel function for meeting and possessing Eigenvalue and eigenfunction and arbitrary characteristics function all pairwise orthogonals in infinite dimensional space.
Further, the computer makes the gradient parameter of convolution kernel towards reservation input data based on the realization of such as under type
The case where temporal information carries out parameter update: convolutional neural networks have not only used defeated when updating the gradient of each convolution kernel
It is worth the difference size information with true value out, has used Largest Mean difference also as the calculation basis of gradient updating, makes each
The gradient parameter of a convolution kernel is updated towards the direction for reducing Largest Mean difference, reduces the maximum of two groups of temporal informations
The different representative of value difference: as gradient declines, the similarity of two groups of temporal informations is intended to increase, to guarantee convolutional neural networks energy
Enough temporal informations for preferably retaining initial data.
The invention further relates to difference devices between a kind of measurement video successive frame and its convolution characteristic pattern, including
The difference of temporal information obtains module: the corresponding convolution characteristic pattern of the continuous frame data of video that camera is shot
For calculating, to obtain the difference about temporal information of the two;
Parameter updating module: using the difference of temporal information as a part of convolutional neural networks loss function, volume is participated in
The gradient descent procedures of product neural network backpropagation make the gradient parameter of convolution kernel towards reservation input data temporal information
Situation carries out parameter update.
Further, the difference of the temporal information obtains the difference that module realizes acquisition time information based on such as under type
It is different:
It is image by Video Quality Metric, obtains the video frame images that sum is n, take out all raw video image frame xi, with
And the corresponding convolution characteristic pattern of the picture frameWherein i represents frame number, and raw video image and convolution characteristic pattern are divided into two
Set, respectively in set, two adjacent images are as one group of temporal information element to be calculated in set;
The carry out zero padding of the data different to dimension rises dimension or the dimensionality reduction that zero-suppresses is so that the same dimension of each data, obtains the second original
Beginning video image set P 'n-1With the second convolution characteristic pattern set Q'n-1;
By the second raw video image set P 'n-1With the second convolution characteristic pattern set Q'n-1Set in all data
It carries out space reflection calculating and is averaging, obtain third raw video image set f (x) and third convolution feature set of graphs f
(xc);Wherein f represents the continuous function collection on the renewable core Hilbert space being mapped;
To third raw video image set f (x), third convolution feature set of graphs f (xc), it calculates separately in each set
The difference of the mapping data of each group same dimension temporal information element to be calculated, sums to difference and calculates average value, to described two
A mean value make difference and square, obtain the Largest Mean difference of temporal information.
Further, the parameter updating module is based on realizing such as under type and update: the Largest Mean of temporal information is poor
The a part of different numerical value as convolutional neural networks model loss function, the gradient participated in network backpropagation declined
Journey makes the gradient parameter of convolution kernel towards carrying out parameter update the case where retaining input data temporal information.
Further, the first raw video image set expression are as follows:
Pn-1={ [x1,x2],[x2,x3],[x3,x4]…[xn-1,xn]}
First convolution characteristic pattern set expression are as follows:
The Largest Mean difference of temporal information is indicated with formula are as follows:
Further, renewable core Hilbert space is constituted using renewable kernel function as basic data in space
The inner product space with completeness, the Limit Operation that completeness represents any function in the space cannot all be detached from the space
Range, the inner product space are to be conjugated symmetrical, line between arbitrary data can all carry out inner product and meet data in a kind of any dimensional space
Property and orthotropicity space, any space for meeting above-mentioned two condition all be referred to as Hilbert space, renewable core letter
Number represents the kernel function for meeting and possessing Eigenvalue and eigenfunction and arbitrary characteristics function all pairwise orthogonals in infinite dimensional space;
The parameter updating module is specifically based under type such as and realizes update: gradient descent procedures itself can be according to the number of loss function
Value to carry out corresponding derivation and update to the gradient of neuron each in convolutional neural networks, to reduce the numerical value of loss function
For the final purpose of backpropagation, and while also regard the Largest Mean difference of temporal information as loss function a part, make
The loss that backpropagation not only reduces former convolutional neural networks according only to output valve and the difference size of true value to determine, together
When also reduce the Largest Mean difference of temporal information, that is, when reducing between former input video successive frame and its convolution characteristic pattern
Between information difference size, thus make the gradient parameter of convolution kernel towards being joined the case where retaining input data temporal information
Number updates.
The utility model has the advantages that
(1) reliable temporal information difference is obtained using temporal information differences method, it can training in neural network
Good utilization is able in journey.By the temporal information difference between former input video successive frame and its convolution characteristic pattern come abundant mind
Gradient information through network makes to train the process of neural network model more reliable, and final lift scheme is to the input data time
The understandability of information.Using temporal information difference parameter as a part of loss function, so that it is participated in neural network and reversely pass
The gradient descent procedures broadcast, since gradient descent procedures itself can be according to the numerical value of loss function come to nerve each in network
The gradient of member carries out corresponding derivation and update, using final purpose of the numerical value as backpropagation for reducing loss function.And by when
Between information gap parameter while be also used as loss function a part, so that backpropagation is not only reduced former network according only to output valve
The loss determined with the difference size of true value, while also reducing temporal information difference parameter, that is, reduce former input view
The difference size of temporal information between frequency successive frame and its convolution characteristic pattern makes the gradient parameter of convolution kernel towards reservation input data
The case where temporal information, carries out parameter update.
(2) in temporal information differences method with renewable core Hilbert space possess the complete inner product space, will
Information MAP can be intact to the space reservation initial data property, guarantee temporal information differences method calculate data foot
It is enough reliable, it can effectively embody the temporal information difference between video successive frame and its convolution characteristic pattern.Meanwhile the mapping space
Itself has steady regularity, it is ensured that method has enough continuitys, i.e., with the increase of input data set, side
Method can also rapidly converge to its desired value.
(3) the feature calculation emphasis of existing common convolutional neural networks is only in that on scene information, can not be transported well
Use temporal information, this method convolutional neural networks binding time information gap method is made it has acquired video successive frame with
Temporal information difference between its convolution characteristic pattern, the type of information can be obtained by improving network, to increase convolutional neural networks
For the understandability of video data.By temporal information difference between measurement video successive frame and its convolution characteristic pattern and make its ginseng
With back-propagation process, to also can be improved the mind while lift scheme continuous to video frame time information understandability
The order of accuarcy in relevant a variety of applications is understood in video through network model, such as promotes the correctness of video actions classification,
The accuracy of video Activity recognition is improved, guarantees effective output etc. of unusual checking in monitor video.Further exist
Also it is capable of providing certain miscellaneous function under other application scenarios, such as is provided reliably in the Classical correlation application for video of taking photo by plane
Temporal information difference, understanding to non-static object is promoted in the obstacle detection system of Autonomous Vehicle visual response part
Ability increases the understandability to different time sections real-time road temporal information difference, gives Autonomous Vehicle subsequent operation anticipation, road
The operations such as diameter planning, which provide, effectively to help.
(4) since the calculating logic of this method carries out metric calculation mainly for the otherness between different data, lead to
It crosses to different input datas using suitable cross-cutting conversion means, this method can be made to be used not only for measurement video and connected
Temporal information difference between continuous frame and its convolution characteristic pattern, can also assist in the related application for measuring continuous voice messaging
Task: such as the voice data of different places dialect even different language is extracted and is compared, is obtained between different language
Pitch disparity and syntactic structure difference, make neural network have according to voice data the ability for judging different language type;Or
Person is directed to the related application task of continuous text information: calculating is compared by the text information difference to different types,
Neural network is set to have according to text information the ability for judging text type of genre;Other data class can be used Deng other
The related application of type different information possesses good cross-cutting generalization.
Detailed description of the invention
Fig. 1 is the schematic diagram that this method handles a certain group of video successive frame and its convolution characteristic pattern
Fig. 2 is two original video sequential frame images in embodiment 1
Fig. 3 is the corresponding convolution characteristic pattern of two width original video sequential frame images in embodiment 1
Fig. 4 is the resulting quantization time information gap distance of embodiment 1
Fig. 5 is two original video sequential frame images in embodiment 2
Fig. 6 is the corresponding convolution characteristic pattern of two width original video sequential frame images in embodiment 2
Fig. 7 is the resulting quantization time information gap distance of embodiment 2
Fig. 8 is two original video sequential frame images in embodiment 3
Fig. 9 is the corresponding convolution characteristic pattern of two width original video sequential frame images in embodiment 3
Figure 10 is the resulting quantization time information gap distance of embodiment 3
Figure 11 is two original video sequential frame images in embodiment 4
Figure 12 is the corresponding convolution characteristic pattern of two width original video sequential frame images in embodiment 4
Figure 13 is the resulting quantization time information gap distance of embodiment 4
Figure 14 is two original video sequential frame images in embodiment 5
Figure 15 is the corresponding convolution characteristic pattern of two width original video sequential frame images in embodiment 5
Figure 16 is the resulting quantization time information gap distance of embodiment 5
Specific embodiment
Present invention is further described in detail with specific embodiment with reference to the accompanying drawing:
Embodiment: the present embodiment is in order to deepen neural network for the understandability of temporal information in video successive frame, needle
A kind of method is devised to convolutional neural networks to calculate the temporal information difference between video successive frame and its convolution characteristic pattern, it should
Method can be a kind of network model that can further be promoted to the metric difference side of time comprehension of information ability by software realization
Method.It is using the quantization time information gap calculated between the video successive frame and its convolution characteristic pattern that measurement obtains, difference is anti-
It is fed in the training process of neural network, neural network is enable to apply to temporal information difference, raising pair when updating weight
The understandability of the continuous interframe temporal information of video.
The present embodiment method can between robust calculation different field sample difference distance intension, be innovatively incorporated into
Convolutional neural networks calculate to use to the temporal information difference between video successive frame and its convolution characteristic pattern, enable the network to
Deepen the understanding to temporal information in video successive frame.
Wherein, video successive frame representative converts original video in the image data as unit of frame, continuous front and back
Two frames or arbitrary frame image.The representative of convolution characteristic pattern obtains raw image data after convolution algorithm, compared to original image
For with certain specific aim characteristic properties image data.
Wherein, temporal information represents a kind of need by carrying out difference operation to the continuous frame data of video, come what is obtained
The acquisition modes of time difference data between same video lower different moments picture frame, corresponding convolution characteristic pattern temporal information are same
Reason.Since video successive frame and its convolution characteristic pattern have source relationship, so the temporal information of two groups of data can also be regarded
To possess direct connection, the connection of reasonable utilization between the two understands that field and its related application have certain valence for video
Value.
The present embodiment is achieved through the following technical solutions, poor between a kind of measurement video successive frame and its convolution characteristic pattern
Different method, the algorithm content specifically comprise the following steps:
Step 1: being image by Video Quality Metric, the video frame images that sum is n are obtained, all raw video images are taken out
Frame xiAnd the corresponding convolution characteristic pattern of the picture frameWherein i represents frame number.Original image and convolution characteristic pattern are divided into
Two set, respectively in set, two adjacent images are as one group of temporal information element to be calculated in set.I.e. to original graph
Image set closes Pn-1With convolution feature set of graphs Qn-1Interior data carry out certain division processing, make each group in set it is to be calculated when
Between information element be to be made of two adjacent image datas of the image collection, such as x1With x2It is one group, x2With x3It is one group, volume
Product feature set of graphs is similarly.Wherein original image set may be expressed as:
Pn-1={ [x1,x2],[x2,x3],[x3,x4]…[xn-1,xn]}
Convolution feature set of graphs may be expressed as:
Step 2: carrying out zero padding to the dimension of all different size data rises dimension, or the processing for the dimensionality reduction that zero-suppresses, dimension is obtained
Treated original image set P 'n-1With dimension treated convolution feature set of graphs Q'n-1, two are gathered interior all data
Dimension size is all identical, this operation can be convenient for the progress of metric calculation;
Step 3: all data after two dimensions are handled in set carry out space reflection calculating and are averaging, obtain
Original image set f (x) and convolution feature set of graphs f (x after mapping after mappingc);
Wherein f represents the continuous function collection on the renewable core Hilbert space being mapped, and data are reflected in f (x) representative
Function result after penetrating;
Wherein renewable core Hilbert space is a kind of to be constituted using renewable kernel function as basic data in space
The inner product space with completeness.The Limit Operation that completeness represents any function in the space cannot all be detached from the space
Range, the inner product space are to be conjugated symmetrical, line between arbitrary data can all carry out inner product and meet data in a kind of any dimensional space
Property and orthotropicity space, any space for meeting above-mentioned two condition all be referred to as Hilbert space.Renewable core letter
Number represents the kernel function for meeting and possessing Eigenvalue and eigenfunction and arbitrary characteristics function all pairwise orthogonals in infinite dimensional space;
Step 4: by number after the liter dimension of two data of every group of temporal information element to be calculated in two set, mapping
According to difference operation is carried out, the difference of the mapping data of each group time dimension element to be calculated in each set is calculated separately, to difference
Value sums and calculates average value, calculates original image set P 'n-1Mapping set the average value and convolution feature atlas
Close Q'n-1Mapping set the mean value, and to described two mean values make difference and square, the Largest Mean for obtaining temporal information is poor
It is different.It may be expressed as: with formula
Obtain the quantized result of temporal information difference between original image and convolution characteristic pattern.
Step 5: being participated in using the numerical value of Largest Mean difference as a part of convolutional neural networks model loss function
Gradient descent procedures in network backpropagation keep difference of the weight gradient of network not only according to output valve and true value big
The small descent direction to determine gradient, while being also updated towards the direction for reducing Largest Mean values of disparity, make convolution kernel
Weight parameter towards reduce Largest Mean values of disparity direction update.
Above-mentioned technical proposal is a kind of method of difference between measurement video successive frame and its convolution characteristic pattern, and succinct says,
It includes the following steps:
Step 1: being image by Video Quality Metric, the video frame images that sum is n are obtained, all raw video images are taken out
Frame xiAnd the corresponding convolution characteristic pattern of the picture frameWherein i represents frame number, by raw video image and convolution characteristic pattern
It is divided into two set, in each set, two adjacent images are as one group of temporal information element to be calculated in set:
First raw video image set expression are as follows:
Pn-1={ [x1,x2],[x2,x3],[x3,x4]…[xn-1,xn]}
First convolution characteristic pattern set expression are as follows:
Step 2: the carry out zero padding of the data different to dimension rises dimension or the dimensionality reduction that zero-suppresses so that the same dimension of each data, obtains
To the second raw video image set P 'n-1With the second convolution characteristic pattern set Q'n-1。
Step 3: by the second raw video image set P 'n-1With the second convolution characteristic pattern set Q'n-1Set in institute
There are data to carry out space reflection calculating and be averaging, obtains third raw video image set f (x) and third convolution feature atlas
Close f (xc);Wherein f represents the continuous function collection on the renewable core Hilbert space being mapped.
Step 4: to third raw video image set f (x), third convolution feature set of graphs f (xc), it calculates separately each
The difference of the mapping data of each group same dimension temporal information element to be calculated in set sums to difference and calculates average value,
To described two mean values make difference and square, obtain the Largest Mean difference of temporal information, the maximum of temporal information indicated with formula
Mean value difference are as follows:
By above-mentioned, the present embodiment proposes a kind of method to measure the side of difference between video successive frame and its convolution characteristic pattern
Method is calculated, finally by dimension processing, space reflection and the difference to the continuous frame data of video and its convolution feature diagram data
Obtain the quantized values of temporal information difference between two groups of data.The temporal information difference obtained by this method can be fed back and be arrived
The training process of convolutional neural networks, to promote the understanding journey of neural network interframe temporal information difference continuous for video
Degree, influences the subsequent other application that video understands direction.
It is as follows that scheme relevant to the disclosure is disclosed in the prior art:
2016, application for a patent for invention " video understanding method and device " (publication number: CN107563257A) disclosed one
Kind is estimated based on scene depth and is obtained depth scene information, thus the side for further being understood scene content and being analyzed
Method, the invention mainly obtain the depth information of scene with a variety of different neural network structures.The difference lies in that this implementation
Example is mainly directed to the temporal information difference between the continuous frame data of original video and its convolution feature diagram data using calculation method,
Rather than the depth information in video scene is obtained by Multi-network.
2017, application for a patent for invention " a kind of image difference detection method based on steadiness factor method " was (open
Number: CNIO7705295A), disclose it is a kind of under Same Scene, different time, different perspectives data information obtained into
Row models and analyzes processing, to obtain steady scene information.The difference lies in that the present embodiment is mainly for video successive frame
Carry out the temporal information difference between measurement with its convolution characteristic pattern, reduces video successive frame by training convolutional neural networks
The method of temporal information difference size reinforces convolutional neural networks to the grasp energy of temporal information between its convolution characteristic pattern
Power increases network model for the understandability of video time information, and the data of Same Scene are believed under non-used different condition
The analysis processing of breath is to obtain different information.
2017, application for a patent for invention " a kind of video understanding method based on deep learning " (publication number:
CNIO7909014A), one kind is disclosed by three kinds of LSTM network, C3D algorithm and PCA algorithm method associative operations, further
Obtain the video understanding method of the stronger video sentence information to be detected of reliability.The difference lies in that the present embodiment utilizes the time
Information gap method measures the temporal information difference between video successive frame and its convolution characteristic pattern, finally obtains the difference of quantization
Heteromerism value with the sentence comprehension of video as a result, be not associated with.
Embodiment 1:
This embodiment is for one group of original video sequential frame image as shown in Figure 2 and correspondence convolution as shown in Figure 3
The distance metric that characteristic pattern carries out calculates, and Fig. 4 is calculated results.
Embodiment 2:
This embodiment is for one group of original video sequential frame image as shown in Figure 5 and corresponding convolution as shown in FIG. 6
The distance metric that characteristic pattern carries out calculates, and Fig. 7 is calculated results.
Embodiment 3:
This embodiment is for one group of original video sequential frame image as shown in Figure 8 and correspondence convolution as shown in Figure 9
The distance metric that characteristic pattern carries out calculates, and Figure 10 is calculated results.
Embodiment 4:
This embodiment is for one group of original video sequential frame image as shown in figure 11 and corresponding volume as shown in figure 12
The distance metric that product characteristic pattern carries out calculates, and Figure 13 is calculated results.
Embodiment 5:
This embodiment is for one group of original video sequential frame image as shown in figure 14 and corresponding volume as shown in figure 15
The distance metric that product characteristic pattern carries out calculates, and Figure 16 is calculated results.
The preferable specific embodiment of the above, only the invention, but the protection scope of the invention is not
It is confined to this, anyone skilled in the art is in the technical scope that the invention discloses, according to the present invention
The technical solution of creation and its inventive concept are subject to equivalent substitution or change, should all cover the invention protection scope it
It is interior.
Claims (10)
1. difference system between a kind of measurement video successive frame and its convolution characteristic pattern, it is characterised in that: including camera and calculating
Machine, the camera are stored with a plurality of instruction for shooting video, the computer, and described instruction is suitable for processor and loads simultaneously
It executes:
By camera shoot the corresponding convolution characteristic pattern of the continuous frame data of video be used to calculate, with obtain both about when
Between information difference;
Using the difference of temporal information as a part of convolutional neural networks loss function, convolutional neural networks backpropagation is participated in
Gradient descent procedures, make the gradient parameter of convolution kernel towards carrying out parameter more the case where retaining input data temporal information
Newly.
2. difference system between measurement video successive frame and its convolution characteristic pattern as described in claim 1, it is characterised in that: described
Computer obtains the difference about temporal information based on such as under type:
Step 1: being image by Video Quality Metric, the video frame images that sum is n are obtained, all raw video image frame x are taken outi,
And the corresponding convolution characteristic pattern of the picture frameWherein i represents frame number, and raw video image and convolution characteristic pattern are divided into two
A set, respectively in set, two adjacent images are as one group of temporal information element to be calculated in set;
Step 2: the carry out zero padding of the data different to dimension rises dimension or the dimensionality reduction that zero-suppresses is so that the same dimension of each data, the is obtained
Two raw video image set P 'n-1With the second convolution characteristic pattern set Q'n-1;
Step 3: by the second raw video image set P 'n-1With the second convolution characteristic pattern set Q'n-1Set in all numbers
It according to progress space reflection calculating and is averaging, obtains third raw video image set f (x) and third convolution feature set of graphs f
(xc);Wherein f represents the continuous function collection on the renewable core Hilbert space being mapped;
Step 4: to third raw video image set f (x), third convolution feature set of graphs f (xc), it calculates separately in each set
Each group same dimension temporal information element to be calculated mapping data difference, sum to difference and calculate average value, to described
Two mean values make difference and square, obtain the Largest Mean difference of temporal information;
Step 5: using the numerical value of the Largest Mean difference of temporal information as one of convolutional neural networks model loss function
Point, the gradient descent procedures in network backpropagation are participated in, believe that the gradient parameter of convolution kernel towards the input data time is retained
The case where breath, carries out parameter update.
3. difference system between measurement video successive frame and its convolution characteristic pattern as described in claim 1, it is characterised in that:
First raw video image set expression are as follows:
Pn-1={ [x1,x2],[x2,x3],[x3,x4]…[xn-1,xn]}
First convolution characteristic pattern set expression are as follows:
The Largest Mean difference of temporal information is indicated with formula are as follows:
4. difference system between measurement video successive frame and its convolution characteristic pattern as described in claim 1, it is characterised in that: can be again
Raw core Hilbert space is the inner product sky with completeness constituted using renewable kernel function as basic data in space
Between, the Limit Operation that completeness represents any function in the space cannot all be detached from the range in the space, and the inner product space is one kind
Arbitrary data can all carry out inner product and be conjugated the space of symmetrical, linear property and orthotropicity between meeting data in any dimensional space,
Any space for meeting above-mentioned two condition is all referred to as Hilbert space, and renewable kernel function, which represents, to be met in infinite dimension sky
Between in possess the kernel function of Eigenvalue and eigenfunction and arbitrary characteristics function all pairwise orthogonals.
5. difference system between measurement video successive frame and its convolution characteristic pattern as described in claim 1, it is characterised in that: described
Computer makes the gradient parameter of convolution kernel towards carrying out the case where retaining input data temporal information based on the realization of such as under type
Parameter updates: convolutional neural networks have not only used the difference of output valve and true value big when updating the gradient of each convolution kernel
Small information has used Largest Mean difference also as the calculation basis of gradient updating, makes the gradient parameter court of each convolution kernel
Reduce the direction of Largest Mean difference and be updated, the Largest Mean difference for reducing two groups of temporal informations represents: with gradient
Decline, the similarity of two groups of temporal informations is intended to increase, to guarantee that convolutional neural networks can preferably retain original number
According to temporal information.
6. difference device between a kind of measurement video successive frame and its convolution characteristic pattern, it is characterised in that: including
The difference of temporal information obtains module: the corresponding convolution characteristic pattern of the continuous frame data of video that camera is shot is used for
It calculates, to obtain the difference about temporal information of the two;
Parameter updating module: using the difference of temporal information as a part of convolutional neural networks loss function, convolution mind is participated in
Gradient descent procedures through network backpropagation make the gradient parameter of convolution kernel towards the case where retaining input data temporal information
To carry out parameter update.
7. difference device between measurement video successive frame and its convolution characteristic pattern as claimed in claim 6, it is characterised in that: described
The difference of temporal information obtains the difference that module realizes acquisition time information based on such as under type:
It is image by Video Quality Metric, obtains the video frame images that sum is n, take out all raw video image frame xiAnd the figure
As the corresponding convolution characteristic pattern of frameWherein i represents frame number, and raw video image and convolution characteristic pattern are divided into two set,
In each set, two adjacent images are as one group of temporal information element to be calculated in set;
The carry out zero padding of the data different to dimension rises dimension or the dimensionality reduction that zero-suppresses so that the same dimension of each data, obtains the second original view
Frequency image collection P 'n-1With the second convolution characteristic pattern set Q'n-1;
By the second raw video image set P 'n-1With the second convolution characteristic pattern set Q'n-1Set in all data carry out
Space reflection is calculated and is averaging, and obtains third raw video image set f (x) and third convolution feature set of graphs f (xc);Its
Middle f represents the continuous function collection on the renewable core Hilbert space being mapped;
To third raw video image set f (x), third convolution feature set of graphs f (xc), calculate separately each group in each set
The difference of the mapping data of same dimension temporal information element to be calculated sums to difference and calculates average value, to described two equal
Value make difference and square, obtain the Largest Mean difference of temporal information.
8. difference device between measurement video successive frame and its convolution characteristic pattern as claimed in claim 6, it is characterised in that: described
Parameter updating module, which is based on realizing such as under type, to be updated: using the numerical value of the Largest Mean difference of temporal information as convolutional Neural net
A part of network model loss function participates in the gradient descent procedures in network backpropagation, makes the gradient parameter court of convolution kernel
Reservation input data temporal information the case where carry out parameter update.
9. difference device between measurement video successive frame and its convolution characteristic pattern as claimed in claim 7, it is characterised in that:
First raw video image set expression are as follows:
Pn-1={ [x1,x2],[x2,x3],[x3,x4]…[xn-1,xn]}
First convolution characteristic pattern set expression are as follows:
The Largest Mean difference of temporal information is indicated with formula are as follows:
10. difference device between measurement video successive frame and its convolution characteristic pattern as claimed in claim 7, it is characterised in that: can
Reproducing kernel Hilbert space is the inner product with completeness constituted using renewable kernel function as basic data in space
Space, the Limit Operation that completeness represents any function in the space cannot all be detached from the range in the space, the inner product space one
Arbitrary data can all carry out inner product and be conjugated the sky of symmetrical, linear property and orthotropicity between meeting data in any dimensional space of kind
Between, any space for meeting above-mentioned two condition is all referred to as Hilbert space, and renewable kernel function, which represents, to be met unlimited
Possess the kernel function of Eigenvalue and eigenfunction and arbitrary characteristics function all pairwise orthogonals in dimension space;The parameter updating module
Be specifically based under type such as and realize update: gradient descent procedures itself can be according to the numerical value of loss function come to convolutional Neural net
The gradient of each neuron carries out corresponding derivation and update in network, to reduce the numerical value of loss function as the final of backpropagation
Purpose, and while also regard the Largest Mean difference of temporal information as loss function a part, reduce backpropagation not only
The loss that former convolutional neural networks are determined according only to output valve and the difference size of true value, while also reducing temporal information
Largest Mean difference, that is, reduce temporal information between former input video successive frame and its convolution characteristic pattern difference it is big
It is small, thus make the gradient parameter of convolution kernel towards carrying out parameter update the case where retaining input data temporal information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811138401.8A CN109344764A (en) | 2018-09-28 | 2018-09-28 | Measure the system and device of difference between video successive frame and its convolution characteristic pattern |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811138401.8A CN109344764A (en) | 2018-09-28 | 2018-09-28 | Measure the system and device of difference between video successive frame and its convolution characteristic pattern |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109344764A true CN109344764A (en) | 2019-02-15 |
Family
ID=65307067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811138401.8A Withdrawn CN109344764A (en) | 2018-09-28 | 2018-09-28 | Measure the system and device of difference between video successive frame and its convolution characteristic pattern |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109344764A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866509A (en) * | 2019-11-20 | 2020-03-06 | 腾讯科技(深圳)有限公司 | Action recognition method and device, computer storage medium and computer equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845357A (en) * | 2016-12-26 | 2017-06-13 | 银江股份有限公司 | A kind of video human face detection and recognition methods based on multichannel network |
WO2017185391A1 (en) * | 2016-04-29 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Device and method for performing training of convolutional neural network |
CN107818345A (en) * | 2017-10-25 | 2018-03-20 | 中山大学 | It is a kind of based on the domain self-adaptive reduced-dimensions method that maximum dependence is kept between data conversion |
CN108280426A (en) * | 2018-01-23 | 2018-07-13 | 深圳极视角科技有限公司 | Half-light source expression recognition method based on transfer learning and device |
-
2018
- 2018-09-28 CN CN201811138401.8A patent/CN109344764A/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017185391A1 (en) * | 2016-04-29 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Device and method for performing training of convolutional neural network |
CN106845357A (en) * | 2016-12-26 | 2017-06-13 | 银江股份有限公司 | A kind of video human face detection and recognition methods based on multichannel network |
CN107818345A (en) * | 2017-10-25 | 2018-03-20 | 中山大学 | It is a kind of based on the domain self-adaptive reduced-dimensions method that maximum dependence is kept between data conversion |
CN108280426A (en) * | 2018-01-23 | 2018-07-13 | 深圳极视角科技有限公司 | Half-light source expression recognition method based on transfer learning and device |
Non-Patent Citations (1)
Title |
---|
顾婷婷 等: "基于帧间信息提取的单幅红外图像深度估计", 《激光与光电子学进展》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866509A (en) * | 2019-11-20 | 2020-03-06 | 腾讯科技(深圳)有限公司 | Action recognition method and device, computer storage medium and computer equipment |
CN110866509B (en) * | 2019-11-20 | 2023-04-28 | 腾讯科技(深圳)有限公司 | Action recognition method, device, computer storage medium and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109389588A (en) | The method for measuring difference between video successive frame and its convolution characteristic pattern | |
CN109800689B (en) | Target tracking method based on space-time feature fusion learning | |
CN108197587B (en) | Method for performing multi-mode face recognition through face depth prediction | |
CN109284720A (en) | Measure application of the difference in video Activity recognition between video successive frame and its convolution characteristic pattern | |
CN106909938B (en) | Visual angle independence behavior identification method based on deep learning network | |
CN107316058A (en) | Improve the method for target detection performance by improving target classification and positional accuracy | |
CN110443843A (en) | A kind of unsupervised monocular depth estimation method based on generation confrontation network | |
CN105574510A (en) | Gait identification method and device | |
CN108564012B (en) | Pedestrian analysis method based on human body feature distribution | |
CN107871106A (en) | Face detection method and device | |
CN108830170B (en) | End-to-end target tracking method based on layered feature representation | |
WO2023015799A1 (en) | Multimodal fusion obstacle detection method and apparatus based on artificial intelligence blindness guiding | |
CN106228539A (en) | Multiple geometric primitive automatic identifying method in a kind of three-dimensional point cloud | |
CN113326735B (en) | YOLOv 5-based multi-mode small target detection method | |
CN110619268A (en) | Pedestrian re-identification method and device based on space-time analysis and depth features | |
CN106815563A (en) | A kind of crowd's quantitative forecasting technique based on human body apparent structure | |
CN114998890B (en) | Three-dimensional point cloud target detection algorithm based on graph neural network | |
CN108009512A (en) | A kind of recognition methods again of the personage based on convolutional neural networks feature learning | |
Chen et al. | Feature extraction method of 3D art creation based on deep learning | |
CN109344764A (en) | Measure the system and device of difference between video successive frame and its convolution characteristic pattern | |
CN117079095A (en) | Deep learning-based high-altitude parabolic detection method, system, medium and equipment | |
CN109389066A (en) | Measure application of the difference in the Classical correlation for taking photo by plane video between video successive frame and its convolution characteristic pattern | |
CN109359561A (en) | The metric algorithm of temporal information difference | |
CN109145874A (en) | Measure application of the difference in the detection of obstacles of Autonomous Vehicle visual response part between video successive frame and its convolution characteristic pattern | |
Ma | Summary of research on application of deep learning in image recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190215 |
|
WW01 | Invention patent application withdrawn after publication |