CN108923984A - Space-time video compress cognitive method based on convolutional network - Google Patents
Space-time video compress cognitive method based on convolutional network Download PDFInfo
- Publication number
- CN108923984A CN108923984A CN201810777563.XA CN201810777563A CN108923984A CN 108923984 A CN108923984 A CN 108923984A CN 201810777563 A CN201810777563 A CN 201810777563A CN 108923984 A CN108923984 A CN 108923984A
- Authority
- CN
- China
- Prior art keywords
- space
- video
- time
- network
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
- H04L41/044—Network management architectures or arrangements comprising hierarchical management structures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0823—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Physics & Mathematics (AREA)
- Algebra (AREA)
- Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Environmental & Geological Engineering (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
The invention discloses a kind of space-time video compress cognitive method based on convolutional network mainly solves the problems, such as that video compress space-time balanced difference and reconstructing video real-time are poor in the prior art.Its scheme is:Prepare training dataset;Design the network structure of space-time video compress cognitive method;Trained and test file is write according to designed network structure;The network of training space-time video compress cognitive method;Test the network of space-time video compress cognitive method.The network of space-time video compress cognitive method of the present invention enhances the reconfiguration technique of temporal correlation using space-time while the observation technology compressed and with " time-space block ", not only it is able to achieve real-time video reconstruct, and the result of reconstruct has stronger space-time balanced, reconstruction quality is high and stablizes, and can be used for the compression transmission and subsequent video reconstruction of video.
Description
Technical field
The invention belongs to technical field of video processing, relate generally to video compress perception, specifically a kind of to be based on convolution net
The space-time video compress cognitive method of network, can be used for realizing the video compress sensing reconstructing of real-time high quality.
Background technique
Compressed sensing (CS) is a kind of signal compression sampling theory, can lower than under nyquist sampling rate to signal into
Row sampling, and original signal is recovered by restructing algorithm.The theory Successful utilization in various field of signal processing, such as medicine
Imaging, radar imagery etc..After the hardware system such as appearance of single pixel camera and popularizing, compressed sensing is employed for static map
On compressing, and show brilliant potentiality.Nowadays compressed sensing is no longer limited to still image, but is generalized to video
In.Compared with still image, video compression also needs to consider the correlation in image temporal dimension, therefore uses compressed sensing
(VCS) theoretical treatment video is more complicated.
Using method substantially the space video compressed sensing and temporal video compression sense of compressive sensing theory processing video
Know, the observation process of the two uses spatial reuse camera (SMC) and time-multiplexed camera (TMC) to realize respectively.For SPACE V CS,
Video be input video is observed and is reconstructed frame by frame, and time VCS be multiple input video successive frames are carried out unified observation and
Reconstruct.In recent years, many methods have been realized in video compress perception, and obtain preferable video reconstruction effect.But due to
The time complexity for the iteration optimization calculation method taken is very high, cannot achieve Real-time Reconstruction video, and real-time is poor in other words, nothing
Method meets actual demand.
With the development of depth learning technology, deep neural network (DNN) is widely used in image and video processing neck
Domain, such as CS, super-resolution go rain, denoising and reparation, and achieve significant achievement.Since DNN has training under line, surveyed on line
The characteristics of examination, once network training is completed, test process, which only needs to carry out propagated forward, to be completed, therefore reconstitution time phase
Conventional method is greatly shortened.
In the method for the existing VCS based on DNN, some carries out the reconstruct of time compressed sensing using fully-connected network,
Cause time complexity high because parameter amount is excessive, cannot achieve Real-time Reconstruction;Some carries out space pressure using convolutional neural networks
Contracting sensing reconstructing enhances video inter-frame relation on the basis of just restoring video to obtain better video reconstruction effect.Due to
These methods are all only compressed in the single dimension in space (or time), also referred to as observe, the observation obtained is caused to be pressed
Resolution ratio in the dimension of contracting is very low.So that reconstruction result is between the pixel in compression dimension when carrying out video reconstruction
Correlation is insufficient, is difficult to be resumed by the information of compression dimension, to reduce the result of entire video reconstruction.
Summary of the invention
It is an object of the invention in view of the above shortcomings of the prior art, propose a kind of spatial and temporal resolution it is balanced, again
Structure performance is more preferable, the stable space-time video compress cognitive method based on convolutional network of reconstructing video result.
The present invention is a kind of space-time video compress cognitive method based on convolutional network, which is characterized in that includes as follows
Step:
1) training dataset is prepared:The video of different resolution needed for downloading, pre-processes all videos of downloading,
Institute's foradownloaded video is successively switched to the video frame of gray scale, and spatially position is cut to the fritter of certain size, by each sky
Between the fritter of position save as picture, be stored in different sub-folders, and name according to video time frame sequential, i.e. 1.jpg,
2.jpg, and so on;All sub-folders finally constitute a whole file, and using whole file as training number
According to collection;Test data set is any video randomly selected, switchs to greyscale video and is stored in file;
2) network structure of space-time video compress cognitive method is designed:Network structure includes observation part and reconstructing part
Point, input video block is input to after a Three dimensional convolution layer as observation output by observation part, and reconstruct part exports observation
Be serially connected to a three-dimensional warp lamination, several " space-time blocks ", one BN layers, obtain reconstructing video after a Three dimensional convolution layer
Block;Each " space-time block " is sequentially connected in series for four Three dimensional convolution layers, and BN layers are added before each Three dimensional convolution layer, then will
The output end progress residual error connection of the output end and the 4th Three dimensional convolution layer of first Three dimensional convolution layer;
3) trained and test file is write according to designed network structure:
Project folder 3a) is established, establishes training, test, network structure, network settings, function etc. respectively in file
Program file;
3b) it is arranged and writes associated file content:Reasonable network hyper parameter is set in network profile, and in letter
Required function in code is trained and is tested in write-in in number file;It is perceived in network structure file according to space-time video compress
Method writes network structure;
The training process of network 3c) is defined in training file:If the sub-folder concentrated from training data takes in order
Dry video frame forms network inputs video block of the video block as space-time video compress cognitive method, by space-time video compress
The reconstructing video block and input video block of the network output of cognitive method calculate mean square error reconstruct and lose, and reconstruct loss is carried out
Backpropagation is updated network parameter;
The test process of network 3d) is defined in test file:The greyscale video that test data is concentrated is switched into video frame,
Gray scale frame is divided into many continuous several frames sequentially in time, each continuous several frame is as an input video
Block, the network for being input to space-time video compress cognitive method obtain corresponding reconstructing video block, and reconstructing video block includes and input
The identical frame number of video block, arranges reconstructing video block to obtain reconstructing video according to time sequencing, perceives for space-time video compress
The test result of the network of method;
4) network of training space-time video compress cognitive method:By network structure file load networks structure and involved
And the parameter arrived, and to all parameter initializations, hyper parameter is loaded by network profile, is calculated using stochastic gradient descent
Method is carried out according to the training process of training document definition using network of the training dataset to space-time video compress cognitive method
Repeatedly training undated parameter, training terminate to obtain final argument model;
5) network of space-time video compress cognitive method is tested:By network structure file load networks structure, load is most
End condition model defines test process as network architecture parameters, according to test file, is regarded using test data set test space-time
The network of frequency compression sensing method obtains real-time, the perception of high quality, the space-time video compress of high space-time balance reconstruct test
Results for video.
The observation part of the network for the space-time video compress cognitive method that the present invention designs uses Three dimensional convolution layer, can be with
Obtain spatial and temporal resolution equilibrium observed result.Reconstruct part has used three-dimensional warp lamination and " space-time block ", can reduce net
Network parameter amount improves harmony in reconstructing video space-time dimension while remove blocking artifact, it is final realize in real time, high quality,
The video reconstruction of high space-time balance.
Compared with the prior art, the present invention has the following advantages:
1. observation process enhances the harmony of time-space compression:In compressed sensing, due to video observation and still image
Observation has larger difference, and processing video observation will not only consider the correlation in the frame in sdi video dimension between pixel, also
Need to consider the correlation between the frame and frame in video time dimension.The present invention compensates for existing method only in time dimension or space
Compression observation is carried out in dimension, so that the space-time balanced of observation is poor and the difference of quality reconstruction on time and space dimension
It is larger, cause reconstructed video quality poor, present invention employs the observation technologies that a kind of novel space-time compresses simultaneously, so that seeing
The space-time balanced of survey is stronger, can obtain preferable reconstructed video quality.
2. video reconstruction can be carried out high quality in real time:Since existing video observation method has ignored in observation process
Space-time balanced, so that reconstructed video quality is poor.In addition to this, traditional VCS method all uses the calculation method of iteration optimization,
This makes computation complexity very high, cannot achieve video reconstruction.Existing some video compress perception sides neural network based
The network parameter amount that method uses is big, so that the video reconstruction time is longer, thus can not Real-time Reconstruction.It is of the invention based on full convolution
Parameter amount used in neural network video compress cognitive method is smaller, and propagated forward process time is extremely short, may be implemented in real time
Reconstruct, and space-time can be enhanced in " the time-space block " in the reconstruct part in the network of space-time video compress cognitive method
Correlation, this observation strong with the space-time balanced of acquisition, which combines, can greatly promote video reconstruction effect, obtain high
The reconstructing video of quality.
3. reconstructing video stability of the present invention is high, it can guarantee that each frame of reconstructing video all has similar preferably reconstruct
Quality, there is no the unconformable problems of visual experience caused by the non-uniform reconstruction result of reconstructed video quality interframe, also can be more
Video file compression and transmission are scientifically carried out, ensure that the integrality of video in compression and restructuring procedure.
Detailed description of the invention
Fig. 1 is the flow chart of the space-time video compress cognitive method of the invention based on convolutional network;
Fig. 2 is the network structure of space-time video compress cognitive method of the invention;
Fig. 3 is the schematic diagram of " time-space block " unit in reconstruct part of the invention;
The result of reconstructing video frame when Fig. 4 is present invention test.
Specific embodiment
The present invention is described in detail with example with reference to the accompanying drawing.
Embodiment 1
Nowadays compressed sensing is no longer limited to still image, but is generalized in video.Compared with still image, video
Compression process also needs to consider the correlation in image temporal dimension, therefore more with compressed sensing (VCS) theoretical treatment video
It is complicated.Video compress perception carries out compression sampling to video, reduces memory space and greatly increases transmission speed, is transmitted
The reconstructing video that data obtain after being reconstructed can carry out more complicated task, such as target detection and tracking, existing method
It is all only compressed in the single dimension in space (or time), also referred to as observes, cause the observation obtained in compressed dimension
On resolution ratio it is very low.So that correlation of the reconstruction result between the pixel in compression dimension is not on carrying out video reconstruction
Foot, is difficult to be resumed by the information of compression dimension, and reduce entire video reconstruction as a result, reconstructed video quality is also unstable,
For this phenomenon, the present invention is inquiring into a kind of better method, proposes a kind of space-time video compress perception based on convolutional network
Method referring to Fig. 1, including has the following steps:
1) training dataset is prepared:The video of different resolution needed for downloading, in order to simplify training process, to the institute of downloading
There is video to be pre-processed, institute's foradownloaded video is successively switched to the video frame of gray scale, and spatially position is cut to certain size
Fritter, the fritter of each spatial position is saved as into picture, is stored in different sub-folders, and suitable according to video time frame
Sequence name, i.e. 1.jpg, 2.jpg, and so on, n-th frame of the picture from video is represented if n.jpg;All sub-folders
A whole file is finally constituted, and using whole file as training dataset;Test data set randomly selects
Any video, also needs to switch to greyscale video frame to be stored in file.
2) network structure of space-time video compress cognitive method is designed:The network structure packet of space-time video compress cognitive method
Observation part and reconstruct part are included, input video block is input to defeated as observing after a Three dimensional convolution layer by observation part
Out, Three dimensional convolution layer carries out convolution operation in Spatial Dimension and time dimension respectively, by the volume for adjusting separately Three dimensional convolution layer
Observation information can be reasonably allocated in space dimension and time dimension by product core in the compression ratio of room and time, go to obtain space-time
The observation of resolution ratio equilibrium, existing method often use two-dimensional convolution layer to carry out identical space dimension up in each time frame
Observation will cause the correlation that can not extract video time dimension in this way, cause reconstructing video effect on time dimension poor;Weight
Observation output is serially connected to three-dimensional warp lamination, several " space-time blocks ", one BN layers, a Three dimensional convolution by structure part
Reconstructing video block is obtained after layer;Each " space-time block " is sequentially connected in series for four Three dimensional convolution layers, and in the every of " space-time block "
Add BN layers before a Three dimensional convolution layer, then by the output end of the output end of first Three dimensional convolution layer and the 4th Three dimensional convolution layer
Carry out residual error connection;Observation information, can be carried out a liter dimension by three-dimensional warp lamination used in the corresponding observation of three-dimensional warp lamination
Just restoration result is obtained, the purpose for rising dimension is solution space in order to which the solution space of observation information to be mapped to video, convenient for subsequent
" space-time block " be first restoration result add more details information." space-time block " can prevent gradient network more using residual error connection
It dissipates, adds detailed information while deepening network for video, enhance the temporal correlation of just restoration result, ensure that reconstruct view
After the recovery effects of frequency, integration function finally is played with a Three dimensional convolution layer, further enhances inter-frame relation.
3) trained and test file is write according to designed network structure:
Project folder 3a) is established, establishes training, test, network structure, network settings, function etc. respectively in file
Program file;
3b) it is arranged and writes associated file content:Reasonable network hyper parameter is set in network profile, including
Batchsize, epoch etc..And required function in training and test code is written in function file;In network structure text
Network structure is write according to space-time video compress cognitive method in part.
The training process of network 3c) is defined in training file:The sub-folder concentrated from training data takes t in order
Video frame forms network inputs video block of the video block as space-time video compress cognitive method, and the value and space-time of t regards
Observation part is positively correlated in the compression ratio of time dimension in the network of frequency compression sensing method, generally requires the observation part of basis
Interior space compression rate is adjusted, and reaches time dimension compression ratio and space dimension compression ratio with a relative equilibrium ratio.Input
Video block passes through network propagated forward, obtains the reconstructing video block of the network output of space-time video compress cognitive method, reconstruct view
Frequency block and input video block calculate mean square error reconstruct and lose, and mean square error reconstructs costing bio disturbance input video block and reconstructing video
The quadratic sum of the distance of each pixel of block, reconstruct loss carries out backpropagation, using gradient descent algorithm to network parameter
It is updated, so far completes primary training, when training dataset reaches the upper limit of access times, then training terminates, and otherwise repeats
Training.
The test process of network 3d) is defined in test file:In order to realize that full figure reconstructs, by any test view when test
Frequency does not need to be cut after switching to gray scale frame, the gray scale frame that all test datas are concentrated is divided into many continuous several
Frame, every t frame is a video block from video first frame, and the zero padding of number deficiency, each continuous t frame is as an input
Video block, the network for being input to space-time video compress cognitive method obtain corresponding reconstructing video block, reconstructing video block include with
The identical frame number of input video block, using former frames of the last one non-zero padding of reconstructing video block as the reconstruct of the video reconstruction block
It is the network of space-time video compress cognitive method as a result, arranging reconstructing video block to obtain reconstructing video according to time sequencing
Test result.
4) network of training space-time video compress cognitive method:By network structure file load networks structure and involved
And the parameter arrived, and to all parameter initializations, hyper parameter is loaded by network profile, is calculated using stochastic gradient descent
Method, according to the training process of training document definition, using training dataset of the invention to space-time video compress cognitive method
Network carries out repeatedly training undated parameter, and training terminates to obtain final argument model.
5) network of space-time video compress cognitive method is tested:By network structure file load networks structure, load is most
End condition model defines test process as network architecture parameters, according to test file, is regarded using test data set test space-time
The network of frequency compression sensing method obtains real-time, the perception of high quality, the space-time video compress of high space-time balance reconstruct test
Results for video.
It is adopted since existing video observation method usually only carries out compression in time dimension or space dimension in observation process
Sample has ignored space-time balanced, so that reconstructed video quality is poor.In addition to this, traditional VCS method is all changed using traditional
For optimized calculation method, this makes time complexity high, cannot achieve real-time video reconstruct.It is more existing to be based on neural network
The network parameter amount that uses of video compress cognitive method it is big so that the video reconstruction time is longer, equally cannot achieve real-time weight
Structure.Parameter amount used in the space-time video compress cognitive method based on convolutional network used in the present invention is small, propagated forward
Process time is short, and real-time video reconstruct may be implemented.Simultaneously, part is reconstructed in the network of space-time video compress cognitive method
In " time-space block " restore the temporal correlation of video information when can be enhanced reconstructing video, this is equal with the space-time of acquisition
The strong observation of weighing apparatus property combines can be with significant increase video reconstruction effect, to obtain high quality and stable reconstructing video.
Embodiment 2
Space-time video compress cognitive method based on convolutional network is with implementation 1, design space-time video described in step 2)
The network structure of compression sensing method referring to fig. 2, including has the following steps:
2a) the Three dimensional convolution layer setting of the space-time video compress cognitive method observation part based on convolutional network:Setting three
The size for tieing up the convolution kernel of convolutional layer is T × 3 × 3, and wherein T=16 is size of the convolution kernel on time dimension, and 3 × 3 be space
Size in dimension, and not zero padding in convolution process is set, step-length 3;Input video block size is T × H × W, and T is input video
The frame number that block includes, H × W is the space dimension size of each frame, and H and W are 3 multiple;When convolution kernel number is 1, space
Compression ratio is 9, and time compression ratio is T, observation rateThe number that convolution kernel is arranged is N, then observation rateWork as N=9, observation rate when T=16Therefore this observed pattern can obtain in time and space
The observation for tieing up the spatial and temporal resolution equilibrium all having compressed facilitates the letter that reconstruct part preferably recovers time and space dimension
Breath.
2b) observation output is firstly connected to the three-dimensional three-dimensional warp lamination of warp lamination reconstructing part point by reconstruct part
Setting:The convolution kernel that three-dimensional warp lamination is arranged need to be according to convolution and the symmetry of deconvolution, when the convolution kernel of deconvolution process
When size is identical as the convolution kernel size of corresponding convolution process, deconvolution output just with the input size of convolution
It is identical, therefore phase should be arranged with the convolution kernel of the Three dimensional convolution layer of observation process in the convolution kernel setting that three-dimensional warp lamination is arranged
Together, i.e., having a size of T × 3 × 3, number N, not zero padding in convolution process, step-length 3.
2c) three-dimensional warp lamination is connected to the design of " the space-time block " of several " space-time blocks " reconstruct part:Each " space-time
Block " is that four Three dimensional convolution layers are sequentially connected in series, and BN layers are added before each Three dimensional convolution layer of " space-time block ", then by the
The output end progress residual error connection of the input terminal and the 4th Three dimensional convolution layer of two Three dimensional convolution layers, residual error are connected second
A, third, the 4th Three dimensional convolution layer constitutes residual block;Each " space-time block " is then by a Three dimensional convolution layer and a residual error
Block serial connection.
2d) reconstruct the Three dimensional convolution layer setting of part:The convolution kernel of the last Three dimensional convolution layer in part is reconstructed having a size of 16
× 1 × 1, number 16, step-length 1, not zero padding in convolution process.The Three dimensional convolution layer of not zero padding in convolution process is further whole
It closes, enhancing inter-frame information, obtains final reconstructing video frame.
The network structure that the present invention designs space-time video compress cognitive method uses Three dimensional convolution layer in observation part, point
The other compression sampling that time dimension and space dimension are carried out to input video block, compared to existing method only in time dimension and space dimension list
It is sampled in a dimension, the sampling of space-time balanced can be obtained, compared to existing method using Gaussian matrix as observation
Matrix, and by the way of convolution study, it is able to carry out more reasonable compression sampling;The network of space-time video compress cognitive method
Reconstruct part is made of a three-dimensional warp lamination, several " space-time blocks ", one BN layer, a Three dimensional convolution layer, three-dimensional is instead
Convolutional layer and Three dimensional convolution layer are symmetry operations, are handled for carrying out liter dimension to observed result, by observed result expand to it is defeated
Enter the identical dimension of video block, be conducive to subsequent network and further restored, is connected in " space-time block " using residual error, it can
More details information is added for reconstructing video, this is because residual error connectionist learning is difference between the outputting and inputting of residual block
Value, so that it is more focused on the detailed information in video, three-dimensional warp lamination plays the integration of network, so that network is last
Reconstructing video block and input video block size having the same.
Embodiment 3
Space-time video compress cognitive method based on convolutional network is with implementing 1-2, step 2c) described in each of " space-time
Block " then by a Three dimensional convolution layer and a residual block serial connection, referring to Fig. 3, including has the following steps:
2c1) the Three dimensional convolution layer setting in each " space-time block ":The convolution kernel size of Three dimensional convolution layer be 16 × 1 ×
1, number 16, and not zero padding in convolution process is set, step-length 1;Since the space dimension of convolution kernel is having a size of 1 × 1, energy
The inter-frame information of each spatial position is enough integrated, there is the ability of enhancing inter-frame relation.
2c2) the residual block setting in each " space-time block ":Residual block includes three Three dimensional convolution layers, Three dimensional convolution layer
The size of convolution kernel be respectively 16 × 3 × 3,64 × 1 × 1 and 32 × 3 × 3, number is respectively 64,32,16, and convolution mistake is arranged
Cheng Jun not zero padding, step-length are 1;Since space dimension can merge space time information, space dimension size having a size of 3 × 3 convolution kernel
Inter-frame information can be integrated for 1 × 1 convolution kernel, therefore space time information can be enhanced in such setting, by the input of residual block
The output end that end is connected to residual block carries out read group total, and adds Tanh active coating after read group total.
2c3) in 2c1) and each Three dimensional convolution layer 2c2) before plus one BN layer to accelerate convergence rate, then add
PReLU is to enhance the non-thread sexuality of network;
Multiple " space-time blocks " can 2c4) be cascaded, i.e., using the output of upper one " space-time block " as it is next " when
The input of empty block " is to expand network capacity, and the structure of each block is identical, such as 2c1) -2c3) one " space-time block " of the completion
Build.It can refer in the present invention according to the actual situation, using one " space-time block " or " multiple space-time blocks ", " space-time block "
Number and the reconstructing video time be trade-off relation, in other words, to requirement of real-time height, then can use " a space-time
Block ".
The present invention designs " the space-time block " of the network reconfiguration part of space-time video compress cognitive method by a Three dimensional convolution
Layer and a residual block serial connection, main functional parts are serially connected from residual block, residual block by three Three dimensional convolution layers
It connects, and the input of first Three dimensional convolution layer is connected to the output of third Three dimensional convolution layer, i.e., by first Three dimensional convolution
The input value of layer is added with the output valve of third Three dimensional convolution layer, collectively as lower layer of input of network, is equivalent in this way
Residual block only learnt residual block input and residual block output difference, residual block output and input between small variations all can
It is amplified and is handled, allow the network to add detailed information when reconstructing video is arrived in study for video.
Embodiment 4
Space-time video compress cognitive method based on convolutional network is with implementing 1-3, step 3c) described in network settings
Reasonable network hyper parameter is set in file, including is had the following steps:
Each input video block number batchsize for training the network for inputting space-time video compress cognitive method, which is arranged, is
20~40, batchsize can make the appropriate adjustments according to the video size in the depth and training dataset of network, entire to instruct
The number epoch that white silk process training dataset uses can do suitable for 3~7, epoch according to the video number in training dataset
Work as adjustment.
Embodiment 5
Space-time video compress cognitive method based on convolutional network is with implementing 1-4, step 3c) described in training file
The middle training process for defining network, referring to Fig. 1, including has the following steps:
3c1) training dataset input video block is handled:Since first sub-folder, according to the number in picture name
Every T=16 picture forms an input video block from small to large, takes be divided into a picture between input video block every time, such as
1.jpg-16.jpg is first input video block, and 2.jpg-17.jpg is second input video block, and so on;Space-time view
The network of frequency compression sensing method inputs batchsize continuous video blocks, i.e. the 1-batchsize input video every time
As inputting for the first time, the 2-batchsize+1 input video block inputs block as second, and so on, if every figure
Chip size is H × W, then input each time includes Q=batchsize × T × W × H pixel value.For training on a large scale
Data set can suitably increase the number of batchsize, so that trained data increase every time, accelerate the trained time.
3c2) training process of the network of space-time video compress cognitive method:The perception of space-time video compress will be inputted every time
Each pixel value X in the continuous video block of batchsize of the network of method0It first passes throughOperation by picture
Plain value normalizes to [- 1,1], then inputs X in the network defined and obtains corresponding reconstruction result X', calculates reconstruction result
Mean square error between X' and input X is as reconstructed errorReconstructed error is subjected to backpropagation, updates net
Network parameter completes a training process of network.
3c3) repeat 3c1) -3c2), training process is executed repeatedly, until the data in current sub-folder have used, is opened
Beginning is trained using the data in next sub-folder, 500 preservation primary network models of every iteration, and network model includes
Network structure and parameter have all been used until by the data in all sub-folders, complete a training dataset training.
3c4) judge whether to complete epoch training dataset trained, is, trained to terminate, otherwise repeatedly 3c1) -3c4).
Space-time video compress cognitive method based on convolutional network uses the training method of neural network, without using tradition
Iteration optimization algorithms, traditional iteration optimization algorithms reconstructing video is multiple to each video iteration optimization, to reconstruct
Time is long, can not achieve Real-time Reconstruction, and computation burden effectively can be transferred to network training process by the present invention, test
When only need input video to carry out network propagated forward then available network reconfiguration video, when being greatly saved calculating
Between, realize real-time video reconstruct.
Embodiment 6
Space-time video compress cognitive method based on convolutional network is with implementing 1-5, step 3d) described in basis design
Network structure will test code be written test file, referring to Fig. 1, including have the following steps:
3d1) test data set input video block is handled:Test data set is any video randomly selected, switchs to gray scale
Video frame is stored in file without cutting out, and contains P video frame for one, each video frame is having a size of H0×W0's
Test video enables every T frame form an input video block, such as:1~T gray scale frame is as first input video block, and 2T~3T is then
Second input video block is formed, then the number that can calculate complete input video frame isRemaining video frame number
Mesh p=P-n × T.
3d2) for preceding n × T frame:It is successively read video frame, works as H0Or W0It is not the multiple for observing convolution kernel bulk 3
When, it needs to carry out zero padding at the position of last row or column, the H in bulk and W that keep its new are 3 multiple;When
The frame number of accumulative video frame reaches T and, having a size of T × H × W input video block, inputs the sense of space-time video compress to get to one
The network of perception method obtains corresponding reconstructing video block.
3d3) for last p frame:Zero padding operation is all carried out in time and space, is spliced into T × H × W video
Block subsequently inputs acquisition reconstructing video block in the network of space-time video compress cognitive method, only not by the preceding p frame of not zero padding
The size of zero padding is H0×W0Part as reconstructing video agllutination fruit.
3d4) reconstructing video block is arranged to obtain reconstructing video according to time sequencing, is space-time video compress cognitive method
The test result of network;
3d5) to each test video according to 3d1) -3d4) successively handle.
To input when do not needed when space-time video compress cognitive method test network based on convolutional network as training network
Greyscale video is cut, because the network of space-time video compress cognitive method is full convolutional network, full convolutional network is once instructed
Practice and complete, is capable of handling the video of any space size, the use scope of present networks has also been enlarged in this, can be to arbitrary size
Greyscale video carries out compression reconfiguration.
A more detailed example is given below, the present invention is further described
Embodiment 7
Space-time video compress cognitive method based on convolutional network includes the following steps with 1-6 is implemented:
Step 1, training dataset is prepared.
The video of different resolution needed for 1a) downloading is downloaded, is put into " video " file, establishes one entitled " frame "
Empty folder establish the empty folder of one entitled " patch " for saving greyscale video frame, for after saving and cutting
Greyscale video frame.
1b) configuration Cuda accelerates, installs Python and install third party library at Python on computers
TensorFlow and cv2.
1c) writing code using the library cv2 becomes greyscale video frame for video:
1c1) enter " video " file, access to some video in video set, obtains video name.
1c2) established and video video frame file of the same name under " frame " file.
1c3) to the video accessed, following steps are executed:Utilize video function reading cv2.VideoCapture ()
Read the video frame of the video;Color framing is switched into gray scale frame using cv2.cvtColor () function;The video frame that will be handled well
It is saved in the video frame file built up sequentially in time, such as:1.jpg, 2.jpg, and so on.
Trimming operation 1d) is carried out to greyscale video frame:
Frame_to_patch () function 1d1) is write, the gray scale frame from sub-folder a certain in the file of " frame "
The fritter having a size of 360 × 240 is cut out according to spatial position;
The fritter of each spatial position 1d2) is saved as into picture, is named according to video time frame sequential, i.e. 1.jpg,
2.jpg, and so on;By the different sub-folders under the file of picture deposit " patch ", sub-folder is named as " ash
Spend frame folder name+spatial position sequence number ", such as " Horse.avi_1 ", " Flower_3 " etc., by file " patch " conduct
Training dataset;
1e) test data set is any video randomly selected, switchs to greyscale video, is saved in builds up sequentially in time
File in " test_frame ", such as:1.jpg, 2.jpg, and so on;
Step 2, training and test generation are write using deep learning frame TensorFlow according to designed network structure
Code:
2a) establish project folder, established in project folder train.py for train, test.py for test,
Config.py is for saving parameter setting, network.py preservation network structure, utils.py for being stored in alternative document
The function used, with simplified code.
Reasonable network hyper parameter 2b) is set in config.py file, such as the number of the frame group of each iteration input
Using number epoch=5 of data set etc. when frame number N=16 that batchsize=16, each frame group include, training, will train
And function required for test code, the function for normalizing to [- 1,1], the loss function for calculating mean square error etc. will be such as inputted,
Utils.py is written.
2c) write network.py file to define the structure of used full convolutional network, as shown in Figure 2:
Input 2c1) is first passed through into a Three dimensional convolution layer, is observed operation.The convolutional layer includes 9 having a size of 3 × 3
The three-dimensional convolution kernel of × N, to obtain the observed result all having compressed in space-time dimension, observation rate is 1/16.
2c2) using observed result as the input of a three-dimensional warp lamination, a liter dimension is carried out to the observation of the low-dimensional of input,
The Output Size of the warp lamination is identical in the input size of observation convolutional layer, and convolution kernel is having a size of 3 × 3 × N.It obtains and inputs
Time and bulk it is identical just restoration result.
First restoration result 2c3) is inputted into 1 or 2 cascade " time-space block ".The structure of space-time block such as Fig. 3 institute
Show, Fig. 3 is one " space-time block " composition signal, and the frame of " space-time block " of the invention enhancing preliminary reconstruction is in time dimension and space dimension
Correlation, obtain final reconstruction result.
Briefly, it is used as observed result after video input Three dimensional convolution layer is observed, then is input to three-dimensional deconvolution
Layer carries out after just restoring, then is input to after " space-time block " carries out space-time enhancing processing and obtains final reconstructing video result.
Write train.py file 2d) to realize the training process of network:
It is a video block that the frame that 2d1) will acquire is often N number of, takes batchsize video block as the input of network, and
By 2c) defined in network carry out propagated forward process obtain reconstruction result.
2d2) using the input of network as reference, the input of reconstruction result and network to acquisition carries out that reconstructed error is asked to grasp
Make, which is carried out backpropagation for mean square error by the reconstructed error used, is carried out network parameter update, is completed network
A training process.
2d3) judge whether to complete epoch training dataset trained, is, trained to terminate, otherwise repeatedly 2d1) -2d2).
Final argument model is saved, 2d4) so as to model measurement.
Write test.py 2e) to realize test process:
2e1) the network model that load saves, and to test video according to 2e2) -2e4) successively handle.
Video frame successively 2e2) is obtained from test video frame file, accumulative video frame reaches N, i.e., one until frame number
Video block inputs 2c) defined in reconstructing video block is obtained in network;For the video frame of last number deficiency N, mended
Z-operation is spliced into a video block, then inputs 2c) defined in obtain reconstruction result in network.Record reconstructs institute each time
Time.
2e3) whenever 2e2) as soon as middle obtain a reconstructing video block, write-in at once and test video video file of the same name,
To realize Real-time Reconstruction;To the reconstruction result that the video block of the last one zero padding obtains, only former frames of not zero padding are written
Video file.
2e4) to 2e2) obtain reconstructing video block in each frame save into picture sequentially in time, as 1.jpg,
2.jpg, and so on.The Y-PSNR of each frame is calculated with corresponding input frame by each frame of reconstructed frame group
(PSNR) and the value of structural similarity (SSIM) and preservation.
Step 3, network is trained
It is 3a) path where " patch " greyscale video block file presss from both sides by the training dataset path replacement of training code.
3b) load networks structure network.py file and network settings config.py file.
3c) the train.py under project implementation file terminates until trained, obtains final argument model.
Step 4, network is tested
It is 4a) path where " test_frame " test video file by the input video path replacement for testing code.
4b) load networks structure network.py file and final argument model.
4c) the test.py under project implementation file, obtains reconstructing video.
Of the invention, propagated forward smaller based on parameter amount used in full convolutional neural networks video compress cognitive method
Process time is extremely short, and Real-time Reconstruction may be implemented, and in the reconstruct part in the network of space-time video compress cognitive method
Temporal correlation can be enhanced in " time-space block ", this observation strong with the space-time balanced of acquisition combines can be very big
Promotion video reconstruction effect, obtain the reconstructing video of high quality.
Technical effect of the invention is explained again below by emulation and its data
Embodiment 8
Space-time video compress cognitive method based on convolutional network is same to implement 1-7,
Test condition:
The reconstruct part that the network of space-time video compress cognitive method is arranged is connected to one " space-time block ", remaining network ginseng
Number is as described above, constitute a network.
The reconstruct part that the network of space-time video compress cognitive method is arranged is connected to two " space-time blocks ", remaining network ginseng
Number is as described above, reconstruct a network.
The two networks are trained respectively, obtain trained network.
Test experiments content
One, of test experiments chooses " walk " and " foliage " in Vidset4 and is used as test data set, referring to fig. 4, will
Test data set is sent into the present invention and passes through the two trained networks, obtains reconstructing video, chooses two tests view therein
The reconstructing video frame of any one frame of frequency is as shown in figure 4, the first behavior " walk " and " foliage " input video are original in Fig. 4
Frame, the second behavior be only connected to the network of one " space-time block " space-time video compress cognitive method to " walk " and
The reconstructing video frame of " foliage " input video primitive frame, third behavior are only connected to two " space-time block " space-time video pressures
Reconstructing video frame of the network of contracting cognitive method to " walk " and " foliage " input video primitive frame.
As seen from Figure 4, greyscale video has preferable vision after the network reconfiguration of space-time video compress cognitive method
Quality reconstruction, the second row and the third line of comparison diagram 4, the network of one " space-time block " space-time video compress cognitive method is
The main contents of original video frame can be recovered, the network of two " space-time block " space-time video compress cognitive methods can make
It obtains reconstruction result to be more clear, further proves that " space-time block " of the invention enables to quality reconstruction more preferable.
The greyscale video of two, of test experiments " walk " and " foliage " inputs respectively is connected to one " space-time block " and two
What the network of the space-time video compress cognitive method of a " space-time block " was tested arrives reconstructing video, calculates preceding the 32 of reconstructing video
The Y-PSNR PSNR and structural similarity SSIM of the image of frame, as a result such as table one:
Table one:The PSNR/SSIM of reconstructing video frame
One block | Walk | Foliage | Two blocks | Walk | Foliage |
1st frame | 25.39/0.809 | 23.05/0.664 | 1st frame | 26.07/0.817 | 23.81/0.716 |
2nd frame | 25.87/0.841 | 24.39/0.742 | 2nd frame | 26.34/0.839 | 24.90/0.769 |
3rd frame | 25.96/0.849 | 24.87/0.766 | 3rd frame | 26.64/0.854 | 25.02/0.779 |
… | … | … | … | … | … |
Average value | 25.25/0.839 | 23.77/0.690 | Average value | 25.80/0.844 | 24.10/0.722 |
The reconstructing video frame that Fig. 4 is provided is compared from the subjective vision of the mankind, illustrates that " space-time block " can enhance space-time pass
Connection property, video reconstruction effect is good.Image index test result is provided from table one below, more can illustrate the present invention from the angle of quantization
Reconstruction quality.From table one as it can be seen that reconstructing video of the greyscale video after the network of space-time video compress cognitive method it is every
One frame all has a preferable PSNR and SSIM, and numerical values recited is close, it was demonstrated that and reconstructing video has preferable reconstruction quality,
Illustrate that reconstructing video result has preferable space-time balanced, can guarantee that the quality of each frame of reconstructing video is stablized.
Three, of test experiments, which is calculated separately, is connected to one " space-time block " and the perception of two " space-time block " space-time video compress
The reconstitution time of average each frame of preceding 32 frame of the reconstructing video of the network " walk " and " foliage " of method, wherein:
Average each frame reconstitution time of the network of one " space-time block " space-time video compress cognitive method is 0.03~
0.04s,
Average each frame reconstitution time of the network of two " space-time block " space-time video compress cognitive methods is 0.05~
0.06s。
Therefore the time of the network reconfiguration video of space-time video compress cognitive method of the present invention is fast, has real-time,
Increase the time that " space-time block " increases reconstructing video simultaneously, " space-time can be increased as far as possible while guaranteeing real-time
Block ".
In brief, the space-time video compress cognitive method disclosed by the invention based on convolutional network, mainly solves existing
The problem of video compress space-time balanced difference and reconstructing video real-time difference in technology.Its scheme is:1) training data is prepared
2) collection designs the network structure 3 of space-time video compress cognitive method) training and test text are write according to designed network structure
Part 4) training space-time video compress cognitive method network 5) test space-time video compress cognitive method network.Space-time of the present invention
The observation technology and enhance temporal and spatial correlations with " time-space block " that the network of video compress cognitive method is compressed simultaneously using space-time
Property reconfiguration technique, be not only able to achieve real-time video reconstruct, but also reconstruct result have stronger space-time balanced, reconstruct matter
Amount is high and stablizes, and can be used for the compression transmission and subsequent video reconstruction of video.
Claims (3)
1. a kind of space-time video compress cognitive method based on convolutional network, which is characterized in that including having the following steps:
1) training dataset is prepared:The video of different resolution needed for downloading, pre-processes all videos of downloading, by institute
Foradownloaded video successively switchs to the video frame of gray scale, and spatially position is cut to the fritter of certain size, by each space bit
The fritter set saves as picture, is stored in different sub-folders, and name according to video time frame sequential, i.e. 1.jpg, 2.jpg,
The rest may be inferred, and all sub-folders finally constitute a whole file, and using whole file as training dataset;It surveys
Trying data set is any video randomly selected, switchs to greyscale video and is stored in file;
2) network structure of space-time video compress cognitive method is designed:
Network structure includes observation part and reconstruct part, and input video block is input to a Three dimensional convolution layer by observation part
Afterwards as observation output, observation output is serially connected to three-dimensional warp lamination, several " a space-time blocks ", one by reconstruct part
BN layers, obtain reconstructing video block after a Three dimensional convolution layer;Each " space-time block " is that four Three dimensional convolution layers are sequentially connected in series,
And BN layers are added before each Three dimensional convolution layer, then by the output end of first Three dimensional convolution layer and the 4th Three dimensional convolution layer
Output end carries out residual error connection;
3) trained and test file is write according to designed network structure:
Project folder 3a) is established, establishes the programs such as training, test, network structure, network settings, function respectively in file
File;
3b) it is arranged and writes associated file content:Reasonable network hyper parameter is set in network profile, and in function text
Required function in training and test code is written in part;According to space-time video compress cognitive method in network structure file
Write network structure;
The training process of network 3c) is defined in training file:The sub-folder concentrated from training data takes several views in order
Frequency frame forms network inputs video block of the video block as space-time video compress cognitive method, is perceived by space-time video compress
The reconstructing video block and input video block of the network output of method calculate mean square error reconstruct and lose, and reconstruct loss are carried out reversed
Propagation is updated network parameter;
The test process of network 3d) is defined in test file:The greyscale video that test data is concentrated is switched into video frame, according to
Gray scale frame is divided into many continuous several frames by time sequencing, each continuous several frame as an input video block,
The network for being input to space-time video compress cognitive method obtains corresponding reconstructing video block, and reconstructing video block includes and input video
The identical frame number of block, arranges reconstructing video block to obtain reconstructing video according to time sequencing, is space-time video compress cognitive method
Network test result;
4) network of training space-time video compress cognitive method:By network structure file load networks structure and involved
Parameter hyper parameter is loaded by network profile and is pressed using stochastic gradient descent algorithm and to all parameter initialization
According to the training process of training document definition, repeatedly instructed using network of the training dataset to space-time video compress cognitive method
Practice undated parameter, training terminates to obtain final argument model;
5) network of space-time video compress cognitive method is tested:By network structure file load networks structure, final ginseng is loaded
Exponential model defines test process as network architecture parameters, according to test file, tests space-time video pressure using test data set
The network of contracting cognitive method obtains the reconstruct test video of real-time high quality, the space-time video compress of high space-time balance perception
As a result.
2. a kind of space-time video compress cognitive method based on convolutional network according to claim 1, which is characterized in that step
It is rapid 2) described in design space-time video compress cognitive method network structure, including have the following steps:
2a) observe the Three dimensional convolution layer setting of part:The size that the convolution kernel of Three dimensional convolution layer is arranged is T × 3 × 3, wherein T=
16 be size of the convolution kernel on time dimension, and 3 × 3 be the size in space dimension, and not zero padding in convolution process is arranged, and step-length is
3;Input video block size is T × H × W, and T is the frame number that input video block includes, and H × W is the space dimension size of each frame, and
H and W is 3 multiple;When convolution kernel number is 1, space compression rate is 9, and time compression ratio is T, observation rateIf
The number for setting convolution kernel is N, then observation rateWork as N=9, observation rate when T=16
2b) reconstruct the setting of partial 3-D warp lamination:The convolution kernel that three-dimensional warp lamination is arranged need to be according to convolution and deconvolution
Symmetry, when the convolution kernel size of deconvolution process is identical as the convolution kernel size of corresponding convolution process,
The output of deconvolution is just identical as the input size of convolution, thus the convolution kernel that three-dimensional warp lamination is set be arranged should with observed
The convolution kernel setting of the Three dimensional convolution layer of journey is identical, i.e., having a size of T × 3 × 3, number N, and not zero padding in convolution process, step-length
It is 3;
2c) reconstruct the design of " the space-time block " of part:Each " space-time block " is sequentially connected in series for four Three dimensional convolution layers, and
Add BN layer before each Three dimensional convolution layer of " space-time block ", then roll up the input terminal of second Three dimensional convolution layer and the 4th are three-dimensional
The output end of lamination carries out residual error connection, and by second, third, the 4th Three dimensional convolution layer constitutes residual block for residual error connection;
Each " space-time block " is then by a Three dimensional convolution layer and a residual block serial connection;
2d) reconstruct the Three dimensional convolution layer setting of part:Reconstruct the convolution kernel of the last Three dimensional convolution layer in part having a size of 16 × 1 ×
1, number 16, step-length 1, not zero padding in convolution process.
3. the space-time video compress cognitive method according to claim 2 based on convolutional network, which is characterized in that step
" space-time block " is then by a Three dimensional convolution layer and a residual block serial connection each of described in 2c), including has the following steps:
2c1) the Three dimensional convolution layer setting in each " space-time block ":The convolution kernel size of Three dimensional convolution layer is 16 × 1 × 1, number
Mesh is 16, and not zero padding in convolution process is arranged, step-length 1;
2c2) the residual block setting in each " space-time block ":Residual block includes three Three dimensional convolution layers, the volume of Three dimensional convolution layer
The size of product core is respectively 16 × 3 × 3,64 × 1 × 1 and 32 × 3 × 3, and number is respectively 64,32,16, and setting convolution process is equal
Not zero padding, step-length are 1;The output end that the input terminal of residual block is connected to residual block is subjected to read group total, and is counted in summation
Add Tanh active coating after calculation;
2c3) in 2c1) and each Three dimensional convolution layer 2c2) before plus one BN layer to accelerate convergence rate, then add PReLU
To enhance the non-thread sexuality of network;
Multiple " space-time blocks " can 2c4) be cascaded, i.e., using the output of upper one " space-time block " as next " space-time block "
Input to expand network capacity, the structure of each block is identical.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810777563.XA CN108923984B (en) | 2018-07-16 | 2018-07-16 | Space-time video compressed sensing method based on convolutional network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810777563.XA CN108923984B (en) | 2018-07-16 | 2018-07-16 | Space-time video compressed sensing method based on convolutional network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108923984A true CN108923984A (en) | 2018-11-30 |
CN108923984B CN108923984B (en) | 2021-01-12 |
Family
ID=64411851
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810777563.XA Active CN108923984B (en) | 2018-07-16 | 2018-07-16 | Space-time video compressed sensing method based on convolutional network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108923984B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109819256A (en) * | 2019-03-06 | 2019-05-28 | 西安电子科技大学 | Video compress cognitive method based on characteristic perception |
CN109859120A (en) * | 2019-01-08 | 2019-06-07 | 北京交通大学 | Image defogging method based on multiple dimensioned residual error network |
CN110059823A (en) * | 2019-04-28 | 2019-07-26 | 中国科学技术大学 | Deep neural network model compression method and device |
CN110166779A (en) * | 2019-05-23 | 2019-08-23 | 西安电子科技大学 | Video-frequency compression method based on super-resolution reconstruction |
CN110503609A (en) * | 2019-07-15 | 2019-11-26 | 电子科技大学 | A kind of image rain removing method based on mixing sensor model |
CN112866763A (en) * | 2020-12-28 | 2021-05-28 | 网宿科技股份有限公司 | Sequence number generation method of HLS multi-rate stream slice, server and storage medium |
CN113196718A (en) * | 2018-12-11 | 2021-07-30 | 瑞典爱立信有限公司 | Techniques for user plane quality of service analysis |
CN117292209A (en) * | 2023-11-27 | 2023-12-26 | 之江实验室 | Video classification method and device based on space-time enhanced three-dimensional attention re-parameterization |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1938590A2 (en) * | 2005-10-17 | 2008-07-02 | QUALCOMM Incorporated | Method and apparatus for spatio-temporal deinterlacing aided by motion compensation for field-based video |
CN106778854A (en) * | 2016-12-07 | 2017-05-31 | 西安电子科技大学 | Activity recognition method based on track and convolutional neural networks feature extraction |
CN106911930A (en) * | 2017-03-03 | 2017-06-30 | 深圳市唯特视科技有限公司 | It is a kind of that the method for perceiving video reconstruction is compressed based on recursive convolution neutral net |
-
2018
- 2018-07-16 CN CN201810777563.XA patent/CN108923984B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1938590A2 (en) * | 2005-10-17 | 2008-07-02 | QUALCOMM Incorporated | Method and apparatus for spatio-temporal deinterlacing aided by motion compensation for field-based video |
CN106778854A (en) * | 2016-12-07 | 2017-05-31 | 西安电子科技大学 | Activity recognition method based on track and convolutional neural networks feature extraction |
CN106911930A (en) * | 2017-03-03 | 2017-06-30 | 深圳市唯特视科技有限公司 | It is a kind of that the method for perceiving video reconstruction is compressed based on recursive convolution neutral net |
Non-Patent Citations (2)
Title |
---|
AARON CHADHA 等: "Compressed-domain video classification with deep neural networks: "There"s way too much information to decode the matrix"", 《2017 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》 * |
侯敬轩: "基于卷积神经网络的压缩视频后处理", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113196718B (en) * | 2018-12-11 | 2023-07-14 | 瑞典爱立信有限公司 | Techniques for user plane quality of service analysis |
CN113196718A (en) * | 2018-12-11 | 2021-07-30 | 瑞典爱立信有限公司 | Techniques for user plane quality of service analysis |
CN109859120A (en) * | 2019-01-08 | 2019-06-07 | 北京交通大学 | Image defogging method based on multiple dimensioned residual error network |
CN109819256A (en) * | 2019-03-06 | 2019-05-28 | 西安电子科技大学 | Video compress cognitive method based on characteristic perception |
CN109819256B (en) * | 2019-03-06 | 2022-07-26 | 西安电子科技大学 | Video compression sensing method based on feature sensing |
CN110059823A (en) * | 2019-04-28 | 2019-07-26 | 中国科学技术大学 | Deep neural network model compression method and device |
CN110166779B (en) * | 2019-05-23 | 2021-06-08 | 西安电子科技大学 | Video compression method based on super-resolution reconstruction |
CN110166779A (en) * | 2019-05-23 | 2019-08-23 | 西安电子科技大学 | Video-frequency compression method based on super-resolution reconstruction |
CN110503609A (en) * | 2019-07-15 | 2019-11-26 | 电子科技大学 | A kind of image rain removing method based on mixing sensor model |
CN112866763A (en) * | 2020-12-28 | 2021-05-28 | 网宿科技股份有限公司 | Sequence number generation method of HLS multi-rate stream slice, server and storage medium |
CN112866763B (en) * | 2020-12-28 | 2023-05-26 | 网宿科技股份有限公司 | Sequence number generation method, server and storage medium of HLS multi-code rate stream slice |
CN117292209A (en) * | 2023-11-27 | 2023-12-26 | 之江实验室 | Video classification method and device based on space-time enhanced three-dimensional attention re-parameterization |
CN117292209B (en) * | 2023-11-27 | 2024-04-05 | 之江实验室 | Video classification method and device based on space-time enhanced three-dimensional attention re-parameterization |
Also Published As
Publication number | Publication date |
---|---|
CN108923984B (en) | 2021-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108923984A (en) | Space-time video compress cognitive method based on convolutional network | |
CN108550115B (en) | Image super-resolution reconstruction method | |
CN107133919A (en) | Time dimension video super-resolution method based on deep learning | |
CN110310227A (en) | A kind of image super-resolution rebuilding method decomposed based on high and low frequency information | |
CN104574336B (en) | Super-resolution image reconstruction system based on adaptive sub- mould dictionary selection | |
Zhu et al. | Efficient single image super-resolution via hybrid residual feature learning with compact back-projection network | |
CN109697697B (en) | Reconstruction method of spectral imaging system based on optimization heuristic neural network | |
CN109360152A (en) | 3 d medical images super resolution ratio reconstruction method based on dense convolutional neural networks | |
US20200134887A1 (en) | Reinforcement Learning for Online Sampling Trajectory Optimization for Magnetic Resonance Imaging | |
CN113205595B (en) | Construction method and application of 3D human body posture estimation model | |
CN107123094B (en) | Video denoising method mixing Poisson, Gaussian and impulse noise | |
CN106097253B (en) | A kind of single image super resolution ratio reconstruction method based on block rotation and clarity | |
KR20200140713A (en) | Method and apparatus for training neural network model for enhancing image detail | |
CN109741407A (en) | A kind of high quality reconstructing method of the spectrum imaging system based on convolutional neural networks | |
CN109472743A (en) | The super resolution ratio reconstruction method of remote sensing images | |
Sankisa et al. | Video error concealment using deep neural networks | |
CN114841856A (en) | Image super-pixel reconstruction method of dense connection network based on depth residual channel space attention | |
CN110378975A (en) | A kind of compressed encoding aperture imaging method and system based on deep neural network | |
CN109949217A (en) | Video super-resolution method for reconstructing based on residual error study and implicit motion compensation | |
CN108289175A (en) | A kind of low latency virtual reality display methods and display system | |
CN111510739A (en) | Video transmission method and device | |
CN111951203A (en) | Viewpoint synthesis method, apparatus, device and computer readable storage medium | |
CN109819256B (en) | Video compression sensing method based on feature sensing | |
CN111882512B (en) | Image fusion method, device and equipment based on deep learning and storage medium | |
CN109272450A (en) | A kind of image oversubscription method based on convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |