CN112929664A - Interpretable video compressed sensing reconstruction method - Google Patents
Interpretable video compressed sensing reconstruction method Download PDFInfo
- Publication number
- CN112929664A CN112929664A CN202110082588.XA CN202110082588A CN112929664A CN 112929664 A CN112929664 A CN 112929664A CN 202110082588 A CN202110082588 A CN 202110082588A CN 112929664 A CN112929664 A CN 112929664A
- Authority
- CN
- China
- Prior art keywords
- reconstruction
- layer
- module
- motion prediction
- compressed sensing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention belongs to the technical field of video compressed sensing reconstruction, and particularly relates to an interpretable video compressed sensing reconstruction method. The method simulates the iteration process of the traditional algorithm by constructing the video compression sensing reconstruction neural network, and maps the traditional iteration optimization algorithm into the feedforward reasoning neural network; the video compressed sensing reconstruction neural network structure comprises a primary reconstruction module, a motion prediction module and a residual reconstruction module which are sequentially cascaded; the preliminary reconstruction module performs preliminary reconstruction on the signal subjected to compressed sensing sampling; the motion prediction module takes the adjacent frames which are stored in the cache and have completed the primary reconstruction as reference frames to carry out multi-hypothesis motion prediction; and the residual error reconstruction module reconstructs the difference between the sampling value and the network input sampling value to obtain a residual error reconstruction result. The invention can effectively improve the reconstruction effect of video compressed sensing and reduce the time required by reconstruction, thereby meeting the real-time reconstruction requirement of video signals.
Description
Technical Field
The invention belongs to the technical field of video compressed sensing reconstruction, and particularly relates to an interpretable video compressed sensing reconstruction method.
Background
Conventional image or video compression methods, such as JPEG or H265, compress a signal after sampling it; the compressed sensing theory proposed by cans, Tao and Donoho in 2006 enables sensing and compression of signals to be performed simultaneously, i.e. only part of the signal is sampled and the original signal is restored by a reconstruction algorithm. It is theorized that if the original signal is sparse in some transform domains, it can be compressively sampled at a frequency lower than that required by nyquist's sampling law, and then restored by a specific reconstruction algorithm. Although the traditional compression method has higher compression rate and reconstruction quality at present, the characteristic of compressing and sensing to simultaneously perform sampling and compression makes the method have great application value in specific fields, such as medical image processing, high-speed photography and the like.
After the compressed sensing theory is proposed, the scholars propose various reconstruction algorithms to improve the reconstruction quality. The traditional algorithm usually completes reconstruction based on iterative optimization, and the aim of optimizing reconstruction quality is achieved by manually designing various prior conditions and transformation domains. The traditional algorithm based on iterative optimization has the advantages of clear algorithm concept, flexible processing size and the like, but the methods also have the defects of high calculation complexity, overlong calculation time, inaccurate prior condition and transform domain design and the like.
Thanks to the strong learning ability of the neural network, the neural network-based method is also widely applied to the compressed sensing reconstruction task. These methods typically use a network structure of feed forward reasoning, and a data-driven training method, thus greatly reducing computation time while improving reconstruction quality. However, the structure of these neural networks is often established by the experience of designers, and the algorithm concept is not clear, so that targeted optimization cannot be performed.
Therefore, if the two are combined, i.e. the traditional iterative algorithm is mapped to the neural network by the algorithm expansion method, the balance between the two can be well achieved.
Disclosure of Invention
The invention aims to provide an interpretable video compressed sensing reconstruction method, so as to effectively reduce the processing time of a video compressed sensing reconstruction task and improve the reconstruction quality.
The interpretable video compressed sensing reconstruction method provided by the invention simulates the iteration process of the traditional algorithm by constructing the video compressed sensing reconstruction neural network, and maps the traditional iterative optimization algorithm into the feedforward reasoning neural network; the video compressed sensing reconstruction neural network structurally comprises a primary reconstruction module, a motion prediction module and a residual reconstruction module which are sequentially cascaded; the specific reconstruction steps of the input signal are as follows:
(1) firstly, constructing a preliminary reconstruction module, and performing preliminary reconstruction on a signal subjected to compressed sensing sampling; the preliminary reconstruction module comprises a full connection layer, 5 convolution layers and an activation layer (the module parameters can be seen in the following table 1-1); the full connection layer is used for finishing signal size conversion, and the 5 convolution layers and the activation layer are used for finishing characteristic value extraction; in order to preserve the information obtained during reconstruction, the input-output size of each convolutional layer is identical to the original signal. The output signal length of the preliminary reconstruction module is consistent with the original signal.
(2) Then, a motion prediction module is constructed, and the adjacent frames which are stored in the cache and have been subjected to preliminary reconstruction are used as reference frames, and multi-hypothesis motion prediction is carried out through the motion prediction module; the structure of the motion prediction module includes a fully connected layer (the module parameters can be seen in tables 1-2 below); the motion prediction module prediction process is described by the following mathematical formula:
where the subscripts t, i denote the number of frames in the video sequence and the number of image blocks in the frames, xt,iAndi.e. the t-th frame, the i-th image block and theCorresponding multi-reference prediction block, ωt,iRepresenting K possible reference blocks in a reference frame, with dimension B2A matrix of xK, B being the size of the reference block, K being the number of reference blocks searched in the reference frame; ht,iCoefficients representing a linear combination of the K reference blocks reflecting their contribution to the final motion predictor; ht,iThe training learning is carried out, so that the optimal effect is ensured;
(3) then, a residual error reconstruction module is constructed, compressed sensing sampling is carried out on the motion prediction result again, and the motion prediction result is input into the residual error reconstruction module so as to further improve the prediction effect; the residual error reconstruction module reconstructs the difference between the sampling value and the network input sampling value to obtain a residual error reconstruction result, so that the reconstruction effect can be improved, and the training effect of the deep neural network is optimized; here, the residual reconstruction network includes a fully-connected layer for performing signal size transformation, 5 convolutional layers and an active layer (the module parameters can be seen in tables 1 to 3 below), and a plurality of convolutional layers and active layers for performing feature value extraction. In order to retain the information obtained in the reconstruction process, the input and output sizes of each convolution layer are consistent with those of the original signal;
(4) and finally, adding the output value of the residual error reconstruction module and the motion prediction output value, and averaging the output value and the preliminary reconstruction result to obtain a final output result.
The invention improves the quality of video compressed sensing reconstruction through multi-hypothesis motion estimation and residual reconstruction, the feedforward reasoning type neural network structure ensures that the reconstruction time is shorter, and the method of algorithm expansion is applied to ensure that the network structure can be obviously optimized for motion prediction.
Table 1-1: primary reconstruction Module construction parameters (cr configurable compression ratio parameter)
Tables 1-2: multi-reference motion prediction module structure parameter
Name of structure | Structural parameters |
Full connection layer | Inputting: 1024; and (3) outputting: 256 |
Tables 1 to 3: residual reconstruction module structure parameter (cr is a configurable compression ratio parameter)
Drawings
FIG. 1 is a schematic view of the overall process of the present invention.
Fig. 2 is a schematic diagram of the "preliminary reconstruction module".
Fig. 3 is a schematic diagram of the residual error reconstruction module.
Detailed Description
The invention will be further elucidated with reference to the schematic diagram 1.
The input of the present invention is the signal y after compressed perceptual sampling of one frame of the original video signal. After the network parameter weight is read from the model file, the input signal y is input into the network for reasoning, and the signal y is reconstructed by the first-stage module and then the reconstruction result x is outputout1Then to xout1Resampling is carried out, a compressed sensing sampling matrix is the same as a matrix for sampling the original signal, and then the resampled signal y1As input signal to the second stage module, and so onAnd repeating the steps for N times, wherein N is the number of modules in the network. Final output xoutI.e. the result of the reconstruction output of the network.
The model takes as input the compressed perceptual signal of an image block of 16x 16. Taking a 4-fold compression ratio as an example, after being measured by a compressed sensing matrix, the 16 × 16 image blocks are compressed into a signal with a length of 64 and input into a reconstruction network. In each module, the specific calculation steps for reconstructing the signal are as follows.
(1) First, an input signal with a length of 64 is primarily reconstructed by a primary reconstruction module. The preliminary reconstruction module outputs the input signal as a 256-length signal through a full link layer and adjusts it to a 16x16 two-dimensional signal. The adjusted signal passes through 5 convolutional layers and an active layer to extract characteristic values, and finally a preliminary reconstruction signal of 16x16 is output.
(2) Then, the motion prediction module reads the neighboring frame that has completed the reconstruction from the buffer as a reference frame. And the motion prediction module selects a region with the size of 32x32 as a reference block to be input into the motion prediction module by taking the corresponding position on the reference frame as the center according to the position of the current block to be reconstructed. The motion prediction module performs multi-hypothesis motion prediction through a full-link layer, namely, a reference block of 32x32 is divided into 4 sub-reference blocks of 16x16, and the 4 sub-reference blocks are linearly combined to be used as a motion prediction value, and finally the motion prediction value of 16x16 is output.
(3) Then, the motion prediction value is resampled by using the matrix same as the original compressed sensing measurement matrix, a resampled signal with the length of 64 is generated, and the resampled signal is subtracted from the original input signal, so that a residual signal is obtained. And the residual signal is sent to a residual reconstruction module, and the residual reconstruction module outputs the residual signal to be a signal with the length of 256 through a full connection layer and adjusts the signal to be a two-dimensional signal of 16x 16. Then 5 convolutional layers and activation functions are used for extracting characteristic values, and finally a residual reconstruction signal of 16x16 is output.
(4) And finally, adding the output value of the residual error reconstruction module and the output value of the motion prediction module to obtain an optimized predicted value, wherein the size of the optimized predicted value is 16x 16. And averaging the result with the output value of the preliminary reconstruction module, namely the sum of the two weights is 0.5, so as to obtain the final output value of the module, wherein the size of the final output value is 16x 16.
The final output result of the method is the reconstructed image block of the compressed sensing signal of the original image block, and experiments prove that the method can effectively complete the reconstruction task in a short time, and the final result has better image quality.
Claims (4)
1. An interpretable video compressed sensing reconstruction method is characterized in that an iteration process of a traditional algorithm is simulated by constructing a video compressed sensing reconstruction neural network, and the traditional iterative optimization algorithm is mapped into a feedforward reasoning neural network; the video compressed sensing reconstruction neural network structurally comprises a primary reconstruction module, a motion prediction module and a residual reconstruction module which are sequentially cascaded; the specific reconstruction steps of the input signal are as follows:
(1) firstly, constructing a preliminary reconstruction module, and performing preliminary reconstruction on a signal subjected to compressed sensing sampling; the primary reconstruction module comprises a full connection layer, 5 convolution layers and an activation layer; the full connection layer is used for finishing signal size conversion, and the 5 convolution layers and the activation layer are used for finishing characteristic value extraction; in order to retain the information obtained in the reconstruction process, the input and output sizes of each convolution layer are consistent with those of the original signal; the length of an output signal of the preliminary reconstruction module is consistent with that of an original signal;
(2) then, a motion prediction module is constructed, and the adjacent frames which are stored in the cache and have been subjected to preliminary reconstruction are used as reference frames, and multi-hypothesis motion prediction is carried out through the motion prediction module; the structure of the motion prediction module comprises a full connection layer; the motion prediction module prediction process is described by the following mathematical formula:
where the subscripts t, i denote the number of frames in the video sequence and the number of image blocks in the frames, xt,iAndi.e. the t-th frame, the i-th image block and its corresponding multi-reference motion prediction block, ωt,iRepresenting K possible reference blocks in a reference frame, with dimension B2A matrix of xK, B being the size of the reference block, K being the number of reference blocks searched in the reference frame; ht,iCoefficients representing a linear combination of the K reference blocks reflecting their contribution to the final motion predictor; ht,iThe training learning is carried out, so that the optimal effect is ensured;
(3) then, a residual error reconstruction module is constructed, compressed sensing sampling is carried out on the motion prediction result again, and the motion prediction result is input into the residual error reconstruction module so as to further improve the prediction effect; the residual error reconstruction module reconstructs the difference between the sampling value and the network input sampling value to obtain a residual error reconstruction result, so that the reconstruction effect can be improved, and the training effect of the deep neural network is optimized; the residual error reconstruction network comprises a full connection layer, 5 convolution layers and an active layer, wherein the full connection layer is used for completing signal size transformation, and the convolution layers and the active layer are used for completing characteristic value extraction. In order to retain the information obtained in the reconstruction process, the input and output sizes of each convolution layer are consistent with those of the original signal;
(4) and finally, adding the output value of the residual error reconstruction module and the motion predicted value, and averaging the output value and the preliminary reconstruction result to obtain a final result.
2. The method according to claim 1, wherein the preliminary reconstruction module comprises a full-link layer, 5 convolutional layers and an active layer, and the specific structure parameters are as follows:
full connection layer: inputting: cr × 256, output: 256 of; cr is a configurable compression ratio parameter;
the convolutional layer 1: convolution kernel size: 1x1, step size: 1, filling: 0, input channel: 1, output channel: 128; activation function: a ReLU function;
and (3) convolutional layer 2: convolution kernel size: 1x1, step size: 1, filling: 0, input channel: 1, output channel: 128; activation function: a ReLU function;
and (3) convolutional layer: convolution kernel size: 3x3, step size: 1, filling: 1, input channel: 64, output channel: 32, a first step of removing the first layer; activation function: a ReLU function;
and (4) convolutional layer: convolution kernel size: 3x1, step size: 1, filling: 1, input channel: 32, output channel: 16; activation function: a ReLU function;
and (5) convolutional layer: convolution kernel size: 3x3, step size: 1, filling: 1, input channel: 16, output channel: 1.
3. the method of claim 1, wherein the parameters of the structure of the full-link layer in the structure of the motion prediction module are: inputting: 1024, outputting: 256.
4. the method of claim 1, wherein the residual reconstruction network comprises a full link layer, 5 convolutional layers and an active layer, and the specific structure parameters are as follows:
full connection layer: inputting: cr × 256, output: 256 of; cr is a configurable compression ratio parameter;
the convolutional layer 1: convolution kernel size: 1x1, step size: 1, filling: 0, input channel: 1, output channel: 128; activation function: a ReLU function;
and (3) convolutional layer 2: convolution kernel size: 1x1, step size: 1, filling: 0, input channel: 128, output channel: 64; activation function: a ReLU function;
and (3) convolutional layer: convolution kernel size: 3x3, step size: 1, filling: 1, input channel: 64, output channel: 32, a first step of removing the first layer;
and (4) convolutional layer: convolution kernel size: 3x3, step size: 1, filling: 1, input channel: 32, output channel: 16;
and (5) convolutional layer: convolution kernel size: 3x3, step size: 1, filling: 1, input channel: 16, output channel: 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110082588.XA CN112929664A (en) | 2021-01-21 | 2021-01-21 | Interpretable video compressed sensing reconstruction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110082588.XA CN112929664A (en) | 2021-01-21 | 2021-01-21 | Interpretable video compressed sensing reconstruction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112929664A true CN112929664A (en) | 2021-06-08 |
Family
ID=76165694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110082588.XA Pending CN112929664A (en) | 2021-01-21 | 2021-01-21 | Interpretable video compressed sensing reconstruction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112929664A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113658282A (en) * | 2021-06-25 | 2021-11-16 | 陕西尚品信息科技有限公司 | Image compression and decompression method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107730451A (en) * | 2017-09-20 | 2018-02-23 | 中国科学院计算技术研究所 | A kind of compressed sensing method for reconstructing and system based on depth residual error network |
CN110933429A (en) * | 2019-11-13 | 2020-03-27 | 南京邮电大学 | Video compression sensing and reconstruction method and device based on deep neural network |
CN112116601A (en) * | 2020-08-18 | 2020-12-22 | 河南大学 | Compressive sensing sampling reconstruction method and system based on linear sampling network and generation countermeasure residual error network |
-
2021
- 2021-01-21 CN CN202110082588.XA patent/CN112929664A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107730451A (en) * | 2017-09-20 | 2018-02-23 | 中国科学院计算技术研究所 | A kind of compressed sensing method for reconstructing and system based on depth residual error network |
CN110933429A (en) * | 2019-11-13 | 2020-03-27 | 南京邮电大学 | Video compression sensing and reconstruction method and device based on deep neural network |
CN112116601A (en) * | 2020-08-18 | 2020-12-22 | 河南大学 | Compressive sensing sampling reconstruction method and system based on linear sampling network and generation countermeasure residual error network |
Non-Patent Citations (2)
Title |
---|
BOWEN HUANG ETAL: "CS-MCNet: A Video Compressive Sensing Reconstruction Network with Interpretable Motion Compensation", 《HTTPS://ARXIV.ORG/PDF/2010.03780.PDF》 * |
涂云轩等: "基于多尺度残差网络的全局图像压缩感知重构", 《工业控制计算机》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113658282A (en) * | 2021-06-25 | 2021-11-16 | 陕西尚品信息科技有限公司 | Image compression and decompression method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108765296B (en) | Image super-resolution reconstruction method based on recursive residual attention network | |
CN106910161B (en) | Single image super-resolution reconstruction method based on deep convolutional neural network | |
CN111932461B (en) | Self-learning image super-resolution reconstruction method and system based on convolutional neural network | |
CN108900848B (en) | Video quality enhancement method based on self-adaptive separable convolution | |
CN110490832A (en) | A kind of MR image reconstruction method based on regularization depth image transcendental method | |
CN109584164B (en) | Medical image super-resolution three-dimensional reconstruction method based on two-dimensional image transfer learning | |
CN107730451A (en) | A kind of compressed sensing method for reconstructing and system based on depth residual error network | |
CN110677651A (en) | Video compression method | |
CN107967516A (en) | A kind of acceleration of neutral net based on trace norm constraint and compression method | |
CN111127325B (en) | Satellite video super-resolution reconstruction method and system based on cyclic neural network | |
CN113177882A (en) | Single-frame image super-resolution processing method based on diffusion model | |
CN112116601A (en) | Compressive sensing sampling reconstruction method and system based on linear sampling network and generation countermeasure residual error network | |
CN107590775B (en) | Image super-resolution amplification method using regression tree field | |
CN109949217B (en) | Video super-resolution reconstruction method based on residual learning and implicit motion compensation | |
CN107784628A (en) | A kind of super-resolution implementation method based on reconstruction optimization and deep neural network | |
CN111369433B (en) | Three-dimensional image super-resolution reconstruction method based on separable convolution and attention | |
CN113222812B (en) | Image reconstruction method based on information flow reinforced depth expansion network | |
CN108492249A (en) | Single frames super-resolution reconstruction method based on small convolution recurrent neural network | |
CN113674172A (en) | Image processing method, system, device and storage medium | |
CN115936985A (en) | Image super-resolution reconstruction method based on high-order degradation cycle generation countermeasure network | |
Hui et al. | Two-stage convolutional network for image super-resolution | |
JP2009049895A (en) | Data compressing method, image display method and display image enlarging method | |
CN112270646A (en) | Super-resolution enhancement method based on residual error dense jump network | |
CN112929664A (en) | Interpretable video compressed sensing reconstruction method | |
Jang et al. | Dual path denoising network for real photographic noise |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210608 |