CN112929664A - Interpretable video compressed sensing reconstruction method - Google Patents

Interpretable video compressed sensing reconstruction method Download PDF

Info

Publication number
CN112929664A
CN112929664A CN202110082588.XA CN202110082588A CN112929664A CN 112929664 A CN112929664 A CN 112929664A CN 202110082588 A CN202110082588 A CN 202110082588A CN 112929664 A CN112929664 A CN 112929664A
Authority
CN
China
Prior art keywords
reconstruction
layer
module
motion prediction
compressed sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110082588.XA
Other languages
Chinese (zh)
Inventor
范益波
黄博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202110082588.XA priority Critical patent/CN112929664A/en
Publication of CN112929664A publication Critical patent/CN112929664A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention belongs to the technical field of video compressed sensing reconstruction, and particularly relates to an interpretable video compressed sensing reconstruction method. The method simulates the iteration process of the traditional algorithm by constructing the video compression sensing reconstruction neural network, and maps the traditional iteration optimization algorithm into the feedforward reasoning neural network; the video compressed sensing reconstruction neural network structure comprises a primary reconstruction module, a motion prediction module and a residual reconstruction module which are sequentially cascaded; the preliminary reconstruction module performs preliminary reconstruction on the signal subjected to compressed sensing sampling; the motion prediction module takes the adjacent frames which are stored in the cache and have completed the primary reconstruction as reference frames to carry out multi-hypothesis motion prediction; and the residual error reconstruction module reconstructs the difference between the sampling value and the network input sampling value to obtain a residual error reconstruction result. The invention can effectively improve the reconstruction effect of video compressed sensing and reduce the time required by reconstruction, thereby meeting the real-time reconstruction requirement of video signals.

Description

Interpretable video compressed sensing reconstruction method
Technical Field
The invention belongs to the technical field of video compressed sensing reconstruction, and particularly relates to an interpretable video compressed sensing reconstruction method.
Background
Conventional image or video compression methods, such as JPEG or H265, compress a signal after sampling it; the compressed sensing theory proposed by cans, Tao and Donoho in 2006 enables sensing and compression of signals to be performed simultaneously, i.e. only part of the signal is sampled and the original signal is restored by a reconstruction algorithm. It is theorized that if the original signal is sparse in some transform domains, it can be compressively sampled at a frequency lower than that required by nyquist's sampling law, and then restored by a specific reconstruction algorithm. Although the traditional compression method has higher compression rate and reconstruction quality at present, the characteristic of compressing and sensing to simultaneously perform sampling and compression makes the method have great application value in specific fields, such as medical image processing, high-speed photography and the like.
After the compressed sensing theory is proposed, the scholars propose various reconstruction algorithms to improve the reconstruction quality. The traditional algorithm usually completes reconstruction based on iterative optimization, and the aim of optimizing reconstruction quality is achieved by manually designing various prior conditions and transformation domains. The traditional algorithm based on iterative optimization has the advantages of clear algorithm concept, flexible processing size and the like, but the methods also have the defects of high calculation complexity, overlong calculation time, inaccurate prior condition and transform domain design and the like.
Thanks to the strong learning ability of the neural network, the neural network-based method is also widely applied to the compressed sensing reconstruction task. These methods typically use a network structure of feed forward reasoning, and a data-driven training method, thus greatly reducing computation time while improving reconstruction quality. However, the structure of these neural networks is often established by the experience of designers, and the algorithm concept is not clear, so that targeted optimization cannot be performed.
Therefore, if the two are combined, i.e. the traditional iterative algorithm is mapped to the neural network by the algorithm expansion method, the balance between the two can be well achieved.
Disclosure of Invention
The invention aims to provide an interpretable video compressed sensing reconstruction method, so as to effectively reduce the processing time of a video compressed sensing reconstruction task and improve the reconstruction quality.
The interpretable video compressed sensing reconstruction method provided by the invention simulates the iteration process of the traditional algorithm by constructing the video compressed sensing reconstruction neural network, and maps the traditional iterative optimization algorithm into the feedforward reasoning neural network; the video compressed sensing reconstruction neural network structurally comprises a primary reconstruction module, a motion prediction module and a residual reconstruction module which are sequentially cascaded; the specific reconstruction steps of the input signal are as follows:
(1) firstly, constructing a preliminary reconstruction module, and performing preliminary reconstruction on a signal subjected to compressed sensing sampling; the preliminary reconstruction module comprises a full connection layer, 5 convolution layers and an activation layer (the module parameters can be seen in the following table 1-1); the full connection layer is used for finishing signal size conversion, and the 5 convolution layers and the activation layer are used for finishing characteristic value extraction; in order to preserve the information obtained during reconstruction, the input-output size of each convolutional layer is identical to the original signal. The output signal length of the preliminary reconstruction module is consistent with the original signal.
(2) Then, a motion prediction module is constructed, and the adjacent frames which are stored in the cache and have been subjected to preliminary reconstruction are used as reference frames, and multi-hypothesis motion prediction is carried out through the motion prediction module; the structure of the motion prediction module includes a fully connected layer (the module parameters can be seen in tables 1-2 below); the motion prediction module prediction process is described by the following mathematical formula:
Figure BDA0002909900950000021
Figure BDA0002909900950000022
where the subscripts t, i denote the number of frames in the video sequence and the number of image blocks in the frames, xt,iAnd
Figure BDA0002909900950000023
i.e. the t-th frame, the i-th image block and theCorresponding multi-reference prediction block, ωt,iRepresenting K possible reference blocks in a reference frame, with dimension B2A matrix of xK, B being the size of the reference block, K being the number of reference blocks searched in the reference frame; ht,iCoefficients representing a linear combination of the K reference blocks reflecting their contribution to the final motion predictor; ht,iThe training learning is carried out, so that the optimal effect is ensured;
(3) then, a residual error reconstruction module is constructed, compressed sensing sampling is carried out on the motion prediction result again, and the motion prediction result is input into the residual error reconstruction module so as to further improve the prediction effect; the residual error reconstruction module reconstructs the difference between the sampling value and the network input sampling value to obtain a residual error reconstruction result, so that the reconstruction effect can be improved, and the training effect of the deep neural network is optimized; here, the residual reconstruction network includes a fully-connected layer for performing signal size transformation, 5 convolutional layers and an active layer (the module parameters can be seen in tables 1 to 3 below), and a plurality of convolutional layers and active layers for performing feature value extraction. In order to retain the information obtained in the reconstruction process, the input and output sizes of each convolution layer are consistent with those of the original signal;
(4) and finally, adding the output value of the residual error reconstruction module and the motion prediction output value, and averaging the output value and the preliminary reconstruction result to obtain a final output result.
The invention improves the quality of video compressed sensing reconstruction through multi-hypothesis motion estimation and residual reconstruction, the feedforward reasoning type neural network structure ensures that the reconstruction time is shorter, and the method of algorithm expansion is applied to ensure that the network structure can be obviously optimized for motion prediction.
Table 1-1: primary reconstruction Module construction parameters (cr configurable compression ratio parameter)
Figure BDA0002909900950000024
Figure BDA0002909900950000031
Tables 1-2: multi-reference motion prediction module structure parameter
Name of structure Structural parameters
Full connection layer Inputting: 1024; and (3) outputting: 256
Tables 1 to 3: residual reconstruction module structure parameter (cr is a configurable compression ratio parameter)
Figure BDA0002909900950000032
Drawings
FIG. 1 is a schematic view of the overall process of the present invention.
Fig. 2 is a schematic diagram of the "preliminary reconstruction module".
Fig. 3 is a schematic diagram of the residual error reconstruction module.
Detailed Description
The invention will be further elucidated with reference to the schematic diagram 1.
The input of the present invention is the signal y after compressed perceptual sampling of one frame of the original video signal. After the network parameter weight is read from the model file, the input signal y is input into the network for reasoning, and the signal y is reconstructed by the first-stage module and then the reconstruction result x is outputout1Then to xout1Resampling is carried out, a compressed sensing sampling matrix is the same as a matrix for sampling the original signal, and then the resampled signal y1As input signal to the second stage module, and so onAnd repeating the steps for N times, wherein N is the number of modules in the network. Final output xoutI.e. the result of the reconstruction output of the network.
The model takes as input the compressed perceptual signal of an image block of 16x 16. Taking a 4-fold compression ratio as an example, after being measured by a compressed sensing matrix, the 16 × 16 image blocks are compressed into a signal with a length of 64 and input into a reconstruction network. In each module, the specific calculation steps for reconstructing the signal are as follows.
(1) First, an input signal with a length of 64 is primarily reconstructed by a primary reconstruction module. The preliminary reconstruction module outputs the input signal as a 256-length signal through a full link layer and adjusts it to a 16x16 two-dimensional signal. The adjusted signal passes through 5 convolutional layers and an active layer to extract characteristic values, and finally a preliminary reconstruction signal of 16x16 is output.
(2) Then, the motion prediction module reads the neighboring frame that has completed the reconstruction from the buffer as a reference frame. And the motion prediction module selects a region with the size of 32x32 as a reference block to be input into the motion prediction module by taking the corresponding position on the reference frame as the center according to the position of the current block to be reconstructed. The motion prediction module performs multi-hypothesis motion prediction through a full-link layer, namely, a reference block of 32x32 is divided into 4 sub-reference blocks of 16x16, and the 4 sub-reference blocks are linearly combined to be used as a motion prediction value, and finally the motion prediction value of 16x16 is output.
(3) Then, the motion prediction value is resampled by using the matrix same as the original compressed sensing measurement matrix, a resampled signal with the length of 64 is generated, and the resampled signal is subtracted from the original input signal, so that a residual signal is obtained. And the residual signal is sent to a residual reconstruction module, and the residual reconstruction module outputs the residual signal to be a signal with the length of 256 through a full connection layer and adjusts the signal to be a two-dimensional signal of 16x 16. Then 5 convolutional layers and activation functions are used for extracting characteristic values, and finally a residual reconstruction signal of 16x16 is output.
(4) And finally, adding the output value of the residual error reconstruction module and the output value of the motion prediction module to obtain an optimized predicted value, wherein the size of the optimized predicted value is 16x 16. And averaging the result with the output value of the preliminary reconstruction module, namely the sum of the two weights is 0.5, so as to obtain the final output value of the module, wherein the size of the final output value is 16x 16.
The final output result of the method is the reconstructed image block of the compressed sensing signal of the original image block, and experiments prove that the method can effectively complete the reconstruction task in a short time, and the final result has better image quality.

Claims (4)

1. An interpretable video compressed sensing reconstruction method is characterized in that an iteration process of a traditional algorithm is simulated by constructing a video compressed sensing reconstruction neural network, and the traditional iterative optimization algorithm is mapped into a feedforward reasoning neural network; the video compressed sensing reconstruction neural network structurally comprises a primary reconstruction module, a motion prediction module and a residual reconstruction module which are sequentially cascaded; the specific reconstruction steps of the input signal are as follows:
(1) firstly, constructing a preliminary reconstruction module, and performing preliminary reconstruction on a signal subjected to compressed sensing sampling; the primary reconstruction module comprises a full connection layer, 5 convolution layers and an activation layer; the full connection layer is used for finishing signal size conversion, and the 5 convolution layers and the activation layer are used for finishing characteristic value extraction; in order to retain the information obtained in the reconstruction process, the input and output sizes of each convolution layer are consistent with those of the original signal; the length of an output signal of the preliminary reconstruction module is consistent with that of an original signal;
(2) then, a motion prediction module is constructed, and the adjacent frames which are stored in the cache and have been subjected to preliminary reconstruction are used as reference frames, and multi-hypothesis motion prediction is carried out through the motion prediction module; the structure of the motion prediction module comprises a full connection layer; the motion prediction module prediction process is described by the following mathematical formula:
Figure FDA0002909900940000011
Figure FDA0002909900940000012
where the subscripts t, i denote the number of frames in the video sequence and the number of image blocks in the frames, xt,iAnd
Figure FDA0002909900940000013
i.e. the t-th frame, the i-th image block and its corresponding multi-reference motion prediction block, ωt,iRepresenting K possible reference blocks in a reference frame, with dimension B2A matrix of xK, B being the size of the reference block, K being the number of reference blocks searched in the reference frame; ht,iCoefficients representing a linear combination of the K reference blocks reflecting their contribution to the final motion predictor; ht,iThe training learning is carried out, so that the optimal effect is ensured;
(3) then, a residual error reconstruction module is constructed, compressed sensing sampling is carried out on the motion prediction result again, and the motion prediction result is input into the residual error reconstruction module so as to further improve the prediction effect; the residual error reconstruction module reconstructs the difference between the sampling value and the network input sampling value to obtain a residual error reconstruction result, so that the reconstruction effect can be improved, and the training effect of the deep neural network is optimized; the residual error reconstruction network comprises a full connection layer, 5 convolution layers and an active layer, wherein the full connection layer is used for completing signal size transformation, and the convolution layers and the active layer are used for completing characteristic value extraction. In order to retain the information obtained in the reconstruction process, the input and output sizes of each convolution layer are consistent with those of the original signal;
(4) and finally, adding the output value of the residual error reconstruction module and the motion predicted value, and averaging the output value and the preliminary reconstruction result to obtain a final result.
2. The method according to claim 1, wherein the preliminary reconstruction module comprises a full-link layer, 5 convolutional layers and an active layer, and the specific structure parameters are as follows:
full connection layer: inputting: cr × 256, output: 256 of; cr is a configurable compression ratio parameter;
the convolutional layer 1: convolution kernel size: 1x1, step size: 1, filling: 0, input channel: 1, output channel: 128; activation function: a ReLU function;
and (3) convolutional layer 2: convolution kernel size: 1x1, step size: 1, filling: 0, input channel: 1, output channel: 128; activation function: a ReLU function;
and (3) convolutional layer: convolution kernel size: 3x3, step size: 1, filling: 1, input channel: 64, output channel: 32, a first step of removing the first layer; activation function: a ReLU function;
and (4) convolutional layer: convolution kernel size: 3x1, step size: 1, filling: 1, input channel: 32, output channel: 16; activation function: a ReLU function;
and (5) convolutional layer: convolution kernel size: 3x3, step size: 1, filling: 1, input channel: 16, output channel: 1.
3. the method of claim 1, wherein the parameters of the structure of the full-link layer in the structure of the motion prediction module are: inputting: 1024, outputting: 256.
4. the method of claim 1, wherein the residual reconstruction network comprises a full link layer, 5 convolutional layers and an active layer, and the specific structure parameters are as follows:
full connection layer: inputting: cr × 256, output: 256 of; cr is a configurable compression ratio parameter;
the convolutional layer 1: convolution kernel size: 1x1, step size: 1, filling: 0, input channel: 1, output channel: 128; activation function: a ReLU function;
and (3) convolutional layer 2: convolution kernel size: 1x1, step size: 1, filling: 0, input channel: 128, output channel: 64; activation function: a ReLU function;
and (3) convolutional layer: convolution kernel size: 3x3, step size: 1, filling: 1, input channel: 64, output channel: 32, a first step of removing the first layer;
and (4) convolutional layer: convolution kernel size: 3x3, step size: 1, filling: 1, input channel: 32, output channel: 16;
and (5) convolutional layer: convolution kernel size: 3x3, step size: 1, filling: 1, input channel: 16, output channel: 1.
CN202110082588.XA 2021-01-21 2021-01-21 Interpretable video compressed sensing reconstruction method Pending CN112929664A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110082588.XA CN112929664A (en) 2021-01-21 2021-01-21 Interpretable video compressed sensing reconstruction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110082588.XA CN112929664A (en) 2021-01-21 2021-01-21 Interpretable video compressed sensing reconstruction method

Publications (1)

Publication Number Publication Date
CN112929664A true CN112929664A (en) 2021-06-08

Family

ID=76165694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110082588.XA Pending CN112929664A (en) 2021-01-21 2021-01-21 Interpretable video compressed sensing reconstruction method

Country Status (1)

Country Link
CN (1) CN112929664A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658282A (en) * 2021-06-25 2021-11-16 陕西尚品信息科技有限公司 Image compression and decompression method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107730451A (en) * 2017-09-20 2018-02-23 中国科学院计算技术研究所 A kind of compressed sensing method for reconstructing and system based on depth residual error network
CN110933429A (en) * 2019-11-13 2020-03-27 南京邮电大学 Video compression sensing and reconstruction method and device based on deep neural network
CN112116601A (en) * 2020-08-18 2020-12-22 河南大学 Compressive sensing sampling reconstruction method and system based on linear sampling network and generation countermeasure residual error network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107730451A (en) * 2017-09-20 2018-02-23 中国科学院计算技术研究所 A kind of compressed sensing method for reconstructing and system based on depth residual error network
CN110933429A (en) * 2019-11-13 2020-03-27 南京邮电大学 Video compression sensing and reconstruction method and device based on deep neural network
CN112116601A (en) * 2020-08-18 2020-12-22 河南大学 Compressive sensing sampling reconstruction method and system based on linear sampling network and generation countermeasure residual error network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BOWEN HUANG ETAL: "CS-MCNet: A Video Compressive Sensing Reconstruction Network with Interpretable Motion Compensation", 《HTTPS://ARXIV.ORG/PDF/2010.03780.PDF》 *
涂云轩等: "基于多尺度残差网络的全局图像压缩感知重构", 《工业控制计算机》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658282A (en) * 2021-06-25 2021-11-16 陕西尚品信息科技有限公司 Image compression and decompression method and device

Similar Documents

Publication Publication Date Title
CN108765296B (en) Image super-resolution reconstruction method based on recursive residual attention network
CN106910161B (en) Single image super-resolution reconstruction method based on deep convolutional neural network
CN111932461B (en) Self-learning image super-resolution reconstruction method and system based on convolutional neural network
CN108900848B (en) Video quality enhancement method based on self-adaptive separable convolution
CN110490832A (en) A kind of MR image reconstruction method based on regularization depth image transcendental method
CN109584164B (en) Medical image super-resolution three-dimensional reconstruction method based on two-dimensional image transfer learning
CN107730451A (en) A kind of compressed sensing method for reconstructing and system based on depth residual error network
CN110677651A (en) Video compression method
CN107967516A (en) A kind of acceleration of neutral net based on trace norm constraint and compression method
CN111127325B (en) Satellite video super-resolution reconstruction method and system based on cyclic neural network
CN113177882A (en) Single-frame image super-resolution processing method based on diffusion model
CN112116601A (en) Compressive sensing sampling reconstruction method and system based on linear sampling network and generation countermeasure residual error network
CN107590775B (en) Image super-resolution amplification method using regression tree field
CN109949217B (en) Video super-resolution reconstruction method based on residual learning and implicit motion compensation
CN107784628A (en) A kind of super-resolution implementation method based on reconstruction optimization and deep neural network
CN111369433B (en) Three-dimensional image super-resolution reconstruction method based on separable convolution and attention
CN113222812B (en) Image reconstruction method based on information flow reinforced depth expansion network
CN108492249A (en) Single frames super-resolution reconstruction method based on small convolution recurrent neural network
CN113674172A (en) Image processing method, system, device and storage medium
CN115936985A (en) Image super-resolution reconstruction method based on high-order degradation cycle generation countermeasure network
Hui et al. Two-stage convolutional network for image super-resolution
JP2009049895A (en) Data compressing method, image display method and display image enlarging method
CN112270646A (en) Super-resolution enhancement method based on residual error dense jump network
CN112929664A (en) Interpretable video compressed sensing reconstruction method
Jang et al. Dual path denoising network for real photographic noise

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210608