CN113992920A - Video compressed sensing reconstruction method based on deep expansion network - Google Patents
Video compressed sensing reconstruction method based on deep expansion network Download PDFInfo
- Publication number
- CN113992920A CN113992920A CN202111239880.4A CN202111239880A CN113992920A CN 113992920 A CN113992920 A CN 113992920A CN 202111239880 A CN202111239880 A CN 202111239880A CN 113992920 A CN113992920 A CN 113992920A
- Authority
- CN
- China
- Prior art keywords
- network
- expansion network
- deep expansion
- deep
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000012549 training Methods 0.000 claims abstract description 38
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 27
- 238000005070 sampling Methods 0.000 claims abstract description 18
- 230000004927 fusion Effects 0.000 claims abstract description 16
- 230000006835 compression Effects 0.000 claims abstract description 7
- 238000007906 compression Methods 0.000 claims abstract description 7
- 238000005516 engineering process Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 11
- 230000008447 perception Effects 0.000 claims description 4
- 238000005457 optimization Methods 0.000 description 8
- 230000003044 adaptive effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
A video compressed sensing reconstruction method based on a deep expansion network comprises the following steps: s1, constructing a training data set: the training data set is composed of a plurality of data pairs, and each data pair is composed of an observation frame formed by compressing multiple frames and a corresponding uncompressed multiple frame; s2, constructing a deep expansion network: expanding the half-quadratic splitting algorithm for optimizing compressed sensing into a deep expanded network, and adding a dense feature fusion technology; s3, training a deep expansion network: based on a training data set, a loss function is given, and parameters in a deep expansion network are continuously optimized by using a back propagation and gradient descent algorithm until the loss function is stable; and S4, applying the trained deep expansion network to carry out a video compression sensing reconstruction process: the input is compressed observation frames and sampling matrixes, and the output is reconstructed video multiframes. The method has good interpretability and can achieve high reconstruction precision on the premise of ensuring higher reconstruction speed.
Description
Technical Field
The invention belongs to the technical field of video processing, and particularly relates to a video compressed sensing reconstruction method based on a deep expansion network.
Background
Video compressed sensing is widely used in imaging systems, where the purpose is to capture, for example, video using two-dimensional sensors[1]Or spectrum of light[2]And (4) equal height dimension signals. By introducing an additional hardware component into the imaging system, the high-dimensional signal is compressed into a two-dimensional signal, and then the reconstruction from the two-dimensional signal to the high-dimensional signal is completed by applying a reconstruction algorithm.
The conventional video compressed perceptual reconstruction algorithm will be briefly described below.
From a mathematical point of view, after obtaining the compressed observation frames and the sampling matrix, the video compression perception solves an ill-posed inverse problem. Traditional methods introduce a priori knowledge of the image or video as a regularization term to iteratively solve a sparsely regularized optimization problem, e.g., introducing a total variance[3]Gaussian mixture model[4]Optical flow[5]Non-local low rank[6]The equistructure sparse property is used as a regular term a priori. Such conventional methods based on optimization iteration can be directly applied to different sampling matrices without retraining, but the performance is limited by the selected prior, and moreover, the process of iterative optimization takes a long time. In recent years, due to the vigorous development of deep learning, a plurality of video compression sensing methods based on the convolutional neural network are promoted, and the methods can be divided into two types, one type is a deep non-expanded network, and the other type is a deep expanded network. Deep non-expanded network learning is a direct mapping of compressed observation frames to original multiframes, e.g. stacked multi-layer convolutions to learn such a mapping[7]A deep fully-connected network is designed to learn this mapping[8]Selecting a jointly optimized sampling matrix and reconstructed network taking into account hardware limitations[9]A network named BIRNAT is designed[10]Wherein the first frame is reconstructed by a convolutional neural network and the rest frames are reconstructed by a bidirectional cyclic network, and the MetaSCI is provided[11]Wherein, the backbone network selects a light design mode andand the network may be applied to different sampling matrices. Proposed RevSCI[12]Packet reversible 3D convolution is used to handle large-scale video compressed sensing reconstruction, and such methods have the disadvantage of poor interpretability. Deep-unfolding network maps optimization method into a deep network to solve the problem that the traditional method needs multiple iterations, such as tenor-ADMM[13]、tensor-FISTA[14]、GAP-UNet[15]、PnP-FFDNet[16]Etc. are proposed in succession. The disadvantages of such a process are as follows: first, 2D convolution is commonly used in networks to exploit inter-frame correlation, which is not an optimal choice; second, deep-developed networks suffer significant loss of information transferred between stages.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the conventional video compressed sensing reconstruction method and provide a video compressed sensing reconstruction algorithm based on a deep expansion network. The method designs a deep expansion network for training and reconstruction, not only has good interpretability, but also can achieve high reconstruction precision on the premise of ensuring higher reconstruction speed.
The technical scheme of the invention is as follows:
a video compressed sensing reconstruction method based on a deep expansion network comprises the following steps: s1, constructing a training data set: the training data set is composed of a plurality of data pairs, and each data pair is composed of an observation frame formed by compressing multiple frames and a corresponding uncompressed multiple frame; s2, constructing a deep expansion network: expanding the half-quadratic splitting algorithm for optimizing compressed sensing into a deep expanded network, and adding a dense feature fusion technology; s3, training a deep expansion network: based on a training data set, a loss function is given, and parameters in a deep expansion network are continuously optimized by using a back propagation and gradient descent algorithm until the loss function is stable; and S4, applying the trained deep expansion network to carry out a video compression sensing reconstruction process: the input is compressed observation frames and sampling matrixes, and the output is reconstructed video multiframes.
Preferably, in the method for video compressed sensing reconstruction based on the deep expansion network, in step S1, a video training data set is constructed for training the deep expansion network, where the training data set is composed of a plurality of data pairs, and each data pair includes a group of consecutive video frames and a corresponding multi-frame compressed observation frame.
Preferably, in the video compressed sensing reconstruction method based on the depth expansion network, in step S2, the depth expansion network is expanded by a semi-quadratic splitting algorithm for optimizing compressed sensing, a network structure of the depth expansion network is formed by alternately stacking data modules and prior modules, and a 3D convolution is introduced to improve a characterization capability of the depth expansion network on inter-frame correlation; and using dense feature fusion techniques to reduce the loss caused by information passing between different stages and to help information to be adaptively transmitted in different stages.
Preferably, in the video compressed sensing reconstruction method based on the deep expansion network, in step S3, a back propagation algorithm is used to calculate gradients of the loss function with respect to each parameter in the deep expansion network, and then a gradient descent algorithm is used to optimize parameters of a network layer of the deep expansion network based on the training data set until the value of the loss function is stable, so as to obtain an optimal parameter of the deep expansion network.
Preferably, in the video compressed sensing reconstruction method based on the deep expansion network, in step S4, a rough reconstruction is performed by using the acquired observation frame and the sampling matrix, and then the reconstruction result and the sampling matrix are sent to the trained deep expansion network, and the output is a high-quality reconstruction result.
According to the technical scheme of the invention, the beneficial effects are as follows:
1. the method constructs a deep expansion network for video compressed sensing reconstruction, wherein 3D convolution is introduced to solve near-end mapping, and the method can better utilize the correlation of a time domain and a space domain;
2. the method of the invention greatly surpasses the prior method in subjective results and numerical values, and obtains the best reconstruction effect up to now. The dense feature fusion technology provided by the invention well solves the problem of information loss in the network and brings about 0.45dB of gain;
3. the method of the invention is the first discussion of how to solve the problem of information loss in the task of video compression perception, and provides a feasible solution. In addition, in order to improve the information fusion capability, the method of the invention provides a dense feature adaptive fusion mode, which can make the effective information in the dense feature adaptively transmitted between different stages.
For a better understanding and appreciation of the concepts, principles of operation, and effects of the invention, reference will now be made in detail to the following examples, taken in conjunction with the accompanying drawings, in which:
drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below.
FIG. 1 is a flowchart of an implementation of a video compressed sensing reconstruction method based on a deep expansion network according to the present invention.
Fig. 2 is a schematic diagram of a compressed perceptual reconstruction of video.
Fig. 3 is a structural diagram of a deep developed network.
FIG. 4 is a schematic diagram of a dense feature adaptive fusion technique.
Fig. 5 and 6 are respectively a visual comparison of the respective reconstruction algorithms on the partially synthesized data set.
Fig. 7 is a visual comparison result of each reconstruction algorithm on real data in an experiment.
Detailed Description
In order to make the objects, technical means and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific examples. These examples are merely illustrative and not restrictive of the invention.
The invention discloses a video compressed sensing reconstruction method based on a depth expansion network, which is used for reconstructing a high-quality video sequence from observation frames which are obtained by compressing a plurality of frames and acquired by a Digital Micromirror Device (DMD) or a Liquid Crystal On Silicon (LCOS) camera.
As shown in fig. 1, the video compressed sensing reconstruction method based on the deep expansion network of the present invention includes the following steps:
s1, constructing a training data set: the training data set is composed of a plurality of data pairs, each data pair being composed of a plurality of compressed observation frames and a corresponding uncompressed multiframe. Specifically, a video training data set is constructed for training the deep expansion network, the training data set is composed of a plurality of data pairs, and each data pair comprises a group of continuous video frames and corresponding observation frames formed by compressing multiple frames.
To determine the optimal parameters of the proposed deep expansion network, the present invention constructs a training data set for the video compressed perceptual reconstruction problem. In the experiment, the invention selects to train on a DAVIS data set, the data set comprises 90 scenes and has the resolution of 480p, the original image is cut to 128 x 128 size in the training process, each data pair has 8 frames, and 25600 data pairs are acquired in total. Therefore, the data pair is composed of a compressed observation frame y and an uncompressed multiframe x, the observation frame and the sampling matrix are used as the input of the network, and the output is the reconstructed result xKWhere uncompressed multiframes are the reconstruction targets, pairs of such training data form a network training data set S.
S2, constructing a deep expansion network: and expanding the half-quadratic splitting algorithm for optimizing the compressed sensing into a neural network, and adding a dense feature fusion technology. The deep expansion network is formed by expanding a semi-quadratic splitting algorithm for optimizing compressed sensing, the network structure is formed by alternately stacking a data module and a prior module, and 3D convolution is introduced to improve the characterization capability of the network on interframe correlation. Dense feature fusion techniques are used to reduce the loss caused by information passing between different stages and to help information to be adaptively transmitted in different stages.
The reconstruction result of the video compressed sensing can be obtained by solving the following optimization problem:
where x is a video frame (i.e. a plurality of frames x in fig. 2) with consecutive B frames, y is an observation frame compressed from the B frames, Φ is a corresponding sampling matrix, Ψ (x) is a regular term that constrains some a priori properties of the video frame x, and λ is a coefficient of the regular term.
Introducing another parameter V (intermediate variable, as shown in fig. 2), equation (1) can translate into a limited optimization problem:
the obtained objective function can be subjected to iterative optimization through a semi-quadratic splitting algorithm, and the method specifically comprises the following steps:
where k represents the number of iteration steps of the semi-quadratic split and η is another regular term coefficient.
(3) The solution can be accelerated by:
vk=xk-1+ΦT(ΦΦT+ηk)-1(rk-Φxk-1).(6)
wherein r is0R is also an intermediate variable (as shown in fig. 2).
(4) Formula can be viewed as a de-noising subproblem, whereby the X and V subproblems can be re-expressed as:
i-represents a cascading operation,represents a regularized observation frame, sinceHas rich information and is therefore introduced in the prior module to provide supplementary reconstruction information. One iteration of the original optimization problem is converted into a data module in the networkAnd a prior moduleTherefore, the invention expands the semi-quadratic splitting algorithm into a deep expanded network, wherein the reconstructed network is formed by alternately stacking the data modules and the prior modules (as shown in fig. 3). Initial value (i.e. initialization in fig. 2) x0=ΦTy is the result of a coarse reconstruction from the observation frame and the sampling matrix, then x0Y andsending the data to the network for reconstruction. The data module and the prior module are respectively used for solving the (7) and the (8), wherein the prior module comprises a dense feature adaptive fusion module for reducing information loss.
To better model the inter-frame correlation, a priori models are proposed in which a 3D convolution is used. Compared with 2D convolution, the 3D convolution kernel not only slides in the two-dimensional plane, but also slides in the time-domain dimension, which enables better utilization of inter-frame correlation in multi-frame reconstruction.
According to the above description, the information transmitted between each stage in the deeply-expanded network has only a limited number of channels, while the framework network used in the present invention is a UNet-like structure, it is obvious that the information transmitted in the network is often multi-channel, and then there is a loss of information in the course of transmission of each stage, in order to solve this problem, the present invention proposes a dense feature fusion technique, and the specific process is as follows (refer to fig. 2 and fig. 3):
as shown in FIG. 3, the network structure of the k-th stage (i.e., stage k) prior module is an encoder-decoder structure and has multiple scales, here, j ∈ [1, 2, 3 ] is used]To represent different scales, where j-1 corresponds to the shallowest layer of the network, and the characteristic diagram of the j-th scale output in the k-th stage encoder is defined asCorrespondingly, the input and output of the j scale in the k stage decoder are respectivelyAndthe output of each scale of the network can be expressed as:
it should be noted thatRepresents the input to the a priori module(s),indicating that there is no residual connection between the encoder and decoder at the scale corresponding to the deepest layer of the network.
To solve this problem, the present invention proposes a technique of dense feature fusion, as shown in fig. 2, the details of which are shown below: the inputs and outputs of the prior modules in each stage typically have a limited number of channels, indicating that there is a loss of information in the transmission of information between stages
Where ×) represents nearest neighbor upsampling. It can be seen that the input of the decoder of each stage fuses the feature maps of the corresponding scales in the previous stage, thereby reducing the information loss due to the change of the number of channels and the up-down sampling, wherein the features from the k-1 th stage are called as dense features and are expressed as follows:
finally, what the prior module learns is a residual map of the multi-frame reconstruction result of the data module (as shown in fig. 3):
since the characteristics of different channels have different contributions to the final result when the dense characteristics are fused, a dense characteristic adaptive fusion technology is provided to ensure that information is selectively transferred between adjacent stages. Specifically, when the dense feature Fk-1And regularizing the observation frameIn the same way, enhancement should be achieved during fusion, and inhibition should be achieved otherwise.
The core idea is to compute the similarity between dense features and the regularized observation frame, as shown in FIG. 4, for the m-th dense featureFor a certain position (p, q) of the c-th channel, the similarity S (p, q, c) can be defined by an anisotropic filterMultiplied by dense features, where H and W denote the height and width of the ith scale feature map, nf denotes the size of the anisotropic filter, and the calculation process can be expressed as:
whereinu is the index of the third element in the five tuple. The adaptive dense feature map may then be calculated by:
where σ denotes the sigmoid function, which acts to transform the variable to the [0, 1] interval. Then, the adaptive dense feature map may be represented as:
s3, training a deep expansion network, wherein the training process is as follows: based on a training data set, a loss function is given, and parameters in a deep expansion network are continuously optimized by using a back propagation and gradient descent algorithm until the loss function is stable. Specifically, a loss function is designed, the gradient of the loss function relative to each parameter in the deep expansion network is calculated by adopting a back propagation algorithm, then the parameters of the network layer are optimized by adopting a gradient descent algorithm based on a training data set until the value of the loss function is stable, namely, until a model converges, and the optimal parameters of the deep expansion network are obtained.
Taking S as a training data set, and taking a mean square error as a loss function of the network:
wherein N iskRepresenting the total number of training data pairs, NsRepresenting the total number of pixels of the image in each data pair. Calculating the gradient of the loss function relative to each parameter in the network through a back propagation algorithm, and then optimizing the parameters of the network layer by adopting a gradient descent algorithm based on the training data set until the value of the loss function is stable, so as to obtain the optimal parameters of the deeply expanded network.
S4, applying the trained deep expansion network to carry out a video compression perception reconstruction process: the input is compressed observation frames and sampling matrixes, and the output is reconstructed video multiframes.
Through the training process of the third step, the optimal depth expansion network parameters can be determined, and based on the trained model, when video compressed sensing reconstruction is performed, rough reconstruction is performed by using the acquired observation frame y and the sampling matrix phi (as shown in fig. 2), namely x0=ΦTAnd y, then sending the reconstruction result and the sampling matrix into a trained deep expansion network, and outputting the result, namely the high-quality reconstruction result.
During testing, the invention respectively reconstructs synthetic data and real data, wherein the synthetic data comprises six scenes of Kobe, Traffic, Runner, Drop, blast and Aerio, the dimensionality of the six scenes is 256 multiplied by 8, the real data set comprises two scenes of Water Balloon and Dominoes, and the dimensionality of the real data set is 512 multiplied by 10. In order to objectively evaluate the reconstruction accuracy of the different methods, the peak signal-to-noise ratio (PSNR) was used as an index for comparison. All experiments were run on servers of NVIDIA Tesla V100. The deep developed network used in the experiment was K ═ i 0.
Table 1: comparison of results of different methods under synthesized data
As shown in table 1 above, comparing the depth-expanded network proposed by the present invention with ten video-based compressed sensing reconstruction methods under synthesized data, the comparing method includes: GAP-TV[3]、E2E-CNN[7]、DeSCI[6]、PnP-FFDNet[15]、BIRNAT[10]、Tensor-ADMM[13]、Tensor-FISTA[14]、GAP-UNet[15]、MetaSCI[16]、RevSCI[12]. The depth expansion network provided by the invention achieves the highest reconstruction precision under the synthetic data, FIGS. 5 and 6 are the reconnection results of the methods under different scenes of the synthetic data, FIG. 7 is the reconstruction results of the methods under the real data, and the reconstruction results of the methods in the whole and amplified details can be seen.
The foregoing description is of the preferred embodiment of the concepts and principles of operation in accordance with the invention. The above-described embodiments should not be construed as limiting the scope of the claims, and other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.
Reference documents:
[1]Llull P,Liao X,Yuan X,et al.Coded aperture compressive temporal imaging[J].Optics express,2013,21(9):10526-10545.
[2]Wagadarikar A A,Pitsianis N P,Sun X,et al.Video rate spectral imaging using a coded aperture snapshot spectral imager[J].Optics express,2009,17(8):6368-6388.
[3]Yuan X.Generalized alternating projection based total variation minimization for compressive sensing[C].2016IEEE International Conference on Image Processing.IEEE,2016:2539-2543.
[4]Yang J,Yuan X,Liao X,et al.Video compressive sensing using Gaussian mixture models[J].IEEE Transactions on Image Processing,2014,23(11):4863-4878.
[5]Reddy D,Veeraraghavan A,Chellappa R.P2C2:Programmable pixel compressive camera for high speed imaging[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2011:329-336.
[6]Liu Y,Yuan X,Suo J,et al.Rank minimization for snapshot compressive imaging[J].IEEE transactions on pattern analysis and machine intelligence,2018,41(12):2990-3006.
[7]Qiao M,Meng Z,Ma J,et al.Deep learning for video compressive sensing[J].APL Photonics,2020,5(3):030801.
[8]Iliadis M,Spinoulas L,Katsaggelos A K.Deep fully-connected networks for video compressive sensing[J].Digital Signal Processing,2018,72:9-18.
[9]Yoshida M,Torii A,Okutomi M,et al.Joint optimization for compressive video sensing and reconstruction under hardware constraints[C].Proceedings of the European Conference on Computer Vision.2018:634-649.
[10]Cheng Z,Lu R,Wang Z,et al.BIRNAT:Bidirectional recurrent neural networks with adversarial training for video snapshot compressive imaging[C].European Conference on Computer Vision.Springer,Cham,2020:258-275.
[11]Wang Z,Zhang H,Cheng Z,et al.Metasci:Scalable and adaptive reconstruction for video compressive sensing[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.,2021.
[12]Cheng Z,Chen B,Liu G,et al.Memory-efficient network for large-scale video compressive sensing[J].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.,2021.
[13]Ma J,Liu X Y,Shou Z,et al.Deep tensor admm-net for snapshot compressive imaging[C].Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:10223-10232.
[14]Han X,Wu B,Shou Z,et al.Tensor FISTA-Net for real-time snapshot compressive imaging[C].Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(07):10933-10940.
[15]Meng Z,Jalali S,Yuan X.GAP-net for Snapshot Compressive Imaging[J].arXiv preprint arXiv:2012.08364,2020.
[16]Yuan X,Liu Y,Suo J,et al.Plug-and-play algorithms for large-scale snapshot compressive imaging[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1447-1457.
Claims (5)
1. a video compressed sensing reconstruction method based on a deep expansion network is characterized by comprising the following steps:
s1, constructing a training data set: the training data set is composed of a plurality of data pairs, and each data pair is composed of an observation frame formed by compressing multiple frames and a corresponding uncompressed multiple frame;
s2, constructing a deep expansion network: expanding the half-quadratic splitting algorithm for optimizing compressed sensing into a deep expanded network, and adding a dense feature fusion technology;
s3, training the deep expansion network: based on the training data set, giving a loss function, and continuously optimizing parameters in the deep expansion network by using a back propagation and gradient descent algorithm until the loss function is stable; and
s4, applying the trained deep expansion network to carry out a video compression perception reconstruction process: the input is compressed observation frames and sampling matrixes, and the output is reconstructed video multiframes.
2. The method for video compressed sensing reconstruction based on deep expansion network as claimed in claim 1, wherein in step S1, a video training data set is constructed for training the deep expansion network, the training data set is composed of a plurality of data pairs, each data pair includes a group of consecutive video frames and corresponding multi-frame compressed observation frames.
3. The video compressed sensing reconstruction method based on the deep expansion network according to claim 1, wherein in step S2, the deep expansion network is expanded by a semi-quadratic splitting algorithm for optimizing compressed sensing, a network structure of the deep expansion network is formed by alternately stacking data modules and prior modules, wherein 3D convolution is introduced to improve a capability of the deep expansion network in characterizing inter-frame correlation; and using dense feature fusion techniques to reduce the loss caused by information passing between different stages and to help information to be adaptively transmitted in different stages.
4. The method for video compressed sensing reconstruction based on the deep expansion network of claim 1, wherein in step S3, a back propagation algorithm is used to calculate gradients of a loss function with respect to each parameter in the deep expansion network, and then a gradient descent algorithm is used to optimize parameters of a network layer of the deep expansion network based on the training data set until values of the loss function are stable, so as to obtain optimal parameters of the deep expansion network.
5. The method for video compressed sensing reconstruction based on the deep expansion network as claimed in claim 1, wherein in step S4, firstly, the collected observation frames and the sampling matrix are used to perform rough reconstruction, and then the reconstructed result and the sampling matrix are sent to the trained deep expansion network, and the output is the reconstructed result with high quality.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111239880.4A CN113992920A (en) | 2021-10-25 | 2021-10-25 | Video compressed sensing reconstruction method based on deep expansion network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111239880.4A CN113992920A (en) | 2021-10-25 | 2021-10-25 | Video compressed sensing reconstruction method based on deep expansion network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113992920A true CN113992920A (en) | 2022-01-28 |
Family
ID=79740885
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111239880.4A Pending CN113992920A (en) | 2021-10-25 | 2021-10-25 | Video compressed sensing reconstruction method based on deep expansion network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113992920A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114841901A (en) * | 2022-07-01 | 2022-08-02 | 北京大学深圳研究生院 | Image reconstruction method based on generalized depth expansion network |
CN117058045A (en) * | 2023-10-13 | 2023-11-14 | 阿尔玻科技有限公司 | Method, device, system and storage medium for reconstructing compressed image |
CN117994176A (en) * | 2023-12-27 | 2024-05-07 | 中国传媒大学 | Depth priori optical flow guided video restoration method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190075309A1 (en) * | 2016-12-30 | 2019-03-07 | Ping An Technology (Shenzhen) Co., Ltd. | Video compressed sensing reconstruction method, system, electronic device, and storage medium |
WO2020037965A1 (en) * | 2018-08-21 | 2020-02-27 | 北京大学深圳研究生院 | Method for multi-motion flow deep convolutional network model for video prediction |
CN112991472A (en) * | 2021-03-19 | 2021-06-18 | 华南理工大学 | Image compressed sensing reconstruction method based on residual dense threshold network |
CN113222812A (en) * | 2021-06-02 | 2021-08-06 | 北京大学深圳研究生院 | Image reconstruction method based on information flow reinforced deep expansion network |
-
2021
- 2021-10-25 CN CN202111239880.4A patent/CN113992920A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190075309A1 (en) * | 2016-12-30 | 2019-03-07 | Ping An Technology (Shenzhen) Co., Ltd. | Video compressed sensing reconstruction method, system, electronic device, and storage medium |
WO2020037965A1 (en) * | 2018-08-21 | 2020-02-27 | 北京大学深圳研究生院 | Method for multi-motion flow deep convolutional network model for video prediction |
CN112991472A (en) * | 2021-03-19 | 2021-06-18 | 华南理工大学 | Image compressed sensing reconstruction method based on residual dense threshold network |
CN113222812A (en) * | 2021-06-02 | 2021-08-06 | 北京大学深圳研究生院 | Image reconstruction method based on information flow reinforced deep expansion network |
Non-Patent Citations (1)
Title |
---|
ZHUOYUAN WU等: "Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging", 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), pages 2 - 5 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114841901A (en) * | 2022-07-01 | 2022-08-02 | 北京大学深圳研究生院 | Image reconstruction method based on generalized depth expansion network |
CN114841901B (en) * | 2022-07-01 | 2022-10-25 | 北京大学深圳研究生院 | Image reconstruction method based on generalized depth expansion network |
CN117058045A (en) * | 2023-10-13 | 2023-11-14 | 阿尔玻科技有限公司 | Method, device, system and storage medium for reconstructing compressed image |
CN117994176A (en) * | 2023-12-27 | 2024-05-07 | 中国传媒大学 | Depth priori optical flow guided video restoration method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107730451B (en) | Compressed sensing reconstruction method and system based on depth residual error network | |
CN113992920A (en) | Video compressed sensing reconstruction method based on deep expansion network | |
CN106910161B (en) | Single image super-resolution reconstruction method based on deep convolutional neural network | |
Hawe et al. | Analysis operator learning and its application to image reconstruction | |
CN112884851B (en) | Construction method of deep compressed sensing network based on expansion iteration optimization algorithm | |
CN112435191B (en) | Low-illumination image enhancement method based on fusion of multiple neural network structures | |
CN111739077A (en) | Monocular underwater image depth estimation and color correction method based on depth neural network | |
CN112102182B (en) | Single image reflection removing method based on deep learning | |
CN112801877B (en) | Super-resolution reconstruction method of video frame | |
CN111179167A (en) | Image super-resolution method based on multi-stage attention enhancement network | |
CN113362250B (en) | Image denoising method and system based on dual-tree quaternary wavelet and deep learning | |
CN109523513B (en) | Stereoscopic image quality evaluation method based on sparse reconstruction color fusion image | |
CN111028150A (en) | Rapid space-time residual attention video super-resolution reconstruction method | |
CN110136060B (en) | Image super-resolution reconstruction method based on shallow dense connection network | |
Han et al. | Tensor FISTA-Net for real-time snapshot compressive imaging | |
CN110189260B (en) | Image noise reduction method based on multi-scale parallel gated neural network | |
CN114746895A (en) | Noise reconstruction for image denoising | |
CN109886898B (en) | Imaging method of spectral imaging system based on optimization heuristic neural network | |
Chen et al. | Image denoising via deep network based on edge enhancement | |
CN114170286A (en) | Monocular depth estimation method based on unsupervised depth learning | |
CN110782458A (en) | Object image 3D semantic prediction segmentation method of asymmetric coding network | |
Zhao et al. | A simple and robust deep convolutional approach to blind image denoising | |
CN114841859A (en) | Single-image super-resolution reconstruction method based on lightweight neural network and Transformer | |
CN115526779A (en) | Infrared image super-resolution reconstruction method based on dynamic attention mechanism | |
CN109615576A (en) | The single-frame image super-resolution reconstruction method of base study is returned based on cascade |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20220128 |