CN105160664A - Low-rank model based compressed sensing video reconstruction method - Google Patents

Low-rank model based compressed sensing video reconstruction method Download PDF

Info

Publication number
CN105160664A
CN105160664A CN201510523631.6A CN201510523631A CN105160664A CN 105160664 A CN105160664 A CN 105160664A CN 201510523631 A CN201510523631 A CN 201510523631A CN 105160664 A CN105160664 A CN 105160664A
Authority
CN
China
Prior art keywords
video
msub
mrow
blocks
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510523631.6A
Other languages
Chinese (zh)
Other versions
CN105160664B (en
Inventor
刘芳
李婉
郝红侠
焦李成
李玲玲
杨淑媛
尚荣华
马文萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201510523631.6A priority Critical patent/CN105160664B/en
Publication of CN105160664A publication Critical patent/CN105160664A/en
Application granted granted Critical
Publication of CN105160664B publication Critical patent/CN105160664B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present invention discloses a low-rank model based compressed sensing video reconstruction method, which mainly solves the problems of inaccuracy and low robustness in compressed sensing video reconstruction. The implementation process of the method is as follows: (1) receiving measurement data; (2) initializing a single frame covariance matrix set; (3) constructing an initial reconstructed video based on piecewise linear estimation of joint sparsity and Gaussian distribution; (4) searching for a similarity block of each image block by using intra-video-frame and inter-video-frame correlation; (5) resolving a low-rank structure of the video; (6) updating the reconstructed video; (7) determining whether an iterative stop condition is met; and (8) outputting the reconstructed video. Compared with an existing video reconstruction technology, the low-rank model based compressed sensing video reconstruction method disclosed by the present invention has the advantages of high reconstructed image quality and high robustness, and can be used for reconstruction of natural scenario videos.

Description

Compressed sensing video reconstruction method based on low-rank model
Technical Field
The invention belongs to the technical field of image processing, and further relates to a video reconstruction method based on a compressed sensing system in the technical field of video coding. The method provided by the invention excavates the similarity between the video sequence frame and the frame, reconstructs the video based on the low-rank model, and can be used for reconstructing the natural image video sequence.
Background
In recent years, a new data theory compressed sensing CS appears in the field of signal processing, the theory realizes compression while acquiring data, breaks through the limitation of the traditional nyquist acquisition Stett sampling theorem, brings revolutionary change to the data acquisition technology, and has wide application prospect in the fields of compressed imaging systems, military cryptography, wireless sensing and the like. The compressed sensing theory mainly comprises three aspects of sparse representation of signals, observation of the signals, reconstruction of the signals and the like. The design of a rapid and effective reconstruction algorithm is an important link for successfully popularizing and applying the CS theory to an actual data model and an acquisition system.
High speed cameras play an important role in capturing fast motion in a variety of applications ranging from science to sports, but measuring high speed video is a challenge to the design of cameras. Compressed sensing enables capture of high frame rate video information through compressed measurement of low frame rate, and therefore compressed sensing is used for capture of high speed video information, thereby alleviating the difficulties of high speed camera design.
GuoshenYu et al, in their published paper "SolvingInversProblemWithPieceWiseLinearEstimators" ("IEEETransactionon Imageprocessing", 2012,21(5): 2481) propose a method for solving an image inverse problem using piecewise linear estimation. The method models the problem as a Gaussian mixture model, namely one image block obeys one of a plurality of multivariate Gaussian distributions, and obtains a better result on the image inverse problem including image reconstruction. The method has the defects that the method is directed at two-dimensional images and cannot be directly used for reconstructing video sequences, and the similarity between image blocks captured by using a statistical method is not accurate.
JanboYang et al, in their published paper "video CompressedSensingUsing Gaussianand Mixturpose columns" ("IEEETransactionSom Imageprocessing AP. SubtiplicationSociens, 2014,23) propose a Gaussian mixture model-based approach. According to the method, a Gaussian mixture model is established for the time-space video blocks, and the time-compressed video sequence is reconstructed, so that a good reconstruction effect is obtained, but the method still has the defect that the reconstruction method only establishes a Gaussian mixture model for the image and does not grasp the correlation between video frames and between frames, so that the video sequence reconstructed by the method is not accurate enough.
Disclosure of Invention
The invention aims to provide a low-rank model-based compressed sensing video reconstruction method aiming at the problem of correlation between video frames and frames in the space-time video compressed sensing reconstruction technology in the prior art, so that the quality of reconstructed images is improved.
The technical idea for realizing the aim of the invention is as follows: the method comprises the steps of utilizing the correlation among video frames, namely that the same positions of different frames of a video have similarity, and modeling that image blocks at the same positions of different frames have the same Gaussian distribution so as to obtain an initial reconstructed video sequence; and establishing a low-rank model for all similar image blocks between frames and within frames, and performing iterative optimization solution on a low-rank structure of the video and a reconstructed video sequence to realize high-quality time video compressed sensing reconstruction.
The specific steps for realizing the purpose of the invention are as follows:
(1) receiving measurement data:
(1a) a compressed sensing sender observes video data, and a result of one-time observation of every H frames of video data by using a random mask observation matrix is formed to form a frame of measurement data, and the measurement data and the random mask observation matrix are sent, wherein H represents a positive integer with the value range of 1 to 20;
(1b) the receiver receives the measurement data and the random mask observation matrix sent by the sender;
(2) initializing a single-frame covariance matrix set:
(2a) generating 18 artificial black-and-white images, wherein the size of each artificial black-and-white image is 65 multiplied by 65 pixels, and each artificial black-and-white image represents one direction;
(2b) adopting 8 × 8 pixels in window size, and respectively sliding a window on the artificial black-and-white image in each direction to select all blocks with the size of 8 × 8 pixels in a step length of 1 pixel to obtain a direction block set in each direction;
(2c) respectively carrying out Principal Component Analysis (PCA) decomposition on the direction block set of each direction to obtain Principal Component Analysis (PCA) orthogonal bases and eigenvalue matrixes, and reserving the first 8 maximum eigenvalues and corresponding principal component orthogonal bases in each direction to obtain corresponding eigenvalue matrixes and direction bases;
(2d) calculating a single-frame covariance matrix in the direction represented by each artificial black-and-white image to obtain a single-frame covariance matrix set;
(3) constructing an initial reconstruction video based on joint sparse and Gaussian distributed piecewise linear estimation:
(3a) for the direction represented by each artificial black-and-white image, putting H single-frame covariance matrixes in the direction on a diagonal line of the matrixes, and constructing a joint sparse video covariance matrix for reconstructing three-dimensional video data in the direction represented by each artificial black-and-white image, wherein the video covariance matrix in the k direction is as follows:
wherein,video covariance matrix, P, representing the k-th directionkA single-frame covariance matrix representing a k-th direction, wherein k represents a direction number represented by the artificial black-and-white image, and k is 1, 2., 18;
(3b) the method comprises the steps of initially setting M multiplied by N multiplied by H dimension reconstructed video as a zero matrix, dividing each frame of the initially reconstructed video into N multiplied by N dimension S image blocks by the size of step length p, reserving the positions of the blocks, and forming the image blocks at the same position of each frame into video blocks to obtain a video block set { x1,...,xl,...,xSAnd (c) the step of (c) in which,xlwhich represents the l-th video block,representing the first image block with size of n × n of the T-th frame of the reconstructed video, and T representing the transposition operation, for the first video block in the direction k represented by the artificial black-and-white imageObey a mean of 0 and a covariance matrix ofThe distribution of the gaussian component of (a) is,a video covariance matrix representing a k-th direction, M, N, H respectively represents the sizes of a first dimension, a second dimension and a third dimension in a reconstructed video, p and n respectively represent positive integers less than or equal to the minimum value in the M, N dimension, k represents a direction number represented by an artificial black-and-white image, k is 1,2,.., 18, l is 1,2,.., S represents the number of video blocks, i.e., the number of image blocks divided per frame;
(3c) the estimated value of the video block in the direction represented in the artificial black-and-white image is calculated according to the following formula:
<math> <mrow> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>=</mo> <msub> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <msup> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <mi>T</mi> </msup> <msup> <mrow> <mo>(</mo> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <msub> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <msup> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <mi>T</mi> </msup> <mo>+</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <msub> <mi>I</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <msub> <mi>y</mi> <mi>l</mi> </msub> </mrow> </math>
wherein,representing the estimate of the l-th video block in the direction k represented by the artificial black-and-white picture,represents the kth video covariance matrix, ΦlAn observation matrix, y, representing the l-th video block taken from the random mask observation matrixlRepresenting the measurement data of the l-th video block taken from the measurement data, sigma is taken to be in the range of 0 to 1, IdRepresents a d-dimensional identity matrix, T represents a transpose operation, ·-1The matrix inversion is represented, k represents a direction number represented by an artificial black-and-white image, k is 1,2,.., 18, l represents a number of video blocks, and l is 1, 2.., S represents the number of video blocks, i.e., the number of image blocks divided per frame;
(3d) calculating the optimal direction of the direction represented by the artificial black-and-white image according to the following formula:
<math> <mrow> <msub> <mover> <mi>k</mi> <mo>~</mo> </mover> <mi>l</mi> </msub> <mo>=</mo> <mi>arg</mi> <mi> </mi> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>k</mi> </munder> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>-</mo> <msub> <mi>y</mi> <mi>l</mi> </msub> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <msup> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>(</mo> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mo>|</mo> <msub> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <mo>|</mo> <mo>)</mo> </mrow> </math>
wherein,represents the optimal direction of the ith video block in the direction represented by the artificial black-and-white picture,denotes the value of k, phi, when returning the objective function value to the minimumlAn observation matrix representing the ith video block taken from the random mask observation matrix,represents the estimated value of the l-th video block in the direction k represented by the artificial black-and-white picture, ylMeasurement data representing the ith video block taken from the measurement data,representing the kth video covariance matrix, σ has a value in the range of 0 to 1 | · |2Represents the square of the norm, | - | represents the value of the determinant, T represents the transpose operation, ·-1The matrix inversion is represented, k represents a direction number represented by an artificial black-and-white image, k is 1,2,.., 18, l represents a number of video blocks, and l is 1, 2.., S represents the number of video blocks, i.e., the number of image blocks divided per frame;
(3e) combining the estimated value of each video block in the optimal direction into an initial reconstructed video according to the positions of the blocks reserved when the video blocks are divided in the step (3 b);
(3f) setting iteration times s and a maximum external iteration time U, setting the current iteration time s to be 1, and taking an initial reconstructed video as a reconstructed video of a first iteration, wherein U represents a positive integer;
(4) searching for similar blocks of each image block using correlation between video frames and intra frames:
(4a) dividing each frame of the reconstructed video into n multiplied by n dimensional blocks according to the size of step length p, and forming the blocks of all the frames into a two-dimensional image block set G1Wherein p and n represent positive integers less than or equal to the minimum value in the M, N dimensions, and M, N represents the sizes of the first dimension and the second dimension of the reconstructed video;
(4b) dividing each frame of the reconstructed video into n x n dimensional blocks with step length of 1, and forming the blocks of all the frames into a two-dimensional image block set G2And recording the video block in G2Wherein n represents a positive integer less than or equal to the smallest value among the M, N dimensions, and M, N represents the sizes of the first and second dimensions of the reconstructed video;
(4c) for G1From G to G2All blocks in a Z multiplied by H window around the block are taken out and are marked as neighbor blocks of the block, wherein Z represents the size of the first dimension and the second dimension of the window, H represents the size of the third dimension of the window, G1Set of two-dimensional image blocks representing a block divided by the size of a step p,G2Representing a set of two-dimensional image blocks partitioned into blocks with a step size of 1;
(4d) computing a set of two-dimensional video blocks G1The Euclidean distance between each block and its adjacent blocks is sorted from small to large according to the Euclidean distance, the first Q blocks are selected as the similar blocks of the corresponding blocks, and the similar blocks of each block are recorded in G2Wherein Q represents a positive integer less than half the number of neighbor blocks;
(5) and solving the low-rank structure of the video according to the following formula:
<math> <mrow> <msup> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi> </mi> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> </munder> <mo>|</mo> <mo>|</mo> <msub> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mrow> <mo>(</mo> <msup> <mi>X</mi> <mi>s</mi> </msup> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&lambda;</mi> <mo>|</mo> <mo>|</mo> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mo>|</mo> <msub> <mo>|</mo> <mo>*</mo> </msub> </mrow> </math>
wherein,a low rank structure representing the ith image block of the t frame for the (s + 1) th iteration,representing a low-rank structure of an image block when the value of an objective function is minimalThe value of (a) is,representing an extraction transformation, X, of all similar blocks from the ith image block of the t framesRepresenting the reconstructed video for the s-th iteration, R ~ l t ( X s ) = ( R l t 1 ( X s ) , ... , R l t q ( X s ) , ... R l t Q ( X s ) ) , representing the extraction of the s-th iterative reconstructed video XsThe extracted transforms of all similar blocks of the ith image block of the tth frame,an extraction matrix representing the q-th similar block for extracting the l-th image block of the t-th frame,representing reconstructed video X from the s-th iteration of extractionsThe extraction matrix of the q-th similar block of the ith image block of the t frame is 0.75,represents a square operation of Frobenius Frobenius norm, | · |)*Representing a kernel norm operation, H representing the size of a third dimension of the reconstructed video, t representing the number of a video frame, i.e., the number of image blocks of the t-th frame, t being 1,2,.., H, l representing the number of video blocks, l being 1,2,.., S representing the number of video blocks, i.e., the number of divided image blocks per frame, Q being 1,2,.., Q representing the number of similar blocks;
(6) the reconstructed video is updated as follows:
<math> <mrow> <msup> <mi>X</mi> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi></mi> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>x</mi> </munder> <mo>|</mo> <mo>|</mo> <mi>y</mi> <mo>-</mo> <mi>&Phi;</mi> <mi>X</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&eta;</mi> <munder> <mo>&Sigma;</mo> <mi>t</mi> </munder> <munder> <mo>&Sigma;</mo> <mi>l</mi> </munder> <mo>|</mo> <mo>|</mo> <msub> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> </mrow> </math>
wherein, Xs+1Representing the reconstructed video for the s +1 th iteration,representing taking the value of X for the reconstructed video when the value of the objective function is minimal, y represents the one-dimensional vector drawn from the measurement data, Φ represents the video observation matrix generated from the random mask observation matrix,representing the extraction transform of all similar blocks of the ith image block of the tth frame, X representing the reconstructed video, R ~ l t ( X ) = ( R l t 1 ( X ) , ... , R l t q ( X ) , ... R l t Q ( X ) ) , R ~ l t ( X ) representing an extraction transformation to extract all similar blocks of the ith image block of the t frame of reconstructed video X,an extraction matrix representing the q-th similar block for extracting the l-th image block of the t-th frame,an extraction matrix representing the extraction of the q-th similar block of the i-th image block of the t-th frame of the reconstructed video X,the low rank structure of the ith image block of the t frame of the (s + 1) th iteration is represented, the value of eta is 1, sigma represents the summation operation,which means that a 2-norm squaring operation is performed,expressing a squaring operation of a Frobenius norm, H expressing the size of a third dimension of a reconstructed video, t expressing the number of a video frame, namely the number of image blocks of the t-th frame, t being 1,2, wherein H, l expressing the number of the video blocks, l being 1,2, wherein S, S expressing the number of the video blocks, namely the number of image blocks divided by each frame, and Q being 1,2, wherein Q expresses the number of similar blocks;
(7) judging whether the current iteration times are larger than the maximum external iteration times, if so, executing the step (8), otherwise, adding 1 to the current iteration times, and executing the step (4);
(8) and outputting the reconstructed video.
Compared with the prior art, the invention has the following advantages:
firstly, the method for constructing the initial reconstruction video by utilizing the piecewise linear estimation based on the joint sparsity and Gaussian distribution overcomes the defect that the piecewise linear reconstruction method in the prior art can not be directly used for reconstructing the video sequence, so that the method has the advantage of improving the robustness of the video reconstruction method.
Secondly, the method searches the similar blocks of each image block by utilizing the correlation between frames and in frames of the video, solves the low-rank structure of the video, and solves the reconstructed video by iterative optimization, thereby overcoming the defect that the method based on the Gaussian mixture model in the prior art does not utilize the correlation between frames and in frames of the video, and leading the method to have the advantage of improving the accuracy of the reconstructed video.
Drawings
FIG. 1 is a flow chart of the present invention;
fig. 2 is a graph of the present invention and prior art video reconstruction for a vehicle at H-8;
FIG. 3 is a PSNR line graph of a vehicle video reconstructed by the method of the present invention and the prior art.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, the specific steps of the present invention are as follows.
Step 1, receiving measurement data.
The compressed sensing sender observes video data, and a result of one-time observation of every H frames of video data by using a random mask observation matrix is formed into a frame of measurement data, and the measurement data and the random mask observation matrix are sent, wherein H represents a positive integer with the value range of 1-20.
And the receiving party receives the measurement data and the random mask observation matrix sent by the sending party.
Original video data of MxNxH dimensionObserving by using an M multiplied by N multiplied by H dimensional random mask observation matrix A to obtain a frame of M multiplied by N dimensional measurement data Y: <math> <mrow> <msub> <mi>Y</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mo>&lsqb;</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mn>2</mn> </mrow> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>H</mi> </mrow> </msub> <mo>&rsqb;</mo> <msup> <mrow> <mo>&lsqb;</mo> <mrow> <msub> <mover> <mi>x</mi> <mo>^</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mover> <mi>x</mi> <mo>^</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mn>2</mn> </mrow> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mover> <mi>x</mi> <mo>^</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mover> <mi>x</mi> <mo>^</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>H</mi> </mrow> </msub> </mrow> <mo>&rsqb;</mo> </mrow> <mi>T</mi> </msup> <mo>,</mo> </mrow> </math> ·Tindicating a transposition operation by vector, Yi,jThe elements of the ith row and j column of the measurement data Y are denoted, i 1, 2.. and M, j 1, 2.. and N,representing original video dataElements in ith row and j column on tth frame, where t is 1,2i,j,nElements representing ith row and j column on t frame of random mask observation matrix A, Ai,j,nWith a probability r of randomly being 0 or 1.
And 2, initializing a single-frame covariance matrix set.
And generating 18 artificial black-and-white images, wherein the size of each artificial black-and-white image is 65 x 65 pixels, and each artificial black-and-white image represents one direction.
And adopting a window with the size of 8 multiplied by 8 pixels, and sliding the window on the artificial black-and-white image in each direction by the step length of 1 pixel to select all blocks with the size of 8 multiplied by 8 pixels respectively to obtain a direction block set in each direction.
And respectively carrying out Principal Component Analysis (PCA) decomposition on the direction block set in each direction to obtain Principal Component Analysis (PCA) orthogonal bases and eigenvalue matrixes, and reserving the first 8 maximum eigenvalues and corresponding principal component orthogonal bases in each direction to obtain corresponding eigenvalue matrixes and direction bases.
And calculating the single-frame covariance matrix in the direction represented by each artificial black-and-white image to obtain a single-frame covariance matrix set.
The single-frame covariance matrix in the direction represented by each black-and-white image is calculated according to the following formula:
Pk=BkDkBk T
wherein, PkRepresents the covariance matrix of the single frame in the k-th direction, BkDenotes a direction base in the k-th direction, DkDenotes a matrix of eigenvalues in the k-th direction, T denotes a transposition operation, k denotes the number of directions, and k is 1, 2.
Principal component analysis PCA decomposition the steps are as follows:
selecting one direction from all directions of the artificial black-and-white image, and solving a covariance matrix of a direction block set of the selected direction according to the following formula:
P=E[fifi T]
where P represents the covariance matrix of the set of directional blocks for the selected direction, E represents the mathematical expectation, fiAn i-th block in the set of directional blocks representing the selected direction, T represents a transpose operation.
And diagonalizing the covariance matrix according to the following formula to obtain a principal component analysis PCA orthogonal basis and an eigenvalue matrix:
P=BDBT
wherein, P represents the covariance matrix of the direction block set of the selected direction, B represents the principal component analysis PCA orthogonal basis of the selected direction, D represents the eigenvalue matrix of the direction, and T represents the transposition operation.
And 3, constructing an initial reconstructed video based on joint sparse and Gaussian distribution piecewise linear estimation.
For the direction represented by each artificial black-and-white image, putting H single-frame covariance matrixes in the direction on a diagonal line of the matrixes to jointly sparsely represent the video covariance matrixes represented by the direction represented by each artificial black-and-white image for the three-dimensional video data, wherein the video covariance matrixes in the k direction are as follows:
wherein,video covariance matrix, P, representing the k-th directionkA single frame covariance matrix representing the k-th direction, k representing the direction number represented by the artificial black-and-white image, and k being 1, 2.
The method comprises the steps of initially setting M multiplied by N multiplied by H dimension reconstructed video as a zero matrix, dividing each frame of the initially reconstructed video into N multiplied by N dimension S image blocks by the size of step length p, reserving the positions of the blocks, and forming the image blocks at the same position of each frame into video blocks to obtain a video block set { x1,...,xl,...,xSAnd (c) the step of (c) in which,xlwhich represents the l-th video block,representing reconstructed viewsThe ith size of the image block of the tth frame is n × n, T represents transposition operation, and for the ith video block in the direction k represented by the artificial black-and-white imageObey a mean of 0 and a covariance matrix ofThe distribution of the gaussian component of (a) is,a video covariance matrix indicating a k-th direction, M, N, H indicates sizes of a first dimension, a second dimension, and a third dimension in the reconstructed video, p and n indicate positive integers less than or equal to a minimum value in the M, N dimension, k indicates a direction number represented by an artificial black-and-white map, and k is 1, 2.
The steps of calculating the optimal direction of the video block in the direction represented by the artificial black-and-white image by using piecewise linear estimation are as follows:
the estimated value of the video block in the direction represented in the artificial black-and-white image is calculated according to the following formula:
<math> <mrow> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>=</mo> <msub> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <msup> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <mi>T</mi> </msup> <msup> <mrow> <mo>(</mo> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <msub> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <msup> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <mi>T</mi> </msup> <mo>+</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <msub> <mi>I</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <msub> <mi>y</mi> <mi>l</mi> </msub> </mrow> </math>
wherein,representing the estimate of the l-th video block in the direction k represented by the artificial black-and-white picture,represents the kth video covariance matrix, ΦlAn observation matrix, y, representing the l-th video block taken from the random mask observation matrixlRepresenting the measurement data of the l-th video block taken from the measurement data, sigma is taken to be in the range of 0 to 1, IdRepresents a d-dimensional identity matrix, T represents a transpose operation, ·-1The matrix inversion is expressed, k denotes a direction number represented by the artificial black-and-white image, k is 1, 2.., 18, l denotes a number of video blocks, and l is 1, 2.., S denotes the number of video blocks, that is, the number of image blocks divided per frame.
Calculating the optimal direction of the direction represented by the artificial black-and-white image according to the following formula:
<math> <mrow> <msub> <mover> <mi>k</mi> <mo>~</mo> </mover> <mi>l</mi> </msub> <mo>=</mo> <mi>arg</mi> <mi> </mi> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>k</mi> </munder> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>-</mo> <msub> <mi>y</mi> <mi>l</mi> </msub> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <msup> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>(</mo> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mo>|</mo> <msub> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <mo>|</mo> <mo>)</mo> </mrow> </math>
wherein,represents the optimal direction of the ith video block in the direction represented by the artificial black-and-white picture,denotes the value of k, phi, when returning the objective function value to the minimumlAn observation matrix representing the ith video block taken from the random mask observation matrix,represents the estimated value of the l-th video block in the direction k represented by the artificial black-and-white picture, ylMeasurement data representing the ith video block taken from the measurement data,representing the kth video covariance matrix, σ has a value in the range of 0 to 1 | · |2Represents the square of the norm, | - | represents the value of the determinant, T represents the transpose operation, ·-1The matrix inversion is expressed, k denotes a direction number represented by the artificial black-and-white image, k is 1, 2.., 18, l denotes a number of video blocks, and l is 1, 2.., S denotes the number of video blocks, that is, the number of image blocks divided per frame.
And combining the estimated values of each video block in the optimal direction into an initial reconstructed video according to the positions of the reserved blocks when the video blocks are divided.
And setting an iteration number s and a maximum external iteration number U, setting the current iteration number s as 1, and taking the initial reconstructed video as a reconstructed video of the first iteration, wherein U represents a positive integer.
And 4, searching similar blocks of each video block by using the correlation between video frames and video frames.
Dividing each frame of the reconstructed video into n multiplied by n dimensional blocks according to the size of step length p, and forming the blocks of all the frames into a two-dimensional image block set G1Where p and n are positive integers equal to or less than the minimum value among the M, N dimensions, and M, N is the size of the first and second dimensions of the reconstructed video.
Dividing each frame of the reconstructed video into n x n dimensional blocks with step length of 1, and forming the blocks of all the frames into a two-dimensional image block set G2And recording the video block in G2Where n denotes a positive integer equal to or less than the smallest value among the M, N dimensions, and M, N denotes the sizes of the first and second dimensions of the reconstructed video.
For G1From G to G2All blocks in a Z multiplied by H window around the block are taken out and are marked as neighbor blocks of the block, wherein Z represents the size of the first dimension and the second dimension of the window, H represents the size of the third dimension of the window, G1Representing a two-dimensional image divided into blocks by the size of a step pSet of blocks, G2Representing a set of two-dimensional image blocks partitioned into blocks with a step size of 1.
Computing a set of two-dimensional video blocks G1The Euclidean distance between each block and its adjacent blocks is sorted from small to large according to the Euclidean distance, the first Q blocks are selected as the similar blocks of the corresponding blocks, and the similar blocks of each block are recorded in G2Wherein Q represents a positive integer less than half the number of neighbor blocks.
The similar block of each block is in G2The index in (1) is expressed asWherein,a similar block index called the ith block of the tth frame,the index value of the qth block similar to the block is represented, Q is 1,2,.. Q, t represents the number of video frames, t is 1,2,... H, l represents the number of video blocks, l is 1, 2.. once, S, H represents the size of the reconstructed video third dimension, S represents the number of video blocks, i.e., the number of image blocks divided per frame, and Q represents a positive integer less than half the number of neighboring blocks.
And 5, solving the low-rank structure of the video according to the following formula.
<math> <mrow> <msup> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi> </mi> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> </munder> <mo>|</mo> <mo>|</mo> <msub> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mrow> <mo>(</mo> <msup> <mi>X</mi> <mi>s</mi> </msup> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&lambda;</mi> <mo>|</mo> <mo>|</mo> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mo>|</mo> <msub> <mo>|</mo> <mo>*</mo> </msub> </mrow> </math>
Wherein,a low rank structure representing the ith image block of the t frame for the (s + 1) th iteration,representing a low-rank structure of an image block when the value of an objective function is minimalThe value of (a) is,representing an extraction transformation, X, of all similar blocks from the ith image block of the t framesRepresenting the reconstructed video for the s-th iteration, R ~ l t ( X s ) = ( R l t 1 ( X s ) , . . . , R l t q ( X s ) , . . . R l t Q ( X s ) ) , representing the extraction of the s-th iterative reconstructed video XsThe extracted transforms of all similar blocks of the ith image block of the tth frame,an extraction matrix representing the q-th similar block for extracting the l-th image block of the t-th frame,representing reconstructed video X from the s-th iteration of extractionsThe extraction matrix of the q-th similar block of the ith image block of the t frame is 0.75,represents a square operation of Frobenius Frobenius norm, | · |)*Representing a kernel norm operation, H representing the size of the third dimension of the reconstructed video, and t representing the size of the video frameThe number, i.e. the number of the t-th frame image block, t 1,2,., H, l denotes the number of the video blocks, l 1,2,., S denotes the number of video blocks, i.e. the number of divided image blocks per frame, Q1, 2,., Q denotes the number of similar blocks,
to pairCarrying out SVD decomposition:wherein U represents a left unitary matrix, V represents a right unitary matrix, and Lambda singular value matrix.
Performing soft threshold operation on the singular value matrix to obtain <math> <mrow> <msub> <mover> <mi>&Lambda;</mi> <mo>~</mo> </mover> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfenced open = '{' close = ''> <mtable> <mtr> <mtd> <mrow> <msub> <mi>&Lambda;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>-</mo> <mi>&lambda;</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>&Lambda;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>&gt;</mo> <mi>&lambda;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <msub> <mi>&Lambda;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>&le;</mo> <mi>&lambda;</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math> Representation matrixRow i and column j.
Calculate the low rank structure for the (s + 1) th iteration:
and 6, updating the reconstructed video according to the following formula.
<math> <mrow> <msup> <mi>X</mi> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi></mi> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>x</mi> </munder> <mo>|</mo> <mo>|</mo> <mi>y</mi> <mo>-</mo> <mi>&Phi;</mi> <mi>X</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&eta;</mi> <munder> <mo>&Sigma;</mo> <mi>t</mi> </munder> <munder> <mo>&Sigma;</mo> <mi>l</mi> </munder> <mo>|</mo> <mo>|</mo> <msub> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> </mrow> </math>
Wherein, Xs+1Representing the reconstructed video for the s +1 th iteration,representing taking the value of X for the reconstructed video when the value of the objective function is minimal, y represents the one-dimensional vector drawn from the measurement data, Φ represents the video observation matrix generated from the random mask observation matrix,representing the extraction transform of all similar blocks of the ith image block of the tth frame, X representing the reconstructed video, R ~ l t ( X ) = ( R l t 1 ( X ) , ... , R l t q ( X ) , ... R l t Q ( X ) ) , representing an extraction transformation to extract all similar blocks of the ith image block of the t frame of reconstructed video X,an extraction matrix representing the q-th similar block for extracting the l-th image block of the t-th frame,an extraction matrix representing the extraction of the q-th similar block of the i-th image block of the t-th frame of the reconstructed video X,the low rank structure of the ith image block of the t frame of the (s + 1) th iteration is represented, the value of eta is 1, sigma represents the summation operation,which means that a 2-norm squaring operation is performed,denotes a squaring operation performed on a Frobenius norm, H denotes a size of a third dimension of a reconstructed video, t denotes a number of a video frame, that is, a number of image blocks of the t-th frame, t is 1,2,.. H, l denotes a number of video blocks, l is 1,2,... S, S denotes the number of video blocks, that is, the number of divided image blocks of each frame, and Q is 1,2,... Q, Q denotes the number of similar blocks.
Solution of reconstructed video for s +1 th iteration: <math> <mrow> <msup> <mi>X</mi> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msup> <mi>&Phi;</mi> <mi>T</mi> </msup> <mi>&Phi;</mi> <mo>+</mo> <mi>&eta;</mi> <munder> <mo>&Sigma;</mo> <mi>t</mi> </munder> <munder> <mo>&Sigma;</mo> <mi>l</mi> </munder> <msubsup> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> <mi>T</mi> </msubsup> <msub> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <msup> <mi>&Phi;</mi> <mi>T</mi> </msup> <mi>y</mi> <mo>+</mo> <mi>&eta;</mi> <munder> <mo>&Sigma;</mo> <mi>t</mi> </munder> <munder> <mo>&Sigma;</mo> <mi>l</mi> </munder> <msubsup> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> <mi>T</mi> </msubsup> <msubsup> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </math>
wherein, <math> <mrow> <msubsup> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> <mi>T</mi> </msubsup> <msub> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <msubsup> <mi>l</mi> <mi>t</mi> <mi>q</mi> </msubsup> <mo>&Element;</mo> <msub> <mi>E</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> </mrow> </munder> <msubsup> <mi>R</mi> <msubsup> <mi>l</mi> <mi>t</mi> <mi>q</mi> </msubsup> <mi>T</mi> </msubsup> <msub> <mi>R</mi> <msubsup> <mi>l</mi> <mi>t</mi> <mi>q</mi> </msubsup> </msub> <mo>,</mo> <msub> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mo>=</mo> <mrow> <mo>(</mo> <msub> <mi>R</mi> <msubsup> <mi>l</mi> <mi>t</mi> <mn>1</mn> </msubsup> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>R</mi> <msubsup> <mi>l</mi> <mi>t</mi> <mi>q</mi> </msubsup> </msub> <mo>,</mo> <mo>...</mo> <msub> <mi>R</mi> <msubsup> <mi>l</mi> <mi>t</mi> <mi>Q</mi> </msubsup> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math>
and 7, judging whether the current iteration frequency is greater than the maximum external iteration frequency, if so, executing a step 8, otherwise, adding 1 to the current iteration frequency, and executing a step 4.
And 8, outputting the reconstructed video.
The invention is further described below in conjunction with the simulation diagrams.
1. Simulation conditions are as follows:
the running system of the simulation experiment is a CPUIntel (R) core (TM) i5-34703.20GHz 32-bit windows7 operating system, the simulation software adopts MatlabR2011b, and the simulation parameters are set as follows.
The invention uses 256 × 256 96 frames of traffic vehicle pictures, each H ═ isCarrying out one observation on 8 frames of video data, and observing an element A in a matrix A by using a random maski,j,nWith the probability r being 0.5 and randomly 1, the partition size n × n being 8 × 8, the step p being 4, the search similarity block window size Z × H being 50 × 50 × 8, the number Q of similarity blocks being 100, and the maximum number U of iterations being 10, where given parameters λ being 0.75 and η being 1.
2. Simulation content and result analysis:
under the above simulation conditions, three different comparison methods are used to reconstruct the pictures of 96 frames of traffic vehicles:
the comparison method 1 is a Gaussian mixture model method in the prior art, and reconstructs 96 frames of pictures of traffic vehicles;
the comparison method 2 is that the steps 1,2 and 3 of the invention are adopted to construct an initial reconstruction video, the images of 96 frames of traffic vehicles are reconstructed, and the initial reconstruction video is taken as a reconstruction result;
the comparison method 3 is to adopt a Gaussian mixture model method in the prior art to construct an initial reconstruction video, replace the step 3 of the invention, and reconstruct the pictures of 96 frames of traffic vehicles.
The reconstructed visual effects of the three comparison methods and the method of the present invention are shown in fig. 2, wherein fig. 2(a) is the original image of the 1 st frame of the 96-frame traffic vehicle video, fig. 2(b) is the original image of the 2 nd frame of the 96-frame traffic vehicle video, fig. 2(c) is the original image of the 3 rd frame of the 96-frame traffic vehicle video, fig. 2(d) is the original image of the 3 rd frame of the 96-frame traffic vehicle video, fig. 2(e) is the reconstructed result image of the 1 st frame of the traffic vehicle video reconstructed by the comparison method 1, fig. 2(f) is the reconstructed result image of the 2 nd frame of the traffic vehicle video reconstructed by the comparison method 1, fig. 2(g) is the reconstructed result image of the 3 rd frame of the traffic vehicle video reconstructed by the comparison method 1, fig. 2(h) is the reconstructed result image of the 4 th frame of the traffic vehicle video reconstructed by the comparison method 1, fig. 2(i) is the reconstructed result image of the 1 st frame of the traffic vehicle video reconstructed by the comparison method 2, FIG. 2(j) is a graph of the result of reconstructing a 2 nd frame of a traffic vehicle video by using a comparison method 2, FIG. 2(k) is a graph of the result of reconstructing a 3 rd frame of the traffic vehicle video by using a comparison method 2, FIG. 2(l) is a graph of the result of reconstructing a 4 th frame of the traffic vehicle video by using a comparison method 2, FIG. 2(m) is a graph of the result of reconstructing a 1 st frame of the traffic vehicle video by using a comparison method 3, FIG. 2(n) is a graph of the result of reconstructing a 2 nd frame of the traffic vehicle video by using a comparison method 3, FIG. 2(o) is a graph of the result of reconstructing a 3 rd frame of the traffic vehicle video by using a comparison method 3, FIG. 2(p) is a graph of the result of reconstructing a 4 th frame of the traffic vehicle video by using a comparison method 3, FIG. 2(q) is a graph of the result of reconstructing a 1 st frame of the traffic vehicle video by using the method of the present invention, fig. 2(r) is a reconstruction result diagram of a 2 nd frame of a traffic vehicle video reconstructed by the method of the present invention, fig. 2(s) is a reconstruction result diagram of a 3 rd frame of a traffic vehicle video reconstructed by the method of the present invention, and fig. 2(t) is a reconstruction result diagram of a 4 th frame of a traffic vehicle video reconstructed by the method of the present invention.
As can be seen from the reconstructed image, the noise near the edge of the reconstructed image is obviously less, and the visual effect of the reconstructed image is superior to that of the other three contrast algorithms.
The first eight frames of the traffic vehicle picture are reconstructed, and the peak signal-to-noise ratio PSNR value by using the method and the three comparison methods is shown in the table 1.
As can be seen from Table 1, the PSNR value of the reconstructed video is higher than that of the three comparison methods, which indicates that the quality of the reconstructed video is good.
Table 1 table of reconstruction results of the first eight frames of traffic vehicle pictures using different methods
Fig. 3 is a broken line diagram of peak signal-to-noise ratio PSNR values reconstructed by three comparison methods and the method of the present invention for a 96-frame picture, wherein an abscissa in fig. 3 represents a video frame of a transportation vehicle, and an ordinate represents a peak signal-to-noise ratio PSNR (db) value, wherein a broken line with a star number identifies a broken line of PSNR values reconstructed by the video using the comparison method 1, a solid line with a star number identifies a broken line of PSNR values reconstructed by the video using the comparison method 2, a broken line with a circle identifies a broken line of PSNR values reconstructed by the video using the comparison method 3, and a solid line with a circle identifies a broken line of PSNR values reconstructed by the video using the present invention.
As can be seen from fig. 3, the PSNR value of the reconstructed result graph of each frame obtained by the method of the present invention is significantly higher than that obtained by other methods.
In conclusion, the method and the device can well obtain clear reconstructed videos, and compared with other existing reconstruction methods, the reconstruction quality of reconstructing the videos is improved.

Claims (4)

1. A compressed sensing video reconstruction method based on a low-rank model comprises the following steps:
(1) receiving measurement data:
(1a) a compressed sensing sender observes video data, and a result of one-time observation of every H frames of video data by using a random mask observation matrix is formed to form a frame of measurement data, and the measurement data and the random mask observation matrix are sent, wherein H represents a positive integer with the value range of 1 to 20;
(1b) the receiver receives the measurement data and the random mask observation matrix sent by the sender;
(2) initializing a single-frame covariance matrix set:
(2a) generating 18 artificial black-and-white images, wherein the size of each artificial black-and-white image is 65 multiplied by 65 pixels, and each artificial black-and-white image represents one direction;
(2b) adopting 8 × 8 pixels in window size, and respectively sliding a window on the artificial black-and-white image in each direction to select all blocks with the size of 8 × 8 pixels in a step length of 1 pixel to obtain a direction block set in each direction;
(2c) respectively carrying out Principal Component Analysis (PCA) decomposition on the direction block set of each direction to obtain Principal Component Analysis (PCA) orthogonal bases and eigenvalue matrixes, and reserving the first 8 maximum eigenvalues and corresponding principal component orthogonal bases in each direction to obtain corresponding eigenvalue matrixes and direction bases;
(2d) calculating a single-frame covariance matrix in the direction represented by each artificial black-and-white image to obtain a single-frame covariance matrix set;
(3) constructing an initial reconstruction video based on joint sparse and Gaussian distributed piecewise linear estimation:
(3a) for the direction represented by each artificial black-and-white image, putting H single-frame covariance matrixes in the direction on a diagonal line of the matrixes, and constructing a joint sparse video covariance matrix for reconstructing three-dimensional video data in the direction represented by each artificial black-and-white image, wherein the video covariance matrix in the k direction is as follows:
wherein,video covariance matrix, P, representing the k-th directionkA single-frame covariance matrix representing a k-th direction, wherein k represents a direction number represented by the artificial black-and-white image, and k is 1, 2., 18;
(3b) initializing M × N × H-dimensional reconstructed video as a zero matrix, dividing each frame of the initial reconstructed video into N × N-dimensional S image blocks by a step size of p, andreserving the position of the block, and forming the image blocks at the same position of each frame into a video block to obtain a video block set { x }1,...,xl,...,xSAnd (c) the step of (c) in which,xlwhich represents the l-th video block,representing the first image block with size of n × n of the T-th frame of the reconstructed video, and T representing the transposition operation, for the first video block in the direction k represented by the artificial black-and-white imageObey a mean of 0 and a covariance matrix ofThe distribution of the gaussian component of (a) is,a video covariance matrix representing a k-th direction, M, N, H respectively represents the sizes of a first dimension, a second dimension and a third dimension in a reconstructed video, p and n respectively represent positive integers less than or equal to the minimum value in the M, N dimension, k represents a direction number represented by an artificial black-and-white image, k is 1,2,.., 18, l is 1,2,.., S represents the number of video blocks, i.e., the number of image blocks divided per frame;
(3c) the estimated value of the video block in the direction represented in the artificial black-and-white image is calculated according to the following formula:
<math> <mrow> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>=</mo> <msub> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <msup> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <mi>T</mi> </msup> <msup> <mrow> <mo>(</mo> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <msub> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <msup> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <mi>T</mi> </msup> <mo>+</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <msub> <mi>I</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <msub> <mi>y</mi> <mi>l</mi> </msub> </mrow> </math>
wherein,representing the estimate of the l-th video block in the direction k represented by the artificial black-and-white picture,represents the kth video covariance matrix, ΦlAn observation matrix, y, representing the l-th video block taken from the random mask observation matrixlRepresenting the measurement data of the l-th video block taken from the measurement data, sigma is taken to be in the range of 0 to 1, IdRepresents a d-dimensional identity matrix, T represents a transpose operation, ·-1The matrix inversion is represented, k represents a direction number represented by an artificial black-and-white image, k is 1,2,.., 18, l represents a number of video blocks, and l is 1, 2.., S represents the number of video blocks, i.e., the number of image blocks divided per frame;
(3d) calculating the optimal direction of the direction represented by the artificial black-and-white image according to the following formula:
<math> <mrow> <msub> <mover> <mi>k</mi> <mo>~</mo> </mover> <mi>l</mi> </msub> <mo>=</mo> <mi>arg</mi> <mi> </mi> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>k</mi> </munder> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msub> <mi>&Phi;</mi> <mi>l</mi> </msub> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>-</mo> <msub> <mi>y</mi> <mi>l</mi> </msub> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <msup> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>(</mo> <msubsup> <mi>x</mi> <mi>l</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mo>|</mo> <msub> <mover> <mi>P</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <mo>|</mo> <mo>)</mo> </mrow> </math>
wherein,represents the optimal direction of the ith video block in the direction represented by the artificial black-and-white picture,denotes the value of k, phi, when returning the objective function value to the minimumlAn observation matrix representing the ith video block taken from the random mask observation matrix,represents the estimated value of the l-th video block in the direction k represented by the artificial black-and-white picture, ylMeasurement data representing the ith video block taken from the measurement data,representing the kth video covariance matrix, the value range of sigma is 0 to 1, | · |. the luminance of the calculation is zero2Represents the square of the norm, | - | represents the value of the determinant, T represents the transpose operation, ·-1The matrix inversion is represented, k represents a direction number represented by an artificial black-and-white image, k is 1,2,.., 18, l represents a number of video blocks, and l is 1, 2.., S represents the number of video blocks, i.e., the number of image blocks divided per frame;
(3e) combining the estimated value of each video block in the optimal direction into an initial reconstructed video according to the positions of the blocks reserved when the video blocks are divided in the step (3 b);
(3f) setting iteration times s and a maximum external iteration time U, setting the current iteration time s to be 1, and taking an initial reconstructed video as a reconstructed video of a first iteration, wherein U represents a positive integer;
(4) searching for similar blocks of each image block using correlation between video frames and intra frames:
(4a) dividing each frame of the reconstructed video into n multiplied by n dimensional blocks according to the size of step length p, and forming the blocks of all the frames into a two-dimensional image block set G1Wherein p and n represent positive integers less than or equal to the minimum value in the M, N dimensions, and M, N represents the sizes of the first dimension and the second dimension of the reconstructed video;
(4b) dividing each frame of the reconstructed video into n x n dimensional blocks with step length of 1, and forming the blocks of all the frames into a two-dimensional image block set G2And recording the video block in G2Wherein n represents a positive integer less than or equal to the smallest value among the M, N dimensions, and M, N represents the sizes of the first and second dimensions of the reconstructed video;
(4c) for G1From G to G2Take out all the Z XZ XH windows around the blockInner block, denoted as the block's neighbor, where Z represents the size of the window's first and second dimensions, H represents the size of the window's third dimension, G1Representing a set of two-dimensional image blocks divided into blocks by the size of a step p, G2Representing a set of two-dimensional image blocks partitioned into blocks with a step size of 1;
(4d) computing a set of two-dimensional image blocks G1The Euclidean distance between each block and its adjacent blocks is sorted from small to large according to the Euclidean distance, the first Q blocks are selected as the similar blocks of the corresponding blocks, and the similar blocks of each block are recorded in G2Wherein Q represents a positive integer less than half the number of neighbor blocks;
(5) and solving the low-rank structure of the video according to the following formula:
<math> <mrow> <msup> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi> </mi> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> </munder> <mo>|</mo> <mo>|</mo> <msub> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mrow> <mo>(</mo> <msup> <mi>X</mi> <mi>s</mi> </msup> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&lambda;</mi> <mo>|</mo> <mo>|</mo> <msub> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mo>|</mo> <msub> <mo>|</mo> <mo>*</mo> </msub> </mrow> </math>
wherein,a low rank structure representing the ith image block of the t frame for the (s + 1) th iteration,representing a low-rank structure of an image block when the value of an objective function is minimalThe value of (a) is,representing an extraction transformation, X, of all similar blocks from the ith image block of the t framesRepresenting the reconstructed video for the s-th iteration, R ~ l t ( X s ) = ( R l t 1 ( X s ) , ... , R l t q ( X s ) , ... R l t Q ( X s ) ) , representing the extraction of the s-th iterative reconstructed video XsThe extracted transforms of all similar blocks of the ith image block of the tth frame,an extraction matrix representing the q-th similar block for extracting the l-th image block of the t-th frame,representing reconstructed video X from the s-th iteration of extractionsThe extraction matrix of the q-th similar block of the ith image block of the t frame is 0.75,represents the squaring operation of Frobenius norm, | | ·| luminance*Representing a kernel norm operation, H representing the size of a third dimension of the reconstructed video, t representing the number of a video frame, i.e., the number of image blocks of the t-th frame, t being 1,2,.., H, l representing the number of video blocks, l being 1,2,.., S representing the number of video blocks, i.e., the number of divided image blocks per frame, Q being 1,2,.., Q representing the number of similar blocks;
(6) the reconstructed video is updated as follows:
<math> <mrow> <msup> <mi>X</mi> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi> </mi> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>X</mi> </munder> <mo>|</mo> <mo>|</mo> <mi>y</mi> <mo>-</mo> <mi>&Phi;</mi> <mi>X</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&eta;</mi> <munder> <mo>&Sigma;</mo> <mi>t</mi> </munder> <munder> <mo>&Sigma;</mo> <mi>l</mi> </munder> <mo>|</mo> <mo>|</mo> <msub> <mover> <mi>R</mi> <mo>~</mo> </mover> <msub> <mi>l</mi> <mi>t</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mi>L</mi> <msub> <mi>l</mi> <mi>t</mi> </msub> <mrow> <mi>s</mi> <mo>+</mo> <mn>1</mn> </mrow> </msubsup> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> </mrow> </math>
wherein, Xs+1Representing the reconstructed video for the s +1 th iteration,representing taking the value of X for the reconstructed video when the value of the objective function is minimal, y represents the one-dimensional vector drawn from the measurement data, Φ represents the video observation matrix generated from the random mask observation matrix,representing the extraction transform of all similar blocks of the ith image block of the tth frame, X representing the reconstructed video, R ~ l t ( X ) = ( R l t 1 ( X ) , ... , R l t q ( X ) , ... R l t Q ( X ) ) , representing an extraction transformation to extract all similar blocks of the ith image block of the t frame of reconstructed video X,an extraction matrix representing the q-th similar block for extracting the l-th image block of the t-th frame,an extraction matrix representing the extraction of the q-th similar block of the i-th image block of the t-th frame of the reconstructed video X,the low rank structure of the ith image block of the t frame of the (s + 1) th iteration is represented, the value of eta is 1, sigma represents the summation operation,which means that a 2-norm squaring operation is performed,expressing a squaring operation of a Frobenius norm, H expressing the size of a third dimension of a reconstructed video, t expressing the number of a video frame, namely the number of image blocks of the t-th frame, t being 1,2, wherein H, l expressing the number of the video blocks, l being 1,2, wherein S, S expressing the number of the video blocks, namely the number of image blocks divided by each frame, and Q being 1,2, wherein Q expresses the number of similar blocks;
(7) judging whether the current iteration times are larger than the maximum external iteration times, if so, executing the step (8), otherwise, adding 1 to the current iteration times, and executing the step (4);
(8) and outputting the reconstructed video.
2. The low-rank model-based compressed perceptual video reconstruction method of claim 1, wherein: and (3) in the step (2a), the boundary of the black area and the white area of each artificial black-white image of the artificial black-white images passes through the center coordinates (33,33), and the angles of the 18 artificial black-white image boundaries are uniformly sampled from 0 to 180 degrees.
3. The low-rank model-based compressed perceptual video reconstruction method of claim 1, wherein: the principal component analysis PCA decomposition steps in step (2c) are as follows:
step 1, selecting one direction from all directions of the artificial black-and-white image, and solving a covariance matrix of a direction block set in the selected direction according to the following formula:
<math> <mrow> <mi>P</mi> <mo>=</mo> <mi>E</mi> <mrow> <mo>&lsqb;</mo> <mrow> <msub> <mi>f</mi> <mi>i</mi> </msub> <msup> <msub> <mi>f</mi> <mi>i</mi> </msub> <mi>T</mi> </msup> </mrow> <mo>&rsqb;</mo> </mrow> </mrow> </math>
where P represents the covariance matrix of the set of directional blocks for the selected direction, E represents the mathematical expectation, fiAn ith block in the direction block set representing the selected direction, T representing a transpose operation;
and step 2, diagonalizing the covariance matrix according to the following formula to obtain a Principal Component Analysis (PCA) orthogonal basis and an eigenvalue matrix:
P=BDBT
wherein, P represents the covariance matrix of the direction block set of the selected direction, B represents the principal component analysis PCA orthogonal basis of the selected direction, D represents the eigenvalue matrix of the direction, and T represents the transposition operation.
4. The low-rank model-based compressed perceptual video reconstruction method of claim 1, wherein: the formula for calculating the single-frame covariance matrix in the direction represented by each artificial black-and-white image in step (2d) is as follows:
Pk=BkDkBk T
wherein, PkRepresents the covariance matrix of the single frame in the k-th direction, BkDenotes a direction base in the k-th direction, DkAnd c, representing a characteristic value matrix in the k-th direction, T representing a transposition operation, k representing a direction number represented by the artificial black-and-white image, and k being 1, 2.
CN201510523631.6A 2015-08-24 2015-08-24 Compressed sensing video reconstruction method based on low-rank model Active CN105160664B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510523631.6A CN105160664B (en) 2015-08-24 2015-08-24 Compressed sensing video reconstruction method based on low-rank model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510523631.6A CN105160664B (en) 2015-08-24 2015-08-24 Compressed sensing video reconstruction method based on low-rank model

Publications (2)

Publication Number Publication Date
CN105160664A true CN105160664A (en) 2015-12-16
CN105160664B CN105160664B (en) 2017-10-24

Family

ID=54801506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510523631.6A Active CN105160664B (en) 2015-08-24 2015-08-24 Compressed sensing video reconstruction method based on low-rank model

Country Status (1)

Country Link
CN (1) CN105160664B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108881911A (en) * 2018-06-26 2018-11-23 电子科技大学 A kind of contexts restoration methods for compressed sensing backsight frequency data stream
WO2021229320A1 (en) * 2020-05-15 2021-11-18 International Business Machines Corporation Matrix sketching using analog crossbar architectures

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722892A (en) * 2012-06-13 2012-10-10 西安电子科技大学 SAR (synthetic aperture radar) image change detection method based on low-rank matrix factorization
CN102821228A (en) * 2012-07-16 2012-12-12 西安电子科技大学 Low-rank video background reconstructing method
US8989465B2 (en) * 2012-01-17 2015-03-24 Mayo Foundation For Medical Education And Research System and method for medical image reconstruction and image series denoising using local low rank promotion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8989465B2 (en) * 2012-01-17 2015-03-24 Mayo Foundation For Medical Education And Research System and method for medical image reconstruction and image series denoising using local low rank promotion
CN102722892A (en) * 2012-06-13 2012-10-10 西安电子科技大学 SAR (synthetic aperture radar) image change detection method based on low-rank matrix factorization
CN102821228A (en) * 2012-07-16 2012-12-12 西安电子科技大学 Low-rank video background reconstructing method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘芳 等: "结构化压缩感知研究进展", 《自动化学报》 *
王蓉芳等: "利用纹理信息的图像分块自适应压缩感知", 《电子学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108881911A (en) * 2018-06-26 2018-11-23 电子科技大学 A kind of contexts restoration methods for compressed sensing backsight frequency data stream
CN108881911B (en) * 2018-06-26 2020-07-10 电子科技大学 Foreground and background recovery method for compressed sensing rear video data stream
WO2021229320A1 (en) * 2020-05-15 2021-11-18 International Business Machines Corporation Matrix sketching using analog crossbar architectures
US11520855B2 (en) 2020-05-15 2022-12-06 International Business Machines Corportation Matrix sketching using analog crossbar architectures
GB2610758A (en) * 2020-05-15 2023-03-15 Ibm Matrix Sketching using analog crossbar architectures

Also Published As

Publication number Publication date
CN105160664B (en) 2017-10-24

Similar Documents

Publication Publication Date Title
Chen et al. Denoising hyperspectral image with non-iid noise structure
CN110119780B (en) Hyper-spectral image super-resolution reconstruction method based on generation countermeasure network
CN113362223B (en) Image super-resolution reconstruction method based on attention mechanism and two-channel network
CN106952317B (en) Hyperspectral image reconstruction method based on structure sparsity
CN112288627B (en) Recognition-oriented low-resolution face image super-resolution method
CN104159003A (en) Method and system of video denoising based on 3D cooperative filtering and low-rank matrix reconstruction
CN102156995A (en) Video movement foreground dividing method in moving camera
CN114463218B (en) Video deblurring method based on event data driving
CN104091320B (en) Based on the noise face super-resolution reconstruction method that data-driven local feature is changed
CN105513033A (en) Super-resolution reconstruction method based on non-local simultaneous sparse representation
CN109615576B (en) Single-frame image super-resolution reconstruction method based on cascade regression basis learning
Liu et al. Infrared image super resolution using gan with infrared image prior
CN111931722B (en) Correlated filtering tracking method combining color ratio characteristics
CN107609571A (en) A kind of adaptive target tracking method based on LARK features
CN103971354A (en) Method for reconstructing low-resolution infrared image into high-resolution infrared image
CN114612305B (en) Event-driven video super-resolution method based on stereogram modeling
CN115861076A (en) Unsupervised hyperspectral image super-resolution method based on matrix decomposition network
CN117252936A (en) Infrared image colorization method and system adapting to multiple training strategies
CN110415169A (en) A kind of depth map super resolution ratio reconstruction method, system and electronic equipment
Jia et al. Dual-complementary convolution network for remote-sensing image denoising
CN102510437B (en) Method for detecting background of video image based on distribution of red, green and blue (RGB) components
CN105160664B (en) Compressed sensing video reconstruction method based on low-rank model
Liu et al. LG-DBNet: Local and Global Dual-Branch Network for SAR Image Denoising
CN105427351B (en) Compression of hyperspectral images cognitive method based on manifold structure sparse prior
Chen et al. Depth map inpainting via sparse distortion model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant