CN104301728A

CN104301728A - Compressed video capture and reconstruction system based on structured sparse dictionary learning

Info

Publication number: CN104301728A
Application number: CN201410545458.5A
Authority: CN
Inventors: 熊红凯; 李勇
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2014-10-15
Filing date: 2014-10-15
Publication date: 2015-01-21
Anticipated expiration: 2034-10-15
Also published as: CN104301728B

Abstract

The invention provides a compressed video capture and reconstruction system based on structured sparse dictionary learning. The system comprises a structured sparse dictionary learning module, a video signal sensing module and a reconstruction processing module. The structured sparse dictionary learning module firstly acquires a training set through a sub-space clustering method, then, a dictionary is acquired through a linear sub-space learning method and minimized-block-relevant block sparse dictionary learning method, the sensing module projects video signals in an image block mode, and acquired data are finally decoded and reconstructed in the reconstruction processing module. Compressed sampling is provided, the distributed progressive structure of the video sampling process is combined, the reconstruction accuracy and efficiency are improved for the special structure of a structured sparse dictionary matrix, the sampling efficiency of the video signals is greatly improved, reconstruction gains are acquired compared with other methods under different sampling compression ratios, and meanwhile the good expandability is achieved.

Description

The compressed video collection of structure based sparse dictionary study and reconfiguration system

Technical field

The present invention relates to a kind of vision signal and obtain scheme, the compressed video collection of specifically a kind of structure based sparse dictionary study and reconfiguration system.

Background technology

The collection of vision signal and coding (compression) are most important for application such as the storage of video and transmission.Traditional signal processing system adopts the pattern of recompression of first sampling: in order to intactly preserve all information of signal, should sample with the twice sample frequency being not less than signal bandwidth to video; The primary signal collected removes the object of redundancy by reaching after a series of coding techniques, the bottleneck of correlation technique is to take a large amount of transducers and computational resource just in order to obtain a small amount of Signal Compression data after process, too high to the resource requirement of sampling end.In order to improve the collecting efficiency of vision signal further, while sampling, add some signal processing technologies, wherein a kind of scheme is then sampling and compression are carried out simultaneously, is then reconstructed the data after compression by some algorithms of rear end.

Through finding the literature search of prior art, the signal sampling theory based on subspace collection is proposed in " A Theory for Sampling Signals From a Union of Subspaces " literary composition that Yue M.Lu and Minh N.Do delivers on " IEEE Transactions on Signal Processing " (TSP) periodical in 2008, this theory gives the condition of uniqueness that the sampling for the signal being in subspace collection will meet and stability, but the subspace collection that this theory is supposed is opened into by fixed base, more effectively openness and adaptability can not be provided.Proposing in " Union of Data-driven Subspaces via Subspace Clustering for Compressive Video Sampling " literary composition that Y.Li and H.Xiong delivers in " IEEE Data Compression Conference " (IEEE DCC) meeting in 2014 is applied on video sampling based on data-driven subspace collection model by compressed sensing, the method directly carries out compression sampling to vision signal at sample code end, UoDS base is used to be reconstructed signal as sparse base in decoding end, this method can carry out rarefaction representation to ensure to reconstruct the subjective quality of the video obtained to signal flexibly effectively, but the plyability between each sub spaces do not considered by the UoDS base that this method uses, show that the correlation of interblock is high so that the block that can not obtain compact conformation is openness, and then cause effect to reduce.These deficiencies impel we go on its basis find one more efficient, flexible ground sparse base go to be reconstructed, make full use of the special construction of vision signal block to improve the subjective and objective quality of reconstruction result.

Summary of the invention

The present invention is directed to the deficiencies in the prior art, provide compressed video collection and the reconfiguration system of the study of a kind of structure based sparse dictionary, effectively can improve the subjective and objective quality of video signal collective efficiency and reconfiguration system, and can be used as a kind of general video acquisition instrument.

The present invention is achieved by the following technical solutions:

The compressed video collection of structure based sparse dictionary study of the present invention and reconfiguration system, comprising: structural sparse dictionary learning module, vision signal sensing module and reconstruction processing module, wherein:

Described structural sparse dictionary learning module, to vision signal key frame block, utilizes structural sparse dictionary learning method to obtain structural sparse basic matrix, and this sparse basis array is outputted to the input of reconstruction processing module;

The non-key frame block of described vision signal sensing module to vision signal projects with the form of block, obtains measured value, and this measured value is outputted to the input of reconstruction processing module;

Described reconstruction processing module receives the structural sparse basic matrix and the measured value that exports of described vision signal sensing module that described structural sparse dictionary learning module exports, and is reconstructed signal.

Described structural sparse dictionary learning module, realize utilizing the method for subspace clustering to obtain training set by the set of blocks of the key frame in reconstruct, the basic matrix that linear subspaces learn to obtain corresponding to each sub spaces is carried out respectively to each class, then structural sparse dictionary learning is used to reduce block correlation between each sub spaces to obtain more compact structure rarefaction representation, the structural sparse base generated can the adaptive immanent structure indicating signal, can more effectively rarefaction representation vision signal relative to fixed base.

Described sensing module is a kind of digital micromirror device (DMD) of single order, and it simulates the compressed sensing to vision signal, samples to video non-key frame block.

Described reconstruction processing module, by the convex relaxed algorithm model realization of a kind of structural sparse, is reconstructed the block of video non-key frame.

The compressed sensing technology that the structure based sparse dictionary adopted in the present invention learns is that the collection of vision signal provides general solution.Structural sparse basic matrix used in the present invention is by adopting the method for structural sparse dictionary learning to obtain in the key frame of reconstruct, take full advantage of the unique texture of frame of video block, reduce the overlap of each sub spaces, frame block signal can be made like this to have the adaptive rarefaction representation of compact structure more, and then improve sampling efficiency (reducing the necessary hits needed for Accurate Reconstruction), contribute to the performance of compressed sensing and the lifting of practicality of structural sparse dictionary learning of the present invention.

Compared with prior art, the present invention has following beneficial effect:

The present invention substantially increases reconstruction property, compared with the video compression sensor-based system be reconstructed with traditional use fixed base or UoDS base, due to of the present invention reconstruct adopt be adaptive global optimum sparse gene this all can be enhanced on quality reconstruction; For other multidimensional signal, the present invention also can be used by suitable amendment, has stronger adaptability; When rebuilding due to the special tectonic of training set and the structuring dictionary learning considering reduction block correlation, signal is made to have more compact structure rarefaction representation, therefore the present invention can improve sampling efficiency further when not reducing the subjective effect of video, accelerate the convergence rate of convex lax restructing algorithm simultaneously, under different Sampling Compression rates, compare additive method achieve reconstruct gain, also possess good extensibility simultaneously.

Accompanying drawing explanation

By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:

Fig. 1 is the structured flowchart of present system one embodiment;

Fig. 2 is the fundamental diagram of structural sparse dictionary learning module;

Fig. 3 is the structural rarefaction representation schematic diagram that structural sparse dictionary learning module produces frame of video block signal.

Embodiment

Below in conjunction with specific embodiment, the present invention is described in detail.Following examples will contribute to those skilled in the art and understand the present invention further, but not limit the present invention in any form.It should be pointed out that to those skilled in the art, without departing from the inventive concept of the premise, some distortion and improvement can also be made.These all belong to protection scope of the present invention.

As shown in Figure 1, the structured flowchart of one embodiment of the invention, comprise: structural sparse dictionary learning module, vision signal sensing module, reconstruction processing module, wherein: structural sparse dictionary learning module utilizes structural sparse dictionary learning method to generate sparse basis array, sensing module carries out compression projection to vision signal with the form of block, the measured value finally decoded reconstruct in reconstruction processing module of gained.In coding side, vision signal sensing module carries out sampling to vision signal and produces measured value; In decoding end, structural sparse dictionary learning module produces sparse basis array; Enter reconstruction processing module together with the measured value that the sparse basis array that described structural sparse dictionary learning module exports and described vision signal sensing module export, in reconstruction processing module, signal is reconstructed.

In the present embodiment, described structural sparse dictionary learning module as shown in Figure 2, does block cluster, wherein: the set of blocks X={x in key frame in the key frame that view picture is rebuild ₁, x ₂..., x _k, utilize sparse Subspace clustering method or block matching method that X is divided into t cluster X ₁, X ₂..., X _t, the block in each cluster is similar and belongs to a sub spaces.X ₁, X ₂..., X _tcorresponding to t sub spaces S ₁, S ₂..., S _t, so arbitrary frame of video block signal x belongs to subspace collection U=∪ S _i, a kind of orthonormal basis that the realization of training set is generated by linear subspaces learning method, linear subspaces learning method (as principal component analysis (PCA)) individually acts on different block group X _i, i=1 ..., t obtains different base Ψ _i, i=1 ..., t, and then composition sparse basis array Ψ ^*=[Ψ ₁, Ψ ₂..., Ψ _t].But the overlap due to subspace causes block openness compact not, and show the higher block degree of correlation, the block degree of correlation is defined as simultaneously || || _ffor F norm, for Ψ _itransposition.Openness in order to obtain compacter block, at training set Ψ in the present embodiment ^*basis on adopt structural sparse dictionary learning model:

Ψ_{i}^{'} = \min_{{Ψ_{i}, C_{i}}_{i = 1, . . ., t}} Σ_{i = 1}^{t} {{| | X_{i} - Ψ_{i} C_{i} | |}_{2}^{2} + λ Σ_{j = 1}^{m_{i}} {| | c_{i}^{j} | |}_{1}} + η \underset{i &NotEqual; j}{Σ} {| | Ψ_{i}^{T} Ψ_{j} | |}_{F}^{2} - - - (1)

Obtain sparse basis array

D = [Ψ_{1}^{,}, . . ., Ψ_{t}^{,}],

Wherein here

C_{i} = [c_{1}, . . ., c_{m_{i}}],

Each column vector be in i-th group jth ∈ [1 ..., m _i] individual signal rarefaction representation vector, η, λ are the customized parameter of span in (0,1).This sparse basis array adaptively can indicate the immanent structure of frame of video block signal, can more effectively rarefaction representation vision signal relative to fixed base, and the rarefaction representation c of signal in this sparse basis array ^*that there is block structured, as shown in Figure 3.

In the present embodiment, described vision signal sensing module is a kind of digital micro mirror projection equipment (DMD) of single order, and it simulates the compressed sensing y=Φ x to vision signal, and Φ is stochastical sampling matrix.First this invention carries out compression sampling to key frame block, and sample rate is 0.7, then carries out compression sampling to non-key frame block signal, sample rate be chosen between 0.1 to 0.4, the sampling based on frame of video block improves the speed of video sampling and reconstruct.

In the present embodiment, described reconstruction processing module, by a kind of convex relaxed algorithm model realization, is specially: for key frame, finds l ₁the expression vector c of Norm minimum makes y=Φ Ψ c, and wherein Ψ is two-dimensional dct base, and what obtain is a globally optimal solution, is multiplied by with two-dimensional dct base Ψ the key frame block signal that this globally optimal solution just can obtain required reconstruct; For non-key frame, find l _{2, I}the c of Norm minimum ^*make y=Φ Dc ^*, what obtain is a globally optimal solution, is multiplied by with D the non-key frame block signal that this globally optimal solution just can obtain required reconstruct.Wherein, Φ is stochastical sampling matrix, l _{2, I}norm is mixing norm, i is the subscript of block group in block structure, as Fig. 3.

Implementation result

Being set to of key parameter in the present embodiment: experiment video sequence derives from Akiyo_cif.yuv (the YUV file of the 4:2:0 form of 352x288), altogether gets 300 frames.Every ten frames are a frame group, and choosing the first frame is key frame, and rear nine frames are non-key frame, and the selection of dimension of block is 16 × 16 pixels.Gray-scale map due to signal has concentrated most energy, and test mainly completes on gray-scale map.The present embodiment compares the methods of people in " Union of Data-driven Subspaces via Subspace Clustering for Compressive Video Sampling " such as the method for the people such as the method for the compressed sensing adopting structure based sparse dictionary of the present invention study and Yue M.Lu in " A Theory for Sampling Signals From a Union of Subspaces " paper and Y.Li.The dimension of the present invention's every sub spaces used is 10, and the number of the subspace that cluster produces is 50.

First two method is compared with it, and when compression ratio is 0.1, the present embodiment system obtains 0.8dB respectively, the reconstruct gain of 3.1dB; When compression ratio is 0.2, the present embodiment system obtains 5.8dB respectively, the reconstruct gain of 3.3dB; When compression ratio is 0.3, the present embodiment system obtains 5.1dB respectively, the reconstruct gain of 0.9dB; When compression ratio is 0.4, the present embodiment system obtains 5.9dB respectively, the reconstruct gain of 0.1dB;

Experiment shows, the present embodiment system reconstructing video sequence out is obviously better than the video sequence that other two kinds of methods obtain on reconstruction quality.

Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.

Claims

1. structure based sparse dictionary study compressed video collection and a reconfiguration system, it is characterized in that, comprising: structural sparse dictionary learning module, vision signal sensing module and reconstruction processing module, wherein:

Described structural sparse dictionary learning module, to vision signal key frame block, utilizes structural sparse dictionary learning method to generate corresponding structural sparse basic matrix, and this sparse basis array is outputted to the input of reconstruction processing module;

Described vision signal sensing module projects with the form of block to the non-key frame of vision signal, obtains measured value, and this measured value is outputted to the input of reconstruction processing module;

Described reconstruction processing module receives the sparse basis array and the measured value that exports of described vision signal sensing module that described structural sparse dictionary learning module exports, and is reconstructed vision signal.

2. the compressed video collection of structure based sparse dictionary according to claim 1 study and reconfiguration system, it is characterized in that, described structural sparse dictionary learning module, realizes utilizing the method for subspace clustering to obtain training set for generating structured sparse basis array by the set of blocks of the key frame in reconstruct.

3. the compressed video collection of structure based sparse dictionary according to claim 2 study and reconfiguration system, it is characterized in that, described structural sparse dictionary learning module realizes a kind of structural sparse base generated by structural sparse learning method, this sparse base can the adaptive immanent structure indicating signal, can more effectively rarefaction representation vision signal relative to fixed base.

4. the compressed video collection of the structure based sparse dictionary study according to any one of claim 1-3 and reconfiguration system, it is characterized in that, described structural sparse dictionary learning module, to realize by structural sparse learning method for reducing the block correlation between each sub spaces to obtain more compact structure rarefaction representation.

5. the compressed video collection of the structure based sparse dictionary study according to any one of claim 1-3 and reconfiguration system, it is characterized in that, described vision signal sensing module is a kind of digital micromirror device of single order, and it simulates the compressed sensing to vision signal.

6. the compressed video collection of the structure based sparse dictionary study according to any one of claim 1-3 and reconfiguration system, it is characterized in that, described reconstruction processing module passes through a kind of convex relaxed algorithm model realization, and it is exactly the reconstruction signal that will obtain that the globally optimal solution found is multiplied by sparse base.

7. the compressed video collection of structure based sparse dictionary according to claim 6 study and reconfiguration system, it is characterized in that, described reconstruction processing module utilizes a kind of piece of sparse constraint to obtain having the openness expression vector of block for reconstruction signal.