CN103596010B

CN103596010B - Video coding and decoding system based on dictionary learning and compressed sensing

Info

Publication number: CN103596010B
Application number: CN201310589803.0A
Authority: CN
Inventors: 郭继昌; 金卯亨嘉; 申燊; 许颖; 孙骏
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2013-11-20
Filing date: 2013-11-20
Publication date: 2017-01-11
Anticipated expiration: 2033-11-20
Also published as: CN103596010A

Abstract

The invention relates to the field of video compressed sensing and image sparse representation, and discloses a video coding and decoding system based on compressed sensing. The video coding and decoding system based on compressed sensing is designed to make a wireless video sensing network have the advantages that the complexity and calculated amount of a coding terminal are small, the volume of data transmitted through a channel is small and a decoding terminal can carry out high-quality real-time video reconstruction. According to the technical scheme, the video coding and decoding system based on dictionary learning and compressed sensing mainly comprises the video coding terminal and the video decoding terminal, wherein the coding terminal is used for temporarily storing image pixel data of K frames, reducing the dimensionality of the image pixel data of the K frames and transmitting data after dimensionality reduction to the decoding terminal through a wireless transmitting module according to the compressed sensing theory, and the decoding terminal is used for decoding the K frames according to the compressed sensing reconstruction algorithm (namely, the improved NSL0 method), storing the K frames and finally forming a video through integration according to frame sequences and outputting the video. The video coding and decoding system based on compressed sensing is mainly applied to video compressed sensing and transmission.

Description

Compressed sensing video coding and decoding system based on dictionary learning

Technical field

The present invention relates to video compress perception and image sparse represents field, particularly relate to compressed sensing video based on dictionary learning Coding/decoding system.

Background technology

Present invention is generally directed to the field of video applications that some coding sides are resource-constrained, such as video monitoring, wireless video sensing network Deng.The equipment used because of it and the limitation of environment, this application require low complex degree, low-power consumption coding side with ensures grow Phase steady operation, and receiving terminal can carry out the storage of substantial amounts of data and complicated decoding calculates.

But, the most H.26X series or the conventional video coding techniques of MPEG series, all use coding side complexity, decoding Hold simple system structure, i.e. coding side by inter prediction, infra-frame prediction and discrete cosine transform (DCT) remove the time and Spatial redundancy, to obtain high compression efficiency, this makes whole system to the requirement of the computing capability of encoder and memory size very Height, far above decoder.Therefore, traditional Video coding mode is not suitable for above-mentioned field.

Compressed sensing (CS) is a kind of emerging theory that be born in signal processing field in recent years.Same at signal acquisition of this theory Time data are compressed, its frequency be far below Nyquist sampling frequency, so can reduce the information data of sampling, joint Enough information is included again while saving memory space.When needs recover primary signal, suitable restructing algorithm is used to carry out Reduction, thus recover enough data.Traditional data acquisition and compression are united two into one by compressive sensing theory, it is not necessary to complicated Data encoding calculate, be especially suitable for being used in the occasion that coding side is resource-constrained.

Summary of the invention

Present invention seek to address that and overcome the deficiencies in the prior art, design a compressed sensing video for wireless video sensing network Coding/decoding system so that it is having coding side complexity low little with amount of calculation, channel transmission data amount is few, and decoding end can carry out height The features such as quality real-time video reconstruct.To this end, the technical solution used in the present invention is, compressed sensing video based on dictionary learning Coding/decoding system, mainly includes Video coding end and decoding end two parts:

Coding side: according to reconstruction accuracy and the requirement of real-time, the frame in video will be divided into two classes, and a class is key frame (K frame), Another kind of for non-key frame (CS frame), every two frames compositions one group, i.e. image sets (GOP) are 2, and odd-numbered frame is K frame, The CS frame for this group following closely；For K frame, according to compressive sensing theory, the image pixel data of K frame is carried out temporarily Storage, then carries out dimensionality reduction by observing matrix Φ, by wireless transmitter module, the data after dimensionality reduction is transferred to decoding end；Pin To CS frame, after reading in image pixel data, carry out difference, i.e. dv=Xcs-Xk with former frame K frame, and judge dv's Square mean error amount (MSE), if MSE is less than bottom threshold, then judges that this two frame is closely similar, sends a 1bit signalisation This CS frame of decoding end, without reconstruct, directly uses former frame K frame reconstruction result as its reconstruction result；If MSE is more than in threshold value Data after dimensionality reduction, by dv by observing matrix Φ dimensionality reduction, are sent to decoding end by limit, send 1bit signalisation solution simultaneously Code end carries out dictionary learning after completing the reconstruct of this CS frame；If MSE is in threshold range, then directly dv is passed through observing matrix Φ dimensionality reduction also sends；

In decoding end, K frame out and stores through the most follow-on modified newton method of compressed sensing restructing algorithm (NSL0) decoding, If coding side transmits the signal updating dictionary, then carry out the dictionary of sparse matrix more according to K-singular value decomposition algorithm (K-SVD) Newly；For CS frame, coefficient matrix and observing matrix that use K frame is updated carry out NSL0 compression reconfiguration, the knot that will reconstruct Fruit is added the reconstruct obtaining CS frame with the reconstruction result of former frame K frame, is finally integrated into video according to frame sequence and exports.

Observing matrix uses the gaussian random matrix of piecemeal.

Compressive sensing theory specifically refers to, and uses K-SVD dictionary learning method to generate sparse dictionary, and initial sparse dictionary is set as Global Dictionary, i.e. uses the picture training of scene dictionary out residing for photographic head.

The technical characterstic of the present invention and effect:

The present invention uses compressed sensing to carry out using the encoding and decoding of wireless video sensing network, is moved on to from coding side by computation complexity Decoding end.

Use difference partition method and block-based observing matrix, on the premise of ensureing reconstruction accuracy, effectively reduce the biography of CS frame Transmission of data amount and reconstitution time.

Use Global Dictionary is as initial dictionary, and updates dictionary by dictionary learning timing, is not affecting video reconstruction real-time On the premise of, it is effectively improved reconstruction accuracy.

Accompanying drawing explanation

The hardware structure diagram of Fig. 1 present invention.

The compressed sensing video coding and decoding system block diagram based on dictionary learning of Fig. 2 present invention.

Dictionary learning algorithm flow chart in Fig. 3 present invention.

Detailed description of the invention

In order to achieve the above object, the present invention uses compressed sensing based on dictionary learning to complete whole video coding and decoding system.Main Video coding end to be included and decoding end two parts.

At coding side, according to reconstruction accuracy and the requirement of real-time, the frame in video will be divided into two classes, and a class is key frame (K Frame), another kind of for non-key frame (CS frame).It is 2 that every two frames form a group, i.e. image sets (GOP), and odd-numbered frame is K Frame, the CS frame for this group following closely.For K frame, according to compressive sensing theory, the image pixel data of K frame is carried out Interim storage, then carries out dimensionality reduction by observing matrix Φ, by wireless transmitter module, the data after dimensionality reduction is transferred to decoding end. For CS frame, after reading in image pixel data, carry out difference, i.e. dv=Xcs-Xk with former frame K frame, and judge dv Square mean error amount (MSE), if MSE less than bottom threshold, then judges that this two frame is closely similar, sends one 1bit signal logical Know that this CS frame of decoding end, without reconstruct, can be used directly former frame K frame reconstruction result as its reconstruction result；If MSE is more than threshold The value upper limit, illustrates that this two frame has a long way to go, and photographed scene there occurs bigger change, therefore should update dictionary new to adapt to Data after dimensionality reduction, therefore by dv by observing matrix Φ dimensionality reduction, are sent to decoding end by scene, send 1bit signal simultaneously Notice decoding end carries out dictionary learning after completing the reconstruct of this CS frame；If MSE is in threshold range, then directly pass through to see by dv Survey matrix Φ dimensionality reduction and send.

In decoding end, K frame out and stores through the follow-on modified newton method of compressed sensing restructing algorithm (NSL0) decoding. If coding side transmits the signal updating dictionary, then carry out sparse matrix according to K-singular value decomposition algorithm (K-SVD) algorithm Dictionary updating.For CS frame, coefficient matrix and observing matrix that use K frame is updated carry out NSL0 compression reconfiguration, will reconstruct The result gone out is added the reconstruct obtaining CS frame with the reconstruction result of former frame K frame.Finally it is integrated into video according to frame sequence and exports.

Here, observing matrix uses the gaussian random matrix of piecemeal, can effectively reduce the data volume transmitting observing matrix for the first time, Reconstitution time is reduced, it is ensured that real-time in the case of not affecting reconstruction accuracy.

Compressive sensing theory use premise be signal be can be sparse in Ψ territory, i.e. signal can be right with it with a sparse dictionary Ψ The sparse coefficient answered represents, and the nonzero term number in sparse coefficient is less than coefficient degree K.The quality of sparse dictionary determines signal Reconstruction accuracy.In the present system, K-SVD dictionary learning method is used to generate sparse dictionary, because it has adaptivity, therefore Compared to dct transform or wavelet transformation, quality reconstruction is more preferable.Initial sparse dictionary is set as Global Dictionary, i.e. uses shooting The picture training of the residing scene of head dictionary out.Such initial dictionary can be more sparse expression image, from the basis of ensure High reconstruction accuracy.

The present invention devises a compressed sensing video coding and decoding system for wireless video sensing network so that it is have coding side Complexity is low and amount of calculation is little, and channel transmission data amount is few, and decoding end can carry out the features such as high-quality real-time video reconstruct.

In order to achieve the above object, the present invention uses compressed sensing based on dictionary learning to complete whole video coding and decoding system.System System mainly includes Video coding end and decoding end two parts.

The present invention will be described in more detail below in conjunction with the accompanying drawings.

Fig. 1 show the hardware block diagram of native system.It is made up of following components: digital camera, DSP video Coding module, wireless transmitter module, wireless receiving module and PC Video decoding module.Digital camera, DSP Video coding mould Block and wireless transmitter module constitute system coding end, wireless receiving module and PC Video decoding module and constitute system decoding end. At coding side, digital camera is connected to DSP video compressing module by multiplexing 32 data line, is passed by the video data gathered Enter and module is carried out store and encoding operation；Afterwards the data after coding are transmitted to decoding end by wireless sending module.Decoding The wireless receiving module of end receives coded data, is transferred to PC, is decoded operation, and finally output reconstructing video stream.

Fig. 2 show compressed sensing video coding and decoding system block diagram based on dictionary learning.Assume that each two field picture of input video is equal For Nr × Nc dimensional signal, an image sets is made up of (GOP=2) two frames, and the first frame is K frame Xk, and the second frame is CS frame Xcs. Whole encoding-decoding process is described below as a example by an image sets.

At coding side, first read in Xk, and store temporarily.Afterwards by observing matrix Φ dimensionality reduction be M × Nc dimension letter Number Yk, as shown in Equation (1), wherein Φ is M × Nr block Gauss random matrix, M < < Nr.View data quilt after Φ Significantly compress, finally the Yk obtained is sent to coding side.So far, K frame has encoded.

Y=ΦX (1)

Read in Xcs afterwards, it is carried out calculus of differences, i.e. dv=Xcs-Xk with the interim Xk stored, obtain after dv it Carry out the calculating of square mean error amount (MSE), and compare with the threshold value preset.If MSE is less than bottom threshold Sl, then Judging that this two frame is closely similar, coding side is not required to this frame is carried out any encoding operation, only need to send a 1bit control signal " 0 ", Notice this CS frame of decoding end, without reconstruct, can be used directly K frame reconstruction result as its reconstruction result；If MSE is more than in threshold value Limit Sh, then carry out dimensionality reduction according to formula (1) by observing matrix Φ by dv, and data Ydv after dimensionality reduction be sent to decoding End, sends 1bit control signal " 1 " simultaneously, and notice decoding end will carry out dictionary learning after this CS frame has reconstructed；If MSE In threshold range, then directly dv by observing matrix Φ dimensionality reduction and is sent Ydv according to formula (1).

Decoding end is started working after receiving Yk.Being compressed sensing reconstructing according to formula (2), wherein Φ keeps with coding side Unanimously, Ψ is Nr × Nr dimension sparse matrix (dictionary), by reconstructStore.

min||S||_oS.t.Y=Φ ψ s (2)

Start afterwards to decode CS frame.First look at and whether receive 1bit control signal.If control signal is " 0 ", then Directly it is sequentially output K frame and CS frame reconstruction result.If without control signal, then according to formula (2), Ydv is compressed perception Reconstruct, because having carried out Difference Calculation at coding side, soIf control signal is " 1 ", then first pass through above-mentioned Reconstructing method reconstructsUse againAs training signal, update dictionary by K-SVD dictionary learning so that it is more adapt to become Scene after change.

Here dictionary learning method uses K-SVD algorithm as shown in Figure 3.First carry out according to initial dictionary and training signal Sparse coding, i.e. fixes dictionary, with this dictionary, data-oriented is carried out rarefaction representation and (i.e. approximates as far as possible with the fewest coefficient Earth's surface registration evidence), obtain coefficient matrix α.Fixed coefficient matrix afterwards, updates each dictionary atom (every string of dictionary) successively, Make its closer expression training signal.So iteration is repeatedly, can complete dictionary learning, obtain being more suitable for the sparse of new scene Matrix Ψ.This matrix will be used for the frame reconstruct of next GOP.

Finally, the reconstruction result of K frame and CS frame is pressed frame sequence and frame per second output, forms outputting video streams.

Wherein, observing matrix Φ uses the gaussian random matrix of piecemeal.If block size is 8 × 8, then coding side generates one 8 × 8 Gaussian random matrix Φ 0, use Φ 0 to generate the diagonal matrix of M × Nr afterwards.Diagonal is by the individual Φ in (M × Nr)/(8 × 8) 0 group Become.The gaussian random matrix of this piecemeal can effectively reduce the data volume transmitting observing matrix for the first time, is not affecting reconstruction accuracy In the case of reduce reconstitution time, it is ensured that real-time.Reconstruct is a link during compressed sensing, and restructing algorithm is reconstruct The concrete grammar used.The restructing algorithm that the present invention uses is NSL0 algorithm, and NSL0 is follow-on modified newton method, is Experiments verify that, in existing compressed sensing restructing algorithm, the algorithm that effect is optimum, because it has, reconstruction accuracy is high and reconstruct is taken Between short feature, meet native system high reconstruction accuracy and the requirement of real-time.

Claims

1. a compressed sensing video coding and decoding system based on dictionary learning, is characterized in that, mainly include Video coding end and decoding end Two parts:

Coding side: according to reconstruction accuracy and the requirement of real-time, the frame in video will be divided into two classes, and a class is key frame K Frame, another kind of for non-key frame CS frame, every two frames compositions one group, i.e. image sets GOP are 2, and odd-numbered frame is K frame, The CS frame for this group following closely；For K frame, according to compressive sensing theory, the image pixel data of K frame is carried out Interim storage, then carries out dimensionality reduction by observing matrix Φ, is transferred to solve by wireless transmitter module by the data after dimensionality reduction Code end；For CS frame, after reading in image pixel data, carry out difference, i.e. dv=Xcs-Xk with former frame K frame, And judge the square mean error amount (MSE) of dv, if MSE is less than bottom threshold, then judge that this two frame is closely similar, send one Individual this CS frame of 1bit signalisation decoding end, without reconstruct, directly uses former frame K frame reconstruction result as its reconstruction result； If MSE is more than upper threshold, by dv by observing matrix Φ dimensionality reduction, the data after dimensionality reduction are sent to decoding end, simultaneously Send 1bit signalisation decoding end and carry out dictionary learning after completing the reconstruct of this CS frame；If MSE is in threshold range, then Directly dv by observing matrix Φ dimensionality reduction and is sent；

In decoding end, K frame decodes out through compressed sensing restructing algorithm NSL0 and stores, if coding side transmits renewal word The signal of allusion quotation, then carry out the dictionary updating of sparse matrix according to K-singular value decomposition algorithm (K-SVD)；For CS frame, make The coefficient matrix updated with K frame and observing matrix carry out NSL0 compression reconfiguration, result and the former frame K frame that will reconstruct Reconstruction result be added and obtain the reconstruct of CS frame, be finally integrated into video according to frame sequence and export；

Compressive sensing theory specifically refers to, and uses K-SVD dictionary learning method to generate sparse dictionary, and initial sparse dictionary sets For Global Dictionary, i.e. use the picture training of scene dictionary out residing for photographic head.

2. compressed sensing video coding and decoding system based on dictionary learning as claimed in claim 1, is characterized in that, observing matrix uses The gaussian random matrix of piecemeal.