CN103596010A

CN103596010A - Video coding and decoding system based on dictionary learning and compressed sensing

Info

Publication number: CN103596010A
Application number: CN201310589803.0A
Authority: CN
Inventors: 郭继昌; 金卯亨嘉; 申燊; 许颖; 孙骏
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2013-11-20
Filing date: 2013-11-20
Publication date: 2014-02-19
Anticipated expiration: 2033-11-20
Also published as: CN103596010B

Abstract

The invention relates to the field of video compressed sensing and image sparse representation, and discloses a video coding and decoding system based on compressed sensing. The video coding and decoding system based on compressed sensing is designed to make a wireless video sensing network have the advantages that the complexity and calculated amount of a coding terminal are small, the volume of data transmitted through a channel is small and a decoding terminal can carry out high-quality real-time video reconstruction. According to the technical scheme, the video coding and decoding system based on dictionary learning and compressed sensing mainly comprises the video coding terminal and the video decoding terminal, wherein the coding terminal is used for temporarily storing image pixel data of K frames, reducing the dimensionality of the image pixel data of the K frames and transmitting data after dimensionality reduction to the decoding terminal through a wireless transmitting module according to the compressed sensing theory, and the decoding terminal is used for decoding the K frames according to the compressed sensing reconstruction algorithm (namely, the improved NSL0 method), storing the K frames and finally forming a video through integration according to frame sequences and outputting the video. The video coding and decoding system based on compressed sensing is mainly applied to video compressed sensing and transmission.

Description

Compressed sensing video coding and decoding system based on dictionary learning

Technical field

The present invention relates to video compression perception and image sparse and represent field, relate in particular to the compressed sensing video coding and decoding system based on dictionary learning.

Background technology

The present invention is mainly for the resource-constrained field of video applications of some coding sides, as video monitoring, wireless video sensing network etc.Because of the limitation of equipment and the environment of its use, this application requires the coding side of low complex degree, low-power consumption to guarantee long-term stable operation, and receiving terminal can carry out a large amount of data storages and complicated decoding calculating.

Yet, no matter H.26X series or the conventional video coding techniques of MPEG series, all adopt coding side complexity, the simple system configuration of decoding end, be that coding side passes through inter prediction, infra-frame prediction and discrete cosine transform (DCT) and removes time and spatial redundancy, to obtain high compression efficiency, this makes whole system very high to the requirement of the computing capability of encoder and memory size, far above decoder.Therefore, traditional Video coding mode be not suitable for above-mentioned field.

Compressed sensing (CS) is a kind of emerging theory that signal process field is born in recent years.This theory is compressed data in signal acquisition, and its frequency, far below Nyquist sampling frequency, so can reduce the information data of sampling, includes again enough information when saving memory space.When needs recover primary signal, adopt suitable restructing algorithm to reduce, thereby recover enough data.Compressive sensing theory unites two into one traditional data acquisition and compression, does not need complicated data encoding to calculate, and is applicable to being very much used in the resource-constrained occasion of coding side.

Summary of the invention

The present invention is intended to solution and overcomes the deficiencies in the prior art, design a compressed sensing video coding and decoding system for wireless video sensing network, make it have coding side complexity low little with amount of calculation, channel transmission data amount is few, and decoding end can be carried out the features such as high-quality real-time video reconstruct.For this reason, the technical solution used in the present invention is that the compressed sensing video coding and decoding system based on dictionary learning, mainly comprises Video coding end and decoding end two parts:

Coding side: according to the requirement of reconstruction accuracy and real-time, frame in video will be divided into two classes, one class is key frame (K frame), another kind of is non-key frame (CS frame), every two frames form Yi Ge group, be that image sets (GOP) is 2, odd-numbered frame is K frame, following closely be the CS frame of this group; For K frame, according to compressive sensing theory, the image pixel data of K frame is stored temporarily, then by observing matrix Φ, carry out dimensionality reduction, the data after dimensionality reduction are transferred to decoding end by wireless transmitter module; For CS frame, after reading in image pixel data, carry out difference with former frame K frame, be dv=Xcs-Xk, and judge the square mean error amount (MSE) of dv, if MSE is less than threshold value lower limit, judge that this two frame is closely similar, send this CS frame of 1bit signal notice decoding end without reconstruct, directly use former frame K frame reconstruction result as its reconstruction result; If MSE is greater than upper threshold, dv, by observing matrix Φ dimensionality reduction, is sent to decoding end by the data after dimensionality reduction, send 1bit signal notice decoding end simultaneously and carry out dictionary learning after completing this CS frame reconstruct; If MSE, in threshold range, directly also sends dv by observing matrix Φ dimensionality reduction;

In decoding end, K frame is that follow-on modified newton method (NSL0) decoding out also stores through compressed sensing restructing algorithm, if coding side transmits the signal that upgrades dictionary, according to K-singular value decomposition algorithm (K-SVD), carry out the dictionary updating of sparse matrix; For CS frame, the coefficient matrix and the observing matrix that use K frame to upgrade carry out NSL0 compression reconfiguration, the reconstruction result of the result reconstructing and former frame K frame are added to the reconstruct that obtains CS frame, finally according to frame order, are integrated into video output.

Observing matrix is used the gaussian random matrix of piecemeal.

Compressive sensing theory specifically refers to, adopts K-SVD dictionary learning method to generate sparse dictionary, and initial sparse dictionary is set as Global Dictionary, the picture training of using camera scene of living in dictionary out.

Technical characterstic of the present invention and effect:

The present invention adopts compressed sensing to use the encoding and decoding of wireless video sensing network, and computation complexity has been moved on to decoding end from coding side.

Use difference partition method and block-based observing matrix, guaranteeing, under the prerequisite of reconstruction accuracy, to effectively reduce transmitted data amount and the reconstitution time of CS frame.

Use Global Dictionary as initial dictionary, and regularly upgrade dictionary by dictionary learning, not affecting under the prerequisite of video reconstruction real-time, effectively improve reconstruction accuracy.

Accompanying drawing explanation

Fig. 1 hardware structure diagram of the present invention.

Fig. 2 compressed sensing video coding and decoding system block diagram based on dictionary learning of the present invention.

Dictionary learning algorithm flow chart in Fig. 3 the present invention.

Embodiment

In order to achieve the above object, the present invention adopts the compressed sensing based on dictionary learning to complete whole video coding and decoding system.Mainly comprise Video coding end and decoding end two parts.

At coding side, according to the requirement of reconstruction accuracy and real-time, the frame in video will be divided into two classes, and a class is key frame (K frame), and another kind of is non-key frame (CS frame).Every two frames form Yi Ge groups, and image sets (GOP) is 2, and odd-numbered frame is K frame, following closely be the CS frame of this group.For K frame, according to compressive sensing theory, the image pixel data of K frame is stored temporarily, then by observing matrix Φ, carry out dimensionality reduction, the data after dimensionality reduction are transferred to decoding end by wireless transmitter module.For CS frame, after reading in image pixel data, carry out difference with former frame K frame, be dv=Xcs-Xk, and judge the square mean error amount (MSE) of dv, if MSE is less than threshold value lower limit, judge that this two frame is closely similar, send this CS frame of 1bit signal notice decoding end without reconstruct, can directly use former frame K frame reconstruction result as its reconstruction result; If MSE is greater than upper threshold, illustrate that this two frame has a long way to go, there is larger change in photographed scene, therefore should upgrade dictionary to adapt to new scene, therefore dv is passed through to observing matrix Φ dimensionality reduction, data after dimensionality reduction are sent to decoding end, send 1bit signal notice decoding end simultaneously and carry out dictionary learning after completing this CS frame reconstruct; If MSE, in threshold range, directly also sends dv by observing matrix Φ dimensionality reduction.

In decoding end, through the follow-on modified newton method of compressed sensing restructing algorithm (NSL0), decoding out also stores K frame.If coding side transmits the signal that upgrades dictionary, according to K-singular value decomposition algorithm (K-SVD) algorithm, carry out the dictionary updating of sparse matrix.For CS frame, the coefficient matrix and the observing matrix that use K frame to upgrade carry out NSL0 compression reconfiguration, the reconstruction result of the result reconstructing and former frame K frame are added to the reconstruct that obtains CS frame.Finally according to frame order, be integrated into video output.

Here, observing matrix is used the gaussian random matrix of piecemeal, can effectively reduce the data volume of transmitting for the first time observing matrix, reduces reconstitution time in the situation that not affecting reconstruction accuracy, guarantees real-time.

The prerequisite that compressive sensing theory is used is that signal is can be sparse in Ψ territory, and signal can represent with a sparse dictionary Ψ sparse coefficient corresponding with it, and nonzero term number in sparse coefficient is less than coefficient degree K.The quality of sparse dictionary has determined the reconstruction accuracy of signal.In native system, adopt K-SVD dictionary learning method to generate sparse dictionary, because it has adaptivity, therefore compared to dct transform or wavelet transformation, reconstruct better effects if.Initial sparse dictionary is set as Global Dictionary, the picture training of using camera scene of living in dictionary out.Initial dictionary like this can be more sparse presentation video, from basis, guarantee high reconstruction accuracy.

The present invention has designed a compressed sensing video coding and decoding system for wireless video sensing network, makes it have coding side complexity low little with amount of calculation, and channel transmission data amount is few, and decoding end can be carried out the features such as high-quality real-time video reconstruct.

In order to achieve the above object, the present invention adopts the compressed sensing based on dictionary learning to complete whole video coding and decoding system.System mainly comprises Video coding end and decoding end two parts.

Below in conjunction with accompanying drawing, the present invention will be described in more detail.

Figure 1 shows that the hardware block diagram of native system.By following components, formed: digital camera, DSP video encoding module, wireless transmitter module, wireless receiving module and PC video decode module.Digital camera, DSP video encoding module and wireless transmitter module have formed system coding end, wireless receiving module and PC video decode module composition system decodes end.At coding side, digital camera is connected to DSP video compressing module by multiplexing 32 data wires, will in the video data afferent module of collection, store and encoding operation; Afterwards the data after coding are transferred to decoding end by wireless sending module.The wireless receiving module of decoding end receives coded data, is transferred to PC, carries out decode operation, and finally exports reconstructing video stream.

Figure 2 shows that the compressed sensing video coding and decoding system block diagram based on dictionary learning.Suppose that each two field picture of input video is Nr * Nc dimensional signal, an image sets forms (GOP=2) by two frames, and the first frame is K frame Xk, and the second frame is CS frame Xcs.Take below an image sets as example illustrates whole encoding-decoding process.

At coding side, first read in Xk, and store temporarily.By observing matrix Φ dimensionality reduction, be M * Nc dimensional signal Yk afterwards, as shown in Equation (1), wherein Φ is M * Nr piecemeal gaussian random matrix, M<<Nr.After Φ, view data is significantly compressed, and finally the Yk obtaining is sent to coding side.So far, K frame has been encoded.

Y=ΦX (1)

Read in afterwards Xcs, it is carried out to calculus of differences with the interim Xk storing, dv=Xcs-Xk, obtains, after dv, it is carried out to the calculating of square mean error amount (MSE), and compares with the threshold value presetting.If MSE is less than threshold value lower limit Sl, judge that this two frame is closely similar, coding side does not need this frame to carry out any encoding operation, only needs to send a 1bit control signal " 0 ", this CS frame of notice decoding end, without reconstruct, can directly be used K frame reconstruction result as its reconstruction result; If MSE is greater than upper threshold Sh, dv is carried out to dimensionality reduction according to formula (1) by observing matrix Φ, and the data Y dv after dimensionality reduction is sent to decoding end, and sending 1bit control signal " 1 " simultaneously, notice decoding end will be carried out dictionary learning after this CS frame reconstruct completes; If MSE in threshold range, directly by dv according to formula (1) by observing matrix Φ dimensionality reduction and send Ydv.

Decoding end is started working after receiving Yk.According to formula (2), carry out compressed sensing reconstruct, wherein Φ and coding side are consistent, and Ψ is that Nr * Nr ties up sparse matrix (dictionary), by what reconstruct

store.

min||S|| _os.t.Y＝Φψs (2)

Start afterwards the CS frame of decoding.First check and whether receive 1bit control signal.If control signal is " 0 ",

directly export successively K frame and CS frame reconstruction result.If without control signal, according to formula (2), Ydv is carried out to compressed sensing reconstruct, because carried out Difference Calculation at coding side, so

if control signal is " 1 ", first by above-mentioned reconstructing method, reconstruct

use again

as training signal, by K-SVD dictionary learning, upgrade dictionary, make its more scene after Adaptive change.

The dictionary learning method here adopts K-SVD algorithm as shown in Figure 3.First according to initial dictionary and training signal, carry out sparse coding, i.e. fixing dictionary, carries out rarefaction representation (i.e. use try one's best few coefficient represent as far as possible approx data) with this dictionary to data-oriented, obtains coefficient matrix α.Fixed coefficient matrix, upgrades each dictionary atom (each row of dictionary) successively afterwards, makes its more approaching expression training signal.So iteration repeatedly, can complete dictionary learning, obtains being more suitable for the sparse matrix Ψ of new scene.This matrix is by the frame reconstruct for next GOP.

Finally, by order and the frame per second output frame by frame of the reconstruction result of K frame and CS frame, form outputting video streams.

Wherein, observing matrix Φ is used the gaussian random matrix of piecemeal.If block size is 8 * 8, coding side generates the gaussian random matrix Φ 0 of 8 * 8, uses afterwards Φ 0 to generate the diagonal matrix of M * Nr.Diagonal is comprised of the individual Φ 0 in (M * Nr)/(8 * 8).The gaussian random matrix of this piecemeal can effectively reduce the data volume of transmitting for the first time observing matrix, reduces reconstitution time in the situation that not affecting reconstruction accuracy, guarantees real-time.Reconstruct is a link in compressed sensing process, and restructing algorithm is the concrete grammar that reconstruct is used.The restructing algorithm that the present invention adopts is NSL0 algorithm, NSL0 is follow-on modified newton method, through the algorithm of experimental verification effect optimum in existing compressed sensing restructing algorithm, because it has reconstruction accuracy height and the short feature of reconstruct required time, meet the requirement of the high reconstruction accuracy of native system and real-time.

Claims

1. the compressed sensing video coding and decoding system based on dictionary learning, is characterized in that, mainly comprises Video coding end and decoding end two parts:

Coding side: according to the requirement of reconstruction accuracy and real-time, the frame in video will be divided into two classes, a class is key frame K frame, another kind of is non-key frame CS frame, and every two frames form Yi Ge group, and image sets GOP is 2, odd-numbered frame is K frame, following closely be the CS frame of this group; For K frame, according to compressive sensing theory, the image pixel data of K frame is stored temporarily, then by observing matrix Φ, carry out dimensionality reduction, the data after dimensionality reduction are transferred to decoding end by wireless transmitter module; For CS frame, after reading in image pixel data, carry out difference with former frame K frame, be dv=Xcs-Xk, and judge the square mean error amount (MSE) of dv, if MSE is less than threshold value lower limit, judge that this two frame is closely similar, send this CS frame of 1bit signal notice decoding end without reconstruct, directly use former frame K frame reconstruction result as its reconstruction result; If MSE is greater than upper threshold, dv, by observing matrix Φ dimensionality reduction, is sent to decoding end by the data after dimensionality reduction, send 1bit signal notice decoding end simultaneously and carry out dictionary learning after completing this CS frame reconstruct; If MSE, in threshold range, directly also sends dv by observing matrix Φ dimensionality reduction;

In decoding end, K frame is decoded out and stores through compressed sensing restructing algorithm NSL0, if coding side transmits the signal that upgrades dictionary, according to K-singular value decomposition algorithm (K-SVD), carries out the dictionary updating of sparse matrix; For CS frame, the coefficient matrix and the observing matrix that use K frame to upgrade carry out NSL0 compression reconfiguration, the reconstruction result of the result reconstructing and former frame K frame are added to the reconstruct that obtains CS frame, finally according to frame order, are integrated into video output.

2. the compressed sensing video coding and decoding system based on dictionary learning as claimed in claim 1, is characterized in that, observing matrix is used the gaussian random matrix of piecemeal.

3. the compressed sensing video coding and decoding system based on dictionary learning as claimed in claim 1, it is characterized in that, compressive sensing theory specifically refers to, adopt K-SVD dictionary learning method to generate sparse dictionary, initial sparse dictionary is set as Global Dictionary, the picture training of using camera scene of living in dictionary out.