CN103778919B

CN103778919B - Based on compressed sensing and the voice coding method of rarefaction representation

Info

Publication number: CN103778919B
Application number: CN201410026207.6A
Authority: CN
Inventors: 杨震; 李尚靖
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University
Priority date: 2014-01-21
Filing date: 2014-01-21
Publication date: 2016-08-17
Anticipated expiration: 2034-01-21
Also published as: CN103778919A

Abstract

The invention discloses the voice coding method under a kind of compressed sensing framework, utilize after the projection of compressed sensing framework descending echelon matrix observation sequence can the feature of member-retaining portion characteristics of speech sounds, use rarefaction representation that observation sequence is carried out Mathematical Models；In the training stage, the observation sequence after projecting a large amount of voice row order ladders first with K singular value decomposition method is trained, and obtains a code book dictionary that can be used for real-time monitored sequence rarefaction representation；In coding stage, utilize the atom in dictionary, use orthogonal matching pursuit algorithm to real-time monitored sequence mathematical modeling, only a small amount of position selecting atom and amplitude are encoded and transmit；Decoding end only need to have same dictionary just can recover observation sequence, and utilizes base tracing algorithm reconstructed speech signal, and rearmounted low pass filter improves the human hearing characteristic of reconstructed voice.This invention can carry out coding transmission under compressed sensing framework effectively to voice signal, reduces coding transmission code check, and ensures good reconstructed voice performance.

Description

Based on compressed sensing and the voice coding method of rarefaction representation

Technical field

The invention belongs to voice process technology field, relate to the voice coding method under a kind of compressed sensing framework.

Background technology

Compressed sensing (Compressed sensing) is the theory of a kind of novelty occurred in recent years, and with tradition how it Qwest's sampling thheorem is entirely different, it is not necessary to sampling rate more than signal bandwidth twice, as long as signal becomes at certain It is sparse or compressible for changing territory, then just may be significantly lower than that signal is carried out by the sample rate of Nyquist sampling rate Sampling, and reconstruct original signal with high probability from a small amount of observation projection.Under this theoretical frame, sampling rate is not Depend on signal bandwidth, and be decided by information structure in the signal and content.Compressive sensing theory mainly comprises three Point: the Its Sparse Decomposition of signal, the design of observing matrix and signal reconstruction algorithm.Compressed sensing, once proposition, is drawn at once Having played the extensive attention of Chinese scholars, applied research has been directed to various fields: such as sensor network, medical image Process, radar scanning, bio-sensing, Speech processing etc..

In recent years, rarefaction representation (Sparse representation) has become as in signal processing and application thereof and locates In one of primary concept.The core concept of rarefaction representation, i.e. for the signal of a classification, sufficiently large at one In sample training space or transform domain, can substantially by sample subspace similar in training sample or transform domain atom Linear expression, atom is the column vector in sample subspace or transform domain matrix.Therefore when this signal is by whole sample During space representation, its coefficient represented is sparse, this be rarefaction representation thought most important one it is assumed that certainly the most also The basis analyzed further after being.Rarefaction representation takes full advantage of the dependency between a certain class signal, for signal Compression, de-noising in process, model and coding etc. all brings huge researching value.For being trained by a class signal For the dictionary arrived, the success or not of training directly determines the performance of next step rarefaction representation, therefore Chinese scholars Propose a series of dictionary training method, including optimizing direction algorithm (MOD), K singular value decomposition algorithm (K-SVD), online dictionary learning algorithm (Online Dictionary Learning) etc..

Voice coding is voice transfer and the premise communicated and basis, and good voice coding method can be at relatively low number Phonetic hearing quality is preferably recovered in the case of rate.Recent two decades comes, along with computer, communication, signal processing Etc. the development of correlation technique, speech coding technology is developed rapidly and is applied.Voice coding is according to traditional classification Method is generally divided into three classes: waveform coding, parameter coding and hybrid coding.Waveform coding is by time domain or frequency domain Or transform-domain signals direct coding is digital signal, make every effort to the waveform shape making reconstructed voice waveform keep primary speech signal Shape, mainly has impulse modulation coding (PCM) and adaptive differential impulse modulation coding (ADPCM).Parameter coding Also known as sound source coding or vocoder, it makes at frequency domain or other transform domains, source signal is extracted characteristic parameter, the most right These characteristic parameters encode and transmit, and again the digital signal received are translated into characteristic parameter in decoding end, according to this A little characteristic parameter reconstructed speech signals.Linear predictor coefficient (Linear prediction coefficient) is application at present Most commonly used parametric coding technique.Waveform coding and parameter coding are combined by hybrid coding, overcome waveform and compile Code and the shortcoming of parameter coding, absorb their strong point, can obtain high-quality conjunction in 4～16kbps speed Become voice.

Summary of the invention

Technical problem: it is an object of the invention to provide and a kind of can effectively compress the numeric code rate needed for voice coding, and And ensure good synthesis voice human auditory system performance based on compressed sensing and the voice coding method of rarefaction representation.

Technical scheme: the present invention, based on compressed sensing and the voice coding method of rarefaction representation, comprises the following steps:

A) the dictionary D of an applicable voice signal observation sequence is obtained by the training of K singular value decomposition algorithm；

B) obtain observation sequence: coding side to enter encoder voice first carry out frame length be 20～40ms point Frame processes, then utilize row order echelon matrices as projection matrix, according to the compression ratio of 1:2 or 1:4 to every frame voice Project, obtain the observation sequence y of every frame voice；

C) utilize rarefaction representation that observation sequence y is carried out mathematical modeling, i.e. utilize orthogonal matching pursuit algorithm, obtain Observation sequence y sparse coefficient in dictionary D, specifically comprises the following steps that

1) initialize: candidate collection I is initialized as empty set, i.e. I=() empty set, residual error r=y, sparse coefficient γ=0, Arranging iteration initial number of times i=1, iteration ends number of times is atom number K selected according to target bit rate, namely presets Degree of rarefication；

2) seek, according to following formula, the index k that the atom degree of association in residual error and dictionary D is the highest:

Wherein d_kFor kth atom in dictionary D, Arg min represents makes object function take Variate-value during little value；

Then selected atom is indexed k and puts into candidate collection I, and I=(I, k)；

3) according to following formula renewal sparse coefficient:

Wherein D_IFor merely with indexing the dictionary of atom in candidate collection I,For D_IPseudo inverse matrix, γ_IFor merely with the sparse coefficient vector of atom in candidate collection I；

Then according to following formula renewal residual error:

R=y-D_Iγ_I；

4) make i=i+1, if i < K, then show that dictionary atom is chosen and be not fully complete, return step 2), otherwise observe sequence Row rarefaction representation loop ends, the γ that final updating is obtained_IAs observation sequence y sparse coefficient γ in dictionary D, Entering step d), wherein K is iteration ends number of times, and its value is the atom number selected according to target bit rate；

D) as follows, respectively position and the amplitude of K atom needed for sparse coefficient γ are encoded:

Atom number in dictionary D is defined as the exponential depth of 2, i.e. L=2^p, find required atom according to p bit Position, use standard 8 bit pulse modulating-coding as atom amplitude；

E) recovery of voice signal observation sequence: individual according to K needed for obtaining sparse coefficient γ in described step d) The position of atom and amplitude, find the atom required for sparse coefficient γ in dictionary D, then by each atom to Amount is multiplied with its amplitude, the atom addition of vectors after being then multiplied the K obtained with amplitude, the language being restored out Tone signal observation sequence；

F) reconstruct of voice signal: the observation sequence according to recovering reconstructs voice signal.Selection discrete cosine base is The sparse base of voice signal, uses base tracing algorithm as restructing algorithm, utilizes the voice signal that described step e) recovers Observation sequence reconstructs voice signal；

G) reconstructed speech signal is carried out low-pass filtering: according to filter transfer functionUse The voice signal that described step f) is reconstructed by the method for rearmounted low pass filter is filtered post processing.

In the step b) of a preferred embodiment of the present invention, at coding side, the voice entering encoder is carried out at framing The frame length of reason is 40ms.

Beneficial effect: the present invention compared with prior art, has the advantage that

Signals collecting and compression two steps can be processed by compressed sensing simultaneously, and finite reduction sampling rate is greatly simplified down One step process operand and signal transmission bandwidth, compressed sensing framework under voice coding also in the starting stage, enter The mathematical modeling of row compressed sensing observation sequence and coding have very important realistic meaning to voice communication.The present invention Based on compressed sensing framework, utilize row order echelon matrices to be observed projection after framing, utilize training dictionary in advance and Rarefaction representation carries out mathematical modeling and extracts characteristic parameter observation sequence, a small amount of characteristic parameter only carries out coding and passes Defeated, utilize dictionary and Parameter reconstruction observation sequence during decoding, utilize discrete cosine base and base tracing algorithm reconstructed voice letter Number, and the human hearing characteristic of rearmounted low pass filter raising reconstructed voice.While ensureing low bit-rate, obtain Preferably voice Quality of recovery.The method is when transmission code rate is 5.25Kbps, and Mean Opinion Score can reach 3.18 points, It is better than classical QCELP Qualcomm (CELP) method.

Accompanying drawing explanation

Fig. 1 is the schematic flow sheet of coding side in the inventive method.

Fig. 2 is the schematic flow sheet of decoding end in the inventive method.

Detailed description of the invention

Below by embodiment, the present invention is described in further detail.

To real-time, different, before the voice signal coding of change, first have to the mode trained obtain for The voice signal observation sequence dictionary of observation sequence rarefaction representation.Voice, same person different time due to different people Voice be all not quite similar, the dictionary obtained by training must comprise characteristics of speech sounds the most widely as far as possible, the most complete The redundancy of dictionary meets the requirement of voice coding dictionary just.First collect and obtain substantial amounts of voice group and become sound bank, Which includes all ages and classes, sex, the voice of the people that the big measure feature such as occupation is different, in order to comprise voice signal Various changes.First voice in sound bank is carried out sub-frame processing, according to the short-term stationarity of voice signal, frame length The arbitrary value being interval with 5ms in 20ms～40ms can be chosen, most widely suited with 40ms.When frame length is 40ms, For 8 KHz sampled signals, a frame voice packet contains N=320 sampled point.Under compressed sensing technological frame, And frame signal every in sound bank is observed projection by row order echelon matrices, obtain the observation sequence of all voices of sound bank Row.When compression ratio is M:N=1:2, shown in row order ladder observing matrix such as formula (1).

[\begin{matrix} 1100000.....0 \\ 0011000.....0 \\ 0000110.....0 \\ ..................... \\ 0000.......011 \end{matrix}] - - - (1)

Now M=160, i.e. observation sequence have 160 sampled points, and compressed sensing sampled point is only traditional sampling mode Half, has greatly reduced data volume.After dictionary training observation series processing completes, i.e. may utilize K singular value decomposition Algorithm carries out dictionary learning to processing the voice observation sequence obtained, and obtains the rarefaction representation dictionary needed for subsequent step. The dictionary being hereinafter previously mentioned, refers both to the specific rarefaction representation dictionary obtained by being learnt by this step.K is unusual It is as follows that value decomposes dictionary learning algorithm steps:

Input: training observation arrangement set X, initial dictionary D₀, target sparse degree K, dictionary size L, iteration time Number n.

Output: dictionary D, sparse matrix Γ (to such an extent as to X ≈ D Γ).

1) initialize: initialization dictionary D=D is set₀；

2) initialize: i=1；

3) for any i, asks.t.||γ||₀≤ K, Γ_iThe sparse matrix of ith iteration, Arg min represents variate-value when making object function take minima；

4) j=1 is set；

5)D_j=0, D_jFor jth atom in dictionary, i.e. jth column vector；

6) I={ training set X utilizes D_jThe index of the observation sequence represented }；

7) E=X_I-DΓ_I, E is the error between the interior signal of index and the rarefaction representation of itself；

8) asks.t.||d||₂=1, d are the atom that the minimization of object function is tried to achieve, g For sparse coefficient, Arg min represents variate-value when making object function take minima；

9) D is updated_j=d；

10) Γ is updated_j,I=g^T, Γ_j,IThe sparse coefficient of index interior atom is utilized during for jth time circulation；

11) j=j+1, if j < L (L represents dictionary atom number), then returns to step 5), otherwise, circulation knot Bundle；

12) i=+1, if i < n, then returns to step 3), otherwise, loop ends, dictionary training completes.

The training iterations n of dictionary takes 30 and is advisable.After dictionary training completes, formal coding stage only needs dictionary D, observation sequence rarefaction representation and the observation sequence in decoding stage for coding stage recover.Dictionary size is through anti- Retrial is tested, and is taken as ten cubes 8192 of 2, i.e. utilizes 13 bits to be assured that the position of selected atom.

In the formal encoding and decoding speech stage, traditional voice processing mode is pressed in the voice of input coding device, first carry out point Frame processes, and uses same 40ms when frame length and training.Utilize compressed sensing technology after framing, every frame voice is entered Row observation projection.In compressed sensing observation model, it is not directly to measure sparse signal x, but signal x is thrown Shadow to one group observation vector Φ=[φ₁, φ₂... φ_m... φ_MOn], and obtain observationWrite as rectangular Formula is

Y=Φ x (2)

In formula；X is that matrix is tieed up in N × 1, and y is that matrix is tieed up in M × 1, and Φ is the observing matrix of M × N-dimensional.First remember sparse Base is Ψ, then have

Y=Φ x=Φ Ψ θ=Ξ θ (3)

In formula, Ξ=Φ Ψ is M × N-dimensional matrix.

Owing to observation dimension M is far smaller than signal dimension N, the inverse problem solving formula (2) is an ill-conditioning problem, All cannot directly from M the observation of y direct solution go out signal x.But owing to compressed sensing make use of signal Openness, signal x can be obtained further by seeking sparse coefficient θ.In order to ensure convergence so that sparse Sparse can be recovered accurately by M observation, Ξ must is fulfilled for limited equidistant characteristics (RIP criterion), i.e. for appointing Meaning has vector v sparse for strict K, and matrix Ξ can ensure that lower inequality such as is set up

1 - ϵ \leq \frac{| | Ξ v | |_{2}}{| | v | |_{2}} \leq 1 + ϵ - - - (4)

ε ＞ 0 in formula.But, it is determined that whether given Ξ has RIP character is a combination challenge.Document is had to refer to If going out can guarantee that observing matrix and sparse base are irrelevant, then Ξ meets RIP characteristic on the biggest probability, irrelevant is Sensing amount { φ_j{ ψ can not be used_iRarefaction representation, incoherence is the strongest, and the coefficient needed for representing mutually is the most；Otherwise phase Closing property is the strongest.By select random Gaussian matrix as observing matrix Φ can high probability ensure irrelevant character and RIP character.

Voice observation sequence after projecting due to row order echelon matrices remains the part of properties of voice, as short-term stationarity, Approximately periodic and pitch structures, the dominant partial redundance characteristic remaining voice signal in the proposition of compression.And Row order echelon matrices also can ensure RIP characteristic, so utilizing row order echelon matrices to obtain as projection matrix, projection by high probability The observation sequence of every frame voice.Row order ladder observing matrix is identical with formula (1), it is ensured that real-time voice observation sequence length Equal with atomic length in dictionary, it is simple to the follow-up mathematical modeling to observation sequence.

Although utilizing compressed sensing that voice signal is compressed, but showing according to correlational study, not pressing completely The redundancy of abbreviated expression tone signal, the space carrying out second-compressed is the biggest.This pressure of characteristic parameter is extracted after mathematical modeling Contracting mode, to be widely applied to signal processing field, utilizes rarefaction representation to be modeled observation sequence, mathematical model As shown in formula (5),

y = Σ_{i 1 = 1}^{K} a_{i 1} d_{i i} - - - (5)

Wherein y represents observation sequence to be modeled, d_iFor the not homoatomic of each in dictionary, a_iFor corresponding atom Gain (amplitude), K is fixing degree of rarefication, chooses K=10.Huge dictionary such a relative to 8192, Only need 10 atoms and the most sparse just can represent any one frame voice signal.This model can simplicity of explanation be utilization Cross the linear combination of 10 atoms in complete dictionary and get final product the observation sequence of accurate approximate real time voice signal, to voice Observation sequence has carried out second-compressed.But ten atoms and the corresponding sparse coefficient of finding rarefaction representation from dictionary are One optimization problem, solves problems method a lot.Orthogonal matching pursuit (OMP) algorithm is a kind of greedy algorithm, On the basis of calculating is fireballing, can guarantee that again modeling accuracy.Its Sparse Decomposition is carried out, tool hence with OMP algorithm Body step is as follows:

1) initialize: candidate collection I is initialized as empty set, i.e. I=() empty set, residual error r=y, sparse coefficient γ=0, Arranging iteration initial number of times i=1, iteration ends number of times is K；

3) according to following formula renewal sparse coefficient:

Then according to following formula renewal residual error:

R=y-D_Iγ_I；

A relative observation sequence dictionary having 8192 atoms, sparse coefficient γ is the vector of a 1*8192, wherein Nonzero element is only K.By the position of this K nonzero element, comparison dictionary just can be calculated sparse system Atom selected by number.After second-compressed, the data of required transmission are only position and the amplitude of sparse coefficient selected atom.

Transmit for convenience, it is only necessary to position and the amplitude of atom needed for sparse coefficient are encoded.Utilizing sparse table Show after observation sequence is carried out mathematical modeling, signal to greatly reduce, merely with atom selected by orthogonal matching algorithm and Coefficient magnitude (amplitude) just can represent signal.It is set to 8192, it is simply that utilize for convenience by crossing complete dictionary size Binary coding, it is contemplated that 8191=2¹³, the most only need 13 bits just to can determine that the position of a selected atom.Former The accuracy of modeling is had a significant impact by sub-amplitude, so using standard 8 bit PCM coding, it is ensured that to atom width It is accurate that degree encodes.After the position of a whole selected atom of frame signal and amplitude coding are completed, can be transmitted. When frame length is T millisecond, needed for a frame observation sequence sparse coefficient, atom is K, needed for determining an atom site When bit number is p, speech encoding rate can be calculated by formula (6),

Bit rate=K × (p+8)/T Kbps (6)

So as T=40ms, K=10, during p=13, corresponding speech encoding rate is 5.25Kbps.

In decoding end, recover observation sequence according to mathematical model.Decoding end mutually should be with the presence of an identical dictionary To recover observation sequence.After decoding end obtains position and the amplitude of selection sparse coefficient, can according to coefficient positions Determine selected atom in dictionary, and be multiplied by the sparse amplitude that this atom pair is answered, be multiplied by amplitude letter by all afterwards Atom after breath is added, and obtains through rarefaction representation mathematical modeling and recovers to obtain voice observation sequence.Experiment simulation is demonstrate,proved Bright, this modeling method is largely effective to the second-compressed of voice observation sequence, to the restoration errors of Voiced signal 1% Within, Unvoiced signal is slightly worse, but also superior to major part for the Mathematical Modeling Methods of compressed sensing observation sequence.And Characteristic parameter under the method only has location parameter and the range parameter of sparse coefficient atom, it is simple to process further.

After obtaining the voice signal observation sequence again recovered, next step reconstructs according to compressed sensing restructing algorithm Voice signal.Under compressive sensing theory framework, the selection with restructing algorithm of choosing of sparse base directly determines last weight Structure goes out to obtain voice quality.Only select the degree of rarefication of suitable basis representation signal guarantee signal, thus ensure signal Recovery precision, when studying the sparse coefficient of signal, conversion base can be weighed by the conversion coefficient rate of decay Rarefaction representation ability.There are some researches show, meeting the signal with power velocity attenuation, available compressive sensing theory obtains Recover to reconstruct, and reconstructed error meets

E = | | x - \hat{x} | |_{2} \leq C_{r} \cdot {(K / {(\log N)}^{6})}^{- r} - - - (7)

Wherein r=1/p-1/2,0 < p < 1.

Document is had to point out the Fourier coefficient of smooth signal, wavelet coefficient, the function of total variation of the functions of bounded variation, The Curvelet coefficient etc. of the Gabor coefficient of oscillator signal and the picture signal with just continuous boundary the most all has enough Openness, signal can be recovered by compressive sensing theory.According to the characteristic of voice signal, selecting discrete cosine Present good openness under base, therefore selecting discrete cosine base is the sparse base of voice signal.DCT based structures such as formula (8) Shown in,

In compressive sensing theory, owing to observation quantity M is far smaller than signal length N, therefore have to solve deficient The problem determining equation y=Φ x=Φ Ψ θ=Ξ θ.Apparently, the underdetermined system of equations is solved the most hopeless.But by Being sparse or compressible in signal x, this premise fundamentally changes problem so that underdetermined problem can solve, And RIP characteristic also theory ensure that Exact recovery signal from M observation.

In order to more clearly describe the p norm of the signal reconstruction problem of compressive sensing theory, first definition signal x it is

| | x | |_{p} = {(Σ_{i = 1}^{N} {| x_{i} |}^{p})}^{1 / p} - - - (9)

Obtain 0 norm as p=0, actually represent the number of nonzero term in x.Then, maybe can press signal is sparse On the premise of contracting, the problem solving underdetermined equation is converted into minimum 0 norm problem (10)

min||Ψ^Tx||₀S.t.y=Φ Ψ x=Ξ θ (10)

Base tracing algorithm will demonstrate that under the conditions of meeting limited equidistant characteristics (RIP), solves a simpler LI model Number minimizes, i.e. minimum as p=1 in formula (9), has identical solution with solving L0 Norm minimum,

min||Ψ^Tx||₁S.t.y=Φ Ψ x=Ξ θ (11)

Base tracing algorithm is the restructing algorithm being now widely used for compressed sensing.Utilize base tracing algorithm according to observation sequence Row obtain the expression that signal is the most sparse from discrete cosine base, i.e. with the fewest reported as precisely as possible representing of base vector Primitive tone signal, thus obtain inward nature's characteristic of signal.Use and represent that sparse norm represents sparse as signal The tolerance of property, is the constrained extreme-value problem of class by minimizing L1 norm by sparse signal representation problem definition, And then be converted into convex optimization linear programming problem and solve.

Although the optimal solution obtained by base tracing algorithm is reconstructed observation on the whole and approaches former observation on Euclidean distance Value, but owing to base tracing algorithm uses L1 norm to move high yardstick as object function, news at low Scale energy Phenomenon, thus easily produce some artifact effect, at high-frequency region, concussion can occur.According to raw tone and reconstruct language The spectrogram contrast of sound finds, both are more identical in low-frequency range, and high band reconstructed voice frequency spectrum promotes bigger.For language For tone signal, HFS spectrum energy promotes and means to there will be some sharp-pointed " hereby " sound, causes reconstruct The human auditory system quality of voice has certain impact.A rearmounted low-pass first order filter can effectively be alleviated has reconstruct to bring HFS distortion, promotes human ear reconstructed voice auditory properties.Shown in filter transfer function such as formula (12).

H (z) = \frac{1 - μ}{1 - {μz}^{- 1}} - - - (12)

When filter parameter μ takes 0.9, shown in filter transfer function such as formula (13).

H (z) = \frac{1 - 0.9}{1 - 0.9 z^{- 1}} - - - (13)

After postfilter filters, reconstructed voice HFS frequency spectrum is closer to raw tone, and " hereby " sound is little Time, reconstructed voice sounds the most comfortable.

Claims

1. one kind based on compressed sensing and the voice coding method of rarefaction representation, it is characterised in that the method comprises the following steps:

B) observation sequence is obtained: at coding side, first the voice entering encoder is carried out the sub-frame processing that frame length is 20～40ms, then utilize row order echelon matrices as projection matrix, according to the compression ratio of 1:2 or 1:4, every frame voice is projected, obtain the observation sequence y of every frame voice；

C) utilize rarefaction representation that observation sequence y is carried out mathematical modeling, i.e. utilize orthogonal matching pursuit algorithm, obtain observation sequence y sparse coefficient in dictionary D, specifically comprise the following steps that

1) initializing: candidate collection I is initialized as empty set, i.e. I=() empty set, residual error r=y, sparse coefficient γ=0, arranges iteration initial number of times i=1, iteration ends number of times is atom number K selected according to target bit rate, namely the degree of rarefication preset；

Wherein d_kFor kth atom in dictionary D, Arg min represents variate-value when making object function take minima；

3) according to following formula renewal sparse coefficient:

Then according to following formula renewal residual error:

R=y-D_Iγ_I；

4) make i=i+1, if i < K, then show that dictionary atom is chosen and be not fully complete, return step 2), otherwise observation sequence rarefaction representation loop ends, the γ that final updating is obtained_IAs observation sequence y sparse coefficient γ in dictionary D, entering step d), wherein K is iteration ends number of times, and its value is the atom number selected according to target bit rate；

Atom number in dictionary D is defined as the exponential depth of 2, i.e. L=2^p, find the position of required atom according to p bit, use standard 8 bit pulse modulating-coding as atom amplitude；

E) recovery of voice signal observation sequence: according to the position and the amplitude that obtain K atom needed for sparse coefficient γ in described step d), the atom required for sparse coefficient γ is found in dictionary D, then the vector of each atom is multiplied with its amplitude, then the atom addition of vectors after the K obtained being multiplied with amplitude, the voice signal observation sequence being restored out；

F) reconstruct of voice signal: the observation sequence according to recovering reconstructs voice signal；Selecting discrete cosine base is the sparse base of voice signal, uses base tracing algorithm to reconstruct voice signal as restructing algorithm, the voice signal observation sequence utilizing described step e) to recover；

G) reconstructed speech signal is carried out low-pass filtering: according to filter transfer functionThe voice signal using the method for rearmounted low pass filter to reconstruct described step f) is filtered post processing.

The most according to claim 1 based on compressed sensing with the voice coding method of rarefaction representation, it is characterised in that in described step b), the frame length voice entering encoder being carried out sub-frame processing at coding side is 40ms.