CN103778919B - Based on compressed sensing and the voice coding method of rarefaction representation - Google Patents

Based on compressed sensing and the voice coding method of rarefaction representation Download PDF

Info

Publication number
CN103778919B
CN103778919B CN201410026207.6A CN201410026207A CN103778919B CN 103778919 B CN103778919 B CN 103778919B CN 201410026207 A CN201410026207 A CN 201410026207A CN 103778919 B CN103778919 B CN 103778919B
Authority
CN
China
Prior art keywords
atom
voice
dictionary
observation sequence
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn - After Issue
Application number
CN201410026207.6A
Other languages
Chinese (zh)
Other versions
CN103778919A (en
Inventor
杨震
李尚靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201410026207.6A priority Critical patent/CN103778919B/en
Publication of CN103778919A publication Critical patent/CN103778919A/en
Application granted granted Critical
Publication of CN103778919B publication Critical patent/CN103778919B/en
Withdrawn - After Issue legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses the voice coding method under a kind of compressed sensing framework, utilize after the projection of compressed sensing framework descending echelon matrix observation sequence can the feature of member-retaining portion characteristics of speech sounds, use rarefaction representation that observation sequence is carried out Mathematical Models;In the training stage, the observation sequence after projecting a large amount of voice row order ladders first with K singular value decomposition method is trained, and obtains a code book dictionary that can be used for real-time monitored sequence rarefaction representation;In coding stage, utilize the atom in dictionary, use orthogonal matching pursuit algorithm to real-time monitored sequence mathematical modeling, only a small amount of position selecting atom and amplitude are encoded and transmit;Decoding end only need to have same dictionary just can recover observation sequence, and utilizes base tracing algorithm reconstructed speech signal, and rearmounted low pass filter improves the human hearing characteristic of reconstructed voice.This invention can carry out coding transmission under compressed sensing framework effectively to voice signal, reduces coding transmission code check, and ensures good reconstructed voice performance.

Description

Based on compressed sensing and the voice coding method of rarefaction representation
Technical field
The invention belongs to voice process technology field, relate to the voice coding method under a kind of compressed sensing framework.
Background technology
Compressed sensing (Compressed sensing) is the theory of a kind of novelty occurred in recent years, and with tradition how it Qwest's sampling thheorem is entirely different, it is not necessary to sampling rate more than signal bandwidth twice, as long as signal becomes at certain It is sparse or compressible for changing territory, then just may be significantly lower than that signal is carried out by the sample rate of Nyquist sampling rate Sampling, and reconstruct original signal with high probability from a small amount of observation projection.Under this theoretical frame, sampling rate is not Depend on signal bandwidth, and be decided by information structure in the signal and content.Compressive sensing theory mainly comprises three Point: the Its Sparse Decomposition of signal, the design of observing matrix and signal reconstruction algorithm.Compressed sensing, once proposition, is drawn at once Having played the extensive attention of Chinese scholars, applied research has been directed to various fields: such as sensor network, medical image Process, radar scanning, bio-sensing, Speech processing etc..
In recent years, rarefaction representation (Sparse representation) has become as in signal processing and application thereof and locates In one of primary concept.The core concept of rarefaction representation, i.e. for the signal of a classification, sufficiently large at one In sample training space or transform domain, can substantially by sample subspace similar in training sample or transform domain atom Linear expression, atom is the column vector in sample subspace or transform domain matrix.Therefore when this signal is by whole sample During space representation, its coefficient represented is sparse, this be rarefaction representation thought most important one it is assumed that certainly the most also The basis analyzed further after being.Rarefaction representation takes full advantage of the dependency between a certain class signal, for signal Compression, de-noising in process, model and coding etc. all brings huge researching value.For being trained by a class signal For the dictionary arrived, the success or not of training directly determines the performance of next step rarefaction representation, therefore Chinese scholars Propose a series of dictionary training method, including optimizing direction algorithm (MOD), K singular value decomposition algorithm (K-SVD), online dictionary learning algorithm (Online Dictionary Learning) etc..
Voice coding is voice transfer and the premise communicated and basis, and good voice coding method can be at relatively low number Phonetic hearing quality is preferably recovered in the case of rate.Recent two decades comes, along with computer, communication, signal processing Etc. the development of correlation technique, speech coding technology is developed rapidly and is applied.Voice coding is according to traditional classification Method is generally divided into three classes: waveform coding, parameter coding and hybrid coding.Waveform coding is by time domain or frequency domain Or transform-domain signals direct coding is digital signal, make every effort to the waveform shape making reconstructed voice waveform keep primary speech signal Shape, mainly has impulse modulation coding (PCM) and adaptive differential impulse modulation coding (ADPCM).Parameter coding Also known as sound source coding or vocoder, it makes at frequency domain or other transform domains, source signal is extracted characteristic parameter, the most right These characteristic parameters encode and transmit, and again the digital signal received are translated into characteristic parameter in decoding end, according to this A little characteristic parameter reconstructed speech signals.Linear predictor coefficient (Linear prediction coefficient) is application at present Most commonly used parametric coding technique.Waveform coding and parameter coding are combined by hybrid coding, overcome waveform and compile Code and the shortcoming of parameter coding, absorb their strong point, can obtain high-quality conjunction in 4~16kbps speed Become voice.
Summary of the invention
Technical problem: it is an object of the invention to provide and a kind of can effectively compress the numeric code rate needed for voice coding, and And ensure good synthesis voice human auditory system performance based on compressed sensing and the voice coding method of rarefaction representation.
Technical scheme: the present invention, based on compressed sensing and the voice coding method of rarefaction representation, comprises the following steps:
A) the dictionary D of an applicable voice signal observation sequence is obtained by the training of K singular value decomposition algorithm;
B) obtain observation sequence: coding side to enter encoder voice first carry out frame length be 20~40ms point Frame processes, then utilize row order echelon matrices as projection matrix, according to the compression ratio of 1:2 or 1:4 to every frame voice Project, obtain the observation sequence y of every frame voice;
C) utilize rarefaction representation that observation sequence y is carried out mathematical modeling, i.e. utilize orthogonal matching pursuit algorithm, obtain Observation sequence y sparse coefficient in dictionary D, specifically comprises the following steps that
1) initialize: candidate collection I is initialized as empty set, i.e. I=() empty set, residual error r=y, sparse coefficient γ=0, Arranging iteration initial number of times i=1, iteration ends number of times is atom number K selected according to target bit rate, namely presets Degree of rarefication;
2) seek, according to following formula, the index k that the atom degree of association in residual error and dictionary D is the highest:
Wherein dkFor kth atom in dictionary D, Arg min represents makes object function take Variate-value during little value;
Then selected atom is indexed k and puts into candidate collection I, and I=(I, k);
3) according to following formula renewal sparse coefficient:
Wherein DIFor merely with indexing the dictionary of atom in candidate collection I,For DIPseudo inverse matrix, γIFor merely with the sparse coefficient vector of atom in candidate collection I;
Then according to following formula renewal residual error:
R=y-DIγI
4) make i=i+1, if i < K, then show that dictionary atom is chosen and be not fully complete, return step 2), otherwise observe sequence Row rarefaction representation loop ends, the γ that final updating is obtainedIAs observation sequence y sparse coefficient γ in dictionary D, Entering step d), wherein K is iteration ends number of times, and its value is the atom number selected according to target bit rate;
D) as follows, respectively position and the amplitude of K atom needed for sparse coefficient γ are encoded:
Atom number in dictionary D is defined as the exponential depth of 2, i.e. L=2p, find required atom according to p bit Position, use standard 8 bit pulse modulating-coding as atom amplitude;
E) recovery of voice signal observation sequence: individual according to K needed for obtaining sparse coefficient γ in described step d) The position of atom and amplitude, find the atom required for sparse coefficient γ in dictionary D, then by each atom to Amount is multiplied with its amplitude, the atom addition of vectors after being then multiplied the K obtained with amplitude, the language being restored out Tone signal observation sequence;
F) reconstruct of voice signal: the observation sequence according to recovering reconstructs voice signal.Selection discrete cosine base is The sparse base of voice signal, uses base tracing algorithm as restructing algorithm, utilizes the voice signal that described step e) recovers Observation sequence reconstructs voice signal;
G) reconstructed speech signal is carried out low-pass filtering: according to filter transfer functionUse The voice signal that described step f) is reconstructed by the method for rearmounted low pass filter is filtered post processing.
In the step b) of a preferred embodiment of the present invention, at coding side, the voice entering encoder is carried out at framing The frame length of reason is 40ms.
Beneficial effect: the present invention compared with prior art, has the advantage that
Signals collecting and compression two steps can be processed by compressed sensing simultaneously, and finite reduction sampling rate is greatly simplified down One step process operand and signal transmission bandwidth, compressed sensing framework under voice coding also in the starting stage, enter The mathematical modeling of row compressed sensing observation sequence and coding have very important realistic meaning to voice communication.The present invention Based on compressed sensing framework, utilize row order echelon matrices to be observed projection after framing, utilize training dictionary in advance and Rarefaction representation carries out mathematical modeling and extracts characteristic parameter observation sequence, a small amount of characteristic parameter only carries out coding and passes Defeated, utilize dictionary and Parameter reconstruction observation sequence during decoding, utilize discrete cosine base and base tracing algorithm reconstructed voice letter Number, and the human hearing characteristic of rearmounted low pass filter raising reconstructed voice.While ensureing low bit-rate, obtain Preferably voice Quality of recovery.The method is when transmission code rate is 5.25Kbps, and Mean Opinion Score can reach 3.18 points, It is better than classical QCELP Qualcomm (CELP) method.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of coding side in the inventive method.
Fig. 2 is the schematic flow sheet of decoding end in the inventive method.
Detailed description of the invention
Below by embodiment, the present invention is described in further detail.
To real-time, different, before the voice signal coding of change, first have to the mode trained obtain for The voice signal observation sequence dictionary of observation sequence rarefaction representation.Voice, same person different time due to different people Voice be all not quite similar, the dictionary obtained by training must comprise characteristics of speech sounds the most widely as far as possible, the most complete The redundancy of dictionary meets the requirement of voice coding dictionary just.First collect and obtain substantial amounts of voice group and become sound bank, Which includes all ages and classes, sex, the voice of the people that the big measure feature such as occupation is different, in order to comprise voice signal Various changes.First voice in sound bank is carried out sub-frame processing, according to the short-term stationarity of voice signal, frame length The arbitrary value being interval with 5ms in 20ms~40ms can be chosen, most widely suited with 40ms.When frame length is 40ms, For 8 KHz sampled signals, a frame voice packet contains N=320 sampled point.Under compressed sensing technological frame, And frame signal every in sound bank is observed projection by row order echelon matrices, obtain the observation sequence of all voices of sound bank Row.When compression ratio is M:N=1:2, shown in row order ladder observing matrix such as formula (1).
1100000.....0 0011000.....0 0000110.....0 ..................... 0000.......011 - - - ( 1 )
Now M=160, i.e. observation sequence have 160 sampled points, and compressed sensing sampled point is only traditional sampling mode Half, has greatly reduced data volume.After dictionary training observation series processing completes, i.e. may utilize K singular value decomposition Algorithm carries out dictionary learning to processing the voice observation sequence obtained, and obtains the rarefaction representation dictionary needed for subsequent step. The dictionary being hereinafter previously mentioned, refers both to the specific rarefaction representation dictionary obtained by being learnt by this step.K is unusual It is as follows that value decomposes dictionary learning algorithm steps:
Input: training observation arrangement set X, initial dictionary D0, target sparse degree K, dictionary size L, iteration time Number n.
Output: dictionary D, sparse matrix Γ (to such an extent as to X ≈ D Γ).
1) initialize: initialization dictionary D=D is set0
2) initialize: i=1;
3) for any i, asks.t.||γ||0≤ K, ΓiThe sparse matrix of ith iteration, Arg min represents variate-value when making object function take minima;
4) j=1 is set;
5)Dj=0, DjFor jth atom in dictionary, i.e. jth column vector;
6) I={ training set X utilizes DjThe index of the observation sequence represented };
7) E=XI-DΓI, E is the error between the interior signal of index and the rarefaction representation of itself;
8) asks.t.||d||2=1, d are the atom that the minimization of object function is tried to achieve, g For sparse coefficient, Arg min represents variate-value when making object function take minima;
9) D is updatedj=d;
10) Γ is updatedj,I=gT, Γj,IThe sparse coefficient of index interior atom is utilized during for jth time circulation;
11) j=j+1, if j < L (L represents dictionary atom number), then returns to step 5), otherwise, circulation knot Bundle;
12) i=+1, if i < n, then returns to step 3), otherwise, loop ends, dictionary training completes.
The training iterations n of dictionary takes 30 and is advisable.After dictionary training completes, formal coding stage only needs dictionary D, observation sequence rarefaction representation and the observation sequence in decoding stage for coding stage recover.Dictionary size is through anti- Retrial is tested, and is taken as ten cubes 8192 of 2, i.e. utilizes 13 bits to be assured that the position of selected atom.
In the formal encoding and decoding speech stage, traditional voice processing mode is pressed in the voice of input coding device, first carry out point Frame processes, and uses same 40ms when frame length and training.Utilize compressed sensing technology after framing, every frame voice is entered Row observation projection.In compressed sensing observation model, it is not directly to measure sparse signal x, but signal x is thrown Shadow to one group observation vector Φ=[φ1, φ2... φm... φMOn], and obtain observationWrite as rectangular Formula is
Y=Φ x (2)
In formula;X is that matrix is tieed up in N × 1, and y is that matrix is tieed up in M × 1, and Φ is the observing matrix of M × N-dimensional.First remember sparse Base is Ψ, then have
Y=Φ x=Φ Ψ θ=Ξ θ (3)
In formula, Ξ=Φ Ψ is M × N-dimensional matrix.
Owing to observation dimension M is far smaller than signal dimension N, the inverse problem solving formula (2) is an ill-conditioning problem, All cannot directly from M the observation of y direct solution go out signal x.But owing to compressed sensing make use of signal Openness, signal x can be obtained further by seeking sparse coefficient θ.In order to ensure convergence so that sparse Sparse can be recovered accurately by M observation, Ξ must is fulfilled for limited equidistant characteristics (RIP criterion), i.e. for appointing Meaning has vector v sparse for strict K, and matrix Ξ can ensure that lower inequality such as is set up
1 - &epsiv; &le; | | &Xi; v | | 2 | | v | | 2 &le; 1 + &epsiv; - - - ( 4 )
ε > 0 in formula.But, it is determined that whether given Ξ has RIP character is a combination challenge.Document is had to refer to If going out can guarantee that observing matrix and sparse base are irrelevant, then Ξ meets RIP characteristic on the biggest probability, irrelevant is Sensing amount { φj{ ψ can not be usediRarefaction representation, incoherence is the strongest, and the coefficient needed for representing mutually is the most;Otherwise phase Closing property is the strongest.By select random Gaussian matrix as observing matrix Φ can high probability ensure irrelevant character and RIP character.
Voice observation sequence after projecting due to row order echelon matrices remains the part of properties of voice, as short-term stationarity, Approximately periodic and pitch structures, the dominant partial redundance characteristic remaining voice signal in the proposition of compression.And Row order echelon matrices also can ensure RIP characteristic, so utilizing row order echelon matrices to obtain as projection matrix, projection by high probability The observation sequence of every frame voice.Row order ladder observing matrix is identical with formula (1), it is ensured that real-time voice observation sequence length Equal with atomic length in dictionary, it is simple to the follow-up mathematical modeling to observation sequence.
Although utilizing compressed sensing that voice signal is compressed, but showing according to correlational study, not pressing completely The redundancy of abbreviated expression tone signal, the space carrying out second-compressed is the biggest.This pressure of characteristic parameter is extracted after mathematical modeling Contracting mode, to be widely applied to signal processing field, utilizes rarefaction representation to be modeled observation sequence, mathematical model As shown in formula (5),
y = &Sigma; i 1 = 1 K a i 1 d i i - - - ( 5 )
Wherein y represents observation sequence to be modeled, diFor the not homoatomic of each in dictionary, aiFor corresponding atom Gain (amplitude), K is fixing degree of rarefication, chooses K=10.Huge dictionary such a relative to 8192, Only need 10 atoms and the most sparse just can represent any one frame voice signal.This model can simplicity of explanation be utilization Cross the linear combination of 10 atoms in complete dictionary and get final product the observation sequence of accurate approximate real time voice signal, to voice Observation sequence has carried out second-compressed.But ten atoms and the corresponding sparse coefficient of finding rarefaction representation from dictionary are One optimization problem, solves problems method a lot.Orthogonal matching pursuit (OMP) algorithm is a kind of greedy algorithm, On the basis of calculating is fireballing, can guarantee that again modeling accuracy.Its Sparse Decomposition is carried out, tool hence with OMP algorithm Body step is as follows:
1) initialize: candidate collection I is initialized as empty set, i.e. I=() empty set, residual error r=y, sparse coefficient γ=0, Arranging iteration initial number of times i=1, iteration ends number of times is K;
2) seek, according to following formula, the index k that the atom degree of association in residual error and dictionary D is the highest:
Wherein dkFor kth atom in dictionary D, Arg min represents makes object function take Variate-value during little value;
Then selected atom is indexed k and puts into candidate collection I, and I=(I, k);
3) according to following formula renewal sparse coefficient:
Wherein DIFor merely with indexing the dictionary of atom in candidate collection I,For DIPseudo inverse matrix, γIFor merely with the sparse coefficient vector of atom in candidate collection I;
Then according to following formula renewal residual error:
R=y-DIγI
4) make i=i+1, if i < K, then show that dictionary atom is chosen and be not fully complete, return step 2), otherwise observe sequence Row rarefaction representation loop ends, the γ that final updating is obtainedIAs observation sequence y sparse coefficient γ in dictionary D, Entering step d), wherein K is iteration ends number of times, and its value is the atom number selected according to target bit rate;
A relative observation sequence dictionary having 8192 atoms, sparse coefficient γ is the vector of a 1*8192, wherein Nonzero element is only K.By the position of this K nonzero element, comparison dictionary just can be calculated sparse system Atom selected by number.After second-compressed, the data of required transmission are only position and the amplitude of sparse coefficient selected atom.
Transmit for convenience, it is only necessary to position and the amplitude of atom needed for sparse coefficient are encoded.Utilizing sparse table Show after observation sequence is carried out mathematical modeling, signal to greatly reduce, merely with atom selected by orthogonal matching algorithm and Coefficient magnitude (amplitude) just can represent signal.It is set to 8192, it is simply that utilize for convenience by crossing complete dictionary size Binary coding, it is contemplated that 8191=213, the most only need 13 bits just to can determine that the position of a selected atom.Former The accuracy of modeling is had a significant impact by sub-amplitude, so using standard 8 bit PCM coding, it is ensured that to atom width It is accurate that degree encodes.After the position of a whole selected atom of frame signal and amplitude coding are completed, can be transmitted. When frame length is T millisecond, needed for a frame observation sequence sparse coefficient, atom is K, needed for determining an atom site When bit number is p, speech encoding rate can be calculated by formula (6),
Bit rate=K × (p+8)/T Kbps (6)
So as T=40ms, K=10, during p=13, corresponding speech encoding rate is 5.25Kbps.
In decoding end, recover observation sequence according to mathematical model.Decoding end mutually should be with the presence of an identical dictionary To recover observation sequence.After decoding end obtains position and the amplitude of selection sparse coefficient, can according to coefficient positions Determine selected atom in dictionary, and be multiplied by the sparse amplitude that this atom pair is answered, be multiplied by amplitude letter by all afterwards Atom after breath is added, and obtains through rarefaction representation mathematical modeling and recovers to obtain voice observation sequence.Experiment simulation is demonstrate,proved Bright, this modeling method is largely effective to the second-compressed of voice observation sequence, to the restoration errors of Voiced signal 1% Within, Unvoiced signal is slightly worse, but also superior to major part for the Mathematical Modeling Methods of compressed sensing observation sequence.And Characteristic parameter under the method only has location parameter and the range parameter of sparse coefficient atom, it is simple to process further.
After obtaining the voice signal observation sequence again recovered, next step reconstructs according to compressed sensing restructing algorithm Voice signal.Under compressive sensing theory framework, the selection with restructing algorithm of choosing of sparse base directly determines last weight Structure goes out to obtain voice quality.Only select the degree of rarefication of suitable basis representation signal guarantee signal, thus ensure signal Recovery precision, when studying the sparse coefficient of signal, conversion base can be weighed by the conversion coefficient rate of decay Rarefaction representation ability.There are some researches show, meeting the signal with power velocity attenuation, available compressive sensing theory obtains Recover to reconstruct, and reconstructed error meets
E = | | x - x ^ | | 2 &le; C r &CenterDot; ( K / ( log N ) 6 ) - r - - - ( 7 )
Wherein r=1/p-1/2,0 < p < 1.
Document is had to point out the Fourier coefficient of smooth signal, wavelet coefficient, the function of total variation of the functions of bounded variation, The Curvelet coefficient etc. of the Gabor coefficient of oscillator signal and the picture signal with just continuous boundary the most all has enough Openness, signal can be recovered by compressive sensing theory.According to the characteristic of voice signal, selecting discrete cosine Present good openness under base, therefore selecting discrete cosine base is the sparse base of voice signal.DCT based structures such as formula (8) Shown in,
In compressive sensing theory, owing to observation quantity M is far smaller than signal length N, therefore have to solve deficient The problem determining equation y=Φ x=Φ Ψ θ=Ξ θ.Apparently, the underdetermined system of equations is solved the most hopeless.But by Being sparse or compressible in signal x, this premise fundamentally changes problem so that underdetermined problem can solve, And RIP characteristic also theory ensure that Exact recovery signal from M observation.
In order to more clearly describe the p norm of the signal reconstruction problem of compressive sensing theory, first definition signal x it is
| | x | | p = ( &Sigma; i = 1 N | x i | p ) 1 / p - - - ( 9 )
Obtain 0 norm as p=0, actually represent the number of nonzero term in x.Then, maybe can press signal is sparse On the premise of contracting, the problem solving underdetermined equation is converted into minimum 0 norm problem (10)
min||ΨTx||0S.t.y=Φ Ψ x=Ξ θ (10)
Base tracing algorithm will demonstrate that under the conditions of meeting limited equidistant characteristics (RIP), solves a simpler LI model Number minimizes, i.e. minimum as p=1 in formula (9), has identical solution with solving L0 Norm minimum,
min||ΨTx||1S.t.y=Φ Ψ x=Ξ θ (11)
Base tracing algorithm is the restructing algorithm being now widely used for compressed sensing.Utilize base tracing algorithm according to observation sequence Row obtain the expression that signal is the most sparse from discrete cosine base, i.e. with the fewest reported as precisely as possible representing of base vector Primitive tone signal, thus obtain inward nature's characteristic of signal.Use and represent that sparse norm represents sparse as signal The tolerance of property, is the constrained extreme-value problem of class by minimizing L1 norm by sparse signal representation problem definition, And then be converted into convex optimization linear programming problem and solve.
Although the optimal solution obtained by base tracing algorithm is reconstructed observation on the whole and approaches former observation on Euclidean distance Value, but owing to base tracing algorithm uses L1 norm to move high yardstick as object function, news at low Scale energy Phenomenon, thus easily produce some artifact effect, at high-frequency region, concussion can occur.According to raw tone and reconstruct language The spectrogram contrast of sound finds, both are more identical in low-frequency range, and high band reconstructed voice frequency spectrum promotes bigger.For language For tone signal, HFS spectrum energy promotes and means to there will be some sharp-pointed " hereby " sound, causes reconstruct The human auditory system quality of voice has certain impact.A rearmounted low-pass first order filter can effectively be alleviated has reconstruct to bring HFS distortion, promotes human ear reconstructed voice auditory properties.Shown in filter transfer function such as formula (12).
H ( z ) = 1 - &mu; 1 - &mu;z - 1 - - - ( 12 )
When filter parameter μ takes 0.9, shown in filter transfer function such as formula (13).
H ( z ) = 1 - 0.9 1 - 0.9 z - 1 - - - ( 13 )
After postfilter filters, reconstructed voice HFS frequency spectrum is closer to raw tone, and " hereby " sound is little Time, reconstructed voice sounds the most comfortable.

Claims (2)

1. one kind based on compressed sensing and the voice coding method of rarefaction representation, it is characterised in that the method comprises the following steps:
A) the dictionary D of an applicable voice signal observation sequence is obtained by the training of K singular value decomposition algorithm;
B) observation sequence is obtained: at coding side, first the voice entering encoder is carried out the sub-frame processing that frame length is 20~40ms, then utilize row order echelon matrices as projection matrix, according to the compression ratio of 1:2 or 1:4, every frame voice is projected, obtain the observation sequence y of every frame voice;
C) utilize rarefaction representation that observation sequence y is carried out mathematical modeling, i.e. utilize orthogonal matching pursuit algorithm, obtain observation sequence y sparse coefficient in dictionary D, specifically comprise the following steps that
1) initializing: candidate collection I is initialized as empty set, i.e. I=() empty set, residual error r=y, sparse coefficient γ=0, arranges iteration initial number of times i=1, iteration ends number of times is atom number K selected according to target bit rate, namely the degree of rarefication preset;
2) seek, according to following formula, the index k that the atom degree of association in residual error and dictionary D is the highest:
Wherein dkFor kth atom in dictionary D, Arg min represents variate-value when making object function take minima;
Then selected atom is indexed k and puts into candidate collection I, and I=(I, k);
3) according to following formula renewal sparse coefficient:
Wherein DIFor merely with indexing the dictionary of atom in candidate collection I,For DIPseudo inverse matrix, γIFor merely with the sparse coefficient vector of atom in candidate collection I;
Then according to following formula renewal residual error:
R=y-DIγI
4) make i=i+1, if i < K, then show that dictionary atom is chosen and be not fully complete, return step 2), otherwise observation sequence rarefaction representation loop ends, the γ that final updating is obtainedIAs observation sequence y sparse coefficient γ in dictionary D, entering step d), wherein K is iteration ends number of times, and its value is the atom number selected according to target bit rate;
D) as follows, respectively position and the amplitude of K atom needed for sparse coefficient γ are encoded:
Atom number in dictionary D is defined as the exponential depth of 2, i.e. L=2p, find the position of required atom according to p bit, use standard 8 bit pulse modulating-coding as atom amplitude;
E) recovery of voice signal observation sequence: according to the position and the amplitude that obtain K atom needed for sparse coefficient γ in described step d), the atom required for sparse coefficient γ is found in dictionary D, then the vector of each atom is multiplied with its amplitude, then the atom addition of vectors after the K obtained being multiplied with amplitude, the voice signal observation sequence being restored out;
F) reconstruct of voice signal: the observation sequence according to recovering reconstructs voice signal;Selecting discrete cosine base is the sparse base of voice signal, uses base tracing algorithm to reconstruct voice signal as restructing algorithm, the voice signal observation sequence utilizing described step e) to recover;
G) reconstructed speech signal is carried out low-pass filtering: according to filter transfer functionThe voice signal using the method for rearmounted low pass filter to reconstruct described step f) is filtered post processing.
The most according to claim 1 based on compressed sensing with the voice coding method of rarefaction representation, it is characterised in that in described step b), the frame length voice entering encoder being carried out sub-frame processing at coding side is 40ms.
CN201410026207.6A 2014-01-21 2014-01-21 Based on compressed sensing and the voice coding method of rarefaction representation Withdrawn - After Issue CN103778919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410026207.6A CN103778919B (en) 2014-01-21 2014-01-21 Based on compressed sensing and the voice coding method of rarefaction representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410026207.6A CN103778919B (en) 2014-01-21 2014-01-21 Based on compressed sensing and the voice coding method of rarefaction representation

Publications (2)

Publication Number Publication Date
CN103778919A CN103778919A (en) 2014-05-07
CN103778919B true CN103778919B (en) 2016-08-17

Family

ID=50571087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410026207.6A Withdrawn - After Issue CN103778919B (en) 2014-01-21 2014-01-21 Based on compressed sensing and the voice coding method of rarefaction representation

Country Status (1)

Country Link
CN (1) CN103778919B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103974076B (en) 2014-05-19 2018-01-12 华为技术有限公司 Image coding/decoding method and equipment, system
CN105336338B (en) 2014-06-24 2017-04-12 华为技术有限公司 Audio coding method and apparatus
CN104217730B (en) * 2014-08-18 2017-07-21 大连理工大学 A kind of artificial speech bandwidth expanding method and device based on K SVD
CN104506198B (en) * 2014-12-30 2017-08-01 大连理工大学 Cardiechema signals compression algorithm based on repeated feature
CN104934038A (en) * 2015-06-09 2015-09-23 天津大学 Spatial audio encoding-decoding method based on sparse expression
CN107305770B (en) * 2016-04-21 2021-02-09 华为技术有限公司 Method, device and system for sampling and reconstructing audio signal
CN107622777B (en) * 2016-07-15 2020-04-14 公安部第三研究所 High-code-rate signal acquisition method based on over-complete dictionary pair
CN106548780B (en) * 2016-10-28 2019-10-15 南京邮电大学 A kind of compressed sensing reconstructing method of voice signal
CN106653061A (en) * 2016-11-01 2017-05-10 武汉大学深圳研究院 Audio matching tracking device and tracking method thereof based on dictionary classification
CN107528595A (en) * 2017-07-17 2017-12-29 广东工业大学 K MP compressed sensing method for fast reconstruction
CN107659315B (en) * 2017-09-25 2020-11-10 天津大学 Sparse binary coding circuit for compressed sensing
CN107705795A (en) * 2017-09-27 2018-02-16 天津大学 Multichannel audio processing method based on KSVD algorithms
CN109044781A (en) * 2018-09-06 2018-12-21 深圳源广安智能科技有限公司 A kind of both arms multifunction medical instrument
CN109040116B (en) * 2018-09-06 2020-03-27 广州宏途教育网络科技有限公司 Video conference system based on cloud server
CN109299227B (en) * 2018-11-07 2023-06-02 平安医疗健康管理股份有限公司 Information query method and device based on voice recognition
CN110739000B (en) * 2019-10-14 2022-02-01 武汉大学 Audio object coding method suitable for personalized interactive system
CN111355493B (en) * 2020-04-03 2023-05-23 哈尔滨工业大学 Support set screening and reconstructing method for modulation broadband converter
CN112054803B (en) * 2020-08-31 2023-11-21 昆明理工大学 Communication signal sorting method based on compressed sensing
CN112187282A (en) * 2020-09-02 2021-01-05 北京电子工程总体研究所 Compressed sensing signal reconstruction method and system based on dictionary double learning
CN112466315A (en) * 2020-12-02 2021-03-09 公安部第三研究所 High code rate obtaining method for audio and video
CN112737595B (en) * 2020-12-28 2023-10-24 南京航空航天大学 Reversible projection compressed sensing method based on FPGA
CN112802139A (en) * 2021-02-05 2021-05-14 歌尔股份有限公司 Image processing method and device, electronic equipment and readable storage medium
CN113327632B (en) * 2021-05-13 2023-07-28 南京邮电大学 Unsupervised abnormal sound detection method and device based on dictionary learning
CN113644916B (en) * 2021-07-30 2023-12-26 南京信息工程大学滨江学院 Electric power system steady-state data compression method based on edge calculation
CN113723546B (en) * 2021-09-03 2023-12-22 江苏理工学院 Bearing fault detection method and system based on discrete hidden Markov model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034478B (en) * 2010-11-17 2013-10-30 南京邮电大学 Voice secret communication system design method based on compressive sensing and information hiding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9224398B2 (en) * 2010-07-01 2015-12-29 Nokia Technologies Oy Compressed sampling audio apparatus

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034478B (en) * 2010-11-17 2013-10-30 南京邮电大学 Voice secret communication system design method based on compressive sensing and information hiding

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于压缩感知的语音信号编码算法;王茂林 等;《桂林电子科技大学学报》;20120831;第32卷(第4期);全文 *
基于小波变换和压缩感知的低速率语音编码方案;叶蕾 等;《仪器仪表学报》;20100731;第31卷(第7期);全文 *
基于数据驱动字典和稀疏表示的语音增强;孙林慧 等;《信号处理》;20111231;第27卷(第12期);全文 *
行阶梯观测矩阵、对偶仿射尺度内点重构算法下的语音压缩感知;叶蕾 等;《电子学报》;20120331;第40卷(第3期);全文 *

Also Published As

Publication number Publication date
CN103778919A (en) 2014-05-07

Similar Documents

Publication Publication Date Title
CN103778919B (en) Based on compressed sensing and the voice coding method of rarefaction representation
US6633839B2 (en) Method and apparatus for speech reconstruction in a distributed speech recognition system
CN101140759B (en) Band-width spreading method and system for voice or audio signal
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN105023580B (en) Unsupervised noise estimation based on separable depth automatic coding and sound enhancement method
CN105070293B (en) Audio bandwidth expansion coding-decoding method based on deep neural network and device
CN103117059B (en) Voice signal characteristics extracting method based on tensor decomposition
US7027979B2 (en) Method and apparatus for speech reconstruction within a distributed speech recognition system
CN103531205A (en) Asymmetrical voice conversion method based on deep neural network feature mapping
CN102750955B (en) Vocoder based on residual signal spectrum reconfiguration
CN111508470B (en) Training method and device for speech synthesis model
CN106653056A (en) Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof
CN104217730B (en) A kind of artificial speech bandwidth expanding method and device based on K SVD
CN110047501A (en) Multi-to-multi phonetics transfer method based on beta-VAE
CN113450761A (en) Parallel speech synthesis method and device based on variational self-encoder
CN103236262B (en) A kind of code-transferring method of speech coder code stream
CN106782599A (en) The phonetics transfer method of post filtering is exported based on Gaussian process
Thomas et al. Acoustic and data-driven features for robust speech activity detection
CN106875944A (en) A kind of system of Voice command home intelligent terminal
CN104240717A (en) Voice enhancement method based on combination of sparse code and ideal binary system mask
CN114495973A (en) Special person voice separation method based on double-path self-attention mechanism
Shin et al. Audio coding based on spectral recovery by convolutional neural network
CN115966218A (en) Bone conduction assisted air conduction voice processing method, device, medium and equipment
CN115035904A (en) High-quality vocoder model based on generative antagonistic neural network
Liu et al. A novel unified framework for speech enhancement and bandwidth extension based on jointly trained neural networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20140507

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: 2016320000212

Denomination of invention: Speech coding method based on compressed sensing and sparse representation

Granted publication date: 20160817

License type: Common License

Record date: 20161118

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EC01 Cancellation of recordation of patent licensing contract

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: 2016320000212

Date of cancellation: 20180116

EC01 Cancellation of recordation of patent licensing contract
AV01 Patent right actively abandoned

Granted publication date: 20160817

Effective date of abandoning: 20230828

AV01 Patent right actively abandoned

Granted publication date: 20160817

Effective date of abandoning: 20230828

AV01 Patent right actively abandoned
AV01 Patent right actively abandoned