CN103824561B - Missing value nonlinear estimating method of speech linear predictive coding model - Google Patents
Missing value nonlinear estimating method of speech linear predictive coding model Download PDFInfo
- Publication number
- CN103824561B CN103824561B CN201410054042.3A CN201410054042A CN103824561B CN 103824561 B CN103824561 B CN 103824561B CN 201410054042 A CN201410054042 A CN 201410054042A CN 103824561 B CN103824561 B CN 103824561B
- Authority
- CN
- China
- Prior art keywords
- alpha
- line spectral
- spectral frequency
- frequency parameters
- normalization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The embodiment of the invention discloses a missing value nonlinear estimating method of a speech linear predictive coding model. The method includes the following steps that: linear spectrum frequency parameter transformation: linear spectrum frequency parameters of the speech linear predictive coding model are converted into linear spectrum frequency parameter difference values through linear transform; model training; probability distribution calculation of lost parts and received parts in a transmission process; and minimum mean square error optimal estimation. With the method provided by the embodiment of the invention adopted, optimal estimation of the linear predictive model can be realized reliably under the situation in which packet loss occurs in packet transmission, and therefore, transmission loss can be reduced, voice quality can be improved. Thus, the missing value nonlinear estimating method of the speech linear predictive coding model has a great practical value.
Description
Technical field
The present invention relates in packet network, the process problem of packet loss in sound transmission course, describe emphatically a kind of based on the line spectral frequency parameters of conversion and the nonlinear optimization method of estimation of Di Li Cray mixture model.
Background technology
Along with the deep development of Internet technology, voice communication technology obtains significant progress, and the voice signal of transmission has been propagated by narrow band signal and evolved to broadband signal propagation.Continually developing and popularization along with multimedia application, people are more and more higher for the requirement of quality of voice transmission in voice communication technology and real-time, and therefore, the voice communication algorithm of research high efficient and reliable, has urgent social demand.
The matter of utmost importance that will solve in voice communication is the coding of voice.Through the development of many decades, speech coding technology roughly can be divided into three kinds of modes: waveform encoding techniques, based on the coding techniques of parameter model and mixed coding technology.Waveform encoding techniques directly carries out quantizing and transmitting, not based on acoustic model for speech waveform.After voice being analyzed by linear prediction model based on the coding techniques of parameter model, transmission line forecast model, side information and speech energy information respectively.Mixed coding technology is that the two combines.
In voice coding, the coding based on parameter model is widely used, and its core is the quantification and the coding that how effectively realize linear prediction model reliably.In the research of speech linear predictive coding model, generally LPC parameters is converted into line spectral frequency parameters, this method for expressing is other Parametric Representation methods more stability and high efficiency comparatively, and reason is that the distribution of its frequency spectrum sensitizing range is comparatively average.
In packet network during transferring voice, the quality that voice recover depends on the situation of network to a great extent.Under the pattern of packet network transmission, if can estimate from Given information the grouping postponing or lose, effectively can reply out voice signal, and avoid extra delay, thus improve voice quality, improve the experience of user.Traditional disappearance and joint distribution between the line spectral frequencies element that receives carry out modeling primarily of gauss hybrid models, simulated the joint distribution receiving part and lost part by gauss hybrid models, thus optimal estimation goes out the information of the bag of loss.Up-to-date research shows, the coding for linear prediction model can line spectral frequency parameters difference realize by quantifying, and the method quantizes more effective than traditional line spectral frequency parameters based on gauss hybrid models.When transmission line spectral frequency difference, traditional gauss hybrid models cannot the distribution of simulated data well, also just can not realize optimum prediction.Therefore, design the corresponding statistical model bag that also model loses in optimal estimation transmitted in packets thus for line spectral frequencies difference and just seem particularly important.
Summary of the invention
For the packet loss problem in existing voice transmitting procedure, the object of this invention is to provide a kind of nonlinear optimization algorithm and estimate lost content, recover the voice quality transmitted to greatest extent.
For achieving the above object, the nonlinear optimization missing value estimation method that the present invention proposes comprises the following steps:
Line spectral frequency parameters shift step: the line spectral frequency parameters of linear for voice coded prediction model is converted into line spectral frequency parameters difference by linear transformation;
Training pattern step: at transmitting terminal, uses the distribution of Di Li Cray mixture model (DMM-Dirichlet mixture model) artificial line spectral frequency parameter difference, adopts the parameters in expectation-maximization algorithm training DMM;
In transmitting procedure lost part and receive interconnection distribution calculation procedure: the hypothesis meeting Dirichlet distribute (Dirichlet distribution) according to line spectral frequency parameters difference, line spectral frequency parameters difference be divided into lost part and receive part, obtaining corresponding Dirichlet distribute after normalization respectively;
Least mean-square error optimum estimation step: according to least mean-square error standard, obtains the optimal estimation of missing values.
In line spectral frequency parameters shift step, utilize the 1. non-negative characteristic of line spectral frequency parameters, 2. ordered nature and 3. bounded characteristic be transformed to linear spectral parameter difference Δ LSF, the feature of this difference is: be 1. distributed in (0,1), in open interval, 2. add and be 1; This step detailed process is as follows:
1) K ties up line spectral frequency parameters and is expressed as s=[s
1, s
2..., s
k]
t, meet 0 < s
1< s
2< ..., s
k< π;
2) K+1 after conversion ties up line spectral frequency parameters difference DELTA LSF
wherein
In training pattern step, before transmission, suppose that the voice signal sent meets Dirichlet distribute, in transmitting terminal training pattern, obtain the parameter of i-th mixed components in mixture model:
Wherein,
Before being transmitted, this parameter is known at receiving end.
In transmitting procedure lost part and receive interconnection distribution calculation procedure, suppose
meet Dirichlet distribute, it can be divided into two parts after transport: lost part
with receive part
due to Di Li Cray vector
neutral vector (neutral vector), can by both correlation properties estimation lost parts wherein.Will
with
can calculate their marginal probability distribution respectively after normalization, its process is as follows:
1) input: by the Δ LSF parameter obtained in the first step
be divided into lost part and receive part, namely
Two parts comprise M and R element respectively;
2) right respectively
with
normalization:
A) sue for peace,
M and R is respectively
with
the length of vector;
B) normalization obtains
with
in like manner,
3) due to after normalization
add and be 1, meet Dirichlet distribute according to it, probability density function is:
After in like manner can receiving part normalization
distribution:
Least mean-square error optimum estimation step: according to minimum mean square error criterion, lost part
optimum estimate, be normalization lost part
average and (1-S
r) be multiplied the result obtained, namely lost part is in the known conditional mean received on part basis.Result of calculation as shown in the formula:
Wherein,
by the parameter receiving partial probability density function and determine.
Beneficial effect of the present invention is, in terms of existing technologies, and the line spectral frequency parameters transmission of the present invention's application conversion, with the distribution of Di Li Cray analogue transmission signal, provide again complete estimating system for application, test findings demonstrates high efficiency of the present invention, has very strong practicality.
Accompanying drawing explanation
Fig. 1 is the flow chart of steps of the nonlinear optimization packet loss method of estimation of a kind of speech linear predictive model of the present invention;
Fig. 2 is the flow chart of steps of line spectral frequency parameters conversion;
Fig. 3 is the flow chart of steps of the mixed components parameter trained at transmitting terminal;
Fig. 4 calculates lost part and the flow chart of steps receiving interconnection distribution in transmitting procedure;
Fig. 5 is least mean-square error optimum estimation flow chart of steps.
Embodiment
Below in conjunction with accompanying drawing, specific embodiments of the present invention is described in detail.
Fig. 1 is process flow diagram of the present invention, comprises the following steps:
Step S1: line spectral frequency parameters is converted to line spectral frequency parameters difference step;
Step S2: the mixed components parameter step trained at transmitting terminal;
Step S3: to calculate in transmitting procedure lost part and receive part normalization probability distribution step;
Step S4: least mean-square error optimum estimation step.
To be specifically described each step below:
Step S1 realizes line spectral frequency parameters conversion, and the line spectral frequency parameters of linear for voice coded prediction model is converted into line spectral frequency parameters difference by linear transformation.The idiographic flow that Fig. 2 gives the method is as follows:
1) input:
A) line spectral frequency parameters s=[s
1, s
2..., s
k]
t;
2), in step 11, by i from 1 to K+1 circulation, the difference at every turn obtained is as follows:
3) export:
A) line spectral frequency parameters
Step S2 is training pattern before being transmitted, obtains according to hypothesis step S1
meet Dirichlet distribute,
wherein α=[α
1, α
2... α
k+1]
tit is parameter vector.As Fig. 3,
middle extraction N ties up object vector
if step 31 is by the mixing Di Li Cray model containing I component, the probability of object vector can be obtained:
wherein α
i=[α
1i, α
2i... α
k+1, i]
tbe the parameter vector of i-th mixed components, this is also known at receiving end.π
ithe nonnegative curvature of i-th component, and
as step 33 receives and lose two-part thought according to being divided into by overall line spectral frequencies parameter in step S3, can be by the mixed components Parametric Representation obtained in conditional probability distribution:
These parameter two parts are all known at transmitting terminal and receiving end.
Step S3 to calculate in transmitting procedure lost part and receives interconnection distribution, as Fig. 4, and will
lost part is divided into after transmission
with receive part
two parts, will
with
can calculate their marginal probability distribution respectively after normalization, its process is as follows:
1) input: the Δ LSF parameter that step 41 will obtain in S1
be divided into lost part and receive part, namely
2) step 42 is right respectively
with
normalization:
A) sue for peace,
M and R is respectively
with
the length of vector;
B) normalization result:
in like manner,
3) step 43 writes out two-part distribution, after normalization
add and be 1, meet Dirichlet distribute according to it, density function is:
After in like manner can receiving part normalization
distribution:
Step S4 is according to minimum mean square error criterion optimal estimation
namely the best expectation value of lost part receives the conditional mean that part basis obtains, as Fig. 5 known.Step 51 tries to achieve the expectation after lost part normalization, and this expectation is obtained by the expectation value weighted sum of each composition in mixture model; The length that lost part is multiplied by expectation after lost part normalization by step 52 obtains the optimum estimation of lost part, and this length is expressed as by receiving part
Result of calculation as shown in the formula:
Wherein,
to distribute the parameter determined by receiving part.
Below by reference to the accompanying drawings the nonlinear optimization packet loss method of estimation of proposed speech linear predictive model and the embodiment of each module are set forth.By the description of above embodiment, one of ordinary skill in the art clearly can recognize that the mode that the present invention can add required general hardware platform by software realizes, and can certainly pass through hardware implementing, but the former are better embodiments.Based on such understanding, technical scheme of the present invention can embody the part that prior art contributes in essence in other words in form of a computer software product, this software product is stored in a storage medium, comprises some instructions and performs method described in each embodiment of the present invention in order to make one or more computer equipment.
According to thought of the present invention, all will change in specific embodiments and applications.In sum, this description should not be construed as limitation of the present invention.
Above-described embodiment of the present invention, does not form the restriction to invention protection domain.Any amendment done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.
Claims (5)
1. a nonlinear optimization packet loss method of estimation for speech linear predictive model, is characterized in that, comprise the following steps:
Line spectral frequency parameters shift step: the line spectral frequency parameters of linear for voice coded prediction model is converted into line spectral frequency parameters difference by linear transformation;
Training pattern step: at transmitting terminal, uses the distribution of Di Li Cray mixture model artificial line spectral frequency parameter difference, adopts the parameters in the Di Li Cray mixture model of expectation-maximization algorithm training;
In transmitting procedure lost part and receive interconnection distribution calculation procedure: the hypothesis meeting Dirichlet distribute (Dirichlet distribution) according to line spectral frequency parameters difference, line spectral frequency parameters difference be divided into lost part and receive part, obtaining corresponding Dirichlet distribute after normalization respectively;
Least mean-square error optimum estimation step: according to least mean-square error standard, obtains the optimal estimation of missing values.
2. the method for claim 1, it is characterized in that, in line spectral frequency parameters shift step, utilize the 1. non-negative characteristic of line spectral frequency parameters, 2. ordered nature and 3. bounded characteristic be transformed to linear spectral parameter difference Δ LSF, the feature of this difference is: be 1. distributed in (0,1) open interval, 2. adds and is 1; This step detailed process is as follows:
1) K ties up line spectral frequency parameters and is expressed as s=[s
1, s
2..., s
k]
t, meet 0 < s
1< s
2< ..., s
k< π;
2) it is x=[x that the K+1 after conversion ties up line spectral frequency parameters difference DELTA LSF
1, x
2..., x
k+1]
t, wherein
3. method as claimed in claim 2, it is characterized in that, in training pattern step, before transmission, suppose that the x calculated in claim 2 meets Dirichlet distribute, in transmitting terminal training pattern, the mixed components parameter obtained in conditional probability distribution can be expressed as:
Wherein,
This parameter is also known at receiving end.
4. method as claimed in claim 3, is characterized in that, calculating in transmitting procedure lost part and receive interconnection distribution step, suppose that x meets Dirichlet distribute, it can be divided into two parts after transport: lost part x
mwith receive part x
r, can by both correlation properties estimation lost parts wherein; Because Di Li Cray vector x is neutral vector (neutral vector), by x
mand x
rcan calculate their marginal probability distribution respectively after normalization, its process is as follows:
1) input: the Δ LSF parameter x obtained in previous step be divided into lost part and receive part, namely
2) respectively to x
mand x
rnormalization:
A) sue for peace,
m and R is x respectively
mand x
rthe length of vector;
B) normalization result:
in like manner,
3) due to after normalization
add and be 1, meet Dirichlet distribute according to it, density function is:
After in like manner can receiving part normalization
distribution:
5. method as claimed in claim 4, is characterized in that, least mean-square error optimum estimation step: according to minimum mean square error criterion, lost part
optimum estimate device, be normalization lost part
average be received part
the result that weighted sum obtains, namely lost part is in the known conditional mean receiving part; Result of calculation as shown in the formula:
Wherein,
receive by what calculate in claim 4 the parameter that partial probability density function determines.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410054042.3A CN103824561B (en) | 2014-02-18 | 2014-02-18 | Missing value nonlinear estimating method of speech linear predictive coding model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410054042.3A CN103824561B (en) | 2014-02-18 | 2014-02-18 | Missing value nonlinear estimating method of speech linear predictive coding model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103824561A CN103824561A (en) | 2014-05-28 |
CN103824561B true CN103824561B (en) | 2015-03-11 |
Family
ID=50759583
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410054042.3A Active CN103824561B (en) | 2014-02-18 | 2014-02-18 | Missing value nonlinear estimating method of speech linear predictive coding model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103824561B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10325609B2 (en) * | 2015-04-13 | 2019-06-18 | Nippon Telegraph And Telephone Corporation | Coding and decoding a sound signal by adapting coefficients transformable to linear predictive coefficients and/or adapting a code book |
CN110660402B (en) | 2018-06-29 | 2022-03-29 | 华为技术有限公司 | Method and device for determining weighting coefficients in a stereo signal encoding process |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145344A (en) * | 2006-09-15 | 2008-03-19 | 华为技术有限公司 | Spectral line frequency vector quantization method and system |
JP2010145836A (en) * | 2008-12-19 | 2010-07-01 | Nippon Telegr & Teleph Corp <Ntt> | Direction information distribution estimating device, sound source number estimating device, sound source direction measuring device, sound source separating device, methods thereof, and programs thereof |
CN102956237A (en) * | 2011-08-19 | 2013-03-06 | 杜比实验室特许公司 | Method and device for measuring content consistency and method and device for measuring similarity |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8010341B2 (en) * | 2007-09-13 | 2011-08-30 | Microsoft Corporation | Adding prototype information into probabilistic models |
-
2014
- 2014-02-18 CN CN201410054042.3A patent/CN103824561B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145344A (en) * | 2006-09-15 | 2008-03-19 | 华为技术有限公司 | Spectral line frequency vector quantization method and system |
JP2010145836A (en) * | 2008-12-19 | 2010-07-01 | Nippon Telegr & Teleph Corp <Ntt> | Direction information distribution estimating device, sound source number estimating device, sound source direction measuring device, sound source separating device, methods thereof, and programs thereof |
CN102956237A (en) * | 2011-08-19 | 2013-03-06 | 杜比实验室特许公司 | Method and device for measuring content consistency and method and device for measuring similarity |
Non-Patent Citations (1)
Title |
---|
《关于汉语/英语AMR语音编码参数统计特性的研究》;于薇;《电讯技术》;20020228(第2期);80-83 * |
Also Published As
Publication number | Publication date |
---|---|
CN103824561A (en) | 2014-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102436820B (en) | High frequency band signal coding and decoding methods and devices | |
CN103229234B (en) | Audio encoding device, method and program, and audio decoding deviceand method | |
CN105976830B (en) | Audio-frequency signal coding and coding/decoding method, audio-frequency signal coding and decoding apparatus | |
CN103531205A (en) | Asymmetrical voice conversion method based on deep neural network feature mapping | |
RU2011104813A (en) | DEVICE AND METHOD OF QUANTIZATION AND REVERSE QUANTIZATION OF LPC FILTER WITH VARIABLE BIT TRANSFER SPEED | |
CN101751926A (en) | Signal coding and decoding method and device, and coding and decoding system | |
CN103295582B (en) | Noise suppressing method and system thereof | |
CN101521010B (en) | Coding and decoding method for voice frequency signals and coding and decoding device | |
CN101308655B (en) | Audio coding and decoding method and layout design method of static discharge protective device and MOS component device | |
CN106104685B (en) | Audio coding method and device | |
CN104995673B (en) | Hiding frames error | |
CN106847297A (en) | The Forecasting Methodology of high-frequency band signals, coding/decoding apparatus | |
CN103824561B (en) | Missing value nonlinear estimating method of speech linear predictive coding model | |
CN102332268B (en) | Voice signal sparse representation method based on self-adaptive redundant dictionary | |
CN112086100A (en) | Quantization error entropy based urban noise identification method of multilayer random neural network | |
CN109743269A (en) | A kind of underwater sound OFDM channel reconstruction method based on data fitting | |
CN102982807B (en) | Method and system for multi-stage vector quantization of speech signal LPC coefficients | |
CN101198041A (en) | Vector quantization method and device | |
CN107452391B (en) | Audio coding method and related device | |
CN103632673B (en) | A kind of non-linear quantization of speech linear predictive model | |
US10276186B2 (en) | Parameter determination device, method, program and recording medium for determining a parameter indicating a characteristic of sound signal | |
CN101604524B (en) | Stereo coding method, stereo coding device, stereo decoding method and stereo decoding device | |
CN104036781A (en) | Voice signal bandwidth expansion device and method | |
US20230395086A1 (en) | Method and apparatus for processing of audio using a neural network | |
CN101944235A (en) | Image compression method based on fractional fourier transform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |