CN103258543A

CN103258543A - Method for expanding artificial voice bandwidth

Info

Publication number: CN103258543A
Application number: CN2013101300812A
Authority: CN
Inventors: 陈喆; 殷福亮; 彭雯雯
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2013-04-12
Filing date: 2013-04-12
Publication date: 2013-08-21
Anticipated expiration: 2033-04-12
Also published as: CN103258543B

Abstract

The invention discloses a method for expanding an artificial voice bandwidth. The working process comprises the steps that a narrow band signal is transmitted into a curve fitting module and is input to an outer pushing high frequency envelope module to be processed, and an output signal of the outer pushing high frequency envelope module enters frequency spectrum forming module; after a narrow band voice signal is transmitted into a feature abstracting module, each frame obtains one set of linear predication parameters, an autoregression model and a filter module are formed by utilizing the linear predication parameters, white noise is processed by an AR module to produce a high frequency noise random sequence related to low frequency, and the high frequency noise random sequence enters the frequency spectrum forming module; the frequency spectrum forming module outputs high frequency voice; wideband voice is obtained by transmitting the high frequency voice and an original narrow band voice signal into a voice synthetizing module.

Description

A kind of method of artificial speech bandwidth expansion

Technical field

The present invention relates to a kind of method of artificial speech bandwidth expansion, belong to digital signal processing technique field.

Background technology

At present, public telephone network (PSTN) effective frequency range only is 0.3～3.4KHz, and GSM digital cellular telephone effective bandwidth is no more than 4KHz.Although the main concentration of energy of speech signal is in 0.3～3.4KHz frequency range, what the actual frequency range that takies will be big is many.The 4KHz narrowband speech has been owing to lacked high fdrequency component, its naturalness, and the obvious variation in aspects such as intelligibility sounds " vexed ".

Summary of the invention

In order to overcome above-mentioned deficiency, the object of the present invention is to provide a kind of method of artificial speech bandwidth expansion.

A kind of method of artificial speech bandwidth expansion, its course of work is as follows:

Through extrapolation high-frequency envelope module, the output signal of extrapolation high-frequency envelope module enters the spectral shaping module after the narrow band voice signal process curve fitting module; Narrow band voice signal obtains one group of linear predictor coefficient through every frame after the characteristic extracting module, construct autoregressive model and filtration module after utilizing linear predictor coefficient, white noise is handled the generation high frequency noise random series relevant with low frequency by this autoregressive model, and the high frequency noise random series enters the spectral shaping module; Spectral shaping module output high frequency voice; High frequency voice and narrow band voice signal obtain broadband voice through the phonetic synthesis module.

The principle of the invention and beneficial effect: keep the lower advantage of algorithm complex, produce the artificial excitation higher with true excitation correlativity.The present invention at first carries out curve fitting to known low frequency log-domain frequency spectrum, obtains curvilinear equation, and then extrapolated high frequency log-domain spectrum envelope curve.From narrowband speech medium and low frequency parameter, utilize linear predictor coefficient to constitute autoregressive model, use the uniform white noise sequence by this autoregressive model, obtain the high frequency noise sequence.This high frequency noise sequence is the white noise that has certain correlativity with narrowband speech, is converted into the log-domain frequency spectrum, passes through the modulation of high frequency log spectrum envelope again, can recover the high frequency voice, and at cepstrum territory synthetic wideband voice.The present invention is a kind of total blindness's speech bandwidth expansion technique, can directly apply to the narrowband speech receiving end.The present invention is without any need for priori or high-frequency information, and algorithm complex is lower, can recover the HFS higher with low correlation, and synthetic broadband voice auditory effect is good.

Description of drawings

Fig. 1 is process flow diagram of the present invention.

Fig. 2 is broadband voice building-up process of the present invention.

Fig. 3 (a) original wideband voice sound spectrograph.

Fig. 3 (b) narrowband speech sound spectrograph.

Voice sound spectrograph after the expansion of Fig. 3 (c) bandwidth.

Fig. 4 (a) algorithm output of the present invention and the output comparing result distribution plan of adaptive rate rate audio coder ﹠ decoder (codec) when code rate is 12.2kbps.

Fig. 4 (b) algorithm output of the present invention and the output comparing result distribution plan of wideband adaptive variable Rate audio coder ﹠ decoder (codec) when code rate is 8.85kbps.

The spectrum distortion of the broadband voice that Fig. 5 narrowband speech and the present invention synthesize is estimated figure.

Fig. 6 shows the subjective testing standard.

Embodiment

The present invention will be further described below in conjunction with accompanying drawing.

Fig. 1 is process flow diagram of the present invention.As shown in Figure 1:

Through extrapolation high-frequency envelope module, the output signal of extrapolation high-frequency envelope module enters the spectral shaping module after the narrow band voice signal process curve fitting module; Narrow band voice signal obtains one group of linear predictor coefficient through every frame after the characteristic extracting module, structure autoregressive model and filtration module, white noise is handled the generation high frequency noise random series relevant with low frequency by this AR model, and the high frequency noise random series enters the spectral shaping module; Spectral shaping module output high frequency voice; High frequency voice and narrow band voice signal obtain broadband voice through the phonetic synthesis module.

Curve fitting module

This module adopts the curve fitting method to obtain narrowband speech low frequency log spectrum enveloping curve equation, by the extrapolated high frequency log spectrum of curvilinear equation envelope.Choose the resonance peak of low frequency part as the input of curve match.At first import the narrowband speech of 8kHz sampling, estimate pitch period, and time-domain signal is transformed in the logarithm frequency domain, by the pitch period search logarithm frequency domain peak point of estimating, the change curve of resonance peak is described through curve fitting technique again, and then extrapolated high frequency log spectrum enveloping curve.

At first, divide frame to handle to narrowband speech, frame length is 128, overlapping 64 sampled points of interframe.The correlativity that adopts frequency domain method namely to calculate signal is calculated the pitch period T of this frame voice.If the input narrowband speech is x (n), autocorrelation function R (k) is

R (k) = Σ_{n = 0}^{N - 1} x (n) x (n - k)

Wherein, N is frame length, and described N=128 searches for the peaked position k' of R (k) in correlation delay k=20～143 scopes, and k' is the valuation T of pitch period.Narrowband speech x (n) is done Fourier transform, be transformed into the logarithm frequency domain then, search out first resonance peak in the logarithm frequency domain, first resonance peak is made as p ₀Because the size in gene cycle and the spacing of resonance peak are about equally, by fixed first resonance peak p ₀With the gene period T, can search out other low-frequency resonance peak.When searching for other low-frequency resonance peaks, only need can obtain the accurate position of other resonance peaks searching for near the point of T with last resonance peak distance, establishing its amplitude is lo_env (ω), i.e. low frequency log spectrum envelope, corresponding Frequency point ω.Lo_env (ω) and ω are as the input of curve match.

Low frequency log spectrum envelope lo_env (ω) and low frequency frequency ω are set up mapping relations

lo_env(ω)＝a·e ^bω+c·e ^dω， ω＝0～2π×4000

Obtain the parameter a in the fitting function, b, c, d had both determined the mapping formula.

Extrapolation high-frequency envelope module

By fixed mapping formula, with high frequency Frequency point substitution formula, the high frequency spectrum envelope data hi_env (ω) of the unknown is extrapolated extrapolated high frequency log spectrum envelope hi_env (ω)

hi_env(ω)＝a·e ^bω+c·e ^dω， ω＝2π×4000～2π×8000。

Characteristic extracting module

Narrowband speech is carried out linear prediction analysis, and every frame obtains one group of linear predictor coefficient, the structure autoregressive model.At first use narrowband speech structure autoregressive model.Be that the speech frame x (n) of N (N=128) carries out linear prediction analysis to each length, namely calculate the autocorrelation function of each windowing speech frame, and use the Levinson-Durbin algorithm to convert thereof into linear predictor coefficient that concrete steps are as follows.

Here use Hamming window window (n)=0.5-0.5cos (2 π n/N), n=0,1 ..., the input speech signal x of N-1 (n) carries out windowing process, voice x'(n after the windowing) be

x'(n)＝x(n)·window(n)，

Calculate autocorrelation function

R (k) = Σ_{n = k}^{N - 1} x^{'} (n) x^{'} (n - k),

K=0,1 ..., N-1, N are positive integer.

Can obtain L rank linear predictor coefficient a by finding the solution following system of equations _i, i=1,2 ..., L, L are positive integer.

Σ_{i = 1}^{L} a_{i} R (| i - k |) = - R (k),

K=1 ..., L, L are positive integer.

Adopt the Levinson-Durbin algorithm, find the solution above-mentioned system of equations, can obtain linear predictor coefficient a _i, i=1 .2.., L.

Structure autoregressive model and filtration module

By low frequency speech linear predictive coefficient a _i, i=1 ..., L constructs composite filter, namely

H (z) = \frac{G}{1 - Σ_{i = 1}^{L} a_{i} z^{- i}},

Wherein, L is the autoregressive model exponent number, and described L is positive integer, and L is certain integer between 8～20, and G is certain decimal between 0.1～l.Embodiments of the invention arrange L=l0, and G=1 is optimum embodiment.

White noise is handled by this composite filter, produced the random series relevant with the low frequency voice.The production method of white noise sequence is

w(n)＝[w(n-1)·31821+13849]，

Wherein, w (0)=0.

White noise sequence w (n) exports high frequency noise sequences y (n), namely by behind the above-mentioned composite filter

y (n) = w (n) + Σ_{i = 1}^{L} a_{i} y (n - i),

Wherein, a _iBe the composite filter coefficient.In order to limit the HFS energy, (n) carries out normalized with the high frequency noise sequences y, namely

y (n) = \frac{y (n)}{Σ_{i = 0}^{N - 1} \sqrt{y (n) \cdot y (n)}},

Wherein, N is frame length, and the present invention's suggestion arranges N=128.

The spectral shaping module

The high frequency logarithm frequency domain envelope hi_env (ω) that utilization is estimated above modulates the high frequency noise sequence ^[7]At first, (n) carries out Fourier transform to the high frequency noise sequences y, again it is transformed into log-domain, obtains the frequency domain logarithm value C of high frequency noise sequence _y(ω).Use high frequency log spectrum envelope that the high frequency noise sequence spectrum is modulated, obtain the frequency spectrum logarithm value C of high frequency voice _Wide(ω)

C _wide(ω)＝C _y(w)·hi_env(w)，

If thresholding is used S respectively when the frequency domain value of high frequency voice and high frequency voice _Wide(ω) and S _Wide(n) expression then has

S _wide(ω)＝exp(C _wide(ω))， (1)

s _wide(n)＝IFFT(S _wide(ω))， (2)

Wherein, exp () is the index computing, and IFFT () is inverse Fourier transform.Through formula (1), formula (2) inverse transformation process, can obtain the high frequency voice.

The phonetic synthesis module

The present invention utilizes the characteristics of cepstrum, and HFS and the low frequency part of voice is synthetic ^[8], and then the broadband voice after obtaining synthesizing.The building-up process of voice as shown in Figure 2.

Be that the method for narrow band signal by interpolation of 8KHz improves sampling rate with sample frequency, promote to be 16KHz that obtain the cepstrum of narrowband speech through cepstrum computation process, the high frequency voice obtain the cepstrum of high frequency voice equally through cepstrum computation process.The cepstrum of narrowband speech and high frequency voice is transformed into frequency domain respectively, and the frequency domain amplitude of narrowband speech is done following processing:

C _wide(ω)＝C _narrow(ω)+C _high(ω)

Wherein, C _Narrow(ω) and C _High(ω) be respectively the cepstrum frequency domain value of narrowband speech and high frequency voice; C _Wide(ω) be the frequency domain value of synthetic broadband cepstrum.Pass through inverse Fourier transform again, obtain the cepstrum of broadband voice, finally by the inverse process of crossing cepstrum, the broadband voice after obtaining synthesizing.As shown in Figure 2.

The present invention is a kind of total blindness's speech bandwidth expansion technique, can directly apply to the narrowband speech receiving end.The present invention is without any need for priori or high-frequency information, and algorithm complex is lower, can recover the HFS higher with low correlation, and synthetic broadband voice auditory effect is good.

In order to verify validity of the present invention, objective examination and subjective testing have been carried out.

Objective examination result

It is the effective ways of objective performance voice quality that spectrum distortion is estimated with sound spectrograph.Be without loss of generality, select the method that sound spectrograph is estimated and drawn to spectrum distortion of calculating for use in objective examination's link.

Spectrum distortion is estimated and is defined as

D_{HC}^{} = \frac{1}{k} Σ_{k = 1}^{k} {&Integral;}_{0.25 ω}^{0.5 ω} {[20 \log_{10} (\frac{A_{k} (ω)}{A_{k}^{'} (ω)}) + G_{C}]}^{2} dω,

G_{C} = \frac{1}{0.25 ω_{s}} {&Integral;}_{0.25 ω_{s}}^{0.5 ω_{s}} 20 \log_{10} (\frac{A_{k}^{'} (ω)}{A_{k} (ω)}) dω,

Wherein, ω _sBe 2 π, G _CBe gain compensation factor, it can remove two square errors between the original envelope effectively, and K is total number of speech frames, A _k(ω) and A' _k(ω) be respectively the spectrum envelope of k frame original reference voice and tested voice, computing formula is as follows

A_{k} (ω) = | Σ_{n = 0}^{N - 1} x (n) e^{- jωn} |,

A_{k}^{'} (ω) = | Σ_{n = 0}^{N - 1} x^{'} (n) e^{- jωn} |,

The present invention's suggestion arranges N=128, and x (n) and x ' (n) represent original reference voice and tested voice respectively, and the original reference voice are the original wideband voice here, and tested voice are original narrowband speech or synthetic broadband voice.

Respectively the original narrowband speech broadband voice synthetic with using this algorithm being calculated spectrum distortion in the manner described above estimates.Test result is seen Fig. 5.As can be seen from Figure 5, the spectrum distortion of the broadband voice that this paper algorithm is synthetic obviously reduces than the spectrum distortion of narrowband speech, illustrates that this paper algorithm can estimate high frequency voice and synthetic wideband voice preferably.

Sound spectrograph is the energy information of representing one section voice intermediate frequency spectrum with gray level image, and the more bright part of image illustrates that this portion of energy is more big, and more dark part illustrates that the energy of this partial frequency spectrum is more little.Sound spectrograph can show the variation of voice medium frequency intuitively, therefore, in order to contrast frequency spectrum difference more intuitively, provided man's narrowband speech in the tested speech, the sound spectrograph of the broadband voice that original wideband voice and this illiteracy of process bandwidth expansion algorithm are synthetic is shown in Fig. 3 (a) and (b), (c).Be the sound spectrograph of primary speech signal from Fig. 3 (a), as can be seen, sound spectrograph all is brighter in 0～8KHz frequency range.Fig. 3 (b) is the sound spectrograph of narrow band voice signal, and the sound spectrograph of narrowband speech is very dark in 4～8KHz frequency range, illustrate at the HFS energy very little, so narrowband speech sounds inadequately nature.Fig. 3 (c) is the sound spectrograph of the blind bandwidth expansion algorithm output voice that propose of the present invention, and in 4～8KHz frequency range, sound spectrograph obviously brightens, and illustrates that the high fdrequency component of voice obviously increases.

The subjective testing result

Subjective testing adopts subjective testing standards of grading method commonly used in the world, namely compares mean opinion score.Fig. 6 has provided the subjective testing standards of grading, and the scoring scope is between-3～+ 3.

The tested speech that the present invention chooses is as follows: (1) adaptive rate rate audio coder ﹠ decoder (codec) is the narrowband telephone voice of exporting under the 12.2kbps in code rate; (2) wideband adaptive variable Rate audio coder ﹠ decoder (codec) is the wideband telephony voice of exporting under the 8.85kbps in code rate; (3) the wideband telephony voice of adaptive rate rate audio coder ﹠ decoder (codec) after code rate is the new blind bandwidth expansion algorithm that proposes through the present invention of the narrowband telephone voice exported under the 12.2kbps.

Wideband telephony voice after the new blind bandwidth expansion algorithm that the narrowband telephone voice propose through the present invention and adaptive rate rate audio coder ﹠ decoder (codec) are that the narrowband telephone voice exported under the 12.2kbps are as first group of tested speech in code rate; Wideband telephony voice after the new blind bandwidth expansion algorithm that the narrowband telephone voice propose through the present invention and wideband adaptive variable Rate audio coder ﹠ decoder (codec) are that the wideband telephony voice exported under the 8.85kbps are as second group of tested speech in code rate.Every section voice all will be clipped to-26 decibels.

In subjective testing, invite 20 audiences (10 male 10 woman) in same environment, to test, the age of test subject is 20 years old～between 40 years old, and do not participate in the relevant subjective testing aspect any voice in half a year.Before test beginning, the effect after the bandwidth expansion to audience display, and is informed that the audience need estimate two main aspects of voice, estimate the high fdrequency component that voice quality and impression expand.When test subject has been understood guidance, they at first will listen to preliminary feelings row, and provide their suggestion.During test, every group of tested speech shows test subject according to random order, and allows them unrestrictedly to repeat to listen to.At last, every bit test main body will provide their suggestion according to the subjective testing standards of grading.Fig. 4 (a) and 4 (b) have provided the distribution plan of the comparing result of two groups of tested speech.

Horizontal ordinate is represented subjective testing standards of grading score in the distribution plan, and ordinate represents to provide audience's proportion of a certain mark.Comment the score standard according to subjective testing, positive number represents that this paper algorithm be the narrowband telephone voice exported under the 12.2kbps or wideband adaptive variable Rate audio coder ﹠ decoder (codec) in code rate than adaptive rate rate audio coder ﹠ decoder (codec) in code rate is that the wideband telephony voice exported under the 8.85kbps are better.This process adopts the difference analysis method, adopts 95% fiducial interval, and bandwidth mode of extension test result is analyzed.Fig. 4 (a) is that the present invention exports the comparing result distribution plan that result and adaptive rate rate audio coder ﹠ decoder (codec) are the narrowband telephone voice exported under the 12.2kbps in code rate; Fig. 4 (b) is that this paper algorithm output result and wideband adaptive variable Rate audio coder ﹠ decoder (codec) are the comparing result figure of the wideband telephony voice exported under the 8.85kbps in code rate.By Fig. 4 (a) and 4 (b) as can be seen, the result that this paper algorithm draws slightly is better than the broadband voice that wideband adaptive variable Rate audio coder ﹠ decoder (codec) is exported under the 8.85kbps code rate, but compared bigger improvement with the narrowband speech that adaptive rate rate audio coder ﹠ decoder (codec) is exported under the 12.2kbps code rate, auditory effect obviously improves.

The above; only be the preferable embodiment of the present invention; but protection scope of the present invention is not limited thereto; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; be equal to replacement or change according to technical scheme of the present invention and inventive concept thereof, all should be encompassed within protection scope of the present invention.

Claims

1. the method for artificial speech bandwidth expansion is characterized in that:

Through extrapolation high-frequency envelope module, the output signal of extrapolation high-frequency envelope module enters the spectral shaping module after the narrow band voice signal process curve fitting module; Narrow band voice signal obtains one group of linear predictor coefficient through every frame after the characteristic extracting module, structure autoregressive model and filtration module, white noise is handled the generation high frequency noise random series relevant with low frequency by this autoregressive model, and the high frequency noise random series enters the spectral shaping module; Spectral shaping module output high frequency voice; High frequency voice and narrow band voice signal obtain broadband voice through the phonetic synthesis module.

2. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: curve fitting module adopts the curve fitting method to obtain narrowband speech low frequency log spectrum enveloping curve equation, by the extrapolated high frequency log spectrum of curvilinear equation envelope, choose the resonance peak of low frequency part as the input of linear fit; At first import the narrowband speech of 8kHz sampling, estimate pitch period, and time-domain signal is transformed in the logarithm frequency domain, by the pitch period search logarithm frequency domain peak point of estimating, the change curve of resonance peak is described through curve fitting technique again, and then extrapolated high frequency log spectrum enveloping curve

Divide frame to handle to narrowband speech: frame length is 128, overlapping 64 sampled points of interframe, and the correlativity that adopts frequency domain method namely to calculate signal is calculated the pitch period T of this frame voice, and the input narrowband speech is x (n), and autocorrelation function R (k) is

Wherein N is frame length, described N=128, in correlation delay k=20～143 scopes, search for the peaked position k' of R (k), k' is the valuation T of pitch period, narrowband speech is done Fourier transform, be transformed into the logarithm frequency domain then, search out first resonance peak in the logarithm frequency domain, first resonance peak is made as p ₀Because the size in gene cycle and the spacing of resonance peak are about equally, by fixed first resonance peak p ₀With the gene period T, can search out other low-frequency resonance peak, when searching for other low-frequency resonance peaks, only need can obtain the accurate position of other resonance peaks searching for near the point of T with last resonance peak distance, establishing its amplitude is lo_env (ω), it is low frequency log spectrum envelope, corresponding Frequency point ω, lo_env (ω) and ω set up mapping relations as the input of curve match with low frequency log spectrum envelope lo_env (ω) and low frequency frequency ω

lo_env(ω)＝a·e ^bω+c·e ^dω，ω＝0～2π*4000，

3. the apparatus and method of a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: extrapolation high-frequency envelope module is by fixed mapping formula, with high frequency Frequency point substitution formula, high frequency log spectrum envelope data hi_env (ω) to the unknown extrapolates, extrapolated high frequency log spectrum envelope hi_env (ω)

hi_env(ω)＝a·e ^bω+c·e ^dω，ω＝2π*4000～2π*8000。

4. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: characteristic extracting module is that narrowband speech is carried out linear prediction analysis, every frame obtains one group of linear predictor coefficient, the structure autoregressive model; At first use narrowband speech structure autoregressive model, be that the speech frame x (n) of N carries out linear prediction analysis to each length, described N=128 namely calculates the autocorrelation function of each windowing speech frame, and use the Levinson-Durbin algorithm to convert thereof into linear predictor coefficient, concrete steps are as follows:

Use Hamming window window (n)=0.5-0.5cos (2 π n/N), n=0,1 ..., N-1, N are positive integer, and input speech signal x (n) is carried out windowing process, voice x'(n after the windowing) be

x'(n)＝x(n)·window(n)，

Calculate autocorrelation function,

K=0,1 ..., N-1, N are positive integer,

Adopt the Levinson-Durbin algorithm, can obtain L rank autoregressive model coefficient a by finding the solution following system of equations _i, i=1,2 ..., L, L are positive integer

k=1,2,...,L。

5. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: structure autoregressive model and filtration module method are as follows:

By low frequency voice autoregressive model coefficient a _i, i=1 ..., L, L are positive integer, structure composite filter model, namely

Wherein, G is gain, and L is the autoregressive model exponent number, and described L is 8,9,10 ..., certain positive integer between 20, L is integer, G is certain decimal between 0.1～1.

White noise is handled by this composite filter, produced the random series relevant with the low frequency voice; The production method of white noise sequence is

w(n)＝[w(n-1)·31821+13849]，

Wherein, w (0)=0;

Wherein, N is frame length, described N=128.

6. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: the spectral shaping module is to utilize the high frequency logarithm frequency domain envelope hi_env (ω) that estimates above that the high frequency noise sequence is modulated,

At first, (n) carries out Fourier transform to the high frequency noise sequences y, again it is transformed into log-domain, obtains the frequency domain logarithm value C of high frequency noise sequence _y(ω), use high frequency log spectrum envelope to modulating, obtains the frequency spectrum logarithm value C of high frequency voice to the high frequency noise sequence spectrum _Wide(ω)

C _wide(ω)＝C _y(w)·hi_env(w)，

S _wide(ω)＝exp(C _wide(ω))， (1)

s _wide(n)＝IFFT(S _wide(ω))， (2)

7. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: the phonetic synthesis module is to be that the method for narrow band signal by interpolation of 8KHz improves sampling rate with sample frequency, promote and be 16KHz, process cepstrum computation process obtains the cepstrum of narrowband speech, and the high frequency voice obtain the cepstrum of high frequency voice equally through cepstrum computation process; The cepstrum of narrowband speech and high frequency voice is transformed into frequency domain respectively, and the frequency domain amplitude of narrowband speech is done following processing:

C _wide(ω)＝C _narrow(ω)+C _high(ω)，

Wherein, C _Narrow(ω) and C _High(ω) be respectively the cepstrum frequency domain value of narrowband speech and high frequency voice; C _Wide(ω) be the frequency domain value of synthetic broadband cepstrum, obtain the cepstrum of broadband voice again through inverse Fourier transform, finally by the inverse process of crossing cepstrum, the broadband voice after obtaining synthesizing.