CN103258543A - Method for expanding artificial voice bandwidth - Google Patents

Method for expanding artificial voice bandwidth Download PDF

Info

Publication number
CN103258543A
CN103258543A CN2013101300812A CN201310130081A CN103258543A CN 103258543 A CN103258543 A CN 103258543A CN 2013101300812 A CN2013101300812 A CN 2013101300812A CN 201310130081 A CN201310130081 A CN 201310130081A CN 103258543 A CN103258543 A CN 103258543A
Authority
CN
China
Prior art keywords
voice
high frequency
frequency
module
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013101300812A
Other languages
Chinese (zh)
Other versions
CN103258543B (en
Inventor
陈喆
殷福亮
彭雯雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201310130081.2A priority Critical patent/CN103258543B/en
Publication of CN103258543A publication Critical patent/CN103258543A/en
Application granted granted Critical
Publication of CN103258543B publication Critical patent/CN103258543B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The invention discloses a method for expanding an artificial voice bandwidth. The working process comprises the steps that a narrow band signal is transmitted into a curve fitting module and is input to an outer pushing high frequency envelope module to be processed, and an output signal of the outer pushing high frequency envelope module enters frequency spectrum forming module; after a narrow band voice signal is transmitted into a feature abstracting module, each frame obtains one set of linear predication parameters, an autoregression model and a filter module are formed by utilizing the linear predication parameters, white noise is processed by an AR module to produce a high frequency noise random sequence related to low frequency, and the high frequency noise random sequence enters the frequency spectrum forming module; the frequency spectrum forming module outputs high frequency voice; wideband voice is obtained by transmitting the high frequency voice and an original narrow band voice signal into a voice synthetizing module.

Description

A kind of method of artificial speech bandwidth expansion
Technical field
The present invention relates to a kind of method of artificial speech bandwidth expansion, belong to digital signal processing technique field.
Background technology
At present, public telephone network (PSTN) effective frequency range only is 0.3~3.4KHz, and GSM digital cellular telephone effective bandwidth is no more than 4KHz.Although the main concentration of energy of speech signal is in 0.3~3.4KHz frequency range, what the actual frequency range that takies will be big is many.The 4KHz narrowband speech has been owing to lacked high fdrequency component, its naturalness, and the obvious variation in aspects such as intelligibility sounds " vexed ".
Summary of the invention
In order to overcome above-mentioned deficiency, the object of the present invention is to provide a kind of method of artificial speech bandwidth expansion.
A kind of method of artificial speech bandwidth expansion, its course of work is as follows:
Through extrapolation high-frequency envelope module, the output signal of extrapolation high-frequency envelope module enters the spectral shaping module after the narrow band voice signal process curve fitting module; Narrow band voice signal obtains one group of linear predictor coefficient through every frame after the characteristic extracting module, construct autoregressive model and filtration module after utilizing linear predictor coefficient, white noise is handled the generation high frequency noise random series relevant with low frequency by this autoregressive model, and the high frequency noise random series enters the spectral shaping module; Spectral shaping module output high frequency voice; High frequency voice and narrow band voice signal obtain broadband voice through the phonetic synthesis module.
The principle of the invention and beneficial effect: keep the lower advantage of algorithm complex, produce the artificial excitation higher with true excitation correlativity.The present invention at first carries out curve fitting to known low frequency log-domain frequency spectrum, obtains curvilinear equation, and then extrapolated high frequency log-domain spectrum envelope curve.From narrowband speech medium and low frequency parameter, utilize linear predictor coefficient to constitute autoregressive model, use the uniform white noise sequence by this autoregressive model, obtain the high frequency noise sequence.This high frequency noise sequence is the white noise that has certain correlativity with narrowband speech, is converted into the log-domain frequency spectrum, passes through the modulation of high frequency log spectrum envelope again, can recover the high frequency voice, and at cepstrum territory synthetic wideband voice.The present invention is a kind of total blindness's speech bandwidth expansion technique, can directly apply to the narrowband speech receiving end.The present invention is without any need for priori or high-frequency information, and algorithm complex is lower, can recover the HFS higher with low correlation, and synthetic broadband voice auditory effect is good.
Description of drawings
Fig. 1 is process flow diagram of the present invention.
Fig. 2 is broadband voice building-up process of the present invention.
Fig. 3 (a) original wideband voice sound spectrograph.
Fig. 3 (b) narrowband speech sound spectrograph.
Voice sound spectrograph after the expansion of Fig. 3 (c) bandwidth.
Fig. 4 (a) algorithm output of the present invention and the output comparing result distribution plan of adaptive rate rate audio coder ﹠ decoder (codec) when code rate is 12.2kbps.
Fig. 4 (b) algorithm output of the present invention and the output comparing result distribution plan of wideband adaptive variable Rate audio coder ﹠ decoder (codec) when code rate is 8.85kbps.
The spectrum distortion of the broadband voice that Fig. 5 narrowband speech and the present invention synthesize is estimated figure.
Fig. 6 shows the subjective testing standard.
Embodiment
The present invention will be further described below in conjunction with accompanying drawing.
Fig. 1 is process flow diagram of the present invention.As shown in Figure 1:
Through extrapolation high-frequency envelope module, the output signal of extrapolation high-frequency envelope module enters the spectral shaping module after the narrow band voice signal process curve fitting module; Narrow band voice signal obtains one group of linear predictor coefficient through every frame after the characteristic extracting module, structure autoregressive model and filtration module, white noise is handled the generation high frequency noise random series relevant with low frequency by this AR model, and the high frequency noise random series enters the spectral shaping module; Spectral shaping module output high frequency voice; High frequency voice and narrow band voice signal obtain broadband voice through the phonetic synthesis module.
Curve fitting module
This module adopts the curve fitting method to obtain narrowband speech low frequency log spectrum enveloping curve equation, by the extrapolated high frequency log spectrum of curvilinear equation envelope.Choose the resonance peak of low frequency part as the input of curve match.At first import the narrowband speech of 8kHz sampling, estimate pitch period, and time-domain signal is transformed in the logarithm frequency domain, by the pitch period search logarithm frequency domain peak point of estimating, the change curve of resonance peak is described through curve fitting technique again, and then extrapolated high frequency log spectrum enveloping curve.
At first, divide frame to handle to narrowband speech, frame length is 128, overlapping 64 sampled points of interframe.The correlativity that adopts frequency domain method namely to calculate signal is calculated the pitch period T of this frame voice.If the input narrowband speech is x (n), autocorrelation function R (k) is
R ( k ) = Σ n = 0 N - 1 x ( n ) x ( n - k )
Wherein, N is frame length, and described N=128 searches for the peaked position k' of R (k) in correlation delay k=20~143 scopes, and k' is the valuation T of pitch period.Narrowband speech x (n) is done Fourier transform, be transformed into the logarithm frequency domain then, search out first resonance peak in the logarithm frequency domain, first resonance peak is made as p 0Because the size in gene cycle and the spacing of resonance peak are about equally, by fixed first resonance peak p 0With the gene period T, can search out other low-frequency resonance peak.When searching for other low-frequency resonance peaks, only need can obtain the accurate position of other resonance peaks searching for near the point of T with last resonance peak distance, establishing its amplitude is lo_env (ω), i.e. low frequency log spectrum envelope, corresponding Frequency point ω.Lo_env (ω) and ω are as the input of curve match.
Low frequency log spectrum envelope lo_env (ω) and low frequency frequency ω are set up mapping relations
lo_env(ω)=a·e +c·e , ω=0~2π×4000
Obtain the parameter a in the fitting function, b, c, d had both determined the mapping formula.
Extrapolation high-frequency envelope module
By fixed mapping formula, with high frequency Frequency point substitution formula, the high frequency spectrum envelope data hi_env (ω) of the unknown is extrapolated extrapolated high frequency log spectrum envelope hi_env (ω)
hi_env(ω)=a·e +c·e , ω=2π×4000~2π×8000。
Characteristic extracting module
Narrowband speech is carried out linear prediction analysis, and every frame obtains one group of linear predictor coefficient, the structure autoregressive model.At first use narrowband speech structure autoregressive model.Be that the speech frame x (n) of N (N=128) carries out linear prediction analysis to each length, namely calculate the autocorrelation function of each windowing speech frame, and use the Levinson-Durbin algorithm to convert thereof into linear predictor coefficient that concrete steps are as follows.
Here use Hamming window window (n)=0.5-0.5cos (2 π n/N), n=0,1 ..., the input speech signal x of N-1 (n) carries out windowing process, voice x'(n after the windowing) be
x'(n)=x(n)·window(n),
Calculate autocorrelation function
R ( k ) = Σ n = k N - 1 x ′ ( n ) x ′ ( n - k ) , K=0,1 ..., N-1, N are positive integer.
Can obtain L rank linear predictor coefficient a by finding the solution following system of equations i, i=1,2 ..., L, L are positive integer.
Σ i = 1 L a i R ( | i - k | ) = - R ( k ) , K=1 ..., L, L are positive integer.
Adopt the Levinson-Durbin algorithm, find the solution above-mentioned system of equations, can obtain linear predictor coefficient a i, i=1 .2.., L.
Structure autoregressive model and filtration module
By low frequency speech linear predictive coefficient a i, i=1 ..., L constructs composite filter, namely
H ( z ) = G 1 - Σ i = 1 L a i z - i ,
Wherein, L is the autoregressive model exponent number, and described L is positive integer, and L is certain integer between 8~20, and G is certain decimal between 0.1~l.Embodiments of the invention arrange L=l0, and G=1 is optimum embodiment.
White noise is handled by this composite filter, produced the random series relevant with the low frequency voice.The production method of white noise sequence is
w(n)=[w(n-1)·31821+13849],
Wherein, w (0)=0.
White noise sequence w (n) exports high frequency noise sequences y (n), namely by behind the above-mentioned composite filter
y ( n ) = w ( n ) + Σ i = 1 L a i y ( n - i ) ,
Wherein, a iBe the composite filter coefficient.In order to limit the HFS energy, (n) carries out normalized with the high frequency noise sequences y, namely
y ( n ) = y ( n ) Σ i = 0 N - 1 y ( n ) · y ( n ) ,
Wherein, N is frame length, and the present invention's suggestion arranges N=128.
The spectral shaping module
The high frequency logarithm frequency domain envelope hi_env (ω) that utilization is estimated above modulates the high frequency noise sequence [7]At first, (n) carries out Fourier transform to the high frequency noise sequences y, again it is transformed into log-domain, obtains the frequency domain logarithm value C of high frequency noise sequence y(ω).Use high frequency log spectrum envelope that the high frequency noise sequence spectrum is modulated, obtain the frequency spectrum logarithm value C of high frequency voice Wide(ω)
C wide(ω)=C y(w)·hi_env(w),
If thresholding is used S respectively when the frequency domain value of high frequency voice and high frequency voice Wide(ω) and S Wide(n) expression then has
S wide(ω)=exp(C wide(ω)), (1)
s wide(n)=IFFT(S wide(ω)), (2)
Wherein, exp () is the index computing, and IFFT () is inverse Fourier transform.Through formula (1), formula (2) inverse transformation process, can obtain the high frequency voice.
The phonetic synthesis module
The present invention utilizes the characteristics of cepstrum, and HFS and the low frequency part of voice is synthetic [8], and then the broadband voice after obtaining synthesizing.The building-up process of voice as shown in Figure 2.
Be that the method for narrow band signal by interpolation of 8KHz improves sampling rate with sample frequency, promote to be 16KHz that obtain the cepstrum of narrowband speech through cepstrum computation process, the high frequency voice obtain the cepstrum of high frequency voice equally through cepstrum computation process.The cepstrum of narrowband speech and high frequency voice is transformed into frequency domain respectively, and the frequency domain amplitude of narrowband speech is done following processing:
C wide(ω)=C narrow(ω)+C high(ω)
Wherein, C Narrow(ω) and C High(ω) be respectively the cepstrum frequency domain value of narrowband speech and high frequency voice; C Wide(ω) be the frequency domain value of synthetic broadband cepstrum.Pass through inverse Fourier transform again, obtain the cepstrum of broadband voice, finally by the inverse process of crossing cepstrum, the broadband voice after obtaining synthesizing.As shown in Figure 2.
The present invention is a kind of total blindness's speech bandwidth expansion technique, can directly apply to the narrowband speech receiving end.The present invention is without any need for priori or high-frequency information, and algorithm complex is lower, can recover the HFS higher with low correlation, and synthetic broadband voice auditory effect is good.
In order to verify validity of the present invention, objective examination and subjective testing have been carried out.
Objective examination result
It is the effective ways of objective performance voice quality that spectrum distortion is estimated with sound spectrograph.Be without loss of generality, select the method that sound spectrograph is estimated and drawn to spectrum distortion of calculating for use in objective examination's link.
Spectrum distortion is estimated and is defined as
D HC 2 = 1 k Σ k = 1 k ∫ 0.25 ω 0.5 ω [ 20 log 10 ( A k ( ω ) A k ′ ( ω ) ) + G C ] 2 dω ,
G C = 1 0.25 ω s ∫ 0.25 ω s 0.5 ω s 20 log 10 ( A k ′ ( ω ) A k ( ω ) ) dω ,
Wherein, ω sBe 2 π, G CBe gain compensation factor, it can remove two square errors between the original envelope effectively, and K is total number of speech frames, A k(ω) and A' k(ω) be respectively the spectrum envelope of k frame original reference voice and tested voice, computing formula is as follows
A k ( ω ) = | Σ n = 0 N - 1 x ( n ) e - jωn | ,
A k ′ ( ω ) = | Σ n = 0 N - 1 x ′ ( n ) e - jωn | ,
The present invention's suggestion arranges N=128, and x (n) and x ' (n) represent original reference voice and tested voice respectively, and the original reference voice are the original wideband voice here, and tested voice are original narrowband speech or synthetic broadband voice.
Respectively the original narrowband speech broadband voice synthetic with using this algorithm being calculated spectrum distortion in the manner described above estimates.Test result is seen Fig. 5.As can be seen from Figure 5, the spectrum distortion of the broadband voice that this paper algorithm is synthetic obviously reduces than the spectrum distortion of narrowband speech, illustrates that this paper algorithm can estimate high frequency voice and synthetic wideband voice preferably.
Sound spectrograph is the energy information of representing one section voice intermediate frequency spectrum with gray level image, and the more bright part of image illustrates that this portion of energy is more big, and more dark part illustrates that the energy of this partial frequency spectrum is more little.Sound spectrograph can show the variation of voice medium frequency intuitively, therefore, in order to contrast frequency spectrum difference more intuitively, provided man's narrowband speech in the tested speech, the sound spectrograph of the broadband voice that original wideband voice and this illiteracy of process bandwidth expansion algorithm are synthetic is shown in Fig. 3 (a) and (b), (c).Be the sound spectrograph of primary speech signal from Fig. 3 (a), as can be seen, sound spectrograph all is brighter in 0~8KHz frequency range.Fig. 3 (b) is the sound spectrograph of narrow band voice signal, and the sound spectrograph of narrowband speech is very dark in 4~8KHz frequency range, illustrate at the HFS energy very little, so narrowband speech sounds inadequately nature.Fig. 3 (c) is the sound spectrograph of the blind bandwidth expansion algorithm output voice that propose of the present invention, and in 4~8KHz frequency range, sound spectrograph obviously brightens, and illustrates that the high fdrequency component of voice obviously increases.
The subjective testing result
Subjective testing adopts subjective testing standards of grading method commonly used in the world, namely compares mean opinion score.Fig. 6 has provided the subjective testing standards of grading, and the scoring scope is between-3~+ 3.
The tested speech that the present invention chooses is as follows: (1) adaptive rate rate audio coder ﹠ decoder (codec) is the narrowband telephone voice of exporting under the 12.2kbps in code rate; (2) wideband adaptive variable Rate audio coder ﹠ decoder (codec) is the wideband telephony voice of exporting under the 8.85kbps in code rate; (3) the wideband telephony voice of adaptive rate rate audio coder ﹠ decoder (codec) after code rate is the new blind bandwidth expansion algorithm that proposes through the present invention of the narrowband telephone voice exported under the 12.2kbps.
Wideband telephony voice after the new blind bandwidth expansion algorithm that the narrowband telephone voice propose through the present invention and adaptive rate rate audio coder ﹠ decoder (codec) are that the narrowband telephone voice exported under the 12.2kbps are as first group of tested speech in code rate; Wideband telephony voice after the new blind bandwidth expansion algorithm that the narrowband telephone voice propose through the present invention and wideband adaptive variable Rate audio coder ﹠ decoder (codec) are that the wideband telephony voice exported under the 8.85kbps are as second group of tested speech in code rate.Every section voice all will be clipped to-26 decibels.
In subjective testing, invite 20 audiences (10 male 10 woman) in same environment, to test, the age of test subject is 20 years old~between 40 years old, and do not participate in the relevant subjective testing aspect any voice in half a year.Before test beginning, the effect after the bandwidth expansion to audience display, and is informed that the audience need estimate two main aspects of voice, estimate the high fdrequency component that voice quality and impression expand.When test subject has been understood guidance, they at first will listen to preliminary feelings row, and provide their suggestion.During test, every group of tested speech shows test subject according to random order, and allows them unrestrictedly to repeat to listen to.At last, every bit test main body will provide their suggestion according to the subjective testing standards of grading.Fig. 4 (a) and 4 (b) have provided the distribution plan of the comparing result of two groups of tested speech.
Horizontal ordinate is represented subjective testing standards of grading score in the distribution plan, and ordinate represents to provide audience's proportion of a certain mark.Comment the score standard according to subjective testing, positive number represents that this paper algorithm be the narrowband telephone voice exported under the 12.2kbps or wideband adaptive variable Rate audio coder ﹠ decoder (codec) in code rate than adaptive rate rate audio coder ﹠ decoder (codec) in code rate is that the wideband telephony voice exported under the 8.85kbps are better.This process adopts the difference analysis method, adopts 95% fiducial interval, and bandwidth mode of extension test result is analyzed.Fig. 4 (a) is that the present invention exports the comparing result distribution plan that result and adaptive rate rate audio coder ﹠ decoder (codec) are the narrowband telephone voice exported under the 12.2kbps in code rate; Fig. 4 (b) is that this paper algorithm output result and wideband adaptive variable Rate audio coder ﹠ decoder (codec) are the comparing result figure of the wideband telephony voice exported under the 8.85kbps in code rate.By Fig. 4 (a) and 4 (b) as can be seen, the result that this paper algorithm draws slightly is better than the broadband voice that wideband adaptive variable Rate audio coder ﹠ decoder (codec) is exported under the 8.85kbps code rate, but compared bigger improvement with the narrowband speech that adaptive rate rate audio coder ﹠ decoder (codec) is exported under the 12.2kbps code rate, auditory effect obviously improves.
The above; only be the preferable embodiment of the present invention; but protection scope of the present invention is not limited thereto; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; be equal to replacement or change according to technical scheme of the present invention and inventive concept thereof, all should be encompassed within protection scope of the present invention.

Claims (7)

1. the method for artificial speech bandwidth expansion is characterized in that:
Through extrapolation high-frequency envelope module, the output signal of extrapolation high-frequency envelope module enters the spectral shaping module after the narrow band voice signal process curve fitting module; Narrow band voice signal obtains one group of linear predictor coefficient through every frame after the characteristic extracting module, structure autoregressive model and filtration module, white noise is handled the generation high frequency noise random series relevant with low frequency by this autoregressive model, and the high frequency noise random series enters the spectral shaping module; Spectral shaping module output high frequency voice; High frequency voice and narrow band voice signal obtain broadband voice through the phonetic synthesis module.
2. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: curve fitting module adopts the curve fitting method to obtain narrowband speech low frequency log spectrum enveloping curve equation, by the extrapolated high frequency log spectrum of curvilinear equation envelope, choose the resonance peak of low frequency part as the input of linear fit; At first import the narrowband speech of 8kHz sampling, estimate pitch period, and time-domain signal is transformed in the logarithm frequency domain, by the pitch period search logarithm frequency domain peak point of estimating, the change curve of resonance peak is described through curve fitting technique again, and then extrapolated high frequency log spectrum enveloping curve
Divide frame to handle to narrowband speech: frame length is 128, overlapping 64 sampled points of interframe, and the correlativity that adopts frequency domain method namely to calculate signal is calculated the pitch period T of this frame voice, and the input narrowband speech is x (n), and autocorrelation function R (k) is
Figure FDA00003042068500011
Wherein N is frame length, described N=128, in correlation delay k=20~143 scopes, search for the peaked position k' of R (k), k' is the valuation T of pitch period, narrowband speech is done Fourier transform, be transformed into the logarithm frequency domain then, search out first resonance peak in the logarithm frequency domain, first resonance peak is made as p 0Because the size in gene cycle and the spacing of resonance peak are about equally, by fixed first resonance peak p 0With the gene period T, can search out other low-frequency resonance peak, when searching for other low-frequency resonance peaks, only need can obtain the accurate position of other resonance peaks searching for near the point of T with last resonance peak distance, establishing its amplitude is lo_env (ω), it is low frequency log spectrum envelope, corresponding Frequency point ω, lo_env (ω) and ω set up mapping relations as the input of curve match with low frequency log spectrum envelope lo_env (ω) and low frequency frequency ω
lo_env(ω)=a·e +c·e ,ω=0~2π*4000,
Obtain the parameter a in the fitting function, b, c, d had both determined the mapping formula.
3. the apparatus and method of a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: extrapolation high-frequency envelope module is by fixed mapping formula, with high frequency Frequency point substitution formula, high frequency log spectrum envelope data hi_env (ω) to the unknown extrapolates, extrapolated high frequency log spectrum envelope hi_env (ω)
hi_env(ω)=a·e +c·e ,ω=2π*4000~2π*8000。
4. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: characteristic extracting module is that narrowband speech is carried out linear prediction analysis, every frame obtains one group of linear predictor coefficient, the structure autoregressive model; At first use narrowband speech structure autoregressive model, be that the speech frame x (n) of N carries out linear prediction analysis to each length, described N=128 namely calculates the autocorrelation function of each windowing speech frame, and use the Levinson-Durbin algorithm to convert thereof into linear predictor coefficient, concrete steps are as follows:
Use Hamming window window (n)=0.5-0.5cos (2 π n/N), n=0,1 ..., N-1, N are positive integer, and input speech signal x (n) is carried out windowing process, voice x'(n after the windowing) be
x'(n)=x(n)·window(n),
Calculate autocorrelation function,
Figure FDA00003042068500021
K=0,1 ..., N-1, N are positive integer,
Adopt the Levinson-Durbin algorithm, can obtain L rank autoregressive model coefficient a by finding the solution following system of equations i, i=1,2 ..., L, L are positive integer
Figure FDA00003042068500022
k=1,2,...,L。
5. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: structure autoregressive model and filtration module method are as follows:
By low frequency voice autoregressive model coefficient a i, i=1 ..., L, L are positive integer, structure composite filter model, namely
Figure FDA00003042068500031
Wherein, G is gain, and L is the autoregressive model exponent number, and described L is 8,9,10 ..., certain positive integer between 20, L is integer, G is certain decimal between 0.1~1.
White noise is handled by this composite filter, produced the random series relevant with the low frequency voice; The production method of white noise sequence is
w(n)=[w(n-1)·31821+13849],
Wherein, w (0)=0;
White noise sequence w (n) exports high frequency noise sequences y (n), namely by behind the above-mentioned composite filter
Figure FDA00003042068500032
Wherein, a iBe the composite filter coefficient.In order to limit the HFS energy, (n) carries out normalized with the high frequency noise sequences y, namely
Figure FDA00003042068500033
Wherein, N is frame length, described N=128.
6. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: the spectral shaping module is to utilize the high frequency logarithm frequency domain envelope hi_env (ω) that estimates above that the high frequency noise sequence is modulated,
At first, (n) carries out Fourier transform to the high frequency noise sequences y, again it is transformed into log-domain, obtains the frequency domain logarithm value C of high frequency noise sequence y(ω), use high frequency log spectrum envelope to modulating, obtains the frequency spectrum logarithm value C of high frequency voice to the high frequency noise sequence spectrum Wide(ω)
C wide(ω)=C y(w)·hi_env(w),
If thresholding is used S respectively when the frequency domain value of high frequency voice and high frequency voice Wide(ω) and S Wide(n) expression then has
S wide(ω)=exp(C wide(ω)), (1)
s wide(n)=IFFT(S wide(ω)), (2)
Wherein, exp () is the index computing, and IFFT () is inverse Fourier transform.Through formula (1), formula (2) inverse transformation process, can obtain the high frequency voice.
7. the method for a kind of artificial speech bandwidth expansion according to claim 1, it is characterized in that: the phonetic synthesis module is to be that the method for narrow band signal by interpolation of 8KHz improves sampling rate with sample frequency, promote and be 16KHz, process cepstrum computation process obtains the cepstrum of narrowband speech, and the high frequency voice obtain the cepstrum of high frequency voice equally through cepstrum computation process; The cepstrum of narrowband speech and high frequency voice is transformed into frequency domain respectively, and the frequency domain amplitude of narrowband speech is done following processing:
C wide(ω)=C narrow(ω)+C high(ω),
Wherein, C Narrow(ω) and C High(ω) be respectively the cepstrum frequency domain value of narrowband speech and high frequency voice; C Wide(ω) be the frequency domain value of synthetic broadband cepstrum, obtain the cepstrum of broadband voice again through inverse Fourier transform, finally by the inverse process of crossing cepstrum, the broadband voice after obtaining synthesizing.
CN201310130081.2A 2013-04-12 2013-04-12 Method for expanding artificial voice bandwidth Expired - Fee Related CN103258543B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310130081.2A CN103258543B (en) 2013-04-12 2013-04-12 Method for expanding artificial voice bandwidth

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310130081.2A CN103258543B (en) 2013-04-12 2013-04-12 Method for expanding artificial voice bandwidth

Publications (2)

Publication Number Publication Date
CN103258543A true CN103258543A (en) 2013-08-21
CN103258543B CN103258543B (en) 2015-06-03

Family

ID=48962413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310130081.2A Expired - Fee Related CN103258543B (en) 2013-04-12 2013-04-12 Method for expanding artificial voice bandwidth

Country Status (1)

Country Link
CN (1) CN103258543B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217730A (en) * 2014-08-18 2014-12-17 大连理工大学 Artificial speech bandwidth expansion method and device based on K-SVD
CN106856623A (en) * 2017-02-20 2017-06-16 鲁睿 Baseband voice signals communicate noise suppressing method and system
CN106992003A (en) * 2017-03-24 2017-07-28 深圳北斗卫星信息科技有限公司 Voice signal auto gain control method
CN107977849A (en) * 2016-10-25 2018-05-01 深圳市百米生活股份有限公司 A kind of method and system based on audio stream real-time intelligent implantation information
CN108198571A (en) * 2017-12-21 2018-06-22 中国科学院声学研究所 A kind of bandwidth expanding method judged based on adaptive bandwidth and system
CN110155064A (en) * 2019-04-22 2019-08-23 江苏大学 Special vehicle traveling lane identification based on voice signal with from vehicle lane change decision system and method
CN110839108A (en) * 2019-11-06 2020-02-25 维沃移动通信有限公司 Noise reduction method and electronic equipment
CN117995193A (en) * 2024-04-02 2024-05-07 山东天意装配式建筑装备研究院有限公司 Intelligent robot voice interaction method based on natural language processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1988565A (en) * 2005-12-23 2007-06-27 Qnx软件操作系统(威美科)有限公司 Bandwidth extension of narrowband speech
CN101556795A (en) * 2008-04-09 2009-10-14 展讯通信(上海)有限公司 Method and device for computing voice fundamental frequency
CN102522092A (en) * 2011-12-16 2012-06-27 大连理工大学 Device and method for expanding speech bandwidth based on G.711.1
CN102543086A (en) * 2011-12-16 2012-07-04 大连理工大学 Device and method for expanding speech bandwidth based on audio watermarking

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1988565A (en) * 2005-12-23 2007-06-27 Qnx软件操作系统(威美科)有限公司 Bandwidth extension of narrowband speech
CN101556795A (en) * 2008-04-09 2009-10-14 展讯通信(上海)有限公司 Method and device for computing voice fundamental frequency
CN102522092A (en) * 2011-12-16 2012-06-27 大连理工大学 Device and method for expanding speech bandwidth based on G.711.1
CN102543086A (en) * 2011-12-16 2012-07-04 大连理工大学 Device and method for expanding speech bandwidth based on audio watermarking

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217730A (en) * 2014-08-18 2014-12-17 大连理工大学 Artificial speech bandwidth expansion method and device based on K-SVD
CN104217730B (en) * 2014-08-18 2017-07-21 大连理工大学 A kind of artificial speech bandwidth expanding method and device based on K SVD
CN107977849A (en) * 2016-10-25 2018-05-01 深圳市百米生活股份有限公司 A kind of method and system based on audio stream real-time intelligent implantation information
CN106856623A (en) * 2017-02-20 2017-06-16 鲁睿 Baseband voice signals communicate noise suppressing method and system
CN106856623B (en) * 2017-02-20 2020-02-11 鲁睿 Baseband voice signal communication noise suppression method and system
CN106992003A (en) * 2017-03-24 2017-07-28 深圳北斗卫星信息科技有限公司 Voice signal auto gain control method
CN108198571A (en) * 2017-12-21 2018-06-22 中国科学院声学研究所 A kind of bandwidth expanding method judged based on adaptive bandwidth and system
CN108198571B (en) * 2017-12-21 2021-07-30 中国科学院声学研究所 Bandwidth extension method and system based on self-adaptive bandwidth judgment
CN110155064A (en) * 2019-04-22 2019-08-23 江苏大学 Special vehicle traveling lane identification based on voice signal with from vehicle lane change decision system and method
CN110155064B (en) * 2019-04-22 2020-12-18 江苏大学 Special vehicle driving lane identification and self-vehicle lane change decision-making system and method based on sound signals
CN110839108A (en) * 2019-11-06 2020-02-25 维沃移动通信有限公司 Noise reduction method and electronic equipment
CN117995193A (en) * 2024-04-02 2024-05-07 山东天意装配式建筑装备研究院有限公司 Intelligent robot voice interaction method based on natural language processing

Also Published As

Publication number Publication date
CN103258543B (en) 2015-06-03

Similar Documents

Publication Publication Date Title
CN103258543B (en) Method for expanding artificial voice bandwidth
CN102054480B (en) Method for separating monaural overlapping speeches based on fractional Fourier transform (FrFT)
CN111128213B (en) Noise suppression method and system for processing in different frequency bands
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN101527141B (en) Method of converting whispered voice into normal voice based on radial group neutral network
CN103117066B (en) Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum
KR100304666B1 (en) Speech enhancement method
EP4191583A1 (en) Transient speech or audio signal encoding method and device, decoding method and device, processing system and computer-readable storage medium
CN110246510B (en) End-to-end voice enhancement method based on RefineNet
CN102881289B (en) Hearing perception characteristic-based objective voice quality evaluation method
US20110099004A1 (en) Determining an upperband signal from a narrowband signal
JPS63259696A (en) Voice pre-processing method and apparatus
CN103531205A (en) Asymmetrical voice conversion method based on deep neural network feature mapping
CN102664003A (en) Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM)
CN103474074B (en) Pitch estimation method and apparatus
CN106997765B (en) Quantitative characterization method for human voice timbre
CN107293306B (en) A kind of appraisal procedure of the Objective speech quality based on output
CN107221334B (en) Audio bandwidth extension method and extension device
CN104658547A (en) Method for expanding artificial voice bandwidth
CN105845149A (en) Predominant pitch acquisition method in acoustical signal and system thereof
CN103093757B (en) Conversion method for conversion from narrow-band code stream to wide-band code stream
CN102930863B (en) Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model
CN106782599A (en) The phonetics transfer method of post filtering is exported based on Gaussian process
CN103559893B (en) One is target gammachirp cepstrum coefficient aural signature extracting method under water
CN102543089B (en) Conversion device for converting narrowband code streams into broadband code streams

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150603

Termination date: 20180412