CN1185626C - System and method for modifying speech signals - Google Patents

System and method for modifying speech signals Download PDF

Info

Publication number
CN1185626C
CN1185626C CNB018042864A CN01804286A CN1185626C CN 1185626 C CN1185626 C CN 1185626C CN B018042864 A CNB018042864 A CN B018042864A CN 01804286 A CN01804286 A CN 01804286A CN 1185626 C CN1185626 C CN 1185626C
Authority
CN
China
Prior art keywords
signal
voice signal
module
narrow band
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB018042864A
Other languages
Chinese (zh)
Other versions
CN1397064A (en
Inventor
U·林格伦
H·古斯塔夫松
P·多伊特根
C·图尔班
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Publication of CN1397064A publication Critical patent/CN1397064A/en
Application granted granted Critical
Publication of CN1185626C publication Critical patent/CN1185626C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques

Abstract

A system and method for speech signal enhancement upsamples a narrowband speech signal at a receiver to generate a wideband speech signal. The lower frequency range of the wideband speech signal is reproduced using the received narrowband speech signal. The received narrowband speech signal is analyzed to determine its formants and pitch information. The upper frequency range of the wideband speech signal is synthesized using information derived from the received narrowband speech signal.

Description

Revise the system and method for voice signal
Background of invention
The present invention relates to be used for technology, and relate more specifically to be used to strengthen the technology of the narrow band voice signal at receiver place at communication network emission voice messaging.
In the emission of voice signal, between the quality of the voice signal of network capacity (that is the number of calls that, is launched) and those callings, a kind of mean method is arranged.Current most of telephone system just in use approximately encodes and the emission voice signal in the narrow-band between 300Hz and the 3.4kHz with the sampling rate of 8kHz according to Nyquist's theorem.Because human speech comprises frequency between about 50Hz and the 13kHz, so with sample human speech and launch about 300Hz and must ignore information in the voice signal of a 8kHz speed to the narrow frequency range of 3.4kHz.Therefore, the telephone system quality of voice signal that must decline.
The whole bag of tricks of the speech signal bandwidth of launching in the extension telephones system is developed.This method can be divided into two classes.First category comprises such system: the bandwidth so that hold of these systems by the voice signal of whole telephone system emission produced by human speech be wide frequency ranges more.These systems have forced other bandwidth requirement everywhere at network, therefore are difficult to realize.
Second classification comprises such system: these systems use mathematical algorithm to operate the narrow band voice signal that is used by existing phone system.Typical example comprises the algorithm of the wideband speech signal in the compressed transmissions machine, so launches broadband signal so that can connect by an existing arrowband.Broadband signal then must be decompressed at the receiver place.These methods may implement too expensive ,-because need to change the structure of existing system.
Other technologies realize a kind of " codebook " (codebook) method, as in the publication " Statistical Recovery of Wideband Speechfrom Narrowband Speech " of YangMing Cheng (statistics from the broadband voice of narrowband speech is recovered) (relevant voice and video handle IEEE can report in October, 1994) and disclosed European Patent Application No. No.EP-A-0945852 A1, being described.A codebook is used to be converted to new wideband speech signal from narrow band voice signal.Usually the conversion from the arrowband to the broadband is based on two kinds of models: a kind ofly be used for the Narrowband Speech analysis and a kind of to be used for broadband voice synthetic.Codebook is trained about speech data so that the diversity of " association " most of speeches (voice).When using codebook, narrowband speech is formed pattern and expression is searched the codebook inlet of a minimum spacing of arrowband model.Selecteed model is converted into its broadband equivalent, and it is used to the synthetic wideband voice.A shortcoming relevant with codebook is: they need important training.
Another kind method is commonly called frequency spectrum stack (folding).The frequency spectrum composite technology is based on a kind of like this principle: promptly, the content in the low-frequency band can be added in the high frequency band.In general, come narrow band signal is sampled so that introduce aliasing (aliasing) in high frequency band again with a higher sampling rate.High frequency band uses a low-pass filter by shaping then, and produces broadband signal.These methods are simple and effective, but they usually introduce the high frequency distortion that makes speech that the metal sense be arranged.
Therefore, need to be used to launch the other system and method for narrow band voice signal in the art.In addition, need to be used to handle the system and method for the narrow band voice signal at receiver place in the art, so that the simulation wideband speech signal.
Summary of the invention
The present invention handles these and other needs by composite signal being added in the narrow band voice signal that receives at the receiver place.Preferably, voice signal is split as a voice range model and a pumping signal.One or more resonance frequencies can be added to the voice range model goes, thereby an extra resonance peak is synthesized in voice signal.In addition, new synthetic pumping signal pumping signal that can be added in the frequency range that will be synthesized is gone.Voice can be synthesized then so that obtain a wideband speech signal.Advantageously, method of the present invention has low relatively computational complexity, and significant distortion is not incorporated in the voice signal.
In one aspect, the invention provides a kind of method that is used for processes voice signals.This method comprises the steps: to analyze one and receives narrow band signal so that determine the content of synthetic high frequency band; The low-frequency band that the narrow band signal that use receives comes the reproduce voice signal; With the low-frequency band of described regeneration and the synthetic high frequency band of determining are merged so that produce a wideband speech signal with synthetic component.
According to another aspect of the present invention, analytical procedure also comprises the steps: to dock and narrows spectrum analysis of band signal execution so that determine the parameter relevant with speech model and residual error signal; Determine a tone relevant with residual error signal; Identification and the relevant peak value of reception narrow band signal; With according in the peak value of tone of determining and identification at least one from information reproduction to a high frequency band that receives in the narrow band signal so that synthetic high frequency band content is provided.
According to another aspect of the present invention, a predetermined frequency scope of broadband signal can optionally be increased.Broadband signal also can be converted into an analog format and be exaggerated.
According to a further aspect, the invention provides a system that is used to handle a voice signal.This system comprises: device is used to analyze one and receives narrow band signal so that determine synthetic high frequency band content; Device, a low-frequency band that is used to use the narrow band signal of reception to come the reproduce voice signal; And device, be used for the low-frequency band of described regeneration and the synthetic high frequency band of determining are merged so that produce a wideband speech signal with synthetic component.
According to another aspect of the present invention, be used to analyze one and receive narrow band signal so that the equipment of definite synthetic high frequency band content comprises: a parameter spectrum analysis module is used to analyze the resonance peak structure of narrow band signal and produces parameter and error signal of describing this narrow-band voice signal; A tone judge module is used for determining the tone of the acoustic segment represented by narrow band signal; With a remaining extender and replication module, be used for handling the information that obtains from narrow band voice signal and produce a synthetic high-frequency band signals component.
According to additional aspects of the present invention, remaining extender and replication module comprise a fast Fourier transform module, are used for the error signal from the parameter spectrum analysis module is converted into frequency domain; A peak detector is used for the harmonic frequency of identification error signal; With a replication module, be used for the peak value by peak detector identification is copied to lower frequency range.
In another aspect, the invention provides a kind of system that is used for managing a narrow band voice signal everywhere at receiver.Native system comprises that is gone up a sampler, and it receives narrow band voice signal and increases sampling frequency has an output signal that increases frequency spectrum so that produce; A parameter spectrum analysis module, it receives from the output signal in the last sampler and analyzes this output signal so that produce and a speech model and the parameter that residual error signal is relevant; A tone judge module, it receives from the residual error signal in the parameter spectrum analysis module and produces a tone signal of this voice signal tone of expression and represent that this voice signal is an indicator signal of voiced speech or unvoiced speech; With a remaining extender and replication module, its reception is also handled this residual error signal and this tone signal so that produce a synthetic high-frequency band signals component.
Description of drawings
Read the following detailed description book in conjunction with the drawings and will understand objects and advantages of the present invention, in the accompanying drawing:
Fig. 1 is a signal narration of the receiver function of explanation various aspects according to the present invention;
Fig. 2 has illustrated the typical frequency spectrum of voiced speech and the rough structure of resonance peak;
Fig. 3 has illustrated a typical spectrogram;
Fig. 4 is a block diagram, and it has illustrated that being used for that according to the present invention composite signal is added to of system and method that narrow band voice signal goes can imitate embodiment;
Fig. 5 has illustrated a block diagram can imitating remaining extender and duplicate circuit of describing in Fig. 4;
Fig. 6 is a block diagram, and it has illustrated that being used for that according to the present invention composite signal is added to second of system and method that narrow band voice signal goes can imitate embodiment;
Fig. 7 has illustrated a block diagram can imitating remaining extender and duplicate circuit of describing in Fig. 6;
Fig. 8 is a block diagram, and it has illustrated that being used for that according to the present invention composite signal is added to the 3rd of system and method that narrow band voice signal goes can imitate embodiment;
Fig. 9 has illustrated a block diagram can imitating remaining index word according to one of the present invention;
Figure 10 is the curve map that the short time autocorrelation function of a voice sampling representing voiced sound has been described;
Figure 11 is the curve map that the average amplitude difference functions of a voice sampling representing voiced sound has been described;
Figure 12 has illustrated that an AR model passes on function and can be separated into two kinds of block diagrams that pass on function;
Figure 13 is a curve map, its illustrated before a synthetic resonance peak is added to voice signal with after the rough structure of voice signal;
Figure 14 is a curve map, its illustrated before a synthetic resonance peak is added to voice signal with after the rough structure of voice signal; With
Figure 15 is a curve map, and it has illustrated the frequency response curve relevant for the AR model of the different parameters of voice signal.
Preferred forms
The invention provides the improvement that the voice signal that can realize at the receiver place is handled.According to one aspect of the present invention, the frequency that the information in the lower frequency region of use received speech signal is synthesized the voice signal in the higher frequency regions.The present invention makes the favourable use of the following fact: promptly, voice signal has the harmonic content that can be extrapolated in the higher frequency regions.
The present invention can be used in traditional Wireline (that is, fixing) telephone system or be used in wireless (that is, moving) telephone system.Because most existing radio telephone system all is digital, so the present invention can be realized in mobile communication terminal (for example, mobile phone or other communication facilitiess) easily.Fig. 1 provides according to various aspects of the present invention and has described by the signal as the performed function of communication terminal of receiver.Antenna 110 and receiver 120 that encoding speech signal is moved phone receive, and are decoded by a channel decoder 130 and a vocoder 140.From the digital signal in the vocoder 140 is at a bandwidth expansion module 150, and it loses frequency (for example, the information in the higher frequency regions) according to the information synthetic speech signal in the received speech signal.The signal that strengthens can be transmitted to a D/A converter 160, and it is converted to digital signal can be directed to the simulating signal that loudspeaker 170 goes.Because voice signal has been digital, so in the emission mobile phone, finished sampling., should be appreciated that the present invention is not limited to wireless network; It can be used in all two-way voice communications usually.
Voice produce
By background, language is produced by the nerve signal in the brain of guide sound system for electrical teaching (neuromuscular signals).The alternative sounds that is produced by audio system is known as phoneme (phonemes), and they are merged so that form word and/or sentence.Each language all has its phoneme set, and some phoneme is present in more than one the language.
Speech sound can be divided into two kinds of main classifications: voiced sound and voiceless sound.Voiced sound is to produce when discharged by glottis the quasi-periodicity of air, and glottis is the opening between the vocal cords.These of air are burst and are excited voice range, produce a voiced sound (that is the weak point " a " in " car ").Institute produced when on the contrary, voiceless sound was the air current stabilization that constraint forces in by voice range.This constraint is usually near mouth, makes air become violent and produces the sound (that is, as " sh " in " she ") of a similar noise.Certainly, those sound that have characteristic with voiced sound and voiceless sound.
Many various features relevant with the language analogue technique are arranged.Such feature wherein is a formant frequency, and it depends on the shape of voice range.The driving source of voice range also is a relevant parameter.
Fig. 2 has illustrated the frequency spectrum with the voiced speech of 16kHz sampling frequency sampling.Rough structure is illustrated by dotted line 210.Three first resonance peaks are represented by arrow.
Resonance peak is the resonance frequency of voice range.They form the rough structure of speech frequency frequency spectrum.Resonance peak depends on that the characteristic of loudspeaker voice range changes, that is, whether it is long (being typically the male sex) or short (being typically the women).When the alteration of form of voice range, resonance frequency also changes aspect frequency, bandwidth and amplitude.Resonance peak changes shape continuously during phoneme, but sudden change is taking place when a voiced sound to a voiceless sound is changed.Three resonance peaks with lowest resonant frequency are very important for the speech that sampling produces., comprise that other resonance peak (for example, the 4th and the 5th resonance peak) has strengthened the quality of voice signal.Because (that is, 8kHz), so the resonance peak of upper frequency is omitted from encoding speech signal, this causes a lower quality voice signal to the low sampling rate that realizes in the narrow band transmission system.Resonance peak usually represents with Fk that at this, k is the number of resonance peak.
There is two types voice range excitation: pulse excitation and Noise Excitation.Thereby a mixed excitation can take place to produce in pulse excitation and Noise Excitation simultaneously.
Rising in that air in the glottis bursts is the basis of pulse excitation.Glottal depends on the pressure of one's voice in speech and vocal cords.The frequency of glottal is called as fundamental frequency, usually is expressed as Fo.Two cycles between bursting continuously is pitch period and its scope from about 1.25ms of voice to 20ms, it corresponding to 50Hz to a frequency range between the 800Hz.Tone only exists when vocal cord vibration and produces a voiced sound (or mixed excitation sound).
Different sound generating depends on the shape of voice range.Fundamental frequency Fo is that sex is relevant, and male speaker is lower than women speaker usually.Can in frequency domain, observe tone as good spectrum structure.In drafting is that tone can be used as thin horizontal line and is observed, as described in Figure 3 in the spectrogram of signal energy (being represented by color intensity usually) of a function of time and frequency.Its high-order harmonic wave that this structure is represented pitch frequency and risen in fundamental frequency.
When voiceless sound produced, driving source was represented noise.Noise is produced by the air current stabilization of (usually being in the oral cavity) compressing in the process voice range.When airflow transmitted this compressing, it became violent, and produced a noise sound.Rely on the phoneme type that produces, compressing is positioned at different positions.Because the shortage of the almost equal peak value of vacating, careful spectrum structure is different with a voiced sound.
Can imitate the voice signal intensifier circuit
Fig. 4 has illustrated that an of system and method who is used for composite signal is added to narrow band voice signal according to the present invention can imitate embodiment.Can be added to a narrow band voice signal to composite signal so that expand the frequency band of regeneration, thereby the regeneration sensation voice quality of improvement is provided.Referring to Fig. 4, the input speech or the voice signal 405 that are received by receiver (for example, mobile phone) are at first gone up sampling by last sampler 410 so that the sampling frequency of increase received signal.In a preferred embodiment, last sampler 410 can still should be appreciated that factor 2 of sampling on the received signal, also can use the sampling factor on other.
Thereby last sampled signal is analyzed the resonance peak structure of determining received speech signal by a parameter spectrum analysis module 420.The analysis of the particular type of being carried out by parameter spectral analysis unit 420 can change.In one embodiment, automatic (AR) model that returns can be used to estimate analog parameter, and is as described below.Alternately, a sinusoidal curve model can be used in the parameter spectral analysis unit 420, for example, the author is that Deisher and Spanias, title are that it is disclosed in this by the reference combination described in the article of " Speech Enhancement Using State-based Estimation andSinusoidal Modeling " (use based on the estimation of state and the voice of sinusoidal curve simulation and strengthen).In arbitrary situation, the parameter 422 of received speech signal is described in 420 outputs of parameter spectral analysis unit, (promptly, with the relevant value of using therein of particular model), and an error signal (e) 424, the relevant predicated error of estimation of its expression and the received speech signal of parameter spectral analysis unit 420.
Error signal (e) 424 is made the tone that is used for estimating received speech signal by tone judging unit 430.Tone judging unit 430 for example can be determined tone based on the distance between the transient state in the error signal.These transient states are the pulse results that produced by glottis when producing voiced sound.Tone judging unit 430 will determine also whether the voice content of received signal represents a voiced sound or a voiceless sound, and produces its signal of expression.The judgement of being made by tone judging unit 430 about as the received signal characteristic of voiced sound or voiceless sound can be the soft decision of the relative probability of a binary decision or an expression voiced sound signal or a voiceless sound signal.
No matter being the signal of a voiced sound or a voiceless sound, tone information and one expression received signal is output to remaining extender and copied cells 440 from tone judging unit 430.Following described about Fig. 5, the information extraction from receive narrow band voice signal of remaining extender and copied cells 440, (for example, in 0 to 4kHz scope) and use information extraction move on to a lower frequency range, (for example, 4kHz-8kHz).The result is forwarded to a composite filter 450 then, and its is based on from the lower frequency ranges of the parameter of parameter spectral analysis unit 420 output be output as basic lower frequency range with remaining extender and copied cells 440 and synthesize.Composite filter 450 for example is an opposite wave filter that is used for the AR model.Alternately, composite filter 450 can be based on the sinusoidal curve model.
Can also be by the output of composite filter 450 being provided to the part that different (LTV) wave filter 460 of linear session amplifies the frequency-of-interest scope.Can imitate among the embodiment at one, LTV wave filter 460 can be an infinite impulse response (IIR) wave filter.Though can use the wave filter of other type, the iir filter with different poles is particularly suitable for the simulated voice territory.Can be fit to LTV wave filter 460 based on the judgement that should be disposed in the synthetic speech signal about synthetic resonance peak (or a plurality of resonance peak).Carried out this judgement according to the linear or nonlinear combination of these values or based on being stored in the question blank and by the value of index according to the tone of received speech signal and from the parameter of parameter spectral analysis unit 420 outputs by determining unit 470 according to the speech model parameter and the definite tone that draw.
Fig. 5 describes remaining extender and copied cells 440 one can imitate embodiment.Wherein, be transfused to a Fast Fourier Transform (FFT) (FFT) module 510 from the residual error signal in the parameter spectral analysis unit 420 (e) 424.FFT unit 510 is converted into frequency domain to error signal and is used for by copied cells 530 operations.Copied cells 530, under the control of peak detector 520, selection information from residual error signal (e) 424, it can be used for the pumping signal of resident at least a portion.In one embodiment,, peak detector 520 can be discerned peak value or the harmonic wave in the residual error signal (e) of narrow band voice signal.Peak value can be replicated module 530 and copy to high frequency band.Alternately, peak detector 520 can be identified in the subclass of peak value purpose that finds in the narrow band voice signal, (for example, first peak value), and the position of using the pitch period by 430 identifications of tone judging unit to calculate the other peak value that duplicates by copied cells 530.Because peak value detects and duplicates the high frequency band voice content that is synthesized voiceless sound and replaces when voice segments is represented a voiceless sound, so expression is that signal of voiced sound or voiceless sound also is provided for peak detector 520 by the sampling narrow band signal.
The unvoiced speech content is produced by voice content unit 540.Synthetic voiceless sound high frequency band voice content can be created in many different modes.For example, linear regression depends on that speech parameter and tone can be performed provides synthetic voiceless sound high frequency band voice content.As a kind of replacement, a relevant memory module can comprise a question blank, its provide with come self model and definite tone in the synthetic accordingly high frequency band unvoiced speech content of the relevant input value of speech parameter.Be imported into and merge module 560 from duplicate peak information and synthetic voiceless sound high frequency band voice content in the residual error signal.Merge cells 560 allows the output of copied cells 530 and synthetic voiceless sound high frequency band voice content unit 540 to be weighted and to be amounted to together before being changed back time domain by FFT unit 570.Weighted value can be adjusted by gain control unit 550.Gain control module 550 is determined the homogeneity of input spectrums, and uses this information and from the tone information in the tone judge module 430, adjusts the gain relevant with merge cells 120.As the part of weighting algorithm, gain control unit 550 also receives that signal that this voice segments of indication is represented voiced sound or voiceless sound.As mentioned above, this signal can be scale-of-two or " soft " information, and it provides processed received signal section is the possibility of voiced sound or voiceless sound.
Fig. 6 has illustrated that the another one that is used for system and method that the lower frequency range that a synthetic speech resonance peak is added to received signal is gone can imitate embodiment.The embodiment that describes among Fig. 6 is similar to the embodiment that describes among Fig. 4, except remaining extender and replication module 640 provide only based on the output from the information in the received signal arrowband part.Can imitate embodiment as described in Figure 7, and be described below for one of this remaining extender and replication module 640.If tone judging unit 430 is determined an interested particular segment and represents a voiceless sound that then its gauge tap 635 selects to be directly used in residual error (e) signal that is input to composite filter 450.On the contrary, if tone judge module 630 determines to exist a voice signal, so switch 635 be controlled to be connected to remaining extender and copied cells 440 output so so that determine higher frequency content.The output of 660 pairs of composite filters 450 of an amplifilter operate in case increase the expectation sampling frequency a pre-determining section in gain.For example, amplifilter 660 can be designed to increase the gain from 2kHz to the 8kHz frequency band.By simulating the regeneration of various synthetic speech resonance peaks described herein, filter pole to for example can be optimized radius be 0.85 and angle be in the adjacent domain of 0.58 π.
Fig. 7 provides the remaining extender of use in the imitated embodiment of Fig. 6 and the example of copied cells 640.At this, residual error signal (e) is transformed into frequency domain by FFT unit 710 again.The relevant peak value of frequency domain form of peak detector 720 identification and residual error signal (e), it is replicated module 730 then and duplicates and be converted into time domain by FFT module 740.As in the imitated embodiment of Fig. 5, peak detector 620 can detect a subclass of each peak value or peak value independently, and can calculate remaining peak value based on the tone of determining.A for a person skilled in the art clearly, when comparing with the enforcement among Fig. 5, how many this particular implementations of remaining extender and replication module is simplified, because it does not attempt voiceless sound is synthesized in the high frequency band voice content.
Fig. 8 is that the signal that is used for another imitated embodiment of system and method that the received signal that a synthetic speech resonance peak is added to a lower frequency range is gone according to the present invention is described.Be directed to one by a narrow band voice signal of x (n) expression and go up sampler 810 so that obtain to have a new signal s (n) who increases sampling frequency (for example being 16kHz).Should be pointed out that n is a sampling number.T is directed to a segmentation module 820 by the signal s (n) of last sampling, and this segmentation module 820 is comprising that the groups of samples of signal s (n) focuses in the vector (vector) (or buffer).
For example can use an AR model to estimate resonance peak structure.For example can use a linear prediction algorithm to come estimation model parameter ak.Linear prediction module 840 receives the sampling vector conduct input of going up sampled signal s (n) and being produced by segmentation module 820, and calculates fallout predictor polynomial expression ak, and is as described below.A linear predictive coding (LPC) module 830 uses reverse polynomial expression to predict the signal s (n) that causes residue signal e (n), predicated error.By rebuilding original signal with residue signal e (n) excitation AR model.
Signal also will be extended to the top of frequency band.In order to encourage this spread signal, residue signal e (n) is expanded by remaining adjustor module 860, and is directed to a Senthesizer module 870.In addition, a new resonance peak module 850 is estimated the position of resonance peak in lower frequency range, and this information is transmitted to Senthesizer module 870.Senthesizer module 870 uses LPC parameter, the residue signal of expansion and the extended model information that is provided by new resonance peak module 850 to produce the wideband speech signal of exporting from this system.
Fig. 9 has illustrated a kind of system that is used for residue signal is expanded to higher frequency regions, and it can meet the remaining adjustor module of describing 860 in Fig. 8.T residue signal e i(n) be directed to tone estimation module 910, it is for example determined tone and produces a signal 912 representing it based on the distance between the transient state in the error signal.Tone estimation module 910 will determine also whether the voice content of received signal is a voiced sound or a voiceless sound, and produces its signal of expression.Can be one by tone estimation module 910 about the judgement of making as the characteristic of the received signal of voiced sound or voiceless sound and twoly select a soft judgement judging or represent a relative probability (voiced sound of signal indication or person's voiceless sound).Residue signal e i(n) also be directed into a FFT module 920 and be transformed into frequency domain, and be directed into a switch 950.It is an adjustor module 930 of broadband form that the output of the one FFT module 920 is directed to modification of signal.The output of T adjustor module 930 is directed to an anti-FFT (IFFT) module 940, and its output is directed to switch 950.
If tone estimation module 910 is determined an interested particular fragments and represents a voiceless sound that its gauge tap 950 selects to be directly used in the residual error (e) that is input to compositor 870 and goes so.On the contrary, represent a voiced sound if tone estimation module 910 is determined this segmentation, switch 950 is controlled to the output that is connected to adjustor module 930 and IFFT module 940 so, thereby so consequently higher frequency content is determined.For example may be directed to compositor 870 from the output in the switch 950 and be used for further processing.
System described in Fig. 8 and Fig. 9 can be used to realize residing at two methods of high frequency band.In first method, regulator 930 brings generation harmonic wave peak value in high frequency band by the low-frequency band residue signal partly being copied to high frequency.Can by in frequency spectrum, find the first-harmonic peak value that exceeds average frequency spectrum and with frequency in telephone band corresponding on a peak value aim at the harmonic wave peak value.The position of a peak value on the part between the peak value of front and back can be copied to.This causes the peak value vacated equally in high frequency band.Though this method may not can make peak value reach the end (8kHz) of frequency spectrum, this technology can be repeated until the end that has reached frequency spectrum.
The result of this process is described in Figure 13, and it has reflected the peak value of vacating equally basically in high frequency band.Because have only a synthetic resonance peak in the neighbourhood by additional, so do not have here can be by the resonance peak model of the harmonic excitation on about 6kHz at 4.6kHz.This method does not produce any synthetic in final synthetic speech.Depend on the amount of noise that is attached in the AR Model Calculation, the portions of the spectrum that is expanded may need to use a function of decaying with the increase frequency to come weighting.
In the second approach, adjustor module 930 uses pitch period that new harmonic wave peak value is positioned in the tram.By using the pitch period of estimating, can calculate the position of harmonic wave in high frequency band, because harmonic wave is considered to a plurality of fundamental frequencies.The method make its can produce with high frequency band in the higher hamonic wave corresponding peaks.
In global system for mobile communications (GSM), the transmission between mobile phone and base station is performed according to the form of sampling block.In GSM, block is made up of 160 sampling corresponding to the 20ms voice.Block size supposition voice among the GSM are metastable signals.The present invention can be suitable for meeting the GSM sampling structure, therefore uses identical block size.A sampling block is known as a frame.After last sampling, frame length will become 320 sampling and represent with L.
The AR model that voice produce
A kind of method of analog voice signal is: supposition is from having produced signal the white noise sound source through a wave filter.If wave filter only is made up of the utmost point, then this process is known as an automatic regression process.When supposing in short-term stable state, this process may be described by following difference equation.
s i ( n ) = Σ k = 1 p a ik s i ( n - k ) + w i ( n ) - - - ( 1 )
At this, w i(n) be the white noise that unit variance is arranged, s i(n) be the output of this process and p is the model ordering.s i(n-k) be the old output valve of this process and a IkIt is corresponding filter coefficient.Subscript i is used to indicate this algorithm based on processing time delta data block, is the block numbering at this i.This model assumption in current block during signal be stable.Corresponding system function can be represented as in the z territory:
H i ( z ) = 1 1 - Σ k = 1 p a ik z - k = 1 A i ( Z ) - - - ( 2 )
At this H i(z) be system pass on function and A i(z) be known as predicted value.This system only is made up of the utmost point and analog voice by halves, but has illustrated when sound device is similar to pipe cascade as a less loss, pass on function will with the AR Model Matching.The inverse of the system function of AR model, a full null function is:
1 H i ( z ) = 1 + Σ k = 1 p a ik z - k = A i ( Z ) - - - ( 3 )
It is known as predictive filter.This be from [si (n) ..., the last p+1 value of Si (n-p+1) and the s that comes i(n+1) one-step prediction.From signal s iWhat deduct (n) is called  i(n) prediction signal produces predicated error e i(n), it is called as remnants sometimes.Even this approximate not finishing, it also provides the valuable information of relevant voice signal.Nasal cavity and nostril are omitted in this model.If the rank of AR model are selected as very high, the AR model will provide useful being similar to of voice signal so.Narrow band voice signal can be simulated with one eight (8) rank.
The AR model is used in analog voice signal on the short term basis, that is, the duration section of typical 10-30ms, at this, it is stable that voice signal is considered to.The AR model is estimated one and is had approximate speech signal s i(n) impulse response  i(n) full utmost point wave filter.Impulse response  i(n) be the anti-z conversion of system function H (z).Error between model and voice signal, therefore e (n) can be defined as
e i ( n ) = s i ( n ) - s ^ i ( n ) - s i ( n ) - Σ k = 1 p a ik ( i ) s i ( n - k ) - - - ( 4 )
Here there are several methods to be used to seek the coefficient of AR model, a IkAutocorrelation method produces coefficient, and it minimizes
ϵ ( i ) = Σ n = 0 L + p - 1 | e i ( n ) | 2 - - - ( 5 )
At this L is data length.Summation originates in zero and ends at L+p-1.This tentation data is zero outside L the data available and passes through s i(n) multiply by a rectangular window and be done.The error function that causes solving one group of linear equation is minimized
Figure C0180428600182
At this, r Si(k) expression window data (n) auto-correlation and a IkIt is the coefficient of AR model.
Equation 6 can be answered according to good several diverse ways, and a kind of method is the Levinson-Durbin recurrence, and it is to be the fact of Toeplitz based on matrix of coefficients.If the unit in each diagonal line have identical numerical value, then a matrix is Toeplitz.The method is very quick and not only produce filter coefficient aik but also produce reflection coefficient.When utilizing a lattice structure to realize the AR model, reflection coefficient is used.When in fixed point environment (it often is the situation in the mobile phone), realizing a wave filter, should consider the insensitiveness that filter coefficient quantizes.The lattice structure lattice structure is blunt to these influences and therefore implements to be more suitable for than direct form.Being used to find the more effective ways of reflection coefficient is the recurrence of Schur, and it only produces reflection coefficient.
Tone is determined
Before pitch period can be estimated, voice segments character must be determined.The fallout predictor that is described below causes a residue signal.Analyze remaining voice signal and can show that this voice segments represents that a voiced sound still is a voiceless sound.If voice segments is represented a voiceless sound, residue signal is with similar noise so.On the contrary, if residue signal is made up of pulse train, it may represent a voiced sound so.Can carry out this classification with many methods, and because pitch period also needs to be determined, so can estimate that both a kind of methods are preferred simultaneously.A method like this is based on the short time standard auto correlation function of the following residue signal that is defined:
R ie ( l ) = 1 R ie ( 0 ) Σ n = 0 L - 1 - i e i ( n ) e i ( n + l ) - - - ( 7 )
At this, n is the sampling number that has in the frame of index i, and 1 be skew.Work as R Ie(1) maximal value is within range of pitch and when exceeding a threshold value, and voice signal is classified as voiced sound.The range of pitch of voice is 50-800Hz, 1 in its corresponding 20-320 sample range.Figure 10 shows the short time autocorrelation function of a unvoiced frame.A peak value is clearly visible around skew 72.Peak value also is visible at many times of fundamental frequency places.
Another algorithm that is suitable for analyzing residue signal is average amplitude difference functions (AMDF).This method has a low relatively computational complexity.This method is also used residue signal.AMDF is defined as:
AMDF i ( l ) = 1 L Σ n = 0 L - I | e i ( n ) - e i ( n - l ) | - - - ( 8 )
This function has a local minimum value with pitch period skew place accordingly.When the numerical value of local minimum value was lower than a variable threshold, this frame was classified as voiced sound.The method needs a data length of two pitch periods to estimate pitch period at least.Figure 11 shows the curve of the AMDF function of a unvoiced frame, can see several local minimum value.Pitch period approximately is 72 sampling, this means that fundamental frequency is 222Hz when sampling frequency is 16kHz.
Increase a synthetic resonance peak
The whole bag of tricks that increases synthetic resonance frequency is estimated.All these methods are synthesized resonance peak with a wave filter simulation.The AR model has the function that passes on of a following form:
H i ( z ) = 1 1 - Σ k = 1 p a ik z - k - - - ( 9 )
It can be represented as with form again
H i ( z ) = 1 ( 1 - Σ k = 1 p - 2 a ik l z - k ) - 1 1 + a i ( p - l ) l z - 1 + a il l z - 2 ) = H i 1 ( z ) · H i 2 ( z ) - - - ( 10 )
At this a IkRepresent two new AR model coefficients.As illustrated in fig. 12, a wave filter can be divided into two wave filters.H I1(z) the AR model that from the electric current voice segments, calculates of expression and H I2(z) the new synthetic resonance peak wave filter of expression.
In a method, synthetic resonance peak (group) is extremely represented (complexconjugate pole pair) by a complex conjugate.Pass on function H I2(z) can be defined by following equation then:
H i 2 ( z ) = b 0 1 - 2 ν cos ( ω 5 ) + ν 2 - - - ( 11 )
At this, ν is radius and ω 5It is the angle of the utmost point.Parameter b 0Can be used to be provided with the main level of wave filter amplification coefficient.The main level of amplification coefficient can be set to 1 to avoid influencing low frequency signal.This can pass through b 0The coefficient summation that setting equals in Hi2 (z) denominator reaches.Synthetic resonance peak can with radius be 0.85 and angle be that 0,58 π is set up.Parameter b 0To be 2.1453 then.If this synthetic resonance peak is added to the AR model of estimating on the narrow band voice signal, result's the function that passes on will not contain a significant synthetic resonance peak peak value so.The substitute is, the AFA function pass on function with the rising frequency in scope 2.0-3.4kHz.The synthetic inapparent reason of resonance peak is owing to amplitude leyel big in the AR model is poor, is generally 60-80dB.The signal of strengthen revising is so that resonance peak reaches an accurate amplitude leyel has reduced the resonance peak bandwidth and the upper frequency in the low-frequency band has been amplified some dB.This is illustrated in Figure 13 wherein, the rough spectrum structure before the synthetic resonance peak of dotted line 1310 expressions increasing.Solid line 1320 is illustrated in increases synthetic resonance peak spectrum structure afterwards, and it produces a small leak at about 4.6kHz place.
Therefore, using an extremely right resonance peak wave filter of complex conjugate to make is difficult to allow the resonance peak wave filter show a similar common resonance peak.If upper passband wave filter white noise is added on the voice signal before the estimation of AR model parameter, the AR model is with analogue noise and voice signal so.If the rank of AR model remain unchanged (for example, rank eight), then some resonance peak may be by insufficient estimation.When the rank of AR model are increased when not hindering the simulation of low-frequency band voice signal so that it can simulate the noise in the high frequency band, obtain a better AR model.This will make synthetic resonance peak occur more as a common resonance peak.This is illustrated in Figure 14 wherein, the rough spectrum structure before the synthetic resonance peak of dotted line 1410 expressions increasing.Solid line 1420 is illustrated in increases synthetic resonance peak spectrum structure afterwards, and it produces a small leak at about 4.6kHz place.
Figure 15 illustrated voice signal is increased noise with not to the difference between the voice signal increase AR model that noise calculated.Referring to Figure 15, solid line 1510 expressions are defined as an AR model of the narrow band voice signal of the tenth quadravalence.Dotted line 1520 expression is defined as the tenth quadravalence and an AR model of the narrow band voice signal that replenishes with high-pass filter noise.Dotted line 1530 expressions are defined as an AR model of the narrow band voice signal on the 8th rank.
The other method of sort this problem out is to use a kind of more complicated resonance peak wave filter.Wave filter can be by several complex conjugates extremely to constituting with zero.Use more complicated synthetic resonance peak wave filter to increase polar radius in the control filters and carry out relevant wave filter other require the difficulty of (such as the unity gain that obtains at the low frequency place).
In order to control the polar radius of synthetic resonance peak wave filter, it is simple that wave filter should keep.A linear dependence between the radius of existing lower frequency resonance peak and new synthetic resonance peak can according to as determine down:
ν 1α 12α 23α 34α 4=ν ω5 (12)
At this, ν 1ν 2ν 3And ν 4Be from the resonance peak radius in the AR model in the narrow band voice signal.Parameter alpha m, m=1,2,3,4th, linear coefficient.Parameter ν ω 5It is the radius of the 5th synthetic resonance peak of the AR model of wideband speech signal.If several AR models are used, equation 12 can be expressed as so:
Figure C0180428600221
At this, ν is that the AR pattern number is represented in the resonance peak radius and first index, and resonance peak numbering is represented in second index and the 3rd index w in the rightmost vector represents from estimative resonance peak in the wideband speech signal, and k is the numbering of AR model.These equational these systems are determined by multiple factor and can calculate least square by means of pseudo-inverse to solve scheme.The radius that the solution that is obtained is used to calculate new synthetic resonance peak then is:
ν ^ i 5 = r i 1 α 1 + r i 2 α 2 + r i 3 α 3 + r i 4 α 4 - - - ( 14 )
At this, ν I5Be that new synthetic resonance peak radius and alpha parameter is the solution of system of equations 13.Described the present invention with reference to specific embodiment in the above, and for a person skilled in the art clearly, can embody the present invention according to other concrete forms except that above-mentioned preferred embodiment.The specific embodiment of Miao Shuing is illustrative and should be considered limiting by any way in the above.Scope of the present invention determined by claim subsequently, and drops on all changes within the claim and equivalent and mean and be included in wherein.

Claims (17)

1. one kind is used for handling narrow band voice signal so that expand the method for reproduction band by increasing synthetic high frequency band content, and this method comprises the steps:
Butt joint narrows band signal and carries out a spectrum analysis and produce an error signal and the parameter of the described narrow band voice signal of expression;
Based on described error signal, determine whether the tone and the described segments of sounds of a segments of sounds being represented by described narrow band voice signal represents voiced sound or voiceless sound;
Processing is from described narrow band voice signal and the information of coming and therefore produce described synthetic high-frequency band signals content;
Based on the parameter of the described narrow band voice signal of expression of the described generation low-frequency band of regenerating; With
Affiliated low-frequency band and described synthetic high frequency band are synthesized so that produce a wideband speech signal of the described narrow band voice signal of expression.
2. the method for claim 1 is characterized in that: the information content and described synthetic high frequency band content that described narrow band voice signal is provided in the 0-4kHz scope are in the 4-8kHz scope.
3. the step of the information of the method for claim 1, wherein handling from described narrow band voice signal and coming is characterized in that:
Identification and the relevant peak value of reception narrow band signal; With
According in the peak value of tone of determining and identification at least one from information reproduction to a high frequency band that receives in the narrow band signal so that synthetic high frequency band content is provided.
4. the method for claim 1 is characterized in that: automatic forecast of regression model device of spectrum analysis use.
5. the method for claim 1 is characterized in that: sinusoidal curve model of spectrum analysis use.
6. the method for claim 1 is characterized in that: the additional step that optionally amplifies a predetermined frequency scope of described broadband signal.
7. the method for claim 1 is characterized in that: the additional step that described broadband signal is converted to an analog format.
8. method as claimed in claim 7 is characterized in that: the additional step that described broadband signal is amplified.
9. one kind is used for by increasing that synthetic high frequency band content is handled narrow band voice signal so that the system of expansion reproduction band, and described narrow band voice signal is by sampler sampling on first, and described system is characterized in that:
One first parameter spectrum analysis module is used to analyze the resonance peak structure of described upward sampling narrow band voice signal and produces an error signal and the parameter of this narrow-band voice signal of description;
A tone judge module based on described error signal, determines whether the tone and the described segments of sounds of a segments of sounds being represented by described narrow band voice signal represents voiced sound or voiceless sound;
Remaining extender and replication module are handled through the described information that obtains from narrow band voice signal of parameter spectrum analysis module and are produced synthetic high-frequency band signals component; With
Composite filter, low-frequency band is also synthesized a wideband speech signal representing described narrow band voice signal with regeneration to described low-frequency band and described synthetic high frequency band content based on being regenerated by the parameter (422) of first parameter spectral analysis unit output.
10. system as claimed in claim 9 is characterized in that: described remaining extender and replication module comprise:
One first fast Fourier transform module is used for the error signal from the parameter spectrum analysis module is converted into frequency domain;
A peak detector is used for the harmonic frequency of identification error signal; With
A replication module is used for the peak value by peak detector identification is copied to lower frequency range.
11. system as claimed in claim 10 is characterized in that: described remaining extender and replication module also comprise:
A module that is used to produce synthetic unvoiced speech content.
12. system as claimed in claim 11 is characterized in that: described remaining extender and replication module also comprise:
A combiner is used for merging from output signal of replication module with from an output of the module that is used for producing synthetic unvoiced speech content.
13. system as claimed in claim 12 is characterized in that: described remaining extender and replication module also comprise:
A gain control module is used for the input signal weighting combiner.
14. system as claimed in claim 12 is characterized in that: described remaining extender and replication module also comprise:
One second fast Fourier transform module is used for the described merging output signal from combiner is converted into time domain from frequency domain.
15. one kind is used for handling narrow band voice signal so that expand the system of reproduction band by increasing synthetic high frequency band content, it is characterized in that:
Sampler on one second, it receives narrow band voice signal and increases sampling frequency has an output signal that increases frequency spectrum so that produce;
One second parameter spectrum analysis module, it receives from the output signal in the sampler on second and analyzes this output signal so that produce and a speech model and the parameter that residual error signal is relevant;
A tone judge module, it receives from the residual error signal in the parameter spectrum analysis module and produces a tone signal of this voice signal tone of expression and represent that this voice signal is an indicator signal of voiced speech or unvoiced speech; With
Remaining extender and replication module, its reception and handle this residual error signal and this tone signal so that produce a synthetic high-frequency band signals component.
16. system as claimed in claim 15 also comprises:
A composite filter, it receives from parameter in the described second parameter spectrum analysis module and the information that obtains from residual error signal, and a generation broadband signal corresponding with described narrow band voice signal.
17. system as claimed in claim 15, wherein, be operatively connected to a switch of an input end of described composite filter from the indicator signal in the described tone judge module, when so consequently if described indicator signal represents that this voice signal is represented voiced speech, the input of composite filter is connected to the output of remaining extender and replication module so, if and indicator signal indicates described voice signal to represent unvoiced speech, the input of composite filter is connected to from described parameter spectrum analysis module and exports residual error signal so.
CNB018042864A 2000-01-28 2001-01-17 System and method for modifying speech signals Expired - Fee Related CN1185626C (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US17872900P 2000-01-28 2000-01-28
US60/178,729 2000-01-28
US09/754,993 2001-01-05
US09/754,993 US6704711B2 (en) 2000-01-28 2001-01-05 System and method for modifying speech signals

Publications (2)

Publication Number Publication Date
CN1397064A CN1397064A (en) 2003-02-12
CN1185626C true CN1185626C (en) 2005-01-19

Family

ID=26874591

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB018042864A Expired - Fee Related CN1185626C (en) 2000-01-28 2001-01-17 System and method for modifying speech signals

Country Status (7)

Country Link
US (1) US6704711B2 (en)
EP (1) EP1252621B1 (en)
CN (1) CN1185626C (en)
AT (1) ATE253766T1 (en)
AU (1) AU2001230190A1 (en)
DE (1) DE60101148T2 (en)
WO (1) WO2001056021A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105637583A (en) * 2013-09-10 2016-06-01 华为技术有限公司 Adaptive bandwidth extension and apparatus for the same
CN108172239A (en) * 2013-09-26 2018-06-15 华为技术有限公司 The method and device of bandspreading

Families Citing this family (97)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2001260162A1 (en) * 2000-04-06 2001-10-23 Telefonaktiebolaget Lm Ericsson (Publ) Pitch estimation in a speech signal
EP1290681A1 (en) * 2000-05-26 2003-03-12 Cellon France SAS Transmitter for transmitting a signal encoded in a narrow band, and receiver for extending the band of the encoded signal at the receiving end, and corresponding transmission and receiving methods, and system
US6829577B1 (en) * 2000-11-03 2004-12-07 International Business Machines Corporation Generating non-stationary additive noise for addition to synthesized speech
EP1336175A1 (en) * 2000-11-09 2003-08-20 Koninklijke Philips Electronics N.V. Wideband extension of telephone speech for higher perceptual quality
US20020128839A1 (en) * 2001-01-12 2002-09-12 Ulf Lindgren Speech bandwidth extension
US7113522B2 (en) * 2001-01-24 2006-09-26 Qualcomm, Incorporated Enhanced conversion of wideband signals to narrowband signals
US6584437B2 (en) * 2001-06-11 2003-06-24 Nokia Mobile Phones Ltd. Method and apparatus for coding successive pitch periods in speech signal
JP4711099B2 (en) * 2001-06-26 2011-06-29 ソニー株式会社 Transmission device and transmission method, transmission / reception device and transmission / reception method, program, and recording medium
US6941263B2 (en) * 2001-06-29 2005-09-06 Microsoft Corporation Frequency domain postfiltering for quality enhancement of coded speech
JP2003044098A (en) * 2001-07-26 2003-02-14 Nec Corp Device and method for expanding voice band
DE50113277D1 (en) * 2001-09-28 2007-12-27 Nokia Siemens Networks Spa LANGUAGE EQUALIZER AND METHOD FOR ESTIMATING A BROADBAND LANGUAGE SIGNAL ON THE BASIS OF A NARROW-BAND LANGUAGE SIGNAL
US6895375B2 (en) * 2001-10-04 2005-05-17 At&T Corp. System for bandwidth extension of Narrow-band speech
BR0206395A (en) * 2001-11-14 2004-02-10 Matsushita Electric Ind Co Ltd Coding device, decoding device and system thereof
US6934677B2 (en) 2001-12-14 2005-08-23 Microsoft Corporation Quantization matrices based on critical band pattern information for digital audio wherein quantization bands differ from critical bands
US7240001B2 (en) 2001-12-14 2007-07-03 Microsoft Corporation Quality improvement techniques in an audio encoder
GB0202386D0 (en) * 2002-02-01 2002-03-20 Cedar Audio Ltd Method and apparatus for audio signal processing
US20030187663A1 (en) * 2002-03-28 2003-10-02 Truman Michael Mead Broadband frequency translation for high frequency regeneration
US7123948B2 (en) * 2002-07-16 2006-10-17 Nokia Corporation Microphone aided vibrator tuning
US7502743B2 (en) 2002-09-04 2009-03-10 Microsoft Corporation Multi-channel audio encoding and decoding with multi-channel transform selection
US7283585B2 (en) 2002-09-27 2007-10-16 Broadcom Corporation Multiple data rate communication system
US7889783B2 (en) * 2002-12-06 2011-02-15 Broadcom Corporation Multiple data rate communication system
US7519530B2 (en) * 2003-01-09 2009-04-14 Nokia Corporation Audio signal processing
US20040138876A1 (en) * 2003-01-10 2004-07-15 Nokia Corporation Method and apparatus for artificial bandwidth expansion in speech processing
JP4311034B2 (en) * 2003-02-14 2009-08-12 沖電気工業株式会社 Band restoration device and telephone
EP1665228A1 (en) * 2003-08-11 2006-06-07 Faculté Polytechnique de Mons Method for estimating resonance frequencies
US7461003B1 (en) 2003-10-22 2008-12-02 Tellabs Operations, Inc. Methods and apparatus for improving the quality of speech signals
BRPI0415464B1 (en) 2003-10-23 2019-04-24 Panasonic Intellectual Property Management Co., Ltd. SPECTRUM CODING APPARATUS AND METHOD.
CA2454296A1 (en) * 2003-12-29 2005-06-29 Nokia Corporation Method and device for speech enhancement in the presence of background noise
US7460990B2 (en) * 2004-01-23 2008-12-02 Microsoft Corporation Efficient coding of digital media spectral data using wide-sense perceptual similarity
JP4649888B2 (en) * 2004-06-24 2011-03-16 ヤマハ株式会社 Voice effect imparting device and voice effect imparting program
EP1638083B1 (en) * 2004-09-17 2009-04-22 Harman Becker Automotive Systems GmbH Bandwidth extension of bandlimited audio signals
KR100707186B1 (en) * 2005-03-24 2007-04-13 삼성전자주식회사 Audio coding and decoding apparatus and method, and recoding medium thereof
ES2350494T3 (en) * 2005-04-01 2011-01-24 Qualcomm Incorporated PROCEDURE AND APPLIANCES FOR CODING AND DECODING A HIGH BAND PART OF A SPEAKING SIGNAL.
JP5129118B2 (en) * 2005-04-01 2013-01-23 クゥアルコム・インコーポレイテッド Method and apparatus for anti-sparse filtering of bandwidth extended speech prediction excitation signal
US8249861B2 (en) * 2005-04-20 2012-08-21 Qnx Software Systems Limited High frequency compression integration
US8086451B2 (en) * 2005-04-20 2011-12-27 Qnx Software Systems Co. System for improving speech intelligibility through high frequency compression
US7813931B2 (en) * 2005-04-20 2010-10-12 QNX Software Systems, Co. System for improving speech quality and intelligibility with bandwidth compression/expansion
CN101199003B (en) * 2005-04-22 2012-01-11 高通股份有限公司 Systems, methods, and apparatus for gain factor attenuation
US8311840B2 (en) * 2005-06-28 2012-11-13 Qnx Software Systems Limited Frequency extension of harmonic signals
US7546237B2 (en) * 2005-12-23 2009-06-09 Qnx Software Systems (Wavemakers), Inc. Bandwidth extension of narrowband speech
US7953604B2 (en) * 2006-01-20 2011-05-31 Microsoft Corporation Shape and scale parameters for extended-band frequency coding
US8190425B2 (en) * 2006-01-20 2012-05-29 Microsoft Corporation Complex cross-correlation parameters for multi-channel audio
US7831434B2 (en) * 2006-01-20 2010-11-09 Microsoft Corporation Complex-transform channel coding with extended-band frequency coding
WO2007095664A1 (en) * 2006-02-21 2007-08-30 Dynamic Hearing Pty Ltd Method and device for low delay processing
US8392176B2 (en) * 2006-04-10 2013-03-05 Qualcomm Incorporated Processing of excitation in audio coding and decoding
US20080300866A1 (en) * 2006-05-31 2008-12-04 Motorola, Inc. Method and system for creation and use of a wideband vocoder database for bandwidth extension of voice
KR20070115637A (en) * 2006-06-03 2007-12-06 삼성전자주식회사 Method and apparatus for bandwidth extension encoding and decoding
US20090281813A1 (en) * 2006-06-29 2009-11-12 Nxp B.V. Noise synthesis
US9454974B2 (en) 2006-07-31 2016-09-27 Qualcomm Incorporated Systems, methods, and apparatus for gain factor limiting
US8775168B2 (en) * 2006-08-10 2014-07-08 Stmicroelectronics Asia Pacific Pte, Ltd. Yule walker based low-complexity voice activity detector in noise suppression systems
KR101375582B1 (en) * 2006-11-17 2014-03-20 삼성전자주식회사 Method and apparatus for bandwidth extension encoding and decoding
US8639500B2 (en) * 2006-11-17 2014-01-28 Samsung Electronics Co., Ltd. Method, medium, and apparatus with bandwidth extension encoding and/or decoding
US7818168B1 (en) * 2006-12-01 2010-10-19 The United States Of America As Represented By The Director, National Security Agency Method of measuring degree of enhancement to voice signal
US8005671B2 (en) 2006-12-04 2011-08-23 Qualcomm Incorporated Systems and methods for dynamic normalization to reduce loss in precision for low-level signals
KR101379263B1 (en) 2007-01-12 2014-03-28 삼성전자주식회사 Method and apparatus for decoding bandwidth extension
US7912729B2 (en) * 2007-02-23 2011-03-22 Qnx Software Systems Co. High-frequency bandwidth extension in the time domain
EP1970900A1 (en) * 2007-03-14 2008-09-17 Harman Becker Automotive Systems GmbH Method and apparatus for providing a codebook for bandwidth extension of an acoustic signal
GB0705324D0 (en) * 2007-03-20 2007-04-25 Skype Ltd Method of transmitting data in a communication system
US7885819B2 (en) 2007-06-29 2011-02-08 Microsoft Corporation Bitstream syntax for multi-process audio decoding
US8041577B2 (en) * 2007-08-13 2011-10-18 Mitsubishi Electric Research Laboratories, Inc. Method for expanding audio signal bandwidth
US20090198500A1 (en) * 2007-08-24 2009-08-06 Qualcomm Incorporated Temporal masking in audio coding based on spectral dynamics in frequency sub-bands
US8428957B2 (en) * 2007-08-24 2013-04-23 Qualcomm Incorporated Spectral noise shaping in audio coding based on spectral dynamics in frequency sub-bands
BRPI0818927A2 (en) * 2007-11-02 2015-06-16 Huawei Tech Co Ltd Method and apparatus for audio decoding
EP2220646A1 (en) * 2007-11-06 2010-08-25 Nokia Corporation Audio coding apparatus and method thereof
WO2009059633A1 (en) * 2007-11-06 2009-05-14 Nokia Corporation An encoder
WO2009086174A1 (en) 2007-12-21 2009-07-09 Srs Labs, Inc. System for adjusting perceived loudness of audio signals
US9159325B2 (en) * 2007-12-31 2015-10-13 Adobe Systems Incorporated Pitch shifting frequencies
US20090201983A1 (en) * 2008-02-07 2009-08-13 Motorola, Inc. Method and apparatus for estimating high-band energy in a bandwidth extension system
US20090314154A1 (en) * 2008-06-20 2009-12-24 Microsoft Corporation Game data generation based on user provided song
CN101620854B (en) * 2008-06-30 2012-04-04 华为技术有限公司 Method, system and device for frequency band expansion
JP4818335B2 (en) * 2008-08-29 2011-11-16 株式会社東芝 Signal band expander
CN101859578B (en) * 2009-04-08 2011-08-31 陈伟江 Method for manufacturing and processing voice products
EP2273493B1 (en) * 2009-06-29 2012-12-19 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Bandwidth extension encoding and decoding
US8538042B2 (en) * 2009-08-11 2013-09-17 Dts Llc System for increasing perceived loudness of speakers
US8204742B2 (en) * 2009-09-14 2012-06-19 Srs Labs, Inc. System for processing an audio signal to enhance speech intelligibility
WO2011035813A1 (en) * 2009-09-25 2011-03-31 Nokia Corporation Audio coding
US8484020B2 (en) 2009-10-23 2013-07-09 Qualcomm Incorporated Determining an upperband signal from a narrowband signal
CN102610231B (en) * 2011-01-24 2013-10-09 华为技术有限公司 Method and device for expanding bandwidth
WO2013019562A2 (en) * 2011-07-29 2013-02-07 Dts Llc. Adaptive voice intelligibility processor
EP2774145B1 (en) * 2011-11-03 2020-06-17 VoiceAge EVS LLC Improving non-speech content for low rate celp decoder
US9312829B2 (en) 2012-04-12 2016-04-12 Dts Llc System for adjusting loudness of audio signals in real time
CN103426441B (en) 2012-05-18 2016-03-02 华为技术有限公司 Detect the method and apparatus of the correctness of pitch period
KR102174270B1 (en) * 2012-10-12 2020-11-04 삼성전자주식회사 Voice converting apparatus and Method for converting user voice thereof
US9564119B2 (en) * 2012-10-12 2017-02-07 Samsung Electronics Co., Ltd. Voice converting apparatus and method for converting user voice thereof
MX361866B (en) 2012-11-13 2018-12-18 Samsung Electronics Co Ltd Method and apparatus for determining encoding mode, method and apparatus for encoding audio signals, and method and apparatus for decoding audio signals.
CN103594091B (en) * 2013-11-15 2017-06-30 努比亚技术有限公司 A kind of mobile terminal and its audio signal processing method
US9524720B2 (en) 2013-12-15 2016-12-20 Qualcomm Incorporated Systems and methods of blind bandwidth extension
US20150215668A1 (en) * 2014-01-29 2015-07-30 Silveredge, Inc. Method and System for cross-device targeting of users
FR3017484A1 (en) * 2014-02-07 2015-08-14 Orange ENHANCED FREQUENCY BAND EXTENSION IN AUDIO FREQUENCY SIGNAL DECODER
KR102033603B1 (en) 2014-11-07 2019-10-17 삼성전자주식회사 Method and apparatus for restoring audio signal
US10847170B2 (en) 2015-06-18 2020-11-24 Qualcomm Incorporated Device and method for generating a high-band signal from non-linearly processed sub-ranges
US9837089B2 (en) * 2015-06-18 2017-12-05 Qualcomm Incorporated High-band signal generation
JP6611042B2 (en) * 2015-12-02 2019-11-27 パナソニックIpマネジメント株式会社 Audio signal decoding apparatus and audio signal decoding method
EP3398355A1 (en) * 2015-12-29 2018-11-07 Otis Elevator Company Acoustic elevator communication system and method of adjusting such a system
CN106997767A (en) * 2017-03-24 2017-08-01 百度在线网络技术(北京)有限公司 Method of speech processing and device based on artificial intelligence
JP6903242B2 (en) * 2019-01-31 2021-07-14 三菱電機株式会社 Frequency band expansion device, frequency band expansion method, and frequency band expansion program
CN113066503B (en) * 2021-03-15 2023-12-08 广州酷狗计算机科技有限公司 Audio frame adjusting method, device, equipment and readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3683767D1 (en) * 1986-04-30 1992-03-12 Ibm VOICE CODING METHOD AND DEVICE FOR CARRYING OUT THIS METHOD.
US6208959B1 (en) 1997-12-15 2001-03-27 Telefonaktibolaget Lm Ericsson (Publ) Mapping of digital data symbols onto one or more formant frequencies for transmission over a coded voice channel
EP0945852A1 (en) 1998-03-25 1999-09-29 BRITISH TELECOMMUNICATIONS public limited company Speech synthesis
GB2351889B (en) 1999-07-06 2003-12-17 Ericsson Telefon Ab L M Speech band expansion

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105637583A (en) * 2013-09-10 2016-06-01 华为技术有限公司 Adaptive bandwidth extension and apparatus for the same
CN105637583B (en) * 2013-09-10 2017-08-29 华为技术有限公司 Adaptive bandwidth extended method and its device
CN107393552A (en) * 2013-09-10 2017-11-24 华为技术有限公司 Adaptive bandwidth extended method and its device
US10249313B2 (en) 2013-09-10 2019-04-02 Huawei Technologies Co., Ltd. Adaptive bandwidth extension and apparatus for the same
CN108172239A (en) * 2013-09-26 2018-06-15 华为技术有限公司 The method and device of bandspreading
CN108172239B (en) * 2013-09-26 2021-01-12 华为技术有限公司 Method and device for expanding frequency band

Also Published As

Publication number Publication date
ATE253766T1 (en) 2003-11-15
AU2001230190A1 (en) 2001-08-07
US20010044722A1 (en) 2001-11-22
EP1252621A1 (en) 2002-10-30
DE60101148D1 (en) 2003-12-11
EP1252621B1 (en) 2003-11-05
US6704711B2 (en) 2004-03-09
WO2001056021A1 (en) 2001-08-02
CN1397064A (en) 2003-02-12
DE60101148T2 (en) 2004-05-27

Similar Documents

Publication Publication Date Title
CN1185626C (en) System and method for modifying speech signals
CN1750124B (en) Bandwidth extension of band limited audio signals
CN103026407B (en) Bandwidth extender
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
US6691090B1 (en) Speech recognition system including dimensionality reduction of baseband frequency signals
CN1735926A (en) Method and apparatus for artificial bandwidth expansion in speech processing
CN1265217A (en) Method and appts. for speech enhancement in speech communication system
CN1225736A (en) Voice activity detector
CN1335980A (en) Wide band speech synthesis by means of a mapping matrix
CN106409313A (en) Audio signal classification method and apparatus
WO2005117517A2 (en) Neuroevolution-based artificial bandwidth expansion of telephone band speech
CN112133277B (en) Sample generation method and device
CN1193344C (en) Speech decoder and method for decoding speech
CN106997765A (en) The quantitatively characterizing method of voice tone color
KR100216018B1 (en) Method and apparatus for encoding and decoding of background sounds
CN109036470A (en) Speech differentiation method, apparatus, computer equipment and storage medium
JP2006521576A (en) Method for analyzing fundamental frequency information, and voice conversion method and system implementing this analysis method
CN111326170A (en) Method and device for converting ear voice into normal voice by combining time-frequency domain expansion convolution
CN108010533A (en) The automatic identifying method and device of voice data code check
Vlaj et al. Voice activity detection algorithm using nonlinear spectral weights, hangover and hangbefore criteria
CN1625681A (en) Generation LSF vector
GB2336978A (en) Improving speech intelligibility in presence of noise
CN111243608A (en) Low-rate speech coding method based on depth self-coding machine
CN110580920A (en) Method and system for judging clear and voiced sounds of sub-band of vocoder
Kim et al. A voice activity detection algorithm for wireless communication systems with dynamically varying background noise

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee