WO2013127364A1 - 一种语音频信号处理方法和装置 - Google Patents

一种语音频信号处理方法和装置 Download PDF

Info

Publication number
WO2013127364A1
WO2013127364A1 PCT/CN2013/072075 CN2013072075W WO2013127364A1 WO 2013127364 A1 WO2013127364 A1 WO 2013127364A1 CN 2013072075 W CN2013072075 W CN 2013072075W WO 2013127364 A1 WO2013127364 A1 WO 2013127364A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
time domain
parameter
current frame
frequency band
Prior art date
Application number
PCT/CN2013/072075
Other languages
English (en)
French (fr)
Inventor
刘泽新
苗磊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to JP2014559077A priority Critical patent/JP6010141B2/ja
Priority to EP18199234.8A priority patent/EP3534365B1/en
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to ES13754564.6T priority patent/ES2629135T3/es
Priority to MX2014010376A priority patent/MX345604B/es
Priority to KR1020177002148A priority patent/KR101844199B1/ko
Priority to KR1020167028242A priority patent/KR101702281B1/ko
Priority to CA2865533A priority patent/CA2865533C/en
Priority to MX2017001662A priority patent/MX364202B/es
Priority to SG11201404954WA priority patent/SG11201404954WA/en
Priority to EP16187948.1A priority patent/EP3193331B1/en
Priority to RU2014139605/08A priority patent/RU2585987C2/ru
Priority to EP13754564.6A priority patent/EP2821993B1/en
Priority to BR112014021407-7A priority patent/BR112014021407B1/pt
Priority to KR1020147025655A priority patent/KR101667865B1/ko
Priority to PL18199234T priority patent/PL3534365T3/pl
Priority to IN1739KON2014 priority patent/IN2014KN01739A/en
Publication of WO2013127364A1 publication Critical patent/WO2013127364A1/zh
Priority to ZA2014/06248A priority patent/ZA201406248B/en
Priority to US14/470,559 priority patent/US9691396B2/en
Priority to US15/616,188 priority patent/US10013987B2/en
Priority to US16/021,621 priority patent/US10360917B2/en
Priority to US16/457,165 priority patent/US10559313B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/083Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being an excitation gain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • G10L19/125Pitch excitation, e.g. pitch synchronous innovation CELP [PSI-CELP]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition

Definitions

  • the present invention relates to the field of digital signal processing technologies, and more particularly to a speech and audio signal processing method and apparatus. Background technique
  • voice, image, audio, and video transmissions have a wide range of application requirements, such as mobile phone calls, audio and video conferencing, broadcast television, and multimedia entertainment.
  • the audio is digitized and passed from one terminal to another via an audio communication network, where the terminal can be a cell phone, a digital telephone terminal or any other type of audio terminal, such as a VOIP phone or ISDN phone, computer, cable communication phone.
  • the speech and audio signals are compressed and processed at the transmitting end and transmitted to the receiving end, and the receiving end recovers the speech and audio signals by the decompression process and plays them.
  • the network will cut off the code rate transmitted from the encoding end to the network, and decode the truncated code stream at the decoding end.
  • the bandwidth of the spoken audio signal so that the output of the spoken audio signal will switch between different bandwidths.
  • a speech and audio signal processing method includes: obtaining an initial high frequency band signal corresponding to a current frame speech and audio signal when a speech audio signal is switched from a wideband signal to a narrowband signal;
  • a narrow band time domain signal of the current frame and the modified high band time domain signal are synthesized and output.
  • a speech signal processing method includes:
  • a speech signal processing apparatus includes:
  • a prediction unit configured to obtain an initial high-band signal corresponding to the current frame speech and audio signal when the speech signal is switched from the broadband signal to the narrow-band signal;
  • a parameter obtaining unit configured to obtain a time domain global gain parameter of the high frequency band signal according to a spectral tilt parameter of a current frame speech audio signal, a correlation between a current frame narrow band signal and a historical frame narrow band signal; Correcting the initial high-band signal with a predicted global gain parameter to obtain a modified high-band time domain signal;
  • a speech and audio signal processing apparatus includes: an obtaining unit, configured to obtain an initial high frequency band signal corresponding to a current frame speech and audio signal when bandwidth switching occurs of the speech and audio signal;
  • a parameter obtaining unit configured to obtain a time domain global gain parameter corresponding to the initial high frequency band signal
  • a weighting processing unit configured to perform weighting processing on the energy ratio value and the time domain global gain parameter, and obtain the weighted value as a predicted a global gain parameter
  • the energy ratio is a ratio of a time domain signal energy of the historical frame high frequency band to an initial high frequency band signal energy of the current frame
  • a correction unit configured to correct the initial high-band signal by using a predicted global gain parameter to obtain a modified high-band time domain signal
  • a synthesizing unit configured to synthesize and output the narrowband time domain signal of the current frame and the modified high frequency band time domain signal.
  • the embodiment of the invention corrects the high-band signal by switching between the wide-band and the narrow-band, so that the high-band signal between the wide-band and the narrow-band is smoothly transitioned, and the switching between the wide-band and the narrow-band is effectively removed. Hearing discomfort; At the same time, because the bandwidth switching algorithm and the codec algorithm of the high-band signal before switching are in the same signal domain, it ensures that the algorithm is not added and the algorithm is simple, and the performance of the output signal is also guaranteed.
  • FIG. 1 is a schematic flowchart of an embodiment of a speech and audio signal processing method according to the present invention
  • FIG. 2 is a schematic flowchart of another embodiment of a speech and audio signal processing method according to the present invention
  • FIG. 3 is a schematic diagram of speech and audio signal processing provided by the present invention.
  • FIG. 4 is a schematic flowchart diagram of another embodiment of a speech and audio signal processing method according to the present invention
  • FIG. 5 is a schematic structural diagram of an embodiment of a speech and audio signal processing apparatus according to the present invention
  • FIG. 7 is a schematic structural diagram of an embodiment of a parameter obtaining unit provided by the present invention;
  • FIG. 8 is a schematic structural diagram of an embodiment of a global gain parameter obtaining unit provided by the present invention
  • FIG. 9 is a schematic structural diagram of an embodiment of an acquiring unit provided by the present invention.
  • FIG. 10 is a schematic structural diagram of another embodiment of a speech and audio signal processing apparatus according to the present invention. detailed description
  • audio codecs and video codecs are widely used in various electronic devices, such as: mobile phones, wireless devices, personal data assistants (PDAs), handheld or portable computers, GPS receivers/navigators. , cameras, audio/video players, camcorders, video recorders, surveillance equipment, etc.
  • PDAs personal data assistants
  • audio/video players camcorders
  • video recorders surveillance equipment, etc.
  • an electronic device includes an audio encoder or an audio decoder, and the audio encoder or decoder may be directly implemented by a digital circuit or a chip such as a DSP (digital signal processor), or may be executed by a software code driven processor in the software code. The process is implemented.
  • DSP digital signal processor
  • the bandwidth of the speech and audio signals changes frequently during the transmission of the speech audio signals, and there are narrow-band speech audio signals to the broadband speech.
  • Audio signal switching and the phenomenon that a wideband speech audio signal is switched to a narrowband speech audio signal.
  • the process of switching such speech audio signals between high and low frequency bands is called bandwidth switching, and the bandwidth switching includes switching from narrow band signals to wide band signals and switching from wide band to narrow band signals.
  • the narrow-band signal mentioned in the present invention is a speech signal which has only a low-band component and a high-band component is empty by up-sampling and low-pass filtering, and the wide-band speech audio signal has both a low-band signal component and a high-frequency signal.
  • the narrowband signal and the wideband signal are relative, for example, the wideband signal is a wideband signal with respect to the narrowband signal; the ultrawideband signal is a broadband signal with respect to the wideband signal.
  • the narrowband signal is a speech audio signal with a sampling rate of 8 kHz
  • the wideband signal is a speech audio signal with a sampling rate of 16 kHz
  • the ultra-wideband is a speech audio signal with a sampling rate of 32 kHz.
  • the coding and decoding algorithm of the high-band signal before switching is selected between the codec algorithms in the time domain and the frequency domain according to different signal types, or the coding algorithm of the high-band signal before the handover is a time domain coding algorithm.
  • the handover algorithm maintains and processes the high-band codec algorithm before handover in the same signal domain, that is, the high-band signal before handover uses the time domain codec algorithm, and the following
  • the switching algorithm uses a time domain switching algorithm; the high frequency band signal before switching uses a frequency domain codec algorithm, and the next switching algorithm uses a frequency domain switching algorithm.
  • the prior art does not use a similar time domain switching technique after switching using the time domain band extension algorithm before handover.
  • Speech audio coding is generally handled in units of frames.
  • the currently input audio frame to be processed is the current frame speech audio signal;
  • the current frame speech audio signal includes the narrow band signal and the high band signal, that is, the current frame narrow band signal and the current frame high band signal.
  • the audio signal of any frame before the current frame audio signal is a historical frame audio signal, and also includes a historical frame narrowband signal and a historical frame high frequency band signal;
  • the previous frame speech audio signal is one frame of the previous audio and video signal is the previous frame Audio signal.
  • an embodiment of a speech audio signal processing method of the present invention includes:
  • the current frame speech audio signal is composed of the current frame narrow band signal and the current frame high band time domain signal.
  • Bandwidth switching includes switching from narrowband signals to wideband signals and switching from wideband to narrowband signals; for switching from narrowband signals to wideband signals, the current framed speech signal is the current frame wideband signal, including narrow The band signal and the high band signal, the initial high frequency band signal of the current frame speech audio signal is a real signal, which can be directly obtained from the current frame speech audio signal; for the switching from the wide band to the narrow band signal, the current frame speech audio
  • the signal is the current frame narrowband signal, the current frame high frequency band time domain signal is empty, and the initial high frequency band signal of the current frame speech audio signal is a prediction signal, and the high frequency band signal corresponding to the current frame narrowband signal needs to be predicted as an initial High frequency band signal.
  • the time-domain global gain parameters of the high-band signals can be obtained by decoding; for the switching of the wide-band signals to the narrow-band signals, the time-domain global gain parameters of the high-band signals can be based on the current Frame signal acquisition: obtaining a time domain global gain parameter of the high frequency band signal according to a spectral tilt parameter of the narrow band signal and a correlation of the current frame narrow band signal with the historical frame narrow band signal.
  • the energy ratio is a ratio of a high-band time domain signal energy of the historical frame speech audio signal to an initial high-band signal energy of the current frame speech audio signal;
  • the correction refers to multiplication of the signal by multiplying the predicted global gain parameter by the initial high-band signal.
  • the time domain envelope parameter and the time domain global gain parameter corresponding to the initial high frequency band signal are obtained in step S102, and the initial height is determined by using the time domain envelope parameter and the predicted global gain parameter in step S104.
  • the frequency band signal is corrected to obtain a modified high-band time-domain signal; that is, the time-domain envelope parameter and the predicted time-domain global gain parameter are multiplied by the predicted high-band signal to obtain a high-band time-domain signal.
  • the time domain envelope parameter of the high frequency band signal can be obtained by decoding; for the switching of the broadband signal to the narrowband signal, the time domain envelope parameter of the high frequency band signal can be based on the current Frame signal acquisition: A preset series of values or a historical frame high-band time domain envelope parameter can be used as a high-band time domain envelope parameter of the current frame speech audio signal.
  • S105 Synthesize a narrowband time domain signal of the current frame and the modified high frequency band time domain signal and output.
  • the above embodiment makes the smooth transition of the high-band signal between the wide band and the narrow band by switching between the wide-band and narrow-band switching time-time high-band signals, effectively removing the hearing loss caused by switching between the wide-band and narrow-band bands. Comfort;
  • the bandwidth switching algorithm and the codec algorithm of the high-band signal before switching are in the same signal domain, it ensures that the algorithm is not added and the algorithm is simple, and the performance of the output signal is also guaranteed.
  • FIG. 2 another embodiment of the speech audio signal processing method of the present invention includes:
  • the step of predicting the predicted high-band signal corresponding to the current frame narrow-band signal comprises: predicting the current frame-audio signal high-band signal excitation signal according to the current frame narrow-band signal; and predicting the LPC of the current frame-audio signal high-band signal (Linear) Predictive Coding, Coefficient: Synthesizes the predicted high-band excitation signal and LPC coefficients to obtain the predicted high-band signal syn-tmp.
  • parameters such as a pitch period, an algebraic number, and a gain may be extracted from the narrowband signal, and the excitation signal predicted to the high frequency band is filtered by the variable;
  • the high-band excitation signal can be predicted by operating on a narrow-band time domain signal or a narrow-band time-domain excitation signal by employing, low-passing, then taking an absolute value or taking a square.
  • the high-band LPC coefficient of the historical frame or a preset series of values can be used as the current frame LPC coefficient; different prediction modes can also be used for different signal types.
  • a predetermined set of values can be used as the high-band time domain envelope parameter of the current frame.
  • the narrowband signals can be roughly divided into several categories, each of which is preset with a series of values, and a set of pre-set time domain envelope parameters is selected according to the type of the narrowband signal of the current frame;
  • the domain envelope value for example, the number of time domain envelopes is M, and the preset value may be M 0.3536.
  • the acquisition of the time domain envelope parameter is an optional step and is not required.
  • the method includes the following steps:
  • S2021 Dividing the current frame speech and audio signal into a first type signal or a second type signal according to a spectral tilt parameter of the current frame speech audio signal and a correlation between a current frame narrow band signal and a historical frame narrow band signal;
  • the first type of signal is a fricative sound signal
  • the second type of signal is a non-friction sound signal
  • the narrowband signal is divided into fricatives, and the other is non- Friction sound.
  • the calculation of the correlation size parameter cor of the current frame narrowband signal and the historical frame narrowband signal may be determined by the magnitude relationship of the energy of the same frequency band signal, or may be determined by the energy relationship of several identical frequency bands, or Autocorrelation or cross-correlation of time-domain signals or time-domain excitation signals Formula to calculate.
  • the current frame speech audio signal is the first type of signal, limiting the spectral tilt parameter to a first predetermined value or less, obtaining a spectral tilt parameter limit value; and using the said tilt parameter limit value as the high frequency band signal Domain global gain parameter. That is, when the spectral tilt parameter of the current frame speech audio signal is less than or equal to the first predetermined value, the original value of the spectral tilt parameter is reserved as the spectral tilt parameter limit value; when the spectral tilt parameter of the current frame speech audio signal is greater than the first predetermined value, the first is taken.
  • the predetermined value is used as a general value of the tilt parameter.
  • g ain ' is obtained by the following formula: Wherein, tilt is a tilt parameter, and 31 is a first predetermined value.
  • the upper limit of the interval value is used as the spectral tilt parameter limit value; when the spectral tilt parameter of the current frame speech audio signal is smaller than the lower limit of the first interval value, the lower limit of the first interval value is taken as the spectral tilt parameter limit value.
  • the time domain global gain parameter g am ' is obtained by the following formula: Where tilt is the ⁇ tilt parameter, [", 6 ] is the first interval value.
  • the spectral tilt parameter tilt of the narrowband signal and the correlation size parameter cor of the current frame narrowband signal and the historical frame narrowband signal are obtained; according to the tilt and cor, the current frame signal is divided into two types: a rubbing sound and a non-friction sound.
  • the tilt parameter tilt>5 and the correlation parameter cor is less than a given value
  • the narrowband signal is divided into fricatives, and the other is non-friction;
  • S203 weighting the energy ratio value and the time domain global gain parameter, and obtaining the weighted value as the predicted global gain parameter; wherein, the energy ratio is a historical frame speech audio signal high frequency band time domain signal energy and a current frame speech audio signal The ratio of the initial high-band signal energy;
  • the high-band time domain signal is obtained by multiplying the predicted high-band signal by the time domain envelope parameter and the predicted time domain global gain parameter.
  • the time domain envelope parameter is optional.
  • the predicted high frequency band signal may be corrected by using the predicted global gain parameter to obtain the modified high frequency band.
  • the domain signal; that is, the predicted high frequency band signal is multiplied by the predicted high frequency band signal to obtain a modified high frequency band time domain signal.
  • S205 Synthesize a narrowband time domain signal of the current frame and the modified high frequency band time domain signal and output.
  • the energy of the high-band time domain signal syn Esyn is used to predict the time domain global gain parameter of the next frame,
  • Esyn's value is assigned to Esyn (- 1 )
  • the above embodiment makes the smooth transition of the high frequency band portion between the wide band and the narrow band by the correction of the high frequency band of the narrow band signal after the wide band signal, effectively removing the hearing discomfort caused by switching between the wide band and the narrow band.
  • Sense At the same time, due to the corresponding processing of the frame at the time of switching, the problems occurring in the parameter and status update are indirectly removed.
  • By keeping the bandwidth switching algorithm and the codec algorithm of the high-band signal before switching in the same signal domain it is ensured that the performance of the output signal is ensured without adding extra delay and the algorithm is simple.
  • another embodiment of the speech audio signal processing method of the present invention includes:
  • S302 Obtain a time domain envelope parameter and a time domain global gain parameter corresponding to the high frequency band signal; the time domain envelope parameter and the time domain global gain parameter may be directly obtained from a current frame high frequency band signal. Among them, the acquisition of the time domain envelope parameter is an optional step.
  • S303 weighting the energy ratio value and the time domain global gain parameter, and obtaining the weighted value as the predicted global gain parameter; wherein, the energy ratio is the historical frame speech audio signal high frequency band time domain signal energy and the current frame speech audio signal The ratio of the initial high-band signal energy. ;
  • each parameter of the high frequency band signal can be obtained by decoding.
  • the weighting factor alfa of the energy ratio corresponding to the previous frame of the audio signal is attenuated by a certain step as the current audio.
  • the weighting factor of the energy ratio corresponding to the frame is attenuated frame by frame until alfa is zero.
  • the alf is attenuated frame by frame according to a certain step. Until the alfa decays to 0; when the backward inter-frame narrowband signal has no correlation, the alfa is directly attenuated to 0, that is, the current decoding result is maintained, and no weighting and correction processing is performed. .
  • S304 Correct the high-band signal by using a time domain envelope parameter and a predicted global gain parameter to obtain a modified high-band time domain signal;
  • the modified time domain envelope parameter and the predicted time domain global gain parameter are multiplied by the high frequency band signal to obtain a modified high frequency band time domain signal.
  • the time domain envelope parameter is optional, and when only the time domain time domain global gain parameter is included, the high-band signal can be corrected by using the predicted global gain parameter to obtain a modified high-band time domain signal; that is, the corrected high-band signal is obtained by multiplying the predicted global gain parameter by the high-band signal.
  • S305 Synthesize a narrowband time domain signal of the current frame and the modified high frequency band time domain signal and output.
  • the correction of the high frequency band of the wideband signal after the narrowband signal enables a smooth transition of the high frequency band between the wideband and the narrowband, effectively removing the hearing discomfort caused by the switching between the wideband and the narrowband.
  • Sense At the same time, due to the corresponding processing of the frame at the time of switching, the problems occurring in the parameter and status update are indirectly removed.
  • the bandwidth switching algorithm and the encoding and decoding algorithm of the high-band signal before switching in the same signal domain it is ensured that the performance of the output signal is ensured without adding extra delay and the algorithm is simple.
  • another embodiment of the speech audio signal processing method of the present invention includes:
  • the wideband signal is switched to the narrowband, that is, the previous frame is a wideband signal, and the current frame is a narrowband signal.
  • the step of predicting the initial high frequency band signal corresponding to the current frame narrowband signal comprises: predicting the current frame speech audio signal high frequency band signal excitation signal according to the current frame narrow frequency band signal; and predicting the LPC coefficient of the current frame speech audio signal high frequency band signal: The predicted high-band excitation signal and the LPC coefficient are synthesized to obtain an initial high-band signal syn tmp.
  • parameters such as pitch period, algebraic number, and gain may be extracted from the narrowband signal, and the excitation signal predicted to the high frequency band is filtered by variable sampling;
  • the high-band excitation signal can be predicted by operation of the narrow-band time domain signal or the narrow-band time domain excitation signal by using the upper pass, the low pass, and then taking the absolute value or taking the square.
  • the high-band LPC coefficient of the historical frame or a preset series of values can be used as the current frame LPC coefficient; different prediction modes can also be used for different signal types.
  • S402 Obtain a time domain global gain parameter of the high frequency band signal according to a current tilt parameter of the current frame audio signal, a correlation between a current frame narrow frequency band signal and a historical frame narrow frequency band signal;
  • S2021 Dividing the current frame speech audio signal into a first type signal or a second type signal according to a spectral tilt parameter of the current frame speech audio signal and a correlation between a current frame narrow frequency band and a historical frame narrow band signal;
  • the first type of signal is a fricative signal
  • the second type of signal is a non-frictional signal.
  • the narrow band signal when the tilt parameter tilt > 5 and the correlation parameter cor is less than a given value, the narrow band signal is divided into fricatives, and the other is non-friction.
  • the calculation of the correlation size parameter cor of the current frame narrowband signal and the historical frame narrowband signal may be determined by the magnitude relationship of the energy of the same frequency band signal, or may be determined by the energy relationship of several identical frequency bands, or Calculated by the autocorrelation or cross-correlation formula of the time domain signal or the time domain excitation signal.
  • the current frame speech audio signal is the first type of signal, limiting the spectral tilt parameter to be less than or equal to the first predetermined value, obtaining a spectral tilt parameter limit value; and using the spectral tilt parameter limit value as the high frequency band signal Domain global gain parameter. That is, when the spectral tilt parameter of the current frame speech audio signal is less than or equal to the first predetermined value, the original value of the spectral tilt parameter is reserved as the i tilt parameter limit value; when the tilt parameter of the current frame speech audio signal is greater than the first predetermined value, the first is taken.
  • the predetermined value is used as the threshold value of the tilt parameter.
  • the time domain global gain parameter g ain ' is obtained by the following formula Wherein, tilt is a ⁇ tilt parameter, which is a first predetermined value.
  • the upper limit of the interval value is used as the spectral tilt parameter limit value; when the ⁇ tilt parameter of the current frame speech audio signal is smaller than the lower limit of the first interval value, the lower limit of the first interval value is taken as the spectral tilt parameter limit value.
  • the time domain global gain parameter g ain ' is obtained by the following formula: Among them, tilt is "i pu tilt parameter, [ ⁇ ] is the first interval value.
  • the spectral tilt parameter tilt of the narrowband signal and the correlation size parameter cor of the current frame narrowband signal and the historical frame narrowband signal are obtained; according to the tilt and cor, the current frame signal is divided into two types: a rubbing sound and a non-friction sound.
  • the spectral tilt parameter can be any value greater than 5, for non-friction sounds, any value less than or equal to 5, or greater than 5, in order to ensure that the spectral tilt parameter tilt can be used as the predicted global gain.
  • the modified high frequency band time domain signal is obtained by multiplying the initial high frequency band signal by the time domain global gain parameter.
  • step S403 may include:
  • the modified high frequency band signal is corrected using the predicted global gain parameter to obtain a modified high frequency band time domain signal; that is, the corrected high frequency band time domain signal is obtained by multiplying the predicted global gain parameter by the initial high frequency band signal.
  • the method may further include:
  • Correcting the initial high frequency band signal using the predicted global gain parameter comprises: modifying the initial high frequency band signal using the time domain envelope parameter and the time domain global gain parameter.
  • S404 Synthesize a narrowband time domain signal of the current frame and the modified high frequency band time domain signal and output.
  • the time domain global gain parameter of the high frequency band signal is obtained according to the spectral tilt parameter and the interframe correlation, and the spectral tilt parameter of the narrow frequency band can be relatively accurately estimated.
  • the energy relationship between the frequency band signal and the high frequency band signal thereby better estimating the energy of the high frequency band signal; with the inter-frame correlation, the correlation between the narrow frequency band frames can be well utilized, and the high frequency band signal is estimated.
  • the inter-frame correlation, and in addition to weighting the global gain of the high-band can make good use of the previous real information without introducing bad noise.
  • the present invention also provides a speech and audio signal processing apparatus, which may be located in a terminal device, a network device, or a test device.
  • the speech signal processing device may be implemented by a hardware circuit or by software in conjunction with hardware.
  • a speech/audio signal processing device is called by a processor to implement speech and audio signal processing.
  • the speech audio signal processing apparatus can perform various methods and processes in the above method embodiments. Referring to FIG. 6, an embodiment of a speech and audio signal processing apparatus includes:
  • the obtaining unit 601 is configured to obtain an initial high frequency band signal corresponding to the current frame audio and video signal when the bandwidth of the audio signal is switched.
  • the parameter obtaining unit 602 is configured to obtain the time domain global gain parameter corresponding to the initial high frequency band signal
  • the weighting processing unit 603 is configured to perform weighting processing on the energy ratio value and the time domain global gain parameter, and obtain the weighted value as the predicted value.
  • a global gain parameter wherein, the energy ratio is a ratio of a time domain signal energy of the historical frame high frequency band to an initial high frequency band signal energy of the current frame;
  • the correcting unit 604 is configured to correct the initial high frequency band signal by using the predicted global gain parameter to obtain a modified high frequency band time domain signal;
  • the synthesizing unit 605 is configured to synthesize and output the narrow-band time domain signal of the current frame and the modified high-band time domain signal.
  • the bandwidth is switched to a wideband signal to a narrowband signal
  • the parameter unit 602 includes:
  • a global gain parameter obtaining unit configured to perform spectral tilt parameters according to a current frame speech audio signal, current Correlation of the frame audio signal with the historical frame narrowband signal obtains a time domain global gain parameter of the high frequency band signal.
  • the bandwidth is switched to the switching of the broadband signal to the narrowband signal
  • the parameter obtaining unit 602 includes:
  • the time domain envelope obtaining unit 701 is configured to use a preset series of values as a high-band time domain envelope parameter of the current frame speech audio signal;
  • the global gain parameter obtaining unit 702 is configured to obtain a time domain global gain parameter of the high frequency band signal according to a spectral tilt parameter of the current frame speech audio signal, a correlation between the current frame speech audio signal and the historical frame narrow band signal.
  • the correcting unit 604 is configured to correct the initial high frequency band signal by using a time domain envelope parameter and a predicted global gain parameter to obtain a modified high frequency band time domain signal.
  • an embodiment of the global gain parameter obtaining unit 702 includes: a classifying unit 801, configured to: according to the spectral tilt parameter of the current frame speech audio signal and the current frame speech audio signal and the historical frame narrowband signal Correlation, dividing the current frame speech audio signal into a first type signal or a second type signal;
  • the first limiting unit 802 if the current frame speech audio signal is the first type of signal, for limiting the ⁇ tilt parameter to be less than or equal to the first predetermined value, obtaining a spectral tilt parameter limit value, where the spectral tilt parameter limit value is high Time domain global gain parameter of the band signal;
  • a second limiting unit 803 if the current frame speech audio signal is a second type of signal, used to limit the spectral tilt parameter to belong to the first interval value, obtain a spectral tilt parameter limit value, and use the language tilt parameter limit value as the high frequency Time domain global gain parameter with signal.
  • the first type of signal is a fricative sound signal
  • the second type of signal is a non-frictional sound signal
  • the narrowband signal is divided into fricative sounds.
  • the other is a non-frictional sound
  • the first predetermined value is 8
  • the first predetermined interval is
  • the obtaining unit 601 includes:
  • the excitation signal obtaining unit 901 is configured to predict a high frequency band signal excitation signal according to the current frame speech audio signal;
  • An LPC coefficient obtaining unit 902 configured to predict an LPC coefficient of the high frequency band signal;
  • the generating unit 903 is configured to synthesize the LPC coefficients of the high-band signal excitation signal and the high-band signal to obtain the predicted high-band signal.
  • the bandwidth is switched to a switching of a narrowband signal to a broadband signal
  • the voice frequency signal processing apparatus further includes:
  • a weighting factor setting unit if the current audio frame has a predetermined correlation with a narrowband signal of the previous frame of the audio signal, the weighting factor alfa for the energy ratio corresponding to the previous frame of the audio signal is attenuated by a certain step size The latter value is used as a weighting factor for the energy ratio corresponding to the current audio frame, and is attenuated frame by frame until alfa is 0.
  • another embodiment of the speech and audio signal processing apparatus includes:
  • the prediction unit 1001 is configured to obtain an initial high-band signal corresponding to the current frame speech and audio signal when the speech signal is switched from the broadband signal to the narrow-band signal;
  • the parameter obtaining unit 1002 is configured to obtain a time domain global gain parameter of the high frequency band signal according to a spectral tilt parameter of the current frame speech audio signal, a correlation between the current frame narrow band signal and the historical frame narrow band signal;
  • the correcting unit 1003 is configured to correct the initial high-band signal by using the predicted global gain parameter to obtain a modified high-band time domain signal;
  • the synthesizing unit 1004 is configured to synthesize and output the narrow-band time domain signal of the current frame and the modified high-band time domain signal.
  • the parameter obtaining unit 1002 includes:
  • the classification unit 801 is configured to divide the current frame speech audio signal into the first type signal or the second according to the spectral tilt parameter of the current frame speech audio signal and the correlation between the current frame speech audio signal and the historical frame frame narrow band signal.
  • Class signal
  • the first limiting unit 802 if the current frame speech audio signal is the first type of signal, for limiting the speech tilt parameter to be less than or equal to the first predetermined value, obtaining a spectral tilt parameter limit value, where the spectral tilt parameter limit value is high Time domain global gain parameter of the band signal;
  • a second limiting unit 803 if the current frame speech audio signal is a second type of signal, used to limit the language tilt parameter to belong to the first interval value, and obtain a spectral tilt parameter limit value, and use the said tilt parameter limit value as the high frequency Time domain global gain parameter with signal.
  • the first type of signal is a fricative signal
  • the second type of signal is a non-fresh a rubbing signal
  • the narrowband signal is divided into fricatives
  • the other is a non-frictional sound
  • the first predetermined value is 8
  • the first predetermined interval is [0 5,1].
  • the audio signal processing device further includes:
  • a weighting processing unit configured to perform weighting processing on the energy ratio value and the time domain global gain parameter, and obtain the weighted value as a predicted global gain parameter, wherein the energy ratio is a historical frame high frequency band time domain signal energy and a current frame initial Ratio of high band signal energy;
  • the correction unit is configured to correct the initial high frequency band signal by using a predicted global gain parameter to obtain a modified high frequency band time domain signal.
  • the parameter obtaining unit is further configured to obtain a time domain envelope parameter corresponding to the initial high frequency band signal; and the modifying unit is configured to use the time domain envelope parameter and the time domain global gain parameter to The initial high band signal is corrected.
  • the program can be stored in a computer readable storage medium, the program When executed, the flow of an embodiment of the methods as described above may be included.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephone Function (AREA)
  • Transmitters (AREA)
  • Tone Control, Compression And Expansion, Limiting Amplitude (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本发明实施例公开了一种语音频信号处理方法和装置。一个实施例中,语音频信号处理方法包括:当语音频信号出现带宽切换时,获得当前帧语音频信号对应的初始高频带信号;获得所述初始高频带信号时域全局增益参数;将能量比值和所述时域全局增益参数进行加权处理,得到的加权值作为预测的全局增益参数,其中,能量比值为历史帧高频带时域信号能量与当前帧初始高频带信号能量的比值;利用预测的全局增益参数对所述初始高频带信号进行修正,获得修正的高频带时域信号;合成当前帧的窄频带时域信号和所述修正的高频带时域信号并输出。

Description

一种语音频信号处理方法和装置
本申请要求于 2012 年 03 月 01 日提交中国专利局、 申请号为 201210051672.6、 发明名称为 "一种语音频信号处理方法和装置" 的中国专利 申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明涉及数字信号处理技术领域,尤其是一种语音频信号处理方法和装 置。 背景技术
在数字通信领域, 语音、 图像、 音频、 视频的传输有着非常广泛的应用需 求,如手机通话、音视频会议、 广播电视、 多媒体娱乐等。音频被数字化处理, 通过音频通信网络从一个终端传递到另一个终端, 这里的终端可以是手机、数 字电话终端或其他任何类型的音频终端, 数字电话终端例如 VOIP电话或 ISDN 电话、 计算机、 电缆通信电话。 为了降低语音频信号存储或者传输过程中占用 的资源,语音频信号在发送端进行压縮处理后传输到接收端,接收端通过解压 缩处理恢复语音频信号并进行播放。
在目前的多速率语音频编码中, 由于网络状态的不同, 网络会对从编码端 传输到网络的码流做不同码率的截断,在解码端就会艮据截断后的码流解码出 不同带宽的语语音频信号,这样就使得输出的语语音频信号会在不同带宽间做 切换。
不同带宽信号间的突然切换, 会造成人耳听觉上的明显不舒适感; 同时, 由于滤波器及时频或频时变换等状态的更新, 一般需要用到前后帧间的参数, 在带宽切换时, 如果不做一些适当的处理, 这些状态的更新将会出现错误, 从 而造成一些能量激变的现象, 造成听觉质量变差。 发明内容
本发明实施例的目的在于提供一种语音频信号处理方法和装置,在语音频 信号带宽切换时提高听觉舒适性。 根据本发明的一实施例, 一种语音频信号处理方法包括: 语音频信号从宽频带信号到窄频带信号的切换时,获得当前帧语音频信号 对应的初始高频带信号;
根据当前帧语音频信号的谱倾斜参数、当前帧窄频带信号与历史帧窄频带 信号的相关性获得所述高频带信号的时域全局增益参数;
利用所述时域全局增益参数对所述初始高频带信号进行修正,获得修正的 高频带时域信号;
合成当前帧的窄频带时域信号和所述修正的高频带时域信号并输出。 根据本发明的另一实施例, 一种语音频信号处理方法包括:
当语音频信号出现带宽切换时,获得当前帧语音频信号对应的初始高频带 信号;
获得所述初始高频带信号时域全局增益参数;
将能量比值和所述时域全局增益参数进行加权处理,得到的加权值作为预 测的全局增益参数, 其中, 能量比值为历史帧高频带时域信号能量与当前帧初 始高频带信号能量的比值;
利用预测的全局增益参数对所述初始高频带信号进行修正,获得修正的高 频带时域信号;
合成当前帧的窄频带时域信号和所述修正的高频带时域信号并输出。 根据本发明的另一实施例, 一种语音频信号处理装置包括:
预测单元, 当语音频信号从宽频带信号到窄频带信号的切换时, 用于获得 当前帧语音频信号对应的初始高频带信号;
参数获得单元, 用于根据当前帧语音频信号的谱倾斜参数、 当前帧窄频带 信号与历史帧窄频带信号的相关性获得所述高频带信号的时域全局增益参数; 修正单元, 用于利用预测的全局增益参数对所述初始高频带信号进行修 正, 获得修正的高频带时域信号;
合成单元,用于合成当前帧的窄频带时域信号和所述修正的高频带时域信 号并输出。 根据本发明的另一实施例, 一种语音频信号处理装置包括: 获取单元, 用于当语音频信号出现带宽切换时,获得当前帧语音频信号对 应的初始高频带信号;
参数获得单元, 用于获得所述初始高频带信号对应的时域全局增益参数; 加权处理单元, 用于将能量比值和所述时域全局增益参数进行加权处理, 得到的加权值作为预测的全局增益参数; 其中, 能量比值为历史帧高频带时域 信号能量与当前帧初始高频带信号能量的比值;
修正单元, 用于利用预测的全局增益参数对所述初始高频带信号进行修 正, 获得修正的高频带时域信号;
合成单元,用于合成当前帧的窄频带时域信号和所述修正的高频带时域信 号并输出。 本发明实施例通过宽频带和窄频带间切换时对高频带信号的修正,使得宽 频带和窄频带间高频带信号平稳的过渡,有效地去除了宽频带和窄频带间切换 时造成的听觉不舒适感; 同时, 由于带宽切换算法和切换前高频带信号的编解 码算法在相同的信号域,保证了不增加额外延且算法简单的同时,还保证了输 出信号的性能。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地, 下面描述 中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付 出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明提供的语音频信号处理方法一个实施例的流程示意图; 图 2为本发明提供的语音频信号处理方法另一个实施例的流程示意图; 图 3为本发明提供的语音频信号处理方法另一个实施例的流程示意图; 图 4为本发明提供的语音频信号处理方法另一个实施例的流程示意图; 图 5为本发明提供的语音频信号处理装置一个实施例的结构示意图; 图 6为本发明提供的语音频信号处理装置一个实施例的结构示意图; 图 7为本发明提供的参数获得单元一个实施例的结构示意图;
图 8为本发明提供的全局增益参数获得单元一个实施例的结构示意图; 图 9为本发明提供的获取单元一个实施例的结构示意图;
图 10为本发明提供的语音频信号处理装置另一个实施例的结构示意图。 具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
数字信号处理领域,音频编解码器、视频编解码器广泛应用于各种电子设 备中, 例如: 移动电话, 无线装置, 个人数据助理(PDA ), 手持式或便携式 计算机, GPS接收机 /导航器, 照相机, 音频 /视频播放器, 摄像机, 录像机, 监控设备等。 通常, 这类电子设备中包括音频编码器或音频解码器, 音频编码 器或者解码器可以直接由数字电路或芯片例如 DSP ( digital signal processor )实 现, 或者由软件代码驱动处理器执行软件代码中的流程而实现。
在现有技术中, 由于网络中传输的语语音频信号的带宽不同,在语语音频 信号传输过程中,语音频信号的带宽会时常发生变化,存在窄频带语语音频信 号向宽频带语语音频信号切换,以及宽频带语语音频信号向窄频带语语音频信 号切换的现象。这种语音频信号在高低频带间切换的过程称为带宽切换, 带宽 切换包括从窄频带信号到宽频带信号的切换和从宽频带到窄频带信号的切换。 本发明中提到的窄频带信号为通过上采样和低通滤波,只有低频带成分而高频 带成分为空的语音信号,而宽频带语语音频信号既有低频带信号成分又有高频 带信号成分。 窄频带信号和宽频带信号是相对的, 例如相对于窄带信号而言, 宽带信号为宽频带信号; 相对于宽带信号而言, 超宽带信号为宽频带信号。 通 常, 窄带信号为釆样率为 8kHz的语语音频信号; 宽带信号为采样率为 16kHz的 语语音频信号; 超宽带为釆样率 32kHz的语语音频信号。
在切换前的高频带信号的编解码算法根据信号类型不同在时域和频域的 编解码算法间选择时, 或当切换前的高频带信号的编码算法是时域编码算法 时, 为了保证切换时输出信号的连续性,切换算法保持和切换前的高频带编解 码算法在相同的信号域进行处理, 即切换前高频带信号采用时域编解码算法, 接下来的切换算法就采用时域的切换算法;切换前的高频带信号采用频域的编 解码算法,接下来的切换算法就采用频域的切换算法。现有技术没有切换前使 用时域频带扩展算法切换后也使用类似的时域切换技术。
语音频编码一般以帧为单位进行处理。当前输入的需要处理的音频帧为当 前帧语音频信号; 当前帧语音频信号中包括窄频带信号和高频带信号, 即当前 帧窄频带信号和当前帧高频带信号。当前帧语音频信号之前的任意一帧语音频 信号为历史帧语音频信号,也包括历史帧窄频带信号和历史帧高频带信号; 当 前帧语音频信号之前一帧语音频信号为前一帧语音频信号。 参考图 1, 本发明语音频信号处理方法的一个实施例包括:
S101 : 当语音频信号出现带宽切换时,获得当前帧语音频信号对应的初始 高频带信号;
当前帧语音频信号是由当前帧窄频带信号和当前帧高频带时域信号组成。 带宽切换包括从窄频带信号到宽频带信号的切换和从宽频带到窄频带信号的 切换; 对于从窄频带信号到宽频带信号的切换, 当前帧语音频信号为当前帧宽 频带信号, 包括窄频带信号和高频带信号, 当前帧语音频信号的初始高频带信 号为真实的信号, 可以直接从当前帧语音频信号中获得; 对于从宽频带到窄频 带信号的切换, 当前帧语音频信号为当前帧窄频带信号, 当前帧高频带时域信 号为空, 当前帧语音频信号的初始高频带信号为预测信号, 需要预测当前帧窄 频带信号对应的高频带信号, 作为初始高频带信号。
S102: 获得该初始高频带信号对应的时域全局增益参数;
对于窄频带信号到宽频带信号的切换,高频带信号的时域全局增益参数可 以通过解码得到; 对于宽频带信号到窄频带信号的切换, 高频带信号的时域全 局增益参数可以根据当前帧信号获得:根据窄频带信号的谱倾斜参数和当前帧 窄频带信号与历史帧窄频带信号的相关性获得所述高频带信号的时域全局增 益参数。
S103 : 将能量比值和该时域全局增益参数进行加权处理,得到的加权值作 为预测的全局增益参数; 其中, 能量比值为历史帧语音频信号高频带时域信号 能量与当前帧语音频信号初始高频带信号能量的比值;
历史帧语音频信号使用的是历史帧最终输出的语音频信号,当前帧语语音 频信号使用的是指初始高频带信号; 能量比值 Ratio = Esyn(-l) / Esynjmp; Esyn(-l)表示历史帧输出的高频带时域信号 syn的能量, Esyn— tmp表示当前帧对 应的初始高频带时域信号 syn的能量。
预测的全局增益参数 gain = alfa*Ratio 十 beta* gain' , 其中, gain' 为时域 全局增益参数, alfa+beta = 1 , 且根据信号类型的不同, alfa和 beta的取值不同。
S104: 利用预测的全局增益参数对该初始高频带信号进行修正,获得修正 的高频带时域信号;
修正指信号相乘, 即用预测的全局增益参数与初始高频带信号相乘。 另一 个实施例中,步骤 S102中获得该初始高频带信号对应的时域包络参数和时域全 局增益参数,则步骤 S104中利用时域包络参数和预测的全局增益参数对该初始 高频带信号进行修正, 获得修正的高频带时域信号; 即用时域包络参数和预测 的时域全局增益参数乘于该预测的高频带信号, 获得高频带时域信号。
对于窄频带信号到宽频带信号的切换,高频带信号的时域包络参数可以通 过解码得到; 对于宽频带信号到窄频带信号的切换, 高频带信号的时域包络参 数可以根据当前帧信号获得:可以将预先设定好的一系列值或者历史帧高频带 时域包络参数作为当前帧语音频信号的高频带时域包络参数。
S105: 合成当前帧的窄频带时域信号和该修正的高频带时域信号并输出。 上述实施例通过宽频带和窄频带间切换时时高频带信号的修正,使得宽频 带和窄频带间高频带信号平稳的过渡,有效地去除了宽频带和窄频带间切换时 造成的听觉不舒适感; 同时, 由于带宽切换算法和切换前高频带信号的编解码 算法在相同的信号域,保证了不增加额外延且算法简单的同时,还保证了输出 信号的性能。
参考图 2, 本发明语音频信号处理方法的另一个实施例包括:
S201 : 当宽频带信号向窄频带信号切换时,预测当前帧窄频带信号对应的 预测高频带信号;
由宽频带信号向窄频带切换, 即前一帧为宽频带信号, 当前帧为窄频带信 号。预测当前帧窄频带信号对应的预测高频带信号的步骤包括: 根据当前帧窄 频带信号预测当前帧语音频信号高频带信号激励信号;预测当前帧语音频信号 高频带信号的 LPC ( Linear Predictive Coding, 线性预测编码)系数: 合成预测 的高频带激励信号和 LPC系数, 获得预测高频带信号 syn— tmp。
一个实施例中, 可以从窄频带信号中提取基音周期、代数码数和增益等参 数, 通过变釆样, 滤波预测到高频带的激励信号;
另一个实施例中,可以通过对窄频带时域信号或窄频带时域激励信号通过 上采用、 低通, 然后取绝对值或取平方等操作来预测高频带激励信号。
预测高频带信号的 LPC系数, 可以将历史帧的高频带 LPC系数或预先设定 好的一系列值作为当前帧 LPC系数; 也可以对不同的信号类型釆用不同的预测 方式。
S202: 获得所述预测高频带信号对应的时域包络参数和时域全局增益参 数;
可以将预先设定好的一系列值作为当前帧的高频带时域包络参数。可以将 窄带信号大体分几类,每类预先设定好一系列值,根据当前帧窄带信号的类型, 选择一组预先设定好的时域包络参数;也可以就设定好一组时域包络值,例如, 时域包络的个数为 M, 则预先设定好的值可以为 M个 0.3536。 该实施例中, 时 域包络参数的获得为可选步骤, 并不是必须的。
根据窄频带信号的谱倾斜参数和当前帧窄频带信号和历史帧窄频带信号 的相关性获得所述高频带信号的时域全局增益参数; 一个实施例中, 包括如下 步骤:
S2021 : 根据所述当前帧语音频信号的谱倾斜参数和当前帧窄频带信号与 历史帧窄频带信号的相关性,将当前帧语音频信号分为第一类信号或第二类信 号; 一个实施例中, 第一类信号为摩擦音信号, 第二类信号为非摩擦音信号; 当谱倾斜参数 tilt>5且相关性参数 cor小于一给定值时, 将窄频带信号分成摩擦 音, 其他的为非摩擦音。
其中, 当前帧窄频带信号和历史帧窄频带信号的相关性大小参数 cor的计 算,可以通过相同某频段信号的能量的大小关系来确定, 也可以通过几个相同 频段的能量关系确定,也可以通过时域信号或时域激励信号的自相关或互相关 公式来计算。
S2022: 如果当前帧语音频信号为第一类信号, 则将谱倾斜参数限制到小 于等于第一预定值, 获得谱倾斜参数限制值; 以所述讲倾斜参数限制值作为高 频带信号的时域全局增益参数。即当前帧语音频信号的谱倾斜参数小于等于第 一预定值时,保留谱倾斜参数原值作为谱倾斜参数限制值; 当前帧语音频信号 的谱倾斜参数大于第一预定值时, 取第一预定值作为普倾斜参数限制值。
gain'通过以下公式获得:
Figure imgf000010_0001
其中, tilt为 Ϊ普倾斜参数, 31为第一预订值。
S2023 : 如果当前帧语音频信号为第二类信号, 则将谱倾斜参数限制到属 于笫一区间值, 获得谱倾斜参数限制值; 以所述语倾斜参数限制值作为高频带 信号的时域全局增益参数。即当前帧语音频信号的借倾斜参数属于第一区间值 时,保留谱倾斜参数原值作为谱倾斜参数限制值; 当前帧语音频信号的 倾斜 参数大于第一区间值的上限时, 取第一区间值的上限作为谱倾斜参数限制值; 当前帧语音频信号的谱倾斜参数小于第一区间值的下限时,取第一区间值的下 限作为谱倾斜参数限制值。
时域全局增益参数 gam'通过以下公式获得:
Figure imgf000010_0002
其中, tilt为谙倾斜参数, [",6]为第一区间值。
一个实施例中, 获得窄频带信号的谱倾斜参数 tilt及当前帧窄频带信号和 历史帧窄频带信号的相关性大小参数 cor; 根据 tilt及 cor将当前帧信号分为摩擦 音及非摩擦音两类, 当谙倾斜参数 tilt>5且相关性参数 cor小于一给定值时, 将 窄频带信号分成摩擦音, 其他的为非摩擦音; 将 tilt的取值范围限制到 0.5<=tilt<=l .0之间作为非摩擦音的时域全局增益参数,将 tilt的取值范围限制到 tilt<=8.0作为摩擦音的时域全局增益参数。 对摩擦音而言, 谱倾斜参数可以是 大于 5的任何值, 对非摩擦音而言, 可以小于等于 5的任何值, 也可能大于 5, 为了保证能将谙倾斜参数 tilt能作为估计的时域全局增益参数, 对 tilt的值的范 围做限定后作为时域全局增益参数, 即当 tilt>8时, 取 tilt = 8作为摩擦音的时域 全局增益参数, 当 tilt<0.5时, 取 1^ = 0.5或 1>1.0时, 取 tilt = 1.0作为非摩擦音 的时域全局增益参数。
S203 : 将能量比值和该时域全局增益参数进行加权处理,得到的加权值作 为预测的全局增益参数; 其中, 能量比值为历史帧语音频信号高频带时域信号 能量与当前帧语音频信号初始高频带信号能量的比值;
求解能量比值 Ratio = Esyn(-l) I Esynjmp, 将 tilt和 Ratio的加权值作为当前 帧预测的全局增益参数 gain, 即 gain = alfa*Ratio + beta*gain,; 其中, gain' 为时域全局增益参数, alfa+beta = 1, 且根据信号类型的不同, alfa和 beta的取 值不同; Esyn(-l)表示历史帧的最终输出的高频带时域信号 syn的能量, Esyn— tmp表示当前帧预测高频带时域信号 syn的能量。
S204:利用时域包络参数和预测的全局增益参数对该预测高频带信号进行 修正, 获得修正的高频带时域信号;
用时域包络参数和预测的时域全局增益参数乘于该预测的高频带信号,获 得高频带时域信号。
该实施例中, 时域包络参数为可选的, 当仅包含时域全局增益参数时, 则 可以利用预测的全局增益参数对该预测高频带信号进行修正,获得修正的高频 带时域信号;即用预测的全局增益参数乘于预测高频带信号得到修正的高频带 时域信号。
S205: 合成当前帧的窄频带时域信号和该修正的高频带时域信号并输出。 高频带时域信号 syn的能量 Esyn用来预测下一帧时域全局增益参数, 即将
Esyn的值赋值给 Esyn (- 1 )
上述实施例通过对宽频带信号后窄频带信号高频带的修正,使得宽频带和 窄频带间高频带部分平稳的过渡,有效地去除了宽频带和窄频带间切换时造成 的听觉不舒适感; 同时, 由于对切换时的帧进行了相应的处理, 间接去除了参 数和状态更新时出现的问题。通过保持带宽切换算法和切换前高频带信号的编 解码算法在相同的信号域,保证了不增加额外延且算法简单的同时,还保证了 输出信号的性能。 参考图 3 , 本发明语音频信号处理方法的另一个实施例包括:
S301 : 当窄频带信号向宽频带信号切换时, 获得当前帧高频带信号; 当由窄频带信号向宽频带切换时, 即前一帧为窄频带信号, 当前帧为宽频 带信号。
S302 : 获得所述高频带信号对应的时域包络参数和时域全局增益参数; 该时域包络参数和时域全局增益参数可以从当前帧高频带信号中直接获 得。 其中, 时域包络参数的获得为可选步骤。
S303 : 将能量比值和该时域全局增益参数进行加权处理,得到的加权值作 为预测的全局增益参数; 其中, 能量比值为历史帧语音频信号高频带时域信号 能量与当前帧语音频信号初始高频带信号能量的比值。;
因为当前帧是宽频带信号, 所以高频带信号的各参数都能通过解码得到, 为了保证切换时能平滑过渡, 通过如下方式对时域全局增益参数进行平滑: 求解能量比值 Ratio = Esyn(-l) I Esynjmp, Esyn(-l)表示历史帧的最终输出 的高频带时域信号 syn的能量; Esyn_tmp当前帧的高频带时域信号 syn的能量。
将解码出的时域全局增益参数 gam和 Ratio的加权值作为当前帧预测的全 局增益参数 gain, 即 gain = alfa*Ratio + beta* gain' , 其中, gain' 为时域全局 增益参数, alfa+beta = 1 , 且根据信号类型的不同, alfa和 beta的取值不同
如果当前音频帧与前一帧语音频信号的窄带信号具有预定相关性时,则对 前一帧语音频信号对应的所述能量比值的加权因子 alfa按一定的步长衰减后的 值作为当前音频帧对应的所述能量比值的加权因子, 逐帧衰减直到 alfa为 0。
当前后帧间窄频带信号有相同的信号类型或相关性满足一定的条件时,即 前后帧间有一定的相关性, 或前后帧间信号类型相似, 则对 alfa按一定的步长 逐帧衰减, 直到 alfa衰减到 0; 当前后帧间窄频带信号不具有相关性时, 直接将 alfa衰减到 0 , 即保持当前解码结果, 不做加权和修正处理。 。
S304: 利用时域包络参数和预测的全局增益参数对该高频带信号进行修 正, 获得修正的高频带时域信号;
修正即用时域包络参数和预测的时域全局增益参数乘于该高频带信号,获 得修正的高频带时域信号。
该实施例中,时域包络参数为可选的,当仅包含时域时域全局增益参数时, 则可以利用预测的全局增益参数对该高频带信号进行修正,获得修正的高频带 时域信号;即用预测的全局增益参数乘于高频带信号得到修正的高频带时域信 号。
S305: 合成当前帧的窄频带时域信号和该修正的高频带时域信号并输出。 上述实施例通过对窄频带信号后宽频带信号高频带的修正,使得宽频带和 窄频带间高频带部分平稳的过渡,有效地去除了宽频带和窄频带间切换时造成 的听觉不舒适感; 同时, 由于对切换时的帧进行了相应的处理, 间接去除了参 数和状态更新时出现的问题。通过保持带宽切换算法和切换前高频带信号的编 解码算法在相同的信号域,保证了不增加额外延且算法简单的同时,还保证了 输出信号的性能。 参考图 4, 本发明语音频信号处理方法的另一个实施例包括:
S401 : 语音频信号从宽频带信号到窄频带信号的切换时,获得当前帧语音 频信号对应的初始高频带信号;
由宽频带信号向窄频带切换, 即前一帧为宽频带信号, 当前帧为窄频带信 号。预测当前帧窄频带信号对应的初始高频带信号的步骤包括: 根据当前帧窄 频带信号预测当前帧语音频信号高频带信号激励信号;预测当前帧语音频信号 高频带信号的 LPC系数: 合成预测的高频带激励信号和 LPC系数, 获得初始高 频带信号 syn tmp。
一个实施例中, 可以从窄频带信号中提取基音周期、代数码数和增益等参 数, 通过变采样, 滤波预测到高频带的激励信号;
另一个实施例中,可以通过对窄频带时域信号或窄频带时域激励信号通过 上釆用、 低通, 然后取绝对值或取平方等操作来预测高频带激励信号。
预测高频带信号的 LPC系数, 可以将历史帧的高频带 LPC系数或预先设定 好的一系列值作为当前帧 LPC系数; 也可以对不同的信号类型釆用不同的预测 方式。
S402: 根据当前帧语音频信号的傳倾斜参数、 当前帧窄频带信号与历史帧 窄频带信号的相关性获得所述高频带信号的时域全局增益参数;
一个实施例中, 包括如下步骤: S2021: 根据所述当前帧语音频信号的谱倾斜参数和当前帧窄频带与历史 帧窄频带信号的相关性, 将当前帧语音频信号分为第一类信号或第二类信号; 一个实施例中, 第一类信号为摩擦音信号, 第二类信号为非摩擦音信号。
一个实施例中, 当普倾斜参数 tilt>5且相关性参数 cor小于一给定值时, 将 窄频带信号分成摩擦音, 其他的为非摩擦音。 其中, 当前帧窄频带信号和历史 帧窄频带信号的相关性大小参数 cor的计算, 可以通过相同某频段信号的能量 的大小关系来确定, 也可以通过几个相同频段的能量关系确定,也可以通过时 域信号或时域激励信号的自相关或互相关公式来计算。
S2022: 如果当前帧语音频信号为第一类信号, 则将谱倾斜参数限制到小 于等于第一预定值, 获得谱倾斜参数限制值; 以所述谱倾斜参数限制值作为高 频带信号的时域全局增益参数。即当前帧语音频信号的谱倾斜参数小于等于第 一预定值时,保留谱倾斜参数原值作为 i倾斜参数限制值; 当前帧语音频信号 的豫倾斜参数大于第一预定值时, 取第一预定值作为豫倾斜参数限制值。
当前帧语音频信号为摩擦音信号时, 时域全局增益参数 gain'通过以下公式 获得
Figure imgf000014_0001
其中, tilt为谙倾斜参数, 为第一预订值。
S2023 : 如果当前帧语音频信号为第二类信号, 则将谱倾斜参数限制到属 于第一区间值, 获得谱倾斜参数限制值; 以所述语倾斜参数限制值作为高频带 信号的时域全局增益参数。即当前帧语音频信号的语倾斜参数属于第一区间值 时,保留谱倾斜参数原值作为谱倾斜参数限制值; 当前帧语音频信号的 倾斜 参数大于第一区间值的上限时, 取第一区间值的上限作为谱倾斜参数限制值; 当前帧语音频信号的谙倾斜参数小于第一区间值的下限时,取第一区间值的下 限作为谱倾斜参数限制值。
当前帧语音频信号为非摩擦音信号时, 时域全局增益参数 gain'通过以下公 式获得:
Figure imgf000014_0002
其中, tilt为" i普倾斜参数, [α ]为第一区间值。 一个实施例中, 获得窄频带信号的谱倾斜参数 tilt及当前帧窄频带信号和 历史帧窄频带信号的相关性大小参数 cor; 根据 tilt及 cor将当前帧信号分为摩擦 音及非摩擦音两类, 当谱倾斜参数 tilt>5且相关性参数 cor小于一给定值时, 将 窄频带信号分成摩擦音, 其他的为非摩擦音; 将 tilt的取值范围限制到 0.5<=tilt<=l .0之间作为非摩擦音的时域全局增益参数,将 tilt的取值范围限制到 tilt<=8.0作为摩擦音的时域全局增益参数。 对摩擦音而言, 谱倾斜参数可以是 大于 5的任何值, 对非摩擦音而言, 可以小于等于 5的任何值, 也可能大于 5, 为了保证能将谱倾斜参数 tilt能作为预测的的全局增益参数, 对 tilt的值的范围 做限定后作为时域全局增益参数, 即当 tilt>8时, 取 tilt = 8作为摩擦音信号的时 域全局增益参数, 当 tilt<0.5时, 取 1¾ = 0.5或1 >1.0时, 取 tilt = 1.0作为非摩擦 音信号的时域全局增益参数。
S403: 利用时域全局增益参数对所述初始高频带信号进行修正,获得修正 的高频带时域信号;
一个实施例中,用时域全局增益参数乘于初始高频带信号得到修正的高频 带时域信号。
另一个实施例中, 步骤 S403可以包括:
将能量比值和所述时域全局增益参数进行加权处理,得到的加权值作为预 测的全局增益参数, 其中, 能量比值为历史帧高频带时域信号能量与当前帧初 始高频带信号能量的比值;
利用预测的全局增益参数对所述初始高频带信号进行修正得到修正的高 频带时域信号;即用预测的全局增益参数乘于初始高频带信号得到修正的高频 带时域信号。
可选的, 在步骤 S403之前还可以包括:
获得所述初始高频带信号对应的时域包络参数;
则利用预测的全局增益参数对所述初始高频带信号进行修正包括: 利用所述时域包络参数和时域全局增益参数对所述初始高频带信号进行 修正。
S404: 合成当前帧的窄频带时域信号和所述修正的高频带时域信号并输 出。 上述实施例中, 在宽频带向窄频带切换时,根据谱倾斜参数和帧间相关性 获得高频带信号的时域全局增益参数,用窄频带的谱倾斜参数能相对比较准确 地估计出窄频带信号和高频带信号间的能量关系,进而更好地估计出高频带信 号的能量; 用帧间相关性, 可以很好地利用窄频带帧间的相关性, 估计出高频 带信号的帧间相关性, 进而在加权求高频带的全局增益时, 既可以很好地利用 前面真实的信息, 又不会引入不好的噪声。利用时域全局增益参数对高频带信 号进行修正,使得宽频带和窄频带间高频带部分平稳的过渡,有效地去除了宽 频带和窄频带间切换时造成的听觉不舒适感。 与上述方法实施例相关联, 本发明还提供一种语音频信号处理装置,该装 置可以位于终端设备, 网络设备, 或测试设备中。 所述语音频信号处理装置可 以由硬件电路来实现, 或者由软件配合硬件来实现。 例如, 参考图 5 , 由一个 处理器调用语音频信号处理装置来实现语音频信号处理。该语音频信号处理装 置可以执行上述方法实施例中的各种方法和流程。 参考图 6, 语音频信号处理装置的一个实施例, 包括:
获取单元 601, 用于当语音频信号出现带宽切换时, 获得当前帧语音频信 号对应的初始高频带信号;
参数获得单元 602 ,用于获得所述初始高频带信号对应时域全局增益参数; 加权处理单元 603 ,用于将能量比值和该时域全局增益参数进行加权处理, 得到的加权值作为预测的全局增益参数; 其中, 能量比值为历史帧高频带时域 信号能量与当前帧初始高频带信号能量的比值;
修正单元 604, 用于利用预测的全局增益参数对所述初始高频带信号进行 修正, 获得修正的高频带时域信号;
合成单元 605, 用于合成当前帧的窄频带时域信号和所述修正的高频带时 域信号并输出。
一个实施例中, 带宽切换为宽频带信号到窄频带信号的切换, 参数徒得单 元 602包括:
全局增益参数获得单元, 用于根据当前帧语音频信号的谱倾斜参数、 当前 帧语音频信号与历史帧窄频带信号的相关性获得所述高频带信号的时域全局 增益参数。
参考图 7, 另一个实施例中, 带宽切换为宽频带信号到窄频带信号的切换, 则参数获得单元 602包括:
时域包络获得单元 701 , 用于将预设一系列值作为当前帧语音频信号的高 频带时域包络参数;
全局增益参数获得单元 702 , 用于根据当前帧语音频信号的谱倾斜参数、 当前帧语音频信号与历史帧窄频带信号的相关性获得所述高频带信号的时域 全局增益参数。
则修正单元 604, 用于利用时域包络参数和预测的全局增益参数对所述初 始高频带信号进行修正, 获得修正的高频带时域信号。
参考图 8, 进一步的, 全局增益参数获得单元 702的一个实施例包括: 分类单元 801, 用于根据所述当前帧语音频信号的谱倾斜参数和当前帧语 音频信号与历史帧窄频带信号的相关性,将当前帧语音频信号分为第一类信号 或第二类信号;
第一限制单元 802, 如果当前帧语音频信号为第一类信号, 用于将谙倾斜 参数限制到小于等于第一预定值,得到谱倾斜参数限制值,以所述谱倾斜参数 限制值作为高频带信号的时域全局增益参数;
第二限制单元 803 , 如果当前帧语音频信号为第二类信号, 用于将谱倾斜 参数限制到属于第一区间值,得到谱倾斜参数限制值, 以所述语倾斜参数限制 值作为高频带信号的时域全局增益参数。
进一步的,一个实施例中, 第一类信号为摩擦音信号, 第二类信号为非摩 擦音信号; 当谱倾斜参数 tilt>5且相关性参数 cor小于一给定值时, 将窄频带信 号分成摩擦音; 其他的为非摩擦音; 所述第一预定值为 8; 第一预定区间为
[0.5,1]。
参考图 9, 一个实施例中, 获取单元 601包括:
激励信号获得单元 901, 用于根据当前帧语音频信号预测高频带信号激励 信号;
LPC系数获得单元 902, 用于预测高频带信号的 LPC系数; 生成单元 903, 用于合成高频带信号激励信号和高频带信号的 LPC系数, 获得所述预测高频带信号。
一个实施例中, 该带宽切换为窄频带信号到宽频带信号的切换, 则该语音 频信号处理装置还包括:
加权因子设置单元,如果当前音频帧与前一帧语音频信号的窄带信号具有 预定相关性时, 用于对前一帧语音频信号对应的所述能量比值的加权因子 alfa 按一定的步长衰减后的值作为当前音频帧对应的所述能量比值的加权因子,逐 帧衰减直到 alfa为到 0。
参考图 10 , 语音频信号处理装置的另一个实施例, 包括:
预测单元 1001 , 当语音频信号从宽频带信号到窄频带信号的切换时,用于 获得当前帧语音频信号对应的初始高频带信号;
参数获得单元 1002,用于根据当前帧语音频信号的谱倾斜参数、 当前帧窄 频带信号与历史帧窄频带信号的相关性获得所述高频带信号的时域全局增益 参数;
修正单元 1003 ,用于利用预测的全局增益参数对所述初始高频带信号进行 修正, 获得修正的高频带时域信号;
合成单元 1004 ,用于合成当前帧的窄频带时域信号和所述修正的高频带时 域信号并输出。
参考图 8, 参数获得单元 1002包括:
分类单元 801, 用于根据所述当前帧语音频信号的谱倾斜参数和当前帧语 音频信号与历史帧帧窄频带信号的相关性,将当前帧语音频信号分为第一类信 号或第二类信号;
第一限制单元 802, 如果当前帧语音频信号为第一类信号, 用于将语倾斜 参数限制到小于等于第一预定值,得到谱倾斜参数限制值, 以所述谱倾斜参数 限制值作为高频带信号的时域全局增益参数;
第二限制单元 803, 如果当前帧语音频信号为第二类信号, 用于将语倾斜 参数限制到属于第一区间值,得到谱倾斜参数限制值, 以所述讲倾斜参数限制 值作为高频带信号的时域全局增益参数。
进一步的, 一个实施例中, 第一类信号为摩擦音信号, 第二类信号为非摩 擦音信号; 当谱倾斜参数 tilt>5且相关性参数 cor小于一给定值时, 将窄频带信 号分成摩擦音; 其他的为非摩擦音; 其中, 第一预定值为 8; 第一预定区间为 [0 5,1].
可选的, 一个实施例中, 语音频信号处理装置还包括:
加权处理单元, 用于将能量比值和所述时域全局增益参数进行加权处理, 得到的加权值作为预测的全局增益参数, 其中, 能量比值为历史帧高频带时域 信号能量与当前帧初始高频带信号能量的比值;
所述修正单元用于利用预测的全局增益参数对所述初始高频带信号进行 修正, 获得修正的高频带时域信号。
另一个实施例中,参数获得单元还用于获得所述初始高频带信号对应的时 域包络参数;则修正单元用于利用所述时域包络参数和时域全局增益参数对所 述初始高频带信号进行修正。 本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程, 是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算 机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。 其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory, ROM )或随机存储记忆体 ( Random Access Memory, RAM )等。 以上所述仅为本发明的几个实施例,本领域的技术人员依据申请文件公开 的可以对本发明进行各种改动或变型而不脱离本发明的 4青神和范围。

Claims

权 利 要 求
1、 一种语语音频信号处理方法, 其特征在于, 包括:
语音频信号从宽频带信号到窄频带信号的切换时,获得当前帧语音频信号 对应的初始高频带信号;
根据当前帧语音频信号的潘倾斜参数、当前帧窄频带信号与历史帧窄频带 信号的相关性获得所述高频带信号的时域全局增益参数;
利用所述时域全局增益参数对所述初始高频带信号进行修正,获得修正的 高频带时域信号;
合成当前帧的窄频带时域信号和所述修正的高频带时域信号并输出。
2、 根据权利要求 1所述的方法, 其特征在于, 所述根据当前帧语音频信 号的借倾斜参数、当前帧窄频带信号与历史帧窄频带信号的相关性获得所述高 频带信号的时域全局增益参数包括:
根据所述当前帧语音频信号的谱倾斜参数和当前帧窄频带信号与历史帧 窄频带信号的相关性, 将当前帧语音频信号分为第一类信号或第二类信号; 如果当前帧语音频信号为第一类信号,则将谙倾斜参数限制到小于等于第 一预定值, 得到谱倾斜参数限制值;
如杲当前帧语音频信号为第二类信号,则将谱倾斜参数限制到属于第一区 间值, 得到谱倾斜参数限制值;
以所述语倾斜参数限制值作为高频带信号的时域全局增益参数。
3、 根据权利要求 2所述的方法, 其特征在于, 所述第一类信号为摩擦音 信号, 第二类信号为非摩擦音信号; 当谱倾斜参数 tilt>5且相关性参数 cor小 于一给定值时, 将窄频带信号分成摩擦音; 其他的为非摩擦音; 所述第一预定 值为 8; 第一预定区间为 [0.5,1]。
4、 根据权利要求 1-3所述的任一方法, 其特征在于, 利用所述时域全局 增益参数对所述初始高频带信号进行修正, 获得修正的高频带时域信号包括: 将能量比值和所述时域全局增益参数进行加权处理,得到的加权值作为预 测的全局增益参数, 其中, 能量比值为历史帧高频带时域信号能量与当前帧初 始高频带信号能量的比值;
利用预测的全局增益参数对所述初始高频带信号进行修正。
5、 根据权利要求 1-3所述的任一方法, 其特征在于, 还包括: 获得所述初始高频带信号对应的时域包络参数;
其中, 利用时域全局增益参数对所述初始高频带信号进行修正包括: 利用所述时域包络参数和时域全局增益参数对所述初始高频带信号进行 修正。
6、 一种语语音频信号处理方法, 其特征在于, 包括:
当语音频信号出现带宽切换时,获得当前帧语音频信号对应的初始高频带 信号;
获得所述初始高频带信号时域全局增益参数;
将能量比值和所述时域全局增益参数进行加权处理,得到的加权值作为预 测的全局增益参数, 其中, 能量比值为历史帧高频带时域信号能量与当前帧初 始高频带信号能量的比值;
利用预测的全局增益参数对所述初始高频带信号进行修正,获得修正的高 频带时域信号;
合成当前帧的窄频带时域信号和所述修正的高频带时域信号并输出。
7、 根据权利要求 6所述的方法, 其特征在于, 所述带宽切换为宽频带信 号到窄频带信号的切换, 所述获得所述初始高频带信号对应的全局增益参数, 包括:
根据当前帧语音频信号的 i "倾斜参数、当前帧窄频带信号与历史帧窄频带 信号的相关性获得所述高频带信号的时域全局增益参数。
8、 根据权利要求 7所述的方法, 其特征在于, 所述根据当前帧语音频信 号的谱倾斜参数、当前帧窄频带信号与历史帧窄频带信号的相关性获得所述高 频带信号的时域全局增益参数包括:
根据所述当前帧语音频信号的谱倾斜参数和当前帧窄频带信号与历史帧 窄频带信号的相关性, 将当前帧语音频信号分为第一类信号或第二类信号; 如杲当前帧语音频信号为第一类信号,则将傳倾斜参数限制到小于等于第 一预定值, 得到语倾斜参数限制值;
如果当前帧语音频信号为第二类信号,则将谱倾斜参数限制到属于第一区 间值, 得到谱倾斜参数限制值;
以所述傳倾斜参数限制值作为高频带信号的时域全局增益参数。
9、 根据权利要求 8所述的方法, 其特征在于, 所述第一类信号为摩擦音 信号, 第二类信号为非摩擦音信号; 当谱倾斜参数 tilt>5且相关性参数 cor小 于一给定值时, 将窄频带信号分成摩擦音; 其他的为非摩擦音; 所述第一预定 值为 8; 第一预定区间为 [0.5,1]。
10、根据权利要求 6所述的方法, 其特征在于, 所述带宽切换为宽频带信 号到窄频带信号的切换,所述获得当前帧语音频信号对应的初始高频带信号包 括:
根据当前帧语音频信号预测高频带激励信号;
预测高频带信号的 LPC系数;
合成高频带激励信号和高频带信号的 LPC系数, 获得所述预测高频带信 号。
11、根据权利要求 6所述的方法, 其特征在于, 所述带宽切换为窄频带信 号到宽频带信号的切换, 所述方法还包括:
如杲当前帧与前一帧语音频信号的窄带信号具有预定相关性时,则对前一 帧语音频信号对应的所述能量比值的加权因子 alfa按一定的步长衰减后的值 作为当前音频帧对应的所述能量比值的加权因子, 逐帧衰减直到 alfa为 0。
12、 一种语音频信号处理装置, 其特征在于, 包括:
预测单元, 当语音频信号从宽频带信号到窄频带信号的切换时, 用于获得 当前帧语音频信号对应的初始高频带信号;
参数获得单元, 用于根据当前帧语音频信号的谱倾斜参数、 当前帧窄频带 信号与历史帧窄频带信号的相关性获得所述高频带信号的时域全局增益参数; 修正单元, 用于利用预测的全局增益参数对所述初始高频带信号进行修 正, 获得修正的高频带时域信号;
合成单元,用于合成当前帧的窄频带时域信号和所述修正的高频带时域信 号并输出。
13、根据权利要求 12所述的装置, 其特征在于, 所述参数获得单元包括: 分类单元,用于根据所述当前帧语音频信号的谱倾斜参数和当前帧语音频 信号与历史帧帧窄频带信号的相关性,将当前帧语音频信号分为第一类信号或 第二类信号;
第一限制单元, 如杲当前帧语音频信号为第一类信号, 用于将 倾斜参数 限制到小于等于第一预定值,得到谙倾斜参数限制值, 以所述语倾斜参数限制 值作为高频带信号的时域全局增益参数;
第二限制单元, 如果当前帧语音频信号为第二类信号, 用于将谱倾斜参数 限制到属于第一区间值,得到谱倾斜参数限制值, 以所述谱倾斜参数限制值作 为高频带信号的时域全局增益参数。
14、 根据权利要求 13所述的装置, 其特征在于, 所述第一类信号为摩擦 音信号, 第二类信号为非摩擦音信号; 当谱倾斜参数 tilt>5且相关性参数 cor 小于一给定值时, 将窄频带信号分成摩擦音; 其他的为非摩擦音; 所述第一预 定值为 8; 第一预定区间为 [0.5,1]。
15、 根据权利要求 12-14所述的任一装置, 其特征在于, 还包括: 加权处理单元, 用于将能量比值和所述时域全局增益参数进行加权处理, 得到的加权值作为预测的全局增益参数, 其中, 能量比值为历史帧高频带时域 信号能量与当前帧初始高频带信号能量的比值;
所述修正单元用于利用预测的全局增益参数对所述初始高频带信号进行 修正, 获得修正的高频带时域信号。
16、 根据权利要求 12-14所述的任一装置, 其特征在于,
所述参数获得单元还用于获得所述初始高频带信号对应的时域包络参数; 所述修正单元用于利用所述时域包络参数和时域全局增益参数对所述初 始高频带信号进行修正。
17、 一种语音频信号处理装置, 其特征在于, 包括:
获取单元, 用于当语音频信号出现带宽切换时,获得当前帧语音频信号对 应的初始高频带信号;
参数获得单元, 用于获得所述初始高频带信号对应的时域全局增益参数; 加权处理单元, 用于将能量比值和所述时域全局增益参数进行加权处理, 得到的加权值作为预测的全局增益参数; 其中, 能量比值为历史帧高频带时域 信号能量与当前帧初始高频带信号能量的比值;
修正单元, 用于利用预测的全局增益参数对所述初始高频带信号进行修 正, 获得修正的高频带时域信号;
合成单元,用于合成当前帧的窄频带时域信号和所述修正的高频带时域信 号并输出。
18、 根据权利要求 17所述的装置, 其特征在于, 所述带宽切换为宽频带 信号到窄频带信号的切换, 所述参数获得单元包括:
全局增益参数获得单元, 用于根据当前帧语音频信号的谱倾斜参数、 当前 帧语音频信号与历史帧窄频带信号的相关性获得所述高频带信号的时域全局 增益参数。
19、 根据权利要求 18所述的装置, 其特征在于, 所述全局增益参数获得 单元包括:
分类单元,用于根据所述当前帧语音频信号的谱倾斜参数和当前帧语音频 信号与历史帧窄频带信号的相关性,将当前帧语音频信号分为第一类信号或第 二类信号;
第一限制单元, 如果当前帧语音频信号为第一类信号, 用于将谱倾斜参数 限制到小于等于第一预定值,得到谱倾斜参数限制值,以所述傳倾斜参数限制 值作为高频带信号的时域全局增益参数;
第二限制单元, 如果当前帧语音频信号为第二类信号, 用于将谱倾斜参数 限制到属于第一区间值,得到谱倾斜参数限制值, 以所述谱倾斜参数限制值作 为高频带信号的时域全局增益参数。
20、 根据权利要求 19所述的装置, 其特征在于, 所述第一类信号为摩擦 音信号, 第二类信号为非摩擦音信号; 当谱倾斜参数 tilt>5且相关性参数 cor 小于一给定值时, 将窄频带信号分成摩擦音; 其他的为非摩擦音; 所述第一预 定值为 8; 第一预定区间为 [0.5,1]。
21、 根据权利要求 17-20所述的任一装置, 其特征在于, 所述带宽切换为 窄频带信号到宽频带信号的切换, 所述装置还包括:
时域包络获得单元,用于将预设一系列值作为当前帧语音频信号的高频带 时域包络参数;
所述修正单元,用于利用时域包络参数和预测的全局增益参数对所述初始 高频带信号进行修正, 获得修正的高频带时域信号。
22、 根据权利要求 17-20所述的任一装置, 其特征在于, 所述获取单元包 括:
激励信号获得单元, 用于根据当前帧语音频信号预测高频带信号激励信 号;
LPC系数获得单元, 用于预测高频带信号的 LPC系数; 合成单元, 用于合成高频带信号激励信号和高频带信号的 LPC 系数, 获 得所述预测高频带信号。
23、 根据权利要求 17-20所述的任一装置, 其特征在于, 所述带宽切换为 窄频带信号到宽频带信号的切换, 所述装置还包括:
加权因子设置单元,如果当前音频帧与前一帧语音频信号的窄带信号具有 预定相关性时, 用于对前一帧语音频信号对应的所述能量比值的加权因子 alfa 按一定的步长衰减后的值作为当前音频帧对应的所述能量比值的加权因子,逐 帧衰减直到 alfa为 0。
PCT/CN2013/072075 2012-03-01 2013-03-01 一种语音频信号处理方法和装置 WO2013127364A1 (zh)

Priority Applications (21)

Application Number Priority Date Filing Date Title
IN1739KON2014 IN2014KN01739A (zh) 2012-03-01 2013-03-01
EP16187948.1A EP3193331B1 (en) 2012-03-01 2013-03-01 Speech/audio signal processing method and apparatus
ES13754564.6T ES2629135T3 (es) 2012-03-01 2013-03-01 Procedimiento y dispositivo de procesamiento de señales de frecuencia de voz
MX2014010376A MX345604B (es) 2012-03-01 2013-03-01 Metodo y aparato de procesamiento de señal de voz/audio.
KR1020177002148A KR101844199B1 (ko) 2012-03-01 2013-03-01 음성 주파수 신호 처리 방법 및 장치
KR1020167028242A KR101702281B1 (ko) 2012-03-01 2013-03-01 음성 주파수 신호 처리 방법 및 장치
CA2865533A CA2865533C (en) 2012-03-01 2013-03-01 Speech/audio signal processing method and apparatus
MX2017001662A MX364202B (es) 2012-03-01 2013-03-01 Metodo y aparato de procesamiento de señal de voz/audio.
EP13754564.6A EP2821993B1 (en) 2012-03-01 2013-03-01 Voice frequency signal processing method and device
JP2014559077A JP6010141B2 (ja) 2012-03-01 2013-03-01 音声/オーディオ信号処理方法および装置
RU2014139605/08A RU2585987C2 (ru) 2012-03-01 2013-03-01 Устройство и способ обработки речевого/аудио сигнала
SG11201404954WA SG11201404954WA (en) 2012-03-01 2013-03-01 Speech/audio signal processing method and apparatus
BR112014021407-7A BR112014021407B1 (pt) 2012-03-01 2013-03-01 método de processamento de sinal de voz/áudio e aparelho
KR1020147025655A KR101667865B1 (ko) 2012-03-01 2013-03-01 음성 주파수 신호 처리 방법 및 장치
PL18199234T PL3534365T3 (pl) 2012-03-01 2013-03-01 Sposób i aparat do przetwarzania sygnału mowy/dźwięku
EP18199234.8A EP3534365B1 (en) 2012-03-01 2013-03-01 Speech/audio signal processing method and apparatus
ZA2014/06248A ZA201406248B (en) 2012-03-01 2014-08-25 Voice frequency signal processing method and device
US14/470,559 US9691396B2 (en) 2012-03-01 2014-08-27 Speech/audio signal processing method and apparatus
US15/616,188 US10013987B2 (en) 2012-03-01 2017-06-07 Speech/audio signal processing method and apparatus
US16/021,621 US10360917B2 (en) 2012-03-01 2018-06-28 Speech/audio signal processing method and apparatus
US16/457,165 US10559313B2 (en) 2012-03-01 2019-06-28 Speech/audio signal processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210051672.6A CN103295578B (zh) 2012-03-01 2012-03-01 一种语音频信号处理方法和装置
CN201210051672.6 2012-03-01

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/470,559 Continuation US9691396B2 (en) 2012-03-01 2014-08-27 Speech/audio signal processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2013127364A1 true WO2013127364A1 (zh) 2013-09-06

Family

ID=49081655

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/072075 WO2013127364A1 (zh) 2012-03-01 2013-03-01 一种语音频信号处理方法和装置

Country Status (20)

Country Link
US (4) US9691396B2 (zh)
EP (3) EP3534365B1 (zh)
JP (3) JP6010141B2 (zh)
KR (3) KR101844199B1 (zh)
CN (2) CN103295578B (zh)
BR (1) BR112014021407B1 (zh)
CA (1) CA2865533C (zh)
DK (1) DK3534365T3 (zh)
ES (3) ES2867537T3 (zh)
HU (1) HUE053834T2 (zh)
IN (1) IN2014KN01739A (zh)
MX (2) MX345604B (zh)
MY (1) MY162423A (zh)
PL (1) PL3534365T3 (zh)
PT (2) PT3193331T (zh)
RU (2) RU2585987C2 (zh)
SG (2) SG11201404954WA (zh)
TR (1) TR201911006T4 (zh)
WO (1) WO2013127364A1 (zh)
ZA (1) ZA201406248B (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105814631A (zh) * 2013-12-15 2016-07-27 高通股份有限公司 盲带宽扩展系统和方法
RU2644123C2 (ru) * 2013-10-18 2018-02-07 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Принцип для кодирования аудиосигнала и декодирования аудиосигнала с использованием детерминированной и шумоподобной информации
US10373625B2 (en) 2013-10-18 2019-08-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for encoding an audio signal and decoding an audio signal using speech related spectral shaping information
CN112927709A (zh) * 2021-02-04 2021-06-08 武汉大学 一种基于时频域联合损失函数的语音增强方法

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103295578B (zh) 2012-03-01 2016-05-18 华为技术有限公司 一种语音频信号处理方法和装置
CN108364657B (zh) 2013-07-16 2020-10-30 超清编解码有限公司 处理丢失帧的方法和解码器
CN104517610B (zh) * 2013-09-26 2018-03-06 华为技术有限公司 频带扩展的方法及装置
KR101864122B1 (ko) 2014-02-20 2018-06-05 삼성전자주식회사 전자 장치 및 전자 장치의 제어 방법
CN106683681B (zh) 2014-06-25 2020-09-25 华为技术有限公司 处理丢失帧的方法和装置
WO2019002831A1 (en) 2017-06-27 2019-01-03 Cirrus Logic International Semiconductor Limited REPRODUCTIVE ATTACK DETECTION
GB201713697D0 (en) 2017-06-28 2017-10-11 Cirrus Logic Int Semiconductor Ltd Magnetic detection of replay attack
GB2563953A (en) 2017-06-28 2019-01-02 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB201801527D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Method, apparatus and systems for biometric processes
GB201801526D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for authentication
GB201801530D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for authentication
GB201801532D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for audio playback
GB201801528D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Method, apparatus and systems for biometric processes
GB201801663D0 (en) 2017-10-13 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of liveness
GB201803570D0 (en) 2017-10-13 2018-04-18 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB201719734D0 (en) * 2017-10-30 2018-01-10 Cirrus Logic Int Semiconductor Ltd Speaker identification
GB2567503A (en) * 2017-10-13 2019-04-17 Cirrus Logic Int Semiconductor Ltd Analysing speech signals
GB201801664D0 (en) 2017-10-13 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of liveness
GB201801874D0 (en) 2017-10-13 2018-03-21 Cirrus Logic Int Semiconductor Ltd Improving robustness of speech processing system against ultrasound and dolphin attacks
GB201804843D0 (en) 2017-11-14 2018-05-09 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB201801659D0 (en) 2017-11-14 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of loudspeaker playback
US11475899B2 (en) 2018-01-23 2022-10-18 Cirrus Logic, Inc. Speaker identification
US11264037B2 (en) 2018-01-23 2022-03-01 Cirrus Logic, Inc. Speaker identification
US11735189B2 (en) 2018-01-23 2023-08-22 Cirrus Logic, Inc. Speaker identification
US10692490B2 (en) 2018-07-31 2020-06-23 Cirrus Logic, Inc. Detection of replay attack
US10915614B2 (en) 2018-08-31 2021-02-09 Cirrus Logic, Inc. Biometric authentication
US11037574B2 (en) 2018-09-05 2021-06-15 Cirrus Logic, Inc. Speaker recognition and speaker change detection
CN115294947A (zh) * 2022-07-29 2022-11-04 腾讯科技(深圳)有限公司 音频数据处理方法、装置、电子设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101335002A (zh) * 2007-11-02 2008-12-31 华为技术有限公司 一种音频解码的方法和装置
CN101499278A (zh) * 2008-02-01 2009-08-05 华为技术有限公司 音频信号切换处理方法和装置
CN101751925A (zh) * 2008-12-10 2010-06-23 华为技术有限公司 一种语音解码方法及装置
CN101964189A (zh) * 2010-04-28 2011-02-02 华为技术有限公司 语音频信号切换方法及装置

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2252170A1 (en) 1998-10-27 2000-04-27 Bruno Bessette A method and device for high quality coding of wideband speech and audio signals
DE60040146D1 (de) 1999-04-26 2008-10-16 Lucent Technologies Inc Pfadumschaltung im bezug auf übertragungsbedarf
CA2290037A1 (en) * 1999-11-18 2001-05-18 Voiceage Corporation Gain-smoothing amplifier device and method in codecs for wideband speech and audio signals
US6606591B1 (en) 2000-04-13 2003-08-12 Conexant Systems, Inc. Speech coding employing hybrid linear prediction coding
US7113522B2 (en) 2001-01-24 2006-09-26 Qualcomm, Incorporated Enhanced conversion of wideband signals to narrowband signals
JP2003044098A (ja) 2001-07-26 2003-02-14 Nec Corp 音声帯域拡張装置及び音声帯域拡張方法
WO2006028009A1 (ja) * 2004-09-06 2006-03-16 Matsushita Electric Industrial Co., Ltd. スケーラブル復号化装置および信号消失補償方法
EP1898397B1 (en) 2005-06-29 2009-10-21 Panasonic Corporation Scalable decoder and disappeared data interpolating method
JP2009524101A (ja) 2006-01-18 2009-06-25 エルジー エレクトロニクス インコーポレイティド 符号化/復号化装置及び方法
RU2414009C2 (ru) * 2006-01-18 2011-03-10 ЭлДжи ЭЛЕКТРОНИКС ИНК. Устройство и способ для кодирования и декодирования сигнала
US9454974B2 (en) * 2006-07-31 2016-09-27 Qualcomm Incorporated Systems, methods, and apparatus for gain factor limiting
GB2444757B (en) 2006-12-13 2009-04-22 Motorola Inc Code excited linear prediction speech coding
JP4733727B2 (ja) 2007-10-30 2011-07-27 日本電信電話株式会社 音声楽音擬似広帯域化装置と音声楽音擬似広帯域化方法、及びそのプログラムとその記録媒体
BRPI0818927A2 (pt) * 2007-11-02 2015-06-16 Huawei Tech Co Ltd Método e aparelho para a decodificação de áudio
KR100930061B1 (ko) * 2008-01-22 2009-12-08 성균관대학교산학협력단 신호 검출 방법 및 장치
JP5448657B2 (ja) * 2009-09-04 2014-03-19 三菱重工業株式会社 空気調和機の室外機
US8484020B2 (en) 2009-10-23 2013-07-09 Qualcomm Incorporated Determining an upperband signal from a narrowband signal
CN102044250B (zh) * 2009-10-23 2012-06-27 华为技术有限公司 频带扩展方法及装置
JP5287685B2 (ja) * 2009-11-30 2013-09-11 ダイキン工業株式会社 空調室外機
US8000968B1 (en) * 2011-04-26 2011-08-16 Huawei Technologies Co., Ltd. Method and apparatus for switching speech or audio signals
AR085895A1 (es) * 2011-02-14 2013-11-06 Fraunhofer Ges Forschung Generacion de ruido en codecs de audio
CN103295578B (zh) * 2012-03-01 2016-05-18 华为技术有限公司 一种语音频信号处理方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101335002A (zh) * 2007-11-02 2008-12-31 华为技术有限公司 一种音频解码的方法和装置
CN101499278A (zh) * 2008-02-01 2009-08-05 华为技术有限公司 音频信号切换处理方法和装置
CN101751925A (zh) * 2008-12-10 2010-06-23 华为技术有限公司 一种语音解码方法及装置
CN101964189A (zh) * 2010-04-28 2011-02-02 华为技术有限公司 语音频信号切换方法及装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2644123C2 (ru) * 2013-10-18 2018-02-07 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Принцип для кодирования аудиосигнала и декодирования аудиосигнала с использованием детерминированной и шумоподобной информации
US10304470B2 (en) 2013-10-18 2019-05-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for encoding an audio signal and decoding an audio signal using deterministic and noise like information
US10373625B2 (en) 2013-10-18 2019-08-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for encoding an audio signal and decoding an audio signal using speech related spectral shaping information
US10607619B2 (en) 2013-10-18 2020-03-31 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for encoding an audio signal and decoding an audio signal using deterministic and noise like information
US10909997B2 (en) 2013-10-18 2021-02-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for encoding an audio signal and decoding an audio signal using speech related spectral shaping information
US11798570B2 (en) 2013-10-18 2023-10-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Concept for encoding an audio signal and decoding an audio signal using deterministic and noise like information
US11881228B2 (en) 2013-10-18 2024-01-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E. V. Concept for encoding an audio signal and decoding an audio signal using speech related spectral shaping information
CN105814631A (zh) * 2013-12-15 2016-07-27 高通股份有限公司 盲带宽扩展系统和方法
CN112927709A (zh) * 2021-02-04 2021-06-08 武汉大学 一种基于时频域联合损失函数的语音增强方法
CN112927709B (zh) * 2021-02-04 2022-06-14 武汉大学 一种基于时频域联合损失函数的语音增强方法

Also Published As

Publication number Publication date
DK3534365T3 (da) 2021-04-12
ES2867537T3 (es) 2021-10-20
BR112014021407A2 (pt) 2019-04-16
CN103295578B (zh) 2016-05-18
CA2865533A1 (en) 2013-09-06
US10360917B2 (en) 2019-07-23
KR20170013405A (ko) 2017-02-06
EP3534365A1 (en) 2019-09-04
ES2741849T3 (es) 2020-02-12
SG10201608440XA (en) 2016-11-29
EP2821993B1 (en) 2017-05-10
MX364202B (es) 2019-04-16
SG11201404954WA (en) 2014-10-30
JP6378274B2 (ja) 2018-08-22
US10013987B2 (en) 2018-07-03
PT3193331T (pt) 2019-08-27
US20180374488A1 (en) 2018-12-27
PL3534365T3 (pl) 2021-07-12
JP2017027068A (ja) 2017-02-02
MX345604B (es) 2017-02-03
US20150006163A1 (en) 2015-01-01
RU2585987C2 (ru) 2016-06-10
US10559313B2 (en) 2020-02-11
CA2865533C (en) 2017-11-07
JP2018197869A (ja) 2018-12-13
CN105469805B (zh) 2018-01-12
EP2821993A1 (en) 2015-01-07
HUE053834T2 (hu) 2021-07-28
KR101667865B1 (ko) 2016-10-19
KR101844199B1 (ko) 2018-03-30
MY162423A (en) 2017-06-15
IN2014KN01739A (zh) 2015-10-23
KR20160121612A (ko) 2016-10-19
BR112014021407B1 (pt) 2019-11-12
KR101702281B1 (ko) 2017-02-03
EP3193331A1 (en) 2017-07-19
US20170270933A1 (en) 2017-09-21
RU2014139605A (ru) 2016-04-20
EP2821993A4 (en) 2015-02-25
JP6010141B2 (ja) 2016-10-19
JP6558748B2 (ja) 2019-08-14
JP2015512060A (ja) 2015-04-23
US20190318747A1 (en) 2019-10-17
US9691396B2 (en) 2017-06-27
EP3193331B1 (en) 2019-05-15
MX2014010376A (es) 2014-12-05
PT2821993T (pt) 2017-07-13
EP3534365B1 (en) 2021-01-27
ZA201406248B (en) 2016-01-27
TR201911006T4 (tr) 2019-08-21
ES2629135T3 (es) 2017-08-07
CN103295578A (zh) 2013-09-11
KR20140124004A (ko) 2014-10-23
CN105469805A (zh) 2016-04-06
RU2616557C1 (ru) 2017-04-17

Similar Documents

Publication Publication Date Title
JP6558748B2 (ja) 音声/オーディオ信号処理方法および装置
JP6892491B2 (ja) 会話/音声信号処理方法および符号化装置
JP2014507681A (ja) 帯域幅を拡張する方法および装置
CN105761724B (zh) 一种语音频信号处理方法和装置
JP5480226B2 (ja) 信号処理装置および信号処理方法
JP2010158044A (ja) 信号処理装置および信号処理方法
JP2010160496A (ja) 信号処理装置および信号処理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13754564

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2865533

Country of ref document: CA

REEP Request for entry into the european phase

Ref document number: 2013754564

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2013754564

Country of ref document: EP

Ref document number: MX/A/2014/010376

Country of ref document: MX

ENP Entry into the national phase

Ref document number: 2014559077

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20147025655

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2014139605

Country of ref document: RU

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: IDP00201405965

Country of ref document: ID

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112014021407

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 112014021407

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20140828

ENPC Correction to former announcement of entry into national phase, pct application did not enter into the national phase

Ref country code: BR

ENPC Correction to former announcement of entry into national phase, pct application did not enter into the national phase

Ref country code: BR

REG Reference to national code

Ref country code: BR

Ref legal event code: B01E

Ref document number: 112014021407

Country of ref document: BR

Kind code of ref document: A8

ENP Entry into the national phase

Ref document number: 112014021407

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20140828