WO2014190649A1 - 信号解码方法及设备 - Google Patents

信号解码方法及设备 Download PDF

Info

Publication number
WO2014190649A1
WO2014190649A1 PCT/CN2013/084514 CN2013084514W WO2014190649A1 WO 2014190649 A1 WO2014190649 A1 WO 2014190649A1 CN 2013084514 W CN2013084514 W CN 2013084514W WO 2014190649 A1 WO2014190649 A1 WO 2014190649A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency band
band
energy
amplitude
extended
Prior art date
Application number
PCT/CN2013/084514
Other languages
English (en)
French (fr)
Inventor
刘泽新
苗磊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP13886051.5A priority Critical patent/EP2991074B1/en
Publication of WO2014190649A1 publication Critical patent/WO2014190649A1/zh
Priority to US14/952,902 priority patent/US9892739B2/en
Priority to US15/894,517 priority patent/US10490199B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
    • G10L19/107Sparse pulse excitation, e.g. by using algebraic codebook
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Definitions

  • the present invention relates to the field of information technology, and in particular, to a signal decoding method and apparatus.
  • the encoder In the process of signal coding in the coding end, in order to improve coding efficiency, it is often desirable to characterize the signal to be transmitted with as few coding bits as possible. For example, at low rate encoding, the encoder often does not encode all frequency bands. Considering that the human ear is more sensitive to the low frequency part of the speech or audio signal than to the high frequency part, usually more bits are allocated in the low frequency part for encoding, and only a few bits are allocated in the high frequency part for encoding, in some cases Even the high frequency part is not encoded. Therefore, it is necessary to recover the uncoded frequency band by blind bandwidth extension technique when decoding at the decoding end.
  • the decoding end often uses the time domain band extension method to recover the uncoded frequency band, but this method has a poor expansion effect on the voice signal and cannot process the audio signal, thus resulting in poor performance of the output voice or audio signal.
  • a signal decoding method including: decoding a bit stream of a voice or audio signal to obtain a decoded signal; predicting an excitation signal of an extended frequency band according to the decoded signal, wherein the extended frequency band and the a frequency band of the decoded signal is adjacent, and a frequency band of the decoded signal is lower than the extended frequency band; a first frequency band and a second frequency band are selected in the decoded signal, according to a spectral coefficient of the first frequency band and the second a spectral coefficient of the frequency band predicting a spectral envelope of the extended frequency band, wherein a lowest frequency point of the first frequency band is less than or equal to a first value, and a highest frequency of the second frequency band The frequency point is less than or equal to a second value of the lowest frequency point of the first frequency band;
  • the selecting the first frequency band and the second frequency band in the decoded signal includes: following a direction from a starting point of the extended frequency band to a low frequency
  • the first frequency band and the second frequency band are selected from the frequency band of the decoded signal, wherein the lowest frequency point of the first frequency band is equal to the first frequency value, and the first value is 0;
  • the lowest frequency point of the highest frequency point of the second frequency band is equal to the second value of the first frequency band, and the second value is 0.
  • the predicting according to a spectral coefficient of the first frequency band and a spectral coefficient of the second frequency band includes: dividing the first frequency band into M sub-bands, and determining an average value of energy or amplitude of each sub-band according to a spectral coefficient of the first frequency band, where M is a positive integer; Determining an average value of energy or amplitude of each subband, determining an adjustment value of energy or amplitude of each subband; predicting a first spectral envelope of the extended frequency band according to an adjustment value of energy or amplitude of each subband Determining an average of energy or amplitude of the second frequency band according to a spectral coefficient of the second frequency band; predicting a location according to a first spectral envelope of the extended frequency band and an average of energy or amplitude of the second frequency band The spectral envelope of the extended band.
  • the mean value of the energy or amplitude of the (i+1) sub-bands is used as an adjustment value of the energy or amplitude of the (i+1)th sub-band; the mean value of the energy or amplitude of the i-th sub-band is smaller than the i+1) the mean of the energy or amplitude of the sub-bands, adjusting the mean of the energy or amplitude of the (i+1)th sub-band to determine the energy or amplitude of the (i+1)th sub-band Adjusting a value, and using an average value of energy or amplitude of the i-th sub-band as an adjustment value of energy or amplitude of the i-th sub-band; if an average of energy or amplitude of the i-th sub-band and the first ( i+1) the ratio between the energy or amplitude mean of the sub-bands is within a preset threshold range, and the mean value of the energy or amplitude of the i-th sub-band is taken as the energy or amplitude of
  • the Predicting a spectral envelope of the extended frequency band by a first spectral envelope of the extended frequency band and an average of energy or amplitude of the second frequency band comprising: a first spectral envelope according to an extended frequency band of a current frame, and the An average of the energy or amplitude of the second frequency band of the current frame, determining a second spectral envelope of the extended frequency band of the current frame; and determining a second spectrum of the extended frequency band of the current frame if it is determined that the preset condition is met
  • the envelope is weighted with the spectral envelope of the extended band of the previous frame to determine a spectral envelope of the extended band of the current frame; and if the predetermined condition is not met, the extended band of the current frame is
  • the second spectral envelope serves as the spectral envelope of the extended frequency band of the current frame.
  • the Predicting a spectral envelope of the extended frequency band by a first spectral envelope of the extended frequency band and an average of energy or amplitude of the second frequency band comprising: a first spectral envelope according to an extended frequency band of a current frame, and the An average of the energy or amplitude of the second frequency band of the current frame, determining a second spectral envelope of the extended frequency band of the current frame; and determining a second spectrum of the extended frequency band of the current frame if it is determined that the preset condition is met
  • the envelope is weighted with a spectral envelope of the extended band of the previous frame to determine a third spectral envelope of the extended band of the current frame; and the extension of the current frame is determined if it is determined that the preset condition is not met a second spectral envelope of the frequency band as a third spectral envelope of the extended
  • the preset condition includes at least one of the following three conditions: Condition 1: the encoding mode of the voice or audio signal of the current frame is different from the encoding mode of the voice or audio signal of the previous frame; Condition 2: the decoded signal of the previous frame is a non-friction sound, and the The ratio between the mean value of the energy or amplitude of the mth frequency band in the decoded signal of the current frame and the mean value of the energy or amplitude of the nth frequency band in the decoded signal of the previous frame is within a preset threshold range, where m And n are positive integers; Condition 3: the decoded signal of the current frame is a non-frictional sound, and the second spectral envelope of the extended frequency band of the current frame and the spectral envelope of the extended frequency band of the previous frame a ratio greater than a ratio between an average of energy or amplitude of the
  • the signal predicting the excitation signal of the extended frequency band includes: when the coding mode of the voice or audio signal is a time domain coding mode, selecting a third frequency band from the decoded signal, the third frequency band and the The extension band is adjacent; and the excitation signal of the extended band is predicted according to the spectral coefficient of the third band.
  • the excitation signal of the extended frequency band includes: when the coding mode of the voice or audio signal is a time-frequency joint coding mode or a frequency domain coding mode, selecting a fourth frequency band from the decoded signal, where the fourth frequency band is The number of allocated bits is greater than a preset number of bits threshold; and the excitation signal of the extended frequency band is predicted according to the spectral coefficients of the fourth frequency band.
  • the method further includes: Combining the decoded signal with a frequency domain signal of the extended frequency band to obtain a frequency domain output signal;
  • the frequency domain output signal is frequency-time transformed to obtain a final output signal.
  • the method further includes: When the encoding mode of the voice or audio signal is the time domain coding mode, acquiring the first time domain signal of the extended frequency band according to the time domain band spreading manner; and converting the frequency domain signal of the extended frequency band into the extended frequency band Synthesizing a second time domain signal of the extended frequency band and a second time domain signal of the extended frequency band to obtain a final time domain signal of the extended frequency band; The final time domain signal of the extended frequency band is synthesized to obtain the final output signal.
  • a signal decoding apparatus including: a decoding unit, configured to decode a bit stream of a voice or audio signal, to obtain a decoded signal; and the prediction unit, configured to receive the decoding from the decoding unit a signal, and predicting an excitation signal of the extended frequency band according to the decoded signal, wherein the extended frequency band is adjacent to a frequency band of the decoded signal, and a frequency band of the decoded signal is lower than the extended frequency band; And a method for selecting a first frequency band and a second frequency band in the decoded signal, and predicting a spectral envelope of the extended frequency band according to a spectral coefficient of the first frequency band and a spectral coefficient of the second frequency band, where The lowest frequency point of the first frequency band is less than or equal to a first value, and the lowest frequency point of the second frequency band is less than or equal to a second frequency point of the first frequency band.
  • a determining unit configured to receive, from the prediction unit, a spectrum envelope of the extended frequency band and an excitation signal of the extended frequency band, according to the Show the spectral envelope of the band and extension band excitation signal, determining a frequency domain signal band of the extended.
  • the predicting unit is specifically configured to select the first frequency band and the second frequency band in the decoded signal according to a direction from a starting point of the extended frequency band to a low frequency
  • the lowest frequency point of the highest frequency point of the first frequency band from the extended frequency band is equal to the first value, the first value is 0; the highest frequency point distance of the second frequency band is the first
  • the lowest frequency point of a frequency band is equal to the second value, and the second value is zero.
  • the prediction unit is specifically configured to divide the first frequency band into M sub-bands, and according to the The spectral coefficients of the first frequency band determine an average of the energy or amplitude of each sub-band, where M is a positive integer; determining an adjustment value of the energy or amplitude of each sub-band according to the mean of the energy or amplitude of each sub-band; The adjustment of the energy or amplitude of the subbands, predicting the first spectral envelope of the extended band; Determining an average of energy or amplitude of the second frequency band according to a spectral coefficient of the second frequency band; predicting the first spectral envelope of the extended frequency band and an average of energy or amplitude of the second frequency band The spectral envelope of the extended band.
  • the predicting unit is specifically configured to use an ith subband and an (i+1)th sub of the M subbands Band, if the ratio between the mean of the energy or amplitude of the i-th sub-band and the mean of the energy or amplitude of the (i+1)th sub-band is not within a preset threshold range, then in the ith When the mean value of the energy or amplitude of the sub-bands is greater than the mean of the energy or amplitude of the (i+1)th sub-band, the mean of the energy or amplitude of the i-th sub-band is adjusted to determine the ith sub-band An adjustment value of the energy or amplitude, and the mean value of the energy or amplitude of the (i+1)th sub-band is used as an adjustment value of the energy or amplitude of the (i+1)th sub-band; When the mean of the energy or amplitude of the sub-bands is less than the mean of the energy
  • the prediction unit is specifically configured to use, according to a first spectrum envelope of an extended frequency band of a current frame, and a first frame of the current frame An average of energy or amplitude of the two frequency bands, determining a second spectral envelope of the extended frequency band of the current frame; and determining a second spectral envelope of the extended frequency band of the current frame, if the predetermined condition is met
  • the spectral envelope of the extended frequency band of one frame is weighted to determine a spectral envelope of the extended frequency band of the current frame; and the second spectrum packet of the extended frequency band of the current frame is determined if it is determined that the preset condition is not met
  • the network acts as the spectral envelope of the extended band of the current frame.
  • the prediction The unit is specifically configured to determine, according to a first spectrum envelope of the extended frequency band of the current frame and an average of energy or amplitude of the second frequency band of the current frame, a second spectrum envelope of the extended frequency band of the current frame; In the case of a preset condition, the second spectral envelope of the extended frequency band of the current frame is weighted with the spectral envelope of the extended frequency band of the previous frame to determine a third spectral envelope of the extended frequency band of the current frame.
  • the second spectrum envelope of the extended frequency band of the current frame as a third spectrum envelope of the extended frequency band of the current frame; according to a pitch period of the decoded signal, A voiced sound factor of the decoded signal and a third spectral envelope of the extended frequency band of the current frame determine a spectral envelope of an extended frequency band of the current frame.
  • the preset condition includes at least one of the following three conditions: Condition 1: the encoding mode of the voice or audio signal of the current frame is different from the encoding mode of the voice or audio signal of the previous frame; Condition 2: the decoded signal of the previous frame is a non-friction sound, and the The ratio between the mean value of the energy or amplitude of the mth frequency band in the decoded signal of the current frame and the mean value of the energy or amplitude of the nth frequency band in the decoded signal of the previous frame is within a preset threshold range, where m And n are positive integers; Condition 3: the decoded signal of the current frame is a non-frictional sound, and the second spectral envelope of the extended frequency band of the current frame and the spectral envelope of the extended frequency band of the previous frame a ratio greater than a ratio between an average of energy or amplitude of the
  • the prediction unit is specifically used in When the encoding mode of the voice or audio signal is the time domain coding mode, the third frequency band is selected from the decoded signal, and the third frequency band is adjacent to the extended frequency band; according to the spectral coefficient of the third frequency band And predicting an excitation signal of the extended frequency band.
  • the prediction unit is specifically configured to be used in the foregoing And selecting a fourth frequency band from the decoded signal, where a number of bits allocated by the fourth frequency band is greater than a preset number of bit thresholds; and an excitation signal of the extended frequency band is predicted according to a spectral coefficient of the fourth frequency band.
  • the first synthesis unit is configured to Combining the decoded signal with the frequency domain signal of the extended frequency band to obtain a frequency domain output signal; and a first transforming unit, configured to perform frequency-time transform on the frequency domain output signal to obtain a final output signal.
  • the acquiring unit is configured to be used in the voice Or the case where the encoding mode of the audio signal is the time domain coding mode, acquiring the first time domain signal of the extended frequency band according to the time domain band spreading manner; and the second transforming unit, configured to use the frequency domain signal of the extended frequency band Transforming into a second time domain signal of the extended frequency band; a second combining unit, configured to synthesize a first time domain signal of the extended frequency band and a second time domain signal of the extended frequency band, to obtain the extended frequency band The final time domain signal; the second synthesizing unit is further configured to synthesize the decoded signal and the final time domain signal of the extended frequency band to obtain a final output signal.
  • a signal encoding method including: performing core layer encoding on a voice or audio signal to obtain a core layer code stream of the voice or audio signal; performing extended layer processing on the voice or audio signal to determine a first envelope of the extended frequency band; determining a second envelope of the extended frequency band based on a signal to noise ratio of the voice or audio signal, a pitch period of the voice or audio signal, and a first envelope of the extended frequency band Encoding the second envelope to obtain an extended layer code stream; and transmitting the core layer code stream and the extended layer code stream to a decoding end.
  • a signal decoding method including: receiving a voice or audio signal from an encoding end a core layer code stream and an extension layer code stream; decoding the extension layer code stream to determine a second envelope of the extended frequency band, wherein the second envelope is the coding end according to the voice or audio signal a signal to noise ratio, a pitch period of the speech or audio signal, and a first envelope of the extended frequency band; determining the core layer code stream to obtain a core layer speech or audio signal; a voice or audio signal, predicting an excitation signal of the extended frequency band; predicting a signal of the extended frequency band based on an excitation signal of the extended frequency band and a second envelope of the extended frequency band.
  • a signal encoding apparatus including: an encoding unit, configured to perform core layer encoding on a voice or audio signal, to obtain a core layer code stream of the voice or audio signal; and a first determining unit, configured to Performing an enhancement layer processing on the voice or audio signal to determine a first envelope of the extended frequency band; a second determining unit, configured to: according to a signal to noise ratio of the voice or audio signal, a pitch period of the voice or audio signal And determining, by the first envelope of the extended frequency band, a second envelope of the extended frequency band; the coding unit is further configured to encode the second envelope to obtain an extended layer code stream; Transmitting the core layer code stream and the extension layer code stream to a decoding end.
  • a signal decoding apparatus including: a receiving unit, a core layer code stream and an extended layer code stream for receiving a voice or audio signal from an encoding end; and a decoding unit, configured to use the extended layer code stream Decoding, determining a second envelope of the extended frequency band, wherein the second envelope is a signal to noise ratio of the audio or audio signal, a pitch period of the voice or audio signal, and the extended frequency band of the encoding end Decoding the first envelope; the decoding unit is further configured to decode the core layer code stream to obtain a core layer voice or audio signal; and a prediction unit, configured to predict, according to the core layer voice or audio signal, The excitation signal of the extended frequency band; the prediction unit is further configured to predict the signal of the extended frequency band according to the excitation signal of the extended frequency band and the second envelope of the extended frequency band.
  • the present invention by separately predicting the spectral envelope and the excitation signal of the extended frequency band according to the decoded signal obtained from the bit stream of the voice or audio signal, it is possible to determine the frequency domain signal of the extended frequency band of the voice or audio signal, thereby enabling Improve the performance of voice or audio signals.
  • FIG. 1 is a schematic flowchart of a signal decoding method according to an embodiment of the present invention.
  • 2 is a schematic flow chart of a process of a signal decoding method according to an embodiment of the present invention.
  • 3 is a schematic block diagram of a signal decoding apparatus in accordance with one embodiment of the present invention.
  • 4 is a schematic block diagram of a signal decoding apparatus according to another embodiment of the present invention.
  • FIG. 5 is a schematic block diagram of a signal decoding apparatus according to another embodiment of the present invention.
  • Figure 6 is a schematic block diagram of a signal decoding apparatus in accordance with one embodiment of the present invention.
  • FIG. 7 is a schematic flowchart of a signal encoding method according to an embodiment of the present invention.
  • FIG. 8 is a schematic flowchart of a signal decoding method according to an embodiment of the present invention.
  • 9 is a schematic block diagram of a signal encoding apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic block diagram of a signal decoding apparatus according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a signal decoding method according to an embodiment of the present invention.
  • the method of Figure 1 is performed by a signal decoding device, for example, may be a decoder. 110. Decode a bit stream of a voice or audio signal to obtain a decoded signal.
  • a bitstream of a speech or audio signal is obtained by a signal encoding device (e.g., an encoder) encoding the original speech or audio signal.
  • the signal decoding device acquires the bit stream of the voice or audio signal, the bit stream can be decoded to obtain a decoded signal.
  • the decoding process can refer to the process of the prior art. To avoid repetition, no further details are provided herein.
  • the decoded signal may be a decoded signal of a low frequency band. For example, if the coding mode of the voice signal is the time domain coding mode, the signal decoding device can decode the bit stream of the voice signal according to the corresponding decoding mode. If the audio signal is encoded In the time domain joint coding mode or the frequency domain coding mode, the signal decoding device can decode the bit stream of the audio signal according to the corresponding decoding mode.
  • the excitation signal of the extended frequency band is predicted according to the decoded signal, wherein a frequency band of the decoded signal is lower than an extended frequency band, and a frequency band of the decoded signal is lower than the extended frequency band.
  • the signal decoding device may select the third frequency band from the decoded signal, and the third frequency band is adjacent to the extended frequency band.
  • the excitation signal of the extended frequency band can be predicted based on the spectral coefficient of the third frequency band.
  • the signal decoding device can predict the excitation signal of the extension band based on the spectral coefficient of the third band adjacent to the extension band.
  • the signal decoding device may select the fourth frequency band from the decoded signal, and the fourth frequency band The number of allocated bits is greater than a preset number of bits threshold.
  • the excitation signal of the extended frequency band can be predicted based on the spectral coefficient of the fourth frequency band.
  • the signal decoding apparatus can predict the excitation signal of the extended frequency band based on the spectral coefficient of the fourth frequency band.
  • the highest frequency point of the first frequency band is the extended frequency band
  • the lowest high frequency point is less than or equal to the first value
  • the highest frequency point of the second frequency band is less than or equal to the second value from the lowest high frequency point of the first frequency band.
  • the extended frequency band may be a frequency band that needs to be expanded.
  • the encoder is encoded by the ACELP (Algebraic Codebook Excited Linear Prediction) coding mode
  • the wideband signal with a sampling rate of 16 kHz can be downsampled to a signal with a sampling rate of 12.8 kHz in order to improve the coding efficiency. Then recode.
  • the signal decoding device decodes the bit stream, the obtained decoded signal has a bandwidth of 6.4 kHz.
  • the signal decoding device can extend the frequency band of 6 kHz to 8 kHz, that is, a signal with a frequency band of 6 kHz to 8 kHz. If in order to obtain an output signal with a bandwidth of 14 kHz, the signal decoding device can To extend the frequency band of 6.4 kHz to 14 kHz, that is, to extend the signal band of 6.4 kHz to 14 kHz.
  • the spectrum envelope of the extended frequency band may include N envelope values, and N is a positive integer, and the value of N may be determined according to actual conditions.
  • the first frequency band and the second frequency band may be selected from the decoded signal from the starting point of the extended frequency band to the low frequency direction.
  • the extended frequency band can be more accurate (ie, closer) Real signal).
  • the first value and the second value are respectively to ensure that the first frequency band and the extended frequency band are sufficiently close to the first frequency band.
  • the first value and the second value may be positive integers or positive numbers; may be represented by a spectral coefficient or a number of frequency points; or may be represented by a bandwidth.
  • the first value and the second value may or may not be equal.
  • the first value and the second value may be preset as needed, for example, the first value and the second value may be set based on the sampling rate and the number of samples of the time-frequency conversion of the speech or audio signal. For example, if 40 spectral coefficients represent 1 kHz, the first value and the second value may be 40, respectively, that is, the distance between the first frequency band and the extended frequency band may be within 1 kHz; the distance between the second frequency band and the first frequency band; Can be within 1 kHz.
  • selecting the first frequency band and the second frequency band in the decoded signal comprises: selecting a first frequency band and a second frequency band in a frequency band of the decoded signal according to a direction from a starting point of the extended frequency band to a low frequency, wherein the first frequency band The lowest frequency point of the highest frequency point distance extension band is equal to the first value, and the first value is 0; the lowest frequency point of the second frequency band is equal to the second value from the lowest frequency point of the first frequency band, and the second value is 0.
  • the first value and the second value may be zero. Then the first frequency band is adjacent to the extended frequency band and the second frequency band is adjacent to the first frequency band.
  • the signal decoding device may select the first frequency band and the second frequency band from the starting point of the extended frequency band to the low frequency direction, wherein the first frequency band and the extended frequency band may be Adjacent, the second frequency band may be adjacent to the first frequency band.
  • the signal decoding apparatus can predict the spectral envelope of the extended frequency band based on the spectral coefficients of the first frequency band and the spectral coefficients of the second frequency band. Specifically, the signal decoding device may sequentially select the first frequency band and the second frequency band in the frequency band of the decoded signal from the starting point of the extended frequency band to the low frequency direction.
  • the first frequency band may be 4.8 kHz to 6.4 kHz
  • the second frequency band may be 3.2 kHz to 4.8 kHz.
  • the first frequency band may be 4 kHz to 6.4 kHz
  • the second frequency band may be 3.2 kHz to 4 kHz.
  • the signal decoding apparatus may divide the first frequency band into M sub-bands, and determine an average value of energy or amplitude of each sub-band according to a spectral coefficient of the first frequency band, where M is a positive integer.
  • the adjustment of the energy or amplitude of each sub-band can be determined based on the average of the energy or amplitude of each sub-band.
  • the first spectral envelope of the extended frequency band can be predicted based on the adjusted value of the energy or amplitude of each sub-band.
  • the mean of the energy or amplitude of the second frequency band can be determined based on the spectral coefficients of the second frequency band.
  • the spectral envelope of the spread band can be predicted based on the first spectral envelope of the extended frequency band and the mean of the energy or amplitude of the second frequency band.
  • the signal decoding apparatus may divide the first frequency band into M sub-bands, and determine an average value of energy or amplitude of each sub-band according to a spectral coefficient of the first frequency band, that is, an average of M energy or amplitudes may be obtained. Based on the mean of the M energy or amplitude, the adjusted values of the M energy or amplitude can be determined.
  • the signal decoding device can predict the first spectral envelope of the extended frequency band based on the adjusted values of the M energy or amplitude.
  • the first spectral envelope may be a preliminary prediction of the spectral envelope of the extended frequency band.
  • the first spectrum envelope may include N values.
  • the signal decoding apparatus can predict the spectral envelope of the extended frequency band based on the first spectral envelope of the extended frequency band and the mean of the energy or amplitude of the second frequency band.
  • the mean value of the energy or amplitude of each of the subbands is adjusted to determine a.
  • An adjustment value of the energy or amplitude of each subband of the subbands, and the mean value of the energy or amplitude of each subband of the b subbands is used as an adjustment value of the energy or amplitude of each subband of the b subbands, wherein a subband
  • the threshold range may be determined based on the variance of the mean of the M energies or amplitudes, which may be determined from the mean of the M energies or amplitudes.
  • the mean threshold can be an average of the M mean values, and the mean of those energy or amplitudes of the M energy or amplitude mean values greater than the average value can be scaled Get the corresponding adjustment value.
  • the process of scaling may be to multiply the mean value to be adjusted by a scaling value, which may be obtained from the mean of the energy or amplitude of the M subbands, and the scaling value is less than one.
  • the energy or amplitude of the ith subband is the mean and the (i+1)th subband
  • the ratio between the mean of the energy or amplitude is not within the preset threshold range, and when the mean of the energy or amplitude of the i-th sub-band is greater than the mean of the energy or amplitude of the (i+1)th sub-band,
  • the mean of the energy or amplitude of the subbands is adjusted to determine the adjusted value of the energy or amplitude of the i-th subband, and the mean of the energy or amplitude of the (i+1)th subband is taken as the (i+1)th subband.
  • the energy of the i-th sub-band may be Or the larger of the mean of the amplitude and the mean of the energy or amplitude of the (i+1)th sub-band, adjusted to obtain a corresponding adjustment value, for example, the larger mean value between the two may be scaled, For example, you can multiply a larger mean by a scale value.
  • the signal decoding device may determine the second spectrum of the extended frequency band of the current frame according to the first spectrum envelope of the extended frequency band of the current frame and the average of the energy or amplitude of the second frequency band of the current frame. Envelope.
  • the second spectral envelope of the extended frequency band of the current frame and the spectral envelope of the extended frequency band of the previous frame may be weighted to determine the spectral envelope of the extended frequency band of the current frame.
  • the second spectrum envelope of the extended band of the current frame is taken as the spectrum envelope of the extended band of the current frame.
  • the spectral envelope of the extended frequency band that the signal decoding device needs to predict is also the spectral envelope of the extended frequency band of the current frame.
  • the signal decoding device may determine the second spectral envelope of the extended frequency band according to the first spectral envelope of the extended frequency band and the average of the energy or amplitude of the second frequency band. For example, when the ratio between the mean value of the energy or amplitude of the second frequency band and the mean of the first spectral envelope is greater than a preset value, the values included in the first spectral envelope are respectively scaled, and ⁇ is a positive integer. .
  • the mean of the first spectral envelope may be the mean of the values included in the first spectral envelope. Further, when the ratio between the root value of the mean value of the energy or amplitude of the second frequency band and the mean value of the first spectral envelope is greater than a preset value, the values included in the first spectral envelope may be respectively scaled. For example, the values included in the first spectral envelope may be multiplied by a scaling value, which may be determined based on the mean of the energy or amplitude of the second frequency band and the mean of the first spectral envelope.
  • the scaling value is greater than 1, and in the case where the coding mode of the voice or audio signal is the time-frequency joint coding mode or the frequency domain coding mode, the scaling ratio is used. The value is less than 1.
  • the spectral envelope of the extended band of the current frame needs to be determined based on the spectral envelope of the extended band of the previous frame.
  • the second spectrum envelope may be weighted with the spectral envelope of the extended band of the previous frame to determine the spectral envelope of the extended band of the current frame.
  • the band envelope of the extended band of the current frame may be the second spectrum envelope.
  • the signal decoding device may determine the second spectrum of the extended frequency band of the current frame according to the first spectrum envelope of the extended frequency band of the current frame and the average of the energy or amplitude of the second frequency band of the current frame. Envelope; weighting the second spectral envelope of the extended frequency band of the current frame and the spectral envelope of the extended frequency band of the previous frame to determine the third spectrum of the extended frequency band of the current frame, if it is determined that the preset condition is met Envelope; determining, in the case that the preset condition is not satisfied, the second spectrum envelope of the extended frequency band of the current frame as the third spectrum envelope of the extended frequency band of the current frame; according to the pitch period of the decoded signal, the voiced sound of the decoded signal The degree factor and the third spectral envelope of the extended band of the current frame determine the spectral envelope of the extended band of the current frame.
  • the process of determining the third spectral envelope of the extended frequency band of the current frame is similar to the process of determining the spectral envelope of the extended frequency band of the current frame in the foregoing embodiment. To avoid repetition, details are not described herein again.
  • the third spectrum envelope of the extended frequency band of the current frame is used as the spectral envelope of the extended frequency band of the current frame, but here, in order to make the spectral envelope of the extended frequency band more accurate, it is possible to
  • the third spectral envelope of the extended frequency band is further modified to obtain a spectral envelope of the extended frequency band, that is, according to the pitch period and the voiced sound factor of the above decoded signal (that is, the decoded signal of the current frame),
  • the third spectral envelope of the extended frequency band is modified such that the spectral envelope of the final extended frequency band is inversely proportional to the voiced sound factor and proportional to the pitch period to determine the spectral envelope of the final extended frequency band.
  • the spectral envelope wenv of the extended band can be determined based on the following equation:
  • pitch can represent the pitch period of the decoded signal
  • voice_fac can represent the voiced sound factor of the decoded signal
  • wenv3 can represent the third spectral envelope of the extended frequency band. Al and bl cannot be 0 at the same time, and a2, b2, and c2 cannot be 0 at the same time.
  • this embodiment can be applied to the case where there are bits in the extended band and the case where the extended band is a blind band.
  • the foregoing preset condition may include at least one of the following three conditions: Condition 1: The coding mode of the voice or audio signal of the current frame and the coding mode of the voice or audio signal of the previous frame Different; Condition 2: The decoded signal of the previous frame is non-friction and the mean of the energy or amplitude of the mth band in the decoded signal of the current frame and the energy or amplitude of the nth band of the decoded signal of the previous frame The ratio between the values is within a preset threshold range, where m and n are positive integers; Condition 3: the decoded signal of the current frame is non-friction and the second spectrum envelope of the extended band of the current frame and the extended band of the previous frame The ratio between the spectral envelopes is greater than the ratio of the mean of the energy or amplitude of the jth band of the decoded signal of the current frame to the mean of the energy or amplitude of the kth band of the decoded signal of the previous
  • the encoding mode of the voice or audio signal of the current frame is different from the encoding mode of the voice or audio signal of the previous frame, which may be that the encoding mode of the voice or audio signal of the current frame is the time domain coding mode, and the previous one
  • the coding mode of the voice or audio signal of the frame is a time-frequency joint coding method or a frequency domain coding mode
  • the coding mode of the current frame of the voice or audio signal is a time-frequency joint coding mode or a frequency domain coding mode
  • the encoding method of the speech or audio signal of the frame is the time domain coding mode.
  • the decoded signal of the previous frame is non-friction, and the ratio of the mean value of the energy or amplitude of the mth band in the decoded signal of the current frame to the mean of the energy or amplitude of the nth band in the decoded signal of the previous frame is In the preset threshold range, the preset threshold range may be set according to an actual situation, which is not limited by the embodiment of the present invention. If the decoded signal of the current frame and the decoded signal of the previous frame All of them are voice signals, and both are voiced or unvoiced, and the preset threshold range can be appropriately expanded.
  • the mean value of the energy or amplitude of the mth frequency band in the decoded signal of the current frame may be that the mth frequency band is selected from the decoded signals of the current frame according to a predefined rule or an actual situation, and the frequency band is determined. The average of the energy or amplitude.
  • the mean value of the energy or amplitude of the mth frequency band in the decoded signal of the current frame may be stored, and in the next frame, the mean value of the energy or amplitude of the mth frequency band in the decoded signal of the stored current frame may be directly obtained. .
  • the average of the energy or amplitude of the nth frequency band in the decoded signal of the previous frame has been stored in the previous frame. At this time, the average of the energy or amplitude of the nth frequency band in the decoded signal of the stored previous frame can be directly obtained. If the encoding mode of the speech or audio signal of the current frame is different from the encoding mode of the speech or audio signal of the previous frame, the mth frequency band of the decoded signal of the current frame may be different from the nth of the decoded signals of the previous frame. frequency band.
  • the manner of determining the mean value of the energy or amplitude of the j-th frequency band in the decoded signal of the current frame may refer to the manner of determining the mean value of the energy or amplitude of the m-th frequency band.
  • the manner of determining the mean value of the energy or amplitude of the k-th frequency band in the decoded signal of the previous frame can be determined by referring to the method of determining the mean value of the energy or amplitude of the n-th frequency band. In order to avoid repetition, it will not be described here.
  • the signal decoding apparatus may weight the spectrum envelope of the second spectrum envelope and the extended frequency band of the previous frame to determine a spectrum envelope of the extended frequency band of the current frame.
  • the band envelope of the extended band of the current frame may be the second spectral envelope.
  • the signal decoding device may convert the frequency domain signal of the extended frequency band into the extended frequency band.
  • the one-time domain signal combines the decoded signal with the first time domain signal of the extended frequency band to obtain an output signal.
  • the signal decoding device may acquire the second time domain signal of the extended frequency band according to the time domain band extension manner.
  • the frequency domain signal of the extended frequency band can be converted into a third time domain signal of the extended frequency band.
  • the second time domain signal of the extended frequency band and the third time domain signal of the extended frequency band may be combined to obtain a final time domain signal of the extended frequency band.
  • the decoded signal can be combined with the final time domain signal of the extended frequency band to obtain an output signal.
  • the signal cancels the time domain signal.
  • the decoded signal can then be combined with the final time domain signal of the extended band to obtain the final output signal.
  • the specific process of the time domain band extension mode can be referred to the prior art. To avoid repetition, details are not described herein again.
  • the present invention by separately predicting the spectral envelope and the excitation signal of the extended frequency band according to the decoded signal obtained from the bit stream of the voice or audio signal, it is possible to determine the frequency domain signal of the extended frequency band of the voice or audio signal, thereby enabling Improve the performance of voice or audio signals.
  • a signal decoding method includes:
  • an excitation signal of the extended frequency band is predicted according to the decoded signal, wherein the extended frequency band is adjacent to a frequency band of the decoded signal, and a frequency band of the decoded signal is lower than the extended frequency band;
  • the difference between this embodiment and the previous embodiment is that the first frequency band and the second frequency band are selected differently.
  • the selected first frequency band is adjacent to the extended frequency band
  • the second frequency band is adjacent to the first frequency band; the adjacent here indicates continuous or no frequency point interval between the two frequency bands.
  • the signal decoding device can The first frequency band and the second frequency band are sequentially selected in the frequency band of the decoded signal from the starting point of the extended frequency band to the low frequency direction.
  • the first frequency band may be 4.8 kHz to 6.4 kHz
  • the second frequency band may be 3.2 kHz to 4.8 kHz.
  • the first frequency band may be 4 kHz to 6.4 kHz
  • the second frequency band may be 3.2 kHz to 4 kHz.
  • the first frequency band and the second frequency band may be selected according to actual conditions, which is not limited by the embodiment of the present invention.
  • the specific implementations and embodiments involved in the steps other than the selection of the first frequency band and the second frequency band in the previous embodiment are applicable to the corresponding steps in this embodiment.
  • the embodiments of the present invention are described in detail below with reference to specific examples. It should be noted that these examples are intended to assist those skilled in the art to better understand the embodiments of the present invention and not to limit the scope of the embodiments of the present invention.
  • 2 is a schematic flow chart of a process of a signal decoding method according to an embodiment of the present invention. In Fig. 2, it is assumed that the sampling rate of the speech or audio signal is 12.8 kHz.
  • the signal decoding device determines a coding manner of the voice or audio signal.
  • the signal decoding device determines that the encoding mode of the voice or the audio signal is not the time domain coding mode, for example, the coding mode of the voice or audio signal is a time domain joint coding mode or a frequency domain coding mode, and the signal decoding device may be used.
  • the corresponding decoding method decodes the bit stream of the speech or audio signal to obtain a decoded signal. Since the sampling rate of the speech or audio signal is 12.8 kHz, the decoded signal has a bandwidth of 6.4 kHz.
  • a blind bandwidth extension is required to recover a signal having a frequency band of 6 kHz to 8 kHz, that is, a signal extending from 6 kHz to 8 kHz.
  • the signal decoding apparatus can recover the frequency domain signal of the extended frequency band of 6 kHz to 8 kHz by using the frequency domain band extension mode.
  • the signal decoding device selects the first frequency band and the second frequency band from the decoded signals in step 202, and predicts a spectral envelope of the extended frequency band according to the spectral coefficients of the first frequency band and the spectral coefficients of the second frequency band.
  • the signal decoding device may select the first frequency band and the second frequency band in the decoded signal according to a direction from a starting point of the extended frequency band to a low frequency, wherein the first frequency band is adjacent to the extended frequency band, and the first frequency band and the second frequency band are Adjacent.
  • the first frequency band can be selected from the frequency band of the decoded signal. Assuming that the first frequency band is 4.8 kHz to 6.4 kHz, the first frequency band can be divided into two sub-bands, and the first sub-band is 4.8 kHz to 5.6 kHz.
  • the two sub-bands are 5.6 kHz to 6.4 kHz.
  • the signal decoding device can determine the mean enerl of the energy of the first sub-band according to the spectral coefficient of the first sub-band.
  • enerl' can represent the adjustment value of the energy of the first subband
  • ener2' can represent the second subband.
  • the adjustment value of the energy of the first sub-band and the second sub-band are determined. The adjustment value of the energy.
  • the first sub-band can also be determined according to whether the mean value of the energy of the first sub-band and the variance of the mean value of the energy of the second sub-band are within a threshold range.
  • the adjustment of the energy and the adjustment of the energy of the second sub-band For the value, the determining process may refer to the above process according to the ratio determination, and details are not described herein. Therefore, according to enerl' and ener2', the first spectrum envelope of the extended frequency band is determined, and the first spectrum envelope is a spectrum packet for the extended frequency band.
  • the first spectral envelope includes two spectral envelope values wenv[l]' and wenv[2]' precede
  • the second frequency band can be selected from the frequency band of the decoded signal, assuming that the second frequency band is 3.2 kHz to 4.8 kHz.
  • the signal decoding device may determine the mean value of the energy of the second frequency band according to the spectral coefficient of the second frequency band.
  • the signal decoding device may determine the second frequency-dive of the extended frequency band according to enerL and wenv[l]' and wenv[2]'.
  • envelope, the second frequency-latent envelope includes two frequency-potential envelope values, namely wenv[l]" and wenv[2]".
  • wenerL >k * [( W env[l]'+wenv[2]')/2] , where the value of k can be predefined, then we can [wen][l]' and wenv[2] 'Zooming to determine the two spectral envelope values of the extended band wenv[l] and wenv[2].
  • wenv[l ]" and wenv[2]" we can determine wenv[l ]" and wenv[2]":
  • wenv[ 1 ] " p*wenv[ 1 ] '
  • wenv[2]" p*wenv[ 2]'
  • p enerL /[( wenv [ ] '+ W env[2] ')12].
  • wenv[ 1 ] " p*wenv[ 1 ] '
  • wenv[2]" p*wenv[2]'
  • p [(wenv[ 1 ] '+wenv[2] ')/2]/ VenerL.
  • the above process of predicting wenk[l]" and wenv[2]" may also be as follows: In the above step (1), the signal decoding apparatus may further determine the first one according to the spectral coefficient of the first subband described above.
  • ampl' may represent an adjustment value of the amplitude of the first sub-band
  • amp2' may represent an adjustment value of the amplitude of the second sub-band.
  • amp2 can be scaled, the example mouth amp2 -amp2 * (2 * amp 1 /amp2), amp 1 can be kept no more, ie amp 1 -amp 1. It should be noted that although the ratio between the mean of the amplitude of the first sub-band and the mean of the amplitude of the second sub-band is within the threshold range, the adjustment value and the energy of the first sub-band are determined. The adjustment value of the energy of the two sub-bands.
  • the adjustment value of the amplitude of the first sub-band may be determined according to whether the mean value of the amplitude of the first sub-band and the variance of the mean value of the amplitude of the second sub-band are within a threshold range.
  • the adjustment value of the amplitude of the second sub-band, the determination process can refer to the above-mentioned process determined according to the ratio, and will not be described here.
  • a first spectral envelope of the extended frequency band is determined, the first spectral envelope is a preliminary prediction of the spectral envelope of the extended frequency band, and the first spectral envelope includes two spectral envelope values wenv [ l]' and wenv[2]' flame
  • wenv[ 1 ] ' and wenv[2] ' wenv[ 1 ] -amp 1 '
  • wenv[2] ' amp2 ' as follows. Determine wenv[l]' and wenv[2]' as follows:
  • the signal decoding apparatus may further determine the mean value am pL of the amplitude of the second frequency band according to the spectral coefficient of the second frequency band.
  • the signal decoding device can determine wenv[l]" and wenv[2]" according to apmL and wenv[l]' and wenv[2]', for example, if ampL>k* [(wenv[l]'+wenv[2 ]')/2] , where the value of k can be pre-defined, then weng[l]' and wenv[2]' can be scaled to determine the two spectral envelope values of the extended band, Wenv[l] And wenv[2].
  • the signal decoding device can determine whether the preset condition is satisfied. In the case where it is determined that the preset condition is satisfied, the above wenv[ l ]" and wenv[2]" are weighted with the spectral envelope of the spread spectrum of the previous frame to determine wenv[1] and wenv[2].
  • the preset condition may include at least one of the following:
  • the coding mode of the voice or audio signal here is the time-frequency joint coding mode or the frequency domain coding mode
  • the coding mode of the voice or audio signal of the previous frame may be the time domain coding mode
  • the decoded signal of the previous frame is non-friction, and the mean of the energy or amplitude of the mth band in the decoded signal of the current frame is equal to the mean of the energy or amplitude of the nth band of the decoded signal of the previous frame.
  • the ratio is within a preset threshold range, where m and n are positive integers.
  • the preset threshold range can be set according to the actual situation.
  • the preset threshold range can be (0.5, 2). If the decoded signal of the current frame and the decoded signal of the previous frame are both voice signals, and both are voiced or unvoiced, the preset threshold range can be appropriately expanded. For example, you can expand the preset threshold range to (0.4, 2.5).
  • the average value of the energy or amplitude of the mth frequency band in the decoded signal of the current frame may be that the mth frequency band is selected from the decoded signals of the current frame according to a predefined rule or an actual situation, and the frequency band is determined. The average of the energy or amplitude.
  • the mean value of the energy or amplitude of the mth frequency band in the decoded signal of the current frame may be stored, and in the next frame, the mean value of the energy or amplitude of the mth frequency band in the decoded signal of the stored current frame may be directly obtained. .
  • the average of the energy or amplitude of the nth frequency band in the decoded signal of the previous frame has been stored in the previous frame. At this time, the average of the energy or amplitude of the nth frequency band in the decoded signal of the stored previous frame can be directly obtained. If the encoding mode of the speech or audio signal of the current frame is different from the encoding mode of the speech or audio signal of the previous frame, the current frame The mth frequency band of the decoded signal may be different from the nth frequency band of the decoded signal of the previous frame.
  • the encoding mode of the voice or audio signal of the current frame is a time-frequency joint coding mode or a frequency domain coding mode
  • a frequency band of 2 kHz to 6 kHz may be selected from the decoded signals of the current frame to determine the mean value of the energy or amplitude of the frequency band.
  • the encoding mode of the speech or audio signal of the previous frame is the time domain encoding mode
  • the average of the energy or amplitude of the frequency band of 4 kHz to 6 kHz in the decoded signal of the previous frame can be determined.
  • the decoded signal of the current frame is non-friction, and the ratio between the second spectral envelope of the extended frequency band of the current frame and the spectral envelope of the extended frequency band of the previous frame is greater than the j-th frequency band of the decoded signal of the current frame.
  • the mean value of the energy or amplitude of the j-th frequency band in the decoded signal of the current frame can be determined by referring to the determination of the mean value of the energy or amplitude of the m-th frequency band in the condition (b).
  • the manner of determining the mean value of the energy or amplitude of the kth frequency band in the decoded signal of the previous frame can be determined by referring to the method of determining the mean value of the energy or amplitude of the nth frequency band in condition (b). If the encoding of the speech or audio signal of the current frame is different from the encoding of the speech or audio signal of the previous frame, the jth frequency band and the kth frequency band may be different.
  • the signal decoding device predicts the excitation signal of the extended frequency band according to the spectral coefficient of the decoded signal obtained in step 202. Then, the signal decoding device can select the number of allocated bits from the frequency band of the decoded signal to be greater than the preset bit number threshold and recover the better frequency band, and predict the excitation signal of the extended band according to the spectral coefficient of the frequency band. For example, an excitation signal of an extended band of 6 kHz to 8 kHz can be predicted based on a spectral coefficient of a frequency band of 2 kHz to 4 kHz.
  • the signal decoding device may select a frequency band adjacent to the extended frequency band from the frequency band of the decoded signal, and predict the excitation signal of the extended frequency band based on the spectral coefficient of the frequency band. For example, an excitation signal of an extended band of 6 kHz to 8 kHz can be predicted from the spectral coefficients of a frequency band of 4 kHz to 6 kHz.
  • the signal decoding device may determine the frequency domain signal of the extended frequency band according to the spectrum envelope predicted by step 203 and the excitation signal predicted by step 204. For example, the spectral envelope of the extended frequency band and the excitation signal of the extended frequency band may be multiplied to determine a frequency domain signal of the extended frequency band.
  • the signal decoding device combines the decoded signal obtained in step 202 with the frequency domain signal of the extended frequency band obtained in step 205 to obtain a frequency domain output signal.
  • the signal decoding device performs frequency-frequency transform on the frequency domain output signal obtained in step 206 to obtain a final output signal.
  • the signal decoding device determines that the encoding mode of the voice or audio signal is the time domain coding mode, the signal decoding device decodes the bit stream of the voice or audio signal by using a corresponding decoding manner.
  • a blind bandwidth extension is required to recover a signal having a frequency band of 6 kHz to 8 kHz, that is, an extended frequency band of 6 kHz to 8 kHz.
  • the signal decoding apparatus can recover the final time domain signal of the extended frequency band of 6 kHz to 8 kHz by using the time domain band extension method and the frequency domain band extension method.
  • the signal decoding device determines, according to the decoded signal in step 208, a first time domain signal with an extended frequency band of 6 kHz to 8 kHz, using a time domain band spreading manner.
  • time domain band extension mode can refer to the prior art. To avoid repetition, details are not described herein again.
  • the signal decoding device performs time-frequency transform on the decoded signal in step 208, and converts the decoded signal from a signal in the time domain to a signal in the frequency domain.
  • the signal decoding device determines a frequency domain signal of the extended frequency band by using a frequency domain band extension manner. For the specific process, refer to steps 203 to 205. To avoid repetition, details are not described herein.
  • the signal decoding device performs frequency-time transform on the frequency domain signal of the extended frequency band determined in step 211 to determine a second time domain signal of the extended frequency band.
  • the signal decoding device adds the first time domain signal of the extended frequency band and the second time domain signal of the extended frequency band to determine a final time domain signal of the extended frequency band. 214.
  • the signal decoding device combines the decoded signal obtained in step 208 with the frequency domain signal of the extended frequency band obtained in step 213 to determine a final output signal.
  • by separately predicting the spectral envelope and the excitation signal of the extended frequency band according to the decoded signal obtained from the bit stream of the voice or audio signal it is possible to determine the frequency domain signal of the extended frequency band of the voice or audio signal, thereby enabling Improve the performance of voice or audio signals.
  • FIG. 3 is a schematic block diagram of a signal decoding apparatus in accordance with one embodiment of the present invention.
  • An example of device 300 of Figure 3 is a decoder.
  • the device 300 includes a decoding unit 310, a prediction unit 320, and a determination unit 330.
  • the decoding unit 310 decodes the bit stream of the voice or audio signal to obtain a decoded signal.
  • the prediction unit 320 receives the decoded signal from the decoding unit 310, and predicts the excitation signal of the extended frequency band based on the decoded signal, wherein the extended frequency band is adjacent to the frequency band of the decoded signal, and the frequency band of the decoded signal is lower than the spread frequency band.
  • the prediction unit 320 further selects the first frequency band and the second frequency band from the decoded signal, and predicts the spectral envelope of the extended frequency band according to the spectral coefficient of the first frequency band and the spectral coefficient of the second frequency band, wherein the highest frequency point of the first frequency band The lowest frequency point of the extended band is less than or equal to the first value, and the lowest frequency point of the second band is less than or equal to the second value from the lowest frequency of the first band.
  • the determining unit 330 receives the spectrum envelope of the extended band and the excitation signal of the extended band from the prediction unit 320, and determines the frequency domain signal of the extended band based on the spectral envelope of the extended band and the excitation signal of the extended band.
  • the present invention by separately predicting the spectral envelope and the excitation signal of the extended frequency band according to the decoded signal obtained from the bit stream of the voice or audio signal, it is possible to determine the frequency domain signal of the extended frequency band of the voice or audio signal, thereby enabling Improve the performance of voice or audio signals.
  • the prediction unit 320 may select the first frequency band and the second frequency band in the decoded signal according to a direction from a starting point of the extended frequency band to a low frequency, where the highest frequency point of the first frequency band is away from the extended frequency band.
  • the lowest frequency point is equal to the first value, and the first value is 0; the lowest frequency point of the second frequency band is equal to the second value from the lowest frequency point of the first frequency band, and the second value is 0.
  • the prediction unit 320 may divide the first frequency band into M sub-bands, and determine an average value of energy or amplitude of each sub-band according to a spectral coefficient of the first frequency band, where M is a positive integer Determining an adjustment value of the energy or amplitude of each sub-band according to the mean value of the energy or amplitude of each sub-band; predicting the first spectral envelope of the extended frequency band according to the adjusted value of the energy or amplitude of each sub-band; The spectral coefficient of the frequency band determines the mean of the energy or amplitude of the second frequency band; predicts the spectral envelope of the extended frequency band based on the first spectral envelope of the extended frequency band and the mean of the energy or amplitude of the second frequency band.
  • the prediction unit 320 may perform the mean value of the energy or amplitude of each of the a subbands.
  • the mean value of the energy or amplitude of each subband in a subband is greater than or equal to the mean threshold, and the mean of the energy or amplitude of each subband in the b subbands is less than the mean threshold
  • a and b are positive integers
  • a+b Mschreib
  • prediction unit 320 may use the mean of the energy or amplitude of each subband as an adjustment of the energy or amplitude of each subband.
  • the prediction unit 320 may be when the mean of the energy or amplitude of the i-th sub-band is greater than the mean of the energy or amplitude of the (i+1)th sub-band And adjusting an average value of the energy or amplitude of the i-th sub-band to determine an adjustment value of the energy or amplitude of the i-th sub-band, and using the mean value of the energy or amplitude of the (i+1)th sub-band as the first (i) +1) the adjusted value of the energy or amplitude of the subbands; when the mean of the energy or amplitude of the i-th subband is less than the mean of the energy or amplitude of the (i+1)th subband
  • the prediction unit 320 may convert the energy of the i-th sub-band Or the mean value of the amplitude is used as the adjustment value of the energy or amplitude of the i-th sub-band, and the mean value of the energy or amplitude of the (i+1)th sub-band is used as the adjustment value of the (i+1)th sub-band, where i is positive Integer and l ⁇ i ⁇ Ml.
  • the prediction unit 320 may determine the extended frequency band of the current frame according to the first spectrum envelope of the extended frequency band of the current frame and the average of the energy or amplitude of the second frequency band of the current frame. a second spectral envelope; wherein the second spectral envelope of the extended frequency band of the current frame is weighted with the spectral envelope of the extended frequency band of the previous frame to determine the extended frequency band of the current frame, if the predetermined condition is met
  • the spectral envelope of the extended frequency band of the current frame is used as the spectral envelope of the extended frequency band of the current frame in the case where it is determined that the preset condition is not satisfied.
  • the prediction unit 320 may determine the second spectrum of the extended frequency band of the current frame according to the first spectrum envelope of the extended frequency band of the current frame and the average of the energy or amplitude of the second frequency band of the current frame. Envelope; weighting the second spectral envelope of the extended frequency band of the current frame and the spectral envelope of the extended frequency band of the previous frame to determine the third spectrum of the extended frequency band of the current frame, if it is determined that the preset condition is met Envelope; determining, in the case that the preset condition is not satisfied, the second spectrum envelope of the extended frequency band of the current frame as the third spectrum envelope of the extended frequency band of the current frame; according to the pitch period of the decoded signal, the voiced sound of the decoded signal The degree factor and the third spectral envelope of the extended band of the current frame determine the spectral envelope of the extended band of the current frame.
  • the foregoing preset condition may include at least one of the following three conditions: Condition 1: The coding mode of the voice or audio signal of the current frame and the coding mode of the voice or audio signal of the previous frame Different; condition 2: the decoded signal of the previous frame is non-friction, and the average of the energy or amplitude of the mth band in the decoded signal of the current frame and the energy or amplitude of the nth band of the decoded signal of the previous frame The ratio between the mean values is within a preset threshold range, where m and n are positive integers; Condition 3: the decoded signal of the current frame is non-frictional, and the second spectral envelope of the extended band of the current frame is the same as the previous frame The ratio between the spectral envelopes of the extended frequency bands is greater than the ratio of the mean of the energy or amplitude of the jth frequency band of the decoded signal of the current frame to the mean of the energy or amplitude of the k
  • the prediction unit 320 may select a third frequency band from the decoded signal when the encoding mode of the voice or the audio signal is the time domain coding mode, where the third frequency band is adjacent to the extended frequency band; The spectral coefficient of the third frequency band predicts the excitation signal of the extended frequency band.
  • the prediction unit 320 may select the fourth frequency band from the decoded signal, where the coding mode of the voice or audio signal is a time-frequency joint coding mode or a frequency domain coding mode. The number of allocated bits is greater than a preset number of bits threshold; and the excitation signal of the extended frequency band is predicted according to the spectral coefficients of the fourth frequency band.
  • the spectral envelope and the excitation signal of the extended frequency band are separately predicted, so that the frequency domain signal of the extended frequency band of the voice or audio signal can be determined, and thus the performance of the voice or audio signal can be improved.
  • FIG. 4 is a schematic block diagram of a signal decoding apparatus according to another embodiment of the present invention.
  • An example of the device 400 of Figure 4 is a decoder.
  • the device 400 includes a first synthesizing unit 340 and a first transform unit 350 in addition to the decoding unit 310, the predicting unit 320, and the determining unit 330.
  • the first synthesizing unit 340 may synthesize the decoded signal and the frequency domain signal of the extended frequency band to obtain the frequency domain output signal when the encoding mode of the speech or audio signal is the time-frequency joint coding method or the frequency domain coding mode.
  • the first transform unit 350 may perform frequency-frequency transform on the frequency domain output signal to obtain a final output signal.
  • the present invention by separately predicting the spectral envelope and the excitation signal of the extended frequency band according to the decoded signal obtained from the bit stream of the voice or audio signal, it is possible to determine the frequency domain signal of the extended frequency band of the voice or audio signal, thereby enabling Improve the performance of voice or audio signals.
  • FIG. 5 is a schematic block diagram of a signal decoding apparatus according to another embodiment of the present invention.
  • An example of device 500 of Figure 5 is a decoder.
  • the device 500 includes an acquisition unit 360, a second conversion unit 370, and a second synthesis unit 380 in addition to the decoding unit 310, the prediction unit 320, and the determination unit 330.
  • the obtaining unit 360 may acquire the first time domain signal of the extended frequency band according to the time domain band spreading manner in a case where the encoding mode of the voice or audio signal is the time domain coding mode.
  • the second transform unit 370 can convert the frequency domain signal of the extended frequency band into the second time domain signal of the extended frequency band.
  • the second synthesizing unit 380 may synthesize the first time domain signal of the extended frequency band and the second time domain signal of the extended frequency band to obtain a final time domain signal of the extended frequency band.
  • the second synthesizing unit 380 can also synthesize the decoded signal with the final time domain signal of the extended band to obtain an output signal.
  • FIG. 6 is a schematic block diagram of a signal decoding apparatus in accordance with one embodiment of the present invention.
  • An example of the device 600 of Figure 6 is a decoder.
  • Apparatus 600 includes a processor 610 and a memory 620.
  • Memory 620 can include random access memory, flash memory, read only memory, programmable read only memory, nonvolatile memory or registers, and the like.
  • the processor 620 can be a Central Processing Unit (CPU).
  • CPU Central Processing Unit
  • Memory 610 is used to store executable instructions.
  • the processor 620 can execute executable instructions stored in the memory 610, for: decoding a bit stream of the voice or audio signal to obtain a decoded signal; predicting an excitation signal of the extended frequency band according to the decoded signal, where the extended frequency band and the decoded signal are The frequency bands are adjacent, and the frequency band of the decoded signal is lower than the extended frequency band; the first frequency band and the second frequency band are selected in the decoded signal, and the spectral envelope of the extended frequency band is predicted according to the spectral coefficient of the first frequency band and the spectral coefficient of the second frequency band, The lowest frequency point of the highest frequency point of the first frequency band is less than or equal to the first value, and the lowest frequency point of the second frequency band is less than or equal to the second value of the first frequency band; The spectral envelope and the excitation signal of the extended frequency band determine the frequency domain signal of the extended frequency band.
  • the processor 610 may select the first frequency band and the second frequency band in the decoded signal according to a direction from a starting point of the extended frequency band to a low frequency, where the highest frequency point of the first frequency band is away from the extended frequency band. The lowest frequency point is equal to the first value, and the first value is 0; the lowest frequency point of the second frequency band is equal to the second value from the lowest frequency point of the first frequency band, and the second value is 0.
  • the processor 610 may divide the first frequency band into M sub-bands, and determine an average value of energy or amplitude of each sub-band according to a spectral coefficient of the first frequency band, where M is a positive integer; An average of the energy or amplitude of each subband, determining an adjustment value of the energy or amplitude of each subband; predicting a first spectral envelope of the extended frequency band according to an adjustment value of the energy or amplitude of each subband; The spectral coefficient of the frequency band determines the mean of the energy or amplitude of the second frequency band; predicts the spectral envelope of the extended frequency band based on the first spectral envelope of the extended frequency band and the mean of the energy or amplitude of the second frequency band.
  • the processor 610 may perform the average of the energy or amplitude of each of the subbands of the a subbands.
  • the processor 610 can use the mean of the energy or amplitude of each subband as an adjustment of the energy or amplitude of each subband.
  • the processor 610 may be when the mean of the energy or amplitude of the i-th sub-band is greater than the mean of the energy or amplitude of the (i+1)th sub-band And adjusting an average value of the energy or amplitude of the i-th sub-band to determine an adjustment value of the energy or amplitude of the i-th sub-band, and using the mean value of the energy or amplitude of the (i+1)th sub-band as the first (i) +1) the adjusted value of the energy or amplitude of the subbands; when the mean of the energy or amplitude of the i-th subband is less than the mean of the energy or amplitude of the (i+1)th
  • the processor 610 may energy the i-th sub-band Or the mean value of the amplitude as the adjustment value of the energy or amplitude of the i-th sub-band, and the mean value of the energy or amplitude of the (i+1)th sub-band is used as the adjustment value of the (i+1)th sub-band, where i is positive Integer and l ⁇ i ⁇ Ml.
  • the processor 610 may determine, according to the first spectrum envelope of the extended frequency band of the current frame and the average of the energy or amplitude of the second frequency band of the current frame, the second spectrum of the extended frequency band of the current frame. Envelope; weighting the second spectral envelope of the extended frequency band of the current frame and the spectral envelope of the extended frequency band of the previous frame to determine the spectral envelope of the extended frequency band of the current frame, if it is determined that the preset condition is met The second spectrum of the extended band of the current frame in the case where it is determined that the preset condition is not satisfied The envelope is the spectral envelope of the extended band of the current frame.
  • the processor 610 may determine, according to the first spectrum envelope of the extended frequency band of the current frame and the average of the energy or amplitude of the second frequency band of the current frame, the second spectrum of the extended frequency band of the current frame. Envelope; weighting the second spectral envelope of the extended frequency band of the current frame and the spectral envelope of the extended frequency band of the previous frame to determine the third spectrum of the extended frequency band of the current frame, if it is determined that the preset condition is met Envelope; determining, in the case that the preset condition is not satisfied, the second spectrum envelope of the extended frequency band of the current frame as the third spectrum envelope of the extended frequency band of the current frame; according to the pitch period of the decoded signal, the voiced sound of the decoded signal The degree factor and the third spectral envelope of the extended band of the current frame determine the spectral envelope of the extended band of the current frame.
  • the foregoing preset condition may include at least one of the following three conditions: Condition 1: The coding mode of the voice or audio signal of the current frame and the coding mode of the voice or audio signal of the previous frame Different; condition 2: the decoded signal of the previous frame is non-friction, and the average of the energy or amplitude of the mth band in the decoded signal of the current frame and the energy or amplitude of the nth band of the decoded signal of the previous frame The ratio between the mean values is within a preset threshold range, where m and n are positive integers; Condition 3: the decoded signal of the current frame is non-frictional, and the second spectral envelope of the extended band of the current frame is the same as the previous frame The ratio between the spectral envelopes of the extended frequency bands is greater than the ratio of the mean of the energy or amplitude of the jth frequency band of the decoded signal of the current frame to the mean of the energy or amplitude of the k
  • the processor 610 may select a third frequency band from the decoded signal, where the third frequency band is adjacent to the extended frequency band, if the encoding mode of the voice or audio signal is a time domain coding mode. And predicting the excitation signal of the extended frequency band according to the spectral coefficient of the third frequency band.
  • the processor 610 may select the fourth frequency band from the decoded signal, where the coding mode of the voice or audio signal is a time-frequency joint coding mode or a frequency domain coding mode. The number of allocated bits is greater than a preset number of bits threshold; and the excitation signal of the extended frequency band is predicted according to the spectral coefficients of the fourth frequency band.
  • the processor 610 may further combine the decoded signal with the frequency domain signal of the extended frequency band when the coding mode of the voice or audio signal is a time-frequency joint coding method or a frequency domain coding mode. Obtaining a frequency domain output signal; performing frequency-frequency conversion on the frequency domain output signal to obtain a final output signal.
  • the processor 610 may further acquire the first time domain signal of the extended frequency band according to the time domain band extension manner, if the coding mode of the voice or audio signal is the time domain coding mode; Converting the frequency domain signal of the extended frequency band into a second time domain signal of the extended frequency band; synthesizing the first time domain signal of the extended frequency band and the second time domain signal of the extended frequency band to obtain a final time domain signal of the extended frequency band; Synthesize with the final time domain signal of the extended band to obtain the final output signal.
  • the memory 620 can store data information generated in the process performed by the processor 610 described above.
  • the processor 610 can read the data information from the memory 620.
  • the present invention by separately predicting the spectral envelope and the excitation signal of the extended frequency band according to the decoded signal obtained from the bit stream of the voice or audio signal, it is possible to determine the frequency domain signal of the extended frequency band of the voice or audio signal, thereby enabling Improve the performance of voice or audio signals.
  • FIG. 7 is a schematic flowchart of a signal encoding method according to an embodiment of the present invention.
  • the method of Figure ⁇ is performed by the encoder, such as a signal encoding device.
  • the signal encoding device divides the input signal into two parts, a low frequency band signal and an extended band signal, the core layer processes the low band signal, and the extension layer processes the extended band signal.
  • the signal coding method includes:
  • the first envelope of the extended frequency band may be the original envelope of the extended frequency band.
  • the first envelope may be a frequency domain envelope or a time domain envelope.
  • the encoding end may further correct the first envelope of the extended frequency band according to a signal to noise ratio of the voice or audio signal and a pitch period of the voice or audio signal, so that the second envelope of the extended frequency band is inversely proportional to the signal to noise ratio. It is proportional to the pitch period to determine the second envelope of the extended band.
  • the encoder can determine the second envelope wenv2 of the extended band according to the following equation:
  • wen2 (al *pitch*pitch+b 1 *pitch+c 1 )/(a2 * snr* snr+b2 * snr+c2) * wenv 1 , where wenvl can represent the first envelope of the extended band, pitch can Represents the pitch period of a speech or audio signal. Snr can represent the signal-to-noise ratio of a speech or audio signal. Al and bl cannot be 0 at the same time. A2, b2, and c2 cannot be 0 at the same time.
  • the quantization index of the second envelope is written to the extended layer code stream.
  • the extension layer code stream may also include quantization indices of other related parameters.
  • Embodiments of the present invention can be applied to the case where the extension band has bits.
  • the second envelope of the extended frequency band is determined by determining a first envelope of the extended frequency band and according to a signal to noise ratio of the voice or audio signal, a pitch period of the voice or audio signal, and a first envelope of the extended frequency band. And enabling the decoding end to determine the signal of the extended frequency band according to the core layer code stream and the second envelope of the extended frequency band, thereby improving the performance of the voice or audio signal.
  • FIG. 8 is a schematic flowchart of a signal decoding method according to an embodiment of the present invention. The method of Figure 8 is performed by a decoder, such as a signal decoding device.
  • a decoder such as a signal decoding device.
  • the extended layer code stream Decodes the extended layer code stream to determine a second envelope of the extended frequency band, where the second envelope is a signal to noise ratio of the voice or audio signal, a pitch period and an extended frequency band of the voice or audio signal. An envelope is determined.
  • the first envelope of the extended frequency band may be the original envelope of the extended frequency band.
  • the first envelope can be either a time domain envelope or a frequency domain envelope.
  • the receiving end can enable the decoding end according to the second envelope of the extended frequency band determined by the encoding end according to the signal to noise ratio of the voice or audio signal, the pitch period of the voice or audio signal, and the first envelope of the extended frequency band.
  • the second envelope of the extended band and the excitation signal of the extended band predict the signal of the extended band, thereby enabling the performance of the voice or audio signal to be improved.
  • the device 900 of Figure 9 An example is the encoder.
  • the device 900 includes an encoding unit 910, a first determining unit 920, a second determining unit 930, and a transmitting unit 940.
  • the coding unit 910 performs core layer coding on the voice or audio signal to obtain a core layer code stream of the voice or audio signal.
  • the first determining unit 920 performs an enhancement layer process on the voice or audio signal to determine a first envelope of the extended frequency band.
  • the second determining unit 930 determines the second envelope of the extended band based on the signal to noise ratio of the speech or audio signal, the pitch period of the speech or audio signal, and the first envelope of the extended band.
  • the encoding unit 910 also encodes the second envelope to obtain an extended layer code stream.
  • the transmitting unit 940 transmits the core layer code stream and the extension layer code stream to the decoding end.
  • the second envelope of the extended frequency band is determined by determining a first envelope of the extended frequency band and according to a signal to noise ratio of the voice or audio signal, a pitch period of the voice or audio signal, and a first envelope of the extended frequency band. And enabling the decoding end to determine the signal of the extended frequency band according to the core layer code stream and the second envelope of the extended frequency band, thereby improving the performance of the voice or audio signal.
  • FIG. 10 is a schematic block diagram of a signal decoding apparatus according to an embodiment of the present invention.
  • An example of device 1000 of Figure 10 is a decoder.
  • the device 1000 includes a receiving unit 1010, a decoding unit 1020, and a prediction unit 1030.
  • the receiving unit 1010 receives the core layer code stream and the extension layer code stream of the voice or audio signal from the encoding side.
  • the decoding unit 1020 decodes the extended layer code stream to determine a second envelope of the extended frequency band, where the second envelope is a signal edge to noise ratio of the voice or audio signal, a pitch period and an extended frequency band of the voice or audio signal at the encoding end. An envelope is determined.
  • Decoding unit 1020 also decodes the core layer code stream to obtain a core layer speech or audio signal.
  • Prediction unit 1030 predicts the excitation signal for the extended frequency band based on the core layer speech or audio signal.
  • the prediction unit 1030 predicts the signal of the extended band based on the excitation signal of the extended band and the second envelope of the extended band.
  • the second envelope of the extended frequency band determined by the encoding end according to the signal to noise ratio of the voice or audio signal, the pitch period of the voice or audio signal, and the first envelope of the extended frequency band is made.
  • the decoding end is capable of predicting the signal of the extended frequency band based on the excitation signal of the second envelope of the extended frequency band and the extended frequency band, thereby being capable of improving the performance of the voice or audio signal.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including Several instructions to make a computer device (can be a personal computer, server, or network device) Etc.) Perform all or part of the steps of the method of the various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Quality & Reliability (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明实施例提供了信号解码方法及设备。该方法包括:对语音或音频信号的比特流进行解码,获取解码信号;根据解码信号预测扩展频带的激励信号,其中扩展频带与解码信号的频带相邻且解码信号的频带低于扩展频带;在解码信号中选取第一频带和第二频带,根据第一频带的频谱系数以及第二频带的频谱系数预测扩展频带的频谱包络;根据扩展频带的频谱包络和扩展频带的激励信号,确定扩展频带的频域信号。本发明实施例中,通过根据从语音或音频信号的比特流中得到的解码信号分别预测扩展频带的频谱包络和激励信号,从而能够确定语音或音频信号的扩展频带的频域信号,因此能够提升语音或音频信号的性能。

Description

信号解码方法及设备
本申请要求于 2013年 5月 31日提交中国专利局、申请号为 201310213593.5 , 发明名称为 "信号解码方法及设备" 的中国专利申请优先权, 上述专利的全部 内容通过引用结合在本申请中。 技术领域
本发明涉及信息技术领域, 并且具体地, 涉及信号解码方法及设备。
背景技术
目前的通信传输越来越重视语音或音频的质量, 因此语音或音频信号的编 解码也成为语音或音频信号处理中越来越重要的环节。
编码端在信号编码的过程中, 为了提高编码效率, 往往希望用尽量少的编 码比特来表征要传输的信号。 例如, 在低速率编码时, 编码端常常不会对所有 频带进行编码。 考虑到人耳对语音或音频信号中的低频部分比对高频部分更加 敏感的特点, 通常在低频部分分配较多的比特进行编码, 在高频部分只分配少 的比特进行编码, 有些情况下甚至不对高频部分进行编码。 因此, 在解码端进 行解码时需要通过盲带宽扩展技术来恢复未编码的频带。
目前, 解码端常采用时域频带扩展方式恢复未编码的频带, 但是这种方式 对语音信号的扩展效果很差, 而且不能处理音频信号, 因此导致输出的语音或 音频信号的性能很差。
发明内容 本发明实施例提供信号解码方法及设备, 能够提升语音或音频信号的性能。 第一方面, 提供了一种信号解码方法, 包括: 对语音或音频信号的比特流 进行解码, 获取解码信号; 根据所述解码信号预测扩展频带的激励信号, 其中, 所述扩展频带与所述解码信号的频带相邻, 且所述解码信号的频带低于所述扩 展频带; 在所述解码信号中选取第一频带和第二频带, 根据所述第一频带的频 谱系数以及所述第二频带的频谱系数预测所述扩展频带的频谱包络, 其中, 所 述第一频带的最高频点距离所述扩展频带的最低频点小于或等于第一值, 所述 第二频带的最高频点距离所述第一频带的最低频点小于或等于第二值; 根据所 述扩展频带的频谱包络和所述扩展频带的激励信号, 确定所述扩展频带的频域 信号。
结合第一方面, 在第一种可能的实现方式中, 所述在所述解码信号中选取 第一频带和第二频带, 包括: 按照从所述扩展频带的起始点向低频的方向, 在 所述解码信号的频带中选取第一频带和第二频带, 其中所述第一频带的最高频 点距离所述扩展频带的最低频点等于所述第一值, 所述第一值为 0; 所述第二频 带的最高频点距离所述第一频带的最低频点等于所述第二值, 所述第二值为 0。
结合第一方面或第一方面的第一种可能的实现方式, 在第二种可能的实现 方式中, 所述根据所述第一频带的频谱系数以及所述第二频带的频谱系数预测 所述扩展频带的频谱包络, 包括: 将所述第一频带划分为 M个子带, 并根据所 述第一频带的频谱系数确定每个子带的能量或幅度的均值, 其中 M为正整数; 根据所述每个子带的能量或幅度的均值, 确定所述每个子带的能量或幅度的调 整值; 根据所述每个子带的能量或幅度的调整值, 预测所述扩展频带的第一频 谱包络; 根据所述第二频带的频谱系数, 确定所述第二频带的能量或幅度的均 值; 根据所述扩展频带的第一频谱包络以及所述第二频带的能量或幅度的均值, 预测所述扩展频带的频谱包络。 结合第一方面的第二种可能的实现方式, 在第三种可能的实现方式中, 所 述根据所述每个子带的能量或幅度的均值, 确定所述每个子带的能量或幅度的 调整值, 包括: 如果所述 M个子带的能量或幅度的均值的方差不在预设的阈值 范围内, 则将 a个子带中每个子带的能量或幅度的均值进行调整以确定所述 a 个子带中每个子带的能量或幅度的调整值, 并将 b个子带中每个子带的能量或 幅度的均值作为所述 b 个子带中每个子带的能量或幅度的调整值, 其中所述 a 个子带中每个子带的能量或幅度的均值大于或等于均值阈值, 所述 b个子带中 每个子带的能量或幅度的均值小于所述均值阈值, a和 b为正整数, 且 a+b=M; 如果所述 M个子带的能量或幅度的均值的方差在预设的阈值范围内, 则将所述 每个子带的能量或幅度的均值作为所述每个子带的能量或幅度的调整值。 结合第一方面的第二种可能的实现方式, 在第四种可能的实现方式中, 所 述根据所述每个子带的能量或幅度的均值, 确定所述每个子带的能量或幅度的 调整值, 包括: 对于所述 M个子带中的第 i个子带和第 (i+1 )个子带, 如果所 述第 i个子带的能量或幅度的均值与所述第 (i+1 ) 个子带的能量或幅度的均值 之间的比值不在预设的阈值范围内,则在所述第 i个子带的能量或幅度的均值大 于所述第 (i+1 ) 个子带的能量或幅度的均值时, 对所述第 i个子带的能量或幅 度的均值进行调整以确定所述第 i 个子带的能量或幅度的调整值, 并将所述第
( i+1 )个子带的能量或幅度的均值作为所述第(i+1 )个子带的能量或幅度的调 整值; 在所述第 i个子带的能量或幅度的均值小于所述第 (i+1 ) 个子带的能量 或幅度的均值时, 对所述第 (i+1 ) 个子带的能量或幅度的均值进行调整以确定 所述第 (i+1 )个子带的能量或幅度的调整值, 并将所述第 i个子带的能量或幅 度的均值作为所述第 i个子带的能量或幅度的调整值; 如果所述第 i个子带的能 量或幅度的均值与所述第 (i+1 ) 个子带的能量或幅度的均值之间的比值在预设 的阈值范围内, 则将所述第 i个子带的能量或幅度的均值作为所述第 i个子带的 能量或幅度的调整值, 并将所述第 (i+1 )个子带的能量或幅度的均值作为所述 第 (i+1 ) 个子带的调整值, 其中 i为正整数且 l≤i≤M-l。
结合第一方面的第二种可能的实现方式或第一方面的第三种可能的实现方 式或第一方面的第四种可能的实现方式, 在第五种可能的实现方式中, 所述根 据所述扩展频带的第一频谱包络以及所述第二频带的能量或幅度的均值, 预测 所述扩展频带的频谱包络, 包括: 根据当前帧的扩展频带的第一频谱包络以及 所述当前帧的第二频带的能量或幅度的均值, 确定所述当前帧的扩展频带的第 二频谱包络; 在确定满足预设条件的情况下, 对所述当前帧的扩展频带的第二 频谱包络与前一帧的扩展频带的频谱包络进行加权, 以确定所述当前帧的扩展 频带的频谱包络; 在确定不满足预设条件的情况下, 将所述当前帧的扩展频带 的第二频谱包络作为所述当前帧的扩展频带的频谱包络。
结合第一方面的第二种可能的实现方式或第一方面的第三种可能的实现方 式或第一方面的第四种可能的实现方式, 在第六种可能的实现方式中, 所述根 据所述扩展频带的第一频谱包络以及所述第二频带的能量或幅度的均值, 预测 所述扩展频带的频谱包络, 包括: 根据当前帧的扩展频带的第一频谱包络以及 所述当前帧的第二频带的能量或幅度的均值, 确定所述当前帧的扩展频带的第 二频谱包络; 在确定满足预设条件的情况下, 对所述当前帧的扩展频带的第二 频谱包络与前一帧的扩展频带的频谱包络进行加权, 以确定所述当前帧的扩展 频带的第三频谱包络; 在确定不满足预设条件的情况下, 将所述当前帧的扩展 频带的第二频谱包络作为所述当前帧的扩展频带的第三频谱包络; 根据所述解 码信号的基音周期、 所述解码信号的浊音度因子以及所述当前帧的扩展频带的 第三频谱包络, 确定所述当前帧的扩展频带的频谱包络。
结合第一方面的第五种可能的实现方式或第一方面的第六种可能的实现方 式, 在第七种可能的实现方式中, 所述预设条件包括以下三个条件中的至少一 个: 条件一: 所述当前帧的语音或音频信号的编码方式与所述前一帧的语音或 音频信号的编码方式不相同; 条件二: 所述前一帧的解码信号为非摩擦音, 且 所述当前帧的解码信号中第 m个频带的能量或幅度的均值与所述前一帧的解码 信号中第 n个频带的能量或幅度的均值之间的比值在预设的阈值范围内,其中 m 和 n为正整数; 条件三: 所述当前帧的解码信号为非摩擦音, 且所述当前帧的 扩展频带的第二频谱包络与所述前一帧的扩展频带的频谱包络之间的比值大于 所述当前帧的解码信号中第 j 个频带的能量或幅度的均值与所述前一帧的解码 信号中第 k个频带的能量或幅度的均值之间的比值, 其中 j和 k为正整数。
结合第一方面或第一方面的第一种可能的实现方式至第一方面的第七种可 能的实现方式中任一实现方式, 在第八种可能的实现方式中, 所述根据所述解 码信号预测所述扩展频带的激励信号, 包括: 在所述语音或音频信号的编码方 式为时域编码方式的情况下, 从所述解码信号中选取第三频带, 所述第三频带 与所述扩展频带相邻; 根据所述第三频带的频谱系数, 预测所述扩展频带的激 励信号。
结合第一方面或第一方面的第一种可能的实现方式至第七种可能的实现方 式中任一实现方式, 在第九种可能的实现方式中, 所述根据所述解码信号预测 所述扩展频带的激励信号, 包括: 在所述语音或音频信号的编码方式为时频联 合编码方式或者频域编码方式的情况下, 从所述解码信号中选取第四频带, 所 述第四频带所分配的比特数目大于预设的比特数目阈值; 根据所述第四频带的 频谱系数, 预测所述扩展频带的激励信号。
结合第一方面或第一方面的第一种可能的实现方式至第九种可能的实现方 式中任一实现方式, 在第十种可能的实现方式中, 所述方法还包括: 在所述语 所述解码信号与所述扩展频带的频域信号进行合成, 获取频域输出信号; 将所 述频域输出信号进行频时变换, 获取最终输出信号。
结合第一方面或第一方面的第一种可能的实现方式至第九种可能的实现方 式中任一实现方式, 在第十一种可能的实现方式中, 所述方法还包括: 在所述 语音或音频信号的编码方式为时域编码方式的情况下, 根据时域频带扩展方式, 获取所述扩展频带的第一时域信号; 将所述扩展频带的频域信号变换为所述扩 展频带的第二时域信号; 对所述扩展频带的第一时域信号和所述扩展频带的第 二时域信号进行合成, 获取所述扩展频带的最终时域信号; 将所述解码信号与 所述扩展频带的最终时域信号进行合成 , 获取最终输出信号。
第二方面, 提供了一种信号解码设备, 包括: 解码单元, 用于对语音或音 频信号的比特流进行解码, 获取解码信号; 所述预测单元, 用于从所述解码单 元接收所述解码信号, 并根据所述解码信号预测扩展频带的激励信号, 其中, 所述扩展频带与所述解码信号的频带相邻, 且所述解码信号的频带低于所述扩 展频带; 所述预测单元, 还用于在所述解码信号中选取第一频带和第二频带, 并根据所述第一频带的频谱系数以及所述第二频带的频谱系数预测所述扩展频 带的频谱包络, 其中, 所述第一频带的最高频点距离所述扩展频带的最低频点 小于或等于第一值, 所述第二频带的最高频点距离所述第一频带的最低频点小 于或等于第二值; 所述确定单元, 用于从所述预测单元接收所述扩展频带的频 谱包络和所述扩展频带的激励信号, 根据所述扩展频带的频谱包络和所述扩展 频带的激励信号, 确定所述扩展频带的频域信号。
结合第二方面, 在第一种可能的实现方式中, 所述预测单元具体用于按照 从所述扩展频带的起始点向低频的方向, 在所述解码信号中选取第一频带和第 二频带, 其中所述第一频带的最高频点距离所述扩展频带的最低频点等于所述 第一值, 所述第一值为 0; 所述第二频带的最高频点距离所述第一频带的最低频 点等于所述第二值, 所述第二值为 0。
结合第二方面或第二方面的第一种可能的实现方式, 在第二种可能的实现 方式中, 所述预测单元具体用于将所述第一频带划分为 M个子带, 并根据所述 第一频带的频谱系数确定每个子带的能量或幅度的均值, 其中 M为正整数; 根 据每个子带的能量或幅度的均值, 确定每个子带的能量或幅度的调整值; 根据 所述每个子带的能量或幅度的调整值, 预测所述扩展频带的第一频谱包络; 根 据所述第二频带的频谱系数, 确定所述第二频带的能量或幅度的均值; 根据所 述扩展频带的第一频谱包络以及所述第二频带的能量或幅度的均值, 预测所述 扩展频带的频谱包络。
结合第二方面的第二种可能的实现方式, 在第三种可能的实现方式中, 所 述预测单元具体用于如果所述 M个子带的能量或幅度的均值的方差不在预设的 阈值范围内, 则将 a个子带中每个子带的能量或幅度的均值进行调整以确定所 述 a个子带中每个子带的能量或幅度的调整值, 并将 b个子带中每个子带的能 量或幅度的均值作为所述 b个子带中每个子带的能量或幅度的调整值, 其中所 述 a个子带中每个子带的能量或幅度的均值大于或等于均值阈值, 所述 b个子 带中每个子带的能量或幅度的均值小于所述均值阈值, a 和 b 为正整数, 且 a+b=M; 如果所述 M个子带的能量或幅度的均值的方差在预设的阈值范围内, 则将所述每个子带的能量或幅度的均值作为所述每个子带的能量或幅度的调整 值。
结合第二方面的第二种可能的实现方式, 在第四种可能的实现方式中, 所 述预测单元具体用于对于所述 M个子带中的第 i个子带和第 (i+1 )个子带, 如果所述第 i个子带的能量或幅度的均值与所述第 (i+1 )个子带的能量或 幅度的均值之间的比值不在预设的阈值范围内,则在所述第 i个子带的能量或幅 度的均值大于所述第 (i+1 )个子带的能量或幅度的均值时, 对所述第 i个子带 的能量或幅度的均值进行调整以确定所述第 i个子带的能量或幅度的调整值,并 将所述第 (i+1 )个子带的能量或幅度的均值作为所述第 (i+1 )个子带的能量或 幅度的调整值; 在所述第 i个子带的能量或幅度的均值小于所述第 (i+1 )个子 带的能量或幅度的均值时, 对所述第 (i+1 ) 个子带的能量或幅度的均值进行调 整以确定所述第 (i+1 ) 个子带的能量或幅度的调整值, 并将所述第 i个子带的 能量或幅度的均值作为所述第 i个子带的能量或幅度的调整值; 如果所述第 i个 子带的能量或幅度的均值与所述第 (i+1 )个子带的能量或幅度的均值之间的比 值在预设的阈值范围内, 则将所述第 i个子带的能量或幅度的均值作为所述第 i 个子带的能量或幅度的调整值, 并将所述第 (i+1 )个子带的能量或幅度的均值 作为所述第 (i+1 )个子带的调整值, 其中 i为正整数且 l≤i≤M-l。
结合第二方面的第二种可能的实现方式或第二方面的第三种可能的实现方 式或第二方面的第四种可能的实现方式, 在第五种可能的实现方式中, 所述预 测单元具体用于根据当前帧的扩展频带的第一频谱包络以及所述当前帧的第二 频带的能量或幅度的均值, 确定所述当前帧的扩展频带的第二频谱包络; 在确 定满足预设条件的情况下, 对所述当前帧的扩展频带的第二频谱包络与前一帧 的扩展频带的频谱包络进行加权, 以确定所述当前帧的扩展频带的频谱包络; 在确定不满足预设条件的情况下, 将所述当前帧的扩展频带的第二频谱包络作 为所述当前帧的扩展频带的频谱包络。
结合第二方面的第二种可能的实现方式或第二方面的第三种可能的实现方 式或第二方面的第四种可能的实现方式, 在第六种可能的实现方式中, 所述预 测单元具体用于根据当前帧的扩展频带的第一频谱包络以及所述当前帧的第二 频带的能量或幅度的均值, 确定所述当前帧的扩展频带的第二频谱包络; 在确 定满足预设条件的情况下, 对所述当前帧的扩展频带的第二频谱包络与前一帧 的扩展频带的频谱包络进行加权, 以确定所述当前帧的扩展频带的第三频谱包 络; 在确定不满足预设条件的情况下, 将所述当前帧的扩展频带的第二频谱包 络作为所述当前帧的扩展频带的第三频谱包络; 根据所述解码信号的基音周期、 所述解码信号的浊音度因子以及所述当前帧的扩展频带的第三频谱包络, 确定 所述当前帧的扩展频带的频谱包络。
结合第二方面的第五种可能的实现方式或第二方面的第六种可能的实现方 式, 在第七种可能的实现方式中, 所述预设条件包括以下三个条件中的至少一 个: 条件一: 所述当前帧的语音或音频信号的编码方式与所述前一帧的语音或 音频信号的编码方式不相同; 条件二: 所述前一帧的解码信号为非摩擦音, 且 所述当前帧的解码信号中第 m个频带的能量或幅度的均值与所述前一帧的解码 信号中第 n个频带的能量或幅度的均值之间的比值在预设的阈值范围内,其中 m 和 n为正整数; 条件三: 所述当前帧的解码信号为非摩擦音, 且所述当前帧的 扩展频带的第二频谱包络与所述前一帧的扩展频带的频谱包络之间的比值大于 所述当前帧的解码信号中第 j 个频带的能量或幅度的均值与所述前一帧的解码 信号中第 k个频带的能量或幅度的均值之间的比值, 其中 j和 k为正整数。
结合第二方面或第二方面的第一种可能的实现方式至第七种可能的实现方 式中任一实现方式, 在第八种可能的实现方式中, 所述预测单元具体用于在所 述语音或音频信号的编码方式为时域编码方式的情况下, 从所述解码信号中选 取第三频带, 所述第三频带与所述扩展频带相邻; 根据所述第三频带的频谱系 数, 预测所述扩展频带的激励信号。
结合第二方面或第二方面的第一种可能的实现方式至第七种可能的实现方 式中任一实现方式, 在第九种可能的实现方式中, 所述预测单元具体用于在所 下, 从所述解码信号中选取第四频带, 所述第四频带所分配的比特数目大于预 设的比特数目阈值; 根据所述第四频带的频谱系数, 预测所述扩展频带的激励 信号。
结合第二方面或第二方面的第一种可能的实现方式至第九种可能的实现方 式中任一实现方式, 在第十种可能的实现方式中, 第一合成单元, 用于在所述 将所述解码信号与所述扩展频带的频域信号进行合成, 获取频域输出信号; 第 一变换单元, 用于将所述频域输出信号进行频时变换, 获取最终输出信号。
结合第二方面或第二方面的第一种可能的实现方式至第九种可能的实现方 式中任一实现方式, 在第十一种可能的实现方式中, 获取单元, 用于在所述语 音或音频信号的编码方式为时域编码方式的情况下, 根据时域频带扩展方式, 获取所述扩展频带的第一时域信号; 第二变换单元, 用于将所述扩展频带的频 域信号变换为所述扩展频带的第二时域信号; 第二合成单元, 用于对所述扩展 频带的第一时域信号和所述扩展频带的第二时域信号进行合成, 获取所述扩展 频带的最终时域信号; 第二合成单元还用于将所述解码信号与所述扩展频带的 最终时域信号进行合成, 获取最终输出信号。
第三方面, 提供了一种信号编码方法, 包括: 对语音或音频信号进行核心 层编码, 得到所述语音或音频信号的核心层码流; 对所述语音或音频信号进行 扩展层处理, 确定扩展频带的第一包络; 根据所述语音或音频信号的信噪比、 所述语音或音频信号的基音周期和所述扩展频带的第一包络, 确定所述扩展频 带的第二包络; 对所述第二包络进行编码, 得到扩展层码流; 向解码端发送所 述核心层码流和所述扩展层码流。
第四方面, 提供了一种信号解码方法, 包括: 从编码端接收语音或音频信 号的核心层码流和扩展层码流; 对所述扩展层码流进行解码, 确定扩展频带的 第二包络, 其中所述第二包络是所述编码端根据所述语音或音频信号的信噪比、 所述语音或音频信号的基音周期和所述扩展频带的第一包络确定的; 对所述核 心层码流进行解码, 得到核心层语音或音频信号; 根据所述核心层语音或音频 信号, 预测所述扩展频带的激励信号; 根据所述扩展频带的激励信号和所述扩 展频带的第二包络, 预测所述扩展频带的信号。
第五方面, 提供了一种信号编码设备, 包括: 编码单元, 用于对语音或音 频信号进行核心层编码, 得到所述语音或音频信号的核心层码流; 第一确定单 元, 用于对所述语音或音频信号进行扩展层处理, 确定所述扩展频带的第一包 络; 第二确定单元, 用于根据所述语音或音频信号的信噪比、 所述语音或音频 信号的基音周期和所述扩展频带的第一包络, 确定所述扩展频带的第二包络; 所述编码单元还用于对所述第二包络进行编码, 得到扩展层码流; 发送单元, 用于向解码端发送所述核心层码流和所述扩展层码流。
第六方面, 提供了一种信号解码设备, 包括: 接收单元, 用于从编码端接 收语音或音频信号的核心层码流和扩展层码流; 解码单元, 用于对所述扩展层 码流进行解码, 确定扩展频带的第二包络, 其中所述第二包络是所述编码端根 据所述语音或音频信号的信噪比、 所述语音或音频信号的基音周期和所述扩展 频带的第一包络确定的; 所述解码单元, 还用于对所述核心层码流进行解码, 得到核心层语音或音频信号; 预测单元, 用于根据所述核心层语音或音频信号, 预测所述扩展频带的激励信号; 所述预测单元还用于根据所述扩展频带的激励 信号和所述扩展频带的第二包络, 预测所述扩展频带的信号。 本发明实施例中, 通过根据从语音或音频信号的比特流中得到的解码信号 分别预测扩展频带的频谱包络和激励信号, 使得能够确定语音或音频信号的扩 展频带的频域信号 , 因此能够提升语音或音频信号的性能。
附图说明 为了更清楚地说明本发明实施例的技术方案, 下面将对本发明实施例中所 需要使用的附图作简单地介绍, 显而易见地, 下面所描述的附图仅仅是本发明 的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。 图 1是根据本发明实施例的信号解码方法的示意性流程图。 图 2是根据本发明实施例的信号解码方法的过程的示意性流程图。 图 3是根据本发明一个实施例的信号解码设备的示意框图。 图 4是根据本发明另一实施例的信号解码设备的示意框图。 图 5是根据本发明另一实施例的信号解码设备的示意框图。 图 6是根据本发明一个实施例的信号解码设备的示意框图。 图 7是根据本发明实施例的信号编码方法的示意性流程图。 图 8是根据本发明实施例的信号解码方法的示意性流程图。 图 9是根据本发明实施例的信号编码设备的示意框图。 图 10是根据本发明实施例的信号解码设备的示意框图。
具体实施方式 下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例是本发明的一部分实施例, 而不是全 部实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创造性劳 动的前提下所获得的所有其他实施例, 都应属于本发明保护的范围。 图 1是根据本发明实施例的信号解码方法的示意性流程图。 图 1的方法由 信号解码设备执行, 例如, 可以是解码器。 110, 对语音或音频信号的比特流进行解码, 获取解码信号。 例如, 语音或音频信号的比特流是信号编码设备(比如, 编码器)对原始 的语音或音频信号进行编码得到的。 信号解码设备获取语音或音频信号的比特 流之后, 可以对该比特流进行解码, 得到解码信号。 解码过程可参照现有技术 的过程, 为了避免重复, 此处不再贅述。 该解码信号可以是低频带的解码信号。 例如, 如果语音信号的编码方式为时域编码方式, 则信号解码设备可以根 据相应的解码方式对语音信号的比特流进行解码。 如果音频信号的编码方式为 时域联合编码方式或频域编码方式, 则信号解码设备可以根据相应的解码方式 对音频信号的比特流进行解码。
120, 根据解码信号预测扩展频带的激励信号, 其中解码信号的频带低于扩 展频带, 且解码信号的频带低于扩展频带。
可选地, 作为一个实施例, 在语音或音频信号的编码方式为时域编码方式 的情况下, 信号解码设备可以从解码信号中选取第三频带, 第三频带与扩展频 带相邻。 可以根据第三频带的频谱系数, 预测扩展频带的激励信号。
具体地, 在语音或音频信号的编码方式为时域编码方式的情况下, 信号解 码设备可以根据与扩展频带相邻的第三频带的频谱系数, 预测扩展频带的激励 信号。
可选地, 作为另一实施例, 在语音或音频信号的编码方式为时频联合编码 方式或者频域编码方式的情况下, 信号解码设备可以从解码信号中选取第四频 带, 第四频带所分配的比特数目大于预设的比特数目阈值。 可以根据第四频带 的频谱系数, 预测扩展频带的激励信号。
具体地, 第四频带中所分配的比特数目较多, 那么第四频带在解码时恢复 的也较好。 因此, 信号解码设备可以根据第四频带的频谱系数, 预测扩展频带 的激励信号。
130, 在解码信号中选取第一频带和第二频带, 根据第一频带的频谱系数以 及第二频带的频谱系数预测扩展频带的频谱包络; 其中, 第一频带的最高频点 距离扩展频带的最低高频点小于或等于第一值, 第二频带的最高频点距离第一 频带的最低高频点小于或等于第二值。
本发明实施例中, 扩展频带可以是需要扩展的频带。 例如, 在编码器采用 ACELP ( Algebraic Codebook Excited Linear Prediction, 码本激励线性预测)编 码模式进行编码时, 为了提高编码效率, 可以将采样率为 16kHz的宽带信号下 采样为采样率为 12.8kHz的信号后再编码。这样,信号解码设备对比特流进行解 码后,得到的解码信号的带宽到 6.4kHz。如果为了获取带宽为 8kHz的输出信号, 那么信号解码设备可以扩展 6kHz〜8kHz 的频带, 也就是扩展出频带为 6kHz〜8kHz的信号。 如果为了获取带宽为 14kHz的输出信号, 信号解码设备可 以扩展 6.4kHz〜 14kHz的频带, 也就是扩展出频带为 6.4kHz〜 14kHz的信号。 应理解, 本发明实施例中, 扩展频带的频谱包络可以包括 N个包络值, N 为正整数, N的取值可以根据实际情况进行确定。 可以从扩展频带的起始点向低频的方向从解码信号中选取第一频带和第二 频带, 选取的第一频带和第二频带的离扩展频带足够接近时, 扩展频带能更准 确 (即更接近真实信号)。第一值和第二值分别是为了保证第一频带与扩展频带、 第二频带与第一频带足够接近。 上述第一值和第二值可以为正整数或正数; 可 以用频谱系数或者频点的数量表示; 也可以用带宽表示。 第一值和第二值可以 相等也可以不相等。 第一值和第二值可以根据需要预先设定, 例如可以基于采 样率以及对语音或音频信号进行时频变换的样点数设定第一值和第二值。 比如, 如果 40个频谱系数表示 1kHz, 第一值和第二值可以分别为 40, 也就是第一频 带和扩展频带之间的距离可以在 1kHz以内; 第二频带和第一频带之间的距离可 以在 1kHz以内。 一个实施例中, 在解码信号中选取第一频带和第二频带包括: 按照从扩展 频带的起始点向低频的方向, 在解码信号的频带中选取第一频带和第二频带, 其中第一频带的最高频点距离扩展频带的最低频点等于第一值, 第一值为 0; 第 二频带的最高频点距离第一频带的最低频点等于第二值, 第二值为 0。 作为优选的实施例, 第一值和第二值可以为 0。 那么第一频带与扩展频带相 邻, 第二频带与第一频带相邻。 因此, 可选地, 步骤 130 的一个实施例, 信号 解码设备可以按照从扩展频带的起始点向低频的方向, 在解码信号中选取第一 频带和第二频带, 其中第一频带可以与扩展频带相邻, 第二频带可以与第一频 带相邻。 信号解码设备可以根据第一频带的频谱系数以及第二频带的频谱系数, 预测扩展频带的频谱包络。 具体地, 信号解码设备可以从扩展频带的起始点向低频的方向, 在解码信 号的频带中依次选取第一频带和第二频带。 例如, 假设解码信号的频带为 0〜6.4kHz, 扩展频带为 6kHz 〜8kHz, 那么第一频带可以是 4.8kHz〜6.4kHz, 第 二频带可以是 3.2kHz〜4.8kHz。 假设解码信号的频带为 0〜6.4kHz, 扩展频带为 6.4kHz〜14kHz , 那么第一频带可以是 4kHz〜6.4kHz , 第二频带可以是 3.2kHz〜4kHz。上述数值的举例是为了帮助本领域技术人员更好地理解本发明实 施例, 而非限制本发明的范围。 第一频带和第二频道可以根据实际情况进行选 取, 本发明实施例对此不作限定。
可选地,作为另一实施例,信号解码设备可以将第一频带划分为 M个子带, 并根据第一频带的频谱系数确定每个子带的能量或幅度的均值, 其中 M为正整 数。 可以根据每个子带的能量或幅度的均值, 确定每个子带的能量或幅度的调 整值。 可以根据每个子带的能量或幅度的调整值, 预测扩展频带的第一频谱包 络。 可以根据第二频带的频谱系数, 确定第二频带的能量或幅度的均值。 可以 根据扩展频带的第一频谱包络以及第二频带的能量或幅度的均值, 预测扩展频 带的频谱包络。
具体地, 信号解码设备可以将第一频带划分为 M个子带, 并根据第一频带 的频谱系数确定每个子带的能量或幅度的均值, 也就是可以得到 M个能量或幅 度的均值。 根据 M个能量或幅度的均值, 可以确定 M个能量或幅度的调整值。
信号解码设备可以根据 M个能量或幅度的调整值, 预测扩展频带的第一频 谱包络。 第一频谱包络可以是对扩展频带的频谱包络的初步预测。 第一频谱包 络可以包括 N个值。 信号解码设备可以根据扩展频带的第一频谱包络以及第二 频带的能量或幅度的均值, 预测扩展频带的频谱包络。
可选地, 作为另一实施例, 如果 M个子带的能量或幅度的均值的方差不在 预设的阈值范围内, 则将 a个子带中每个子带的能量或幅度的均值进行调整以 确定 a个子带中每个子带的能量或幅度的调整值, 并将 b个子带中每个子带的 能量或幅度的均值作为 b个子带中每个子带的能量或幅度的调整值, 其中 a个 子带中每个子带的能量或幅度的均值大于或等于均值阈值, b个子带中每个子带 的能量或幅度的均值小于均值阈值, a和 b为正整数, 且 a+b=M; 如果 M个子 带的能量或幅度的均值的方差在预设的阈值范围内, 则将每个子带的能量或幅 度的均值作为每个子带的能量或幅度的调整值。
具体地, 在 M个能量或幅度的均值的方差不在预设的阈值范围内时, 可以 将 M个能量或幅度的均值中大于均值阈值的那些值进行调整。 应注意, 阈值范 围可以是根据 M个能量或幅度的均值的方差来确定的, 均值阈值可以是根据 M 个能量或幅度的均值来确定的。 例如, 均值阈值可以是 M个均值的平均值, 可 以将 M个能量或幅度的均值中大于该平均值的那些能量或幅度的均值进行缩放 得到对应的调整值。 进行缩放的过程可以是将需要调整的均值乘以缩放比例值, 该缩放比例值可以是根据 M个子带的能量或幅度的均值得到的, 且该缩放比例 值小于 1。
可选地, 作为另一实施例, 对于 M个子带中的第 i个子带和第 (i+1 )个子 带, 如果第 i个子带的能量或幅度的均值与第 (i+1 )个子带的能量或幅度的均 值之间的比值不在预设的阈值范围内,则在第 i个子带的能量或幅度的均值大于 第 (i+1 ) 个子带的能量或幅度的均值时, 对第 i个子带的能量或幅度的均值进 行调整以确定第 i个子带的能量或幅度的调整值, 并将第 (i+1 ) 个子带的能量 或幅度的均值作为第 (i+1 )个子带的能量或幅度的调整值; 在第 i个子带的能 量或幅度的均值小于第 (i+1 )个子带的能量或幅度的均值时, 对第 (i+1 )个子 带的能量或幅度的均值进行调整以确定第(i+1 )个子带的能量或幅度的调整值, 并将第 i个子带的能量或幅度的均值作为第 i个子带的能量或幅度的调整值; 如 果第 i个子带的能量或幅度的均值与第 (i+1 )个子带的能量或幅度的均值之间 的比值在预设的阈值范围内, 则将第 i个子带的能量或幅度的均值作为第 i个子 带的能量或幅度的调整值,并将第(i+1 )个子带的能量或幅度的均值作为第(i+1 ) 个子带的调整值, 其中 i为正整数且 l≤i≤M-l。
具体地, 如果第 i个子带的能量或幅度的均值与第 (i+1 )个子带的能量或 幅度的均值之间的比值不在预设的阈值范围内,则可以对第 i个子带的能量或幅 度的均值和第 (i+1 )个子带的能量或幅度的均值中较大的那个值进行调整得打 对应的调整值, 例如, 可以对两者之间较大的那个均值进行缩放, 比如可以将 较大的均值乘以缩放比例值。
可选地, 作为另一实施例, 信号解码设备可以根据当前帧的扩展频带的第 一频谱包络以及当前帧的第二频带的能量或幅度的均值, 确定当前帧的扩展频 带的第二频谱包络。 在确定满足预设条件的情况下, 可以对当前帧的扩展频带 的第二频谱包络与前一帧的扩展频带的频谱包络进行加权, 以确定当前帧的扩 展频带的频谱包络。 在确定不满足预设条件的情况下, 将当前帧的扩展频带的 第二频谱包络作为当前帧的扩展频带的频谱包络。
应理解, 图 1 所描述的过程均是针对于当前帧的。 因此, 信号解码设备需 要预测的扩展频带的频谱包络也是当前帧的扩展频带的频谱包络。 具体地, 信号解码设备可以根据扩展频带的第一频谱包络以及第二频带的 能量或幅度的均值, 确定扩展频带的第二频谱包络。 例如, 可以在第二频带的 能量或幅度的均值与第一频谱包络的均值之间的比值大于预设值时, 对第一频 谱包络包括的 Ν个值分别进行缩放, Ν为正整数。 第一频谱包络的均值可以是 第一频谱包络包括的 Ν个值的均值。 进一步, 可以在第二频带的能量或幅度的 均值的开方值与第一频谱包络的均值之间的比值大于预设值时, 对第一频谱包 络包括的 Ν个值分别进行缩放。 例如, 可以将第一频谱包络包括的 Ν个值分别 乘以缩放比例值, 该缩放比例值可以根据第二频带的能量或幅度的均值和第一 频谱包络的均值确定。 在语音或音频信号的编码方式为时域编码方式的情况下, 该缩放比例值大于 1 ,在语音或音频信号的编码方式为时频联合编码方式或频域 编码方式的情况下, 该缩放比例值小于 1。
在满足预设条件时, 当前帧的扩展频带的频谱包络还需要基于前一帧的扩 展频带的频谱包络进行确定。 具体地, 可以将上述第二频谱包络与前一帧的扩 展频带的频谱包络进行加权, 确定当前帧的扩展频带的频谱包络。 在不满足预 设条件时, 当前帧的扩展频带的频带包络可以是第二频谱包络。
可选地, 作为另一实施例, 信号解码设备可以根据当前帧的扩展频带的第 一频谱包络以及当前帧的第二频带的能量或幅度的均值, 确定当前帧的扩展频 带的第二频谱包络; 在确定满足预设条件的情况下, 对当前帧的扩展频带的第 二频谱包络与前一帧的扩展频带的频谱包络进行加权, 以确定当前帧的扩展频 带的第三频谱包络; 在确定不满足预设条件的情况下, 将当前帧的扩展频带的 第二频谱包络作为当前帧的扩展频带的第三频谱包络; 根据解码信号的基音周 期、 解码信号的浊音度因子以及当前帧的扩展频带的第三频谱包络, 确定当前 帧的扩展频带的频谱包络。
具体地, 确定当前帧的扩展频带的第三频谱包络的过程与上述实施例中确 定当前帧的扩展频带的频谱包络的过程类似, 为了避免重复, 此处不再贅述。 也就是, 在上述实施例中, 将当前帧的扩展频带的第三频谱包络作为当前帧的 扩展频带的频谱包络, 但此处, 为了使扩展频带的频谱包络更精确, 因此可以 对扩展频带的第三频谱包络进行进一步修正得到扩展频带的频谱包络, 即可以 根据上述解码信号 (也就是当前帧的解码信号) 的基音周期和浊音度因子, 对 扩展频带的第三频谱包络进行修正, 使得最终的扩展频带的频谱包络与浊音度 因子成反比, 与基音周期成正比, 从而确定最终的扩展频带的频谱包络。
例如, 可以基于以下等式确定扩展频带的频谱包络 wenv:
wenv=(al*pitch*pitch+bl *pitch+cl)/(a2*voice_fac*voice_fac+b2*voice_fac+c 2)*wenv3
其中, pitch可以表示解码信号的基音周期, voice— fac可以表示解码信号的 浊音度因子, wenv3可以表示扩展频带的第三频谱包络。 al和 bl不能同时为 0, a2、 b2和 c2不能同时为 0。
这样, 对于扩展频带存在比特的情况以及扩展频带是盲频带的情况, 该实 施例都可以适用。
可选地, 作为另一实施例, 上述预设条件可以包括以下三个条件中的至少 一个: 条件一: 当前帧的语音或音频信号的编码方式与前一帧的语音或音频信 号的编码方式不相同; 条件二: 前一帧的解码信号为非摩擦音且当前帧的解码 信号中第 m个频带的能量或幅度的均值与前一帧的解码信号中第 n个频带的能 量或幅度的均值之间的比值在预设的阈值范围内, 其中 m和 n为正整数; 条件 三: 当前帧的解码信号为非摩擦音且当前帧的扩展频带的第二频谱包络与前一 帧的扩展频带的频谱包络之间的比值大于当前帧的解码信号中第 j 个频带的能 量或幅度的均值与前一帧的解码信号中第 k个频带的能量或幅度的均值之间的 比值, 其中 j和 k为正整数。
具体地, 当前帧的语音或音频信号的编码方式与前一帧的语音或音频信号 的编码方式不相同, 可以是指当前帧的语音或音频信号的编码方式为时域编码 方式, 而前一帧的语音或音频信号的编码方式为时频联合编码方式或频域编码 方式, 也可以是指当前帧的语音或音频信号的编码方式为时频联合编码方式或 频域编码方式, 而前一帧的语音或音频信号的编码方式为时域编码方式。
前一帧的解码信号为非摩擦音, 且当前帧的解码信号中第 m个频带的能量 或幅度的均值与前一帧的解码信号中第 n个频带的能量或幅度的均值之间的比 值在预设的阈值范围内, 其中, 预设的阈值范围可以是根据实际情况进行设定 的, 本发明实施例对此不作限定。 如果当前帧的解码信号和前一帧的解码信号 均是语音信号, 并且均是浊音或清音, 则该预设的阈值范围可以适当地扩大。 此外, 在上述条件中, 当前帧的解码信号中第 m个频带的能量或幅度的均 值, 可以是根据预定义的规则或实际情况从当前帧的解码信号中选择第 m个频 带,确定该频带的能量或幅度的均值。此外,还可以将当前帧的解码信号中第 m 个频带的能量或幅度的均值进行存储, 在下一帧时, 可以直接获取存储的当前 帧的解码信号中第 m个频带的能量或幅度的均值。 因此, 前一帧的解码信号中 第 n个频带的能量或幅度的均值在前一帧时已经被存储。 此时, 可以直接获取 存储的前一帧的解码信号中第 n个频带的能量或幅度的均值。 如果当前帧的语 音或音频信号的编码方式与前一帧的语音或音频信号的编码方式不相同, 则当 前帧的解码信号中第 m个频带可以不同于前一帧的解码信号中第 n个频带。 此外,当前帧的解码信号中第 j个频带的能量或幅度的均值的确定方式可参 照上述第 m个频带的能量或幅度的均值的确定方式。 前一帧的解码信号中第 k 个频带的能量或幅度的均值的确定方式可参照上述第 n个频带的能量或幅度的 均值的确定方式。 为了避免重复, 此处不再贅述。 具体地, 在满足上面三个条件中的至少一个时, 信号解码设备可以将上述 第二频谱包络与前一帧的扩展频带的频谱包络进行加权, 确定当前帧的扩展频 带的频谱包络。 在上面三个条件都不满足时, 当前帧的扩展频带的频带包络可 以是第二频谱包络。
140, 根据扩展频带的频谱包络和扩展频带的激励信号, 确定扩展频带的频 域信号。 例如, 可以将扩展频带的频谱包络和扩展频带的激励信号相乘, 确定扩展 频带的频域信号。 本发明实施例中, 上述确定扩展频带的频域信号的方式, 可以称为频域频 带扩展方式。 可选地, 作为另一实施例, 在语音或音频信号的编码方式为时频联合编码 方式或者频域编码方式的情况下, 信号解码设备可以将扩展频带的频域信号变 换为扩展频带的第一时域信号, 将解码信号与扩展频带的第一时域信号进行合 成, 获取输出信号。 可选地, 作为另一实施例, 在语音或音频信号的编码方式为时域编码方式 的情况下, 信号解码设备可以根据时域频带扩展方式, 获取扩展频带的第二时 域信号。 可以将扩展频带的频域信号变换为扩展频带的第三时域信号。 可以对 扩展频带的第二时域信号和扩展频带的第三时域信号进行合成, 获取扩展频带 的最终时域信号。 可以将解码信号与扩展频带的最终时域信号进行合成, 获取 输出信号。
具体地, 在语音或音频信号的编码方式为时域编码方式的情况下, 信号解 终时域信号。 然后可以将解码信号与扩展频带的最终时域信号进行合成, 获取 最终的输出信号。 时域频带扩展方式的具体过程可参照现有技术, 为了避免重 复, 此处不再贅述。
本发明实施例中, 通过根据从语音或音频信号的比特流中得到的解码信号 分别预测扩展频带的频谱包络和激励信号, 使得能够确定语音或音频信号的扩 展频带的频域信号 , 因此能够提升语音或音频信号的性能。
另一个实施例中, 根据本发明实施例的信号解码方法, 包括:
对语音或音频信号的比特流进行解码, 获取解码信号;
根据所述解码信号预测扩展频带的激励信号, 其中, 所述扩展频带与所述 解码信号的频带相邻, 且所述解码信号的频带低于所述扩展频带;
按照从所述扩展频带的起始点向低频的方向, 在所述解码信号的频带中选 取第一频带和第二频带, 其中第一频带与扩展频带相邻, 第二频带与第一频带 相邻;
根据所述第一频带的频谱系数以及所述第二频带的频谱系数预测所述扩展 频带的频谱包络;
根据所述扩展频带的频谱包络和所述扩展频带的激励信号, 确定所述扩展 频带的频域信号。
本实施例与上一个实施例的区别在于第一频带和第二频带的选择方式不 同。 本实施例中, 选取的第一频带与扩展频带相邻, 第二频带与第一频带相邻; 此处的相邻表示两个频带之间连续或者无频点间隔。 具体地, 信号解码设备可 以从扩展频带的起始点向低频的方向, 在解码信号的频带中依次选取第一频带 和第二频带。例如,假设解码信号的频带为 0〜6.4kHz,扩展频带为 6kHz〜8kHz, 那么第一频带可以是 4.8kHz〜6.4kHz, 第二频带可以是 3.2kHz〜4.8kHz。 假设解 码信号的频带为 0〜6.4kHz, 扩展频带为 6.4kHz〜14kHz, 那么第一频带可以是 4kHz〜6.4kHz, 第二频带可以是 3.2kHz〜4kHz。 上述数值的举例是为了帮助本领 域技术人员更好地理解本发明实施例, 而非限制本发明的范围。 第一频带和第 二频带可以根据实际情况进行选取, 本发明实施例对此不作限定。 显然的, 上一个实施例中除选择第一频带和第二频带之外的其他步骤涉及 的具体实现方式和实施例都适用于本实施例中的对应步骤。 下面结合具体例子详细描述本发明实施例。 应注意, 这些例子是为了帮助 本领域技术人员更好地理解本发明实施例, 而非限制本发明实施例的范围。 图 2是根据本发明实施例的信号解码方法的过程的示意性流程图。 在图 2中, 假设语音或音频信号的采样率为 12.8kHz。
201 , 信号解码设备确定语音或音频信号的编码方式。 202, 在信号解码设备确定语音或音频信号的编码方式不是时域编码方式的 情况下, 例如, 该语音或音频信号的编码方式为时域联合编码方式或频域编码 方式, 信号解码设备可以采用相应的解码方式对该语音或音频信号的比特流进 行解码, 获取解码信号。 由于该语音或音频信号的采样率为 12.8kHz, 那么解码 信号的带宽为 6.4kHz。 为了获取带宽为 8kHz的输出信号, 则需要进行盲带宽扩 展, 恢复频带为 6kHz〜8kHz的信号, 也就是扩展出频带为 6kHz〜8kHz的信号。
下,信号解码设备可采用频域频带扩展方式恢复扩展频带 6kHz〜8kHz的频域信 号。
203 , 信号解码设备从步骤 202的解码信号中选取第一频带和第二频带, 并 根据第一频带的频谱系数和第二频带的频谱系数, 预测扩展频带的频谱包络。 可选地, 信号解码设备可以按照从扩展频带的起始点向低频的方向, 在解 码信号中选取第一频带和第二频带, 其中第一频带与扩展频带相邻, 第一频带 与第二频带相邻。 下面结合具体的例子详细描述预测扩展频带的频谱包络的过 程。 应注意, 这个例子只是为了帮助本领域技术人员更好地理解本发明实施例, 而非限制本发明实施例的范围。 在下面的例子中, 假设将扩展频带划分为两个子带, 则需要预测每个子带 的频-潜包络值, 此处以 wenv[l]和 wenv[2]表示两个子带的频谱包络值。 (1)从解码信号的频带中可选取第一频带,假设第一频带为 4.8kHz〜6.4kHz, 可将第一频带划分为两个子带, 第一个子带为 4.8kHz〜5.6kHz, 第二个子带为 5.6kHz〜6.4kHz。 信号解码设备可根据第一个子带的频谱系数, 确定第一个子带 的能量的均值 enerl。 可根据第二个子带的频谱系数, 确定第二个子带的能量的 均值 ener2„ 假设预设的阈值范围为(0.5,2), 如果 enerl/ener2>2, 则可以对 enerl进行缩 放, 例如 enerl'=enerl*(2*ener2/enerl), ener2可以保持不更, 即 ener2'=ener2。 此处, enerl'可以表示第一个子带的能量的调整值, ener2'可以表示第二个子带的 能量的调整值。 如 果 enerl/ener2<0.5 , 则 可 以 对 ener2 进 行 缩 放 , 例 如 ener2 -ener2 * (2 * ener 1 /ener2) , enerl可以保持不更, 即 enerl'=enerl。 应注意, 虽然此处根据第一个子带的能量的均值和第二个子带的能量的均 值之间的比值是否在阈值范围之内, 来确定第一个子带的能量的调整值和第二 个子带的能量的调整值。 但本发明实施例中, 还可以根据第一个子带的能量的 均值和第二个子带的能量的均值的方差是否在阈值范围之内, 来确定第一个子 带的能量的调整值和第二个子带的能量的调整值, 确定过程可参照上述根据比 值确定的过程, 此处不再贅述。 因此, 根据 enerl'和 ener2', 确定扩展频带的第一频谱包络, 第一频谱包络 是对扩展频带的频谱包络的初步预测, 第一频谱包络包括两个频谱包络值 wenv[l]'和 wenv[2]'„ 例如, 可以按照如下方式确定 wenv[ 1 ] '和 wenv[2] ': wenv[l]'=^ener1' , wenv[2]'=^ener2' 。 也可以按照如下方式确定 wenv[l]'和 wenv[2]':
Γ 1 η, Γοη, J(enerl' + ener2') 12
wenv[l] =wenv[2] = / 7 。 (2)从解码信号的频带中可选取第二频带,假设第二频带为 3.2kHz〜4.8kHz。 信号解码设备可以根据第二频带的频谱系数, 确定第二频带的能量的均值 enerL„ 信号解码设备可以根据 enerL以及 wenv[l]'和 wenv[2]', 确定扩展频带的第 二频 -潜包络, 第二频 -潜包络包括两个频 -潜包络值, 即 wenv[l]"和 wenv[2]"。
例如,如果 enerL >k* [(Wenv[l]'+wenv[2]')/2] ,其中 k的取值可以是预先定 义的, 那么可以对 wenv[l]'和 wenv[2]'进行缩放, 从而确定扩展频带的两个频谱 包络值 wenv[l]和 wenv[2]„ 例如,可以根据 enerL以及 wenv[l]'和 wenv[2]',按照如下方式确定 wenv[l]" 和 wenv[2]": 在语音或音频信号的编码方式为时域编码方式的情况下: wenv[ 1 ] "=p*wenv[ 1 ] ' , wenv[2]"=p*wenv[2]' , p= enerL /[(wenv[ ] '+Wenv[2] ')12]。
下: wenv[ 1 ] "=p*wenv[ 1 ] ' , wenv[2]"=p*wenv[2]' , p=[(wenv[ 1 ] '+wenv[2] ')/2]/ VenerL。 此外,如果解码信号为摩擦音,那么可以对上面得到的 wenv[l]"和 wenv[2]" 进一步缩放, 缩放比例值小于 1。 应注意, 上述预测 wenv[l]"和 wenv[2]"的过程还可以如下: 在上述步骤( 1 )中,信号解码设备还可以根据上述第一个子带的频谱系数, 确定第一个子带的幅度的均值 ampl。 可根据上述第二个子带的频谱系数, 确定 第二个子带的幅度的均值 amp2。 假设预设的阈值范围为(0.5,2), 如果 ampl/amp2>2, 则可以对 ampl进行缩 放, 例如 ampl'=ampl*(2*amp2/ampl), amp2可以保持不变, 即 amp2'=amp2。 此处, ampl'可以表示第一个子带的幅度的调整值, amp2'可以表示第二个子带的 幅度的调整值。 口 果 am l/amp2<0.5 , 则 可 以 对 amp2 进 行 缩 放 , 例 口 amp2 -amp2 * (2 * amp 1 /amp2) , amp 1可以保持不更, 即 amp 1 -amp 1。 应注意, 虽然此处根据第一个子带的幅度的均值和第二个子带的幅度的均 值之间的比值是否在阈值范围之内, 来确定第一个子带的能量的调整值和第二 个子带的能量的调整值。 但本发明实施例中, 还可以根据第一个子带的幅度的 均值和第二个子带的幅度的均值的方差是否在阈值范围之内, 来确定第一个子 带的幅度的调整值和第二个子带的幅度的调整值, 确定过程可参照上述根据比 值确定的过程, 此处不再贅述。 因此, 根据 ampl'和 amp2', 确定扩展频带的第一频谱包络, 第一频谱包络 是对扩展频带的频谱包络的初步预测, 第一频谱包络包括两个频谱包络值 wenv[l]'和 wenv[2]'„ 例如, 可以按照如下方式确定 wenv[ 1 ] '和 wenv[2] ': wenv[ 1 ] -amp 1 ' , wenv[2] '= amp2 '。 也可以按照如下方式确定 wenv[l]'和 wenv[2]':
wenv[ 1 ] -wenv[2] '= (amp 1 '+amp2')/2。。 在上述步骤(2 ) 中, 信号解码设备还可以根据第二频带的频谱系数, 确定 第二频带的幅度的均值 ampL。 信号解码设备可以根据 apmL 以及 wenv[l]'和 wenv[2]', 确定 wenv[l]"和 wenv[2]"„ 例如, 如果 ampL>k* [(wenv[l]'+wenv[2]')/2] , 其中 k的取值可以是预先定 义的, 那么可以对 wenv[l]'和 wenv[2]'进行缩放, 从而确定扩展频带的两个频谱 包络值 wenv[l]和 wenv[2]„ 例如,可以根据 ampL以及 wenv[l]'和 wenv[2]',按照如下方式确定 wenv[l]" 和 wenv[2]": 在语音或音频信号的编码方式为时域编码方式的情况下: wenv[ 1 ] "=p*wenv[ 1 ] ' , wenv[2]"=p*wenv[2]' , p=ampL/[(wenv[ 1 ] '+wenv[2] ')/2]。 下:
wenv[ 1 ] "=p*wenv[ 1 ] ' , wenv[2]"=p*wenv[2]' , p=[(wenv[ 1 ] '+wenv[2] ')/2]/ampL。
(3)信号解码设备可以确定是否满足预设条件。在确定满足预设条件的情况 下, 将上面的 wenv[ l ]"和 wenv[2]"与前一帧的扩展频谱的频谱包络进行加权, 确定 wenv[ 1 ]和 wenv[2]。
在确定不满足预设条件的情况下, wenv[ l]=wenv[ l ] ", wenv[2]=wenv[2]"。 预设条件可以包括以下中的至少一个:
(a) 当前帧的语音或音频信号的编码方式与前一帧的语音或音频信号的编 码方式不相同。
例如, 此处的语音或音频信号的编码方式为时频联合编码方式或频域编码 方式, 那么前一帧的语音或音频信号的编码方式可以为时域编码方式。
(b) 前一帧的解码信号为非摩擦音, 且当前帧的解码信号中第 m个频带的 能量或幅度的均值与前一帧的解码信号中第 n个频带的能量或幅度的均值之间 的比值在预设的阈值范围内, 其中 m和 n为正整数。
例如, 预设的阈值范围可以是根据实际情况进行设定的。 比如, 该预设的 阈值范围可以为 (0.5,2)。如果当前帧的解码信号和前一帧的解码信号均是语音信 号, 并且均是浊音或清音, 则该预设的阈值范围可以适当地扩大。 比如, 可以 将预设的阈值范围扩大为 (0.4,2.5)。
此外, 在该条件中, 当前帧的解码信号中第 m个频带的能量或幅度的均值, 可以是根据预定义的规则或实际情况从当前帧的解码信号中选择第 m个频带, 确定该频带的能量或幅度的均值。 此外, 还可以将当前帧的解码信号中第 m个 频带的能量或幅度的均值进行存储, 在下一帧时, 可以直接获取存储的当前帧 的解码信号中第 m个频带的能量或幅度的均值。 因此, 前一帧的解码信号中第 n个频带的能量或幅度的均值在前一帧时已经被存储。 此时, 可以直接获取存储 的前一帧的解码信号中第 n个频带的能量或幅度的均值。 如果当前帧的语音或 音频信号的编码方式与前一帧的语音或音频信号的编码方式不相同, 则当前帧 的解码信号中第 m个频带可以不同于前一帧的解码信号中第 n个频带。 例如, 如果当前帧的语音或音频信号的编码方式为时频联合编码方式或频域编码方 式, 那么可以从当前帧的解码信号中选择 2kHz〜6kHz的频带, 确定该频带的能 量或幅度的均值。 如果前一帧的语音或音频信号的编码方式为时域编码方式, 那么可以确定前一帧的解码信号中频带为 4kHz〜6kHz的能量或幅度的均值。
(c) 当前帧的解码信号为非摩擦音,且当前帧的扩展频带的第二频谱包络与 前一帧的扩展频带的频谱包络之间的比值大于当前帧的解码信号中第 j 个频带 的能量或幅度的均值与前一帧的解码信号中第 k个频带的能量或幅度的均值之 间的比值, 其中 j和 k为正整数。
在该条件中,当前帧的解码信号中第 j个频带的能量或幅度的均值的确定方 式可参照条件(b )中第 m个频带的能量或幅度的均值的确定方式。 前一帧的解 码信号中第 k个频带的能量或幅度的均值的确定方式可以参照条件(b ) 中第 n 个频带的能量或幅度的均值的确定方式。 如果当前帧的语音或音频信号的编码 方式与前一帧的语音或音频信号的编码方式不相同,则第 j个频带和第 k个频带 可以不相同。
204, 信号解码设备根据步骤 202得到的解码信号的频谱系数, 预测扩展频 带的激励信号。 式, 那么信号解码设备可以从解码信号的频带中选取分配的比特数目大于预设 的比特数目阈值且恢复较好的频带, 根据该频带的频谱系数预测扩展频带的激 励信号。 例如, 可以根据 2kHz〜4kHz 的频带的频谱系数, 预测扩展频带 6kHz〜8kHz的激励信号。
此外, 如果语音或音频信号的编码方式为时域编码方式, 那么信号解码设 备可以从解码信号的频带中选取与扩展频带相邻的频带, 根据该频带的频谱系 数预测扩展频带的激励信号。 例如, 可以根据 4kHz〜6kHz的频带的频谱系数, 预测扩展频带 6kHz〜8kHz的激励信号。
205, 信号解码设备可以根据步骤 203预测的频谱包络和步骤 204预测的激 励信号, 确定扩展频带的频域信号。 例如, 可以将扩展频带的频谱包络和扩展频带的激励信号相乘, 确定扩展 频带的频域信号。
206, 信号解码设备将步骤 202得到的解码信号与步骤 205中得到的扩展频 带的频域信号进行合成, 获取频域输出信号。
207, 信号解码设备将步骤 206得到的频域输出信号进行频时变换, 获取最 终输出信号。
208, 在信号解码设备确定语音或音频信号的编码方式为时域编码方式的情 况下, 信号解码设备采用相应的解码方式对该语音或音频信号的比特流进行解 码。
由于该语音或音频信号的采样率为 12.8kHz , 那么解码信号的带宽为
6.4kHz。 为了获取带宽为 8kHz的输出信号, 则需要进行盲带宽扩展, 恢复频带 为 6kHz〜8kHz的信号, 也就是扩展频带为 6kHz〜8kHz。
在语音或音频信号的编码方式为时域编码方式的情况下, 信号解码设备可 以采用时域频带扩展方式和频域频带扩展方式恢复扩展频带 6kHz〜8kHz的最终 时域信号。
209, 信号解码设备根据步骤 208中的解码信号, 使用时域频带扩展方式, 确定扩展频带 6kHz〜8kHz的第一时域信号。
时域频带扩展方式的具体过程可参照现有技术, 为了避免重复, 此处不再 贅述。
210, 信号解码设备将步骤 208中的解码信号进行时频变换, 将解码信号由 时域的信号转换为频域的信号。
211 , 信号解码设备使用频域频带扩展方式, 确定扩展频带的频域信号。 具体过程可参照步骤 203至 205 , 为了避免重复, 此处不再贅述。
212,信号解码设备将步骤 211中确定的扩展频带的频域信号进行频时变换, 确定扩展频带的第二时域信号。
213 , 信号解码设备对扩展频带的第一时域信号和扩展频带的第二时域信号 进行相加, 确定扩展频带的最终时域信号。 214, 信号解码设备将步骤 208得到的解码信号与步骤 213中得到的扩展频 带的频域信号进行合成, 确定最终输出信号。 本发明实施例中, 通过根据从语音或音频信号的比特流中得到的解码信号 分别预测扩展频带的频谱包络和激励信号, 使得能够确定语音或音频信号的扩 展频带的频域信号, 因此能够提升语音或音频信号的性能。
图 3是根据本发明一个实施例的信号解码设备的示意框图。图 3的设备 300 的一个例子是解码器。 设备 300 包括解码单元 310、 预测单元 320和确定单元 330。
解码单元 310对语音或音频信号的比特流进行解码, 获取解码信号。 预测 单元 320从解码单元 310接收解码信号, 并根据解码信号预测扩展频带的激励 信号, 其中, 扩展频带与解码信号的频带相邻, 且解码信号的频带低于扩展频 带。 预测单元 320还在解码信号中选取第一频带和第二频带, 并根据第一频带 的频谱系数以及第二频带的频谱系数预测扩展频带的频谱包络, 其中, 第一频 带的最高频点距离扩展频带的最低频点小于或等于第一值, 第二频带的最高频 点距离第一频带的最低频点小于或等于第二值。 确定单元 330从预测单元 320 接收扩展频带的频谱包络和扩展频带的激励信号, 根据扩展频带的频谱包络和 扩展频带的激励信号, 确定扩展频带的频域信号。 本发明实施例中, 通过根据从语音或音频信号的比特流中得到的解码信号 分别预测扩展频带的频谱包络和激励信号, 使得能够确定语音或音频信号的扩 展频带的频域信号, 因此能够提升语音或音频信号的性能。
设备 300的其它功能和操作可参照上面图 1和图 2的方法实施例的过程, 为了避免重复, 此处不再贅述。
可选地, 作为一个实施例, 预测单元 320可按照从扩展频带的起始点向低 频的方向, 在解码信号中选取第一频带和第二频带, 其中第一频带的最高频点 距离扩展频带的最低频点等于第一值, 第一值为 0; 第二频带的最高频点距离第 一频带的最低频点等于第二值, 第二值为 0。
可选地, 作为另一实施例, 预测单元 320可将第一频带划分为 M个子带, 并根据第一频带的频谱系数确定每个子带的能量或幅度的均值, 其中 M为正整 数; 根据每个子带的能量或幅度的均值, 确定每个子带的能量或幅度的调整值; 根据每个子带的能量或幅度的调整值, 预测扩展频带的第一频谱包络; 根据第 二频带的频谱系数, 确定第二频带的能量或幅度的均值; 根据扩展频带的第一 频谱包络以及第二频带的能量或幅度的均值, 预测扩展频带的频谱包络。
可选地, 作为另一实施例, 如果 M个子带的能量或幅度的均值的方差不在 预设的阈值范围内, 则预测单元 320可将 a个子带中每个子带的能量或幅度的 均值进行调整以确定 a个子带中每个子带的能量或幅度的调整值, 并将 b个子 带中每个子带的能量或幅度的均值作为 b个子带中每个子带的能量或幅度的调 整值, 其中 a个子带中每个子带的能量或幅度的均值大于或等于均值阈值, b 个子带中每个子带的能量或幅度的均值小于均值阈值, a 和 b 为正整数, 且 a+b=M„
如果 M个子带的能量或幅度的均值的方差在预设的阈值范围内, 则预测单 元 320可将每个子带的能量或幅度的均值作为每个子带的能量或幅度的调整值。
可选地, 作为另一实施例, 对于 M个子带中的第 i个子带和第 (i+1 )个子 带, 如果第 i个子带的能量或幅度的均值与第 (i+1 )个子带的能量或幅度的均 值之间的比值不在预设的阈值范围内,则预测单元 320可在第 i个子带的能量或 幅度的均值大于第 (i+1 )个子带的能量或幅度的均值时, 对第 i个子带的能量 或幅度的均值进行调整以确定第 i个子带的能量或幅度的调整值, 并将第(i+1 ) 个子带的能量或幅度的均值作为所述第 (i+1 )个子带的能量或幅度的调整值; 在第 i个子带的能量或幅度的均值小于第(i+1 )个子带的能量或幅度的均值时, 对第 (i+1 )个子带的能量或幅度的均值进行调整以确定第 (i+1 )个子带的能量 或幅度的调整值, 并将第 i个子带的能量或幅度的均值作为第 i个子带的能量或 幅度的调整值。
如果第 i个子带的能量或幅度的均值与第 (i+1 )个子带的能量或幅度的均 值之间的比值在预设的阈值范围内,则预测单元 320可将第 i个子带的能量或幅 度的均值作为第 i个子带的能量或幅度的调整值, 并将第 (i+1 ) 个子带的能量 或幅度的均值作为第 (i+1 )个子带的调整值, 其中 i为正整数且 l≤i≤M-l。
可选地, 作为另一实施例, 预测单元 320可根据当前帧的扩展频带的第一 频谱包络以及当前帧的第二频带的能量或幅度的均值, 确定当前帧的扩展频带 的第二频谱包络; 在确定满足预设条件的情况下, 对当前帧的扩展频带的第二 频谱包络与前一帧的扩展频带的频谱包络进行加权, 以确定当前帧的扩展频带 的频谱包络; 在确定不满足预设条件的情况下, 将当前帧的扩展频带的第二频 谱包络作为当前帧的扩展频带的频谱包络。 可选地, 作为另一实施例, 预测单元 320可根据当前帧的扩展频带的第一 频谱包络以及当前帧的第二频带的能量或幅度的均值, 确定当前帧的扩展频带 的第二频谱包络; 在确定满足预设条件的情况下, 对当前帧的扩展频带的第二 频谱包络与前一帧的扩展频带的频谱包络进行加权, 以确定当前帧的扩展频带 的第三频谱包络; 在确定不满足预设条件的情况下, 将当前帧的扩展频带的第 二频谱包络作为当前帧的扩展频带的第三频谱包络; 根据解码信号的基音周期、 解码信号的浊音度因子以及当前帧的扩展频带的第三频谱包络, 确定当前帧的 扩展频带的频谱包络。 可选地, 作为另一实施例, 上述预设条件可包括以下三个条件中的至少一 个: 条件一: 当前帧的语音或音频信号的编码方式与前一帧的语音或音频信号 的编码方式不相同; 条件二: 前一帧的解码信号为非摩擦音, 且当前帧的解码 信号中第 m个频带的能量或幅度的均值与前一帧的解码信号中第 n个频带的能 量或幅度的均值之间的比值在预设的阈值范围内, 其中 m和 n为正整数; 条件 三: 当前帧的解码信号为非摩擦音, 且当前帧的扩展频带的第二频谱包络与前 一帧的扩展频带的频谱包络之间的比值大于当前帧的解码信号中第 j 个频带的 能量或幅度的均值与前一帧的解码信号中第 k个频带的能量或幅度的均值之间 的比值, 其中 j和 k为正整数。 可选地, 作为另一实施例, 预测单元 320可在语音或音频信号的编码方式 为时域编码方式的情况下, 从解码信号中选取第三频带, 第三频带与扩展频带 相邻; 根据第三频带的频谱系数, 预测扩展频带的激励信号。 可选地, 作为另一实施例, 预测单元 320可在语音或音频信号的编码方式 为时频联合编码方式或者频域编码方式的情况下, 从解码信号中选取第四频带, 第四频带所分配的比特数目大于预设的比特数目阈值; 根据第四频带的频谱系 数, 预测扩展频带的激励信号。 本发明实施例中, 通过根据从语音或音频信号的比特流中得到的解码信号 分别预测扩展频带的频谱包络和激励信号, 使得能够确定语音或音频信号的扩 展频带的频域信号 , 因此能够提升语音或音频信号的性能。
图 4是根据本发明另一实施例的信号解码设备的示意框图。图 4的设备 400 的一个例子是解码器。 在图 4中, 与图 3相同或相似的部分沿用相同的附图标 记。 设备 400除了包括解码单元 310、 预测单元 320和确定单元 330之外, 还包 括第一合成单元 340和第一变换单元 350。
第一合成单元 340可在语音或音频信号的编码方式为时频联合编码方式或 者频域编码方式的情况下, 将解码信号与扩展频带的频域信号进行合成, 获取 频域输出信号。 第一变换单元 350可将频域输出信号进行频时变换, 获取最终 输出信号。
设备 400的其它功能和操作可参照上面图 1和图 2的方法实施例的过程, 为了避免重复, 此处不再贅述。
本发明实施例中, 通过根据从语音或音频信号的比特流中得到的解码信号 分别预测扩展频带的频谱包络和激励信号, 使得能够确定语音或音频信号的扩 展频带的频域信号, 因此能够提升语音或音频信号的性能。
图 5是根据本发明另一实施例的信号解码设备的示意框图。图 5的设备 500 的一个例子是解码器。 在图 5中, 与图 3和图 4相同或相似的部分沿用相同的 附图标记。设备 500除了包括解码单元 310、预测单元 320和确定单元 330之外, 还包括获取单元 360、 第二变换单元 370和第二合成单元 380。
获取单元 360可在语音或音频信号的编码方式为时域编码方式的情况下, 根据时域频带扩展方式, 获取扩展频带的第一时域信号。 第二变换单元 370可 将扩展频带的频域信号变换为扩展频带的第二时域信号。 第二合成单元 380可 对扩展频带的第一时域信号和扩展频带的第二时域信号进行合成, 获取扩展频 带的最终时域信号。 第二合成单元 380还可将解码信号与扩展频带的最终时域 信号进行合成, 获取输出信号。
设备 500的其它功能和操作可参照上面图 1和图 2的方法实施例的过程, 为了避免重复, 此处不再贅述。
本发明实施例中, 通过根据从语音或音频信号的比特流中得到的解码信号 分别预测扩展频带的频谱包络和激励信号, 使得能够确定语音或音频信号的扩 展频带的频域信号 , 因此能够提升语音或音频信号的性能。 图 6是根据本发明一个实施例的信号解码设备的示意框图。图 6的设备 600 的一个例子是解码器。 设备 600包括处理器 610和存储器 620。 存储器 620可以包括随机存储器、 闪存、 只读存储器、 可编程只读存储器、 非易失性存储器或寄存器等。 处理器 620可以是中央处理器( Central Processing Unit, CPU )。
存储器 610用于存储可执行指令。 处理器 620可以执行存储器 610中存储 的可执行指令, 用于: 对语音或音频信号的比特流进行解码, 获取解码信号; 根据解码信号预测扩展频带的激励信号, 其中, 扩展频带与解码信号的频带相 邻, 且解码信号的频带低于扩展频带; 在解码信号中选取第一频带和第二频带, 并根据第一频带的频谱系数以及第二频带的频谱系数预测扩展频带的频谱包 络, 其中, 第一频带的最高频点距离扩展频带的最低频点小于或等于第一值, 第二频带的最高频点距离第一频带的最低频点小于或等于第二值; 根据扩展频 带的频谱包络和扩展频带的激励信号, 确定扩展频带的频域信号。 本发明实施例中, 通过根据从语音或音频信号的比特流中得到的解码信号 分别预测扩展频带的频谱包络和激励信号, 使得能够确定语音或音频信号的扩 展频带的频域信号 , 因此能够提升语音或音频信号的性能。 设备 600的其它功能和操作可参照上面图 1和图 2的方法实施例的过程, 为了避免重复, 此处不再贅述。 可选地, 作为一个实施例, 处理器 610可按照从扩展频带的起始点向低频 的方向, 在解码信号中选取第一频带和第二频带, 其中第一频带的最高频点距 离扩展频带的最低频点等于第一值, 第一值为 0; 第二频带的最高频点距离第一 频带的最低频点等于第二值, 第二值为 0。
可选地, 作为另一实施例, 处理器 610可将第一频带划分为 M个子带, 并 根据第一频带的频谱系数确定每个子带的能量或幅度的均值,其中 M为正整数; 根据每个子带的能量或幅度的均值, 确定每个子带的能量或幅度的调整值; 根 据每个子带的能量或幅度的调整值, 预测扩展频带的第一频谱包络; 根据第二 频带的频谱系数, 确定第二频带的能量或幅度的均值; 根据扩展频带的第一频 谱包络以及第二频带的能量或幅度的均值, 预测扩展频带的频谱包络。
可选地, 作为另一实施例, 如果 M个子带的能量或幅度的均值的方差不在 预设的阈值范围内, 则处理器 610可将 a个子带中每个子带的能量或幅度的均 值进行调整以确定 a个子带中每个子带的能量或幅度的调整值, 并将 b个子带 中每个子带的能量或幅度的均值作为 b个子带中每个子带的能量或幅度的调整 值, 其中 a个子带中每个子带的能量或幅度的均值大于或等于均值阈值, b个 子带中每个子带的能量或幅度的均值小于均值阈值, a和 b为正整数,且 a+b=M。
如果 M个子带的能量或幅度的均值的方差在预设的阈值范围内, 则处理器 610可将每个子带的能量或幅度的均值作为每个子带的能量或幅度的调整值。
可选地, 作为另一实施例, 对于 M个子带中的第 i个子带和第 (i+1 )个子 带, 如果第 i个子带的能量或幅度的均值与第 (i+1 )个子带的能量或幅度的均 值之间的比值不在预设的阈值范围内,则处理器 610可在第 i个子带的能量或幅 度的均值大于第 (i+1 ) 个子带的能量或幅度的均值时, 对第 i个子带的能量或 幅度的均值进行调整以确定第 i个子带的能量或幅度的调整值, 并将第 (i+1 ) 个子带的能量或幅度的均值作为所述第 (i+1 )个子带的能量或幅度的调整值; 在第 i个子带的能量或幅度的均值小于第(i+1 )个子带的能量或幅度的均值时, 对第 (i+1 )个子带的能量或幅度的均值进行调整以确定第 (i+1 )个子带的能量 或幅度的调整值, 并将第 i个子带的能量或幅度的均值作为第 i个子带的能量或 幅度的调整值。
如果第 i个子带的能量或幅度的均值与第 (i+1 )个子带的能量或幅度的均 值之间的比值在预设的阈值范围内,则处理器 610可将第 i个子带的能量或幅度 的均值作为第 i个子带的能量或幅度的调整值, 并将第 (i+1 )个子带的能量或 幅度的均值作为第 (i+1 )个子带的调整值, 其中 i为正整数且 l≤i≤M-l。
可选地, 作为另一实施例, 处理器 610可根据当前帧的扩展频带的第一频 谱包络以及当前帧的第二频带的能量或幅度的均值, 确定当前帧的扩展频带的 第二频谱包络; 在确定满足预设条件的情况下, 对当前帧的扩展频带的第二频 谱包络与前一帧的扩展频带的频谱包络进行加权, 以确定当前帧的扩展频带的 频谱包络; 在确定不满足预设条件的情况下, 将当前帧的扩展频带的第二频谱 包络作为当前帧的扩展频带的频谱包络。 可选地, 作为另一实施例, 处理器 610可根据当前帧的扩展频带的第一频 谱包络以及当前帧的第二频带的能量或幅度的均值, 确定当前帧的扩展频带的 第二频谱包络; 在确定满足预设条件的情况下, 对当前帧的扩展频带的第二频 谱包络与前一帧的扩展频带的频谱包络进行加权, 以确定当前帧的扩展频带的 第三频谱包络; 在确定不满足预设条件的情况下, 将当前帧的扩展频带的第二 频谱包络作为当前帧的扩展频带的第三频谱包络; 根据解码信号的基音周期、 解码信号的浊音度因子以及当前帧的扩展频带的第三频谱包络, 确定当前帧的 扩展频带的频谱包络。 可选地, 作为另一实施例, 上述预设条件可包括以下三个条件中的至少一 个: 条件一: 当前帧的语音或音频信号的编码方式与前一帧的语音或音频信号 的编码方式不相同; 条件二: 前一帧的解码信号为非摩擦音, 且当前帧的解码 信号中第 m个频带的能量或幅度的均值与前一帧的解码信号中第 n个频带的能 量或幅度的均值之间的比值在预设的阈值范围内, 其中 m和 n为正整数; 条件 三: 当前帧的解码信号为非摩擦音, 且当前帧的扩展频带的第二频谱包络与前 一帧的扩展频带的频谱包络之间的比值大于当前帧的解码信号中第 j 个频带的 能量或幅度的均值与前一帧的解码信号中第 k个频带的能量或幅度的均值之间 的比值, 其中 j和 k为正整数。 可选地, 作为另一实施例, 处理器 610可在语音或音频信号的编码方式为 时域编码方式的情况下, 从解码信号中选取第三频带, 第三频带与所述扩展频 带相邻; 根据第三频带的频谱系数, 预测扩展频带的激励信号。 可选地, 作为另一实施例, 处理器 610可在语音或音频信号的编码方式为 时频联合编码方式或者频域编码方式的情况下, 从解码信号中选取第四频带, 第四频带所分配的比特数目大于预设的比特数目阈值; 根据第四频带的频谱系 数, 预测扩展频带的激励信号。 可选地, 作为另一实施例, 处理器 610还可在语音或音频信号的编码方式 为时频联合编码方式或者频域编码方式的情况下, 将解码信号与扩展频带的频 域信号进行合成, 获取频域输出信号; 将频域输出信号进行频时变换, 获取最 终输出信号。 可选地, 作为另一实施例, 处理器 610还可在语音或音频信号的编码方式 为时域编码方式的情况下, 根据时域频带扩展方式, 获取扩展频带的第一时域 信号; 将扩展频带的频域信号变换为扩展频带的第二时域信号; 对扩展频带的 第一时域信号和扩展频带的第二时域信号进行合成, 获取扩展频带的最终时域 信号; 将解码信号与扩展频带的最终时域信号进行合成, 获取最终输出信号。
存储器 620可存储上述处理器 610执行的过程中产生的数据信息。 处理器 610可从存储器 620中读取这些数据信息。
本发明实施例中, 通过根据从语音或音频信号的比特流中得到的解码信号 分别预测扩展频带的频谱包络和激励信号, 使得能够确定语音或音频信号的扩 展频带的频域信号, 因此能够提升语音或音频信号的性能。
图 7是根据本发明实施例的信号编码方法的示意性流程图。 图 Ί的方法由 编码端执行, 例如信号编码设备。 信号编码设备将输入信号分成两部分, 低频 带信号和扩展频带信号, 核心层处理低频带信号, 扩展层处理扩展频带信号。 该信号编码方法包括:
710, 对语音或音频信号进行核心层编码, 得到语音或音频信号的核心层码 流。
720, 对语音或音频信号进行扩展层处理, 确定扩展频带的第一包络。
扩展频带的第一包络可以是扩展频带的原始的包络。 此处, 第一包络可以 是频域包络, 也可以是时域包络。
730, 根据语音或音频信号的信噪比、 语音或音频信号的基音周期和扩展频 带的第一包络, 确定扩展频带的第二包络。
具体地, 编码端可以根据语音或音频信号的信噪比以及语音或音频信号的 基音周期, 对扩展频带的第一包络进一步修正, 使得扩展频带的第二包络与信 噪比成反比, 与基音周期成正比, 从而确定扩展频带的第二包络。 例如, 编码 端可以根据如下等式确定扩展频带的第二包络 wenv2:
wen2=(al *pitch*pitch+b 1 *pitch+c 1 )/(a2 * snr* snr+b2 * snr+c2) * wenv 1 , 其中, wenvl 可以表示扩展频带的第一包络, pitch可以表示语音或音频信 号的基音周期, snr可以表示语音或音频信号的信噪比, al和 bl不能同时为 0 , a2、 b2和 c2不能同时为 0。
740, 对第二包络进行编码, 得到扩展层码流。
也就是, 将第二包络的量化索引写入扩展层码流。 此外, 扩展层码流中还 可以包括其它相关参数的量化索引。
750, 向解码端发送核心层码流以及扩展层码流。
本发明实施例可以适用于扩展频带有比特的情况。
本发明实施例中, 通过确定扩展频带的第一包络, 并根据语音或音频信号 的信噪比、 语音或音频信号的基音周期和扩展频带的第一包络确定扩展频带的 第二包络, 使得解码端能够根据核心层码流和扩展频带的第二包络确定扩展频 带的信号, 从而能够提升语音或音频信号的性能。
图 8是根据本发明实施例的信号解码方法的示意性流程图。 图 8的方法由 解码端执行, 例如, 信号解码设备。
810, 从编码端接收语音或音频信号的核心层码流以及扩展层码流。
820, 对扩展层码流进行解码, 确定扩展频带的第二包络, 其中第二包络是 是编码端根据语音或音频信号的信噪比、 语音或音频信号的基音周期和扩展频 带的第一包络确定的。
扩展频带的第一包络可以是扩展频带的原始的包络。 第一包络可以是时域 包络, 也可以是频域包络。
830, 对核心层码流进行解码, 得到核心层语音或音频信号。
840, 根据核心层语音或音频信号预测扩展频带的激励信号。
850, 根据扩展频带的激励信号和扩展频带的第二包络, 预测扩展频带的信 号。
本发明实施例中, 通过接收编码端根据语音或音频信号的信噪比、 语音或 音频信号的基音周期和扩展频带的第一包络确定的扩展频带的第二包络, 使得 解码端能够根据扩展频带的第二包络和扩展频带的激励信号预测扩展频带的信 号, 从而能够提升语音或音频信号的性能。
图 9是根据本发明实施例的信号编码设备的示意框图。 图 9的设备 900的 一个例子是编码器。 设备 900包括编码单元 910、 第一确定单元 920、 第二确定 单元 930和发送单元 940。
编码单元 910对语音或音频信号进行核心层编码, 得到语音或音频信号的 核心层码流。 第一确定单元 920在语音或音频信号进行扩展层处理, 确定扩展 频带的第一包络。 第二确定单元 930根据语音或音频信号的信噪比、 语音或音 频信号的基音周期和扩展频带的第一包络, 确定扩展频带的第二包络。 编码单 元 910还对第二包络进行编码, 得到扩展层码流。 发送单元 940向解码端发送 核心层码流和扩展层码流。
图 9的设备 900的其它功能和操作可以参照上面图 Ί的方法实施例的过程, 为了避免重复, 此处不再贅述。
本发明实施例中, 通过确定扩展频带的第一包络, 并根据语音或音频信号 的信噪比、 语音或音频信号的基音周期和扩展频带的第一包络确定扩展频带的 第二包络, 使得解码端能够根据核心层码流和扩展频带的第二包络确定扩展频 带的信号, 从而能够提升语音或音频信号的性能。
图 10是根据本发明实施例的信号解码设备的示意框图。 图 10的设备 1000 的一个例子是解码器。 设备 1000包括接收单元 1010、 解码单元 1020和预测单 元 1030。
接收单元 1010从编码端接收语音或音频信号的核心层码流和扩展层码流。 解码单元 1020对扩展层码流进行解码, 确定扩展频带的第二包络, 其中第二包 络是编码端根据语音或音频信号的信噪比、 语音或音频信号的基音周期和扩展 频带的第一包络确定的。 解码单元 1020还对核心层码流进行解码, 得到核心层 语音或音频信号。 预测单元 1030根据核心层语音或音频信号, 预测扩展频带的 激励信号。 预测单元 1030根据扩展频带的激励信号和扩展频带的第二包络, 预 测扩展频带的信号。
设备 1000的其它功能和操作可以参照上面图 8的方法实施例的过程, 为了 避免重复, 此处不再贅述。
本发明实施例中, 通过接收编码端根据语音或音频信号的信噪比、 语音或 音频信号的基音周期和扩展频带的第一包络确定的扩展频带的第二包络, 使得 解码端能够根据扩展频带的第二包络和扩展频带的激励信号预测扩展频带的信 号, 从而能够提升语音或音频信号的性能。 本领域普通技术人员可以意识到, 结合本文中所公开的实施例描述的各示 例的单元及算法步骤, 能够以电子硬件、 或者计算机软件和电子硬件的结合来 实现。 这些功能究竟以硬件还是软件方式来执行, 取决于技术方案的特定应用 和设计约束条件。 专业技术人员可以对每个特定的应用来使用不同方法来实现 所描述的功能, 但是这种实现不应认为超出本发明的范围。 所属领域的技术人员可以清楚地了解到, 为描述的方便和简洁, 上述描述 的系统、 装置和单元的具体工作过程, 可以参考前述方法实施例中的对应过程, 在此不再赞述。
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统、 装置和方 法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示意性 的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可以有另 外的划分方式, 例如多个单元或组件可以结合或者可以集成到另一个系统, 或 一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间的耦合或直 接耦合或通信连接可以是通过一些接口, 装置或单元的间接耦合或通信连接, 可以是电性, 机械或其它的形式。 所述作为分离部件说明的单元可以是或者也可以不是物理上分开的, 作为 单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者 也可以分布到多个网络单元上。 可以根据实际的需要选择其中的部分或者全部 单元来实现本实施例方案的目的。 另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元中, 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一个单元 中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用 时, 可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发明的技 术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以 软件产品的形式体现出来, 该计算机软件产品存储在一个存储介质中, 包括若 干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备 等)执行本发明各个实施例所述方法的全部或部分步骤。 而前述的存储介质包 括: U盘、 移动硬盘、 只读存储器(ROM, Read-Only Memory ), 随机存取存储 器(RAM, Random Access Memory ),磁碟或者光盘等各种可以存储程序代码的 介质。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限于 此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到 变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应 以所述权利要求的保护范围为准。

Claims

权 利 要 求
1. 一种信号解码方法, 其特征在于, 包括:
对语音或音频信号的比特流进行解码, 获取解码信号;
根据所述解码信号预测扩展频带的激励信号, 其中, 所述扩展频带与所述 解码信号的频带相邻, 且所述解码信号的频带低于所述扩展频带;
在所述解码信号中选取第一频带和第二频带, 根据所述第一频带的频谱系 数以及所述第二频带的频谱系数预测所述扩展频带的频谱包络, 其中, 所述第 一频带的最高频点距离所述扩展频带的最低频点小于或等于第一值, 所述第二 频带的最高频点距离所述第一频带的最低频点小于或等于第二值;
根据所述扩展频带的频谱包络和所述扩展频带的激励信号, 确定所述扩展 频带的频域信号。
2. 根据权利要求 1所述的方法, 其特征在于, 所述在所述解码信号中选取 第一频带和第二频带, 包括:
按照从所述扩展频带的起始点向低频的方向, 在所述解码信号的频带中选 取第一频带和第二频带; 其中所述第一频带的最高频点距离所述扩展频带的最 低频点等于所述第一值, 所述第一值为 0; 所述第二频带的最高频点距离所述第 一频带的最低频点等于所述第二值, 所述第二值为 0。
3. 根据权利要求 1或 2所述的方法, 其特征在于, 所述根据所述第一频带 的频谱系数以及所述第二频带的频谱系数预测所述扩展频带的频谱包络, 包括: 将所述第一频带划分为 M个子带, 并根据所述第一频带的频谱系数确定每 个子带的能量或幅度的均值, 其中 M为正整数;
根据所述每个子带的能量或幅度的均值, 确定所述每个子带的能量或幅度 的调整值;
根据所述每个子带的能量或幅度的调整值, 预测所述扩展频带的第一频谱 包络;
根据所述第二频带的频谱系数, 确定所述第二频带的能量或幅度的均值; 根据所述扩展频带的第一频谱包络以及所述第二频带的能量或幅度的均 值, 预测所述扩展频带的频谱包络。
4. 根据权利要求 3所述的方法, 其特征在于, 所述根据所述每个子带的能 量或幅度的均值, 确定所述每个子带的能量或幅度的调整值, 包括:
如果所述 M个子带的能量或幅度的均值的方差不在预设的阈值范围内, 则 将 a个子带中每个子带的能量或幅度的均值进行调整以确定所述 a个子带中每个 子带的能量或幅度的调整值, 并将 b个子带中每个子带的能量或幅度的均值作 为所述 b个子带中每个子带的能量或幅度的调整值, 其中所述 a个子带中每个 子带的能量或幅度的均值大于或等于均值阈值, 所述 b个子带中每个子带的能 量或幅度的均值小于所述均值阈值, a和 b为正整数, 且 a+b=M;
如果所述 M个子带的能量或幅度的均值的方差在预设的阈值范围内, 则将 所述每个子带的能量或幅度的均值作为所述每个子带的能量或幅度的调整值。
5. 根据权利要求 3所述的方法, 其特征在于, 所述根据所述每个子带的能 量或幅度的均值, 确定所述每个子带的能量或幅度的调整值, 包括:
对于所述 M个子带中的第 i个子带和第 (i+1 )个子带,
如果所述第 i个子带的能量或幅度的均值与所述第 (i+1 )个子带的能量或 幅度的均值之间的比值不在预设的阈值范围内,则在所述第 i个子带的能量或幅 度的均值大于所述第 (i+1 )个子带的能量或幅度的均值时, 对所述第 i个子带 的能量或幅度的均值进行调整以确定所述第 i个子带的能量或幅度的调整值,并 将所述第 (i+1 )个子带的能量或幅度的均值作为所述第 (i+1 )个子带的能量或 幅度的调整值; 在所述第 i个子带的能量或幅度的均值小于所述第 (i+1 )个子 带的能量或幅度的均值时, 对所述第 (i+1 ) 个子带的能量或幅度的均值进行调 整以确定所述第 (i+1 ) 个子带的能量或幅度的调整值, 并将所述第 i个子带的 能量或幅度的均值作为所述第 i个子带的能量或幅度的调整值;
如果所述第 i个子带的能量或幅度的均值与所述第 (i+1 )个子带的能量或 幅度的均值之间的比值在预设的阈值范围内,则将所述第 i个子带的能量或幅度 的均值作为所述第 i个子带的能量或幅度的调整值, 并将所述第 (i+1 ) 个子带 的能量或幅度的均值作为所述第 (i+1 ) 个子带的调整值, 其中 i 为正整数且 l≤i≤M-l。
6. 根据权利要求 3至 5中任一项所述的方法, 其特征在于, 所述根据所述 扩展频带的第一频谱包络以及所述第二频带的能量或幅度的均值, 预测所述扩 展频带的频谱包络, 包括:
根据当前帧的扩展频带的第一频谱包络以及所述当前帧的第二频带的能量 或幅度的均值, 确定所述当前帧的扩展频带的第二频谱包络;
在确定满足预设条件的情况下, 对所述当前帧的扩展频带的第二频谱包络 与前一帧的扩展频带的频谱包络进行加权, 以确定所述当前帧的扩展频带的频 谱包络;
在确定不满足预设条件的情况下, 将所述当前帧的扩展频带的第二频谱包 络作为所述当前帧的扩展频带的频谱包络。
7. 根据权利要求 3至 5中任一项所述的方法, 其特征在于, 所述根据所述 扩展频带的第一频谱包络以及所述第二频带的能量或幅度的均值, 预测所述扩 展频带的频谱包络, 包括:
根据当前帧的扩展频带的第一频谱包络以及所述当前帧的第二频带的能量 或幅度的均值, 确定所述当前帧的扩展频带的第二频谱包络;
在确定满足预设条件的情况下, 对所述当前帧的扩展频带的第二频谱包络 与前一帧的扩展频带的频谱包络进行加权, 以确定所述当前帧的扩展频带的第 三频谱包络;
在确定不满足预设条件的情况下, 将所述当前帧的扩展频带的第二频谱包 络作为所述当前帧的扩展频带的第三频谱包络;
根据所述解码信号的基音周期、 所述解码信号的浊音度因子以及所述当前 帧的扩展频带的第三频谱包络, 确定所述当前帧的扩展频带的频谱包络。
8. 根据权利要求 6或 7所述的方法, 其特征在于, 所述预设条件包括以下 三个条件中的至少一个:
条件一: 所述当前帧的语音或音频信号的编码方式与所述前一帧的语音或 音频信号的编码方式不相同;
条件二: 所述前一帧的解码信号为非摩擦音, 且所述当前帧的解码信号中 第 m个频带的能量或幅度的均值与所述前一帧的解码信号中第 n个频带的能量 或幅度的均值之间的比值在预设的阈值范围内, 其中 m和 n为正整数; 条件三: 所述当前帧的解码信号为非摩擦音, 且所述当前帧的扩展频带的 第二频谱包络与所述前一帧的扩展频带的频谱包络之间的比值大于所述当前帧 的解码信号中第 j个频带的能量或幅度的均值与所述前一帧的解码信号中第 k个 频带的能量或幅度的均值之间的比值, 其中 j和 k为正整数。
9. 根据权利要求 1至 8中任一项所述的方法, 其特征在于, 所述根据所述 解码信号预测所述扩展频带的激励信号, 包括:
在所述语音或音频信号的编码方式为时域编码方式的情况下, 从所述解码 信号中选取第三频带, 所述第三频带与所述扩展频带相邻;
根据所述第三频带的频谱系数, 预测所述扩展频带的激励信号。
10. 根据权利要求 1至 8中任一项所述的方法, 其特征在于, 所述根据所 述解码信号预测所述扩展频带的激励信号, 包括: 的情况下, 从所述解码信号中选取第四频带, 所述第四频带所分配的比特数目 大于预设的比特数目阈值;
根据所述第四频带的频谱系数, 预测所述扩展频带的激励信号。
11. 根据权利要求 1至 10中任一项所述的方法, 其特征在于, 所述方法还 包括: 的情况下, 将所述解码信号与所述扩展频带的频域信号进行合成, 获取频域输 出信号;
将所述频域输出信号进行频时变换, 获取最终输出信号。
12. 根据权利要求 1至 10中任一项所述的方法, 其特征在于, 所述方法还 包括:
在所述语音或音频信号的编码方式为时域编码方式的情况下, 根据时域频 带扩展方式, 获取所述扩展频带的第一时域信号;
将所述扩展频带的频域信号变换为所述扩展频带的第二时域信号; 对所述扩展频带的第一时域信号和所述扩展频带的第二时域信号进行合 成, 获取所述扩展频带的最终时域信号;
将所述解码信号与所述扩展频带的最终时域信号进行合成, 获取最终输出 信号。
13. 一种信号解码设备, 其特征在于, 包括:
解码单元, 用于对语音或音频信号的比特流进行解码, 获取解码信号; 所述预测单元, 用于从所述解码单元接收所述解码信号, 并根据所述解码 信号预测扩展频带的激励信号, 其中, 所述扩展频带与所述解码信号的频带相 邻, 且所述解码信号的频带低于所述扩展频带;
所述预测单元, 还用于在所述解码信号中选取第一频带和第二频带, 并根 据所述第一频带的频谱系数以及所述第二频带的频谱系数预测所述扩展频带的 频谱包络, 其中, 所述第一频带的最高频点距离所述扩展频带的最低频点小于 或等于第一值, 所述第二频带的最高频点距离所述第一频带的最低频点小于或 等于第二值;
所述确定单元, 用于从所述预测单元接收所述扩展频带的频谱包络和所述 扩展频带的激励信号, 根据所述扩展频带的频谱包络和所述扩展频带的激励信 号, 确定所述扩展频带的频域信号。
14. 根据权利要求 13所述的设备, 其特征在于, 所述预测单元具体用于按 照从所述扩展频带的起始点向低频的方向, 在所述解码信号中选取第一频带和 第二频带, 其中所述第一频带的最高频点距离所述扩展频带的最低频点等于所 述第一值, 所述第一值为 0; 所述第二频带的最高频点距离所述第一频带的最低 频点等于所述第二值, 所述第二值为 0。
15. 根据权利要求 13或 14所述的设备, 其特征在于, 所述预测单元具体 用于将所述第一频带划分为 M个子带, 并根据所述第一频带的频谱系数确定每 个子带的能量或幅度的均值, 其中 M为正整数; 根据所述每个子带的能量或幅 度的均值, 确定所述每个子带的能量或幅度的调整值; 根据所述每个子带的能 量或幅度的调整值, 预测所述扩展频带的第一频谱包络; 根据所述第二频带的 频谱系数, 确定所述第二频带的能量或幅度的均值; 根据所述扩展频带的第一 频谱包络以及所述第二频带的能量或幅度的均值, 预测所述扩展频带的频谱包 络。
16. 根据权利要求 15所述的设备, 其特征在于, 所述预测单元具体用于如 果所述 M个子带的能量或幅度的均值的方差不在预设的阈值范围内, 则将 a个 子带中每个子带的能量或幅度的均值进行调整以确定所述 a个子带中每个子带 的能量或幅度的调整值, 并将 b个子带中每个子带的能量或幅度的均值作为所 述 b个子带中每个子带的能量或幅度的调整值, 其中所述 a个子带中每个子带 的能量或幅度的均值大于或等于均值阈值, 所述 b个子带中每个子带的能量或 幅度的均值小于所述均值阈值, a和 b为正整数, 且 a+b=M; 如果所述 M个子 带的能量或幅度的均值的方差在预设的阈值范围内, 则将所述每个子带的能量 或幅度的均值作为所述每个子带的能量或幅度的调整值。
17. 根据权利要求 15所述的设备, 其特征在于, 所述预测单元具体用于对 于所述 M个子带中的第 i个子带和第 (i+1 )个子带,
如果所述第 i个子带的能量或幅度的均值与所述第 (i+1 )个子带的能量或 幅度的均值之间的比值不在预设的阈值范围内,则在所述第 i个子带的能量或幅 度的均值大于所述第 (i+1 )个子带的能量或幅度的均值时, 对所述第 i个子带 的能量或幅度的均值进行调整以确定所述第 i个子带的能量或幅度的调整值,并 将所述第 (i+1 )个子带的能量或幅度的均值作为所述第 (i+1 )个子带的能量或 幅度的调整值; 在所述第 i个子带的能量或幅度的均值小于所述第 (i+1 )个子 带的能量或幅度的均值时, 对所述第 (i+1 ) 个子带的能量或幅度均值进行调整 以确定所述第 (i+1 )个子带的能量或幅度的调整值, 并将所述第 i个子带的能 量或幅度的均值作为所述第 i个子带的能量或幅度的调整值;
如果所述第 i个子带的能量或幅度的均值与所述第 (i+1 )个子带的能量或 幅度的均值之间的比值在预设的阈值范围内,则将所述第 i个子带的能量或幅度 的均值作为所述第 i个子带的能量或幅度的调整值, 并将所述第 (i+1 ) 个子带 的能量或幅度的均值作为所述第 (i+1 ) 个子带的调整值, 其中 i 为正整数且 l≤i≤M-l。
18. 根据权利要求 15至 17中任一项所述的设备, 其特征在于, 所述预测 单元具体用于根据当前帧的扩展频带的第一频谱包络以及所述当前帧的第二频 带的能量或幅度的均值, 确定所述当前帧的扩展频带的第二频谱包络; 在确定 满足预设条件的情况下, 对所述当前帧的扩展频带的第二频谱包络与前一帧的 扩展频带的频谱包络进行加权, 以确定所述当前帧的扩展频带的频谱包络; 在 确定不满足预设条件的情况下, 将所述当前帧的扩展频带的第二频谱包络作为 所述当前帧的扩展频带的频谱包络。
19. 根据权利要求 15至 17中任一项所述的设备, 其特征在于, 所述预测 单元具体用于根据当前帧的扩展频带的第一频谱包络以及所述当前帧的第二频 带的能量或幅度的均值, 确定所述当前帧的扩展频带的第二频谱包络; 在确定 满足预设条件的情况下, 对所述当前帧的扩展频带的第二频谱包络与前一帧的 扩展频带的频谱包络进行加权, 以确定所述当前帧的扩展频带的第三频谱包络; 在确定不满足预设条件的情况下, 将所述当前帧的扩展频带的第二频谱包络作 为所述当前帧的扩展频带的第三频谱包络; 根据所述解码信号的基音周期、 所 述解码信号的浊音度因子以及所述当前帧的扩展频带的第三频谱包络, 确定所 述当前帧的扩展频带的频谱包络。
20. 根据权利要求 18或 19所述的设备, 其特征在于, 所述预设条件包括 以下三个条件中的至少一个: 条件一: 所述当前帧的语音或音频信号的编码方式与所述前一帧的语音或 音频信号的编码方式不相同;
条件二: 所述前一帧的解码信号为非摩擦音, 且所述当前帧的解码信号中 第 m个频带的能量或幅度的均值与所述前一帧的解码信号中第 n个频带的能量 或幅度的均值之间的比值在预设的阈值范围内, 其中 m和 n为正整数;
条件三: 所述当前帧的解码信号为非摩擦音, 且所述当前帧的扩展频带的 第二频谱包络与所述前一帧的扩展频带的频谱包络之间的比值大于所述当前帧 的解码信号中第 j个频带的能量或幅度的均值与所述前一帧的解码信号中第 k个 频带的能量或幅度的均值之间的比值, j和 k为正整数。
21. 根据权利要求 13至 20中任一项所述的设备, 其特征在于, 所述预测 单元具体用于在所述语音或音频信号的编码方式为时域编码方式的情况下, 从 所述解码信号中选取第三频带, 所述第三频带与所述扩展频带相邻; 根据所述 第三频带的频谱系数, 预测所述扩展频带的激励信号。
22. 根据权利要求 13至 20中任一项所述的设备, 其特征在于, 所述预测 单元具体用于在所述语音或音频信号的编码方式为时频联合编码方式或者频域 编码方式的情况下, 从所述解码信号中选取第四频带, 所述第四频带所分配的 比特数目大于预设的比特数目阈值; 根据所述第四频带的频谱系数, 预测所述 扩展频带的激励信号。
23. 根据权利要求 13至 22中任一项所述的设备, 其特征在于, 所述设备 还包括:
第一合成单元, 用于在所述语音或音频信号的编码方式为时频联合编码方 式或者频域编码方式的情况下, 将所述解码信号与所述扩展频带的频域信号进 行合成, 获取频域输出信号;
第一变换单元, 用于将所述频域输出信号进行频时变换, 获取最终输出信 号。
24. 根据权利要求 13至 22中任一项所述的设备, 其特征在于, 所述设备 还包括:
获取单元, 用于在所述语音或音频信号的编码方式为时域编码方式的情况 下, 根据时域频带扩展方式, 获取所述扩展频带的第一时域信号;
第二变换单元, 用于将所述扩展频带的频域信号变换为所述扩展频带的第 二时域信号;
第二合成单元, 用于对所述扩展频带的第一时域信号和所述扩展频带的第 二时域信号进行合成, 获取所述扩展频带的最终时域信号;
第二合成单元还用于将所述解码信号与所述扩展频带的最终时域信号进行 合成, 获取最终输出信号。
25. 一种信号编码方法, 其特征在于, 包括:
对语音或音频信号进行核心层编码, 得到所述语音或音频信号的核心层码 流;
对所述语音或音频信号进行扩展层处理, 确定扩展频带的第一包络; 根据所述语音或音频信号的信噪比、 所述语音或音频信号的基音周期和所 述扩展频带的第一包络, 确定所述扩展频带的第二包络;
对所述第二包络进行编码, 得到扩展层码流;
向解码端发送所述核心层码流和所述扩展层码流。
26. 一种信号解码方法, 其特征在于, 包括:
从编码端接收语音或音频信号的核心层码流和扩展层码流;
对所述扩展层码流进行解码, 确定扩展频带的第二包络, 其中所述第二包 络是所述编码端根据所述语音或音频信号的信噪比、 所述语音或音频信号的基 音周期和所述扩展频带的第一包络确定的;
对所述核心层码流进行解码, 得到核心层语音或音频信号;
根据所述核心层语音或音频信号, 预测所述扩展频带的激励信号; 根据所述扩展频带的激励信号和所述扩展频带的第二包络, 预测所述扩展 频带的信号。
27. 一种信号编码设备, 其特征在于, 包括:
编码单元, 用于对语音或音频信号进行核心层编码, 得到所述语音或音频 信号的核心层码流;
第一确定单元, 用于对所述语音或音频信号进行扩展层处理, 确定所述扩 展频带的第一包络;
第二确定单元, 用于根据所述语音或音频信号的信噪比、 所述语音或音频 信号的基音周期和所述扩展频带的第一包络, 确定所述扩展频带的第二包络; 所述编码单元还用于对所述第二包络进行编码, 得到扩展层码流; 发送单元, 用于向解码端发送所述核心层码流和所述扩展层码流。
28. 一种信号解码设备, 其特征在于, 包括:
接收单元, 用于从编码端接收语音或音频信号的核心层码流和扩展层码流; 解码单元, 用于对所述扩展层码流进行解码, 确定扩展频带的第二包络, 其中所述第二包络是所述编码端根据所述语音或音频信号的信噪比、 所述语音 或音频信号的基音周期和所述扩展频带的第一包络确定的; 所述解码单元, 还用于对所述核心层码流进行解码, 得到核心层语音或音 频信号;
预测单元, 用于根据所述核心层语音或音频信号, 预测所述扩展频带的激 励信号;
所述预测单元还用于根据所述扩展频带的激励信号和所述扩展频带的第二 包络, 预测所述扩展频带的信号。
PCT/CN2013/084514 2013-05-31 2013-09-27 信号解码方法及设备 WO2014190649A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP13886051.5A EP2991074B1 (en) 2013-05-31 2013-09-27 Signal decoding method and device
US14/952,902 US9892739B2 (en) 2013-05-31 2015-11-25 Bandwidth extension audio decoding method and device for predicting spectral envelope
US15/894,517 US10490199B2 (en) 2013-05-31 2018-02-12 Bandwidth extension audio decoding method and device for predicting spectral envelope

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310213593.5 2013-05-31
CN201310213593.5A CN104217727B (zh) 2013-05-31 2013-05-31 信号解码方法及设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/952,902 Continuation US9892739B2 (en) 2013-05-31 2015-11-25 Bandwidth extension audio decoding method and device for predicting spectral envelope

Publications (1)

Publication Number Publication Date
WO2014190649A1 true WO2014190649A1 (zh) 2014-12-04

Family

ID=51987923

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/084514 WO2014190649A1 (zh) 2013-05-31 2013-09-27 信号解码方法及设备

Country Status (4)

Country Link
US (2) US9892739B2 (zh)
EP (1) EP2991074B1 (zh)
CN (1) CN104217727B (zh)
WO (1) WO2014190649A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103426441B (zh) 2012-05-18 2016-03-02 华为技术有限公司 检测基音周期的正确性的方法和装置
CN103716470B (zh) * 2012-09-29 2016-12-07 华为技术有限公司 语音质量监控的方法和装置
CN104217727B (zh) * 2013-05-31 2017-07-21 华为技术有限公司 信号解码方法及设备
PL3163571T3 (pl) * 2014-07-28 2020-05-18 Nippon Telegraph And Telephone Corporation Kodowanie sygnału dźwiękowego
US10049684B2 (en) * 2015-04-05 2018-08-14 Qualcomm Incorporated Audio bandwidth selection
CN108702260B (zh) 2016-04-01 2020-12-01 华为技术有限公司 反馈信息的发送、接收方法、终端设备及接入网设备
US10839814B2 (en) * 2017-10-05 2020-11-17 Qualcomm Incorporated Encoding or decoding of audio signals
WO2019142513A1 (ja) * 2018-01-17 2019-07-25 日本電信電話株式会社 符号化装置、復号装置、摩擦音判定装置、これらの方法及びプログラム
KR102570480B1 (ko) 2019-01-04 2023-08-25 삼성전자주식회사 오디오 신호 처리 방법 및 이를 지원하는 전자 장치

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5455888A (en) * 1992-12-04 1995-10-03 Northern Telecom Limited Speech bandwidth extension method and apparatus
CN1535459A (zh) * 2001-07-26 2004-10-06 日本电气株式会社 语音带宽扩展装置及语音带宽扩展方法
CN101023470A (zh) * 2004-09-17 2007-08-22 松下电器产业株式会社 语音编码装置、语音解码装置、通信装置及语音编码方法

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6862567B1 (en) * 2000-08-30 2005-03-01 Mindspeed Technologies, Inc. Noise suppression in the frequency domain by adjusting gain according to voicing parameters
SE522553C2 (sv) * 2001-04-23 2004-02-17 Ericsson Telefon Ab L M Bandbreddsutsträckning av akustiska signaler
DE602005001048T2 (de) * 2005-01-31 2008-01-03 Harman Becker Automotive Systems Gmbh Erweiterung der Bandbreite eines schmalbandigen Sprachsignals
DE102005032724B4 (de) * 2005-07-13 2009-10-08 Siemens Ag Verfahren und Vorrichtung zur künstlichen Erweiterung der Bandbreite von Sprachsignalen
KR20070115637A (ko) * 2006-06-03 2007-12-06 삼성전자주식회사 대역폭 확장 부호화 및 복호화 방법 및 장치
KR101379263B1 (ko) 2007-01-12 2014-03-28 삼성전자주식회사 대역폭 확장 복호화 방법 및 장치
FR2912249A1 (fr) * 2007-02-02 2008-08-08 France Telecom Codage/decodage perfectionnes de signaux audionumeriques.
US7912729B2 (en) * 2007-02-23 2011-03-22 Qnx Software Systems Co. High-frequency bandwidth extension in the time domain
EP1970900A1 (en) * 2007-03-14 2008-09-17 Harman Becker Automotive Systems GmbH Method and apparatus for providing a codebook for bandwidth extension of an acoustic signal
JP5618826B2 (ja) * 2007-06-14 2014-11-05 ヴォイスエイジ・コーポレーション Itu.t勧告g.711と相互運用可能なpcmコーデックにおいてフレーム消失を補償する装置および方法
ATE456130T1 (de) * 2007-10-29 2010-02-15 Harman Becker Automotive Sys Partielle sprachrekonstruktion
KR101452722B1 (ko) * 2008-02-19 2014-10-23 삼성전자주식회사 신호 부호화 및 복호화 방법 및 장치
EP2144230A1 (en) * 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Low bitrate audio encoding/decoding scheme having cascaded switches
EP2151821B1 (en) * 2008-08-07 2011-12-14 Nuance Communications, Inc. Noise-reduction processing of speech signals
WO2010031003A1 (en) * 2008-09-15 2010-03-18 Huawei Technologies Co., Ltd. Adding second enhancement layer to celp based core layer
EP4053838B1 (en) * 2008-12-15 2023-06-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio bandwidth extension decoder, corresponding method and computer program
US8463599B2 (en) * 2009-02-04 2013-06-11 Motorola Mobility Llc Bandwidth extension method and apparatus for a modified discrete cosine transform audio coder
EP2239732A1 (en) * 2009-04-09 2010-10-13 Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. Apparatus and method for generating a synthesis audio signal and for encoding an audio signal
FR2947945A1 (fr) * 2009-07-07 2011-01-14 France Telecom Allocation de bits dans un codage/decodage d'amelioration d'un codage/decodage hierarchique de signaux audionumeriques
CN102714041B (zh) * 2009-11-19 2014-04-16 瑞典爱立信有限公司 改进的激励信号带宽扩展
US8924200B2 (en) * 2010-10-15 2014-12-30 Motorola Mobility Llc Audio signal bandwidth extension in CELP-based speech coder
WO2012081166A1 (ja) * 2010-12-14 2012-06-21 パナソニック株式会社 符号化装置、復号装置およびそれらの方法
CN102610231B (zh) * 2011-01-24 2013-10-09 华为技术有限公司 一种带宽扩展方法及装置
WO2012108680A2 (ko) * 2011-02-08 2012-08-16 엘지전자 주식회사 대역 확장 방법 및 장치
CN102208188B (zh) * 2011-07-13 2013-04-17 华为技术有限公司 音频信号编解码方法和设备
KR101144610B1 (ko) * 2011-08-02 2012-05-11 한국기계연구원 투명 전극의 전도성 메쉬 매설 방법
KR101398189B1 (ko) * 2012-03-27 2014-05-22 광주과학기술원 음성수신장치 및 음성수신방법
CN104217727B (zh) * 2013-05-31 2017-07-21 华为技术有限公司 信号解码方法及设备
CN104517611B (zh) * 2013-09-26 2016-05-25 华为技术有限公司 一种高频激励信号预测方法及装置
CN104517610B (zh) * 2013-09-26 2018-03-06 华为技术有限公司 频带扩展的方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5455888A (en) * 1992-12-04 1995-10-03 Northern Telecom Limited Speech bandwidth extension method and apparatus
CN1535459A (zh) * 2001-07-26 2004-10-06 日本电气株式会社 语音带宽扩展装置及语音带宽扩展方法
CN101023470A (zh) * 2004-09-17 2007-08-22 松下电器产业株式会社 语音编码装置、语音解码装置、通信装置及语音编码方法

Also Published As

Publication number Publication date
EP2991074A1 (en) 2016-03-02
EP2991074B1 (en) 2019-05-15
US20160086613A1 (en) 2016-03-24
US20180166085A1 (en) 2018-06-14
CN104217727A (zh) 2014-12-17
US10490199B2 (en) 2019-11-26
EP2991074A4 (en) 2016-10-26
US9892739B2 (en) 2018-02-13
CN104217727B (zh) 2017-07-21

Similar Documents

Publication Publication Date Title
WO2014190649A1 (zh) 信号解码方法及设备
ES2460893T3 (es) Sistemas, procedimientos y aparato para limitar el factor de ganancia
JP5129118B2 (ja) 帯域幅拡張音声予測励振信号の反疎性フィルタリングのための方法及び装置
JP5285162B2 (ja) ピーク検出に基づいた選択型スケーリングマスク演算
JP4991854B2 (ja) オーディオ信号に関連付けられるフレームを持つ窓を修正するためのシステムと方法
ES2711524T3 (es) Generación de señal de excitación de banda alta
JP5283046B2 (ja) ピーク検出に基づく選択的スケーリングマスク計算
JP6538209B2 (ja) ノイズ変調とゲイン調整とを実行するシステムおよび方法
WO2011047578A1 (zh) 频带扩展方法及装置
WO2015043161A1 (zh) 频带扩展的方法及装置
WO2010077556A1 (en) Method and apparatus for generating an enhancement layer within a multiple-channel audio coding system
WO2014107950A1 (zh) 音频信号编码和解码方法、音频信号编码和解码装置
KR20160067210A (ko) 고대역 여기 신호를 생성하기 위한 믹싱 팩터들의 추정
UA114233C2 (uk) Системи та способи для визначення набору коефіцієнтів інтерполяції
JP2005258478A (ja) 符号化装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13886051

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2013886051

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE