EP1120775A1 - Geräuschsignalkodierer und stimmensignalkodierer - Google Patents

Geräuschsignalkodierer und stimmensignalkodierer Download PDF

Info

Publication number
EP1120775A1
EP1120775A1 EP00935511A EP00935511A EP1120775A1 EP 1120775 A1 EP1120775 A1 EP 1120775A1 EP 00935511 A EP00935511 A EP 00935511A EP 00935511 A EP00935511 A EP 00935511A EP 1120775 A1 EP1120775 A1 EP 1120775A1
Authority
EP
European Patent Office
Prior art keywords
speech
signal
noise
segment
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00935511A
Other languages
English (en)
French (fr)
Other versions
EP1120775A4 (de
Inventor
Koji Yoshida
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Publication of EP1120775A1 publication Critical patent/EP1120775A1/de
Publication of EP1120775A4 publication Critical patent/EP1120775A4/de
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02168Noise filtering characterised by the method used for estimating noise the estimation exclusively taking place during speech pauses

Definitions

  • the present invention relates to a low bit rate speech signal coder used in applications such as a mobile communication system that transmit coded speech signals and speech recorder.
  • speech coders In the fields of digital mobile communications and speech storage, speech coders are used which perform coding on speech information at a low bit rate for effective utilization of radio frequency and recording media.
  • Such conventional technologies include the CS-ACELP coding system of the ITU-T Recommendation G.729 ("Coding of speech at 8kbit/s using conjugate-structure algebraic-code-excited linear-prediction(CS-ACELP)") and the CS-ACELP coding system with DTX (Discontinuous Transmission) control of the same ITU-T Recommendation G.729 Annex B ("A silence compression scheme for G.729 optimized for terminals conforming to Recommendation V.70").
  • FIG.1 is a block diagram showing a configuration of a coder based on the conventional CS-ACELP coding system.
  • LPC analyzer/quantizer 1 performs an LPC (linear prediction) analysis and quantization on an input speech signal and outputs LPC coefficients and LPC quantization codes.
  • an adaptive excitation signal and fixed excitation signal extracted from adaptive excitation codebook 2 and fixed excitation codebook 3 are multiplied by a gain extracted from gain codebook 4 and added up, subjected to a speech synthesis by LPC synthesis filter 7, and an error signal between the synthesized signal and the input signal is weighted by perceptual weighting filter 9, and an adaptive excitation code, fixed excitation code and gain code which minimize the weighted error signal are output together with the LPC quantization code as coded data.
  • reference numeral 5 denotes a multiplier
  • reference numeral 6 denotes an adder
  • reference numeral 8 denotes a subtractor.
  • FIG.2 is a block diagram showing a configuration of a coder based on the conventional CS-ACELP coding system with DTX control.
  • speech/non-speech decision section 11 decides whether the input signal is in a speech segment or non-speech segment speech segment (segment with only background noise).
  • CS-ACELP speech coder 12 performs speech coding on the speech segment.
  • CS-ACELP speech coder 12 has a configuration shown in FIG.1.
  • non-speech segment speech segment coder 13 performs coding.
  • This non-speech segment speech segment coder 13 calculates an LPC coefficient and LPC prediction residual energy of the input signal similar to those of coding in the speech segment from the input signal and outputs as coding data in the non-speech segment speech segment.
  • DTX controller & multiplexer 14 controls and multiplexes data to be sent as transmission data from the outputs of speech/non-speech decision section 11, CS-ACELP speech coder 12 and non-speech segment speech segment coder 13 and outputs this data as transmission data.
  • the speech coder performs coding at a bit rate as low as 8 kbps using speech-specific redundancy, and therefore while high-quality coding is possible in the case where a clean speech signal without superimposed background noise is input, in the case where a speech signal with surrounding background noise superimposed is input as the input signal, the conventional CS-ACELP coder above involves a problem that the quality of the decoded signal deteriorates when the speech signal with background noise noise signal is coded.
  • the conventional CS-ACELP coder with DTX control above performs coding using the CS-ACELP coder only on the speech segment and performs coding using a dedicated non-speech segment speech segment coder at a lower bit rate than that of the speech coder on the non-speech segment speech segment (segment with only noise), and thereby reduces an average bit rate for transmission.
  • the non-speech segment speech segment coder performs coding using a signal model (which generates a decoded signal by driving an AR type synthesis filter (LPC synthesis filter) with a random signal at short intervals (approximately 10 to 50 ms) similar to that of the speech coder
  • LPC synthesis filter AR type synthesis filter
  • the conventional CS-ACELP coder with DTX control involves a problem that the quality of the decoded signal deteriorates for a speech signal with background noise superimposed thereon as in the case of the conventional CS-ACELP coder above.
  • a theme of the present invention is to provide a speech signal coder and decoder with little deterioration of the quality of the decoded signal also for a speech signal with background noise superimposed thereon, also capable of reducing an average bit rate necessary for transmission by calculating statistical characteristic quantities on an input signal in a non-speech segment speech segment (segment with only noise), storing information on a noise model that can express statistical characteristic quantities on an input noise signal, detecting whether a noise model parameter expressing the input noise signal has changed or not and updating the noise model.
  • FIG.3 is a block diagram showing a configuration of a radio communication apparatus equipped with a speech signal coder according to Embodiment 1 of the present invention.
  • a speech signal is converted to an electric analog signal by speech input apparatus 101 such as a microphone on the transmitting side and output to A/D converter 102.
  • the analog speech signal is converted to a digital speech signal by A/D converter 102 and output to speech coder 103.
  • Speech coder 103 performs speech coding processing on the digital speech signal and outputs the coded information to modulation/demodulation section 104.
  • Modulation/demodulation section 104 digital-modulates the coded speech signal and sends to radio transmission section 105.
  • Radio transmission section 105 performs predetermined radio transmission processing on the modulated signal. This signal is sent out via antenna 106.
  • a reception signal received from antenna 107 is subjected to predetermined radio reception processing by radio reception section 108 and sent to modulation/demodulation section 104.
  • Modulation/demodulation section 104 performs demodulation processing on the reception signal and outputs the demodulated signal to speech decoding section 109.
  • Speech decoding section 109 performs decoding processing on the demodulated signal, obtains a digital decoded speech signal and outputs the digital decoded speech signal to D/A converter 110.
  • D/A converter 110 converts the digital decoded speech signal output from speech decoding section 109 to an analog speech signal and outputs to speech output apparatus 111 such as a speaker.
  • speech output apparatus 111 converts the electrical analog speech signal to speech sound and outputs.
  • Speech coding section 103 shown in FIG.3 has a configuration shown in FIG.4.
  • FIG.4 is a block diagram showing a configuration of the speech coding section according to Embodiment 1 of the present invention.
  • Speech/non-speech decision section 201 decides whether an input signal is in a speech segment or non-speech segment speech segment (segment with only noise) and outputs the decision result to DTX controller & multiplexer 204.
  • Speech/non-speech decision section 201 can be of any type and a decision is generally made using an instantaneous value or amount of change of a plurality of parameters such as power, spectrum and pitch period of the input signal.
  • speech coder 202 performs speech coding on the input speech signal in the speech segment including the speech signal and noise signal and outputs the coded data to DTX controller & multiplexer 204.
  • This speech coder 202 is a coder for the speech segment and any coder can be used as far as the coder can perform efficient coding on speech sound.
  • noise signal coder 203 performs noise signal coding on the input signal in the non-speech segment speech segment including only a noise signal and outputs information on a noise model that expresses the input noise signal and a flag indicating whether the noise model should be updated or not to DTX controller & multiplexer 204.
  • DTX controller & multiplexer 204 controls information to be sent as transmission data using the outputs from speech/non-speech decision section 201, speech coder 202 and noise signal coder 203, multiplexes transmission information and outputs as transmission data.
  • Noise signal coder 203 in FIG.4 has a configuration shown in FIG.5.
  • FIG.5 is a block diagram showing a configuration of the noise signal coder according to Embodiment 1 of the present invention.
  • Noise signal analysis section 301 performs a signal analysis on a noise signal input at certain intervals and calculates analysis parameters regarding the noise signal.
  • Analysis parameters extracted are parameters necessary to express statistical characteristic quantities regarding an input signal such as short-time spectrum calculated through FFT (Fast Fourier Transform) on a short-segment signal, input power, LPC spectrum parameter, etc..
  • FFT Fast Fourier Transform
  • noise model variation detection section 303 detects whether a noise model parameter that should express the currently input noise signal has changed from the noise model parameter retained in noise model storage section 302 or not.
  • the noise model parameter refers to information on a noise model that can express statistical characteristic quantities regarding an input noise signal, for example, information that expresses statistical characteristic quantities such as average spectrum of short-time spectra, variance, etc. using a statistical model such as HMM.
  • noise model variation detection section 303 decides whether an analysis parameter for the current input signal obtained from noise signal analysis section 301 is appropriate or not as the output from the noise model, which is stored as the noise model expressing preceding input signals (for example, in the case of an HMM model, whether the probability of output of an analysis parameter for the current input signal is equal to or greater than a specified value or not), and in the case where it is decided that the noise model parameter that should express the currently input noise signal has changed from the stored noise model, noise model variation detection section 303 outputs a flag as to whether the noise model should be updated or not and information to be updated (update information) to noise model updating section 304.
  • the external updating enable flag is a flag to externally instruct on whether the updating of the noise model should be enabled or not and the updating of the noise model is disabled in the case where the speech coder of the present invention, which will be described later, is prevented from sending noise model parameters, for example, for a period during which coded data in the speech segment is sent.
  • noise model updating section 304 only outputs information of the updated noise model parameters or changed parts of noise model parameters previously stored in noise model storage section 302 and at the same time updates noise model storage section 302 using the output information.
  • noise model updating flag indicates no updating
  • noise model updating section 304 neither updates nor outputs updating information.
  • FIG.6 is a block diagram showing a configuration of the speech decoder according to Embodiment 1 of the present invention.
  • Separator & DTX controller 401 receives transmission data which is an input signal coded and sent on the coding side as reception data and separates this reception data into speech coded data or noise model parameter, speech/non-speech decision flag and noise model updating flag necessary for speech decoding and noise signal generation.
  • the speech/non-speech decision flag indicates the speech segment speech decoder 402 performs speech decoding from the speech coded data and outputs the decoded speech to output switch 404.
  • noise signal generator 403 generates a noise signal from the noise model parameter and noise model updating flag and outputs the noise signal to output switch 404.
  • Output switch 404 switches between the output of speech decoder 402 and the output of noise signal generator 403 according to the result of the speech/non-speech decision flag and outputs as an output signal.
  • Noise signal generator 403 in FIG.6 has a configuration shown in FIG.7.
  • FIG.7 is a block diagram showing a configuration of the noise signal generator of the speech decoder according to Embodiment 1 of the present invention.
  • noise model updating flag and noise model parameter (in the case of model updating) output from noise signal coder 203 shown in FIG.5 are input to noise model updating section 501.
  • noise model updating section 501 updates the noise model using the input noise model parameter and the previous noise model parameter retained in noise model storage section 502 and newly stores the updated noise model parameter in noise model storage section 502.
  • Noise signal generator 503 generates and outputs a noise signal based on the information of noise model storage section 502. Noise signals are generated based on the model information which express statistical characteristic quantities so that the noise signal generated becomes an appropriate signal as the output from the model. For example, in the case where HMM is used as a statistical model, noise signal generator 503 stochastically outputs signal parameters (for example, short-time spectra) necessary for generation according to state transition probability and parameter output probability, etc. and generates/outputs a noise signal based thereupon.
  • signal parameters for example, short-time spectra
  • FIG.8 is a flow chart showing a processing flow of the speech signal coding method according to Embodiment 1. In this method, suppose the processing shown in FIG.8 is repeated for every frame of a certain short segment (for example, approximately 10 to 50 ms).
  • step (hereinafter referred to as "ST") 101 speech signals are input frame by frame. Then, in ST102, a speech/non-speech decision is made on the input signal and the decision result is output. In the case where the decision result is "speech”, in ST104, speech coding processing is performed on the input speech signal and the coded data is output.
  • the noise signal coder performs noise signal coding processing on the input signal and outputs a flag as to whether information on a noise model that expresses the input noise signal and noise model should be updated or not.
  • the coding processing on a noise signal will be described later.
  • FIG.9 is a flow chart showing a processing flow of the noise signal coding method of the speech signal coding method according to this embodiment. According to this method, the processing shown in FIG.9 is repeated for every frame of a fixed short segment (for example, approximately 10 to 50 ms).
  • a noise signal is input frame by frame.
  • a signal analysis is made on the noise signal frame by frame and analysis parameters for the noise signal are calculated.
  • the random coding method by modeling a noise signal with a noise model that can express with statistical characteristic quantities, it is possible to generate a coded signal with little perceptual deterioration with respect to a background noise signal. Moreover, since there is no need for faithful coding on input signal waveforms and transmission is performed only in segments where noise model parameters corresponding to the input signal are changed, it is possible to provide, low bit rate, highly efficient coding.
  • the speech signal coding method provides high quality, highly efficient coding even in a background noise environment by performing coding in a speech segment using a speech coder capable of coding a speech signal with high quality and performing coding in a non-speech segment speech segment using a noise signal coder with high efficiency and little perceptual deterioration.
  • FIG.10 is a block diagram showing a configuration of a speech signal coding section according to Embodiment 2 of the present invention.
  • speech/noise signal separator 801 separates an input speech signal into a speech signal and a background noise signal superimposed on the speech signal.
  • Speech/noise signal separator 801 can be of any type. As this separation method, several methods are available such as a method called “spectrum subtraction” which separates an input signal into a speech signal with noise signal suppressed and the noise signal by subtracting a random spectrum from the input signal in the frequency domain and a method of separating speech sound and noise signals using input signals from a plurality of signal input devices.
  • speech/non-speech decision section 802 decides from the speech signal after the separation obtained from speech/noise signal separator 801 whether the signal is in a speech segment or non-speech segment speech segment (segment with only noise) and outputs the decision result to speech coder 803 and DTX controller & multiplexer 805. It is also possible to make this decision using an input signal before separation. Speech/non-speech decision section 802 can be of any type. This decision is generally made using instantaneous values or amount of variation of a plurality of parameters such as power, spectrum and pitch period of the input signal.
  • speech coder 803 performs speech signal coding on the speech signal after the separation obtained from speech/noise signal separator 801 only in the speech segment and outputs the coded data to DTX controller & multiplexer 805.
  • This speech coder 803 is a coder for the speech segment and any coder can be used as far as the coder can perform efficient coding on speech sound.
  • noise signal coder 804 performs noise signal coding on the noise signal after the separation obtained from speech/noise signal separator 801 over the entire segment and outputs a flag as to whether information on the noise model that expresses an input noise signal and noise model should be updated or not.
  • Speech/noise signal separator 801 is shown in FIG.5 explained in Embodiment 1.
  • the speech/non-speech decision result flag input to noise signal coder 804 is designated as the noise model updating disable flag in noise signal coder 804 and the model is not updated.
  • DTX controller & multiplexer 805 controls information to be sent as transmission data and multiplexes transmission information using the outputs from speech/non-speech decision section 802, speech coder 803 and noise signal coder 804 and outputs as transmission data.
  • FIG.11 is a block diagram showing a configuration of the speech signal decoder according to Embodiment 2 of the present invention.
  • separator & DTX controller 901 receives transmission data, which is an input signal coded and sent on the coding side as reception data and separates the reception data into speech coded data or noise model parameter, speech/non-speech decision flag and noise model updating flag necessary for speech decoding and noise generation.
  • the speech segment speech decoder 902 performs speech decoding from the speech coded data and outputs the decoded speech to speech/noise signal adder 904.
  • noise signal generator 903 generates a noise signal from the noise model parameter and noise model updating flag and outputs the noise signal to speech/noise signal adder 904.
  • Speech/noise signal adder 904 adds up the output of speech decoder 902 and the output of noise signal generator 903 and outputs as an output signal.
  • ST301 input signals are input frame by frame. Then, in ST302, an input speech signal is separated into a speech signal and a background noise signal superimposed on the speech signal. Then, in ST303, a speech/non-speech decision is made on the input signal or the speech signal after the separation obtained in ST302 and the decision result is output (ST304).
  • the speech coder performs speech coding processing on the speech signal after the separation obtained in ST302 and outputs the coded data. Then, on the noise signal after the separation obtained in ST302, the noise signal coder performs noise signal coding in ST306 and outputs information on the noise model that expresses the input noise signal and a flag as to whether the noise model should be updated or not.
  • model updating is not performed in noise signal coding processing in ST306. Then, in ST307, information to be sent as transmission data is controlled and transmission information is multiplexed using the output obtained as a result of the speech/non-speech decision, speech coding processing and noise signal coding processing and finally in ST308, this data is output as transmission data.
  • the speech signal coder of this embodiment can perform coding in a speech segment using the speech coder providing high-quality coding on the speech signal and perform coding on a noise signal using the noise signal coder of Embodiment 1 with high efficiency and little perceptual deterioration, and therefore can perform high-quality and high-efficiency coding even in a background noise environment. Furthermore, by providing a speech/noise signal separator, the speech signal coder of this embodiment can remove superimposed background noise signals from the speech signal input to the speech coder, providing higher-quality or higher efficiency coding in the speech segment.
  • FIG.13 is a block diagram showing a configuration of a speech coding section according to Embodiment 3 of the present invention.
  • the configuration on the decoding side of this embodiment is the same as the configuration of the speech signal decoder shown in FIG.6.
  • Input signal analyzer 1101 performs a signal analysis on an input signal input for every certain segment and calculates analysis parameters for the input signal.
  • Characteristic parameters to be extracted include parameters necessary to express statistical characteristic quantities on the input signal and parameters expressing speech characteristics.
  • the parameters necessary to express statistical characteristic quantities include short-time spectra obtained by FFT on a short-segment signal, input power, LPC spectrum parameter, etc..
  • the parameters expressing speech characteristics include LPC parameter, input power and pitch period information, etc.
  • mode decision section 1104 decides whether the input signal is in a speech segment or non-speech segment speech segment (segment with only noise) and whether a noise model is updated and updating information is sent or not in the case of a non-speech segment speech segment, on the analysis parameters obtained by input signal analyzer 1101 using the speech characteristic pattern retained in speech model storage section 1102 and the noise model parameter retained in noise model storage section 1103.
  • speech model storage section 1102 creates and stores speech characteristic patterns beforehand and the speech characteristic patterns include information such as distribution of LPC parameters, input signal power and pitch period information, etc. in a speech (voiced) segment.
  • the noise model parameters refer to information on a noise model that can express statistical characteristic quantities on the input noise signal such as information expressing statistical characteristic quantities such as(?) average spectrum of short-time spectra, distribution value, using a statistic model such as HMM.
  • input signal analyzer 1101 decides whether statistical analysis parameters for the current input signal obtained is appropriate as the output from the noise model stored noise modelexpressing signals in the preceding random segment or not (for example, in the case of an HMM model, whether the probability of output of an analysis parameter for the current input signal is equal to or greater than a specified value) and at the same time decides from the parameter expressing speech characteristics on the input signal whether the signal is in a speech (voiced) segment or not.
  • mode decision section 1104 decides that the signal is in the speech segment speech coder 1105 performs speech coding on the input signal and outputs the coded data to DTX controller & multiplexer 1107.
  • mode decision section 1104 decides that the signal is in the non-speech segment speech segment and noise model updating information is sent
  • noise model updating section 1106 updates the noise model and outputs the information on the updated noise model to DTX controller & multiplexer 1107.
  • DTX controller & multiplexer 1107 controls information to be sent as transmission data and multiplexes transmission information using the outputs from the speech coder and noise model updating section 1106 and outputs as transmission data.
  • ST403 it is decided whether a currently input statistical analysis parameter is appropriate or not as the output from the noise model retained in noise model storage section 1103 in FIG.11 (ST404).
  • the process moves on to next ST405 and it is decided from the speech characteristic parameter obtained by analyzing the input signal whether the signal is in a speech (voiced) segment or not.
  • the speech coder performs speech coding processing and outputs the coded data.
  • the speech signal coder can make decisions using a variation in statistical characteristic quantities of an input signal and speech characteristic patterns. Therefore, this embodiment can make more precise mode decisions and suppress deterioration of quality due to decision errors.
  • the noise signal coder of the present invention adopts a configuration comprising an analyzer that performs a signal analysis on a noise signal contained in a speech signal, a storage device that stores information on a noise model expressing the noise signal, a detector that detects a variation of information on the stored noise model based on the result of a signal analysis of a current input noise signal and an updater that updates, when a change of the information on the noise model is detected, information on the noise model stored by the amount of the variation.
  • This configuration allows a noise signal to be modeled with a noise model capable of expressing with statistical characteristic quantities, and thereby can generate a decoded signal with little perceptual deterioration with respect to a background noise signal.
  • This modeling also eliminates the need for faithful coding for the input signal waveform, providing low bit rate, highly efficient coding by only transmitting a segment where a noise model parameter corresponding to the input signal changes.
  • the noise signal coder of the present invention in the above configuration adopts a configuration with the analyzer extracting statistical characteristic quantities on the noise signal and the storage device storing information capable of expressing the statistical characteristic quantities as information on the noise model.
  • This configuration provides appropriate modeling of a noise signal and low bit rate, highly efficient coding.
  • the speech signal coder of the present invention adopts a configuration comprising a speech/non-speech decision section that decides whether an input speech signal is in a speech segment or non-speech segment speech segment that includes only a noise signal, a speech coder that performs speech coding on the input speech signal when the decision result shows that the signal is in a speech segment, the noise signal coder that, performs noise signal coding on the input signal when the decision result shows that the signal is in a non-speech segment speech segment and a multiplexer that multiplexes the outputs from the speech/non-speech decision section, speech coder and noise signal coder.
  • the speech coder capable of performing high quality coding on the speech signal performs coding in a speech segment and the noise signal coder with high efficiency and little perceptual deterioration performs coding in a non-speech segment speech segment, thus providing high quality and highly efficient coding even in a background noise environment.
  • the speech signal coder of the present invention adopts a configuration comprising a speech/noise signal separator that separates an input speech signal into a speech signal and a background noise signal superimposed on this speech signal, a speech/non-speech decision section that decides the speech segment or non-speech segment speech segment including only the noise signal from the speech signal obtained from the input speech signal or the speech/noise signal speech/non-speechseparator, a speech coder that performs speech coding on the input speech signal when the decision result indicates a speech segment, the noise signal coder that performs coding on the background noise signal obtained from the speech/noise signal speech/non-speech separator and a multiplexer that multiplexes the outputs from the speech/noise signal speech/non-speech decision section, speech coder and noise signal coder.
  • the speech coder capable of performing high quality coding on the speech signal performs coding in a speech segment and the noise signal coder with high efficiency and little perceptual deterioration performs coding on a noise signal, thus providing high quality and highly efficient coding even in a background noise environment. Furthermore, provision of the speech/noise signal speech/non-speech separator makes it possible to remove superimposed background noise from the speech signal input to the speech coder, providing high quality, highly efficient coding on the speech segment.
  • the speech signal coder of the present invention adopts a configuration comprising an analyzer that performs a signal analysis on an input speech signal, a speech model storage device that stores speech characteristic patterns necessary to decide whether the input speech signal is a voiced signal or not, a noise model storage device that stores information on a noise model expressing a noise signal included in the input speech signal, a mode decision section that decides whether the input speech signal is in a speech segment or non-speech segment speech segment containing only a noise signal using the outputs of the analyzer, speech model storage device and noise model storage device and in the case of the non-speech segment speech segment, decides whether the noise model should be updated or not, a speech coder that performs speech coding on the input speech signal when the mode decision section decides the speech segment, a noise model updater that updates the noise model when the mode decision section decides the non-speech segment speech segment and decides that the noise model will be updated and a multiplexer that multiplexes the outputs from the speech coder and noise model update
  • provision of the mode decision section makes it possible to make a decision using a variation of statistical characteristic quantities of the input signal and speech characteristic patterns.
  • this configuration provides more precise mode decision and can suppress quality deterioration due to decision errors.
  • the noise signal generator of the present invention adopts a configuration comprising a noise model updater that updates a noise model as required according to noise model parameters coded on the input noise signal on the coding side and the noise model updating flag, a noise model storage device that stores information on the updated noise model using the output of the noise model updater and a noise signal generator that generates a noise signal from information on the noise model stored in the noise model storage device.
  • the noise signal generator of the present invention in the above configuration adopts a configuration with the noise model parameters input to the noise model updater and information stored in the noise model storage device being information capable of expressing statistical characteristic quantities on the noise signal generated.
  • this configuration can generate a decoded signal with little perceptual deterioration with respect to a background noise signal.
  • the speech signal decoder of the present invention adopts a configuration comprising a separator that receives a signal including speech data coded on the coding side, noise model parameter, speech/non-speech decision flag and noise model updating flag and separates the noise model parameter, speech/non-speech decision flag and noise model updating flag from the signal, a speech decoder that performs speech decoding on the speech data when the speech/non-speech decision flag indicates a speech segment, a noise signal generator that generates a noise signal from the noise model parameter and noise model updating flag when the speech/non-speech decision flag indicates a non-speech segment speech segment and an output switch that switches between the decoded speech output from the speech decoder and the noise signal output from the noise signal generator according to the speech/non-speech decision flag and outputs as an output signal.
  • This configuration makes it possible to generate a decoded signal with little perceptual deterioration with respect to a background noise signal.
  • the speech signal decoder of the present invention adopts a configuration comprising a separator that receives a signal including speech data coded on the coding side, noise model parameter, speech/non-speech decision flag and noise model updating flag and separates the noise model parameter, speech/non-speech decision flag and noise model updating flag from the signal, a speech decoder that performs speech decoding on the speech data when the speech/non-speech decision flag indicates a speech segment, the noise signal generator that generates a noise signal from the noise model parameter and noise model updating flag when the speech/non-speech decision flag indicates a non-speech segment speech segment and a speech/noise signal adder that adds up the decoded speech output from the speech decoder and noise signal output from the noise signal generator.
  • This configuration makes it possible to generate a decoded signal with little perceptual deterioration with respect to a background noise signal. Furthermore, after the coding side separates a speech signal and a noise signal superimposed thereon, coders suited to their respective signals perform coding and the decoding side adds up the signals to generate a decoded signal, thus providing coding of a speech signal in a speech segment with higher quality.
  • the speech signal coding method of the present invention comprises a speech/non-speech deciding step of deciding whether an input speech signal is in a speech segment or non-speech segment speech segment that includes only a noise signal, a speech coding step of coding the input speech signal when the decision result shows that the signal is in a speech segment, a noise signal coding step of performing noise signal coding on the input signal when the decision result shows that the signal is in a non-speech segment speech segment, and a multiplexing step of multiplexing the outputs from the speech/non-speech deciding step, speech coding step and noise signal coding step, and the noise signal coding step comprises an analyzing step of performing a signal analysis on a noise signal contained in a speech signal, a storing step of storing information on a noise model expressing the noise signal, a detecting step of detecting a variation of information on the stored noise model based on the result of a signal analysis of a current input noise signal and an updating step of updating information on the
  • the speech coding section capable of performing high quality coding on the speech signal performs coding in a speech segment and the noise signal coder of the first embodiment with high efficiency and little perceptual deterioration performs coding in a non-speech segment speech segment, thus providing high quality, highly efficient coding even in a background noise environment.
  • the speech signal coding method of the present invention comprises a speech/noise signal separating step of separating an input speech signal into a speech signal and a background noise signal superimposed on this speech signal, a speech/non-speech deciding step of deciding the speech segment or non-speech segment speech segment that includes only the noise signal from the speech signal obtained in the input speech signal or the speech/noise signal separating step, a speech coding step of performing speech coding on the input speech signal when the decision result indicates a speech segment, a noise signal coding step of performing noise signal coding on the input signal when the decision result indicates a non-speech segment speech segment and performing coding on the background noise signal obtained from the speech/noise signal separating step and a multiplexing step of multiplexing the outputs from the speech/non-speech deciding step, speech coding step and noise signal coding step, and the noise signal coding step comprises an analyzing step of performing a signal analysis on a noise signal contained in a speech signal, a
  • the speech coding section capable of performing high quality coding on the speech signal performs coding in a speech segment and the noise signal coder of the first embodiment with high efficiency and little perceptual deterioration performs coding in a non-speech segment speech segment, thus providing high quality and highly efficient coding even in a background noise environment. Furthermore, provision of the speech/noise signal speech/non-speech separating section makes it possible to remove superimposed background noise from the speech signal input to the speech coding section, providing high quality, highly efficient coding on the speech segment.
  • the speech signal coding method of the present invention comprises an analyzing step of performing a signal analysis on an input speech signal, a speech model storing step of storing speech characteristic patterns necessary to decide whether the input speech signal is a voiced signal or not, a noise model storing step of storing information on a noise model expressing a noise signal included in the input speech signal, a mode deciding step of deciding whether the input speech signal is in a speech segment or non-speech segment speech segment containing only a noise signal using the outputs of the analyzing section, speech model storing section and noise model storing section and when the decision result indicates the non-speech segment speech segment, deciding whether the noise model should be updated or not, a speech coding step of performing speech coding on the input speech signal when the mode decision section decides the speech segment, a noise model updating step of updating the noise model when the mode decision section decides the non-speech segment speech segment and decides that the noise model will be updated, and a multiplexing step of multiplexing the outputs from the speech
  • provision of the mode decision section allows decisions to be made using a variation of statistical characteristic quantities and speech characteristic patterns of the input signal.
  • this method provides more precise mode decisions and suppresses quality deterioration due to decision errors.
  • the recording medium of the present invention is a mechanically readable medium that records a program to execute the steps of analyzing statistical characteristic quantities on an input noise signal, storing information on a noise model expressing the statistical characteristic quantities on the input noise signal, detecting a variation of the noise model expressing the input noise signal and updating the noise model and outputting information on the updated noise model as required.
  • the noise signal coder of the present invention can generate a decoded signal with little perceptual deterioration with respect to a background noise signal by modeling a noise signal with a noise model capable of expressing the noise signal with statistical characteristic quantities.
  • the noise signal coder of the present invention also eliminates the need for faithful coding for the input signal waveform, and thus provides low bit rate, highly efficient coding by transmitting only a segment where a noise model parameter for the input signal changes.
  • the speech signal coder of the present invention provides high-quality, highly efficient coding even in a background noise environment by performing coding in a speech segment through a speech coder capable of coding a speech signal with high quality and performing coding in a non-speech segment speech segment through the noise signal coder with high efficiency and little perceptual deterioration.
  • the present invention is applicable to a base station apparatus and communication terminal apparatus in a digital radio communication system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
EP00935511A 1999-06-15 2000-06-01 Geräuschsignalkodierer und stimmensignalkodierer Withdrawn EP1120775A4 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP16854599A JP4464484B2 (ja) 1999-06-15 1999-06-15 雑音信号符号化装置および音声信号符号化装置
JP16854599 1999-06-15
PCT/JP2000/003526 WO2000077774A1 (fr) 1999-06-15 2000-06-01 Codeur de signaux de bruit et codeur de signaux vocaux

Publications (2)

Publication Number Publication Date
EP1120775A1 true EP1120775A1 (de) 2001-08-01
EP1120775A4 EP1120775A4 (de) 2001-09-26

Family

ID=15870014

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00935511A Withdrawn EP1120775A4 (de) 1999-06-15 2000-06-01 Geräuschsignalkodierer und stimmensignalkodierer

Country Status (5)

Country Link
EP (1) EP1120775A4 (de)
JP (1) JP4464484B2 (de)
CN (1) CN1313983A (de)
AU (1) AU5103700A (de)
WO (1) WO2000077774A1 (de)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2164238A1 (de) * 2007-06-27 2010-03-17 NEC Corporation Mehrpunktverbindungsvorrichtung sowie signalanalysevorrichtung, -verfahren und -programm
WO2010070187A1 (en) * 2008-12-19 2010-06-24 Nokia Corporation An apparatus, a method and a computer program for coding
US9037457B2 (en) 2011-02-14 2015-05-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio codec supporting time-domain and frequency-domain coding modes
US9047859B2 (en) 2011-02-14 2015-06-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for encoding and decoding an audio signal using an aligned look-ahead portion
US9153236B2 (en) 2011-02-14 2015-10-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio codec using noise synthesis during inactive phases
US9384739B2 (en) 2011-02-14 2016-07-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for error concealment in low-delay unified speech and audio coding
US9536530B2 (en) 2011-02-14 2017-01-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Information signal representation using lapped transform
US9583110B2 (en) 2011-02-14 2017-02-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for processing a decoded audio signal in a spectral domain
US9595263B2 (en) 2011-02-14 2017-03-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoding and decoding of pulse positions of tracks of an audio signal
US9595262B2 (en) 2011-02-14 2017-03-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Linear prediction based coding scheme using spectral domain noise shaping
US9620129B2 (en) 2011-02-14 2017-04-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for coding a portion of an audio signal using a transient detection and a quality result

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4769121B2 (ja) * 2006-05-15 2011-09-07 日本電信電話株式会社 サーバ・クライアント型音声認識方法、装置およびサーバ・クライアント型音声認識プログラム、記録媒体
CN101546557B (zh) * 2008-03-28 2011-03-23 展讯通信(上海)有限公司 用于音频内容识别的分类器参数更新方法
CN104469250B (zh) * 2013-09-23 2019-07-26 联想(北京)有限公司 一种信息处理方法及电子设备
EP3010017A1 (de) * 2014-10-14 2016-04-20 Thomson Licensing Verfahren und Vorrichtung zur Trennung von Sprachdaten von Hintergrunddaten in der Audiokommunikation
CN106971741B (zh) * 2016-01-14 2020-12-01 芋头科技(杭州)有限公司 实时将语音进行分离的语音降噪的方法及系统
RU2743732C2 (ru) * 2016-05-30 2021-02-25 Сони Корпорейшн Способ и устройство для обработки видео- и аудиосигналов и программа

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1039898A (ja) * 1996-07-22 1998-02-13 Nec Corp 音声信号伝送方法及び音声符号復号化システム
EP0843301A2 (de) * 1996-11-15 1998-05-20 Nokia Mobile Phones Ltd. Verfahren zur Erzeugung von Hintergrundrauschen während einer diskontinuierlichen Übertragung
US5809460A (en) * 1993-11-05 1998-09-15 Nec Corporation Speech decoder having an interpolation circuit for updating background noise
WO2000075919A1 (en) * 1999-06-07 2000-12-14 Ericsson, Inc. Methods and apparatus for generating comfort noise using parametric noise model statistics

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2692104B2 (ja) * 1988-02-12 1997-12-17 株式会社日立製作所 音声多重化システム
JP3173639B2 (ja) * 1995-05-26 2001-06-04 株式会社エヌ・ティ・ティ・ドコモ 背景雑音更新システムおよび方法
JP2806308B2 (ja) * 1995-06-30 1998-09-30 日本電気株式会社 音声復号化装置
JP3575967B2 (ja) * 1996-12-02 2004-10-13 沖電気工業株式会社 音声通信システムおよび音声通信方法
JP3119204B2 (ja) * 1997-06-27 2000-12-18 日本電気株式会社 音声符号化装置
JP2000122698A (ja) * 1998-10-19 2000-04-28 Mitsubishi Electric Corp 音声符号化装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5809460A (en) * 1993-11-05 1998-09-15 Nec Corporation Speech decoder having an interpolation circuit for updating background noise
JPH1039898A (ja) * 1996-07-22 1998-02-13 Nec Corp 音声信号伝送方法及び音声符号復号化システム
EP0843301A2 (de) * 1996-11-15 1998-05-20 Nokia Mobile Phones Ltd. Verfahren zur Erzeugung von Hintergrundrauschen während einer diskontinuierlichen Übertragung
WO2000075919A1 (en) * 1999-06-07 2000-12-14 Ericsson, Inc. Methods and apparatus for generating comfort noise using parametric noise model statistics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PATENT ABSTRACTS OF JAPAN vol. 1998, no. 06, 30 April 1998 (1998-04-30) & JP 10 039898 A (NEC CORP), 13 February 1998 (1998-02-13) & US 5 953 698 A (NEC CORPORATION, TOKYO, JAPAN) 14 September 1999 (1999-09-14) *
See also references of WO0077774A1 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9118805B2 (en) 2007-06-27 2015-08-25 Nec Corporation Multi-point connection device, signal analysis and device, method, and program
EP2164238A4 (de) * 2007-06-27 2010-11-03 Nec Corp Mehrpunktverbindungsvorrichtung sowie signalanalysevorrichtung, -verfahren und -programm
EP2164238A1 (de) * 2007-06-27 2010-03-17 NEC Corporation Mehrpunktverbindungsvorrichtung sowie signalanalysevorrichtung, -verfahren und -programm
WO2010070187A1 (en) * 2008-12-19 2010-06-24 Nokia Corporation An apparatus, a method and a computer program for coding
US9153236B2 (en) 2011-02-14 2015-10-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio codec using noise synthesis during inactive phases
US9047859B2 (en) 2011-02-14 2015-06-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for encoding and decoding an audio signal using an aligned look-ahead portion
US9037457B2 (en) 2011-02-14 2015-05-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio codec supporting time-domain and frequency-domain coding modes
US9384739B2 (en) 2011-02-14 2016-07-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for error concealment in low-delay unified speech and audio coding
US9536530B2 (en) 2011-02-14 2017-01-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Information signal representation using lapped transform
US9583110B2 (en) 2011-02-14 2017-02-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for processing a decoded audio signal in a spectral domain
US9595263B2 (en) 2011-02-14 2017-03-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoding and decoding of pulse positions of tracks of an audio signal
US9595262B2 (en) 2011-02-14 2017-03-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Linear prediction based coding scheme using spectral domain noise shaping
US9620129B2 (en) 2011-02-14 2017-04-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for coding a portion of an audio signal using a transient detection and a quality result

Also Published As

Publication number Publication date
JP2000357000A (ja) 2000-12-26
EP1120775A4 (de) 2001-09-26
CN1313983A (zh) 2001-09-19
WO2000077774A1 (fr) 2000-12-21
AU5103700A (en) 2001-01-02
JP4464484B2 (ja) 2010-05-19

Similar Documents

Publication Publication Date Title
EP1120775A1 (de) Geräuschsignalkodierer und stimmensignalkodierer
EP0770987B1 (de) Verfahren und Vorrichtung zur Wiedergabe von Sprachsignalen, zur Dekodierung, zur Sprachsynthese und tragbares Funkendgerät
CN101715549B (zh) 嵌入在音频信号中的隐藏数据的恢复
JP4731775B2 (ja) スーパーフレーム構造のlpcハーモニックボコーダ
JP2964344B2 (ja) 符号化/復号化装置
JP4861271B2 (ja) 位相スペクトル情報をサブサンプリングする方法および装置
JP4132154B2 (ja) 音声合成方法及び装置、並びに帯域幅拡張方法及び装置
JP2006099124A (ja) デジタル無線チャネル上の自動音声/話者認識
KR100603167B1 (ko) 시간 동기식 파형 보간법을 이용한 피치 프로토타입파형으로부터의 음성 합성
JP4445328B2 (ja) 音声・楽音復号化装置および音声・楽音復号化方法
KR20000077057A (ko) 음성합성장치 및 방법, 전화장치 및 프로그램 제공매체
EP1041541B1 (de) Celp sprachkodierer
JP2007279754A (ja) 音声符号化装置
EP1355297A1 (de) Datenverarbeitungsgerät
WO2002021091A1 (fr) Analyseur de signal de bruit, synthetiseur de signal de bruit, procede d'analyse de signal de bruit et procede de synthese de signal de bruit
WO2001065542A1 (fr) Dispositif de codage/decodage de la voix et procede associe
US5893060A (en) Method and device for eradicating instability due to periodic signals in analysis-by-synthesis speech codecs
CA2424558C (en) Pitch cycle search range setting apparatus and pitch cycle search apparatus
JP2002169595A (ja) 固定音源符号帳及び音声符号化/復号化装置
JP4230550B2 (ja) 音声符号化方法及び装置、並びに音声復号化方法及び装置
EP0987680A1 (de) Audiosignalverarbeitung
JP3896654B2 (ja) 音声信号区間検出方法及び装置
EP1164577A2 (de) Verfahren und Einrichtung zur Wiedergabe von Sprachsignalen

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010309

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

A4 Supplementary search report drawn up and despatched

Effective date: 20010816

AK Designated contracting states

Kind code of ref document: A4

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

RIC1 Information provided on ipc code assigned before grant

Free format text: 7G 10L 19/04 A, 7H 03M 7/30 B, 7G 10L 19/00 B, 7G 10L 19/14 B, 7G 10L 101:00 Z

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20030128