EP1120775A1

EP1120775A1 - Noise signal encoder and voice signal encoder

Info

Publication number: EP1120775A1
Application number: EP00935511A
Authority: EP
Inventors: Koji Yoshida
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-06-15
Filing date: 2000-06-01
Publication date: 2001-08-01
Also published as: JP2000357000A; EP1120775A4; CN1313983A; WO2000077774A1; AU5103700A; JP4464484B2

Abstract

Using noise model storage section 302 for storing information on a noise model capable of expressing statistical characteristic quantities regarding an input noise signal with respect to statistical characteristic quantities for an input noise signal calculated by noise signal analysis section 301, noise model variation detection section 303 detects whether a noise model parameter indicating the input noise signal has changed or not, noise model updating section 304 updates the noise model and outputs the updated model information. Coding is performed on a non-speech segment speech segment (segment with only noise) of an input signal or on a noise signal separated from a speech signal using the noise signal coder in the above configuration, while coding is performed on a speech segment using a speech coder.

Description

Technical Field

The present invention relates to a low bit rate speech signal coder used in applications such as a mobile communication system that transmit coded speech signals and speech recorder.

Background Art

In the fields of digital mobile communications and speech storage, speech coders are used which perform coding on speech information at a low bit rate for effective utilization of radio frequency and recording media. Such conventional technologies include the CS-ACELP coding system of the ITU-T Recommendation G.729 ("Coding of speech at 8kbit/s using conjugate-structure algebraic-code-excited linear-prediction(CS-ACELP)") and the CS-ACELP coding system with DTX (Discontinuous Transmission) control of the same ITU-T Recommendation G.729 Annex B ("A silence compression scheme for G.729 optimized for terminals conforming to Recommendation V.70").
FIG.1 is a block diagram showing a configuration of a coder based on the conventional CS-ACELP coding system. In FIG.1, LPC analyzer/quantizer 1 performs an LPC (linear prediction) analysis and quantization on an input speech signal and outputs LPC coefficients and LPC quantization codes.
Then, an adaptive excitation signal and fixed excitation signal extracted from adaptive excitation codebook 2 and fixed excitation codebook 3 are multiplied by a gain extracted from gain codebook 4 and added up, subjected to a speech synthesis by LPC synthesis filter 7, and an error signal between the synthesized signal and the input signal is weighted by perceptual weighting filter 9, and an adaptive excitation code, fixed excitation code and gain code which minimize the weighted error signal are output together with the LPC quantization code as coded data. In FIG.1, reference numeral 5 denotes a multiplier, reference numeral 6 denotes an adder and reference numeral 8 denotes a subtractor.
FIG.2 is a block diagram showing a configuration of a coder based on the conventional CS-ACELP coding system with DTX control. First, speech/non-speech decision section 11 decides whether the input signal is in a speech segment or non-speech segment speech segment (segment with only background noise). In the case where speech/non-speech decision section 11 decides that the input signal is in a speech segment, CS-ACELP speech coder 12 performs speech coding on the speech segment. CS-ACELP speech coder 12 has a configuration shown in FIG.1.
On the other hand, in the case where speech/non-speech decision section 11 decides that the input signal is in a non-speech segment speech segment, non-speech segment speech segment coder 13 performs coding. This non-speech segment speech segment coder 13 calculates an LPC coefficient and LPC prediction residual energy of the input signal similar to those of coding in the speech segment from the input signal and outputs as coding data in the non-speech segment speech segment.
DTX controller & multiplexer 14 controls and multiplexes data to be sent as transmission data from the outputs of speech/non-speech decision section 11, CS-ACELP speech coder 12 and non-speech segment speech segment coder 13 and outputs this data as transmission data.
However, in the conventional CS-ACELP coder above, the speech coder performs coding at a bit rate as low as 8 kbps using speech-specific redundancy, and therefore while high-quality coding is possible in the case where a clean speech signal without superimposed background noise is input, in the case where a speech signal with surrounding background noise superimposed is input as the input signal, the conventional CS-ACELP coder above involves a problem that the quality of the decoded signal deteriorates when the speech signal with background noise noise signal is coded.
Furthermore, the conventional CS-ACELP coder with DTX control above performs coding using the CS-ACELP coder only on the speech segment and performs coding using a dedicated non-speech segment speech segment coder at a lower bit rate than that of the speech coder on the non-speech segment speech segment (segment with only noise), and thereby reduces an average bit rate for transmission. However, since the non-speech segment speech segment coder performs coding using a signal model (which generates a decoded signal by driving an AR type synthesis filter (LPC synthesis filter) with a random signal at short intervals (approximately 10 to 50 ms) similar to that of the speech coder, the conventional CS-ACELP coder with DTX control involves a problem that the quality of the decoded signal deteriorates for a speech signal with background noise superimposed thereon as in the case of the conventional CS-ACELP coder above.

Disclosure of Invention

It is an object of the present invention to provide a speech signal coder and decoder with little deterioration of the quality of a decoded signal also for a speech signal with background noise superimposed thereon, capable of reducing an average bit rate necessary for transmission.
A theme of the present invention is to provide a speech signal coder and decoder with little deterioration of the quality of the decoded signal also for a speech signal with background noise superimposed thereon, also capable of reducing an average bit rate necessary for transmission by calculating statistical characteristic quantities on an input signal in a non-speech segment speech segment (segment with only noise), storing information on a noise model that can express statistical characteristic quantities on an input noise signal, detecting whether a noise model parameter expressing the input noise signal has changed or not and updating the noise model.

Brief Description of Drawings

FIG.1 is a block diagram showing a configuration of a conventional speech signal coder;
FIG.2 is a block diagram showing a configuration of the conventional speech signal coder;
FIG.3 is a block diagram showing a configuration of a radio communication system equipped with a speech signal coder and speech signal decoder according to an embodiment of the present invention;
FIG.4 is a block diagram showing a configuration of a speech signal coder according to Embodiment 1 of the present invention;
FIG.5 is a block diagram showing a configuration of a noise signal coder according to Embodiment 1 of the present invention;
FIG.6 is a block diagram showing a configuration of a speech signal decoder according to Embodiment 1 of the present invention;
FIG.7 is a block diagram showing a configuration of a noise signal generator of the speech signal decoder according to Embodiment 1 of the present invention;
FIG.8 is a flow chart showing a processing flow of a speech signal coding method according to Embodiment 1 of the present invention;
FIG.9 is a flow chart showing a processing flow of a noise signal coding method according to Embodiment 1 of the present invention;
FIG.10 is a block diagram showing a configuration of a speech signal coder according to Embodiment 2 of the present invention;
FIG.11 is a block diagram showing a configuration of a speech signal decoder according to Embodiment 2 of the present invention;
FIG.12 is a flow chart showing a processing flow of a speech signal coding method according to Embodiment 2 of the present invention;
FIG.13 is a block diagram showing a configuration of a speech signal coder according to Embodiment 3 of the present invention; and
FIG.14 is a flow chart showing a processing flow of a speech signal coding method according to Embodiment 3 of the present invention.

Best Mode for Carrying out the Invention

With reference now to the attached drawings, embodiments of the present invention will be explained in detail below.

(Embodiment 1)

FIG.3 is a block diagram showing a configuration of a radio communication apparatus equipped with a speech signal coder according to Embodiment 1 of the present invention.
In this radio communication apparatus, a speech signal is converted to an electric analog signal by speech input apparatus 101 such as a microphone on the transmitting side and output to A/D converter 102. The analog speech signal is converted to a digital speech signal by A/D converter 102 and output to speech coder 103. Speech coder 103 performs speech coding processing on the digital speech signal and outputs the coded information to modulation/demodulation section 104. Modulation/demodulation section 104 digital-modulates the coded speech signal and sends to radio transmission section 105. Radio transmission section 105 performs predetermined radio transmission processing on the modulated signal. This signal is sent out via antenna 106.
On the other hand, on the receiving side of the radio communication apparatus, a reception signal received from antenna 107 is subjected to predetermined radio reception processing by radio reception section 108 and sent to modulation/demodulation section 104. Modulation/demodulation section 104 performs demodulation processing on the reception signal and outputs the demodulated signal to speech decoding section 109. Speech decoding section 109 performs decoding processing on the demodulated signal, obtains a digital decoded speech signal and outputs the digital decoded speech signal to D/A converter 110. D/A converter 110 converts the digital decoded speech signal output from speech decoding section 109 to an analog speech signal and outputs to speech output apparatus 111 such as a speaker. Finally, speech output apparatus 111 converts the electrical analog speech signal to speech sound and outputs.
Speech coding section 103 shown in FIG.3 has a configuration shown in FIG.4. FIG.4 is a block diagram showing a configuration of the speech coding section according to Embodiment 1 of the present invention.
Speech/non-speech decision section 201 decides whether an input signal is in a speech segment or non-speech segment speech segment (segment with only noise) and outputs the decision result to DTX controller & multiplexer 204. Speech/non-speech decision section 201 can be of any type and a decision is generally made using an instantaneous value or amount of change of a plurality of parameters such as power, spectrum and pitch period of the input signal.
In the case where the result of decision by speech/non-speech decision section 201 shows that the input signal is in a speech segment speech coder 202 performs speech coding on the input speech signal in the speech segment including the speech signal and noise signal and outputs the coded data to DTX controller & multiplexer 204. This speech coder 202 is a coder for the speech segment and any coder can be used as far as the coder can perform efficient coding on speech sound.
On the other hand, in the case where the result of decision by speech/non-speech decision section 201 shows that the input signal is in a non-speech segment speech segment, noise signal coder 203 performs noise signal coding on the input signal in the non-speech segment speech segment including only a noise signal and outputs information on a noise model that expresses the input noise signal and a flag indicating whether the noise model should be updated or not to DTX controller & multiplexer 204. Finally, DTX controller & multiplexer 204 controls information to be sent as transmission data using the outputs from speech/non-speech decision section 201, speech coder 202 and noise signal coder 203, multiplexes transmission information and outputs as transmission data.
Noise signal coder 203 in FIG.4 has a configuration shown in FIG.5. FIG.5 is a block diagram showing a configuration of the noise signal coder according to Embodiment 1 of the present invention.
Noise signal analysis section 301 performs a signal analysis on a noise signal input at certain intervals and calculates analysis parameters regarding the noise signal. Analysis parameters extracted are parameters necessary to express statistical characteristic quantities regarding an input signal such as short-time spectrum calculated through FFT (Fast Fourier Transform) on a short-segment signal, input power, LPC spectrum parameter, etc..
Then, noise model variation detection section 303 detects whether a noise model parameter that should express the currently input noise signal has changed from the noise model parameter retained in noise model storage section 302 or not.
Here, the noise model parameter refers to information on a noise model that can express statistical characteristic quantities regarding an input noise signal, for example, information that expresses statistical characteristic quantities such as average spectrum of short-time spectra, variance, etc. using a statistical model such as HMM.
Then, noise model variation detection section 303 decides whether an analysis parameter for the current input signal obtained from noise signal analysis section 301 is appropriate or not as the output from the noise model, which is stored as the noise model expressing preceding input signals (for example, in the case of an HMM model, whether the probability of output of an analysis parameter for the current input signal is equal to or greater than a specified value or not), and in the case where it is decided that the noise model parameter that should express the currently input noise signal has changed from the stored noise model, noise model variation detection section 303 outputs a flag as to whether the noise model should be updated or not and information to be updated (update information) to noise model updating section 304.
The external updating enable flag is a flag to externally instruct on whether the updating of the noise model should be enabled or not and the updating of the noise model is disabled in the case where the speech coder of the present invention, which will be described later, is prevented from sending noise model parameters, for example, for a period during which coded data in the speech segment is sent.
Then, when the noise model updating flag indicates updating, noise model updating section 304 only outputs information of the updated noise model parameters or changed parts of noise model parameters previously stored in noise model storage section 302 and at the same time updates noise model storage section 302 using the output information. On the other hand, when the noise model updating flag indicates no updating, noise model updating section 304 neither updates nor outputs updating information.
Next, speech decoding section 109 shown in FIG.3 has a configuration shown in FIG.6. FIG.6 is a block diagram showing a configuration of the speech decoder according to Embodiment 1 of the present invention.
Separator & DTX controller 401 receives transmission data which is an input signal coded and sent on the coding side as reception data and separates this reception data into speech coded data or noise model parameter, speech/non-speech decision flag and noise model updating flag necessary for speech decoding and noise signal generation.
Then, in the case where the speech/non-speech decision flag indicates the speech segment speech decoder 402 performs speech decoding from the speech coded data and outputs the decoded speech to output switch 404.
On the other hand, in the case where the speech/non-speech decision flag indicates the non-speech segment speech segment, noise signal generator 403 generates a noise signal from the noise model parameter and noise model updating flag and outputs the noise signal to output switch 404. Output switch 404 switches between the output of speech decoder 402 and the output of noise signal generator 403 according to the result of the speech/non-speech decision flag and outputs as an output signal.
Noise signal generator 403 in FIG.6 has a configuration shown in FIG.7. FIG.7 is a block diagram showing a configuration of the noise signal generator of the speech decoder according to Embodiment 1 of the present invention.
The noise model updating flag and noise model parameter (in the case of model updating) output from noise signal coder 203 shown in FIG.5 are input to noise model updating section 501. In the case where the noise model updating flag indicates updating, noise model updating section 501 updates the noise model using the input noise model parameter and the previous noise model parameter retained in noise model storage section 502 and newly stores the updated noise model parameter in noise model storage section 502.
Noise signal generator 503 generates and outputs a noise signal based on the information of noise model storage section 502. Noise signals are generated based on the model information which express statistical characteristic quantities so that the noise signal generated becomes an appropriate signal as the output from the model. For example, in the case where HMM is used as a statistical model, noise signal generator 503 stochastically outputs signal parameters (for example, short-time spectra) necessary for generation according to state transition probability and parameter output probability, etc. and generates/outputs a noise signal based thereupon.
Then, operations of the speech coder and speech decoder in the configurations above will be explained. FIG.8 is a flow chart showing a processing flow of the speech signal coding method according to Embodiment 1. In this method, suppose the processing shown in FIG.8 is repeated for every frame of a certain short segment (for example, approximately 10 to 50 ms).
First, in step (hereinafter referred to as "ST") 101, speech signals are input frame by frame. Then, in ST102, a speech/non-speech decision is made on the input signal and the decision result is output. In the case where the decision result is "speech", in ST104, speech coding processing is performed on the input speech signal and the coded data is output.
On the other hand, in the case where the decision result in ST103 is "non speech", in ST105, the noise signal coder performs noise signal coding processing on the input signal and outputs a flag as to whether information on a noise model that expresses the input noise signal and noise model should be updated or not. The coding processing on a noise signal will be described later.
Then, in ST106, information to be sent as transmission data is controlled and transmission information is multiplexed using the output obtained as a result of speech/non-speech decision, speech coding processing and noise signal coding processing and finally in ST107, this data is output as transmission data.
FIG.9 is a flow chart showing a processing flow of the noise signal coding method of the speech signal coding method according to this embodiment. According to this method, the processing shown in FIG.9 is repeated for every frame of a fixed short segment (for example, approximately 10 to 50 ms).
In ST201, a noise signal is input frame by frame. Then, in ST202, a signal analysis is made on the noise signal frame by frame and analysis parameters for the noise signal are calculated. Then, in ST203, it is detected from the analysis parameters whether the noise model has changed or not and in the case where it is decided that the noise model has changed, in ST205, a flag (updating) as to whether the noise model should be updated or not and information to be updated (updating information) are output and in ST206, noise model storage section 302 is updated using the output information.
On the other hand, in the case where it is decided in ST204 that the noise model has not changed, in ST207, only a flag (no updating) as to whether the noise model should be updated or not is output. In the case where the externally updating enable flag, which is separately input from the outside, is "disabled" in ST203, it is decided that the model has not changed and no noise model parameter is sent.
In this way, according to the random coding method according to this embodiment, by modeling a noise signal with a noise model that can express with statistical characteristic quantities, it is possible to generate a coded signal with little perceptual deterioration with respect to a background noise signal. Moreover, since there is no need for faithful coding on input signal waveforms and transmission is performed only in segments where noise model parameters corresponding to the input signal are changed, it is possible to provide, low bit rate, highly efficient coding.
Furthermore, the speech signal coding method according to this embodiment provides high quality, highly efficient coding even in a background noise environment by performing coding in a speech segment using a speech coder capable of coding a speech signal with high quality and performing coding in a non-speech segment speech segment using a noise signal coder with high efficiency and little perceptual deterioration.

(Embodiment 2)

FIG.10 is a block diagram showing a configuration of a speech signal coding section according to Embodiment 2 of the present invention.
In this speech signal coding section 103, speech/noise signal separator 801 separates an input speech signal into a speech signal and a background noise signal superimposed on the speech signal. Speech/noise signal separator 801 can be of any type. As this separation method, several methods are available such as a method called "spectrum subtraction" which separates an input signal into a speech signal with noise signal suppressed and the noise signal by subtracting a random spectrum from the input signal in the frequency domain and a method of separating speech sound and noise signals using input signals from a plurality of signal input devices.
Then, speech/non-speech decision section 802 decides from the speech signal after the separation obtained from speech/noise signal separator 801 whether the signal is in a speech segment or non-speech segment speech segment (segment with only noise) and outputs the decision result to speech coder 803 and DTX controller & multiplexer 805. It is also possible to make this decision using an input signal before separation. Speech/non-speech decision section 802 can be of any type. This decision is generally made using instantaneous values or amount of variation of a plurality of parameters such as power, spectrum and pitch period of the input signal.
In the case where the decision result of speech/non-speech decision section 802 shows "speech", speech coder 803 performs speech signal coding on the speech signal after the separation obtained from speech/noise signal separator 801 only in the speech segment and outputs the coded data to DTX controller & multiplexer 805. This speech coder 803 is a coder for the speech segment and any coder can be used as far as the coder can perform efficient coding on speech sound.
On the other hand, noise signal coder 804 performs noise signal coding on the noise signal after the separation obtained from speech/noise signal separator 801 over the entire segment and outputs a flag as to whether information on the noise model that expresses an input noise signal and noise model should be updated or not. Speech/noise signal separator 801 is shown in FIG.5 explained in Embodiment 1.
By the way, in the case where the speech/non-speech decision result indicates "speech", the speech/non-speech decision result flag input to noise signal coder 804 is designated as the noise model updating disable flag in noise signal coder 804 and the model is not updated.
Finally, DTX controller & multiplexer 805 controls information to be sent as transmission data and multiplexes transmission information using the outputs from speech/non-speech decision section 802, speech coder 803 and noise signal coder 804 and outputs as transmission data.
FIG.11 is a block diagram showing a configuration of the speech signal decoder according to Embodiment 2 of the present invention.
In the decoder shown in FIG.11, separator & DTX controller 901 receives transmission data, which is an input signal coded and sent on the coding side as reception data and separates the reception data into speech coded data or noise model parameter, speech/non-speech decision flag and noise model updating flag necessary for speech decoding and noise generation.
Then, in the case where the speech/non-speech decision flag indicates the speech segment speech decoder 902 performs speech decoding from the speech coded data and outputs the decoded speech to speech/noise signal adder 904.
On the other hand, noise signal generator 903 generates a noise signal from the noise model parameter and noise model updating flag and outputs the noise signal to speech/noise signal adder 904. Speech/noise signal adder 904 adds up the output of speech decoder 902 and the output of noise signal generator 903 and outputs as an output signal.
Then, with reference to FIG.12, a processing flow of the speech signal coding method according to Embodiment 2 will be explained. In this method, suppose the processing shown in FIG.12 is repeated for every frame of a certain short segment (for example, approximately 10 to 50 ms).
First, in ST301, input signals are input frame by frame. Then, in ST302, an input speech signal is separated into a speech signal and a background noise signal superimposed on the speech signal. Then, in ST303, a speech/non-speech decision is made on the input signal or the speech signal after the separation obtained in ST302 and the decision result is output (ST304).
In the case where the decision result is "speech", in ST305, the speech coder performs speech coding processing on the speech signal after the separation obtained in ST302 and outputs the coded data. Then, on the noise signal after the separation obtained in ST302, the noise signal coder performs noise signal coding in ST306 and outputs information on the noise model that expresses the input noise signal and a flag as to whether the noise model should be updated or not.
In the case where the speech/non-speech decision result in ST303 is "speech", model updating is not performed in noise signal coding processing in ST306. Then, in ST307, information to be sent as transmission data is controlled and transmission information is multiplexed using the output obtained as a result of the speech/non-speech decision, speech coding processing and noise signal coding processing and finally in ST308, this data is output as transmission data.
In this way, the speech signal coder of this embodiment can perform coding in a speech segment using the speech coder providing high-quality coding on the speech signal and perform coding on a noise signal using the noise signal coder of Embodiment 1 with high efficiency and little perceptual deterioration, and therefore can perform high-quality and high-efficiency coding even in a background noise environment. Furthermore, by providing a speech/noise signal separator, the speech signal coder of this embodiment can remove superimposed background noise signals from the speech signal input to the speech coder, providing higher-quality or higher efficiency coding in the speech segment.

(Embodiment 3)

FIG.13 is a block diagram showing a configuration of a speech coding section according to Embodiment 3 of the present invention. The configuration on the decoding side of this embodiment is the same as the configuration of the speech signal decoder shown in FIG.6.
Input signal analyzer 1101 performs a signal analysis on an input signal input for every certain segment and calculates analysis parameters for the input signal. Characteristic parameters to be extracted include parameters necessary to express statistical characteristic quantities on the input signal and parameters expressing speech characteristics. The parameters necessary to express statistical characteristic quantities include short-time spectra obtained by FFT on a short-segment signal, input power, LPC spectrum parameter, etc.. On the other hand, the parameters expressing speech characteristics include LPC parameter, input power and pitch period information, etc.
Then, mode decision section 1104 decides whether the input signal is in a speech segment or non-speech segment speech segment (segment with only noise) and whether a noise model is updated and updating information is sent or not in the case of a non-speech segment speech segment, on the analysis parameters obtained by input signal analyzer 1101 using the speech characteristic pattern retained in speech model storage section 1102 and the noise model parameter retained in noise model storage section 1103.
Here, speech model storage section 1102 creates and stores speech characteristic patterns beforehand and the speech characteristic patterns include information such as distribution of LPC parameters, input signal power and pitch period information, etc. in a speech (voiced) segment. Furthermore, the noise model parameters refer to information on a noise model that can express statistical characteristic quantities on the input noise signal such as information expressing statistical characteristic quantities such as(?) average spectrum of short-time spectra, distribution value, using a statistic model such as HMM.
Then, input signal analyzer 1101 decides whether statistical analysis parameters for the current input signal obtained is appropriate as the output from the noise model stored noise modelexpressing signals in the preceding random segment or not (for example, in the case of an HMM model, whether the probability of output of an analysis parameter for the current input signal is equal to or greater than a specified value) and at the same time decides from the parameter expressing speech characteristics on the input signal whether the signal is in a speech (voiced) segment or not.
In the case where mode decision section 1104 decides that the signal is in the speech segment speech coder 1105 performs speech coding on the input signal and outputs the coded data to DTX controller & multiplexer 1107. On the other hand, in the case where mode decision section 1104 decides that the signal is in the non-speech segment speech segment and noise model updating information is sent, noise model updating section 1106 updates the noise model and outputs the information on the updated noise model to DTX controller & multiplexer 1107.
Finally, DTX controller & multiplexer 1107 controls information to be sent as transmission data and multiplexes transmission information using the outputs from the speech coder and noise model updating section 1106 and outputs as transmission data.
Then, with reference to FIG.14, a processing flow of the speech signal coding method according to this embodiment will be explained. In this method, suppose the processing shown in FIG.14 is repeated for every frame of a certain short segment (for example, approximately 10 to 50 ms).
First, in ST401, input signals are input frame by frame. Then, in ST402, a signal analysis is made on an input signal input for every certain segment and their analysis parameters are calculated and output.
Then, in ST403, it is decided whether a currently input statistical analysis parameter is appropriate or not as the output from the noise model retained in noise model storage section 1103 in FIG.11 (ST404). In the case where the decision result shows that the parameter is not appropriate, that is, the current input signal cannot be expressed with the noise model currently retained, the process moves on to next ST405 and it is decided from the speech characteristic parameter obtained by analyzing the input signal whether the signal is in a speech (voiced) segment or not. In the case where it is decided that the signal is in a speech segment, in ST406, the speech coder performs speech coding processing and outputs the coded data.
On the other hand, in the case where it is decided in ST405 that the signal is not in the speech segment, in ST407, the noise model is updated and information on the updated noise model is output. In the case where it is decided in ST403 that the current input can be expressed with the noise model which is currently retained, no processing is performed and the process moves on to the next step. Then, in ST408, information to be transmitted as transmission data is controlled and transmission information is multiplexed using the outputs from the speech coder and noise model updater, and in ST409 transmission data is output.
As described above, by providing a mode decision section, the speech signal coder according to this embodiment can make decisions using a variation in statistical characteristic quantities of an input signal and speech characteristic patterns. Therefore, this embodiment can make more precise mode decisions and suppress deterioration of quality due to decision errors.
The noise signal coder of the present invention adopts a configuration comprising an analyzer that performs a signal analysis on a noise signal contained in a speech signal, a storage device that stores information on a noise model expressing the noise signal, a detector that detects a variation of information on the stored noise model based on the result of a signal analysis of a current input noise signal and an updater that updates, when a change of the information on the noise model is detected, information on the noise model stored by the amount of the variation.
This configuration allows a noise signal to be modeled with a noise model capable of expressing with statistical characteristic quantities, and thereby can generate a decoded signal with little perceptual deterioration with respect to a background noise signal. This modeling also eliminates the need for faithful coding for the input signal waveform, providing low bit rate, highly efficient coding by only transmitting a segment where a noise model parameter corresponding to the input signal changes.
The noise signal coder of the present invention in the above configuration adopts a configuration with the analyzer extracting statistical characteristic quantities on the noise signal and the storage device storing information capable of expressing the statistical characteristic quantities as information on the noise model.
This configuration provides appropriate modeling of a noise signal and low bit rate, highly efficient coding.
The speech signal coder of the present invention adopts a configuration comprising a speech/non-speech decision section that decides whether an input speech signal is in a speech segment or non-speech segment speech segment that includes only a noise signal, a speech coder that performs speech coding on the input speech signal when the decision result shows that the signal is in a speech segment, the noise signal coder that, performs noise signal coding on the input signal when the decision result shows that the signal is in a non-speech segment speech segment and a multiplexer that multiplexes the outputs from the speech/non-speech decision section, speech coder and noise signal coder.
According to this configuration, the speech coder capable of performing high quality coding on the speech signal performs coding in a speech segment and the noise signal coder with high efficiency and little perceptual deterioration performs coding in a non-speech segment speech segment, thus providing high quality and highly efficient coding even in a background noise environment.
The speech signal coder of the present invention adopts a configuration comprising a speech/noise signal separator that separates an input speech signal into a speech signal and a background noise signal superimposed on this speech signal, a speech/non-speech decision section that decides the speech segment or non-speech segment speech segment including only the noise signal from the speech signal obtained from the input speech signal or the speech/noise signal speech/non-speechseparator, a speech coder that performs speech coding on the input speech signal when the decision result indicates a speech segment, the noise signal coder that performs coding on the background noise signal obtained from the speech/noise signal speech/non-speech separator and a multiplexer that multiplexes the outputs from the speech/noise signal speech/non-speech decision section, speech coder and noise signal coder.
According to this configuration, the speech coder capable of performing high quality coding on the speech signal performs coding in a speech segment and the noise signal coder with high efficiency and little perceptual deterioration performs coding on a noise signal, thus providing high quality and highly efficient coding even in a background noise environment. Furthermore, provision of the speech/noise signal speech/non-speech separator makes it possible to remove superimposed background noise from the speech signal input to the speech coder, providing high quality, highly efficient coding on the speech segment.
The speech signal coder of the present invention adopts a configuration comprising an analyzer that performs a signal analysis on an input speech signal, a speech model storage device that stores speech characteristic patterns necessary to decide whether the input speech signal is a voiced signal or not, a noise model storage device that stores information on a noise model expressing a noise signal included in the input speech signal, a mode decision section that decides whether the input speech signal is in a speech segment or non-speech segment speech segment containing only a noise signal using the outputs of the analyzer, speech model storage device and noise model storage device and in the case of the non-speech segment speech segment, decides whether the noise model should be updated or not, a speech coder that performs speech coding on the input speech signal when the mode decision section decides the speech segment, a noise model updater that updates the noise model when the mode decision section decides the non-speech segment speech segment and decides that the noise model will be updated and a multiplexer that multiplexes the outputs from the speech coder and noise model updater.
According to this configuration, provision of the mode decision section makes it possible to make a decision using a variation of statistical characteristic quantities of the input signal and speech characteristic patterns. Thus, this configuration provides more precise mode decision and can suppress quality deterioration due to decision errors.
The noise signal generator of the present invention adopts a configuration comprising a noise model updater that updates a noise model as required according to noise model parameters coded on the input noise signal on the coding side and the noise model updating flag, a noise model storage device that stores information on the updated noise model using the output of the noise model updater and a noise signal generator that generates a noise signal from information on the noise model stored in the noise model storage device.
According to this configuration, it is possible to generate a decoded signal with little perceptual deterioration with respect to a background noise signal.
The noise signal generator of the present invention in the above configuration adopts a configuration with the noise model parameters input to the noise model updater and information stored in the noise model storage device being information capable of expressing statistical characteristic quantities on the noise signal generated.
By modeling a noise signal with a noise model capable of expressing with statistical characteristic quantities, this configuration can generate a decoded signal with little perceptual deterioration with respect to a background noise signal.
The speech signal decoder of the present invention adopts a configuration comprising a separator that receives a signal including speech data coded on the coding side, noise model parameter, speech/non-speech decision flag and noise model updating flag and separates the noise model parameter, speech/non-speech decision flag and noise model updating flag from the signal, a speech decoder that performs speech decoding on the speech data when the speech/non-speech decision flag indicates a speech segment, a noise signal generator that generates a noise signal from the noise model parameter and noise model updating flag when the speech/non-speech decision flag indicates a non-speech segment speech segment and an output switch that switches between the decoded speech output from the speech decoder and the noise signal output from the noise signal generator according to the speech/non-speech decision flag and outputs as an output signal.
This configuration makes it possible to generate a decoded signal with little perceptual deterioration with respect to a background noise signal.
The speech signal decoder of the present invention adopts a configuration comprising a separator that receives a signal including speech data coded on the coding side, noise model parameter, speech/non-speech decision flag and noise model updating flag and separates the noise model parameter, speech/non-speech decision flag and noise model updating flag from the signal, a speech decoder that performs speech decoding on the speech data when the speech/non-speech decision flag indicates a speech segment, the noise signal generator that generates a noise signal from the noise model parameter and noise model updating flag when the speech/non-speech decision flag indicates a non-speech segment speech segment and a speech/noise signal adder that adds up the decoded speech output from the speech decoder and noise signal output from the noise signal generator.
This configuration makes it possible to generate a decoded signal with little perceptual deterioration with respect to a background noise signal. Furthermore, after the coding side separates a speech signal and a noise signal superimposed thereon, coders suited to their respective signals perform coding and the decoding side adds up the signals to generate a decoded signal, thus providing coding of a speech signal in a speech segment with higher quality.
The speech signal coding method of the present invention comprises a speech/non-speech deciding step of deciding whether an input speech signal is in a speech segment or non-speech segment speech segment that includes only a noise signal, a speech coding step of coding the input speech signal when the decision result shows that the signal is in a speech segment, a noise signal coding step of performing noise signal coding on the input signal when the decision result shows that the signal is in a non-speech segment speech segment, and a multiplexing step of multiplexing the outputs from the speech/non-speech deciding step, speech coding step and noise signal coding step, and the noise signal coding step comprises an analyzing step of performing a signal analysis on a noise signal contained in a speech signal, a storing step of storing information on a noise model expressing the noise signal, a detecting step of detecting a variation of information on the stored noise model based on the result of a signal analysis of a current input noise signal and an updating step of updating information on the noise model stored by the amount of the variation when a change of the information on the noise model is detected.
According to this method, the speech coding section capable of performing high quality coding on the speech signal performs coding in a speech segment and the noise signal coder of the first embodiment with high efficiency and little perceptual deterioration performs coding in a non-speech segment speech segment, thus providing high quality, highly efficient coding even in a background noise environment.
The speech signal coding method of the present invention comprises a speech/noise signal separating step of separating an input speech signal into a speech signal and a background noise signal superimposed on this speech signal, a speech/non-speech deciding step of deciding the speech segment or non-speech segment speech segment that includes only the noise signal from the speech signal obtained in the input speech signal or the speech/noise signal separating step, a speech coding step of performing speech coding on the input speech signal when the decision result indicates a speech segment, a noise signal coding step of performing noise signal coding on the input signal when the decision result indicates a non-speech segment speech segment and performing coding on the background noise signal obtained from the speech/noise signal separating step and a multiplexing step of multiplexing the outputs from the speech/non-speech deciding step, speech coding step and noise signal coding step, and the noise signal coding step comprises an analyzing step of performing a signal analysis on a noise signal contained in a speech signal, a storing step of storing information on a noise model expressing the noise signal, a detecting step of detecting a variation of information on the stored noise model based on the result of a signal analysis of a current input noise signal and an updating step of updating when a variation of the information on the noise model is detected, information on the stored noise model by the amount of the variation.
According to this configuration, the speech coding section capable of performing high quality coding on the speech signal performs coding in a speech segment and the noise signal coder of the first embodiment with high efficiency and little perceptual deterioration performs coding in a non-speech segment speech segment, thus providing high quality and highly efficient coding even in a background noise environment. Furthermore, provision of the speech/noise signal speech/non-speech separating section makes it possible to remove superimposed background noise from the speech signal input to the speech coding section, providing high quality, highly efficient coding on the speech segment.
The speech signal coding method of the present invention comprises an analyzing step of performing a signal analysis on an input speech signal, a speech model storing step of storing speech characteristic patterns necessary to decide whether the input speech signal is a voiced signal or not, a noise model storing step of storing information on a noise model expressing a noise signal included in the input speech signal, a mode deciding step of deciding whether the input speech signal is in a speech segment or non-speech segment speech segment containing only a noise signal using the outputs of the analyzing section, speech model storing section and noise model storing section and when the decision result indicates the non-speech segment speech segment, deciding whether the noise model should be updated or not, a speech coding step of performing speech coding on the input speech signal when the mode decision section decides the speech segment, a noise model updating step of updating the noise model when the mode decision section decides the non-speech segment speech segment and decides that the noise model will be updated, and a multiplexing step of multiplexing the outputs from the speech coding section and noise model updating section.
According to this method, provision of the mode decision section allows decisions to be made using a variation of statistical characteristic quantities and speech characteristic patterns of the input signal. Thus, this method provides more precise mode decisions and suppresses quality deterioration due to decision errors.
The recording medium of the present invention is a mechanically readable medium that records a program to execute the steps of analyzing statistical characteristic quantities on an input noise signal, storing information on a noise model expressing the statistical characteristic quantities on the input noise signal, detecting a variation of the noise model expressing the input noise signal and updating the noise model and outputting information on the updated noise model as required.
As described above, the noise signal coder of the present invention can generate a decoded signal with little perceptual deterioration with respect to a background noise signal by modeling a noise signal with a noise model capable of expressing the noise signal with statistical characteristic quantities. The noise signal coder of the present invention also eliminates the need for faithful coding for the input signal waveform, and thus provides low bit rate, highly efficient coding by transmitting only a segment where a noise model parameter for the input signal changes.
Furthermore, the speech signal coder of the present invention provides high-quality, highly efficient coding even in a background noise environment by performing coding in a speech segment through a speech coder capable of coding a speech signal with high quality and performing coding in a non-speech segment speech segment through the noise signal coder with high efficiency and little perceptual deterioration.
This application is based on the Japanese Patent Application No.HEI 11-168545 filed on June 15, 1999, entire content of which is expressly incorporated by reference herein.

Industrial Applicability

The present invention is applicable to a base station apparatus and communication terminal apparatus in a digital radio communication system.

Claims

A noise signal coder comprising:

analyzing means for performing a signal analysis on a noise signal contained in a speech signal;

storing means for storing information on a noise model expressing said noise signal;

detecting means for detecting a variation of information on the stored noise model based on the signal analysis result of a current input noise signal; and

updating means for updating, when a variation of the information on the noise model is detected, information on said noise model stored by the amount of the variation.
The noise signal coder according to claim 1, wherein the analyzing means extracts statistical characteristic quantities on the noise signal and the storing means stores information capable of expressing said statistical characteristic quantities as information on the noise model.
A speech signal coder comprising:

speech/non-speech deciding means for deciding whether an input speech signal is in a speech segment or non-speech segment speech segment that includes only a noise signal;

speech coding means for performing speech coding on said input speech signal when the decision result indicates the speech segment;

the noise signal coder according to claim 1 or claim 2 that performs noise signal coding on said input signal when the decision result indicates the non-speech segment speech segment; and

multiplexing means for multiplexing the outputs from said speech/non-speech deciding means, said speech coding means and said noise signal coder.
A speech signal coder comprising:

speech/noise signal separating means for separating an input speech signal into a speech signal and a background noise signal superimposed on this speech signal;

speech/non-speech deciding means for deciding whether a signal is in a speech segment or non-speech segment speech segment including only a noise signal from the speech signal obtained from said input speech signal or said speech/noise signal separating means;

speech coding means for performing speech coding on said input speech signal when the decision result indicates the speech segment;

the noise signal coder according to claim 1 that performs coding on the background noise signal obtained from said speech/noise signal separating means; and

multiplexing means for multiplexing the outputs from said speech/non-speech deciding means, said speech coding means and said noise signal coder.
A speech signal coder comprising:

analyzing means for performing a signal analysis on an input speech signal;

speech model storing means for storing speech characteristic patterns necessary to decide whether said input speech signal is a voiced signal or not;

noise model storing means for storing information on a noise model expressing a noise signal contained in said input speech signal;

mode deciding means for deciding whether said input speech signal is in a speech segment or non-speech segment speech segment containing only a noise signal using the outputs of said analyzing means, speech model storing means and noise model storing means and, when the decision result indicates the non-speech segment speech segment, deciding whether the noise model should be updated or not;

speech coding means for performing speech coding on the input speech signal when the mode deciding means decides the speech segment;

noise model updating means for updating the noise model when said mode deciding means decides the non-speech segment speech segment and decides that the noise model will be updated; and

multiplexing means for multiplexing the outputs from the speech coding means and noise model updating means.
A base station apparatus equipped with a speech signal coder, said speech signal coder comprising:

speech/non-speech deciding means for deciding whether an input speech signal is in a speech segment or non-speech segment speech segment that includes only a noise signal;

speech coding means for performing speech coding on said input speech signal when the decision result indicates the speech segment;

the noise signal coder according to claim 1 or claim 2 that performs noise signal coding on said input signal when the decision result indicates the non-speech segment speech segment; and

multiplexing means for multiplexing the outputs from said speech/non-speech deciding means, said speech coding means and said noise signal coder.
A communication terminal apparatus equipped with a speech signal coder, said speech signal coder comprising:

speech/non-speech deciding means for deciding whether an input speech signal is in a speech segment or non-speech segment speech segment that includes only a noise signal;

speech coding means for performing speech coding on said input speech signal when the decision result indicates the speech segment;

the noise signal coder according to claim 1 or claim 2 that performs noise signal coding on said input signal when the decision result indicates the non-speech segment speech segment; and

multiplexing means for multiplexing the outputs from said speech/non-speech deciding means, said speech coding means and said noise signal coder.
A noise signal generator comprising:

noise model updating means for updating a noise model as required according to noise model parameters coded on an input noise signal on the coding side and a noise model updating flag;

noise model storing means for storing information on the updated noise model using the output of said noise model updating means; and

noise signal generating means for generating a noise signal from information on the noise model stored in said noise model storing means.
The noise signal generator according to claim 8, wherein the noise model parameters input to said noise model updating means and information stored in said noise model storing means are information capable of expressing statistical characteristic quantities on the noise signal generated.
A speech signal decoder comprising:

separating means for receiving a signal including speech data coded on the coding side, noise model parameter, speech/non-speech decision flag and noise model updating flag and separating the noise model parameter, speech/non-speech decision flag and noise model updating flag from said signal;

speech decoding means for performing speech decoding on said speech data when the speech/non-speech decision flag indicates a speech segment;

the noise signal generator according to claim 8 that when said speech/non-speech decision flag indicates a non-speech segment speech segment, generates a noise signal from said noise model parameter and noise model updating flag; and

output switching means for switching between the decoded speech output from said speech decoding means and the noise signal output from said noise signal generator according to said speech/non-speech decision flag and outputting as an output signal.
A speech signal decoder comprising:

separating means for receiving a signal including speech data coded on the coding side, noise model parameter, speech/non-speech decision flag and noise model updating flag and separating the noise model parameter, speech/non-speech decision flag and noise model updating flag from said signal;

speech decoding means for performing speech decoding on said speech data when said speech/non-speech decision flag indicates a speech segment;

the noise signal generator according to claim 8 or claim 9 that, when said speech/non-speech decision flag indicates a non-speech segment speech segment, generates a noise signal from said noise model parameter and noise model updating flag; and

speech/noise signal adding means for adding up the decoded speech output from said speech decoding means and the noise signal output from said noise signal generator.
A speech signal coding method comprising:

a speech/non-speech deciding step of deciding whether an input speech signal is in a speech segment or non-speech segment speech segment that includes only a noise signal;

a speech coding step of performing speech coding on said input speech signal when the decision result indicates the speech segment;

a noise signal coding step of performing noise signal coding on said input signal when the decision result indicates the non-speech segment speech segment; and

a multiplexing step of multiplexing the outputs from said speech/non-speech deciding step, said speech coding step and said noise signal coding step, wherein the noise signal coding step comprising:

an analyzing step of performing a signal analysis on a noise signal contained in a speech signal;

a storing step of storing information on a noise model expressing said noise signal;

a detecting step of detecting a variation of information on the stored noise model based on the result of a signal analysis of a current input noise signal; and

an updating step of updating, when a variation of the information on the noise model is detected, information on said noise model stored by the amount of said variation.
A speech signal coding method comprising:

a speech/noise signal separating step of separating an input speech signal into a speech signal and a background noise signal superimposed on this speech signal;

a speech/non-speech deciding step of deciding a speech segment or non-speech segment speech segment including only a noise signal from the speech signal obtained from said input speech signal or said speech/noise signal separating step;

a speech coding step of performing speech coding on said input speech signal when the decision result indicates a speech segment;

a noise signal coding step of performing coding on a background noise signal obtained from said speech/noise signal separating step; and

a multiplexing step of multiplexing the outputs from said speech/non-speech deciding step, said speech coding step and said noise signal coding step, wherein the noise signal coding step comprising:

an analyzing step of performing a signal analysis on a noise signal contained in a speech signal;

a storing step of storing information on a noise model expressing said noise signal;

a detecting step of detecting a variation of information on the stored noise model based on the result of a signal analysis of a current input noise signal; and

an updating step of updating, when a variation of the information on the noise model is detected, information on said noise model stored by the amount of said variation.
A speech signal coding method comprising:

an analyzing step of performing a signal analysis on an input speech signal;

a speech model storing step of storing speech characteristic patterns necessary to decide whether said input speech signal is a voiced signal or not;

a noise model storing step of storing information on a noise model expressing a noise signal contained in said input speech signal;

a mode deciding step of deciding whether said input speech signal is in a speech segment or non-speech segment speech segment containing only a noise signal using the outputs of said analyzing means, speech model storing means and noise model storing means and when said decision result indicates the non-speech segment speech segment, deciding whether the noise model should be updated or not;

a speech coding step of performing speech coding on the input speech signal when said mode deciding means decides the speech segment;

a noise model updating step of updating the noise model when said mode deciding means decides the non-speech segment speech segment and decides that the noise model will be updated; and

a multiplexing step of multiplexing the outputs from the speech coding means and noise model updating means.
A mechanically readable recording medium that records a program to execute the steps of:

analyzing statistical characteristic quantities on an input noise signal;

storing information on a noise model expressing the statistical characteristic quantities on the input noise signal;

detecting a variation of the noise model expressing the input noise signal; and

updating the noise model and outputting information on the updated noise model as required.