WO2010046954A1 - 雑音抑圧装置および音声復号化装置 - Google Patents
雑音抑圧装置および音声復号化装置 Download PDFInfo
- Publication number
- WO2010046954A1 WO2010046954A1 PCT/JP2008/003021 JP2008003021W WO2010046954A1 WO 2010046954 A1 WO2010046954 A1 WO 2010046954A1 JP 2008003021 W JP2008003021 W JP 2008003021W WO 2010046954 A1 WO2010046954 A1 WO 2010046954A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spectrum
- noise
- signal
- unit
- noise suppression
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
Definitions
- the present invention relates to a noise suppression device that suppresses noise mixed in a speech / acoustic signal and a speech decoding device including the noise suppression device.
- SS Spectrum Subtraction
- the estimation error of the noise spectrum remains as distortion in the signal after noise suppression processing, which has characteristics that are significantly different from the signal before processing, and is also harsh noise (artificial Noise (also called a musical tone), the subjective quality of the output signal may be greatly degraded.
- Patent Document 1 As a conventional method for suppressing the subjective feeling of deterioration as described above, for example, there is one disclosed in Patent Document 1.
- the sound signal processing method of Patent Document 1 is intended to audibly reduce distortion caused by noise suppression processing and low bit rate speech coding processing.
- An input signal and processing that smoothes the input signal By performing weighted addition on the signal based on the estimated noise ratio in the signal obtained by the voice / noise state discriminating means, the subjective quality is improved mainly in the section where there are many deterioration components such as background noise. It is a thing.
- the weighted addition control of the input signal and the processed signal depends on the voice / noise state discriminating means, and the voice section detection fails in the section including the voice.
- the voice section detection fails in the section including the voice.
- the evaluation value itself is based on the analysis result in the time domain.
- the threshold value of the evaluation value is adjusted so as to suppress the feeling of deterioration of the low range noise, the noise signal is relatively If high-frequency audio signals with high power are processed by mistake and the quality deteriorates, conversely, if adjustment is made so that distortion of the high-frequency audio signals does not appear, there is a problem that almost no improvement effect can be obtained. .
- the control factor is only the magnitude of the amplitude spectrum component of the input signal, and whether the frequency component is speech or noise.
- whether or not the input signal is voice (or musical tone) depends largely on the section judgment evaluation value in the time domain, and if the section judgment is wrong, the situation of quality degradation does not change.
- the present invention has been made in order to solve such a problem, and a noise suppression device capable of preferable noise suppression for hearing and having little quality degradation even under high noise, and a high-quality speech decoding device including the noise suppression device
- the purpose is to provide.
- a noise suppression device is based on a time / frequency converter that converts an input signal into an input signal spectrum that is a frequency component, a noise spectrum estimator that estimates an estimated noise spectrum from the input signal, and an estimated noise spectrum
- a noise spectrum suppressor that performs noise suppression of the input signal spectrum and generates a noise suppression spectrum, and a signal that deforms the noise suppression spectrum according to the ratio based on the noise suppression spectrum and the estimated noise spectrum and generates a smoothed processing spectrum
- a modification unit and a signal addition unit that adds a processed spectrum to a noise suppression spectrum and suppresses a degradation component included in the noise suppression spectrum are provided.
- the speech decoding apparatus includes a speech decoding unit that decodes predetermined code data to generate a decoded signal, a time / frequency conversion unit that converts the decoded signal into a decoded signal spectrum that is a frequency component, A noise spectrum estimator for estimating an estimated noise spectrum from a decoded signal; a signal modifying unit for generating a smoothed processed spectrum while modifying the decoded signal spectrum according to a ratio based on the decoded signal spectrum and the estimated noise spectrum; A signal addition unit that adds the processed spectrum to the spectrum and suppresses the degradation component included in the decoded signal spectrum is provided.
- Embodiment 1 is an overall configuration diagram of a noise suppression device according to Embodiment 1 of the present invention. It is operation
- movement explanatory drawing which shows a series of processing content in the signal processing part as described in Embodiment 1 of this invention, and shows what expressed and expressed the amplitude spectrum and phase spectrum of the frequency of the area
- FIG. 1 shows the overall configuration of a noise suppression apparatus 100 according to this embodiment.
- a noise suppression apparatus 100 shown in FIG. 1 includes a time / frequency conversion unit 2, a noise suppression unit 3, a signal processing unit 4, and a frequency / time conversion unit 5.
- the noise suppression unit 3 includes a noise spectrum suppression unit 7 and a noise spectrum estimation unit 8 including a voice / noise determination unit 9 and a noise spectrum update unit 10.
- the signal processing unit 4 includes a signal addition unit 11, an amplitude smoothing unit 12, and a signal deformation unit 13 including a processing component calculation unit 14 and a phase disturbance unit 15.
- an input signal 1 sampled at a predetermined sampling frequency (for example, 8 kHz) and divided into frames at a predetermined frame period (for example, 20 msec) is converted into a time / frequency conversion unit 2 in the noise suppression apparatus 100 and described later.
- a predetermined sampling frequency for example, 8 kHz
- a predetermined frame period for example, 20 msec
- the time / frequency conversion unit 2 performs a windowing process on the input signal 1 divided into the above-described frame periods, and, for example, 256-point FFT (Fast Fourier Transform: Fast Fourier Transform) Is converted into an input signal spectrum 16 which is a spectral component for each frequency.
- the time / frequency converter 2 outputs the input signal spectrum 16 to the noise spectrum suppressor 7 and noise spectrum estimator 8 inside the noise suppressor 3 and the amplitude smoother 12 inside the signal processor 4.
- a known method such as a Hanning window or a trapezoidal window can be used.
- FFT is a well-known method, description is abbreviate
- the noise spectrum suppression unit 7 uses the estimated noise spectrum 17 input from the noise spectrum estimation unit 8 described later with respect to the input signal spectrum 16 input from the time / frequency conversion unit 2. The noise suppression process is performed, and the obtained result is output as a noise suppression spectrum 18 to the signal addition unit 11 and the processing component calculation unit 14 in the signal processing unit 4.
- a method of noise suppression processing in the noise spectrum suppression unit 7 for example, a method based on spectral subtraction as described in Non-Patent Document 1, or a signal for each frequency of the input signal spectrum 16 and the estimated noise spectrum 17 is used.
- a method that combines spectral subtraction and spectral amplitude suppression for example, Japanese Patent No. 3454190 “Noise Suppression”. The method described in “Apparatus and Method”) can be used.
- the signal processing unit 4 processes the deterioration component in the noise suppression spectrum 18 so as to be audibly preferable in accordance with the modes of the noise suppression spectrum 18 and the estimated noise spectrum 17 which are input signal spectra after noise suppression. Specifically, using the noise suppression spectrum 18 output from the noise spectrum suppression unit 7 and the estimated noise spectrum 17 output from the noise spectrum estimation unit 8, the signal transformation unit 13 generates a processed spectrum 19 and adds the signal. The unit 11 adds the processed spectrum 19 to the noise spectrum 18 to obtain an added spectrum 20. Then, the amplitude smoothing unit 12 smoothes the addition spectrum 20 in the time direction and the frequency direction, and outputs the result to the frequency / time conversion unit 5 as a smoothed noise suppression spectrum 21 smoothed so as to be audible. The processing of the signal processing unit 4 will be described in detail later.
- the frequency / time conversion unit 5 performs inverse FFT processing on the smoothed noise suppression spectrum 21 input from the signal processing unit 4 to return to the time domain signal, and windowing for smooth connection with the previous and subsequent frames. The connection is performed while processing is performed, and the obtained signal is output as the output signal 6.
- the noise spectrum estimation unit 8 estimates an average noise spectrum in the input signal 1.
- the speech / noise determination unit 9 calculates the speech likelihood signal VAD using the input signal 1, the input signal spectrum 16 output by the time / frequency conversion unit 2, and the estimated noise spectrum 17 estimated from the past frame. I do.
- the voice likelihood signal VAD represents the degree of whether the input signal 1 of the current frame is voice or noise. For example, when the possibility of voice is high, the voice likelihood signal VAD takes a large evaluation value, and the possibility of voice. Is a signal that takes a small evaluation value.
- the speech / noise determination unit 9 calculates the speech likelihood signal VAD by using, for example, the maximum value of autocorrelation analysis of the input signal 1 and the frame SN ratio that can be calculated from the ratio of the power of the input signal 1 and the power of the estimated noise spectrum 17. Can be used alone or in combination.
- the maximum value ACF max of the autocorrelation analysis result of the input signal 1 can be calculated by Expression (1)
- the frame SN ratio SNR fr can be calculated by Expression (2).
- x (t) is the input signal 1 divided into frames at time t
- N is the autocorrelation analysis section length
- S (k) is the kth component of the input signal spectrum 16
- N (k) is the estimated noise.
- M is the number of FFT points.
- the speech likelihood signal VAD can be calculated by the following equation (3), for example.
- VAD w ACF ⁇ ACF max + w SNR ⁇ SNR fr ⁇ SNR norm (3)
- SNR norm is a predetermined value for normalizing the value of SNR fr within the range of 0 to 1
- w ACF and w SNR are predetermined values for weighting, respectively.
- the sound quality signal VAD may be adjusted in advance so that it can be suitably determined.
- ACF max takes a value in the range of 0 to 1 from the property of the above formula (1).
- the speech / noise determination unit 9 outputs the speech likelihood signal VAD for noise spectrum estimation calculated by the processing described above to the noise spectrum update unit 10.
- the voice / noise determination unit 9 uses the input signal spectrum 16 and the estimated noise spectrum 17 to calculate the SN ratio of the spectrum component for each frequency, and the sum of the SN ratios of the spectrum components for each frequency (The larger the sum is, the higher the possibility of speech is), or the variance of the S / N ratio of the spectral components for each frequency (the greater the variance is, the more likely the speech harmonic structure appears and the more likely the speech is) It is possible to make various improvements and changes such as using.
- the noise spectrum update unit 10 refers to the speech likelihood signal VAD that is the output of the voice / noise determination unit 9, and when the state of the input signal 1 of the current frame is highly likely to be noise, the noise spectrum update unit 10 calculates the input signal spectrum 16 of the current frame.
- the estimated noise spectrum 17 estimated from the past frame stored in the internal memory or the like is updated.
- the noise spectrum updating unit 10 updates the input signal spectrum 16 by reflecting it in the estimated noise spectrum 17 according to the following equation (4), for example.
- n is the frame number
- N (n ⁇ 1, k) is the estimated noise spectrum 17 before update
- S noise (n, k) is the input signal spectrum 16 of the current frame determined to have a high possibility of noise.
- N tilde (n, k) (for the purposes of electronic application, an alphabetic character with a symbol of ⁇ is expressed as an alphabetic tilde) is an estimated noise spectrum 17 after update.
- ⁇ (k) is a predetermined update speed coefficient that takes a value from 0 to 1, and it is preferable to set a value relatively close to 0.
- ⁇ (k) may be better to have a slightly larger coefficient value as it becomes higher, and can be adjusted according to the type of noise.
- the noise spectrum updating unit 10 calculates the right side of the equation (4) and updates the left tilde (n, k) as the new estimated noise spectrum 17.
- the noise spectrum update unit 10 outputs the obtained estimated noise spectrum 17 to the noise spectrum suppression unit 7, the speech / noise determination unit 9, the processing component calculation unit 14, and the amplitude smoothing unit 12.
- the estimated noise spectrum 17 output to the voice / noise determination unit 9 is applied in the voice-likeness evaluation of the next frame.
- a plurality of update speed coefficients are applied according to the value of the speech likelihood signal VAD, or between frames.
- the value of the speech likelihood signal VAD or between frames.
- an update rate coefficient that increases the update rate is applied, or the power is the smallest or the voice-likeness in a certain time.
- Various modifications and improvements such as replacing (resetting) the estimated noise spectrum 17 with the input signal spectrum 16 of the frame having the smallest signal VAD are possible.
- the noise spectrum update unit 10 does not update the estimated noise spectrum 17. May be.
- the signal transformation unit 13 generates a processed spectrum 19 using the noise suppression spectrum 18 output from the noise spectrum suppression unit 7 and the estimated noise spectrum 17 output from the noise spectrum estimation unit 8.
- the processed component calculation unit 14 obtains a value (deformation estimated noise spectrum described later) obtained by multiplying the amplitude value by a predetermined value, and the same amplitude value as the obtained value.
- the noise suppression spectrum 18 is deformed so as to have the following, and is output to the phase disturbance unit 15 as a modified noise suppression spectrum 18a.
- the predetermined value multiplied by the estimated noise spectrum 17 for example, a value near the maximum suppression amount in the noise suppression processing is suitable.
- the predetermined value may be set to about 0.25 to 0.2, and is set in advance according to the type of noise, the noise suppression method, the degree of deterioration, or the user's preference. Adjust it. It is also possible to hold a plurality of values in a memory or the like, and the processing component calculation unit 14 can switch to a suitable value according to the type of noise and noise power.
- the phase disturbance unit 15 performs phase disturbance as a kind of smoothing.
- the phase disturbance unit 15 gives a disturbance to the phase component for each frequency with respect to the modified noise suppression spectrum 18 a calculated by the processing component calculation unit 14, and outputs the spectrum after the disturbance as a processing spectrum 19 to the signal addition unit 11.
- a phase angle within a predetermined range may be generated using random numbers and added to the original phase angle.
- the phase disturbance unit 15 may replace each phase component with a value generated with a random number.
- the phase disturbance unit 15 can adaptively control the phase angle generation range, for example, by increasing the range when the noise power or the S / N ratio becomes low according to the S / N ratio.
- the phase disturbance unit 15 may weight the limitation of the disturbance range in the frequency axis direction, such as increasing the disturbance range as the frequency increases, or stopping the phase disturbance in the low frequency range.
- the signal addition unit 11 adds the processed spectrum 19 to the noise suppression spectrum 18 to suppress the degradation component included in the noise suppression spectrum 18 and outputs the obtained addition spectrum 20 to the amplitude smoothing unit 12.
- FIG. 2 is an operation explanatory diagram showing a series of processing contents in the signal transformation unit 13 and the signal addition unit 11, and represents an amplitude spectrum and a phase spectrum of a certain frequency as vectors.
- FIG. 2A illustrates an example of the relationship between the noise suppression spectrum 18 and the estimated noise spectrum 17.
- the vector 101 of the noise suppression spectrum 18, the vector 102 of the estimated noise spectrum 17, and the amplitude of the estimated noise spectrum 17 are illustrated.
- a scalar value 103 multiplied by a predetermined value is represented by a vector 104 of the modified noise suppression spectrum 18a obtained by modifying the vector 101 so as to have the same amplitude value as the scalar value 103.
- FIG. 1 is an operation explanatory diagram showing a series of processing contents in the signal transformation unit 13 and the signal addition unit 11, and represents an amplitude spectrum and a phase spectrum of a certain frequency as vectors.
- FIG. 2A illustrates an example of the relationship between the noise suppression spectrum 18 and the estimated noise spectrum 17.
- FIG. 2B illustrates an example of the relationship between the noise suppression spectrum 18, the processed spectrum 19, and the addition spectrum 20.
- the suppression spectrum 18a is expressed by the vector 105 of the processed spectrum 19 and the vector 106 of the addition spectrum 20 obtained by phase disturbance.
- ⁇ is a phase angle for phase disturbance of the vector 104.
- a phase disturbance range (existing range of the processed spectrum 19) A is indicated by a dotted circle.
- FIG. 3 is a graph illustrating a series of processes of the signal transformation unit 13 and the signal addition unit 11 by giving a more specific example, and shows a spectrum in a typical case.
- the vertical axis represents the power of the amplitude spectrum
- the horizontal axis represents the frequency.
- a dotted line represents the estimated noise spectrum 17 and a modified estimated noise spectrum 17a obtained by multiplying the estimated noise spectrum 17 by a predetermined positive value smaller than 1, and a solid line represents the noise suppression spectrum 18 and the smoothed noise suppression spectrum 21.
- a region B indicated by an alternate long and short dash line is an example in which the amplitude value of the modified estimated noise spectrum 17 a is close to the amplitude value of the noise suppression spectrum 18, and the region C is an amplitude value of the noise suppression spectrum 18.
- the amplitude value of the deformation estimation noise spectrum 17a is small is illustrated.
- 3 corresponds to the scalar value 103 obtained by multiplying the amplitude of the estimated noise spectrum 17 of FIG. 2 by a predetermined value.
- FIG. 4 is an operation explanatory diagram showing a series of processing contents of the signal transformation unit 13 and the signal addition unit 11 for the regions B and C in FIG. 3, and FIG. 4A shows the amplitude spectrum of the frequency in the region B in FIG. The phase spectrum is vectorized and expressed, and FIG. 4B shows the frequency amplitude spectrum and phase spectrum of the region C in FIG.
- FIG. 4 the same components as those in FIG.
- the estimated noise Since the predetermined value multiplied by the spectrum 17 is set in the vicinity of the maximum suppression amount, it can be considered that the spectrum component of the noise suppression spectrum 18 is noise-suppressed by the suppression amount close to the maximum suppression amount. In other words, this spectral component represents noise.
- region B of FIG. 3 there is a high possibility that noise that cannot be completely suppressed in the noise suppression process remains in the noise suppression spectrum 18 (in particular, the higher the frequency, that is, the frequency).
- the residual noise D which is a deteriorated component in the noise suppression spectrum 18, is subjected to larger signal processing by the processing spectrum 19.
- the spectrum component of the noise suppression spectrum 18 may be speech.
- the noise suppression spectrum 18 is dominant as shown in the region C of FIG. 3, so that the influence of the signal processing by the processed spectrum 19 is small and there is almost no audible influence.
- the amplitude smoothing unit 12 illustrated in FIG. 1 performs a smoothing process on the amplitude component of the spectrum for each frequency with respect to the addition spectrum 20 input from the signal addition unit 11, and the smoothed spectrum is smoothed noise suppression spectrum. 21 is output to the frequency / time converter 5.
- the smoothing process either the frequency axis direction, the time axis direction (interframe smoothing), or a combination of both can be used.
- the amplitude smoothing unit 12 can perform both frequency axis and time axis smoothing processes as shown in the following equations (5) and (6), for example.
- X (n, 0) S ADD (n, 0)
- X (n, k) (1 ⁇ (k)) ⁇ S ADD (n, k ⁇ 1) + ⁇ (k) ⁇ S ADD (n, k)
- k 1,. . . , M
- Y (n, k) (1 ⁇ (k)) ⁇ Y (n ⁇ 1, k) + ⁇ (k) ⁇ X (n, k)
- k 0,. . . , M (6)
- the above equation (5) indicates the smoothing process in the frequency axis direction
- the equation (6) indicates the smoothing in the time axis direction
- n is the frame number
- k is the spectrum component number
- S ADD (n, k) is The addition spectrum 20
- X (n, k) is the addition spectrum after smoothing in the frequency axis direction
- Y (n, k) is the addition spectrum after smoothing in both the frequency axis and the time axis, that is, the smoothed noise suppression spectrum.
- ⁇ (k) and ⁇ (k) are smoothing coefficients in the frequency axis direction and the time axis direction, respectively, and are predetermined values having values of 0 to 1.
- the optimum values of the smoothing coefficients ⁇ (k) and ⁇ (k) differ depending on the frame length and the degree of degraded sound to be eliminated, but in the configuration of the present embodiment, about 0.95 and 0.2 to 0, respectively. A value of about .4 is preferred. Also, depending on the type of noise, it is better to weight the smoothing coefficient in the frequency direction.For example, in the case of automobile running noise in which power is unevenly distributed in the low range, adjustments that enhance the smoothing of the low range are made.
- the frequency direction of the band is strengthened, and conversely the time axis direction of the band
- the smoothing can be adjusted to weaken, and the smoothing effect can be enhanced by specializing in the noise type.
- the amplitude smoothing unit 12 changes or controls the smoothing processing method or changes the smoothing coefficient according to the input signal spectrum 16 and the estimated noise spectrum 17, for example. Is possible.
- the amplitude smoothing unit 12 uses the SN ratio for each frequency of the input signal spectrum 16 and the estimated noise spectrum 17 (spectrum SN ratio where the input signal spectrum 16 is S and the estimated noise spectrum 17 is N), For example, when the spectral SN ratio is less than 0.75 dB, smoothing in both the frequency axis direction and the time axis direction is performed, and when the spectral SN ratio is 0.75 dB or more and less than 1.5 dB, the time axis direction When the spectral SN ratio is 1.5 dB or more, the smoothing process is stopped and the quality of the output sound 6 is good.
- the amplitude smoothing unit 12 may use a noise suppression spectrum 18 instead of the input signal spectrum 16. Since the ratio between the noise suppression spectrum 18 and the estimated noise spectrum 17 can be a good indicator of residual noise as described above with reference to FIG. 3, the amplitude smoothing unit 12 operates the smoothing process more efficiently. Can improve the subjective quality.
- the amplitude smoothing unit 12 has, for example, noise having brown spectrum characteristics, brown noise, or white noise to the extent that the audio signal is not affected (for example, 1 dB amplitude) with respect to the spectrum component after the smoothing process.
- Pseudo-noise such as noise imparted with frequency characteristics (slope, etc.) of the noise spectrum in the input signal may be superimposed.
- the noise suppression apparatus 100 includes a time / frequency conversion unit 2 that converts the input signal 1 into an input signal spectrum 16 that is a frequency component, and noise that estimates an estimated noise spectrum 17 from the input signal 1.
- the noise suppression of the input signal spectrum 16 based on the estimated noise spectrum 17, the noise spectrum suppression unit 7 that generates the noise suppression spectrum 18, the ratio based on the noise suppression spectrum 18 and the estimated noise spectrum 17.
- the noise suppression spectrum 18 is deformed and the processed spectrum 19 is generated by smoothing (phase disturbance), and the processed spectrum 19 is added to the noise suppression spectrum 18, and the degradation included in the noise suppression spectrum 18 is detected.
- the signal adding unit 11 for suppressing the component is provided.
- the signal processing unit 4 performs predetermined processing on the noise suppression spectrum 18 that has deteriorated due to noise suppression processing or the like, the value of the frequency component of the noise suppression spectrum 18 and the value of the frequency component of the estimated noise spectrum 17 are changed. Based on this, a processed spectrum 19 that is a smoothing component that is not subjectively concerned with the degradation component included in the noise suppression spectrum 18 is obtained and added to the frequency component of the noise suppression spectrum 18 to suppress the degradation component. be able to. As a result, voice / noise section determination, which was necessary in the conventional method, is not required, and as a result, there is an effect that subjective quality can be improved without generating an echo feeling or noise feeling due to a section determination error.
- the signal processing unit 4 performs fine processing component generation and processing for each spectral component in the frequency domain. For this reason, for example, even in the case of an audio signal mixed with automobile running noise whose noise power is concentrated in the low frequency range, the deterioration of the low frequency noise is subjectively improved, but the high frequency audio component is not processed. Since the components can be processed, the subjective quality can be further improved.
- the signal processing unit 4 generates a processing component for each spectrum component based on both the noise suppression spectrum 18 and the estimated noise spectrum 17 which are input signals. Therefore, processing control according to each spectrum component becomes possible, and for example, there is an effect that subjective quality can be improved even for a signal in which a degradation component is locally generated in a certain band.
- the amplitude spectrum component is smoothed and the phase spectrum component is disturbed. Therefore, it is possible to satisfactorily suppress the unstable behavior and disturbance of the artificial amplitude component and phase component of the degradation component, and to further improve the subjective quality. .
- the processing performed on the noise suppression spectrum 18 is performed by both the phase disturbance unit 15 and the amplitude smoothing unit 12.
- the noise suppression device 100 includes the phase disturbance unit 15;
- a configuration in which only one of the processes is performed, for example, only the phase disturbance process may be performed.
- the speech / noise determination unit 9 and the noise spectrum update unit 10 are used for estimating the estimated noise spectrum 17, but the means for obtaining the noise spectrum is not limited to this configuration.
- the noise / noise determination unit 9 is omitted by making the update rate of the noise spectrum very slow, or the estimated noise spectrum 17 is not estimated from the input signal 1.
- a separate analysis / estimation method may be taken from the input signal.
- FIG. FIG. 5 shows the overall configuration of the noise suppression device 100 according to the present embodiment, and is a configuration in which a signal subtracting unit 22 is added to the noise suppression device 100 of the first embodiment.
- the same or equivalent components as those of the first embodiment (FIG. 1) described above are denoted by the same reference numerals, and description thereof is omitted.
- the processed component calculation unit 14 obtains a value (deformed estimated noise spectrum) obtained by multiplying the amplitude value by a predetermined value, and the noise suppression spectrum has the same amplitude value as that value.
- 18 is transformed for each frequency component and output to the phase disturbance unit 15 as a modified noise suppression spectrum 18a and also output to the signal subtraction unit 22.
- the predetermined value to be multiplied by the estimated noise spectrum 17 is the embodiment. Similar to 1, it may be adjusted in advance according to the type of noise, the noise suppression method, the degree of deteriorated sound, or the user's preference.
- the signal subtraction unit 22 performs a subtraction process for subtracting the modified noise suppression spectrum 18 a from the noise suppression spectrum 18 output by the noise spectrum suppression unit 7, and outputs the obtained spectrum component to the signal addition unit 11.
- FIG. 6 is an operation explanatory diagram showing a series of processing contents in the signal deforming unit 13, the signal subtracting unit 22, and the signal adding unit 11, and represents an amplitude spectrum and a phase spectrum of a certain frequency as vectors.
- the same or corresponding parts as in FIG. 6A shows an example of the relationship between the noise suppression spectrum 18 and the estimated noise spectrum 17 as in FIG. 2A.
- the vector 101 of the noise suppression spectrum 18 and the estimated noise spectrum 17 are shown in FIG.
- FIG. 6B shows an example of the relationship between the noise suppression spectrum, the processed spectrum obtained in FIG. 6A, and the addition spectrum, as in FIG. 2B.
- a vector 101 of the noise suppression spectrum 18, a vector 104 of the modified noise suppression spectrum 18a, a vector 105 of the processed spectrum 19, a component vector 107 of the spectrum obtained by subtracting the modified noise suppression spectrum 18a from the noise suppression spectrum 18, and a vector 108 of the addition spectrum 20 It is expressed by
- FIG. 6 differs from FIG. 2 in that the vector 104 of the modified noise suppression spectrum 18a is subtracted before the vector 105 of the processed spectrum 19 is added to the vector 101 of the noise suppression spectrum 18. Therefore, there is an advantage that the amplitude of the noise suppression spectrum 18 does not increase even if the signal addition unit 11 performs the process of adding the processed spectrum 19 to suppress the degradation component.
- the amplitude smoothing unit 12 performs an amplitude smoothing process on the added spectrum 20 as in the first embodiment.
- the amplitude smoothing unit 12 also converts noise, brown noise, or white noise having a Hot spectrum characteristic to an extent that does not affect the audio signal (for example, 1 dB amplitude) with respect to the spectrum component after the smoothing process. You may superimpose pseudo noises, such as the noise which provided the frequency characteristic (slope etc.) of the noise spectrum in an input signal.
- the noise suppression apparatus 100 generates the modified noise suppression spectrum 18a obtained by modifying the noise suppression spectrum 18 in accordance with the ratio based on the noise suppression spectrum 18 and the estimated noise spectrum 17, and the modified noise suppression.
- a signal modification unit 13 that generates a processed spectrum 19 obtained by smoothing (phase disturbance) the spectrum 18a, a signal subtraction unit 22 that subtracts the modified noise suppression spectrum 18a from the noise suppression spectrum 18, and a modified noise suppression spectrum by the signal subtraction unit 22.
- the processing spectrum 19 is added to the noise suppression spectrum 18 from which 18a is subtracted, and the signal addition unit 11 is configured to suppress the degradation component included in the noise suppression spectrum 18. Since the signal processing unit 4 subtracts the modified noise suppression spectrum 18a and adds the processed spectrum 19 to the noise suppression spectrum 18, in addition to the effects described in the first embodiment, the output signal 6 There is an effect that the subjective quality can be further improved while suppressing an increase in noise.
- the addition process of the signal addition unit 11 is performed, but this order is reversed, that is, noise suppression. Needless to say, the same effect can be obtained by adding the processed spectrum 19 to the spectrum 18 and then subtracting the modified noise suppression spectrum 18a.
- the noise suppression apparatus 100 includes the amplitude smoothing unit 12.
- the amplitude smoothing unit 12 may be omitted and the amplitude smoothing process may be omitted.
- the speech / noise determination unit 9 and the noise spectrum update unit 10 are used to estimate the estimated noise spectrum 17, but the configuration for obtaining the noise spectrum is the same as in the first embodiment.
- the noise / noise determination unit 9 is omitted by making the update rate of the noise spectrum very slow, or the estimated noise spectrum 17 is not estimated from the input signal 1. May be separately analyzed and estimated from the noise estimation input signal.
- Embodiment 3 In the first and second embodiments, in the processing of the processing component calculation unit 14 in the signal transformation unit 13, a value near the maximum suppression amount in the noise suppression processing is used as the predetermined value multiplied for each frequency of the estimated noise spectrum 17. It was a configuration. In the present embodiment, a predetermined value multiplied for each frequency of the estimated noise spectrum 17 is weighted in the frequency axis direction, for example, a large value at a low frequency and a small value at a high frequency.
- the configuration of the noise suppression apparatus of the present embodiment is the same as that of the noise suppression apparatus 100 of the first embodiment shown in FIG. 1 or the second embodiment shown in FIG. Only the processing is different.
- the processing component calculation unit 14 selects the weighting coefficient used for frequency weighting from, for example, one or more tables (a constant array when described in a program) according to the type of noise or user preference.
- a function for generating and outputting a weighting coefficient by inputting a spectral tilt amount that can be calculated from the ratio of the low frequency component power to the high frequency component power of the noise power or the estimated noise spectrum 17 is defined in advance.
- each frame may be generated from the function and applied sequentially.
- the processing component calculation unit 14 weights the predetermined value for multiplication for each frequency of the estimated noise spectrum 17 in the frequency direction. Therefore, in addition to the effects described in the first and second embodiments, there is an effect that the subjective quality can be improved even for a signal having a different degree of deterioration in the frequency direction.
- FIG. 7 shows the overall configuration of the noise suppression apparatus 100 according to the present embodiment.
- a noise suppression filter unit 23 In place of the noise spectrum suppression unit 7 of the first embodiment, a noise suppression filter unit 23, a time / frequency conversion unit 24, and It is the structure provided with.
- the same or equivalent components as those of the first embodiment (FIG. 1) described above are denoted by the same reference numerals, and description thereof is omitted.
- the noise suppression filter unit 23 shown in FIG. 7 receives the input signal 1 and performs noise suppression processing in the time domain. Specifically, the noise suppression filter unit 23 performs noise suppression processing corresponding to time axis processing such as a Kalman filter on the input signal 1 and outputs the noise suppression signal to the time / frequency conversion unit 24.
- time axis processing such as a Kalman filter
- the time / frequency converter 24 converts the noise suppression signal output from the noise suppression filter 23 into a frequency domain signal. Specifically, the time / frequency conversion unit 24 performs FFT of the noise suppression signal, and outputs the obtained spectrum component as the noise suppression spectrum 18 to the signal addition unit 11 and the processing component calculation unit 14. Note that the number of FFT points in the time / frequency conversion unit 24 and the number of FFT points in the time / frequency conversion unit 2 described above are preferably the same, and the time / frequency conversion unit 24 outputs the noise suppression spectrum 18. In addition, the time / frequency converter 2 and the number of FFT points may be the same.
- the time / frequency conversion unit 24 thins out or averages and outputs the spectrum components. For example, spectral components may be interpolated and output.
- the number of FFT points of the time / frequency conversion units 2 and 24 is not necessarily the same.
- the subjective quality of the signal to be processed can be improved regardless of the frequency domain or the time domain as a noise suppression processing technique.
- the configuration of the fourth embodiment can be easily applied to the second and third embodiments, and even in the case of the configuration, the method of noise suppression processing is not limited to the frequency domain or the time domain. There is an effect that the subjective quality of the signal to be processed can be improved.
- FIG. 8 shows the overall configuration of speech decoding apparatus 200 according to the present embodiment.
- the speech decoding apparatus 200 is assumed to receive the code data 25 instead of the input signal, and newly includes a speech decoding unit 26 that decodes the code data 25.
- FIG. 8 the same or corresponding parts as in FIG.
- the code data 25 is input to the speech decoding unit 26 in the speech decoding apparatus 200 via, for example, a not-shown wired or wireless communication path, or storage means such as a memory. Note that the code data 25 is a result of separately encoding a speech acoustic signal by a speech encoding unit (not shown).
- the audio decoding unit 26 performs a predetermined decoding process corresponding to the encoding process of the audio encoding unit on the code data 25, and sends the decoded signal 27 to the time / frequency conversion unit 2 and the audio / noise determination unit 9. Output.
- the time / frequency conversion unit 2 performs frame division and windowing processing on the decoded signal 27 instead of the input signal 1 in the same manner as in the first embodiment, and performs, for example, FFT on the windowed signal. . Then, the time / frequency conversion unit 2 outputs the decoded signal spectrum 28 which is a spectrum component for each frequency to the signal processing unit 4 and the noise spectrum estimation unit 8.
- the speech / noise determination unit 9 calculates the speech likelihood signal of the current frame using the input decoded signal 27 and decoded signal spectrum 28. Subsequently, the noise spectrum updating unit 10 estimates an average noise spectrum in the decoded signal spectrum 28 and outputs it as an estimated noise spectrum 17.
- the configuration and processing in the noise spectrum estimation unit 8 can be the same as in the first embodiment.
- the signal transformation unit 13 in the signal processing unit 4 generates a processed spectrum 19 using the decoded signal spectrum 28 and the estimated noise spectrum 17 output from the noise spectrum estimation unit 8.
- the processing component calculation unit 14 obtains a value obtained by multiplying the amplitude value by a predetermined value for each frequency component of the estimated noise spectrum 17 and has a decoded signal spectrum having the same amplitude value as the obtained value. 28 is modified for each frequency component, and is output to the phase disturbance unit 15 as a modified decoded signal spectrum 28a.
- the predetermined value multiplied by the estimated noise spectrum 17 is not a value near the maximum suppression amount, but is slightly smaller than 1 or 1, for example.
- the processing component calculation unit 14 can switch to a suitable value according to the type of speech encoding method.
- the phase disturbance unit 15 gives disturbance to the phase component for each frequency with respect to the modified decoded signal spectrum 28 a calculated by the processing component calculation unit 14, and outputs the spectrum after the disturbance as the processing spectrum 19 to the signal addition unit 11. .
- the same method as in the first embodiment can be used as a method for giving disturbance to each phase component and a method for controlling the phase disturbance range.
- the signal addition unit 11 adds the processed spectrum 19 to the decoded signal spectrum 28 and outputs the obtained addition spectrum 20 to the amplitude smoothing unit 12.
- the amplitude smoothing unit 12 smoothes the amplitude component of the spectrum for each frequency with respect to the addition spectrum 20 input from the signal addition unit 11, and uses the smoothed spectrum as the smoothed decoded signal spectrum 29 to obtain a frequency. Output to the time conversion unit 5.
- the configuration, processing, smoothing control method, and the like of the amplitude smoothing unit 12 can be the same as those in the first embodiment. For each parameter, for example, a speech encoding method or a decoded signal 27 is used. What is necessary is just to adjust beforehand according to the deterioration degree of.
- the amplitude smoothing unit 12 has, for example, noise having brown spectrum characteristics, brown noise, or white noise to the extent that the audio signal is not affected (for example, 1 dB amplitude) with respect to the spectrum component after the smoothing process.
- artificially generated pseudo noise such as noise imparted with frequency characteristics (gradient or the like) of the noise spectrum in the input signal may be superimposed.
- the frequency / time conversion unit 5 performs inverse FFT processing on the smoothed decoded signal spectrum 29 input from the signal processing unit 4 to return to the time domain signal, and a window for smooth connection with the previous and subsequent frames. The connection is performed while performing the multiplication process, and the obtained signal is output as the output signal 6.
- the speech decoding apparatus 200 decodes predetermined code data 25 to generate a decoded signal 27, and converts the decoded signal 27 into a decoded signal spectrum 28 that is a frequency component.
- the time / frequency conversion unit 2 for conversion, the noise spectrum estimation unit 8 for estimating the estimated noise spectrum 17 from the decoded signal 27, and the decoded signal spectrum 28 are deformed according to the ratio based on the decoded signal spectrum 28 and the estimated noise spectrum 17.
- a signal transformation unit 13 that generates a processed spectrum 19 that has been smoothed (phase disturbed), and a signal addition unit 11 that adds the processed spectrum 19 to the decoded signal spectrum 28 and suppresses a degradation component included in the decoded signal spectrum 28.
- the signal processing unit 4 performs a predetermined processing process on the decoded signal spectrum 28 that has deteriorated due to the speech encoding process, the frequency component value of the decoded signal spectrum 28 and the frequency component value of the estimated noise spectrum 17 are changed. Based on this, a processed spectrum 19 that is a smoothing component that is not subjectively concerned with the degradation component included in the decoded signal spectrum 28 is obtained and added to the frequency component of the decoded signal spectrum 28 to suppress the degradation component. be able to. As a result, voice / noise section determination, which was necessary in the conventional method, is not required, and as a result, there is an effect that subjective quality can be improved without generating an echo feeling or noise feeling due to a section determination error.
- the signal processing unit 4 performs fine processing component generation and processing for each spectral component in the frequency domain. For this reason, for example, even in the case of an audio signal mixed with automobile running noise whose noise power is concentrated in the low frequency range, the deterioration of the low frequency noise is subjectively improved, but the high frequency audio component is not processed. Since the component suppression process can be performed, the subjective quality can be further improved.
- the signal processing unit 4 generates a processing component for each spectral component based on both the decoded signal spectrum 28 which is an input signal and the estimated noise spectrum 17. Therefore, processing control according to each spectrum component becomes possible, and for example, there is an effect that subjective quality can be improved even for a signal in which a degradation component is locally generated in a certain band.
- the amplitude spectrum component is smoothed and the phase spectrum component is disturbed. Therefore, it is possible to satisfactorily suppress the unstable behavior and disturbance of the artificial amplitude component and phase component of the degradation component, and to further improve the subjective quality. .
- the process performed on the decoded signal spectrum 28 is performed by both the phase disturbance unit 15 and the amplitude smoothing unit 12.
- the speech decoding apparatus 200 includes the phase disturbance unit. Only one of the processes may be performed, such as only the phase disturbance process is performed with only 15.
- the speech / noise determination unit 9 and the noise spectrum update unit 10 are used for estimation of the estimated noise spectrum 17, but the configuration for obtaining the noise spectrum is the same as in the first embodiment.
- the noise / noise determination unit 9 is omitted by making the update rate of the noise spectrum very slow, or the estimated noise spectrum 17 is not estimated from the decoded signal 27, but only noise. May be separately analyzed and estimated from the noise estimation input signal.
- Embodiment 6 FIG. Similarly to the fifth embodiment, the noise suppression apparatus 100 according to the second embodiment may be modified to configure a speech decoding apparatus 200 as shown in the present embodiment.
- FIG. 9 shows the overall configuration of speech decoding apparatus 200 according to the present embodiment. In FIG. 9, the same or corresponding parts as those in FIG. 5 or FIG.
- the processed component calculation unit 14 obtains a value obtained by multiplying the amplitude value by a predetermined value for each frequency component of the estimated noise spectrum 17 and sets the decoded signal spectrum 28 so as to have the same amplitude value as the obtained value.
- Each frequency component is transformed and output to the phase disturbance unit 15 as a modified decoded signal spectrum 28 a and also to the signal subtracting unit 22.
- the predetermined value multiplied by the estimated noise spectrum 17 is set to a value slightly smaller than 1 or 1, for example, as in the fifth embodiment, or the degree of deterioration or use of the speech encoding method or the decoded signal 27. What was adjusted beforehand according to a user's liking should just be used. It is also possible to store a plurality of values in a memory or the like, and the processing component calculation unit 14 can switch to a suitable value according to the type of speech encoding method.
- the signal subtracting unit 22 performs a subtraction process for subtracting the modified decoded signal spectrum 28 a from the decoded signal spectrum 28 output from the time / frequency converting unit 2, and outputs the obtained spectrum component to the signal adding unit 11.
- the amplitude smoothing unit 12 performs an amplitude smoothing process on the added spectrum 20 as in the fifth embodiment.
- the amplitude smoothing unit 12 also converts noise, brown noise, or white noise having a Hot spectrum characteristic to an extent that does not affect the audio signal (for example, 1 dB amplitude) with respect to the spectrum component after the smoothing process.
- Artificially generated pseudo noise such as noise imparted with frequency characteristics (such as slope) of the noise spectrum in the input signal may be superimposed.
- speech decoding apparatus 200 generates modified decoded signal spectrum 28a obtained by modifying decoded signal spectrum 28 in accordance with the ratio based on decoded signal spectrum 28 and estimated noise spectrum 17, and modified decoding.
- a signal deforming unit 13 that generates a processed spectrum 19 obtained by smoothing (phase perturbing) the signal spectrum 28 a, a signal subtracting unit 22 that subtracts the modified decoded signal spectrum 28 a from the decoded signal spectrum 28, and a modified decoded signal by the signal subtracting unit 22.
- the signal adding unit 11 is configured to add the processed spectrum 19 to the decoded signal spectrum 28 from which the spectrum 28a has been subtracted to suppress the degradation component included in the decoded signal spectrum 28.
- the signal processing unit 4 subtracts the modified decoded signal spectrum 28a and adds the processed spectrum 19 to the decoded signal spectrum 28, in addition to the effects described in the fifth embodiment, the output signal 6 There is an effect that subjective quality can be further improved while suppressing an increase in noise.
- the addition process of the signal addition unit 11 is performed. It goes without saying that the same effect can be obtained by adding the processed spectrum 19 to the spectrum 28 and then subtracting the modified decoded signal spectrum 28a.
- the speech decoding apparatus 200 includes the amplitude smoothing unit 12.
- the amplitude smoothing process may be omitted without the amplitude smoothing unit 12.
- the speech / noise determination unit 9 and the noise spectrum update unit 10 are used to estimate the estimated noise spectrum 17, but the configuration for obtaining the noise spectrum is the same as in the first embodiment.
- the noise / noise determination unit 9 is omitted by making the update rate of the noise spectrum very slow, or the estimated noise spectrum 17 is not estimated from the decoded signal 27, but only noise. May be separately analyzed and estimated from the noise estimation input signal.
- Embodiment 7 FIG.
- a constant value in the frequency axis direction is used as the predetermined value to be multiplied for each frequency of the estimated noise spectrum 17 in the processing of the processing component calculation unit 14 in the signal transformation unit 13. It was.
- a predetermined value multiplied for each frequency of the estimated noise spectrum 17 is weighted in the frequency axis direction, for example, a large value at a low frequency and a small value at a high frequency.
- the configuration of speech decoding apparatus 200 of the present embodiment is the same as the configuration of speech decoding apparatus 200 of Embodiment 5 shown in FIG. 8 or Embodiment 6 shown in FIG. Only the processing of the unit 14 is different.
- the processing component calculation unit 14 determines the weighting coefficient used for frequency weighting from, for example, one or more tables (a constant array when described in a program), and the type or user of the speech encoding method.
- the processing component calculation unit 14 weights the predetermined value for multiplication for each frequency of the estimated noise spectrum 17 in the frequency direction. Therefore, in addition to the effects described in the fifth and sixth embodiments, there is an effect that the subjective quality can be improved even for a signal having a different degree of deterioration in the frequency direction.
- Embodiment 8 FIG.
- the signal processing unit 4 is configured to generate the processed spectrum 19 according to the ratio based on the estimated noise spectrum 17 and the noise suppression spectrum 18, but in the present embodiment, the estimated noise spectrum 17 and the noise
- the width of the phase disturbance of the noise suppression spectrum 18 is controlled according to the ratio based on the suppression spectrum 18.
- FIG. 10 shows the overall configuration of the noise suppression apparatus 100 according to the present embodiment.
- the signal processing unit 4 of the noise suppression device 100 shown in FIG. 10 is composed of a phase disturbance unit 30, a phase control unit 31, and an amplitude smoothing unit 12, unlike the signal processing unit 4 of the first embodiment shown in FIG. ing. 10 that are the same as or equivalent to those in FIG. 1 are assigned the same reference numerals, and descriptions thereof are omitted.
- the phase control unit 31 calculates a phase control signal 32 for controlling the width of the phase disturbance according to the calculated spectrum S / N ratio, and outputs the phase control signal 32 to the phase disturbance unit 30.
- a method for controlling the range of phase disturbance for example, a method of controlling so that the range of phase disturbance is increased when the spectral SNR is small, and conversely when the spectral SNR is large, the range is decreased.
- a method for setting the phase control signal 32 for instructing the range of the phase disturbance for example, a plurality of predetermined values corresponding to the spectrum S / N ratio are stored in a table or the like, and the phase control unit 31 has the highest value for the calculated spectrum S / N ratio.
- a predetermined function for inputting the spectrum S / N ratio and outputting the phase control signal 32 may be defined in advance, and the phase control unit 31 may calculate the phase control signal 32 using the function. Whichever method is used, it may be adjusted in advance according to the type of noise, the noise suppression method, the degree of deterioration, or the user's preference.
- the phase control unit 31 may weight the frequency axis direction, for example, increasing the disturbance range as the frequency becomes higher and stopping the phase disturbance in the lower frequency range.
- the phase control unit 31 sets the weighting coefficient used for frequency weighting according to the type of noise suppression method or the user's preference from, for example, one or more tables (a constant array when described in a program).
- a function that generates and outputs a weighting coefficient by inputting a spectral tilt amount that can be calculated from the ratio of the low frequency component power and the high frequency component power of the noise power or the estimated noise spectrum 17 is defined in advance.
- a weighting coefficient may be generated for each frame and applied sequentially.
- the spectrum S / N ratio is illustrated and used for the sake of simplicity, but it is not necessary to be limited to this configuration.
- the total band power of the noise suppression spectrum 18 The ratio of the total band power of the estimated noise spectrum 17 and the amount of spectrum inclination that can be calculated from the ratio of the low frequency component power and the high frequency component power of the estimated noise spectrum 17 may be used in combination as control factors.
- the phase control unit 31 can control the range of the phase disturbance with higher accuracy, and can further improve the subjective quality.
- the phase disturbance unit 30 performs phase disturbance of the noise suppression spectrum 18 in accordance with the phase control signal 32 for controlling the width of the phase disturbance output from the phase control unit 31 and outputs the phase disturbance spectrum 33.
- the same effect can be obtained by using the configuration of the phase disturbance unit 15 described in the first embodiment shown in FIG. 1 instead of the phase disturbance unit 30.
- the amplitude smoothing unit 12 smoothes the amplitude component of the spectrum for each frequency with respect to the phase disturbance spectrum 33 input from the phase disturbance unit 30, and uses the smoothed spectrum as the smoothed noise suppression spectrum 21. Output to the frequency / time converter 5.
- the configuration, processing, smoothing control method, and the like of the amplitude smoothing unit 12 can be the same as those in the first embodiment. For each parameter, for example, the type of noise suppression method or the signal What is necessary is just to adjust beforehand according to a deterioration degree.
- the amplitude smoothing unit 12 has, for example, noise having brown spectrum characteristics, brown noise, or white noise to the extent that the audio signal is not affected (for example, 1 dB amplitude) with respect to the spectrum component after the smoothing process.
- artificially generated pseudo noise such as noise imparted with frequency characteristics (gradient or the like) of the noise spectrum in the input signal may be superimposed.
- the noise suppression apparatus 100 when the signal processing unit 4 performs a predetermined processing on the noise suppression spectrum 18 deteriorated by the noise suppression processing or the like, the noise suppression spectrum 18 that is an input signal. Based on the value of the frequency component and the frequency component value of the estimated noise spectrum 17, the phase disturbance is performed so that the deterioration component included in the noise suppression spectrum 18 is not subjectively concerned. This eliminates the need for voice / noise section determination, which is necessary in the conventional method. As a result, there is an effect that the subjective quality can be improved without generating an echo or noise due to a section determination error.
- the signal processing unit 4 performs fine processing for each spectrum component in the frequency domain. For this reason, for example, even in the case of an audio signal mixed with automobile running noise whose noise power is concentrated in the low frequency range, the deterioration of the low frequency noise is subjectively improved, but the high frequency audio component is not processed. Since the components can be processed, the subjective quality can be further improved.
- the signal processing unit 4 performs processing for each spectrum component based on both the noise suppression spectrum 18 and the estimated noise spectrum 17 which are input signals. Therefore, processing control according to each spectrum component becomes possible, and for example, there is an effect that subjective quality can be improved even for a signal in which a degradation component is locally generated in a certain band.
- the amplitude spectrum component is smoothed and the phase spectrum component is disturbed. Therefore, it is possible to satisfactorily suppress the unstable behavior and disturbance of the artificial amplitude component and phase component of the degradation component, and to further improve the subjective quality. .
- the noise suppression apparatus 100 was set as the structure provided with the amplitude smoothing part 12, the structure which does not include the amplitude smoothing part 12 but abbreviate
- omits an amplitude smoothing process may be sufficient.
- the speech / noise determining unit 9 and the noise spectrum updating unit 10 are used for estimating the estimated noise spectrum 17, but the means for obtaining the noise spectrum is the same as in the first embodiment.
- the noise / noise determination unit 9 is omitted by making the update rate of the noise spectrum very slow, or the estimated noise spectrum 17 is not estimated from the input signal 1. May be separately analyzed and estimated from the noise estimation input signal.
- Embodiment 9 FIG. Similarly to the eighth embodiment, the speech decoding apparatus 200 according to the fifth embodiment is modified so that the signal processing unit 4 generates the processed spectrum 19 according to the ratio based on the decoded signal spectrum 28 and the estimated noise spectrum 17. Instead, the width of the phase disturbance of the decoded signal spectrum 28 may be controlled according to the ratio based on the decoded signal spectrum 28 and the estimated noise spectrum 17.
- FIG. 11 shows the overall configuration of the speech decoding apparatus 200 according to this embodiment.
- the signal processing unit 4 of the speech decoding apparatus 200 shown in FIG. 11 includes a phase disturbance unit 30, a phase control unit 31, and an amplitude smoothing unit 12 unlike the signal processing unit 4 of the fifth embodiment shown in FIG. Has been.
- FIG. 11 the same or corresponding parts as those in FIG. 5 or FIG.
- the phase control unit 31 calculates a phase control signal 32 for controlling the width of the phase disturbance according to the calculated spectrum S / N ratio, and outputs the phase control signal 32 to the phase disturbance unit 30.
- a method for controlling the range of phase disturbance for example, a method of controlling so that the range of phase disturbance is increased when the spectral SNR is small, and conversely when the spectral SNR is large, the range is decreased.
- a method for setting the phase control signal 32 for instructing the range of the phase disturbance, the control of the range of the disturbance, and the control factor it is possible to use the same method as the processing in the eighth embodiment. What is necessary is just to adjust beforehand according to the kind of, the degree of deterioration, or a user's liking.
- the phase disturbance unit 30 performs phase disturbance of the decoded signal spectrum 28 in accordance with the phase control signal 32 output from the phase control unit 31 and outputs it as the phase disturbance spectrum 33.
- the same effect can be obtained by using the configuration of the phase disturbance unit 15 described in the first embodiment shown in FIG. 1 instead of the phase disturbance unit 30.
- the amplitude smoothing unit 12 smoothes the amplitude component of the spectrum for each frequency with respect to the phase disturbance spectrum 33 input from the phase disturbance unit 30, and uses the smoothed spectrum as the smoothed decoded signal spectrum 29. Output to the frequency / time converter 5.
- the configuration, processing, smoothing control method, and the like of the amplitude smoothing unit 12 can be the same as those in the fifth embodiment, and for each parameter, for example, the type of speech encoding method or What is necessary is just to adjust beforehand according to the deterioration degree of a signal.
- the amplitude smoothing unit 12 has, for example, noise having brown spectrum characteristics, brown noise, or white noise to the extent that the audio signal is not affected (for example, 1 dB amplitude) with respect to the spectrum component after the smoothing process.
- artificially generated pseudo noise such as noise imparted with frequency characteristics (gradient or the like) of the noise spectrum in the input signal may be superimposed.
- the speech decoding apparatus 200 uses the decoded signal spectrum that is an input signal when the signal processing unit 4 performs a predetermined processing on the decoded signal spectrum 28 that has deteriorated due to the speech encoding process. Based on the value of the frequency component of 28 and the value of the frequency component of the estimated noise spectrum 17, the phase disturbance is performed so that the deterioration component included in the decoded signal spectrum 28 is not subjectively concerned. This eliminates the need for voice / noise section determination, which is necessary in the conventional method. As a result, there is an effect that subjective quality can be improved without generating an echo feeling or a noise feeling due to a section determination error.
- the signal processing unit 4 performs fine processing for each spectrum component in the frequency domain. For this reason, for example, even in the case of an audio signal mixed with automobile running noise whose noise power is concentrated in the low frequency range, the deterioration of the low frequency noise is subjectively improved, but the high frequency audio component is not processed. Since the components can be processed, the subjective quality can be further improved.
- the signal processing unit 4 performs processing for each spectrum component based on both the decoded signal spectrum 28 as the input signal and the estimated noise spectrum 17. Therefore, processing control according to each spectrum component becomes possible, and for example, there is an effect that subjective quality can be improved even for a signal in which a degradation component is locally generated in a certain band.
- the amplitude spectrum component is smoothed and the phase spectrum component is disturbed. Therefore, it is possible to satisfactorily suppress the unstable behavior and disturbance of the artificial amplitude component and phase component of the degradation component, and to further improve the subjective quality. .
- the speech decoding apparatus 200 includes the amplitude smoothing unit 12.
- the amplitude smoothing process may be omitted without the amplitude smoothing unit 12.
- the speech / noise determination unit 9 and the noise spectrum update unit 10 are used for estimating the estimated noise spectrum 17, but the means for obtaining the noise spectrum is the same as in the first embodiment.
- the noise / noise determination unit 9 is omitted by making the update rate of the noise spectrum very slow, or the estimated noise spectrum 17 is not estimated from the decoded signal 27, but only noise. May be separately analyzed and estimated from a noise estimation input signal.
- FIG. 12 shows the overall configuration of speech decoding apparatus 200 according to this embodiment.
- FIG. 12 shows a configuration including a noise spectrum suppression unit 7 for performing noise suppression processing, but a configuration including a noise suppression filter unit 23 and a time / frequency conversion unit 24 (FIG. 7) instead of the noise spectrum suppression unit 7. It may be.
- FIG. 12 parts that are the same as or equivalent to those in FIGS. 1 to 11 are given the same reference numerals and explanation thereof is omitted.
- a noise suppression method in the time domain by the filter unit 23 can be used.
- the decoded signal spectrum 28 is newly deteriorated due to the noise suppression process in addition to the deterioration associated with the speech encoding process, but depending on the degree of deterioration, the signal modification (not shown) in the signal processing unit 4 is performed.
- the control method and various parameters of the unit 13, the amplitude smoothing unit 12, and the phase control unit 31 may be adjusted as appropriate.
- noise suppression processing has been described as an example of processing connected to the subsequent stage of the speech decoding unit 26, other signal processing such as post-filter processing such as formant emphasis and auditory masking processing, amplitude dynamic range compression processing, etc. It is also possible to replace it with processing.
- a signal including a degradation component other than that resulting from the speech coding process can be processed into a subjectively preferable signal, and the subjective quality can be improved.
- Embodiment 11 FIG.
- the time / frequency conversion unit 2 calculates the spectral component by FFT
- the frequency / time conversion unit 5 returns the processed spectral component to the time domain signal by inverse FFT processing.
- FFT Fast Fourier transform
- the same effect as described in the first to tenth embodiments can be obtained even in a configuration that does not use Fourier transform.
- the configuration of the phase disturbance unit 30 (and the phase control unit 31) may be used instead of the configuration of the phase disturbance unit 15, and the phase disturbance unit 30 (and the phase control unit).
- the configuration of the phase disturbance unit 15 may be used instead of the configuration of 31).
- the noise suppression device and the speech decoding device improve the sound quality and improve the speech recognition rate by suppressing noise other than the target signal such as a speech / acoustic signal. Since it is a noise suppression device and speech decoding device that can be used, it can be used in various noise environments, such as voice communication systems such as mobile phones and interphones, hands-free call systems, video conferencing systems, monitoring systems, voice storage systems, voice recognition Suitable for use in systems and the like.
Abstract
Description
特許文献1の音信号加工方法は、雑音抑圧処理や、低ビットレート音声符号化処理によって発生する歪感を聴感的に軽減することを目的としており、入力信号と、入力信号を平滑化した加工信号を、音声・雑音状態判別手段によって求められた信号中の雑音比率の推定値に基づいて重み付け加算を行うことで、背景騒音など劣化成分が多く含まれる区間を中心に主観品質を改善するようにしたものである。
実施の形態1.
図1は本実施の形態による雑音抑圧装置100の全体構成を示したものである。
図1に示す雑音抑圧装置100は、時間・周波数変換部2、雑音抑圧部3、信号加工部4、周波数・時間変換部5で構成されている。雑音抑圧部3は、雑音スペクトル抑圧部7と、音声・雑音判定部9および雑音スペクトル更新部10からなる雑音スペクトル推定部8とで構成されている。信号加工部4は、信号加算部11と、振幅平滑部12と、加工成分算出部14および位相擾乱部15からなる信号変形部13とで構成されている。
まず、所定のサンプリング周波数(例えば、8kHz)でサンプリングされ、所定のフレーム周期(例えば、20msec)にフレーム分割された入力信号1が、雑音抑圧装置100内の時間・周波数変換部2と、後述説明する雑音スペクトル推定部8内部の音声・雑音判定部9に入力される。
VAD=wACF・ACFmax+wSNR・SNRfr・SNRnorm (3)
信号変形部13は、雑音スペクトル抑圧部7が出力する雑音抑圧スペクトル18と、雑音スペクトル推定部8が出力する推定雑音スペクトル17とを用いて、加工スペクトル19を生成する。まず、加工成分算出部14は、推定雑音スペクトル17の周波数成分毎に、その振幅値に所定値を乗算した値(後述する変形推定雑音スペクトル)を得て、その得られた値と同じ振幅値を持つように雑音抑圧スペクトル18を変形し、変形雑音抑圧スペクトル18aとして位相擾乱部15へ出力する。なお、推定雑音スペクトル17に乗算する所定値としては、例えば雑音抑圧処理における最大抑圧量近傍の値が好適である。例えば、最大抑圧量が-12dBであれば、所定値は0.25~0.2程度で設定すればよく、雑音の種類、雑音抑圧方法、劣化の度合い、または使用者の好みに合わせて予め調整すれば良い。また、複数の値をメモリ等に保持しておき、加工成分算出部14が雑音の種類および雑音パワーなどに応じて好適な値に切り替えることなども可能である。
図2(a)は雑音抑圧スペクトル18と推定雑音スペクトル17との関係の一例を図示したものであり、雑音抑圧スペクトル18のベクトル101、推定雑音スペクトル17のベクトル102、推定雑音スペクトル17の振幅に所定値を乗算したスカラ値103、スカラ値103と同じ振幅値となるようにベクトル101を変形した、変形雑音抑圧スペクトル18aのベクトル104により表現される。
また、図2(b)は雑音抑圧スペクトル18、加工スペクトル19および加算スペクトル20の関係の一例を図示したものであり、雑音抑圧スペクトル18のベクトル101、変形雑音抑圧スペクトル18aのベクトル104、変形雑音抑圧スペクトル18aを位相擾乱して得た加工スペクトル19のベクトル105、加算スペクトル20のベクトル106により表現される。またθはベクトル104を位相擾乱するための位相角である。位相擾乱の範囲(加工スペクトル19の存在範囲)Aを点線円で示す。
X(n,0)=SADD(n,0)
X(n,k)=(1-β(k))・SADD(n,k-1)
+β(k)・SADD(n,k)
ただし、k=1,...,M (5)
Y(n,k)=(1-γ(k))・Y(n-1,k)+γ(k)・X(n,k)
ただし、k=0,...,M (6)
そのため、雑音抑圧処理等によって劣化した雑音抑圧スペクトル18に対して信号加工部4が所定の加工処理を行うにあたり、雑音抑圧スペクトル18の周波数成分の値と、推定雑音スペクトル17の周波数成分の値に基づいて、雑音抑圧スペクトル18に含まれる劣化成分を主観的に気にならないようにした平滑化成分である加工スペクトル19を求めて、雑音抑圧スペクトル18の周波数成分に加算し、劣化成分を抑圧することができる。この結果、従来の方法では必要であった音声・雑音区間判定が要らなくなり、この結果、区間判定誤りによるエコー感や雑音感の発生無しに主観品質を改善できる効果がある。
図5は、本実施の形態による雑音抑圧装置100の全体構成を示したものであり、上記実施の形態1の雑音抑圧装置100に信号減算部22を追加した構成である。以下の実施の形態の説明において、先立って説明した実施の形態1(図1)の構成要素と同一または相当するものには同一の符号を付し、説明を省略する。
図6(a)は、図2(a)と同様に、雑音抑圧スペクトル18と推定雑音スペクトル17との関係の一例を図示したものであり、雑音抑圧スペクトル18のベクトル101、推定雑音スペクトル17のベクトル102、推定雑音スペクトル17の振幅に所定値を乗算したスカラ値103、変形雑音抑圧スペクトル18aのベクトル104、雑音抑圧スペクトル18から変形雑音抑圧スペクトル18aを減算したスペクトルの成分ベクトル107により表現される。
また、図6(b)は、図2(b)と同様に、雑音抑圧スペクトルと、図6(a)にて得られた加工スペクトル、および加算スペクトルとの関係の一例を図示したものであり、雑音抑圧スペクトル18のベクトル101、変形雑音抑圧スペクトル18aのベクトル104、加工スペクトル19のベクトル105、雑音抑圧スペクトル18から変形雑音抑圧スペクトル18aを減算したスペクトルの成分ベクトル107、加算スペクトル20のベクトル108により表現される。
信号加工部4が雑音抑圧スペクトル18に対し、変形雑音抑圧スペクトル18aを減算すると共に加工スペクトル19を加算するようにしたので、上記実施の形態1にて述べた効果に加えて、出力信号6の雑音感の増加を抑制しつつ、更に主観品質を改善できる効果がある。
上記実施の形態1および2では、信号変形部13内部の加工成分算出部14の処理において、推定雑音スペクトル17の周波数毎に乗算する所定値として、雑音抑圧処理における最大抑圧量近傍の値を用いる構成であった。本実施の形態では、推定雑音スペクトル17の周波数毎に乗算する所定値に、例えば低周波数では大きい値、高周波数では小さい値というような、周波数軸方向の重み付けを行う構成とする。本実施の形態の雑音抑圧装置の構成は、図1に示す上記実施の形態1または図5に示す実施の形態2の雑音抑圧装置100の構成と図面上では同様であり、加工成分算出部14の処理のみが異なる。
上記実施の形態1では、雑音抑圧処理を周波数領域(またはスペクトル領域と言う)にて実施していたが、必ずしもこの構成である必要は無く、時間領域に実施しても構わない。図7は、本実施の形態による雑音抑圧装置100の全体構成を示したものであり、上記実施の形態1の雑音スペクトル抑圧部7に代えて雑音抑圧フィルタ部23と時間・周波数変換部24とを備える構成である。以下の実施の形態の説明において、先立って説明した実施の形態1(図1)の構成要素と同一または相当するものには同一の符号を付し、説明を省略する。
実施の形態1の雑音抑圧装置100を変形して、本実施の形態に示す音声復号化装置200を構成してもよい。図8は、本実施の形態による音声復号化装置200の全体構成を示すものである。音声復号化装置200は、入力信号に代えて符号データ25が入力されるものとし、符号データ25を復号化処理する音声復号部26を新たに備える。図8において図1と同一または相当の部分については同一の符号を付す。
そのため、音声符号化処理によって劣化した復号信号スペクトル28に対して信号加工部4が所定の加工処理を行うにあたり、復号信号スペクトル28の周波数成分の値と、推定雑音スペクトル17の周波数成分の値に基づいて、復号信号スペクトル28に含まれる劣化成分を主観的に気にならないようにした平滑化成分である加工スペクトル19を求めて、復号信号スペクトル28の周波数成分に加算し、劣化成分を抑圧することができる。この結果、従来の方法では必要であった音声・雑音区間判定が要らなくなり、この結果、区間判定誤りによるエコー感や雑音感の発生無しに主観品質を改善できる効果がある。
上記実施の形態5と同様に、上記実施の形態2の雑音抑圧装置100を変形して、本実施の形態に示すような音声復号化装置200を構成してもよい。図9は、本実施の形態による音声復号化装置200の全体構成を示すものである。図9において図5または図8と同一または相当の部分については同一の符号を付し説明を省略する。
信号加工部4が復号信号スペクトル28に対し、変形復号信号スペクトル28aを減算すると共に加工スペクトル19を加算するようにしたので、上記実施の形態5にて述べた効果に加えて、出力信号6の雑音感の増加を抑制しつつ、更に主観品質を改善できる効果がある。
上記実施の形態5および6では、信号変形部13内部の加工成分算出部14の処理において、推定雑音スペクトル17の周波数毎に乗算する所定値として、周波数軸方向に一定の値を用いる構成であった。本実施の形態では、推定雑音スペクトル17の周波数毎に乗算する所定値に、例えば低周波数では大きな値、高周波数では小さい値というような、周波数軸方向の重み付けを行う構成とする。本実施の形態の音声復号化装置200の構成は、図8に示す実施の形態5または図9に示す実施の形態6の音声復号化装置200の構成と図面上では同様であり、加工成分算出部14の処理のみが異なる。
上記実施の形態1では、信号加工部4が推定雑音スペクトル17と雑音抑圧スペクトル18に基づく比に応じて加工スペクトル19を生成する構成であったが、本実施の形態では推定雑音スペクトル17と雑音抑圧スペクトル18に基づく比に応じて雑音抑圧スペクトル18の位相擾乱の幅を制御する構成とする。
この構成の場合には、雑音抑圧処理の手法として周波数領域および時間領域を問わず、その主観品質を改善できる効果がある。
上記実施の形態8と同様に、上記実施の形態5の音声復号化装置200を変形して、信号加工部4が復号信号スペクトル28と推定雑音スペクトル17に基づく比に応じて加工スペクトル19を生成する代わりに、復号信号スペクトル28と推定雑音スペクトル17に基づく比に応じて復号信号スペクトル28の位相擾乱の幅を制御してもよい。
上記実施の形態5~7および9では、信号加工部4は復号信号スペクトル28を加工対象にして加工処理を実施する構成としたが、図12に示すように、雑音スペクトル抑圧部7が復号信号27の雑音抑圧処理を行った後で信号加工部4が信号加工を行う構成であっても良い。図12は本実施の形態による音声復号化装置200の全体構成を示したものである。図12では、雑音抑圧処理を行うために雑音スペクトル抑圧部7を備える構成を示すが、雑音スペクトル抑圧部7に代えて雑音抑圧フィルタ部23および時間・周波数変換部24(図7)を備える構成にしてもよい。なお、図12において図1~11と同一または相当の部分については同一の符号を付し説明を省略する。
上記実施の形態1~10では、時間・周波数変換部2がFFTによってスペクトル成分を算出し、周波数・時間変換部5が加工処理の実施されたスペクトル成分を逆FFT処理によって時間領域の信号に戻す構成としているが、FFTの代わりにバンドパスフィルタ群の各出力に対して、加工処理を実施し、帯域別信号の加算によって出力信号を得る構成も可能であるし、ウェーブレット(Wavelet)変換等の変換関数を用いることも可能である。
Claims (12)
- 入力信号を周波数成分である入力信号スペクトルに変換する時間・周波数変換部と、
前記入力信号から推定雑音スペクトルを推定する雑音スペクトル推定部と、
前記推定雑音スペクトルに基づいて前記入力信号スペクトルの雑音抑圧を行い、雑音抑圧スペクトルを生成する雑音スペクトル抑圧部と、
前記雑音抑圧スペクトルと前記推定雑音スペクトルに基づく比に応じて前記雑音抑圧スペクトルを変形すると共に平滑化した加工スペクトルを生成する信号変形部と、
前記雑音抑圧スペクトルに前記加工スペクトルを加算して、当該雑音抑圧スペクトルに含まれる劣化成分を抑圧する信号加算部とを備える雑音抑圧装置。 - 信号変形部は、周波数軸方向の重み付けをした加工スペクトルを生成することを特徴とする請求項1記載の雑音抑圧装置。
- 入力信号を周波数成分である入力信号スペクトルに変換する時間・周波数変換部と、
前記入力信号から推定雑音スペクトルを推定する雑音スペクトル推定部と、
前記推定雑音スペクトルに基づいて前記入力信号スペクトルの雑音抑圧を行い、雑音抑圧スペクトルを生成する雑音スペクトル抑圧部と、
前記雑音抑圧スペクトルと前記推定雑音スペクトルに基づく比に応じて前記雑音抑圧スペクトルを変形した変形雑音抑圧スペクトルを生成すると共に、当該変形雑音抑圧スペクトルを平滑化した加工スペクトルを生成する信号変形部と、
前記雑音抑圧スペクトルから前記変形雑音抑圧スペクトルを減算する信号減算部と、
前記信号減算部により前記変形雑音抑圧スペクトルが減算された前記雑音抑圧スペクトルに前記加工スペクトルを加算して、当該雑音抑圧スペクトルに含まれる劣化成分を抑圧する信号加算部とを備える雑音抑圧装置。 - 信号変形部は、周波数軸方向の重み付けをした加工スペクトルを生成することを特徴とする請求項3記載の雑音抑圧装置。
- 入力信号を周波数成分である入力信号スペクトルに変換する時間・周波数変換部と、
前記入力信号から推定雑音スペクトルを推定する雑音スペクトル推定部と、
前記推定雑音スペクトルに基づいて前記入力信号スペクトルの雑音抑圧を行い、雑音抑圧スペクトルを生成する雑音スペクトル抑圧部と、
前記雑音抑圧スペクトルと前記推定雑音スペクトルに基づく比に応じた度合いで、前記雑音抑圧スペクトルの位相を擾乱する位相擾乱部とを備える雑音抑圧装置。 - 位相擾乱部は、周波数軸方向の重み付けをした位相擾乱の度合いを求めることを特徴とする請求項5記載の雑音抑圧装置。
- 所定の符号データを復号化して復号信号を生成する音声復号部と、
前記復号信号を周波数成分である復号信号スペクトルに変換する時間・周波数変換部と、
前記復号信号から推定雑音スペクトルを推定する雑音スペクトル推定部と、
前記復号信号スペクトルと前記推定雑音スペクトルに基づく比に応じて前記復号信号スペクトルを変形すると共に平滑化した加工スペクトルを生成する信号変形部と、
前記復号信号スペクトルに前記加工スペクトルを加算して、当該復号信号スペクトルに含まれる劣化成分を抑圧する信号加算部とを備える音声復号化装置。 - 信号変形部は、周波数軸方向の重み付けをした加工スペクトルを生成することを特徴とする請求項7記載の音声復号化装置。
- 所定の符号データを復号化して復号信号を生成する音声復号部と、
前記復号信号を周波数成分である復号信号スペクトルに変換する時間・周波数変換部と、
前記復号信号から推定雑音スペクトルを推定する雑音スペクトル推定部と、
前記復号信号スペクトルと前記推定雑音スペクトルに基づく比に応じて前記復号信号スペクトルを変形した変形復号信号スペクトルを生成すると共に、当該変形復号信号スペクトルを平滑化した加工スペクトルを生成する信号変形部と、
前記復号信号スペクトルから前記変形復号信号スペクトルを減算する信号減算部と、
前記信号減算部により前記変形復号信号スペクトルが減算された前記復号信号スペクトルに前記加工スペクトルを加算して、当該復号信号スペクトルに含まれる劣化成分を抑圧する信号加算部とを備える音声復号化装置。 - 信号変形部は、周波数軸方向の重み付けをした加工スペクトルを生成することを特徴とする請求項9記載の音声復号化装置。
- 所定の符号データを復号化して復号信号を生成する音声復号部と、
前記復号信号を周波数成分である復号信号スペクトルに変換する時間・周波数変換部と、
前記復号信号から推定雑音スペクトルを推定する雑音スペクトル推定部と、
前記復号信号スペクトルと前記推定雑音スペクトルに基づく比に応じた度合いで、前記復号信号スペクトルの位相を擾乱する位相擾乱部とを備える音声復号化装置。 - 位相擾乱部は、周波数軸方向の重み付けをした位相擾乱の度合いを求めることを特徴とする請求項11記載の音声復号化装置。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008801310563A CN102150206B (zh) | 2008-10-24 | 2008-10-24 | 噪音抑制装置以及声音解码装置 |
JP2010534608A JP5153886B2 (ja) | 2008-10-24 | 2008-10-24 | 雑音抑圧装置および音声復号化装置 |
EP08877520.0A EP2346032B1 (en) | 2008-10-24 | 2008-10-24 | Noise suppressor and voice decoder |
US13/055,837 US20110125490A1 (en) | 2008-10-24 | 2008-10-24 | Noise suppressor and voice decoder |
PCT/JP2008/003021 WO2010046954A1 (ja) | 2008-10-24 | 2008-10-24 | 雑音抑圧装置および音声復号化装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2008/003021 WO2010046954A1 (ja) | 2008-10-24 | 2008-10-24 | 雑音抑圧装置および音声復号化装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010046954A1 true WO2010046954A1 (ja) | 2010-04-29 |
Family
ID=42119013
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2008/003021 WO2010046954A1 (ja) | 2008-10-24 | 2008-10-24 | 雑音抑圧装置および音声復号化装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20110125490A1 (ja) |
EP (1) | EP2346032B1 (ja) |
JP (1) | JP5153886B2 (ja) |
CN (1) | CN102150206B (ja) |
WO (1) | WO2010046954A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012114628A1 (ja) * | 2011-02-26 | 2012-08-30 | 日本電気株式会社 | 信号処理装置、信号処理方法、及び記憶媒体 |
JP2016038551A (ja) * | 2014-08-11 | 2016-03-22 | 沖電気工業株式会社 | 雑音抑圧装置、方法及びプログラム |
WO2018116944A1 (ja) * | 2016-12-20 | 2018-06-28 | 三菱電機株式会社 | 音声ノイズ検出装置、デジタル放送受信装置、及び音声ノイズ検出方法 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8725506B2 (en) * | 2010-06-30 | 2014-05-13 | Intel Corporation | Speech audio processing |
WO2012038998A1 (ja) | 2010-09-21 | 2012-03-29 | 三菱電機株式会社 | 雑音抑圧装置 |
CN103137133B (zh) * | 2011-11-29 | 2017-06-06 | 南京中兴软件有限责任公司 | 非激活音信号参数估计方法及舒适噪声产生方法及系统 |
EP2905779B1 (en) * | 2012-02-16 | 2016-09-14 | 2236008 Ontario Inc. | System and method for dynamic residual noise shaping |
JPWO2014017371A1 (ja) * | 2012-07-25 | 2016-07-11 | 株式会社ニコン | 音処理装置、電子機器、撮像装置、プログラム、及び、音処理方法 |
GB2520048B (en) * | 2013-11-07 | 2018-07-11 | Toshiba Res Europe Limited | Speech processing system |
US9721580B2 (en) * | 2014-03-31 | 2017-08-01 | Google Inc. | Situation dependent transient suppression |
CN105338148B (zh) * | 2014-07-18 | 2018-11-06 | 华为技术有限公司 | 一种根据频域能量对音频信号进行检测的方法和装置 |
US9953661B2 (en) * | 2014-09-26 | 2018-04-24 | Cirrus Logic Inc. | Neural network voice activity detection employing running range normalization |
US11282531B2 (en) * | 2020-02-03 | 2022-03-22 | Bose Corporation | Two-dimensional smoothing of post-filter masks |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001134287A (ja) * | 1999-11-10 | 2001-05-18 | Mitsubishi Electric Corp | 雑音抑圧装置 |
JP2003101445A (ja) * | 2001-09-20 | 2003-04-04 | Mitsubishi Electric Corp | エコー処理装置 |
JP3454190B2 (ja) | 1999-06-09 | 2003-10-06 | 三菱電機株式会社 | 雑音抑圧装置および方法 |
JP2004272292A (ja) | 1997-12-08 | 2004-09-30 | Mitsubishi Electric Corp | 音信号加工方法 |
JP2005258158A (ja) * | 2004-03-12 | 2005-09-22 | Advanced Telecommunication Research Institute International | ノイズ除去装置 |
JP2008076975A (ja) * | 2006-09-25 | 2008-04-03 | Fujitsu Ltd | 音信号補正方法、音信号補正装置及びコンピュータプログラム |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4630305A (en) * | 1985-07-01 | 1986-12-16 | Motorola, Inc. | Automatic gain selector for a noise suppression system |
JP3259759B2 (ja) * | 1996-07-22 | 2002-02-25 | 日本電気株式会社 | 音声信号伝送方法及び音声符号復号化システム |
CN1192358C (zh) * | 1997-12-08 | 2005-03-09 | 三菱电机株式会社 | 声音信号加工方法和声音信号加工装置 |
US6088668A (en) * | 1998-06-22 | 2000-07-11 | D.S.P.C. Technologies Ltd. | Noise suppressor having weighted gain smoothing |
WO2000046789A1 (fr) * | 1999-02-05 | 2000-08-10 | Fujitsu Limited | Detecteur de la presence d'un son et procede de detection de la presence et/ou de l'absence d'un son |
US7349841B2 (en) * | 2001-03-28 | 2008-03-25 | Mitsubishi Denki Kabushiki Kaisha | Noise suppression device including subband-based signal-to-noise ratio |
JP3457293B2 (ja) * | 2001-06-06 | 2003-10-14 | 三菱電機株式会社 | 雑音抑圧装置及び雑音抑圧方法 |
US20030055645A1 (en) * | 2001-09-18 | 2003-03-20 | Meir Griniasty | Apparatus with speech recognition and method therefor |
JP4162604B2 (ja) * | 2004-01-08 | 2008-10-08 | 株式会社東芝 | 雑音抑圧装置及び雑音抑圧方法 |
US7492889B2 (en) * | 2004-04-23 | 2009-02-17 | Acoustic Technologies, Inc. | Noise suppression based on bark band wiener filtering and modified doblinger noise estimate |
US7454332B2 (en) * | 2004-06-15 | 2008-11-18 | Microsoft Corporation | Gain constrained noise suppression |
GB2422237A (en) * | 2004-12-21 | 2006-07-19 | Fluency Voice Technology Ltd | Dynamic coefficients determined from temporally adjacent speech frames |
US20080243496A1 (en) * | 2005-01-21 | 2008-10-02 | Matsushita Electric Industrial Co., Ltd. | Band Division Noise Suppressor and Band Division Noise Suppressing Method |
US20060184363A1 (en) * | 2005-02-17 | 2006-08-17 | Mccree Alan | Noise suppression |
JP4670483B2 (ja) * | 2005-05-31 | 2011-04-13 | 日本電気株式会社 | 雑音抑圧の方法及び装置 |
US8566086B2 (en) * | 2005-06-28 | 2013-10-22 | Qnx Software Systems Limited | System for adaptive enhancement of speech signals |
JP4765461B2 (ja) * | 2005-07-27 | 2011-09-07 | 日本電気株式会社 | 雑音抑圧システムと方法及びプログラム |
EP1979901B1 (de) * | 2006-01-31 | 2015-10-14 | Unify GmbH & Co. KG | Verfahren und anordnungen zur audiosignalkodierung |
EP1918910B1 (en) * | 2006-10-31 | 2009-03-11 | Harman Becker Automotive Systems GmbH | Model-based enhancement of speech signals |
JP2008148179A (ja) * | 2006-12-13 | 2008-06-26 | Fujitsu Ltd | 音声信号処理装置および自動利得制御装置における雑音抑圧処理方法 |
US9966085B2 (en) * | 2006-12-30 | 2018-05-08 | Google Technology Holdings LLC | Method and noise suppression circuit incorporating a plurality of noise suppression techniques |
JP5018193B2 (ja) * | 2007-04-06 | 2012-09-05 | ヤマハ株式会社 | 雑音抑圧装置およびプログラム |
KR101437830B1 (ko) * | 2007-11-13 | 2014-11-03 | 삼성전자주식회사 | 음성 구간 검출 방법 및 장치 |
US20110178800A1 (en) * | 2010-01-19 | 2011-07-21 | Lloyd Watts | Distortion Measurement for Noise Suppression System |
-
2008
- 2008-10-24 US US13/055,837 patent/US20110125490A1/en not_active Abandoned
- 2008-10-24 WO PCT/JP2008/003021 patent/WO2010046954A1/ja active Application Filing
- 2008-10-24 JP JP2010534608A patent/JP5153886B2/ja active Active
- 2008-10-24 EP EP08877520.0A patent/EP2346032B1/en not_active Not-in-force
- 2008-10-24 CN CN2008801310563A patent/CN102150206B/zh not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004272292A (ja) | 1997-12-08 | 2004-09-30 | Mitsubishi Electric Corp | 音信号加工方法 |
JP3454190B2 (ja) | 1999-06-09 | 2003-10-06 | 三菱電機株式会社 | 雑音抑圧装置および方法 |
JP2001134287A (ja) * | 1999-11-10 | 2001-05-18 | Mitsubishi Electric Corp | 雑音抑圧装置 |
JP2003101445A (ja) * | 2001-09-20 | 2003-04-04 | Mitsubishi Electric Corp | エコー処理装置 |
JP2005258158A (ja) * | 2004-03-12 | 2005-09-22 | Advanced Telecommunication Research Institute International | ノイズ除去装置 |
JP2008076975A (ja) * | 2006-09-25 | 2008-04-03 | Fujitsu Ltd | 音信号補正方法、音信号補正装置及びコンピュータプログラム |
Non-Patent Citations (6)
Title |
---|
ATSUYOSHI YANO ET AL: "Shuhasu Ryoiki Shori ni yoru Zanryu Echo Yokuatsu Hoho no Kento", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 2004 NEN SOGO TAIKAI KOEN RONBUNSHU KISO. KYOKAI, vol. A-4-45, 8 March 2004 (2004-03-08), pages 136, XP008146761 * |
HIROHISA TASAKI ET AL: "Post Noise Smoother ni yoru Tei-rate CELP no Zatsuon Kukan Hinshitsu no Kaizen", THE ACOUSTICAL SOCIETY OF JAPAN HEISEI 10 NENDO SHUNKI KENKYU HAPPYOKAI KOEN RONBUNSHU -I, vol. 2-7-2, 17 March 1998 (1998-03-17), pages 237 - 238, XP008146760 * |
SATORU FURUTA ET AL: "Shuhasu Omomizuke Spectrum Subtraction-ho ni yoru Zatsuon Yokuatsu", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 2000 NEN SOGO TAIKAI KOEN RONBUNSHU JOHO. SYSTEM 1, vol. D-14-14, 7 March 2000 (2000-03-07), pages 183, XP008146763 * |
SATORU FURUTA ET AL: "Shuhasu Ryoiki Shori ni yoru Haikei Zatsuon Seiseiho no Ichikento", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 2002 NEN SOGO TAIKAI KOEN RONBUNSHU KISO. KYOKAI, vol. A-10-1, 7 March 2002 (2002-03-07), pages 236, XP008146762 * |
See also references of EP2346032A4 * |
STEVEN F. BOLL: "Suppression of Acoustic noise in speech usingspectralsubtraction", IEEE TRANS. ASSP, vol. ASSP-27, no. 2, April 1979 (1979-04-01) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012114628A1 (ja) * | 2011-02-26 | 2012-08-30 | 日本電気株式会社 | 信号処理装置、信号処理方法、及び記憶媒体 |
US9531344B2 (en) | 2011-02-26 | 2016-12-27 | Nec Corporation | Signal processing apparatus, signal processing method, storage medium |
JP2016038551A (ja) * | 2014-08-11 | 2016-03-22 | 沖電気工業株式会社 | 雑音抑圧装置、方法及びプログラム |
WO2018116944A1 (ja) * | 2016-12-20 | 2018-06-28 | 三菱電機株式会社 | 音声ノイズ検出装置、デジタル放送受信装置、及び音声ノイズ検出方法 |
JPWO2018116944A1 (ja) * | 2016-12-20 | 2019-04-11 | 三菱電機株式会社 | 音声ノイズ検出装置、デジタル放送受信装置、及び音声ノイズ検出方法 |
Also Published As
Publication number | Publication date |
---|---|
US20110125490A1 (en) | 2011-05-26 |
JPWO2010046954A1 (ja) | 2012-03-15 |
JP5153886B2 (ja) | 2013-02-27 |
CN102150206A (zh) | 2011-08-10 |
CN102150206B (zh) | 2013-06-05 |
EP2346032A4 (en) | 2012-10-24 |
EP2346032B1 (en) | 2014-05-07 |
EP2346032A1 (en) | 2011-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5153886B2 (ja) | 雑音抑圧装置および音声復号化装置 | |
JP5300861B2 (ja) | 雑音抑圧装置 | |
JP3591068B2 (ja) | 音声信号の雑音低減方法 | |
RU2329550C2 (ru) | Способ и устройство для улучшения речевого сигнала в присутствии фонового шума | |
JP3574123B2 (ja) | 雑音抑圧装置 | |
JP4836720B2 (ja) | ノイズサプレス装置 | |
US8521530B1 (en) | System and method for enhancing a monaural audio signal | |
JP4423300B2 (ja) | 雑音抑圧装置 | |
JP5791092B2 (ja) | 雑音抑圧の方法、装置、及びプログラム | |
US8804980B2 (en) | Signal processing method and apparatus, and recording medium in which a signal processing program is recorded | |
JP5245714B2 (ja) | 雑音抑圧装置及び雑音抑圧方法 | |
WO2008121436A1 (en) | Method and apparatus for quickly detecting a presence of abrupt noise and updating a noise estimate | |
JPWO2013065088A1 (ja) | 雑音抑圧装置 | |
JP5526524B2 (ja) | 雑音抑圧装置及び雑音抑圧方法 | |
JP2003280696A (ja) | 音声強調装置及び音声強調方法 | |
JP5840087B2 (ja) | 音声信号復元装置および音声信号復元方法 | |
JP4173525B2 (ja) | 雑音抑圧装置及び雑音抑圧方法 | |
JP5131149B2 (ja) | 雑音抑圧装置及び雑音抑圧方法 | |
JP2006113515A (ja) | ノイズサプレス装置、ノイズサプレス方法及び移動通信端末装置 | |
Esch et al. | Wideband noise suppression supported by artificial bandwidth extension techniques | |
JP4098271B2 (ja) | 雑音抑圧装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200880131056.3 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08877520 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010534608 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13055837 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008877520 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |