WO2014112023A1 - 雑音除去システム、音声検出システム、音声認識システム、雑音除去方法および雑音除去プログラム - Google Patents
雑音除去システム、音声検出システム、音声認識システム、雑音除去方法および雑音除去プログラム Download PDFInfo
- Publication number
- WO2014112023A1 WO2014112023A1 PCT/JP2013/007573 JP2013007573W WO2014112023A1 WO 2014112023 A1 WO2014112023 A1 WO 2014112023A1 JP 2013007573 W JP2013007573 W JP 2013007573W WO 2014112023 A1 WO2014112023 A1 WO 2014112023A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- noise
- estimated
- input signal
- stationary
- unit
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02165—Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
Definitions
- the present invention relates to a noise removal system, a voice detection system, a speech recognition system, a noise removal method, and a noise removal program for removing noise contained in speech mixed with noise, and more particularly to both stationary noise and non-stationary noise.
- the present invention relates to a noise removal system, a voice detection system, a voice recognition system, a noise removal method, and a noise removal program that can be removed with high accuracy.
- Patent Document 1 describes an example of a noise estimation system and a noise attenuation system that performs noise removal using noise estimated by the noise estimation system.
- FIG. 8 is a block diagram showing the configuration of the noise estimation system described in Patent Document 1. As shown in FIG. As shown in FIG. 8, the noise estimation system described in Patent Document 1 includes a first noise estimation unit 611, a first noise attenuation unit 621, a voice pattern storage unit 631, and a second noise attenuation unit. 622 and a second noise estimator 612.
- the noise estimation system having such a configuration operates as follows.
- the noise included in the input signal is estimated by the first noise estimation unit 611, and the first noise attenuation unit 621 subtracts the estimated first noise from the input signal to obtain a first noise attenuation signal.
- the second noise attenuation unit 622 obtains a second noise attenuation signal using the first noise attenuation signal and the audio pattern stored in the audio pattern storage unit 631.
- the second noise estimator 612 obtains the second noise using the second noise attenuation signal.
- Patent Document 2 describes a technique for improving noise resistance in an environment where non-stationary noise such as CD player or radio sound exists in addition to stationary noise.
- the noise estimation system described in Patent Literature 1 obtains a noise attenuation signal using the first noise estimated by the first noise estimation unit, and re-estimates the noise using the noise attenuation signal. Thereby, the noise estimation system can estimate the noise included in the input signal with higher accuracy than the first noise.
- the noise estimation system has the following problems.
- the problem is that a noise component that is not included in the first noise estimated by the first noise estimation unit, that is, a component that is not regarded as noise by the first noise estimation unit, is output from the second noise estimation unit. It is also not included in some second noise.
- the first noise estimation unit estimates a stationary noise component (noise average value, noise component with small variation in time of dispersion value), non-stationary noise component (noise average value, dispersion value Since a noise component having a large time variation is not included in the first noise, an unsteady noise component remains in the noise attenuation signal.
- the second noise does not include non-stationary noise, and even if the noise attenuation signal is calculated using the second noise, the noise attenuation signal is non-stationary. Noise component remains.
- the present invention estimates noise included in an input signal with high accuracy, and uses the estimated noise to remove noise included in the input signal with high accuracy, a speech detection system, a speech recognition system,
- An object of the present invention is to provide a noise removal method and a noise removal program.
- the noise removal system estimates a stationary noise component included in a first input signal, outputs a first estimated noise, a first input signal, and a first input signal.
- a first noise removing unit that outputs a first estimated speech obtained by removing a stationary noise component from the first input signal using the first estimated noise from the noise estimating unit; and a first input signal.
- the first estimated speech from the first noise removing unit, the second noise that re-estimates the stationary noise component included in the first input signal and outputs the second estimated noise
- a second non-stationary component composed of a sum of a stationary noise component and a non-stationary noise component included in the first input signal using the estimation unit, the first input signal, and the second input signal.
- a third noise estimator that estimates a stationary noise component and outputs a third estimated noise; and a second noise estimator
- the stationary noise component and the second non-stationary noise component included in the first input signal are estimated using the second estimated noise and the third estimated noise from the third noise estimator.
- a second noise removing unit that removes a stationary noise component and a second non-stationary noise component included in the first input signal.
- the speech detection system estimates a stationary noise component included in a first input signal, outputs a first estimated noise, a first input signal, and a first input signal.
- a first noise removing unit that outputs a first estimated speech obtained by removing a stationary noise component from the first input signal using the first estimated noise from the noise estimating unit; and a first input signal.
- the first estimated speech from the first noise removing unit, the second noise that re-estimates the stationary noise component included in the first input signal and outputs the second estimated noise
- a second non-stationary component composed of a sum of a stationary noise component and a non-stationary noise component included in the first input signal using the estimation unit, the first input signal, and the second input signal.
- a third noise estimator that estimates a stationary noise component and outputs a third estimated noise; and a second noise estimator
- the stationary noise component and the second non-stationary noise component included in the first input signal are estimated using the second estimated noise and the third estimated noise from the third noise estimator.
- a normalizing unit that normalizes the second estimated speech from the noise removing unit with the second estimated noise from the second noise estimating unit or the first estimated noise from the first noise estimating unit;
- a voice detection unit for detecting voice using the normalized voice from the above.
- the speech recognition system estimates a stationary noise component included in a first input signal, outputs a first estimated noise, a first input signal, and a first input signal.
- a first noise removing unit that outputs a first estimated speech obtained by removing a stationary noise component from the first input signal using the first estimated noise from the noise estimating unit; and a first input signal.
- the first estimated speech from the first noise removing unit, the second noise that re-estimates the stationary noise component included in the first input signal and outputs the second estimated noise
- a second non-stationary component composed of a sum of a stationary noise component and a non-stationary noise component included in the first input signal using the estimation unit, the first input signal, and the second input signal.
- a third noise estimator that estimates a stationary noise component and outputs a third estimated noise; and a second noise estimator
- the stationary noise component and the second non-stationary noise component included in the first input signal are estimated using the second estimated noise and the third estimated noise from the third noise estimator.
- An estimated noise integrating unit a second noise removing unit that outputs a second estimated speech obtained by removing stationary noise components and second non-stationary noise components from the first input signal,
- a normalizing unit that normalizes the second estimated speech from the noise removing unit with the second estimated noise from the second noise estimating unit or the first estimated noise from the first noise estimating unit;
- a speech detection unit that detects speech using the normalized speech from a first recognition speech from the first noise removal unit and a speech recognition unit that recognizes speech based on a detection result from the speech detection unit It is characterized by providing.
- the speech recognition system estimates a stationary noise component included in a first input signal, outputs a first estimated noise, a first input signal, and a first input signal.
- a first noise removing unit that outputs a first estimated speech obtained by removing a stationary noise component from the first input signal using the first estimated noise from the noise estimating unit; and a first input signal.
- the first estimated speech from the first noise removing unit, the second noise that re-estimates the stationary noise component included in the first input signal and outputs the second estimated noise
- a second non-stationary component composed of a sum of a stationary noise component and a non-stationary noise component included in the first input signal using the estimation unit, the first input signal, and the second input signal.
- a third noise estimator that estimates a stationary noise component and outputs a third estimated noise; and a second noise estimator
- the stationary noise component and the second non-stationary noise component included in the first input signal are estimated using the second estimated noise and the third estimated noise from the third noise estimator.
- a normalizing unit that normalizes the second estimated speech from the noise removing unit with the second estimated noise from the second noise estimating unit or the first estimated noise from the first noise estimating unit;
- the first input signal is stationary using a voice detection unit that detects voice using the normalized voice from the first input signal and the second estimated noise from the second noise estimation unit.
- a third noise removing unit that outputs a third estimated speech from which noise components have been removed; a third estimated speech from the third noise removing unit; Receives the detection result from the voice detecting unit, characterized in that it comprises a
- the noise removal method estimates a stationary noise component included in a first input signal, outputs a first estimated noise, and uses the first input signal and the first estimated noise, A first estimated speech obtained by removing a stationary noise component from the first input signal is output, and at least the first input signal and the first estimated speech are used to attain a stationary state included in the first input signal.
- a second non-stationary noise component composed of a sum of various noise components is output, a third estimated noise is output, and the first estimated noise is output using the second estimated noise and the third estimated noise.
- a stationary noise component and a second non-stationary noise component included in the input signal are estimated, and a stationary noise component included in the first input signal is estimated. And removing the a noise component and a second non-stationary noise components.
- the noise removal program estimates a stationary noise component included in the first input signal to the computer and outputs the first estimated noise, and the first input signal and the first estimated noise. Are used to output a first estimated speech obtained by removing a stationary noise component from the first input signal, and at least using the first input signal and the first estimated speech.
- a stationary noise component included in the input signal is re-estimated and output from the second estimated noise, and the first input signal and the second input signal are included in the first input signal.
- the noise included in the input signal can be estimated with high accuracy, and the noise included in the input signal can be removed with high accuracy using the estimated noise.
- a stationary noise component included in the first input signal is estimated with high accuracy using the first input signal, and further included in the first input signal using the second input signal.
- the non-stationary noise component is estimated, and the estimated stationary noise component and the non-stationary noise component are integrated and removed from the first input signal, so that the noise included in the first input signal is reduced. It can be removed with high accuracy.
- FIG. 1 It is a block diagram which shows the structure of the noise removal system of the 1st Embodiment of this invention. It is a flowchart which shows the process of the noise removal system of the 1st Embodiment of this invention. It is a block diagram which shows the structure of the audio
- Embodiment 1 FIG. A first embodiment of the present invention will be described below with reference to the drawings.
- FIG. 1 is a block diagram showing a configuration of a noise removal system according to a first embodiment of the present invention.
- the noise removal system includes a first microphone (hereinafter referred to as a microphone) 101, a second microphone 102, a first noise estimation unit 111, a second noise estimation unit 112, and the like. , A third noise estimating unit 113, an estimated noise integrating unit 114, a first noise removing unit 121, and a second noise removing unit 122.
- the first microphone 101 outputs a signal based on the input voice (hereinafter referred to as a first input signal).
- the first noise estimation unit 111 estimates a stationary noise component included in the first input signal, and outputs the first estimated noise.
- the first noise removal unit 121 uses the first input signal and the first estimated noise obtained by the first noise estimation unit 111 to use a stationary noise component included in the first input signal. Remove.
- the first noise removing unit 121 outputs the first input signal from which the stationary noise component is removed as the first estimated speech.
- the second noise estimation unit 112 uses at least the first input signal and the first estimated speech obtained by the first noise removal unit 121 to use stationary noise included in the first input signal.
- the component is re-estimated and the second estimated noise is output.
- the second microphone 102 outputs a signal based on the input voice (hereinafter referred to as a second input signal).
- the third noise estimation unit 113 estimates a non-stationary noise component included in the first input signal using the first input signal and the second input signal, and outputs the third estimated noise. To do.
- the estimated noise integration unit 114 is included in the first input signal using the second estimated noise from the second noise estimation unit 112 and the third estimated noise from the third noise estimation unit 113.
- a stationary noise component and a non-stationary noise component are estimated, and a fourth estimated noise is output.
- the second noise removing unit 122 uses the first input signal and the fourth estimated noise obtained by the estimated noise integrating unit 114 to remove the stationary noise component included in the first input signal from the non-steady noise component. Remove stationary noise.
- the first noise estimation unit 111, the second noise estimation unit 112, the third noise estimation unit 113, the estimated noise integration unit 114, the first noise removal unit 121, and the second noise removal unit 122 are, for example, Realized by a computer operating according to a noise removal program.
- the CPU reads the noise removal program, and in accordance with the program, the first noise estimation unit 111, the second noise estimation unit 112, the third noise estimation unit 113, the estimated noise integration unit 114, the first noise removal Unit 121 and second noise removal unit 122 operate.
- the first noise estimation unit 111, the second noise estimation unit 112, the third noise estimation unit 113, the estimated noise integration unit 114, the first noise removal unit 121, and the second noise removal unit 122 are separately provided. It may be realized by hardware.
- FIG. 2 is a flowchart showing processing of the noise removal system according to the first embodiment of the present invention.
- the frequency spectrum of the audio signal is S (f, t) and the frequency spectrum of the noise signal is N (k, f, t).
- the frequency spectrum X1 (f, t) of the first input signal that is the output of the first microphone 101 and the frequency spectrum X2 (f, t) of the second input signal that is the output of the second microphone 102. are modeled by Equation 1 and Equation 2, respectively.
- f is a frequency index.
- t is a time index.
- k is the index of the noise source.
- the frequency spectrum is handled as a power spectrum and an amplitude power spectrum.
- the multiplication symbol “x” may be omitted.
- H0 (f, t) is a frequency spectrum of a path difference when the audio signal S (f, t) is transmitted to the first microphone 101 and the second microphone 102.
- H (k, f, t) is a frequency spectrum of a path difference when the noise signal N (k, f, t) of the noise source k is transmitted to the second microphone 102 and the first microphone 101.
- ⁇ _ ⁇ x lower limit ⁇ ⁇ ⁇ upper limit ⁇ f (x) is the sum of f (x) when the variable x is changed from the lower limit to the upper limit.
- the noise to be removed in Equation 1 is divided into a frequency spectrum Ns (f, t) of stationary noise that is the first noise component and a frequency spectrum Nn (f) of non-stationary noise that is the second noise component.
- the first noise estimator 111 acquires the first input signal represented by Expression 3 from the first microphone 101 (step S1), and stationary noise included in the first input signal X1 (f, t).
- the component Ns (f, t) is estimated (step S2).
- An average (time average) of X1 (f, t) is defined as a first estimated noise Ns′1 (f, t).
- ave_ ⁇ x ⁇ [f (x)] is an operator that averages f (x) with respect to x.
- the estimation method of the first estimated noise Ns′1 a histogram of the input signal X1 (f, t) is created and the minimum value is set to the first estimated noise Ns′1 (f, t). t).
- a method for estimating the first estimated noise Ns′1 (f, t) using the estimation method described in Japanese Patent Laid-Open No. 2002-204175 there is a method for estimating the first estimated noise Ns′1 (f, t) using the estimation method described in Japanese Patent Laid-Open No. 2002-204175.
- the first noise estimation unit 111 may estimate the first estimated noise Ns′1 (f, t) using a method different from the above example.
- the first noise removal unit 121 obtains the first estimated speech S′1 (f, t) ( Step S3).
- An example of a method for estimating the first estimated speech S′1 (f, t) will be described below.
- the first noise removing unit 121 may estimate the first estimated speech S′1 (f, t) using a method different from the above example.
- the second noise estimating unit 112 uses the first input signal X1 (f, t) and the first input signal X1 (f, t).
- a second estimated noise Ns′2 (f, t) is obtained using at least the estimated speech S′1 (f, t) (step S4).
- Ns′2 (f, t) X1 (f, t) ⁇ S′1 (f, t)
- the second noise estimator 112 receives the first estimated noise Ns′1 (f , T) may be used to estimate the second estimated noise Ns′2 (f, t).
- the second noise estimation unit 112 may estimate the second estimated noise Ns′2 (f, t) using a method different from the above example.
- the second noise estimator 112 uses the first estimated sound S′1 (f, t) in addition to the first input signal X1 (f, t), thereby obtaining the first estimated noise Ns′1 ( It is possible to estimate the stationary noise component Ns (f, t) included in X1 (f, t) with higher accuracy than f, t). In particular, the second noise estimation unit 112 calculates not only the average value Nsm (f, t) of the stationary noise component shown in Equation 3, but also the difference between Ns (f, t) and Nsm (f, t). It can be estimated including a certain Nsv (f, t).
- WI (f, t) takes a value close to 1, and the WI This is because (f, t) is multiplied by the first input signal X1 (f, t) including Nsv (f, t) to obtain the second estimated noise Ns′2 (f, t).
- the non-stationary noise component Nn (f, t) is not included in the second estimated noise Ns′2 (f, t). This is because the non-stationary noise component is not regarded as noise in the first estimated noise Ns′1 (f, t).
- the third noise estimation unit 113 obtains the second input signal from the second microphone 102 (step S5), the first input signal X1 (f, t) and the second input signal X2 ( The third estimated noise Nn′1 (f, t) is obtained using (f, t) (step S6).
- H ′ (f, t) is an estimated value of H (k, f, t) included in Equation 1, and may be estimated by a method other than the method shown in the above example. If the value of H ′ (f, t) can be obtained in advance, that value may be used.
- the third estimated noise Nn′1 (f, t) includes a non-stationary noise component that is not included in the second estimated noise Ns′2 (f, t). However, for a stationary noise component included in the first input signal from the first microphone 101, the difference Nsv (f, t) between Ns (f, t) and its average value Nsm (f, t).
- the second estimated noise Ns′2 (f, t) estimated including the second is more accurate than the third estimated noise Nn′1 (f, t) based on the second input signal.
- the estimated noise integration unit 114 outputs the second estimated noise Ns′2 (f, t) output from the second noise estimation unit 112 and the third noise output from the third noise estimation unit 113.
- Stationary noise and non-stationary noise included in the first input signal are estimated with high accuracy using the estimated noise Nn′1 (f, t) (step S7).
- the estimated noise integration unit 114 outputs the estimated noise as the fourth estimated noise N ′ (f, t).
- N ′ (f, t) (1 ⁇ (f, t)) ⁇ s (f, t) Ns′2 (f, t) ⁇ + ⁇ (f, t) ⁇ n (f, t) Nn′1 (F, t) ⁇
- ⁇ (f, t) is a coefficient for controlling the mixing of ⁇ s (f, t) Ns′2 (f, t) and ⁇ n (f, t) Nn′1 (f, t) (hereinafter referred to as mixing) It is called a coefficient.)
- ⁇ s (f, t) is a coefficient (hereinafter referred to as an adjustment coefficient) for finely adjusting the estimated value Ns′2 (f, t) of the stationary noise component.
- ⁇ n (f, t) is an adjustment coefficient for finely adjusting the estimated value Nn′1 (f, t) of noise including non-stationary noise.
- ⁇ s (f, t) and ⁇ n (f, t) are normally preferably 1.0, but if noise is to be overestimated, the value is greater than 1.0, and if it is to be underestimated, it is greater than 1.0. A small value may be set.
- the mixing coefficient ⁇ (f, t) takes a value close to 1.0 when non-stationary noise exists, and takes a value close to 0.0 when non-stationary noise does not exist. That's fine. For example, the following may be performed.
- the fourth estimated noise N ′ (f, t) is as follows when the operation max [] taking the maximum value is used.
- N ′ (f, t) max [ ⁇ s (f, t) Ns′2 (f, t), ⁇ n (f, t) Nn′1 (f, t)]
- ⁇ (f, t) approaches 1.0 and decreases.
- ⁇ (f, t) approaches 1.0 and decreases.
- ⁇ (f, t) approaches 0.0. Note that a calculation method of ⁇ (f, t) may be different from the above example.
- the second noise removal unit 122 removes the noise included in the first input signal X1 (f, t) using the fourth estimated noise N ′ (f, t) (step S8).
- the first input signal X1 (f, t) from which noise is removed is output as the second estimated speech S′2 (f, t).
- the second noise removing unit 122 can use the method shown in the following example.
- the second noise removal unit 122 may estimate the second estimated speech S′2 (f, t) using a method different from the above example.
- the noise estimation value Nn′1 (f, t) including a non-stationary noise component based on the signal is integrated.
- the stationary noise component and the non-stationary noise component included in the first input signal can be estimated with high accuracy.
- noise is removed from the first input signal based on the estimated stationary noise component and unsteady noise component, so that noise can be removed with high accuracy. it can.
- the unsteady noise component to be removed in the method described in Patent Document 2 is a reproduction sound (echo from a speaker such as a CD player) input from a CD player or the like input via a microphone.
- an unsteady noise component is estimated from a CD player or the like using an electric signal before being converted into sound as a reference signal. Therefore, non-stationary noise components not included in the reference signal cannot be estimated and cannot be removed.
- the non-stationary noise component estimated by the third noise estimation unit 113 is transmitted from the speaker.
- the stationary noise component and the non-stationary noise component not reproduced from the speaker are also included. That is, the third noise estimation unit 113 expresses a non-stationary noise component (hereinafter, referred to as a second non-stationary noise component) composed of a sum of a stationary noise component and a non-stationary noise component. ). Therefore, according to the present embodiment, in addition to the non-stationary noise component reproduced from the speaker, the stationary noise component and the non-stationary noise component not reproduced from the speaker can be removed.
- FIG. 3 is a block diagram showing the configuration of the voice detection system according to the second embodiment of the present invention.
- FIG. 4 is a block diagram illustrating another configuration of the voice detection system according to the second embodiment.
- symbol same as FIG. 1 is attached
- subjected and description is abbreviate
- the voice detection system includes a normalization unit 131 and a voice detection unit 132 in addition to the configuration of the noise removal system of the first embodiment.
- the normalization unit 131 converts the second estimated speech S′2 (f, t) from the second noise removal unit 122 into the second estimated noise Ns′2 (f, t) from the second noise estimation unit 112. Normalize using t).
- the sound detection unit 132 detects the sound using the normalized sound from the normalization unit 131.
- the noise removal unit 122, the normalization unit 131, and the voice detection unit 132 are realized by a computer that operates according to a voice detection program, for example.
- the CPU reads the voice detection program, and in accordance with the program, the first noise estimation unit 111, the second noise estimation unit 112, the third noise estimation unit 113, the estimated noise integration unit 114, the first noise removal Unit 121, second noise removal unit 122, normalization unit 131, and voice detection unit 132.
- the unit 131 and the voice detection unit 132 may be realized by separate hardware.
- the normalization unit 131 converts the second estimated speech S′2 (f, t) from the second noise removal unit 122 into the second estimated noise Ns′2 (f, t) from the second noise estimation unit 112. ) And normalized speech Sn ′ (f, t) is output. Equation 4 shows an example of normalization of the second estimated speech S′2 (f, t).
- the value of the normalized sound Sn ′ (f, t) is changed. do not do. That is, when detecting sound using Sn ′ (f, t), the detection threshold value can be easily set.
- the first term H0 (f, t) S (f, t) (hereinafter referred to as crosstalk) of the right side of the second input signal expressed by Expression 2 cannot be ignored, the fourth estimated noise N ′ (
- Sn ′ (f, t) becomes a larger value in the speech section. That is, when detecting voice using Sn ′ (f, t), the voice section can be detected with higher accuracy.
- the normalized speech is output for each of the frequency index f and the time index t in Equation 4, normalized speech is output, but it may be averaged by frequency or time. Further, as illustrated in FIG. 4, the normalization unit 131 similarly uses the first estimated noise Ns′1 (f, t, which does not include crosstalk instead of the second estimated noise Ns′2 (f, t). t) may be input and normalized using the first estimated noise Ns′1 (f, t).
- a small amount of non-stationary noise is generated with respect to the second estimated noise Ns′2 (f, t) and the first estimated noise Ns′1 (f, t) used for normalization. May be mixed. That is, the second estimated noise Ns′2 (f, t) and the first estimated noise Ns′1 (f, t) mixed with the small amount of non-stationary noise may be used for normalization.
- ⁇ (f, t) is a coefficient that controls the degree of non-stationary noise mixed into Ns′2 (f, t) or Ns′1 (f, t), and is a positive number smaller than 1. is there.
- Ns′2 (f, t) 0.01 is set, N′2 (f, t) is included in Nn′1 (f, t). 1% of unsteady noise is mixed.
- 1% of nonstationary noise is mixed into N′2 (f, t).
- the adverse effect of a small amount of non-stationary noise remaining in S′2 (f, t) can be reduced.
- the stationary noise is very small compared to the non-stationary noise, the adverse effect of the remaining trace amount of the non-stationary noise is great, so that the effect of mixing the trace amount of the non-stationary noise is great.
- ⁇ (f, t) may be set to a larger value as the low frequency (f is smaller), which is more difficult to estimate the non-stationary noise.
- the stationary noise may be set to a larger value as it is smaller than the non-stationary noise.
- the voice detection unit 132 detects the voice using the normalized voice Sn ′ (f, t) from the normalization unit 131 and outputs the detection result. Examples of detection results are shown below.
- Sn ′ (t) is a normalized speech calculated after averaging with respect to the frequency f when calculating Sn ′ (f, t). At time t, if Sn ′ (t) is larger than the threshold Th, it is determined that it is not the target voice section, and if it is smaller than the threshold Th, it is not the target voice section.
- the second estimated speech S′2 (f, t) from the second noise removing unit 122 from which noise has been removed with high accuracy is used as the second noise estimating unit 112. Is normalized using the second estimated noise Ns′2 (f, t). Thereby, setting of the threshold value in the voice detection unit 132 is facilitated.
- the second estimated noise Ns′2 (f, t) that does not include the crosstalk is not a normal value instead of the fourth estimated noise N ′ (f, t). Turn into. As a result, Sn ′ (f, t) becomes a larger value in the speech section. That is, when detecting speech using Sn ′ (f, t), the speech section can be detected with higher accuracy.
- FIG. 5 is a block diagram showing the configuration of the speech recognition system according to the third embodiment of the present invention.
- symbol same as FIG. 3 is attached
- subjected and description is abbreviate
- the voice recognition system includes a voice recognition unit 133 in addition to the configuration of the voice detection system of the second embodiment.
- the voice recognition unit 133 recognizes the voice in response to the first estimated voice S′1 (f, t) from the first noise removal unit 121 and the detection result from the voice detection unit 132.
- the voice recognition unit 133 recognizes the voice in response to the first estimated voice S′1 (f, t) from the first noise removal unit 121 and the detection result from the voice detection unit 132, and outputs the voice recognition result. To do.
- the speech recognition unit 133 recognizes the first estimated speech S′1 (f, t) from the first noise removal unit 121 when the received detection result is the target speech section.
- the first estimated speech from the first noise removing unit 121 that is not affected by the crosstalk not the second estimated speech S′2 (f, t) from the second noise removing unit 122.
- S′1 (f, t) as an input to the speech recognition unit 133, it is possible to prevent a decrease in speech recognition rate due to the influence of crosstalk.
- FIG. 6 is a block diagram illustrating another configuration of the speech recognition system according to the third embodiment.
- the voice recognition system shown in FIG. 6 includes a third noise removal unit 123 in addition to the configuration of the voice recognition system shown in FIG.
- the third noise removing unit 123 uses the second estimated noise Ns′2 (f, t) that does not include the first input signal and crosstalk, and uses the first noise removing unit 121 and the second noise.
- a third estimated speech is obtained by the same method as the removal unit 122. Then, the third noise removal unit 123 outputs the third estimated speech to the speech recognition unit 133.
- the voice is recognized by receiving the first estimated voice S′1 (f, t) from the first noise removing unit 121 and the detection result from the voice detecting unit 132. Then, the speech recognition result is output. As described above, the high-precision detection result from the voice detection unit 132 and the first estimated voice S′1 (f, t) from the first noise removal unit 121 that is not affected by the crosstalk are recognized as voices. By using the input of the unit 133, a high speech recognition rate can be achieved.
- the noise removal unit 122, the normalization unit 131, the voice detection unit 132, the voice recognition unit 133, and the third noise removal unit 123 are realized by, for example, a computer that operates according to a voice recognition program.
- the CPU reads the voice recognition program, and according to the program, the first noise estimation unit 111, the second noise estimation unit 112, the third noise estimation unit 113, the estimated noise integration unit 114, the first noise removal Unit 121, second noise removal unit 122, normalization unit 131, voice detection unit 132, voice recognition unit 133, and third noise removal unit 123.
- the unit 131, the voice detection unit 132, the voice recognition unit 133, and the third noise removal unit 123 may be realized by separate hardware.
- FIG. 7 is an explanatory diagram showing an embodiment of a voice recognition system according to the present invention.
- the terminal 200 is a tablet terminal, for example, and the speaker 300 operates the touch panel 201 installed on the terminal 200.
- the side on which the touch panel 201 is installed is the surface of the terminal 200.
- the voice uttered by the speaker 300 is picked up by the first microphone 101 and the second microphone 102. It is desirable that the first microphone 101 and the second microphone 102 be arranged so that the speaker's voice is greatly input by the first microphone 101. Therefore, in this embodiment, as shown in FIG. 7, the first microphone 101 is arranged on the surface of the terminal 200. Then, the second microphone 102 is arranged on the back surface of the terminal 200 so that the direct sound of the voice of the speaker 300 is not input to the second microphone 102.
- the direct sound of the speaker 300 is input to the first microphone 101, but only the reflected sound and the diffracted sound are input to the second microphone 102. For this reason, the voice of the speaker 300 is greatly input by the first microphone 101.
- it is desirable that the noise from the air conditioner 400 and the television 500 that generate noise is input largely by the second microphone 102.
- the speech recognition system can recognize speech with high accuracy.
- the speech recognition system accurately estimated and estimated the stationary noise component and the non-stationary noise component included in the first input signal output from the first microphone 101. This is because noise is removed from the first input signal based on the stationary noise component and the non-stationary noise component.
- the speech recognition system includes the first microphone 101 and the second microphone 102
- the speech recognition system includes the first microphone 101 and the second microphone. 102 may not be provided.
- a microphone included in terminal 200 may be used as the first microphone and the second microphone.
- the noise removal system and the voice detection system may not include the first microphone 101 and the second microphone 102.
- the present invention can be applied to uses such as a noise removal system that can remove noise contained in an input signal, and a program for realizing the noise removal system in a computer.
- the estimated noise integration unit multiplies the second estimated noise from the second noise estimator and the third estimated noise from the third noise estimator by the adjustment coefficient, and multiplies the adjustment coefficient.
- the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient in accordance with the magnitude of the third estimated noise multiplied by the second estimated noise and the adjustment coefficient.
- the first input by controlling the mixing coefficient for mixing and multiplying each of the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient after multiplying each mixing coefficient.
- the speech detection system according to appendix 1, wherein the stationary noise component and the second non-stationary noise component included in the signal are estimated.
- the voice section can be detected with higher accuracy.
- the estimated noise integration unit multiplies the second estimated noise from the second noise estimator and the third estimated noise from the third noise estimator by the adjustment coefficient, and multiplies the adjustment coefficient.
- the speech detection system according to supplementary note 1, wherein noise included in the first input signal is estimated by selecting a larger one of the third estimated noise multiplied by the second estimated noise and the adjustment coefficient.
- a first voice input device that outputs an input voice as a first input signal and a second voice input device that outputs an input voice as a second input signal.
- the speech to be noise-removed input to the input device is greater than the speech to be noise-removed to be input to the second speech input device, according to any one of supplementary notes 1 to 3 Voice detection system.
- the voice can be detected with higher accuracy.
- a first noise removing unit that outputs a first estimated speech obtained by removing a stationary noise component from the first input signal using the first estimated noise, a first input signal, and a first
- a second noise estimator that re-estimates a stationary noise component included in the first input signal and outputs a second estimated noise using at least the first estimated speech from the noise removing unit
- Second non-stationary noise composed of a sum of stationary noise components and non-stationary noise components included in the first input signal using the first input signal and the second input signal
- a third noise estimator that estimates the component and outputs a third estimated noise; a second estimated noise from the second noise estimator;
- An estimated noise integrating unit that estimates a stationary noise component and a second non-stationary noise component included in the first input signal using the third estimated noise from the noise estimating unit;
- a speech recognition system comprising: a speech detection unit for detection; and a speech recognition unit that receives a first estimated speech from the first noise removal unit and a detection result from the speech detection unit and recognizes speech.
- the estimated noise integration unit multiplies the second estimated noise from the second noise estimator and the third estimated noise from the third noise estimator by the adjustment coefficient, and multiplies the adjustment coefficient.
- the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient in accordance with the magnitude of the third estimated noise multiplied by the second estimated noise and the adjustment coefficient.
- the first input by controlling the mixing coefficient for mixing and multiplying each of the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient after multiplying each mixing coefficient.
- the estimated noise integration unit multiplies the second estimated noise from the second noise estimator and the third estimated noise from the third noise estimator by the adjustment coefficient, and multiplies the adjustment coefficient.
- the speech recognition system according to supplementary note 5, wherein the noise included in the first input signal is estimated by selecting the larger one of the third estimated noise multiplied by the second estimated noise and the adjustment coefficient.
- a first voice input device that outputs an input voice as a first input signal and a second voice input device that outputs an input voice as a second input signal, the first voice
- the speech to be noise-removed input to the input device is greater than the speech to be noise-removed to be input to the second speech input device, according to any one of supplementary notes 5 to 7 Speech recognition system.
- a higher speech recognition rate can be achieved even when stationary noise from an air conditioner or non-stationary noise from a television is emitted.
- a third noise removing unit that outputs a third estimated speech from the third noise removing unit and a detection result from the speech detecting unit
- Receiving a speech recognition system comprising: a recognizing speech recognition unit the speech.
- the estimated noise integration unit multiplies the second estimated noise from the second noise estimator and the third estimated noise from the third noise estimator by the adjustment coefficient, and multiplies the adjustment coefficient.
- the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient in accordance with the magnitude of the third estimated noise multiplied by the second estimated noise and the adjustment coefficient.
- the first input by controlling the mixing coefficient for mixing and multiplying each of the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient after multiplying each mixing coefficient.
- the speech recognition system according to appendix 9, which estimates a stationary noise component and a second non-stationary noise component included in the signal.
- the estimated noise integration unit multiplies the second estimated noise from the second noise estimator and the third estimated noise from the third noise estimator by the adjustment coefficient, and multiplies the adjustment coefficient.
- the speech recognition system according to supplementary note 9, wherein the noise included in the first input signal is estimated by selecting the larger one of the third estimated noise multiplied by the second estimated noise and the adjustment coefficient.
- a first voice input device that outputs an input voice as a first input signal
- a second voice input device that outputs the input voice as a second input signal.
- the speech to be noise-removed input to the input device is greater than the speech to be noise-removed to be input to the second speech input device, as described in any one of supplementary notes 9 to 11 Speech recognition system.
- a stationary noise component included in the first input signal is estimated, the first estimated noise is output, and the first input signal and the first estimated noise are used to input the first input signal.
- a first estimated speech from which a stationary noise component is removed from the signal is output, and at least the first input signal and the first estimated speech are used to obtain a stationary noise component included in the first input signal.
- a second non-stationary noise component composed of the sum is estimated, a third estimated noise is output, and is included in the first input signal using the second estimated noise and the third estimated noise.
- the stationary noise component and the second non-stationary noise component are estimated, and the stationary noise component and the second non-stationary noise component are estimated from the first input signal. Noise removing method and outputting a second estimation speech to remove specific noise components.
- the second estimated noise and the third estimated noise are respectively multiplied by the adjustment coefficient, and the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient are multiplied by the magnitude.
- a mixing coefficient for mixing the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is controlled, and the second estimated noise multiplied by the adjustment coefficient
- the stationary noise component and the second non-stationary noise component included in the first input signal are estimated by multiplying the third estimated noise multiplied by the adjustment coefficient by the respective mixing coefficients and adding them.
- the second estimated noise and the third estimated noise are multiplied by the adjustment coefficient, respectively, and the value of the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is 14.
- a stationary noise component included in the first input signal is estimated, the first estimated noise is output, and the first input is performed using the first input signal and the first estimated noise.
- a first estimated speech from which a stationary noise component is removed from the signal is output, and at least the first input signal and the first estimated speech are used to obtain a stationary noise component included in the first input signal.
- a second non-stationary noise component composed of the sum is estimated, a third estimated noise is output, and is included in the first input signal using the second estimated noise and the third estimated noise.
- the stationary noise component and the second non-stationary noise component are estimated, and the stationary noise component and the second non-stationary noise component are estimated from the first input signal.
- a second estimated speech from which a noise component is removed is output, and the second estimated speech is detected by using the second estimated noise or a normalized speech obtained by normalizing with the first estimated noise. Voice detection method.
- the second estimated noise and the third estimated noise are respectively multiplied by the adjustment coefficient, and the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient are multiplied by the magnitude.
- a mixing coefficient for mixing the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is controlled, and the second estimated noise multiplied by the adjustment coefficient
- the stationary noise component and the second non-stationary noise component included in the first input signal are estimated by multiplying the third estimated noise multiplied by the adjustment coefficient by the respective mixing coefficients and adding them.
- the second estimated noise and the third estimated noise are multiplied by the adjustment coefficient, respectively, and the value of the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is 18.
- a stationary noise component included in the first input signal is estimated, the first estimated noise is output, and the first input is performed using the first input signal and the first estimated noise.
- a first estimated speech from which a stationary noise component is removed from the signal is output, and at least the first input signal and the first estimated speech are used to obtain a stationary noise component included in the first input signal.
- a second non-stationary noise component composed of the sum is estimated, a third estimated noise is output, and is included in the first input signal using the second estimated noise and the third estimated noise.
- the stationary noise component and the second non-stationary noise component are estimated, and the stationary noise component and the second non-stationary noise component are estimated from the first input signal.
- the second estimated speech from which the noise component is removed is output, the second estimated speech is normalized with the second estimated noise or the first estimated noise, and the speech is detected.
- a speech recognition method characterized by recognizing speech by receiving the estimated speech and the detection result.
- the second estimated noise and the third estimated noise are respectively multiplied by the adjustment coefficient, and the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient are multiplied by the magnitude.
- a mixing coefficient for mixing the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is controlled, and the second estimated noise multiplied by the adjustment coefficient
- the stationary noise component and the second non-stationary noise component included in the first input signal are estimated by multiplying the third estimated noise multiplied by the adjustment coefficient by the respective mixing coefficients and adding them.
- the second estimated noise and the third estimated noise are multiplied by the adjustment coefficient, respectively, and the value of the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is the value.
- the stationary noise component included in the first input signal is estimated, the first estimated noise is output, and the first input is performed using the first input signal and the first estimated noise.
- a first estimated speech from which a stationary noise component is removed from the signal is output, and at least the first input signal and the first estimated speech are used to obtain a stationary noise component included in the first input signal.
- a second non-stationary noise component composed of the sum is estimated, a third estimated noise is output, and is included in the first input signal using the second estimated noise and the third estimated noise.
- the stationary noise component and the second non-stationary noise component are estimated, and the stationary noise component and the second non-stationary noise component are estimated from the first input signal.
- the second estimated speech from which the noise component is removed is output, the second estimated speech is normalized with the second estimated noise or the first estimated noise, and the speech is detected.
- the third estimated speech obtained by removing the stationary noise component from the first input signal is output using the input signal and the second estimated noise, and the third estimated speech and the detection result are received as speech.
- a speech recognition method characterized by recognizing a voice.
- the second estimated noise and the third estimated noise are multiplied by the adjustment coefficient, respectively, and the magnitude of the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient are obtained.
- a mixing coefficient for mixing the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is controlled, and the second estimated noise multiplied by the adjustment coefficient
- the stationary noise component and the second non-stationary noise component included in the first input signal are estimated by multiplying the third estimated noise multiplied by the adjustment coefficient by the respective mixing coefficients and adding them.
- the second estimated noise and the third estimated noise are multiplied by the adjustment coefficient, respectively, and the value of the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is 26.
- the second estimated noise multiplied by the adjustment coefficient is multiplied by the second estimated noise and the third estimated noise, respectively, and the third estimated noise multiplied by the adjusted coefficient is multiplied by the computer.
- a mixing coefficient for mixing the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is controlled according to the magnitude, and the second estimated noise is multiplied by the second estimated noise multiplied by the adjustment coefficient.
- the stationary noise component and the second non-stationary noise component included in the first input signal are obtained by multiplying the third estimated noise multiplied by the estimated noise and the adjustment coefficient after multiplying the respective mixed coefficients.
- the noise removal program according to appendix 29, which executes a process for estimating the noise.
- the second estimated noise multiplied by the adjustment coefficient is multiplied by the second estimated noise and the third estimated noise, respectively, and the third estimated noise multiplied by the adjustment coefficient is multiplied by the computer.
- the noise removal program according to supplementary note 29, wherein a process of estimating noise included in the first input signal is executed by selecting one having a larger value.
- the first input signal and the second input signal are transmitted to the computer so that the noise removal target speech included in the first input signal is larger than the noise removal target speech included in the second input signal.
- the noise removal program according to any one of supplementary note 29 to supplementary note 31, which executes a process of inputting the input signal.
- a voice detection program for executing a process of detecting a voice using a normalized voice normalized with the first estimated noise.
- the second estimated noise multiplied by the adjustment coefficient is multiplied by the second estimated noise and the third estimated noise respectively, and the third estimated noise multiplied by the adjusted coefficient is multiplied by the computer.
- a mixing coefficient for mixing the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is controlled according to the magnitude, and the second estimated noise is multiplied by the second estimated noise multiplied by the adjustment coefficient.
- the stationary noise component and the second non-stationary noise component included in the first input signal are obtained by multiplying the third estimated noise multiplied by the estimated noise and the adjustment coefficient after multiplying the respective mixed coefficients. 34.
- the voice detection program according to supplementary note 33, which executes a process of estimating the frequency.
- the second estimated noise multiplied by the adjustment coefficient is multiplied by the second estimated noise and the third estimated noise, respectively, and the third estimated noise multiplied by the adjustment coefficient is multiplied by the computer.
- 34 The voice detection program according to supplementary note 33, wherein a process of estimating noise included in the first input signal is executed by selecting one having a larger value.
- the computer uses the first input signal and the second input signal so that the noise removal target speech included in the first input signal is larger than the noise removal target speech included in the second input signal.
- the voice detection program according to any one of supplementary note 33 to supplementary note 35, which executes a process of inputting the input signal.
- a voice recognition program for executing a process of detecting a voice using a normalized voice normalized with the first estimated noise and a process of recognizing the voice in response to the first estimated voice and the voice detection result .
- the second estimated noise multiplied by the adjustment coefficient is multiplied by the second estimated noise and the third estimated noise, respectively, and the third estimated noise multiplied by the adjusted coefficient is multiplied by the computer.
- a mixing coefficient for mixing the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is controlled according to the magnitude, and the second estimated noise is multiplied by the second estimated noise multiplied by the adjustment coefficient.
- the stationary noise component and the second non-stationary noise component included in the first input signal are obtained by multiplying the third estimated noise multiplied by the estimated noise and the adjustment coefficient after multiplying the respective mixed coefficients.
- the speech recognition program according to appendix 37 which executes a process of estimating
- the second estimated noise multiplied by the adjustment coefficient is multiplied by the second estimated noise and the third estimated noise, respectively, and the third estimated noise multiplied by the adjustment coefficient is multiplied by the computer. 38.
- the speech recognition program according to supplementary note 37 wherein a process of estimating noise included in the first input signal is executed by selecting a larger value.
- the computer uses the first input signal and the second input signal so that the noise removal target speech included in the first input signal is larger than the noise removal target speech included in the second input signal.
- the speech recognition program according to any one of supplementary note 37 to supplementary note 39 for executing a process of inputting the input signal.
- a stationary noise component is detected from the first input signal using the process of detecting the voice using the normalized voice normalized with the first estimated noise, and the first input signal and the second estimated noise.
- a speech recognition program for recognizing speech in response to a process of outputting the removed third estimated speech and the third estimated speech and speech detection results.
- the second estimated noise multiplied by the adjustment coefficient and the second estimated noise multiplied by the adjustment coefficient are multiplied by the second estimated noise and the third estimated noise, respectively.
- a mixing coefficient for mixing the second estimated noise multiplied by the adjustment coefficient and the third estimated noise multiplied by the adjustment coefficient is controlled according to the magnitude, and the second estimated noise is multiplied by the second estimated noise multiplied by the adjustment coefficient.
- the stationary noise component and the second non-stationary noise component included in the first input signal are obtained by multiplying the third estimated noise multiplied by the estimated noise and the adjustment coefficient after multiplying each of the mixing coefficients. 42.
- the speech recognition program according to appendix 41 which executes a process for estimating
- the second estimated noise multiplied by the adjustment coefficient is multiplied by the second estimated noise and the third estimated noise, respectively, and the third estimated noise multiplied by the adjustment coefficient is multiplied by the computer. 42.
- the speech recognition program according to appendix 41 wherein a process of estimating noise included in the first input signal is executed by selecting a larger value.
- the computer uses the first input signal and the second input signal so that the noise removal target speech included in the first input signal is larger than the noise removal target speech included in the second input signal. 44.
- the speech recognition program according to any one of appendix 41 to appendix 43, which executes a process of inputting an input signal of No. 41.
Abstract
Description
以下、本発明の第1の実施形態を図面を参照して説明する。
X2(f,t)=H0(f,t)S(f,t)+Σ_{k=1}^{K}N(k,f,t) (式2)
W(f,t)=S’’1(f,t)/{S’’1(f,t)+Ns’1(f,t)}
S’’1(f,t)=0.98×S’’1(f,t-1)+0.02×max[X1(f,t)-Ns’1(f,t),0]
WI(f,t)=Ns’1(f,t)/{S’1(f,t)+Ns’1(f,t)}
または、
WI(f,t)=1-S’1(f,t)/{S’1(f,t)+Ns’1(f,t)}
H’(f,t)=ave_{t}[X1(f,t)]/ave_{t}[X2(f,t)]
α(f,t)=0.0 for βs(f,t)Ns’2(f,t) >= βn(f,t)Nn’1(f,t)
W(f,t)=S’’2(f,t)/{S’’2(f,t)+N’(f,t)}
S’’2(f,t)=0.98×S’’2(f,t-1)+0.02×max[X1(f,t)-N’(f,t),0]
以下、本発明の第2の実施形態を図面を参照して説明する。
Ns’1(f,t)=(1-γ(f,t))Ns’1(f,t)+γ(f,t)Nn’1(f,t)
検出結果=非対象音声区間 for Sn’(t) < Th
以下、本発明の第3の実施形態を図面を参照して説明する。
102 第2のマイク
111 第1の雑音推定部
112 第2の雑音推定部
113 第3の雑音推定部
114 推定雑音統合部
121 第1の雑音除去部
122 第2の雑音除去部
123 第3の雑音除去部
131 正規化部
132 音声検出部
133 音声認識部
200 端末
201 タッチパネル
300 話者
400 空調機
500 テレビ
611 第1の雑音推定部
612 第2の雑音推定部
621 第1の雑音減衰部
622 第2の雑音減衰部
631 音声パタン記憶部
Claims (9)
- 第1の入力信号に含まれる定常的な雑音成分を推定し、第1の推定雑音を出力する第1の雑音推定部と、
前記第1の入力信号と前記第1の雑音推定部からの第1の推定雑音とを用いて、前記第1の入力信号から定常的な雑音成分を除去した第1の推定音声を出力する第1の雑音除去部と、
前記第1の入力信号と前記第1の雑音除去部からの第1の推定音声とを少なくとも用いて、前記第1の入力信号に含まれる定常的な雑音成分を再推定し、第2の推定雑音を出力する第2の雑音推定部と、
前記第1の入力信号と第2の入力信号とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と非定常的な雑音成分の和から構成される第2の非定常的な雑音成分を推定し、第3の推定雑音を出力する第3の雑音推定部と、
前記第2の雑音推定部からの第2の推定雑音と前記第3の雑音推定部からの第3の推定雑音とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と第2の非定常的な雑音成分を推定する推定雑音統合部と、
前記第1の入力信号に含まれる定常的な雑音成分と第2の非定常的な雑音成分を除去する第2の雑音除去部とを備える
ことを特徴とする雑音除去システム。 - 前記推定雑音統合部は、前記第2の雑音推定部からの第2の推定雑音と前記第3の雑音推定部からの第3の推定雑音にそれぞれ調整係数を乗じて、前記調整係数が乗じられた第2の推定雑音と前記調整係数が乗じられた第3の推定雑音の大きさに応じて、前記調整係数が乗じられた第2の推定雑音と前記調整係数が乗じられた第3の推定雑音とを混合するための混合係数を制御し、前記調整係数が乗じられた第2の推定雑音と前記調整係数が乗じられた第3の推定雑音にそれぞれの混合係数を乗じた後に加算することにより前記第1の入力信号に含まれる定常的な雑音成分と第2の非定常的な雑音成分を推定する
請求項1に記載の雑音除去システム。 - 前記推定雑音統合部は、前記第2の雑音推定部からの第2の推定雑音と前記第3の雑音推定部からの第3の推定雑音にそれぞれ調整係数を乗じて、前記調整係数が乗じられた第2の推定雑音と前記調整係数が乗じられた第3の推定雑音のうち値が大きい方を選択することにより前記第1の入力信号に含まれる雑音を推定する
請求項1に記載の雑音除去システム。 - 入力した音声を第1の入力信号として出力する第1の音声入力装置と、入力した音声を第2の入力信号として出力する第2の音声入力装置とを備え、
前記第1の音声入力装置に入力される雑音除去の対象となる音声が、前記第2の音声入力装置に入力される雑音除去の対象となる音声よりも大きい
請求項1から請求項3のうちのいずれか1項に記載の雑音除去システム。 - 第1の入力信号に含まれる定常的な雑音成分を推定し、第1の推定雑音を出力する第1の雑音推定部と、
前記第1の入力信号と前記第1の雑音推定部からの第1の推定雑音とを用いて、前記第1の入力信号から定常的な雑音成分を除去した第1の推定音声を出力する第1の雑音除去部と、
前記第1の入力信号と前記第1の雑音除去部からの第1の推定音声とを少なくとも用いて、前記第1の入力信号に含まれる定常的な雑音成分を再推定し、第2の推定雑音を出力する第2の雑音推定部と、
前記第1の入力信号と第2の入力信号とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と非定常的な雑音成分の和から構成される第2の非定常的な雑音成分を推定し、第3の推定雑音を出力する第3の雑音推定部と、
前記第2の雑音推定部からの第2の推定雑音と前記第3の雑音推定部からの第3の推定雑音とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と第2の非定常的な雑音成分を推定する推定雑音統合部と、
前記第1の入力信号から定常的な雑音成分と第2の非定常的な雑音成分を除去した第2の推定音声を出力する第2の雑音除去部と、
前記第2の雑音除去部からの第2の推定音声を前記第2の雑音推定部からの第2の推定雑音または前記第1の雑音推定部からの第1の推定雑音で正規化する正規化部と、
前記正規化部からの正規化音声を用いて音声を検出する音声検出部とを備える
ことを特徴とする音声検出システム。 - 第1の入力信号に含まれる定常的な雑音成分を推定し、第1の推定雑音を出力する第1の雑音推定部と、
前記第1の入力信号と前記第1の雑音推定部からの第1の推定雑音とを用いて、前記第1の入力信号から定常的な雑音成分を除去した第1の推定音声を出力する第1の雑音除去部と、
前記第1の入力信号と前記第1の雑音除去部からの第1の推定音声とを少なくとも用いて、前記第1の入力信号に含まれる定常的な雑音成分を再推定し、第2の推定雑音を出力する第2の雑音推定部と、
前記第1の入力信号と第2の入力信号とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と非定常的な雑音成分の和から構成される第2の非定常的な雑音成分を推定し、第3の推定雑音を出力する第3の雑音推定部と、
前記第2の雑音推定部からの第2の推定雑音と前記第3の雑音推定部からの第3の推定雑音とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と第2の非定常的な雑音成分を推定する推定雑音統合部と、
前記第1の入力信号から定常的な雑音成分と第2の非定常的な雑音成分を除去した第2の推定音声を出力する第2の雑音除去部と、
前記第2の雑音除去部からの第2の推定音声を前記第2の雑音推定部からの第2の推定雑音または前記第1の雑音推定部からの第1の推定雑音で正規化する正規化部と、
前記正規化部からの正規化音声を用いて音声を検出する音声検出部と、
前記第1の雑音除去部からの第1の推定音声と前記音声検出部からの検出結果を受けて音声を認識する音声認識部とを備える
ことを特徴とする音声認識システム。 - 第1の入力信号に含まれる定常的な雑音成分を推定し、第1の推定雑音を出力する第1の雑音推定部と、
前記第1の入力信号と前記第1の雑音推定部からの第1の推定雑音とを用いて、前記第1の入力信号から定常的な雑音成分を除去した第1の推定音声を出力する第1の雑音除去部と、
前記第1の入力信号と前記第1の雑音除去部からの第1の推定音声とを少なくとも用いて、前記第1の入力信号に含まれる定常的な雑音成分を再推定し、第2の推定雑音を出力する第2の雑音推定部と、
前記第1の入力信号と第2の入力信号とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と非定常的な雑音成分の和から構成される第2の非定常的な雑音成分を推定し、第3の推定雑音を出力する第3の雑音推定部と、
前記第2の雑音推定部からの第2の推定雑音と前記第3の雑音推定部からの第3の推定雑音とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と第2の非定常的な雑音成分を推定する推定雑音統合部と、
前記第1の入力信号から定常的な雑音成分と第2の非定常的な雑音成分を除去した第2の推定音声を出力する第2の雑音除去部と、
前記第2の雑音除去部からの第2の推定音声を前記第2の雑音推定部からの第2の推定雑音または前記第1の雑音推定部からの第1の推定雑音で正規化する正規化部と、
前記正規化部からの正規化音声を用いて音声を検出する音声検出部と、
前記第1の入力信号と前記第2の雑音推定部からの第2の推定雑音とを用いて、前記第1の入力信号から定常的な雑音成分を除去した第3の推定音声を出力する第3の雑音除去部と、
前記第3の雑音除去部からの第3の推定音声と前記音声検出部からの検出結果を受けて音声を認識する音声認識部とを備える
ことを特徴とする音声認識システム。 - 第1の入力信号に含まれる定常的な雑音成分を推定し、第1の推定雑音を出力し、
前記第1の入力信号と第1の推定雑音とを用いて、前記第1の入力信号から定常的な雑音成分を除去した第1の推定音声を出力し、
前記第1の入力信号と第1の推定音声とを少なくとも用いて、前記第1の入力信号に含まれる定常的な雑音成分を再推定し、第2の推定雑音を出力し、
前記第1の入力信号と第2の入力信号とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と非定常的な雑音成分の和から構成される第2の非定常的な雑音成分を推定し、第3の推定雑音を出力し、
第2の推定雑音と第3の推定雑音とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と第2の非定常的な雑音成分を推定し、
前記第1の入力信号に含まれる定常的な雑音成分と第2の非定常的な雑音成分を除去する
ことを特徴とする雑音除去方法。 - コンピュータに、
第1の入力信号に含まれる定常的な雑音成分を推定し、第1の推定雑音を出力する処理と、
前記第1の入力信号と第1の推定雑音とを用いて、前記第1の入力信号から定常的な雑音成分を除去した第1の推定音声を出力する処理と、
前記第1の入力信号と第1の推定音声とを少なくとも用いて、前記第1の入力信号に含まれる定常的な雑音成分を再推定し、第2の推定雑音を出力する処理と、
前記第1の入力信号と第2の入力信号とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と非定常的な雑音成分の和から構成される第2の非定常的な雑音成分を推定し、第3の推定雑音を出力する処理と、
第2の推定雑音と第3の推定雑音とを用いて、前記第1の入力信号に含まれる定常的な雑音成分と第2の非定常的な雑音成分を推定する処理と、
前記第1の入力信号に含まれる定常的な雑音成分と第2の非定常的な雑音成分を除去する処理とを実行させる
ための雑音除去プログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014557199A JP6265136B2 (ja) | 2013-01-17 | 2013-12-25 | 雑音除去システム、音声検出システム、音声認識システム、雑音除去方法および雑音除去プログラム |
US14/760,814 US9449616B2 (en) | 2013-01-17 | 2013-12-25 | Noise reduction system, speech detection system, speech recognition system, noise reduction method, and noise reduction program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013006044 | 2013-01-17 | ||
JP2013-006044 | 2013-01-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014112023A1 true WO2014112023A1 (ja) | 2014-07-24 |
Family
ID=51209149
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/007573 WO2014112023A1 (ja) | 2013-01-17 | 2013-12-25 | 雑音除去システム、音声検出システム、音声認識システム、雑音除去方法および雑音除去プログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US9449616B2 (ja) |
JP (1) | JP6265136B2 (ja) |
WO (1) | WO2014112023A1 (ja) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6559576B2 (ja) * | 2016-01-05 | 2019-08-14 | 株式会社東芝 | 雑音抑圧装置、雑音抑圧方法及びプログラム |
GB201615538D0 (en) * | 2016-09-13 | 2016-10-26 | Nokia Technologies Oy | A method , apparatus and computer program for processing audio signals |
US10535360B1 (en) * | 2017-05-25 | 2020-01-14 | Tp Lab, Inc. | Phone stand using a plurality of directional speakers |
JP6948609B2 (ja) * | 2018-03-30 | 2021-10-13 | パナソニックIpマネジメント株式会社 | 騒音低減装置 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0667691A (ja) * | 1992-08-18 | 1994-03-11 | Nec Corp | 雑音除去装置 |
JP2000163099A (ja) * | 1998-11-25 | 2000-06-16 | Brother Ind Ltd | 雑音除去装置、音声認識装置および記憶媒体 |
JP2003195882A (ja) * | 2001-12-21 | 2003-07-09 | Fujitsu Ltd | 信号処理システムおよび方法 |
JP2006163231A (ja) * | 2004-12-10 | 2006-06-22 | Internatl Business Mach Corp <Ibm> | 雑音除去装置、雑音除去プログラム、及び雑音除去方法 |
JP2009075536A (ja) * | 2007-08-28 | 2009-04-09 | Nippon Telegr & Teleph Corp <Ntt> | 定常率算出装置、雑音レベル推定装置、雑音抑圧装置、それらの方法、プログラム及び記録媒体 |
JP2011186384A (ja) * | 2010-03-11 | 2011-09-22 | Fujitsu Ltd | 雑音推定装置、雑音低減システム、雑音推定方法、及びプログラム |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007193517A (ja) | 2006-01-18 | 2007-08-02 | Sharp Corp | 電波方式認識を用いた搭載部品チェックシステム、搭載部品チェック方法、搭載部品チェックプログラム、および搭載部品チェックプログラムを格納した記録媒体 |
JP5423966B2 (ja) * | 2007-08-27 | 2014-02-19 | 日本電気株式会社 | 特定信号消去方法、特定信号消去装置、適応フィルタ係数更新方法、適応フィルタ係数更新装置及びコンピュータプログラム |
US8571231B2 (en) * | 2009-10-01 | 2013-10-29 | Qualcomm Incorporated | Suppressing noise in an audio signal |
-
2013
- 2013-12-25 WO PCT/JP2013/007573 patent/WO2014112023A1/ja active Application Filing
- 2013-12-25 JP JP2014557199A patent/JP6265136B2/ja active Active
- 2013-12-25 US US14/760,814 patent/US9449616B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0667691A (ja) * | 1992-08-18 | 1994-03-11 | Nec Corp | 雑音除去装置 |
JP2000163099A (ja) * | 1998-11-25 | 2000-06-16 | Brother Ind Ltd | 雑音除去装置、音声認識装置および記憶媒体 |
JP2003195882A (ja) * | 2001-12-21 | 2003-07-09 | Fujitsu Ltd | 信号処理システムおよび方法 |
JP2006163231A (ja) * | 2004-12-10 | 2006-06-22 | Internatl Business Mach Corp <Ibm> | 雑音除去装置、雑音除去プログラム、及び雑音除去方法 |
JP2009075536A (ja) * | 2007-08-28 | 2009-04-09 | Nippon Telegr & Teleph Corp <Ntt> | 定常率算出装置、雑音レベル推定装置、雑音抑圧装置、それらの方法、プログラム及び記録媒体 |
JP2011186384A (ja) * | 2010-03-11 | 2011-09-22 | Fujitsu Ltd | 雑音推定装置、雑音低減システム、雑音推定方法、及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20150356983A1 (en) | 2015-12-10 |
JP6265136B2 (ja) | 2018-01-24 |
JPWO2014112023A1 (ja) | 2017-01-19 |
US9449616B2 (en) | 2016-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3276621B1 (en) | Noise suppression device and noise suppressing method | |
US8824700B2 (en) | Multi-input noise suppression device, multi-input noise suppression method, program thereof, and integrated circuit thereof | |
US8675901B2 (en) | Howling suppression device, hearing aid, howling suppression method, and integrated circuit | |
US20080085012A1 (en) | Sound signal correcting method, sound signal correcting apparatus and computer program | |
JP2006163231A (ja) | 雑音除去装置、雑音除去プログラム、及び雑音除去方法 | |
CN106575511B (zh) | 用于估计背景噪声的方法和背景噪声估计器 | |
JP6265136B2 (ja) | 雑音除去システム、音声検出システム、音声認識システム、雑音除去方法および雑音除去プログラム | |
JP2009075536A (ja) | 定常率算出装置、雑音レベル推定装置、雑音抑圧装置、それらの方法、プログラム及び記録媒体 | |
JP7325445B2 (ja) | ギャップ信頼度を用いた背景雑音推定 | |
US9467571B2 (en) | Echo removal | |
US8259961B2 (en) | Audio processing apparatus and program | |
KR20170032603A (ko) | 전자 장치, 그의 반향 신호 제거 방법 및 비일시적 컴퓨터 판독가능 기록매체 | |
WO2014194013A1 (en) | Echo removal | |
US10438606B2 (en) | Pop noise control | |
EP3288030B1 (en) | Gain adjustment apparatus and gain adjustment method | |
WO2012176932A1 (ja) | 音声処理装置、音声処理方法および音声処理プログラム | |
JP2011069901A (ja) | 雑音除去装置 | |
JP6182862B2 (ja) | 信号処理装置、信号処理方法、及び信号処理プログラム | |
JP2000010593A (ja) | スペクトル雑音除去装置 | |
JP4395105B2 (ja) | 音響結合量推定方法、音響結合量推定装置、プログラム、記録媒体 | |
JP6716933B2 (ja) | 雑音推定装置、プログラム及び方法、並びに、音声処理装置 | |
US10109291B2 (en) | Noise suppression device, noise suppression method, and computer program product | |
JP6720772B2 (ja) | 信号処理装置、信号処理方法、及び、信号処理プログラム | |
JP2001067092A (ja) | 音声検出装置 | |
JP2003177783A (ja) | 音声認識装置、音声認識方式及び音声認識プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13871719 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2014557199 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14760814 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13871719 Country of ref document: EP Kind code of ref document: A1 |