WO2012026126A1 - 音源分離装置、音源分離方法、及び、プログラム - Google Patents
音源分離装置、音源分離方法、及び、プログラム Download PDFInfo
- Publication number
- WO2012026126A1 WO2012026126A1 PCT/JP2011/004734 JP2011004734W WO2012026126A1 WO 2012026126 A1 WO2012026126 A1 WO 2012026126A1 JP 2011004734 W JP2011004734 W JP 2011004734W WO 2012026126 A1 WO2012026126 A1 WO 2012026126A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound source
- unit
- signal
- noise
- frequency
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000000926 separation method Methods 0.000 claims abstract description 113
- 238000004364 calculation method Methods 0.000 claims abstract description 94
- 238000001228 spectrum Methods 0.000 claims abstract description 26
- 238000012545 processing Methods 0.000 claims description 102
- 230000001629 suppression Effects 0.000 claims description 32
- 230000003044 adaptive effect Effects 0.000 claims description 16
- 230000003595 spectral effect Effects 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 6
- 238000013016 damping Methods 0.000 claims 1
- 238000010183 spectrum analysis Methods 0.000 abstract description 18
- 230000003068 static effect Effects 0.000 abstract 2
- 238000010586 diagram Methods 0.000 description 14
- 238000009499 grossing Methods 0.000 description 14
- 238000012937 correction Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 6
- 238000001914 filtration Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012887 quadratic function Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R29/00—Monitoring arrangements; Testing arrangements
- H04R29/004—Monitoring arrangements; Testing arrangements for microphones
- H04R29/005—Microphone arrays
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/20—Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2499/00—Aspects covered by H04R or H04S not otherwise provided for in their subgroups
- H04R2499/10—General applications
- H04R2499/13—Acoustic transducers and sound field adaptation in vehicles
Definitions
- the present invention uses a plurality of microphones, and a sound source that separates a sound source signal coming from a target sound source from a signal in which a plurality of audio signals emitted from a plurality of sound sources and a plurality of acoustic signals such as various environmental noises are mixed
- the present invention relates to a separation device, a sound source separation method, and a program.
- An example in which these treatments are particularly necessary is, for example, in an automobile environment.
- a call using a mobile phone while driving is generally using a microphone installed away from the inside of the car due to the spread of the mobile phone, which significantly deteriorates the call quality.
- speech recognition is performed during driving in an automobile environment
- the speech recognition performance is deteriorated because the speech is spoken in the same situation.
- Advances in current speech recognition technology make it possible to recover a significant portion of the degraded performance against the problem of speech recognition rate degradation for stationary noise.
- it is difficult to cope with the current speech recognition technology there is a problem of deterioration in recognition performance when a plurality of speakers speak simultaneously.
- Current voice recognition technology is low in technology that recognizes mixed speech of two people who are spoken at the same time, so when using a voice recognition device, passengers other than the speaker are restricted from speaking, and there is a situation that restricts the passenger's behavior It has occurred.
- the sound source separation device described in Patent Document 1 performs beamformer processing for attenuating sound source signals arriving from directions symmetric with respect to a straight line connecting two microphones, and calculates beamformer output.
- the spectrum information of the target sound source is extracted based on the difference between the power spectrum information.
- the difference between the two power spectrum information calculated after the beamformer processing is equal to or larger than a predetermined threshold
- the difference is recognized as the target sound and output as it is.
- the difference between the two power spectrum information is less than a predetermined threshold
- the difference is recognized as noise and the output of the frequency band is set to zero. Therefore, for example, when the sound source separation device of Patent Document 1 is operated in an environment where there is diffusive noise whose arrival direction is not fixed in a specific direction such as driving noise of an automobile, a specific frequency band is largely deleted. In some cases, diffusive noise is irregularly distributed to the sound source separation result and becomes musical noise. Note that the musical noise is an unerased noise and is an isolated component on the time axis and the frequency axis, so that it can be heard as an unnatural and annoying sound.
- Patent Document 1 discloses that post-filter processing is performed before beamformer processing to reduce diffusive noise, stationary noise, and the like, thereby preventing the generation of musical noise after sound source separation.
- the microphones are arranged apart from each other or when the microphones are molded in a case such as a mobile phone or a headset, the volume difference or phase difference of noise input to both microphones increases. Therefore, if the gain obtained by one microphone is applied to the other microphone as it is, the target sound is excessively suppressed for each band or a large amount of noise remains. As a result, it becomes difficult to sufficiently prevent the generation of musical noise.
- the present invention has been made to solve the above-described problems, and a sound source separation device and sound source separation that can sufficiently reduce the occurrence of musical noise without being affected by the arrangement of microphones. It is an object to provide a method and a program.
- an aspect of the present invention is a sound source separation device that separates a sound source signal from a target sound source from a mixed sound in which sound source signals emitted from a plurality of sound sources are mixed, and the mixed sound A line-sum connecting the two microphones is obtained by performing a product-sum operation in the frequency domain using a first coefficient different from each other for each output signal from the microphone pair composed of two microphones.
- a first beamformer processing unit for attenuating a sound source signal arriving from a region opposite to a region including the direction of the target sound source with respect to an intersecting plane, and each output signal from the microphone pair, Multiplying the first coefficient different from each other and a second coefficient having a complex conjugate relationship in the frequency domain, and performing a product-sum operation in the frequency domain,
- a second beamformer processing unit for attenuating a sound source signal coming from a region including the direction of the target sound source across a plane; and a power value for each frequency from the signal obtained by the first beamformer processing unit
- a power calculation unit that calculates second spectrum information having a power value for each frequency from the signal obtained by the second beamformer processing unit; and A weighting coefficient for calculating a weighting coefficient for each frequency for multiplying the signal obtained by the first beamformer processing unit according to the difference between the power values for each frequency of the spectrum information and the second spectrum information And a multiplication result of the signal obtained by the first beamformer processing unit and the weight
- a sound source separation method executed by a sound source separation device including a first beamformer processing unit, a second beamformer processing unit, a power calculation unit, and a weighting coefficient calculation unit.
- the first beamformer processing unit mutually outputs respective output signals from a pair of microphones including two microphones to which mixed sound obtained by mixing sound source signals emitted from a plurality of sound sources is input.
- By performing a product-sum operation in the frequency domain using different first coefficients it arrives from a region opposite to the region that includes the direction of the target sound source across the plane that intersects the line segment that connects the two microphones.
- the result obtained from the area including the direction of the target sound source across the plane is obtained.
- a second step of attenuating an incoming sound source signal; and the power calculation unit calculates first spectrum information having a power value for each frequency from the signal obtained by the first processing step;
- a third step of calculating second spectral information having a power value for each frequency from the signal obtained in the second processing step; and the weighting coefficient calculating unit includes the first spectral information and the second spectral information.
- This is a sound source separation method.
- Another aspect of the present invention is different from each other in output signals from a pair of microphones including two microphones to which a mixed sound obtained by mixing sound source signals emitted from a plurality of sound sources is input to a computer.
- a product-sum operation in the frequency domain using the first coefficient it arrives from a region opposite to the region including the direction of the target sound source with a plane intersecting the line segment connecting the two microphones as a boundary.
- a fourth processing step for calculating a weighting coefficient for each frequency for multiplying the obtained signal, and the signal obtained by the first processing step and the fourth processing step.
- a sound source separation program for separating a sound source signal from the target sound source from the mixed sound based on a multiplication result with the weighting coefficient.
- FIG. 5 is an enlarged view of a part of the processing result of FIG. 4. It is a figure which shows the structure of a noise estimation part. It is a figure which shows the structure of a noise equalizer part.
- FIG. 1 It is a figure which shows the graph which compared the case of a near sound and a long distance sound about the output value of the beam former 30 (microphone space
- FIG. 1 is a diagram showing a basic configuration of a sound source separation system according to the first embodiment.
- This system includes two microphones (hereinafter referred to as “microphones”) 10 and 11 and a sound source separation device 1.
- microphones hereinafter referred to as “microphones”
- the embodiment will be described with two microphones, but the number of microphones is not limited to two as long as it is at least two.
- the sound source separation device 1 includes a CPU (not shown) that controls the whole and executes arithmetic processing, hardware including a storage device such as a ROM, a RAM, and a hard disk device, and programs and data stored in the storage device. Including software. Each functional block of the sound source separation device 1 is realized by these hardware and software.
- the two microphones 10 and 11 are installed apart from each other on the plane, and receive the signals emitted from the two sound sources R1 and R2. At this time, these two sound sources R1 and R2 are divided into two regions (hereinafter referred to as “left and right of the separation surface”) separated by a plane (hereinafter referred to as a separation surface) intersecting with a line segment connecting the two microphones 10 and 11. However, it does not necessarily have to be located symmetrically with respect to the separation plane. In the present embodiment, an example in which the separation surface is a plane that intersects perpendicularly with a plane that includes the line segment connecting the two microphones 10 and 11 in the plane and that passes through the midpoint of the line segment will be described. To do.
- the sound generated from the sound source R1 is the target sound to be acquired, and the sound generated from the sound source R2 is the noise to be suppressed (the same applies throughout this specification). Further, the noise is not limited to one, and there may be a plurality of noises. However, the direction of the target sound and noise shall be different.
- the two sound source signals obtained by the microphones 10 and 11 are subjected to frequency analysis for each microphone output in the spectrum analysis units 20 and 21, respectively, and the beamformer unit 3 sets the blind spots on the left and right sides of the separation plane. Filtering is performed by the formed beam formers 30 and 31, and the power calculators 40 and 41 calculate the power of the filter output.
- the beam formers 30 and 31 preferably form blind spots symmetrically with respect to the separation surface on the left and right sides of the separation surface.
- the structure of the beam former part 3 which consists of the beam formers 30 and 31 is demonstrated.
- the multipliers 100a, 100b, 100c, and 100d use the filter coefficient w 1 ( ⁇ ), w 2 ( ⁇ ), w 1 * ( ⁇ ), w 2 * ( ⁇ ) (* indicates that they are in a complex conjugate relationship) and multiplication, respectively.
- the output ds 2 ( ⁇ ) of the beam former 31 can be obtained by the following equation.
- the beamformer unit 3 forms a blind spot at a symmetrical position with respect to the separation plane by using the complex conjugate filter coefficient.
- ⁇ represents an angular frequency
- ⁇ 2 ⁇ f with respect to the frequency f.
- the power calculators 40 and 41 calculate the output ds 1 ( ⁇ ) and ds 2 ( ⁇ ) from the beam former 30 and the beam former 31 according to the following calculation formulas as power spectrum information ps 1 ( ⁇ ) and ps 2 ( ⁇ to convert to).
- Weighting coefficient calculation unit The outputs ps 1 ( ⁇ ) and ps 2 ( ⁇ ) of the power calculators 40 and 41 are used as two inputs of the weighting coefficient calculator 50.
- the weighting coefficient calculator 50 receives the power spectrum information output from the two beam formers 30 and 31 and outputs a weighting coefficient G BSA ( ⁇ ) for each frequency.
- Weighting factor G BSA (omega) is a value based on the difference between the power spectrum information, as an example of a weighting factor G BSA (omega), the difference between ps 1 (omega) and ps 2 ( ⁇ ) for each frequency was calculated, the value of ps 1 (omega) is obtained by dividing the difference between the square root of ps 1 if greater than the value of ps 2 ( ⁇ ) ( ⁇ ) and ps 2 ( ⁇ ) by the square root of ps 1 ( ⁇ ) When the value of ps 1 ( ⁇ ) is less than or equal to ps 2 ( ⁇ ), an output value of a monotonically increasing function with a value indicating 0 as a domain can be considered.
- the weighting coefficient G BSA ( ⁇ ) is expressed as follows.
- ds 1 ( ⁇ ) is a signal obtained by linear processing with respect to the observation signal X ( ⁇ , ⁇ 1 , ⁇ 2 ).
- G BSA ( ⁇ ) ds 1 ( ⁇ ) is a signal obtained by nonlinear processing on ds 1 ( ⁇ ).
- FIGS. 4B and 4C are examples of the spectrogram of G BSA ( ⁇ ) ds 1 ( ⁇ ).
- a sigmoid function is applied to the monotonically increasing function F (x) of the sound source separation device according to the present embodiment.
- FIG. 5 is an enlarged view of a part of the spectrogram (reference numeral 5) in FIGS. 4A to 4C in the time axis direction.
- the processing result of the sound source separation device of the present embodiment FIG. 5C.
- the noise component energy is unevenly distributed in the time direction and the frequency direction, so that a musical noise is generated.
- the noise component of the spectrogram in FIG. 4 (c) shows that the energy of the noise component is not unevenly distributed in the time direction and the frequency direction unlike the input signal, and it can be seen that there is little musical noise.
- G BSA ( ⁇ ) ds 1 ( ⁇ ) is a sound source signal from a target sound source in which musical noise is sufficiently reduced.
- G BSA ( ⁇ ) ds 1 ( ⁇ ) is nonlinear processing.
- the value of G BSA ( ⁇ ) varies greatly for each frequency bin and for each frame, and tends to cause musical noise. Therefore, musical noise is reduced by adding a signal before nonlinear processing in which no musical noise is generated to the output after nonlinear processing.
- a signal X BSA ( ⁇ ) obtained by multiplying the output G BSA ( ⁇ ) by the output ds 1 ( ⁇ ) of the beam former 30 and the output ds 1 ( ⁇ ) of the beam former 30 are set to a predetermined value. Calculate the signal that is the sum of the percentages.
- X BSA ( ⁇ ) mixed with the output ds 1 ( ⁇ ) of the beam former 30 at a certain ratio (X S ( ⁇ )) is expressed by the following equation.
- ⁇ S is a weighting factor that determines a ratio at the time of mixing, and is a value that is larger than 0 and smaller than 1.
- the musical noise reduction gain calculation unit 60 can include a subtraction unit that subtracts 1 from G BSA ( ⁇ ), a multiplication unit that multiplies it by a weighting coefficient ⁇ S , and an addition unit that adds 1 to it. That is, from these configurations, the gain value G S ( ⁇ ) in which the musical noise is reduced is recalculated as the gain multiplied by the output ds 1 ( ⁇ ) of the beam former 30.
- a signal obtained based on the multiplication result of the gain value G S ( ⁇ ) and the output ds 1 ( ⁇ ) of the beamformer 30 is a target sound source in which musical noise is reduced compared to G BSA ( ⁇ ) ds 1 ( ⁇ ).
- the gain value G S ( ⁇ ) is necessarily larger than G BSA ( ⁇ )
- the noise component is increased while the musical noise is reduced. Therefore, in order to suppress residual noise, a residual noise suppression gain calculation unit 110 is provided after the musical noise reduction gain calculation unit 60, and an optimum gain value is recalculated.
- the residual noise of X S ( ⁇ ) obtained by multiplying the output ds 1 ( ⁇ ) of the beam former 30 by the gain G S ( ⁇ ) calculated by the musical noise reduction gain calculation unit 60 includes sudden noise. . Therefore, a blocking matrix unit 70 and a noise equalizer unit 100 described below are introduced in the calculation of the estimated noise used by the residual noise suppression gain calculation unit 110 so that sudden noise can also be estimated.
- the noise estimation unit 70 performs adaptive filtering from the two signals obtained by the microphones 10 and 11, and cancels the signal component from the sound source R1 that is the target sound, thereby acquiring only the noise component.
- the signal from the sound source R1 is S (t).
- the sound from the sound source R1 reaches the microphone 10 before the sound from the sound source R2.
- Signals of sounds emitted from other sound sources are n j (t), and these are noises.
- the input x 1 microphone 10 (t), the input x 2 (t) of the microphone 11 is as follows.
- the adaptive filter unit 71 shown in FIG. 6 convolves the input signal of the microphone 10 with the adaptive filter coefficient, and calculates a pseudo signal that matches the signal component obtained by the microphone 11.
- the subtraction unit 72 subtracts the pseudo signal from the signal of the microphone 11 to calculate an error signal (noise signal) in the signal from the sound source R ⁇ b> 1 included in the microphone 11.
- This error signal x ABM (t) becomes an output signal of the noise estimation unit 70.
- the adaptive filter unit 71 updates the adaptive filter coefficient from the error signal. For example, NLMS (Normalized Least Mean Square) is used for updating the coefficient H (t) of the adaptive filter.
- the update of the adaptive filter may be controlled from an external VAD (Voice Activity Detection) value or information from the control unit 160 described later (FIGS. 6C and 6D).
- VAD Voice Activity Detection
- the threshold comparison unit 74 determines that the control signal from the control unit 160 is larger than a predetermined threshold, the coefficient H (t) of the adaptive filter is updated. good.
- the VAD value is a value indicating whether the target voice is in a speech state or a non-speech state.
- the value may be a binary transition of On / Off or a probability value having a certain range indicating the certainty of the speech state.
- the output x ABM (t) is as follows.
- noise components other than the target sound direction can be estimated to some extent.
- the target sound can be suppressed robustly due to the difference in microphone gain.
- the spatial range determined as noise can be controlled by changing the DELAY value of the filter in the delay unit 73. FIG. Therefore, the directivity can be narrowed or widened according to the DELAY value.
- any adaptive filter may be used as long as it is robust to the gain characteristic difference of the microphone.
- the output of the noise estimation unit 70 is subjected to frequency analysis by the spectrum analysis unit 80, and the noise power calculation unit 90 calculates the power for each frequency bin.
- the input of the noise estimation unit 70 may be a microphone input signal after spectrum analysis.
- the signal X BSA obtained by multiplying the output ds 1 ( ⁇ ) of the beamformer 30 by the noise amount contained in X ABM ( ⁇ ) obtained by frequency analysis of the output of the noise estimation unit 70 and the weighting coefficient G BSA ( ⁇ ).
- the amount of noise included in the signal X S ( ⁇ ) that is obtained by adding ( ⁇ ) and the output ds 1 ( ⁇ ) of the beamformer 30 at a predetermined ratio has a difference in energy amount although the shape of the spectrum is similar. is there. Therefore, the noise equalizer unit 100 performs correction in order to match the energy amounts of the two.
- FIG. 1 A block diagram of the noise equalizer unit 100 is shown in FIG.
- an output pX ABM ( ⁇ ) of the power calculation unit 90 an output G S ( ⁇ ) of the musical noise reduction gain calculation unit 60, and an output ds 1 ( ⁇ ) of the beamformer 30 are used.
- the example used will be described.
- the multiplication unit 101 performs multiplication of ds 1 ( ⁇ ) and G S ( ⁇ ).
- the power calculation unit 102 obtains the power.
- the smoothing units 103 and 104 receive the output VX ABM ( ⁇ ) of the power calculation unit 90 and the output pX of the power calculation unit 102 in a section determined as noise by receiving an external VAD value or a signal from the control unit 160 described later. Smoothing processing is performed on each of S ( ⁇ ).
- the “smoothing process” is a process of averaging data in continuous data in order to reduce the influence of data greatly deviating from other data.
- smoothing processing is performed using a first-order IIR filter, and the output pX ′ ABM ( ⁇ ) of the power calculation unit 90 and the output pX ′ S ( ⁇ ) of the power calculation unit 102 subjected to the smoothing processing are
- the output pX ABM ( ⁇ ) of the power calculation unit 90 in the processing frame and the output pX S ( ⁇ ) of the power calculation unit 102 are the outputs of the power calculation unit 90 and the power calculation unit 102 subjected to the smoothing process in the past frame. It is calculated using.
- the output pX ′ ABM ( ⁇ ) of the power calculation unit 90 subjected to the smoothing process and the output pX ′ S ( ⁇ ) of the power calculation unit 102 are calculated as in the following Expression (13-1). .
- a processing frame number m is provided, the current processing frame is m, and the previous processing frame is m-1.
- the processing in the smoothing unit 103 may be executed when the threshold value comparison unit 105 determines that the control signal from the control unit 160 is smaller than a predetermined threshold value.
- the equalizer update unit 106 calculates the output ratio of pX ′ ABM ( ⁇ ) and pX ′ S ( ⁇ ). That is, the output of the equalizer update unit 106 is as follows.
- the equalizer applying unit 107 calculates the estimated noise power p ⁇ d ( ⁇ ) included in X S ( ⁇ ) based on the output H EQ ( ⁇ ) of the equalizer updating unit 106 and the output pX ABM ( ⁇ ) of the power calculating unit 90. calculate.
- p ⁇ d ( ⁇ ) may be calculated based on the following calculation.
- the residual noise suppression gain calculation unit 110 in order to suppress a noise component remaining when the gain value G S ( ⁇ ) is applied to the output ds 1 ( ⁇ ) of the beamformer 30, a gain multiplied by ds 1 ( ⁇ ) is used. re-calculated. That is, in the residual noise suppression gain calculator 110, ds 1 ( ⁇ ) with respect to G S applied to (omega) the value X S (omega) to, X S based on the estimated value lambda d a (omega) of the residual noise component A residual noise suppression gain G T ( ⁇ ) that is a gain for appropriately removing the noise component included in ( ⁇ ) is calculated.
- a Wiener filter or the MMSE-STSA method (see Non-Patent Document 1) is often used.
- the MMSE-STSA method assumes noise as a normal distribution, sudden noise or the like may not apply to the assumption of MMSE-STSA. Therefore, in the present embodiment, an estimator that is relatively easy to suppress sudden noise is used. However, any method may be used for the estimator.
- the residual noise suppression gain calculation unit 110 calculates the gain G T ( ⁇ ) as follows. First, the residual noise suppression gain calculation unit 110 calculates an instantaneous prior SNR (clean speech-to-noise ratio (S / N)) derived based on the posterior SNR ((S + N) / N).
- the residual noise suppression gain calculation unit 110 calculates a prior SNR (clean speech-to-noise ratio (S / N)) by DECISION-DIRECTED APPROACH.
- the residual noise suppression gain calculation unit 110 calculates an optimum gain value based on the prior SNR.
- ⁇ p ( ⁇ ) in the following equation (18) is a spectral floor value that defines the lower limit value of the gain. Setting this to a large value can suppress degradation of the sound quality of the target sound, but increases the amount of residual noise. On the other hand, if the setting is small, the residual noise amount is reduced, but the sound quality of the target sound is greatly deteriorated.
- the output value of the residual noise suppression gain calculation unit 110 is expressed as follows.
- a gain value G T ( ⁇ ) is recalculated as a gain multiplied by the output ds 1 ( ⁇ ) of the beamformer 30 so that the musical noise is reduced and the residual noise is also reduced.
- the value of ⁇ d ( ⁇ ) may be adjusted according to the external VAD information and the value of the control signal of the control unit 160 of the present invention.
- the output G BSA ( ⁇ ) of the weighting coefficient calculator 50, the output G S ( ⁇ ) of the musical noise reduction gain calculator 60, or the output G T ( ⁇ ) of the residual noise suppression calculator 110 is input to the gain multiplier 130. It is used as a.
- the gain multiplier 130 multiplies the output ds 1 ( ⁇ ) of the beamformer 30 by the weighting coefficient G BSA ( ⁇ ), the musical noise reduction gain G S ( ⁇ ), or the residual noise suppression G T ( ⁇ ). Based signal X BSA ( ⁇ ) is output.
- a multiplication value of ds 1 ( ⁇ ) and G BSA ( ⁇ ) for example, a multiplication value of ds 1 ( ⁇ ) and G BSA ( ⁇ ), a multiplication value of ds 1 ( ⁇ ) and G S ( ⁇ ), or ds 1 ( A product of ⁇ ) and G T ( ⁇ ) may be used.
- the sound source signal from the target sound source obtained from the multiplication value of ds 1 ( ⁇ ) and G T ( ⁇ ) is a signal with very little musical noise and noise components.
- FIG. 8 is a diagram illustrating another configuration example of the sound source separation system according to the present embodiment. The difference between this configuration and the configuration of the sound source separation system shown in FIG. 1 is that the noise estimation unit 70 is realized in the time domain in the sound source separation system in FIG. This is the point that has been realized. In addition, about another structure, it is the same as that of the structure of the sound source separation system of FIG. In the case of this configuration, the spectrum analysis 80 is unnecessary.
- FIG. 9 is a diagram showing a basic configuration of a sound source separation system according to the second embodiment of the present invention.
- the sound source separation system according to this embodiment is characterized by having a control unit 160.
- the control unit 160 controls internal parameters of the noise estimation unit 70, the noise equalizer unit 100, and the residual noise suppression gain calculation unit 110 based on the weighting coefficient G BSA ( ⁇ ) for all frequency bands.
- the internal parameters include the step size of the adaptive filter, the spectrum floor value ⁇ of the weight coefficient G BSA ( ⁇ ), the noise amount of the estimated noise, and the like.
- control unit 160 executes the following processing. For example, the average value of the weighting coefficient G BSA ( ⁇ ) over the entire frequency band is calculated. If the average value is large, it can be determined that the voice existence probability is high. Therefore, the control unit 160 compares the calculated average value with a predetermined threshold value, and controls other blocks based on the comparison result.
- control unit 160 calculates a histogram of the weighting coefficient G BSA ( ⁇ ) calculated by the weighting coefficient calculating unit 50 every 0.1 from 0 to 1.0. Note that when the value of G BSA ( ⁇ ) is large, the probability that a voice is present is high, and when the value of G BSA ( ⁇ ) is small, the probability that a voice is present is low. We are prepared. Then, the calculated histogram is multiplied by a weight table to calculate an average value thereof, compared with a threshold value, and other blocks are controlled based on the comparison result.
- control unit 160 calculates the histogram of the weighting coefficient G BSA ( ⁇ ) every 0.1 in 0 to 1.0, and then calculates the number distributed in the range of 0.7 to 1.0, for example. Count, compare the number with a threshold, and control other blocks based on the comparison result.
- the control unit 160 may accept an output signal from at least one of the two microphones (microphones 10 and 11).
- a block diagram of the control unit 160 in this case is shown in FIG.
- the basic idea of the processing in the control unit 160 is that the signal X BSA ( ⁇ ) based on the multiplication result of ds 1 ( ⁇ ) and G BSA ( ⁇ ), the processing by the noise estimation unit 165 and the spectrum analysis unit 166 are processed.
- the energy comparison unit 167 compares the power spectral density of the output X ABM ( ⁇ ).
- control unit 160 calculates the estimated SNR D ( ⁇ ) of the target sound as follows.
- FIG. 11 is a diagram illustrating an example of a basic configuration of a sound source separation system according to the third embodiment of the present invention.
- the sound source separation device 1 in the sound source separation system shown in FIG. 11 includes spectrum analysis units 20 and 21, beam formers 30 and 31, power calculation units 40 and 41, weighting coefficient calculation unit 50, and weighting coefficient multiplication unit 310. And a time waveform conversion unit 120.
- the configuration other than the weighting coefficient multiplication unit 310 is the same as the configuration in the other embodiments described above.
- the weighting coefficient multiplication unit 310 multiplies the signal ds 1 ( ⁇ ) obtained by the beamformer 30 by the weighting coefficient calculated by the weighting coefficient calculation unit 50.
- FIG. 12 is a diagram showing another example of the basic configuration of the sound source separation system according to the third embodiment of the present invention.
- the sound source separation apparatus 1 in the sound source separation system shown in FIG. 12 includes spectrum analysis units 20 and 21, beam formers 30 and 31, power calculation units 40 and 41, weighting coefficient calculation unit 50, and weighting coefficient multiplication unit 310.
- the configuration other than the weighting coefficient multiplication unit 310, the musical noise reduction unit 320, and the residual noise suppression unit 330 is the same as the configuration in the other embodiments described above.
- the musical noise reduction unit 320 outputs a result obtained by adding the output result of the weighting coefficient multiplication unit 310 and the signal obtained from the beam former 30 at a predetermined ratio.
- the residual noise suppression unit 330 suppresses residual noise included in the output result of the musical noise reduction unit 320 based on the output result of the musical noise reduction unit 320 and the output result of the noise equalizer unit 100.
- the noise equalizer unit 100 includes the noise component included in the output result of the musical noise reduction unit 320 based on the output result of the musical noise reduction unit and the noise component calculated by the noise estimation unit 70. It is calculated.
- the weighting factor G BSA the (omega) and the signal obtained by multiplying the output ds 1 beamformer 30 ( ⁇ ) X BSA ( ⁇ ), a predetermined ratio of the output ds 1 (omega) of the beamformer 30
- the signal X S ( ⁇ ) generated by adding together may include sudden noise depending on the noise environment. Therefore, a noise estimation unit 70 and a noise equalizer unit 100 described below are introduced so that sudden noise can also be estimated.
- the sound source separation device 1 in FIG. 12 separates the sound source signal from the target sound source from the mixed sound based on the output result of the residual noise suppressing unit 330. That is, the sound source separation apparatus 1 of FIG. 12 does not calculate the musical noise reduction gain G S ( ⁇ ) or the residual noise suppression gain G T ( ⁇ ), which is the sound source separation apparatus 1 of the first embodiment and the second embodiment. to be different. Even if it is a structure like FIG. 12, there exists an effect similar to the sound source separation device 1 which concerns on 1st Embodiment.
- FIG. 13 is a diagram showing another example of the basic configuration of the sound source separation system according to the third embodiment of the present invention.
- a control unit 160 is added to the configuration of the sound source separation device 1 in FIG.
- the function of the control unit 160 is the same as the function described in the second embodiment.
- FIG. 14 is a diagram showing a basic configuration of a sound source separation system according to the fourth embodiment of the present invention.
- the sound source separation system according to the present embodiment is characterized by having a directivity control unit 170, a target sound correction unit 180, and an arrival direction estimation unit 190.
- the directivity control unit 170 performs spectrum analysis based on the target sound position estimated by the arrival direction estimation unit 190 so that the two sound sources R1 and R2 to be separated are virtually symmetrical with respect to the separation plane as much as possible.
- a delay operation is given to one of the microphone outputs subjected to frequency analysis by the units 20 and 21. That is, the separation surface is virtually rotated, and an optimum value is calculated for the rotation angle at this time according to the frequency band.
- the target sound correcting unit 180 corrects the frequency characteristic of the target sound output.
- FIG. 25 shows a situation in which two sound sources R1 ′ (target sound) and sound source R2 ′ (noise) are symmetrical with respect to the separation surface rotated by ⁇ with respect to the original separation surface intersecting the line segment connecting the microphones.
- a situation equivalent to the situation shown in FIG. 25 can be realized by giving a constant delay amount ⁇ d to a signal acquired by one microphone. That is, in order to adjust the directivity by manipulating the phase difference between the microphones, the phase rotator D ( ⁇ ) is multiplied in the above equation (1).
- W 1 ( ⁇ ) W 1 ( ⁇ , ⁇ 1 , ⁇ 2 )
- X ( ⁇ ) X ( ⁇ , ⁇ 1 , ⁇ 2 ).
- the delay amount ⁇ d is calculated as follows.
- d is the distance between microphones [m]
- c is the speed of sound [m / s].
- the directivity control unit 170 is provided with an optimum delay amount calculation unit 171 so that the rotation angle ⁇ when the separation surface is virtually rotated is set.
- the above problem is solved by calculating an optimum delay amount satisfying the spatial sampling theorem for each frequency band.
- the directivity control unit 170 determines whether the optimal delay amount calculation unit 171 satisfies the spatial sampling theorem for each frequency when the delay amount by ⁇ is given from the equation (28). If the corresponding delay amount ⁇ d is applied to the phase rotator 172 and the spatial sampling theorem is not satisfied, the delay amount ⁇ 0 is applied to the phase rotator 172.
- FIG. 16 is a diagram showing the directivity characteristics of the sound source separation device 1 according to the present embodiment. As shown in FIG. 16, by applying the delay amount of Expression (31), the problem is that the sound of the high frequency component in the opposite zone coming from the direction greatly deviating from the desired sound source separation plane is output. Can be solved.
- FIG. 17 is a diagram illustrating another configuration of the directivity control unit 170.
- the delay amount calculated based on the equation (31) in the optimum delay amount calculation unit 171 is not given to only one microphone input, but half each of the microphone inputs by the phase rotators 172 and 173. The same amount of delay operation may be realized as a whole.
- the delay amount of the signal acquired by one of the microphone tau d / 2 (or tau 0/2), the other delay-tau in the acquired signal at the microphone d / 2 (or-tau 0/2) to provide a differential delay of the total tau d (or tau 0) It may be made to become.
- Target sound correction section Another problem is that a slight distortion occurs in the frequency characteristics of the target sound by performing the BSA processing with the beam formers 30 and 31 after the directivity control unit 170 narrows the directivity. Further, there is a problem that the output gain is reduced by the processing of Expression (31). Therefore, in order to correct the frequency characteristic of the target sound output, the target sound correction unit 180 is provided to perform frequency equalization. In other words, since the location of the target sound is approximately fixed, the estimated target sound position is corrected. In this embodiment, a physical model that simply imitates a transfer function representing the propagation time and attenuation from a point sound source to each microphone is used.
- the transfer function of the microphone 10 is expressed as a reference value
- the transfer function of the microphone 11 is expressed as a relative value with respect to the microphone 10.
- ⁇ s is the distance between the microphone 10 and the target sound
- ⁇ S is the direction of the target sound.
- the weighting coefficient for the above propagation model is G BSA ( ⁇
- the weighting coefficient G BSA ( ⁇ ) calculated by the weighting coefficient calculation unit 50 is corrected by the target sound correction unit 180 to G BSA ′ ( ⁇ ) represented by the following equation.
- FIG. 18 is a diagram illustrating the directivity characteristics of the sound source separation device 1 when the equalizer of the target sound correcting unit 180 is designed with ⁇ S being 0 degrees and ⁇ S being 1.5 [m]. It can be confirmed from FIG. 18 that there is no frequency distortion of the output signal with respect to the sound source coming from the 0 degree direction.
- the musical noise reduction gain calculator 60 receives the corrected weighting coefficient G BSA '( ⁇ ) as an input. That is, G BSA ( ⁇ ) in equation (7) and the like is replaced with G BSA '( ⁇ ). Further, at least one of the signals obtained by the microphones 10 and 11 may be input to the control unit 160.
- FIG. 19 is a flowchart showing an example of processing in the sound source separation system.
- frequency analysis is performed on the input signal 1 and the input signal 2 obtained in the microphones 10 and 20, respectively (steps S101 and S102).
- the position of the target sound is estimated by the arrival direction estimation unit 190, and the optimum delay amount is calculated by the directivity control unit 170 based on the estimated positions of the sound sources R1 and R2, and this optimum
- the input signal 1 may be multiplied by the phase rotator from the delay amount.
- the beamformers 30 and 31 perform filtering processing on the signals x 1 ( ⁇ ) and x 2 ( ⁇ ) subjected to frequency analysis in steps S101 and S102 (steps S103 and S104). Further, power is calculated by the power calculation units 40 and 41 with respect to the output of these filtering processes (steps S105 and S106). In the weighting coefficient calculation unit 50, a separation gain value G BSA ( ⁇ ) is calculated from the calculation results in steps S105 and S106 (step S107).
- the frequency characteristic of the target sound may be corrected by recalculating the weighting coefficient value G BSA ( ⁇ ) in the target sound correcting unit 180.
- the musical noise reduction gain calculation unit 60 calculates a gain value G S ( ⁇ ) that reduces the musical noise (step S108).
- control signals for controlling the noise estimation unit 70, the noise equalizer unit 100, and the residual noise suppression gain calculation unit 110 based on the weighting coefficient value G BSA ( ⁇ ) calculated in step S107 are provided. Calculated (step S109).
- noise estimation is performed in the noise estimation unit 70 (step S110). Further, after the frequency analysis is performed in the spectrum analysis unit 80 on the noise estimation result x ABM (t) in step S110 (step S111), the power calculation unit 90 calculates the power for each frequency bin ( Step S112). Further, in the noise equalizer unit 100, the power of the estimated noise calculated in step S112 is corrected.
- the residual noise suppression gain calculation unit 110 applies the gain value G S ( ⁇ ) calculated in step S108 to the output value ds 1 ( ⁇ ) of the beam former 30 processed in step S103.
- the gain G T ( ⁇ ) for removing the noise component is calculated (step S114). Note that the gain G T ( ⁇ ) is calculated based on the estimated value ⁇ d ( ⁇ ) of the noise component whose power is corrected in step S112.
- the gain multiplication unit 130 multiplies the processing result in the beam former 30 in step S103 by the gain calculated in step S114 (step S117).
- the time waveform conversion unit 120 converts the multiplication result (target sound) in step S117 into a time domain signal (step S118).
- noise is not generated from the output signal of the beamformer 30 by the musical noise reduction unit 320 and the residual noise suppression unit 330 without performing the gain calculation in step S108 and step S114. It may be removed.
- Each process shown in the flowchart of FIG. 19 is roughly divided into three processes.
- the three processes are an output process from the beamformer 30 (steps S101 to S103), a gain calculation process (steps S101 to S108 and step S114), and a noise estimation process (steps S110 to S113).
- the gain calculation process and the noise estimation process after the weighting coefficient is calculated in steps S101 to S107 of the gain calculation process, the process of step S108 is executed, and at the same time, the process of step S109 and the noise estimation process (steps S110 to S110).
- a gain to be multiplied by the output of the beamformer 30 is determined in step S114.
- FIG. 20 is a flowchart showing details of the process in step S110 of FIG.
- a pseudo signal H T (t) ⁇ x 1 (t) that matches the signal component from the sound source R1 is calculated (step S201). 6 is subtracted from the signal x 2 (t) of the microphone 11 from the signal x 2 (t) of the microphone 11, the error signal x ABM ( t) is calculated (step S202). Thereafter, when the control signal from the control unit 160 is larger than the predetermined threshold (step S203), the adaptive filter unit 71 updates the coefficient H (t) of the adaptive filter (step S204).
- FIG. 21 is a flowchart showing details of the process in step S113 of FIG. First, an output X S ( ⁇ ) is obtained by multiplying the output ds 1 ( ⁇ ) of the beam former 30 by the gain G S ( ⁇ ) output from the musical noise reduction gain calculation unit 60 (step S301).
- step S302 When the control signal from the control unit 160 is smaller than the predetermined threshold (step S302), the smoothing unit 103 in FIG. 7 executes the time smoothing process for the output pX S ( ⁇ ) of the power calculation unit 102.
- the smoothing unit 104 the time smoothing process of the output pX ABM ( ⁇ ) of the power calculation unit 90 is executed (steps S303 and S304).
- the equalizer update unit 106 calculates the ratio H EQ ( ⁇ ) of the processing results of step S303 and step S304, and updates the equalizer value to H EQ ( ⁇ ) (step S305). Finally, the equalizer application unit 107 calculates the estimated noise ⁇ d ( ⁇ ) included in X S ( ⁇ ) (step S306).
- FIG. 22 is a flowchart showing details of the process in step S114 of FIG.
- the value of ⁇ d ( ⁇ ) which is the output of the noise equalizer unit 100 and is the estimated value of the noise component, is, for example, 0.
- a process of reducing it to 75 times or the like is executed (step S402).
- a posterior SNR is calculated (step S403).
- a prior SNR is calculated (step S404).
- the residual noise suppression gain G T ( ⁇ ) is calculated (step S405).
- the weighting coefficient may be calculated using a predetermined bias value ⁇ ( ⁇ ).
- a new gain value may be calculated by adding a predetermined bias value to the denominator of the gain value G BSA ( ⁇ ).
- the addition of the bias value can be expected to improve the low-frequency SNR particularly when the gain characteristics of the microphone are uniform and the target sound such as a headset or a handset exists near the microphone.
- FIG. 23 and FIG. 24 are graphs comparing the output values of the beamformer 30 in the case of the proximity sound and the long distance sound.
- (a1) to (a3) are graphs showing the output values for the proximity sound
- (b1) to (b3) are graphs showing the output values for the long-distance sound.
- the distance between the microphone 10 and the microphone 11 is 0.03 m
- the distance between the microphone 10 and the sound sources R1 and R2 is 0.06 m (meters) and 1.5 m, respectively.
- the distance between the microphone 10 and the microphone 11 is 0.01 m
- the distance between the microphone 10 and the sound sources R1 and R2 is 0.02 m (meters) and 1.5 m, respectively.
- FIG. is a graph showing the value of ds 1 ( ⁇ ) by the distance sound.
- the target sound correcting unit 180 is designed with the proximity sound as the target sound position, and in the case of a long-distance sound, the value of ps 1 ( ⁇ ) decreases in the low frequency due to the influence of the target sound correcting unit 180.
- the value of ds 1 ( ⁇ ) is small (that is, when the value of ps 1 ( ⁇ ) is small)
- the influence of ⁇ ( ⁇ ) becomes large. That is, since the denominator term is relatively larger than that of the numerator, G BSA ( ⁇ ) is further reduced. Therefore, the low range of the long distance sound is suppressed.
- G BSA ( ⁇ ) obtained by the above equation (35) is applied to the output value ds 1 ( ⁇ ) of the beamformer 30, and G BSA ( ⁇ ) and ds 1 ( ⁇ ) Multiplication result X BSA ( ⁇ ) is calculated as follows.
- the following formula as an example, the case where the sound source separation device 1 has the configuration shown in FIG. 7 is shown.
- (a1) and (b1) in FIGS. 23 and 24 are graphs showing the output ds 1 ( ⁇ ) of the beamformer 30. Further, (a2) and (b2) in each figure are graphs showing the output X BSA ( ⁇ ) when ⁇ ( ⁇ ) is not inserted into the denominator of the equation (35). Also, (a3) and (b3) in each figure are graphs showing the output X BSA ( ⁇ ) when ⁇ ( ⁇ ) is inserted into the denominator of the equation (35). From each figure, it can be seen that the low range of the long-distance sound is suppressed. In other words, an effect can be expected for the running noise that exists at the center of the low frequency range.
- the beamformer 30 constitutes a first beamformer processing unit. Further, the beamformer 31 constitutes a second beamformer processing unit.
- the gain multiplication part 130 comprises a sound source separation part.
- the present invention can be used in any industry that requires accurate separation of sound sources, such as voice recognition devices, car navigation systems, sound collection devices, recording devices, and device control using voice commands.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
Abstract
Description
上記のような問題を解決する方法として、複数のマイクロホンを備えた音源分離方法が存在する。例えば、特許文献1に記載の音源分離装置は、2つのマイクロホンを結ぶ直線の垂線に対して対称な方向から到来する音源信号を各々減衰させるためのビームフォーマ処理を行い、ビームフォーマ出力について計算したパワースペクトル情報同士の差分に基づいて目的音源のスペクトル情報を抽出する。
[第1実施形態]
図1は、第1実施形態に係る音源分離システムの基本的構成を示す図である。このシステムは、2つのマイクロホン(以下「マイク」という)10、11と、音源分離装置1とで構成されている。以下、マイクロホンを二つとして実施形態の説明を行うが、マイクロホンの数は少なくとも2つ以上あればよく、2つに限定されない。
このマイク10、11で得た2つの音源信号を、スペクトル分析部20、21においてそれぞれマイク出力毎に周波数分析し、ビームフォーマ部3においてこれらの周波数分析された信号を分離面の左右に死角を形成したビームフォーマ30、31でフィルタリングを行い、パワー計算部40、41においてそのフィルタ出力のパワーを計算する。なお、ビームフォーマ30、31は、好ましくは、分離面の左右において、分離面に対して対称に死角を形成するものである。
まず、図2を参照して、ビームフォーマ30、31からなるビームフォーマ部3の構成を説明する。スペクトル分析部20、スペクトル分析部21で周波数成分毎に分解された信号x1(ω)、x2(ω)を入力として、乗算器100a、100b、100c、100dにて、フィルタ係数w1(ω)、w2(ω)、w1 *(ω)、w2 *(ω)(*は複素共役の関係にあることを示す)と乗算をそれぞれ行う。
[パワー計算部]
次に、図3を参照して、パワー計算部40、41について説明する。パワー計算部40、41は、以下の計算式により、ビームフォーマ30、ビームフォーマ31からの出力ds1(ω)、ds2(ω)を、パワースペクトル情報ps1(ω)、ps2(ω)に変換する。
パワー計算部40、41の出力ps1(ω)、ps2(ω)は、重み付け係数算出部50の2つの入力として使用される。重み付け係数算出部50は、この2つのビームフォーマ30、31の出力のパワースペクトル情報を入力として、周波数毎の重み付け係数GBSA(ω)を出力する。
ここで、GBSA(ω)ds1(ω)について考察する。式(1)で示されるように、ds1(ω)は観測信号X(ω,θ1,θ2)に対する線形処理により得られる信号である。一方、GBSA(ω)ds1(ω)はds1(ω)に対する非線形処理により得られる信号である。
一方、図4(c)のスペクトログラムの雑音成分は入力信号のように雑音成分のエネルギーが時間方向、周波数方向に偏在しておらず、ミュージカルノイズが少ない様子がわかる。
GBSA(ω)ds1(ω)は、十分にミュージカルノイズが低減された目的音源からの音源信号であるが、拡散性雑音など様々な方向から到来するような雑音の場合、非線形処理であるGBSA(ω)は周波数ビンごとおよびフレームごとに値が大きく変化し、ミュージカルノイズを生じさせる傾向がある。そこで、非線形処理後の出力にミュージカルノイズが生じていない非線形処理前の信号を付加することでミュージカルノイズを低減する。具体的には、出力GBSA(ω)を、ビームフォーマ30の出力ds1(ω)に乗算して得られる信号XBSA(ω)と、ビームフォーマ30の出力ds1(ω)を所定の割合で足し合わせてできる信号を算出する。
ところで、ゲイン値GS(ω)は、GBSA(ω)に比較して必ず大きくなるため、ミュージカルノイズを低減する一方で、雑音成分を増加してしまう。そこで、残留雑音を抑圧するために、ミュージカルノイズ低減ゲイン算出部60の後段に残留雑音抑圧ゲイン算出部110を設け、さらに最適なゲイン値を再算出する。
雑音推定部70のブロック図を図6(a)~(d)に示す。雑音推定部70は、マイク10、11で得た2つの信号から適応フィルタリング
を行い、目的音である音源R1からの信号成分をキャンセルすることで、雑音成分のみを取得する。
ここで、音源R1からの信号をS(t)とする。なお、音源R1からの音は音源R2からの音よりも先にマイク10に到達する。それ以外の音源から発せられる音の信号をnj(t)とし、それらを雑音とする。このとき、マイク10の入力x1(t)と、マイク11の入力x2(t)は、以下のようになる。
また、このとき、目的音と雑音が無相関であると仮定すると、雑音推定部70の出力xABM(t)は、以下のように算出される。
また、雑音推定部70の出力に対しては、スペクトル分析部80において周波数分析し、雑音パワー計算部90において周波数ビン毎のパワーを計算する。また、雑音推定部70の入力としては、スペクトル分析後のマイク入力信号でもよい。
雑音推定部70の出力を周波数分析したXABM(ω)に含まれる雑音量と、重み付け係数GBSA(ω)を、ビームフォーマ30の出力ds1(ω)に乗算して得られる信号XBSA(ω)と、ビームフォーマ30の出力ds1(ω)を所定の割合で足し合わせてできる信号XS(ω)に含まれる雑音量は、スペクトルの形は似ているもののエネルギー量に乖離がある。よって、雑音イコライザ部100では、両者のエネルギー量を一致させるために補正を行う。
残留雑音抑圧ゲイン算出部110では、ビームフォーマ30の出力ds1(ω)にゲイン値GS(ω)を適用した際に残留する雑音成分を抑圧するため、ds1(ω)に乗ずるゲインを再算出する。すなわち、残留雑音抑圧ゲイン算出部110では、ds1(ω)にGS(ω)を適用した値XS(ω)に対し、残留雑音成分の推定値λd(ω)を基にXS(ω)に含まれる雑音成分を適切に除去するゲインである残留雑音抑圧ゲインGT(ω)を算出する。ゲインの算出には、ウィーナーフィルタやMMSE-STSA法(非特許文献1参照)がよく利用されている。しかし、MMSE-STSA法は、雑音を正規分布として仮定しているため、突発性雑音などはMMSE-STSAの仮定に当てはまらない場合がある。そこで、本実施形態では、比較的突発性雑音を抑圧しやすい推定器を利用する。但し、推定器には、どのような手法を用いてもよい。
重み付け係数算出部50の出力GBSA(ω)、ミュージカルノイズ低減ゲイン算出部60の出力GS(ω)、又は残留雑音抑圧算出部110の出力GT(ω)は、ゲイン乗算部130の入力として使用される。ゲイン乗算部130は、ビームフォーマ30の出力ds1(ω)と、重み付け係数GBSA(ω)、ミュージカルノイズ低減ゲインGS(ω)、又は残留雑音抑圧GT(ω)との乗算結果に基づく信号XBSA(ω)を出力する。すなわち、XBSA(ω)の値としては、例えば、ds1(ω)とGBSA(ω)との乗算値、ds1(ω)とGS(ω)との乗算値、又はds1(ω)とGT(ω)との乗算値を用いればよい。
特に、ds1(ω)とGT(ω)との乗算値から得られた目的音源からの音源信号はミュージカルノイズ、雑音成分が極めて少ない信号となる。
時間波形変換部120は、ゲイン乗算部130の出力XBSA(ω)を時間領域信号に変換する。
[音源分離システムの別の構成例]
また、図8は、本実施形態に係る音源分離システムの別の構成例を示す図である。本構成と図1に示される音源分離システムの構成との違いは、図1の音源分離システムでは雑音推定部70を時間領域で実現していたのに対し、図8の音源分離システムでは周波数領域で実現している点である。なお、他の構成については図1の音源分離システムの構成と同様である。この構成の場合、スペクトル分析80は不要となる。
図9は、本発明の第2実施形態に係る音源分離システムの基本的構成を示す図である。本実施形態に係る音源分離システムにおいては、制御部160を有する点が特徴である。制御部160は、全周波数帯域の重み付け係数GBSA(ω)をもとに、雑音推定部70、雑音イコライザ部100、残留雑音抑圧ゲイン算出部110の内部パラメータを制御することを特徴とする。内部パラメータの例としては、適応フィルタのステップサイズ、重み係数GBSA(ω)のスペクトラムフロア値β、推定雑音の雑音量などが挙げられる。
[第3実施形態]
(第1の構成)
図11は、本発明の第3実施形態に係る音源分離システムの基本的構成の一例を示す図である。
図11に示される音源分離システムにおける音源分離装置1は、スペクトル分析部20、21と、ビームフォーマ30、31と、パワー計算部40、41と、重み付け係数算出部50と、重み付け係数乗算部310と、時間波形変換部120と、を有する。ここで、重み付け係数乗算部310以外の構成については、上述した他の実施形態における構成と同様である。
重み付け係数乗算部310は、ビームフォーマ30により得られた信号ds1(ω)と、重み付け係数算出部50が算出する重み付け係数とを乗算する。
図12は、本発明の第3実施形態に係る音源分離システムの基本的構成の別の例を示す図である。
図12に示される音源分離システムにおける音源分離装置1は、スペクトル分析部20、21と、ビームフォーマ30、31と、パワー計算部40、41と、重み付け係数算出部50と、重み付け係数乗算部310と、ミュージカルノイズ低減部320と、残留雑音抑圧部330と、雑音推定部70と、スペクトル分析部80と、パワー計算部90と、雑音イコライザ部100と、時間波形変換部120と、を有する。ここで、重み付け係数乗算部310と、ミュージカルノイズ低減部320と、残留雑音抑圧部330以外の構成については、上述した他の実施形態における構成と同様である。
残留雑音抑圧部330は、ミュージカルノイズ低減部320の出力結果と雑音イコライザ部100の出力結果に基づき、ミュージカルノイズ低減部320の出力結果に含まれる残留雑音を抑圧する。
ここで、重み付け係数GBSA(ω)を、ビームフォーマ30の出力ds1(ω)に乗算して得られる信号XBSA(ω)と、ビームフォーマ30の出力ds1(ω)を所定の割合で足し合わせてできる信号XS(ω)には、雑音環境に応じて突発性雑音が含まれる場合がある。そこで、突発性雑音も推定できるように以下に説明する雑音推定部70と雑音イコライザ部100を導入する。
すなわち、図12の音源分離装置1では、ミュージカルノイズ低減ゲインGS(ω)や、残留雑音抑圧ゲインGT(ω)を算出しない点が第1実施形態および第2実施形態の音源分離装置1と異なる点である。図12のような構成であっても、第1実施形態に係る音源分離装置1と同様の効果を奏する。
また、図13は、本発明の第3実施形態に係る音源分離システムの基本的構成の別の例を示す図である。図13に示される音源分離装置1は、図12の音源分離装置1の構成に、制御部160が加えられている。制御部160の機能は、第2実施形態で説明した機能と同様である。
図14は、本発明の第4実施形態に係る音源分離システムの基本的構成を示す図である。本実施形態に係る音源分離システムにおいては、指向性制御部170、目的音補正部180、および到来方向推定部190を有する点が特徴である。
図25は、2つの音源R1'(目的音)、音源R2'(雑音)がマイクを結ぶ線分と交わる元々の分離面に対してθτだけ回転した分離面に対し、左右対称となる状況を示している。特許文献1に記述されているように、片方のマイクで取得した信号に一定遅延量τdを与えることで、図25に示される状況と等価な状況を実現可能である。すなわち、マイク間の位相差を操作し、指向特性を調整するため、上記の式(1)において、位相回転子D(ω)を乗ずる。なお、以下の式において、W1(ω)=W1(ω,θ1,θ2)、X(ω)=X(ω,θ1,θ2)である。
しかしながら、位相情報をもとにアレイ処理をする場合、以下の式で表現される空間サンプリング定理を満たさなければならない。
になるようにしてもよい。
別の問題点として、指向性制御部170において指向性を狭めた後にビームフォーマ30、31でBSA処理を行うことにより、目的音の周波数特性に若干の歪が生じることが挙げられる。また、式(31)の処理により、出力ゲインが小さくなってしまう問題が生じる。よって、目的音出力の周波数特性を補正するため目的音補正部180を設け周波数イコライジングを行う。つまり、目的音の場所はおおよそ固定されているため、推定される目的音位置に対して補正を行う。本実施形態では、ある点音源から各マイクまでの伝播時間や減衰量を表す伝達関数を簡易的に模した物理モデルを利用する。ここでは、マイク10の伝達関数を基準値とし、マイク11の伝達関数をマイク10に対する相対値として表現する。このとき、目的音位置から各マイクに到達する音の伝播モデルXm(ω)=[Xm1(ω),Xm2(ω)]は、以下のように表せる。γsは、マイク10と目的音の距離、θSは、目的音の方向である。
なお、ミュージカルノイズ低減ゲイン算出部60では、この補正された重み付け係数GBSA'(ω)を入力とする。すなわち、式(7)等のGBSA(ω)は、GBSA'(ω)に置きかえられる。
また、制御部160には、マイク10、11で得られた信号の少なくとも一方が入力されるようになっていてもよい。
図19、音源分離システムにおける処理の一例を示すフロー図である。
スペクトル分析部20、21において、マイク10、20のそれぞれにおいて得られた入力信号1、入力信号2に対し、周波数分析が実行される(ステップS101、S102)。また、ここで、到来方向推定部190において目的音の位置の推定が行われ、指向性制御部170において、推定された音源R1、R2の位置に基づいて最適遅延量が算出されて、この最適遅延量から入力信号1に位相回転子が乗算されるようになっていてもよい。
重み付け係数算出部50において、ステップS105、S106での計算結果から分離ゲイン値GBSA(ω)が算出される(ステップS107)。また、ここで、目的音補正部180において重み付け係数値GBSA(ω)が再算出されることにより、目的音の周波数特性が補正されるようになっていてもよい。
最後に、時間波形変換部120において、ステップS117での乗算結果(目的音)が時間領域信号に変換される(ステップS118)。
ゲイン算出処理と雑音推定処理については、ゲイン算出処理のステップS101~S107で重み付け係数が算出された後、ステップS108の処理が実行されると同時に、ステップS109の処理と雑音推定処理(ステップS110~S113)が処理された後、ステップS114でビームフォーマ30の出力に乗算されるゲインが決定される。
図20は、図19のステップS110における処理の詳細を示すフロー図である。まず、音源R1からの信号成分と一致するような擬似信号HT(t)・x1(t)が算出される(ステップS201)。次に、図6の減算部72において、マイク11の信号x2(t)から、ステップS201で算出された擬似信号が減算されることで、雑音推定部70の出力となる誤差信号xABM(t)が算出される(ステップS202)。
その後、制御部160からの制御信号が所定の閾値よりも大きい場合には(ステップS203)、適応フィルタ部71において、適応フィルタの係数H(t)が更新される(ステップS204)。
図21は、図19のステップS113における処理の詳細を示すフロー図である。まず、ビームフォーマ30の出力ds1(ω)に対してミュージカルノイズ低減ゲイン算出部60から出力されるゲインGS(ω)が乗算されて出力XS(ω)が得られる(ステップS301)。
図22は、図19のステップS114における処理の詳細を示すフロー図である。制御部160からの制御信号が所定の閾値よりも大きい場合には(ステップS401)、雑音イコライザ部100の出力であって、雑音成分の推定値であるλd(ω)の値が例えば0.75倍等に小さくする処理が実行される(ステップS402)。次に、事後SNRが算出される(ステップS403)。また、事前SNRが算出される(ステップS404)。最後に、残留雑音抑圧ゲインGT(ω)が算出される(ステップS405)。
重み付け係数算出部50でのゲイン値GBSA(ω)の算出時において、所定のバイアス値γ(ω)を用いて前記重み付け係数を算出しても良い。例えば、ゲイン値GBSA(ω)の分母に所定のバイアス値を加算して新たなゲイン値を算出しても良い。前記バイアス値の加算は、マイクのゲイン特性が揃っており、かつ、ヘッドセットやハンドセットなど目的音がマイクの近くに存在する場合において、特に低域のSNRの改善が期待できる。
なお、上記説明において、ビームフォーマ30は第1のビームフォーマ処理部を構成する。また、ビームフォーマ31は第2のビームフォーマ処理部を構成する。また、ゲイン乗算部130は、音源分離部を構成する。
3 ビームフォーマ部
10、11 マイク
20、21 スペクトル分析部
30、31 ビームフォーマ
40、41 パワー計算部
50 重み付け係数算出部
60 ミュージカルノイズ低減ゲイン算出部
70 雑音推定部
71 適応フィルタ部
72 減算部
73 遅延器
74 閾値比較部
80 スペクトル分析部
90 パワー計算部
100 雑音イコライザ部
101 乗算部
102 パワー計算部
103、104 スムージング部
105 閾値比較部
106 イコライザ更新部
107 イコライザ適用部
110 残留雑音抑圧ゲイン算出部
120 時間波形変換部
130 ゲイン乗算部
160 制御部
161A、161B スペクトル分析部
162A、162B ビームフォーマ
163A、163B パワー計算部
164 重み付け係数算出部
165 雑音推定部
166 スペクトル分析部
167 エネルギー比較部
170 指向性制御部
171 最適遅延量算出部
172、173 位相回転子
180 目的音補正部
190 到来方向推定部
310 重み付け係数乗算部
320 ミュージカルノイズ低減部
330 残留雑音抑圧部
Claims (12)
- 複数の音源から発せられた音源信号が混合された混合音から目的音源からの音源信号を分離する音源分離装置であって、
前記混合音が入力される2つのマイクロホンからなるマイクロホン対からのそれぞれの出力信号に対して互いに異なる第1の係数を用いた周波数領域での積和演算を行うことにより、前記2つのマイクロホンを結ぶ線分と交わる平面を境にして前記目的音源の方向が含まれる領域とは反対の領域から到来する音源信号を減衰させる第1のビームフォーマ処理部と、
前記マイクロホン対からのそれぞれの出力信号に対して、前記互いに異なる第1の係数と周波数領域で複素共役の関係にある第2の係数を乗算し、得られる結果を周波数領域で積和演算することにより、前記平面を境にして前記目的音源の方向が含まれる領域から到来する音源信号を減衰させる第2のビームフォーマ処理部と、
前記第1のビームフォーマ処理部により得られた信号から周波数毎のパワー値を有する第1のスペクトル情報を計算し、更に、前記第2のビームフォーマ処理部により得られた信号から周波数毎のパワー値を有する第2のスペクトル情報を計算するパワー計算部と、
前記第1のスペクトル情報と前記第2のスペクトル情報の周波数毎のパワー値の差分に応じて、前記第1のビームフォーマ処理部で得られた信号に乗算するための周波数毎の重み付け係数を算出する重み付け係数算出部と、を備え、
前記第1のビームフォーマ処理部により得られた信号と、前記重み付け係数算出部が算出する前記重み付け係数との乗算結果に基づき、前記混合音から前記目的音源からの音源信号を分離する音源分離部と、
を有することを特徴とする音源分離装置。 - 前記第1のビームフォーマ処理部により得られた信号と、前記重み付け係数算出部が算出する前記重み付け係数とを乗算する重み付け係数乗算部を更に有し、
前記音源分離部は、前記重み付け係数乗算部の出力結果と前記第1のビームフォーマ処理部から得られた信号とを、所定の割合で加算した結果に基づき、前記混合音から前記目的音源からの音源信号を分離することを特徴とする請求項1に記載の音源分離装置。 - 前記重み付け係数乗算部の出力結果と前記第1のビームフォーマ処理部から得られた信号とを、所定の割合で加算した結果を出力するミュージカルノイズ低減部と、
前記マイクロホン対のうち、前記目的音源に近いマイクロホンからの出力信号にフィルタ係数が可変な適応フィルタを適用することで前記マイクロホン対のうち、前記目的音源から遠いマイクロホンからの出力信号と一致するような擬似信号を算出し、前記目的音源から遠いマイクロホンからの出力信号と前記疑似信号との差分によって雑音成分を算出する雑音推定部と、
前記ミュージカルノイズ低減部の出力結果と、前記雑音推定部が算出した前記雑音成分に基づいて、前記ミュージカルノイズ低減部の出力結果に含まれる雑音成分を算出する雑音イコライザ部と、
前記ミュージカルノイズ低減部の出力結果と雑音イコライザ部の出力結果に基づき前記ミュージカルノイズ低減部の出力結果に含まれる残留雑音を抑圧する残留雑音抑圧部を有し、
前記音源分離部は、前記残留雑音抑圧部の出力結果に基づき前記混合音から前記目的音源からの音源信号を分離することを特徴とする請求項2に記載の音源分離装置。 - 前記雑音推定部、前記雑音イコライザ部、および前記残留雑音抑制部の少なくとも一つを前記周波数毎の重み付け係数に基づき制御する制御部を有する請求項3に記載の音源分離装置。
- 前記第1のビームフォーマ処理部で得られた音源信号に前記重み付け係数を乗算した乗算結果と、前記第1のビームフォーマ処理で得られた音源信号とを、所定の割合で加算するためのゲインを算出するミュージカルノイズ低減ゲイン算出部を有し、
前記音源分離部は、前記ミュージカルノイズ低減ゲイン算出部で算出されたゲインと前記第1のビームフォーマ処理にで得られた音源信号との乗算結果に基づき、前記混合音から前記目的音源からの音源信号を分離することを特徴とする請求項1に記載の音源分離装置。 - 前記マイクロホン対のうち、前記目的音源に近いマイクロホンからの出力信号にフィルタ係数が可変な適応フィルタを適用することで前記マイクロホン対のうち、前記目的音源から遠いマイクロホンからの出力信号と一致するような擬似信号を算出し、前記目的音源から遠いマイクロホンからの出力信号と前記疑似信号との差分によって雑音成分を算出する雑音推定部と、
前記第1のビームフォーマ処理部で得られた音源信号と前記ミュージカルノイズ低減ゲイン算出部において算出されたゲインとを乗算した乗算結果と、前記雑音推定部が算出した前記雑音成分に基づいて、前記第1のビームフォーマ処理部で得られた音源信号と前記ミュージカルノイズ低減ゲイン算出部において算出されたゲインとを乗算した乗算結果に含まれる雑音成分を算出する雑音イコライザ部と、
前記ミュージカルノイズ低減ゲイン算出部で算出されたゲインと、前記雑音イコライザ部で算出された前記雑音成分に基づいて、前記第1のビームフォーマ処理部で得られた音源信号に乗算するためのゲインであって、前記第1のビームフォーマ処理部で得られた音源信号と前記ミュージカルノイズ低減ゲイン算出部において算出されたゲインとを乗算した乗算結果に含まれる残留雑音を抑圧するためのゲインを算出する残留雑音抑圧ゲイン算出部を備え、
前記音源分離部は、残留雑音抑圧ゲイン算出部で算出されたゲインと前記第1のビームフォーマ処理で得られた音源信号との乗算結果に基づき前記混合音から前記目的音源からの音源信号を分離することを特徴とする請求項5に記載の音源分離装置。 - 前記雑音推定部、前記雑音イコライザ部、および前記残留雑音抑圧ゲイン算出部の少なくとも一つを前記周波数毎の重み付け係数に基づき制御する制御部を有する請求項6に記載の音源分離装置。
- 前記マイクロホン対の少なくとも一方のマイクロホンからの出力信号に乗算して、当該マイクロホンの位置を仮想的に移動させるための基準遅延量を周波数毎に算出する基準遅延量算出部と、前記マイクロホン対の少なくとも一方のマイクロホンからの出力信号に対して周波数帯域ごとに遅延量を与える指向性制御部を備え、
前記指向性制御部は、基準遅延量算出部が算出する前記基準遅延量が空間サンプリング定理を満たす周波数帯域では、当該基準遅延量を前記遅延量とし、前記基準遅延量が空間サンプリング定理を満たさない周波数帯域では、下記式(30)によって求められる最適遅延量τ0を前記遅延量とすることを特徴とする請求項1から7のいずれか一項に記載の音源分離装置。
(ただし、下記式(30)中、dは2つのマイクロホン間距離、cは音速、ωは周波数)
- 複数の音源から発せられた音源信号が混合された混合音から目的音源からの音源信号を分離する音源分離装置であって、
前記混合音が入力される2つのマイクロホンからなるマイクロホン対からのそれぞれの出力信号に対して異なる第1の係数を乗算し、得られる結果を周波数領域で積和演算することにより、前記2つのマイクロホンを結ぶ線分と交わる平面を境にして前記目的音源の方向が含まれる領域とは反対の領域から到来する音源信号を減衰させる第1のビームフォーマ処理手段と、
前記マイクロホン対からのそれぞれの出力信号に対して、前記異なる第1の係数と周波数領域で複素共役の関係にある第2の係数を乗算し、得られる結果を周波数領域で積和演算することにより、前記平面を境にして前記目的音源の方向が含まれる領域から到来する音源信号を減衰させる第2のビームフォーマ処理手段と、
前記第1のビームフォーマ処理手段により得られた信号から周波数毎のパワー値を有する第1のスペクトル情報を計算し、更に、前記第2のビームフォーマ処理手段により得られた信号から周波数毎のパワー値を有する第2のスペクトル情報を計算するパワー計算手段と、
前記第1のスペクトル情報と前記第2のスペクトル情報の周波数毎のパワー値の差分に応じて、前記第1のビームフォーマ処理手段で得られた信号に乗算するための周波数毎の重み付け係数を算出する重み付け係数算出手段と、を備え、
前記第1のビームフォーマ処理手段により得られた信号と、前記重み付け係数算出手段が算出する前記重み付け係数との乗算結果に基づき、前記混合音から前記目的音源からの音源信号を分離する音源分離手段と、
を有することを特徴とする音源分離装置。 - 前記第1のビームフォーマ処理手段により得られた信号と、前記重み付け係数算出手段が算出する前記重み付け係数とを乗算する重み付け係数乗算手段を更に有し、
前記音源分離手段は、前記重み付け係数乗算手段の出力結果と前記第1のビームフォーマ処理手段から得られた信号とを、所定の割合で加算した結果に基づき、前記混合音から前記目的音源からの音源信号を分離することを特徴とする請求項9に記載の音源分離装置。 - 第1のビームフォーマ処理部と、第2のビームフォーマ処理部と、パワー計算部と、重み付け係数算出部と、音源分離部と、を有する音源分離装置が実行する音源分離方法であって、
前記第1のビームフォーマ処理部が、複数の音源から発せられた音源信号が混合された混合音が入力される2つのマイクロホンからなるマイクロホン対からのそれぞれの出力信号に対して互いに異なる第1の係数を用いた周波数領域での積和演算を行うことにより、前記2つのマイクロホンを結ぶ線分と交わる平面を境にして目的音源の方向が含まれる領域とは反対の領域から到来する音源信号を減衰させる第1のステップと、
前記第2のビームフォーマ処理部が、前記マイクロホン対からのそれぞれの出力信号に対して、前記互いに異なる第1の係数と周波数領域で複素共役の関係にある第2の係数を乗算し、得られる結果を周波数領域で積和演算することにより、前記平面を境にして前記目的音源の方向が含まれる領域から到来する音源信号を減衰させる第2のステップと、
前記パワー計算部が、前記第1のステップにより得られた信号から周波数毎のパワー値を有する第1のスペクトル情報を計算し、更に、前記第2のステップにより得られた信号から周波数毎のパワー値を有する第2のスペクトル情報を計算する第3のステップと、
前記重み付け係数算出部が、前記第1のスペクトル情報と前記第2のスペクトル情報の周波数毎のパワー値の差分に応じて、前記第1のステップで得られた信号に乗算するための周波数毎の重み付け係数を算出する第4のステップと、
前記音源分離部が、前記第1のステップにより得られた信号と、前記第4のステップにおいて算出された前記重み付け係数との乗算結果に基づき、前記混合音から前記目的音源からの音源信号を分離する第5のステップと、
を含むことを特徴とする音源分離方法。 - コンピュータに、
複数の音源から発せられた音源信号が混合された混合音が入力される2つのマイクロホンからなるマイクロホン対からのそれぞれの出力信号に対して互いに異なる第1の係数を用いた周波数領域での積和演算を行うことにより、前記2つのマイクロホンを結ぶ線分と交わる平面を境にして目的音源の方向が含まれる領域とは反対の領域から到来する音源信号を減衰させる第1の処理ステップと、
前記マイクロホン対からのそれぞれの出力信号に対して、前記互いに異なる第1の係数と周波数領域で複素共役の関係にある第2の係数を乗算し、得られる結果を周波数領域で積和演算することにより、前記平面を境にして前記目的音源の方向が含まれる領域から到来する音源信号を減衰させる第2の処理ステップと、
前記第1の処理ステップにより得られた信号から周波数毎のパワー値を有する第1のスペクトル情報を計算し、更に、前記第2の処理ステップにより得られた信号から周波数毎のパワー値を有する第2のスペクトル情報を計算する第3の処理ステップと、
前記第1のスペクトル情報と前記第2のスペクトル情報の周波数毎のパワー値の差分に応じて、前記第1の処理ステップで得られた信号に乗算するための周波数毎の重み付け係数を算出する第4の処理ステップと、
前記第1の処理ステップにより得られた信号と、前記第4の処理ステップにおいて算出された前記重み付け係数との乗算結果に基づき、前記混合音から前記目的音源からの音源信号を分離する第5の処理ステップと、
を実行させるためのプログラム。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
BR112012031656A BR112012031656A2 (pt) | 2010-08-25 | 2011-05-25 | dispositivo, e método de separação de fontes sonoras, e, programa |
CN2011800197387A CN103098132A (zh) | 2010-08-25 | 2011-08-25 | 声源分离装置、声源分离方法、以及程序 |
KR1020127024378A KR101339592B1 (ko) | 2010-08-25 | 2011-08-25 | 음원 분리 장치, 음원 분리 방법, 및 프로그램을 기록한 컴퓨터 판독 가능한 기록 매체 |
EP11819602.1A EP2562752A4 (en) | 2010-08-25 | 2011-08-25 | DEVICE FOR SEPARATING SOUND SOURCES, METHOD FOR SEPARATING SOUND SOURCES AND PROGRAM |
JP2012530540A JP5444472B2 (ja) | 2010-08-25 | 2011-08-25 | 音源分離装置、音源分離方法、及び、プログラム |
US13/699,421 US20130142343A1 (en) | 2010-08-25 | 2011-08-25 | Sound source separation device, sound source separation method and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010-188737 | 2010-08-25 | ||
JP2010188737 | 2010-08-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012026126A1 true WO2012026126A1 (ja) | 2012-03-01 |
Family
ID=45723148
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/004734 WO2012026126A1 (ja) | 2010-08-25 | 2011-08-25 | 音源分離装置、音源分離方法、及び、プログラム |
Country Status (8)
Country | Link |
---|---|
US (1) | US20130142343A1 (ja) |
EP (1) | EP2562752A4 (ja) |
JP (1) | JP5444472B2 (ja) |
KR (1) | KR101339592B1 (ja) |
CN (1) | CN103098132A (ja) |
BR (1) | BR112012031656A2 (ja) |
TW (1) | TW201222533A (ja) |
WO (1) | WO2012026126A1 (ja) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103680512A (zh) * | 2012-09-03 | 2014-03-26 | 现代摩比斯株式会社 | 车用阵列话筒的语音识别水平提升系统及其方法 |
WO2015129760A1 (ja) * | 2014-02-28 | 2015-09-03 | 日本電信電話株式会社 | 信号処理装置、方法及びプログラム |
JP2016516343A (ja) * | 2013-03-13 | 2016-06-02 | コピン コーポレーション | 雑音キャンセリングマイクロホン装置 |
JP2018164156A (ja) * | 2017-03-24 | 2018-10-18 | 沖電気工業株式会社 | 収音装置、プログラム及び方法 |
JP2020512754A (ja) * | 2017-03-20 | 2020-04-23 | ボーズ・コーポレーションBose Corporation | ノイズ低減のためのオーディオ信号処理 |
CN111863015A (zh) * | 2019-04-26 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | 一种音频处理方法、装置、电子设备和可读存储介质 |
JP6854967B1 (ja) * | 2019-10-09 | 2021-04-07 | 三菱電機株式会社 | 雑音抑圧装置、雑音抑圧方法、及び雑音抑圧プログラム |
Families Citing this family (77)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8577678B2 (en) * | 2010-03-11 | 2013-11-05 | Honda Motor Co., Ltd. | Speech recognition system and speech recognizing method |
CN102447993A (zh) * | 2010-09-30 | 2012-05-09 | Nxp股份有限公司 | 声音场景操纵 |
JP5566846B2 (ja) * | 2010-10-15 | 2014-08-06 | 本田技研工業株式会社 | ノイズパワー推定装置及びノイズパワー推定方法並びに音声認識装置及び音声認識方法 |
JP5845760B2 (ja) * | 2011-09-15 | 2016-01-20 | ソニー株式会社 | 音声処理装置および方法、並びにプログラム |
US8712951B2 (en) * | 2011-10-13 | 2014-04-29 | National Instruments Corporation | Determination of statistical upper bound for estimate of noise power spectral density |
US8943014B2 (en) | 2011-10-13 | 2015-01-27 | National Instruments Corporation | Determination of statistical error bounds and uncertainty measures for estimates of noise power spectral density |
US10136239B1 (en) | 2012-09-26 | 2018-11-20 | Foundation For Research And Technology—Hellas (F.O.R.T.H.) | Capturing and reproducing spatial sound apparatuses, methods, and systems |
US20160210957A1 (en) * | 2015-01-16 | 2016-07-21 | Foundation For Research And Technology - Hellas (Forth) | Foreground Signal Suppression Apparatuses, Methods, and Systems |
US9955277B1 (en) | 2012-09-26 | 2018-04-24 | Foundation For Research And Technology-Hellas (F.O.R.T.H.) Institute Of Computer Science (I.C.S.) | Spatial sound characterization apparatuses, methods and systems |
US10175335B1 (en) | 2012-09-26 | 2019-01-08 | Foundation For Research And Technology-Hellas (Forth) | Direction of arrival (DOA) estimation apparatuses, methods, and systems |
US10149048B1 (en) | 2012-09-26 | 2018-12-04 | Foundation for Research and Technology—Hellas (F.O.R.T.H.) Institute of Computer Science (I.C.S.) | Direction of arrival estimation and sound source enhancement in the presence of a reflective surface apparatuses, methods, and systems |
US9257952B2 (en) | 2013-03-13 | 2016-02-09 | Kopin Corporation | Apparatuses and methods for multi-channel signal compression during desired voice activity detection |
US10306389B2 (en) | 2013-03-13 | 2019-05-28 | Kopin Corporation | Head wearable acoustic system with noise canceling microphone geometry apparatuses and methods |
AT514412A1 (de) * | 2013-03-15 | 2014-12-15 | Commend Internat Gmbh | Verfahren zur Erhöhung der Sprachverständlichkeit |
JP2014219467A (ja) * | 2013-05-02 | 2014-11-20 | ソニー株式会社 | 音信号処理装置、および音信号処理方法、並びにプログラム |
EP2819429B1 (en) * | 2013-06-28 | 2016-06-22 | GN Netcom A/S | A headset having a microphone |
ES2700246T3 (es) * | 2013-08-28 | 2019-02-14 | Dolby Laboratories Licensing Corp | Mejora paramétrica de la voz |
US9497528B2 (en) * | 2013-11-07 | 2016-11-15 | Continental Automotive Systems, Inc. | Cotalker nulling based on multi super directional beamformer |
US10176823B2 (en) | 2014-05-09 | 2019-01-08 | Apple Inc. | System and method for audio noise processing and noise reduction |
US9990939B2 (en) * | 2014-05-19 | 2018-06-05 | Nuance Communications, Inc. | Methods and apparatus for broadened beamwidth beamforming and postfiltering |
CN105100338B (zh) * | 2014-05-23 | 2018-08-10 | 联想(北京)有限公司 | 降低噪声的方法和装置 |
CN104134444B (zh) * | 2014-07-11 | 2017-03-15 | 福建星网视易信息系统有限公司 | 一种基于mmse的歌曲去伴奏方法和装置 |
DE102015203600B4 (de) * | 2014-08-22 | 2021-10-21 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | FIR-Filterkoeffizientenberechnung für Beamforming-Filter |
JP6703525B2 (ja) * | 2014-09-05 | 2020-06-03 | インターデジタル シーイー パテント ホールディングス | 音源を強調するための方法及び機器 |
EP3029671A1 (en) * | 2014-12-04 | 2016-06-08 | Thomson Licensing | Method and apparatus for enhancing sound sources |
EP3010017A1 (en) * | 2014-10-14 | 2016-04-20 | Thomson Licensing | Method and apparatus for separating speech data from background data in audio communication |
CN105702262A (zh) * | 2014-11-28 | 2016-06-22 | 上海航空电器有限公司 | 一种头戴式双麦克风语音增强方法 |
CN105989851B (zh) * | 2015-02-15 | 2021-05-07 | 杜比实验室特许公司 | 音频源分离 |
CN106157967A (zh) | 2015-04-28 | 2016-11-23 | 杜比实验室特许公司 | 脉冲噪声抑制 |
US9554207B2 (en) | 2015-04-30 | 2017-01-24 | Shure Acquisition Holdings, Inc. | Offset cartridge microphones |
US9565493B2 (en) | 2015-04-30 | 2017-02-07 | Shure Acquisition Holdings, Inc. | Array microphone system and method of assembling the same |
US9460727B1 (en) * | 2015-07-01 | 2016-10-04 | Gopro, Inc. | Audio encoder for wind and microphone noise reduction in a microphone array system |
US9613628B2 (en) | 2015-07-01 | 2017-04-04 | Gopro, Inc. | Audio decoder for wind and microphone noise reduction in a microphone array system |
US9401158B1 (en) * | 2015-09-14 | 2016-07-26 | Knowles Electronics, Llc | Microphone signal fusion |
US11631421B2 (en) * | 2015-10-18 | 2023-04-18 | Solos Technology Limited | Apparatuses and methods for enhanced speech recognition in variable environments |
CN108292508B (zh) * | 2015-12-02 | 2021-11-23 | 日本电信电话株式会社 | 空间相关矩阵估计装置、空间相关矩阵估计方法和记录介质 |
WO2017108085A1 (en) * | 2015-12-21 | 2017-06-29 | Huawei Technologies Co., Ltd. | A signal processing apparatus and method |
GB2549922A (en) * | 2016-01-27 | 2017-11-08 | Nokia Technologies Oy | Apparatus, methods and computer computer programs for encoding and decoding audio signals |
CN107404684A (zh) * | 2016-05-19 | 2017-11-28 | 华为终端(东莞)有限公司 | 一种采集声音信号的方法和装置 |
DK3253075T3 (en) * | 2016-05-30 | 2019-06-11 | Oticon As | A HEARING EQUIPMENT INCLUDING A RADIO FORM FILTER UNIT CONTAINING AN EXCHANGE UNIT |
CN107507624B (zh) * | 2016-06-14 | 2021-03-09 | 瑞昱半导体股份有限公司 | 声源分离方法与装置 |
WO2018037643A1 (ja) * | 2016-08-23 | 2018-03-01 | ソニー株式会社 | 情報処理装置、情報処理方法及びプログラム |
GB201615538D0 (en) * | 2016-09-13 | 2016-10-26 | Nokia Technologies Oy | A method , apparatus and computer program for processing audio signals |
EP3324406A1 (en) * | 2016-11-17 | 2018-05-23 | Fraunhofer Gesellschaft zur Förderung der Angewand | Apparatus and method for decomposing an audio signal using a variable threshold |
US10367948B2 (en) | 2017-01-13 | 2019-07-30 | Shure Acquisition Holdings, Inc. | Post-mixing acoustic echo cancellation systems and methods |
JP6472823B2 (ja) * | 2017-03-21 | 2019-02-20 | 株式会社東芝 | 信号処理装置、信号処理方法および属性付与装置 |
CN107135443B (zh) * | 2017-03-29 | 2020-06-23 | 联想(北京)有限公司 | 一种信号处理方法及电子设备 |
US10187721B1 (en) * | 2017-06-22 | 2019-01-22 | Amazon Technologies, Inc. | Weighing fixed and adaptive beamformers |
JP6686977B2 (ja) * | 2017-06-23 | 2020-04-22 | カシオ計算機株式会社 | 音源分離情報検出装置、ロボット、音源分離情報検出方法及びプログラム |
CN108630216B (zh) * | 2018-02-15 | 2021-08-27 | 湖北工业大学 | 一种基于双麦克风模型的mpnlms声反馈抑制方法 |
US10755728B1 (en) * | 2018-02-27 | 2020-08-25 | Amazon Technologies, Inc. | Multichannel noise cancellation using frequency domain spectrum masking |
WO2019231632A1 (en) | 2018-06-01 | 2019-12-05 | Shure Acquisition Holdings, Inc. | Pattern-forming microphone array |
US11297423B2 (en) | 2018-06-15 | 2022-04-05 | Shure Acquisition Holdings, Inc. | Endfire linear array microphone |
CN110610718B (zh) * | 2018-06-15 | 2021-10-08 | 炬芯科技股份有限公司 | 一种提取期望声源语音信号的方法及装置 |
CN110931028B (zh) * | 2018-09-19 | 2024-04-26 | 北京搜狗科技发展有限公司 | 一种语音处理方法、装置和电子设备 |
CN112889296A (zh) | 2018-09-20 | 2021-06-01 | 舒尔获得控股公司 | 用于阵列麦克风的可调整的波瓣形状 |
CN111175727B (zh) * | 2018-11-13 | 2022-05-03 | 中国科学院声学研究所 | 一种基于条件波数谱密度的宽带信号方位估计的方法 |
JP2022526761A (ja) | 2019-03-21 | 2022-05-26 | シュアー アクイジッション ホールディングス インコーポレイテッド | 阻止機能を伴うビーム形成マイクロフォンローブの自動集束、領域内自動集束、および自動配置 |
CN113841419A (zh) | 2019-03-21 | 2021-12-24 | 舒尔获得控股公司 | 天花板阵列麦克风的外壳及相关联设计特征 |
US11558693B2 (en) | 2019-03-21 | 2023-01-17 | Shure Acquisition Holdings, Inc. | Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality |
TW202101422A (zh) | 2019-05-23 | 2021-01-01 | 美商舒爾獲得控股公司 | 可操縱揚聲器陣列、系統及其方法 |
US11302347B2 (en) | 2019-05-31 | 2022-04-12 | Shure Acquisition Holdings, Inc. | Low latency automixer integrated with voice and noise activity detection |
CN110244260B (zh) * | 2019-06-17 | 2021-06-29 | 杭州电子科技大学 | 基于声能流矢量补偿的水下目标高精度doa估计方法 |
CN112216303B (zh) * | 2019-07-11 | 2024-07-23 | 北京声智科技有限公司 | 一种语音处理方法、装置及电子设备 |
US11297426B2 (en) | 2019-08-23 | 2022-04-05 | Shure Acquisition Holdings, Inc. | One-dimensional array microphone with improved directivity |
US12028678B2 (en) | 2019-11-01 | 2024-07-02 | Shure Acquisition Holdings, Inc. | Proximity microphone |
US11552611B2 (en) | 2020-02-07 | 2023-01-10 | Shure Acquisition Holdings, Inc. | System and method for automatic adjustment of reference gain |
CN111179960B (zh) * | 2020-03-06 | 2022-10-18 | 北京小米松果电子有限公司 | 音频信号处理方法及装置、存储介质 |
US11706562B2 (en) | 2020-05-29 | 2023-07-18 | Shure Acquisition Holdings, Inc. | Transducer steering and configuration systems and methods using a local positioning system |
US11290814B1 (en) | 2020-12-15 | 2022-03-29 | Valeo North America, Inc. | Method, apparatus, and computer-readable storage medium for modulating an audio output of a microphone array |
EP4285605A1 (en) | 2021-01-28 | 2023-12-06 | Shure Acquisition Holdings, Inc. | Hybrid audio beamforming system |
CN113362864B (zh) * | 2021-06-16 | 2022-08-02 | 北京字节跳动网络技术有限公司 | 音频信号处理的方法、装置、存储介质及电子设备 |
CN114166334B (zh) * | 2021-11-23 | 2023-06-27 | 中国直升机设计研究所 | 一种非消声风洞旋翼噪声测点的声衰减系数校准方法 |
CN113921027B (zh) * | 2021-12-14 | 2022-04-29 | 北京清微智能信息技术有限公司 | 一种基于空间特征的语音增强方法、装置及电子设备 |
CN114974199A (zh) * | 2022-05-11 | 2022-08-30 | 北京小米移动软件有限公司 | 降噪方法、装置、降噪耳机及介质 |
CN114979902B (zh) * | 2022-05-26 | 2023-01-20 | 珠海市华音电子科技有限公司 | 一种基于改进的变步长ddcs自适应算法的降噪拾音方法 |
TWI812276B (zh) * | 2022-06-13 | 2023-08-11 | 英業達股份有限公司 | 振噪影響硬碟效能的測試方法與系統 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10207490A (ja) * | 1997-01-22 | 1998-08-07 | Toshiba Corp | 信号処理装置 |
JP2001100800A (ja) * | 1999-09-27 | 2001-04-13 | Toshiba Corp | 雑音成分抑圧処理装置および雑音成分抑圧処理方法 |
JP2004289762A (ja) * | 2003-01-29 | 2004-10-14 | Toshiba Corp | 音声信号処理方法と装置及びプログラム |
JP2007147732A (ja) * | 2005-11-24 | 2007-06-14 | Japan Advanced Institute Of Science & Technology Hokuriku | 雑音低減システム及び雑音低減方法 |
JP4225430B2 (ja) | 2005-08-11 | 2009-02-18 | 旭化成株式会社 | 音源分離装置、音声認識装置、携帯電話機、音源分離方法、及び、プログラム |
JP2009288215A (ja) * | 2008-06-02 | 2009-12-10 | Toshiba Corp | 音響処理装置及びその方法 |
JP2010271411A (ja) * | 2009-05-19 | 2010-12-02 | Nara Institute Of Science & Technology | 雑音抑圧装置およびプログラム |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7099821B2 (en) * | 2003-09-12 | 2006-08-29 | Softmax, Inc. | Separation of target acoustic signals in a multi-transducer arrangement |
DE102006047982A1 (de) * | 2006-10-10 | 2008-04-24 | Siemens Audiologische Technik Gmbh | Verfahren zum Betreiben einer Hörfilfe, sowie Hörhilfe |
US8577677B2 (en) * | 2008-07-21 | 2013-11-05 | Samsung Electronics Co., Ltd. | Sound source separation method and system using beamforming technique |
EP2192794B1 (en) * | 2008-11-26 | 2017-10-04 | Oticon A/S | Improvements in hearing aid algorithms |
KR101761312B1 (ko) * | 2010-12-23 | 2017-07-25 | 삼성전자주식회사 | 마이크 어레이를 이용한 방향성 음원 필터링 장치 및 그 제어방법 |
JP5543023B2 (ja) * | 2011-05-24 | 2014-07-09 | 三菱電機株式会社 | 目的音強調装置およびカーナビゲーションシステム |
-
2011
- 2011-05-25 BR BR112012031656A patent/BR112012031656A2/pt not_active IP Right Cessation
- 2011-08-25 KR KR1020127024378A patent/KR101339592B1/ko active IP Right Grant
- 2011-08-25 US US13/699,421 patent/US20130142343A1/en not_active Abandoned
- 2011-08-25 WO PCT/JP2011/004734 patent/WO2012026126A1/ja active Application Filing
- 2011-08-25 CN CN2011800197387A patent/CN103098132A/zh active Pending
- 2011-08-25 TW TW100130572A patent/TW201222533A/zh unknown
- 2011-08-25 EP EP11819602.1A patent/EP2562752A4/en not_active Withdrawn
- 2011-08-25 JP JP2012530540A patent/JP5444472B2/ja not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10207490A (ja) * | 1997-01-22 | 1998-08-07 | Toshiba Corp | 信号処理装置 |
JP2001100800A (ja) * | 1999-09-27 | 2001-04-13 | Toshiba Corp | 雑音成分抑圧処理装置および雑音成分抑圧処理方法 |
JP2004289762A (ja) * | 2003-01-29 | 2004-10-14 | Toshiba Corp | 音声信号処理方法と装置及びプログラム |
JP4225430B2 (ja) | 2005-08-11 | 2009-02-18 | 旭化成株式会社 | 音源分離装置、音声認識装置、携帯電話機、音源分離方法、及び、プログラム |
JP2007147732A (ja) * | 2005-11-24 | 2007-06-14 | Japan Advanced Institute Of Science & Technology Hokuriku | 雑音低減システム及び雑音低減方法 |
JP2009288215A (ja) * | 2008-06-02 | 2009-12-10 | Toshiba Corp | 音響処理装置及びその方法 |
JP2010271411A (ja) * | 2009-05-19 | 2010-12-02 | Nara Institute Of Science & Technology | 雑音抑圧装置およびプログラム |
Non-Patent Citations (3)
Title |
---|
S. GUSTAFSSON; P. JAX; P. VARY: "A novel psychoacoustically motivated audio enhancement algorithm preserving background noise characteristics", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP'98, vol. 1, 12 May 1998 (1998-05-12), pages 397 - 400, XP000854599, DOI: doi:10.1109/ICASSP.1998.674451 |
See also references of EP2562752A4 |
Y. EPHRAIM; D. MALAH: "Speech enhancement using minimum mean-square error short-time spectral amplitude estimator", IEEE TRANS ACOUST., SPEECH, SIGNAL PROCESSING, ASSP-32, vol. 6, December 1984 (1984-12-01), pages 1109 - 1121, XP002435684, DOI: doi:10.1109/TASSP.1984.1164453 |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103680512A (zh) * | 2012-09-03 | 2014-03-26 | 现代摩比斯株式会社 | 车用阵列话筒的语音识别水平提升系统及其方法 |
JP2016516343A (ja) * | 2013-03-13 | 2016-06-02 | コピン コーポレーション | 雑音キャンセリングマイクロホン装置 |
US10379386B2 (en) | 2013-03-13 | 2019-08-13 | Kopin Corporation | Noise cancelling microphone apparatus |
WO2015129760A1 (ja) * | 2014-02-28 | 2015-09-03 | 日本電信電話株式会社 | 信号処理装置、方法及びプログラム |
JPWO2015129760A1 (ja) * | 2014-02-28 | 2017-03-30 | 日本電信電話株式会社 | 信号処理装置、方法及びプログラム |
JP2020512754A (ja) * | 2017-03-20 | 2020-04-23 | ボーズ・コーポレーションBose Corporation | ノイズ低減のためのオーディオ信号処理 |
JP2018164156A (ja) * | 2017-03-24 | 2018-10-18 | 沖電気工業株式会社 | 収音装置、プログラム及び方法 |
CN111863015A (zh) * | 2019-04-26 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | 一种音频处理方法、装置、电子设备和可读存储介质 |
JP6854967B1 (ja) * | 2019-10-09 | 2021-04-07 | 三菱電機株式会社 | 雑音抑圧装置、雑音抑圧方法、及び雑音抑圧プログラム |
WO2021070278A1 (ja) * | 2019-10-09 | 2021-04-15 | 三菱電機株式会社 | 雑音抑圧装置、雑音抑圧方法、及び雑音抑圧プログラム |
US11984132B2 (en) | 2019-10-09 | 2024-05-14 | Mitsubishi Electric Corporation | Noise suppression device, noise suppression method, and storage medium storing noise suppression program |
Also Published As
Publication number | Publication date |
---|---|
CN103098132A (zh) | 2013-05-08 |
US20130142343A1 (en) | 2013-06-06 |
KR20120123566A (ko) | 2012-11-08 |
TW201222533A (en) | 2012-06-01 |
JPWO2012026126A1 (ja) | 2013-10-28 |
EP2562752A1 (en) | 2013-02-27 |
JP5444472B2 (ja) | 2014-03-19 |
KR101339592B1 (ko) | 2013-12-10 |
EP2562752A4 (en) | 2013-10-30 |
BR112012031656A2 (pt) | 2016-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5444472B2 (ja) | 音源分離装置、音源分離方法、及び、プログラム | |
EP3692704B1 (en) | Spatial double-talk detector | |
JP5762956B2 (ja) | ヌル処理雑音除去を利用した雑音抑制を提供するシステム及び方法 | |
US8942976B2 (en) | Method and device for noise reduction control using microphone array | |
CN110085248B (zh) | 个人通信中降噪和回波消除时的噪声估计 | |
EP2237271B1 (en) | Method for determining a signal component for reducing noise in an input signal | |
US7464029B2 (en) | Robust separation of speech signals in a noisy environment | |
JP4225430B2 (ja) | 音源分離装置、音声認識装置、携帯電話機、音源分離方法、及び、プログラム | |
JP4496186B2 (ja) | 音源分離装置、音源分離プログラム及び音源分離方法 | |
US20210125625A1 (en) | Apparatus and method for multiple-microphone speech enhancement | |
CN101510426A (zh) | 一种噪声消除方法及系统 | |
EP3155618A1 (en) | Multi-band noise reduction system and methodology for digital audio signals | |
EP3008924B1 (en) | Method of signal processing in a hearing aid system and a hearing aid system | |
Djendi et al. | Analysis of two-sensors forward BSS structure with post-filters in the presence of coherent and incoherent noise | |
CN111128210A (zh) | 具有声学回声消除的音频信号处理 | |
US20140193000A1 (en) | Method and apparatus for generating a noise reduced audio signal using a microphone array | |
Song et al. | An integrated multi-channel approach for joint noise reduction and dereverberation | |
Priyanka et al. | Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement | |
EP3225037B1 (en) | Method and apparatus for generating a directional sound signal from first and second sound signals | |
Priyanka et al. | Generalized sidelobe canceller beamforming with combined postfilter and sparse NMF for speech enhancement | |
Kodrasi et al. | Curvature-based optimization of the trade-off parameter in the speech distortion weighted multichannel wiener filter | |
Cho et al. | Speech enhancement using microphone array in moving vehicle environment | |
JP2012049715A (ja) | 音源分離装置、音源分離方法、及び、プログラム | |
Prasad et al. | Two microphone technique to improve the speech intelligibility under noisy environment | |
Martın-Donas et al. | A postfiltering approach for dual-microphone smartphones |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180019738.7 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11819602 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012530540 Country of ref document: JP |
|
ENP | Entry into the national phase |
Ref document number: 20127024378 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011819602 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13699421 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10417/CHENP/2012 Country of ref document: IN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2013103344 Country of ref document: RU Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112012031656 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 112012031656 Country of ref document: BR Kind code of ref document: A2 Effective date: 20121212 |