WO2006059806A1 - 音声認識装置 - Google Patents
音声認識装置 Download PDFInfo
- Publication number
- WO2006059806A1 WO2006059806A1 PCT/JP2005/022601 JP2005022601W WO2006059806A1 WO 2006059806 A1 WO2006059806 A1 WO 2006059806A1 JP 2005022601 W JP2005022601 W JP 2005022601W WO 2006059806 A1 WO2006059806 A1 WO 2006059806A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound source
- sound
- mask
- unit
- speech recognition
- Prior art date
Links
- 238000000926 separation method Methods 0.000 claims abstract description 64
- 230000004807 localization Effects 0.000 claims description 34
- 238000000605 extraction Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 2
- 238000013459 approach Methods 0.000 claims 1
- 230000005236 sound signal Effects 0.000 abstract description 8
- 238000000034 method Methods 0.000 description 33
- 238000001228 spectrum Methods 0.000 description 23
- 238000012546 transfer Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 239000008896 Opium Substances 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 229960001027 opium Drugs 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Definitions
- the present invention relates to a speech recognition apparatus.
- the present invention relates to a speech recognition device that is robust against speech that has deteriorated due to noise or input device specifications. Background art
- a speech recognition device used in a real environment receives speech that has deteriorated due to noise, reverberation, or input device specifications. Spectral meals against this problem
- ASA Anatory Scene Analysis
- Voice recognition includes a method of estimating and recognizing the original feature amount of the masked part, and a method of generating and recognizing an acoustic model corresponding to the masked feature amount. Disclosure of the invention
- the present invention proposes a speech recognition device that improves the robustness of speech recognition for speech input in which degraded feature quantities cannot be identified completely. Means for solving the problem
- the present invention provides a speech recognition device for recognizing speech from an acoustic signal collected from the outside.
- This device includes at least two sound detection means for detecting an acoustic signal, a sound source localization unit that determines the direction of the sound source based on the acoustic signal, and a sound source separation that separates sound from the sound source from the sound signal based on the direction of the sound source.
- a mask generation unit that generates a mask value according to the reliability of the separation result, a feature extraction unit that extracts a feature quantity of the acoustic signal, and recognizes speech from the acoustic signal by applying the mask to the feature quantity
- a voice recognition unit that generates a mask value according to the reliability of the separation result
- a feature extraction unit that extracts a feature quantity of the acoustic signal, and recognizes speech from the acoustic signal by applying the mask to the feature quantity
- a voice recognition unit that generates a mask value according to the reliability of the separation result.
- the mask value is generated according to the reliability of the result of separating the sound from the sound source from the acoustic signal, the robustness of the speech recognition can be improved.
- the mask generation unit according to the degree of coincidence between the result of separating the acoustic signal using a plurality of sound source separation methods different from the sound source separation unit and the result of separation by the sound source separation unit Generate the mask value.
- the mask generation unit generates a mask value according to the passage width for determining whether the sound sources are the same as determined by the sound source direction.
- the mask generation unit when there are a plurality of sound sources, the mask generation unit generates a mask value by increasing the reliability of the sound source separation result as the mask generation unit is closer to only one of the plurality of sound sources.
- FIG. 1 is a schematic diagram showing a speech recognition system including a speech recognition device according to an embodiment of the present invention.
- FIG. 2 is a block diagram of the speech recognition apparatus according to the present embodiment.
- Fig. 3 is a diagram showing the epipolar geometry of the microphone opium sound source.
- FIG. 4 is a diagram showing the relationship between the microphone phase difference ⁇ , the frequency f, and the sound source direction 0 S derived from the epipolar geometry.
- FIG. 5 is a diagram showing the relationship between the phase difference ⁇ between microphones derived from the transfer function, the frequency f, and the sound source direction 0 s.
- FIG. 6 is a diagram showing the relationship between the sound pressure difference ⁇ p between microphones derived from the transfer function, the frequency f, and the sound source direction ⁇ s .
- FIG. 7 shows the positional relationship between the microphone and the sound source.
- FIG. 8 is a diagram showing the time change of the sound source direction ⁇ s .
- FIG. 9 is a diagram showing the pass width function ⁇ ( ⁇ ).
- FIG. 10 is a diagram showing the sound source direction S s and the passband.
- FIG. 11 is a diagram showing subband selection by the phase difference ⁇ in the sound source separation unit.
- FIG. 12 is a diagram showing subband selection based on the sound pressure difference ⁇ in the sound source separation unit.
- Figure 13 shows the mask function using the pass width function.
- FIG. 1 is a schematic diagram showing a speech recognition system including a speech recognition device 10 according to an embodiment of the present invention.
- a casing 12 having a voice recognition device 10 recognizes a voice emitted by a sound source 14 around it.
- the sound source 14 emits sound as a means of communication such as humans and robots.
- Case 1 2 uses speech recognition for the interface of mobile robots and electrical appliances.
- a pair of microphones 16 a and 16 b for collecting sound from the sound source are installed on both sides of the case 1 2.
- the positions of the microphones 16 a and 16 b are not limited to both sides of the casing 12, and may be installed at other positions of the casing 12. Further, the number of microphones is not limited to a pair, and a pair or more may be installed.
- the sound emitted by the sound source 14 is collected by the enclosure 1 2 via the microphone 16.
- the collected voice is processed by the voice recognition device 10 in the housing 12.
- the voice recognition device 10 estimates the direction of the sound source 14 from which the voice is emitted and recognizes the content of the voice. Cases 1 and 2 perform tasks according to the content of the voice, for example, and reply by their own speech mechanism.
- FIG. 2 is a block diagram of the speech recognition apparatus 10 according to the present embodiment.
- the plurality of microphones 16 a, 16 b collect sound emitted from one or a plurality of sound sources 14, and send an acoustic signal including these sounds to the speech recognition device 10.
- the sound source localization unit 21 localizes the direction ⁇ 8 of the sound source 14 from the acoustic signals input from the microphones 16 a and 16 b. When the sound source 14 or the device 10 itself is moving, the position of the localized sound source 14 is tracked in the time direction.
- sound source localization is performed using epipolar geometry, scattering theory, or transfer functions.
- the sound source separation unit 23 uses the direction information 0 s of the sound source 14 obtained by the sound source localization unit 21 to separate the sound source signal from the input signal.
- the aforementioned Epipora geometric scattering theory or the phase difference delta phi or microphone Mahon pressure difference delta [rho between microphones is obtained by using a transfer function, a bandpass function simulating a human auditory characteristics, the Perform sound source separation in combination.
- the mask generation unit 25 generates a mask value depending on whether the separation result of the sound source separation unit 23 is reliable. To determine whether it is reliable, the input signal spectrum and sound source separation results are used. The mask takes a value between 0 and 1, and the closer to 1, the more reliable it is. Each mask value generated by the mask generation unit is applied to the feature value of the input signal used for speech recognition.
- the feature extraction unit 27 extracts the feature quantity from the spectrum of the input signal.
- the speech recognition unit 29 obtains the output probability of the feature amount from the acoustic model and performs speech recognition. At that time, the output probability is adjusted by applying the mask generated by the mask generation unit 25. In this embodiment, recognition is performed using a Hidden Malkov Model (HMM).
- HMM Hidden Malkov Model
- the sound source localization unit 21 localizes the direction of the sound source 14 from the acoustic signals input from the plurality of microphones 16.
- the position of the localized sound source 14 is tracked in the time direction.
- sound source localization using the epipolar geometry of sound sources 14 and 16 (section 2.1), sound source localization using scattering theory (section 2.2), and transfer function are used. Apply one of the sound source localizations described in Section 2.3.
- the sound source localization process may use other known methods such as beam forming.
- the sound source direction 0 s is calculated using the epipolar geometry of the microphone 16 and the sound source 14 as shown in FIG.
- the distance between microphones 16a and 16b is 2b, with the midpoint between the two microphones as the origin and the vertical direction from the origin as the front.
- V represents the speed of sound
- b represents the distance between the origin and the microphone
- ⁇ represents the angle in the direction of the sound source.
- groups choose what sound source direction is near and articulatory relationship, the sound source direction theta 8 of that group. When multiple groups are selected, it is considered that there are multiple sound sources, so the direction of each sound source may be obtained. If the number of sound sources is known in advance, it is desirable to select the number of groups corresponding to the number of sound sources.
- the sound source direction S s is calculated in consideration of the scattered wave from the enclosure 12 where the microphone 16 is installed.
- the housing 1 2 on which the microphone 16 is installed is the Ropot head and is a sphere of radius b.
- the center of the head is the origin of polar coordinates ( ⁇ , ⁇ , ⁇ ).
- ⁇ p (fi) is the sound pressure difference between the two microphones.
- Pl (fi) is the power of subband fi of microphone 1
- P2 (fi) is the power of subband £ of microphone 2.
- V 1 (4)
- f is the frequency
- V is the speed of sound
- R is the distance between the sound source and the observation point
- Vs represents the potential due to scattered sound
- P n represents the first kind of Legendre function
- h n (l) represents the first kind of spherical Hankel function
- Equation (8) and (9) Enter an appropriate value (for example, every 5 degrees) into 0 in Equations (8) and (9) in advance, and the relationship between frequency and phase difference ⁇ ( ⁇ , fi), or frequency fi and sound pressure difference ⁇ ⁇ ( Find the relationship with ⁇ , £). 8) In ⁇ , or,, let ⁇ be the closest to ⁇ (£) or ⁇ to be the sound source direction ⁇ i of each subband £.
- a common method for associating phase differences and sound pressure differences with frequencies and sound source directions is to measure transfer functions.
- the transfer function is created by measuring impulse responses from various directions with microphones 16a and l6b installed in the enclosure 12 (for example, a robot).
- the sound source direction is localized using this. Sound source localization using the transfer function is performed according to the following procedure.
- Equation (2) The obtained spectrum is divided into a plurality of frequency regions (subbands), and the phase difference ⁇ (£) of each subpand fi is obtained from Equation (1).
- the sound pressure difference ⁇ (3 ⁇ 4) of each subpound £ is obtained from Equation (3).
- ⁇ ( ⁇ , /) arg (S l ())-axg ( ) 3 ⁇ 4 ? 2 ()) (1 0)
- Figure 5 shows an example of the calculated phase difference ⁇ (f) and sound pressure difference ⁇ ⁇ ( ⁇ , f) for the direction 0 at an arbitrary interval in the range of ⁇ 90 ° and the arbitrary frequency f.
- Figure 6 Show.
- the sound source direction ⁇ 8 may be obtained using both ⁇ (£) and ⁇ ptfi).
- the difference between the distances from the sound source 14 to the microphones 16a and 16b (Fig. 7d) is obtained from the cross-correlation of the input signals of the microphones 16a and 16b.
- 2 Estimate the sound source direction ⁇ 3 from the relationship with b. This method is carried out according to the following procedure.
- T represents the frame length.
- x 2 (t) represents the input signal from the microphone 16 cut out with the frame length T.
- Equation (1 2) the direction 0 S of the sound source 14 is obtained from Equation (1 2) using the distance 2 b between the microphones and the difference d from the distance from the sound source to the microphone.
- Figure 8 shows the time variation of the sound source direction 0 s. Tracking a sound source direction theta [rho predicted it to time on whether the locus obtained theta 8 of et al compares the actually obtained e s, smaller than the threshold value the difference is a predetermined In this case, it is determined that the signal is from the same sound source. If it is greater than the threshold, it is determined that the signal is not from the same sound source.
- the existing time series signal prediction methods such as Kalmanfi 7 letter autoregressive prediction and HMM are used.
- the sound source separation unit 23 uses the direction information ⁇ s of the sound source 14 obtained by the sound source localization unit 21 to separate the sound source signal from the input signal.
- separation is performed by combining the above-described epipolar geometry, scattering theory, or inter-microphone phase difference ⁇ or inter-microphone sound pressure difference obtained using the transfer function, and a pass width function simulating human auditory characteristics. The method is described.
- the method used in the sound source separation unit 23 is a well-known method that uses sound source direction such as beam forming GSS (Geometric Source Separation) and separates sound sources for each subband. May be used.
- GSS Global System
- sound source separation is performed in the time domain, it is converted to the frequency domain after separation.
- sound source separation is performed by the following procedure.
- Equation (1) receives the sound source direction e s from the sound source localization unit 21, a Sabupando £ phase difference ⁇ ( ⁇ ) or sound pressure difference ⁇ of the scan Bae spectrum of the input signal.
- ⁇ (£) or ⁇ ( ⁇ ) is obtained using Equation (1) or Equation (3).
- the pass width function is a function designed based on the human auditory characteristics that the resolution with respect to the sound source direction is high in the front direction and low in the periphery.For example, as shown in Fig. 9, the pass width function in the front direction is narrow, The passing width is wide.
- the horizontal axis is the horizontal angle when the front face of the housing 12 is 0 [deg].
- phase difference ⁇ , ⁇ corresponding to 0 ⁇ , 0h is calculated using any of the above-mentioned epipolar geometry (Equation (2) and Fig. 4), scattering theory (Equation (8)), or transfer function (Fig. 5).
- Figure 11 is a graph showing an example of the relationship between the estimated phase difference and the frequency fi. Or ⁇ , ⁇ ! !
- the sound pressure difference, p h corresponding to the aforementioned scattering theory (Equation (9)) is estimated by using any of the transfer function (FIG. 6).
- FIG. 12 is a graph showing an example of the relationship between the estimated sound pressure difference and the frequency £.
- phase difference is used for low frequency localization
- sound pressure difference is used for high frequency localization, so that the separation accuracy is increased. Therefore, sub-pands smaller than a predetermined threshold (for example, 1500 [Hz])
- the phase difference ⁇ ⁇ and a large sub-pand may be selected using the sound pressure difference ⁇ ⁇ .
- sound source separation may be performed using a spectrum in the mel frequency domain instead of the spectrum in the linear frequency domain described so far.
- Mel frequency is a human interval measure for pitch, and its value roughly corresponds to the logarithm of the actual frequency.
- sound source separation in the mel frequency domain is performed according to the following procedure in which filter processing for conversion to the mel frequency is added after step 1) of the processing of the sound source separation unit 23 described above.
- the sub-pand is smaller than a predetermined threshold (for example, 1500 [Hz]). May be selected using the phase difference ⁇ , and large sub-pands using the sound pressure difference ⁇ p.
- the mask generation unit 25 generates a mask value depending on whether or not the separation result of the sound source separation unit 23 is reliable.
- mask generation using information from multiple sound source separation methods (Section 4.1), mask generation using pass width functions (Section 4.2), and mask generation considering the effects of multiple sound sources (Section 4.2) 4. Apply either of 3).
- the reliability of the flag (0 or 1) set by the sound source separation unit 23 is checked, and the mask value is set in consideration of the flag value and the reliability.
- the mask takes a value between 0 and 1, and the closer to 1, the more reliable.
- the result of signal separation by a plurality of sound source separation methods is used to check whether the separation result of the sound source separation unit 23 is reliable, and a mask is generated. This process is performed according to the following procedure.
- the sound source separation unit 23 performs sound source separation using any of the following elements.
- phase difference based on epipolar geometry is used for the sound source separation unit 23 method
- phase difference based on scattering theory iii) sound pressure difference based on scattering theory
- V Considering the case of using the sound pressure difference based on the transfer function, the mask values in each state are as follows.
- an appropriate threshold value may be set for the mask value converted to the mel frequency axis, and it may be converted to a binary mask that takes 1 if the threshold is exceeded and 0 otherwise.
- This method uses the sound source direction e s and the pass width function s (e s ) to generate a mask value according to the proximity to the sound source direction. In other words, the closer to the sound source direction, the more reliable the 1 flag attached by the sound source separation unit 23, and the more the 0 flag attached by the sound source separation unit 23, the more reliable the flag farther from the sound source direction. This process is performed in the following procedure.
- a mask is generated as follows.
- the mask value obtained is subjected to mel-scale filter puncture analysis and converted to the mel frequency axis to generate a mask. As described above, this step is not necessary when sound source separation is obtained in the mel frequency domain.
- an appropriate threshold value may be set for the mask value converted to the mel frequency axis, and it may be converted to a binary mask that takes 1 if the threshold is exceeded and 0 otherwise.
- the mask value is generated so as to reduce the reliability of the subpand estimated to contain two or more sound source signals.
- a temporary mask of 0 is generated for subbands that fall under i) or ii), and 1 is generated otherwise.
- the mask value obtained is subjected to mel-scale filter puncture analysis and converted to the mel frequency axis to generate a mask.
- sound source separation is This step is not necessary if it is determined in the wavenumber domain.
- an appropriate threshold value may be set for the mask value converted to the mel frequency axis, and it may be converted to a binary mask that takes 1 if the threshold is exceeded and 0 otherwise.
- the feature extraction unit 27 obtains a feature amount from the spectrum of the input signal using a generally known method. This process is performed in the following procedure.
- the voice recognition unit 29 performs voice recognition using an HMM known as a conventional technique.
- the output probability fOr, S) of a normal continuous HMM when the feature vector ⁇ is in state S is expressed by Eq. (16).
- f (x ⁇ S) ⁇ P (k ⁇ S) f (x ⁇ k, S) (1 6) where N is the number of mixtures in the mixed normal distribution and P (k To express.
- Xr is a reliable component of the feature vector and the mask is If it is greater than 0, indicates an unreliable component of the feature vector with a mask of 0.
- equation (17) can be rewritten as equation (18).
- S) of the J j-th component can be expressed as in equation (19).
- ⁇ is the feature vector
- j represents the mask of the component
- the overall output probability O (ATIS) can be expressed as shown in Equation (20).
- J represents the dimension of the feature vector.
- Equation (20) can be expressed by Equation (2 1) t
- Speech recognition is performed using Equation (20) or Equation (2 1).
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006546764A JP4157581B2 (ja) | 2004-12-03 | 2005-12-02 | 音声認識装置 |
EP05814282A EP1818909B1 (en) | 2004-12-03 | 2005-12-02 | Voice recognition system |
US11/792,052 US8073690B2 (en) | 2004-12-03 | 2005-12-02 | Speech recognition apparatus and method recognizing a speech from sound signals collected from outside |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US63335104P | 2004-12-03 | 2004-12-03 | |
US60/633,351 | 2004-12-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006059806A1 true WO2006059806A1 (ja) | 2006-06-08 |
Family
ID=36565223
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/022601 WO2006059806A1 (ja) | 2004-12-03 | 2005-12-02 | 音声認識装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US8073690B2 (ja) |
EP (1) | EP1818909B1 (ja) |
JP (1) | JP4157581B2 (ja) |
WO (1) | WO2006059806A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011191759A (ja) * | 2010-03-11 | 2011-09-29 | Honda Motor Co Ltd | 音声認識装置及び音声認識方法 |
JP2012088390A (ja) * | 2010-10-15 | 2012-05-10 | Honda Motor Co Ltd | 音声認識装置及び音声認識方法 |
JP2013097273A (ja) * | 2011-11-02 | 2013-05-20 | Toyota Motor Corp | 音源推定装置、方法、プログラム、及び移動体 |
JP2013250380A (ja) * | 2012-05-31 | 2013-12-12 | Yamaha Corp | 音響処理装置 |
JPWO2018207453A1 (ja) * | 2017-05-08 | 2020-03-12 | ソニー株式会社 | 情報処理装置 |
JP2022533300A (ja) * | 2019-03-10 | 2022-07-22 | カードーム テクノロジー リミテッド | キューのクラスター化を使用した音声強化 |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7697827B2 (en) | 2005-10-17 | 2010-04-13 | Konicek Jeffrey C | User-friendlier interfaces for a camera |
JP4496186B2 (ja) * | 2006-01-23 | 2010-07-07 | 株式会社神戸製鋼所 | 音源分離装置、音源分離プログラム及び音源分離方法 |
WO2009093416A1 (ja) * | 2008-01-21 | 2009-07-30 | Panasonic Corporation | 音声信号処理装置および方法 |
KR101178801B1 (ko) * | 2008-12-09 | 2012-08-31 | 한국전자통신연구원 | 음원분리 및 음원식별을 이용한 음성인식 장치 및 방법 |
FR2950461B1 (fr) * | 2009-09-22 | 2011-10-21 | Parrot | Procede de filtrage optimise des bruits non stationnaires captes par un dispositif audio multi-microphone, notamment un dispositif telephonique "mains libres" pour vehicule automobile |
WO2011055410A1 (ja) * | 2009-11-06 | 2011-05-12 | 株式会社 東芝 | 音声認識装置 |
US8676581B2 (en) * | 2010-01-22 | 2014-03-18 | Microsoft Corporation | Speech recognition analysis via identification information |
KR101670313B1 (ko) * | 2010-01-28 | 2016-10-28 | 삼성전자주식회사 | 음원 분리를 위해 자동적으로 문턱치를 선택하는 신호 분리 시스템 및 방법 |
US20120045068A1 (en) * | 2010-08-20 | 2012-02-23 | Korea Institute Of Science And Technology | Self-fault detection system and method for microphone array and audio-based device |
JP5810903B2 (ja) * | 2011-12-27 | 2015-11-11 | 富士通株式会社 | 音声処理装置、音声処理方法及び音声処理用コンピュータプログラム |
US9210499B2 (en) | 2012-12-13 | 2015-12-08 | Cisco Technology, Inc. | Spatial interference suppression using dual-microphone arrays |
FR3011377B1 (fr) * | 2013-10-01 | 2015-11-06 | Aldebaran Robotics | Procede de localisation d'une source sonore et robot humanoide utilisant un tel procede |
WO2015057661A1 (en) * | 2013-10-14 | 2015-04-23 | The Penn State Research Foundation | System and method for automated speech recognition |
US9911416B2 (en) * | 2015-03-27 | 2018-03-06 | Qualcomm Incorporated | Controlling electronic device based on direction of speech |
JP6543843B2 (ja) * | 2015-06-18 | 2019-07-17 | 本田技研工業株式会社 | 音源分離装置、および音源分離方法 |
JP6501260B2 (ja) * | 2015-08-20 | 2019-04-17 | 本田技研工業株式会社 | 音響処理装置及び音響処理方法 |
EP3157268B1 (en) * | 2015-10-12 | 2021-06-30 | Oticon A/s | A hearing device and a hearing system configured to localize a sound source |
JP6703460B2 (ja) * | 2016-08-25 | 2020-06-03 | 本田技研工業株式会社 | 音声処理装置、音声処理方法及び音声処理プログラム |
JP6723120B2 (ja) | 2016-09-05 | 2020-07-15 | 本田技研工業株式会社 | 音響処理装置および音響処理方法 |
CN107135443B (zh) * | 2017-03-29 | 2020-06-23 | 联想(北京)有限公司 | 一种信号处理方法及电子设备 |
CN107644650B (zh) * | 2017-09-29 | 2020-06-05 | 山东大学 | 一种基于渐进串行正交化盲源分离算法的改进声源定位方法及其实现系统 |
JP7013789B2 (ja) * | 2017-10-23 | 2022-02-01 | 富士通株式会社 | 音声処理用コンピュータプログラム、音声処理装置及び音声処理方法 |
EP3704873B1 (en) | 2017-10-31 | 2022-02-23 | Widex A/S | Method of operating a hearing aid system and a hearing aid system |
CN108520756B (zh) * | 2018-03-20 | 2020-09-01 | 北京时代拓灵科技有限公司 | 一种说话人语音分离的方法及装置 |
WO2019198306A1 (ja) * | 2018-04-12 | 2019-10-17 | 日本電信電話株式会社 | 推定装置、学習装置、推定方法、学習方法及びプログラム |
WO2021226515A1 (en) | 2020-05-08 | 2021-11-11 | Nuance Communications, Inc. | System and method for data augmentation for multi-microphone signal processing |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6967455B2 (en) * | 2001-03-09 | 2005-11-22 | Japan Science And Technology Agency | Robot audiovisual system |
-
2005
- 2005-12-02 WO PCT/JP2005/022601 patent/WO2006059806A1/ja active Application Filing
- 2005-12-02 JP JP2006546764A patent/JP4157581B2/ja active Active
- 2005-12-02 EP EP05814282A patent/EP1818909B1/en active Active
- 2005-12-02 US US11/792,052 patent/US8073690B2/en active Active
Non-Patent Citations (2)
Title |
---|
YAMAMOTO S. ET AL: "Assessment of general applicability of robot audition system by recognizing three simultaneous speeches.", PROCEEDINGS OF THE 2004 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS., 28 September 2004 (2004-09-28), pages 2111 - 2116, XP002995569 * |
YAMAMOTO S. ET AL: "Evaluation of MFT-Based Interface between Sound Source Separation and ASR.", ANNUAL CONFERENCE OF THE ROBOTICS SOCIETY OF JAPAN YOKOSHU, vol. 22, 15 September 2004 (2004-09-15), pages 1C33, XP002995570 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011191759A (ja) * | 2010-03-11 | 2011-09-29 | Honda Motor Co Ltd | 音声認識装置及び音声認識方法 |
JP2012088390A (ja) * | 2010-10-15 | 2012-05-10 | Honda Motor Co Ltd | 音声認識装置及び音声認識方法 |
JP2013097273A (ja) * | 2011-11-02 | 2013-05-20 | Toyota Motor Corp | 音源推定装置、方法、プログラム、及び移動体 |
JP2013250380A (ja) * | 2012-05-31 | 2013-12-12 | Yamaha Corp | 音響処理装置 |
JPWO2018207453A1 (ja) * | 2017-05-08 | 2020-03-12 | ソニー株式会社 | 情報処理装置 |
JP7103353B2 (ja) | 2017-05-08 | 2022-07-20 | ソニーグループ株式会社 | 情報処理装置 |
US11468884B2 (en) | 2017-05-08 | 2022-10-11 | Sony Corporation | Method, apparatus and computer program for detecting voice uttered from a particular position |
JP2022533300A (ja) * | 2019-03-10 | 2022-07-22 | カードーム テクノロジー リミテッド | キューのクラスター化を使用した音声強化 |
JP7564117B2 (ja) | 2019-03-10 | 2024-10-08 | カードーム テクノロジー リミテッド | キューのクラスター化を使用した音声強化 |
Also Published As
Publication number | Publication date |
---|---|
US8073690B2 (en) | 2011-12-06 |
EP1818909A1 (en) | 2007-08-15 |
US20080167869A1 (en) | 2008-07-10 |
EP1818909B1 (en) | 2011-11-02 |
JPWO2006059806A1 (ja) | 2008-06-05 |
JP4157581B2 (ja) | 2008-10-01 |
EP1818909A4 (en) | 2009-10-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2006059806A1 (ja) | 音声認識装置 | |
US10901063B2 (en) | Localization algorithm for sound sources with known statistics | |
US11711648B2 (en) | Audio-based detection and tracking of emergency vehicles | |
JP4516527B2 (ja) | 音声認識装置 | |
CN112116920B (zh) | 一种说话人数未知的多通道语音分离方法 | |
Izumi et al. | Sparseness-based 2ch BSS using the EM algorithm in reverberant environment | |
Kwan et al. | An automated acoustic system to monitor and classify birds | |
US11922965B2 (en) | Direction of arrival estimation apparatus, model learning apparatus, direction of arrival estimation method, model learning method, and program | |
US20060204019A1 (en) | Acoustic signal processing apparatus, acoustic signal processing method, acoustic signal processing program, and computer-readable recording medium recording acoustic signal processing program | |
Birnie et al. | Reflection assisted sound source localization through a harmonic domain music framework | |
Al-Karawi et al. | Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions | |
Traa et al. | Blind multi-channel source separation by circular-linear statistical modeling of phase differences | |
CN111243600A (zh) | 一种基于声场和场纹的语音欺骗攻击检测方法 | |
Vestman et al. | Time-varying autoregressions for speaker verification in reverberant conditions | |
Demir et al. | Improved microphone array design with statistical speaker verification | |
Zeremdini et al. | Multi-pitch estimation based on multi-scale product analysis, improved comb filter and dynamic programming | |
Habib et al. | Auditory inspired methods for localization of multiple concurrent speakers | |
Habib et al. | Improving Multiband Position-Pitch Algorithm for Localization and Tracking of Multiple Concurrent Speakers by Using a Frequency Selective Criterion. | |
Sharma et al. | Detection of various vehicles using wireless seismic sensor network | |
Chen et al. | Robust phase replication method for spatial aliasing problem in multiple sound sources localization | |
Cho et al. | Underwater radiated signal analysis in the modulation spectrogram domain | |
Zhou et al. | Replay attack anaysis based on acoustic parameters of overall voice quality | |
El Chami et al. | A phase-based dual microphone method to count and locate audio sources in reverberant rooms | |
Zhang et al. | Two microphone based direction of arrival estimation for multiple speech sources using spectral properties of speech | |
Wang | Towards Robust and Secure Audio Sensing Using Wireless Vibrometry and Deep Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KN KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2005814282 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006546764 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2005814282 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11792052 Country of ref document: US |