WO2012176932A1 - Dispositif, procédé et programme de traitement de la parole - Google Patents

Dispositif, procédé et programme de traitement de la parole Download PDF

Info

Publication number
WO2012176932A1
WO2012176932A1 PCT/JP2012/066449 JP2012066449W WO2012176932A1 WO 2012176932 A1 WO2012176932 A1 WO 2012176932A1 JP 2012066449 W JP2012066449 W JP 2012066449W WO 2012176932 A1 WO2012176932 A1 WO 2012176932A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
microphone
input signal
signal
sound
Prior art date
Application number
PCT/JP2012/066449
Other languages
English (en)
Japanese (ja)
Inventor
隆行 荒川
宝珠山 治
剛範 辻川
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Publication of WO2012176932A1 publication Critical patent/WO2012176932A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic

Definitions

  • the present invention relates to a voice processing device, a voice processing method, and a voice processing program for processing a mixed signal in which desired voice and noise are mixed.
  • Patent Document 1 discloses a voice detection device that detects whether or not a target voice is input based on voice signals from voices picked up by two directional microphones. That is, based on the level difference between the two audio signals and the power ratio, the desired audio is detected regardless of the noise level.
  • An object of the present invention is to solve the above problems and provide a voice processing device, a voice processing method, and a program for accurately detecting a desired voice regardless of the intensity of the desired voice.
  • a speech processing apparatus includes a first ratio of a noise source, a first microphone, and a second microphone, and a second input signal output from the second microphone. From noise estimation means for estimating noise originally, noise suppression means for suppressing a noise signal included in the first input signal output from the first microphone based on the output from the noise estimation means, and noise suppression means And a threshold value having a predetermined value to determine whether or not a desired voice is present.
  • an audio processing method includes a first ratio of a noise source, a first microphone, and a second microphone, and a second input signal output from the second microphone.
  • an audio processing program includes a first ratio of a noise source, a first microphone, and a second microphone, and a second input signal output from the second microphone.
  • Noise estimation processing for estimating noise originally noise suppression processing for suppressing a noise signal included in the first input signal output from the first microphone based on the output of the noise estimation processing, and output of the noise suppression processing And a threshold value having a predetermined value, and causing the computer to perform a determination process for determining whether or not a desired sound exists.
  • the sound is detected with high accuracy regardless of the intensity of the desired sound.
  • voice input into the audio processing apparatus which concerns on the 2nd and 3rd embodiment of this invention is shown.
  • generated in the audio processing apparatus which concerns on the 2nd Embodiment of this invention is shown.
  • generated in the audio processing apparatus which concerns on the 2nd Embodiment of this invention is shown.
  • a speech processing apparatus 100 according to a first embodiment of the present invention will be described with reference to FIG.
  • the speech processing apparatus 100 includes a derivation unit 101, an integration unit 102, a subtraction unit 103, and a determination unit 104.
  • the deriving unit 101 generates the first attenuation factor in the first microphone 110 generated by the noise source and propagated to the first microphone 110, and the noise generated by the noise source and propagated to the second microphone 120.
  • the attenuation factor ratio is derived.
  • the integrating unit 102 integrates the attenuation rate ratio with the second input signal output from the second microphone 120. Further, the subtracting unit 103 subtracts the integration result obtained by the integrating unit 102 from the first input signal output from the first microphone 110.
  • the determination unit 104 compares the subtraction result obtained by the subtraction 103 with a predetermined threshold value, and determines that the desired sound exists when the subtraction result is larger than the threshold value. According to the present embodiment, the noise is suppressed in consideration of the attenuation rate of noise propagating from the noise source to the microphone, and a desired voice is detected with high accuracy.
  • FIG. 2 shows an example of the arrangement of a microphone that generates an input signal input to the sound processing apparatus according to the present embodiment and a sound source that generates sound.
  • the sound from the two sound sources propagates to each of the two microphones.
  • desired sound is generated from the sound source 210, and noise is generated from the sound source 220.
  • a time series of the power of the sound generated from the sound source 210 is denoted as PA (t).
  • a time series of the power of the sound generated from the sound source 220 is denoted as PB (t).
  • PA (t) and PB (t) are not directly observable quantities.
  • the microphone 201 and the microphone 202 are arranged so that the distance between the microphone 201 and the sound source 210 is shorter than the distance between the microphone 202 and the sound source 201.
  • the microphone 201 generates a sound signal whose power time series is represented by P1 (t) based on the collected sound.
  • the microphone 202 generates a sound signal whose power time series is represented by P2 (t) based on the collected sound.
  • P1 (t) and P2 (t) are directly observable quantities.
  • the sound generated by the sound source 210 propagates to the microphone 201 and the microphone 202, and the power of the sound at the time of arrival at the microphone 201 and the microphone 202 is attenuated by the attenuation rates represented by dA1 and dA2, respectively.
  • the sound generated by the sound source 220 propagates to the microphone 201 and the microphone 202, and the power of the sound at the time of arrival at the microphone 201 and the microphone 202 is attenuated by attenuation factors represented by dB1 and dB2, respectively.
  • the sound source is a point sound source
  • the sound power is proportional to the inverse square of the distance between the sound source and the microphone, so the attenuation rate is the inverse of the square of the distance.
  • the time series P1 (t) and P2 (t) of the sound power collected by the microphone 201 and the microphone 202 are proportional to the sum of the sound power from the sound source 210 and the sound source 220 weighted by the attenuation rate.
  • DA (t) PA (t) ⁇ (dA1-dA2)
  • RA (t) dA1 / dA2 It is. Note that RA (t) and RB (t) have constant values over time.
  • a time series D (t) of power difference in a state where noise and desired speech are generated and a time series R (t) of power ratio are represented.
  • FIG. 3 shows an example of the time change of D (t), DA (t), and DB (t).
  • the power ratio time series R (t) is a linear combination of RA (t) and RB (t).
  • R (t) ⁇ (t) ⁇ RA (t) + (1 ⁇ (t)) ⁇ RB (t) It is.
  • ⁇ (t) 1 / (1 + PB (t) / PA (t) ⁇ dB2 / dA2). If the values of PA (t) and PB (t) are non-negative, ⁇ takes a value between 0 and 1.
  • FIG. 4 shows an example of temporal changes in R (t), RA (t), and RB (t).
  • the time series of R (t) includes the time values of the time series of RA (t) and RB (t) at a ratio of (1- ⁇ (t)) and ⁇ (t). It is a time series of divided values.
  • the detection efficiency depends on the conditions described below.
  • voice detection the presence of a desired voice is determined by comparing a feature amount with a threshold value.
  • a feature quantity having a large difference between a state where the desired sound is present and a state where the desired voice is not present is referred to as a “good” feature quantity
  • a feature quantity having a small difference is referred to as a “bad” feature quantity.
  • the following four conditions can be cited as conditions under which the power difference time series D (t) is a “bad” feature quantity.
  • Condition 1-1 The temporal variation of the time series PA (t) of the desired voice power is small. For example, the difference between the maximum value and the minimum value of PA (t) is small. At this time, the time variation of DA (t) is small.
  • Condition 1-2 The time variation of the noise power time series PB (t) is large. At this time, the time variation of DB (t) is large.
  • Condition 1-3 Desired sound is equally input to the microphone 201 and the microphone 202. At this time, dA1 is substantially equal to dA2, and DA (t) is substantially zero. Therefore, the time variation of DA (t) is small.
  • Condition 1-4 The sound signal generated from noise by the microphone 202 is much larger than the sound signal generated from noise by the microphone 201.
  • the temporal variation of DB (t) increases.
  • the temporal variation of DA (t) becomes smaller than the temporal variation of DB (t), and it is difficult to determine the threshold value. That is, as shown in FIG. 3, if the time variation of DB (t) is larger than the time variation of DA (t), it is difficult to detect DA (t) from D (t). At this time, D (t) is a “bad” feature quantity.
  • the voice detection using the power ratio time series R (t) the following two conditions can be cited as conditions for R (t) to be “bad” feature quantities.
  • Condition 2-1 Desired sound is equally input to the microphone 201 and the microphone 202.
  • dA1 is approximately equal to dA2, and RA (t) is approximately 1.
  • Condition 2-2 Noise is equally input to the microphone 201 and the microphone 202.
  • dB1 is approximately equal to dB2, and RB (t) is approximately 1.
  • RA (t) is approximately equal to dB2
  • RB (t) is approximately 1.
  • the difference between RA (t) and RB (t) becomes small, making it difficult to determine the threshold value.
  • the distance between the speaker's mouth and the microphone which is the sound source of the desired sound, is large, the levels of the desired sound input to the microphone 201 and the microphone 202 are close to each other. For this reason, it is difficult to determine the threshold value regardless of which of the power difference D (t) and the power ratio R (t) is used.
  • time series E of power in which noise is suppressed from the sound signal of the microphone 201. (T) is used.
  • the time series of power with suppressed noise is denoted as E (t) and will be described below.
  • the estimated value of the noise attenuation rate ratio RB (t) is denoted as Q (t).
  • Q (t) the power time series E (t) with suppressed noise is proportional to the time series PA (t) of the desired speech power. That is, E (t) is rewritten as follows.
  • E (t) is non-zero if the ratio dA1 / dA2 of the desired voice attenuation rate is larger than the ratio dB1 / dB2 of the noise attenuation rate. For this reason, it is possible to determine the presence of a desired voice by setting the threshold value to a positive value. Since the absolute value of the threshold can be arbitrarily set to a small value, the desired audio power value may be small.
  • E (t) does not depend on the magnitude relationship between the noises input to the two microphones.
  • the threshold value may be fixed to a positive value. Since the threshold value can be arbitrarily set to a small absolute value, the voice can be detected regardless of the desired voice level. Furthermore, since the noise time is not included in the time series E (t) of the power in which noise is suppressed, voice detection is performed without depending on the magnitude of the noise.
  • FIG. 5 is a block diagram showing the configuration of the speech processing apparatus according to this embodiment.
  • the speech processing apparatus 500 includes a microphone 201, a microphone 202, a power calculation unit 503, a power calculation unit 504, a noise power ratio estimation unit 505, a noise power estimation unit 506, a noise A suppression power estimation unit 507 and a threshold comparison unit 508 are included.
  • the microphone 201 is closer to the desired sound source than the microphone 202.
  • the microphone 201 outputs a first mixed signal in which desired voice and noise are mixed.
  • the microphone 202 outputs a second mixed signal in which desired voice and noise are mixed at a mixing ratio different from that of the first mixed signal.
  • the power calculation unit 503 calculates and outputs power based on the first mixed signal.
  • the power calculation unit 504 calculates and outputs power based on the second mixed signal.
  • the noise power ratio estimation unit 505 estimates and outputs the noise power ratio based on the power of the first mixed signal and the power of the second mixed signal.
  • the noise power estimation unit 506 estimates and outputs the noise power included in the first mixed signal based on the power of the second mixed signal and the noise power ratio.
  • the noise suppression power estimation unit 507 estimates and outputs the noise suppression power based on the power of the first mixed signal and the estimated value of the noise power included in the first mixed signal.
  • the threshold value comparison unit 508 compares the noise suppression power with a preset threshold value, and determines whether or not a desired voice exists.
  • a first mixed signal in which desired voice and noise are mixed is acquired in the microphone 201, and a second mixed signal in which desired voice and noise are mixed in the microphone 202 at a mixing ratio different from that of the first input signal.
  • a signal is acquired (step S601). That is, an analog signal such as a potential difference is converted by an analog-digital (AD) converter into digital data having a quantization size of 16 bits and a sampling number of 44 kHz, for example, and the first mixed signal and the second mixed signal. Is output as The power calculation unit 503 calculates a power time series based on the first mixed signal, and the power calculation unit 504 calculates a power time series based on the second mixed signal (step S602).
  • AD analog-digital
  • the power is obtained for the first or second mixed signal that is cut out in units of a short time length such as 20 milliseconds.
  • the time series of the power of the first mixed signal and the power value of the second mixed signal at time t are denoted as P1 (t) and P2 (t), respectively.
  • a power calculation method for example, a process of averaging the square of input waveform data for each sample over the number of samplings in a unit time is employed. Or the process which averages the square of the spectrum calculated
  • the noise power ratio estimation unit 505 estimates the ratio Q (t) of the noise power contained in the first mixed signal and the noise power contained in the second mixed signal (step S603).
  • the following methods can be considered as a method for estimating Q (t).
  • the ratio of the noise power does not depend on the value of the noise power generated by the noise source, and takes a constant value determined by the positional relationship. For this reason, the ratio of the power P1 (t) of the first mixed signal and the power P2 (t) of the second mixed signal is obtained at a plurality of times before the desired sound is generated, and the average value of the ratio is obtained.
  • the estimated value of RB (t) is used.
  • the noise power ratio Q (t) at time t is equal to the noise power ratio Q (t ⁇ 1) at time t ⁇ 1, the power P1 (t) of the first mixed signal at time t, and the second. Is obtained according to the following relational expression from the power P2 (t) of the mixed signal.
  • Q (t) ⁇ ⁇ P1 (t) / P2 (t) + (1 ⁇ ) ⁇ Q (t ⁇ 1)
  • takes a value in the range from 0 to 1.
  • is substantially zero when P1 (t) / P2 (t)> Q (t ⁇ 1), and when P1 (t) / P2 (t) ⁇ Q (t ⁇ 1), The value is almost 1.
  • the follow-up of the value of Q (t) with respect to the increase in the value of P1 (t) / P2 (t) is slow and the follow-up with respect to the decrease is fast.
  • the noise power ratio Q (t) the same procedure as in the other general noise estimation methods described above may be used.
  • P1 (t) / P2 (t) is regarded as the power of an input signal in which desired speech and noise are mixed.
  • Q (t) is the noise power estimated from the signal power using the noise estimation method.
  • a general noise estimation method a method of storing the minimum value of the power of the input signal over a predetermined time and outputting it as noise power may be employed.
  • the noise power estimation unit 506 estimates the power of noise included in the first mixed signal (step S604).
  • the noise power is estimated by multiplying the power P2 (t) of the second mixed signal by the noise power ratio Q (t).
  • the ratio RB () of the power P2 (t) of the second mixed signal and the power of the noise has high accuracy of estimated noise.
  • the noise suppression power estimation unit 507 suppresses the noise included in the first mixed signal, and estimates the time series E (t) of the power with the noise suppressed (step S605).
  • the other general noise removal method described above may be used for the estimation of the time series E (t) of the power in which noise is suppressed.
  • P1 (t) is regarded as the power of an input signal in which desired speech and noise are mixed, and Q (t) ⁇ P2 (t) is regarded as the estimated noise power. Then, the noise power estimated from the power of the input signal is removed using a noise removal method.
  • a general noise removal method in addition to the subtraction process as described above, a method of suppressing the noise power by a process of multiplying the calculated noise reduction filter by the power of the input signal may be employed. .
  • the threshold comparison unit 508 compares the time series E (t) of the power with suppressed noise and a preset threshold ⁇ to determine whether or not a desired voice exists (step S606).
  • E (t) is larger than the threshold ⁇ , it is determined that there is a voice, and if not, it is determined that there is no voice.
  • the value of the threshold ⁇ is arbitrarily set to a value slightly larger than 0. In the time series E (t) of power with suppressed noise, the noise is almost completely removed regardless of the magnitude of the noise.
  • the desired voice is included in the second mixed signal, a part of the desired voice is suppressed together with the noise by the above-described processing such as subtraction.
  • the desired sound input to the microphone 201 is slightly larger than the desired sound input to the microphone 202, all the desired sounds are canceled and do not disappear.
  • the presence of the desired voice is detected by comparing the time series E (t) of power with suppressed noise and the threshold value ⁇ . Further, since the value of the threshold ⁇ does not depend on the magnitude of noise, a constant value that does not depend on noise is used. According to the speech processing method according to the present embodiment, the object of the present invention is achieved.
  • the voice detection in the voice processing method according to the present embodiment described above may be performed in each of the divided frequency bands. In this case, noise suppression power E (t) may be obtained for each frequency band, and the average or sum thereof may be compared with a threshold value, or E (t) may be compared with a threshold value for each frequency band, and the result may be It may be integrated by majority vote or the like.
  • the speech processing apparatus 700 includes an adaptive filter 701.
  • the adaptive filter 701 receives the second mixed signal, and estimates the noise signal included in the first mixed signal from the noise signal included in the second mixed signal. That is, from the second microphone 202 to which the noise that is the noise signal included in the second mixed signal is input, to the first microphone 201 to which the noise that is the noise signal included in the first mixed signal is input.
  • An impulse response method is applied to the noise path to estimate a pseudo noise signal included in the first mixed signal.
  • a pseudo-emphasized signal that is estimated to remove noise is obtained.
  • the adaptive filter 701 for example, an adaptive filter disclosed in Japanese Patent Laid-Open No. 08-056180 is employed.
  • the pseudo enhancement signal is input to the power calculation unit 503, and the pseudo noise signal is input to the power calculation unit 504. Based on the input signal, the same processing as in the second embodiment is performed.
  • the same processing as in the second embodiment is performed.
  • FIG. 2 shows an arrangement of a sound source that generates sound input to the sound processing apparatus according to the present embodiment and a microphone that acquires sound.
  • the desired sound source 210 is near the microphone 201 and far from the microphone 202
  • the noise source 220 is near the microphone 202 and far from the microphone 201.
  • a speech processing apparatus 800 according to the fourth embodiment of the present invention will be described with reference to FIGS. As shown in FIG. 8, the sound processing apparatus according to this embodiment includes a first beamformer 801, a second microphone 202, and a power calculation unit 504 between the first microphone 201 and the power calculation unit 503. And a second beam former 802.
  • the first beamformer 801 calculates the sum of the first mixed signal and the second mixed signal in the time waveform region, and obtains a sum signal.
  • the second beamformer 802 calculates a difference in the time waveform region between the first mixed signal and the second mixed signal, and obtains a difference signal.
  • the sum signal is input to the power calculation unit 503, and the difference signal is input to the power calculation unit 504.
  • the subsequent processing is the same as the processing in the second embodiment.
  • FIG. 9 shows an arrangement of a sound source that generates sound input to the sound processing apparatus according to the present embodiment and a microphone that acquires sound.
  • the sound source 210 that generates the desired sound is equidistant from the microphone 201 and the microphone 202, and the sound source 220 that generates noise is close to either the microphone 201 or the microphone 202.
  • the difference signal generated by the second beam former 802 the audio signal from the desired audio is canceled and disappears, and only the noise signal remains.
  • the noise in the sum signal is suppressed.
  • the first beamformer 801 is preferably used.
  • the difference signal is calculated, and the second beamformer 802 calculates the sum signal.
  • the difference signal is input to the power calculation unit 503, and the sum signal is input to the power calculation unit 504.
  • the output from the first beamformer and the output from the second beamformer are corrected in consideration of the ratio of the noise attenuation rate.
  • the first beamformer 801 may perform a beamformer that directs the beam in a direction in which desired sound is generated, or the second beamformer 802 may direct the beam in a direction in which noise is generated. May be performed.
  • the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device.
  • the present invention can also be applied to a case where an information processing program that implements the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention with a computer, a program installed in the computer, a medium storing the program, and a WWW (World Wide Web) server to which the program is downloaded are also included in the scope of the present invention. It is.
  • This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2011-140668 for which it applied on June 24, 2011, and takes in those the indications of all here.
  • the present invention can be suitably applied to a sound processing device that detects a desired sound.
  • the present invention is suitably applied to a voice processing apparatus that suppresses noise mixed in from the surrounding environment and detects the utterance of a desired voice even in a situation where the signal level of the desired voice is not high.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

L'invention concerne un dispositif de traitement de la parole permettant de détecter des paroles souhaitées avec une excellente précision indépendamment de leur intensité. Le dispositif de traitement de la parole comprend : un moyen d'estimation de bruit qui estime un bruit en fonction d'un premier rapport de source de bruit, un premier microphone, un second microphone et un second signal d'entrée qui est émis par le second microphone ; un moyen de suppression de bruit qui supprime un signal de bruit compris dans un premier signal d'entrée émis par le premier microphone en fonction de la sortie du moyen d'estimation de bruit ; et un moyen de détermination qui détermine si les paroles souhaitées sont présentes ou non par comparaison de la sortie du moyen de suppression de bruit et d'une valeur seuil présentant une valeur prédéterminée.
PCT/JP2012/066449 2011-06-24 2012-06-21 Dispositif, procédé et programme de traitement de la parole WO2012176932A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011140668A JP2014194437A (ja) 2011-06-24 2011-06-24 音声処理装置、音声処理方法および音声処理プログラム
JP2011-140668 2011-06-24

Publications (1)

Publication Number Publication Date
WO2012176932A1 true WO2012176932A1 (fr) 2012-12-27

Family

ID=47422754

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/066449 WO2012176932A1 (fr) 2011-06-24 2012-06-21 Dispositif, procédé et programme de traitement de la parole

Country Status (2)

Country Link
JP (1) JP2014194437A (fr)
WO (1) WO2012176932A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106024004A (zh) * 2016-05-11 2016-10-12 Tcl移动通信科技(宁波)有限公司 一种移动终端双麦降噪处理方法、系统及移动终端
CN107331407A (zh) * 2017-06-21 2017-11-07 深圳市泰衡诺科技有限公司 下行通话降噪方法及装置
JP2018164156A (ja) * 2017-03-24 2018-10-18 沖電気工業株式会社 収音装置、プログラム及び方法
US11395079B2 (en) * 2020-04-28 2022-07-19 Beijing Xiaomi Pinecone Electronics Co., Ltd. Method and device for detecting audio input module, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3545691B1 (fr) * 2017-01-04 2021-11-17 Harman Becker Automotive Systems GmbH Capture sonore en champ lointain

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03212697A (ja) * 1990-01-18 1991-09-18 Matsushita Electric Ind Co Ltd 信号処理装置
JP2005529379A (ja) * 2001-11-21 2005-09-29 アリフコム 電子的信号からノイズを除去する方法および装置
JP2009503568A (ja) * 2005-07-22 2009-01-29 ソフトマックス,インコーポレイテッド 雑音環境における音声信号の着実な分離

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03212697A (ja) * 1990-01-18 1991-09-18 Matsushita Electric Ind Co Ltd 信号処理装置
JP2005529379A (ja) * 2001-11-21 2005-09-29 アリフコム 電子的信号からノイズを除去する方法および装置
JP2009503568A (ja) * 2005-07-22 2009-01-29 ソフトマックス,インコーポレイテッド 雑音環境における音声信号の着実な分離

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ASANO: "Griffiths-Jim Type Adaptive Beamformer with Divided Structure", IEICE TECHNICAL REPORT, vol. 95, no. 587, 15 March 1996 (1996-03-15), pages 17 - 24 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106024004A (zh) * 2016-05-11 2016-10-12 Tcl移动通信科技(宁波)有限公司 一种移动终端双麦降噪处理方法、系统及移动终端
CN106024004B (zh) * 2016-05-11 2019-03-26 Tcl移动通信科技(宁波)有限公司 一种移动终端双麦降噪处理方法、系统及移动终端
JP2018164156A (ja) * 2017-03-24 2018-10-18 沖電気工業株式会社 収音装置、プログラム及び方法
CN107331407A (zh) * 2017-06-21 2017-11-07 深圳市泰衡诺科技有限公司 下行通话降噪方法及装置
CN107331407B (zh) * 2017-06-21 2020-10-16 深圳市泰衡诺科技有限公司 下行通话降噪方法及装置
US11395079B2 (en) * 2020-04-28 2022-07-19 Beijing Xiaomi Pinecone Electronics Co., Ltd. Method and device for detecting audio input module, and storage medium

Also Published As

Publication number Publication date
JP2014194437A (ja) 2014-10-09

Similar Documents

Publication Publication Date Title
JP5817366B2 (ja) 音声信号処理装置、方法及びプログラム
EP1995940B1 (fr) Procédé et appareil de traitement d'au moins deux signaux de microphone pour fournir un signal de sortie avec une réduction des interférences
US8472616B1 (en) Self calibration of envelope-based acoustic echo cancellation
US10062372B1 (en) Detecting device proximities
JP4916394B2 (ja) エコー抑圧装置、エコー抑圧方法及びコンピュータプログラム
US20090238373A1 (en) System and method for envelope-based acoustic echo cancellation
TWI738532B (zh) 具多麥克風之語音增強裝置及方法
JP5278220B2 (ja) ハウリングキャンセラ
US8891780B2 (en) Microphone array device
JP2007523514A (ja) 適応ビームフォーマ、サイドローブキャンセラー、方法、装置、及びコンピュータープログラム
WO2012176932A1 (fr) Dispositif, procédé et programme de traitement de la parole
CN112272848B (zh) 使用间隙置信度的背景噪声估计
US9330677B2 (en) Method and apparatus for generating a noise reduced audio signal using a microphone array
US20170310360A1 (en) Echo removal device, echo removal method, and non-transitory storage medium
CN110148421B (zh) 一种残余回声检测方法、终端和装置
US11205437B1 (en) Acoustic echo cancellation control
JP6265136B2 (ja) 雑音除去システム、音声検出システム、音声認識システム、雑音除去方法および雑音除去プログラム
JP2011069901A (ja) 雑音除去装置
JP2010220087A (ja) 音響処理装置およびプログラム
JP2005142756A (ja) エコーキャンセラ
JP4395105B2 (ja) 音響結合量推定方法、音響結合量推定装置、プログラム、記録媒体
JP2017040752A (ja) 音声判定装置、方法及びプログラム、並びに、音声信号処理装置
JP6631127B2 (ja) 音声判定装置、方法及びプログラム、並びに、音声処理装置
JP2018142819A (ja) 非目的音判定装置、プログラム及び方法
JP4478045B2 (ja) エコー消去装置、エコー消去方法、エコー消去プログラムおよびその記録媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12802041

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12802041

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP