JP4568905B2

JP4568905B2 - Microphone device and speech detection device

Info

Publication number: JP4568905B2
Application number: JP2004329307A
Authority: JP
Inventors: 寧佐藤; 晃佐宗; 宏明児島
Original assignee: Kenwood KK; National Institute of Advanced Industrial Science and Technology AIST
Current assignee: Kenwood KK; National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2004-11-12
Filing date: 2004-11-12
Publication date: 2010-10-27
Anticipated expiration: 2024-11-12
Also published as: JP2006139117A

Description

本発明は、マイクロホン装置および発話検出装置に関する。 The present invention relates to a microphone apparatus and a speech detection apparatus.

近年、マイクロホン装置、通称「マイク」は、映像や放送の分野だけにとどまらず、活躍の舞台を様々な世界へと広げている。こうした用途の広がりに応じて、時代はますます小型且つ高性能なマイクを求めている。例えば、ナビゲーションシステムにおいては、指示を送る側と受け取る側との間の、正確な情報伝達が重要である。 In recent years, microphone devices, commonly known as “microphones”, have expanded beyond the field of video and broadcasting to a variety of worlds. In response to this widespread use, the times demand increasingly smaller and higher performance microphones. For example, in a navigation system, accurate information transmission between an instruction sending side and a receiving side is important.

また、近年盛んに研究が為されている音声認識の分野においても、マイクロホン装置が担う役割は大きい。例えば、マイクロホン装置が話し手以外からの音声（いわゆる雑音）を拾ってしまうことは、しばしば認識の低下（認識の劣化）を引き起こす原因となる。こうした、雑音によって起こる認識の劣化を防ぐために、雑音を除去した上で音声伝達を行うことが可能なマイクロホン装置が考えられた（例えば、特許文献１）。 Also, in the field of speech recognition, which has been actively researched in recent years, the microphone device plays a major role. For example, when a microphone device picks up sound (so-called noise) from a person other than the speaker, it often causes a decrease in recognition (deterioration of recognition). In order to prevent such degradation of recognition caused by noise, a microphone device capable of performing voice transmission after removing noise has been considered (for example, Patent Document 1).

こうした雑音の除去のような認識率改善処理を行う場合、話者が発声している時間（音声区間）の情報を取得し、発話に合わせて処理を行う必要がある。例えば、音声区間の情報を誤って取得していると、話者が話し出す前の状態と話者が既に話し始めている状態とを混同して処理を行うことになり、認識率は思うように改善されない。そこで、音声区間の情報を適切に取得するための方法として、画像認識やカメラ等を用いて人間の唇の動きを検出する研究が、ＡＴＲ（国際電気通信基礎技術研究所）等において為されている。
特開２００３−１１１１８６号公報 When performing recognition rate improvement processing such as noise removal, it is necessary to acquire information on the time (voice segment) during which the speaker is speaking and perform processing in accordance with the speech. For example, if the information of the voice section is acquired by mistake, the processing before the speaker starts speaking will be confused with the state where the speaker has already started speaking, and the recognition rate will improve as expected. Not. Therefore, as a method for appropriately acquiring the information of the voice section, research for detecting the movement of the human lips by using image recognition, a camera or the like has been performed at ATR (International Telecommunications Research Institute). Yes.
JP 2003-111186 A

上述したように、画像認識やカメラによって人間の唇の動きを検出する方法を用いる場合、その方法を実現するための機材自体が非常に大きなものとなり、金額も高価なものとなってしまう。また、そのシステム全体を実現するためのＣＰＵのリソースも大きなものになってしまう。 As described above, when a method for detecting the movement of a human lip by image recognition or a camera is used, the equipment itself for realizing the method becomes very large and the amount of money becomes expensive. Also, CPU resources for realizing the entire system become large.

本発明は、上記実状に鑑みて為されたものであり、話者の発話期間を効率的に検出可能なマイクロホン装置を容易に実現することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to easily realize a microphone device that can efficiently detect a speaker's speech period.

上記目的を達成するため、本発明の第１の観点に係るマイクロホン装置は、
話者に当てるための超音波を送出する超音波送出手段と、
話者の発話による音波を受信する音波受信手段と、
前記音波受信手段と共通の受信手段により構成されるとともに前記超音波送出手段の近傍に設置され、前記超音波送出手段より送出された超音波が話者に当たって反射した反射波を受信する反射波受信手段と、
前記反射波受信手段により受信した反射波に基づいて、話者が話しているか否かを判別する発話判別手段と、
前記発話判別手段により話者が話していると判別した場合にのみ、前記音波受信手段により受信した音波に基づいて音声を出力する出力手段とを備える、
ことを特徴とする。 In order to achieve the above object, a microphone device according to a first aspect of the present invention includes:
Ultrasonic transmission means for transmitting ultrasonic waves to be applied to the speaker;
Sound wave receiving means for receiving sound waves from the speaker's utterance;
Wherein arranged in the proximity of the ultrasonic delivery means, the reflected wave reception ultrasonic wave transmitted from the ultrasonic transmitting means receives the reflected wave reflected when the speaker while being constituted by a common receiving means and the wave reception unit Means,
Utterance discriminating means for discriminating whether or not a speaker is speaking based on the reflected wave received by the reflected wave receiving means ;
Only when it is determined that the speaking speaker by the speech discriminating means, and output means for outputting sound based on the sound wave received by the wave receiving means,
It is characterized by that.

前記受信手段は、前記音波と前記反射波との混合波を受信して、該混合波を電気信号に変換するものであり、
前記混合波の電気信号を、音波に相当する信号と反射波に相当する信号とに分離する分離手段をさらに備え、
前記発話判別手段は、分離された反射波の信号に基づいて、話者が発話しているか否かを判別することが好ましい。 The receiving means receives a mixed wave of the sound wave and the reflected wave, and converts the mixed wave into an electric signal,
Separating means for separating the electric signal of the mixed wave into a signal corresponding to a sound wave and a signal corresponding to a reflected wave,
The utterance discrimination means preferably discriminates whether or not the speaker is speaking based on the separated reflected wave signal.

前記分離手段は、前記混合波の信号に含まれる成分の周波数に基づいて、前記混合波を音波と超音波とに分離することが好ましい。 It is preferable that the separation unit separates the mixed wave into a sound wave and an ultrasonic wave based on a frequency of a component included in the mixed wave signal.

前記発話判別手段は、
前記反射波受信手段により受信した反射波を検波して、信号波形を抽出する検波手段と、
前記検波手段によって抽出された信号波形に基づいて、話者の唇が発話動作を示す動きをしているか否かを判別する発話動作検出手段とを備える、
ことが好ましい。 The utterance discrimination means is
Detecting means for detecting a reflected wave received by the reflected wave receiving means and extracting a signal waveform; and
On the basis of the signal waveform extracted by the detection means, speaker lips and a speech activity detector for determining whether or not the motion indicating the speech operation,
It is preferable.

前記発話動作検出手段は、前記検波手段によって抽出された信号波形の振幅値の時間変化を検出し、該時間変化が所定の閾値より大きいか否かを判別して、話者の唇が発話動作を示す動きをしているか否かを判定することが好ましい。 The speech action detecting means detects a time change of the amplitude value of the signal waveform extracted by the detection means, determines whether the time change is larger than a predetermined threshold, and the speaker's lips are made to speak. It is preferable to determine whether or not it is moving.

前記超音波送出手段と前記受信手段とは、共通のダイナミックマイクから構成されることが好ましい。 It is preferable that the ultrasonic transmission means and the reception means are constituted by a common dynamic microphone.

上記目的を達成するため、本発明の第２の観点に係る発話検出装置は、
話者に当てるための超音波を送出する超音波送出手段と、
話者の発話による音波を受信する音波受信手段と、
前記音波受信手段と共通の受信手段により構成されるとともに前記超音波送出手段の近傍に設置され、前記超音波送出手段より送出された超音波が話者に当たって反射した反射波を受信する反射波受信手段と、
前記反射波受信手段により受信した反射波に基づいて、話者が話しているか否かを判別する発話判別手段とを備える、
ことを特徴とする。 In order to achieve the above object, an utterance detection apparatus according to the second aspect of the present invention provides:
Ultrasonic transmission means for transmitting ultrasonic waves to be applied to the speaker;
Sound wave receiving means for receiving sound waves from the speaker's utterance;
Wherein arranged in the proximity of the ultrasonic delivery means, the reflected wave reception ultrasonic wave transmitted from the ultrasonic transmitting means receives the reflected wave reflected when the speaker while being constituted by a common receiving means and the wave reception unit Means,
On the basis of the reflected wave received by the reflection wave receiving means, and a speech discrimination means for discriminating whether the speaking speaker,
It is characterized by that.

本発明によれば、話者の発話期間を検出可能なマイクロホン装置が実現できる。 ADVANTAGE OF THE INVENTION According to this invention, the microphone apparatus which can detect a speaker's speech period is realizable.

本発明の実施の形態に係るマイクロホン装置１は、図１に示すように、マイクユニット１１と、帯域分離部１２と、検波回路１３と、微分回路１４と、発話判定部１５と、スピーカスイッチ１６と、を備える。マイクユニット１１は、超音波送出部１１１と、音波超音波受信部１１２とから構成される。帯域分離部１２は、ＬＰＦ（Low Pass Filter）１２１と、ＢＰＦ（Band Pass Filter）１２２とから構成される。また、帯域分離部１２と、検波回路１３と、微分回路１４と、発話判定部１５と、スピーカスイッチ１６とは、筐体Ｋの内部に格納されている。 As shown in FIG. 1, the microphone device 1 according to the embodiment of the present invention includes a microphone unit 11, a band separation unit 12, a detection circuit 13, a differentiation circuit 14, an utterance determination unit 15, and a speaker switch 16. And comprising. The microphone unit 11 includes an ultrasonic transmission unit 111 and a sonic ultrasonic reception unit 112. The band separation unit 12 includes an LPF (Low Pass Filter) 121 and a BPF (Band Pass Filter) 122. Further, the band separation unit 12, the detection circuit 13, the differentiation circuit 14, the utterance determination unit 15, and the speaker switch 16 are stored inside the housing K.

マイクユニット１１は、通常、マイクロホン装置１を用いて音声の伝達を行おうとする話者（以下、話者Ｓ）が自身の口元近辺に位置させ、発話した音声の入力口とする部分である。
超音波送出部１１１は、トランスジューサ等の小型の超音波発振器から構成され、マイクロホン装置１の電源（図示せず）がＯＮされるのに伴い、所定の周波数（例えば、４０ｋＨｚ程度）の超音波を外部の所定範囲に向けて連続的に送出する。
音波超音波受信部１１２は、ＥＣＭ（エレクトレット・コンデンサ・マイクロホン）等から構成され、話者Ｓが発した音声（音波）を集音する。また、音波超音波受信部１１２は、超音波送出部１１１により発せられた超音波の話者Ｓに当たって跳ね返った反射波を受信する。音波および超音波を受信すると、音波超音波受信部１１２は、それらを電気信号に変換して帯域分離部１２に供給する。 The microphone unit 11 is a portion that is usually used as an input port for spoken speech that is located near the mouth of a speaker (hereinafter referred to as speaker S) who wants to transmit speech using the microphone device 1.
The ultrasonic transmission unit 111 is composed of a small ultrasonic oscillator such as a transducer, and emits ultrasonic waves of a predetermined frequency (for example, about 40 kHz) as the power supply (not shown) of the microphone device 1 is turned on. Continuously send to a predetermined external range.
The sonic wave ultrasonic reception unit 112 is configured by an ECM (electret, condenser, microphone) or the like, and collects sound (sound wave) emitted by the speaker S. The ultrasonic ultrasonic wave reception unit 112 receives the reflected wave that bounces off the ultrasonic speaker S emitted from the ultrasonic wave transmission unit 111. When the sound wave and the ultrasonic wave are received, the sound wave ultrasonic wave reception unit 112 converts them into an electric signal and supplies it to the band separation unit 12.

帯域分離部１２は、音波超音波受信部１１２から供給された電気信号を、その周波数に基づいて、音波に相当する信号と超音波に相当する信号とに分離する。より詳細には、ＬＰＦ１２１は音波を通過させ、ＢＰＦ１２２は超音波を通過させる。すなわち、ＬＰＦ１２１は所定の遮断周波数（約２０ｋＨｚ）以下の周波数の信号だけを通過させ、遮断周波数以上の周波数の信号を減衰させる。一方、ＢＰＦ１２２は、所定の周波数範囲（４０ｋＨｚ前後）の周波数の信号だけを通過させ、それ以外の周波数の信号を減衰させる。 The band separation unit 12 separates the electrical signal supplied from the sonic ultrasonic wave reception unit 112 into a signal corresponding to the sound wave and a signal corresponding to the ultrasonic wave based on the frequency. More specifically, the LPF 121 passes sound waves, and the BPF 122 passes ultrasonic waves. That is, the LPF 121 passes only a signal having a frequency equal to or lower than a predetermined cutoff frequency (about 20 kHz) and attenuates a signal having a frequency equal to or higher than the cutoff frequency. On the other hand, the BPF 122 passes only signals having a frequency in a predetermined frequency range (around 40 kHz) and attenuates signals having other frequencies.

検波回路１３は、ＢＰＦ１２２から供給された信号波形を検波する回路であり、半波整流のためのダイオードと信号波形を平滑化するためのコンデンサとから構成される。 The detection circuit 13 is a circuit that detects the signal waveform supplied from the BPF 122, and includes a diode for half-wave rectification and a capacitor for smoothing the signal waveform.

微分回路１４は、検波回路１３が検波した信号波形に対し、その信号波形を時間に対して微分したものに相当する信号（振幅値の時間変化に相当する信号）を出力する回路である。ここで、検波回路１３を通過した信号は話者Ｓに当たって跳ね返った反射波であり、その時間変化に相当する信号は、話者Ｓの動作（具体的には唇の動き）を示す信号となる。この微分回路１４を通ることにより、話者Ｓの体の動き、すなわち顔の前後左右へのゆっくりとした動きを示す信号波形は抽出されず、唇の動きを示す信号波形のみが抽出される。 The differentiating circuit 14 is a circuit that outputs a signal corresponding to a signal waveform differentiated with respect to time with respect to the signal waveform detected by the detection circuit 13 (a signal corresponding to a temporal change in amplitude value). Here, the signal that has passed through the detection circuit 13 is a reflected wave that bounces off the speaker S, and the signal corresponding to the time change is a signal that indicates the operation of the speaker S (specifically, the movement of the lips). . By passing through the differentiating circuit 14, the signal waveform indicating the movement of the body of the speaker S, that is, the slow movement of the face to the front, back, left and right is not extracted, but only the signal waveform indicating the movement of the lips is extracted.

発話判定部１５は、図示するようにコンパレータ等を備え、微分回路１４から供給された、振幅値の時間変化に相当する信号が所定の閾値を超えているか否かを判別する。すなわち、話者の唇の動きの速さが検出され、この速さが所定の閾値を超えているか否かによって、話者Ｓの唇の動きが発話を示すレベルのものであるか否かを判別する。そして、閾値を超えていると判別すると、音声出力のためのスイッチのオンを要求する所定レベルの信号を、スピーカスイッチ１６に出力する。 The utterance determination unit 15 includes a comparator or the like as shown in the figure, and determines whether or not the signal supplied from the differentiating circuit 14 and corresponding to the time change of the amplitude value exceeds a predetermined threshold value. That is, the speed of movement of the speaker's lips is detected, and whether or not the movement of the lips of the speaker S is at a level indicating utterance depending on whether or not the speed exceeds a predetermined threshold value. Determine. When it is determined that the threshold value is exceeded, a signal of a predetermined level that requests the switch on for audio output to be turned on is output to the speaker switch 16.

スピーカスイッチ１６は、モーメンタリ型のスイッチから構成され、発話判定部１５から所定レベルの信号を供給されると、音声出力についてのスイッチをＯＮに設定する。尚、スピーカスイッチ１６は、発話判定部１５からの所定レベルの信号を受け取らない場合、音声出力をＯＦＦの状態で維持し続ける（オンに切り替えない）。より詳細には、微分回路１４によって、唇が大きく速く動いた状態が信号として出力されるので、例えば口をゆっくり大きく開けた場合や、大きく動かしてもそのまま動かない場合などは、その変化が微分回路１４でカットされ、スピーカスイッチ１６がオンにならない。 The speaker switch 16 is composed of a momentary switch, and when a signal of a predetermined level is supplied from the utterance determination unit 15, the switch for sound output is set to ON. If the speaker switch 16 does not receive a signal of a predetermined level from the utterance determination unit 15, the speaker output 16 continues to maintain the voice output in an OFF state (not switched on). More specifically, since the differentiation circuit 14 outputs a signal indicating that the lips have moved greatly and quickly as a signal, for example, when the mouth is slowly opened wide or when it does not move as it is moved, the change is differentiated. The circuit 14 is cut and the speaker switch 16 is not turned on.

次に、本実施の形態に係るマイクロホン装置１が行う処理動作について、図２のフローチャートを参照して説明する。尚、ここで、話者Ｓは、図３に示すように、マイクロホン装置１を手に取り、筐体Ｋを掴んでマイクユニット１１を口元に近づけているものとする。 Next, processing operations performed by the microphone device 1 according to the present embodiment will be described with reference to the flowchart of FIG. Here, it is assumed that the speaker S holds the microphone device 1 in his hand, grasps the housing K, and brings the microphone unit 11 close to the mouth as shown in FIG.

マイクユニット１１内の超音波送出部１１１は、所定の周波数の超音波を外部に対して連続的に送出する（ステップＳ１０１）。超音波の一部は話者Ｓの唇に当たり、反射波を発生させる。 The ultrasonic transmission unit 111 in the microphone unit 11 continuously transmits ultrasonic waves having a predetermined frequency to the outside (step S101). A part of the ultrasonic wave hits the lip of the speaker S and generates a reflected wave.

音波超音波受信部１１２は、ステップＳ１０１の超音波の送出により、話者Ｓに当たって発生した反射波を受信する（ステップＳ１０２）。その際、話者Ｓが何か言葉を発している場合には、同時にその音声（音波）も受信する。 The sonic ultrasonic wave reception unit 112 receives the reflected wave generated by hitting the speaker S by the transmission of the ultrasonic wave in step S101 (step S102). At that time, if the speaker S is uttering a word, the voice (sound wave) is also received at the same time.

音波超音波受信部１１２は、受信した音波と超音波とを含む混合波を電気信号に変換して、帯域分離部１２に供給する。帯域分離部１２では、ＬＰＦ１２１とＢＰＦ１２２とにおいて、混合波の帯域分離を行う（ステップＳ１０３）。ここで、ＬＰＦ１２１は、周波数２０ｋＨｚ以下の信号を音波に相当するものとして通過させる。一方、ＢＰＦ１２２は、周波数が４０ｋＨｚ付近の信号が超音波に相当するものとして通過させる。ＢＰＦ１２２を通過した超音波は、検波回路１３に向けて出力される。 The sonic wave ultrasonic wave reception unit 112 converts the mixed wave including the received sound wave and the ultrasonic wave into an electric signal and supplies the electric signal to the band separation unit 12. In the band separation unit 12, the LPF 121 and the BPF 122 perform band separation of the mixed wave (step S103). Here, the LPF 121 passes a signal having a frequency of 20 kHz or less as a sound wave. On the other hand, the BPF 122 passes a signal having a frequency of around 40 kHz as equivalent to an ultrasonic wave. The ultrasonic wave that has passed through the BPF 122 is output toward the detection circuit 13.

検波回路１３は、ＢＰＦ１２２を通って供給された信号を検波して、半波整流と波形の平滑化とを行い（ステップＳ１０４）、微分回路１４に供給する。 The detection circuit 13 detects the signal supplied through the BPF 122, performs half-wave rectification and waveform smoothing (step S104), and supplies the signal to the differentiation circuit 14.

微分回路１４は、検波回路１３によって検波された波形に対して、時間微分に相当する処理（波形の時間変化を検出する処理）を行う（ステップＳ１０５）。そして、時間変化に相当する信号波形を、発話判定部１５に供給する。 The differentiating circuit 14 performs processing corresponding to time differentiation (processing for detecting temporal changes in the waveform) on the waveform detected by the detection circuit 13 (step S105). Then, a signal waveform corresponding to the time change is supplied to the utterance determination unit 15.

発話判定部１５は、微分回路１４により出力された、時間変化に相当する信号の信号レベルが所定の閾値より大きいか否かを判別する（ステップＳ１０６）。 The utterance determination unit 15 determines whether or not the signal level of the signal corresponding to the time change output from the differentiation circuit 14 is greater than a predetermined threshold (step S106).

閾値よりも大きいと判別すると（ステップＳ１０６：Ｙｅｓ）、発話判定部１５は、話者Ｓの唇が大きく速く開いた、すなわち喋り始めたと判定して、スピーカスイッチ１６に所定レベルの信号を供給し、音声出力をＯＮに切り替えさせる（ステップＳ１０７）。一方、波形の時間変化が閾値以下であると判別すると（ステップＳ１０６：Ｎｏ）、発話判定部１５はスピーカスイッチ１６に信号を供給せず、音声出力のＯＮ／ＯＦＦを切り替えずにそのままの状態を維持させる。 If it is determined that the value is larger than the threshold value (step S106: Yes), the speech determination unit 15 determines that the lips of the speaker S are greatly opened quickly, that is, starts to speak, and supplies a signal of a predetermined level to the speaker switch 16. The voice output is switched to ON (step S107). On the other hand, if it is determined that the time change of the waveform is equal to or less than the threshold (step S106: No), the utterance determination unit 15 does not supply a signal to the speaker switch 16, and does not switch ON / OFF the audio output. Let it be maintained.

以上の処理動作により、マイクロホン装置１による音声出力スイッチ（スピーカスイッチ１６）の自動切り替えが行われる。即ち、話者Ｓの唇の動きが所定の大きさや速度以上のものであると判別した場合、話者Ｓは発話していると見なされ、音声出力のスイッチがＯＮされる。 Through the above processing operation, the microphone device 1 automatically switches the audio output switch (speaker switch 16). That is, when it is determined that the movement of the lips of the speaker S is greater than a predetermined size and speed, the speaker S is considered to be speaking and the voice output switch is turned on.

以上説明したように、本実施の形態に係るマイクロホン装置１によれば、超音波によって話者の唇の動きを検出することが可能であり、画像認識装置やカメラといった大型の付属装置なしに、話者の発話期間を検出することができる。 As described above, according to the microphone device 1 according to the present embodiment, it is possible to detect the movement of the speaker's lips by ultrasonic waves, without a large accessory device such as an image recognition device or a camera. The speaker's utterance period can be detected.

尚、本発明は、上記実施の形態で示した例に限定されるものではなく、様々な変形および応用が可能である。
例えば、上記実施の形態では、マイクユニット１１が超音波送出部１１１と音波超音波受信部１１２とを用いて構成されるものとした。しかし、これに限られず、例えばダイナミックマイクを用いてマイクロホン装置１を構成することも可能である。この場合、ダイナミックマイクを構成する振動板が、音波超音波受信部１１２と超音波送出部１１１との双方の役割を果たすことになる。 In addition, this invention is not limited to the example shown by the said embodiment, A various deformation | transformation and application are possible.
For example, in the above embodiment, the microphone unit 11 is configured by using the ultrasonic transmission unit 111 and the ultrasonic ultrasonic reception unit 112. However, the present invention is not limited to this, and the microphone device 1 can be configured using, for example, a dynamic microphone. In this case, the diaphragm constituting the dynamic microphone plays a role of both the ultrasonic ultrasonic receiving unit 112 and the ultrasonic transmission unit 111.

また、上記実施の形態では、音波超音波受信部１１２が音波の受信と超音波の受信との両方を行うものとしたが、音波の受信を行う音波受信部と超音波の受信を行う超音波受信部とを別個に設けるような構成としても良い。ただし、上記実施の形態のように音波の受信と超音波の受信とを同時に同じ部分（音波超音波受信部１１２）から行うことにより、話者は、音波の受信と超音波の受信とにおける指向性の違いを考慮せずに発話を行うことができる。より詳細には、図４（ａ）に示すような超音波受信部と音波受信部との指向性の違いを考慮することなく、話者Ｓは図４（ｂ）に示すように受信部の正面に位置して発話を行うことができる。 Moreover, in the said embodiment, although the sonic wave ultrasonic wave reception part 112 shall perform both reception of a sonic wave and reception of an ultrasonic wave, the sonic wave reception part which receives a sonic wave, and the ultrasonic wave which receives an ultrasonic wave It is good also as a structure which provides a receiving part separately. However, by performing reception of sound waves and reception of ultrasonic waves simultaneously from the same portion (the sound wave ultrasonic wave reception unit 112) as in the above embodiment, the speaker is directed to receive sound waves and receive ultrasonic waves. You can speak without considering gender differences. More specifically, without considering the directivity difference between the ultrasonic wave receiving unit and the sound wave receiving unit as shown in FIG. 4 (a), the speaker S can be connected to the receiving unit as shown in FIG. 4 (b). You can speak in front of you.

また、上記実施の形態では、マイクロホン装置１を、話者Ｓが筐体Ｋを掴んで保持するハンドマイクとして説明したが、マイクロホン装置１の形態はこれに限定されるものではなく、話者Ｓが自らの胸に装着するピンマイクや、上方から音声を拾うような吊マイク等としても実現可能である。また、自動車の車内のダッシュボード等に、運転者の唇に向けて設置してもよい。その際、音波の受信範囲と合わせるため、ピンマイクやダッシュボードに設置されるマイクでは超音波の送出方向を狭い範囲に限定する必要があり、逆に、吊マイクでは超音波の送出方向を広範囲に設定する必要がある。 In the above embodiment, the microphone device 1 is described as a hand microphone that the speaker S holds and holds the housing K. However, the form of the microphone device 1 is not limited to this, and the speaker S is not limited thereto. However, it can be realized as a pin microphone to be worn on its chest or a hanging microphone that picks up sound from above. Moreover, you may install in the dashboard etc. of a motor vehicle toward a driver | operator's lip. At that time, in order to match the reception range of sound waves, it is necessary to limit the transmission direction of ultrasonic waves to a narrow range for pin microphones and microphones installed on dashboards. Must be set.

また、上記実施の形態において、波形の時間変化の大小を判定するために閾値を設定するものとしたが、この閾値は特定の値に限定されない。これは、例えば、子供と大人、欧米人と日本人などでは、言葉を発する際の口の開け方が異なるためであり、それぞれのカテゴリーに合わせて閾値を設定しておくことにより、より正確に話者の発話期間を検出することが可能となる。 In the above embodiment, the threshold value is set to determine the magnitude of the time change of the waveform. However, this threshold value is not limited to a specific value. This is because, for example, children and adults, Westerners and Japanese, etc. have different ways of opening their mouths when speaking, and by setting a threshold value for each category, it is more accurate. It becomes possible to detect the utterance period of the speaker.

また、検波回路、微分回路等の細かい回路構成は、上記実施の形態で示した例に限定されるものではなく、個々の回路が各々の目的を達成可能に構成されていればよい。 Further, the detailed circuit configuration such as the detection circuit and the differentiation circuit is not limited to the example shown in the above embodiment, and it is sufficient that each circuit can be configured to achieve each purpose.

本発明の実施の形態に係るマイクロホン装置の構成図である。1 is a configuration diagram of a microphone device according to an embodiment of the present invention. 図１のマイクロホン装置が行う処理動作を説明するためのフローチャートである。3 is a flowchart for explaining a processing operation performed by the microphone device of FIG. 1. 図１のマイクロホン装置が話者によって利用される際の様子を模式的に示す図である。It is a figure which shows typically a mode when the microphone apparatus of FIG. 1 is utilized by the speaker. 音波受信部と超音波受信部とが同一である場合に、音波と超音波の指向性の違いを考慮せずに発話を行えることを説明するための模式図である。It is a schematic diagram for demonstrating that an utterance can be performed without considering the difference in directivity between a sound wave and an ultrasonic wave when the sound wave receiving unit and the ultrasonic wave receiving unit are the same.

Explanation of symbols

１マイクロホン装置
１１マイクユニット
１１１超音波送出部
１１２音波超音波受信部
１２帯域分離部
１２１ＬＰＦ
１２２ＢＰＦ
１３検波回路
１４微分回路
１５発話判定部
１６スピーカスイッチ DESCRIPTION OF SYMBOLS 1 Microphone apparatus 11 Microphone unit 111 Ultrasonic transmission part 112 Sonic wave ultrasonic reception part 12 Band separation part 121 LPF
122 BPF
13 Detection Circuit 14 Differentiation Circuit 15 Utterance Determination Unit 16 Speaker Switch

Claims

Ultrasonic transmission means for transmitting ultrasonic waves to be applied to the speaker;
Sound wave receiving means for receiving sound waves from the speaker's utterance;
Wherein arranged in the proximity of the ultrasonic delivery means, the reflected wave reception ultrasonic wave transmitted from the ultrasonic transmitting means receives the reflected wave reflected when the speaker while being constituted by a common receiving means and the wave reception unit Means,
Utterance discriminating means for discriminating whether or not a speaker is speaking based on the reflected wave received by the reflected wave receiving means ;
Only when it is determined that the speaking speaker by the speech discriminating means, and output means for outputting sound based on the sound wave received by the wave receiving means,
A microphone device characterized by that.

The receiving means receives a mixed wave of the sound wave and the reflected wave, and converts the mixed wave into an electric signal,
Separating means for separating the electric signal of the mixed wave into a signal corresponding to a sound wave and a signal corresponding to a reflected wave,
The speech discrimination means determines whether or not the speaker is speaking based on the separated reflected wave signal.
The microphone device according to claim 1.

The separation means separates the mixed wave into a sound wave and an ultrasonic wave based on a frequency of a component included in the signal of the mixed wave;
The microphone device according to claim 2.

The utterance discrimination means is
Detecting means for detecting a reflected wave received by the reflected wave receiving means and extracting a signal waveform; and
On the basis of the signal waveform extracted by the detection means, speaker lips and a speech activity detector for determining whether or not the motion indicating the speech operation,
The microphone device according to claim 1, 2, or 3.

The speech action detecting means detects a time change of the amplitude value of the signal waveform extracted by the detection means, determines whether or not the time change is larger than a predetermined threshold, and the speaker's lips make a speech action To determine whether or not it is moving
The microphone device according to claim 4.

The ultrasonic transmission means and the reception means are composed of a common dynamic microphone.
The microphone device according to any one of claims 1 to 5, wherein

Ultrasonic transmission means for transmitting ultrasonic waves to be applied to the speaker;
Sound wave receiving means for receiving sound waves from the speaker's utterance;
Wherein arranged in the proximity of the ultrasonic delivery means, the reflected wave reception ultrasonic wave transmitted from the ultrasonic transmitting means receives the reflected wave reflected when the speaker while being constituted by a common receiving means and the wave reception unit Means,
On the basis of the reflected wave received by the reflection wave receiving means, and a speech discrimination means for discriminating whether the speaking speaker,
An utterance detection device characterized by that.

The receiving means receives a mixed wave of the sound wave and the reflected wave, and converts the mixed wave into an electric signal,
Separating means for separating the electric signal of the mixed wave into a signal corresponding to a sound wave and a signal corresponding to a reflected wave,
The speech discrimination means determines whether or not the speaker is speaking based on the separated reflected wave signal.
The utterance detection device according to claim 7.

The utterance discrimination means is
Detecting means for detecting a reflected wave received by the reflected wave receiving means and extracting a signal waveform; and
On the basis of the signal waveform extracted by the detection means, speaker lips and a speech activity detector for determining whether or not the motion indicating the speech operation,
The utterance detection apparatus according to claim 7 or 8, wherein