JP4568905B2 - Microphone device and speech detection device - Google Patents

Microphone device and speech detection device Download PDF

Info

Publication number
JP4568905B2
JP4568905B2 JP2004329307A JP2004329307A JP4568905B2 JP 4568905 B2 JP4568905 B2 JP 4568905B2 JP 2004329307 A JP2004329307 A JP 2004329307A JP 2004329307 A JP2004329307 A JP 2004329307A JP 4568905 B2 JP4568905 B2 JP 4568905B2
Authority
JP
Japan
Prior art keywords
wave
speaker
ultrasonic
reflected wave
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2004329307A
Other languages
Japanese (ja)
Other versions
JP2006139117A (en
Inventor
寧 佐藤
晃 佐宗
宏明 児島
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kenwood KK
National Institute of Advanced Industrial Science and Technology AIST
Original Assignee
Kenwood KK
National Institute of Advanced Industrial Science and Technology AIST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kenwood KK, National Institute of Advanced Industrial Science and Technology AIST filed Critical Kenwood KK
Priority to JP2004329307A priority Critical patent/JP4568905B2/en
Publication of JP2006139117A publication Critical patent/JP2006139117A/en
Application granted granted Critical
Publication of JP4568905B2 publication Critical patent/JP4568905B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Details Of Audible-Bandwidth Transducers (AREA)
  • Circuit For Audible Band Transducer (AREA)

Description

本発明は、マイクロホン装置および発話検出装置に関するThe present invention relates to a microphone apparatus and a speech detection apparatus.

近年、マイクロホン装置、通称「マイク」は、映像や放送の分野だけにとどまらず、活躍の舞台を様々な世界へと広げている。こうした用途の広がりに応じて、時代はますます小型且つ高性能なマイクを求めている。例えば、ナビゲーションシステムにおいては、指示を送る側と受け取る側との間の、正確な情報伝達が重要である。   In recent years, microphone devices, commonly known as “microphones”, have expanded beyond the field of video and broadcasting to a variety of worlds. In response to this widespread use, the times demand increasingly smaller and higher performance microphones. For example, in a navigation system, accurate information transmission between an instruction sending side and a receiving side is important.

また、近年盛んに研究が為されている音声認識の分野においても、マイクロホン装置が担う役割は大きい。例えば、マイクロホン装置が話し手以外からの音声(いわゆる雑音)を拾ってしまうことは、しばしば認識の低下(認識の劣化)を引き起こす原因となる。こうした、雑音によって起こる認識の劣化を防ぐために、雑音を除去した上で音声伝達を行うことが可能なマイクロホン装置が考えられた(例えば、特許文献1)。   Also, in the field of speech recognition, which has been actively researched in recent years, the microphone device plays a major role. For example, when a microphone device picks up sound (so-called noise) from a person other than the speaker, it often causes a decrease in recognition (deterioration of recognition). In order to prevent such degradation of recognition caused by noise, a microphone device capable of performing voice transmission after removing noise has been considered (for example, Patent Document 1).

こうした雑音の除去のような認識率改善処理を行う場合、話者が発声している時間(音声区間)の情報を取得し、発話に合わせて処理を行う必要がある。例えば、音声区間の情報を誤って取得していると、話者が話し出す前の状態と話者が既に話し始めている状態とを混同して処理を行うことになり、認識率は思うように改善されない。そこで、音声区間の情報を適切に取得するための方法として、画像認識やカメラ等を用いて人間の唇の動きを検出する研究が、ATR(国際電気通信基礎技術研究所)等において為されている。
特開2003−111186号公報
When performing recognition rate improvement processing such as noise removal, it is necessary to acquire information on the time (voice segment) during which the speaker is speaking and perform processing in accordance with the speech. For example, if the information of the voice section is acquired by mistake, the processing before the speaker starts speaking will be confused with the state where the speaker has already started speaking, and the recognition rate will improve as expected. Not. Therefore, as a method for appropriately acquiring the information of the voice section, research for detecting the movement of the human lips by using image recognition, a camera or the like has been performed at ATR (International Telecommunications Research Institute). Yes.
JP 2003-111186 A

上述したように、画像認識やカメラによって人間の唇の動きを検出する方法を用いる場合、その方法を実現するための機材自体が非常に大きなものとなり、金額も高価なものとなってしまう。また、そのシステム全体を実現するためのCPUのリソースも大きなものになってしまう。   As described above, when a method for detecting the movement of a human lip by image recognition or a camera is used, the equipment itself for realizing the method becomes very large and the amount of money becomes expensive. Also, CPU resources for realizing the entire system become large.

本発明は、上記実状に鑑みて為されたものであり、話者の発話期間を効率的に検出可能なマイクロホン装置を容易に実現することを目的とする。   The present invention has been made in view of the above circumstances, and an object thereof is to easily realize a microphone device that can efficiently detect a speaker's speech period.

上記目的を達成するため、本発明の第1の観点に係るマイクロホン装置は、
話者に当てるための超音波を送出する超音波送出手段と、
話者の発話による音波を受信する音波受信手段と、
前記音波受信手段と共通の受信手段により構成されるとともに前記超音波送出手段の近傍に設置され、前記超音波送出手段より送出された超音波が話者に当たって反射した反射波を受信する反射波受信手段と、
前記反射波受信手段により受信した反射波に基づいて、話者が話しているか否かを判別する発話判別手段と、
前記発話判別手段により話者が話していると判別した場合にのみ、前記音波受信手段により受信した音波に基づいて音声を出力する出力手段とを備える
ことを特徴とする。
In order to achieve the above object, a microphone device according to a first aspect of the present invention includes:
Ultrasonic transmission means for transmitting ultrasonic waves to be applied to the speaker;
Sound wave receiving means for receiving sound waves from the speaker's utterance;
Wherein arranged in the proximity of the ultrasonic delivery means, the reflected wave reception ultrasonic wave transmitted from the ultrasonic transmitting means receives the reflected wave reflected when the speaker while being constituted by a common receiving means and the wave reception unit Means,
Utterance discriminating means for discriminating whether or not a speaker is speaking based on the reflected wave received by the reflected wave receiving means ;
Only when it is determined that the speaking speaker by the speech discriminating means, and output means for outputting sound based on the sound wave received by the wave receiving means,
It is characterized by that.

前記受信手段は、前記音波と前記反射波との混合波を受信して、該混合波を電気信号に変換するものであり、
前記混合波の電気信号を、音波に相当する信号と反射波に相当する信号とに分離する分離手段をさらに備え、
前記発話判別手段は、分離された反射波の信号に基づいて、話者が発話しているか否かを判別することが好ましい。
The receiving means receives a mixed wave of the sound wave and the reflected wave, and converts the mixed wave into an electric signal,
Separating means for separating the electric signal of the mixed wave into a signal corresponding to a sound wave and a signal corresponding to a reflected wave,
The utterance discrimination means preferably discriminates whether or not the speaker is speaking based on the separated reflected wave signal.

前記分離手段は、前記混合波の信号に含まれる成分の周波数に基づいて、前記混合波を音波と超音波とに分離することが好ましい。   It is preferable that the separation unit separates the mixed wave into a sound wave and an ultrasonic wave based on a frequency of a component included in the mixed wave signal.

前記発話判別手段は、
前記反射波受信手段により受信した反射波を検波して、信号波形を抽出する検波手段と、
前記検波手段によって抽出された信号波形に基づいて、話者の唇が発話動作を示す動きをしているか否かを判別する発話動作検出手段とを備える
ことが好ましい。
The utterance discrimination means is
Detecting means for detecting a reflected wave received by the reflected wave receiving means and extracting a signal waveform; and
On the basis of the signal waveform extracted by the detection means, speaker lips and a speech activity detector for determining whether or not the motion indicating the speech operation,
It is preferable.

前記発話動作検出手段は、前記検波手段によって抽出された信号波形の振幅値の時間変化を検出し、該時間変化が所定の閾値より大きいか否かを判別して、話者の唇が発話動作を示す動きをしているか否かを判定することが好ましい。   The speech action detecting means detects a time change of the amplitude value of the signal waveform extracted by the detection means, determines whether the time change is larger than a predetermined threshold, and the speaker's lips are made to speak. It is preferable to determine whether or not it is moving.

前記超音波送出手段と前記受信手段とは、共通のダイナミックマイクから構成されることが好ましい。   It is preferable that the ultrasonic transmission means and the reception means are constituted by a common dynamic microphone.

上記目的を達成するため、本発明の第2の観点に係る発話検出装置は、
話者に当てるための超音波を送出する超音波送出手段と、
話者の発話による音波を受信する音波受信手段と、
前記音波受信手段と共通の受信手段により構成されるとともに前記超音波送出手段の近傍に設置され、前記超音波送出手段より送出された超音波が話者に当たって反射した反射波を受信する反射波受信手段と、
前記反射波受信手段により受信した反射波に基づいて、話者が話しているか否かを判別する発話判別手段とを備える
ことを特徴とする。
In order to achieve the above object, an utterance detection apparatus according to the second aspect of the present invention provides:
Ultrasonic transmission means for transmitting ultrasonic waves to be applied to the speaker;
Sound wave receiving means for receiving sound waves from the speaker's utterance;
Wherein arranged in the proximity of the ultrasonic delivery means, the reflected wave reception ultrasonic wave transmitted from the ultrasonic transmitting means receives the reflected wave reflected when the speaker while being constituted by a common receiving means and the wave reception unit Means,
On the basis of the reflected wave received by the reflection wave receiving means, and a speech discrimination means for discriminating whether the speaking speaker,
It is characterized by that.

前記受信手段は、前記音波と前記反射波との混合波を受信して、該混合波を電気信号に変換するものであり、
前記混合波の電気信号を、音波に相当する信号と反射波に相当する信号とに分離する分離手段をさらに備え、
前記発話判別手段は、分離された反射波の信号に基づいて、話者が発話しているか否かを判別することが好ましい。
The receiving means receives a mixed wave of the sound wave and the reflected wave, and converts the mixed wave into an electric signal,
Separating means for separating the electric signal of the mixed wave into a signal corresponding to a sound wave and a signal corresponding to a reflected wave,
The utterance discrimination means preferably discriminates whether or not the speaker is speaking based on the separated reflected wave signal.

前記発話判別手段は、
前記反射波受信手段により受信した反射波を検波して、信号波形を抽出する検波手段と、
前記検波手段によって抽出された信号波形に基づいて、話者の唇が発話動作を示す動きをしているか否かを判別する発話動作検出手段とを備える
ことが好ましい。
The utterance discrimination means is
Detecting means for detecting a reflected wave received by the reflected wave receiving means and extracting a signal waveform; and
On the basis of the signal waveform extracted by the detection means, speaker lips and a speech activity detector for determining whether or not the motion indicating the speech operation,
It is preferable.

本発明によれば、話者の発話期間を検出可能なマイクロホン装置が実現できる。   ADVANTAGE OF THE INVENTION According to this invention, the microphone apparatus which can detect a speaker's speech period is realizable.

本発明の実施の形態に係るマイクロホン装置1は、図1に示すように、マイクユニット11と、帯域分離部12と、検波回路13と、微分回路14と、発話判定部15と、スピーカスイッチ16と、を備える。マイクユニット11は、超音波送出部111と、音波超音波受信部112とから構成される。帯域分離部12は、LPF(Low Pass Filter)121と、BPF(Band Pass Filter)122とから構成される。また、帯域分離部12と、検波回路13と、微分回路14と、発話判定部15と、スピーカスイッチ16とは、筐体Kの内部に格納されている。   As shown in FIG. 1, the microphone device 1 according to the embodiment of the present invention includes a microphone unit 11, a band separation unit 12, a detection circuit 13, a differentiation circuit 14, an utterance determination unit 15, and a speaker switch 16. And comprising. The microphone unit 11 includes an ultrasonic transmission unit 111 and a sonic ultrasonic reception unit 112. The band separation unit 12 includes an LPF (Low Pass Filter) 121 and a BPF (Band Pass Filter) 122. Further, the band separation unit 12, the detection circuit 13, the differentiation circuit 14, the utterance determination unit 15, and the speaker switch 16 are stored inside the housing K.

マイクユニット11は、通常、マイクロホン装置1を用いて音声の伝達を行おうとする話者(以下、話者S)が自身の口元近辺に位置させ、発話した音声の入力口とする部分である。
超音波送出部111は、トランスジューサ等の小型の超音波発振器から構成され、マイクロホン装置1の電源(図示せず)がONされるのに伴い、所定の周波数(例えば、40kHz程度)の超音波を外部の所定範囲に向けて連続的に送出する。
音波超音波受信部112は、ECM(エレクトレット・コンデンサ・マイクロホン)等から構成され、話者Sが発した音声(音波)を集音する。また、音波超音波受信部112は、超音波送出部111により発せられた超音波の話者Sに当たって跳ね返った反射波を受信する。音波および超音波を受信すると、音波超音波受信部112は、それらを電気信号に変換して帯域分離部12に供給する。
The microphone unit 11 is a portion that is usually used as an input port for spoken speech that is located near the mouth of a speaker (hereinafter referred to as speaker S) who wants to transmit speech using the microphone device 1.
The ultrasonic transmission unit 111 is composed of a small ultrasonic oscillator such as a transducer, and emits ultrasonic waves of a predetermined frequency (for example, about 40 kHz) as the power supply (not shown) of the microphone device 1 is turned on. Continuously send to a predetermined external range.
The sonic wave ultrasonic reception unit 112 is configured by an ECM (electret, condenser, microphone) or the like, and collects sound (sound wave) emitted by the speaker S. The ultrasonic ultrasonic wave reception unit 112 receives the reflected wave that bounces off the ultrasonic speaker S emitted from the ultrasonic wave transmission unit 111. When the sound wave and the ultrasonic wave are received, the sound wave ultrasonic wave reception unit 112 converts them into an electric signal and supplies it to the band separation unit 12.

帯域分離部12は、音波超音波受信部112から供給された電気信号を、その周波数に基づいて、音波に相当する信号と超音波に相当する信号とに分離する。より詳細には、LPF121は音波を通過させ、BPF122は超音波を通過させる。すなわち、LPF121は所定の遮断周波数(約20kHz)以下の周波数の信号だけを通過させ、遮断周波数以上の周波数の信号を減衰させる。一方、BPF122は、所定の周波数範囲(40kHz前後)の周波数の信号だけを通過させ、それ以外の周波数の信号を減衰させる。   The band separation unit 12 separates the electrical signal supplied from the sonic ultrasonic wave reception unit 112 into a signal corresponding to the sound wave and a signal corresponding to the ultrasonic wave based on the frequency. More specifically, the LPF 121 passes sound waves, and the BPF 122 passes ultrasonic waves. That is, the LPF 121 passes only a signal having a frequency equal to or lower than a predetermined cutoff frequency (about 20 kHz) and attenuates a signal having a frequency equal to or higher than the cutoff frequency. On the other hand, the BPF 122 passes only signals having a frequency in a predetermined frequency range (around 40 kHz) and attenuates signals having other frequencies.

検波回路13は、BPF122から供給された信号波形を検波する回路であり、半波整流のためのダイオードと信号波形を平滑化するためのコンデンサとから構成される。   The detection circuit 13 is a circuit that detects the signal waveform supplied from the BPF 122, and includes a diode for half-wave rectification and a capacitor for smoothing the signal waveform.

微分回路14は、検波回路13が検波した信号波形に対し、その信号波形を時間に対して微分したものに相当する信号(振幅値の時間変化に相当する信号)を出力する回路である。ここで、検波回路13を通過した信号は話者Sに当たって跳ね返った反射波であり、その時間変化に相当する信号は、話者Sの動作(具体的には唇の動き)を示す信号となる。この微分回路14を通ることにより、話者Sの体の動き、すなわち顔の前後左右へのゆっくりとした動きを示す信号波形は抽出されず、唇の動きを示す信号波形のみが抽出される。   The differentiating circuit 14 is a circuit that outputs a signal corresponding to a signal waveform differentiated with respect to time with respect to the signal waveform detected by the detection circuit 13 (a signal corresponding to a temporal change in amplitude value). Here, the signal that has passed through the detection circuit 13 is a reflected wave that bounces off the speaker S, and the signal corresponding to the time change is a signal that indicates the operation of the speaker S (specifically, the movement of the lips). . By passing through the differentiating circuit 14, the signal waveform indicating the movement of the body of the speaker S, that is, the slow movement of the face to the front, back, left and right is not extracted, but only the signal waveform indicating the movement of the lips is extracted.

発話判定部15は、図示するようにコンパレータ等を備え、微分回路14から供給された、振幅値の時間変化に相当する信号が所定の閾値を超えているか否かを判別する。すなわち、話者の唇の動きの速さが検出され、この速さが所定の閾値を超えているか否かによって、話者Sの唇の動きが発話を示すレベルのものであるか否かを判別する。そして、閾値を超えていると判別すると、音声出力のためのスイッチのオンを要求する所定レベルの信号を、スピーカスイッチ16に出力する。   The utterance determination unit 15 includes a comparator or the like as shown in the figure, and determines whether or not the signal supplied from the differentiating circuit 14 and corresponding to the time change of the amplitude value exceeds a predetermined threshold value. That is, the speed of movement of the speaker's lips is detected, and whether or not the movement of the lips of the speaker S is at a level indicating utterance depending on whether or not the speed exceeds a predetermined threshold value. Determine. When it is determined that the threshold value is exceeded, a signal of a predetermined level that requests the switch on for audio output to be turned on is output to the speaker switch 16.

スピーカスイッチ16は、モーメンタリ型のスイッチから構成され、発話判定部15から所定レベルの信号を供給されると、音声出力についてのスイッチをONに設定する。尚、スピーカスイッチ16は、発話判定部15からの所定レベルの信号を受け取らない場合、音声出力をOFFの状態で維持し続ける(オンに切り替えない)。より詳細には、微分回路14によって、唇が大きく速く動いた状態が信号として出力されるので、例えば口をゆっくり大きく開けた場合や、大きく動かしてもそのまま動かない場合などは、その変化が微分回路14でカットされ、スピーカスイッチ16がオンにならない。   The speaker switch 16 is composed of a momentary switch, and when a signal of a predetermined level is supplied from the utterance determination unit 15, the switch for sound output is set to ON. If the speaker switch 16 does not receive a signal of a predetermined level from the utterance determination unit 15, the speaker output 16 continues to maintain the voice output in an OFF state (not switched on). More specifically, since the differentiation circuit 14 outputs a signal indicating that the lips have moved greatly and quickly as a signal, for example, when the mouth is slowly opened wide or when it does not move as it is moved, the change is differentiated. The circuit 14 is cut and the speaker switch 16 is not turned on.

次に、本実施の形態に係るマイクロホン装置1が行う処理動作について、図2のフローチャートを参照して説明する。尚、ここで、話者Sは、図3に示すように、マイクロホン装置1を手に取り、筐体Kを掴んでマイクユニット11を口元に近づけているものとする。   Next, processing operations performed by the microphone device 1 according to the present embodiment will be described with reference to the flowchart of FIG. Here, it is assumed that the speaker S holds the microphone device 1 in his hand, grasps the housing K, and brings the microphone unit 11 close to the mouth as shown in FIG.

マイクユニット11内の超音波送出部111は、所定の周波数の超音波を外部に対して連続的に送出する(ステップS101)。超音波の一部は話者Sの唇に当たり、反射波を発生させる。   The ultrasonic transmission unit 111 in the microphone unit 11 continuously transmits ultrasonic waves having a predetermined frequency to the outside (step S101). A part of the ultrasonic wave hits the lip of the speaker S and generates a reflected wave.

音波超音波受信部112は、ステップS101の超音波の送出により、話者Sに当たって発生した反射波を受信する(ステップS102)。その際、話者Sが何か言葉を発している場合には、同時にその音声(音波)も受信する。   The sonic ultrasonic wave reception unit 112 receives the reflected wave generated by hitting the speaker S by the transmission of the ultrasonic wave in step S101 (step S102). At that time, if the speaker S is uttering a word, the voice (sound wave) is also received at the same time.

音波超音波受信部112は、受信した音波と超音波とを含む混合波を電気信号に変換して、帯域分離部12に供給する。帯域分離部12では、LPF121とBPF122とにおいて、混合波の帯域分離を行う(ステップS103)。ここで、LPF121は、周波数20kHz以下の信号を音波に相当するものとして通過させる。一方、BPF122は、周波数が40kHz付近の信号が超音波に相当するものとして通過させる。BPF122を通過した超音波は、検波回路13に向けて出力される。   The sonic wave ultrasonic wave reception unit 112 converts the mixed wave including the received sound wave and the ultrasonic wave into an electric signal and supplies the electric signal to the band separation unit 12. In the band separation unit 12, the LPF 121 and the BPF 122 perform band separation of the mixed wave (step S103). Here, the LPF 121 passes a signal having a frequency of 20 kHz or less as a sound wave. On the other hand, the BPF 122 passes a signal having a frequency of around 40 kHz as equivalent to an ultrasonic wave. The ultrasonic wave that has passed through the BPF 122 is output toward the detection circuit 13.

検波回路13は、BPF122を通って供給された信号を検波して、半波整流と波形の平滑化とを行い(ステップS104)、微分回路14に供給する。   The detection circuit 13 detects the signal supplied through the BPF 122, performs half-wave rectification and waveform smoothing (step S104), and supplies the signal to the differentiation circuit 14.

微分回路14は、検波回路13によって検波された波形に対して、時間微分に相当する処理(波形の時間変化を検出する処理)を行う(ステップS105)。そして、時間変化に相当する信号波形を、発話判定部15に供給する。   The differentiating circuit 14 performs processing corresponding to time differentiation (processing for detecting temporal changes in the waveform) on the waveform detected by the detection circuit 13 (step S105). Then, a signal waveform corresponding to the time change is supplied to the utterance determination unit 15.

発話判定部15は、微分回路14により出力された、時間変化に相当する信号の信号レベルが所定の閾値より大きいか否かを判別する(ステップS106)。   The utterance determination unit 15 determines whether or not the signal level of the signal corresponding to the time change output from the differentiation circuit 14 is greater than a predetermined threshold (step S106).

閾値よりも大きいと判別すると(ステップS106:Yes)、発話判定部15は、話者Sの唇が大きく速く開いた、すなわち喋り始めたと判定して、スピーカスイッチ16に所定レベルの信号を供給し、音声出力をONに切り替えさせる(ステップS107)。一方、波形の時間変化が閾値以下であると判別すると(ステップS106:No)、発話判定部15はスピーカスイッチ16に信号を供給せず、音声出力のON/OFFを切り替えずにそのままの状態を維持させる。   If it is determined that the value is larger than the threshold value (step S106: Yes), the speech determination unit 15 determines that the lips of the speaker S are greatly opened quickly, that is, starts to speak, and supplies a signal of a predetermined level to the speaker switch 16. The voice output is switched to ON (step S107). On the other hand, if it is determined that the time change of the waveform is equal to or less than the threshold (step S106: No), the utterance determination unit 15 does not supply a signal to the speaker switch 16, and does not switch ON / OFF the audio output. Let it be maintained.

以上の処理動作により、マイクロホン装置1による音声出力スイッチ(スピーカスイッチ16)の自動切り替えが行われる。即ち、話者Sの唇の動きが所定の大きさや速度以上のものであると判別した場合、話者Sは発話していると見なされ、音声出力のスイッチがONされる。   Through the above processing operation, the microphone device 1 automatically switches the audio output switch (speaker switch 16). That is, when it is determined that the movement of the lips of the speaker S is greater than a predetermined size and speed, the speaker S is considered to be speaking and the voice output switch is turned on.

以上説明したように、本実施の形態に係るマイクロホン装置1によれば、超音波によって話者の唇の動きを検出することが可能であり、画像認識装置やカメラといった大型の付属装置なしに、話者の発話期間を検出することができる。   As described above, according to the microphone device 1 according to the present embodiment, it is possible to detect the movement of the speaker's lips by ultrasonic waves, without a large accessory device such as an image recognition device or a camera. The speaker's utterance period can be detected.

尚、本発明は、上記実施の形態で示した例に限定されるものではなく、様々な変形および応用が可能である。
例えば、上記実施の形態では、マイクユニット11が超音波送出部111と音波超音波受信部112とを用いて構成されるものとした。しかし、これに限られず、例えばダイナミックマイクを用いてマイクロホン装置1を構成することも可能である。この場合、ダイナミックマイクを構成する振動板が、音波超音波受信部112と超音波送出部111との双方の役割を果たすことになる。
In addition, this invention is not limited to the example shown by the said embodiment, A various deformation | transformation and application are possible.
For example, in the above embodiment, the microphone unit 11 is configured by using the ultrasonic transmission unit 111 and the ultrasonic ultrasonic reception unit 112. However, the present invention is not limited to this, and the microphone device 1 can be configured using, for example, a dynamic microphone. In this case, the diaphragm constituting the dynamic microphone plays a role of both the ultrasonic ultrasonic receiving unit 112 and the ultrasonic transmission unit 111.

また、上記実施の形態では、音波超音波受信部112が音波の受信と超音波の受信との両方を行うものとしたが、音波の受信を行う音波受信部と超音波の受信を行う超音波受信部とを別個に設けるような構成としても良い。ただし、上記実施の形態のように音波の受信と超音波の受信とを同時に同じ部分(音波超音波受信部112)から行うことにより、話者は、音波の受信と超音波の受信とにおける指向性の違いを考慮せずに発話を行うことができる。より詳細には、図4(a)に示すような超音波受信部と音波受信部との指向性の違いを考慮することなく、話者Sは図4(b)に示すように受信部の正面に位置して発話を行うことができる。   Moreover, in the said embodiment, although the sonic wave ultrasonic wave reception part 112 shall perform both reception of a sonic wave and reception of an ultrasonic wave, the sonic wave reception part which receives a sonic wave, and the ultrasonic wave which receives an ultrasonic wave It is good also as a structure which provides a receiving part separately. However, by performing reception of sound waves and reception of ultrasonic waves simultaneously from the same portion (the sound wave ultrasonic wave reception unit 112) as in the above embodiment, the speaker is directed to receive sound waves and receive ultrasonic waves. You can speak without considering gender differences. More specifically, without considering the directivity difference between the ultrasonic wave receiving unit and the sound wave receiving unit as shown in FIG. 4 (a), the speaker S can be connected to the receiving unit as shown in FIG. 4 (b). You can speak in front of you.

また、上記実施の形態では、マイクロホン装置1を、話者Sが筐体Kを掴んで保持するハンドマイクとして説明したが、マイクロホン装置1の形態はこれに限定されるものではなく、話者Sが自らの胸に装着するピンマイクや、上方から音声を拾うような吊マイク等としても実現可能である。また、自動車の車内のダッシュボード等に、運転者の唇に向けて設置してもよい。その際、音波の受信範囲と合わせるため、ピンマイクやダッシュボードに設置されるマイクでは超音波の送出方向を狭い範囲に限定する必要があり、逆に、吊マイクでは超音波の送出方向を広範囲に設定する必要がある。   In the above embodiment, the microphone device 1 is described as a hand microphone that the speaker S holds and holds the housing K. However, the form of the microphone device 1 is not limited to this, and the speaker S is not limited thereto. However, it can be realized as a pin microphone to be worn on its chest or a hanging microphone that picks up sound from above. Moreover, you may install in the dashboard etc. of a motor vehicle toward a driver | operator's lip. At that time, in order to match the reception range of sound waves, it is necessary to limit the transmission direction of ultrasonic waves to a narrow range for pin microphones and microphones installed on dashboards. Must be set.

また、上記実施の形態において、波形の時間変化の大小を判定するために閾値を設定するものとしたが、この閾値は特定の値に限定されない。これは、例えば、子供と大人、欧米人と日本人などでは、言葉を発する際の口の開け方が異なるためであり、それぞれのカテゴリーに合わせて閾値を設定しておくことにより、より正確に話者の発話期間を検出することが可能となる。   In the above embodiment, the threshold value is set to determine the magnitude of the time change of the waveform. However, this threshold value is not limited to a specific value. This is because, for example, children and adults, Westerners and Japanese, etc. have different ways of opening their mouths when speaking, and by setting a threshold value for each category, it is more accurate. It becomes possible to detect the utterance period of the speaker.

また、検波回路、微分回路等の細かい回路構成は、上記実施の形態で示した例に限定されるものではなく、個々の回路が各々の目的を達成可能に構成されていればよい。   Further, the detailed circuit configuration such as the detection circuit and the differentiation circuit is not limited to the example shown in the above embodiment, and it is sufficient that each circuit can be configured to achieve each purpose.

本発明の実施の形態に係るマイクロホン装置の構成図である。1 is a configuration diagram of a microphone device according to an embodiment of the present invention. 図1のマイクロホン装置が行う処理動作を説明するためのフローチャートである。3 is a flowchart for explaining a processing operation performed by the microphone device of FIG. 1. 図1のマイクロホン装置が話者によって利用される際の様子を模式的に示す図である。It is a figure which shows typically a mode when the microphone apparatus of FIG. 1 is utilized by the speaker. 音波受信部と超音波受信部とが同一である場合に、音波と超音波の指向性の違いを考慮せずに発話を行えることを説明するための模式図である。It is a schematic diagram for demonstrating that an utterance can be performed without considering the difference in directivity between a sound wave and an ultrasonic wave when the sound wave receiving unit and the ultrasonic wave receiving unit are the same.

符号の説明Explanation of symbols

1 マイクロホン装置
11 マイクユニット
111 超音波送出部
112 音波超音波受信部
12 帯域分離部
121 LPF
122 BPF
13 検波回路
14 微分回路
15 発話判定部
16 スピーカスイッチ
DESCRIPTION OF SYMBOLS 1 Microphone apparatus 11 Microphone unit 111 Ultrasonic transmission part 112 Sonic wave ultrasonic reception part 12 Band separation part 121 LPF
122 BPF
13 Detection Circuit 14 Differentiation Circuit 15 Utterance Determination Unit 16 Speaker Switch

Claims (9)

話者に当てるための超音波を送出する超音波送出手段と、
話者の発話による音波を受信する音波受信手段と、
前記音波受信手段と共通の受信手段により構成されるとともに前記超音波送出手段の近傍に設置され、前記超音波送出手段より送出された超音波が話者に当たって反射した反射波を受信する反射波受信手段と、
前記反射波受信手段により受信した反射波に基づいて、話者が話しているか否かを判別する発話判別手段と、
前記発話判別手段により話者が話していると判別した場合にのみ、前記音波受信手段により受信した音波に基づいて音声を出力する出力手段とを備える
ことを特徴とするマイクロホン装置。
Ultrasonic transmission means for transmitting ultrasonic waves to be applied to the speaker;
Sound wave receiving means for receiving sound waves from the speaker's utterance;
Wherein arranged in the proximity of the ultrasonic delivery means, the reflected wave reception ultrasonic wave transmitted from the ultrasonic transmitting means receives the reflected wave reflected when the speaker while being constituted by a common receiving means and the wave reception unit Means,
Utterance discriminating means for discriminating whether or not a speaker is speaking based on the reflected wave received by the reflected wave receiving means ;
Only when it is determined that the speaking speaker by the speech discriminating means, and output means for outputting sound based on the sound wave received by the wave receiving means,
A microphone device characterized by that.
前記受信手段は、前記音波と前記反射波との混合波を受信して、該混合波を電気信号に変換するものであり、
前記混合波の電気信号を、音波に相当する信号と反射波に相当する信号とに分離する分離手段をさらに備え、
前記発話判別手段は、分離された反射波の信号に基づいて、話者が発話しているか否かを判別する、
ことを特徴とする請求項1に記載のマイクロホン装置。
The receiving means receives a mixed wave of the sound wave and the reflected wave, and converts the mixed wave into an electric signal,
Separating means for separating the electric signal of the mixed wave into a signal corresponding to a sound wave and a signal corresponding to a reflected wave,
The speech discrimination means determines whether or not the speaker is speaking based on the separated reflected wave signal.
The microphone device according to claim 1.
前記分離手段は、前記混合波の信号に含まれる成分の周波数に基づいて、前記混合波を音波と超音波とに分離する、
ことを特徴とする請求項2に記載のマイクロホン装置。
The separation means separates the mixed wave into a sound wave and an ultrasonic wave based on a frequency of a component included in the signal of the mixed wave;
The microphone device according to claim 2.
前記発話判別手段は、
前記反射波受信手段により受信した反射波を検波して、信号波形を抽出する検波手段と、
前記検波手段によって抽出された信号波形に基づいて、話者の唇が発話動作を示す動きをしているか否かを判別する発話動作検出手段とを備える
ことを特徴とする請求項1、2又は3に記載のマイクロホン装置。
The utterance discrimination means is
Detecting means for detecting a reflected wave received by the reflected wave receiving means and extracting a signal waveform; and
On the basis of the signal waveform extracted by the detection means, speaker lips and a speech activity detector for determining whether or not the motion indicating the speech operation,
The microphone device according to claim 1, 2, or 3.
前記発話動作検出手段は、前記検波手段によって抽出された信号波形の振幅値の時間変化を検出し、該時間変化が所定の閾値より大きいか否かを判別して、話者の唇が発話動作を示す動きをしているか否かを判定する、
ことを特徴とする請求項4に記載のマイクロホン装置。
The speech action detecting means detects a time change of the amplitude value of the signal waveform extracted by the detection means, determines whether or not the time change is larger than a predetermined threshold, and the speaker's lips make a speech action To determine whether or not it is moving
The microphone device according to claim 4.
前記超音波送出手段と前記受信手段とは、共通のダイナミックマイクから構成される、
ことを特徴とする請求項1乃至5のいずれか1項に記載のマイクロホン装置。
The ultrasonic transmission means and the reception means are composed of a common dynamic microphone.
The microphone device according to any one of claims 1 to 5, wherein
話者に当てるための超音波を送出する超音波送出手段と、
話者の発話による音波を受信する音波受信手段と、
前記音波受信手段と共通の受信手段により構成されるとともに前記超音波送出手段の近傍に設置され、前記超音波送出手段より送出された超音波が話者に当たって反射した反射波を受信する反射波受信手段と、
前記反射波受信手段により受信した反射波に基づいて、話者が話しているか否かを判別する発話判別手段とを備える
ことを特徴とする発話検出装置。
Ultrasonic transmission means for transmitting ultrasonic waves to be applied to the speaker;
Sound wave receiving means for receiving sound waves from the speaker's utterance;
Wherein arranged in the proximity of the ultrasonic delivery means, the reflected wave reception ultrasonic wave transmitted from the ultrasonic transmitting means receives the reflected wave reflected when the speaker while being constituted by a common receiving means and the wave reception unit Means,
On the basis of the reflected wave received by the reflection wave receiving means, and a speech discrimination means for discriminating whether the speaking speaker,
An utterance detection device characterized by that.
前記受信手段は、前記音波と前記反射波との混合波を受信して、該混合波を電気信号に変換するものであり、
前記混合波の電気信号を、音波に相当する信号と反射波に相当する信号とに分離する分離手段をさらに備え、
前記発話判別手段は、分離された反射波の信号に基づいて、話者が発話しているか否かを判別する、
ことを特徴とする請求項7に記載の発話検出装置。
The receiving means receives a mixed wave of the sound wave and the reflected wave, and converts the mixed wave into an electric signal,
Separating means for separating the electric signal of the mixed wave into a signal corresponding to a sound wave and a signal corresponding to a reflected wave,
The speech discrimination means determines whether or not the speaker is speaking based on the separated reflected wave signal.
The utterance detection device according to claim 7.
前記発話判別手段は、
前記反射波受信手段により受信した反射波を検波して、信号波形を抽出する検波手段と、
前記検波手段によって抽出された信号波形に基づいて、話者の唇が発話動作を示す動きをしているか否かを判別する発話動作検出手段とを備える
ことを特徴とする請求項7又は8に記載の発話検出装置。
The utterance discrimination means is
Detecting means for detecting a reflected wave received by the reflected wave receiving means and extracting a signal waveform; and
On the basis of the signal waveform extracted by the detection means, speaker lips and a speech activity detector for determining whether or not the motion indicating the speech operation,
The utterance detection apparatus according to claim 7 or 8, wherein
JP2004329307A 2004-11-12 2004-11-12 Microphone device and speech detection device Active JP4568905B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2004329307A JP4568905B2 (en) 2004-11-12 2004-11-12 Microphone device and speech detection device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2004329307A JP4568905B2 (en) 2004-11-12 2004-11-12 Microphone device and speech detection device

Publications (2)

Publication Number Publication Date
JP2006139117A JP2006139117A (en) 2006-06-01
JP4568905B2 true JP4568905B2 (en) 2010-10-27

Family

ID=36619980

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2004329307A Active JP4568905B2 (en) 2004-11-12 2004-11-12 Microphone device and speech detection device

Country Status (1)

Country Link
JP (1) JP4568905B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660536A (en) * 2021-09-28 2021-11-16 北京七维视觉科技有限公司 Subtitle display method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6338993A (en) * 1986-08-04 1988-02-19 松下電器産業株式会社 Voice section detector
JPH01310399A (en) * 1988-06-08 1989-12-14 Toshiba Corp Speech recognition device
JPH07307989A (en) * 1994-05-13 1995-11-21 Matsushita Electric Ind Co Ltd Voice input device
JPH08271627A (en) * 1995-03-31 1996-10-18 Hitachi Commun Syst Inc Distance measuring device between loudspeaker and microphone

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6338993A (en) * 1986-08-04 1988-02-19 松下電器産業株式会社 Voice section detector
JPH01310399A (en) * 1988-06-08 1989-12-14 Toshiba Corp Speech recognition device
JPH07307989A (en) * 1994-05-13 1995-11-21 Matsushita Electric Ind Co Ltd Voice input device
JPH08271627A (en) * 1995-03-31 1996-10-18 Hitachi Commun Syst Inc Distance measuring device between loudspeaker and microphone

Also Published As

Publication number Publication date
JP2006139117A (en) 2006-06-01

Similar Documents

Publication Publication Date Title
US10586534B1 (en) Voice-controlled device control using acoustic echo cancellation statistics
US7885818B2 (en) Controlling an apparatus based on speech
US9958950B2 (en) Detector
US9913022B2 (en) System and method of improving voice quality in a wireless headset with untethered earbuds of a mobile device
WO2010140358A1 (en) Hearing aid, hearing assistance system, walking detection method, and hearing assistance method
US11343607B2 (en) Automatic active noise reduction (ANR) control to improve user interaction
US20170263267A1 (en) System and method for performing automatic gain control using an accelerometer in a headset
US20100098266A1 (en) Multi-channel audio device
US20170365249A1 (en) System and method of performing automatic speech recognition using end-pointing markers generated using accelerometer-based voice activity detector
JP2007028610A (en) Hearing apparatus and method for operating the same
US20190369236A1 (en) Method for operating a loudspeaker unit, and loudspeaker unit
US20120197635A1 (en) Method for generating an audio signal
US6959095B2 (en) Method and apparatus for providing multiple output channels in a microphone
JP2003264883A (en) Voice processing apparatus and voice processing method
CN113314121B (en) Soundless voice recognition method, soundless voice recognition device, soundless voice recognition medium, soundless voice recognition earphone and electronic equipment
WO2003107327A1 (en) Controlling an apparatus based on speech
JP4568905B2 (en) Microphone device and speech detection device
JP3838159B2 (en) Speech recognition dialogue apparatus and program
CN108711434A (en) Vehicle noise-reduction method and device
CN113767431A (en) Speech detection
EP3195618A1 (en) A method for operating a hearing system as well as a hearing system
JP2007267331A (en) Combination microphone system for speaking voice collection
WO2011149969A2 (en) Separating voice from noise using a network of proximity filters
JPH023520B2 (en)
US20230239617A1 (en) Ear-worn device and reproduction method

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20070116

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20091225

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20100105

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20100303

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20100629

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20100723

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130820

Year of fee payment: 3

R150 Certificate of patent or registration of utility model

Ref document number: 4568905

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130820

Year of fee payment: 3

S111 Request for change of ownership or part of ownership

Free format text: JAPANESE INTERMEDIATE CODE: R313115

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130820

Year of fee payment: 3

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

S533 Written request for registration of change of name

Free format text: JAPANESE INTERMEDIATE CODE: R313533

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250