JP2002034092A

JP2002034092A - Sound-absorbing device

Info

Publication number: JP2002034092A
Application number: JP2000215481A
Authority: JP
Inventors: Takuya Tsuda; 拓也津田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2000-07-17
Filing date: 2000-07-17
Publication date: 2002-01-31

Abstract

PROBLEM TO BE SOLVED: To provide a sound absorbing device which collects only a target sound at a high sound-collecting S/N ratio, even in a noise by a method where a sound source is specified, without performing a sound-collecting scanning operation and to control its directivity. SOLUTION: The sound-collecting device is provided with a sound-collecting part 10 with directivity, a directivity control part 30, a selection part 40 which selects a target sound source region by a detection key word and a key-word detection part 60 by a voice recognition means. The key word and the sound source region are registered in advance, so as to be related to a reference table 50. When the key word is detected by the part 60, the related sound source region is selected, and the directivity of the part 10 is controlled toward the selected target sound source region in synchronization with a signal from a control-timing instruction part 20.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、指向性を有し、更
にその指向性を制御できる収音装置に関する。特に目的
音源領域を特定し、音源領域に合わせて指向性を制御す
ることで、雑音を排し、目的音のみを高いＳＮ比で収音
する装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound pickup device having directivity and capable of controlling the directivity. In particular, the present invention relates to a device that eliminates noise and collects only a target sound at a high SN ratio by specifying a target sound source region and controlling the directivity according to the sound source region.

【０００２】[0002]

【従来の技術】人が集まる部屋や車内のように多くの雑
音が存在する環境下で、高いＳＮ比で収音を行うために
は従来、接話マイクが必要であった。しかし、接話マイ
クを話者数分用意して収音を行うことは、使用する話者
にとって煩わしいものであるだけでなく、コストが高く
つき、システムの拡張性に劣る。そこで、収音部が指向
性を有し、適宜、音源に指向性を振り向けることができ
れば、話者から離れた位置に収音部を配置しても高い収
音ＳＮ比が得られ、マイクロホンの存在を意識しない自
然な会話が可能となる。2. Description of the Related Art In an environment where a lot of noises exist, such as in a room where people gather or in a car, in order to collect sound with a high SN ratio, a close-talking microphone is conventionally required. However, preparing the close-talking microphones for the number of speakers to collect sound is not only troublesome for the speaker to use, but also increases the cost and is inferior in system expandability. Therefore, if the sound pickup unit has directivity and the directivity can be appropriately directed to the sound source, a high sound pickup S / N ratio can be obtained even if the sound pickup unit is arranged at a position distant from the speaker. A natural conversation that is not aware of the existence of is possible.

【０００３】指向性を有した収音部を備え、その指向性
を制御できる収音装置として、代表的なものにマイクロ
ホンアレイ装置が挙げられる。A typical example of a sound collection device having a sound collection unit having directivity and capable of controlling the directivity is a microphone array device.

【０００４】マイクロホンアレイ装置は、複数のマイク
ロホンを空間的に異なる位置に配置し、信号処理を施す
ことで、信号分離および雑音除去を行う。マイクロホン
アレイ装置は複雑な信号処理を必要とするが、高性能な
アルゴリズムの開発とプロセッサ処理能力の向上によ
り、小型化、低コスト化が実現され、現在、ハンズフリ
ー通話装置、マルチメディア通信会議システムなど幅広
い分野で応用が進んでいる。The microphone array device performs signal separation and noise removal by arranging a plurality of microphones at spatially different positions and performing signal processing. Microphone array devices require complex signal processing, but the development of high-performance algorithms and the improvement of processor processing power have realized miniaturization and cost reduction. Currently, hands-free communication devices, multimedia communication conference systems Applications are expanding in a wide range of fields.

【０００５】例えば、マルチメディア通信会議用にマイ
クロホンアレイ装置を用いて、目的とする音源方向距離
を推定し、音源方向に向けて収音、雑音除去、カメラの
焦点制御までを行うシステムが製品化されている。For example, a system for estimating a target sound source direction distance using a microphone array device for a multimedia communication conference, and performing sound pickup, noise removal, and camera focus control toward the sound source direction has been commercialized. Have been.

【０００６】このような例として、特開平９−２５１２
９９号公報には、音声入力部と、周波数分析部と、音源
位置探査部と、音声パラメータ抽出部と、音声認識部を
有するマイクロホンアレイ入力型音声認識装置であっ
て、音声入力部により入力されるマイクロホンアレイか
らの入力信号を、周波数分析部にてバンドパスフィルタ
により周波数分析して、周波数帯域毎のバンドパス波形
をマイクロホンチャネル別に求め、このバンドパス波形
から音源位置探査部により音源位置または方向毎のバン
ドパスパワー分布を周波数帯域毎に求めて、そのバンド
パスパワー分布から音源位置または方向を推定し、音声
パラメータ抽出部で、この音源位置または方向に基づい
て周波数帯域毎のバンドパスパワー分布から該当するバ
ンドパスパワーを音声パラメータとして抽出し、これを
音声認識部での音声認識に用いることが示されている。As such an example, Japanese Patent Application Laid-Open No. 9-2512
Japanese Patent Publication No. 99 is a microphone array input type speech recognition device having a speech input unit, a frequency analysis unit, a sound source location search unit, a speech parameter extraction unit, and a speech recognition unit, and is input by the speech input unit. The input signal from the microphone array is frequency-analyzed by a frequency analysis unit using a band-pass filter, and a band-pass waveform for each frequency band is obtained for each microphone channel. From this band-pass waveform, the sound source position or direction is detected by a sound source position search unit. The band-pass power distribution for each frequency band is obtained for each frequency band, and the sound source position or direction is estimated from the band-pass power distribution. The corresponding bandpass power is extracted as a speech parameter from It has been shown that used to identify.

【０００７】また、特開平１１−１８１９２号公報に
は、マイクロホンアレイと、マイクロホンアレイ主装置
と、スピーカと、受話検出部と、指向性制御部とを有す
る収音装置において、マイクロホンアレイで収音した信
号にマイクロホンアレイ主装置で信号処理を施し、目的
話者の存在領域にマイクロホンアレイの指向性を向けて
目的音声を高いＳＮ比で収音することが示されている。Japanese Unexamined Patent Application Publication No. 11-18192 discloses a sound collection device having a microphone array, a microphone array main device, a speaker, a reception detector, and a directivity controller. It shows that the microphone array main unit performs signal processing on the obtained signal, and directs the microphone array to the area where the target speaker is present to collect the target voice with a high SN ratio.

【０００８】これらの装置は、マイクロホンアレイ装置
が周囲を走査、収音した結果、マイクロホンアレイ主装
置からの出力が最も大きい角度、位置を検出して発言者
の位置として特定することを特徴としている。These devices are characterized in that the microphone array device scans the surroundings and collects sound, and as a result, detects the angle and position where the output from the microphone array main device is the largest, and specifies the angle and position as the position of the speaker. .

【０００９】[0009]

【発明が解決しようとする課題】マイクロホンアレイ装
置が音源を特定するために収音を行うにあたって、様々
な雑音が障害になることが多い。雑音が人間の音声以外
のものであれば、音声スペクトルに基づいて、ある程度
除去は可能かもしれない。しかし、例えば通信会議シス
テムにおいては、受話スピーカーからの音声や収音を望
まない発言者以外のひそひそ話など、人間の音声が雑音
として存在する。そのため、精度よく音声信号を収音、
分離することは困難であり、限界もある。When a microphone array device picks up sound to specify a sound source, various noises often become obstacles. If the noise is other than human speech, some removal may be possible based on the speech spectrum. However, in a communication conference system, for example, human voice such as a voice from a receiving speaker or a covert voice of a speaker who does not want to collect sound is present as noise. Therefore, the sound signal is collected with high accuracy,
Separation is difficult and has limitations.

【００１０】通常、会議などの公式な場では、発言者の
位置はあらかじめ決められており、基本的にそこから移
動することは無い。また議事進行役による指名など、会
話の文脈に依存して、発言が制御されることも特徴であ
る。Normally, in an official place such as a conference, the position of a speaker is predetermined and basically does not move from there. Another feature is that remarks are controlled depending on the context of the conversation, such as nomination by the chairperson.

【００１１】本発明では、このように話者の物理的な位
置が固定的であることと、話者が交代する時の文脈依存
性に着目して、雑音の影響を受けやすい音源の収音走査
を行わずに、収音したい音源領域を特定し、マイクロホ
ンの指向性を制御して、目的音を高いＳＮ比で収音する
ことを目的とする。The present invention focuses on the fact that the physical position of the speaker is fixed and the context dependency when the speaker changes as described above, and picks up a sound source that is easily affected by noise. An object is to specify a sound source region to be picked up without scanning, control the directivity of a microphone, and pick up a target sound at a high SN ratio.

【００１２】[0012]

【課題を解決するための手段】本発明の収音装置では、
収音走査を行わずに目的音源領域を特定する手段とし
て、外部から入力されるキーワードを検出する。すなわ
ち、本発明の収音装置は、指向性を備えた収音部と前記
収音部の指向性を制御する手段と、収音を行う目的音源
領域を選択する手段と、キーワード入力手段を具備し、
事前に得られた音源領域候補とキーワードが関連付けら
れており、当該キーワードがキーワード入力手段より入
力された場合に、関連付けられた音源領域を選択し、選
択された領域に合わせて収音部の指向性を制御すること
を特徴とする。According to the sound pickup apparatus of the present invention,
As means for specifying a target sound source area without performing sound pickup scanning, a keyword input from the outside is detected. That is, the sound pickup device of the present invention includes a sound pickup unit having directivity, a unit for controlling the directivity of the sound pickup unit, a unit for selecting a target sound source area for sound pickup, and a keyword input unit. And
When a keyword is associated with a sound source area candidate obtained in advance, when the keyword is input from the keyword input unit, the associated sound source area is selected, and the sound pickup unit is directed in accordance with the selected area. It is characterized by controlling the characteristics.

【００１３】本発明は、上記収音装置において、特に収
音装置として常時取りこまれる音声信号を加工して、キ
ーワードの入力とする。すなわち、本発明は、上記収音
装置において、収音部より得られた音声信号を認識して
音声情報を得る手段と、前記音声情報からキーワードを
検出する手段を具備する。According to the present invention, in the above-described sound collecting device, a speech signal which is always picked up as a sound collecting device is processed and used as a keyword input. That is, the present invention provides the above sound collecting apparatus, comprising: means for recognizing a sound signal obtained from a sound collecting unit to obtain sound information; and means for detecting a keyword from the sound information.

【００１４】さらに、本発明は、上記収音装置におい
て、キーワードと関連付けられる音源領域候補は、対照
表の形で装置内部に記憶される。このキーワードは状況
に応じて変更されるべきなので、前記対照表は書換可能
な記憶領域に保存される。Further, according to the present invention, in the sound pickup device, the sound source area candidates associated with the keywords are stored in the device in the form of a comparison table. Since this keyword should be changed according to the situation, the comparison table is stored in a rewritable storage area.

【００１５】本発明の上記収音装置において、前記キー
ワードは、音源を識別する発言者の名前のような識別子
であることを特徴とする。[0015] In the above sound pickup device of the present invention, the keyword is an identifier such as a name of a speaker for identifying a sound source.

【００１６】本発明の上記収音装置は、収音部より得ら
れる音声信号から、無音声の状態を検出する手段を備
え、前記無音声状態が一定時間以上継続した場合、発言
が終了したと判断して、それに同期して、目的音源領域
に合わせて収音部の指向性制御を行い、次の発言者の音
声を高いＳＮ比で収音する。The sound collecting apparatus of the present invention includes means for detecting a state of no sound from a sound signal obtained from the sound collecting unit. If the state of no sound continues for a predetermined time or more, the speech is terminated. Judgment is made, and in synchronism therewith, directivity control of the sound pickup unit is performed in accordance with the target sound source area, and the sound of the next speaker is picked up at a high SN ratio.

【００１７】本発明の指向性を備えた収音部の指向性を
目的音源領域に向けて制御するようにした収音装置は、
収音部から入力された音声から次の目的音源を特定する
キーワードを検出するキーワード検出手段と、目的音源
の交代を示すデータを格納したキーワード分類リスト
と、あらかじめ目的音源と音源領域を関連付けたデータ
からなるキーワード−音源領域対照表と、キーワード検
出手段が検出した目的音源に関するデータを用いてキー
ワード−音源領域対照表を参照して音源領域を選択する
音源領域選択部と、収音部の指向性を選択された音源領
域へ移行させる制御を開始するタイミングを指示する制
御開始指示信号を出力する制御タイミング指示手段と、
音源領域選択信号と制御開始指示信号とによって収音部
の指向性を制御する指向性制御部とを具備した。According to the sound collecting apparatus of the present invention, the directivity of a sound collecting section having directivity is controlled toward a target sound source area.
Keyword detection means for detecting a keyword specifying the next target sound source from the voice input from the sound pickup unit, a keyword classification list storing data indicating the change of the target sound source, and data in which the target sound source and the sound source region are previously associated And a sound source area selecting unit for selecting a sound source area by referring to the keyword-sound source area comparing table using data on the target sound source detected by the keyword detecting means, and a directivity of the sound collecting unit. Control timing instructing means for outputting a control start instruction signal for instructing the timing to start the control to shift to the selected sound source region,
A directivity control unit is provided for controlling the directivity of the sound pickup unit according to the sound source area selection signal and the control start instruction signal.

【００１８】さらに、本発明の収音装置は、前記制御タ
イミング指示手段が、収音部より得られた音声信号から
無音声の状態を検出する手段を備え、前記無音声状態が
一定時間以上継続した場合に同期して、制御開始指示信
号を出力するようにした。Further, in the sound pickup apparatus according to the present invention, the control timing instructing means includes means for detecting a state of no sound from a sound signal obtained from the sound collecting section, and the soundless state continues for a predetermined time or more. The control start instruction signal is output in synchronization with the case.

【００１９】[0019]

【発明の実施の形態】以下、図面を参照しながら本発明
の実施の形態を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００２０】図１は、本発明のキーワード検出による目
的音源領域指定収音装置の信号処理部の構成例を示すブ
ロック図である。本発明のキーワード検出による目的音
源領域指定収音装置の信号処理部は、マイクロホンアレ
イ１１とマイクロホンアレイ主装置１２からなる収音部
１０と、制御タイミング指示部２０と、指向性制御部３
０と、音源領域選択部４０と、キーワード−音源領域対
照表５０と、音声認識・キーワード検出部６０と、キー
ワード分類リスト表７０とを有している。FIG. 1 is a block diagram showing a configuration example of a signal processing unit of a target sound source area designating sound pickup apparatus by keyword detection according to the present invention. The signal processing unit of the target sound source area designating sound collecting device by keyword detection according to the present invention includes a sound collecting unit 10 including a microphone array 11 and a microphone array main device 12, a control timing instructing unit 20, and a directivity control unit 3.
0, a sound source area selection unit 40, a keyword-sound source area comparison table 50, a speech recognition / keyword detection unit 60, and a keyword classification list table 70.

【００２１】マイクロホンアレイ１１とマイクロホンア
レイ主装置１２からなる収音部１０は、マイクロホンア
レイに設けた複数のマイクロホンを制御して、目的音源
領域の音声のみを強調して取り出す。A sound collection unit 10 comprising a microphone array 11 and a microphone array main unit 12 controls a plurality of microphones provided in the microphone array, and emphasizes and extracts only sound in a target sound source area.

【００２２】制御タイミング指示部２０は、音声入力状
態を監視し、音声入力がない状態が所定の切替待ち時間
を越えて継続したとき、話者（目的音源）が交代すると
判断して指向性制御部３０に対して指向性制御開始信号
を送信する。The control timing instructing section 20 monitors a voice input state, and determines that a speaker (a target sound source) is to be changed when a state in which no voice input is continued for more than a predetermined switching waiting time, and performs directivity control. The directivity control start signal is transmitted to the unit 30.

【００２３】指向性制御部３０は、音源領域選択部４０
で選択した音源領域情報が「次の指示部」としてセット
される音源領域情報格納部を有し、制御タイミング指示
部２０からの指向性制御開始信号があったときにマイク
ロホンアレイ１１の指向性を「次の指示部」としてセッ
トされた音源領域へ向けて制御する。The directivity controller 30 includes a sound source area selector 40
Has a sound source area information storage unit in which the sound source area information selected in step 2 is set as a “next instruction unit”, and determines the directivity of the microphone array 11 when there is a directivity control start signal from the control timing instruction unit 20. Control is performed toward the sound source area set as the “next instruction unit”.

【００２４】音源領域選択部４０は、検出したキーワー
ドからキーワード−音源領域対照表５０で対応付けられ
た音源領域情報を選択する。The sound source area selector 40 selects sound source area information associated with the keyword-sound source area comparison table 50 from the detected keywords.

【００２５】キーワード−音源領域対照表５０は、検出
キーワードによって特定される音源と音源領域情報を対
応付けた対照表である。このキーワード−音源領域対照
表５０は、目的音源（話者）の配置状況を勘案してあら
かじめ書替可能な記憶領域に格納される。The keyword-sound source region comparison table 50 is a comparison table in which the sound source specified by the detected keyword is associated with the sound source region information. The keyword-sound source area comparison table 50 is stored in a rewritable storage area in advance in consideration of the arrangement state of the target sound source (speaker).

【００２６】音声認識・キーワード検出部６０は、キー
ワードを会話の中からワードスポッティングにより切り
出す手段であり、音声認識処理部を有している。The voice recognition / keyword detection unit 60 is a means for extracting a keyword from a conversation by word spotting, and has a voice recognition processing unit.

【００２７】キーワード分類リスト表７０は、音声認識
・キーワード検出部６０が検出したキーワードによって
次の話者を特定したり、発言の終了を検出したりするリ
ストであり、あらかじめ出席する個人名などによって次
の話者を特定している。The keyword classification list table 70 is a list for specifying the next speaker based on the keyword detected by the voice recognition / keyword detection unit 60 or detecting the end of the utterance. The next speaker is identified.

【００２８】上記のような構成を有する、目的音源領域
指定収音装置の信号処理部は、以下のように動作する。
マイクロホンアレイ１１から収音された外部の音声信号
は、マイクロホンアレイ主装置１２を介して、目的音源
領域の音声のみが強調されて、音声認識・キーワード検
出部６０に伝達される。音声認識・キーワード検出部６
０に伝達された音声信号は、音声認識・キーワード検出
部６０内部で認識され、ワードスポッティングによりキ
ーワードだけが切り出される。The signal processing section of the target sound source area designating sound collection device having the above configuration operates as follows.
The external sound signal picked up from the microphone array 11 is transmitted to the voice recognition / keyword detection unit 60 via the microphone array main device 12 with only the sound of the target sound source area emphasized. Voice recognition / keyword detection unit 6
The voice signal transmitted to 0 is recognized inside the voice recognition / keyword detection unit 60, and only the keyword is cut out by word spotting.

【００２９】音声認識・キーワード検出部６０で切り出
されたキーワードは、キーワード分類リスト表７０を参
照して、表１に示す３種類のキーワード種類に分類され
る。The keywords extracted by the voice recognition / keyword detection unit 60 are classified into three types of keywords shown in Table 1 with reference to a keyword classification list table 70.

【００３０】[0030]

【表１】 [Table 1]

【００３１】キーワードの分類処理を図３を用いて説明
する。The keyword classification process will be described with reference to FIG.

【００３２】音声認識・キーワード検出部６０は、音声
入力があるか否かを監視し（Ｓ１）、音声入力がある
と、制御タイミング指示部２０の音声入力状態を「有
り」にセットする（Ｓ２）。次いで、音声認識・キーワ
ード検出部６０は、キーワードの検出を監視する（Ｓ
３）。キーワードを検出すると、キーワード分類リスト
表７０を参照して検出したキーワードが発明者個人名や
ニックネームなどの話者特定語であるか否かを判定する
（Ｓ４）。話者特定語であるときには、図４に示す処理
１を行った（Ｓ５）後、音声認識を終了したか否かを判
定し（Ｓ６）、終了であるときには処理を終了する。ス
テップＳ６で音声認識を終了していないときには、ステ
ップＳ１に戻って音声入力を監視する。The voice recognition / keyword detection unit 60 monitors whether or not there is a voice input (S1), and when there is a voice input, sets the voice input state of the control timing instructing unit 20 to "present" (S2). ). Next, the voice recognition / keyword detection unit 60 monitors the detection of the keyword (S
3). When a keyword is detected, it is determined whether or not the detected keyword is a speaker specific word such as an inventor's personal name or a nickname with reference to the keyword classification list table 70 (S4). If it is the speaker specific word, after performing the processing 1 shown in FIG. 4 (S5), it is determined whether or not the speech recognition has been completed (S6), and if it has been completed, the processing is terminated. If the voice recognition has not been completed in step S6, the process returns to step S1 to monitor the voice input.

【００３３】ステップＳ１で、音声入力がないときに
は、制御タイミング指示部２０の音声入力状態を「な
し」にセットし（Ｓ１０）、ステップＳ１に戻って音声
入力を監視する。If there is no voice input in step S1, the voice input state of the control timing instructing section 20 is set to "none" (S10), and the process returns to step S1 to monitor the voice input.

【００３４】ステップＳ４で検出したキーワードが話者
特定語でないときには、「皆さん」などの話者特定解消
指示語であるか否かを判定する（Ｓ１１）。話者特定解
消指示語であるときには、処理２を実行した（Ｓ１２）
後、ステップＳ６に移行して音声認識終了を監視する。If the keyword detected in step S4 is not a speaker specific word, it is determined whether or not it is a speaker specific cancellation instruction word such as "everyone" (S11). If it is the speaker specific cancellation instruction word, the process 2 is executed (S12).
Thereafter, the process shifts to step S6 to monitor the end of the speech recognition.

【００３５】ステップＳ１１で、話者特定解消指示語で
ないと判定されたときには、「どうぞ」や「〜ですか
？」などの明確な質問文である話者交代促進語であるか
否かを判定する（Ｓ１３）。話者交代促進語であるとき
には、処理３を実行した（Ｓ１４）後、ステップＳ６に
移行して音声認識終了を監視する。If it is determined in step S11 that the instruction word is not a speaker specific cancellation instruction word, it is determined whether the instruction word is a speaker change promotion word that is a clear question sentence such as "please" or "is?" (S13). If the word is a speaker change promotion word, after the process 3 is executed (S14), the process proceeds to step S6 to monitor the end of the speech recognition.

【００３６】ステップＳ３でキーワードを検出しなかっ
たときまたはステップＳ１３で話者交代促進語でなかっ
たときは、ステップＳ６に移行して音声認識終了を監視
する。If no keyword is detected in step S3 or if it is not a speaker change promotion word in step S13, the flow shifts to step S6 to monitor completion of speech recognition.

【００３７】ステップＳ５のキーワードが話者特定語で
あるときの処理１は、図４に示すような処理である。す
なわち、検出したキーワードが話者特定語であると判断
した音声認識・キーワード検出部６０は、音源領域選択
部４０に検出したキーワードを伝達し（Ｓ２１）、音源
領域選択部４０は、キーワード−音源領域対照表５０か
ら特定された話者に対応する音源領域情報を選択して
（Ｓ２２）、この音源領域情報を指向性制御部３０の
「次の指示部」にセットする（Ｓ２３）処理である。Processing 1 in step S5 when the keyword is a speaker specific word is processing as shown in FIG. That is, the speech recognition / keyword detecting unit 60 that determines that the detected keyword is the speaker specific word transmits the detected keyword to the sound source region selecting unit 40 (S21), and the sound source region selecting unit 40 The sound source area information corresponding to the specified speaker is selected from the area comparison table 50 (S22), and the sound source area information is set in the “next instruction section” of the directivity control unit 30 (S23). .

【００３８】キーワード−音源領域対照表５０は、選択
される音源領域候補と、検出キーワードが対応付けられ
ており、随時書換可能な形で記憶されている。In the keyword-sound source area comparison table 50, the selected sound source area candidates are associated with the detected keywords, and are stored in a form that can be rewritten as needed.

【００３９】ステップＳ１１で、キーワードが話者特定
解消指示語であると判断したときの処理２は、図５のフ
ローチャートに示されるように、音源領域選択部４０を
経由して、指向性制御部３０において「次の指示部」を
リセットする（Ｓ３０）処理である。In step S11, when it is determined that the keyword is the speaker specific elimination instructing word, the processing 2 is performed via the sound source area selecting section 40 and the directivity controlling section as shown in the flowchart of FIG. This is the process of resetting the "next instruction unit" in S30 (S30).

【００４０】ステップＳ１３で、キーワードが話者交代
促進語であると判断されたときの処理３は、図６のフロ
ーチャートに示されるように、音源領域選択部４０が制
御タイミング指示部２０の切替待ち時間を「短期間」に
セットする（Ｓ４０）処理である。In step S13, when it is determined that the keyword is the speaker change promotion word, as shown in the flowchart of FIG. 6, the sound source area selector 40 waits for the control timing instructor 20 to switch. This is a process of setting the time to “short period” (S40).

【００４１】制御タイミング指示部２０は、図７のフロ
ーチャートで示される処理を実行する。まず、指向性制
御部３０の切替待ち時間を「長期間」にセットする（Ｓ
５１）。音声入力状態「無し」の経過時間を監視し（Ｓ
５２）、音声入力状態が「無し」であるか否かを判断す
る（Ｓ５３）。音声入力状態が「無し」であるときに
は、音声入力状態「無し」の経過時間が、設定された待
ち時間を越えたか否かを判断する（Ｓ５４）。設定され
た待ち時間を越えたときには、制御タイミング指示部２
０は、指向性制御部３０に対して、指向性制御開始信号
を送信し（Ｓ５５）、切替待ち時間を「長期間」にセッ
トして（Ｓ５６）、制御タイミング指示が終了したか否
かを判断する（Ｓ５７）。The control timing instructing section 20 executes the processing shown in the flowchart of FIG. First, the switching waiting time of the directivity control unit 30 is set to “long time” (S
51). The elapsed time of the voice input state “None” is monitored (S
52), it is determined whether or not the voice input state is “absent” (S53). If the voice input state is "none", it is determined whether or not the elapsed time of the voice input state "none" has exceeded a set waiting time (S54). When the set waiting time is exceeded, the control timing instructing section 2
0 transmits a directivity control start signal to the directivity control unit 30 (S55), sets the switching waiting time to "long term" (S56), and determines whether the control timing instruction has been completed. A determination is made (S57).

【００４２】制御タイミング指示が終了したときには、
この処理を終了する。When the control timing instruction is completed,
This processing ends.

【００４３】ステップＳ５３で音声入力が「あり」とな
った場合は、ステップＳ５７に移行して制御タイミング
指示が終了したか否かを判定した後、終了した場合に
は、処理を終了する。If the voice input is "present" in step S53, the process proceeds to step S57 to determine whether or not the control timing instruction has been completed. If so, the process is terminated.

【００４４】ステップＳ５４の判定で，音声入力状態
「無し」の経過時間が切り替え待ち時間を超えないとき
には、ステップＳ５３に戻り音声入力の有り無しを監視
する。If it is determined in step S54 that the elapsed time of the voice input state "absent" does not exceed the switching waiting time, the process returns to step S53 and the presence or absence of voice input is monitored.

【００４５】ステップＳ５７で、制御タイミング指示が
終了していないときには、ステップＳ５２に戻り音声入
力状態「無し」の経過時間を監視する。If the control timing instruction has not been completed in step S57, the flow returns to step S52 to monitor the elapsed time of the voice input state "none".

【００４６】切替待ち時間は、制御開始直後と指向性制
御終了後に「長期間」にセットされるが、処理３が実行
されれば「短期間」に変更される。The switching waiting time is set to “long time” immediately after the start of the control and after the end of the directivity control, but is changed to “short time” when the processing 3 is executed.

【００４７】指向性制御部３０の処理を、図８のフロー
チャートを用いて説明する。指向性制御部３０は、指向
性制御処理に先立って既に登録されている「次の指示
部」データをリセットした（Ｓ６１）後、制御タイミン
グ指示部２０からの指向性制御開始信号を受信したか否
かを監視し、受信するまで監視を続ける（Ｓ６２）。指
向性制御開始信号を受信すると、「次の指示部」データ
がセットされているか否かを判断する（Ｓ６３）。制御
タイミング指示部２０から前記指向性制御開始信号を受
信したときに、「次の指示部」がセットされていれば、
指示部に向けてマイクロホンの指向性を制御する制御信
号をマイクロホンアレイ主装置１２へ伝達する（Ｓ６
４）。ステップＳ６３の判断で、「次の指示部」がセッ
トされていなければ、指向性制御を解除する制御信号を
マイクロホンアレイ主装置１２へ伝達する（Ｓ６５）。The processing of the directivity control unit 30 will be described with reference to the flowchart of FIG. The directivity control unit 30 resets the data of the “next instruction unit” already registered prior to the directivity control process (S61), and then receives the directivity control start signal from the control timing instruction unit 20. It monitors whether or not it is, and continues monitoring until it is received (S62). When the directivity control start signal is received, it is determined whether or not the “next instruction section” data is set (S63). When the “next instruction unit” is set when the directivity control start signal is received from the control timing instruction unit 20,
A control signal for controlling the directivity of the microphone is transmitted to the microphone array main device 12 toward the instruction unit (S6).
4). If it is determined in step S63 that the “next instruction unit” is not set, a control signal for canceling the directivity control is transmitted to the microphone array main device 12 (S65).

【００４８】その後、指向性制御を終了するか否かを判
断し（Ｓ６６）、終了するときには、この処理を終了
し、終了しないときには、ステップＳ６１に戻って指向
性制御処理を継続する。Thereafter, it is determined whether or not the directivity control is to be terminated (S66). When the process is to be terminated, this process is terminated. When not to be terminated, the process returns to step S61 to continue the directivity control process.

【００４９】尚、上記実施例については、以下のように
変更を加えることで、より適切な指向性制御が可能にな
る。In the above embodiment, more appropriate directivity control can be performed by making the following changes.

【００５０】第一に、キーワード−音源領域対照表５０
として、１キーワードを１つの音源領域に関連付ける単
純な表ではなく、複数のキーワードを組み合わせた論理
演算式と音源領域を関連付けることで、音源推定の精度
を上げることができる。First, a keyword-sound source area comparison table 50
However, the accuracy of sound source estimation can be improved by associating the sound source region with a logical operation expression combining a plurality of keywords, instead of a simple table in which one keyword is associated with one sound source region.

【００５１】第二に、上記実施例では、指向性制御部３
０の音源領域情報は最も新しいものに書きかえられる。
音源領域選択部４０において、キーワードに優先度を設
け、前に伝達したものよりも優先度の高い場合だけ、前
記音源領域情報を伝達することで、音源推定の精度を上
げることができる。Second, in the above embodiment, the directivity control unit 3
The sound source area information of 0 is rewritten to the newest one.
In the sound source area selection unit 40, priorities are assigned to keywords, and the sound source area information is transmitted only when the priority is higher than that previously transmitted, so that the accuracy of sound source estimation can be improved.

【００５２】第三に上記実施例の形態では、発言者の交
代は、前発言者の発言内容からキーワードを得て行われ
る。一方、会議などでは発言者とは別に議事の進行役が
存在して、発言者の交代に強制力を持つ場合がある。こ
のように進行役が存在する場合の目的音源領域指定収音
手法について図２を用いて説明する。Third, in the embodiment described above, the change of the speaker is performed by obtaining a keyword from the content of the previous speaker. On the other hand, in a meeting or the like, there is a case where a facilitator of the agenda exists separately from a speaker, and the change of the speaker is forced. The target sound source area designating sound collection method in the case where the facilitator is present will be described with reference to FIG.

【００５３】図２は、本発明の第２の実施の形態にかか
るキーワード検出による目的音源領域指定収音装置の信
号処理部の構成例を示すブロック図である。第２の実施
の形態にかかるキーワード検出による目的音源領域指定
収音装置の信号処理部は、第１の実施の形態にかかるキ
ーワード検出による目的音源領域指定収音装置の信号処
理部の構成に、議事進行役専用の優先マイクロホン８０
のようなキーワード入力部と、音声選択スイッチ９０を
設けた点に特徴を有している。FIG. 2 is a block diagram showing a configuration example of a signal processing unit of a target sound source area designating sound collecting apparatus by keyword detection according to a second embodiment of the present invention. The signal processing unit of the target sound source area designating sound pickup device by keyword detection according to the second embodiment has a configuration of a signal processing unit of the target sound source region designating sound collection device by keyword detection according to the first embodiment. Priority microphone 80 exclusively for the facilitator
This is characterized in that a keyword input unit as described above and a voice selection switch 90 are provided.

【００５４】この実施の形態では、音声認識・キーワー
ド検出部６０に入力される音声は、収音装置１０からの
音声と、議事進行役専用の優先マイクロホン８０からの
音声のいずれかの音声を選択する音声選択スイッチ９０
を介して入力される。In this embodiment, as the voice input to the voice recognition / keyword detection unit 60, one of the voice from the sound collection device 10 and the voice from the priority microphone 80 dedicated to the chairman is selected. Sound selection switch 90
Is entered via

【００５５】音声選択スイッチ９０は、議事進行役専用
の優先マイクロホン８０から音声入力があったときに、
この音声を他のキーワード入力（収音装置１０）よりも
優先して音声認識・キーワード検出部６０に入力するこ
とで、音源推定の精度を上げることができる。The voice selection switch 90 is used when the priority microphone 80 dedicated to the chairman inputs a voice.
By inputting this voice to the voice recognition / keyword detection unit 60 prior to other keyword input (sound collecting device 10), the accuracy of sound source estimation can be improved.

【００５６】[0056]

【発明の効果】以上により、本発明のキーワード検出に
よる目的音源領域指定収音装置は、収音走査を行わない
で、キーワードから音源を特定し、指向性制御を行う収
音装置であるために、目的音のみを高いＳＮ比で収音で
き、特に雑音の多い環境下で効果的である。As described above, the target sound source area designating sound collecting device by keyword detection according to the present invention is a sound collecting device which specifies a sound source from a keyword and performs directivity control without performing sound collecting scanning. In addition, only the target sound can be collected at a high SN ratio, which is particularly effective in a noisy environment.

【００５７】キーワードの入力手段として、収音された
音声信号を認識する音声認識部を組み込むことで、入力
作業に煩わされること無く、リアルタイムに指向性制御
を自動で行うことができるBy incorporating a voice recognition unit for recognizing collected voice signals as keyword input means, directivity control can be automatically performed in real time without bothering input operations.

【００５８】また、カメラ等の撮影装置と連動させるこ
とで、音源を自動的に追従撮影することが可能になるな
ど、他の装置と組み合わせた場合、多様な機能の実現が
期待できる。Further, by linking with a photographing device such as a camera, it is possible to automatically follow up and photograph a sound source. When combined with another device, it is expected that various functions can be realized.

[Brief description of the drawings]

【図１】本発明の第１の実施形態であるキーワード検出
による目的音源領域指定収音装置の信号処理部の構成を
示すブロック図。FIG. 1 is a block diagram showing a configuration of a signal processing unit of a target sound source area designating sound pickup device by keyword detection according to a first embodiment of the present invention.

【図２】本発明の第２の実施形態であるキーワード検出
による目的音源領域指定収音装置の信号処理部の構成を
示すブロック図。FIG. 2 is a block diagram showing a configuration of a signal processing unit of a target sound source area designating sound collection device by keyword detection according to a second embodiment of the present invention.

【図３】本発明装置の音声認識・キーワード検出部の動
作を説明するためのフローチャート。FIG. 3 is a flowchart for explaining the operation of a voice recognition / keyword detection unit of the device of the present invention.

【図４】図３のフローチャートにおけるキーワードが話
者特定語であるときの処理１の内容を具体化したフロー
チャート。FIG. 4 is a flowchart that embodies the contents of processing 1 when the keyword in the flowchart of FIG. 3 is a speaker specific word;

【図５】図３のフローチャートにおけるキーワードが話
者特定解消指示語であるときの処理２の内容を具体化し
たフローチャート。FIG. 5 is a flowchart that embodies the content of processing 2 when the keyword in the flowchart of FIG. 3 is a speaker identification cancellation instruction word;

【図６】図３のフローチャートにおけるキーワードが話
者交代促進語であるときの処理３の内容を具体化したフ
ローチャート。FIG. 6 is a flowchart that embodies the content of processing 3 when the keyword in the flowchart of FIG. 3 is a speaker change promotion word;

【図７】本発明装置の制御タイミング指示部の動作を説
明するためのフローチャート。FIG. 7 is a flowchart for explaining the operation of a control timing instruction unit of the device of the present invention.

【図８】本発明装置の指向性制御部の動作を説明するた
めのフローチャート。FIG. 8 is a flowchart for explaining the operation of the directivity control unit of the device of the present invention.

[Explanation of symbols]

１０収音部１１マイクロホンアレイ１２マイクロホンアレイ主装置２０制御タイミング指示部３０指向性制御部４０音源領域選択部５０キーワード−音源領域対照表６０音声認識・キーワード検出部７０キーワード分類リスト表８０優先マイクロホン９０音声選択スイッチ Reference Signs List 10 sound pickup unit 11 microphone array 12 microphone array main unit 20 control timing instruction unit 30 directivity control unit 40 sound source region selection unit 50 keyword-sound source region comparison table 60 voice recognition / keyword detection unit 70 keyword classification list table 80 priority microphone 90 Voice selection switch

Claims

[Claims]

1. A sound pickup device comprising: a sound pickup unit having directivity; a unit for controlling the directivity of the sound pickup unit; a unit for selecting a target sound source area for sound pickup; and a keyword input unit. The keyword and the sound source area are associated in advance, and when the keyword is input from the keyword input unit, the associated sound source area is selected, and the directivity of the sound pickup unit is adjusted according to the selected area. A sound pickup device characterized by controlling the following.

2. The keyword input device according to claim 1, further comprising: means for recognizing a voice signal obtained from the sound collecting section to obtain voice information; and means for detecting a keyword from the voice information. A sound pickup device as described.

3. The sound pickup device according to claim 1, wherein the keyword input from the keyword input means is an identifier for identifying a sound source.

4. An audio signal obtained from the sound pickup section,
2. The sound pickup according to claim 1, further comprising means for detecting a state of no sound, wherein the directivity control of the sound pickup unit is performed in synchronization with the case where the soundless state continues for a predetermined time or more. apparatus.

5. A sound pickup device in which the directivity of a sound pickup unit having directivity is controlled toward a target sound source area,
Keyword detection means for detecting a keyword specifying the next target sound source from the voice input from the sound pickup unit, a keyword classification list storing data indicating a change of the target sound source, and data in which the target sound source and the sound source region are previously associated A keyword-sound source region comparison table, a sound source region selection unit that selects a sound source region by referring to the keyword-sound source region comparison table using data on the target sound source detected by the keyword detection unit, and a directivity of the sound collection unit. Control timing instructing means for outputting a control start instruction signal for instructing the timing to start the control to shift to the selected sound source region,
A sound pickup device comprising: a directivity control unit that controls the directivity of a sound pickup unit according to a sound source area selection signal and a control start instruction signal.

6. The control timing instructing means includes means for detecting a state of no sound from a sound signal obtained from a sound pickup unit, and controls the control in synchronization with the case where the state of no sound continues for a predetermined time or more. The sound collection device according to claim 5, wherein a start instruction signal is output.