JP2017208682A

JP2017208682A - Target sound collection device, target sound collection method, program, and recording media

Info

Publication number: JP2017208682A
Application number: JP2016099334A
Authority: JP
Inventors: 小林　和則; Kazunori Kobayashi; 和則小林; 翔一郎齊藤; Shoichiro Saito; 弘章伊藤; Hiroaki Ito
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-05-18
Filing date: 2016-05-18
Publication date: 2017-11-24
Anticipated expiration: 2036-05-18
Also published as: JP6538002B2

Abstract

PROBLEM TO BE SOLVED: To provide a target sound collection device that can estimate a direction of unnecessary sound to execute directional characteristics sound collection with a direction other than that of the unnecessary sound as a sound collection direction.SOLUTION: A target sound collection device includes: a direction estimation unit for estimating a direction of a sound source on the basis of a plurality of acoustic signals collected from a plurality of microphones; a sound generation frequency measurement unit for measuring frequency estimated as a sound source direction regarding each direction; an unnecessary sound direction estimation unit for estimating some direction as an unnecessary sound direction when comparison results between frequency in some direction in each direction and a predetermined threshold value satisfies a predetermined condition; a sound collection direction determination unit for determining a direction of a sound source that is a direction different from the unnecessary sound direction as a sound collection direction when a direction different from the unnecessary sound direction is estimated as the sound collection direction; and a directional characteristics sound collection unit for emphasizing sound in a determined sound collection direction to collect it.SELECTED DRAWING: Figure 3

Description

本発明は、複数のマイクロホンを利用した集音技術に利用される目的音集音装置、目的音集音方法、プログラム、記録媒体に関する。 The present invention relates to a target sound collection device, a target sound collection method, a program, and a recording medium that are used in sound collection technology using a plurality of microphones.

図１、図２に特許文献１の目的音集音装置９の構成、および動作を模式的に示す。図１に示すように、特許文献１の目的音集音装置９は、方向推定部９１、集音方向制御部９２、指向性集音部９３、記憶部９４を含み、集音方向制御部９２は、所望方向設定部９２２、集音方向決定部９２３を含む。方向推定部９１は、複数のマイクロホン８−１、…、８−Ｎ（Ｎは２以上の整数）から集音した複数の音響信号に基づいて音源の方向を推定する（Ｓ９１）。方向推定部９１は、マイクロホン８−１、…、８−Ｎ間で発生する時間差や振幅差を手掛かりに音源の方向を推定する。所望方向設定部９２２は、集音を所望する方向（所望方向）、あるいは集音を所望する角度範囲（所望角度範囲）を予め設定する（Ｓ９２２）。集音方向決定部９２３は、ステップＳ９１で推定された音源の方向が予め設定された所望方向（あるいは所望角度範囲）と合致する場合に、当該音源の方向を集音方向と決定する（Ｓ９２３）。指向性集音部９３は、ステップＳ９２３で決定された集音方向の音を強調して指向性集音を実行する（Ｓ９３）。ステップＳ９３は、特許文献２などに開示された方法により実現できる。ステップＳ９３で指向性集音された音響信号は、同図に示すように記憶部９４に記憶する構成としてもよいし、この装置の外部に出力されてもよい。 FIG. 1 and FIG. 2 schematically show the configuration and operation of the target sound collecting device 9 of Patent Document 1. As shown in FIG. 1, the target sound collection device 9 of Patent Document 1 includes a direction estimation unit 91, a sound collection direction control unit 92, a directivity sound collection unit 93, and a storage unit 94, and the sound collection direction control unit 92. Includes a desired direction setting unit 922 and a sound collection direction determination unit 923. The direction estimating unit 91 estimates the direction of the sound source based on a plurality of acoustic signals collected from the plurality of microphones 8-1,..., 8-N (N is an integer of 2 or more) (S91). The direction estimation unit 91 estimates the direction of the sound source based on the time difference or the amplitude difference generated between the microphones 8-1,. The desired direction setting unit 922 sets in advance a direction in which sound collection is desired (desired direction) or an angle range in which sound collection is desired (desired angle range) (S922). The sound collection direction determination unit 923 determines the sound source direction as the sound collection direction when the direction of the sound source estimated in step S91 matches a preset desired direction (or a desired angle range) (S923). . The directivity sound collection unit 93 emphasizes the sound in the sound collection direction determined in step S923 and executes directivity sound collection (S93). Step S93 can be realized by the method disclosed in Patent Document 2 and the like. The acoustic signal collected in step S93 may be stored in the storage unit 94 as shown in the figure, or may be output to the outside of the apparatus.

特開２００５−６４９６８号公報JP 2005-64968 A 特開２００９−４４５８８号公報JP 2009-44588 A

特許文献１の方法は予め集音したい方向や角度範囲が決まっていれば有効である。一方、特許文献１の方法において予め集音したい方向や角度範囲が決まっていない場合、全方位が集音方向とされる。この場合、周囲に不要音を発するものや人がいる場合には、この音も必要な音として集音してしまうという問題がある。例えば、音声認識を用いて対話を行うロボットや、音声認識を用いて機器の操作を行うリモコンなどに特許文献１の方法を適用した場合、不要音に反応して音声認識が行われ、ロボットやリモコンが誤動作を起こす可能性がある。 The method of Patent Document 1 is effective if the direction and angle range in which sound collection is desired are determined in advance. On the other hand, when the direction and angle range in which the sound is desired to be collected are not determined in advance in the method of Patent Document 1, all directions are set as the sound collection direction. In this case, if there is a person or person who emits unnecessary sounds around, there is a problem that this sound is also collected as a necessary sound. For example, when the method of Patent Document 1 is applied to a robot that performs dialogue using voice recognition or a remote controller that operates a device using voice recognition, voice recognition is performed in response to unwanted sounds, The remote control may malfunction.

そこで本発明では、不要音方向を推定し、不要音方向以外の方向を集音方向として指向性集音を実行することができる目的音集音装置を提供することを目的とする。 Therefore, an object of the present invention is to provide a target sound collecting device that can estimate a direction of unnecessary sound and execute directional sound collection using a direction other than the unnecessary sound direction as a sound collecting direction.

本発明の目的音集音装置は、方向推定部と、発音頻度計測部と、不要音方向推定部と、集音方向決定部と、指向性集音部を含む。 The target sound collection device of the present invention includes a direction estimation unit, a pronunciation frequency measurement unit, an unnecessary sound direction estimation unit, a sound collection direction determination unit, and a directional sound collection unit.

方向推定部は、複数のマイクロホンから集音した複数の音響信号に基づいて音源の方向を推定する。発音頻度計測部は、各方向について音源の方向として推定された頻度を計測する。不要音方向推定部は、各方向のうちの何れかの方向における頻度と予め定めた閾値との比較結果が所定の条件を充たす場合に、何れかの方向を不要音方向と推定する。集音方向決定部は、不要音方向と異なる方向が音源の方向と推定された場合に、不要音方向と異なる方向である音源の方向を集音方向と決定する。指向性集音部は、決定された集音方向の音を強調して集音する。 The direction estimation unit estimates the direction of the sound source based on a plurality of acoustic signals collected from a plurality of microphones. The pronunciation frequency measurement unit measures the frequency estimated as the direction of the sound source in each direction. The unnecessary sound direction estimation unit estimates any direction as an unnecessary sound direction when a comparison result between the frequency in any one of the directions and a predetermined threshold satisfies a predetermined condition. When the direction different from the unnecessary sound direction is estimated as the direction of the sound source, the sound collection direction determination unit determines the direction of the sound source that is different from the unnecessary sound direction as the sound collection direction. The directional sound collection unit collects sound by emphasizing the sound in the determined sound collection direction.

本発明の目的音集音装置によれば、不要音方向を推定し、不要音方向以外の方向を集音方向として指向性集音を実行することができる。 According to the target sound collecting device of the present invention, it is possible to estimate a direction of unnecessary sound and execute directional sound collection using a direction other than the unnecessary sound direction as a sound collecting direction.

従来技術の目的音集音装置の構成を示すブロック図。The block diagram which shows the structure of the target sound collector of a prior art. 従来技術の目的音集音装置の動作を示すフローチャート。The flowchart which shows operation | movement of the target sound collector of a prior art. 実施例１の目的音集音装置の構成を示すブロック図。FIG. 3 is a block diagram illustrating a configuration of a target sound collection device according to the first embodiment. 実施例１の目的音集音装置の動作を示すフローチャート。3 is a flowchart showing the operation of the target sound collection device of the first embodiment. 実施例２の目的音集音装置の構成を示すブロック図。FIG. 5 is a block diagram illustrating a configuration of a target sound collection device according to a second embodiment. 実施例２の目的音集音装置の合成音声生成、再生動作を示すフローチャート。9 is a flowchart showing a synthetic voice generation and reproduction operation of the target sound collection device according to the second embodiment. 実施例２の目的音集音装置の集音方向制御動作を示すフローチャート。9 is a flowchart illustrating a sound collection direction control operation of the target sound collection device according to the second embodiment. 変形例１の目的音集音装置の構成を示すブロック図。The block diagram which shows the structure of the target sound collector of the modification 1. FIG. 変形例１の目的音集音装置の発話検出動作を示すフローチャート。The flowchart which shows the speech detection operation | movement of the target sound collector of the modification 1. 変形例１の目的音集音装置の集音方向制御動作を示すフローチャート。10 is a flowchart showing a sound collection direction control operation of the target sound collection device of Modification 1; 実施例３の目的音集音装置の構成を示すブロック図。FIG. 9 is a block diagram illustrating a configuration of a target sound collecting device according to a third embodiment. 実施例３の目的音集音装置の動作を示すフローチャート。10 is a flowchart showing the operation of the target sound collecting device of the third embodiment. 変形例２の目的音集音装置の構成を示すブロック図。The block diagram which shows the structure of the target sound collector of the modification 2. FIG. 変形例２の目的音集音装置の動作を示すフローチャート。10 is a flowchart showing the operation of a target sound collecting device according to a second modification.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the structure part which has the same function, and duplication description is abbreviate | omitted.

以下、図３、図４を参照して実施例１の目的音集音装置の構成、および動作を説明する。図３に示すように、本実施例の目的音集音装置１は、方向推定部９１と、集音方向制御部１２と、指向性集音部９３と、記憶部９４を含み、集音方向制御部１２は、発音頻度計測部１２１と、不要音方向推定部１２２と、集音方向決定部１２３を含む。方向推定部９１、指向性集音部９３、記憶部９４は、特許文献１の目的音集音装置９の同名、同番号の構成要件と同じ機能を有するため、説明を略する。 Hereinafter, the configuration and operation of the target sound collecting apparatus according to the first embodiment will be described with reference to FIGS. 3 and 4. As shown in FIG. 3, the target sound collection device 1 of the present embodiment includes a direction estimation unit 91, a sound collection direction control unit 12, a directional sound collection unit 93, and a storage unit 94, and the sound collection direction The control unit 12 includes a sound generation frequency measurement unit 121, an unnecessary sound direction estimation unit 122, and a sound collection direction determination unit 123. Since the direction estimation unit 91, the directivity sound collection unit 93, and the storage unit 94 have the same functions as the constituent elements of the same name and the same number as the target sound collection device 9 of Patent Document 1, description thereof is omitted.

発音頻度計測部１２１は、各方向について音源の方向として推定された頻度を計測する（Ｓ１２１）。すなわち、発音頻度計測部１２１は、一定時間内にどの方向からどのくらいの頻度で発音があったかを計測する。発音があったか否かについては、方向推定部９１の出力から知ることができる。発音頻度計測部１２１は、例えば過去Ｔ秒の間に方向推定部９１が推定した方向がθであった時間の合計をＡ（θ）秒とすれば、θ方向の発音頻度を、それらの比Ｄ（θ）＝Ａ（θ）／Ｔとして求めることができる。発音頻度計測部１２１は、この頻度を各方向についてすべて求める。例えば雑音源がテレビや音楽受聴用のスピーカであると想定した場合、これらは長時間の間ほとんど無音になることなく、マイクロホン８−１、…８−Ｎには同じ方向から音が到来し続けることとなる。このような雑音源がθ方向にあった場合、発音頻度Ｄ（θ）は１に近い大きな値をとることになる。 The pronunciation frequency measurement unit 121 measures the frequency estimated as the direction of the sound source for each direction (S121). That is, the sound generation frequency measuring unit 121 measures how often the sound is generated from which direction within a certain time. Whether or not there is a pronunciation can be known from the output of the direction estimation unit 91. For example, if the total time when the direction estimated by the direction estimation unit 91 is θ during the past T seconds is A (θ) seconds, the sound generation frequency measurement unit 121 calculates the sound generation frequency in the θ direction as a ratio thereof. It can be obtained as D (θ) = A (θ) / T. The pronunciation frequency measuring unit 121 calculates all the frequencies in each direction. For example, if it is assumed that the noise source is a TV or a speaker for listening to music, they will not be silent for a long time, and sound will continue to arrive at the microphones 8-1,. It will be. When such a noise source is in the θ direction, the sound generation frequency D (θ) takes a large value close to 1.

不要音方向推定部１２２は、各方向のうちの何れかの方向における頻度と予め定めた閾値との比較結果が所定の条件を充たす場合に、何れかの方向を不要音方向と推定する（Ｓ１２２）。例えば不要音方向推定部１２２は、前述の発音頻度Ｄ（θ）が、予め設定した閾値Ｅ（０≦Ｅ≦１）を超える場合に、その方向を不要音方向として設定する。不要音方向推定部１２２は、すべての方向について同様の推定を行い、一方向、または複数の方向を不要音方向と推定する。不要音方向推定部１２２は、不要音方向として設定された方向θ_Ｎに基づいて定まる所定の角度範囲内の全ての方向（例えばθ_Ｎ＋Δθ〜θ_Ｎ−Δθの範囲内の全ての方向）を不要音方向としてもよい。Δθは予め設定された不要音方向の設定幅であり、方向推定の精度に基づいて設定される。例えば方向推定精度が１０度であった場合に１０度以上の値を設定することで、方向の推定誤差で不要音を必要な音として判定してしまうことがなくなる。 The unnecessary sound direction estimation unit 122 estimates any direction as an unnecessary sound direction when a comparison result between the frequency in any one of the directions and a predetermined threshold satisfies a predetermined condition (S122). ). For example, when the aforementioned sound generation frequency D (θ) exceeds a preset threshold value E (0 ≦ E ≦ 1), the unnecessary sound direction estimation unit 122 sets that direction as the unnecessary sound direction. The unnecessary sound direction estimation unit 122 performs the same estimation for all directions, and estimates one direction or a plurality of directions as unnecessary sound directions. Unwanted sound direction estimating unit 122, the all directions within a predetermined angular range determined based on the set direction theta _N as unnecessary sound direction (e.g. all directions within the range of θ _{_N +} Δθ~θ _N -Δθ) An unnecessary sound direction may be used. Δθ is a preset setting range of the unnecessary sound direction, and is set based on the accuracy of direction estimation. For example, when the direction estimation accuracy is 10 degrees, by setting a value of 10 degrees or more, unnecessary sounds are not determined as necessary sounds due to direction estimation errors.

集音方向決定部１２３は、不要音方向と異なる方向が音源の方向と推定された場合に、不要音方向と異なる方向である音源の方向を集音方向と決定する（Ｓ１２３）。 When the direction different from the unnecessary sound direction is estimated as the direction of the sound source, the sound collection direction determination unit 123 determines the direction of the sound source that is different from the unnecessary sound direction as the sound collection direction (S123).

本実施例の目的音集音装置１によれば、音を継続的に発生している音源を不要音と推定し、それ以外の音源を目的音として取り扱うため、テレビや音楽受聴用のスピーカなどの雑音源が存在する場合でも、目的音を適切に判別し、目的音を強調して集音することができる。 According to the target sound collecting device 1 of the present embodiment, a sound source that continuously generates sound is estimated as an unnecessary sound, and other sound sources are handled as target sounds. Even if there is a noise source, the target sound can be appropriately identified and the target sound can be emphasized and collected.

以下、図５、図６、図７を参照して実施例２の目的音集音装置の構成、および動作を説明する。図５に示すように、本実施例の目的音集音装置２は、方向推定部９１と、集音方向制御部２２と、指向性集音部９３と、記憶部９４と、発話制御部２４と、音声合成部２５を含み、集音方向制御部２２は、発音頻度計測部２２１と、不要音方向推定部１２２と、集音方向決定部１２３を含む。発音頻度計測部２２１、発話制御部２４、音声合成部２５以外の構成要件については、実施例１の目的音集音装置１の同名、同番号の構成要件と同じ機能を有するため、説明を略する。本実施例の目的音集音装置２は、ユーザと対話を行う装置であることを想定し、発話制御部２４と、音声合成部２５を備えることにより、発話制御、音声合成機能を有している。発話制御部２４は、発話を制御する（Ｓ２４）。音声合成部２５は、合成音声を生成して再生する（Ｓ２５）。本実施例の発話頻度計測部２２１は、合成音声が再生されている時間に限り、頻度を計測する（Ｓ２２１）。 Hereinafter, the configuration and operation of the target sound collecting apparatus according to the second embodiment will be described with reference to FIGS. 5, 6, and 7. As shown in FIG. 5, the target sound collection device 2 of this embodiment includes a direction estimation unit 91, a sound collection direction control unit 22, a directivity sound collection unit 93, a storage unit 94, and a speech control unit 24. The sound collection direction control unit 22 includes a sound generation frequency measurement unit 221, an unnecessary sound direction estimation unit 122, and a sound collection direction determination unit 123. The constituent elements other than the pronunciation frequency measuring unit 221, the speech control unit 24, and the speech synthesizing unit 25 have the same functions as those of the target sound collecting device 1 of the first embodiment and have the same functions, and thus the description thereof is omitted. To do. Assuming that the target sound collecting device 2 of the present embodiment is a device that interacts with the user, the target sound collecting device 2 has an utterance control unit 24 and a voice synthesis unit 25, thereby having an utterance control and a voice synthesis function. Yes. The utterance control unit 24 controls the utterance (S24). The speech synthesizer 25 generates and reproduces synthesized speech (S25). The utterance frequency measuring unit 221 according to the present embodiment measures the frequency only during the time when the synthesized speech is being reproduced (S221).

合成音声再生中は、対話相手であるユーザは相槌を打つ程度で、ユーザが頻繁に発話をする可能性は少ないものと想定される。従って合成音声再生中の発音頻度が高い場合、その音源はユーザでなく、雑音源である可能性が高い。本実施例の目的音集音装置２はこの想定に基づき、ユーザと対話を行う装置においても、実施例１と同様の効果を奏する。 During the synthetic voice playback, it is assumed that the user who is the other party of the conversation only has a chance to talk, and the user is unlikely to speak frequently. Therefore, when the pronunciation frequency is high during the reproduction of the synthesized speech, the sound source is likely not a user but a noise source. Based on this assumption, the target sound collecting device 2 of the present embodiment has the same effect as that of the first embodiment even in a device that interacts with the user.

＜変形例１＞
以下、図８、図９、図１０を参照して、発話制御部２４と、音声合成部２５を発話検出部２６ａに代替し、発音頻度計測部２２１を発音頻度計測部２２１ａに代替した変形例１の目的音集音装置２ａの構成、および動作について説明する。 <Modification 1>
Hereinafter, with reference to FIG. 8, FIG. 9, and FIG. 10, a modification in which the utterance control unit 24 and the speech synthesis unit 25 are replaced with the utterance detection unit 26a, and the pronunciation frequency measurement unit 221 is replaced with the pronunciation frequency measurement unit 221a. The configuration and operation of the target sound collection device 2a will be described.

発話検出部２６ａは、スピーカ再生用音信号から発話を検出する（Ｓ２６ａ）。発話検出部２６ａは、スピーカ再生用音信号のレベルとあらかじめ設定した閾値を比較し、スピーカ再生用音信号のレベルが閾値を超える場合に、発話ありとして発話を検出する。発話頻度計測部２２１ａは、発話が検出されている時間に限り、頻度を計測する（Ｓ２２１ａ）。 The utterance detection unit 26a detects an utterance from the sound signal for speaker reproduction (S26a). The utterance detection unit 26a compares the level of the sound signal for speaker reproduction with a preset threshold value, and detects the utterance as uttered when the level of the sound signal for speaker reproduction exceeds the threshold value. The utterance frequency measuring unit 221a measures the frequency only during the time when the utterance is detected (S221a).

以下、図１１、図１２を参照して実施例３の目的音集音装置の構成、および動作を説明する。図１１に示すように、本実施例の目的音集音装置３は、方向推定部９１と、集音方向制御部３２と、指向性集音部９３と、記憶部９４と、発話制御部２４と、音声合成部２５を含み、集音方向制御部３２は、発音頻度計測部２２１と、不要音方向推定部１２２と、集音方向決定部３２３と、発音タイミング計測部３２４を含む。集音方向決定部３２３、発音タイミング計測部３２４以外の構成要件については、実施例２の目的音集音装置２の同名、同番号の構成要件と同じ機能を有するため、説明を略する。 Hereinafter, the configuration and operation of the target sound collecting apparatus according to the third embodiment will be described with reference to FIGS. 11 and 12. As shown in FIG. 11, the target sound collection device 3 of this embodiment includes a direction estimation unit 91, a sound collection direction control unit 32, a directivity sound collection unit 93, a storage unit 94, and an utterance control unit 24. The sound collection direction control unit 32 includes a sound generation frequency measurement unit 221, an unnecessary sound direction estimation unit 122, a sound collection direction determination unit 323, and a sound generation timing measurement unit 324. Since the components other than the sound collection direction determination unit 323 and the sound generation timing measurement unit 324 have the same functions as the component having the same name and the same number as the target sound collection device 2 of the second embodiment, description thereof will be omitted.

発音タイミング計測部３２４は、合成音声の再生が休止している場合に、合成音声の再生が休止した直後から最初に音源の方向の推定がなされるまでの時間を計測する（Ｓ３２４）。 The pronunciation timing measurement unit 324 measures the time from when the synthesized speech is paused to when the direction of the sound source is first estimated when the synthesized speech is paused (S324).

集音方向決定部３２３は、最初に推定された音源の方向が不要音方向と異なる方向であって、ステップＳ３２４で計測された時間が予め定めた条件を充たす場合に、最初に推定された音源の方向を集音方向と決定する（Ｓ３２３）。具体的には、集音方向決定部３２３は、方向推定部９１が推定した音源の方向が不要音方向推定部１２２において不要音方向と推定されていないという条件に加え、発音タイミング計測部３２４で計測された時間が、予め設定した最小値と最大値の間に入っている場合に、当該推定された方向を集音方向を決定する（Ｓ３２３）。ただし、最小値、最大値は負の値でも正の値でもよい。 The sound collection direction determination unit 323 determines the sound source that is estimated first when the direction of the sound source estimated first is different from the unnecessary sound direction and the time measured in step S324 satisfies a predetermined condition. Is determined as the sound collection direction (S323). Specifically, the sound collection direction determination unit 323 performs the sound generation timing measurement unit 324 in addition to the condition that the direction of the sound source estimated by the direction estimation unit 91 is not estimated as the unnecessary sound direction by the unnecessary sound direction estimation unit 122. If the measured time is between the preset minimum and maximum values, the sound collection direction is determined for the estimated direction (S323). However, the minimum and maximum values may be negative or positive.

本装置と対話を行っているユーザであれば、対話音声（合成音声）の再生が終わった直後に発話するのが自然であるものと想定される。本実施例の目的音集音装置３はこの想定に基づき、対話音声（合成音声）の再生が休止している場合に、対話音声（合成音声）の再生が休止した直後から短い時間以内に発音が開始されたか否かに注目することで音源がユーザであるか否かを判別することができる。 If the user is interacting with the apparatus, it is assumed that it is natural to utter immediately after the reproduction of the conversation voice (synthetic voice) ends. Based on this assumption, the target sound collection device 3 of the present embodiment generates a sound within a short time immediately after the reproduction of the dialogue voice (synthetic voice) is paused. Whether or not the sound source is a user can be determined by paying attention to whether or not the sound source is started.

＜変形例２＞
以下、図１３、図１４を参照して、発話制御部２４と、音声合成部２５を発話検出部２６ａに代替し、発音頻度計測部２２１を発音頻度計測部２２１ａに代替し、発音タイミング計測部３２４を発音タイミング計測部３２４ａに代替した変形例２の目的音集音装置３ａの構成、および動作について説明する。発音タイミング計測部３２４ａは、発話が検出されていない場合に、発話が検出されなくなった直後から、最初に音源の方向の推定がなされるまでの時間を計測する（Ｓ３２４ａ）。別の表現では、発音タイミング計測部３２４ａは、発話検出部２６ａの出力が発話ありから発話なしに移行した時刻から、最初に音源の方向の推定がなされるまでの時間を計測する（Ｓ３２４ａ）。 <Modification 2>
Hereinafter, referring to FIG. 13 and FIG. 14, the utterance control unit 24 and the voice synthesis unit 25 are replaced with the utterance detection unit 26 a, the pronunciation frequency measurement unit 221 is replaced with the pronunciation frequency measurement unit 221 a, and the pronunciation timing measurement unit The configuration and operation of the target sound collecting device 3a according to the modified example 2 in which the sound generation timing measuring unit 324a is replaced with 324 will be described. When the utterance is not detected, the pronunciation timing measuring unit 324a measures the time from when the utterance is not detected to when the direction of the sound source is first estimated (S324a). In another expression, the pronunciation timing measurement unit 324a measures the time from when the output of the utterance detection unit 26a shifts from the presence of utterance to the absence of utterance until the direction of the sound source is first estimated (S324a).

＜補記＞
本発明の装置は、例えば単一のハードウェアエンティティとして、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ハードウェアエンティティの外部に通信可能な通信装置（例えば通信ケーブル）が接続可能な通信部、ＣＰＵ（Central Processing Unit、キャッシュメモリやレジスタなどを備えていてもよい）、メモリであるＲＡＭやＲＯＭ、ハードディスクである外部記憶装置並びにこれらの入力部、出力部、通信部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置の間のデータのやり取りが可能なように接続するバスを有している。また必要に応じて、ハードウェアエンティティに、ＣＤ−ＲＯＭなどの記録媒体を読み書きできる装置（ドライブ）などを設けることとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Supplementary note>
The apparatus of the present invention includes, for example, a single hardware entity as an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity. Can be connected to a communication unit, a CPU (Central Processing Unit, may include a cache memory or a register), a RAM or ROM that is a memory, an external storage device that is a hard disk, and an input unit, an output unit, or a communication unit thereof , A CPU, a RAM, a ROM, and a bus connected so that data can be exchanged between the external storage devices. If necessary, the hardware entity may be provided with a device (drive) that can read and write a recording medium such as a CD-ROM. A physical entity having such hardware resources includes a general-purpose computer.

ハードウェアエンティティの外部記憶装置には、上述の機能を実現するために必要となるプログラムおよびこのプログラムの処理において必要となるデータなどが記憶されている（外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくこととしてもよい）。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。 The external storage device of the hardware entity stores a program necessary for realizing the above functions and data necessary for processing the program (not limited to the external storage device, for example, reading a program) It may be stored in a ROM that is a dedicated storage device). Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device.

ハードウェアエンティティでは、外部記憶装置（あるいはＲＯＭなど）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてメモリに読み込まれて、適宜にＣＰＵで解釈実行・処理される。その結果、ＣＰＵが所定の機能（上記、…部、…手段などと表した各構成要件）を実現する。 In the hardware entity, each program stored in an external storage device (or ROM or the like) and data necessary for processing each program are read into a memory as necessary, and are interpreted and executed by a CPU as appropriate. . As a result, the CPU realizes a predetermined function (respective component requirements expressed as the above-described unit, unit, etc.).

本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the spirit of the present invention. In addition, the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the apparatus that executes the processing. .

既述のように、上記実施形態において説明したハードウェアエンティティ（本発明の装置）における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 As described above, when the processing functions in the hardware entity (the apparatus of the present invention) described in the above embodiments are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, a hardware entity is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

A direction estimation unit that estimates the direction of a sound source based on a plurality of acoustic signals collected from a plurality of microphones;
A pronunciation frequency measuring unit that measures the frequency estimated as the direction of the sound source for each direction;
An unnecessary sound direction estimating unit that estimates any one of the directions as an unnecessary sound direction when a comparison result between the frequency in any one of the directions and a predetermined threshold satisfies a predetermined condition;
When the direction different from the unnecessary sound direction is estimated as the direction of the sound source, a sound collection direction determining unit that determines the direction of the sound source that is different from the unnecessary sound direction as the sound collection direction;
A directional sound collection unit that emphasizes and collects the sound in the determined sound collection direction;
A target sound collecting device.

The target sound collecting device according to claim 1,
A speech synthesizer that generates and reproduces synthesized speech;
The speech frequency measurement unit
A target sound collecting device that measures the frequency only during the time that the synthesized speech is being reproduced.

The target sound collecting device according to claim 1,
An utterance detection unit for detecting an utterance from a predetermined sound signal;
The speech frequency measurement unit
A target sound collecting device that measures the frequency only during the time when the utterance is detected.

The target sound collecting device according to claim 2,
When the playback of the synthesized speech is paused, further includes a sounding timing measuring unit that measures a time from immediately after the playback of the synthesized speech pauses until the estimation of the direction of the sound source is first performed,
The sound collection direction determination unit
When the first estimated sound source direction is different from the unnecessary sound direction and the measured time satisfies a predetermined condition, the first estimated sound source direction is determined as the sound collection direction. Sound collecting device.

The target sound collecting device according to claim 3,
When the utterance is not detected, it further includes a pronunciation timing measurement unit that measures the time from immediately after the utterance is no longer detected until the first direction of the sound source is estimated,
The sound collection direction determination unit
When the first estimated sound source direction is different from the unnecessary sound direction and the measured time satisfies a predetermined condition, the first estimated sound source direction is determined as the sound collection direction. Sound collecting device.

A target sound collection method executed by the target sound collection device,
Estimating a direction of a sound source based on a plurality of acoustic signals collected from a plurality of microphones;
Measuring the frequency estimated as the direction of the sound source for each direction;
A step of estimating any one of the directions as an unnecessary sound direction when a comparison result between the frequency in any one of the directions and a predetermined threshold satisfies a predetermined condition;
Determining a direction of the sound source that is different from the unnecessary sound direction as a sound collection direction when a direction different from the unnecessary sound direction is estimated as the direction of the sound source;
Enhancing the sound in the determined sound collection direction and collecting the sound;
Purpose sound collection method including.

A program for causing a computer to function as the target sound collecting device according to any one of claims 1 to 5.

A computer-readable recording medium on which the program according to claim 7 is recorded.