JP2020039048A

JP2020039048A - Voice collecting device and voice collecting method

Info

Publication number: JP2020039048A
Application number: JP2018165139A
Authority: JP
Inventors: 勉中戸; Tsutomu Nakato; 伊藤　隆志; Takashi Ito; 隆志伊藤
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 2018-09-04
Filing date: 2018-09-04
Publication date: 2020-03-12

Abstract

To provide a voice collecting device and a voice collecting method that can select a microphone appropriately even in an environment that is easily affected by a reflected sound.SOLUTION: A voice collecting device includes a voice acquisition unit 241 that acquires voice signals from a plurality of microphones, a first microphone identification unit 242 that identifies the microphone on the basis of the strength and/or arrival time of the voice signal obtained from the microphone, a user identification unit 243 that identifies a user corresponding to the voice signal obtained from the microphone, a second microphone identification unit 244 that identifies a microphone associated with the identified user in a user table that associates one microphone among a plurality of microphones with each user, and a microphone selection unit 245 that selects the microphone identified by the second microphone identification unit as a microphone used for voice recognition processing when the microphone is identified by the second microphone identification unit and the microphone identified by the first microphone identification unit does not match the microphone identified by the second microphone identification unit.SELECTED DRAWING: Figure 4

Description

本発明は、音声収集装置および音声収集方法に関する。 The present invention relates to a voice collecting device and a voice collecting method.

近年、ユーザが発話した音声を収集し、収集した音声に基づいてコマンドを特定し、特定したコマンドに対応する処理を実行する情報処理装置が使用されている。このような音声入力可能な情報処理装置によれば、ユーザは、タッチパネルやキーボード、マウスといった入力デバイスを操作することなく、所望の処理を実行させることができる。例えば、車両に搭載された音声入力可能なカーナビゲーションシステムに対し、車両を運転するユーザは、音声を発話することによって、運転動作を中断することなく目的地を設定するなどの所望の処理を実行させることができる。 2. Description of the Related Art In recent years, an information processing apparatus that collects voice spoken by a user, specifies a command based on the collected voice, and executes a process corresponding to the specified command has been used. According to such an information processing apparatus capable of voice input, a user can execute a desired process without operating an input device such as a touch panel, a keyboard, and a mouse. For example, a user driving a vehicle performs a desired process such as setting a destination without interrupting the driving operation by speaking a voice to a car navigation system capable of voice input mounted on the vehicle. Can be done.

ユーザが発話した音声は、マイクによって取得されて音声信号に変換され、音声認識装置に入力される。音声認識装置が音声認識処理（入力された音声信号に基づいてコマンドを特定する処理）を適切に実行するためには、ユーザが発話した音声に忠実に対応する音声信号を取得することが望ましい。 The voice spoken by the user is acquired by a microphone, converted into a voice signal, and input to a voice recognition device. In order for the voice recognition device to appropriately execute voice recognition processing (processing for specifying a command based on an input voice signal), it is desirable to obtain a voice signal that faithfully corresponds to the voice uttered by the user.

ユーザが発話した音声に忠実に対応する音声信号を取得するため、それぞれ異なる指向特性を有する複数のマイクによってユーザの発話した音声を取得する場合がある。この場合、複数のマイクの取得した音声信号が取捨選択され、音声認識処理に使用される。 In order to obtain an audio signal faithfully corresponding to the voice uttered by the user, the voice uttered by the user may be obtained by a plurality of microphones having different directional characteristics. In this case, the audio signals obtained by the plurality of microphones are selected and used for voice recognition processing.

特許文献１には、発話者の方向を検出し、発話者の音声を検出し、指向特性を発話者の方向において高めるように調整し、調整された発話者の音声を認識する車載用音声認識装置が記載されている。かかる車載用音声認識装置は、発話者の音声以外のノイズを低減して音声認識率を向上することができる。 Patent Document 1 discloses an in-vehicle voice recognition that detects the direction of a speaker, detects the voice of the speaker, adjusts the directional characteristics to increase in the direction of the speaker, and recognizes the adjusted voice of the speaker. An apparatus is described. Such an in-vehicle voice recognition device can reduce noise other than the voice of the speaker and improve the voice recognition rate.

特開平１１−２１９１９３号公報JP-A-11-219193

特許文献１に記載の音声認識装置では、複数のマイクの出力における信号レベルおよび遅延時間に基づいて算出する発話者の方向に基づいて、各マイクのゲインを調整する。 In the speech recognition device described in Patent Literature 1, the gain of each microphone is adjusted based on the direction of the speaker calculated based on the signal levels and the delay times at the outputs of the plurality of microphones.

ここで、マイクは、直接マイクに到達する音声（直接音）と壁面などの周辺の物体により反射されてマイクに到達する音声（反射音）とが混在した音声を取得する場合がある。特に、車両の車室のように狭い閉ざされた状況では、反射音の影響を受けやすくなる。反射音は直接音よりも遅延するため、反射音が混在することにより直接音による音声認識処理の阻害要因となり得る。 Here, the microphone may acquire a sound in which a sound (direct sound) directly reaching the microphone and a sound (reflected sound) reaching the microphone by being reflected by a peripheral object such as a wall surface are mixed. In particular, in a narrow closed situation such as a vehicle cabin, the vehicle is easily affected by reflected sound. Since the reflected sound is delayed more than the direct sound, the mixed sound may hinder the voice recognition processing by the direct sound.

信号レベルおよび遅延時間に基づいて選択されたマイクの取得する音声が反射音の影響を受けている場合があり、このような場合、選択されたマイクが音声認識処理に最適のマイクとは限らない。 The sound obtained by the microphone selected based on the signal level and the delay time may be affected by the reflected sound, and in such a case, the selected microphone is not always the best microphone for the voice recognition processing. .

そして、マイクが反射音の影響を受けやすいか否かは、発話したユーザの声の特徴により決まる傾向があり、例えば、声の大きいユーザによる発話は反射音の影響を受けやすい。 Whether or not the microphone is easily affected by the reflected sound tends to be determined by the characteristics of the voice of the uttering user. For example, an utterance by a loud user is easily affected by the reflected sound.

そこで、本発明は、反射音の影響を受けやすい状況でも適切にマイクを選択可能な、音声収集装置および音声収集方法を提供することを目的とする。 Therefore, an object of the present invention is to provide a sound collection device and a sound collection method that can appropriately select a microphone even in a situation easily affected by a reflected sound.

本発明にかかる音声収集装置は、複数のマイクから音声信号を取得する音声取得部と、マイクから取得した音声信号の強度および／または到達時間に基づいてマイクを特定する第１マイク特定部と、マイクから取得した音声信号に対応するユーザを特定するユーザ特定部と、ユーザごとに複数のマイクのうち一のマイクを関連づけるユーザテーブルにおいて、特定されたユーザに関連づけられたマイクを特定する第２マイク特定部と、第２マイク特定部によりマイクが特定され、かつ、第１マイク特定部により特定されたマイクと第２マイク特定部により特定されたマイクとが一致しない場合、第２マイク特定部により特定されたマイクを、音声認識処理に使用するマイクとして選択するマイク選択部と、を備えることを特徴とする。 A sound collection device according to the present invention includes a sound obtaining unit that obtains sound signals from a plurality of microphones, a first microphone specifying unit that specifies a microphone based on the strength and / or arrival time of the sound signals obtained from the microphones, A user identification unit that identifies a user corresponding to an audio signal acquired from a microphone, and a second microphone that identifies a microphone associated with the identified user in a user table that associates one of a plurality of microphones for each user If the microphone is specified by the specifying unit and the second microphone specifying unit, and the microphone specified by the first microphone specifying unit does not match the microphone specified by the second microphone specifying unit, the second microphone specifying unit A microphone selection unit that selects the specified microphone as a microphone used for the voice recognition processing.

また、本発明にかかる音声収集装置において、マイクは、車両における車室内の音声に応じた音声信号を取得可能に配置されており、ユーザテーブルは、ユーザに加え、車室への案内音声の出力レベル、車室内のノイズレベル、車両の速度、および／または、車室の密閉状態ごとに、複数のマイクのうち一のマイクを関連づけており、第２マイク特定部は、ユーザテーブルにおいて、特定されたユーザ、および、車室への案内音声の出力レベル、車室内のノイズレベル、車両の速度、および／または、車室内の密閉状態に関連づけられるマイクを特定することが好ましい。 Further, in the sound collection device according to the present invention, the microphone is arranged so as to be able to acquire a sound signal corresponding to the sound in the vehicle cabin of the vehicle, and the user table is configured to output guidance voice to the vehicle room in addition to the user. One microphone among the plurality of microphones is associated with each of the level, the noise level in the vehicle compartment, the speed of the vehicle, and / or the closed state of the vehicle compartment, and the second microphone specifying unit is specified in the user table. It is preferable to specify the user who has been connected and the output level of the guidance voice to the passenger compartment, the noise level in the passenger compartment, the speed of the vehicle, and / or the microphone associated with the closed state of the passenger compartment.

また、本発明にかかる音声収集装置において、ユーザ特定部は、車両と無線接続可能な情報端末ごとにユーザを関連づける端末テーブルにおいて、車両と無線接続された情報端末の識別子に関連づけられるユーザを、音声信号に対応するユーザとして特定することが好ましい。 Also, in the voice collection device according to the present invention, the user identification unit may include, in a terminal table for associating a user with each information terminal wirelessly connectable to the vehicle, a user associated with the identifier of the information terminal wirelessly connected to the vehicle. It is preferable to specify the user corresponding to the signal.

また、本発明にかかる音声収集装置は、複数のマイクごとに、取得した音声信号を一時記憶させ、選択されたマイクにより取得されて一時記憶された音声信号を出力させるバッファ制御部をさらに備えることが好ましい。 In addition, the audio collection device according to the present invention may further include a buffer control unit that temporarily stores the obtained audio signal for each of the plurality of microphones and outputs the audio signal obtained and temporarily stored by the selected microphone. Is preferred.

また、本発明にかかる音声収集装置は、選択されたマイクにより取得された音声信号の音声認識の結果を受信し、当該結果に基づいて、ユーザテーブルにおいて、特定されたユーザとマイクとの関連づけを変更する学習部をさらに備えることが好ましい。 In addition, the voice collection device according to the present invention receives a result of voice recognition of a voice signal acquired by the selected microphone, and associates the specified user with the microphone in the user table based on the result. It is preferable to further include a learning unit for changing.

本発明にかかる音声収集方法は、複数のマイクから音声信号を取得する音声取得ステップと、マイクから取得した音声信号の強度および／または到達時間に基づいてマイクを特定する第１マイク特定ステップと、マイクから取得した音声信号に対応するユーザを特定するユーザ特定ステップと、ユーザごとに複数のマイクのうち一のマイクを関連づけるユーザテーブルにおいて、特定されたユーザに関連づけられたマイクを特定する第２マイク特定ステップと、第２マイク特定ステップにおいてマイクが特定され、かつ、第１マイク特定ステップにおいて特定されたマイクと第２マイク特定ステップにおいて特定されたマイクとが一致しない場合、第２マイク特定ステップにおいて特定されたマイクを音声認識処理に使用するマイクとして選択するマイク選択ステップと、を含むことを特徴とする。 A voice collecting method according to the present invention includes: a voice obtaining step of obtaining voice signals from a plurality of microphones; a first microphone specifying step of specifying a microphone based on the strength and / or arrival time of the voice signals obtained from the microphones; A user identification step of identifying a user corresponding to an audio signal acquired from the microphone, and a second microphone identifying a microphone associated with the identified user in a user table associating one of a plurality of microphones for each user If the microphone is specified in the specifying step and the second microphone specifying step, and the microphone specified in the first microphone specifying step does not match the microphone specified in the second microphone specifying step, the microphone is specified in the second microphone specifying step. Select the specified microphone as the microphone to be used for speech recognition processing. Characterized in that it comprises a microphone selection step of, a.

本発明の音声収集装置によれば、発話するユーザに応じて、反射音の影響を受けやすい状況でも適切にマイクを選択することができる。 ADVANTAGE OF THE INVENTION According to the audio | voice collection apparatus of this invention, a microphone can be appropriately selected according to the user who speaks even in the situation which is easily influenced by a reflected sound.

本発明の音声収集方法によれば、発話するユーザに応じて、反射音の影響を受けやすい状況でも適切にマイクを選択することができる。 ADVANTAGE OF THE INVENTION According to the audio | voice collection method of this invention, a microphone can be appropriately selected according to the user who speaks even in the situation easily affected by the reflected sound.

音声収集装置の動作概要を説明する図である。It is a figure explaining the operation | movement outline of a voice collection apparatus. 音声収集装置を備えた車両のハードウェア模式図である。It is a hardware schematic diagram of the vehicle provided with the voice collection device. （ａ）は端末テーブルの例であり、（ｂ）はユーザテーブルの例である。(A) is an example of a terminal table, and (b) is an example of a user table. 音声収集装置の概略構成を示す模式図である。It is a schematic diagram which shows the schematic structure of a voice collection device. 音声収集装置の処理フローチャートである。It is a processing flowchart of a voice collection device.

以下、図面を参照して音声収集装置および音声収集方法について詳細に説明する。ただし、本発明は図面または以下に記載される実施形態に限定されないことを理解されたい。 Hereinafter, the voice collecting apparatus and the voice collecting method will be described in detail with reference to the drawings. However, it should be understood that the invention is not limited to the drawings or the embodiments described below.

本発明の音声収集装置は、マイクから取得した音声信号の強度および／または到達時間に基づいてマイクを特定する。一方で、上述したように、マイクが反射の影響を受けやすいか否かは、ユーザの声の特徴により決まる傾向がある。そこで、音声収集装置は、ユーザごとに、反射の影響の受けやすさを反映して、マイクを関連づけて記憶しておく。そして、音声収集装置は、マイクから取得した音声信号に対応するユーザを特定し、特定されたユーザに関連づけられたマイクを特定する。音声収集装置は、ユーザに関連づけられたマイクが特定され、かつ、音声の強度および／または到達時間に基づいて特定されたマイクとユーザに関連づけられたマイクとが一致しない場合、ユーザに関連づけられたマイクを、音声認識処理に使用するマイクとして選択する。これにより、音声収集装置は、発話するユーザに応じて、反射音の影響を受けやすい状況でも適切にマイクを選択することができる。 The sound collection device of the present invention specifies a microphone based on the strength and / or arrival time of a sound signal acquired from the microphone. On the other hand, as described above, whether or not a microphone is susceptible to reflection tends to be determined by characteristics of a user's voice. Therefore, the voice collecting apparatus stores a microphone in association with a microphone, reflecting the susceptibility of the user to the reflection. Then, the sound collection device specifies the user corresponding to the sound signal acquired from the microphone, and specifies the microphone associated with the specified user. The voice collection device may be configured to identify the microphone associated with the user, and if the microphone identified based on the sound intensity and / or the arrival time does not match the microphone associated with the user, Select the microphone as the microphone to be used for the voice recognition processing. Accordingly, the voice collection device can appropriately select a microphone according to the user who speaks even in a situation where the sound is easily affected by the reflected sound.

なお、発話したユーザが特定できない場合や、発話したユーザが新規のユーザであることなどによりそのユーザにマイクが関連づけて記憶されていない場合は、音声収集装置は、音声の強度および／または到達時間に基づいて特定されたマイクを選択する。また、新規のユーザの場合でも、後述する学習部によって、そのユーザにおける反射の影響の受けやすさを反映して、マイクを新たに関連づけて記憶しておくことにより、次回からはそのユーザに関連づけられたマイクを特定することができる。 If the user who made the utterance cannot be identified, or if the user who made the utterance is a new user and no microphone is stored in association with the user, the voice collection device will use the voice intensity and / or arrival time. Select the specified microphone based on the. Also, even in the case of a new user, the learning unit described later reflects the susceptibility of the user to the influence of the reflection, and the microphone is newly associated and stored. The microphone which was made can be specified.

図１は、音声収集装置の動作概要を説明する図である。 FIG. 1 is a diagram illustrating an outline of the operation of the voice collection device.

車両１は、音声収集装置２と、マイク３ａ〜３ｃと、ストレージ装置５を有する。ユーザＸは、端末Ｐを携帯して車両１に乗車し、発話する。音声収集装置２は、ユーザＸの発話をマイク３ａ〜３ｃから取得する。 The vehicle 1 includes a voice collection device 2, microphones 3a to 3c, and a storage device 5. The user X gets on the vehicle 1 while carrying the terminal P and speaks. The voice collection device 2 acquires the utterance of the user X from the microphones 3a to 3c.

グラフＧ１は、マイク３ａから取得した音声信号の入力レベルを縦軸とし、時間を横軸として表したグラフである。同様に、グラフＧ２、グラフＧ３は、マイク３ｂ、マイク３ｃから取得した音声信号の入力レベルを表している。 The graph G1 is a graph in which the vertical axis represents the input level of the audio signal acquired from the microphone 3a, and the horizontal axis represents time. Similarly, the graphs G2 and G3 show the input levels of the audio signals acquired from the microphones 3b and 3c.

グラフＧ１において、系列Ｇ１１はマイク３ａから取得した直接音の入力レベルを表し、系列Ｇ１２はマイク３ａから取得した反射音の入力レベルを表している。グラフＧ１では、説明を簡単にするため、直接音と反射音とを別の系列で表しているが、実際にはこれらの音が合成された状態でマイク３ａから音声信号が取得される。 In the graph G1, a series G11 represents an input level of a direct sound acquired from the microphone 3a, and a series G12 represents an input level of a reflected sound acquired from the microphone 3a. In the graph G1, for the sake of simplicity, the direct sound and the reflected sound are represented in different series, but in practice, an audio signal is obtained from the microphone 3a in a state where these sounds are combined.

ユーザの発話した音声の直接音は、時刻ｔ₀にマイク３ａに到達する。ユーザの発話した音声の直接音の入力レベルは、Ｌ₁である。ユーザの発話した音声の反射音は、時刻ｔ₃にマイク３ａに到達する。ユーザの発話した音声の反射音の入力レベルは、Ｌ₅である。 Direct sound of the voice uttered the user arrives at the microphone 3a at time t _0. Input level of the direct sound of the user's uttered voice is L _1. Reflected sound of the sound uttered the user arrives at the microphone 3a at time t _3. Input level of the reflected sound of the user's uttered voice is L _5.

グラフＧ２において系列Ｇ２１で表されるマイク３ｂから取得した直接音は、時刻ｔ₁にマイク３ｂに到達し、その入力レベルはＬ₂である。また、グラフＧ２において系列Ｇ２２で表されるマイク３ｂから取得した反射音は、時刻ｔ₄にマイク３ｂに到達し、その入力レベルは、Ｌ₆である。 Direct sound acquired from the microphone 3b represented by series G21 in the graph G2 reaches the microphone 3b at time t _1, the input level is L _2. Further, the reflected sound acquired from the microphone 3b represented by series G22 in the graph G2 reaches the microphone 3b at time t _4, the input level is L _6.

グラフＧ３において系列Ｇ３１で表されるマイク３ｃから取得した直接音は、時刻ｔ₂にマイク３ｃに到達し、その入力レベルはＬ₃である。また、グラフＧ３において系列Ｇ３２で表されるマイク３ｃから取得した反射音は、時刻ｔ₅にマイク３ｃに到達し、その入力レベルは、Ｌ₄である。 Direct sound acquired from the microphone 3c represented by series G31 in graph G3 reaches the microphone 3c to time t _2, the the input level is L _3. Further, the reflected sound acquired from the microphone 3c represented by series G32 in graph G3 reaches the microphone 3c at time t _5, the input level is L _4.

このように各マイクから取得した音声信号の強度および／または到達時間に基づいて、音声収集装置２はマイクを特定する。図１の例では、音声収集装置２は、取得した音声信号の強度および到達時間に基づいて、音声信号の強度が最も大きく、かつ、音声信号が最も早く到達するマイク３ａを特定する。 Thus, the sound collection device 2 specifies the microphone based on the strength and / or the arrival time of the sound signal acquired from each microphone. In the example of FIG. 1, the voice collection device 2 specifies the microphone 3a with the highest voice signal intensity and the earliest voice signal arrival based on the obtained voice signal strength and arrival time.

また、音声収集装置２は、取得した音声信号の強度または到達時間に基づいてマイクを特定してもよい。例えば、音声収集装置２は、音声信号の強度が最も大きいマイクを特定してもよく、音声信号が最も早く到達するマイクを特定してもよい。 Further, the sound collection device 2 may specify the microphone based on the strength or arrival time of the obtained sound signal. For example, the sound collection device 2 may specify the microphone having the largest sound signal strength, or may specify the microphone to which the sound signal reaches earliest.

一方、音声収集装置２は、マイクから取得した音声信号に対応するユーザを特定する。具体的には、音声収集装置２はまず、ユーザが携帯する端末Ｐの識別子を検出する。 On the other hand, the sound collection device 2 specifies the user corresponding to the sound signal acquired from the microphone. Specifically, the voice collection device 2 first detects the identifier of the terminal P carried by the user.

そして、音声収集装置２は、ストレージ装置５に記憶された端末テーブル５１を参照し、端末Ｐに対応するユーザであるユーザＸを特定する。 Then, the voice collection device 2 refers to the terminal table 51 stored in the storage device 5 and specifies the user X who is the user corresponding to the terminal P.

さらに、音声収集装置２は、ストレージ装置５に記憶されたユーザテーブル５２を参照し、ユーザＸに関連づけられるマイクであるマイク３ｂを特定する。 Further, the voice collection device 2 refers to the user table 52 stored in the storage device 5 and specifies the microphone 3b that is the microphone associated with the user X.

次に、音声収集装置２は、音声信号の強度および／または到達時間に基づいて特定されたマイクと、ユーザに関連づけられたマイクとが同一であるか否かを判定する。図１の例では、音声信号の強度および／または到達時間に基づいて特定されたマイクはマイク３ａであり、特定されたユーザＸに対応するマイクはマイク３ｂであるので、音声収集装置２はマイクが一致しないと判定する。 Next, the voice collection device 2 determines whether the microphone specified based on the strength and / or the arrival time of the voice signal is the same as the microphone associated with the user. In the example of FIG. 1, the microphone specified based on the strength and / or arrival time of the audio signal is the microphone 3a, and the microphone corresponding to the specified user X is the microphone 3b. Are determined not to match.

マイクが一致しないと判定した音声収集装置２は、特定されたユーザに対応するマイクを、音声認識処理に使用するマイクとして選択する。図１の例では、音声収集装置２は、マイクが一致しないと判定し、特定されたユーザに対応するマイクであるマイク３ｂを、音声認識処理に使用するマイクとして選択する。 The voice collection device 2 that determines that the microphones do not match selects the microphone corresponding to the specified user as the microphone used for the voice recognition processing. In the example of FIG. 1, the voice collection device 2 determines that the microphones do not match, and selects the microphone 3b, which is the microphone corresponding to the specified user, as the microphone used for the voice recognition processing.

このようにマイクを選択することにより、音声収集装置は、発話するユーザに応じて、反射音の影響を受けやすい状況でも、音声認識処理に使用するマイクを適切に選択することができる。 By selecting a microphone in this way, the voice collecting apparatus can appropriately select a microphone to be used for the voice recognition process, depending on the user who speaks, even in a situation that is easily affected by the reflected sound.

図２は、音声収集装置を備えた車両のハードウェア模式図である。 FIG. 2 is a schematic hardware diagram of a vehicle including the voice collecting device.

車両１は、音声収集装置２と、マイク３と、無線インタフェース４と、ストレージ装置５と、音声認識装置６と、カーナビゲーションシステム７とを備える。 The vehicle 1 includes a voice collection device 2, a microphone 3, a wireless interface 4, a storage device 5, a voice recognition device 6, and a car navigation system 7.

音声収集装置２は、複数のマイク３から音声信号を取得し、選択した１のマイクの音声信号を音声認識装置６に出力する情報処理装置である。音声収集装置２の詳細な構成は後述する。 The voice collecting device 2 is an information processing device that obtains a voice signal from a plurality of microphones 3 and outputs a voice signal of one selected microphone to the voice recognition device 6. The detailed configuration of the voice collection device 2 will be described later.

マイク３は、音声に応じた電気信号である音声信号を出力する入力デバイスである。マイク３は音声収集装置２と接続され、音声信号を音声収集装置２に提供する。マイク３は複数のマイクを含み、マイク３に含まれる複数のマイクのそれぞれは、車両の客室内の各所に取り付けられる。 The microphone 3 is an input device that outputs a sound signal that is an electric signal corresponding to the sound. The microphone 3 is connected to the sound collection device 2 and provides a sound signal to the sound collection device 2. The microphone 3 includes a plurality of microphones, and each of the plurality of microphones included in the microphone 3 is attached to each location in the passenger compartment of the vehicle.

例えば、マイク３ａはフロントガラス上端中央近傍に、マイク３ｂは右側Ｂピラー上部に、マイク３ｃは左側Ｂピラー上部に取り付けられる。 For example, the microphone 3a is mounted near the center of the upper end of the windshield, the microphone 3b is mounted above the right B pillar, and the microphone 3c is mounted above the left B pillar.

無線インタフェース４は、外部機器と無線による通信を行うために用いられるインタフェースである。無線インタフェース４はBluetooth（登録商標）インタフェースを提供するデバイスであり、音声収集装置２と接続されてスマートフォン、タブレットコンピュータといったBluetoothに対応する外部機器との通信を可能とする。無線インタフェース４は、IEEE802.11acなどの無線ローカルエリアネットワーク、ISO/IEC 18092などの近距離無線通信規格などであってもよい。 The wireless interface 4 is an interface used for performing wireless communication with an external device. The wireless interface 4 is a device that provides a Bluetooth (registered trademark) interface, and is connected to the voice collection device 2 to enable communication with a Bluetooth-compatible external device such as a smartphone or a tablet computer. The wireless interface 4 may be a wireless local area network such as IEEE802.11ac, a short-range wireless communication standard such as ISO / IEC 18092, or the like.

ストレージ装置５は、データを記憶する装置である。ストレージ装置５は記憶素子として半導体素子メモリを用いたＳＳＤ（Solid State Drive）であり、音声収集装置２と接続されてデータの送受信を行う。ストレージ装置５は、ハードディスクドライブであってもよい。 The storage device 5 is a device that stores data. The storage device 5 is an SSD (Solid State Drive) using a semiconductor element memory as a storage element, and is connected to the audio collection device 2 to transmit and receive data. The storage device 5 may be a hard disk drive.

ストレージ装置５は、端末テーブル５１とユーザテーブル５２とを記憶する。端末テーブル５１とユーザテーブル５２の詳細については後述する。 The storage device 5 stores a terminal table 51 and a user table 52. The details of the terminal table 51 and the user table 52 will be described later.

音声認識装置６は、音声収集装置２が出力する音声信号を周辺機器へのコマンドに変換する装置である。音声認識装置６は、メインメモリに展開された音声認識プログラムをプロセッサが実行することにより構成される。音声認識装置６は、音声収集装置２と接続されて音声収集装置２から音声信号を受信する。 The voice recognition device 6 is a device that converts a voice signal output from the voice collection device 2 into a command to a peripheral device. The speech recognition device 6 is configured by a processor executing a speech recognition program developed in a main memory. The speech recognition device 6 is connected to the speech collection device 2 and receives a speech signal from the speech collection device 2.

カーナビゲーションシステム７は、車両１の現在位置に基づいて目的地への経路案内を行う装置である。カーナビゲーションシステム７は、ローカルエリアネットワークを介して音声認識装置６と接続され、音声認識装置６が受信したコマンドに基づいて、目的地の設定または表示の変更などの動作を行う。 The car navigation system 7 is a device that provides route guidance to a destination based on the current position of the vehicle 1. The car navigation system 7 is connected to the voice recognition device 6 via the local area network, and performs operations such as setting a destination or changing a display based on a command received by the voice recognition device 6.

図３（ａ）は端末テーブルの例であり、図３（ｂ）はユーザテーブルの例である。 FIG. 3A is an example of a terminal table, and FIG. 3B is an example of a user table.

ストレージ装置５に記憶される端末テーブル５１は、車両１と無線接続可能な情報端末ごとにユーザを関連づけている。例えば、端末テーブル５１において、端末ＰにはユーザＸが関連づけられている。 The terminal table 51 stored in the storage device 5 associates a user with each information terminal wirelessly connectable to the vehicle 1. For example, in the terminal table 51, the user X is associated with the terminal P.

ストレージ装置５は、ユーザによる設定に従って、情報端末とユーザとの関連付けを端末テーブル５１に記憶する。 The storage device 5 stores the association between the information terminal and the user in the terminal table 51 according to the setting by the user.

ストレージ装置５に記憶されるユーザテーブル５２には、ユーザごとに複数のマイク３のうち一のマイクを関連づけられている。例えば、ユーザテーブル５２において、ユーザＸにはマイク３ｂが関連づけられている。 In the user table 52 stored in the storage device 5, one microphone among the plurality of microphones 3 is associated with each user. For example, in the user table 52, the microphone 3b is associated with the user X.

また、ユーザテーブル５２には、ユーザに加えて、車両１の車室への案内音声の出力レベル、車室内のノイズレベル、車両１の速度、および／または、車室の密閉状態ごとに、複数のマイク３のうち一のマイクが関連づけられていてもよい。この場合、例えばユーザＸに対し、案内音声の出力レベルが所定の閾値以上であるか否かに応じて異なるマイクを割り当てることができる。 In addition to the user, the user table 52 has a plurality of output levels for the guidance sound to the vehicle interior of the vehicle 1, a noise level in the vehicle interior, a speed of the vehicle 1, and / or a closed state of the vehicle interior. One of the microphones 3 may be associated. In this case, for example, a different microphone can be assigned to the user X depending on whether or not the output level of the guidance voice is equal to or higher than a predetermined threshold.

車室への案内音声の出力レベルとは、カーナビゲーションシステム７が乗員のための案内として出力する音声のレベルである。ユーザは、案内音声が聞きとりにくいときに案内音声の出力レベルを上げると考えられる。したがって、ユーザが案内音声の出力レベルを通常よりも上げている場合、車室内は案内音声が聞きとりにくい状況にあると考えられる。案内音声の出力レベルごとにマイクを関連づけることにより、音声収集装置２は、案内音声が聞きとりにくい状況にある車室内に適したマイクを選択することができる。 The output level of the guidance voice to the cabin is the level of the voice that the car navigation system 7 outputs as guidance for the occupant. It is considered that the user increases the output level of the guidance voice when the guidance voice is difficult to hear. Therefore, when the user raises the output level of the guidance sound more than usual, it is considered that the guidance sound is difficult to hear in the vehicle cabin. By associating a microphone with each output level of the guidance voice, the voice collection device 2 can select a microphone suitable for a vehicle cabin where the guidance voice is difficult to hear.

同様に、車室内のノイズレベル、車両１の速度、車室の密閉状態も、マイク３によるユーザの発話音声の取得に影響を与え得る要因であり、これらにマイクを関連づけることにより、音声収集装置２は、車室内の状況に適したマイクを選択することができる。 Similarly, the noise level in the vehicle compartment, the speed of the vehicle 1, and the closed state of the vehicle compartment are also factors that may affect the acquisition of the uttered voice of the user by the microphone 3, and by associating these with the microphone, the voice collection device 2 can select a microphone suitable for the situation in the cabin.

なお、車室への案内音声の出力レベルは、カーナビゲーションシステム７から取得することができる。車室内のノイズレベルは、マイク３から取得した音声信号に基づいて算出することができる。車両１の速度は、車両１の走行を司る走行制御ＥＣＵから取得することができる。車室の密閉状態は、ドアウィンドウの開度に基づいて取得することができる。 Note that the output level of the guidance voice to the cabin can be acquired from the car navigation system 7. The noise level in the vehicle compartment can be calculated based on the audio signal acquired from the microphone 3. The speed of the vehicle 1 can be obtained from the traveling control ECU that governs the traveling of the vehicle 1. The closed state of the vehicle compartment can be obtained based on the opening degree of the door window.

ストレージ装置５は、ユーザによる設定に従って、ユーザとマイク３との関連づけをユーザテーブル５２に記憶する。また、ストレージ装置５は、後述する学習処理により、ユーザとマイク３との関連づけを変更する。 The storage device 5 stores the association between the user and the microphone 3 in the user table 52 according to the setting by the user. The storage device 5 changes the association between the user and the microphone 3 by a learning process described later.

図４は、音声収集装置の概略構成を示す模式図である。 FIG. 4 is a schematic diagram showing a schematic configuration of the voice collecting device.

音声収集装置２は、入力部２１と、出力部２２と、記憶部２３と、演算部２４とを有する。音声収集装置２は、ＥＣＵ（Electronic Control Unit）として車両１に搭載される。 The voice collection device 2 includes an input unit 21, an output unit 22, a storage unit 23, and a calculation unit 24. The voice collection device 2 is mounted on the vehicle 1 as an ECU (Electronic Control Unit).

入力部２１は、マイク３、無線インタフェース４、ストレージ装置５等から信号を受信する回路である。入力部２１は、受信した信号を演算部２４に供給する。 The input unit 21 is a circuit that receives signals from the microphone 3, the wireless interface 4, the storage device 5, and the like. The input unit 21 supplies the received signal to the calculation unit 24.

出力部２２は、ストレージ装置５、音声認識装置６等に信号を送信する回路である。出力部２２は、演算部２４から供給された信号を、ストレージ装置５、音声認識装置６等に送信する。 The output unit 22 is a circuit that transmits a signal to the storage device 5, the voice recognition device 6, and the like. The output unit 22 transmits the signal supplied from the calculation unit 24 to the storage device 5, the voice recognition device 6, and the like.

記憶部２３は、情報を記憶する不揮発メモリである。記憶部２３は、演算部２４による演算に用いられる命令、データ、および閾値を記憶する。 The storage unit 23 is a nonvolatile memory that stores information. The storage unit 23 stores an instruction, data, and a threshold used for the operation by the operation unit 24.

また、記憶部２３は、バッファ２３１を有する。バッファ２３１は、複数のマイク３ｇごとに、取得した音声信号を一時記憶する。 The storage unit 23 has a buffer 231. The buffer 231 temporarily stores the acquired audio signal for each of the plurality of microphones 3g.

演算部２４は、入力部２１から供給された信号に基づき演算を行い、出力部２２に信号を出力する。演算部２４は、所定のプログラムを実行することにより演算を行うプロセッサを有する。 The arithmetic unit 24 performs an arithmetic operation based on the signal supplied from the input unit 21 and outputs a signal to the output unit 22. The calculation unit 24 has a processor that performs a calculation by executing a predetermined program.

演算部２４は、音声取得部２４１と、第１マイク特定部２４２と、ユーザ特定部２４３と、第２マイク特定部２４４と、マイク選択部２４５と、バッファ制御部２４６と、学習部２４７とを有する。演算部２４が有するこれらの各部は、演算部２４が有するプロセッサ上で実行される機能モジュールである。あるいは、演算部２４が有するこれらの各部は、専用回路により実現されてもよい。 The calculation unit 24 includes a voice acquisition unit 241, a first microphone identification unit 242, a user identification unit 243, a second microphone identification unit 244, a microphone selection unit 245, a buffer control unit 246, and a learning unit 247. Have. Each of these units included in the arithmetic unit 24 is a functional module executed on a processor included in the arithmetic unit 24. Alternatively, these units included in the arithmetic unit 24 may be realized by a dedicated circuit.

音声取得部２４１は、複数のマイク３から音声信号を取得する。 The sound acquisition unit 241 acquires sound signals from the plurality of microphones 3.

第１マイク特定部２４２は、マイク３から取得した音声信号の強度および／または到達時間に基づいてマイクを特定する。 The first microphone identification unit 242 identifies the microphone based on the intensity and / or arrival time of the audio signal acquired from the microphone 3.

ユーザ特定部２４３は、マイク３から取得した音声信号に対応するユーザを特定する。 The user specifying unit 243 specifies a user corresponding to the audio signal acquired from the microphone 3.

第２マイク特定部２４４は、複数のユーザごとに複数のマイク３のうち一のマイクを関連づけるユーザテーブル５２において、特定されたユーザに関連づけられたマイクを特定する。 The second microphone identification unit 244 identifies the microphone associated with the identified user in the user table 52 that associates one microphone among the plurality of microphones 3 for each of the plurality of users.

また、ユーザテーブル５２に、ユーザに加えて、車両１の車室への案内音声の出力レベル、車室内のノイズレベル、車両１の速度、および／または、車室の密閉状態ごとに、複数のマイク３のうち一のマイクが関連づけられている場合、第２マイク特定部２４４は、これらに関連づけられたマイクを特定してもよい。 In addition, in addition to the user, a plurality of output levels of the guidance sound to the cabin of the vehicle 1, the noise level in the cabin, the speed of the vehicle 1, and / or the closed state of the cabin are displayed in the user table 52. When one of the microphones 3 is associated, the second microphone identification unit 244 may identify the microphone associated with these microphones.

マイク選択部２４５は、第１マイク特定部２４２により特定されたマイクと第２マイク特定部２４４により特定されたマイクとが一致しない場合、第２マイク特定部２４４により特定されたマイクを、音声認識処理に使用するマイクとして選択する。 When the microphone specified by the first microphone specifying unit 242 and the microphone specified by the second microphone specifying unit 244 do not match, the microphone selecting unit 245 recognizes the microphone specified by the second microphone specifying unit 244 by voice recognition. Select the microphone to be used for processing.

バッファ制御部２４６は、複数のマイク３ごとに、取得した音声信号をバッファ２３１に一時記憶させる。また、バッファ制御部２４６は、選択されたマイクにより取得されてバッファ２３１に一時記憶された音声信号を出力する。 The buffer control unit 246 causes the buffer 231 to temporarily store the acquired audio signal for each of the plurality of microphones 3. Further, the buffer control unit 246 outputs an audio signal acquired by the selected microphone and temporarily stored in the buffer 231.

バッファ制御部２４６が一時記憶させた音声信号を出力することにより、音声収集装置２は、マイクが選択されるまでにユーザが発話した内容も音声認識の対象として音声認識装置６に出力することができる。 By outputting the audio signal temporarily stored by the buffer control unit 246, the audio collection device 2 can output the content uttered by the user before the microphone is selected to the audio recognition device 6 as an object of audio recognition. it can.

学習部２４７は、選択されたマイクにより取得された音声信号の音声認識の結果を受信し、当該結果に基づいて、ユーザテーブル５２において特定されたユーザとマイクとの関連づけを変更する。 The learning unit 247 receives the result of the voice recognition of the voice signal acquired by the selected microphone, and changes the association between the user specified in the user table 52 and the microphone based on the result.

学習部２４７が音声認識の結果に基づいてユーザテーブル５２における関連づけを変更することにより、音声収集装置２は音声認識に適したマイクをユーザごとに選択することができる。 By the learning unit 247 changing the association in the user table 52 based on the result of the voice recognition, the voice collection device 2 can select a microphone suitable for voice recognition for each user.

本実施形態の音声収集装置２は、上述した各部を含む構成により、発話するユーザに応じて、反射音の影響を受けやすい状況でも、音声認識処理に使用するマイクを適切に選択することができる。 With the configuration including the above-described units, the voice collection device 2 of the present embodiment can appropriately select a microphone to be used for voice recognition processing, depending on the user who speaks, even in a situation that is easily affected by the reflected sound. .

図５は、音声収集装置の処理フローチャートである。 FIG. 5 is a processing flowchart of the voice collection device.

まず、音声取得部２４１は、複数のマイク３から音声信号を取得する（ステップＳ１１）。このとき、バッファ制御部２４６は、複数のマイクごとに、取得した音声信号をバッファ２３１に一時記憶させる。 First, the sound acquisition unit 241 acquires sound signals from the plurality of microphones 3 (Step S11). At this time, the buffer control unit 246 temporarily stores the acquired audio signal in the buffer 231 for each of the plurality of microphones.

次に、第１マイク特定部２４２は、マイク３から取得した音声信号の強度および／または到達時間に基づいてマイク３を特定する（ステップＳ１２）。 Next, the first microphone specifying unit 242 specifies the microphone 3 based on the intensity and / or arrival time of the audio signal acquired from the microphone 3 (Step S12).

次に、ユーザ特定部２４３は、マイク３から取得した音声信号に対応するユーザを特定し（ステップＳ１３）、ユーザを特定できない場合（ステップＳ１３：Ｎ）、音声収集装置２の処理は後述するステップＳ１８に遷移する。 Next, the user identification unit 243 identifies a user corresponding to the audio signal acquired from the microphone 3 (step S13). If the user cannot be identified (step S13: N), the processing of the audio collection device 2 will be described later. Transit to S18.

ステップＳ１３において、ユーザ特定部２４３は、まず、ユーザが携帯する端末の識別子を検出する。ユーザ特定部２４３は、無線インタフェース４により端末の識別子を検出することができる。そして、ユーザ特定部２４３は、端末テーブル５１において検出された識別子に関連づけられるユーザを、音声信号に対応するユーザとして特定する。 In step S13, the user identification unit 243 first detects an identifier of a terminal carried by the user. The user identification unit 243 can detect the identifier of the terminal using the wireless interface 4. Then, the user specifying unit 243 specifies a user associated with the identifier detected in the terminal table 51 as a user corresponding to the audio signal.

音声収集装置２は、このようにユーザが通常携帯する端末の識別子に基づいてユーザを特定するので、ユーザに特段の動作を要求することなく確実にユーザの特定を行うことができる。 Since the voice collecting apparatus 2 specifies the user based on the identifier of the terminal usually carried by the user, the voice collecting apparatus 2 can reliably specify the user without requesting the user to perform any particular operation.

また、ユーザ特定部２４３は、シートに組み込まれた重量センサの出力、ステアリングに組み込まれた指紋センサの出力等に基づいてユーザの特定を行ってもよい。その場合、ストレージ装置５には、これらの出力とユーザとを関連づけるテーブルが、端末テーブル５１に代えて記憶される。 Further, the user specifying unit 243 may specify the user based on the output of the weight sensor incorporated in the seat, the output of the fingerprint sensor incorporated in the steering wheel, and the like. In this case, a table that associates these outputs with the users is stored in the storage device 5 instead of the terminal table 51.

次に、第２マイク特定部２４４は、ユーザごとに複数のマイク３のうち一のマイクを関連づけるユーザテーブル５２において、特定されたユーザに関連づけられたマイクを特定し（ステップＳ１４）、特定されたユーザに関連づけられたマイクを特定できない場合（ステップＳ１４：Ｎ）、音声収集装置２の処理は、後述するステップＳ１８に遷移する。 Next, the second microphone identification unit 244 identifies a microphone associated with the identified user in the user table 52 that associates one microphone among the plurality of microphones 3 for each user (Step S14), and identifies the microphone. If the microphone associated with the user cannot be specified (step S14: N), the process of the sound collection device 2 transitions to step S18 described below.

また、ユーザテーブル５２に、ユーザに加えて、車両１の車室への案内音声の出力レベル、車室内のノイズレベル、車両１の速度、および／または、車室の密閉状態ごとに、複数のマイク３のうち一のマイクが関連づけられている場合、ステップＳ１４において、第２マイク特定部２４４は、これらに関連づけられたマイクを特定してもよい。 In addition, in addition to the user, a plurality of output levels of the guidance sound to the cabin of the vehicle 1, the noise level in the cabin, the speed of the vehicle 1, and / or the closed state of the cabin are displayed in the user table 52. If one of the microphones 3 is associated, the second microphone identification unit 244 may identify the microphone associated with them in step S14.

次に、マイク選択部２４５は、ステップＳ１２において特定されたマイクとステップＳ１４において特定されたマイクが一致するか否かを判定する（ステップＳ１５）。 Next, the microphone selection unit 245 determines whether the microphone specified in Step S12 matches the microphone specified in Step S14 (Step S15).

ステップＳ１２において特定されたマイクとステップＳ１４において特定されたマイクが一致すると判定すると（ステップＳ１５：Ｙ）、マイク選択部２４５は、一致したマイクを音声認識処理に使用するマイクとして選択する（ステップＳ１６）。 If it is determined that the microphone specified in step S12 matches the microphone specified in step S14 (step S15: Y), the microphone selection unit 245 selects the matched microphone as the microphone to be used for the voice recognition process (step S16). ).

ステップＳ１２において特定されたマイクとステップＳ１４において特定されたマイクが一致しないと判定すると（ステップＳ１５：Ｎ）、マイク選択部２４５は、ステップＳ１４において特定されたマイクを音声認識処理に使用するマイクとして選択する（ステップＳ１７）。 If it is determined that the microphone specified in Step S12 does not match the microphone specified in Step S14 (Step S15: N), the microphone selection unit 245 determines that the microphone specified in Step S14 is used as the microphone used for the voice recognition process. Select (step S17).

ステップＳ１３においてユーザを特定できない場合（ステップＳ１３：Ｎ）、および、ステップＳ１４において特定されたユーザに関連づけられたマイクを特定できない場合（ステップＳ１４：Ｎ）、マイク選択部２４５は、ステップＳ１２において特定されたマイクを音声認識処理に使用するマイクとして選択する（ステップＳ１８）。 If the user cannot be specified in step S13 (step S13: N), and if the microphone associated with the user specified in step S14 cannot be specified (step S14: N), the microphone selection unit 245 determines in step S12. The selected microphone is selected as a microphone to be used for the voice recognition processing (step S18).

次に、バッファ制御部２４６は、選択されたマイクにより取得されてバッファ２３１に一時記憶された音声信号を、音声認識処理のために音声認識装置６へ出力させる（ステップＳ１９）。 Next, the buffer control unit 246 causes the voice signal acquired by the selected microphone and temporarily stored in the buffer 231 to be output to the voice recognition device 6 for voice recognition processing (step S19).

次に、学習部２４７は、選択されたマイクにより取得された音声信号の音声認識結果を音声認識装置６から受信する。そして、学習部２４７は、当該結果に基づいて、ユーザテーブル５２において、特定されたユーザとマイクとの関連づけを変更する（ステップＳ２０）。 Next, the learning unit 247 receives, from the speech recognition device 6, a speech recognition result of the speech signal acquired by the selected microphone. Then, the learning unit 247 changes the association between the specified user and the microphone in the user table 52 based on the result (Step S20).

音声収集装置２は、ユーザの発話を検出するたびに上述した一連の処理を実行する。また、音声収集装置２は、ステップＳ１１からＳ１９までの処理回数が所定回に到達する度にステップＳ２０を実行するようにしてもよい。 The voice collection device 2 executes the above-described series of processes each time a user's utterance is detected. Further, the voice collection device 2 may execute step S20 every time the number of times of processing from steps S11 to S19 reaches a predetermined number.

本実施形態の音声収集方法は、上述したステップを含む処理を実行することで、発話するユーザに応じて、反射音の影響を受けやすい状況でも、音声認識処理に使用するマイクを適切に選択することを可能とする。 The voice collection method according to the present embodiment executes the processing including the above-described steps to appropriately select a microphone to be used for the voice recognition processing according to the uttering user even in a situation where the reflected sound is susceptible. To make things possible.

当業者は、本発明の精神および範囲から外れることなく、種々の変更、置換および修正をこれに加えることが可能であることを理解されたい。 It will be understood by those skilled in the art that various changes, substitutions and modifications can be made without departing from the spirit and scope of the invention.

１車両
２音声収集装置
２４１音声取得部
２４２第１マイク特定部
２４３ユーザ特定部
２４４第２マイク特定部
２４５マイク選択部
２４６バッファ制御部
２４７学習部
３、３ａ〜３ｃマイク Reference Signs List 1 vehicle 2 sound collecting device 241 sound obtaining unit 242 first microphone specifying unit 243 user specifying unit 244 second microphone specifying unit 245 microphone selecting unit 246 buffer control unit 247 learning unit 3, 3a to 3c microphone

Claims

An audio acquisition unit that acquires audio signals from a plurality of microphones;
A first microphone identification unit that identifies the microphone based on the intensity and / or arrival time of the audio signal acquired from the microphone;
A user identification unit that identifies a user corresponding to the audio signal acquired from the microphone,
A second microphone identification unit that identifies a microphone associated with the identified user in a user table that associates one microphone among the plurality of microphones for each user;
When the microphone is specified by the second microphone specifying unit and the microphone specified by the first microphone specifying unit does not match the microphone specified by the second microphone specifying unit, the microphone is specified by the second microphone specifying unit. A microphone selection unit that selects the identified microphone as a microphone to be used for voice recognition processing;
An audio collection device comprising:

The microphone is arranged so as to be able to acquire an audio signal according to the audio in the vehicle compartment of the vehicle,
The user table includes, in addition to the user, the plurality of microphones for each of an output level of a guidance voice to the cabin, a noise level in the cabin, a speed of the vehicle, and / or a closed state of the cabin. One of the microphones,
The second microphone identification unit is configured to output, in the user table, the identified user and an output level of a guidance voice to the cabin, a noise level in the cabin, a speed of the vehicle, and / or The sound collection device according to claim 1, wherein a microphone associated with the closed state of the room is specified.

The user identification unit identifies a user associated with an identifier of an information terminal wirelessly connected to the vehicle as a user corresponding to the audio signal in a terminal table that associates a user with each information terminal wirelessly connectable to the vehicle. The voice collecting apparatus according to claim 2, which performs the processing.

The apparatus according to claim 1, further comprising: a buffer control unit configured to temporarily store an acquired audio signal for each of the plurality of microphones and output the audio signal acquired and temporarily stored by the selected microphone for audio recognition. 4. The voice collecting device according to any one of 3.

A learning unit configured to receive a result of voice recognition of the voice signal acquired by the selected microphone and change an association between the specified user and the microphone in the user table based on the result. The voice collection device according to claim 4.

An audio acquisition step of acquiring audio signals from a plurality of microphones;
A first microphone identification step of identifying a microphone based on the strength and / or arrival time of an audio signal acquired from the microphone;
A user identification step of identifying a user corresponding to the audio signal obtained from the microphone,
A second microphone identification step of identifying a microphone associated with the identified user in a user table that associates one microphone among the plurality of microphones for each user;
When the microphone is specified in the second microphone specifying step, and the microphone specified in the first microphone specifying step does not match the microphone specified in the second microphone specifying step, the microphone is specified in the second microphone specifying step. A microphone selection step of selecting the identified microphone as a microphone to be used for speech recognition processing;
Voice collection method including.