JP2021196550A

JP2021196550A - Voice recognition device, voice recognition method, program, and storage medium

Info

Publication number: JP2021196550A
Application number: JP2020104448A
Authority: JP
Inventors: 光憲田中; Mitsunori Tanaka; 裕也関口; Yuya Sekiguchi; 涼小林; Ryo Kobayashi
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2021-12-27

Abstract

To improve accuracy of voice recognition when there are multiple devices with voice recognition functions.SOLUTION: A voice recognition device includes: a voice input unit for inputting a user's voice; a voice recognition unit for recognizing the user's voice input to the voice input unit; and a control unit that performs processing according to the user's voice recognized by the voice recognition unit based on whether or not another voice recognition device is playing another voice.SELECTED DRAWING: Figure 1

Description

本発明は、音声認識装置、音声認識方法、プログラム、および記録媒体に関する。 The present invention relates to a voice recognition device, a voice recognition method, a program, and a recording medium.

音声認識機能を持つ家電製品などの機器が限られた空間内に複数存在する場合、それぞれの機器がその機器としての独立した動作を行いながらも、ユーザからの音声コマンドに対しては相互に情報交換を行いながら音声認識が行えるようにすることで、誤認識やそれによる誤動作を回避でき、さらに、雑音除去などを機能的に行えるようにして適切な機器制御を可能とする技術が知られている。（例えば、特許文献１参照）。 When there are multiple devices such as home appliances with voice recognition function in a limited space, each device operates independently as the device, but information is provided to each other for voice commands from the user. It is known that by enabling voice recognition while exchanging, it is possible to avoid misrecognition and malfunction due to it, and further, it is possible to functionally perform noise removal and enable appropriate device control. There is. (See, for example, Patent Document 1).

特開２００２−１８２６７９号公報Japanese Unexamined Patent Publication No. 2002-182679

音声認識機能を有する装置が空間内に複数存在するケースにおいて、装置の位置関係に基づきユーザがどの装置に対して音声コマンドを発話したのかを特定し、特定された装置が音声認識を行う場合、他の装置が音声を再生している場合に特定された装置がユーザの音声に対する音声認識を行うと、音声認識の精度が悪くなる。 In the case where there are multiple devices having a voice recognition function in the space, it is specified to which device the user has spoken a voice command based on the positional relationship of the devices, and the specified device performs voice recognition. If the specified device performs voice recognition for the user's voice when another device is playing the voice, the accuracy of the voice recognition deteriorates.

その理由は次のとおりである。自機器の音声認識処理について、他の機器が再生する音声の除去は、自機器がマイクから入力した信号と、他の機器がスピーカーから出力する音声を自機器のマイクから入力した場合の信号（参照信号）と、を比較し、これに基づいて処理することが理想である。 The reason is as follows. Regarding the voice recognition processing of the own device, the removal of the voice played by the other device is the signal when the signal input by the own device from the microphone and the voice output by the other device from the speaker are input from the microphone of the own device ( It is ideal to compare with the reference signal) and process based on this.

しかしながら、例えば、特許文献１のシステムにおいては、他の機器がスピーカーから出力する音声は他の機器の情報処理部で解析され雑音情報として出力され、ネットワーク接続部経由で自機器に入力される。つまり、他の機器のスピーカーの前段の回路、他の機器のスピーカー、他の機器から自機器までの音声の伝搬路、自機器のマイク、自機器のマイクの後段の回路を経由しない。よって、特許文献１のシステムにおいては、理想的な参照信号からのずれが存在し、この分音声認識の精度が悪くなる。 However, for example, in the system of Patent Document 1, the sound output from the speaker by another device is analyzed by the information processing unit of the other device, output as noise information, and input to the own device via the network connection unit. That is, it does not pass through the circuit before the speaker of the other device, the speaker of the other device, the sound propagation path from the other device to the own device, the microphone of the own device, and the circuit after the microphone of the own device. Therefore, in the system of Patent Document 1, there is a deviation from the ideal reference signal, and the accuracy of speech recognition deteriorates by this amount.

本発明の一態様は、音声認識機能を有する装置が複数存在する場合に、音声認識の精度を向上することを目的とする。 One aspect of the present invention is to improve the accuracy of voice recognition when there are a plurality of devices having a voice recognition function.

本発明の一態様に係る音声認識装置は、ユーザの音声が入力される音声入力部と、前記音声入力部に入力された前記ユーザの音声の認識を行う音声認識部と、他の音声認識装置が他の音声を再生中であるか否かに基づいて、前記音声認識部により認識される前記ユーザの音声に応じた処理を行う制御部と、を備える。 The voice recognition device according to one aspect of the present invention includes a voice input unit for inputting a user's voice, a voice recognition unit for recognizing the user's voice input to the voice input unit, and another voice recognition device. Includes a control unit that performs processing according to the user's voice recognized by the voice recognition unit based on whether or not another voice is being reproduced.

本発明の一態様に係る音声認識方法は、ユーザの音声が入力され、入力された前記ユーザの音声の認識を行い、他の音声認識装置が他の音声を再生中であるか否かに基づいて、前記音声認識部により認識される前記ユーザの音声に応じた処理を行う、処理を備える。 The voice recognition method according to one aspect of the present invention is based on whether or not a user's voice is input, the input user's voice is recognized, and another voice recognition device is playing another voice. Further, it includes a process of performing a process according to the voice of the user recognized by the voice recognition unit.

本発明の一態様に係るプログラムは、ユーザの音声が入力されるコンピュータに、入力された前記ユーザの音声の認識を行い、他の音声認識装置が他の音声を再生中であるか否かに基づいて、前記音声認識部により認識される前記ユーザの音声に応じた処理を行う、処理を実行させる。 The program according to one aspect of the present invention recognizes the input user's voice to the computer to which the user's voice is input, and determines whether or not another voice recognition device is playing another voice. Based on this, the process of performing the process according to the voice of the user recognized by the voice recognition unit is executed.

本発明の一態様に係るコンピュータ読み取り可能な記録媒体は、ユーザの音声が入力されるコンピュータに、入力された前記ユーザの音声の認識を行い、他の音声認識装置が他の音声を再生中であるか否かに基づいて、前記音声認識部により認識される前記ユーザの音声に応じた処理を行う、処理を実行させるプログラムを記録する。 The computer-readable recording medium according to one aspect of the present invention recognizes the input user's voice to the computer to which the user's voice is input, and another voice recognition device is playing another voice. A program for executing a process for performing a process according to the voice of the user recognized by the voice recognition unit is recorded based on the presence or absence.

実施の形態に係るシステムの構成図の一例である。It is an example of the block diagram of the system which concerns on embodiment. 実施の形態に係る音声認識装置の構成図の一例である。It is an example of the block diagram of the voice recognition apparatus which concerns on embodiment. 実施の形態に係る音声認識方法（その１）のフローチャートの一例である。This is an example of the flowchart of the voice recognition method (No. 1) according to the embodiment. 実施の形態に係る音声認識方法（その１）のフローチャートの一例である。This is an example of the flowchart of the voice recognition method (No. 1) according to the embodiment. 実施の形態に係る音声認識方法（その１）のフローチャートの一例である。This is an example of the flowchart of the voice recognition method (No. 1) according to the embodiment. 実施の形態に係る音声認識方法（その２）のフローチャートの一例である。This is an example of the flowchart of the voice recognition method (No. 2) according to the embodiment. 実施の形態に係る音声認識方法（その２）のフローチャートの一例である。This is an example of the flowchart of the voice recognition method (No. 2) according to the embodiment. 実施の形態に係る音声認識方法（その２）のフローチャートの一例である。This is an example of the flowchart of the voice recognition method (No. 2) according to the embodiment. 実施の形態に係るシステムの構成図の一例である。It is an example of the block diagram of the system which concerns on embodiment. 実施の形態に係る音声認識方法（その３）のフローチャートの一例である。This is an example of the flowchart of the voice recognition method (No. 3) according to the embodiment. 実施の形態に係る音声認識方法（その３）のフローチャートの一例である。This is an example of the flowchart of the voice recognition method (No. 3) according to the embodiment. 実施の形態に係る音声認識方法（その３）のフローチャートの一例である。This is an example of the flowchart of the voice recognition method (No. 3) according to the embodiment. 実施の形態に係る音声認識方法（その３）のフローチャートの一例である。This is an example of the flowchart of the voice recognition method (No. 3) according to the embodiment.

以下、実施の形態について、図面を参照しつつ説明する。なお、図面については、同一又は同等の要素には同一の符号を付し、重複する説明は省略する。 Hereinafter, embodiments will be described with reference to the drawings. In the drawings, the same or equivalent elements are designated by the same reference numerals, and duplicate description will be omitted.

図１は、実施の形態に係るシステムの構成図の一例である。 FIG. 1 is an example of a configuration diagram of a system according to an embodiment.

システム１０は、音声認識装置１０１−ｉ（ｉ＝１〜３）を有する。 The system 10 has a voice recognition device 101-i (i = 1-3).

音声認識装置１０１−ｉは、ユーザ２０１の音声を認識し、音声認識結果に応じた処理を行う。具体的には、例えば、音声認識装置１０１−ｉは、ユーザ２０１の発話による質問を検出し、当該質問に対する回答を音声で出力する。また、具体的には、例えば、音声認識装置１０１−ｉは、ユーザ２０１の発話による音声認識装置１０１−ｉに対する指示（例えば、電源のオン／オフ、または音量の増減など）を検出し、当該指示に従った音声認識装置１０１−ｉの制御（例えば、電源のオン／オフ、または音量の増減など）を行う。 The voice recognition device 101-i recognizes the voice of the user 201 and performs processing according to the voice recognition result. Specifically, for example, the voice recognition device 101-i detects a question uttered by the user 201 and outputs an answer to the question by voice. Specifically, for example, the voice recognition device 101-i detects an instruction (for example, turning on / off the power supply, increasing / decreasing the volume, etc.) to the voice recognition device 101-i by the speech of the user 201, and the said person concerned. The voice recognition device 101-i is controlled according to the instruction (for example, power on / off, volume increase / decrease, etc.).

音声認識装置１０１−ｉは、ＬＡＮ（Local Area Network）等のネットワーク３０１に接続され、互いに通信可能である。音声認識装置１０１−ｉは、ＷＡＮ（Wide Area Network）等の外部のネットワークにさらに接続していてもよい。 The voice recognition device 101-i is connected to a network 301 such as a LAN (Local Area Network) and can communicate with each other. The voice recognition device 101-i may be further connected to an external network such as a WAN (Wide Area Network).

音声認識装置１０１−ｉは、例えば、テレビ受像機、スマートスピーカー、スマートフォン、空気調和機、音響機器、またはＰＣ（Personal Computer）等のコンピュータである。 The voice recognition device 101-i is, for example, a computer such as a television receiver, a smart speaker, a smartphone, an air conditioner, an audio device, or a PC (Personal Computer).

また、音声認識装置１０１−ｉは、音声を出力可能であり、例えば、ユーザ２０１の質問に対する回答、テレビ放送の音声、ネットワークを介して受信した動画の音声、ＣＤ（Compact Disc）等の記録媒体から読み出した音声、または音声認識装置１０１−ｉ内に記憶されている音声などを出力してもよい。 Further, the voice recognition device 101-i can output voice, for example, an answer to a question of user 201, a voice of a television broadcast, a voice of a moving image received via a network, a recording medium such as a CD (Compact Disc). The voice read from, or the voice stored in the voice recognition device 101-i may be output.

尚、実施の形態における音声認識装置１０１−ｉの数は、一例であり、これに限られるものではない。また、実施の形態における音声認識装置１０１−ｉの位置およびユーザ２０１の位置は、一例であり、これに限られるものではない。 The number of voice recognition devices 101-i in the embodiment is an example, and the number is not limited to this. Further, the position of the voice recognition device 101-i and the position of the user 201 in the embodiment are examples, and are not limited thereto.

図２は、実施の形態に係る音声認識装置の構成図の一例である。 FIG. 2 is an example of a configuration diagram of the voice recognition device according to the embodiment.

図２では、音声認識装置１０１−１の構成について説明する。尚、音声認識装置１０１−２、１０１−３の構成は、音声認識装置１０１−１の構成と同様であるため、詳細な説明は省略する。 FIG. 2 describes the configuration of the voice recognition device 101-1. Since the configurations of the voice recognition devices 101-2 and 101-3 are the same as the configurations of the voice recognition devices 101-1, detailed description thereof will be omitted.

音声認識装置１０１−１は、マイク１１１、エコーキャンセル部１２１、音声認識部１３１、制御部１４１、記憶部１５１、通信部１６１、音声処理部１７１、およびスピーカー１８１を有する。 The voice recognition device 101-1 includes a microphone 111, an echo canceling unit 121, a voice recognition unit 131, a control unit 141, a storage unit 151, a communication unit 161 and a voice processing unit 171 and a speaker 181.

マイク１１１は、マイク１１１に入力される音声（例えば、ユーザ２０１が発話した音声、およびスピーカー１８１や他の音声認識装置１０１−ｊ（ｊ＝２、３）から出力された音声など）を電気信号に変換し、入力された音声を示す当該電気信号（入力音声信号）をエコーキャンセル部１２１に出力する。マイク１１１は、音声入力部の一例である。 The microphone 111 is an electric signal for the voice input to the microphone 111 (for example, the voice spoken by the user 201 and the voice output from the speaker 181 or another voice recognition device 101-j (j = 2, 3)). And outputs the electric signal (input voice signal) indicating the input voice to the echo canceling unit 121. The microphone 111 is an example of a voice input unit.

エコーキャンセル部１２１は、入力音声信号に対してエコーキャンセル処理を行い、エコーキャンセル後の音声信号を音声認識部１３１に出力する。具体的には、例えば、音声処理部１７１から出力されたスピーカー１８１から出力する音声に対応する出力音声信号に基づいて、入力音声信号に対してエコーキャンセル処理を行い、エコーキャンセル後の音声信号を音声認識部１３１に出力する。 The echo canceling unit 121 performs echo canceling processing on the input voice signal, and outputs the voice signal after echo cancellation to the voice recognition unit 131. Specifically, for example, based on the output voice signal corresponding to the voice output from the speaker 181 output from the voice processing unit 171, the input voice signal is subjected to echo cancellation processing, and the voice signal after echo cancellation is obtained. Output to the voice recognition unit 131.

また、エコーキャンセル部１２１は、マイク１１１に入力される音声の大きさを算出し、算出した音声の大きさを示す音量情報を制御部１４１に出力する。具体的には、例えば、入力音声信号から音圧レベルを算出し、算出した音圧レベルを示す音量情報を制御部１４１に出力する。 Further, the echo canceling unit 121 calculates the volume of the voice input to the microphone 111, and outputs the volume information indicating the calculated voice volume to the control unit 141. Specifically, for example, the sound pressure level is calculated from the input audio signal, and the volume information indicating the calculated sound pressure level is output to the control unit 141.

音声認識部１３１は、エコーキャンセル後の音声信号に対して音声認識処理を行い、音声認識結果を制御部１４１に出力する。音声認識結果は、例えば、マイク１１１に入力される音声を音声認識処理によってテキスト化したテキストデータである。 The voice recognition unit 131 performs voice recognition processing on the voice signal after echo cancellation, and outputs the voice recognition result to the control unit 141. The voice recognition result is, for example, text data obtained by converting the voice input to the microphone 111 into text by voice recognition processing.

制御部１４１は、他の音声認識装置が音声を再生中であるか否かに基づいて、ユーザ２０１の発話によるマイク１１１に入力される音声の音声認識結果に基づく、ユーザ２０１の発話によるマイク１１１に入力される音声に応じた処理を行う。また、制御部１４１は、自装置が音声再生中であるか否かおよび他の音声認識装置が音声を再生中であるか否かに基づいて、ユーザ２０１の発話によるマイク１１１に入力される音声の音声認識結果に基づく、ユーザ２０１の発話によるマイク１１１に入力される音声に応じた処理を行ってもよい。また、制御部１４１は、マイク１１１に入力される特定の言葉の音声の大きさ、他の音声認識装置１０１−ｊに入力される特定の言葉の音声の大きさ、自装置が音声再生中であるか否か、および他の音声認識装置が音声を再生中であるか否かに基づいて、ユーザ２０１の発話によるマイク１１１に入力される音声の音声認識結果に基づく、ユーザ２０１の発話によるマイク１１１に入力される音声に応じた処理を行ってもよい。 The control unit 141 is based on the voice recognition result of the voice input to the microphone 111 by the utterance of the user 201 based on whether or not another voice recognition device is playing the voice, and the microphone 111 by the utterance of the user 201. Performs processing according to the voice input to. Further, the control unit 141 inputs the voice input to the microphone 111 by the utterance of the user 201 based on whether or not the own device is playing voice and whether or not another voice recognition device is playing voice. Based on the voice recognition result of the above, processing may be performed according to the voice input to the microphone 111 by the utterance of the user 201. Further, the control unit 141 has a loudness of the voice of a specific word input to the microphone 111, a loudness of the voice of a specific word input to another voice recognition device 101-j, and the own device is playing a voice. A microphone spoken by the user 201 based on the voice recognition result of the voice input to the microphone 111 spoken by the user 201 based on the presence or absence and whether another voice recognition device is playing the voice. Processing may be performed according to the voice input to 111.

具体的には、例えば、制御部１４１は、自装置が音声再生中であるか否かおよび他の音声認識装置が音声を再生中であるか否かに基づいて、ユーザ２０１の発話によるマイク１１１に入力される音声（例えば、ユーザ２０１の発話）が示す質問または指示に応じた処理を行う。具体的には、例えば、制御部１４１は、マイク１１１に入力されるユーザ２０１の質問に応じて、インターネット等を検索し、検索結果に基づく質問に対する回答を出力する。例えば、制御部１４１は、自装置が音声再生中であるか否かおよび他の音声認識装置が音声を再生中であるか否かに基づいて、ユーザ２０１の発話によるマイク１１１に入力される指示に応じて、音声認識装置１０１−１の制御（例えば、電源のオン／オフ、音量の調整、または指示に応じた特定の機能の実行）、または他の音声認識装置への指示などの処理を行う。 Specifically, for example, the control unit 141 uses the microphone 111 uttered by the user 201 based on whether or not the own device is playing voice and whether or not another voice recognition device is playing voice. Performs processing according to a question or instruction indicated by the voice input to the user (for example, the utterance of the user 201). Specifically, for example, the control unit 141 searches the Internet or the like in response to the question of the user 201 input to the microphone 111, and outputs an answer to the question based on the search result. For example, the control unit 141 gives an instruction to be input to the microphone 111 by the voice of the user 201 based on whether or not the own device is playing voice and whether or not another voice recognition device is playing voice. Depending on the control of the voice recognition device 101-1 (for example, turning the power on / off, adjusting the volume, or executing a specific function according to the instruction), or performing a process such as instructing another voice recognition device. conduct.

また、制御部１４１は、スピーカー１８１から音声を出力するための処理を行ってもよく、具体的には、例えば、ネットワーク３０１を介して受信した動画の音声、ＣＤ等の記録媒体から読み出した音声、または音声認識装置１０１−ｉ内に記憶されている音声のそれぞれの音声信号を音声処理部１７１に出力してもよい。 Further, the control unit 141 may perform a process for outputting sound from the speaker 181. Specifically, for example, the sound of a moving image received via the network 301, the sound read from a recording medium such as a CD, etc. , Or each voice signal of the voice stored in the voice recognition device 101-i may be output to the voice processing unit 171.

記憶部１５１は、音声認識装置１０１−１で利用されるプログラムやデータ等を記憶する。記憶部１５１は、例えば、ＲＡＭ（Random Access Memory）、ＨＤＤ（Hard Disk Drive）、またはフラッシュメモリ等である。 The storage unit 151 stores programs, data, and the like used in the voice recognition device 101-1. The storage unit 151 is, for example, a RAM (Random Access Memory), an HDD (Hard Disk Drive), a flash memory, or the like.

通信部１６１は、ＬＡＮ等のネットワーク３０１に接続され、通信に伴うデータ変換を行う通信インターフェースである。通信部１６１は、音声認識装置１０１−２、１０１−３と通信を行う。また、通信部１６１は、音声認識装置１０１−２、１０１−３以外の装置とさらに通信を行ってもよい。 The communication unit 161 is a communication interface that is connected to a network 301 such as a LAN and performs data conversion associated with communication. The communication unit 161 communicates with the voice recognition devices 101-2 and 101-3. Further, the communication unit 161 may further communicate with a device other than the voice recognition devices 101-2 and 101-3.

具体的には、例えば、通信部１６１は、他の音声認識装置１０１−ｊが他の音声認識装置１０１−ｊのスピーカーから音声を再生中であることを示す再生中情報を他の音声認識装置１０１−ｊから受信する。 Specifically, for example, the communication unit 161 uses the other voice recognition device to provide playback information indicating that the other voice recognition device 101-j is playing a voice from the speaker of the other voice recognition device 101-j. Received from 101-j.

また、具体的には、例えば、通信部１６１は、他の音声認識装置１０１−ｊのマイクに入力された音声の大きさを示す音量情報を他の音声認識装置１０１−ｊから受信する。音声の大きさは、例えば、音声の音圧レベルである。 Specifically, for example, the communication unit 161 receives volume information indicating the magnitude of the voice input to the microphone of the other voice recognition device 101-j from the other voice recognition device 101-j. The loudness of the voice is, for example, the sound pressure level of the voice.

また、具体的には、例えば、通信部１６１は、自装置（すなわち、音声認識装置１０１−１）にユーザ２０１からの音声による質問または指示に応じた処理を任せることを示す委任情報を受信する。 Further, specifically, for example, the communication unit 161 receives the delegation information indicating that the own device (that is, the voice recognition device 101-1) is entrusted with the processing according to the voice question or instruction from the user 201. ..

音声処理部１７１は、スピーカー１８１から出力する音声に対応する音声信号（出力音声信号）をスピーカー１８１およびエコーキャンセル部１２１に出力する。例えば、音声処理部１７１は、入力された発話データに対応する音声信号をスピーカー１８１およびエコーキャンセル部１２１に出力する。また、音声処理部１７１は、制御部１６１またはチューナー（不図示）等から入力される、テレビ放送の音声、ネットワーク３０１を介して受信した動画の音声、ＣＤ等の記録媒体から読み出した音声、または音声認識装置１０１−ｉ内に記憶されている音声のそれぞれの音声信号をスピーカー１８１およびエコーキャンセル部１２１に出力してもよい。 The voice processing unit 171 outputs a voice signal (output voice signal) corresponding to the voice output from the speaker 181 to the speaker 181 and the echo canceling unit 121. For example, the voice processing unit 171 outputs a voice signal corresponding to the input utterance data to the speaker 181 and the echo canceling unit 121. Further, the audio processing unit 171 may input from the control unit 161 or a tuner (not shown) or the like, the audio of a television broadcast, the audio of a moving image received via the network 301, the audio read from a recording medium such as a CD, or the audio. Each voice signal of the voice stored in the voice recognition device 101-i may be output to the speaker 181 and the echo canceling unit 121.

スピーカー１８１は、音声処理部１７１から出力された出力音声信号を音声に変換して出力する。 The speaker 181 converts the output voice signal output from the voice processing unit 171 into voice and outputs it.

図３Ａ〜３Ｃは、実施の形態に係る音声認識方法（その１）のフローチャートの一例である。図３Ａ〜３Ｃでは、音声認識装置１０１−１の処理について説明する。尚、音声認識装置１０１−２、１０１−３も同様に音声認識方法（その１）を行う。 3A to 3C are examples of a flowchart of the voice recognition method (No. 1) according to the embodiment. In FIGS. 3A to 3C, the processing of the voice recognition device 101-1 will be described. The voice recognition devices 101-2 and 101-3 also perform the voice recognition method (No. 1) in the same manner.

ステップＳ３０１において、制御部１４１は、ユーザ２０１の発話にホットワードが含まれているか否か判定し、ユーザ２０１の発話にホットワードが含まれていると判定された場合、制御はステップＳ３０２に進む。ここで、ホットワード（または、ウェイクワードとも呼ばれる）とは、予め定められた特定の言葉であり、特定の処理や機能（例えば、実施の形態のステップＳ３０２以降の処理）の開始のトリガーとなる言葉である。ホットワードは、例えば、記憶部１５１に予め記憶されている。ホットワードは、例えば、「ＯＫ ○○」または「ハロー ○○」（○○は、例えば、音声認識装置１０１−ｉの製品名や音声認識装置１０１−ｉに搭載された音声アシスタントの名称）等である。 In step S301, the control unit 141 determines whether or not the utterance of the user 201 contains a hot word, and if it is determined that the utterance of the user 201 contains a hot word, the control proceeds to step S302. .. Here, the hot word (also also referred to as a wake word) is a predetermined specific word, which triggers the start of a specific process or function (for example, the process after step S302 of the embodiment). It's a word. The hot word is stored in advance in, for example, the storage unit 151. The hot word is, for example, "OK XX" or "Hello XX" (where XX is, for example, the product name of the voice recognition device 101-i or the name of the voice assistant mounted on the voice recognition device 101-i). Is.

具体的には、例えば、制御部１４１は、音声認識部１３１から入力される音声認識結果にホットワードが含まれているか否か判定し、ホットワードが含まれていると判定された場合、制御はステップＳ３０２に進む。ホットワードが含まれていないと判定された場合、制御部１４１は、音声認識結果が入力されるのを待つ。 Specifically, for example, the control unit 141 determines whether or not the voice recognition result input from the voice recognition unit 131 includes a hot word, and if it is determined that the hot word is included, the control unit 141 controls. Goes to step S302. If it is determined that the hot word is not included, the control unit 141 waits for the voice recognition result to be input.

ステップＳ３０２において、制御部１４１は、マイク１１１に入力される音声の大きさを示す音量情報を取得する。具体的には、エコーキャンセル部１２１は、マイク１１１に入力される音声の大きさを算出し、算出した音声の大きさを示す音量情報を制御部１４１に出力し、制御部１４１は、音量情報を取得する。音声の大きさは、例えば、音声の音圧レベルである。 In step S302, the control unit 141 acquires volume information indicating the volume of the voice input to the microphone 111. Specifically, the echo canceling unit 121 calculates the volume of the voice input to the microphone 111, outputs the volume information indicating the calculated voice volume to the control unit 141, and the control unit 141 outputs the volume information. To get. The loudness of the voice is, for example, the sound pressure level of the voice.

ステップＳ３０３において、制御部１４１は、自装置（音声認識装置１０１−１）が音声を再生中か否か判定する。例えば、制御部１４１は、音声処理部１７１が音声信号を出力しているか否か、および制御部１４１が発話データまたは音声信号を出力しているか否かに基づいて、自装置（音声認識装置１０１−１）が音声を再生中か否か判定する。自装置（音声認識装置１０１−１）が音声を再生中と判定された場合（ステップＳ３０３：Ｙｅｓ）、制御はステップＳ３０４に進み、自装置（音声認識装置１０１−１）が音声を再生中でないと判定された場合（ステップＳ３０３：Ｎｏ）、制御はステップＳ３０６に進む。 In step S303, the control unit 141 determines whether or not the own device (voice recognition device 101-1) is playing back the voice. For example, the control unit 141 owns the device (voice recognition device 101) based on whether or not the voice processing unit 171 outputs a voice signal and whether or not the control unit 141 outputs utterance data or a voice signal. -1) determines whether or not the sound is being played. When it is determined that the own device (voice recognition device 101-1) is playing the voice (step S303: Yes), the control proceeds to step S304, and the own device (voice recognition device 101-1) is not playing the voice. If it is determined (step S303: No), the control proceeds to step S306.

ステップＳ３０４において、制御部１４１は、他の音声認識装置１０１−ｊに、自装置（音声認識装置１０１−１）が音声再生中であることを示す再生中情報を通信部１６１から送信する。 In step S304, the control unit 141 transmits from the communication unit 161 the reproduction information indicating that the own device (voice recognition device 101-1) is reproducing the voice to the other voice recognition device 101-j.

ステップＳ３０５において、制御部１４１は、通信部１６１が他の音声認識装置１０１−ｊが音声再生中であることを示す再生中情報を、他の音声認識装置１０１−ｊのいずれかから受信したか否か判定する。再生中情報を受信したと判定された場合（ステップＳ３０５：Ｙｅｓ）、制御はステップＳ３０８に進み、再生中情報を受信していないと判定された場合（ステップＳ３０５：Ｎｏ）、制御はステップＳ３１１に進む。 In step S305, the control unit 141 has received the playback information indicating that the other voice recognition device 101-j is playing the voice from any of the other voice recognition devices 101-j. Judge whether or not. If it is determined that the playback information has been received (step S305: Yes), the control proceeds to step S308, and if it is determined that the playback information has not been received (step S305: No), the control proceeds to step S311. move on.

ステップＳ３０６において、制御部１４１は、通信部１６１が他の音声認識装置１０１−ｊが音声再生中であることを示す再生中情報を、他の音声認識装置１０１−ｊのいずれかから受信したか否か判定する。再生中情報を受信したと判定された場合（ステップＳ３０６：Ｙｅｓ）、制御はステップＳ３０７に進み、再生中情報を受信していないと判定された場合（ステップＳ３０６：Ｎｏ）、制御はステップＳ３０８に進む。 In step S306, has the control unit 141 received the reproduction information indicating that the other voice recognition device 101-j is playing the voice from any of the other voice recognition devices 101-j? Judge whether or not. If it is determined that the information being played back has been received (step S306: Yes), the control proceeds to step S307, and if it is determined that the information being played back has not been received (step S306: No), the control goes to step S308. move on.

ステップＳ３０７において、制御部１４１は、通信部１６１が他の音声認識装置１０１−ｊが音声再生中であることを示す再生中情報を、複数の他の音声認識装置１０１−ｊから受信したか否か判定する。すなわち、制御部１４１は、複数の再生中情報を受信したか否か判定する。例えば、図１のシステムにおいて、制御部１４１は、音声認識装置１０１−２および音声認識装置１０１−３の両方から再生中情報を受信したか否か判定する。複数の他の音声認識装置１０１−ｊから再生中情報を受信したと判定された場合（ステップＳ３０７：Ｙｅｓ）、制御はステップＳ３０８に進み、複数の他の音声認識装置１０１−ｊから再生中情報を受信していないと判定された場合（ステップＳ３０７：Ｎｏ）、処理は終了する。ステップＳ３０７において、複数の再生中情報を受信していないと判定された場合、すなわち、１つの他の音声認識装置からのみ再生中情報を受信している場合、自装置（音声認識装置１０１−１）は音声を再生中ではなく、且つ当該１つの他の音声認識装置が音声を再生中である。そのため、自装置（音声認識装置１０１−１）は、後述のユーザ２０１の質問または指示に応じた処理を行わず、ホットワードを検出し且つ音声を再生中である当該１つの他の音声認識装置がユーザ２０１の質問または指示に応じた処理を行う。 In step S307, the control unit 141 has received the reproduction information indicating that the other voice recognition device 101-j is playing the voice from the plurality of other voice recognition devices 101-j. Is determined. That is, the control unit 141 determines whether or not a plurality of playback information has been received. For example, in the system of FIG. 1, the control unit 141 determines whether or not the reproduction information is received from both the voice recognition device 101-2 and the voice recognition device 101-3. If it is determined that the playback information has been received from the plurality of other voice recognition devices 101-j (step S307: Yes), the control proceeds to step S308, and the playback information is being played from the plurality of other voice recognition devices 101-j. When it is determined that the above is not received (step S307: No), the process ends. In step S307, when it is determined that the plurality of playing information is not received, that is, when the playing information is received only from one other voice recognition device, the own device (voice recognition device 101-1). ) Is not playing the voice, and the other voice recognition device is playing the voice. Therefore, the own device (speech recognition device 101-1) does not perform processing according to the question or instruction of the user 201 described later, and detects the hot word and is playing the voice. Performs processing according to the question or instruction of the user 201.

ステップＳ３０８の説明の前に、実施の形態における親機について説明する。実施の形態において、音声認識装置１０１−１〜１０１−３のうちの１つの音声認識装置が予め親機に定められており、親機に定められている音声認識装置の記憶部１５１には、自装置が親機であることを示す情報が記憶されている。また、親機に定められていない音声認識装置の記憶部１５１には、親機を示す親機に定められている音声認識装置を示す情報が記憶されている。制御部１４１は、例えば、ステップＳ３０８またはステップＳ３０８以前のいずれかのタイミングで、記憶部１５１に自身が親機であることを示す情報が記憶されているか否かに基づいて、自身が親機であるか否か判定する。 Prior to the description of step S308, the master unit according to the embodiment will be described. In the embodiment, one of the voice recognition devices 101-1 to 101-3 has a voice recognition device defined in advance in the master unit, and the storage unit 151 of the voice recognition device defined in the master unit has a storage unit 151. Information indicating that the own device is the master unit is stored. Further, in the storage unit 151 of the voice recognition device which is not defined as the master unit, information indicating the voice recognition device defined as the master unit indicating the master unit is stored. The control unit 141 is the master unit itself, for example, based on whether or not the storage unit 151 stores information indicating that the master unit is the master unit at either the timing of step S308 or before step S308. Determine if it exists.

ステップＳ３０８において、制御部１４１は、自装置（音声認識装置１０１−１）が親機でない場合、ステップＳ３０２で取得した音量情報を親機に通信部１６１から送信する。自装置（音声認識装置１０１−１）が親機であり、他の音声認識装置１０１−ｊが音量情報を親機に送信している場合、通信部１６１は、他の音声認識装置１０１−ｊから音量情報を受信する。 In step S308, when the own device (voice recognition device 101-1) is not the master unit, the control unit 141 transmits the volume information acquired in step S302 to the master unit from the communication unit 161. When the own device (voice recognition device 101-1) is the master unit and another voice recognition device 101-j transmits volume information to the master unit, the communication unit 161 uses the other voice recognition device 101-j. Receive volume information from.

ステップＳ３０９において、制御部１４１は、自装置（音声認識装置１０１−１）が親機である場合、ステップＳ３０２で取得した音量情報と他の音声認識装置１０１−ｊから受信した音量情報から、音声認識装置１０１−ｉのそれぞれに入力された音声の大きさを比較し、１番大きい音声が入力された音声認識装置を判定し、１番大きい音声が入力されたと判定された音声認識装置にユーザ２０１からの音声による質問または指示に応じた処理を任せることを示す委任情報を送信する。そして、１番大きい音声が入力されたと判定された音声認識装置は、親機から委任情報を受信する。尚、自装置が親機であり且つ１番大きい音声が入力された音声認識装置であると判定された場合、制御部１４１は、自装置に委任情報を実際に送信してもよいし、実際には自装置に委任情報を送信せずに親機から委任情報を受信したことにしてもよい。 In step S309, when the own device (voice recognition device 101-1) is the master unit, the control unit 141 voices from the volume information acquired in step S302 and the volume information received from another voice recognition device 101-j. The user compares the volume of the voice input to each of the recognition devices 101-i, determines the voice recognition device in which the loudest voice is input, and determines that the voice recognition device in which the loudest voice is input is the voice recognition device. The delegation information indicating that the processing according to the voice question or instruction from 201 is entrusted is transmitted. Then, the voice recognition device determined that the loudest voice has been input receives the delegation information from the master unit. If it is determined that the own device is the master unit and the voice recognition device to which the loudest voice is input, the control unit 141 may actually transmit the delegated information to the own device, or actually. It is also possible that the delegation information is received from the master unit without transmitting the delegation information to the own device.

ステップＳ３１０において、制御部１４１は、親機から委任情報を受信したか否か判定し、委任情報を受信したと判定された場合、制御はステップＳ３１１に進み、委任情報を受信していないと判定された場合、処理を終了する。 In step S310, the control unit 141 determines whether or not the delegation information has been received from the master unit, and if it is determined that the delegation information has been received, the control proceeds to step S311 and determines that the delegation information has not been received. If so, the process ends.

ステップＳ３１１において、制御部１４１は、ユーザ２０１からの音声による質問または指示の待ち受けを開始する。具体的に、例えば、制御部１４１は、音声認識部１３１から入力される音声認識結果に基づいて、ユーザ２０１からの質問または指示を検出し、ユーザ２０１の音声による質問または指示に応じて、情報の検索や装置の操作などの処理を行う音声アシスタントを開始する。 In step S311 the control unit 141 starts listening to a question or instruction by voice from the user 201. Specifically, for example, the control unit 141 detects a question or instruction from the user 201 based on the voice recognition result input from the voice recognition unit 131, and information is given in response to the voice question or instruction of the user 201. Start a voice assistant that performs processing such as searching for and operating the device.

ステップＳ３１２において、制御部１４１は、ユーザ２０１からの音声による質問または指示の入力があるか否か判定し、質問または指示の入力があると判定された場合、制御はステップＳ３１３に進む。質問または指示の入力がないと判定された場合、制御部１４１は、質問または指示が入力されるのを待つ。具体的に、例えば、制御部１４１は、音声認識部１３１から入力される音声認識結果に基づいて、ユーザ２０１からの質問または指示が入力されたか否か判定し、質問または指示の入力があると判定された場合、制御はステップＳ３１３に進む。 In step S312, the control unit 141 determines whether or not there is a voice question or instruction input from the user 201, and if it is determined that there is a question or instruction input, the control proceeds to step S313. If it is determined that the question or instruction is not input, the control unit 141 waits for the question or instruction to be input. Specifically, for example, the control unit 141 determines whether or not a question or instruction from the user 201 has been input based on the voice recognition result input from the voice recognition unit 131, and if there is an input of the question or instruction. If determined, control proceeds to step S313.

ステップＳ３１３において、制御部１４１は、ユーザ２０１からの音声による質問または指示に応じた処理を行う。具体的には、例えば、制御部１４１は、ユーザ２０１からの質問に対する回答を検索し、当該回答のテキストデータ（発話データ）を音声処理部１７１に出力し、音声処理部１７１は当該回答を音声でスピーカー１８１から出力する。具体的には、例えば、制御部１４１は、ユーザ２０１からの音声による指示に応じた音声認識装置１０１−１の制御（例えば、電源のオン／オフ、音量の調整、または指示に応じた特定の機能の実行）、または他の音声認識装置への指示などの処理を行う。 In step S313, the control unit 141 performs a process in response to a voice question or instruction from the user 201. Specifically, for example, the control unit 141 searches for an answer to the question from the user 201, outputs the text data (utterance data) of the answer to the voice processing unit 171 and the voice processing unit 171 outputs the answer by voice. Output from speaker 181. Specifically, for example, the control unit 141 controls the voice recognition device 101-1 in response to a voice instruction from the user 201 (for example, power on / off, volume adjustment, or a specific response to the instruction. Perform functions) or perform processing such as instructing other speech recognition devices.

ステップＳ３１４において、制御部１４１は、ユーザ２０１からの音声による質問または指示が終了したか否か判定し、質問または指示が終了したと判定された場合、処理を終了し、質問または指示が終了していないと判定された場合、制御はステップＳ３１１に戻る。 In step S314, the control unit 141 determines whether or not the voice question or instruction from the user 201 has ended, and if it is determined that the question or instruction has ended, the process ends and the question or instruction ends. If it is determined that the control has not been performed, the control returns to step S311.

上述の音声認識方法（その１）では、ホットワードを検出した音声認識装置１０１−ｉのそれぞれにおいて、自装置が音声再生中であるか、および他の音声認識装置が音声再生中であることを示す再生中情報を受信したかに基づいて、ユーザ２０１の音声による質問または指示に応じた情報の検索や装置の操作などの処理を行う音声アシスタントを実行している。ホットワードを検出した音声認識装置１０１−ｉのそれぞれは、自装置が音声再生中であるか、および再生中情報を受信したかについて、以下の４つのパターン（１）〜（４）に応じた処理を行う。 In the above-mentioned voice recognition method (No. 1), in each of the voice recognition devices 101-i that have detected the hot word, it is determined that the own device is playing a voice or another voice recognition device is playing a voice. A voice assistant that performs processing such as searching for information according to a question or instruction by the voice of the user 201 and operating the device is executed based on whether or not the indicated playing information is received. Each of the voice recognition devices 101-i that detected the hot word corresponds to the following four patterns (1) to (4) as to whether the own device is playing voice or receiving the playing information. Perform processing.

（１）自装置が音声再生中であり、且つ再生中情報を受信した場合（ステップＳ３０３：Ｙｅｓ、且つステップＳ３０５：Ｙｅｓ）
自装置が音声再生しているときかつ、再生中情報を受信したときは、複数の音声認識装置が音声を再生していると判断できるので、ユーザ２０１と一番近い音声認識装置に音声認識をさせると音声認識の精度が一番良くなる。よって次のように動作させる。 (1) When the own device is playing audio and receives playback information (step S303: Yes, and step S305: Yes).
When the own device is playing voice and the information being played is received, it can be determined that a plurality of voice recognition devices are playing voice, so voice recognition is applied to the voice recognition device closest to the user 201. If you do, the accuracy of voice recognition will be the best. Therefore, it operates as follows.

自装置が音声再生していることを判断したあと、他の音声認識機器に再生中情報を送信する。 After determining that the own device is playing voice, the information being played is transmitted to another voice recognition device.

その後、自装置が再生中情報を受信した後、あらかじめ決められた音量情報を送信する。これにより、親機は音声再生中の各音声認識装置の音量情報を収集できる。 After that, after the own device receives the playback information, it transmits a predetermined volume information. As a result, the master unit can collect volume information of each voice recognition device during voice reproduction.

その後、親機は、集められた音量情報の中で一番大きい音量情報を送信した音声認識装置に委任情報を送り、各音声認識装置は親機から通知を受信したか判定する。 After that, the master unit sends the delegated information to the voice recognition device that has transmitted the loudest volume information among the collected volume information, and each voice recognition device determines whether or not the notification has been received from the master unit.

委任情報を受信した音声認識装置はユーザ２０１の音声による質問または指示に応じた処理を行い、委任情報を受信しなかった音声認識装置は処理を終了する。このように、親機は、ユーザ２０１とどの音声認識装置の距離が一番近いか判断を行い、ユーザ２０１と一番近い音声認識装置がユーザ２０１の音声による質問または指示に応じた処理を行う。 The voice recognition device that has received the delegation information performs processing according to the question or instruction by the voice of the user 201, and the voice recognition device that has not received the delegation information ends the processing. In this way, the master unit determines which voice recognition device is the closest to the user 201, and the voice recognition device closest to the user 201 performs processing according to the voice question or instruction of the user 201. ..

（２）自装置が音声再生中であり、且つ再生中情報を受信しない場合（ステップＳ３０３：Ｙｅｓ、且つステップＳ３０５：Ｎｏ）
自装置が音声再生していることを判断したあと、他の音声認識機器に再生中情報を送信する。 (2) When the own device is playing audio and does not receive playback information (step S303: Yes, and step S305: No).
After determining that the own device is playing voice, the information being played is transmitted to another voice recognition device.

また、再生中情報を受信していない状態は他の音声認識機器が音声を再生していないと判断できるので、自装置がユーザ２０１の音声による質問または指示に応じた処理を行う。 Further, since it can be determined that the other voice recognition device is not playing the voice in the state where the information during playback is not received, the own device performs the process according to the question or instruction by the voice of the user 201.

このように音声を再生中の音声認識機器がユーザ２０１の音声による質問または指示に応じた処理を行うことができる。 In this way, the voice recognition device that is playing back the voice can perform processing according to the question or instruction by the voice of the user 201.

（３）自装置が音声再生中でなく、且つ再生中情報を受信した場合（ステップＳ３０３：Ｎｏ、且つステップＳ３０６：Ｙｅｓ）
自装置が音声を再生していないときかつ、再生中情報を受信したときは、他の音声認識機器が音声を再生していると判断できるので、自装置は音声認識を行わず処理を終了する。 (3) When the own device is not playing audio and receives playback information (step S303: No, and step S306: Yes).
When the own device is not playing the voice and the information being played is received, it can be determined that another voice recognition device is playing the voice, so the own device ends the process without performing voice recognition. ..

自装置が複数から再生中情報を受けていた場合は複数の音声認識機器が音声を再生している状況なので、「（１）自装置が音声再生中であり、且つ再生中情報を受信した場合」と同様に親機に音圧情報を送信し、親機は、ユーザ２０１とどの音声認識装置の距離が一番近いか判断を行い、ユーザ２０１と一番近い音声認識装置がユーザ２０１の音声による質問または指示に応じた処理を行う。 If the own device receives playback information from multiple devices, it means that multiple voice recognition devices are playing the voice. Therefore, "(1) When the own device is playing the voice and receives the playback information. The sound pressure information is transmitted to the master unit, the master unit determines which voice recognition device is the closest to the user 201, and the voice recognition device closest to the user 201 is the voice of the user 201. Take action according to the question or instruction given by.

（４）自装置が音声再生中でなく、且つ再生中情報を受信しない場合（ステップＳ３０３：Ｎｏ、且つステップＳ３０６：Ｎｏ）
自装置が音声を再生していないときかつ、再生中情報を受信しなかったときは、どの音声認識装置も音声を再生していないと判断できるので「（１）自装置が音声再生中であり、且つ再生中情報を受信した場合」と同様に親機に音圧情報を送信し、親機は、集められた音量情報の中で一番大きい音量情報を送信した音声認識装置に委任情報を送り、各音声認識装置は親機から通知を受信したか判定する。そして、委任情報を受信した音声認識装置はユーザ２０１の音声による質問または指示に応じた処理を行い、委任情報を受信しなかった音声認識装置は処理を終了する。 (4) When the own device is not playing audio and does not receive playback information (step S303: No, and step S306: No).
When the own device is not playing the voice and does not receive the playback information, it can be determined that no voice recognition device is playing the voice. Therefore, "(1) The own device is playing the voice. , And when the playback information is received ", the sound pressure information is transmitted to the master unit, and the master unit sends the delegated information to the voice recognition device that transmitted the loudest volume information among the collected volume information. Send, and each voice recognition device determines whether a notification has been received from the master unit. Then, the voice recognition device that has received the delegation information performs processing according to the question or instruction by the voice of the user 201, and the voice recognition device that has not received the delegation information ends the processing.

上記４つのパターンに応じた処理を各音声認識装置が実行することで、各音声認識装置は状況にあった動作を行うことができる。 By each voice recognition device performing processing according to the above four patterns, each voice recognition device can perform an operation suitable for the situation.

実施の形態の音声認識方法（その１）によれば、自装置が音声再生中であるか、および他の音声認識装置が音声再生中であることを示す再生中情報を受信したかに基づいて、ユーザの音声に応じた処理を行うことで、音声認識の精度を向上することができる。詳細には、例えば、他の音声認識装置が音声を再生中の場合は、自装置は音声認識を用いたユーザの音声に応じた処理は行わないので、他の音声認識装置が再生している音声を除去する必要がなくなり、音声を再生している他の音声認識装置自身が自身で再生している音声をマイクから入力された音声から除去（エコーキャンセル処理）して音声認識を行うので、音声認識の精度が向上する。 According to the voice recognition method (No. 1) of the embodiment, it is based on whether the own device is playing voice or another voice recognition device receives playing information indicating that the voice is being played. , The accuracy of voice recognition can be improved by performing processing according to the voice of the user. Specifically, for example, when another voice recognition device is playing a voice, the own device does not perform processing according to the user's voice using voice recognition, so that the other voice recognition device is playing. It is no longer necessary to remove the voice, and the other voice recognition device itself that is playing the voice removes the voice that is being played by itself from the voice input from the microphone (echo cancel processing), so voice recognition is performed. The accuracy of voice recognition is improved.

また、音声認識装置１０１−ｉは、音声認識方法（その１）に代えて、以下に説明するような音声認識方法（その２）を行ってもよい。 Further, the voice recognition device 101-i may perform the voice recognition method (No. 2) as described below instead of the voice recognition method (No. 1).

図４Ａ〜４Ｃは、実施の形態に係る音声認識方法（その２）のフローチャートの一例である。図４Ａ〜４Ｃでは、音声認識装置１０１−１の処理について説明する。尚、音声認識装置１０１−２、１０１−３も同様に音声認識方法（その２）を行っている。 4A to 4C are examples of the flowchart of the voice recognition method (No. 2) according to the embodiment. In FIGS. 4A to 4C, the processing of the voice recognition device 101-1 will be described. The voice recognition devices 101-2 and 101-3 also use the voice recognition method (No. 2) in the same manner.

ステップＳ４０１において、制御部１４１は、ユーザ２０１の発話にホットワードが含まれているか否か判定し、ユーザ２０１の発話にホットワードが含まれていると判定された場合、制御はステップＳ４０２に進む。具体的には、例えば、制御部１４１は、音声認識部１３１から入力される音声認識結果にホットワードが含まれているか否か判定し、ホットワードが含まれていると判定された場合、制御はステップＳ４０２に進む。ホットワードが含まれていないと判定された場合、制御部１４１は、音声認識結果が入力されるのを待つ。 In step S401, the control unit 141 determines whether or not the utterance of the user 201 contains a hot word, and if it is determined that the utterance of the user 201 contains a hot word, the control proceeds to step S402. .. Specifically, for example, the control unit 141 determines whether or not the voice recognition result input from the voice recognition unit 131 includes a hot word, and if it is determined that the hot word is included, the control unit 141 controls. Goes to step S402. If it is determined that the hot word is not included, the control unit 141 waits for the voice recognition result to be input.

ステップＳ４０２において、制御部１４１は、マイク１１１に入力される音声の大きさを示す音量情報を取得し、音量情報を他の音声認識装置１０１−ｊに送信する。具体的には、エコーキャンセル部１２１は、マイク１１１に入力されるホットワードの音声の大きさを算出し、算出した音声の大きさを示す音量情報を制御部１４１に出力し、制御部１４１は、音量情報を取得し、音量情報を他の音声認識装置１０１−ｊに送信する。音声の大きさは、例えば、音声の音圧レベルである。 In step S402, the control unit 141 acquires volume information indicating the volume of the voice input to the microphone 111, and transmits the volume information to another voice recognition device 101-j. Specifically, the echo canceling unit 121 calculates the volume of the voice of the hot word input to the microphone 111, outputs the volume information indicating the calculated voice volume to the control unit 141, and the control unit 141 outputs the volume information indicating the volume of the calculated voice. , Acquires the volume information and transmits the volume information to another voice recognition device 101-j. The loudness of the voice is, for example, the sound pressure level of the voice.

ステップＳ４０３において、制御部１４１は、自装置（音声認識装置１０１−１）が音声再生中であるか否か判定し、自装置（音声認識装置１０１−１）が音声再生中であると判定された場合、他の音声認識装置１０１−ｊに、自装置（音声認識装置１０１−１）が音声再生中であることを示す再生中情報を通信部１６１から他の音声認識装置１０１−ｊに送信する。 In step S403, the control unit 141 determines whether or not the own device (voice recognition device 101-1) is playing a voice, and determines that the own device (voice recognition device 101-1) is playing a voice. If this is the case, the communication unit 161 transmits to the other voice recognition device 101-j the playback information indicating that the own device (voice recognition device 101-1) is playing the voice. do.

ステップＳ４０４において、制御部１４１は、所定時間、他の音声認識装置１０１−ｊから通知（音量情報または再生中情報）を受信するのを待つ。制御部１４１は、他の音声認識装置１０１−ｊが音量情報または再生中情報を送信した場合、他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさを示す音量情報または他の音声認識装置１０１−ｊが音声再生中であることを示す再生中情報を通信部１６１を介して受信する。 In step S404, the control unit 141 waits for a predetermined time to receive a notification (volume information or playback information) from another voice recognition device 101-j. When the other voice recognition device 101-j transmits the volume information or the playing information, the control unit 141 indicates the volume information or the other indicating the volume of the hot word voice input to the other voice recognition device 101-j. The voice recognition device 101-j of the above is receiving the reproduction information indicating that the voice is being reproduced via the communication unit 161.

ステップＳ４０５において、制御部１４１は、ステップＳ４０４の所定時間の間に、他の音声認識装置１０１−ｊから通知（音量情報または再生中情報）を受信したか判定し、他の音声認識装置１０１−ｊから通知を受信したと判定された場合（ステップＳ４０５：Ｙｅｓ）、制御はステップＳ４０６に進み、他の音声認識装置１０１−ｊから通知を受信していないと判定された場合（ステップＳ４０５：Ｎｏ）、制御はステップＳ４１１に進む。 In step S405, the control unit 141 determines whether the notification (volume information or playback information) has been received from the other voice recognition device 101-j during the predetermined time in step S404, and determines whether the other voice recognition device 101- When it is determined that the notification has been received from j (step S405: Yes), the control proceeds to step S406, and when it is determined that the notification has not been received from the other voice recognition device 101-j (step S405: No). ), Control proceeds to step S411.

ステップＳ４０６において、制御部１４１は、音声再生中の他の音声認識装置１０１−ｊがあるか否か判定し、音声再生中の他の音声認識装置１０１−ｊがあると判定された場合（ステップＳ４０６：Ｙｅｓ）、制御はステップＳ４０９に進み、音声再生中の他の音声認識装置１０１−ｊがないと判定された場合（ステップＳ４０６：Ｎｏ）、制御はステップＳ４０７に進む。具体的には、例えば、制御部１４１は、他の音声認識装置１０１−ｊのいずれかから再生中情報を受信した場合、音声再生中の他の音声認識装置１０１−ｊがあると判定し、再生中情報を受信していない場合、音声再生中の他の音声認識装置１０１−ｊが無いと判定する。 In step S406, the control unit 141 determines whether or not there is another voice recognition device 101-j during voice reproduction, and if it is determined that there is another voice recognition device 101-j during voice reproduction (step S406). S406: Yes), the control proceeds to step S409, and when it is determined that there is no other voice recognition device 101-j during voice reproduction (step S406: No), the control proceeds to step S407. Specifically, for example, when the control unit 141 receives the playback information from any of the other voice recognition devices 101-j, the control unit 141 determines that there is another voice recognition device 101-j during voice playback, and determines that there is another voice recognition device 101-j. If the playback information is not received, it is determined that there is no other voice recognition device 101-j during voice playback.

ステップＳ４０７において、制御部１４１は、自装置（音声認識装置１０１−１）が音声を再生中か否か判定する。例えば、制御部１４１は、音声処理部１７１が音声信号を出力しているか否か、および制御部１４１が発話データまたは音声信号を出力しているか否かに基づいて、自装置（音声認識装置１０１−１）が音声を再生中か否か判定する。自装置（音声認識装置１０１−１）が音声を再生中と判定された場合（ステップＳ４０７：Ｙｅｓ）、制御はステップＳ４１１に進み、自装置（音声認識装置１０１−１）が音声を再生中でないと判定された場合（ステップＳ４０７：Ｎｏ）、制御はステップＳ４０８に進む。 In step S407, the control unit 141 determines whether or not the own device (voice recognition device 101-1) is playing back the voice. For example, the control unit 141 owns the device (voice recognition device 101) based on whether or not the voice processing unit 171 outputs a voice signal and whether or not the control unit 141 outputs utterance data or a voice signal. -1) determines whether or not the sound is being played. When it is determined that the own device (voice recognition device 101-1) is playing the voice (step S407: Yes), the control proceeds to step S411, and the own device (voice recognition device 101-1) is not playing the voice. If it is determined (step S407: No), the control proceeds to step S408.

ステップＳ４０８において、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさ（受信音量）が音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいか否か判定する。すなわち、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中でない他の音声認識装置１０１−ｊがあるか否か判定する。具体的には、例えば、制御部１４１は、ステップＳ４０２で取得した自装置の音量情報とステップＳ４０４で受信した他の音声認識装置１０１−ｊから受信した音量情報から、自装置に入力されたホットワードの音声の大きさが音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいか否か判定する。自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいと判定された場合（すなわち、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中でない他の音声認識装置１０１−ｊがない場合）（ステップＳ４０８：Ｙｅｓ）、制御はステップＳ４１１に進み、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きくないと判定された場合（すなわち、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中でない他の音声認識装置１０１−ｊがある場合）（ステップＳ４０８：Ｎｏ）、制御はステップＳ４０１に戻る。 In step S408, the control unit 141 inputs the volume (reception volume) of the voice of the hot word input to the own device (voice recognition device 101-1) to another voice recognition device 101-j during which the voice is not being reproduced. It is determined whether or not the volume of the hot word is larger than that of the voice. That is, the control unit 141 has another voice recognition device 101-j in which the voice of the hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1) is input and the voice is not being reproduced. Judge whether or not. Specifically, for example, the control unit 141 is hot input to the own device from the volume information of the own device acquired in step S402 and the volume information received from another voice recognition device 101-j received in step S404. It is determined whether or not the loudness of the voice of the word is larger than the loudness of the voice of the hot word input to the other voice recognition device 101-j during which the voice is not being reproduced. It was determined that the loudness of the hot word input to the own device (voice recognition device 101-1) is larger than the loudness of the hot word input to another voice recognition device 101-j during voice reproduction. Case (that is, when there is no other voice recognition device 101-j that is not playing the input voice with a hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1)) ( Step S408: Yes), the control proceeds to step S411, and the loudness of the hot word voice input to the own device (voice recognition device 101-1) is input to another voice recognition device 101-j during voice reproduction. When it is determined that the volume of the voice of the hot word is not larger than the volume of the voice of the hot word (that is, the voice of the hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1) is being played back. If there is another voice recognition device 101-j that is not (step S408: No), the control returns to step S401.

ステップＳ４０９において、制御部１４１は、自装置（音声認識装置１０１−１）が音声を再生中か否か判定する。例えば、制御部１４１は、音声処理部１７１が音声信号を出力しているか否か、および制御部１４１が発話データまたは音声信号を出力しているか否かに基づいて、自装置（音声認識装置１０１−１）が音声を再生中か否か判定する。自装置（音声認識装置１０１−１）が音声を再生中と判定された場合（ステップＳ４０９：Ｙｅｓ）、制御はステップＳ４１０に進み、自装置（音声認識装置１０１−１）が音声を再生中でないと判定された場合（ステップＳ４０９：Ｎｏ）、制御はステップＳ４０１に戻る。 In step S409, the control unit 141 determines whether or not the own device (voice recognition device 101-1) is playing back the voice. For example, the control unit 141 owns the device (voice recognition device 101) based on whether or not the voice processing unit 171 outputs a voice signal and whether or not the control unit 141 outputs utterance data or a voice signal. -1) determines whether or not the sound is being played. When it is determined that the own device (voice recognition device 101-1) is playing the voice (step S409: Yes), the control proceeds to step S410, and the own device (voice recognition device 101-1) is not playing the voice. If it is determined (step S409: No), the control returns to step S401.

ステップＳ４１０において、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさ（受信音量）が音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいか否か判定する。すなわち、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中の他の音声認識装置１０１−ｊがあるか否か判定する。具体的には、例えば、制御部１４１は、他の音声認識装置１０１−ｊから再生中情報を受信したか否かに基づいて、他の音声認識装置１０１−ｊが音声再生中であるか判定する。例えば、制御部１４１は、音声認識装置１０１−２から再生中情報を受信した場合、音声認識装置１０１−２は音声再生中であると判定する。そして、制御部１４１は、ステップＳ４０２で取得した自装置の音量情報とステップＳ４０４で受信した音声再生中と判定された他の音声認識装置１０１−ｊから受信した音量情報から、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいか否か判定する。自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいと判定された場合（すなわち、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中の他の音声認識装置１０１−ｊがない場合）（ステップＳ４１０：Ｙｅｓ）、制御はステップＳ４１１に進み、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きくないと判定された場合（すなわち、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中の他の音声認識装置１０１−ｊがある場合）（ステップＳ４１０：Ｎｏ）、制御はステップＳ４０１に戻る。 In step S410, the control unit 141 inputs the volume (reception volume) of the hot word voice input to its own device (voice recognition device 101-1) to another voice recognition device 101-j during voice reproduction. It is determined whether or not the volume of the hot word is larger than that of the voice. That is, the control unit 141 has another voice recognition device 101-j during voice reproduction in which a hot word voice larger than the hot word voice input to the own device (voice recognition device 101-1) is input. Judge whether or not. Specifically, for example, the control unit 141 determines whether or not the other voice recognition device 101-j is playing a voice based on whether or not the playback information is received from the other voice recognition device 101-j. do. For example, when the control unit 141 receives the playback information from the voice recognition device 101-2, the control unit 141 determines that the voice recognition device 101-2 is playing the voice. Then, the control unit 141 uses its own device (voice recognition) from the volume information of its own device acquired in step S402 and the volume information received from another voice recognition device 101-j determined to be playing the voice received in step S404. It is determined whether or not the loudness of the hot word voice input to the device 101-1) is larger than the loudness of the hot word voice input to the other voice recognition device 101-j during voice reproduction. It was determined that the loudness of the hot word input to the own device (voice recognition device 101-1) is larger than the loudness of the hot word input to the other voice recognition device 101-j during voice reproduction. Case (that is, when there is no other voice recognition device 101-j during voice reproduction in which a hot word voice larger than the hot word voice input to the own device (voice recognition device 101-1) is input) ( Step S410: Yes), the control proceeds to step S411, and the loudness of the hot word voice input to the own device (voice recognition device 101-1) is input to another voice recognition device 101-j during voice reproduction. When it is determined that the volume of the voice of the hot word is not larger than the volume of the voice of the hot word (that is, the voice of the hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1) is being played back. If there is another voice recognition device 101-j) (step S410: No), the control returns to step S401.

ステップＳ４１１において、制御部１４１は、ユーザ２０１からの音声による質問または指示の待ち受けを開始する。具体的に、例えば、制御部１４１は、音声認識部１３１から入力される音声認識結果に基づいて、ユーザ２０１からの質問または指示を検出し、ユーザ２０１の音声による質問または指示に応じて、情報の検索や装置の操作などの処理を行う音声アシスタントを開始する。 In step S411, the control unit 141 starts listening to a question or instruction by voice from the user 201. Specifically, for example, the control unit 141 detects a question or instruction from the user 201 based on the voice recognition result input from the voice recognition unit 131, and information is given in response to the voice question or instruction of the user 201. Start a voice assistant that performs processing such as searching for and operating the device.

ステップＳ４１２において、制御部１４１は、ユーザ２０１からの音声による質問または指示の入力があるか否か判定し、質問または指示の入力があると判定された場合、制御はステップＳ４１３に進む。質問または指示の入力がないと判定された場合、制御部１４１は、ステップＳ４１４に進む。具体的に、例えば、制御部１４１は、音声認識部１３１から入力される音声認識結果に基づいて、ユーザ２０１からの質問または指示が入力されたか否か判定し、質問または指示の入力があると判定された場合、制御はステップＳ４１３に進む。 In step S412, the control unit 141 determines whether or not there is a voice question or instruction input from the user 201, and if it is determined that there is a question or instruction input, the control proceeds to step S413. If it is determined that there is no input of a question or an instruction, the control unit 141 proceeds to step S414. Specifically, for example, the control unit 141 determines whether or not a question or instruction from the user 201 has been input based on the voice recognition result input from the voice recognition unit 131, and if there is an input of the question or instruction. If determined, control proceeds to step S413.

ステップＳ４１３において、制御部１４１は、ユーザ２０１からの音声による質問または指示に応じた処理を行う。具体的には、例えば、制御部１４１は、ユーザ２０１からの質問に対する回答を検索し、当該回答のテキストデータ（発話データ）を音声処理部１７１に出力し、音声処理部１７１は当該回答を音声でスピーカー１８１から出力する。具体的には、例えば、制御部１４１は、ユーザ２０１からの音声による指示に応じた音声認識装置１０１−１の制御（例えば、電源のオン／オフ、音量の調整、または指示に応じた特定の機能の実行）、または他の音声認識装置への指示などの処理を行う。 In step S413, the control unit 141 performs a process in response to a voice question or instruction from the user 201. Specifically, for example, the control unit 141 searches for an answer to the question from the user 201, outputs the text data (utterance data) of the answer to the voice processing unit 171 and the voice processing unit 171 outputs the answer by voice. Output from speaker 181. Specifically, for example, the control unit 141 controls the voice recognition device 101-1 in response to a voice instruction from the user 201 (for example, power on / off, volume adjustment, or a specific response to the instruction. Perform functions) or perform processing such as instructing other speech recognition devices.

ステップＳ４１４において、制御部１４１は、ステップＳ４１１のユーザ２０１からの音声による質問または指示の待ち受けの開始から、予め定められた所定時間を経過したか判定し、所定時間経過したと判定された場合、制御はステップＳ４０１に戻り、所定時間経過していないと判定された場合、制御はステップＳ４１２に戻る。 In step S414, the control unit 141 determines whether a predetermined predetermined time has elapsed from the start of listening for a voice question or instruction from the user 201 in step S411, and if it is determined that the predetermined time has elapsed, the control unit 141 determines. The control returns to step S401, and if it is determined that the predetermined time has not elapsed, the control returns to step S412.

実施の形態の音声認識方法（その２）によれば、自装置が音声再生中であるか、および他の音声認識装置が音声再生中であることを示す再生中情報を受信したかに基づいて、ユーザの音声に応じた処理を行うことで、音声認識の精度を向上することができる。詳細には、例えば、他の音声認識装置が音声を再生中の場合は、自装置は音声認識を用いたユーザの音声に応じた処理は行わないので、他の音声認識装置が再生している音声を除去する必要がなくなり、音声を再生している他の音声認識装置自身が自身で再生している音声をマイクから入力された音声から除去（エコーキャンセル処理）して音声認識を行うので、音声認識の精度が向上する。 According to the voice recognition method (No. 2) of the embodiment, it is based on whether the own device is playing voice or another voice recognition device receives playing information indicating that the voice is being played. , The accuracy of voice recognition can be improved by performing processing according to the voice of the user. Specifically, for example, when another voice recognition device is playing a voice, the own device does not perform processing according to the user's voice using voice recognition, so that the other voice recognition device is playing. It is no longer necessary to remove the voice, and the other voice recognition device itself that is playing the voice removes the voice that is being played by itself from the voice input from the microphone (echo cancel processing), so voice recognition is performed. The accuracy of voice recognition is improved.

また、音声認識装置１０１−ｉは、音声認識方法（その１）および音声認識方法（その２）に代えて、以下に説明するような音声認識方法（その３）を行ってもよい。 Further, the voice recognition device 101-i may perform the voice recognition method (No. 3) as described below instead of the voice recognition method (No. 1) and the voice recognition method (No. 2).

上述の音声認識方法（その２）では、音声認識装置１０１−ｉのそれぞれは、受信音量の比較よりも音声再生中か否かの比較を優先させて、ユーザからの質問または指示に対する処理を行うか判定していた。 In the above-mentioned voice recognition method (No. 2), each of the voice recognition devices 101-i gives priority to the comparison of whether or not the voice is being played over the comparison of the received volume, and processes the question or instruction from the user. I was judging.

そのため、上述の音声認識方法（その２）では、図５に示すように、音声認識装置１０１−１、１０１−３は音声を再生しておらず、音声認識装置１０１−２は動画等のコンテンツの音声再生中である場合、音声認識装置１０１−１の近くのユーザ２０１が発話すると、ユーザ２０１にとって音声認識装置１０１−１よりも遠くにある音声認識装置１０１−２が応答する。しかしながら、図５に示すような場合では、ユーザ２０１は、音声認識装置１０１−２が音声再生中であっても、一般的には、音声認識装置１０１−１に話しかけている場合が多い。このような場合に、音声認識方法（その３）では、音声認識装置１０１−１が応答するようにする。 Therefore, in the above-mentioned voice recognition method (No. 2), as shown in FIG. 5, the voice recognition devices 101-1 and 101-3 do not reproduce the voice, and the voice recognition device 101-2 is the content such as a moving image. When the user 201 near the voice recognition device 101-1 speaks, the voice recognition device 101-2 farther than the voice recognition device 101-1 responds to the user 201. However, in the case shown in FIG. 5, the user 201 generally speaks to the voice recognition device 101-1 even when the voice recognition device 101-2 is playing a voice. In such a case, in the voice recognition method (No. 3), the voice recognition device 101-1 is made to respond.

図６Ａ〜６Ｄは、実施の形態に係る音声認識方法（その３）のフローチャートの一例である。図６Ａ〜６Ｄでは、音声認識装置１０１−１の処理について説明する。尚、音声認識装置１０１−２、１０１−３も同様に音声認識方法（その３）を行っている。 6A to 6D are examples of the flowchart of the voice recognition method (No. 3) according to the embodiment. 6A to 6D describe the processing of the voice recognition device 101-1. The voice recognition devices 101-2 and 101-3 also use the voice recognition method (No. 3) in the same manner.

ステップＳ６０１において、制御部１４１は、ユーザ２０１の発話にホットワードが含まれているか否か判定し、ユーザ２０１の発話にホットワードが含まれていると判定された場合、制御はステップＳ６０２に進む。具体的には、例えば、制御部１４１は、音声認識部１３１から入力される音声認識結果にホットワードが含まれているか否か判定し、ホットワードが含まれていると判定された場合、制御はステップＳ６０２に進む。ホットワードが含まれていないと判定された場合、制御部１４１は、音声認識結果が入力されるのを待つ。 In step S601, the control unit 141 determines whether or not the utterance of the user 201 contains a hot word, and if it is determined that the utterance of the user 201 contains a hot word, the control proceeds to step S602. .. Specifically, for example, the control unit 141 determines whether or not the voice recognition result input from the voice recognition unit 131 includes a hot word, and if it is determined that the hot word is included, the control unit 141 controls. Goes to step S602. If it is determined that the hot word is not included, the control unit 141 waits for the voice recognition result to be input.

ステップＳ６０２において、制御部１４１は、マイク１１１に入力される音声の大きさを示す音量情報を取得し、音量情報を他の音声認識装置１０１−ｊに送信する。具体的には、エコーキャンセル部１２１は、マイク１１１に入力されるホットワードの音声の大きさを算出し、算出した音声の大きさを示す音量情報を制御部１４１に出力し、制御部１４１は、音量情報を取得し、音量情報を他の音声認識装置１０１−ｊに送信する。音声の大きさは、例えば、音声の音圧レベルである。 In step S602, the control unit 141 acquires volume information indicating the volume of the voice input to the microphone 111, and transmits the volume information to another voice recognition device 101-j. Specifically, the echo canceling unit 121 calculates the volume of the voice of the hot word input to the microphone 111, outputs the volume information indicating the calculated voice volume to the control unit 141, and the control unit 141 outputs the volume information indicating the volume of the calculated voice. , Acquires the volume information and transmits the volume information to another voice recognition device 101-j. The loudness of the voice is, for example, the sound pressure level of the voice.

ステップＳ６０３において、制御部１４１は、自装置（音声認識装置１０１−１）が音声再生中であるか否か判定し、自装置（音声認識装置１０１−１）が音声再生中であると判定された場合、他の音声認識装置１０１−ｊに、自装置（音声認識装置１０１−１）が音声再生中であることを示す再生中情報を通信部１６１から他の音声認識装置１０１−ｊに送信する。 In step S603, the control unit 141 determines whether or not the own device (voice recognition device 101-1) is playing a voice, and determines that the own device (voice recognition device 101-1) is playing a voice. If this is the case, the communication unit 161 transmits to the other voice recognition device 101-j the playback information indicating that the own device (voice recognition device 101-1) is playing the voice. do.

ステップＳ６０４において、制御部１４１は、予め定められた所定時間、他の音声認識装置１０１−ｊから通知（音量情報または再生中情報）を受信するのを待つ。制御部１４１は、他の音声認識装置１０１−ｊが音量情報または再生中情報を送信した場合、音量情報または再生中情報を通信部１６１を介して受信する。 In step S604, the control unit 141 waits for a notification (volume information or playback information) to be received from another voice recognition device 101-j for a predetermined time. When another voice recognition device 101-j transmits volume information or playback information, the control unit 141 receives the volume information or playback information via the communication unit 161.

ステップＳ６０５において、制御部１４１は、ステップＳ６０４の所定時間の間に、他の音声認識装置１０１−ｊから通知（音量情報または再生中情報）を受信したか判定し、他の音声認識装置１０１−ｊから通知を受信したと判定された場合（ステップＳ６０５：Ｙｅｓ）、制御はステップＳ６０６に進み、他の音声認識装置１０１−ｊから通知を受信していないと判定された場合（ステップＳ６０５：Ｎｏ）、制御はステップＳ６１５に進む。 In step S605, the control unit 141 determines whether the notification (volume information or playback information) has been received from the other voice recognition device 101-j during the predetermined time of step S604, and determines whether the other voice recognition device 101- When it is determined that the notification has been received from j (step S605: Yes), the control proceeds to step S606, and when it is determined that the notification has not been received from the other voice recognition device 101-j (step S605: No). ), Control proceeds to step S615.

ステップＳ６０６において、制御部１４１は、音声再生中の他の音声認識装置１０１−ｊがあるか否か判定し、音声再生中の他の音声認識装置１０１−ｊがあると判定された場合（ステップＳ６０６：Ｙｅｓ）、制御はステップＳ６１０に進み、音声再生中の他の音声認識装置１０１−ｊがないと判定された場合（ステップＳ６０６：Ｎｏ）、制御はステップＳ６０７に進む。具体的には、例えば、制御部１４１は、他の音声認識装置１０１−ｊのいずれかから再生中情報を受信した場合、音声再生中の他の音声認識装置１０１−ｊがあると判定し、再生中情報を受信していない場合、音声再生中の他の音声認識装置１０１−ｊが無いと判定する。 In step S606, the control unit 141 determines whether or not there is another voice recognition device 101-j during voice reproduction, and if it is determined that there is another voice recognition device 101-j during voice reproduction (step). S606: Yes), the control proceeds to step S610, and when it is determined that there is no other voice recognition device 101-j during voice reproduction (step S606: No), the control proceeds to step S607. Specifically, for example, when the control unit 141 receives the playback information from any of the other voice recognition devices 101-j, the control unit 141 determines that there is another voice recognition device 101-j during voice playback, and determines that there is another voice recognition device 101-j. If the playback information is not received, it is determined that there is no other voice recognition device 101-j during voice playback.

ステップＳ６０７において、制御部１４１は、自装置（音声認識装置１０１−１）が音声を再生中か否か判定する。例えば、制御部１４１は、音声処理部１７１が音声信号を出力しているか否か、および制御部１４１が発話データまたは音声信号を出力しているか否かに基づいて、自装置（音声認識装置１０１−１）が音声を再生中か否か判定する。自装置（音声認識装置１０１−１）が音声を再生中と判定された場合（ステップＳ６０７：Ｙｅｓ）、制御はステップＳ６０９に進み、自装置（音声認識装置１０１−１）が音声を再生中でないと判定された場合（ステップＳ６０７：Ｎｏ）、制御はステップＳ６０８に進む。 In step S607, the control unit 141 determines whether or not the own device (voice recognition device 101-1) is playing back the voice. For example, the control unit 141 owns the device (voice recognition device 101) based on whether or not the voice processing unit 171 outputs a voice signal and whether or not the control unit 141 outputs utterance data or a voice signal. -1) determines whether or not the sound is being played. When it is determined that the own device (voice recognition device 101-1) is playing the voice (step S607: Yes), the control proceeds to step S609, and the own device (voice recognition device 101-1) is not playing the voice. If it is determined (step S607: No), the control proceeds to step S608.

ステップＳ６０８において、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさ（受信音量）が音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいか否か判定する。すなわち、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中でない他の音声認識装置１０１−ｊがあるか否か判定する。具体的には、例えば、制御部１４１は、ステップＳ６０２で取得した自装置の音量情報とステップＳ６０４で受信した他の音声認識装置１０１−ｊから受信した音量情報から、自装置に入力されたホットワードの音声の大きさが音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいか否か判定する。自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいと判定された場合（すなわち、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中でない他の音声認識装置１０１−ｊがない場合）（ステップＳ６０８：Ｙｅｓ）、制御はステップＳ６１５に進み、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きくないと判定された場合（すなわち、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中でない他の音声認識装置１０１−ｊがある場合）（ステップＳ６０８：Ｎｏ）、制御はステップＳ６０１に戻る。 In step S608, the control unit 141 inputs the volume (reception volume) of the voice of the hot word input to the own device (voice recognition device 101-1) to another voice recognition device 101-j during which the voice is not being reproduced. It is determined whether or not the volume of the hot word is larger than that of the voice. That is, the control unit 141 has another voice recognition device 101-j in which the voice of the hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1) is input and the voice is not being reproduced. Judge whether or not. Specifically, for example, the control unit 141 is hot input to the own device from the volume information of the own device acquired in step S602 and the volume information received from another voice recognition device 101-j received in step S604. It is determined whether or not the loudness of the voice of the word is larger than the loudness of the voice of the hot word input to the other voice recognition device 101-j during which the voice is not being reproduced. It was determined that the loudness of the hot word input to the own device (voice recognition device 101-1) is larger than the loudness of the hot word input to another voice recognition device 101-j during voice reproduction. Case (that is, when there is no other voice recognition device 101-j that is not playing the input voice with a hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1)) ( Step S608: Yes), the control proceeds to step S615, and the loudness of the hot word voice input to the own device (voice recognition device 101-1) is input to another voice recognition device 101-j during voice reproduction. When it is determined that the volume of the voice of the hot word is not larger than the volume of the voice of the hot word (that is, the voice of the hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1) is being played back. If there is another voice recognition device 101-j that is not (step S608: No), the control returns to step S601.

ステップＳ６０９において、制御部１４１は、音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさ（受信音量）のうちの最大の音声の大きさが自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさより十分大きいか否か判定する。具体的には、例えば、制御部１４１は、ステップＳ６０２で取得した自装置の音量情報とステップＳ６０４で受信した音声再生中でない他の音声認識装置１０１−ｊから受信した音量情報から、音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさが自装置に入力されたホットワードの音声の大きさより十分大きいか否か判定する。尚、他の音声認識装置１０１−ｊが音声再生中であるか否かは、当該他の音声認識装置１０１−ｊから再生中情報を受信したか否かにより判定される。 In step S609, the control unit 141 has its own device (voice) having the maximum voice volume (received volume) of the hot word voice input to the other voice recognition device 101-j that is not playing voice. It is determined whether or not the volume of the voice of the hot word input to the recognition device 101-1) is sufficiently larger than the volume. Specifically, for example, the control unit 141 is playing a voice from the volume information of its own device acquired in step S602 and the volume information received from another voice recognition device 101-j that is not playing the voice received in step S604. It is determined whether or not the maximum voice volume of the hot word voice input to the other voice recognition device 101-j is sufficiently larger than the hot word voice volume input to the own device. .. Whether or not the other voice recognition device 101-j is playing a voice is determined by whether or not the playback information is received from the other voice recognition device 101-j.

十分大きいか否かの判定に関して、例えば、制御部１４１は、音声再生中でない他の音声認識装置１０１−ｊから受信した１つ以上の音量情報のうち最大の音量情報の値Ａが、自装置の音量情報の値に所定値を加算した値Ｂよりも大きい場合に、音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさが自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさより十分大きいと判定する。また、例えば、制御部１４１は、音声再生中でない他の音声認識装置１０１−ｊから受信した１つ以上の音量情報のうち最大の音量情報の値Ａが、自装置の音量情報の値に所定値を乗算した値Ｂよりも大きい場合に、音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさが自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさより十分大きいと判定してもよい。音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさが自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさより十分大きいと判定された場合（ステップＳ６０９：Ｙｅｓ）、制御はステップＳ６０１に戻り、音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさが自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさより十分大きくない判定された場合（ステップＳ６０９：Ｎｏ）、制御はステップＳ６１５に進む。 Regarding the determination of whether or not it is sufficiently large, for example, the control unit 141 has its own device that has the maximum volume information value A of one or more volume information received from another voice recognition device 101-j that is not playing voice. When the value is larger than the value B obtained by adding a predetermined value to the value of the volume information of, the maximum voice volume of the hot word voice input to the other voice recognition device 101-j during voice playback is not performed. It is determined that the size is sufficiently larger than the volume of the voice of the hot word input to the own device (voice recognition device 101-1). Further, for example, the control unit 141 determines the value A of the maximum volume information among the one or more volume information received from another voice recognition device 101-j that is not playing voice as the value of the volume information of its own device. When the value is larger than the value B multiplied by the value, the maximum voice volume of the hot word voice input to the other voice recognition device 101-j that is not playing voice is the own device (voice recognition). It may be determined that the volume of the voice of the hot word input to the device 101-1) is sufficiently larger than the volume of the voice. The maximum voice volume of the hot word voice input to the other voice recognition device 101-j that is not playing voice is the hot word input to the own device (voice recognition device 101-1). If it is determined that the volume is sufficiently larger than the volume of the voice (step S609: Yes), the control returns to step S601, and the loudness of the hot word input to the other voice recognition device 101-j that is not playing the voice If it is determined that the maximum voice volume of is not sufficiently larger than the voice volume of the hot word input to the own device (voice recognition device 101-1) (step S609: No), the control proceeds to step S615.

ステップＳ６１０において、制御部１４１は、自装置（音声認識装置１０１−１）が音声を再生中か否か判定する。例えば、制御部１４１は、音声処理部１７１が音声信号を出力しているか否か、および制御部１４１が発話データまたは音声信号を出力しているか否かに基づいて、自装置（音声認識装置１０１−１）が音声を再生中か否か判定する。自装置（音声認識装置１０１−１）が音声を再生中と判定された場合（ステップＳ６１０：Ｙｅｓ）、制御はステップＳ６１１に進み、自装置（音声認識装置１０１−１）が音声を再生中でないと判定された場合（ステップＳ６１０：Ｎｏ）、制御はステップＳ６１３に進む。 In step S610, the control unit 141 determines whether or not the own device (voice recognition device 101-1) is playing back the voice. For example, the control unit 141 owns the device (voice recognition device 101) based on whether or not the voice processing unit 171 outputs a voice signal and whether or not the control unit 141 outputs utterance data or a voice signal. -1) determines whether or not the sound is being played. When it is determined that the own device (voice recognition device 101-1) is playing the voice (step S610: Yes), the control proceeds to step S611, and the own device (voice recognition device 101-1) is not playing the voice. If it is determined (step S610: No), the control proceeds to step S613.

ステップＳ６１１において、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさ（受信音量）が音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいか否か判定する。すなわち、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中の他の音声認識装置１０１−ｊがあるか否か判定する。具体的には、例えば、制御部１４１は、他の音声認識装置１０１−ｊから再生中情報を受信したか否かに基づいて、他の音声認識装置１０１−ｊが音声再生中であるか判定する。例えば、制御部１４１は、音声認識装置１０１−２から再生中情報を受信した場合、音声認識装置１０１−２は音声再生中であると判定する。そして、制御部１４１は、ステップＳ６０２で取得した自装置の音量情報とステップＳ６０４で受信した音声再生中と判定された他の音声認識装置１０１−ｊから受信した音量情報から、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいか否か判定する。自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいと判定された場合（すなわち、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中の他の音声認識装置１０１−ｊがない場合）（ステップＳ６１１：Ｙｅｓ）、制御はステップＳ６１２に進み、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きくないと判定された場合（すなわち、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中でない他の音声認識装置１０１−ｊがある場合）（ステップＳ６１１：Ｎｏ）、制御はステップＳ６０１に戻る。 In step S611, the control unit 141 inputs the volume (reception volume) of the voice of the hot word input to its own device (voice recognition device 101-1) to another voice recognition device 101-j during voice reproduction. It is determined whether or not the volume of the hot word is larger than that of the voice. That is, the control unit 141 has another voice recognition device 101-j during voice reproduction in which a hot word voice larger than the hot word voice input to the own device (voice recognition device 101-1) is input. Judge whether or not. Specifically, for example, the control unit 141 determines whether or not the other voice recognition device 101-j is playing a voice based on whether or not the playback information is received from the other voice recognition device 101-j. do. For example, when the control unit 141 receives the playback information from the voice recognition device 101-2, the control unit 141 determines that the voice recognition device 101-2 is playing the voice. Then, the control unit 141 uses its own device (voice recognition) from the volume information of its own device acquired in step S602 and the volume information received from another voice recognition device 101-j determined to be playing the voice received in step S604. It is determined whether or not the loudness of the hot word voice input to the device 101-1) is larger than the loudness of the hot word voice input to the other voice recognition device 101-j during voice reproduction. It was determined that the loudness of the hot word input to the own device (voice recognition device 101-1) is larger than the loudness of the hot word input to the other voice recognition device 101-j during voice reproduction. Case (that is, when there is no other voice recognition device 101-j during voice reproduction in which a hot word voice larger than the hot word voice input to the own device (voice recognition device 101-1) is input) ( Step S611: Yes), the control proceeds to step S612, and the loudness of the hot word voice input to the own device (voice recognition device 101-1) is input to the other voice recognition device 101-j during voice reproduction. When it is determined that the volume of the voice of the hot word is not larger than the volume of the voice of the hot word (that is, the voice of the hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1) is being played back. If there is another voice recognition device 101-j that is not (step S611: No), the control returns to step S601.

ステップＳ６１２において、制御部１４１は、音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさ（受信音量）のうちの最大の音声の大きさが自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさより十分大きいか否か判定する。具体的には、例えば、制御部１４１は、ステップＳ６０２で取得した自装置の音量情報とステップＳ６０４で受信した他の音声認識装置１０１−ｊから受信した音量情報から、音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさが自装置に入力されたホットワードの音声の大きさより十分大きいか否か判定する。尚、他の音声認識装置１０１−ｊが音声再生中であるか否かは、当該他の音声認識装置１０１−ｊから再生中情報を受信したか否かにより判定される。十分大きいか否かの判定方法に関しては、ステップ６０９で説明したものと同様であるため詳細な説明は省略する。音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさが自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさより十分大きいと判定された場合（ステップＳ６１２：Ｙｅｓ）、制御はステップＳ６０１に戻り、音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさが自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさより十分大きくない判定された場合（ステップＳ６１２：Ｎｏ）、制御はステップＳ６１５に進む。 In step S612, the control unit 141 has its own device (voice) having the maximum voice volume (received volume) of the hot word voice input to another voice recognition device 101-j that is not playing voice. It is determined whether or not the volume of the voice of the hot word input to the recognition device 101-1) is sufficiently larger than the volume. Specifically, for example, the control unit 141 uses the volume information of its own device acquired in step S602 and the volume information received from the other voice recognition device 101-j received in step S604 to perform other voices that are not being played back. It is determined whether or not the maximum voice volume of the hot word voice input to the recognition device 101-j is sufficiently larger than the hot word voice volume input to the own device. Whether or not the other voice recognition device 101-j is playing a voice is determined by whether or not the playback information is received from the other voice recognition device 101-j. Since the method for determining whether or not it is sufficiently large is the same as that described in step 609, detailed description thereof will be omitted. The maximum voice volume of the hot word voice input to the other voice recognition device 101-j that is not playing voice is the hot word input to the own device (voice recognition device 101-1). If it is determined that the volume is sufficiently larger than the volume of the voice (step S612: Yes), the control returns to step S601, and among the volume of the voice of the hot word input to the other voice recognition device 101-j that is not playing the voice. If it is determined that the maximum voice volume of is not sufficiently larger than the voice volume of the hot word input to the own device (voice recognition device 101-1) (step S612: No), the control proceeds to step S615.

ステップＳ６１３において、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさ（受信音量）が音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいか否か判定する。すなわち、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中でない他の音声認識装置１０１−ｊがあるか否か判定する。具体的には、例えば、制御部１４１は、ステップＳ６０２で取得した自装置の音量情報とステップＳ６０４で受信した他の音声認識装置１０１−ｊから受信した音量情報から、自装置に入力されたホットワードの音声の大きさが音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいか否か判定する。自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きいと判定された場合（すなわち、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中でない他の音声認識装置１０１−ｊがない場合）（ステップＳ６１３：Ｙｅｓ）、制御はステップＳ６１４に進み、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中でない他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさより大きくないと判定された場合（すなわち、自装置（音声認識装置１０１−１）に入力されたホットワードの音声よりも大きいホットワードの音声が入力された音声再生中でない他の音声認識装置１０１−ｊがある場合）（ステップＳ６１３：Ｎｏ）、制御はステップＳ６０１に戻る。 In step S613, the control unit 141 inputs the volume (reception volume) of the voice of the hot word input to the own device (voice recognition device 101-1) to another voice recognition device 101-j during which the voice is not being reproduced. It is determined whether or not the volume of the hot word is larger than that of the voice. That is, the control unit 141 has another voice recognition device 101-j in which the voice of the hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1) is input and the voice is not being reproduced. Judge whether or not. Specifically, for example, the control unit 141 is hot input to the own device from the volume information of the own device acquired in step S602 and the volume information received from another voice recognition device 101-j received in step S604. It is determined whether or not the loudness of the voice of the word is larger than the loudness of the voice of the hot word input to the other voice recognition device 101-j during which the voice is not being reproduced. It was determined that the loudness of the hot word input to the own device (voice recognition device 101-1) is larger than the loudness of the hot word input to another voice recognition device 101-j during voice reproduction. Case (that is, when there is no other voice recognition device 101-j that is not playing the input voice with a hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1)) ( Step S613: Yes), the control proceeds to step S614, and the loudness of the hot word voice input to the own device (voice recognition device 101-1) is input to another voice recognition device 101-j during voice reproduction. When it is determined that the volume of the voice of the hot word is not larger than the volume of the voice of the hot word (that is, the voice of the hot word larger than the voice of the hot word input to the own device (voice recognition device 101-1) is being played back. If there is another voice recognition device 101-j that is not (step S613: No), the control returns to step S601.

ステップＳ６１４において、制御部１４１は、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさ（受信音量）が音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさより十分大きいか否か判定する。具体的には、例えば、制御部１４１は、他の音声認識装置１０１−ｊから再生中情報を受信したか否かに基づいて、他の音声認識装置１０１−ｊが音声再生中であるか判定する。そして、制御部１４１は、ステップＳ６０２で取得した自装置の音量情報とステップＳ６０４で受信した音声再生中と判定された他の音声認識装置１０１−ｊから受信した音量情報から、自装置に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさより十分大きいか否か判定する。 In step S614, the control unit 141 inputs the volume (reception volume) of the voice of the hot word input to its own device (voice recognition device 101-1) to another voice recognition device 101-j during voice reproduction. It is determined whether or not the voice volume of the hot word is sufficiently larger than the maximum voice volume. Specifically, for example, the control unit 141 determines whether or not the other voice recognition device 101-j is playing a voice based on whether or not the playback information is received from the other voice recognition device 101-j. do. Then, the control unit 141 is input to the own device from the volume information of the own device acquired in step S602 and the volume information received from another voice recognition device 101-j determined to be playing the voice received in step S604. It is determined whether or not the voice volume of the hot word is sufficiently larger than the maximum voice volume of the voice volume of the hot word input to the other voice recognition device 101-j during voice reproduction.

十分大きいか否かの判定に関して、例えば、制御部１４１は、自装置の音量情報の値Ａが、音声再生中の他の音声認識装置１０１−ｊから受信した１つ以上の音量情報のうち最大の音量情報の値に所定値を加算した値Ｂよりも大きい場合に、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさより十分大きいと判定する。また、例えば、制御部１４１は、自装置の音量情報の値Ａが、音声再生中の他の音声認識装置１０１−ｊから受信した１つ以上の音量情報のうち最大の音量情報の値に所定値を乗算した値Ｂよりも大きい場合に、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさより十分大きいと判定してもよい。自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさより十分大きいと判定された場合（ステップＳ６１４：Ｙｅｓ）、制御はステップＳ６１５に進み、自装置（音声認識装置１０１−１）に入力されたホットワードの音声の大きさが音声再生中の他の音声認識装置１０１−ｊに入力されたホットワードの音声の大きさのうちの最大の音声の大きさより十分大きくないと判定された場合（ステップＳ６１４：Ｎｏ）、制御はステップＳ６０１に戻る。 Regarding the determination of whether or not it is sufficiently large, for example, the control unit 141 has the maximum volume information value A of its own device among one or more volume information received from another voice recognition device 101-j during voice reproduction. When the value of the hot word input to the own device (voice recognition device 101-1) is larger than the value B obtained by adding a predetermined value to the value of the volume information of the other voice recognition device during voice reproduction. It is determined that the voice volume of the hot word input to 101-j is sufficiently larger than the maximum voice volume. Further, for example, the control unit 141 determines that the volume information value A of its own device is the maximum volume information value among one or more volume information received from another voice recognition device 101-j during voice reproduction. When the value is larger than the value B multiplied by the value, the loudness of the voice of the hot word input to the own device (voice recognition device 101-1) is input to the other voice recognition device 101-j during voice reproduction. It may be determined that the voice volume of the hot word is sufficiently larger than the maximum voice volume. The loudness of the hot word input to the own device (voice recognition device 101-1) is the largest of the loudness of the hot word input to the other voice recognition device 101-j during voice reproduction. If it is determined that the volume is sufficiently larger than the volume of the voice (step S614: Yes), the control proceeds to step S615, and the volume of the voice of the hot word input to the own device (voice recognition device 101-1) is playing the voice. If it is determined that the voice volume of the hot word input to the other voice recognition device 101-j is not sufficiently larger than the maximum voice volume (step S614: No), the control returns to step S601. ..

ステップＳ６１５において、制御部１４１は、ユーザ２０１からの音声による質問または指示の待ち受けを開始する。具体的に、例えば、制御部１４１は、音声認識部１３１から入力される音声認識結果に基づいて、ユーザ２０１からの質問または指示を検出し、ユーザ２０１の音声による質問または指示に応じて、情報の検索や装置の操作などの処理を行う音声アシスタントを開始する。 In step S615, the control unit 141 starts listening to a question or instruction by voice from the user 201. Specifically, for example, the control unit 141 detects a question or instruction from the user 201 based on the voice recognition result input from the voice recognition unit 131, and information is given in response to the voice question or instruction of the user 201. Start a voice assistant that performs processing such as searching for and operating the device.

ステップＳ６１６において、制御部１４１は、ユーザ２０１からの音声による質問または指示の入力があるか否か判定し、質問または指示の入力があると判定された場合、制御はステップＳ６１７に進む。質問または指示の入力がないと判定された場合、制御部１４１は、ステップＳ６１８に進む。具体的に、例えば、制御部１４１は、音声認識部１３１から入力される音声認識結果に基づいて、ユーザ２０１からの質問または指示が入力されたか否か判定し、質問または指示の入力があると判定された場合、制御はステップＳ６１７に進む。 In step S616, the control unit 141 determines whether or not there is a voice question or instruction input from the user 201, and if it is determined that there is a question or instruction input, the control proceeds to step S617. If it is determined that there is no question or instruction input, the control unit 141 proceeds to step S618. Specifically, for example, the control unit 141 determines whether or not a question or instruction from the user 201 has been input based on the voice recognition result input from the voice recognition unit 131, and if there is an input of the question or instruction. If determined, control proceeds to step S617.

ステップＳ６１７において、制御部１４１は、ユーザ２０１からの音声による質問または指示に応じた処理を行う。具体的には、例えば、制御部１４１は、ユーザ２０１からの質問に対する回答を検索し、当該回答のテキストデータ（発話データ）を音声処理部１７１に出力し、音声処理部１７１は当該回答を音声でスピーカー１８１から出力する。具体的には、例えば、制御部１４１は、ユーザ２０１からの音声による指示に応じた音声認識装置１０１−１の制御（例えば、電源のオン／オフ、音量の調整、または指示に応じた特定の機能の実行）、または他の音声認識装置への指示などの処理を行う。 In step S617, the control unit 141 performs a process in response to a voice question or instruction from the user 201. Specifically, for example, the control unit 141 searches for an answer to the question from the user 201, outputs the text data (utterance data) of the answer to the voice processing unit 171 and the voice processing unit 171 outputs the answer by voice. Output from speaker 181. Specifically, for example, the control unit 141 controls the voice recognition device 101-1 in response to a voice instruction from the user 201 (for example, power on / off, volume adjustment, or a specific response to the instruction. Perform functions) or perform processing such as instructing other speech recognition devices.

ステップＳ６１８において、制御部１４１は、ステップＳ６１５のユーザ２０１からの音声による質問または指示の待ち受けの開始から、予め定められた所定時間を経過したか判定し、所定時間経過したと判定された場合、制御はステップＳ６０１に戻り、所定時間経過していないと判定された場合、制御はステップＳ６１６に戻る。 In step S618, the control unit 141 determines whether a predetermined predetermined time has elapsed from the start of listening for a voice question or instruction from the user 201 in step S615, and if it is determined that the predetermined time has elapsed, the control unit 141 determines. The control returns to step S601, and if it is determined that the predetermined time has not elapsed, the control returns to step S616.

実施の形態の音声認識方法（その３）によれば、自装置が音声再生中であるか、および他の音声認識装置が音声再生中であることを示す再生中情報を受信したかに基づいて、ユーザの音声に応じた処理を行うことで、音声認識の精度を向上することができる。 According to the voice recognition method (No. 3) of the embodiment, it is based on whether the own device is playing voice or another voice recognition device receives playing information indicating that the voice is being played. , The accuracy of voice recognition can be improved by performing processing according to the voice of the user.

また、実施の形態の音声認識方法（その３）によれば、ユーザと近い自装置が音声を再生しておらず、ユーザと遠い他の音声認識装置が音声再生中の場合でも、ユーザとの距離が近く、ユーザが話しかけていると考えられる自装置がユーザの音声に応じた処理を行うことができる。 Further, according to the voice recognition method (No. 3) of the embodiment, even when the own device close to the user does not play the voice and another voice recognition device far from the user is playing the voice, the user and the user. The distance is short, and the own device, which is considered to be the user talking to, can perform processing according to the user's voice.

（ソフトウェアによる実現例）
音声認識装置１０１の制御ブロック（特に、エコーキャンセル部１２１、音声認識部１３１、制御部１４１、および音声処理部１７１）は、集積回路（ＩＣ（Integrated Circuit）チップ）等に形成された論理回路（ハードウェア）によって実現可能であり、またＣＰＵ（Central Processing Unit）等のプロセッサを用いてソフトウェアによって実現してもよい。後者の場合、例えば、コンピュータである音声認識装置１０１は、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラムおよび各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭまたは記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ等を備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、エコーキャンセル部１２１、音声認識部１３１、制御部１４１、および音声処理部１７１として動作し、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路等を用いることができる。また、上記プログラムは、伝送可能な任意の伝送媒体を介して上記コンピュータに供給されてよい。 (Example of realization by software)
The control block of the voice recognition device 101 (particularly, the echo canceling unit 121, the voice recognition unit 131, the control unit 141, and the voice processing unit 171) is a logic circuit (especially, an integrated circuit (IC) chip) or the like. It can be realized by hardware), or it may be realized by software using a processor such as a CPU (Central Processing Unit). In the latter case, for example, the voice recognition device 101, which is a computer, has a CPU that executes an instruction of a program that is software that realizes each function, and a ROM in which the program and various data are readablely recorded by the computer (or CPU). Alternatively, it is equipped with a storage device (these are referred to as "recording media"), a RAM for developing the above program, and the like. Then, when the computer (or CPU) reads the program from the recording medium and executes it, it operates as an echo canceling unit 121, a voice recognition unit 131, a control unit 141, and a voice processing unit 171. Achieved. As the recording medium, a "non-temporary tangible medium", for example, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. Further, the program may be supplied to the computer via any transmission medium capable of transmission.

なお、本発明は、上述した実施の形態に限定されるものではなく変形可能であり、上記の構成は、実質的に同一の構成、同一の作用効果を奏する構成又は同一の目的を達成することができる構成で置き換えることができる。 It should be noted that the present invention is not limited to the above-described embodiment, but can be modified, and the above-mentioned configuration is to achieve substantially the same configuration, a configuration having the same action and effect, or the same object. Can be replaced with a configuration that allows.

１０システム
１０１音声認識装置
１１１マイク
１２１エコーキャンセル部
１３１音声認識部
１４１制御部
１５１記憶部
１６１通信部
１７１音声処理部
１８１スピーカー 10 System 101 Voice recognition device 111 Microphone 121 Echo canceling unit 131 Voice recognition unit 141 Control unit 151 Storage unit 161 Communication unit 171 Voice processing unit 181 Speaker

Claims

The voice input section where the user's voice is input, and
A voice recognition unit that recognizes the user's voice input to the voice input unit, and
A control unit that performs processing according to the user's voice recognized by the voice recognition unit based on whether or not another voice recognition device is playing another voice.
A voice recognition device equipped with.

The voice recognition device according to claim 1, wherein the control unit performs processing according to the user's voice recognized by the voice recognition unit when the other voice recognition device is not playing the other voice.

The control unit performs processing according to the user's voice based on whether or not the own device is playing a voice and whether or not the other voice recognition device is playing a voice. Or the voice recognition device according to 2.

The control unit performs processing according to the user's voice recognized by the voice recognition unit when the own device is playing a voice and the other voice recognition device is not playing the other voice. The voice recognition device according to claim 3.

The control unit has a voice volume of a specific word input to the voice input unit, a voice volume of the specific word input to the other voice recognition device, and the own device is playing a voice. The voice recognition device according to claim 3, wherein processing is performed according to the voice of the user based on whether or not the voice is being reproduced by the other voice recognition device.

In the control unit, the own device is playing voice, the other voice recognition device is playing the other voice, and the loudness of the voice of a specific word input to the voice input unit is said. The voice recognition device according to claim 5, wherein when the volume of the voice of the specific word input to another voice recognition device is larger than the volume of the voice of the specific word, processing is performed according to the voice of the user recognized by the voice recognition unit.

The control unit is not playing a voice by its own device, the other voice recognition device is playing the other voice, and the loudness of the voice of a specific word input to the voice input unit is the same. The voice recognition according to claim 5, wherein when the voice volume of the specific word input to another voice recognition device is sufficiently larger than the voice volume of the specific word, processing is performed according to the voice of the user recognized by the voice recognition unit. Device.

The user's voice is input,
The input voice of the user is recognized, and the voice is recognized.
Based on whether or not another voice recognition device is playing another voice, processing according to the user's voice recognized by the voice recognition unit is performed.
A speech recognition method with processing.

To the computer where the user's voice is input
The input voice of the user is recognized, and the voice is recognized.
Based on whether or not another voice recognition device is playing another voice, processing according to the user's voice recognized by the voice recognition unit is performed.
A program that executes processing.

To the computer where the user's voice is input
The input voice of the user is recognized, and the voice is recognized.
Based on whether or not another voice recognition device is playing another voice, processing according to the user's voice recognized by the voice recognition unit is performed.
A computer-readable recording medium that records the program that performs the process.