JP2019164615A

JP2019164615A - Information processing system and information processing method

Info

Publication number: JP2019164615A
Application number: JP2018052341A
Authority: JP
Inventors: 幸司粂谷; Koji Kumeya
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2018-03-20
Filing date: 2018-03-20
Publication date: 2019-09-26
Anticipated expiration: 2038-03-20
Also published as: JP7349533B2; JP2022133293A; JP7088703B2

Abstract

To provide an information processing system in which a processing unit other than a processing unit that has transmitted audio data can receive a processing result.SOLUTION: Processing units 2a to 2c each include a voice input unit 211, a voice data generation unit 215, a voice data transmission unit 213, and a processing result data reception unit 213. The voice input unit 211 inputs a first voice indicating a specific command and a second voice specifying at least one of the plurality of processing units 2a to 2c. The voice data generation unit 215 generates voice data, and the voice data transmission unit 213 transmits the voice data to a server 3. The processing result data reception unit 213 receives processing result data from the server 3. The processing result data indicates a result of execution of processing corresponding to the specific command. Among the plurality of processing units 2a to 2c, the processing unit that has transmitted the voice data and the processing unit that is specified by the voice data receive the processing result data.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理システム、及び情報処理方法に関する。 The present invention relates to an information processing system and an information processing method.

ユーザの発話に応じた検索結果をユーザに提供する情報提供システムが知られている（例えば、特許文献１参照）。特許文献１には、車載端末と、サーバと、検索エンジンとを備えた情報提供システムが開示されている。車載端末は、ユーザの発話を音声認識処理によって文字列に変換してサーバに送信する。サーバは、検索エンジンに対して、文字列に応じた検索を要求する。検索エンジンは検索結果をサーバに送信し、サーバは、検索エンジンから取得した検索結果を車載端末に送信する。 There is known an information providing system that provides a search result corresponding to a user's utterance to the user (for example, see Patent Document 1). Patent Document 1 discloses an information providing system including an in-vehicle terminal, a server, and a search engine. The in-vehicle terminal converts the user's utterance into a character string by voice recognition processing and transmits it to the server. The server requests the search engine to perform a search according to the character string. The search engine transmits the search result to the server, and the server transmits the search result acquired from the search engine to the in-vehicle terminal.

特開２０１７−１９４８５０号公報JP 2017-194850 A

しかしながら、特許文献１に開示された技術によれば、発話（音声）に対応する文字列をサーバに送信した端末のみが、検索結果（処理の結果）を受信する。したがって、発話（音声）に対応する文字列をサーバに送信した端末以外の端末は、検索結果（処理の結果）を受信することができない。 However, according to the technique disclosed in Patent Document 1, only a terminal that transmits a character string corresponding to an utterance (voice) to a server receives a search result (processing result). Therefore, terminals other than the terminal that has transmitted the character string corresponding to the speech (voice) to the server cannot receive the search result (processing result).

本発明は、上記課題に鑑み、音声データをサーバに送信した処理ユニットに加えて、音声データをサーバに送信した処理ユニット以外の処理ユニットも処理の結果を受信することができる情報処理システム、及び情報処理方法を提供することを目的とする。 In view of the above-described problems, the present invention provides an information processing system in which a processing unit other than a processing unit that has transmitted audio data to a server can receive a processing result in addition to the processing unit that has transmitted audio data to the server, and An object is to provide an information processing method.

本発明の情報処理システムは、複数の処理ユニットを備える。前記処理ユニットは、音声入力部と、音声データ生成部と、音声データ送信部と、少なくとも１つの処理結果データ受信部とを備える。前記音声入力部は、特定のコマンドを示す第１音声と、前記複数の処理ユニットのうちの少なくとも１つを特定する第２音声とを入力する。前記音声データ生成部は、前記第１音声に対応する第１音声データ及び前記第２音声に対応する第２音声データを生成する。前記音声データ送信部は、前記第１音声データ及び前記第２音声データをサーバに送信する。前記少なくとも１つの処理結果データ受信部は、前記サーバから処理結果データを受信する。前記処理結果データは、前記特定のコマンドに対応する処理の実行結果を示す。前記複数の処理ユニットのうち、前記第１音声データ及び前記第２音声データを送信した処理ユニットと、前記第２音声データによって特定された処理ユニットとが、前記処理結果データを受信する。 The information processing system of the present invention includes a plurality of processing units. The processing unit includes an audio input unit, an audio data generation unit, an audio data transmission unit, and at least one processing result data reception unit. The voice input unit inputs a first voice indicating a specific command and a second voice specifying at least one of the plurality of processing units. The voice data generation unit generates first voice data corresponding to the first voice and second voice data corresponding to the second voice. The voice data transmitting unit transmits the first voice data and the second voice data to a server. The at least one processing result data receiving unit receives processing result data from the server. The processing result data indicates an execution result of processing corresponding to the specific command. Among the plurality of processing units, the processing unit that transmitted the first audio data and the second audio data and the processing unit specified by the second audio data receive the processing result data.

本発明の情報処理方法は、特定のコマンドを示す第１音声と、複数の処理ユニットのうちの少なくとも１つを特定する第２音声とを入力するステップと、前記第１音声に対応する第１音声データ及び前記第２音声に対応する第２音声データを生成するステップと、前記第１音声データ及び前記第２音声データをサーバに送信するステップと、前記複数の処理ユニットのうち、前記第１音声データ及び前記第２音声データを送信した処理ユニットと、前記第２音声データによって特定された処理ユニットとが、処理結果データを前記サーバから受信するステップとを含む。前記処理結果データは、前記特定のコマンドに対応する処理の実行結果を示す。 The information processing method of the present invention includes a step of inputting a first voice indicating a specific command and a second voice specifying at least one of a plurality of processing units, and a first corresponding to the first voice. Generating audio data and second audio data corresponding to the second audio; transmitting the first audio data and the second audio data to a server; and among the plurality of processing units, the first The processing unit that has transmitted the audio data and the second audio data, and the processing unit specified by the second audio data receive processing result data from the server. The processing result data indicates an execution result of processing corresponding to the specific command.

本発明によれば、音声データをサーバに送信した処理ユニットに加えて、音声データをサーバに送信した処理ユニット以外の処理ユニットも処理の結果を受信することができる。 According to the present invention, in addition to the processing unit that has transmitted the audio data to the server, a processing unit other than the processing unit that has transmitted the audio data to the server can also receive the processing result.

本発明の実施形態１に係る情報処理システムの構成を示す図である。It is a figure which shows the structure of the information processing system which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る第１スマートスピーカの構成を示す図である。It is a figure which shows the structure of the 1st smart speaker which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係るサーバの構成を示す図である。It is a figure which shows the structure of the server which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る管理テーブルを示す図である。It is a figure which shows the management table which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る第１スマートスピーカの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the 1st smart speaker which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係るサーバの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the server which concerns on Embodiment 1 of this invention. 本発明の実施形態２に係る情報処理システムの構成を示す図である。It is a figure which shows the structure of the information processing system which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係る第１スマートスピーカの構成を示す図である。It is a figure which shows the structure of the 1st smart speaker which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係る第１サーバの構成を示す図である。It is a figure which shows the structure of the 1st server which concerns on Embodiment 2 of this invention. （ａ）は本発明の実施形態２に係る第１管理テーブルを示す図である。（ｂ）は本発明の実施形態２に係る第２管理テーブルを示す図である。(A) is a figure which shows the 1st management table which concerns on Embodiment 2 of this invention. (B) is a figure which shows the 2nd management table which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係る第１端末の構成を示す図である。It is a figure which shows the structure of the 1st terminal which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係る第１サーバの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the 1st server which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係る第１端末の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the 1st terminal which concerns on Embodiment 2 of this invention.

以下、図面を参照して本発明の実施形態を説明する。ただし、本発明は以下の実施形態に限定されない。なお、説明が重複する箇所については、適宜説明を省略する場合がある。また、図中、同一又は相当部分については同一の参照符号を付して説明を繰り返さない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the present invention is not limited to the following embodiments. In addition, about the location where description overlaps, description may be abbreviate | omitted suitably. In the drawings, the same or corresponding parts are denoted by the same reference numerals and description thereof will not be repeated.

［実施形態１］
図１は、実施形態１に係る情報処理システム１の構成を示す図である。図１に示すように、情報処理システム１は、第１処理ユニット２ａ〜第３処理ユニット２ｃと、サーバ３とを備える。 [Embodiment 1]
FIG. 1 is a diagram illustrating a configuration of an information processing system 1 according to the first embodiment. As shown in FIG. 1, the information processing system 1 includes a first processing unit 2 a to a third processing unit 2 c and a server 3.

本実施形態において、第１処理ユニット２ａは、第１スマートスピーカ２１ａを含む。第２処理ユニット２ｂは、第２スマートスピーカ２１ｂを含む。第３処理ユニット２ｃは、第３スマートスピーカ２１ｃを含む。第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃはそれぞれ音声入出力端末の一例である。 In the present embodiment, the first processing unit 2a includes a first smart speaker 21a. The second processing unit 2b includes a second smart speaker 21b. The third processing unit 2c includes a third smart speaker 21c. Each of the first smart speaker 21a to the third smart speaker 21c is an example of a voice input / output terminal.

第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃはそれぞれ、例えばインターネット回線を介して、サーバ３との間で通信を行う。具体的には、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃはそれぞれ、ユーザが発生した音声を入力し、入力した音声を音声データ（デジタルデータ）に変換してサーバ３へ送信する。 Each of the first smart speaker 21a to the third smart speaker 21c communicates with the server 3 via, for example, the Internet line. Specifically, each of the first smart speaker 21a to the third smart speaker 21c inputs voice generated by the user, converts the input voice into voice data (digital data), and transmits the voice data to the server 3.

詳しくは、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃはそれぞれ、起動コマンドを示すデータを記憶している。第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃは、ユーザが起動コマンドを示す音声を発声した場合、レディ状態となる。第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃは、レディ状態となってから所定の期間が経過する前にユーザが音声を発声すると、その音声を音声データに変換してサーバ３へ送信する。 Specifically, each of the first smart speaker 21a to the third smart speaker 21c stores data indicating an activation command. The first smart speaker 21a to the third smart speaker 21c are in a ready state when the user utters a voice indicating an activation command. The first smart speaker 21a to the third smart speaker 21c convert the sound into sound data and transmit it to the server 3 when the user utters the sound before the predetermined period elapses after becoming ready.

サーバ３は、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃのいずれか１つから音声データを受信すると、受信した音声データが特定のコマンドを示すか否かを判定する。音声データが特定のコマンドを示す場合、サーバ３は、特定のコマンドに対応する処理の実行結果を示す処理結果データを取得する。本実施形態において、処理結果データは音声データである。サーバ３は、音声データを送信したスマートスピーカに、処理結果データを送信する。なお、以下の説明において、音声データを送信したスマートスピーカを「音声送信スマートスピーカ」と記載する場合がある。 When the server 3 receives audio data from any one of the first smart speaker 21a to the third smart speaker 21c, the server 3 determines whether or not the received audio data indicates a specific command. When the voice data indicates a specific command, the server 3 acquires process result data indicating the execution result of the process corresponding to the specific command. In the present embodiment, the processing result data is audio data. The server 3 transmits the processing result data to the smart speaker that has transmitted the audio data. In the following description, a smart speaker that has transmitted audio data may be referred to as “audio transmitting smart speaker”.

本実施形態において、サーバ３は、音声送信スマートスピーカ以外のスマートスピーカにも処理結果データ（音声データ）を送信する。具体的には、サーバ３は、受信した音声データに、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃのうちの少なくとも１つを指定する指定キーワードが含まれる場合、音声送信スマートスピーカと、指定キーワードが指定するスマートスピーカとに、処理結果データ（音声データ）を送信する。なお、以下の説明において、指定キーワードが指定するスマートスピーカを「指定スマートスピーカ」と記載する場合がある。 In the present embodiment, the server 3 transmits the processing result data (voice data) to smart speakers other than the voice transmission smart speaker. Specifically, when the received voice data includes a designated keyword that designates at least one of the first smart speaker 21a to the third smart speaker 21c, the server 3 includes the voice transmitting smart speaker, the designated keyword, The processing result data (sound data) is transmitted to the smart speaker specified by. In the following description, the smart speaker designated by the designated keyword may be described as “designated smart speaker”.

続いて図１及び図２を参照して、第１スマートスピーカ２１ａの構成を説明する。図２は、実施形態１に係る第１スマートスピーカ２１ａの構成を示す図である。図２に示すように、第１スマートスピーカ２１ａは、音声入力部２１１と、音声出力部２１２と、通信部２１３と、記憶部２１４と、制御部２１５とを備える。 Next, the configuration of the first smart speaker 21a will be described with reference to FIGS. FIG. 2 is a diagram illustrating a configuration of the first smart speaker 21a according to the first embodiment. As shown in FIG. 2, the first smart speaker 21 a includes an audio input unit 211, an audio output unit 212, a communication unit 213, a storage unit 214, and a control unit 215.

音声入力部２１１は、ユーザが発声した音声を集音して、アナログ電気信号に変換する。アナログ電気信号は、制御部２１５に入力される。音声入力部２１１は、例えば、マイクロフォンである。なお、以下の説明において、ユーザが発声した音声を「ユーザ音声」と記載する場合がある。 The voice input unit 211 collects voice uttered by the user and converts it into an analog electrical signal. The analog electrical signal is input to the control unit 215. The voice input unit 211 is, for example, a microphone. In the following description, the voice uttered by the user may be referred to as “user voice”.

音声出力部２１２は、サーバ３から受信した音声データに対応する音声を出力する。音声出力部２１２は、例えば、スピーカである。 The sound output unit 212 outputs sound corresponding to the sound data received from the server 3. The audio output unit 212 is, for example, a speaker.

通信部２１３は、サーバ３との間の通信を制御する。通信部２１３は、例えば、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）ボード又は無線ＬＡＮボードを備える。具体的には、通信部２１３は、音声データをサーバ３に送信する。また、通信部２１３は、サーバ３から音声データを受信する。 The communication unit 213 controls communication with the server 3. The communication unit 213 includes, for example, a LAN (Local Area Network) board or a wireless LAN board. Specifically, the communication unit 213 transmits audio data to the server 3. The communication unit 213 receives audio data from the server 3.

本実施形態において、通信部２１３は、音声データ送信部の一例である。また、通信部２１３は、処理結果データ受信部の一例である。詳しくは、特定のコマンドを示す音声を音声入力部２１１が入力すると、通信部２１３は、特定のコマンドを示す音声データを送信する。更に、図１を参照して説明した指定キーワードを示す音声を音声入力部２１１が入力すると、通信部２１３は、指定キーワードを示す音声データを送信する。また、通信部２１３は、図１を参照して説明した処理結果データ（音声データ）を受信する。 In the present embodiment, the communication unit 213 is an example of an audio data transmission unit. The communication unit 213 is an example of a processing result data receiving unit. Specifically, when the voice input unit 211 inputs voice indicating a specific command, the communication unit 213 transmits voice data indicating the specific command. Furthermore, when the voice input unit 211 inputs the voice indicating the designated keyword described with reference to FIG. 1, the communication unit 213 transmits voice data indicating the designated keyword. Further, the communication unit 213 receives the processing result data (voice data) described with reference to FIG.

記憶部２１４は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）及びＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）のような半導体メモリーを備える。記憶部２１４は更に、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）のようなストレージデバイスを備えてもよい。記憶部２１４は、制御部２１５が実行する制御プログラムを記憶する。記憶部２１４は更に、図１を参照して説明した起動コマンドを示すデータを記憶する。 The storage unit 214 includes a semiconductor memory such as a RAM (Random Access Memory) and a ROM (Read Only Memory). The storage unit 214 may further include a storage device such as an HDD (Hard Disk Drive). The storage unit 214 stores a control program executed by the control unit 215. The storage unit 214 further stores data indicating the activation command described with reference to FIG.

制御部２１５は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、又はＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）のようなプロセッサを備える。制御部２１５は、記憶部２１４に記憶された制御プログラムに基づいて、第１スマートスピーカ２１ａの動作を制御する。 The control unit 215 includes a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an ASIC (Application Specific Integrated Circuit), or a DSP (Digital Signal Processor). The control unit 215 controls the operation of the first smart speaker 21a based on the control program stored in the storage unit 214.

具体的には、制御部２１５は、音声入力部２１１から入力されたアナログ電気信号（ユーザ音声）をデジタル信号（音声データ）に変換して、通信部２１３にデジタル信号を送信させる。また、制御部２１５は、通信部２１３が受信したデジタル信号（音声データ）をアナログ電気信号に変換して、音声出力部２１２に音声を出力させる。 Specifically, the control unit 215 converts an analog electrical signal (user voice) input from the voice input unit 211 into a digital signal (voice data), and causes the communication unit 213 to transmit the digital signal. In addition, the control unit 215 converts the digital signal (audio data) received by the communication unit 213 into an analog electric signal, and causes the audio output unit 212 to output audio.

詳しくは、制御部２１５は、音声入力部２１１がユーザ音声を入力すると、記憶部２１４に記憶された起動コマンドを示すデータを参照して、ユーザ音声に対応する音声データが起動コマンドを示すか否かを判定する。制御部２１５は、ユーザ音声に対応する音声データが起動コマンドを示す場合、レディ状態となる。制御部２１５は、レディ状態となってから所定の期間が経過する前に音声入力部２１１がユーザ音声を入力すると、ユーザ音声に対応する音声データを記憶部２１４に保存する。なお、記憶部２１４は、所定の期間を示すデータを記憶している。所定の期間は、例えば８秒間である。 Specifically, when the voice input unit 211 inputs a user voice, the control unit 215 refers to the data indicating the start command stored in the storage unit 214 and determines whether the voice data corresponding to the user voice indicates the start command. Determine whether. The control unit 215 enters a ready state when the voice data corresponding to the user voice indicates an activation command. When the voice input unit 211 inputs a user voice before a predetermined period has elapsed after entering the ready state, the control unit 215 stores voice data corresponding to the user voice in the storage unit 214. Note that the storage unit 214 stores data indicating a predetermined period. The predetermined period is, for example, 8 seconds.

本実施形態において、制御部２１５は、所定の期間が経過するまでの間、音声入力部２１１がユーザ音声を入力する度に、ユーザ音声に対応する音声データを記憶部２１４に保存する。制御部２１５は、所定の期間が経過すると、記憶部２１４に音声データが保存されているか否かを判定する。制御部２１５は、記憶部２１４に音声データが保存されている場合、記憶部２１４に保存されている音声データを通信部２１３に送信させる。 In the present embodiment, the control unit 215 saves the audio data corresponding to the user voice in the storage unit 214 every time the voice input unit 211 inputs the user voice until a predetermined period elapses. The control unit 215 determines whether audio data is stored in the storage unit 214 when a predetermined period has elapsed. When audio data is stored in the storage unit 214, the control unit 215 causes the communication unit 213 to transmit the audio data stored in the storage unit 214.

以上、図１及び図２を参照して、第１スマートスピーカ２１ａの構成を説明した。なお、第２スマートスピーカ２１ｂ及び第３スマートスピーカ２１ｃの構成は、第１スマートスピーカ２１ａの構成と同様であるため、その説明は割愛する。 The configuration of the first smart speaker 21a has been described above with reference to FIGS. In addition, since the structure of the 2nd smart speaker 21b and the 3rd smart speaker 21c is the same as that of the structure of the 1st smart speaker 21a, the description is omitted.

続いて図１及び図３を参照して、サーバ３の構成を説明する。図３は、実施形態１に係るサーバ３の構成を示す図である。図３に示すように、サーバ３は、通信部３１と、音声認識部３２と、記憶部３３と、制御部３４とを備える。 Next, the configuration of the server 3 will be described with reference to FIGS. 1 and 3. FIG. 3 is a diagram illustrating a configuration of the server 3 according to the first embodiment. As shown in FIG. 3, the server 3 includes a communication unit 31, a voice recognition unit 32, a storage unit 33, and a control unit 34.

通信部３１は、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃとの間の通信を制御する。通信部３１は、例えば、ＬＡＮボード又は無線ＬＡＮボードを備える。具体的には、通信部３１は、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃから音声データを受信する。また、通信部３１は、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃへ音声データを送信する。 The communication unit 31 controls communication between the first smart speaker 21a to the third smart speaker 21c. The communication unit 31 includes, for example, a LAN board or a wireless LAN board. Specifically, the communication unit 31 receives audio data from the first smart speaker 21a to the third smart speaker 21c. The communication unit 31 also transmits audio data to the first smart speaker 21a to the third smart speaker 21c.

本実施形態において、通信部３１は、音声データ受信部の一例である。また、通信部３１は、処理結果データ送信部の一例である。詳しくは、通信部３１は、特定のコマンドを示す音声データを受信する。更に、通信部３１は、図１を参照して説明した指定キーワードを示す音声データを受信する。また、通信部３１は、図１を参照して説明した処理結果データ（音声データ）を送信する。 In the present embodiment, the communication unit 31 is an example of an audio data receiving unit. The communication unit 31 is an example of a processing result data transmission unit. Specifically, the communication unit 31 receives audio data indicating a specific command. Further, the communication unit 31 receives voice data indicating the designated keyword described with reference to FIG. Further, the communication unit 31 transmits the processing result data (voice data) described with reference to FIG.

音声認識部３２は、通信部３１が受信した音声データを音声認識技術によりテキスト情報（以下、「認識結果テキスト」と記載する場合がある。）に変換する。音声認識部３２は、例えば、音声認識ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）を備える。 The voice recognition unit 32 converts the voice data received by the communication unit 31 into text information (hereinafter may be referred to as “recognition result text”) by voice recognition technology. The voice recognition unit 32 includes, for example, a voice recognition LSI (Large Scale Integration).

記憶部３３は、例えばＲＡＭ及びＲＯＭのような半導体メモリーを備える。更に、記憶部３３は、ＨＤＤのようなストレージデバイスを備える。記憶部３３は、制御部３４が実行する制御プログラムを記憶する。記憶部３３は更に、管理テーブル３３１を記憶する。管理テーブル３３１には、図１を参照して説明した指定キーワードが登録される。 The storage unit 33 includes a semiconductor memory such as a RAM and a ROM. Furthermore, the storage unit 33 includes a storage device such as an HDD. The storage unit 33 stores a control program executed by the control unit 34. The storage unit 33 further stores a management table 331. The specified keyword described with reference to FIG. 1 is registered in the management table 331.

制御部３４は、例えばＣＰＵ又はＭＰＵのようなプロセッサを備える。また、制御部３４は、記憶部３３に記憶された制御プログラムに基づいて、サーバ３の動作を制御する。 The control unit 34 includes a processor such as a CPU or MPU. Further, the control unit 34 controls the operation of the server 3 based on the control program stored in the storage unit 33.

具体的には、制御部３４は、記憶部３３に記憶されているキーワード群を参照して、認識結果テキストに特定のコマンドを示す文字列が含まれるか否かを判定する。あるいは、制御部３４は、意図推定処理により認識結果テキストを解析して、認識結果テキストに特定のコマンドを示す文字列が含まれるか否かを判定する。制御部３４が意図推定処理を実行する場合、記憶部３３は、コーパスを記憶する。制御部３４は、認識結果テキストに特定のコマンドを示す文字列が含まれる場合、特定のコマンドに対応する処理を実行して処理結果データを取得する。例えば、特定のコマンドは、検索キーワードと、検索処理の実行を促すキーワードとを示す。この場合、制御部３４は、検索キーワードに基づいて検索処理を実行し、検索結果を示すデータを取得する。 Specifically, the control unit 34 refers to the keyword group stored in the storage unit 33 and determines whether or not a character string indicating a specific command is included in the recognition result text. Alternatively, the control unit 34 analyzes the recognition result text by the intention estimation process, and determines whether or not a character string indicating a specific command is included in the recognition result text. When the control unit 34 executes intention estimation processing, the storage unit 33 stores a corpus. When the recognition result text includes a character string indicating a specific command, the control unit 34 executes processing corresponding to the specific command and acquires processing result data. For example, the specific command indicates a search keyword and a keyword that prompts execution of search processing. In this case, the control unit 34 executes a search process based on the search keyword, and acquires data indicating the search result.

制御部３４は、通信部３１に処理結果データを送信させる。具体的には、制御部３４は、管理テーブル３３１を参照して、認識結果テキストに指定キーワードを示す文字列が含まれるか否かを判定する。認識結果テキストに指定キーワードを示す文字列が含まれていない場合、通信部３１は、音声送信スマートスピーカに処理結果データを送信する。一方、認識結果テキストに指定キーワードを示す文字列が含まれている場合、通信部３１は、音声送信スマートスピーカと指定スマートスピーカとに処理結果データを送信する。 The control unit 34 causes the communication unit 31 to transmit processing result data. Specifically, the control unit 34 refers to the management table 331 and determines whether or not a character string indicating the specified keyword is included in the recognition result text. When the character string indicating the specified keyword is not included in the recognition result text, the communication unit 31 transmits the processing result data to the voice transmission smart speaker. On the other hand, when the character string indicating the designated keyword is included in the recognition result text, the communication unit 31 transmits the processing result data to the voice transmission smart speaker and the designated smart speaker.

なお、サーバ３は、他のサーバに、特定のコマンドに対応する処理の実行を要求してもよい。この場合、サーバ３は、他のサーバから処理結果データを取得（受信）する。 The server 3 may request another server to execute processing corresponding to a specific command. In this case, the server 3 acquires (receives) processing result data from another server.

続いて図４を参照して、管理テーブル３３１を説明する。図４は、実施形態１に係る管理テーブル３３１を示す図である。図４に示すように、管理テーブル３３１は、スマートスピーカ登録欄４１と、指定キーワード登録欄４２とを有する。 Next, the management table 331 will be described with reference to FIG. FIG. 4 is a diagram illustrating the management table 331 according to the first embodiment. As shown in FIG. 4, the management table 331 has a smart speaker registration field 41 and a designated keyword registration field 42.

スマートスピーカ登録欄４１には、サーバ３との間で通信が可能なスマートスピーカを識別するスマートスピーカ識別情報が登録される。本実施形態では、スマートスピーカ登録欄４１に、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃを識別するスマートスピーカ識別情報が登録される。スマートスピーカ識別情報は、ユーザが任意に決定して登録する。 In the smart speaker registration field 41, smart speaker identification information for identifying a smart speaker that can communicate with the server 3 is registered. In the present embodiment, smart speaker identification information for identifying the first smart speaker 21 a to the third smart speaker 21 c is registered in the smart speaker registration field 41. The smart speaker identification information is arbitrarily determined and registered by the user.

指定キーワード登録欄４２には、スマートスピーカ登録欄４１に登録されたスマートスピーカを特定するキーワード（指定キーワード）が登録される。指定キーワードは、ユーザが任意に決定して登録する。例えば、指定キーワードは、スマートスピーカが設置された場所の名称であり得る。図４に示す指定キーワード登録欄４２には、第１スマートスピーカ２１ａの指定キーワードとして「Ａ地点」が登録されている。同様に、第２スマートスピーカ２１ｂの指定キーワードとして「Ｂ地点」が登録されており、第３スマートスピーカ２１ｃの指定キーワードとして「Ｃ地点」が登録されている。管理テーブル３３１は、スマートスピーカ識別情報と指定キーワードとを関連付ける。 In the designated keyword registration field 42, a keyword (designated keyword) for specifying the smart speaker registered in the smart speaker registration field 41 is registered. The specified keyword is arbitrarily determined and registered by the user. For example, the designated keyword may be a name of a place where the smart speaker is installed. In the designated keyword registration field 42 shown in FIG. 4, “A spot” is registered as the designated keyword of the first smart speaker 21a. Similarly, “B point” is registered as a designated keyword of the second smart speaker 21b, and “C point” is registered as a designated keyword of the third smart speaker 21c. The management table 331 associates smart speaker identification information with a specified keyword.

なお、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃは、音声データをサーバ３に送信する際に、自機のスマートスピーカ識別情報を送信する。スマートスピーカ識別情報は、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃの記憶部２１４（図２）に記憶されている。サーバ３は、音声データと共に受信したスマートスピーカ識別情報に基づいて、音声データを送信したスマートスピーカに処理結果データを送信する。 The first smart speaker 21a to the third smart speaker 21c transmit their own smart speaker identification information when transmitting audio data to the server 3. The smart speaker identification information is stored in the storage unit 214 (FIG. 2) of the first smart speaker 21a to the third smart speaker 21c. The server 3 transmits the processing result data to the smart speaker that transmitted the audio data based on the smart speaker identification information received together with the audio data.

続いて図１、図２及び図５を参照して、第１スマートスピーカ２１ａの動作を説明する。図５は、実施形態１に係る第１スマートスピーカ２１ａの動作を示すフローチャートである。図５に示す動作は、第１スマートスピーカ２１ａの音声入力部２１１がユーザ音声を入力するとスタートする。 Next, the operation of the first smart speaker 21a will be described with reference to FIG. 1, FIG. 2, and FIG. FIG. 5 is a flowchart illustrating the operation of the first smart speaker 21a according to the first embodiment. The operation shown in FIG. 5 starts when the voice input unit 211 of the first smart speaker 21a inputs user voice.

図５に示すように、音声入力部２１１がユーザ音声を入力すると、制御部２１５は、ユーザ音声に対応する音声データを生成する（ステップＳ１）。制御部２１５は、音声データを生成すると、記憶部２１４に記憶されている起動コマンドを示すデータを参照して、音声データが起動コマンドを示すか否かを判定する（ステップＳ２）。 As shown in FIG. 5, when the voice input unit 211 inputs user voice, the control unit 215 generates voice data corresponding to the user voice (step S1). When generating the voice data, the control unit 215 refers to the data indicating the activation command stored in the storage unit 214 and determines whether or not the voice data indicates the activation command (step S2).

制御部２１５は、音声データが起動コマンドを示すと判定すると（ステップＳ２のＹｅｓ）、所定の期間、レディ状態となる（ステップＳ３）。レディ状態において、制御部２１５は、音声入力部２１１がユーザ音声を入力すると、ユーザ音声に対応する音声データを記憶部２１４に保存する。制御部２１５は、所定の期間が経過すると、記憶部２１４に音声データが保存されているか否かを判定する（ステップＳ４）。 When the control unit 215 determines that the voice data indicates an activation command (Yes in step S2), the control unit 215 enters a ready state for a predetermined period (step S3). In the ready state, when the voice input unit 211 inputs user voice, the control unit 215 stores voice data corresponding to the user voice in the storage unit 214. When the predetermined period has elapsed, the control unit 215 determines whether or not audio data is stored in the storage unit 214 (step S4).

制御部２１５は、記憶部２１４に音声データが保存されていると判定すると（ステップＳ４のＹｅｓ）、記憶部２１４に保存された音声データと、記憶部２１４に記憶されているスマートスピーカ識別情報とをサーバ３に送信して（ステップＳ５）、図５に示す動作を終了する。 If the control unit 215 determines that the audio data is stored in the storage unit 214 (Yes in step S4), the control unit 215 stores the audio data stored in the storage unit 214 and the smart speaker identification information stored in the storage unit 214. Is transmitted to the server 3 (step S5), and the operation shown in FIG.

また、制御部２１５は、音声データが起動コマンドを示さないと判定した場合（ステップＳ２のＮｏ）、又は、記憶部２１４に音声データが保存されていないと判定した場合（ステップＳ４のＮｏ）、図５に示す動作を終了する。 In addition, when the control unit 215 determines that the voice data does not indicate an activation command (No in Step S2), or determines that the voice data is not stored in the storage unit 214 (No in Step S4), The operation shown in FIG.

以上、図１、図２及び図５を参照して第１スマートスピーカ２１ａの動作を説明した。なお、第２スマートスピーカ２１ｂ及び第３スマートスピーカ２１ｃは、第１スマートスピーカ２１ａと同様に、図５に示す動作を実行する。 The operation of the first smart speaker 21a has been described above with reference to FIGS. In addition, the 2nd smart speaker 21b and the 3rd smart speaker 21c perform the operation | movement shown in FIG. 5 similarly to the 1st smart speaker 21a.

続いて図１、図３、図４及び図６を参照して、サーバ３の動作を説明する。図６は、実施形態１に係るサーバ３の動作を示すフローチャートである。図６に示す動作は、サーバ３の通信部３１が音声データ及びスマートスピーカ識別情報を受信するとスタートする。 Subsequently, the operation of the server 3 will be described with reference to FIGS. 1, 3, 4 and 6. FIG. 6 is a flowchart illustrating the operation of the server 3 according to the first embodiment. The operation shown in FIG. 6 starts when the communication unit 31 of the server 3 receives audio data and smart speaker identification information.

図６に示すように、通信部３１が音声データ及びスマートスピーカ識別情報を受信すると、音声認識部３２は、音声データをテキスト情報に変換して、認識結果テキストを生成する（ステップＳ１１）。この結果、制御部３４が、認識結果テキストを取得する。また、制御部３４は、通信部３１が音声データ及びスマートスピーカ識別情報を受信すると、処理結果データ（音声データ）の送信先として、通信部３１が受信したスマートスピーカ識別情報を記憶部３３に保存する。 As shown in FIG. 6, when the communication unit 31 receives voice data and smart speaker identification information, the voice recognition unit 32 converts the voice data into text information and generates a recognition result text (step S11). As a result, the control unit 34 acquires the recognition result text. In addition, when the communication unit 31 receives the audio data and the smart speaker identification information, the control unit 34 stores the smart speaker identification information received by the communication unit 31 in the storage unit 33 as a transmission destination of the processing result data (audio data). To do.

制御部３４は、認識結果テキストを取得すると、認識結果テキストから特定のコマンドを認識できるか否かを判定する（ステップＳ１２）。換言すると、制御部３４は、認識結果テキストに特定のコマンドを示す文字列が含まれるか否かを判定する。例えば、制御部３４は、記憶部３３に記憶されているキーワード群を参照して、認識結果テキストに特定のコマンドを示す文字列が含まれるか否かを判定する。あるいは、制御部３４は、記憶部３３に記憶されているコーパスを用いた意図推定処理により、認識結果テキストに特定のコマンドを示す文字列が含まれるか否かを判定する。 When acquiring the recognition result text, the control unit 34 determines whether or not a specific command can be recognized from the recognition result text (step S12). In other words, the control unit 34 determines whether or not a character string indicating a specific command is included in the recognition result text. For example, the control unit 34 refers to the keyword group stored in the storage unit 33 and determines whether or not a character string indicating a specific command is included in the recognition result text. Or the control part 34 determines whether the character string which shows a specific command is contained in the recognition result text by the intention estimation process using the corpus memorize | stored in the memory | storage part 33. FIG.

制御部３４は、認識結果テキストから特定のコマンドを認識できると判定すると（ステップＳ１２のＹｅｓ）、認識した特定のコマンドを記憶部３３に保存する（ステップＳ１３）。 When determining that the specific command can be recognized from the recognition result text (Yes in Step S12), the control unit 34 stores the recognized specific command in the storage unit 33 (Step S13).

制御部３４は、認識した特定のコマンドを記憶部３３に保存すると、記憶部３３に記憶されている管理テーブル３３１を参照して、認識結果テキストから指定キーワードを認識できるか否かを判定する（ステップＳ１４）。換言すると、制御部３４は、認識結果テキストに指定キーワードを示す文字列が含まれるか否かを判定する。 When the control unit 34 stores the recognized specific command in the storage unit 33, the control unit 34 refers to the management table 331 stored in the storage unit 33 and determines whether or not the specified keyword can be recognized from the recognition result text ( Step S14). In other words, the control unit 34 determines whether or not a character string indicating the designated keyword is included in the recognition result text.

制御部３４は、認識結果テキストから指定キーワードを認識できると判定すると（ステップＳ１４のＹｅｓ）、認識した指定キーワードに対応するスマートスピーカ識別情報を、処理結果データ（音声データ）の送信先として記憶部３３に保存する（ステップＳ１５）。 When determining that the specified keyword can be recognized from the recognition result text (Yes in step S14), the control unit 34 stores the smart speaker identification information corresponding to the recognized specified keyword as the transmission destination of the processing result data (voice data). 33 is stored (step S15).

制御部３４は、スマートスピーカ識別情報を記憶部３３に保存すると、記憶部３３に保存した特定のコマンドに対応する処理結果データ（音声データ）を取得する（ステップＳ１６）。あるいは、制御部３４は、認識結果テキストから指定キーワードを認識できないと判定すると（ステップＳ１４のＮｏ）、記憶部３３に保存した特定のコマンドに対応する処理結果データ（音声データ）を取得する（ステップＳ１６）。具体的には、制御部３４は、特定のコマンドに対応する処理を実行して、処理結果データを取得する。あるいは、制御部３４は、他のサーバに、特定のコマンドに対応する処理の実行を要求して、他のサーバから処理結果データを取得する。 When storing the smart speaker identification information in the storage unit 33, the control unit 34 acquires processing result data (voice data) corresponding to the specific command stored in the storage unit 33 (step S16). Alternatively, when determining that the designated keyword cannot be recognized from the recognition result text (No in step S14), the control unit 34 acquires processing result data (voice data) corresponding to a specific command stored in the storage unit 33 (step S14). S16). Specifically, the control unit 34 executes processing corresponding to a specific command and acquires processing result data. Or the control part 34 requests | requires execution of the process corresponding to a specific command from another server, and acquires process result data from another server.

制御部３４は、処理結果データを取得すると、処理結果データの送信先として記憶部３３に保存したスマートスピーカ識別情報を参照して、通信部３１に処理結果データ（音声データ）を送信させ（ステップＳ１７）、図６に示す動作を終了する。詳しくは、認識結果テキストから指定キーワードを認識できた場合（ステップＳ１４のＹｅｓ）、サーバ３は、音声送信スマートスピーカと指定スマートスピーカとに処理結果データを送信する。一方、認識結果テキストから指定キーワードを認識できない場合（ステップＳ１４のＮｏ）、サーバ３は、音声送信スマートスピーカに処理結果データを送信する。 When acquiring the processing result data, the control unit 34 refers to the smart speaker identification information stored in the storage unit 33 as a transmission destination of the processing result data, and causes the communication unit 31 to transmit the processing result data (voice data) (step S1). S17), the operation shown in FIG. Specifically, when the specified keyword can be recognized from the recognition result text (Yes in step S14), the server 3 transmits the processing result data to the voice transmission smart speaker and the specified smart speaker. On the other hand, when the designated keyword cannot be recognized from the recognition result text (No in step S14), the server 3 transmits the processing result data to the voice transmission smart speaker.

また、制御部３４は、認識結果テキストから特定のコマンドを認識できないと判定すると（ステップＳ１２のＮｏ）、エラーフラグをＯＮにする（ステップＳ１８）。制御部３４は、エラーフラグをＯＮにすると、エラーメッセージを示す音声データを通信部３１に送信させ（ステップＳ１７）、図６に示す動作を終了する。詳しくは、サーバ３は、音声送信スマートスピーカにエラーメッセージ（音声データ）を送信する。エラーメッセージは、コマンドを認識できない旨を示す。 If the control unit 34 determines that the specific command cannot be recognized from the recognition result text (No in step S12), the control unit 34 turns on the error flag (step S18). When the error flag is set to ON, the control unit 34 causes the communication unit 31 to transmit voice data indicating an error message (step S17), and ends the operation illustrated in FIG. Specifically, the server 3 transmits an error message (audio data) to the audio transmission smart speaker. The error message indicates that the command cannot be recognized.

以上、図１〜図６を参照して、本発明の実施形態１について説明した。本実施形態によれば、音声データをサーバに送信したスマートスピーカ（処理ユニット）に加えて、音声データをサーバに送信したスマートスピーカ（処理ユニット）以外のスマートスピーカ（処理ユニット）も処理の結果を受信することができる。例えば、第１スマートスピーカ２１ａのユーザが起動コマンドを示す音声を発声した後、所定の期間内に、検索キーワードに基づく検索の実行を促す音声と、Ｂ地点を示す音声とを発声すると、検索キーワードに基づく検索結果を示す音声が、第１スマートスピーカ２１ａ及び第２スマートスピーカ２１ｂから出力される。 The first embodiment of the present invention has been described above with reference to FIGS. According to the present embodiment, in addition to the smart speaker (processing unit) that has transmitted the audio data to the server, the smart speaker (processing unit) other than the smart speaker (processing unit) that has transmitted the audio data to the server also displays the processing result. Can be received. For example, if the user of the first smart speaker 21a utters a voice indicating an activation command and then utters a voice prompting execution of a search based on a search keyword and a voice indicating a point B within a predetermined period, the search keyword A sound indicating a search result based on the first smart speaker 21a and the second smart speaker 21b is output.

なお、本実施形態において、サーバ３の記憶部３３は、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃのそれぞれの指定キーワードを記憶したが、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃのそれぞれの指定キーワードに加えて、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃの全てを指定する指定キーワードを更に記憶してもよい。例えば、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃの全てを指定する指定キーワードは、「オール（ＡＬＬ）」であり得る。この場合、例えば、第１スマートスピーカ２１ａのユーザが起動コマンドを示す音声を発声した後、所定の期間内に、検索キーワードに基づく検索の実行を促す音声と、「オール」を示す音声とを発声すると、検索キーワードに基づく検索結果を示す音声が、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃから出力される。 In the present embodiment, the storage unit 33 of the server 3 stores the designated keywords of the first smart speaker 21a to the third smart speaker 21c, but each of the first smart speaker 21a to the third smart speaker 21c. In addition to the specified keyword, a specified keyword that specifies all of the first smart speaker 21a to the third smart speaker 21c may be further stored. For example, the designated keyword that designates all of the first smart speaker 21a to the third smart speaker 21c may be “ALL”. In this case, for example, after the user of the first smart speaker 21a utters a sound indicating an activation command, a sound prompting execution of a search based on the search keyword and a sound indicating “all” are uttered within a predetermined period. Then, the sound indicating the search result based on the search keyword is output from the first smart speaker 21a to the third smart speaker 21c.

［実施形態２］
続いて図７〜図１２を参照して本発明の実施形態２について説明する。但し、実施形態１と異なる事項を説明し、実施形態１と同じ事項についての説明は割愛する。実施形態２は、情報処理システム１がウエブ会議システムである点で実施形態１と異なる。 [Embodiment 2]
Subsequently, Embodiment 2 of the present invention will be described with reference to FIGS. However, items different from the first embodiment will be described, and descriptions of the same items as the first embodiment will be omitted. The second embodiment is different from the first embodiment in that the information processing system 1 is a web conference system.

図７は、実施形態２に係る情報処理システム１の構成を示す図である。図７に示すように、情報処理システム１（ウエブ会議システム）は、第１処理ユニット２ａ〜第３処理ユニット２ｃと、第１サーバ３と、第２サーバ４とを備える。なお、第１サーバ３は、実施形態１において説明したサーバ３に対応する。 FIG. 7 is a diagram illustrating a configuration of the information processing system 1 according to the second embodiment. As shown in FIG. 7, the information processing system 1 (web conference system) includes a first processing unit 2 a to a third processing unit 2 c, a first server 3, and a second server 4. The first server 3 corresponds to the server 3 described in the first embodiment.

本実施形態において、第１処理ユニット２ａは、第１スマートスピーカ２１ａと、第１端末２２ａと、第１表示装置２３ａとを含む。第２処理ユニット２ｂは、第２スマートスピーカ２１ｂと、第２端末２２ｂと、第２表示装置２３ｂとを含む。第３処理ユニット２ｃは、第３スマートスピーカ２１ｃと、第３端末２２ｃと、第３表示装置２３ｃとを含む。第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃはそれぞれ音声入出力端末の一例であり、第１端末２２ａ〜第３端末２２ｃはそれぞれ情報処理端末の一例である。 In the present embodiment, the first processing unit 2a includes a first smart speaker 21a, a first terminal 22a, and a first display device 23a. The second processing unit 2b includes a second smart speaker 21b, a second terminal 22b, and a second display device 23b. The third processing unit 2c includes a third smart speaker 21c, a third terminal 22c, and a third display device 23c. The first smart speaker 21a to the third smart speaker 21c are examples of voice input / output terminals, and the first terminal 22a to the third terminal 22c are examples of information processing terminals.

また、本実施形態において、第１スマートスピーカ２１ａ及び第１表示装置２３ａは、第１端末２２ａの周辺装置であり、第２スマートスピーカ２１ｂ及び第２表示装置２３ｂは、第２端末２２ｂの周辺装置であり、第３スマートスピーカ２１ｃ及び第３表示装置２３ｃは、第３端末２２ｃの周辺装置である。 In the present embodiment, the first smart speaker 21a and the first display device 23a are peripheral devices of the first terminal 22a, and the second smart speaker 21b and the second display device 23b are peripheral devices of the second terminal 22b. The third smart speaker 21c and the third display device 23c are peripheral devices of the third terminal 22c.

本実施形態において、第１サーバ３は、例えばインターネット回線を介して、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃの各々との間で通信を行う。更に、第１サーバ３は、例えばインターネット回線を介して、第１端末２２ａ〜第３端末２２ｃの各々との間で通信を行う。 In the present embodiment, the first server 3 communicates with each of the first smart speaker 21a to the third smart speaker 21c via, for example, the Internet line. Furthermore, the first server 3 communicates with each of the first terminal 22a to the third terminal 22c via, for example, the Internet line.

第１サーバ３は、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃから受信した音声データに基づいて処理結果データを取得する。本実施形態において、処理結果データは、音声データ又は端末用コマンドである。 The first server 3 acquires processing result data based on the audio data received from the first smart speaker 21a to the third smart speaker 21c. In the present embodiment, the processing result data is voice data or a terminal command.

第１サーバ３は、処理結果データが音声データである場合、音声送信スマートスピーカに処理結果データ（音声データ）を送信する。更に、実施形態１において説明したように、第１サーバ３は、音声送信スマートスピーカ以外のスマートスピーカにも処理結果データ（音声データ）を送信する。詳しくは、第１サーバ３は、受信した音声データに指定キーワードが含まれる場合、音声送信スマートスピーカと指定スマートスピーカとに処理結果データ（音声データ）を送信する。 When the processing result data is audio data, the first server 3 transmits the processing result data (audio data) to the audio transmission smart speaker. Furthermore, as described in the first embodiment, the first server 3 transmits the processing result data (voice data) to smart speakers other than the voice transmission smart speaker. Specifically, when the specified keyword is included in the received voice data, the first server 3 transmits the processing result data (voice data) to the voice transmission smart speaker and the specified smart speaker.

第１サーバ３は、処理結果データが端末用コマンドである場合、音声送信スマートスピーカに接続している端末に処理結果データ（端末用コマンド）を送信する。以下、音声送信スマートスピーカに接続している端末を「接続端末」と記載する場合がある。 When the processing result data is a terminal command, the first server 3 transmits the processing result data (terminal command) to a terminal connected to the voice transmission smart speaker. Hereinafter, a terminal connected to the voice transmission smart speaker may be referred to as a “connection terminal”.

詳しくは、第１サーバ３は、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃと第１端末２２ａ〜第３端末２２ｃとの対応関係を記憶している。第１サーバ３は、第１端末２２ａ〜第３端末２２ｃから定期的に要求信号を受信する。要求信号は、第１サーバ３に対し、処理結果データ（端末用コマンド）の送信を要求する。第１サーバ３は、処理結果データが端末用コマンドである場合、接続端末から要求信号を受信すると、接続端末へ処理結果データ（端末用コマンド）を送信する。 Specifically, the first server 3 stores a correspondence relationship between the first smart speaker 21a to the third smart speaker 21c and the first terminal 22a to the third terminal 22c. The first server 3 periodically receives request signals from the first terminal 22a to the third terminal 22c. The request signal requests the first server 3 to transmit processing result data (terminal command). When the processing result data is a terminal command, the first server 3 transmits the processing result data (terminal command) to the connection terminal when receiving a request signal from the connection terminal.

更に、第１サーバ３は、接続端末以外の端末にも処理結果データ（端末用コマンド）を送信する。詳しくは、第１サーバ３は、受信した音声データに指定キーワードが含まれる場合、接続端末と、指定スマートスピーカに接続している端末とに、処理結果データ（端末用コマンド）を送信する。なお、以下の説明において、指定スマートスピーカに接続している端末を「指定端末」と記載する場合がある。 Further, the first server 3 transmits processing result data (terminal command) to terminals other than the connection terminal. Specifically, when the designated keyword is included in the received voice data, the first server 3 transmits processing result data (terminal command) to the connection terminal and the terminal connected to the designated smart speaker. In the following description, a terminal connected to a designated smart speaker may be referred to as a “designated terminal”.

第２サーバ４は、例えばインターネット回線を介して、第１端末２２ａ〜第３端末２２ｃの各々との間で通信を行うことにより、第１端末２２ａ〜第３端末２２ｃの間でウエブ会議を実行させる。具体的には、第２サーバ４は、第１端末２２ａから受信した音声データ及び撮像データを、第２端末２２ｂ及び第３端末２２ｃへ送信する。同様に、第２サーバ４は、第２端末２２ｂから受信した音声データ及び撮像データを、第１端末２２ａ及び第３端末２２ｃへ送信する。また、第２サーバ４は、第３端末２２ｃから受信した音声データ及び撮像データを、第１端末２２ａ及び第２端末２２ｂへ送信する。 The second server 4 performs a web conference between the first terminal 22a and the third terminal 22c by communicating with each of the first terminal 22a to the third terminal 22c via, for example, the Internet line. Let Specifically, the second server 4 transmits the audio data and imaging data received from the first terminal 22a to the second terminal 22b and the third terminal 22c. Similarly, the second server 4 transmits the audio data and imaging data received from the second terminal 22b to the first terminal 22a and the third terminal 22c. Further, the second server 4 transmits the audio data and the imaging data received from the third terminal 22c to the first terminal 22a and the second terminal 22b.

続いて図７を参照して、第１処理ユニット２ａに含まれる第１スマートスピーカ２１ａ、第１端末２２ａ、及び第１表示装置２３ａの動作について説明する。 Next, operations of the first smart speaker 21a, the first terminal 22a, and the first display device 23a included in the first processing unit 2a will be described with reference to FIG.

第１スマートスピーカ２１ａは、ユーザ音声に対応する音声データを第１端末２２ａへ送信する。また、第１スマートスピーカ２１ａは、第１端末２２ａから音声データを受信する。第１スマートスピーカ２１ａは、第１端末２２ａから受信した音声データに対応する音声を出力する。更に、第１スマートスピーカ２１ａは撮像部を備え、撮像データを第１端末２２ａへ送信する。 The first smart speaker 21a transmits voice data corresponding to the user voice to the first terminal 22a. The first smart speaker 21a receives audio data from the first terminal 22a. The first smart speaker 21a outputs a sound corresponding to the sound data received from the first terminal 22a. Further, the first smart speaker 21a includes an imaging unit, and transmits imaging data to the first terminal 22a.

また、第１スマートスピーカ２１ａは、レディ状態においても、音声データ及び撮像データを第１端末２２ａへ送信し、第１端末２２ａから音声データを受信する。更に、実施形態１において説明したように、第１スマートスピーカ２１ａがレディ状態となってから所定の期間が経過するまでの間にユーザが音声を発声すると、第１スマートスピーカ２１ａは、ユーザ音声に対応する音声データを第１サーバ３へ送信する。 The first smart speaker 21a also transmits audio data and imaging data to the first terminal 22a and receives audio data from the first terminal 22a even in the ready state. Further, as described in the first embodiment, when the user utters a voice during a predetermined period after the first smart speaker 21a becomes ready, the first smart speaker 21a Corresponding audio data is transmitted to the first server 3.

第１端末２２ａは、第１スマートスピーカ２１ａから受信した音声データ及び撮像データを第２サーバ４へ送信する。また、第１端末２２ａは、第２サーバ４から音声データ及び撮像データを受信する。第１端末２２ａは、第２サーバ４から受信した音声データを第１スマートスピーカ２１ａへ送信する。また、第１端末２２ａは、第２サーバ４から受信した撮像データを第１表示装置２３ａに出力する。第１表示装置２３ａは、第１端末２２ａから入力された撮像データに対応する映像を表示する。 The first terminal 22 a transmits the audio data and imaging data received from the first smart speaker 21 a to the second server 4. In addition, the first terminal 22 a receives audio data and imaging data from the second server 4. The first terminal 22a transmits the audio data received from the second server 4 to the first smart speaker 21a. In addition, the first terminal 22a outputs the imaging data received from the second server 4 to the first display device 23a. The first display device 23a displays a video corresponding to the imaging data input from the first terminal 22a.

更に、第１端末２２ａは、第１サーバ３へ要求信号を送信して、第１サーバ３に対し処理結果データ（端末用コマンド）の送信を要求する。第１端末２２ａは、第１サーバ３から端末用コマンドを受信すると、受信した端末用コマンドに対応する処理を実行する。 Further, the first terminal 22a transmits a request signal to the first server 3 and requests the first server 3 to transmit processing result data (terminal command). When receiving the terminal command from the first server 3, the first terminal 22 a executes a process corresponding to the received terminal command.

なお、第２処理ユニット２ｂに含まれる第２スマートスピーカ２１ｂ、第２端末２２ｂ、及び第２表示装置２３ｂの動作は、第１処理ユニット２ａに含まれる第１スマートスピーカ２１ａ、第１端末２２ａ、及び第１表示装置２３ａの動作と同様であるため、その説明は省略する。また、第３処理ユニット２ｃに含まれる第３スマートスピーカ２１ｃ、第３端末２２ｃ、及び第３表示装置２３ｃの動作は、第１処理ユニット２ａに含まれる第１スマートスピーカ２１ａ、第１端末２２ａ、及び第１表示装置２３ａの動作と同様であるため、その説明は省略する。 The operations of the second smart speaker 21b, the second terminal 22b, and the second display device 23b included in the second processing unit 2b are the same as the first smart speaker 21a, the first terminal 22a, Since the operation is the same as that of the first display device 23a, the description thereof is omitted. The operations of the third smart speaker 21c, the third terminal 22c, and the third display device 23c included in the third processing unit 2c are the same as the first smart speaker 21a, the first terminal 22a, Since the operation is the same as that of the first display device 23a, the description thereof is omitted.

続いて図７及び図８を参照して、実施形態２に係る第１スマートスピーカ２１ａの構成を説明する。図８は、実施形態２に係る第１スマートスピーカ２１ａの構成を示す図である。 Next, the configuration of the first smart speaker 21a according to the second embodiment will be described with reference to FIGS. FIG. 8 is a diagram illustrating a configuration of the first smart speaker 21a according to the second embodiment.

図８に示すように、第１スマートスピーカ２１ａは、音声入力部２１１、音声出力部２１２、第１通信部２１３、記憶部２１４、制御部２１５、撮像部２１６、及び第２通信部２１７を備える。なお、第１通信部２１３は、実施形態１において説明した通信部２１３に対応する。 As shown in FIG. 8, the first smart speaker 21 a includes an audio input unit 211, an audio output unit 212, a first communication unit 213, a storage unit 214, a control unit 215, an imaging unit 216, and a second communication unit 217. . The first communication unit 213 corresponds to the communication unit 213 described in the first embodiment.

本実施形態において、制御部２１５は音声データ生成部の一例である。また、第１通信部２１３は音声データ送信部の一例であるとともに、第１処理結果データ受信部の一例である。 In the present embodiment, the control unit 215 is an example of an audio data generation unit. The first communication unit 213 is an example of an audio data transmission unit and an example of a first processing result data reception unit.

撮像部２１６は、第１スマートスピーカ２１ａの周辺環境を撮像して画像信号（アナログ電気信号）を出力する。例えば、撮像部２１６は、ＣＣＤ（Ｃｈａｒｇｅ−ＣｏｕｐｌｅｄＤｅｖｉｃｅ）のような撮像素子を備える。 The imaging unit 216 images the surrounding environment of the first smart speaker 21a and outputs an image signal (analog electrical signal). For example, the imaging unit 216 includes an imaging element such as a CCD (Charge-Coupled Device).

第２通信部２１７は、第１端末２２ａとの間の通信を制御する。第２通信部２１７は、例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標）のような近距離無線通信規格に準じた無線通信モジュールを備える。あるいは、第２通信部２１７は、例えば、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）端子を備えるＵＳＢインターフェースであり得る。 The second communication unit 217 controls communication with the first terminal 22a. The second communication unit 217 includes a wireless communication module conforming to the short-range wireless communication standard such as Bluetooth (registered trademark). Alternatively, the second communication unit 217 can be, for example, a USB interface including a USB (Universal Serial Bus) terminal.

第２通信部２１７は、音声入力部２１１が入力した音声に対応する音声データを第１端末２２ａへ送信する。また、第２通信部２１７は、撮像部２１６から出力された画像信号に対応する撮像データを第１端末２２ａへ送信する。更に、第２通信部２１７は、第１端末２２ａから音声データを受信する。 The second communication unit 217 transmits audio data corresponding to the audio input by the audio input unit 211 to the first terminal 22a. In addition, the second communication unit 217 transmits imaging data corresponding to the image signal output from the imaging unit 216 to the first terminal 22a. Further, the second communication unit 217 receives audio data from the first terminal 22a.

制御部２１５は、音声入力部２１１から入力されたアナログ電気信号（ユーザ音声）をデジタル信号（音声データ）に変換して、第２通信部２１７にデジタル信号を送信させる。また、制御部２１５は、撮像部２１６から入力された画像信号（アナログ電気信号）をデジタル信号（撮像データ）に変換して、第２通信部２１７にデジタル信号を送信させる。 The control unit 215 converts the analog electrical signal (user voice) input from the voice input unit 211 into a digital signal (voice data), and causes the second communication unit 217 to transmit the digital signal. The control unit 215 converts the image signal (analog electrical signal) input from the imaging unit 216 into a digital signal (imaging data), and causes the second communication unit 217 to transmit the digital signal.

更に、制御部２１５は、第２通信部２１７が受信したデジタル信号（音声データ）をアナログ電気信号に変換して、音声出力部２１２に音声を出力させる。また、実施形態１と同様に、制御部２１５は、第１通信部２１３が受信したデジタル信号（音声データ）をアナログ電気信号に変換して、音声出力部２１２に音声を出力させる。したがって、本実施形態において、音声出力部２１２は、第１通信部２１３が第１サーバ３から受信した音声データに対応する音声に加えて、第２通信部２１７が第１端末２２ａから受信した音声データに対応する音声を出力する。 Further, the control unit 215 converts the digital signal (audio data) received by the second communication unit 217 into an analog electric signal, and causes the audio output unit 212 to output audio. Similarly to the first embodiment, the control unit 215 converts the digital signal (audio data) received by the first communication unit 213 into an analog electrical signal, and causes the audio output unit 212 to output audio. Therefore, in the present embodiment, the audio output unit 212 includes the audio received by the second communication unit 217 from the first terminal 22a in addition to the audio corresponding to the audio data received by the first communication unit 213 from the first server 3. Outputs audio corresponding to the data.

更に、制御部２１５は、レディ状態となってから所定の期間が経過する前に音声入力部２１１がユーザ音声を入力すると、ユーザ音声に対応する音声データを第２通信部２１７に送信させる一方で、ユーザ音声に対応する音声データの複製物を記憶部２１４に保存する。制御部２１５は、所定の期間が経過すると、記憶部２１４に保存されている音声データ（複製物）を第１通信部２１３に送信させる。 Furthermore, when the voice input unit 211 inputs the user voice before the predetermined period has elapsed since entering the ready state, the control unit 215 causes the second communication unit 217 to transmit voice data corresponding to the user voice. A copy of the voice data corresponding to the user voice is stored in the storage unit 214. When a predetermined period has elapsed, the control unit 215 causes the first communication unit 213 to transmit audio data (a duplicate) stored in the storage unit 214.

以上、図７及び図８を参照して、第１スマートスピーカ２１ａの構成を説明した。なお、第２スマートスピーカ２１ｂ及び第３スマートスピーカ２１ｃの構成は、第１スマートスピーカ２１ａの構成と同様であるため、その説明は割愛する。 The configuration of the first smart speaker 21a has been described above with reference to FIGS. In addition, since the structure of the 2nd smart speaker 21b and the 3rd smart speaker 21c is the same as that of the structure of the 1st smart speaker 21a, the description is omitted.

続いて図７及び図９を参照して、第１サーバ３の構成を説明する。図９は、実施形態２に係る第１サーバ３の構成を示す図である。図９に示すように、第１サーバ３は、通信部３１と、音声認識部３２と、記憶部３３と、制御部３４とを備える。 Next, the configuration of the first server 3 will be described with reference to FIGS. 7 and 9. FIG. 9 is a diagram illustrating a configuration of the first server 3 according to the second embodiment. As shown in FIG. 9, the first server 3 includes a communication unit 31, a voice recognition unit 32, a storage unit 33, and a control unit 34.

本実施形態において、記憶部３３は、第１管理テーブル３３１と、第２管理テーブル３３２とを記憶する。第１管理テーブル３３１は、実施形態１において説明した管理テーブル３３１に対応する。したがって、第１管理テーブル３３１には、指定キーワードが登録されている。第２管理テーブル３３２には、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃと第１端末２２ａ〜第３端末２２ｃとの対応関係が登録されている。 In the present embodiment, the storage unit 33 stores a first management table 331 and a second management table 332. The first management table 331 corresponds to the management table 331 described in the first embodiment. Therefore, the designated keyword is registered in the first management table 331. In the second management table 332, correspondence relationships between the first smart speaker 21a to the third smart speaker 21c and the first terminal 22a to the third terminal 22c are registered.

また、記憶部３３は、端末用記憶領域３３３を有する。本実施形態において、端末用記憶領域３３３は、第１記憶領域３３３ａと、第２記憶領域３３３ｂと、第３記憶領域３３３ｃとを含む。第１記憶領域３３３ａは、第１端末２２ａに送信する端末用コマンド（処理結果データ）を記憶する領域である。同様に、第２記憶領域３３３ｂは、第２端末２２ｂに送信する端末用コマンド（処理結果データ）を記憶する領域であり、第３記憶領域３３３ｃは、第３端末２２ｃに送信する端末用コマンド（処理結果データ）を記憶する領域である。 In addition, the storage unit 33 includes a terminal storage area 333. In the present embodiment, the terminal storage area 333 includes a first storage area 333a, a second storage area 333b, and a third storage area 333c. The first storage area 333a is an area for storing a terminal command (processing result data) to be transmitted to the first terminal 22a. Similarly, the second storage area 333b is an area for storing a terminal command (processing result data) to be transmitted to the second terminal 22b, and the third storage area 333c is a terminal command (to be transmitted to the third terminal 22c). Processing result data).

制御部３４は、通信部３１に処理結果データを送信させる。具体的には、制御部３４は、処理結果データが音声データであるか端末用コマンドであるかを判定する。制御部３４は、認識結果テキストに指定キーワードを示す文字列が含まれているか否かを示す判定結果と、処理結果データが音声データであるか端末用コマンドであるかを示す判定結果とに基づいて、処理結果データの送信先を決定する。 The control unit 34 causes the communication unit 31 to transmit processing result data. Specifically, the control unit 34 determines whether the processing result data is voice data or a terminal command. Based on the determination result indicating whether or not the character string indicating the specified keyword is included in the recognition result text, and the determination result indicating whether the processing result data is voice data or a terminal command. Thus, the transmission destination of the processing result data is determined.

詳しくは、処理結果データが音声データであり、認識結果テキストに指定キーワードを示す文字列が含まれていない場合、通信部３１は、実施形態１と同様に、音声送信スマートスピーカに処理結果データを送信する。また、処理結果データが音声データであり、認識結果テキストに指定キーワードを示す文字列が含まれている場合、通信部３１は、実施形態１と同様に、音声送信スマートスピーカと指定スマートスピーカとに処理結果データを送信する。 Specifically, when the processing result data is voice data, and the character string indicating the specified keyword is not included in the recognition result text, the communication unit 31 sends the processing result data to the voice transmission smart speaker as in the first embodiment. Send. In addition, when the processing result data is voice data and the recognition result text includes a character string indicating a designated keyword, the communication unit 31 performs a voice transmission smart speaker and a designated smart speaker as in the first embodiment. Process result data is transmitted.

一方、処理結果データが端末用コマンドあり、認識結果テキストに指定キーワードを示す文字列が含まれていない場合、制御部３４は、第１記憶領域３３３ａ〜第３記憶領域３３３ｃのうち、接続端末に対応する記憶領域に端末用コマンドを記憶させる。通信部３１が接続端末から要求信号を受信すると、制御部３４が、接続端末に対応する記憶領域から端末用コマンドを読み出し、通信部３１が、記憶部３３から読み出された端末用コマンドを接続端末に送信する。 On the other hand, when the processing result data is a command for a terminal and the character string indicating the specified keyword is not included in the recognition result text, the control unit 34 sets the connection terminal among the first storage area 333a to the third storage area 333c. The terminal command is stored in the corresponding storage area. When the communication unit 31 receives a request signal from the connection terminal, the control unit 34 reads the terminal command from the storage area corresponding to the connection terminal, and the communication unit 31 connects the terminal command read from the storage unit 33. Send to the terminal.

また、処理結果データが端末用コマンドあり、認識結果テキストに指定キーワードを示す文字列が含まれている場合、制御部３４は、第１記憶領域３３３ａ〜第３記憶領域３３３ｃのうち、接続端末に対応する記憶領域と、指定端末に対応する記憶領域とに端末用コマンドを記憶させる。通信部３１が接続端末から要求信号を受信すると、制御部３４が、接続端末に対応する記憶領域から端末用コマンドを読み出し、通信部３１が、記憶部３３から読み出された端末用コマンドを接続端末に送信する。また、通信部３１が指定端末から要求信号を受信すると、制御部３４が、指定端末に対応する記憶領域から端末用コマンドを読み出し、通信部３１が、記憶部３３から読み出された端末用コマンドを指定端末に送信する。 In addition, when the processing result data is a terminal command and the recognition result text includes a character string indicating a specified keyword, the control unit 34 selects a connection terminal from among the first storage area 333a to the third storage area 333c. The terminal command is stored in the corresponding storage area and the storage area corresponding to the designated terminal. When the communication unit 31 receives a request signal from the connection terminal, the control unit 34 reads the terminal command from the storage area corresponding to the connection terminal, and the communication unit 31 connects the terminal command read from the storage unit 33. Send to the terminal. When the communication unit 31 receives a request signal from the designated terminal, the control unit 34 reads out a terminal command from the storage area corresponding to the designated terminal, and the communication unit 31 reads out the terminal command from the storage unit 33. Is sent to the specified terminal.

続いて図１０（ａ）及び図１０（ｂ）を参照して、第１管理テーブル３３１及び第２管理テーブル３３２を説明する。図１０（ａ）は、実施形態２に係る第１管理テーブル３３１を示す図である。図１０（ｂ）は、実施形態２に係る第２管理テーブル３３２を示す図である。図１０（ａ）に示すように、第１管理テーブル３３１は、図４を参照して説明した管理テーブル３３１と同様に、スマートスピーカ登録欄４１と、指定キーワード登録欄４２とを有する。 Next, the first management table 331 and the second management table 332 will be described with reference to FIGS. 10 (a) and 10 (b). FIG. 10A illustrates the first management table 331 according to the second embodiment. FIG. 10B is a diagram illustrating the second management table 332 according to the second embodiment. As shown in FIG. 10A, the first management table 331 includes a smart speaker registration field 41 and a designated keyword registration field 42, as with the management table 331 described with reference to FIG.

図１０（ｂ）に示すように、第２管理テーブル３３２は、スマートスピーカ登録欄１０１と、端末登録欄１０２とを有する。スマートスピーカ登録欄１０１には、図１０（ａ）に示すスマートスピーカ登録欄４１と同様に、スマートスピーカ識別情報が登録される。端末登録欄１０２には、サーバ３との間で通信が可能な端末を識別する端末識別情報が登録される。第２管理テーブル３３２は、スマートスピーカ識別情報と端末識別情報とを関連付ける。 As illustrated in FIG. 10B, the second management table 332 includes a smart speaker registration field 101 and a terminal registration field 102. In the smart speaker registration column 101, smart speaker identification information is registered in the same manner as the smart speaker registration column 41 shown in FIG. In the terminal registration field 102, terminal identification information for identifying a terminal that can communicate with the server 3 is registered. The second management table 332 associates smart speaker identification information with terminal identification information.

本実施形態では、端末登録欄１０２に、第１端末２２ａ〜第３端末２２ｃを識別する端末識別情報が登録される。端末識別情報は、ユーザが任意に決定して登録する。例えば、ユーザは、スマートスピーカ識別情報を登録する際に、スマートスピーカ識別情報に関連付けて端末識別情報を登録し得る。 In the present embodiment, terminal identification information for identifying the first terminal 22 a to the third terminal 22 c is registered in the terminal registration field 102. The terminal identification information is arbitrarily determined and registered by the user. For example, when registering smart speaker identification information, a user can register terminal identification information in association with smart speaker identification information.

なお、第１端末２２ａ〜第３端末２２ｃは、要求信号を第１サーバ３に送信する際に、自機の端末識別情報を送信する。第１端末２２ａ〜第３端末２２ｃはそれぞれ、自機の端末識別情報を記憶している。第１サーバ３は、要求信号と共に受信した端末識別情報に基づいて、要求信号を送信した端末に処理結果データを送信する。 The first terminal 22a to the third terminal 22c transmit their own terminal identification information when transmitting the request signal to the first server 3. Each of the first terminal 22a to the third terminal 22c stores its own terminal identification information. The first server 3 transmits the processing result data to the terminal that transmitted the request signal based on the terminal identification information received together with the request signal.

続いて図７及び図１１を参照して、第１端末２２ａの構成を説明する。図１１は、実施形態２に係る第１端末２２ａの構成を示す図である。図１１に示すように、第１端末２２ａは、第１通信部２２１、第２通信部２２２、出力部２２３、記憶部２２４、及び制御部２２５を備える。本実施形態において、第１端末２２ａは、ノート型ＰＣ（パーソナルコンピュータ）又はデスクトップ型ＰＣのような情報処理装置である。あるいは、第１端末２２ａは、タブレットＰＣ又はスマートフォンのような携帯型の情報処理装置である。 Next, the configuration of the first terminal 22a will be described with reference to FIGS. FIG. 11 is a diagram illustrating a configuration of the first terminal 22a according to the second embodiment. As illustrated in FIG. 11, the first terminal 22a includes a first communication unit 221, a second communication unit 222, an output unit 223, a storage unit 224, and a control unit 225. In the present embodiment, the first terminal 22a is an information processing apparatus such as a notebook PC (personal computer) or a desktop PC. Alternatively, the first terminal 22a is a portable information processing device such as a tablet PC or a smartphone.

第１通信部２２１は、第１サーバ３との間の通信を制御する。また、第１通信部２２１は、第２サーバ４との間の通信を制御する。第１通信部２２１は、例えば、ＬＡＮボード又は無線ＬＡＮボードを備える。本実施形態において、第１通信部２２１は、第２処理結果データ受信部の一例である。また、第１通信部２２１は、要求信号送信部の一例である。 The first communication unit 221 controls communication with the first server 3. The first communication unit 221 controls communication with the second server 4. The first communication unit 221 includes, for example, a LAN board or a wireless LAN board. In the present embodiment, the first communication unit 221 is an example of a second processing result data receiving unit. The first communication unit 221 is an example of a request signal transmission unit.

具体的には、第１通信部２２１は、要求信号と、図１０（ｂ）を参照して説明した端末識別情報とを第１サーバ３に送信する。また、第１通信部２２１は、第１サーバ３から処理結果データ（端末用コマンド）を受信する。 Specifically, the first communication unit 221 transmits a request signal and the terminal identification information described with reference to FIG. 10B to the first server 3. The first communication unit 221 receives processing result data (terminal command) from the first server 3.

更に、第１通信部２２１は、音声データ及び撮像データを第２サーバ４に送信する。また、第１通信部２２１は、音声データ及び撮像データを第２サーバ４から受信する。 Further, the first communication unit 221 transmits audio data and imaging data to the second server 4. Further, the first communication unit 221 receives audio data and imaging data from the second server 4.

第２通信部２２２は、第１スマートスピーカ２１ａとの間の通信を制御する。第２通信部２２２は、例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標）のような近距離無線通信規格に準じた無線通信モジュールを備える。あるいは、第２通信部２２２は、例えば、ＵＳＢ端子を備えるＵＳＢインターフェースであり得る。 The second communication unit 222 controls communication with the first smart speaker 21a. The second communication unit 222 includes a wireless communication module that conforms to a near field communication standard such as Bluetooth (registered trademark). Alternatively, the second communication unit 222 may be a USB interface including a USB terminal, for example.

第２通信部２２２は、第１スマートスピーカ２１ａから音声データを受信する。また、第２通信部２２２は、第１スマートスピーカ２１ａへ音声データを送信する。更に、第２通信部２２２は、第１スマートスピーカ２１ａから撮像データを受信する。 The second communication unit 222 receives audio data from the first smart speaker 21a. The second communication unit 222 transmits audio data to the first smart speaker 21a. Further, the second communication unit 222 receives imaging data from the first smart speaker 21a.

出力部２２３は、撮像データを第１表示装置２３ａに出力する。出力部２２３は、例えば、ＨＤＭＩ（登録商標）端子又はＤｉｓｐｌａｙｐｏｒｔのようなデジタル映像インターフェースである。なお、出力部２２３は、Ｄ−ＳＵＢ端子のようなアナログ映像インターフェースであってもよい。 The output unit 223 outputs the imaging data to the first display device 23a. The output unit 223 is, for example, a digital video interface such as an HDMI (registered trademark) terminal or a display port. Note that the output unit 223 may be an analog video interface such as a D-SUB terminal.

記憶部２２４は、例えばＲＡＭ及びＲＯＭのような半導体メモリーを備える。更に、記憶部２２４は、ＨＤＤのようなストレージデバイスを備える。記憶部２２４は、制御部２２５が実行する制御プログラムを記憶する。また、記憶部２２４は、図１０（ｂ）を参照して説明した端末識別情報を記憶する。本実施形態において、記憶部２２４は更に、ウエブ会議用アプリケーションソフトウエアを記憶する。 The storage unit 224 includes a semiconductor memory such as a RAM and a ROM. Furthermore, the storage unit 224 includes a storage device such as an HDD. The storage unit 224 stores a control program executed by the control unit 225. The storage unit 224 stores the terminal identification information described with reference to FIG. In the present embodiment, the storage unit 224 further stores web conference application software.

制御部２２５は、例えばＣＰＵ又はＭＰＵのようなプロセッサを備える。また、制御部２２５は、記憶部２２４に記憶された制御プログラムに基づいて、第１端末２２ａの動作を制御する。 The control unit 225 includes a processor such as a CPU or MPU. In addition, the control unit 225 controls the operation of the first terminal 22a based on the control program stored in the storage unit 224.

具体的には、制御部２２５は、定期的に要求信号を生成し、第１通信部２２１を介して要求信号と端末識別情報とを第１サーバ３に送信する。例えば、制御部２２５は、３０秒ごと、又は１分ごとに、第１通信部２２１を介して要求信号と接続端末識別情報とを送信する。また、制御部２２５は、第１通信部２２１を介して第１サーバ３から処理結果データ（端末用コマンド）を受信すると、端末用コマンドに対応する処理を実行する。 Specifically, the control unit 225 periodically generates a request signal and transmits the request signal and the terminal identification information to the first server 3 via the first communication unit 221. For example, the control unit 225 transmits a request signal and connection terminal identification information via the first communication unit 221 every 30 seconds or every minute. In addition, when receiving the processing result data (terminal command) from the first server 3 via the first communication unit 221, the control unit 225 executes processing corresponding to the terminal command.

例えば、端末用コマンドは、ウエブ会議の終了を命令するコマンドであり得る。端末用コマンドがウエブ会議の終了を命令するコマンドである場合、制御部２２５は、ウエブ会議用アプリケーションソフトウエアの実行を停止する。あるいは、端末用コマンドは、印刷の実行を命令するコマンドであり得る。端末用コマンドが印刷の実行を命令するコマンドである場合、制御部２２５は、第１端末２２ａに接続されているプリンターに印刷を要求する。 For example, the terminal command may be a command for instructing the end of the web conference. When the terminal command is a command for instructing the end of the web conference, the control unit 225 stops the execution of the web conference application software. Alternatively, the terminal command may be a command for instructing execution of printing. When the terminal command is a command for instructing execution of printing, the control unit 225 requests the printer connected to the first terminal 22a to perform printing.

続いて図７及び図１１を参照して、第１端末２２ａの制御部２２５がウエブ会議用アプリケーションソフトウエアに基づいて実行する処理について説明する。制御部２２５は、ウエブ会議用アプリケーションソフトウエアを実行することにより、第２処理ユニット２ｂの第２端末２２ｂと第３処理ユニット２ｃの第３端末２２ｃとの間でウエブ会議を実行する。 Next, a process executed by the control unit 225 of the first terminal 22a based on the web conference application software will be described with reference to FIGS. The control unit 225 executes the web conference between the second terminal 22b of the second processing unit 2b and the third terminal 22c of the third processing unit 2c by executing the web conference application software.

具体的には、制御部２２５は、第２通信部２２２を介して第１スマートスピーカ２１ａから受信した音声データ及び撮像データを、第１通信部２２１を介して第２サーバ４へ送信する。この結果、第２処理ユニット２ｂの第２スマートスピーカ２１ｂ、及び第３処理ユニット２ｃの第３スマートスピーカ２１ｃから、第１スマートスピーカ２１ａに入力された音声が出力される。また、第２処理ユニット２ｂの第２表示装置２３ｂ、及び第３処理ユニット２ｃの第３表示装置２３ｃが、第１スマートスピーカ２１ａによって撮像された映像を表示する。 Specifically, the control unit 225 transmits audio data and imaging data received from the first smart speaker 21 a via the second communication unit 222 to the second server 4 via the first communication unit 221. As a result, the sound input to the first smart speaker 21a is output from the second smart speaker 21b of the second processing unit 2b and the third smart speaker 21c of the third processing unit 2c. In addition, the second display device 23b of the second processing unit 2b and the third display device 23c of the third processing unit 2c display images captured by the first smart speaker 21a.

また、制御部２２５は、第１通信部２２１を介して第２サーバ４から受信した音声データを、第２通信部２２２を介して第１スマートスピーカ２１ａに送信する。この結果、第１スマートスピーカ２１ａから、第２処理ユニット２ｂの第２スマートスピーカ２１ｂによって入力された音声が出力される。また、第１スマートスピーカ２１ａから、第３処理ユニット２ｃの第３スマートスピーカ２１ｃによって入力された音声が出力される。 In addition, the control unit 225 transmits the audio data received from the second server 4 via the first communication unit 221 to the first smart speaker 21a via the second communication unit 222. As a result, the sound input from the second smart speaker 21b of the second processing unit 2b is output from the first smart speaker 21a. Further, the sound input from the first smart speaker 21a by the third smart speaker 21c of the third processing unit 2c is output.

また、制御部２２５は、第１通信部２２１を介して第２サーバ４から受信した撮像データを、出力部２２３を介して第１表示装置２３ａに出力する。この結果、第１表示装置２３ａが、第２処理ユニット２ｂの第２スマートスピーカ２１ｂによって撮像された映像、及び第３処理ユニット２ｃの第３スマートスピーカ２１ｃによって撮像された映像を表示する。 In addition, the control unit 225 outputs imaging data received from the second server 4 via the first communication unit 221 to the first display device 23a via the output unit 223. As a result, the first display device 23a displays the video imaged by the second smart speaker 21b of the second processing unit 2b and the video imaged by the third smart speaker 21c of the third processing unit 2c.

以上、図７及び図１１を参照して、第１端末２２ａの構成を説明した。なお、第２端末２２ｂ及び第３端末２２ｃの構成は第１端末２２ａの構成と同様であるため、その説明は省略する。 The configuration of the first terminal 22a has been described above with reference to FIGS. Note that the configurations of the second terminal 22b and the third terminal 22c are the same as the configuration of the first terminal 22a, and thus description thereof is omitted.

続いて図７、図９、図１０（ａ）、図１０（ｂ）及び図１２を参照して、第１サーバ３の動作を説明する。図１２は、実施形態２に係る第１サーバ３の動作を示すフローチャートである。図１２に示す動作は、第１サーバ３の通信部３１が音声データ及びスマートスピーカ識別情報を受信するとスタートする。 Subsequently, the operation of the first server 3 will be described with reference to FIGS. 7, 9, 10 (a), 10 (b), and 12. FIG. 12 is a flowchart illustrating the operation of the first server 3 according to the second embodiment. The operation shown in FIG. 12 starts when the communication unit 31 of the first server 3 receives audio data and smart speaker identification information.

図１２に示すように、通信部３１が音声データ及びスマートスピーカ識別情報を受信すると、図６を参照して説明した動作と同様に、音声認識部３２が、音声データをテキスト情報に変換して、認識結果テキストを生成する（ステップＳ２１）。また、制御部３４は、通信部３１が音声データ及びスマートスピーカ識別情報を受信すると、処理結果データ（音声データ）の送信先として、通信部３１が受信したスマートスピーカ識別情報を記憶部３３に保存する。 As shown in FIG. 12, when the communication unit 31 receives voice data and smart speaker identification information, the voice recognition unit 32 converts the voice data into text information in the same manner as the operation described with reference to FIG. The recognition result text is generated (step S21). In addition, when the communication unit 31 receives the audio data and the smart speaker identification information, the control unit 34 stores the smart speaker identification information received by the communication unit 31 in the storage unit 33 as a transmission destination of the processing result data (audio data). To do.

制御部３４は、認識結果テキストを取得すると、図６を参照して説明した動作と同様に、認識結果テキストから特定のコマンドを認識できるか否かを判定する（ステップＳ２２）。 When acquiring the recognition result text, the control unit 34 determines whether or not a specific command can be recognized from the recognition result text, similarly to the operation described with reference to FIG. 6 (step S22).

制御部３４は、認識結果テキストから特定のコマンドを認識できると判定すると（ステップＳ２２のＹｅｓ）、図６を参照して説明した動作と同様に、認識した特定のコマンドを記憶部３３に保存する（ステップＳ２３）。 When determining that the specific command can be recognized from the recognition result text (Yes in step S22), the control unit 34 stores the recognized specific command in the storage unit 33 as in the operation described with reference to FIG. (Step S23).

制御部３４は、認識した特定のコマンドを記憶部３３に保存すると、図６を参照して説明した動作と同様に、記憶部３３に記憶されている第１管理テーブル３３１を参照して、認識結果テキストから指定キーワードを認識できるか否かを判定する（ステップＳ２４）。 When the control unit 34 stores the recognized specific command in the storage unit 33, the control unit 34 recognizes the specific command with reference to the first management table 331 stored in the storage unit 33, similarly to the operation described with reference to FIG. 6. It is determined whether or not the specified keyword can be recognized from the result text (step S24).

制御部３４は、認識結果テキストから指定キーワードを認識できると判定すると（ステップＳ２４のＹｅｓ）、図６を参照して説明した動作と同様に、認識した指定キーワードに対応するスマートスピーカ識別情報を、処理結果データ（音声データ）の送信先として記憶部３３に保存する（ステップＳ２５）。 When the control unit 34 determines that the designated keyword can be recognized from the recognition result text (Yes in step S24), the smart speaker identification information corresponding to the recognized designated keyword is obtained in the same manner as the operation described with reference to FIG. The processing result data (voice data) is stored in the storage unit 33 as a transmission destination (step S25).

制御部３４は、スマートスピーカ識別情報を記憶部３３に保存すると、記憶部３３に保存した特定のコマンドが、処理結果データとして音声データを取得させるコマンドであるのか、処理結果データとして端末用コマンドを取得させるコマンドであるのかを判定する（ステップＳ２６）。あるいは、制御部３４は、認識結果テキストから指定キーワードを認識できないと判定すると（ステップＳ２４のＮｏ）、記憶部３３に保存した特定のコマンドが、処理結果データとして音声データを取得させるコマンドであるのか、処理結果データとして端末用コマンドを取得させるコマンドであるのかを判定する（ステップＳ２６）。 When the control unit 34 stores the smart speaker identification information in the storage unit 33, whether the specific command stored in the storage unit 33 is a command for acquiring voice data as the processing result data, or a terminal command as the processing result data. It is determined whether the command is to be acquired (step S26). Alternatively, if the control unit 34 determines that the specified keyword cannot be recognized from the recognition result text (No in step S24), is the specific command stored in the storage unit 33 a command for acquiring voice data as processing result data? Then, it is determined whether the command is a command for acquiring a terminal command as processing result data (step S26).

制御部３４は、特定のコマンドが音声データを取得させるコマンドであると判定すると（ステップＳ２６の「音声データ」）、特定のコマンドに対応する処理を実行して、音声データ（処理結果データ）を取得する（ステップＳ２７）。あるいは、制御部３４は、他のサーバに、特定のコマンドに対応する処理の実行を要求して、他のサーバから音声データ（処理結果データ）を取得する（ステップＳ２７）。 When the control unit 34 determines that the specific command is a command for acquiring the voice data (“voice data” in step S26), the control unit 34 executes a process corresponding to the specific command to obtain the voice data (processing result data). Obtain (step S27). Or the control part 34 requests | requires execution of the process corresponding to a specific command to another server, and acquires audio | voice data (process result data) from another server (step S27).

制御部３４は、音声データ（処理結果データ）を取得すると、処理結果データの送信先として記憶部３３に保存したスマートスピーカ識別情報を参照して、通信部３１に音声データ（処理結果データ）を送信させ（ステップＳ２８）、図１２に示す動作を終了する。詳しくは、認識結果テキストから指定キーワードを認識できた場合（ステップＳ２４のＹｅｓ）、第１サーバ３は、音声送信スマートスピーカと指定スマートスピーカとに音声データ（処理結果データ）を送信する。一方、認識結果テキストから指定キーワードを認識できない場合（ステップＳ２４のＮｏ）、第１サーバ３は、音声送信スマートスピーカに音声データ（処理結果データ）を送信する。 When acquiring the audio data (processing result data), the control unit 34 refers to the smart speaker identification information stored in the storage unit 33 as the transmission destination of the processing result data, and transmits the audio data (processing result data) to the communication unit 31. The transmission is performed (step S28), and the operation shown in FIG. Specifically, when the specified keyword can be recognized from the recognition result text (Yes in step S24), the first server 3 transmits voice data (processing result data) to the voice transmission smart speaker and the specified smart speaker. On the other hand, when the designated keyword cannot be recognized from the recognition result text (No in step S24), the first server 3 transmits voice data (processing result data) to the voice transmission smart speaker.

一方、制御部３４は、特定のコマンドが端末用コマンドを取得させるコマンドであると判定すると（ステップＳ２６の「コマンド」）、特定のコマンドに対応する処理を実行して、端末用コマンド（処理結果データ）を取得する（ステップＳ２９）。 On the other hand, when the control unit 34 determines that the specific command is a command for acquiring a terminal command (“command” in step S26), the control unit 34 executes a process corresponding to the specific command to execute a terminal command (processing result). Data) is acquired (step S29).

制御部３４は、端末用コマンド（処理結果データ）を取得すると、処理結果データの送信先として記憶部３３に保存したスマートスピーカ識別情報と、第２管理テーブル３３２とを参照して、第１記憶領域３３３ａ〜第３記憶領域３３３ｃのうちの少なくとも１つに端末用コマンドを記憶させる（ステップＳ３０）。 When acquiring the terminal command (processing result data), the control unit 34 refers to the smart speaker identification information stored in the storage unit 33 as the transmission destination of the processing result data, and the second management table 332, and stores the first storage. The terminal command is stored in at least one of the areas 333a to 333c (step S30).

制御部３４は、端末用コマンドを端末用記憶領域３３３に記憶した後に、通信部３１が要求用信号を受信すると、端末用記憶領域３３３に記憶した端末用コマンド（処理結果データ）を通信部３１に送信させ（ステップＳ３１）、図１２に示す動作を終了する。詳しくは、認識結果テキストから指定キーワードを認識できた場合（ステップＳ２４のＹｅｓ）、第１サーバ３は、接続端末と指定端末とに端末用コマンド（処理結果データ）を送信する。一方、認識結果テキストから指定キーワードを認識できない場合（ステップＳ２４のＮｏ）、第１サーバ３は、接続端末に端末用コマンド（処理結果データ）を送信する。 When the communication unit 31 receives the request signal after storing the terminal command in the terminal storage area 333, the control unit 34 transmits the terminal command (processing result data) stored in the terminal storage area 333 to the communication unit 31. (Step S31), and the operation shown in FIG. Specifically, when the designated keyword can be recognized from the recognition result text (Yes in step S24), the first server 3 transmits a terminal command (processing result data) to the connection terminal and the designated terminal. On the other hand, when the designated keyword cannot be recognized from the recognition result text (No in step S24), the first server 3 transmits a terminal command (processing result data) to the connection terminal.

また、制御部３４は、認識結果テキストから特定のコマンドを認識できないと判定すると（ステップＳ２２のＮｏ）、図６を参照して説明した動作と同様に、エラーフラグをＯＮにして（ステップＳ３２）、エラーメッセージを示す音声データを通信部３１に送信させ（ステップＳ２８）、図１２に示す動作を終了する。詳しくは、第１サーバ３は、音声送信スマートスピーカにエラーメッセージ（音声データ）を送信する。 If the control unit 34 determines that the specific command cannot be recognized from the recognition result text (No in step S22), it sets the error flag to ON as in the operation described with reference to FIG. 6 (step S32). Then, the voice data indicating the error message is transmitted to the communication unit 31 (step S28), and the operation shown in FIG. Specifically, the first server 3 transmits an error message (audio data) to the audio transmission smart speaker.

続いて図１１及び図１３を参照して、第１端末２２ａの動作を説明する。図１３は、実施形態２に係る第１端末２２ａの動作を示すフローチャートである。図１３に示す動作は、定期的に実行される。 Next, the operation of the first terminal 22a will be described with reference to FIG. 11 and FIG. FIG. 13 is a flowchart illustrating the operation of the first terminal 22a according to the second embodiment. The operation shown in FIG. 13 is periodically executed.

詳しくは、図１３に示すように、制御部２２５は、第１通信部２２１を介して第１サーバ３に要求信号を送信する（ステップＳ４１）。制御部２２５は、要求信号を送信した後、第１通信部２２１が第１サーバ３から端末用コマンドを受信したか否かを判定する（ステップＳ４２）。制御部２２５は、第１通信部２２１が端末用コマンドを受信したと判定すると（ステップＳ４２のＹｅｓ）、受信した端末用コマンドに対応する処理を実行して（ステップＳ４３）、図１３に示す動作を終了する。あるいは、制御部２２５、第１通信部２２１が端末用コマンドを受信しないと判定すると（ステップＳ４２のＮｏ）、図１３に示す動作を終了する。 Specifically, as shown in FIG. 13, the control unit 225 transmits a request signal to the first server 3 via the first communication unit 221 (step S41). After transmitting the request signal, the control unit 225 determines whether or not the first communication unit 221 has received a terminal command from the first server 3 (step S42). When the control unit 225 determines that the first communication unit 221 has received the terminal command (Yes in step S42), the control unit 225 executes processing corresponding to the received terminal command (step S43), and the operation illustrated in FIG. Exit. Or if the control part 225 and the 1st communication part 221 determine with not receiving the command for terminals (No of step S42), the operation | movement shown in FIG. 13 will be complete | finished.

なお、第２端末２２ｂの制御部２２５及び第３端末２２ｃの制御部２２５も、第１端末２２ａの制御部２２５と同様の動作を実行する。 Note that the control unit 225 of the second terminal 22b and the control unit 225 of the third terminal 22c also perform the same operation as the control unit 225 of the first terminal 22a.

以上、図７〜図１３を参照して、本発明の実施形態２について説明した。本実施形態によれば、音声データをサーバに送信した処理ユニットに加えて、音声データをサーバに送信した処理ユニット以外の処理ユニットも処理の結果を受信することができる。例えば、第１処理ユニット２ａ（第１端末２２ａ）のユーザが起動コマンドを示す音声を発声した後、所定の期間内に、ウエブ会議の終了を促す音声と、Ｂ地点を示す音声とを発声すると、第１端末２２ａ及び第２端末２２ｂがウエブ会議用アプリケーションの実行を停止する。あるいは、第１処理ユニット２ａ（第１端末２２ａ）のユーザが起動コマンドを示す音声を発声した後、所定の期間内に、ウエブ会議の終了を促す音声と、「オール」を示す音声とを発声すると、第１端末２２ａ〜第３端末２２ｃがウエブ会議用アプリケーションの実行を停止する。 The second embodiment of the present invention has been described above with reference to FIGS. According to this embodiment, in addition to the processing unit that has transmitted the audio data to the server, a processing unit other than the processing unit that has transmitted the audio data to the server can also receive the processing result. For example, when the user of the first processing unit 2a (first terminal 22a) utters a voice indicating an activation command and then utters a voice prompting the end of the web conference and a voice indicating the point B within a predetermined period. The first terminal 22a and the second terminal 22b stop executing the web conference application. Alternatively, after the user of the first processing unit 2a (first terminal 22a) utters a voice indicating an activation command, a voice prompting the end of the web conference and a voice indicating “all” are uttered within a predetermined period. Then, the first terminal 22a to the third terminal 22c stop executing the web conference application.

なお、本実施形態において、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃが撮像部２１６を備える構成を説明したが、第１端末２２ａ〜第３端末２２ｃが撮像部を備えてもよい。あるいは、第１端末２２ａ〜第３端末２２ｃに周辺装置としてカメラ装置が接続されてもよい。 In the present embodiment, the configuration in which the first smart speaker 21a to the third smart speaker 21c include the imaging unit 216 has been described, but the first terminal 22a to the third terminal 22c may include an imaging unit. Alternatively, a camera device may be connected as a peripheral device to the first terminal 22a to the third terminal 22c.

また、本実施形態において、情報処理システム１はウエブ会議システムであったが、情報処理システム１はテレビ会議システム又は電話会議システムであってもよい。この場合、第１端末２２ａ〜第３端末２２ｃは、ＬＡＮを介して接続される。 In the present embodiment, the information processing system 1 is a web conference system, but the information processing system 1 may be a video conference system or a telephone conference system. In this case, the first terminal 22a to the third terminal 22c are connected via a LAN.

情報処理システム１がテレビ会議システム又は電話会議システムである場合、第２サーバ４は省略され得る。また、情報処理システム１が電話会議システムである場合、第１端末２２ａ〜第３端末２２ｃは、電話会議専用のマイク／スピーカ装置であり得る。また、情報処理システム１が電話会議システムである場合、第１表示装置２３ａ〜第３表示装置２３ｃは省略され得る。 When the information processing system 1 is a video conference system or a telephone conference system, the second server 4 can be omitted. Further, when the information processing system 1 is a telephone conference system, the first terminal 22a to the third terminal 22c may be microphone / speaker devices dedicated to the telephone conference. When the information processing system 1 is a telephone conference system, the first display device 23a to the third display device 23c can be omitted.

また、本実施形態において、第１処理ユニット２ａ〜第３処理ユニット２ｃが第１端末２２ａ〜第３端末２２ｃを含む構成について説明したが、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃが第１端末２２ａ〜第３端末２２ｃの機能を有してもよい。この場合、第１端末２２ａ〜第３端末２２ｃは省略され得る。 Further, in the present embodiment, the configuration in which the first processing unit 2a to the third processing unit 2c include the first terminal 22a to the third terminal 22c has been described, but the first smart speaker 21a to the third smart speaker 21c are the first. You may have the function of the terminal 22a-the 3rd terminal 22c. In this case, the first terminal 22a to the third terminal 22c may be omitted.

以上、本発明の実施形態１、２について図面を参照しながら説明した。但し、本発明は、上記の実施形態に限られず、その要旨を逸脱しない範囲で種々の態様において実施することが可能である。 The first and second embodiments of the present invention have been described above with reference to the drawings. However, the present invention is not limited to the above-described embodiment, and can be implemented in various modes without departing from the gist thereof.

例えば、本発明による実施形態では、情報処理システム１は、３つの処理ユニットを備えたが、情報処理システム１は、２つの処理ユニット又は４つ以上の処理ユニットを備えてもよい。 For example, in the embodiment according to the present invention, the information processing system 1 includes three processing units, but the information processing system 1 may include two processing units or four or more processing units.

また、本発明による実施形態において、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃの制御部２１５は、所定の期間が経過した後にユーザ音声をサーバ３に送信したが、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃの制御部２１５は、所定の期間が経過する前にユーザ音声をサーバ３に送信してもよい。具体的には、第１スマートスピーカ２１ａ〜第３スマートスピーカ２１ｃの制御部２１５は、音声データを、所定回数（例えば、２回）、記憶部２１４に保存すると、記憶部２１４に保存した音声データをサーバ３に送信してもよい。 In the embodiment according to the present invention, the control unit 215 of the first smart speaker 21a to the third smart speaker 21c transmits the user voice to the server 3 after a predetermined period has elapsed. The control unit 215 of the 3 smart speaker 21c may transmit the user voice to the server 3 before a predetermined period elapses. Specifically, when the control unit 215 of the first smart speaker 21a to the third smart speaker 21c stores the sound data in the storage unit 214 a predetermined number of times (for example, twice), the sound data stored in the storage unit 214 is stored. May be transmitted to the server 3.

本発明は、スマートスピーカのような音声入出力端末を使用するシステムに有用である。 The present invention is useful for a system using a voice input / output terminal such as a smart speaker.

１情報処理システム
２ａ第１処理ユニット
２ｂ第２処理ユニット
２ｃ第３処理ユニット
３サーバ
２１ａ第１スマートスピーカ
２１ｂ第２スマートスピーカ
２１ｃ第３スマートスピーカ
２２ａ第１端末
２２ｂ第２端末
２２ｃ第３端末
３１通信部
３２音声認識部
３４制御部
２１１音声入力部
２１３通信部
２１５制御部
２２１第１通信部
３３３端末用記憶領域
３３３ａ第１記憶領域
３３３ｂ第２記憶領域
３３３ｃ第３記憶領域 DESCRIPTION OF SYMBOLS 1 Information processing system 2a 1st processing unit 2b 2nd processing unit 2c 3rd processing unit 3 Server 21a 1st smart speaker 21b 2nd smart speaker 21c 3rd smart speaker 22a 1st terminal 22b 2nd terminal 22c 3rd terminal 31 Communication Unit 32 Voice recognition unit 34 Control unit 211 Voice input unit 213 Communication unit 215 Control unit 221 First communication unit 333 Terminal storage area 333a First storage area 333b Second storage area 333c Third storage area

Claims

An information processing system comprising a plurality of processing units,
The processing unit is
A voice input unit for inputting a first voice indicating a specific command and a second voice specifying at least one of the plurality of processing units;
An audio data generation unit that generates first audio data corresponding to the first audio and second audio data corresponding to the second audio;
An audio data transmission unit for transmitting the first audio data and the second audio data to a server;
And at least one processing result data receiving unit for receiving processing result data from the server,
The processing result data indicates an execution result of processing corresponding to the specific command,
Information processing in which the processing unit that has transmitted the first audio data and the second audio data and the processing unit that is specified by the second audio data among the plurality of processing units receive the processing result data. system.

The processing unit includes a voice input / output terminal,
The information processing system according to claim 1, wherein the voice input / output terminal includes the voice input unit, the voice data generation unit, the voice data transmission unit, and the processing result data reception unit.

The processing unit includes a voice input / output terminal and an information processing terminal,
The voice input / output terminal includes the voice input unit, the voice data generation unit, and the voice data transmission unit,
The information processing system according to claim 1, wherein the information processing terminal includes the processing result data receiving unit.

The at least one processing result data receiving unit includes a first processing result data receiving unit and a second processing result data receiving unit,
The processing unit includes a voice input / output terminal and an information processing terminal,
The voice input / output terminal includes the voice input unit, the voice data generation unit, the voice data transmission unit, and the first processing result data reception unit.
The information processing system according to claim 1, wherein the information processing terminal includes the second processing result data receiving unit.

The information processing terminal
A control unit that generates a request signal for requesting the server to transmit data;
The information processing system according to claim 3, further comprising: a request signal transmission unit configured to transmit the request signal to the server.

The information processing system includes the server,
The server
An audio data receiving unit for receiving the first audio data and the second audio data;
A voice recognition unit that converts the first voice data and the second voice data into text information;
A control unit for obtaining the processing result data based on the text information;
A processing result data transmitting unit for transmitting the processing result data,
The processing result data transmission unit transmits the processing result data to the processing unit that has transmitted the first audio data and the second audio data and the processing unit that is specified by the second audio data. The information processing system according to any one of claims 1 to 5.

The server has a storage area corresponding to each of the plurality of processing units;
The control unit of the server includes a storage area corresponding to the processing unit that has transmitted the first audio data and the second audio data, and the processing unit specified by the second audio data. The information processing system according to claim 6, wherein the processing result data is stored in a storage area corresponding to.

Inputting a first voice indicating a specific command and a second voice specifying at least one of the plurality of processing units;
Generating first audio data corresponding to the first audio and second audio data corresponding to the second audio;
Transmitting the first audio data and the second audio data to a server;
The processing unit that has transmitted the first audio data and the second audio data and the processing unit that is specified by the second audio data among the plurality of processing units receives processing result data from the server. Including and
The information processing method, wherein the processing result data indicates an execution result of processing corresponding to the specific command.