JP7465075B2

JP7465075B2 - Calculation device, recording medium, voice input device

Info

Publication number: JP7465075B2
Application number: JP2019206587A
Authority: JP
Inventors: 悠藤田; 賢知受田; 功田澤; 謙太山崎
Original assignee: Hitachi Building Systems Co Ltd
Current assignee: Hitachi Building Systems Co Ltd
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2024-04-10
Anticipated expiration: 2039-11-14
Also published as: JP2021081482A

Description

本発明は、演算装置、記録媒体、および音声入力装置に関する。 The present invention relates to a computing device, a recording medium, and a voice input device.

少子高齢化や働き方改革により、人と対話可能な大量のロボットを安価に提供することが求められている。ロボットが人へ満足度の高い音声対話を提供するには、迅速な応答速度と高精度の認識率が必要となる。この実現のために、ロボットに対してユーザが話しかけた際に音声認識プロセスを起動する方式が考えられる。しかし、音声認識プロセスは、発話による音声を高精度に文字に変換するための容量の大きい言語モデルを有しており、起動に数秒の時間を要してしまう。そのため、ユーザの発話を契機に音声認識プロセスを起動するとユーザの音声認識開始までに待ち時間が発生し、結果的にユーザへの応答が遅延するため、ユーザの満足度が低下する問題を有する。
特許文献１には、音声出力部を備えたコミュニケーションロボットであって、被写体を撮影して撮影画像を生成する撮影部と、前記撮影部によって得られた撮影画像に基づいて、発話対象者を特定する対象者特定手段と、前記コミュニケーションロボットの配置位置周辺の混雑度を判定する混雑度判定手段と、前記混雑度判定手段による判定結果に応じて所定の処理を行うとともに、前記対象者特定手段により特定した前記発話対象者に対する発話データを、前記音声出力部から出力する発話手段と、を備えるコミュニケーションロボットが開示されている。 Due to the declining birthrate and aging population, as well as work style reforms, there is a demand to provide a large number of robots capable of interacting with humans at low cost. In order for a robot to provide a highly satisfying voice dialogue to humans, it is necessary to have a fast response speed and a high recognition rate. To achieve this, a method of starting a voice recognition process when a user speaks to a robot can be considered. However, the voice recognition process has a large-capacity language model for converting spoken voice into characters with high accuracy, and it takes several seconds to start up. Therefore, if the voice recognition process is started in response to a user's speech, a waiting time occurs before the user's voice recognition starts, resulting in a delayed response to the user, which causes a problem of lowering user satisfaction.
Patent document 1 discloses a communication robot equipped with an audio output unit, which includes an imaging unit that images a subject and generates an image, a target person identification means that identifies a target person to be spoken to based on the image obtained by the imaging unit, a congestion degree determination means that determines the degree of congestion in the area around the position where the communication robot is placed, and a speech means that performs a predetermined process depending on the determination result by the congestion degree determination means and outputs speech data addressed to the target person identified by the target person identification means from the audio output unit.

特開２０１８－１４９６２５号公報JP 2018-149625 A

特許文献１に記載されている発明では、音声認識プロセスを起動するタイミングに改善の余地がある。 The invention described in Patent Document 1 leaves room for improvement in the timing of starting the voice recognition process.

本発明の第１の態様による演算装置は、ユーザとの音声対話が発生する可能性を示すリソース制御情報、および音声情報を送信する複数のロボットと通信するサーバ通信部と、前記ロボットから受信する前記音声情報を文字情報に変換する認識プロセスを複数実行可能な音声認識部と、前記ロボットから受信する前記リソース制御情報を用いて前記認識プロセスの起動数を管理し、前記認識プロセスの起動および停止を行うプロセス制御部とを備える。
本発明の第２の態様による記録媒体は、以下のプログラムを記録したコンピュータ読み取り可能な記録媒体であり、プログラムは、ユーザとの音声対話が発生する可能性を示すリソース制御情報、および音声情報を送信する複数のロボットと通信するサーバ通信部を備えるコンピュータに、前記ロボットから受信する前記音声情報を文字情報に変換する認識プロセスを複数実行可能な音声認識処理と、前記ロボットから受信する前記リソース制御情報を用いて前記認識プロセスの起動数を管理し、前記認識プロセスの起動および停止を行うプロセス制御処理と、を実行させる。
本発明の第３の態様による音声入力装置は、受信した音声情報を文字情報に変換する認識プロセスを複数実行可能な音声認識部を備える演算装置と通信可能なロボット通信部と、ユーザとの距離に基づき前記演算装置に前記認識プロセスの起動を要求する起動要求を送信する起動要求部と、前記ユーザの発話を録音し前記音声情報として前記演算装置に送信する送信部と、を備える音声入力装置であって、前記認識プロセスの起動時間、および前記ユーザと当該音声入力装置との相対速度の積を監視距離として算出する距離決定部をさらに備え、前記起動要求部は、前記ユーザと当該音声入力装置との距離が前記監視距離以下の場合に前記起動要求を送信する。 A computing device according to a first aspect of the present invention includes resource control information indicating the possibility of a voice dialogue with a user, a server communication unit that communicates with a plurality of robots that transmit voice information, a voice recognition unit that is capable of executing a plurality of recognition processes that convert the voice information received from the robots into text information, and a process control unit that uses the resource control information received from the robots to manage the number of recognition processes that are activated and starts and stops the recognition processes.
A recording medium according to a second aspect of the present invention is a computer-readable recording medium having the following program recorded thereon, the program causing a computer having resource control information indicating the possibility of a voice dialogue with a user and a server communication unit that communicates with a plurality of robots that transmit voice information to execute a voice recognition process capable of executing a plurality of recognition processes that convert the voice information received from the robots into text information, and a process control process that uses the resource control information received from the robots to manage the number of recognition processes that are activated and to start and stop the recognition processes.
A voice input device according to a third aspect of the present invention is a voice input device comprising: a robot communication unit capable of communicating with a calculation device having a voice recognition unit capable of executing multiple recognition processes for converting received voice information into text information; a start-up request unit that transmits a start-up request to the calculation device to request the start-up of the recognition process based on the distance from the user; and a transmission unit that records the user's speech and transmits it to the calculation device as the voice information, and further comprises a distance determination unit that calculates the product of the start-up time of the recognition process and the relative speed between the user and the voice input device as a monitoring distance, and the start-up request unit transmits the start-up request when the distance between the user and the voice input device is less than or equal to the monitoring distance.

本発明によれば、ユーザがロボットに近づくと音声認識プロセスが起動されるので、ユーザが発話してから音声認識が開始されるまでの時間を短縮できる。 According to the present invention, the voice recognition process is initiated when the user approaches the robot, thereby shortening the time between when the user speaks and when voice recognition begins.

音声認識システムの全体を示す概要図Overview of the speech recognition system ロボットの構成図Robot configuration diagram 演算装置と入出力装置の関係を示す図A diagram showing the relationship between the arithmetic unit and the input/output unit. 演算装置の構成を示す図FIG. 1 shows the configuration of a computing device. 演算装置を実現するコンピュータのハードウエア構成を示す図A diagram showing the hardware configuration of a computer that realizes a calculation device. 音声認識部の構成の一例を示す図FIG. 2 is a diagram showing an example of the configuration of a voice recognition unit; 待機管理テーブルの一例を示す図FIG. 13 is a diagram illustrating an example of a standby management table. 数管理テーブルの一例を示す図FIG. 13 is a diagram showing an example of a number management table. 割当管理テーブルの一例を示す図FIG. 13 is a diagram showing an example of an allocation management table. 起動要求部の処理を示すフローチャートFlowchart showing the process of the start request unit 図１０のユーザ検出処理の詳細を示すフローチャートA flowchart showing details of the user detection process of FIG. 10. 割当要求部の処理を示すフローチャートFlowchart showing the process of the allocation request unit プロセス制御部の処理を示すフローチャートFlowchart showing the processing of the process control unit 図１３のステップＳ１３０１において肯定判断される場合に実行される処理を示すフローチャートA flowchart showing a process executed when a positive determination is made in step S1301 of FIG. プロセス停止部の処理を示すフローチャートFlowchart showing the processing of the process stopping unit 第２の実施の形態におけるロボットの構成図FIG. 1 is a configuration diagram of a robot according to a second embodiment. 第２の実施の形態における演算装置の構成図FIG. 1 is a block diagram of a computing device according to a second embodiment. 第３の実施の形態における距離決定部の処理を示すフローチャート13 is a flowchart showing the process of the distance determination unit in the third embodiment.

―第１の実施の形態―
以下、図１～図１５を参照して、音声認識システムの第１の実施の形態を説明する。 -First embodiment-
A first embodiment of a voice recognition system will be described below with reference to FIGS.

図１は第１の実施の形態における音声認識システムの全体を示す概要図である。音声認識システムＳは、複数のロボット２００と、後に詳述する音声認識部６００を備える演算装置４００とを含んで構成される。なお以下では、複数のロボット２００のそれぞれを区別するために、枝番を付してロボット２００－１、ロボット２００－２、などと呼ぶ。ロボット２００の数は複数であればよいため、図１では「ｎ」個から構成されるとしてロボット２００－ｎまでを記載している。 Figure 1 is a schematic diagram showing the entire voice recognition system in the first embodiment. The voice recognition system S is configured to include multiple robots 200 and a calculation device 400 equipped with a voice recognition unit 600, which will be described in detail later. In the following, in order to distinguish between the multiple robots 200, they will be given sub-numbers and referred to as robot 200-1, robot 200-2, etc. As it is sufficient for there to be more than one robot 200, Figure 1 shows up to robot 200-n as being made up of "n" robots.

複数のロボット２００のそれぞれは、演算装置４００と通信を行う。この通信はたとえば、通信の効率化のためにルータ１２０により集約されて行われる。ルータ１２０と演算装置４００は、ネットワーク１１０を介して接続される。ただしルータ１２０と演算装置４００はネットワーク１１０を介さずに通信を行ってもよい。 Each of the multiple robots 200 communicates with the computing device 400. For example, this communication is aggregated by the router 120 to improve communication efficiency. The router 120 and the computing device 400 are connected via the network 110. However, the router 120 and the computing device 400 may communicate without going through the network 110.

ロボット２００およびルータ１２０は、テナント１００に設置される。テナント１００とは、ビルや商業施設などである。演算装置４００は、ネットワーク上のクラウド３００に設置される。クラウド３００とは、ネットワークで接続された１または複数の地点を指す抽象的な概念である。本実施の形態では図１に示す構成を前提として説明するが、ロボット２００および演算装置４００が同一のロケーション、たとえば同一の建物内や同一の室内に存在してもよい。また、テナント１００やクラウド３００は単なる例示である。 The robot 200 and the router 120 are installed in a tenant 100. The tenant 100 is a building, a commercial facility, or the like. The computing device 400 is installed in a cloud 300 on a network. The cloud 300 is an abstract concept that refers to one or more points connected by a network. In this embodiment, the configuration shown in FIG. 1 is used as a premise for explanation, but the robot 200 and the computing device 400 may be present in the same location, for example, in the same building or in the same room. Also, the tenant 100 and the cloud 300 are merely examples.

本実施の形態では、ロボット２００の周囲に存在する人間を「ユーザ」と呼ぶ。ユーザはロボット２００に近づいて発話を行い、ロボット２００はユーザの発話に対応する行動を行う。詳述すると、ロボット２００はユーザの発話を音声データとして記録し、演算装置４００を用いて音声データをテキストデータに変換し、そのテキストデータを解釈することでユーザの発話に対応する行動を行う。 In this embodiment, humans around the robot 200 are called "users." The users approach the robot 200 and speak, and the robot 200 behaves in response to the user's utterance. In more detail, the robot 200 records the user's utterance as voice data, converts the voice data into text data using the computing device 400, and interprets the text data to behave in response to the user's utterance.

図２はロボット２００－１の構成図である。ロボット２００－１～２００－ｎは同じ構成を有するので、ここでは代表してロボット２００－１の構成を説明する。ただし図２では、ハードウエア構成、機能構成、および情報が混在して記載されている。ロボット２００－１は、ハードウエア構成として、通信インタフェース部２０１、距離センサ２０２、マイク２０３、およびスピーカー２０４を備える。ロボット２００－１は、機能構成として動作指示解析部２０５、ロボット動作制御部２０６、データ送受信部２０７、距離決定部２０８、起動要求部２０９、および割当要求部２１０を備える。ロボット２００－１は、センサ情報２１１、音声情報２１２、人歩行速度情報２１３、およびテナント地図情報２１４の情報を有する。 Figure 2 is a configuration diagram of robot 200-1. Robots 200-1 to 200-n have the same configuration, so here the configuration of robot 200-1 will be described as a representative. However, in Figure 2, the hardware configuration, functional configuration, and information are described together. Robot 200-1 has a communication interface unit 201, a distance sensor 202, a microphone 203, and a speaker 204 as a hardware configuration. Robot 200-1 has a motion instruction analysis unit 205, a robot motion control unit 206, a data transmission/reception unit 207, a distance determination unit 208, an activation request unit 209, and an allocation request unit 210 as a functional configuration. Robot 200-1 has sensor information 211, voice information 212, human walking speed information 213, and tenant map information 214.

ロボット２００－１のハードウエア構成を説明する。通信インタフェース部２０１は、演算装置４００との通信を行う通信モジュールである。ロボット２００－１は距離センサ２０２を用いて周囲のユーザとの距離情報を取得し、センサ情報２１１として保存する。距離センサ２０２はたとえば、奥行き（Ｄｅｐｔｈ）情報が付加されたカラー、すなわちＲＧＢの情報である三次元点群を取得するＲＧＢ－Ｄセンサである。ただし距離センサ２０２はこれに限定されない。また、ロボット内部に含まれるセンサのみに限定されず、例えば外部に設置された監視カメラをセンサとして情報を取得してもよい。 The hardware configuration of the robot 200-1 will now be described. The communication interface unit 201 is a communication module that communicates with the computing device 400. The robot 200-1 acquires distance information from surrounding users using the distance sensor 202, and stores this as sensor information 211. The distance sensor 202 is, for example, an RGB-D sensor that acquires a three-dimensional point cloud that is color with added depth information, i.e., RGB information. However, the distance sensor 202 is not limited to this. In addition, the sensors are not limited to those included inside the robot, and information may be acquired using, for example, a surveillance camera installed outside as a sensor.

マイク２０３は、ユーザがロボット２００－１に対して発話した音声を録音し、音声情報２１２として保存する。ロボット２００－１はデータ送受信部２０７を介して演算装置４００と通信を行い、取得したデータを送信、または演算装置４００から送られたデータを受信する。スピーカー２０４は演算装置４００から送られてきた発話音声情報を再生する。以上がロボット２００－１のハードウエア構成の説明である。 The microphone 203 records the voice spoken by the user to the robot 200-1 and stores it as voice information 212. The robot 200-1 communicates with the arithmetic device 400 via the data transmission/reception unit 207, and transmits acquired data or receives data sent from the arithmetic device 400. The speaker 204 plays the spoken voice information sent from the arithmetic device 400. This concludes the description of the hardware configuration of the robot 200-1.

ロボット２００－１の機能構成を説明する。ロボット２００－１が有する機能である動作指示解析部２０５、ロボット動作制御部２０６、データ送受信部２０７、距離決定部２０８、起動要求部２０９、および割当要求部２１０は、たとえば不図示のＲＯＭに格納されるプログラムをＲＡＭに展開して実行することにより実現される。ただしプログラムは、不図示の不揮発性の記憶装置に格納されていてもよい。また、ロボット２００－１が不図示の入出力インタフェースを備え、必要なときに入出力インタフェースとロボット２００－１が利用可能な媒体を介して、他の装置からプログラムが読み込まれてもよい。ここで媒体とは、たとえば入出力インタフェースに着脱可能な記憶媒体２９１、または通信媒体２９２、すなわち有線、無線、光などのネットワーク、または当該ネットワークを伝搬する搬送波やディジタル信号、を指す。 The functional configuration of the robot 200-1 will be described. The functions of the robot 200-1, namely, the motion instruction analysis unit 205, the robot motion control unit 206, the data transmission/reception unit 207, the distance determination unit 208, the activation request unit 209, and the allocation request unit 210, are realized, for example, by expanding a program stored in a ROM (not shown) into a RAM and executing it. However, the program may be stored in a non-volatile storage device (not shown). The robot 200-1 may also have an input/output interface (not shown), and the program may be read from another device when necessary via the input/output interface and a medium available to the robot 200-1. Here, the medium refers to, for example, a storage medium 291 that is detachable from the input/output interface, or a communication medium 292, i.e., a network such as a wired, wireless, or optical network, or a carrier wave or digital signal that propagates through the network.

動作指示解析部２０５は、演算装置４００から送られてきた動作指示を解析してロボット２００が解釈できる形式に変換し、ロボット動作制御部２０６に出力する。ロボット動作制御部２０６は、受信した動作指示に従った動作、たとえば表情の変更や移動を行う。データ送受信部２０７は、通信インタフェース部２０１を利用して演算装置４００とのデータの送受信を実現する。距離決定部２０８は、後述する処理で閾値として用いられる距離である監視距離を決定する。この監視距離は次のように利用される。すなわちユーザがロボット２００に対して監視距離よりも近づいた場合に認識プロセス６０１の起動を開始することで、ユーザが発話を開始した際には即座に音声認識を開始することができる。 The motion instruction analysis unit 205 analyzes the motion instructions sent from the calculation device 400, converts them into a format that can be interpreted by the robot 200, and outputs them to the robot motion control unit 206. The robot motion control unit 206 performs an action according to the received motion instruction, such as changing facial expression or moving. The data transmission/reception unit 207 realizes transmission and reception of data with the calculation device 400 using the communication interface unit 201. The distance determination unit 208 determines a monitoring distance, which is a distance used as a threshold value in the processing described below. This monitoring distance is used as follows. That is, by starting the recognition process 601 when the user gets closer to the robot 200 than the monitoring distance, voice recognition can be started immediately when the user starts speaking.

距離決定部２０８は、演算装置４００から受信する認識プロセス６０１の起動に必要な時間であるプロセス起動時間と、ユーザとロボット２００との相対速度とを用いて、監視距離を算出する。ただし本実施の形態ではロボット２００は移動しないので、相対速度の代わりに、予めロボット２００に保存される人歩行速度情報２１３を用いる。たとえば認識プロセス６０１のプロセス起動時間が５秒、ユーザの歩行速度が１秒あたり１ｍの場合に、距離決定部２０８は監視距離を５ｍと決定する。なお、認識プロセス６０１の起動必要時間は予めロボットに保存してあったデータを利用してもよい。 The distance determination unit 208 calculates the monitoring distance using the process startup time, which is the time required to start the recognition process 601 received from the computing device 400, and the relative speed between the user and the robot 200. However, in this embodiment, since the robot 200 does not move, human walking speed information 213 stored in advance in the robot 200 is used instead of the relative speed. For example, if the process startup time of the recognition process 601 is 5 seconds and the user's walking speed is 1 meter per second, the distance determination unit 208 determines the monitoring distance to be 5 meters. Note that the startup time required for the recognition process 601 may be determined using data stored in advance in the robot.

起動要求部２０９は、距離決定部２０８から得られた監視距離と、距離センサ２０２で取得したロボット２００－１から周囲のユーザまでの距離情報とを比較する。起動要求部２０９は、ロボット２００－１から監視距離以内にユーザが侵入したと判断する場合に認識プロセス６０１の起動要求を演算装置４００に送信する。なお起動要求は、ユーザとの音声対話が発生する可能性を示す情報とも呼べる。また、起動要求を受信した演算装置４００は認識プロセス６０１の起動のためにリソースの確保を行うので、起動要求はリソース制御情報とも呼べる。 The start request unit 209 compares the monitoring distance obtained from the distance determination unit 208 with the distance information from the robot 200-1 to surrounding users obtained by the distance sensor 202. When the start request unit 209 determines that a user has entered within the monitoring distance of the robot 200-1, it sends a start request for the recognition process 601 to the calculation device 400. The start request can also be called information indicating the possibility of a voice dialogue with the user. Furthermore, since the calculation device 400 that receives the start request secures resources for starting the recognition process 601, the start request can also be called resource control information.

また起動要求部２０９は、監視距離以内に存在していたユーザがロボット２００－１から監視距離よりも遠くに移動したと判断する場合に認識プロセス６０１の停止要求を演算装置４００に送信する。割当要求部２１０は、ユーザが発話を開始すると認識プロセス６０１の割当要求を演算装置４００に送信し、ユーザの発話が終了すると認識プロセス６０１の割当解除要求を演算装置４００に送信する。以上がロボット２００－１の機能構成の説明である。 In addition, the start request unit 209 sends a request to the calculation device 400 to stop the recognition process 601 when it determines that a user who was within the monitoring distance has moved farther away from the robot 200-1 than the monitoring distance. The allocation request unit 210 sends a request to the calculation device 400 to allocate the recognition process 601 when the user starts speaking, and sends a request to the calculation device 400 to deallocate the recognition process 601 when the user finishes speaking. This concludes the explanation of the functional configuration of the robot 200-1.

ロボット２００－１に格納される情報を説明する。センサ情報２１１は、距離センサ２０２を用いて取得された、周囲のユーザとの距離情報である。音声情報２１２は、マイク２０３を用いて取得された、ユーザがロボット２００－１に対して発話した音声を録音したものである。 The information stored in the robot 200-1 will now be described. The sensor information 211 is distance information from surrounding users, obtained using the distance sensor 202. The voice information 212 is a recording of the voice spoken by the user to the robot 200-1, obtained using the microphone 203.

人歩行速度情報２１３は、ユーザが歩行する速度の情報であり、予め保存される。人歩行速度情報２１３は、予め保存してあるデータだけではなく、距離センサ２０２の出力を解析した結果など、新しく取得した値を利用してもよい。テナント地図情報２１４は、テナント１００における壁、柱、および棚などの固定物の三次元点群の情報である。テナント地図情報２１４は、起動要求部２０９がユーザとロボット２００との距離を判別する際に利用される。以上がロボット２００－１に格納される情報の説明である。 The human walking speed information 213 is information about the speed at which the user walks, and is stored in advance. The human walking speed information 213 may be not only data stored in advance, but also newly acquired values, such as the results of analyzing the output of the distance sensor 202. The tenant map information 214 is information about a three-dimensional point cloud of fixed objects such as walls, pillars, and shelves in the tenant 100. The tenant map information 214 is used when the activation request unit 209 determines the distance between the user and the robot 200. This concludes the explanation of the information stored in the robot 200-1.

図３は演算装置４００と入出力装置の関係を示す図である。クラウド３００にはネットワーク１１０に接続される通信インタフェース部３０１が備えられる。ネットワーク１１０には入力装置３０２および出力装置３０３が接続される。入力装置３０２はたとえば、キーボード、マウス、タッチパネルなどである。出力装置３０３はたとえば、液晶ディスプレイである。オペレータは、入力装置３０２を用いて演算装置４００に動作指令を行い、演算装置４００による演算の結果を出力装置３０３を用いて確認できる。 Figure 3 is a diagram showing the relationship between the arithmetic device 400 and the input/output devices. The cloud 300 is equipped with a communication interface unit 301 that is connected to the network 110. An input device 302 and an output device 303 are connected to the network 110. The input device 302 is, for example, a keyboard, a mouse, or a touch panel. The output device 303 is, for example, an LCD display. An operator can use the input device 302 to give operational commands to the arithmetic device 400 and use the output device 303 to check the results of calculations performed by the arithmetic device 400.

図４は演算装置４００の構成を示す図である。ただし図４には、演算装置４００の機能構成と演算装置４００に格納される情報とが示されている。演算装置４００はその機能として、サーバ通信部４０１、音声認識部６００、対話処理部４０３、音声合成部４０４、指示部４０５、時間測定部４０６、プロセス制御部４０７、プロセス停止部４０８、タスク受付部４０９、およびタスク制御部４１０を備える。演算装置４００には、対話定義情報４１２、動作シナリオ情報４１３、音声認識処理情報４１４、および寿命情報４１５が格納される。 Figure 4 is a diagram showing the configuration of the arithmetic device 400. However, Figure 4 shows the functional configuration of the arithmetic device 400 and the information stored in the arithmetic device 400. The arithmetic device 400 has, as its functions, a server communication unit 401, a voice recognition unit 600, a dialogue processing unit 403, a voice synthesis unit 404, an instruction unit 405, a time measurement unit 406, a process control unit 407, a process stopping unit 408, a task reception unit 409, and a task control unit 410. The arithmetic device 400 stores dialogue definition information 412, operation scenario information 413, voice recognition processing information 414, and lifespan information 415.

演算装置４００が有する機能を説明する。サーバ通信部４０１は通信インタフェース部３０１及びネットワーク１１０を介してロボット２００と相互に通信を行い、音声データ等をやり取りする。音声認識部６００はロボット２００から受信したデータを処理し、音声データをテキストデータへ変換する。音声認識部６００の詳細な構成については後述する。 The functions of the computing device 400 will be described. The server communication unit 401 communicates with the robot 200 via the communication interface unit 301 and the network 110, and exchanges voice data and the like. The voice recognition unit 600 processes data received from the robot 200 and converts the voice data into text data. The detailed configuration of the voice recognition unit 600 will be described later.

対話処理部４０３は対話定義情報４１２を用いて、音声認識部４０２によって得られたテキストデータに応答するテキストを出力する。音声合成部４０４は、対話処理部４０３によって出力された応答テキストを音声データに変換する。指示部４０５は、音声合成部４０４によって生成されたデータ、および動作シナリオ情報４１３に従ってロボットに発話や動作を実行させる指示を生成する。さらに指示部４０５は、サーバ通信部４０１を介してロボット２００に対して生成した指示を送信する。 The dialogue processing unit 403 uses the dialogue definition information 412 to output text that responds to the text data obtained by the voice recognition unit 402. The voice synthesis unit 404 converts the response text output by the dialogue processing unit 403 into voice data. The instruction unit 405 generates instructions to cause the robot to speak or perform an action according to the data generated by the voice synthesis unit 404 and the action scenario information 413. Furthermore, the instruction unit 405 transmits the generated instructions to the robot 200 via the server communication unit 401.

時間測定部４０６は、認識プロセス６０１に対する起動指示から、認識プロセス６０１による音声認識処理が処理開始できるまでの時間を測定する。この時間については後に具体例を用いて改めて説明する。時間測定部４０６はたとえば、何度か測定を行い最長の時間をプロセス起動時間としてロボット２００に送信する。なおプロセス起動時間は、時間測定部４０６が測定する代わりに、演算装置４００のオペレータが予め測定した値を入力装置３０２から入力して用いてもよい。 The time measurement unit 406 measures the time from the start instruction to the recognition process 601 to the time when the recognition process 601 can start processing the voice recognition process. This time will be explained again later using a concrete example. For example, the time measurement unit 406 measures several times and transmits the longest time as the process start time to the robot 200. Note that instead of the time measurement unit 406 measuring the process start time, a value measured in advance by the operator of the computing device 400 may be input from the input device 302 and used.

プロセス制御部４０７はロボット２００から受信した認識プロセス６０１の起動要求、停止要求、割当要求、および割当解除要求にしたがって認識プロセス６０１を制御する。プロセス制御部４０７の動作の詳細は後述する。プロセス停止部４０８は、所定の時間以上停止している認識プロセス６０１を停止させる。タスク受付部４０９は、入力装置３０２を介してロボット２００へユーザからの指示を受け付け、タスク制御部４１０は受け付けた処理を実行する。以上が、演算装置４００が有する機能の説明である。 The process control unit 407 controls the recognition process 601 in accordance with requests to start, stop, allocate, and deallocate the recognition process 601 received from the robot 200. Details of the operation of the process control unit 407 will be described later. The process stop unit 408 stops the recognition process 601 that has been stopped for a predetermined time or more. The task reception unit 409 receives instructions from the user to the robot 200 via the input device 302, and the task control unit 410 executes the received processing. This concludes the description of the functions of the calculation device 400.

演算装置４００に格納される情報を説明する。対話定義情報４１２は、ユーザとロボット２００との対話を成立させるための複数組の文章である。対話定義情報４１２はたとえば質問と回答の組合せであり、演算装置４００はユーザの発話する質問に対する回答を出力する。動作シナリオ情報４１３には、状況ごとにロボット２００が行うべき動作が記載されている。動作シナリオ情報４１３にはたとえば、ユーザが特定の一連の発話を行った場合に、ロボット２００が特定の動作、たとえば移動およびあらかじめ定められたポーズをとることが記載される。 The information stored in the computation device 400 will be described. The dialogue definition information 412 is a set of sentences for establishing a dialogue between the user and the robot 200. The dialogue definition information 412 is, for example, a combination of questions and answers, and the computation device 400 outputs answers to questions uttered by the user. The action scenario information 413 describes the actions that the robot 200 should perform for each situation. The action scenario information 413 describes, for example, that when the user utters a specific series of utterances, the robot 200 will perform a specific action, such as moving and taking a predetermined pose.

音声認識処理情報４１４には待機管理テーブル７００、数管理テーブル８００、および割当管理テーブル９００が格納される。これらのテーブルについては後述する。寿命情報４１５は、プロセス停止部４０８が認識プロセス６０１を停止する際に参照する起動時間のしきい値である。寿命情報４１５は、管理者により予め登録される。 The voice recognition processing information 414 stores a standby management table 700, a number management table 800, and an allocation management table 900. These tables will be described later. The lifespan information 415 is a startup time threshold that the process stopping unit 408 refers to when stopping the recognition process 601. The lifespan information 415 is registered in advance by the administrator.

図５は演算装置４００を実現するコンピュータ５００のハードウエア構成を示す図である。演算装置４００は、１または複数のコンピュータ５００により構成される。コンピュータ５００は、ＣＰＵ（Central Processing Unit）に代表される演算装置５０１、ＲＡＭ（Random Access Memory）等のメモリ５０２、入力装置５０３、出力装置５０４、メモリコントローラ５０５、およびＩ／Ｏ（Input／Output）コントローラ５０６を備える。演算装置５０１、メモリ５０２、入力装置５０３、出力装置５０４、およびＩ／Ｏコントローラ５０６は、メモリコントローラ５０５を介して相互に接続される。入力装置５０３はたとえば、キーボード、マウス、タッチパネルなどである。出力装置５０４はたとえば、外部ディスプレイモニタに接続されたビデオグラフィックカードなどである。 Figure 5 is a diagram showing the hardware configuration of a computer 500 that realizes the arithmetic device 400. The arithmetic device 400 is composed of one or more computers 500. The computer 500 includes an arithmetic device 501 represented by a CPU (Central Processing Unit), a memory 502 such as a RAM (Random Access Memory), an input device 503, an output device 504, a memory controller 505, and an I/O (Input/Output) controller 506. The arithmetic device 501, the memory 502, the input device 503, the output device 504, and the I/O controller 506 are connected to each other via the memory controller 505. The input device 503 is, for example, a keyboard, a mouse, a touch panel, etc. The output device 504 is, for example, a video graphics card connected to an external display monitor, etc.

コンピュータ５００は、演算装置４００の各プログラムがＩ／Ｏコントローラ５０６を介してＳＳＤやＨＤＤ等の外部記憶装置５０８から読み出す。そしてこれらのプログラムを演算装置５０１およびメモリ５０２が協働して実行することにより、演算装置４００の機能が実現される。また、コンピュータ５００が不図示の入出力インタフェースを備え、必要なときに入出力インタフェースとコンピュータ５００が利用可能な媒体を介して、他の装置からプログラムが読み込まれてもよい。ここで媒体とは、たとえば入出力インタフェースに着脱可能な記憶媒体５９１、または通信媒体５９２、すなわち有線、無線、光などのネットワーク、または当該ネットワークを伝搬する搬送波やディジタル信号、を指す。 In the computer 500, each program of the arithmetic device 400 is read from an external storage device 508 such as an SSD or HDD via an I/O controller 506. The arithmetic device 501 and memory 502 cooperate to execute these programs, thereby realizing the functions of the arithmetic device 400. The computer 500 may also have an input/output interface (not shown), and programs may be read from other devices when necessary via the input/output interface and a medium available to the computer 500. The medium here refers to, for example, a storage medium 591 that is detachable from the input/output interface, or a communication medium 592, i.e., a network such as a wired, wireless, or optical network, or a carrier wave or digital signal that propagates through the network.

図６は音声認識部６００の構成の一例を示す図である。音声認識部６００は、任意の数の認識プロセス６０１と、言語モデル６０２と、ロードバランサ６０３と、コントローラ６０４とを備える。コントローラ６０４は、プロセス制御部４０７およびプロセス停止部４０８の動作指示に従って認識プロセス６０１の起動および停止を実行する。認識プロセス６０１は、それぞれのロボット２００から受信した音声データをテキストデータに変換する処理を行うプロセスの総称である。本実施の形態では個々の認識プロセス６０１には枝番を付して、認識プロセス６０１－１、認識プロセス６０１－２、・・などと呼ぶ。 Figure 6 is a diagram showing an example of the configuration of the voice recognition unit 600. The voice recognition unit 600 comprises an arbitrary number of recognition processes 601, a language model 602, a load balancer 603, and a controller 604. The controller 604 starts and stops the recognition processes 601 in accordance with the operational instructions of the process control unit 407 and the process stopping unit 408. The recognition process 601 is a general term for processes that perform processing to convert voice data received from each robot 200 into text data. In this embodiment, each recognition process 601 is assigned a subnumber and is referred to as recognition process 601-1, recognition process 601-2, ..., etc.

言語モデル６０２は、音声データをテキストデータに変換するための辞書の役割を有する。ロードバランサ６０３は、ロボット２００から送られてきた音声データを、後述する割当管理テーブル９００を参照していずれかの認識プロセス６０１に送信する。 The language model 602 acts as a dictionary for converting voice data into text data. The load balancer 603 transmits the voice data sent from the robot 200 to one of the recognition processes 601 by referring to the allocation management table 900 described later.

認識プロセス６０１－１の起動を例に、認識プロセス６０１を詳しく説明する。コントローラ６０４は、演算プロセスの起動を指示されると、音声認識処理のプログラムをＲＯＭなどから読みこみ、メモリ５０２に領域を確保して展開し、言語モデル６０２のマウントや各種の初期化処理を実行して認識プロセス６０１－１として動作させる。データの読み込みや各種の初期化処理が完了すると、認識プロセス６０１－１は起動完了となり、任意のロボット２００に対して割り当てが可能となる。 The recognition process 601 will be described in detail using the startup of the recognition process 601-1 as an example. When the controller 604 is instructed to start a calculation process, it reads the speech recognition processing program from ROM or the like, secures an area in the memory 502 and expands it, mounts the language model 602, and executes various initialization processes to operate as the recognition process 601-1. Once the data reading and various initialization processes are complete, the recognition process 601-1 has completed startup and can be assigned to any robot 200.

認識プロセス６０１－１は、いずれかのロボット２００に対して割り当てられるとそのロボット２００から出力された音声を受け取り、音声認識処理を実行する。コントローラ６０４は、後述する処理により認識プロセス６０１－１を停止させることを決定すると、メモリ５０２に確保した認識プロセス６０１－１の領域を解放する。 When the recognition process 601-1 is assigned to one of the robots 200, it receives the voice output from that robot 200 and executes voice recognition processing. When the controller 604 decides to stop the recognition process 601-1 by processing described below, it releases the area reserved for the recognition process 601-1 in the memory 502.

コントローラ６０４が認識プロセス６０１の起動指示を受けてから、新たな認識プロセス６０１の起動が完了するまでの時間が、時間測定部４０６が計測する時間である。コントローラ６０４は、再び演算プロセスの起動を指示されると、先ほどと同一の音声認識処理のプログラムを読みこみ、メモリ５０２の先ほどとは異なる領域に展開し、認識プロセス６０１－２として動作させる。 The time measured by the time measurement unit 406 is the time from when the controller 604 receives an instruction to start the recognition process 601 to when the start of the new recognition process 601 is completed. When the controller 604 is instructed to start the calculation process again, it reads the same voice recognition processing program as before, expands it in a different area of the memory 502 from before, and operates it as the recognition process 601-2.

すなわち認識プロセス６０１－１、認識プロセス６０１－２、・・などのそれぞれは、同一のプログラムコードを用いて生成された、音声認識プログラムの異なる実体である。本実施の形態では、それぞれの認識プロセス６０１には異なるロボット２００からの音声データが入力されるので、音声データが入力された後では、それぞれの認識プロセス６０１のメモリ内の情報は一致しない。 In other words, each of the recognition processes 601-1, 601-2, ... is a different instance of a voice recognition program generated using the same program code. In this embodiment, voice data from different robots 200 is input to each recognition process 601, so after the voice data is input, the information in the memory of each recognition process 601 does not match.

認識プロセス６０１－１は、１台のロボット２００から送信された音声データを処理する。認識プロセス６０１－１は、ある１台のロボット２００から送信された音声データの処理が完了すれば、別な１台のロボット２００から送信された音声データの処理は可能である。しかし認識プロセス６０１－１は、同時に２台のロボット２００から送信された音声データは処理できない。 The recognition process 601-1 processes voice data transmitted from one robot 200. Once the recognition process 601-1 has completed processing of the voice data transmitted from one robot 200, it can process the voice data transmitted from another robot 200. However, the recognition process 601-1 cannot process voice data transmitted from two robots 200 at the same time.

コントローラ６０４によるそれぞれの認識プロセス６０１の起動は、それぞれの認識プロセス６０１の「生成」とも呼べる。また、コントローラ６０４によるそれぞれの認識プロセス６０１の停止は、それぞれの認識プロセス６０１の「削除」とも呼べる。それぞれの認識プロセス６０１は、デプロイされたプログラム、インスタンス、プロセス、コンテナ、などとも呼ばれる。認識プロセス６０１は、Ｌｉｎｕｘ（登録商標）のコンテナ技術を用いて実現することもできるし、仮想マシンを用いて実現してもよい。また図６に示す構成は概念的な構成にすぎず、パブリッククラウドのSoftware-as-a-Service（以下、「ＳａａＳ」と記載する）を利用して、それぞれの認識プロセス６０１を個別のインスタンスとして起動してもよい。 The activation of each recognition process 601 by the controller 604 can also be called "creation" of each recognition process 601. Furthermore, the stopping of each recognition process 601 by the controller 604 can also be called "deletion" of each recognition process 601. Each recognition process 601 is also called a deployed program, an instance, a process, a container, etc. The recognition process 601 can be realized using Linux (registered trademark) container technology, or may be realized using a virtual machine. Furthermore, the configuration shown in FIG. 6 is merely a conceptual configuration, and each recognition process 601 may be started as an individual instance by using public cloud Software-as-a-Service (hereinafter referred to as "SaaS").

（データ）
図７は起動中のそれぞれの認識プロセス６０１を管理する待機管理テーブル７００の一例を示す図である。待機管理テーブル７００は複数のレコードから構成され、各レコードは処理ＩＤ７０１、ステータス７０２、および待機時間７０３のフィールドを有する。処理ＩＤ７０１には、それぞれの認識プロセス６０１に割り当てられるユニークなＩＤが格納される。処理ＩＤ７０１に格納する値は、たとえば認識プロセス６０１がプログラムの実行により実現される場合にそのプロセスのプロセスＩＤ番号を用いてもよい。ステータス７０２には、それぞれの認識プロセス６０１の割当状況、つまり現時点で音声認識処理に実際に利用されているか否かを示す情報が格納される。 (data)
7 is a diagram showing an example of a standby management table 700 that manages each running recognition process 601. The standby management table 700 is composed of a plurality of records, and each record has fields for a process ID 701, a status 702, and a standby time 703. A unique ID assigned to each recognition process 601 is stored in the process ID 701. For example, when the recognition process 601 is realized by executing a program, the value stored in the process ID 701 may be the process ID number of the process. The status 702 stores the allocation status of each recognition process 601, that is, information indicating whether or not the process is actually being used for speech recognition processing at the present time.

プロセス制御部４０７は、利用可能ないずれかの認識プロセス６０１を、あるロボット２００に占有させるために割当状態に遷移させ、あるロボット２００による音声認識処理が不要になると解除状態に遷移させる。待機時間７０３のフィールドには、それぞれの認識プロセス６０１において、ステータス７０２が待機状態に移行してからの経過時間を示す情報が格納される。図７に示す例では、経過時間そのものを待機時間７０３のフィールドに格納しているが、待機状態に遷移した時刻を格納し、必要な場合に現在時刻と待機時間７０３のフィールドの値とを比較させてもよい。 The process control unit 407 transitions one of the available recognition processes 601 to an allocated state in order to have it occupied by a certain robot 200, and transitions it to a released state when speech recognition processing by a certain robot 200 is no longer necessary. The field of standby time 703 stores information indicating the elapsed time since the status 702 transitioned to the standby state in each recognition process 601. In the example shown in FIG. 7, the elapsed time itself is stored in the field of standby time 703, but it is also possible to store the time of transition to the standby state and compare the current time with the value in the field of standby time 703 when necessary.

図８はプロセス制御部４０７が使用する数管理テーブル８００の一例を示す図である。数管理テーブル８００は、起動数８０１、割当数８０２、および予約数８０３のフィールドを有するレコードを１つのみ有する。起動数８０１のフィールドには、現在起動しているそれぞれの認識プロセス６０１の数が格納される。割当数８０２のフィールドには、ロボット２００から発行された割当要求に従い割り当てられている認識プロセス６０１の数が格納される。予約数８０３のフィールドには、まだ認識プロセス６０１が割り当てられておらず、短時間のうちに音声認識処理が必要になることが想定されるロボット２００の数が格納される。 Figure 8 is a diagram showing an example of a number management table 800 used by the process control unit 407. The number management table 800 has only one record having fields for activation number 801, allocation number 802, and reservation number 803. The activation number 801 field stores the number of each recognition process 601 that is currently activated. The allocation number 802 field stores the number of recognition processes 601 that have been allocated in accordance with an allocation request issued by the robot 200. The reservation number 803 field stores the number of robots 200 that have not yet been assigned a recognition process 601 and are expected to require voice recognition processing within a short period of time.

なお以下では、”起動数８０１のフィールドに格納される値”のことを起動数８０１と呼ぶこともあり、”割当数８０２のフィールドに格納される値”のことを割当数８０２と呼ぶこともあり、”予約数８０３のフィールドに格納される値”のことを予約数８０３と呼ぶこともある。 Note that below, the "value stored in the field of activation count 801" may be referred to as activation count 801, the "value stored in the field of allocation count 802" may be referred to as allocation count 802, and the "value stored in the field of reservation count 803" may be referred to as reservation count 803.

図９はロボットに対する認識プロセス６０１の割当情報を管理する割当管理テーブル９００の一例を示す図である。割当管理テーブル９００は複数のレコードから構成され、各レコードは、割当要求ＩＤ９０１、ロボットＩＤ９０２、および処理ＩＤ７０１のフィールドを有する。処理ＩＤ７０１のフィールドには、図７に例示した待機管理テーブル７００の処理ＩＤ７０１と同様の情報が格納されるのでここでは説明を省略する。 Figure 9 shows an example of an allocation management table 900 that manages allocation information for the recognition process 601 for a robot. The allocation management table 900 is composed of multiple records, and each record has fields for an allocation request ID 901, a robot ID 902, and a process ID 701. The process ID 701 field stores information similar to that of the process ID 701 in the waiting management table 700 illustrated in Figure 7, so a description of this will be omitted here.

割当要求ＩＤ９０１には、それぞれのロボット２００に対して発話を開始した際に発行される割当要求それぞれに設定されるユニークなＩＤが格納される。ロボットＩＤ９０２には、ロボット２００にそれぞれ設定されるユニークなＩＤが格納される。プロセス制御部４０７が割当要求を受信すると、要求元のロボット２００と割り当てる認識プロセス６０１を紐付けて割当要求ＩＤを付与し、割当管理テーブル９００に登録する。 In the allocation request ID 901, a unique ID is stored that is set for each allocation request issued to each robot 200 when the robot 200 starts speaking. In the robot ID 902, a unique ID is stored that is set for each robot 200. When the process control unit 407 receives an allocation request, it links the requesting robot 200 with the recognition process 601 to be allocated, assigns an allocation request ID, and registers them in the allocation management table 900.

（ロボットの処理を示すフローチャート）
図１０～図１２を参照してロボット２００に共通して備えられる各機能の処理を詳しく説明する。ただし図１０～図１２の説明では、特定のロボット２００であるロボット２００－１が各フローチャートを実行するとして説明を行う。 (Flowchart showing the robot's processing)
The processing of each function commonly provided to the robots 200 will be described in detail with reference to Figures 10 to 12. However, in the description of Figures 10 to 12, the description will be made on the assumption that the robot 200-1, which is a specific robot 200, executes each flowchart.

図１０は、ロボット２００に備えられる起動要求部２０９の処理を示すフローチャートである。起動要求部２０９は、ロボット２００－１の起動が完了すると処理を開始する。起動要求部２０９は、まずステップＳ１００１において、距離決定部２０８から監視距離の情報、たとえば「５ｍ」を取得する。次にステップＳ１００２では起動要求部２０９は、距離センサ２０２により取得された距離情報を取得する。続くステップＳ１０９１では、後述するユーザ検出処理により、ステップＳ１００１で取得した監視範囲に存在するユーザを検出する。 Figure 10 is a flowchart showing the processing of the activation request unit 209 provided in the robot 200. The activation request unit 209 starts processing when the activation of the robot 200-1 is completed. First, in step S1001, the activation request unit 209 acquires monitoring distance information, for example "5 m", from the distance determination unit 208. Next, in step S1002, the activation request unit 209 acquires distance information acquired by the distance sensor 202. In the following step S1091, a user detection process described below is performed to detect a user present in the monitoring range acquired in step S1001.

続くステップＳ１００３では起動要求部２０９は、ロボット２００－１から監視距離の範囲にユーザが存在すると判断する場合にはステップＳ１００４へ進む。起動要求部２０９は、ロボット２００－１から監視距離の範囲にユーザがいないと判断する場合にはステップＳ１００２へ戻る。ステップＳ１００４では起動要求部２０９は、演算装置４００に向けて認識プロセス６０１の起動要求を送信する。続くステップＳ１００５では起動要求部２０９は、ステップＳ１００２と同様に距離センサ２０２により取得された距離情報を取得する。続くステップＳ１０９２では起動要求部２０９は、ステップＳ１０９１と同様のユーザ検出処理を行いステップＳ１００６に進む。 In the following step S1003, if the start request unit 209 determines that a user is present within the monitoring distance range from the robot 200-1, the process proceeds to step S1004. If the start request unit 209 determines that a user is not present within the monitoring distance range from the robot 200-1, the process returns to step S1002. In step S1004, the start request unit 209 transmits a start request for the recognition process 601 to the computing device 400. In the following step S1005, the start request unit 209 acquires distance information acquired by the distance sensor 202 in the same manner as in step S1002. In the following step S1092, the start request unit 209 performs a user detection process similar to step S1091, and proceeds to step S1006.

ステップＳ１００６では起動要求部２０９は、ロボット２００－１から監視距離の範囲にユーザが存在するか否か、換言するとステップＳ１００３において監視範囲内に存在すると判断したユーザが、まだ監視範囲内にとどまっているか否かを判断する。起動要求部２０９は、ユーザが監視範囲内に存在すると判断する場合はステップＳ１００５に戻り、ユーザが監視範囲内に存在しないと判断する場合はステップＳ１００７に進む。ステップＳ１００７では起動要求部２０９は、認識プロセス６０１の停止要求を演算装置４００へ送信する。続くステップＳ１００８では起動要求部２０９は、ロボット２００－１が停止処理中であるか否かを判断し、停止処理中であると判断する場合には図１０に示す処理を終了し、停止処理中ではないと判断する場合にはステップＳ１００２に戻る。 In step S1006, the start request unit 209 determines whether or not a user is present within the monitoring distance range of the robot 200-1, in other words, whether or not the user determined to be present within the monitoring range in step S1003 is still remaining within the monitoring range. If the start request unit 209 determines that the user is present within the monitoring range, it returns to step S1005, and if it determines that the user is not present within the monitoring range, it proceeds to step S1007. In step S1007, the start request unit 209 sends a request to stop the recognition process 601 to the computing device 400. In the following step S1008, the start request unit 209 determines whether or not the robot 200-1 is in the process of stopping, and if it determines that the robot 200-1 is in the process of stopping, it ends the process shown in FIG. 10, and if it determines that the robot 200-1 is not in the process of stopping, it returns to step S1002.

図１１は、図１０のステップＳ１０９１やステップＳ１０９２に示したユーザ検出処理の詳細を示すフローチャートである。ステップＳ１１０１では起動要求部２０９は、テナント地図情報２１４の点群情報を取得する。ステップＳ１１０２では起動要求部２０９は、テナント地図情報２１４を基に現在のロボット２００－１の位置を特定する。なおロボット２００－１が移動しない場合は、固定の位置座標を用いてもよい。次にステップＳ１１０３では起動要求部２０９は、距離センサ２０２で取得した三次元点群情報から監視距離の外側に位置する点群を削除する。 Figure 11 is a flowchart showing the details of the user detection process shown in steps S1091 and S1092 of Figure 10. In step S1101, the start request unit 209 acquires point cloud information from the tenant map information 214. In step S1102, the start request unit 209 identifies the current position of the robot 200-1 based on the tenant map information 214. If the robot 200-1 does not move, fixed position coordinates may be used. Next, in step S1103, the start request unit 209 deletes point clouds located outside the monitoring distance from the three-dimensional point cloud information acquired by the distance sensor 202.

続くステップＳ１１０４では起動要求部２０９は、テナント地図情報２１４と三次元点群情報とを照らし合わせ、壁や柱などの固定物に相当する点群を削除する。ステップＳ１１０３およびステップＳ１１０４の処理により、ロボット２００－１が取得した点群内に含まれる固定物の情報が削除されるため、監視範囲内にユーザが存在する場合にはその点群のみが残る。ステップＳ１１０５では起動要求部２０９は、ステップＳ１１０４の処理後に点群が残っていると判断する場合にはステップＳ１１０６へ進み、点群が残っていないと判断する場合はステップＳ１１０７に進む。 In the following step S1104, the start request unit 209 compares the tenant map information 214 with the three-dimensional point cloud information, and deletes the point cloud corresponding to fixed objects such as walls and pillars. The processing of steps S1103 and S1104 deletes the information of fixed objects contained in the point cloud acquired by the robot 200-1, so that if a user is present within the monitoring range, only that point cloud remains. In step S1105, if the start request unit 209 determines that a point cloud remains after the processing of step S1104, it proceeds to step S1106, and if it determines that no point cloud remains, it proceeds to step S1107.

ステップＳ１１０６では起動要求部２０９は、監視範囲内にユーザを検出した旨の検出結果を出力して図１１に示す処理を終了する。ステップＳ１１０７では起動要求部２０９は、監視範囲内にユーザを検出しなかった旨の検出結果を出力して図１１に示す処理を終了する。 In step S1106, the start request unit 209 outputs a detection result indicating that a user was detected within the monitoring range, and ends the process shown in FIG. 11. In step S1107, the start request unit 209 outputs a detection result indicating that a user was not detected within the monitoring range, and ends the process shown in FIG. 11.

図１２は割当要求部２１０の処理を示すフローチャートである。割当要求部２１０は、ユーザの発話開始を検出すると図１２に示す処理を実行する。発話の開始はたとえば、マイク２０３に入力される音の大きさがあらかじめ定めた閾値よりも大きくなったことで検出できる。ステップＳ１２０２では起動要求部２０９は、演算装置４００に対して認識プロセス６０１の割当要求を送る。ステップＳ１２０３起動要求部２０９は、ユーザの対話終了を検知したか否かを判断する。 Figure 12 is a flowchart showing the processing of the allocation request unit 210. When the allocation request unit 210 detects the start of a user's speech, it executes the processing shown in Figure 12. The start of speech can be detected, for example, when the volume of the sound input to the microphone 203 becomes larger than a predetermined threshold. In step S1202, the start request unit 209 sends an allocation request for the recognition process 601 to the computing device 400. In step S1203, the start request unit 209 determines whether or not the end of the user's dialogue has been detected.

対話終了の検知はたとえば、センサを用いてロボット２００－１の正面にいたユーザの移動を検知することや、ユーザの発話終了後に一定時間が経過してもユーザから次の発話がないことの検知で実現できる。起動要求部２０９はユーザの対話終了を検知したと判断する場合にはステップＳ１２０４に進み、ユーザの対話終了を検知していないと判断する場合にはステップＳ１２０３に留まる。ステップＳ１２０４では起動要求部２０９は、演算装置４００に対して認識プロセス６０１の割当解除要求を送信する。 The end of the dialogue can be detected, for example, by using a sensor to detect the movement of the user who was in front of the robot 200-1, or by detecting that the user does not make a next utterance even after a certain time has passed since the user finished speaking. If the start request unit 209 determines that the end of the user's dialogue has been detected, it proceeds to step S1204, and if it determines that the end of the user's dialogue has not been detected, it remains in step S1203. In step S1204, the start request unit 209 sends a request to the computing device 400 to deallocate the recognition process 601.

なおステップＳ１２０２およびＳ１２０４では、起動要求部２０９が演算装置４００に対して送信する情報には送信元であるロボット２００－１を特定可能な情報が含まれている。たとえば通信プロトコルにＴＣＰ／ＩＰを用いる場合には、ＩＰパケットのヘッダに含まれる送信元のＩＰアドレスがロボット２００－１を示すものであることにより演算装置４００は送信元のロボット２００を特定できる。また通信プロトコルに送信者を特定する情報が含まれないＣＡＮ（登録商標）などの場合には、ロボット２００－１はペイロードにロボット２００－１を示す識別子を追加する。 In steps S1202 and S1204, the information sent by the activation request unit 209 to the computing device 400 includes information that can identify the sender, robot 200-1. For example, when TCP/IP is used as the communication protocol, the computing device 400 can identify the sender, robot 200, because the sender's IP address included in the header of the IP packet indicates robot 200-1. In addition, in the case of a communication protocol such as CAN (registered trademark) that does not include information that identifies the sender, robot 200-1 adds an identifier that indicates robot 200-1 to the payload.

（演算装置の処理を示すフローチャート）
図１３～図１５を参照して演算装置４００が備える機能の処理を詳しく説明する。ただし図１３～図１５の説明では、割当要求、解除要求、起動要求、および停止要求を送信したロボット２００がロボット２００－１であるとして説明する。 (Flowchart showing the processing of the arithmetic device)
The processing of the functions of the arithmetic device 400 will be described in detail with reference to Figures 13 to 15. However, in the description of Figures 13 to 15, it is assumed that the robot 200 that has transmitted the allocation request, release request, start request, and stop request is the robot 200-1.

図１３および図１４は、演算装置４００に含まれるプロセス制御部４０７の処理を示すフローチャートである。演算装置４００は、ロボット２００－１から何らかの要求、具体的には割当要求、解除要求、起動要求、および停止要求のいずれかを受信すると図１３に示す処理を開始する。 Figures 13 and 14 are flowcharts showing the processing of the process control unit 407 included in the computing device 400. The computing device 400 starts the processing shown in Figure 13 when it receives a request from the robot 200-1, specifically, an allocation request, a release request, a start request, or a stop request.

ステップＳ１３０１では演算装置４００は、受信した要求が割当要求および解除要求のいずれかであると判断すると、「Ａ」に進み図１４に示す処理を実行する。演算装置４００は、受信した要求が割当要求および解除要求のいずれでもないと判断する場合はステップＳ１３０２に進む。ステップＳ１３０２ではプロセス制御部４０７は、受信した要求が起動要求であるか否かを判断し、起動要求であると判断する場合にはステップＳ１３０３に進み、起動要求ではない、すなわち停止要求であると判断する場合はステップＳ１３０８に進む。 In step S1301, if the computing device 400 determines that the received request is either an allocation request or a release request, it proceeds to "A" and executes the process shown in FIG. 14. If the computing device 400 determines that the received request is neither an allocation request nor a release request, it proceeds to step S1302. In step S1302, the process control unit 407 determines whether the received request is a startup request, and if it is determined that it is a startup request, it proceeds to step S1303, and if it is determined that it is not a startup request, i.e., that it is a stop request, it proceeds to step S1308.

ステップＳ１３０３ではプロセス制御部４０７は、数管理テーブル８００を読み込み、次に説明する空処理数を算出する。空処理数とは、起動数８０１から割当数８０２および予約数８０３を引いた値である。図８に示す数管理テーブル８００の例では、起動数８０１が「３」、割当数８０２が「１」、予約数８０３が「１」なので、空処理数は、「３」から「２」を引いて「１」となる。続くステップＳ１３０４ではプロセス制御部４０７は、空処理数が１以上であると判断する場合はステップＳ１３０５に進み、空処理数がゼロであると判断する場合はステップＳ１３０６に進む。ステップＳ１３０５ではプロセス制御部４０７は、数管理テーブル８００の予約数８０３を「１」だけ増やして図１３に示す処理を終了する。 In step S1303, the process control unit 407 reads the number management table 800 and calculates the number of free processes, which will be described next. The number of free processes is the value obtained by subtracting the number of allocations 802 and the number of reservations 803 from the number of activations 801. In the example of the number management table 800 shown in FIG. 8, the number of activations 801 is "3", the number of allocations 802 is "1", and the number of reservations 803 is "1", so the number of free processes is "1" obtained by subtracting "2" from "3". In the following step S1304, if the process control unit 407 determines that the number of free processes is 1 or more, the process proceeds to step S1305, and if the process control unit 407 determines that the number of free processes is zero, the process proceeds to step S1306. In step S1305, the process control unit 407 increases the number of reservations 803 in the number management table 800 by "1" and ends the process shown in FIG. 13.

ステップＳ１３０６ではプロセス制御部４０７は、認識プロセス６０１を起動し、起動した認識プロセス６０１の情報を待機管理テーブル７００に登録する。続くステップＳ１３０７ではプロセス制御部４０７は、数管理テーブル８００の起動数８０１および予約数８０３をそれぞれ「１」だけ増やして図１３に示す処理を終了する。ステップＳ１３０８ではプロセス制御部４０７は、数管理テーブル８００の予約数８０３を「１」だけ減らして図１３に示す処理を終了する。 In step S1306, the process control unit 407 starts the recognition process 601 and registers information about the started recognition process 601 in the standby management table 700. In the following step S1307, the process control unit 407 increases the start number 801 and the reservation number 803 in the number management table 800 by "1", and ends the processing shown in FIG. 13. In step S1308, the process control unit 407 decreases the reservation number 803 in the number management table 800 by "1", and ends the processing shown in FIG. 13.

図１４は、図１３においてステップＳ１３０１において肯定判断される場合に実行される「Ａ」の処理を示すフローチャートである。まずステップＳ１４０２ではプロセス制御部４０７は、受信した要求が割当要求であるか否かを判断し、割当要求であると判断する場合にはステップＳ１４０３に進み、割当要求ではない、すなわち割当解除要求であると判断する場合はステップＳ１４０８に進む。ステップＳ１４０３ではプロセス制御部４０７は、待機管理テーブル７００を参照してデータを取得する。 Figure 14 is a flowchart showing the processing of "A" that is executed when a positive judgment is made in step S1301 in Figure 13. First, in step S1402, the process control unit 407 judges whether the received request is an allocation request or not, and if it is judged to be an allocation request, the process proceeds to step S1403, and if it is judged not to be an allocation request, i.e., a deallocation request, the process proceeds to step S1408. In step S1403, the process control unit 407 refers to the standby management table 700 to acquire data.

続くステップＳ１４０４ではプロセス制御部４０７は、待機管理テーブル７００の中から、ステータス７０２のフィールドの値が「待機」であり、かつ待機時間７０３のフィールドの値が最も小さい、換言すると待機している時間が最も短いレコードを選択する。そしてプロセス制御部４０７は、そのレコードのステータス７０２のフィールドの値を「割当」に変更する。続くステップＳ１４０５ではプロセス制御部４０７は、ステップＳ１４０４で選択したレコードの処理ＩＤ７０１の値で特定される認識プロセス６０１に対して、ロボット２００－１から送信された音声情報を送るようにロードバランサ６０３に指示をする。 In the next step S1404, the process control unit 407 selects a record from the standby management table 700 in which the value of the status 702 field is "standby" and the value of the standby time 703 field is the smallest, in other words, the standby time is the shortest. The process control unit 407 then changes the value of the status 702 field of that record to "allocated". In the next step S1405, the process control unit 407 instructs the load balancer 603 to send the voice information sent from the robot 200-1 to the recognition process 601 identified by the value of the process ID 701 of the record selected in step S1404.

続くステップＳ１４０６ではプロセス制御部４０７は、割当要求を送信したロボット２００－１のロボットＩＤと、選択した認識プロセス６０１のＩＤとを割当管理テーブル９００の新たなレコードを追加して記録する。最後にステップＳ１４０７においてプロセス制御部４０７は、数管理テーブル８００の割当数８０２の値を「１」増やし、予約数８０３を「１」減らして、図１４に示す処理を終了する。 In the next step S1406, the process control unit 407 adds a new record to the allocation management table 900 and records the robot ID of the robot 200-1 that sent the allocation request and the ID of the selected recognition process 601. Finally, in step S1407, the process control unit 407 increases the value of the allocation number 802 in the number management table 800 by "1" and decreases the reservation number 803 by "1", and ends the processing shown in FIG. 14.

ステップＳ１４０２において否定判断されると実行されるステップＳ１４０８ではプロセス制御部４０７は、割当管理テーブル９００において、割当解除要求を送信したロボット２００－１のロボットＩＤが含まれるレコードを特定する。続くステップＳ１４０９ではプロセス制御部４０７は、ステップＳ１４０８において特定したレコードにおけるステータス７０２のフィールドの値を「待機」に変更し、待機時間７０３のフィールドの値を「０」に変更する。続くステップＳ１４１０ではプロセス制御部４０７は、割当管理テーブル９００から該当の登録を削除する。最後にプロセス制御部４０７は、ステップＳ１４１１にて数管理テーブル８００の割当数８０２を「１」減らし、予約数８０３を「１」増して図１４に示す処理を終了する。 In step S1408, which is executed if a negative judgment is made in step S1402, the process control unit 407 identifies a record in the allocation management table 900 that includes the robot ID of the robot 200-1 that sent the release request for allocation. In the following step S1409, the process control unit 407 changes the value of the status 702 field in the record identified in step S1408 to "waiting" and changes the value of the wait time 703 field to "0". In the following step S1410, the process control unit 407 deletes the corresponding entry from the allocation management table 900. Finally, in step S1411, the process control unit 407 decrements the allocation number 802 in the number management table 800 by "1" and increments the reservation number 803 by "1", and ends the processing shown in FIG. 14.

図１５はプロセス停止部４０８の処理を示すフローチャートである。プロセス停止部４０８はたとえば、１０秒、１分、１０分などの所定の時間ごとに、図１５に示す処理を実行する。まずステップＳ１５０１ではプロセス停止部４０８は、待機管理テーブル７００の全レコードの待機時間７０３のフィールドの値を読み込む。続くステップＳ１５０２ではプロセス停止部４０８は、ステップＳ１５０１で読み込んだレコードのうち、待機時間７０３のフィールドの値が予め設定された閾値以上の値を有するレコードを特定する。さらにプロセス停止部４０８は、その特定したレコードに記載された処理ＩＤ７０１で特定される認識プロセス６０１を停止させる。 Figure 15 is a flowchart showing the processing of the process stopping unit 408. The process stopping unit 408 executes the processing shown in Figure 15 at predetermined intervals, such as every 10 seconds, 1 minute, or 10 minutes. First, in step S1501, the process stopping unit 408 reads the values in the wait time 703 field of all records in the wait management table 700. In the following step S1502, the process stopping unit 408 identifies, from the records read in step S1501, records in which the value of the wait time 703 field is equal to or greater than a preset threshold value. Furthermore, the process stopping unit 408 stops the recognition process 601 identified by the process ID 701 written in the identified record.

続くステップＳ１５０３ではプロセス停止部４０８は、ステップＳ１５０２において停止させた認識プロセス６０１の数だけ、数管理テーブル８００における起動数８０１の値を減らす。最後にステップＳ１５０４においてプロセス停止部４０８は、待機管理テーブル７００からステップＳ１５０２において停止した認識プロセス６０１が記載されているレコードを削除して図１５に示す処理を終了する。 In the next step S1503, the process stopping unit 408 reduces the value of the activation count 801 in the count management table 800 by the number of recognition processes 601 stopped in step S1502. Finally, in step S1504, the process stopping unit 408 deletes the record in which the recognition process 601 stopped in step S1502 is written from the standby management table 700, and ends the processing shown in FIG. 15.

上述した第１の実施の形態によれば、次の作用効果が得られる。
（１）演算装置４００は、ユーザとの音声対話が発生する可能性を示すリソース制御情報、および音声情報を送信する複数のロボット２００と通信するサーバ通信部４０１と、ロボット２００から受信する音声情報を文字情報に変換する認識プロセス６０１を複数実行可能な音声認識部６００と、ロボット２００から受信するリソース制御情報を用いて認識プロセス６０１の起動数を管理し、認識プロセス６０１の起動および停止を行うプロセス制御部４０７およびプロセス停止部４０８とを備える。そのため演算装置４００は、ユーザがロボット２００に近づくと認識プロセス６０１が起動されるので、ユーザが発話してから認識プロセス６０１による音声認識が開始されるまでの時間を短縮できる。 According to the above-described first embodiment, the following advantageous effects can be obtained.
(1) The arithmetic device 400 includes a server communication unit 401 that communicates with a plurality of robots 200 that transmit resource control information indicating the possibility of a voice dialogue with a user and voice information, a voice recognition unit 600 that can execute a plurality of recognition processes 601 that convert voice information received from the robots 200 into text information, and a process control unit 407 and a process stop unit 408 that manage the number of recognition processes 601 that are activated using the resource control information received from the robots 200 and start and stop the recognition processes 601. Therefore, the arithmetic device 400 activates the recognition process 601 when the user approaches the robot 200, thereby shortening the time from when the user speaks to when voice recognition by the recognition process 601 is started.

本実施の形態で説明したように、監視距離は人間の歩行速度を用いて決定しているので、ほとんどのケースでユーザがロボット２００の近くに到着した際には認識プロセス６０１の起動は完了している。しかし途中からユーザがロボット２００に走って近づいた場合や、ユーザが遠くからロボット２００に対して発話を行った場合には、発話が行われるタイミングでは認識プロセス６０１が起動中である可能性もある。しかしそのような場合であっても、すでに認識プロセス６０１の起動は開始されているので、ユーザが発話してから認識プロセス６０１による音声認識が開始されるまでの時間を短縮できる効果を有する。 As explained in this embodiment, the monitoring distance is determined using the walking speed of a human, so in most cases, the activation of the recognition process 601 is completed when the user arrives near the robot 200. However, if the user approaches the robot 200 by running, or if the user speaks to the robot 200 from a distance, the recognition process 601 may be in progress when the speech is made. However, even in such a case, the recognition process 601 has already started to be activated, so this has the effect of shortening the time from when the user speaks to when voice recognition by the recognition process 601 begins.

なお、ユーザの存在有無にかかわらず認識プロセス６０１を予め起動しておき、音声情報を受信すると即座に音声認識処理を開始しつつ新たな認識プロセス６０１をさらに起動する手法も考えられる。しかしこの場合は、少なくとも１つの認識プロセス６０１を常にアイドル状態で起動するのでリソースの浪費となる。認識プロセス６０１を実行するコンピュータ５００が、認識プロセス６０１以外のアプリケーションにも利用している場合にはそのアプリケーションが利用可能なリソースを減少させるデメリットがある。また認識プロセス６０１がＳａａＳを利用している場合には、アイドル状態で待機させる認識プロセス６０１の利用料金が余計な支出となるデメリットがある。そのため、このようなデメリットが生じない本実施の形態の手法に優位性がある。 It is also possible to start the recognition process 601 in advance regardless of whether the user is present, and when voice information is received, start the voice recognition process immediately while also starting a new recognition process 601. In this case, however, at least one recognition process 601 is always started in an idle state, which wastes resources. If the computer 500 that executes the recognition process 601 is also used for an application other than the recognition process 601, there is a disadvantage that the resources available to that application are reduced. Also, if the recognition process 601 uses SaaS, there is a disadvantage that the usage fee for the recognition process 601 that is kept waiting in an idle state is an additional expense. Therefore, the method of this embodiment, which does not cause such disadvantages, is advantageous.

（２）リソース制御情報には、ロボット２００がユーザとの距離に基づき出力する認識プロセスの起動要求が含まれる。プロセス制御部４０７は、ロボットが送信する起動要求に基づき認識プロセス６０１を起動し、起動した認識プロセス６０１をロボット２００に割当てることでロボットが送信する音声情報を文字情報に変換する。そのため、すでに起動している認識プロセス６０１をロボット２００－１に割り当てることができるので、ユーザが監視距離のそばなどロボット２００－１から離れた位置から発話した場合にも、即座に音声認識処理を開始することができる。 (2) The resource control information includes a request to start a recognition process that the robot 200 outputs based on the distance from the user. The process control unit 407 starts the recognition process 601 based on the start request sent by the robot, and converts the voice information sent by the robot into text information by assigning the started recognition process 601 to the robot 200. Therefore, since the recognition process 601 that is already running can be assigned to the robot 200-1, voice recognition processing can be started immediately even if the user speaks from a position far away from the robot 200-1, such as near the monitoring distance.

（３）プロセス制御部４０７は、ロボット２００－１から認識プロセス６０１の割当要求を受信した場合に（図１４のＳ１４０２：ＹＥＳ）、ステップＳ１４０３～Ｓ１４０７の処理によりロボット２００－１に認識プロセス６０１を割り当てる。これは割当管理テーブル９００を用いて管理される。 (3) When the process control unit 407 receives a request to allocate the recognition process 601 from the robot 200-1 (S1402: YES in FIG. 14), it allocates the recognition process 601 to the robot 200-1 by processing steps S1403 to S1407. This is managed using the allocation management table 900.

（４）プロセス停止部４０８は、ロボット２００－１から認識プロセス６０１の割当解除要求を受信した場合に（図１４のＳ１４０２：ＮＯ）、ステップＳ１４０８～Ｓ１４１１の処理によりロボット２００－１に対する認識プロセス６０１の割当を解除する。 (4) When the process stop unit 408 receives a request to deallocate the recognition process 601 from the robot 200-1 (S1402: NO in FIG. 14), it deallocates the recognition process 601 to the robot 200-1 by processing steps S1408 to S1411.

（５）音声入力装置でもあるそれぞれのロボット２００は、受信した音声情報を文字情報に変換する認識プロセス６０１を複数実行可能な音声認識部６００を備える演算装置４００と通信可能なデータ送受信部２０７と、ユーザとの距離に基づき演算装置４００に認識プロセス６０１の起動を要求する起動要求を送信する起動要求部２０９と、ユーザの発話を録音し音声情報として演算装置４００に送信するデータ送受信部２０７と、を備える。 (5) Each robot 200, which is also a voice input device, is equipped with a data transmission/reception unit 207 capable of communicating with a calculation device 400 having a voice recognition unit 600 capable of executing multiple recognition processes 601 that convert received voice information into text information, an activation request unit 209 that transmits an activation request to the calculation device 400 to request activation of the recognition process 601 based on the distance from the user, and the data transmission/reception unit 207 that records the user's speech and transmits it to the calculation device 400 as voice information.

（６）ロボット２００－１は、認識プロセス６０１の起動時間、およびユーザとロボット２００－１との相対速度の積を監視距離として算出する距離決定部２０８を備える。ただし本実施の形態ではロボット２００は移動しないので、相対速度の代わりに、予めロボット２００に保存される人歩行速度情報２１３を用いる。起動要求部２０９は、図１０のステップＳ１０９１、Ｓ１００３、Ｓ１００４に示すように、ユーザとロボット２００－１との距離が監視距離以下の場合に起動要求を送信する。そのため、ユーザがロボット２００－１に到達するときには認識プロセス６０１の起動が完了しており、即座に音声認識が開始できる。 (6) Robot 200-1 is equipped with distance determination unit 208 that calculates the product of the start-up time of recognition process 601 and the relative speed between the user and robot 200-1 as the monitoring distance. However, in this embodiment, robot 200 does not move, so human walking speed information 213 stored in advance in robot 200 is used instead of the relative speed. As shown in steps S1091, S1003, and S1004 of FIG. 10, start-up request unit 209 transmits a start-up request when the distance between the user and robot 200-1 is equal to or less than the monitoring distance. Therefore, when the user reaches robot 200-1, start-up of recognition process 601 has been completed, and voice recognition can start immediately.

（７）ロボット２００－１の起動要求部２０９は、ステップＳ１０９２、Ｓ１００６、Ｓ１００７に示すように、ユーザとロボット２００－１との距離が監視距離よりも遠くなると認識プロセス６０１を停止させる停止要求を送信する。そのため演算装置４００において認識プロセス６０１のために用いるリソースを適切に節約できる。 (7) As shown in steps S1092, S1006, and S1007, the start request unit 209 of the robot 200-1 transmits a stop request to stop the recognition process 601 when the distance between the user and the robot 200-1 becomes greater than the monitoring distance. This allows the resources used for the recognition process 601 in the computing device 400 to be appropriately saved.

（変形例１）
上述した第１の実施の形態では、それぞれのロボット２００が演算装置４００に対して割当要求および割当解除要求を送信した。しかしそれぞれのロボット２００は、割当要求および割当解除要求の少なくとも一方を送信しなくてもよい。ロボット２００が割当要求を送信しない場合には、演算装置４００はロボット２００から音声情報を送信するたびに割当管理テーブル９００を参照し、音声情報を送信したロボット２００への認識プロセスの割当の有無を判断する。そして演算装置４００は、音声情報を送信したロボット２００に認識プロセスが割り当てられていないと判断する場合には、割当要求を受信した場合と同様の処理を行う。 (Variation 1)
In the first embodiment described above, each robot 200 transmits an allocation request and a deallocation request to the arithmetic device 400. However, each robot 200 does not have to transmit at least one of an allocation request and a deallocation request. If the robot 200 does not transmit an allocation request, the arithmetic device 400 refers to the allocation management table 900 each time voice information is transmitted from the robot 200, and determines whether or not a recognition process has been allocated to the robot 200 that transmitted the voice information. If the arithmetic device 400 determines that a recognition process has not been allocated to the robot 200 that transmitted the voice information, it performs the same processing as when an allocation request is received.

ロボット２００が割当解除要求を送信しない場合には、演算装置４００は次の処理を行う。すなわち演算装置４００は割当管理テーブル９００を参照し、認識プロセス６０１が割り当てられており、かつ所定時間より長く音声情報を送信していないロボット２００を特定して、そのロボット２００に対する認識プロセス６０１の割り当てを解除する。 If the robot 200 does not send a release request, the calculation device 400 performs the following process. That is, the calculation device 400 refers to the allocation management table 900, identifies a robot 200 to which the recognition process 601 is assigned and which has not transmitted voice information for a predetermined period of time, and releases the allocation of the recognition process 601 to that robot 200.

この変形例１によれば、次の作用効果が得られる。
（８）プロセス制御部４０７は、認識プロセス６０１が割り当てられていないロボット２００から音声情報を受信した場合にロボットに認識プロセスを割り当てる。そのため演算装置４００は自発的に認識プロセス６０１の割り当てを実行することで、ロボット２００の処理負荷を下げることができる。 According to the first modification, the following effects can be obtained.
(8) When the process control unit 407 receives voice information from the robot 200 to which the recognition process 601 is not assigned, the process control unit 407 assigns a recognition process to the robot. Therefore, the arithmetic device 400 can reduce the processing load of the robot 200 by voluntarily assigning the recognition process 601.

（９）プロセス制御部４０７は、認識プロセス６０１が割り当てられたロボット２００から所定時間より長く音声情報を受信しない場合にロボットに対する認識プロセスの割当を解除する。そのため演算装置４００は自発的に認識プロセス６０１の割り当てを解除することで、ロボット２００の処理負荷を下げることができる。 (9) The process control unit 407 releases the allocation of the recognition process to the robot if it does not receive voice information from the robot 200 to which the recognition process 601 is assigned for a period longer than a predetermined time. Therefore, the computing device 400 can reduce the processing load of the robot 200 by voluntarily releasing the allocation of the recognition process 601.

（変形例２）
上述した第１の実施の形態では、演算装置４００はそれぞれのロボット２００に対して１つの認識プロセスのみを割当てた。しかし演算装置４００は、それぞれのロボット２００に対して複数の認識プロセスを割当てもよい。この場合にはそれぞれのロボット２００は、そのロボット２００から監視距離以内に存在するユーザの数に応じて起動要求を送信する。ロボット２００は、音声が発せられる方向ごとに一意な識別子を付して音声情報を演算装置４００に出力してもよい。 (Variation 2)
In the above-described first embodiment, the arithmetic device 400 assigns only one recognition process to each robot 200. However, the arithmetic device 400 may assign multiple recognition processes to each robot 200. In this case, each robot 200 transmits an activation request according to the number of users existing within a monitoring distance from the robot 200. The robot 200 may output voice information to the arithmetic device 400 by assigning a unique identifier for each direction from which the voice is emitted.

たとえばロボット２００－１は、左方向と右方向からそれぞれ音声が発せられると、左方向からの音声の音声情報には識別子として「２００－１Ｌ」を付し、右方向からの音声の音声情報には識別子として「２００－１Ｒ」を付して送信する。演算装置４００は、「２００－１Ｌ」と「２００－１Ｒ」の識別子が付された音声情報を、それぞれ異なる認識プロセス６０１に割り振られるようにロードバランサ６０３に設定する。この変形例２によれば、それぞれのロボット２００が複数のユーザから同時に話しかけられた場合にも即座に対応できる利点を有する。 For example, when robot 200-1 receives voices from the left and right directions, it assigns the identifier "200-1L" to the voice information of the voice from the left direction and the identifier "200-1R" to the voice information of the voice from the right direction and transmits the same. The computing device 400 sets the load balancer 603 so that the voice information with the identifiers "200-1L" and "200-1R" is assigned to different recognition processes 601. This second modification has the advantage that each robot 200 can respond immediately even when multiple users are speaking to it at the same time.

（変形例３）
上述した第１の実施の形態では、演算装置４００はロボット２００から認識プロセス６０１の停止要求を受けた場合に、図１３のステップＳ１３０８に示すように予約数８０３を減少させるのみで認識プロセス６０１の停止は行わなかった。しかし演算装置４００のプロセス停止部４０８は、ロボット２００から認識プロセス６０１の停止要求を受けた場合に認識プロセス６０１を停止させてもよい。この場合にプロセス停止部４０８は、待機管理テーブル７００を参照して待機時間７０３が最も長い認識プロセス６０１を停止させることが望ましい。 (Variation 3)
In the above-described first embodiment, when the arithmetic device 400 receives a request to stop the recognition process 601 from the robot 200, it only reduces the reservation number 803 as shown in step S1308 of Fig. 13, but does not stop the recognition process 601. However, the process stopping unit 408 of the arithmetic device 400 may stop the recognition process 601 when it receives a request to stop the recognition process 601 from the robot 200. In this case, it is preferable that the process stopping unit 408 refers to the standby management table 700 and stops the recognition process 601 with the longest standby time 703.

この変形例３によれば、次の作用効果が得られる。
（１０）プロセス停止部４０８は、ロボット２００から認識プロセス６０１を停止する要求を受信した場合に認識プロセス６０１を停止する。そのため、不要な認識プロセス６０１を早期に停止させてリソースをさらに節約できる。 According to the third modification, the following advantageous effects can be obtained.
(10) The process stopping unit 408 stops the recognition process 601 when a request to stop the recognition process 601 is received from the robot 200. Therefore, unnecessary recognition processes 601 can be stopped early to further save resources.

―第２の実施の形態―
図１６～図１７を参照して、認識システムの第２の実施の形態を説明する。以下の説明では、第１の実施の形態と同じ構成要素には同じ符号を付して相違点を主に説明する。特に説明しない点については、第１の実施の形態と同じである。本実施の形態では、主に、ロボットがセンサ情報をそのまま演算装置に送信する点で、第１の実施の形態と異なる。第２の実施の形態における音声認識システムのハードウエア構成は第１の実施の形態と同様なので説明を省略する。 --Second embodiment--
A second embodiment of the recognition system will be described with reference to Figures 16 and 17. In the following description, the same components as in the first embodiment are given the same reference numerals, and differences will be mainly described. Points that are not particularly described are the same as in the first embodiment. This embodiment differs from the first embodiment mainly in that the robot transmits sensor information directly to the arithmetic unit. The hardware configuration of the voice recognition system in the second embodiment is the same as in the first embodiment, so description will be omitted.

図１６は第２の実施の形態におけるロボット２００－１の構成図であり、第１の実施の形態における図２に対応する。図１６に示す構成は、第１の実施の形態に比べて距離決定部２０８、起動要求部２０９、および割当要求部２１０が含まれない点が異なる。 Figure 16 is a configuration diagram of robot 200-1 in the second embodiment, and corresponds to Figure 2 in the first embodiment. The configuration shown in Figure 16 differs from the first embodiment in that it does not include distance determination unit 208, activation request unit 209, and allocation request unit 210.

図１７は第２の実施の形態における演算装置４００の構成図であり、第１の実施の形態における図４に対応する。図１７に示す構成は、第１の実施の形態に比べて、距離決定部２０８、起動要求部２０９、割当要求部２１０、およびテナント地図情報２１４をさらに含む点が異なる。 Figure 17 is a configuration diagram of the computing device 400 in the second embodiment, and corresponds to Figure 4 in the first embodiment. The configuration shown in Figure 17 differs from the first embodiment in that it further includes a distance determination unit 208, a start request unit 209, an allocation request unit 210, and tenant map information 214.

本実施の形態ではロボット２００のそれぞれは、センサ情報をそのまま演算装置４００に送信する。センサ情報を受信した演算装置４００は、第１の実施の形態と同様に距離決定部２０８、起動要求部２０９、および割当要求部２１０を動作させることで、認識プロセス６０１の管理を行う。すなわち第２の実施の形態では、演算装置４００は受信するセンサ情報を用いて認識プロセス６０１を制御するので、センサ情報を「リソース制御情報」と呼ぶことができる。 In this embodiment, each robot 200 transmits the sensor information directly to the computing device 400. The computing device 400 receives the sensor information and manages the recognition process 601 by operating the distance determination unit 208, the activation request unit 209, and the allocation request unit 210, as in the first embodiment. That is, in the second embodiment, the computing device 400 controls the recognition process 601 using the received sensor information, so the sensor information can be called "resource control information."

上述した第２の実施の形態によれば、次の作用効果が得られる。
（１１）リソース制御情報には、ロボットに搭載されるセンサの出力であるセンサ情報が含まれる。プロセス制御部４０７は、センサ情報を用いてロボット２００から所定の監視距離以内に音声を発話するユーザが存在すると判断する場合に認識プロセス６０１を起動する。そのため、第１の実施の形態に比べてロボット２００の処理負荷を軽減でき、リソースが少なく演算能力が低いハードウエアでも認識システムに用いることができる。 According to the above-described second embodiment, the following advantageous effects can be obtained.
(11) The resource control information includes sensor information that is the output of a sensor mounted on the robot. The process control unit 407 starts the recognition process 601 when it determines, using the sensor information, that a user who is speaking is present within a predetermined monitoring distance from the robot 200. Therefore, the processing load on the robot 200 can be reduced compared to the first embodiment, and even hardware with few resources and low computing power can be used in the recognition system.

（１２）リソース制御情報には、ロボット２００に搭載されるセンサの出力であるセンサ情報が含まれる。プロセス制御部４０７は、ロボット２００から所定の監視距離以内に存在していたユーザが、監視距離よりも遠くに移動したことをセンサ情報を用いて判断する場合に認識プロセスを停止する。そのため、第１の実施の形態に比べてロボット２００の処理負荷を軽減でき、リソースが少なく演算能力が低いハードウエアでも認識システムに用いることができる。 (12) The resource control information includes sensor information, which is the output of a sensor mounted on the robot 200. The process control unit 407 stops the recognition process when it determines, using the sensor information, that a user who was present within a predetermined monitoring distance from the robot 200 has moved farther than the monitoring distance. Therefore, the processing load on the robot 200 can be reduced compared to the first embodiment, and even hardware with few resources and low computing power can be used in the recognition system.

（第２の実施の形態の変形例）
ロボット２００－１のプロセス停止部４０８は、センサ情報を用いてユーザとロボット２００－１との距離が、監視距離よりも遠い停止距離よりも遠いと判断する場合に認識プロセス６０１を停止してもよい。たとえば監視距離が１０ｍ、停止距離が１５ｍの場合に、ユーザが遠方からロボット２００－１に近づき、１０ｍ以内になると１つの認識プロセス６０１が新たに起動され、そのユーザが１５ｍよりも遠くに移動すると１つの認識プロセス６０１が停止される。このとき停止される認識プロセス６０１は、待機時間が最も長い認識プロセス６０１である。 (Modification of the second embodiment)
The process stopping unit 408 of the robot 200-1 may stop the recognition process 601 when it determines, using the sensor information, that the distance between the user and the robot 200-1 is farther than the monitoring distance or the stopping distance. For example, when the monitoring distance is 10 m and the stopping distance is 15 m, when the user approaches the robot 200-1 from a distance and comes within 10 m, one recognition process 601 is newly started, and when the user moves farther than 15 m, one recognition process 601 is stopped. The recognition process 601 stopped at this time is the recognition process 601 with the longest waiting time.

―第３の実施の形態―
図１８を参照して、認識システムの第３の実施の形態を説明する。以下の説明では、第１の実施の形態と同じ構成要素には同じ符号を付して相違点を主に説明する。特に説明しない点については、第１の実施の形態と同じである。本実施の形態では、主に、ロボットが移動する点で、第１の実施の形態と異なる。 --Third embodiment--
A third embodiment of the recognition system will be described with reference to Fig. 18. In the following description, the same components as those in the first embodiment are denoted by the same reference numerals, and differences will be mainly described. Points that are not particularly described are the same as those in the first embodiment. This embodiment differs from the first embodiment mainly in the movement of the robot.

音声認識システムＳのハードウエア構成および機能構成は、距離決定部２０８を除いて第１の実施の形態と同様である。そのため距離決定部２０８の動作のみを説明する。 The hardware configuration and functional configuration of the voice recognition system S are the same as those of the first embodiment, except for the distance determination unit 208. Therefore, only the operation of the distance determination unit 208 will be explained.

図１８は、第３の実施の形態における距離決定部２０８の処理を示すフローチャートである。以下では、ロボット２００－１が実行する距離決定部２０８の処理を説明する。ステップＳ１８０１では距離決定部２０８は、予め保存されている情報である人歩行速度情報２１３を読み込む。続くステップＳ１８０２では距離決定部２０８は、演算装置４００からプロセス起動時間の情報を取得する。続くステップＳ１８０３では距離決定部２０８は、監視距離算出要求が発生した時点でのロボット２００－１の移動速度を算出する。ロボット２００－１の移動速度の算出は様々な手段を利用できるが、一例を挙げるとロボット２００－１に付随する車輪の回転速度から算出することができる。 Figure 18 is a flowchart showing the processing of the distance determination unit 208 in the third embodiment. The processing of the distance determination unit 208 executed by the robot 200-1 will be described below. In step S1801, the distance determination unit 208 reads human walking speed information 213, which is information stored in advance. In the following step S1802, the distance determination unit 208 acquires information on the process startup time from the computing device 400. In the following step S1803, the distance determination unit 208 calculates the movement speed of the robot 200-1 at the time when the monitoring distance calculation request is generated. The movement speed of the robot 200-1 can be calculated using various means, but as one example, it can be calculated from the rotation speed of the wheels attached to the robot 200-1.

ステップＳ１８０４では距離決定部２０８は、プロセス起動時間、人歩行速度、およびロボット移動速度を用いて監視距離を算出する。プロセス起動時間が５秒、人歩行速度が毎秒１ｍ、ロボット移動速度が毎秒０．５ｍだった場合は、（１ｍ毎秒＋０．５ｍ毎秒）×５秒となり、監視距離は７．５ｍとなる。ただしこの場合に、ユーザおよびロボット２００－１の進行方向を考慮して両者のベクトル和を相対速度としてもよいし、計算を簡略化し、かつ認識プロセス６０１の起動遅れを防止するために、両者の進行方向を考慮せずに両者の速度の和を相対速度としてもよい。 In step S1804, the distance determination unit 208 calculates the monitoring distance using the process startup time, the human walking speed, and the robot moving speed. If the process startup time is 5 seconds, the human walking speed is 1 m per second, and the robot moving speed is 0.5 m per second, then the monitoring distance is (1 m per second + 0.5 m per second) x 5 seconds, and the monitoring distance is 7.5 m. However, in this case, the vector sum of the user and robot 200-1 may be taken as the relative speed taking into account their moving directions, or, to simplify the calculation and prevent a delay in starting the recognition process 601, the sum of their speeds may be taken as the relative speed without taking into account their moving directions.

上述した第３の実施の形態によれば、次の作用効果が得られる。
（１３）ロボット２００－１は、認識プロセス６０１の起動時間、およびユーザとロボット２００－１との相対速度の積を監視距離として算出する距離決定部２０８を備える。そのため、それぞれのロボット２００の移動速度を考慮することにより、それぞれのロボット２００が移動している場合でもユーザが発話する時点で音声認識処理を起動完了状態にすることができる。 According to the above-described third embodiment, the following advantageous effects can be obtained.
(13) The robot 200-1 includes a distance determination unit 208 that calculates the product of the activation time of the recognition process 601 and the relative speed between the user and the robot 200-1 as a monitoring distance. Therefore, by taking into consideration the moving speed of each robot 200, the voice recognition process can be brought into a complete activation state at the time the user speaks even if each robot 200 is moving.

上述した各実施の形態および変形例において、機能ブロックの構成は一例に過ぎない。別々の機能ブロックとして示したいくつかの機能構成を一体に構成してもよいし、１つの機能ブロック図で表した構成を２以上の機能に分割してもよい。また各機能ブロックが有する機能の一部を他の機能ブロックが備える構成としてもよい。特にプロセス制御部４０７およびプロセス停止部４０８は、両者の機能を併せ持つ１つの機能ブロックに統合してもよい。 In each of the above-mentioned embodiments and variants, the functional block configurations are merely examples. Several functional configurations shown as separate functional blocks may be configured together, or a configuration shown in a single functional block diagram may be divided into two or more functions. Also, some of the functions of each functional block may be provided by other functional blocks. In particular, the process control unit 407 and the process stopping unit 408 may be integrated into a single functional block that combines the functions of both.

上述した各実施の形態および変形例において、ロボットおよび演算装置が実行するプログラムは不図示のＲＯＭに格納されるとしたが、プログラムは不揮発性の記憶領域に格納されていてもよい。また、ロボットおよび演算装置が不図示の入出力インタフェースを備え、必要なときに入出力インタフェースとロボットおよび演算装置が利用可能な媒体を介して、他の装置からプログラムが読み込まれてもよい。ここで媒体とは、例えば入出力インタフェースに着脱可能な記憶媒体、または通信媒体、すなわち有線、無線、光などのネットワーク、または当該ネットワークを伝搬する搬送波やディジタル信号、を指す。また、プログラムにより実現される機能の一部または全部がハードウエア回路やＦＰＧＡにより実現されてもよい。 In the above-mentioned embodiments and variants, the programs executed by the robot and the computing device are stored in a ROM (not shown), but the programs may be stored in a non-volatile storage area. The robot and the computing device may also have an input/output interface (not shown), and the programs may be loaded from another device when necessary via the input/output interface and a medium available to the robot and the computing device. Here, the medium refers to, for example, a storage medium that is detachable from the input/output interface, or a communication medium, i.e., a network such as a wired, wireless, or optical network, or a carrier wave or digital signal that propagates through the network. Some or all of the functions realized by the programs may be realized by a hardware circuit or an FPGA.

上述した各実施の形態および変形例は、それぞれ組み合わせてもよい。上記では、種々の実施の形態および変形例を説明したが、本発明はこれらの内容に限定されるものではない。本発明の技術的思想の範囲内で考えられるその他の態様も本発明の範囲内に含まれる。 The above-mentioned embodiments and modifications may be combined with each other. Although various embodiments and modifications have been described above, the present invention is not limited to these. Other aspects that are conceivable within the scope of the technical concept of the present invention are also included within the scope of the present invention.

２００…ロボット
２０８…距離決定部
２０９…起動要求部
２１０…割当要求部
２１３…人歩行速度情報
４００…演算装置
４０１…サーバ通信部
４０２…音声認識部
４０７…プロセス制御部
４０８…プロセス停止部
６０１…認識プロセス
７００…待機管理テーブル
８００…数管理テーブル
９００…割当管理テーブル 200... Robot 208... Distance determination unit 209... Start request unit 210... Allocation request unit 213... Human walking speed information 400... Calculation device 401... Server communication unit 402... Voice recognition unit 407... Process control unit 408... Process stop unit 601... Recognition process 700... Waiting management table 800... Number management table 900... Allocation management table

Claims

a server communication unit that communicates with a plurality of robots that transmit resource control information indicating the possibility of a voice dialogue with a user and voice information;
a voice recognition unit capable of executing a plurality of recognition processes for converting the voice information received from the robot into text information;
a process control unit that uses the resource control information received from the robot to manage the number of recognition processes being activated and activates and stops the recognition processes.

2. The computing device according to claim 1,
the resource control information includes a request to start the recognition process, which is output by the robot based on a distance between the robot and a user;
The process control unit is a calculation device that activates the recognition process based on the activation request transmitted by the robot, and converts voice information transmitted by the robot into text information by assigning the activated recognition process to the robot.

3. The computing device according to claim 2,
The process control unit is a computing device that assigns the recognition process to the robot when at least one of a request for assignment of the recognition process is received from the robot and when voice information is received from the robot to which the recognition process is not assigned.

4. The computing device according to claim 3,
The process control unit is a computing device that releases the allocation of the recognition process to the robot in at least one of the following cases: when a request to release the recognition process from the robot is received, and when no voice information is received from the robot to which the recognition process is allocated for a predetermined period of time.

2. The computing device according to claim 1,
the resource control information includes sensor information that is an output of a sensor mounted on the robot;
The process control unit is a computing device that initiates the recognition process when it determines, using the sensor information, that a user speaking is present within a predetermined monitoring distance from the robot.

2. The computing device according to claim 1,
The process control unit is a computing device that stops the recognition process when a request to stop the recognition process is received from the robot.

2. The computing device according to claim 1,
the resource control information includes sensor information that is an output of a sensor mounted on the robot;
The process control unit is a computing device that stops the recognition process when it determines using the sensor information that a user who was within a predetermined monitoring distance from the robot has moved farther than the monitoring distance.

A computer having a server communication unit that communicates with a plurality of robots that transmit resource control information indicating a possibility of a voice dialogue with a user and voice information,
a voice recognition process capable of executing a plurality of recognition processes for converting the voice information received from the robot into text information;
A computer-readable recording medium having recorded thereon a program for executing a process control process that manages the number of recognition processes started using the resource control information received from the robot and starts and stops the recognition processes.

a robot communication unit capable of communicating with a computing device having a voice recognition unit capable of executing a plurality of recognition processes for converting received voice information into text information;
an activation request unit that transmits an activation request to the arithmetic device to request activation of the recognition process based on a distance to a user;
A voice input device comprising: a transmitting unit that records an utterance of the user and transmits the utterance as the voice information to the arithmetic device,
a distance determination unit that calculates a monitoring distance by multiplying an activation time of the recognition process by a relative speed between the user and the voice input device;
The activation request unit is a voice input device that transmits the activation request when the distance between the user and the voice input device is equal to or less than the monitoring distance.

10. The voice input device according to claim 9,
The voice input device, wherein the start request unit transmits a stop request to stop the recognition process when the distance between the user and the voice input device becomes greater than the monitoring distance.