JP2021064299A

JP2021064299A - Control system, terminal device, control method, and computer program

Info

Publication number: JP2021064299A
Application number: JP2019189797A
Authority: JP
Inventors: 惇馬場; Atsushi Baba; 思超宋; sichao Song; 岩本　拓也; Takuya Iwamoto; 拓也岩本; 大介遠藤; Daisuke Endo; 惇也中西; Junya Nakanishi; 到倉本; Itaru Kuramoto; 浩平小川; Kohei Ogawa; 雄一郎吉川; Yuichiro Yoshikawa; 石黒　浩; Hiroshi Ishiguro; 浩石黒
Original assignee: Albero Grande Co Ltd; Osaka University NUC; CyberAgent Inc
Current assignee: Albero Grande Co Ltd; Osaka University NUC; CyberAgent Inc
Priority date: 2019-10-16
Filing date: 2019-10-16
Publication date: 2021-04-22

Abstract

To allow for controlling robots and other devices more easily.SOLUTION: A control system provided herein comprises: a voice collection unit configured to collect voice uttered by a given device to a person facing the device; a control information generation unit configured to generate control information indicative of content of control for the given device or equipment associated with the given device based on the collected voice; and an equipment control unit configured to control the given device or the equipment associated with the given device based on the control information.SELECTED DRAWING: Figure 1

Description

本発明は、制御システム、端末装置、制御方法及びコンピュータプログラムに関する。 The present invention relates to control systems, terminal devices, control methods and computer programs.

ロボットを制御する方法に、音声を用いた制御がある。音声を用いた制御では、音声に対応付けされた制御内容に基づいて、ロボットが制御される。音声を用いて制御することで、遠隔地にいたり、移動したりすることが困難な人でもロボットを制御することができる。このような技術を開示した文献として特許文献１がある。特許文献１に開示された技術は、携帯電話等の通信装置を用いてロボットに音声を送信することで、ロボットを制御する技術である。 As a method of controlling a robot, there is a control using voice. In the control using voice, the robot is controlled based on the control content associated with the voice. By controlling using voice, the robot can be controlled even by a person who is difficult to move or move to a remote place. Patent Document 1 is a document that discloses such a technique. The technique disclosed in Patent Document 1 is a technique for controlling a robot by transmitting voice to the robot using a communication device such as a mobile phone.

特開２００５−２４６５０２号公報Japanese Unexamined Patent Publication No. 2005-246502

しかしながら、特許文献１に開示された技術は、ロボットを制御することを目的とした音声をロボットに伝えることで、ロボットを制御する技術である。このため、ロボットを制御する者が、ロボットを介してロボットが対峙している人と会話をしながら、その会話の内容に応じてロボット等の装置を動作させることは難しい。 However, the technique disclosed in Patent Document 1 is a technique for controlling a robot by transmitting a voice for controlling the robot to the robot. For this reason, it is difficult for a person who controls a robot to operate a device such as a robot according to the content of the conversation while having a conversation with a person whom the robot is facing via the robot.

上記事情に鑑み、本発明は、より簡単にロボット等の装置を制御することができる技術を提供することを目的としている。 In view of the above circumstances, it is an object of the present invention to provide a technique capable of controlling a device such as a robot more easily.

本発明の一態様は、所定の装置が対峙する人に向けて発話された音声を収音する収音部と、前記所定の装置又は前記所定の装置に係る機器に対する制御の内容を示す制御情報を収音された音声に基づいて生成する制御情報生成部と、前記制御情報に基づいて前記所定の装置又は前記所定の装置に係る機器を制御する機器制御部と、を備える、制御システムである。 One aspect of the present invention is control information indicating a sound collecting unit that picks up a sound uttered by a predetermined device toward a person facing the person, and control of the predetermined device or a device related to the predetermined device. It is a control system including a control information generation unit that generates sound based on the picked-up voice, and a device control unit that controls the predetermined device or the device related to the predetermined device based on the control information. ..

本発明の一態様は、上記の制御システムであって、前記音声に基づいて、前記音声を示す文字列を生成する音声認識部をさらに備え、前記制御情報生成部は、前記制御情報を前記文字列に基づいて生成する。 One aspect of the present invention is the control system, further including a voice recognition unit that generates a character string indicating the voice based on the voice, and the control information generation unit generates the control information into the character. Generate based on columns.

本発明の一態様は、上記の制御システムであって、前記所定の装置と前記所定の装置が向いている方向の空間とを示す映像を表示する表示部と、前記音声を発話するユーザによって指定された前記映像上の座標に基づいて、前記座標が示す空間を向くように前記所定の装置の方向を算出する算出部と、をさらに備え、前記機器制御部は、前記算出された結果に基づいて所定の装置の方向を制御する。 One aspect of the present invention is the control system, which is designated by a display unit that displays an image showing a space indicating the predetermined device and the space in the direction in which the predetermined device is facing, and a user who speaks the voice. The device control unit further includes a calculation unit that calculates the direction of the predetermined device so as to face the space indicated by the coordinates based on the coordinates on the image, and the device control unit is based on the calculated result. To control the direction of a predetermined device.

本発明の一態様は、上記の制御システムであって、前記機器制御部は、前記音声の内容に合わせて、前記所定の装置又は前記所定の装置に係る機器を駆動、発光又は発音させる。 One aspect of the present invention is the control system, wherein the device control unit drives, emits light, or sounds the predetermined device or the device related to the predetermined device according to the content of the voice.

本発明の一態様は、上記の制御システムであって、前記制御情報生成部は、音声と前記所定の装置又は前記所定の装置に係る機器に対する制御の内容とを対応付けた学習データを機械学習することで生成された推定器と、前記収音部によって収音された音声と、に基づいて、前記制御情報を生成する。 One aspect of the present invention is the above-mentioned control system, in which the control information generation unit machine-learns learning data in which voice is associated with the content of control for the predetermined device or the device related to the predetermined device. The control information is generated based on the estimator generated by the above and the sound picked up by the sound collecting unit.

本発明の一態様は、所定の装置が対峙する人に向けて発話された音声を収音する収音部と、前記所定の装置又は前記所定の装置に係る機器に対する制御の内容を示す制御情報を収音された音声に基づいて生成する制御情報生成部と、を備える、端末装置である。 One aspect of the present invention is control information indicating a sound collecting unit that picks up a sound uttered by a predetermined device toward a person facing the person, and a control content for the predetermined device or a device related to the predetermined device. It is a terminal device including a control information generation unit that generates sound based on the picked-up voice.

本発明の一態様は、コンピュータが、所定の装置が対峙する人に向けて発話された音声を収音する収音ステップと、前記所定の装置又は前記所定の装置に係る機器に対する制御の内容を示す制御情報を収音された音声に基づいて生成する制御情報生成ステップと、前記制御情報に基づいて前記所定の装置又は前記所定の装置に係る機器を制御する機器制御ステップと、を有する、制御方法である。 One aspect of the present invention is a sound collection step in which a computer picks up a voice uttered to a person facing a predetermined device, and controls the predetermined device or a device related to the predetermined device. Control having a control information generation step for generating the indicated control information based on the picked-up voice, and a device control step for controlling the predetermined device or the device related to the predetermined device based on the control information. The method.

本発明の一態様は、上記の制御システムとしてコンピュータを機能させるためのコンピュータプログラムである。 One aspect of the present invention is a computer program for operating a computer as the control system described above.

本発明により、より簡単にロボット等の装置を制御することが可能となる。 According to the present invention, it becomes possible to control a device such as a robot more easily.

ロボット制御システム１のシステム構成を示すシステム構成図である。It is a system configuration diagram which shows the system configuration of the robot control system 1. ロボット４００に対する制御の内容の一具体例を示す図である。It is a figure which shows a specific example of the content of control with respect to a robot 400. ロボット４００の状態遷移に対する状態遷移情報の一具体例を示す図である。It is a figure which shows a specific example of the state transition information with respect to the state transition of a robot 400. ロボット４００の動作内容の一具体例を示す図である。It is a figure which shows a specific example of the operation content of a robot 400. ロボット４００の一具体例を示す図である。It is a figure which shows a specific example of a robot 400. ロボット４００の頭部が向く方向を示す座標を受け付ける一具体例を示す図である。It is a figure which shows a specific example which accepts the coordinates which indicate the direction which the head of a robot 400 faces. 音声でロボット４００を制御する処理の一具体例を示す図である。It is a figure which shows a specific example of the process which controls a robot 400 by voice. ロボット４００の向きを制御する処理の一具体例を示す図である。It is a figure which shows a specific example of the process which controls the orientation of a robot 400. ロボット制御システムがロボットの代わりに表示装置又はアクチュエータを備える場合の一具体例を示す図である。It is a figure which shows a specific example of the case where a robot control system is provided with a display device or an actuator instead of a robot.

図１は、ロボット制御システム１のシステム構成を示すシステム構成図である。ロボット制御システム１は、端末装置１００、制御装置２００、中継サーバ３００、ロボット４００、マイク・スピーカー５００、カメラ６００及び発光部７００を備える。ロボット制御システム１は、端末装置１００に入力された音声に基づいて、ロボット４００及びロボット４００に係る機器を制御する。 FIG. 1 is a system configuration diagram showing a system configuration of the robot control system 1. The robot control system 1 includes a terminal device 100, a control device 200, a relay server 300, a robot 400, a microphone / speaker 500, a camera 600, and a light emitting unit 700. The robot control system 1 controls the robot 400 and the equipment related to the robot 400 based on the voice input to the terminal device 100.

端末装置１００、制御装置２００及び中継サーバ３００は、いずれもネットワーク８００を介して通信可能である。ネットワーク８００は、例えばＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）又はインターネット等のネットワークである。ネットワーク８００は、無線通信を用いたネットワークであってもよいし、有線通信を用いたネットワークであってもよい。ネットワーク８００は、複数のネットワークが組み合わされて構成されてもよい。ネットワーク８００は、ＶＰＮ（Virtual Private Network）等の閉域通信網であってもよい。なお、ネットワーク８００は、各装置の通信を実現するためのネットワークの具体例にすぎず、各装置の通信を実現するためのネットワークとして他の構成が採用されてもよい。例えば、特定の装置間の通信が他の装置間の通信に用いられるネットワークとは異なるネットワークを用いて実現されてもよい。具体的には、端末装置１００と中継サーバ３００との間の通信は、制御装置２００及び中継サーバ３００の各装置間の通信とは異なるネットワークで実現されてもよい。本実施形態では、ロボット４００、マイク・スピーカー５００、カメラ６００及び発光部７００は、いずれも制御装置２００に接続されているが、これに限定されない。例えば、ロボット４００、マイク・スピーカー５００、カメラ６００及び発光部７００は、いずれもネットワーク８００を介して制御装置２００と通信可能に接続されていてもよい。 The terminal device 100, the control device 200, and the relay server 300 can all communicate with each other via the network 800. The network 800 is, for example, a network such as a LAN (Local Area Network), a WAN (Wide Area Network), or the Internet. The network 800 may be a network using wireless communication or a network using wired communication. The network 800 may be configured by combining a plurality of networks. The network 800 may be a closed communication network such as a VPN (Virtual Private Network). The network 800 is only a specific example of a network for realizing communication of each device, and another configuration may be adopted as a network for realizing communication of each device. For example, communication between specific devices may be realized using a network different from the network used for communication between other devices. Specifically, the communication between the terminal device 100 and the relay server 300 may be realized by a network different from the communication between the devices of the control device 200 and the relay server 300. In the present embodiment, the robot 400, the microphone / speaker 500, the camera 600, and the light emitting unit 700 are all connected to the control device 200, but the present invention is not limited thereto. For example, the robot 400, the microphone / speaker 500, the camera 600, and the light emitting unit 700 may all be communicably connected to the control device 200 via the network 800.

端末装置１００は、パーソナルコンピュータ、タブレットコンピュータ又はサーバ等の情報処理装置を用いて構成される。端末装置１００は、制御装置２００にロボット４００の制御に関する情報を送信するための制御指示機能が実装されている。制御指示機能は、ハードウェアによって端末装置１００に実装されてもよいし、ソフトウェアのインストールによって実装されてもよい。端末装置１００は、通信部１０１、入力部１０２、表示部１０３、マイク１０４、カメラ１０５、制御内容記憶部１０６及び制御部１０７を備える。 The terminal device 100 is configured by using an information processing device such as a personal computer, a tablet computer, or a server. The terminal device 100 is equipped with a control instruction function for transmitting information related to the control of the robot 400 to the control device 200. The control instruction function may be implemented in the terminal device 100 by hardware, or may be implemented by installing software. The terminal device 100 includes a communication unit 101, an input unit 102, a display unit 103, a microphone 104, a camera 105, a control content storage unit 106, and a control unit 107.

通信部１０１は、ネットワークインタフェース等の通信装置である。通信部１０１は所定のプロトコルでネットワーク８００に通信可能に接続する。通信部１０１は、制御部１０７の制御に応じてネットワーク８００を介して、他の装置との間でデータ通信する。 The communication unit 101 is a communication device such as a network interface. The communication unit 101 is communicably connected to the network 800 by a predetermined protocol. The communication unit 101 communicates data with other devices via the network 800 under the control of the control unit 107.

入力部１０２は、キーボード、ポインティングデバイス（マウス、タブレット等）、ボタン、タッチパネル等の入力装置を用いて構成される。入力部１０２は、ユーザの指示を端末装置１００に入力する際にユーザによって操作される。入力部１０２は、入力装置を端末装置１００に接続するためのインタフェースであってもよい。この場合、入力部１０２は、入力装置においてユーザの入力に応じて生成された入力信号を端末装置１００に入力する。 The input unit 102 is configured by using an input device such as a keyboard, a pointing device (mouse, tablet, etc.), a button, and a touch panel. The input unit 102 is operated by the user when inputting the user's instruction to the terminal device 100. The input unit 102 may be an interface for connecting the input device to the terminal device 100. In this case, the input unit 102 inputs the input signal generated in response to the user's input in the input device to the terminal device 100.

表示部１０３は、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイ、電子泳動方式ディスプレイ、ＣＲＴ（Cathode Ray Tube）ディスプレイ等の画像表示装置である。表示部１０３は、制御部１０７の制御に応じて画像を表示する。表示部１０３は、画像表示装置を端末装置１００に接続するためのインタフェースであってもよい。この場合、表示部１０３は、制御部１０７の制御に応じた画像を表示するための映像信号を生成し、自身に接続されている画像表示装置に映像信号を出力する。 The display unit 103 is an image display device such as a liquid crystal display, an organic EL (Electro Luminescence) display, an electroluminescence display, and a CRT (Cathode Ray Tube) display. The display unit 103 displays an image under the control of the control unit 107. The display unit 103 may be an interface for connecting the image display device to the terminal device 100. In this case, the display unit 103 generates a video signal for displaying an image under the control of the control unit 107, and outputs the video signal to an image display device connected to the display unit 103.

マイク１０４は、自装置近傍の音声を収音する。マイク１０４は、例えば端末装置１００の操作者によって発話された音声を収音する。端末装置１００の操作者は、例えば、ロボット４００が対峙する人に対して発話する。マイク１０４は、収音された音声に基づいて音声信号を生成する。マイク１０４は、生成された音声信号を端末装置１００に出力する。なお、マイク１０４は、外付けマイク等の収音装置を端末装置１００に接続するためのインタフェースであってもよい。この場合、マイク１０４は、収音装置において入力された音声から音声信号を生成し、端末装置１００に出力する。マイク１０４は、収音部の一具体例である。収音部は、所定の装置が対峙する人に向けて発話された音声を収音する。 The microphone 104 collects sound in the vicinity of its own device. The microphone 104 collects, for example, the voice uttered by the operator of the terminal device 100. The operator of the terminal device 100, for example, speaks to a person whom the robot 400 faces. The microphone 104 generates a voice signal based on the picked up voice. The microphone 104 outputs the generated audio signal to the terminal device 100. The microphone 104 may be an interface for connecting a sound collecting device such as an external microphone to the terminal device 100. In this case, the microphone 104 generates a voice signal from the voice input by the sound collecting device and outputs the voice signal to the terminal device 100. The microphone 104 is a specific example of the sound collecting unit. The sound collecting unit picks up the sound uttered to the person facing the predetermined device.

カメラ１０５は、端末装置１００の操作者及び操作者の近傍の動画像を撮像する。カメラ１０５は、カメラ等の撮像装置を端末装置１００に接続するためのインタフェースであってもよい。この場合、カメラ１０５は、撮像装置において撮像された動画像から映像信号を生成し、端末装置１００に入力する。 The camera 105 captures a moving image of the operator of the terminal device 100 and the vicinity of the operator. The camera 105 may be an interface for connecting an imaging device such as a camera to the terminal device 100. In this case, the camera 105 generates a video signal from the moving image captured by the imaging device and inputs it to the terminal device 100.

制御内容記憶部１０６は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。制御内容記憶部１０６は、ロボット４００に対する制御の内容を示す情報を記憶する。ロボット４００に対する制御の内容は、マイク１０４によって収音された音声から生成された文字列又は文字列の一部分に対応付けされる。 The control content storage unit 106 is configured by using a storage device such as a magnetic hard disk device or a semiconductor storage device. The control content storage unit 106 stores information indicating the content of control for the robot 400. The content of control for the robot 400 is associated with a character string or a part of the character string generated from the voice picked up by the microphone 104.

図２は、ロボット４００に対する制御の内容の一具体例を示す図である。制御内容記憶部１０６は、認識文字列と制御内容とを対応付けて記憶する。認識文字列は、ロボット４００又はロボット４００に係る機器を制御するための文字列である。マイク１０４によって収音された音声に基づいて生成された文字列が、認識文字列を含む場合、認識文字列に対応付けされた制御内容に基づいて、ロボット４００又はロボット４００に係る機器が制御される。制御内容は、マイク１０４によって収音された音声に基づいて生成された文字列が、認識文字列を含む場合におけるロボット４００に対する制御の内容を示す。制御内容は、複数の認識文字列と対応付けされてもよい。 FIG. 2 is a diagram showing a specific example of the content of control for the robot 400. The control content storage unit 106 stores the recognition character string and the control content in association with each other. The recognition character string is a character string for controlling the robot 400 or the device related to the robot 400. When the character string generated based on the voice picked up by the microphone 104 includes the recognition character string, the robot 400 or the device related to the robot 400 is controlled based on the control content associated with the recognition character string. To. The control content indicates the content of control for the robot 400 when the character string generated based on the voice picked up by the microphone 104 includes the recognition character string. The control content may be associated with a plurality of recognition character strings.

図２の最上段に示される例では、認識文字列の値が“おはよう，こんにちは，こんばんは，・・・”、制御内容の値が“event:greeting”である。従って、図２の最上段に示される例によると、マイク１０４によって収音された音声に基づいて生成された文字列が“おはよう”、“こんにちは”、“こんばんは”等の認識文字列のいずれか１つを含む場合、制御内容“event:greeting”に基づいてロボット４００又はロボット４００に係る機器は制御される。 In the example shown at the top of Figure 2, recognized the value of the string "Good morning, hello, good evening, ...", the value of the control contents: a "event greeting". Therefore, according to the example shown at the top of FIG. 2, the string is generated based on the sound collected by the microphone 104 "Good morning", "Hello", one of the recognized character string such as "Good evening" When one is included, the robot 400 or the device related to the robot 400 is controlled based on the control content “event: greeting”.

また、認識文字列が“＜任意のテキスト＞”である場合について説明する。認識文字列が“＜任意のテキスト＞”である場合、マイク１０４によって収音された音声に基づいて生成された文字列がどのような文字列であっても、その制御内容が実行されることを示す。すなわち、認識文字列がどのような文字列であっても、制御内容“event:talking”に基づいてロボット４００又はロボット４００に係る機器は制御される。 Further, a case where the recognition character string is "<arbitrary text>" will be described. When the recognition character string is "<arbitrary text>", the control content is executed regardless of the character string generated based on the voice picked up by the microphone 104. Is shown. That is, regardless of the character string recognized, the robot 400 or the device related to the robot 400 is controlled based on the control content "event: talking".

また、認識文字列が“＜認識終了＞”である場合について説明する。認識文字列が“＜認識終了＞”である場合、音声の収音が終了したこと（文字列が生成されなくなったこと）に伴って、実行される制御であることを示す。すなわち、マイク１０４による音声の収音が終了すると、ロボット４００又はロボット４００に係る機器は、制御内容“event:talking_final”に基づいて制御される。 Further, a case where the recognition character string is "<recognition end>" will be described. When the recognition character string is "<recognition end>", it indicates that the control is executed when the sound collection of the voice ends (the character string is no longer generated). That is, when the sound collection of the voice by the microphone 104 is completed, the robot 400 or the device related to the robot 400 is controlled based on the control content "event: talking_final".

また、認識文字列が“＜認識終了３秒後＞”である場合について説明する。認識文字列が“＜認識終了３秒後＞”である場合、音声の収音が終了した（文字列が生成されなくなった）後、３秒後に実行される制御であることを示す。すなわち、マイク１０４による音声の収音が終了して３秒経過すると、ロボット４００又はロボット４００に係る機器は、制御内容“event:silent”に基づいて制御される。なお、図２に示される制御の内容は一具体例に過ぎない。そのため、制御内容記憶部１０６は、さらに多くの制御の内容を記憶してもよいし、より少ない制御の内容を記憶してもよい。 Further, a case where the recognition character string is "<3 seconds after the end of recognition>" will be described. When the recognition character string is "<3 seconds after the end of recognition>", it indicates that the control is executed 3 seconds after the sound collection of the voice is completed (the character string is no longer generated). That is, when 3 seconds have passed after the sound collection by the microphone 104 is completed, the robot 400 or the device related to the robot 400 is controlled based on the control content "event: silent". The content of the control shown in FIG. 2 is only a specific example. Therefore, the control content storage unit 106 may store more control content or less control content.

図１に戻って、ロボット制御システム１の説明を続ける。制御部１０７は、端末装置１００の各部の動作を制御する。制御部１０７は、ＣＰＵ（Central Processing Unit）等のプロセッサ及びＲＡＭ（Random Access Memory）を用いて構成される。制御部１０７は、プロセッサが特定のプログラムを実行することによって、通信制御部１７１、音声認識部１７２、制御情報生成部１７３、映像制御部１７４、角度算出部１７５及び状態推定部１７６として機能する。 Returning to FIG. 1, the description of the robot control system 1 will be continued. The control unit 107 controls the operation of each unit of the terminal device 100. The control unit 107 is configured by using a processor such as a CPU (Central Processing Unit) and a RAM (Random Access Memory). The control unit 107 functions as a communication control unit 171, a voice recognition unit 172, a control information generation unit 173, a video control unit 174, an angle calculation unit 175, and a state estimation unit 176 when the processor executes a specific program.

通信制御部１７１は、所定の通信プログラムを実行することによって、他の装置と通信する。例えば、通信制御部１７１は、中継サーバ３００を介して制御装置２００に制御情報又は角度情報を送信する。制御情報及び角度情報については後述する。 The communication control unit 171 communicates with another device by executing a predetermined communication program. For example, the communication control unit 171 transmits control information or angle information to the control device 200 via the relay server 300. The control information and the angle information will be described later.

音声認識部１７２は、音声認識処理を実行する。音声認識処理は、音声信号に基づいて文字列を生成する処理である。音声認識部１７２は、音声認識処理を実行することで、マイク１０４によって出力された音声信号に基づいて文字列を生成する。音声認識部１７２は、生成された文字列を制御情報生成部１７３に出力する。音声認識部１７２は、公知の手法を用いて文字列を生成してもよい。 The voice recognition unit 172 executes the voice recognition process. The voice recognition process is a process of generating a character string based on a voice signal. The voice recognition unit 172 generates a character string based on the voice signal output by the microphone 104 by executing the voice recognition process. The voice recognition unit 172 outputs the generated character string to the control information generation unit 173. The voice recognition unit 172 may generate a character string by using a known method.

制御情報生成部１７３は、生成された文字列に基づいて制御情報を生成する。制御情報は、ロボット４００又はロボット４００に係る機器に対する制御の内容を示す。具体的には、制御情報は、制御内容記憶部１０６に記憶される制御内容を含む。ロボット４００に係る機器は、ロボット４００と同様に制御情報に基づいて制御される機器である。ロボット４００に係る機器は、例えば、マイク・スピーカー５００又は発光部７００である。 The control information generation unit 173 generates control information based on the generated character string. The control information indicates the content of control for the robot 400 or the device related to the robot 400. Specifically, the control information includes the control content stored in the control content storage unit 106. The device related to the robot 400 is a device controlled based on control information like the robot 400. The device related to the robot 400 is, for example, a microphone / speaker 500 or a light emitting unit 700.

例えば、制御情報生成部１７３は、制御内容記憶部１０６に記憶された認識文字列を生成された文字列から検索する。制御情報生成部１７３は、認識文字列を生成された文字列から見つけた場合、その認識文字列に対応付けされた制御内容を取得する。制御情報生成部１７３は、取得された制御内容を示す制御情報を生成する。制御情報生成部１７３は、中継サーバ３００を介して制御装置２００に制御情報を送信する。 For example, the control information generation unit 173 searches the recognition character string stored in the control content storage unit 106 from the generated character string. When the control information generation unit 173 finds the recognition character string from the generated character string, the control information generation unit 173 acquires the control content associated with the recognition character string. The control information generation unit 173 generates control information indicating the acquired control content. The control information generation unit 173 transmits control information to the control device 200 via the relay server 300.

なお、制御情報生成部１７３は、複数の制御内容を示す制御情報を生成してもよい。例えば、生成された文字列が“おはようございます”である場合について説明する。この場合、制御情報生成部１７３は、制御内容記憶部１０６に記憶された認識文字列を“おはようございます”から検索する。この例では、制御情報生成部１７３は、認識文字列“おはよう”を見つけることができる。このため、制御情報生成部１７３は、認識文字列“おはよう”に対応付けされた制御内容“event:greeting”を取得する。また、制御情報生成部１７３は、認識文字列“＜任意のテキスト＞”を見つけることができる。このため、制御情報生成部１７３は、制御内容“event:talking”を取得する。制御情報生成部１７３は、取得された制御内容“event:greeting”及び“event:talking”を含む制御情報を生成する。なお、制御情報生成部１７３は、生成された文字列が“おはようございます”で終了している場合、認識文字列“＜認識終了＞”に対応付けされた制御内容“event:greeting_final”をさらに取得してもよい。この場合、制御情報生成部１７３は、制御内容“event:greeting_final”をさらに含む制御情報を生成する。 The control information generation unit 173 may generate control information indicating a plurality of control contents. For example, the case where the generated character string is "Good morning" will be described. In this case, the control information generation unit 173 searches for the recognition character string stored in the control content storage unit 106 from "Good morning". In this example, the control information generation unit 173 can find the recognition character string "good morning". Therefore, the control information generation unit 173 acquires the control content “event: greeting” associated with the recognition character string “good morning”. Further, the control information generation unit 173 can find the recognition character string "<arbitrary text>". Therefore, the control information generation unit 173 acquires the control content “event: talking”. The control information generation unit 173 generates control information including the acquired control contents “event: greeting” and “event: talking”. When the generated character string ends with "Good morning", the control information generation unit 173 further adds the control content "event: greeting_final" associated with the recognition character string "<recognition end>". You may get it. In this case, the control information generation unit 173 generates control information including the control content “event: greeting_final”.

制御情報生成部１７３は、文字列の生成が終了してから３秒経過すると、認識文字列“＜認識終了３秒後＞”に対応付けされた制御内容“event:silent”を取得する。制御情報生成部１７３は、制御内容“event:silent”を含む制御情報を生成する。 The control information generation unit 173 acquires the control content "event: silent" associated with the recognition character string "<3 seconds after the end of recognition>" when 3 seconds have passed since the generation of the character string was completed. The control information generation unit 173 generates control information including the control content “event: silent”.

映像制御部１７４は、所定の映像制御プログラムを実行することによって、他の装置と通信する。例えば、映像制御部１７４は、マイク１０４によって出力された音声信号を制御装置２００に送信する。また、映像制御部１７４は、制御装置２００から映像信号を受信する。映像制御部１７４は、受信した映像信号を表示部１０３に出力する。 The video control unit 174 communicates with other devices by executing a predetermined video control program. For example, the video control unit 174 transmits the audio signal output by the microphone 104 to the control device 200. Further, the video control unit 174 receives a video signal from the control device 200. The video control unit 174 outputs the received video signal to the display unit 103.

角度算出部１７５は、ロボット４００の頭部の角度を算出する。具体的には、端末装置１００のユーザは、入力部１０２を操作することでロボット４００の頭部を向かせたい方向を示す座標を表示部１０３に表示されたカメラ映像から指定する。カメラ映像とは、カメラ６００によって撮像された動画像である。例えば、端末装置１００のユーザは、表示部１０３に表示されたマウスカーソルで、任意のカメラ映像上の座標をクリックすることで、座標を指定する。角度算出部１７５は、入力部１０２を介して表示部１０３に出力されたカメラ映像上の座標を受け付ける。受け付けた座標は、ロボット４００の頭部が向く方向を示す。角度算出部１７５は、受け付けた座標によって示される方向を向くようにロボット４００が向く角度を算出する。例えば、角度算出部１７５は、以下の数式（１）で示される座標に基づいて、ロボット４００の頭部が向く３次元座標を算出する。角度算出部１７５は、算出された三次元座標に基づいて、ロボット４００の頭部が向く角度を算出する。角度算出部１７５は、算出された角度を角度情報として制御装置２００に送信する。 The angle calculation unit 175 calculates the angle of the head of the robot 400. Specifically, the user of the terminal device 100 specifies the coordinates indicating the direction in which the head of the robot 400 is desired to be directed by operating the input unit 102 from the camera image displayed on the display unit 103. The camera image is a moving image captured by the camera 600. For example, the user of the terminal device 100 specifies the coordinates by clicking the coordinates on an arbitrary camera image with the mouse cursor displayed on the display unit 103. The angle calculation unit 175 receives the coordinates on the camera image output to the display unit 103 via the input unit 102. The received coordinates indicate the direction in which the head of the robot 400 faces. The angle calculation unit 175 calculates the angle at which the robot 400 faces so as to face the direction indicated by the received coordinates. For example, the angle calculation unit 175 calculates the three-dimensional coordinates to which the head of the robot 400 faces, based on the coordinates represented by the following mathematical formula (1). The angle calculation unit 175 calculates the angle at which the head of the robot 400 faces based on the calculated three-dimensional coordinates. The angle calculation unit 175 transmits the calculated angle as angle information to the control device 200.

数式（１）は、角度算出部１７５が受け付けた座標から、ロボット４００の頭部が向く三次元座標Ｐを算出する数式である。数式（１）では、表示部１０３に表示されるカメラ映像の幅は、Ｗで表される。数式（１）では、表示部１０３に表示されるカメラ映像の高さは、Ｈで表される。数式（１）では、カメラ６００のＸ軸方向の画角はＡ＿ｘ（ｒａｄｉａｎ）で表される。数式（１）では、カメラ６００のＹ軸方向の画角はＡ＿ｙ（ｒａｄｉａｎ）で表される。数式（１）では、Ｚ軸方向の距離はＤ＿ｚ（ｃｍ）で固定である。このように、映像上で入力された座標（ｘ、ｙ）における三次元座標Ｐは、数式（１）で表される。 The mathematical formula (1) is a mathematical formula for calculating the three-dimensional coordinates P to which the head of the robot 400 faces from the coordinates received by the angle calculation unit 175. In the mathematical formula (1), the width of the camera image displayed on the display unit 103 is represented by W. In the mathematical formula (1), the height of the camera image displayed on the display unit 103 is represented by H. In the mathematical formula (1), the angle of view of the camera 600 in the X-axis direction is represented by A_x (radian). In the mathematical formula (1), the angle of view of the camera 600 in the Y-axis direction is represented by A_y (radian). In the formula (1), the distance in the Z-axis direction is fixed at D_z (cm). As described above, the three-dimensional coordinates P at the coordinates (x, y) input on the video are expressed by the mathematical formula (1).

状態推定部１７６は、カメラ１０５によって撮像された映像に基づいて、端末装置１００のユーザの状態を推定する。例えば、状態推定部１７６は、端末装置１００のユーザが端末装置１００の操作を許可されたユーザであるか否かを判定してもよい。この場合、端末装置１００の操作を許可されたユーザである場合、状態推定部１７６は、ユーザから端末装置１００に対する操作を受け付けるように制御してもよい。端末装置１００の操作を許可されたユーザではない場合、状態推定部１７６は、ユーザから端末装置１００に対する操作を受け付けないように制御してもよい。また、状態推定部１７６は、入力部１０２に対する入力や、マイク１０４の収音の状態に基づいて、操作状況を推定してもよい。 The state estimation unit 176 estimates the state of the user of the terminal device 100 based on the image captured by the camera 105. For example, the state estimation unit 176 may determine whether or not the user of the terminal device 100 is a user who is permitted to operate the terminal device 100. In this case, if the user is authorized to operate the terminal device 100, the state estimation unit 176 may control to accept the operation of the terminal device 100 from the user. If the user is not authorized to operate the terminal device 100, the state estimation unit 176 may control so as not to accept the operation on the terminal device 100 from the user. Further, the state estimation unit 176 may estimate the operation status based on the input to the input unit 102 and the sound collection state of the microphone 104.

制御装置２００は、パーソナルコンピュータ、タブレットコンピュータ又はサーバ等の情報処理装置を用いて構成される。制御装置２００は、ロボット４００を動作させるための動作機能が実装されている。動作機能は、ハードウェアによって制御装置２００に実装されてもよいし、ソフトウェアのインストールによって実装されてもよい。制御装置２００は、通信部２０１、状態遷移情報記憶部２０２、動作内容記憶部２０３及び制御部２０４を備える。 The control device 200 is configured by using an information processing device such as a personal computer, a tablet computer, or a server. The control device 200 is equipped with an operation function for operating the robot 400. The operation function may be implemented in the control device 200 by hardware, or may be implemented by installing software. The control device 200 includes a communication unit 201, a state transition information storage unit 202, an operation content storage unit 203, and a control unit 204.

通信部２０１は、ネットワークインタフェース等の通信装置である。通信部２０１は所定のプロトコルでネットワーク８００に通信可能に接続する。通信部２０１は、制御部２０４の制御に応じてネットワーク８００を介して、他の装置との間でデータ通信する。 The communication unit 201 is a communication device such as a network interface. The communication unit 201 is communicably connected to the network 800 by a predetermined protocol. The communication unit 201 communicates data with other devices via the network 800 under the control of the control unit 204.

状態遷移情報記憶部２０２は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。状態遷移情報記憶部２０２は、ロボット４００の制御の状態の遷移に関する状態遷移情報を記憶する。状態遷移情報は、端末装置１００から受信した制御情報に基づいて、ロボット４００の状態を遷移させるための情報を示す。ロボット４００は、同じ制御情報を受信しても、制御状態に応じて異なる動作を行う。 The state transition information storage unit 202 is configured by using a storage device such as a magnetic hard disk device or a semiconductor storage device. The state transition information storage unit 202 stores the state transition information related to the state transition of the control of the robot 400. The state transition information indicates information for transitioning the state of the robot 400 based on the control information received from the terminal device 100. Even if the robot 400 receives the same control information, the robot 400 performs different operations depending on the control state.

図３は、ロボット４００の状態遷移に対する状態遷移情報の一具体例を示す図である。状態遷移情報では、制御情報が示す制御内容を縦一列目に表す。状態遷移情報では、ロボット４００のとりうる制御状態を横一行目に表す。図３では、状態遷移情報は、“state:waiting”と“state:talking”との２つの制御状態を有する。制御状態“state:waiting”とは、例えば、ロボット４００が制御装置２００からの動作指示を待っている状態である。制御状態が“state:waiting”である場合、ロボット４００は停止していてもよい。制御状態“state:talking”とは、例えば、マイク・スピーカー５００が端末装置１００から送信された音声信号を出力し、ロボット４００が、ロボット４００に対峙している人に対して、制御装置２００からの動作指示に応じて所定の動作を行っている状態である。 FIG. 3 is a diagram showing a specific example of state transition information for the state transition of the robot 400. In the state transition information, the control contents indicated by the control information are represented in the first vertical column. In the state transition information, the control states that the robot 400 can take are shown in the first horizontal line. In FIG. 3, the state transition information has two control states, “state: waiting” and “state: talking”. The control state “state: waiting” is, for example, a state in which the robot 400 is waiting for an operation instruction from the control device 200. When the control state is "state: waiting", the robot 400 may be stopped. The control state “state: talking” means, for example, that the microphone / speaker 500 outputs an audio signal transmitted from the terminal device 100, and the robot 400 receives a person facing the robot 400 from the control device 200. It is a state in which a predetermined operation is performed according to the operation instruction of.

以下、制御装置２００が、ロボット４００の制御状態が“state:waiting”である場合に、制御内容として“event:talking”を示す制御情報を受信した場合について説明する。この場合、ロボット４００は、制御内容“event:talking”、制御状態“state:waiting”によって示される状態制御情報に基づいて制御される。この場合、状態遷移情報記憶部２０２に記憶される状態制御情報は、“action:talking→state:talking”を示す。この状態制御情報では、“action:talking”が実行された後に、制御状態“state:talking”に遷移することを示す。“action:talking”は、ロボット４００に動作させる動作内容の識別情報を示す。このため、制御装置２００は、ロボット４００に“action:talking”によって識別される動作内容を動作させた後、ロボット４００の状態を“state:talking”に遷移させる。なお、“action:talking”等のロボットを動作させる動作内容の具体例は、後述する。このように、制御装置２００は、端末装置１００から受信した制御情報が示す制御内容と状態遷移情報記憶部２０２に記憶される状態遷移情報とに基づいて、ロボット４００の制御状態と動作内容を決定する。なお、状態遷移情報は、図３に示す態様に限定されない。例えば、状態遷移情報は、状態として“state:waiting”、“state:talking”以外の制御状態を有してもよい。例えば、状態遷移情報は、状態として“state:dancing”等の状態を有してもよい。例えば、ロボット４００の状態が“state:dancing”である場合には、ロボット４００は左右の腕を上下に駆動させる等のダンスをしているような動作を維持してもよい。 Hereinafter, a case where the control device 200 receives control information indicating “event: talking” as the control content when the control state of the robot 400 is “state: waiting” will be described. In this case, the robot 400 is controlled based on the state control information indicated by the control content “event: talking” and the control state “state: waiting”. In this case, the state control information stored in the state transition information storage unit 202 indicates “action: talking → state: talking”. This state control information indicates that the state transitions to the control state "state: talking" after "action: talking" is executed. “Action: talking” indicates identification information of the operation content to be operated by the robot 400. Therefore, the control device 200 causes the robot 400 to operate the operation content identified by "action: talking", and then shifts the state of the robot 400 to "state: talking". A specific example of the operation content for operating the robot such as “action: talking” will be described later. In this way, the control device 200 determines the control state and operation content of the robot 400 based on the control content indicated by the control information received from the terminal device 100 and the state transition information stored in the state transition information storage unit 202. To do. The state transition information is not limited to the mode shown in FIG. For example, the state transition information may have a control state other than "state: waiting" and "state: talking" as a state. For example, the state transition information may have a state such as “state: dancing” as the state. For example, when the state of the robot 400 is “state: dancing”, the robot 400 may maintain a dancing motion such as driving the left and right arms up and down.

図１に戻って、ロボット制御システム１の説明を続ける。動作内容記憶部２０３は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。動作内容記憶部２０３は、ロボット４００の動作の内容を記憶する。 Returning to FIG. 1, the description of the robot control system 1 will be continued. The operation content storage unit 203 is configured by using a storage device such as a magnetic hard disk device or a semiconductor storage device. The operation content storage unit 203 stores the operation content of the robot 400.

図４は、ロボット４００の動作内容の一具体例を示す図である。動作内容記憶部２０３は、動作内容と実行内容とを対応付けて記憶する。動作内容は、ロボット４００に動作させる内容を識別するための識別情報を示す。動作内容は、動作内容記憶部２０３において一意に識別される。実行内容は、ロボット４００に所定の動作を実行させる命令の内容を示す。実行内容は、複数の動作内容を示していてもよい。 FIG. 4 is a diagram showing a specific example of the operation content of the robot 400. The operation content storage unit 203 stores the operation content and the execution content in association with each other. The operation content indicates identification information for identifying the content to be operated by the robot 400. The operation content is uniquely identified in the operation content storage unit 203. The execution content indicates the content of an instruction for causing the robot 400 to execute a predetermined operation. The execution content may indicate a plurality of operation contents.

図４に示す実行内容の具体的な内容について説明する。robot eyeColor greenは、ロボットが備える第一発光部材（両目に相当）を緑色に発光させる動作を実行させる命令である。robot motion raiseHandは、ロボットが備える左肩関節部を上まで上げる動作を実行させる命令である。sound play sound1は、マイク・スピーカー５００にラッパ口音を発音させる動作を実行させる命令である。led play animation1は、発光部７００を点滅させる動作を実行させる命令である。robot repeat eyeColor blue/offは、ロボットが備える第一発光部材を青色に点滅させる動作を実行させる命令である。robot motion mojimojiは、左腕部及び右腕部をランダムに駆動させる動作を実行させる命令である。sound play sound2は、マイク・スピーカー５００に時計口音を発音させる動作を実行させる命令である。時計口音とは、例えば“チクタク”等の音である。led play animation2は、右肩関節部を下方向に下げて、右腕関節部を閉じる、発光部を吹き出しの形状と吹き出しの内部で“・・・”の文字の形とに点灯させる動作を実行させる命令である。robot repeat eyeColor green/blueは、左目及び右目に相当する第一発光部材を緑と青とに交互に点滅を繰り返す動作を実行させる命令である。robot motion randomHeadは、ロボット４００の頭部をランダム間隔でプラスマイナス５度ずつ変更する動作を実行させる命令である。robot motion openHandは、左肩関節部と右肩関節部とを下に移動させ、右腕関節部と左腕関節部とを開く動作を実行させる命令である。robot motion homePoseは、全ての駆動部材を初期位置に戻し、発光部７００を消灯する動作を実行させる命令である。 The specific contents of the execution contents shown in FIG. 4 will be described. robot eyeColor green is a command to execute an operation of making the first light emitting member (corresponding to both eyes) of the robot emit green light. The robot motion raiseHand is a command for executing the action of raising the left shoulder joint portion of the robot. sound play sound1 is a command for causing the microphone / speaker 500 to perform an operation of producing a trumpet sound. The led play animation1 is an instruction to execute an operation of blinking the light emitting unit 700. robot repeat eyeColor blue / off is a command to execute an operation of blinking the first light emitting member of the robot in blue. The robot motion mojimoji is a command to execute an action of randomly driving the left arm and the right arm. sound play sound2 is a command for causing the microphone / speaker 500 to perform an operation of producing a clock mouth sound. The clock mouth sound is, for example, a sound such as "tick". led play animation2 executes the operation of lowering the right shoulder joint part downward, closing the right arm joint part, and lighting the light emitting part in the shape of the balloon and the shape of the character "..." inside the balloon. It is an instruction. robot repeat eyeColor green / blue is a command to execute an operation of repeating blinking of the first light emitting member corresponding to the left eye and the right eye alternately in green and blue. The robot motion randomHead is a command to execute an operation of changing the head of the robot 400 by plus or minus 5 degrees at random intervals. The robot motion openHand is a command to move the left shoulder joint and the right shoulder joint downward and execute an operation to open the right arm joint and the left arm joint. The robot motion homePose is a command to return all the driving members to the initial positions and execute an operation of turning off the light emitting unit 700.

図４の最上段の動作内容及び実行内容に基づいて動作内容を説明する。図４の最上段の動作内容は、“action:greeting”、実行内容の値が“robot eyeColor green、robot motion raiseHand、sound play sound1、led play animation1”である。従って、図４の最上段の動作内容によると、状態遷移情報が“action:greeting”を示す場合、制御装置２００は、動作内容“action:greeting”に対応付けされた実行内容に基づいてロボット４００及びロボット４００に係る機器を制御する。具体的には、制御装置２００は、“robot eyeColor green”を実行することで、ロボット４００の第一発光部を緑に発光させる。次に、制御装置２００は、“robot motion raiseHand”を実行することで、ロボット４００の左肩関節部を上に上げる。次に、制御装置２００は、“sound play sound1”を実行することで、ロボット４００に係る機器であるマイク・スピーカー５００から、ラッパ口音を発音させる。次に、制御装置２００は、“led play animation1”を実行することで、ロボット４００に係る機器である発光部７００を点滅させる。なお、動作内容は、図４に示す態様に限定されない。例えば、実行内容は、図４に示す内容以外の実行内容を有してもよい。 The operation contents will be described based on the operation contents and the execution contents at the top of FIG. The operation content at the top of FIG. 4 is “action: greeting”, and the value of the execution content is “robot eyeColor green, robot motion raiseHand, sound play sound1, led play animation1”. Therefore, according to the operation content at the top of FIG. 4, when the state transition information indicates “action: greeting”, the control device 200 determines the robot 400 based on the execution content associated with the operation content “action: greeting”. And control the equipment related to the robot 400. Specifically, the control device 200 causes the first light emitting unit of the robot 400 to emit green light by executing "robot eyeColor green". Next, the control device 200 raises the left shoulder joint portion of the robot 400 by executing "robot motion raiseHand". Next, the control device 200 executes "sound play sound1" to generate a trumpet mouth sound from the microphone / speaker 500, which is a device related to the robot 400. Next, the control device 200 blinks the light emitting unit 700, which is a device related to the robot 400, by executing "led play animation1". The operation content is not limited to the mode shown in FIG. For example, the execution content may have execution content other than the content shown in FIG.

図１に戻って、ロボット制御システム１の説明を続ける。制御部２０４は、制御装置２００の各部の動作を制御する。制御部２０４は、ＣＰＵ等のプロセッサ及びＲＡＭを用いて構成される。制御部２０４は、プロセッサが特定のプログラムを実行することによって、通信制御部２４１、機器制御部２４２及び映像制御部２４３として機能する。 Returning to FIG. 1, the description of the robot control system 1 will be continued. The control unit 204 controls the operation of each unit of the control device 200. The control unit 204 is configured by using a processor such as a CPU and a RAM. The control unit 204 functions as a communication control unit 241, a device control unit 242, and a video control unit 243 when the processor executes a specific program.

通信制御部２４１は、所定の通信プログラムを実行することによって、他の装置と通信する。例えば、通信制御部２４１は、中継サーバ３００を介して端末装置１００から制御情報を受信する。また、通信制御部２４１は、端末装置１００から角度情報を受信する。 The communication control unit 241 communicates with another device by executing a predetermined communication program. For example, the communication control unit 241 receives control information from the terminal device 100 via the relay server 300. Further, the communication control unit 241 receives the angle information from the terminal device 100.

機器制御部２４２は、制御情報又は角度情報に基づいて、ロボット４００又はロボット４００に係る機器を制御する。例えば、機器制御部２４２は、動作指示をロボット４００又はロボット４００に係る機器に送信することで、これらの機器を制御する。動作指示は、制御情報又は角度情報の内容を示す。例えば、端末装置１００から制御情報を受信した場合、機器制御部２４２は、制御情報の内容を示す動作指示をロボット４００又はロボット４００に係る機器に送信する。例えば、制御情報が、“robot eyeColor green”を示す場合、機器制御部２４２は、ロボット４００に第一発光部を緑に発光させる動作指示を送信する。 The device control unit 242 controls the robot 400 or the device related to the robot 400 based on the control information or the angle information. For example, the device control unit 242 controls these devices by transmitting an operation instruction to the robot 400 or the devices related to the robot 400. The operation instruction indicates the content of the control information or the angle information. For example, when the control information is received from the terminal device 100, the device control unit 242 transmits an operation instruction indicating the content of the control information to the robot 400 or the device related to the robot 400. For example, when the control information indicates "robot eyeColor green", the device control unit 242 transmits an operation instruction for causing the first light emitting unit to emit green light to the robot 400.

また、機器制御部２４２は、端末装置１００から角度情報を受信した場合、角度情報の内容を示す動作指示をロボット４００又はロボット４００に係る機器に送信する。例えば、座標情報がＰ＝（Ｘ、Ｙ、Ｚ）である場合について説明する。この場合、機器制御部２４２は、ロボット４００の頭部を制御することで、ロボット４００の頭部が、座標Ｐ＝（Ｘ、Ｙ、Ｚ）の方向を向くようにロボット４００の頭部を制御する動作指示を送信する。 When the device control unit 242 receives the angle information from the terminal device 100, the device control unit 242 transmits an operation instruction indicating the content of the angle information to the robot 400 or the device related to the robot 400. For example, a case where the coordinate information is P = (X, Y, Z) will be described. In this case, the device control unit 242 controls the head of the robot 400 so that the head of the robot 400 faces the direction of the coordinates P = (X, Y, Z). Send an operation instruction.

映像制御部２４３は、所定の映像制御プログラムを実行することによって、他の装置と通信する。例えば、映像制御部２４３は、マイク・スピーカー５００によって収音された音声信号を端末装置１００に送信する。また、映像制御部２４３は、カメラ６００によって出力された映像信号を制御装置２００に送信する。また、映像制御部２４３は、端末装置１００から音声信号を受信する。映像制御部２４３は、受信した音声信号をマイク・スピーカー５００に出力する。 The video control unit 243 communicates with other devices by executing a predetermined video control program. For example, the video control unit 243 transmits the audio signal picked up by the microphone / speaker 500 to the terminal device 100. Further, the video control unit 243 transmits the video signal output by the camera 600 to the control device 200. Further, the video control unit 243 receives an audio signal from the terminal device 100. The video control unit 243 outputs the received audio signal to the microphone / speaker 500.

中継サーバ３００は、パーソナルコンピュータ、産業用コンピュータ又はサーバ等の情報処理装置を用いて構成される。中継サーバ３００は、端末装置１００と制御装置２００との通信を中継するための中継機能が実装されている。中継機能は、ハードウェアによって中継サーバ３００に実装されてもよいし、ソフトウェアのインストールによって実装されてもよい。中継サーバ３００は、例えばＷｅｂＲＴＣ（Web Real-Time Communication）シグナリングを実行することで、端末装置１００と制御装置２００との映像信号及び音声信号の送受信を実現させる。また、中継サーバ３００は、端末装置１００から制御装置２００に制御情報又は角度情報を送信するためのＷｅｂＳｏｃｋｅｔサーバとして機能してもよい。 The relay server 300 is configured by using an information processing device such as a personal computer, an industrial computer, or a server. The relay server 300 is equipped with a relay function for relaying communication between the terminal device 100 and the control device 200. The relay function may be implemented in the relay server 300 by hardware, or may be implemented by installing software. The relay server 300 realizes transmission / reception of video signals and audio signals between the terminal device 100 and the control device 200, for example, by executing WebRTC (Web Real-Time Communication) signaling. Further, the relay server 300 may function as a WebSocket server for transmitting control information or angle information from the terminal device 100 to the control device 200.

ロボット４００は、制御装置２００によって送信された動作指示に応じて各駆動機構や発光部、スピーカー又はカメラ等のロボット４００に設けられた機能を制御して所定の動作を行う。ロボット４００は、例えば、首、肩又は腕の各関節部に設けられた駆動機構を作動して動作してもよい。ロボット４００は、例えば、肩又は脚等の各関節部に設けられた駆動機構を作動して歩行する動物型であってもよい。ロボット４００は、肩又は脚等の各関節部に設けられた駆動機構を作動して自立歩行する二足歩行等のロボットであってもよい。ロボット４００は車輪又は無限軌道で移動できるような移動型ロボットであってもよい。ロボット４００は、例えばテーブルや受付台等の板状の台の上に設置される。ロボットは所定の装置の一具体例である。 The robot 400 performs a predetermined operation by controlling functions provided in the robot 400 such as each drive mechanism, a light emitting unit, a speaker, and a camera in response to an operation instruction transmitted by the control device 200. The robot 400 may operate by operating a drive mechanism provided at each joint of the neck, shoulder, or arm, for example. The robot 400 may be, for example, an animal type that walks by operating a drive mechanism provided at each joint such as a shoulder or a leg. The robot 400 may be a robot such as a bipedal walking robot that walks independently by operating a drive mechanism provided at each joint such as a shoulder or a leg. The robot 400 may be a mobile robot capable of moving on wheels or tracks. The robot 400 is installed on a plate-shaped table such as a table or a reception table. A robot is a specific example of a predetermined device.

図５は、ロボット４００の一具体例を示す図である。ロボット４００は、頭部４１０と、左腕部４２０、右腕部４３０及び脚部４４０を備える胴体部とから構成される。頭部４１０は、マイク４１１、カメラ４１２、第一発光部４１３及び第二発光部４１４を備える。頭部４１０は、制御装置２００の機器制御部２４２によって指示される動作指示に基づいて、上下左右等の３軸方向に駆動する。ロボット４００は、例えば動作指示が示す角度の方向に頭部４１０の向きを制御する。 FIG. 5 is a diagram showing a specific example of the robot 400. The robot 400 is composed of a head portion 410 and a body portion including a left arm portion 420, a right arm portion 430, and a leg portion 440. The head 410 includes a microphone 411, a camera 412, a first light emitting unit 413, and a second light emitting unit 414. The head 410 is driven in three axial directions such as up, down, left, and right based on an operation instruction instructed by the device control unit 242 of the control device 200. The robot 400 controls the direction of the head 410 in the direction of the angle indicated by the operation instruction, for example.

マイク４１１は、自装置近傍の音声を収音する。マイク４１１は、例えばロボット４００が対峙する人によって発話された音声を収音する。マイク４１１は、収音された音声に基づいて音声信号を生成する。マイク４１１は、生成された音声信号を制御装置２００に記録してもよいし、ネットワーク８００を介して外部の装置に記録してもよい。マイク４１１は、制御装置２００の機器制御部２４２によって指示される実行内容に応じて、音声を収音してもよい。 The microphone 411 collects sound in the vicinity of its own device. The microphone 411 picks up, for example, the voice spoken by the person facing the robot 400. The microphone 411 generates a voice signal based on the picked up voice. The microphone 411 may record the generated audio signal in the control device 200, or may record it in an external device via the network 800. The microphone 411 may pick up sound according to the execution content instructed by the device control unit 242 of the control device 200.

カメラ４１２は、ロボット４００が対峙する人及びロボット４００が対峙する人の近傍の動画像又は静止画像を撮像する。カメラ４１２は、撮像された動画像又は静止画像を制御装置２００に記録してもよいし、ネットワーク８００を介して外部の装置に記録してもよい。カメラ４１２は、制御装置２００の機器制御部２４２によって指示される動作指示に応じて、動画像又は静止画像を撮像してもよい。 The camera 412 captures a moving image or a still image in the vicinity of the person facing the robot 400 and the person facing the robot 400. The camera 412 may record the captured moving image or still image in the control device 200, or may record it in an external device via the network 800. The camera 412 may capture a moving image or a still image in response to an operation instruction instructed by the device control unit 242 of the control device 200.

第一発光部４１３は、フルカラーＬＥＤ等の発光部材である。第一発光部４１３は、制御装置２００の機器制御部２４２によって指示される動作指示に応じて、所定の色で発光してもよい。例えば、第一発光部４１３は、動作指示として“robot eyeColor green”を指示された場合、緑色に発光する。なお、第一発光部４１３は、ロボット４００の右目と左目とに相当する。第一発光部４１３は、左右の目を同時に同じ色で発光してもよいし、交互に異なる色で発光してもよい。第二発光部４１４は、フルカラーＬＥＤ等の発光部材である。第二発光部４１４は、制御装置２００の機器制御部２４２によって指示される実行内容に応じて、所定の色で発光してもよい。例えば、第二発光部４１４は、制御内容として“event:talking”が指示された場合、赤色に点滅する。このように構成されることで、マイク・スピーカー５００が音声を出力している時に、第二発光部４１４を点滅させることができる。このように第二発光部４１４が発行することで、ロボット４００が、ロボット４００に対峙している人に話しかけているように見せることができる。 The first light emitting unit 413 is a light emitting member such as a full-color LED. The first light emitting unit 413 may emit light in a predetermined color in response to an operation instruction instructed by the device control unit 242 of the control device 200. For example, the first light emitting unit 413 emits green light when "robot eyeColor green" is instructed as an operation instruction. The first light emitting unit 413 corresponds to the right eye and the left eye of the robot 400. The first light emitting unit 413 may simultaneously emit light of the same color to the left and right eyes, or may alternately emit light of different colors. The second light emitting unit 414 is a light emitting member such as a full-color LED. The second light emitting unit 414 may emit light in a predetermined color according to the execution content instructed by the device control unit 242 of the control device 200. For example, the second light emitting unit 414 blinks in red when "event: talking" is instructed as the control content. With this configuration, the second light emitting unit 414 can be blinked when the microphone / speaker 500 is outputting sound. By issuing the second light emitting unit 414 in this way, it is possible to make the robot 400 appear to be talking to a person facing the robot 400.

左腕部４２０は、左肩関節部４２１と左腕関節部４２２とを備える。左肩関節部４２１は、左腕部４２０を上下左右に駆動させる駆動機構である。左肩関節部４２１は、制御装置２００の機器制御部２４２によって指示される動作指示に応じて、回転駆動する。例えば、左肩関節部４２１は、制御内容として“robot motion raiseHand”が指示された場合、左腕部４２０を上方向に移動させるように回転駆動させる。左肩関節部４２１は、回転方向に応じて左腕部４２０を上下左右に駆動させる。左腕関節部４２２は、回転方向に応じて左腕部４２０を開いたり閉じたりする。左腕関節部４２２は、制御装置２００の機器制御部２４２によって指示される実行内容に応じて、回転駆動してもよい。例えば、左腕関節部４２２は、実行内容として“robot motion openHand”が指示された場合、左腕部４２０を開くように回転駆動してもよい。 The left arm portion 420 includes a left shoulder joint portion 421 and a left arm joint portion 422. The left shoulder joint portion 421 is a drive mechanism that drives the left arm portion 420 up, down, left, and right. The left shoulder joint portion 421 is rotationally driven in response to an operation instruction instructed by the device control unit 242 of the control device 200. For example, the left shoulder joint portion 421 is rotationally driven so as to move the left arm portion 420 upward when "robot motion raiseHand" is instructed as the control content. The left shoulder joint portion 421 drives the left arm portion 420 up, down, left and right according to the direction of rotation. The left arm joint portion 422 opens and closes the left arm portion 420 according to the direction of rotation. The left arm joint portion 422 may be rotationally driven according to the execution content instructed by the device control unit 242 of the control device 200. For example, the left arm joint portion 422 may be rotationally driven so as to open the left arm portion 420 when "robot motion openHand" is instructed as the execution content.

右腕部４３０は、右肩関節部４３１と右腕関節部４３２とを備える。右腕部４３０は、左腕部４２０と同様に、制御装置２００の機器制御部２４２によって指示される動作指示に応じて、右肩関節部４３１と右腕関節部４３２とを駆動させる。 The right arm portion 430 includes a right shoulder joint portion 431 and a right arm joint portion 432. Similar to the left arm portion 420, the right arm portion 430 drives the right shoulder joint portion 431 and the right arm joint portion 432 in response to an operation instruction instructed by the device control unit 242 of the control device 200.

脚部４４０は、制御装置２００の機器制御部２４２によって指示される動作指示に応じて、ロボット４００の向きを変更したり、前後左右に移動させたりする。ロボット４００は、例えば動作情報が示す角度の方向に脚部４４０の向きを制御する。 The legs 440 change the direction of the robot 400 or move the robot 400 back and forth and left and right in response to an operation instruction instructed by the device control unit 242 of the control device 200. The robot 400 controls the orientation of the legs 440 in the direction of the angle indicated by the motion information, for example.

図１に戻って、ロボット制御システム１の説明を続ける。マイク・スピーカー５００は、マイクとスピーカーとから構成される。マイク・スピーカー５００は、自装置近傍の音声を収音する。マイク・スピーカー５００は、例えばロボット４００が対峙する人によって発話された音声を収音する。マイク・スピーカー５００は、収音された音声に基づいて音声信号を生成する。マイク・スピーカー５００は、生成された音声信号を制御装置２００に出力する。マイク・スピーカー５００は、端末装置１００から送信された音声信号を出力する。このように構成されることで、ロボット４００が、ロボット４００に対峙する人に話しかけているように見せることができる。また、マイク・スピーカー５００は、制御装置２００の機器制御部２４２によって指示される動作指示に応じて、音声を出力してもよい。例えば、マイク・スピーカー５００は、動作指示として“sound play sound1”が指示された場合、ラッパ口音を出力する。 Returning to FIG. 1, the description of the robot control system 1 will be continued. The microphone / speaker 500 is composed of a microphone and a speaker. The microphone / speaker 500 collects sound in the vicinity of its own device. The microphone speaker 500, for example, picks up the sound uttered by the person facing the robot 400. The microphone speaker 500 generates a voice signal based on the picked-up voice. The microphone speaker 500 outputs the generated audio signal to the control device 200. The microphone / speaker 500 outputs an audio signal transmitted from the terminal device 100. With this configuration, the robot 400 can appear to be talking to a person facing the robot 400. Further, the microphone / speaker 500 may output voice in response to an operation instruction instructed by the device control unit 242 of the control device 200. For example, the microphone / speaker 500 outputs a trumpet mouth sound when "sound play sound1" is instructed as an operation instruction.

カメラ６００は、動画像の撮像装置である。カメラ６００は、例えばロボット４００の後部に設けられる。カメラ６００は、例えば、ロボット４００の後ろ姿が映るように、ロボット４００の近傍の動画像を撮像する。カメラ６００は、撮像された動画像を示す映像信号を生成し、制御装置２００に出力する。なお、カメラ６００は、ロボット４００の前後左右のいずれの位置に設けられてもよい。カメラ６００は、ロボット４００が対峙する人を撮像できるならば、どの位置に設けられてもよい。 The camera 600 is a moving image imaging device. The camera 600 is provided, for example, at the rear of the robot 400. The camera 600 captures a moving image in the vicinity of the robot 400 so that the rear view of the robot 400 can be seen, for example. The camera 600 generates a video signal indicating the captured moving image and outputs it to the control device 200. The camera 600 may be provided at any position in the front, back, left, or right of the robot 400. The camera 600 may be provided at any position as long as the robot 400 can image the person facing it.

発光部７００は、ＬＥＤや電球等の発光部材である。発光部７００は、例えば所定の形状をしたパネルに複数の発光部材を備えてもよい。所定の形状とは、円形であってもよいし、四角形等の多角形であってもよい。発光部７００は、制御装置２００の機器制御部２４２によって指示される動作指示に応じて、発光してもよい。例えば、機器制御部２４２が動作指示として“led play animation1”を指示した場合、発光部７００は、点滅するように発光する。なお、発光部７００は、実行内容に応じて、異なる態様で発光してもよい。例えば、発光部７００が複数の発光部材を備える場合、動作指示に応じて一部の発光部材を発行させるように構成されてもよい。 The light emitting unit 700 is a light emitting member such as an LED or a light bulb. The light emitting unit 700 may include, for example, a plurality of light emitting members on a panel having a predetermined shape. The predetermined shape may be a circle or a polygon such as a quadrangle. The light emitting unit 700 may emit light in response to an operation instruction instructed by the device control unit 242 of the control device 200. For example, when the device control unit 242 instructs "led play animation1" as an operation instruction, the light emitting unit 700 emits light so as to blink. The light emitting unit 700 may emit light in different modes depending on the content of execution. For example, when the light emitting unit 700 includes a plurality of light emitting members, some light emitting members may be issued in response to an operation instruction.

図６は、ロボット４００の頭部４１０が向く方向を示す座標を受け付ける一具体例を示す図である。図６に示す映像１３０は、カメラ６００によって撮像されるカメラ映像である。映像１３０は、表示部１０３に表示される。映像１３０は、図５に示すようなロボット４００の背面を映す。なお、映像１３０には、発光部７００とロボット４００とが映されているように描画されているが、ロボット４００は、発光部７００の前方向（奥行き方向）に設置されている。このため、ロボット４００は、発光部７００によって遮られており、表示部１０３には映らない。図６において、ロボット４００は、説明のために記載されている。 FIG. 6 is a diagram showing a specific example of accepting coordinates indicating the direction in which the head 410 of the robot 400 faces. The image 130 shown in FIG. 6 is a camera image captured by the camera 600. The image 130 is displayed on the display unit 103. The image 130 shows the back surface of the robot 400 as shown in FIG. Although the image 130 is drawn so that the light emitting unit 700 and the robot 400 are projected, the robot 400 is installed in the front direction (depth direction) of the light emitting unit 700. Therefore, the robot 400 is blocked by the light emitting unit 700 and does not appear on the display unit 103. In FIG. 6, the robot 400 is described for illustration purposes.

映像１３０は、高さＨ、幅Ｗの解像度を持つ。高さ及び幅は、カメラ６００の性能や、端末装置１００及び制御装置２００に導入されている映像ソフトウェアの性能に応じて定まる。映像１３０において、Ｘ軸方向の画角はＡ＿ｘで表される。映像１３０において、Ｙ軸方向の画角はＡ＿ｙで表される。点１３１は、映像１３０におけるＸ軸及びＹ軸の原点を示す。点１３２は、入力部１０２を介して受け付けた座標を示す。端末装置１００の操作者は、例えば、入力部１０２に基づいて、表示部１０３に表示される映像１３０の任意の座標を選択することができる。点１３２は、入力部１０２を介して選択された座標を表す。制御装置２００は、点１３２に基づいて算出される方向に頭部４１０を向くようにロボット４００を制御する。 The image 130 has a resolution of height H and width W. The height and width are determined according to the performance of the camera 600 and the performance of the video software installed in the terminal device 100 and the control device 200. In the image 130, the angle of view in the X-axis direction is represented by A_x. In the image 130, the angle of view in the Y-axis direction is represented by A_y. Point 131 indicates the origin of the X-axis and the Y-axis in the image 130. Point 132 indicates the coordinates received via the input unit 102. The operator of the terminal device 100 can select, for example, arbitrary coordinates of the image 130 displayed on the display unit 103 based on the input unit 102. Point 132 represents the coordinates selected via the input unit 102. The control device 200 controls the robot 400 so as to face the head 410 in the direction calculated based on the point 132.

図７は、音声でロボット４００を制御する処理の一具体例を示す図である。端末装置１００のマイク１０４は、端末装置１００を操作する人の音声を収音する（ステップＳ１０１）。端末装置１００を操作する人は、表示部１０３に表示されるカメラ映像に基づき、発話する。端末装置１００を操作する人は、ロボット４００が対峙する人に向けて発話する。ロボット４００が対峙する人は、カメラ６００によって撮像される。ロボット４００が対峙する人は、表示部１０３に表示される。マイク１０４は、収音された音声に基づいて音声信号を生成する。マイク１０４は、生成された音声信号を端末装置１００に出力する。端末装置１００の音声認識部１７２は、音声認識処理を実行する（ステップＳ１０２）。音声認識部１７２は、マイク１０４によって出力された音声信号に基づいて文字列を生成する。端末装置１００の制御情報生成部１７３は、生成された文字列に基づいて制御情報を生成する（ステップＳ１０３）。 FIG. 7 is a diagram showing a specific example of a process of controlling the robot 400 by voice. The microphone 104 of the terminal device 100 picks up the voice of the person who operates the terminal device 100 (step S101). The person who operates the terminal device 100 speaks based on the camera image displayed on the display unit 103. The person who operates the terminal device 100 speaks to the person whom the robot 400 faces. The person facing the robot 400 is imaged by the camera 600. The person facing the robot 400 is displayed on the display unit 103. The microphone 104 generates a voice signal based on the picked up voice. The microphone 104 outputs the generated audio signal to the terminal device 100. The voice recognition unit 172 of the terminal device 100 executes the voice recognition process (step S102). The voice recognition unit 172 generates a character string based on the voice signal output by the microphone 104. The control information generation unit 173 of the terminal device 100 generates control information based on the generated character string (step S103).

端末装置１００の映像制御部１７４は、マイク１０４によって出力された音声信号を制御装置２００に送信する（ステップＳ１０４）。また、制御装置２００の映像制御部２４３は、端末装置１００から受信した音声信号をマイク・スピーカー５００に出力する（ステップＳ１０５）。制御情報生成部１７３は、生成された制御情報を中継サーバ３００に送信する（ステップＳ１０６）。中継サーバ３００は、受信した制御情報を制御装置２００に送信する（ステップＳ１０７）。 The video control unit 174 of the terminal device 100 transmits the audio signal output by the microphone 104 to the control device 200 (step S104). Further, the video control unit 243 of the control device 200 outputs the audio signal received from the terminal device 100 to the microphone / speaker 500 (step S105). The control information generation unit 173 transmits the generated control information to the relay server 300 (step S106). The relay server 300 transmits the received control information to the control device 200 (step S107).

制御装置２００の機器制御部２４２は、制御情報に基づいて動作指示を生成する（ステップＳ１０８）。機器制御部２４２は、動作指示をロボット４００に送信する（ステップＳ１０９）。マイク・スピーカー５００は、制御装置２００から受信した音声信号を音声として出力する（ステップＳ１１１）。マイク・スピーカー５００は、音声を出力することで、ロボット４００が対峙する人に、端末装置１００を操作する人の発話を伝えることができる。ロボット４００は、動作指示に基づいて、動作する（ステップＳ１１１）。ロボット４００は、マイク・スピーカー５００から出力される音声と合わせた動作をすることで、ロボット４００が対峙する人と円滑な対話をすることが可能になる。 The device control unit 242 of the control device 200 generates an operation instruction based on the control information (step S108). The device control unit 242 transmits an operation instruction to the robot 400 (step S109). The microphone / speaker 500 outputs the voice signal received from the control device 200 as voice (step S111). By outputting voice, the microphone / speaker 500 can convey the utterance of the person who operates the terminal device 100 to the person whom the robot 400 faces. The robot 400 operates based on the operation instruction (step S111). The robot 400 operates in combination with the voice output from the microphone / speaker 500, so that the robot 400 can have a smooth dialogue with the person facing the robot 400.

図８は、ロボット４００の向きを制御する処理の一具体例を示す図である。本実施形態では、図６のようにカメラ６００は、ロボット４００の後ろ姿が映った動画像を撮像する。ロボット４００の向きを制御する処理は、ロボット制御システム１が動作している間、カメラ映像上の座標を受け付けることで実行されてもよい。カメラ６００は、撮像された動画像の映像信号を生成する。カメラ６００は映像信号を制御装置２００に出力する（ステップＳ２０１）。制御装置２００の映像制御部２４３は、受信した映像信号を端末装置１００に送信する（ステップＳ２０２）。端末装置１００の映像制御部１７４は、受信した映像信号を表示部１０３に出力する（ステップＳ２０３）。 FIG. 8 is a diagram showing a specific example of a process for controlling the orientation of the robot 400. In the present embodiment, as shown in FIG. 6, the camera 600 captures a moving image showing the rear view of the robot 400. The process of controlling the orientation of the robot 400 may be executed by accepting the coordinates on the camera image while the robot control system 1 is operating. The camera 600 generates a video signal of the captured moving image. The camera 600 outputs a video signal to the control device 200 (step S201). The video control unit 243 of the control device 200 transmits the received video signal to the terminal device 100 (step S202). The video control unit 174 of the terminal device 100 outputs the received video signal to the display unit 103 (step S203).

端末装置１００の角度算出部１７５は、入力部１０２を介して表示部１０３に出力されたカメラ映像上の座標を受け付ける（ステップＳ２０４）。角度算出部１７５は、受け付けた座標によって示される方向を向くようにロボット４００の頭部４１０の角度を算出する（ステップＳ２０５）。角度算出部１７５は、算出された角度を角度情報として制御装置２００に送信する（ステップＳ２０６）。 The angle calculation unit 175 of the terminal device 100 receives the coordinates on the camera image output to the display unit 103 via the input unit 102 (step S204). The angle calculation unit 175 calculates the angle of the head 410 of the robot 400 so as to face the direction indicated by the received coordinates (step S205). The angle calculation unit 175 transmits the calculated angle as angle information to the control device 200 (step S206).

制御装置２００の機器制御部２４２は、受信した角度情報に基づいて動作指示を生成する（ステップＳ２０７）。機器制御部２４２は、動作指示をロボット４００に送信する（ステップＳ２０８）。ロボット４００は、動作指示に基づいて、動作を実行する（ステップＳ２０９）。具体的には、ロボット４００は、動作指示が示す角度に頭部４１０を駆動させる。 The device control unit 242 of the control device 200 generates an operation instruction based on the received angle information (step S207). The device control unit 242 transmits an operation instruction to the robot 400 (step S208). The robot 400 executes an operation based on the operation instruction (step S209). Specifically, the robot 400 drives the head 410 at an angle indicated by the operation instruction.

このように構成されたロボット制御システム１は、ロボット４００が対峙する人に対して発話された音声に基づいて制御情報を生成する。機器制御部２４２が、制御情報に基づいてロボット４００又はロボット４００に係る機器を制御する。このため、端末装置１００のユーザは、ロボット４００を介して、ロボット４００が対峙する人と会話をしながら、ロボット４００を制御することが可能になる。このため、ロボット制御システム１では、より簡単にロボット４００を制御することができる。 The robot control system 1 configured in this way generates control information based on the voice spoken by the robot 400 to the person facing it. The device control unit 242 controls the robot 400 or the device related to the robot 400 based on the control information. Therefore, the user of the terminal device 100 can control the robot 400 through the robot 400 while having a conversation with a person whom the robot 400 faces. Therefore, the robot control system 1 can control the robot 400 more easily.

また、ロボット制御システム１は、ロボット４００とロボット４００の前方向の空間とを示す映像を端末装置１００の表示部１０３に表示する。角度算出部１７５は、端末装置１００のユーザから、カメラ映像の座標を受け付けることで、ロボット４００が指定された座標の方向を向くためのロボット４００の頭部４１０の角度を算出する。機器制御部２４２は、算出された角度に基づいてロボット４００を制御する。このため、端末装置１００のユーザは、座標を入力するだけで、ロボット４００の向きを変更することができる。このため、ロボット制御システム１では、より簡単にロボット４００を制御することができる。 Further, the robot control system 1 displays an image showing the robot 400 and the space in the front direction of the robot 400 on the display unit 103 of the terminal device 100. The angle calculation unit 175 receives the coordinates of the camera image from the user of the terminal device 100, and calculates the angle of the head 410 of the robot 400 for the robot 400 to face the direction of the designated coordinates. The device control unit 242 controls the robot 400 based on the calculated angle. Therefore, the user of the terminal device 100 can change the orientation of the robot 400 simply by inputting the coordinates. Therefore, the robot control system 1 can control the robot 400 more easily.

上述の実施形態では、ロボット制御システム１はロボット４００に係る機器として、マイク・スピーカー５００、カメラ６００及び発光部７００を備えるものとして説明したが、これらに限定されない。例えば、ロボット制御システム１は、ロボット４００に係る機器として、端末装置１００のユーザによって発話された音声を示す文字列を表示する表示装置を備えてもよい。文字列は、例えば音声認識処理によって生成される。このように構成されることで、ロボット制御システム１では、端末装置１００のユーザは、ロボット４００を介して聴覚障害を持つ人とも会話をすることが可能になる。 In the above-described embodiment, the robot control system 1 has been described as including a microphone / speaker 500, a camera 600, and a light emitting unit 700 as devices related to the robot 400, but the robot control system 1 is not limited thereto. For example, the robot control system 1 may include, as a device related to the robot 400, a display device that displays a character string indicating a voice uttered by a user of the terminal device 100. The character string is generated by, for example, voice recognition processing. With this configuration, in the robot control system 1, the user of the terminal device 100 can have a conversation with a person with a hearing disability via the robot 400.

上述の実施形態では、ロボット制御システム１は、ロボット４００の代わりにディスプレイ等の表示装置や、アクチュエータ等の駆動装置を備えるように構成されてもよい。図９は、ロボット制御システムがロボットの代わりに表示装置又はアクチュエータ等の所定の装置を備える場合の一具体例を示す図である。図９（ａ）は、ロボット制御システムが、ロボットの代わりに表示装置を備える場合の一具体例を示す図である。図９（ａ）によると、ロボット制御システム１は、ロボット４００の代わりに表示装置４００ａを備える。表示装置４００ａは、タッチパネルやディスプレイ等の表示装置である。表示装置４００ａは、人画像４０１を表示する。人画像４０１は、人の姿を示す画像である。人の姿は、全身を表す姿であってもよいし、バストアップ等の姿の一部であってもよい。人画像４０１は、実写が用いられてもよいし、ＣＧ(Computer Graphics)が用いられてもよい。人画像４０１は、制御装置２００から受信した動作指示に基づいて動作する。人画像４０１は、ロボット４００と同様の動作を行ってもよい。なお、表示装置４００ａに示す画像は、複数の人画像を表示してもよいし、動物、物又は植物等の人以外の画像を表示してもよい。 In the above-described embodiment, the robot control system 1 may be configured to include a display device such as a display or a drive device such as an actuator instead of the robot 400. FIG. 9 is a diagram showing a specific example of a case where the robot control system is provided with a predetermined device such as a display device or an actuator instead of the robot. FIG. 9A is a diagram showing a specific example of a case where the robot control system includes a display device instead of the robot. According to FIG. 9A, the robot control system 1 includes a display device 400a instead of the robot 400. The display device 400a is a display device such as a touch panel or a display. The display device 400a displays the human image 401. The human image 401 is an image showing a human figure. The figure of a person may be a figure representing the whole body or a part of a figure such as a bust-up. As the human image 401, a live-action image may be used, or CG (Computer Graphics) may be used. The human image 401 operates based on the operation instruction received from the control device 200. The human image 401 may perform the same operation as the robot 400. The image shown on the display device 400a may display a plurality of human images, or may display images other than humans such as animals, objects, and plants.

図９（ｂ）は、ロボット制御システムが、ロボットの代わりにアクチュエータを備える場合の一具体例を示す図である。図９（ｂ）によると、ロボット制御システム１は、ロボット４００の代わりにアクチュエータ４００ｂを備える。アクチュエータ４００ｂは、内部にモータや伝達ギア等を備える。アクチュエータ４００ｂには、物体４０２を載せることができる。物体４０２は、本、食品又は道具等の物である。アクチュエータ４００ｂは、制御装置２００から受信した動作指示に基づいて動作する。具体的には、アクチュエータ４００ｂは、回転移動をしたり、上下に振動したり等の所定の動作を行う。アクチュエータ４００ｂに載せられた物体４０２は、アクチュエータ４００ｂの動作に応じて、回転移動したり、上下に振動したりする。このように、ロボット制御システム１は、アクチュエータ４００ｂを動作させながら、マイク・スピーカー５００から音声を出力させることで、物体４０２を生き物のようにふるまわせることができる。このため、物体４０２がショップの商品である場合には、来店客の購買意欲を駆り立てることが可能になる。 FIG. 9B is a diagram showing a specific example of a case where the robot control system includes an actuator instead of the robot. According to FIG. 9B, the robot control system 1 includes an actuator 400b instead of the robot 400. The actuator 400b includes a motor, a transmission gear, and the like inside. An object 402 can be mounted on the actuator 400b. The object 402 is an object such as a book, food, or a tool. The actuator 400b operates based on the operation instruction received from the control device 200. Specifically, the actuator 400b performs a predetermined operation such as rotational movement or vertical vibration. The object 402 mounted on the actuator 400b rotates and moves or vibrates up and down according to the operation of the actuator 400b. In this way, the robot control system 1 can make the object 402 behave like a living thing by outputting sound from the microphone / speaker 500 while operating the actuator 400b. Therefore, when the object 402 is a product of the shop, it is possible to drive the purchase motivation of the customers.

上述の実施形態では、端末装置１００の制御情報生成部１７３はマイク１０４によって収音された音声信号に基づいて制御情報を生成するように構成されたがこれに限定されない。例えば、制御情報生成部１７３は、入力部１０２を介して入力された文字列に基づいて制御情報を生成するように構成されてもよい。この場合、制御部１０７は、入力された文字列に基づいて音声信号を生成して、制御装置２００に送信する。がこのように構成されることで、端末装置１００のユーザが、喋ることができない人であっても、ロボット４００を介して、ロボット４００が対峙する人と会話することが可能になる。 In the above-described embodiment, the control information generation unit 173 of the terminal device 100 is configured to generate control information based on the audio signal picked up by the microphone 104, but the present invention is not limited to this. For example, the control information generation unit 173 may be configured to generate control information based on a character string input via the input unit 102. In this case, the control unit 107 generates an audio signal based on the input character string and transmits it to the control device 200. With this configuration, the user of the terminal device 100 can talk with the person whom the robot 400 faces through the robot 400, even if the person cannot speak.

上述の実施形態では、ロボット制御システム１は、１台のロボット４００を制御する者として説明したが、１台に限定されない。例えば、ロボット制御システム１は、２台以上のロボットを制御するように構成されてもよい。例えば、端末装置１００のユーザが、いずれのロボットを制御するのか、入力部１０２を介して指定できるように構成されてもよい。この場合、機器制御部２４２は指定されたロボット又はロボットに係る機器に対して動作指示を送信する。また、ロボット制御システム１は、複数台のロボットに対して同一の制御情報を送信するように構成されてもよい。 In the above-described embodiment, the robot control system 1 has been described as a person who controls one robot 400, but the robot control system 1 is not limited to one. For example, the robot control system 1 may be configured to control two or more robots. For example, the user of the terminal device 100 may be configured to specify which robot to control via the input unit 102. In this case, the device control unit 242 transmits an operation instruction to the designated robot or the device related to the robot. Further, the robot control system 1 may be configured to transmit the same control information to a plurality of robots.

上述の実施形態では、ロボット制御システム１は、入力された音声信号に基づいて文字列を生成し、制御情報を生成することでロボット４００等を動作させる。しかし、ロボット制御システム１は、収音された音声と所定の推定器とに基づいてロボット４００等の動作を行うように構成されてもよい。推定器とは、複数の学習データを機械学習することで生成される学習モデルである。推定器は、ロボット４００が対峙する人に向けて発話された音声に基づいて、制御情報を生成する機能を持つ。この場合、制御情報生成部１７３は、マイク１０４によって収音された音声と推定器とに基づいて制御情報を生成する。例えば、制御情報生成部１７３は、“おはよう”、“こんにちは”又は“こんばんは”等のあいさつを表す音声が収音された場合、制御情報生成部１７３は、推定器に基づいて、ロボット４００の片腕を上げる動作をする制御情報を生成してもよい。推定器は、例えば制御情報生成部１７３によって生成されてもよいし、予め端末装置１００が記憶していてもよい。学習データは、音声とロボット４００又はロボット４００に係る機器に対する制御の内容とを対応付けたデータである。学習データは、推定器の生成に用いられる。機械学習は、例えば、ＳＧＤ（Stochastic Gradient Descent）、ランダムフォレスト、線形回帰、決定木又はＣＮＮ（Convolutional Neural Network）等の公知の機械学習であればどのような機械学習であってもよい。なお、学習データには、音声の代わりに、音声に基づいて生成された波形が用いられてもよいし、音声に基づいて生成された文字列が用いられてもよい。 In the above-described embodiment, the robot control system 1 generates a character string based on the input voice signal, and operates the robot 400 or the like by generating control information. However, the robot control system 1 may be configured to operate the robot 400 or the like based on the picked-up voice and a predetermined estimator. The estimator is a learning model generated by machine learning a plurality of learning data. The estimator has a function of generating control information based on the voice spoken by the robot 400 to the person facing it. In this case, the control information generation unit 173 generates control information based on the voice picked up by the microphone 104 and the estimator. For example, the control information generating unit 173, if "Good morning", the voice representing a greeting such as "Hello" or "Good evening" picked up, the control information generating unit 173, based on the estimator, one arm of the robot 400 You may generate control information that operates to raise. The estimator may be generated by, for example, the control information generation unit 173, or may be stored in advance by the terminal device 100. The learning data is data in which the voice is associated with the robot 400 or the content of control for the device related to the robot 400. The training data is used to generate the estimator. The machine learning may be any known machine learning such as SGD (Stochastic Gradient Descent), random forest, linear regression, decision tree or CNN (Convolutional Neural Network). As the learning data, a waveform generated based on the voice may be used instead of the voice, or a character string generated based on the voice may be used.

端末装置１００は、ネットワーク８００を介して通信可能に接続された複数台の情報処理装置を用いて実装されてもよい。この場合、端末装置１００が備える各機能部は、複数の情報処理装置に分散して実装されてもよい。例えば、制御情報生成部１７３と状態推定部１７６とはそれぞれ異なる情報処理装置に実装されてもよい。 The terminal device 100 may be implemented by using a plurality of information processing devices that are communicably connected via the network 800. In this case, each functional unit included in the terminal device 100 may be distributed and mounted in a plurality of information processing devices. For example, the control information generation unit 173 and the state estimation unit 176 may be mounted on different information processing devices.

制御装置２００は、ネットワーク８００を介して通信可能に接続された複数台の情報処理装置を用いて実装されてもよい。この場合、制御装置２００が備える各機能部は、複数の情報処理装置に分散して実装されてもよい。例えば、機器制御部２４２と映像制御部２４３とはそれぞれ異なる情報処理装置に実装されてもよい。 The control device 200 may be implemented by using a plurality of information processing devices that are communicably connected via the network 800. In this case, each functional unit included in the control device 200 may be distributed and mounted in a plurality of information processing devices. For example, the device control unit 242 and the video control unit 243 may be mounted on different information processing devices.

上述した実施形態における端末装置１００及び制御装置２００をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The terminal device 100 and the control device 200 in the above-described embodiment may be realized by a computer. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs and the like within a range that does not deviate from the gist of the present invention.

１…ロボット制御システム，１００…端末装置，１０１…通信部，１０２…入力部，１０３…表示部，１０４…マイク，１０５…カメラ，１０６…制御内容記憶部，１０７…制御部，１７１…通信制御部，１７２…音声認識部，１７３…制御情報生成部，１７４…映像制御部，１７５…角度算出部，１７６…状態推定部，２００…制御装置，２０１…通信部，２０２…状態遷移情報記憶部，２０３…動作内容記憶部，２０４…制御部，２４１…通信制御部，２４２…機器制御部，２４３…映像制御部，３００…中継サーバ，４００…ロボット，５００…マイク・スピーカー，６００…カメラ，７００…発光部，８００…ネットワーク 1 ... Robot control system, 100 ... Terminal device, 101 ... Communication unit, 102 ... Input unit, 103 ... Display unit, 104 ... Microphone, 105 ... Camera, 106 ... Control content storage unit, 107 ... Control unit, 171 ... Communication control Unit, 172 ... Voice recognition unit, 173 ... Control information generation unit, 174 ... Video control unit, 175 ... Angle calculation unit, 176 ... State estimation unit, 200 ... Control device, 201 ... Communication unit, 202 ... State transition information storage unit , 203 ... Operation content storage unit, 204 ... Control unit, 241 ... Communication control unit, 242 ... Equipment control unit, 243 ... Video control unit, 300 ... Relay server, 400 ... Robot, 500 ... Microphone / speaker, 600 ... Camera, 700 ... light emitting part, 800 ... network

Claims

A sound collecting unit that collects the voice uttered to the person facing the predetermined device, and a sound collecting unit.
A control information generation unit that generates control information indicating the content of control for the predetermined device or the device related to the predetermined device based on the picked-up voice.
A device control unit that controls the predetermined device or the device related to the predetermined device based on the control information is provided.
Control system.

A voice recognition unit that generates a character string indicating the voice based on the voice is further provided.
The control information generation unit generates the control information based on the character string.
The control system according to claim 1.

A display unit that displays an image showing the predetermined device and the space in the direction in which the predetermined device is facing.
A calculation unit that calculates the direction of the predetermined device so as to face the space indicated by the coordinates based on the coordinates on the image designated by the user who speaks the voice is further provided.
The device control unit controls the direction of a predetermined device based on the calculated result.
The control system according to claim 1 or 2.

The device control unit drives, emits light, or sounds the predetermined device or the device related to the predetermined device according to the content of the voice.
The control system according to any one of claims 1 to 3.

The control information generation unit is generated by machine learning of learning data in which the voice is associated with the voice and the content of control for the predetermined device or the device related to the predetermined device, and the sound collecting unit. The control information is generated based on the picked-up sound.
The control system according to claim 1.

A sound collecting unit that collects the voice uttered to the person facing the predetermined device, and a sound collecting unit.
A control information generation unit that generates control information indicating the content of control for the predetermined device or the device related to the predetermined device based on the picked-up voice.
A terminal device.

The computer
A sound collection step that collects the voice spoken to the person facing the predetermined device, and
A control information generation step of generating control information indicating the content of control for the predetermined device or the device related to the predetermined device based on the picked-up voice.
It has a device control step for controlling the predetermined device or the device related to the predetermined device based on the control information.
Control method.

A computer program for operating a computer as the control system according to any one of claims 1 to 5.