JP2020185630A

JP2020185630A - Control device, robot, control method and control program

Info

Publication number: JP2020185630A
Application number: JP2019090756A
Authority: JP
Inventors: 学永尾; Manabu Nagao; 厚太鍋嶌; Kota Nabeshima
Original assignee: Preferred Networks Inc
Current assignee: Preferred Networks Inc
Priority date: 2019-05-13
Filing date: 2019-05-13
Publication date: 2020-11-19
Also published as: WO2020230784A1

Abstract

To provide a robot operated based on voice instruction of a user which improves a voice detection ratio.SOLUTION: A robot control device includes: a lip operation detection part which detects a lip operation of a user based on acquired image data; a voice detection part which detects voice data from the acquired voice data; and an instruction part which instructs a sound source to reduce voice emitted from the sound source when the lip operation detection part detects the lip operation of the user and the voice detection part does not detect the voice data.SELECTED DRAWING: Figure 1

Description

本開示は、制御装置、ロボット、制御方法及び制御プログラムに関する。 The present disclosure relates to control devices, robots, control methods and control programs.

従来より、ユーザの音声指示に基づいて動作するロボットが知られている。一例として、ユーザが発話した際に、マイクを適切な位置に移動させることで、ユーザの音声を適切な音圧で検出するロボットが挙げられる。 Conventionally, robots that operate based on a user's voice instruction have been known. One example is a robot that detects the user's voice at an appropriate sound pressure by moving the microphone to an appropriate position when the user speaks.

しかしながら、このようなロボットであっても、例えば、ユーザの音声以外の音（ロボット自身が発する音や周囲の音）が大きい場合には、ユーザの音声を検出できないことがある。 However, even with such a robot, for example, when a sound other than the user's voice (a sound emitted by the robot itself or an ambient sound) is loud, the user's voice may not be detected.

特開２００８−１２６３２９号公報JP-A-2008-126329

本開示は、ユーザの音声指示に基づいて動作するロボットにおいて、音声検出率を向上させることを目的とする。 An object of the present disclosure is to improve the voice detection rate in a robot that operates based on a user's voice instruction.

本開示の一態様による制御装置は、例えば、以下のような構成を有する。即ち、
取得した画像データに基づいて、ユーザの口唇動作を検出する口唇動作検出部と、
取得した音データから、音声データを検出する音声検出部と、
前記口唇動作検出部が前記ユーザの口唇動作を検出した場合であって、前記音声検出部が音声データを検出しなかった場合に、音源が発する音を低減させるよう指示する指示部とを有する。 The control device according to one aspect of the present disclosure has, for example, the following configuration. That is,
A lip motion detection unit that detects the user's lip motion based on the acquired image data,
A voice detector that detects voice data from the acquired sound data,
It has an instruction unit for instructing to reduce the sound emitted by the sound source when the lip motion detection unit detects the user's lip motion and the voice detection unit does not detect the voice data.

ロボットの外観構成の一例を示す図である。It is a figure which shows an example of the appearance composition of a robot. 制御装置のハードウェア構成の一例を示す第１の図である。FIG. 1 is a first diagram showing an example of a hardware configuration of a control device. 制御装置の機能構成の一例を示す第１の図である。FIG. 1 is a first diagram showing an example of a functional configuration of a control device. 制御装置による動作制御処理の流れを示す第１のフローチャートである。It is a 1st flowchart which shows the flow of the operation control processing by a control device. 制御装置の機能構成の一例を示す第２の図である。FIG. 2 is a second diagram showing an example of a functional configuration of a control device. 制御装置による動作制御処理の流れを示す第２のフローチャートである。It is a 2nd flowchart which shows the flow of the operation control processing by a control device. ロボットの制御装置の機能構成の一例を示す第３の図である。It is a 3rd figure which shows an example of the functional structure of the control device of a robot. 制御装置による動作制御処理の流れを示す第３のフローチャートである。It is a 3rd flowchart which shows the flow of the operation control processing by a control device. ロボットの作業環境の一例を示す第１の図である。It is the first figure which shows an example of the working environment of a robot. 制御装置のハードウェア構成の一例を示す第２の図である。FIG. 2 is a second diagram showing an example of a hardware configuration of a control device. ロボットの作業環境の一例を示す第２の図である。It is the 2nd figure which shows an example of the working environment of a robot. ロボットの作業環境の一例を示す第３の図である。It is a 3rd figure which shows an example of the working environment of a robot. 制御装置の機能構成の一例を示す第４の図である。FIG. 4 is a fourth diagram showing an example of a functional configuration of a control device. 制御装置による動作制御処理の流れを示す第４のフローチャートである。It is a 4th flowchart which shows the flow of the operation control processing by a control device.

以下、各実施形態について添付の図面を参照しながら説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複した説明を省略する。 Hereinafter, each embodiment will be described with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functional configuration are designated by the same reference numerals, so that duplicate description will be omitted.

［第１の実施形態］
＜ロボットの外観構成＞
はじめに、本実施形態に係る制御装置の制御対象であるロボットの外観構成について説明する。図１は、ロボットの外観構成の一例を示す図である。 [First Embodiment]
<Appearance configuration of robot>
First, the appearance configuration of the robot to be controlled by the control device according to the present embodiment will be described. FIG. 1 is a diagram showing an example of the appearance configuration of the robot.

ロボット１０は、ユーザの音声指示に基づいて動作する。図１に示すように、ロボット１０は、カメラ１１０と、マイクロフォン１２０と、スピーカ１３０、１３１と、可動部１４０、１５０（複数の関節、エンドエフェクタ）と、可動部１６０（移動機構）とを有する。また、ロボット１０は、制御装置１００を内蔵する。ただし、制御装置１００は、ロボット１０に内蔵させずに、別のコンピュータで実現してもよい。その場合、別のコンピュータは、通信によりロボット１０との間で情報の送受信を行う。 The robot 10 operates based on a user's voice instruction. As shown in FIG. 1, the robot 10 has a camera 110, a microphone 120, speakers 130 and 131, movable portions 140 and 150 (a plurality of joints and end effectors), and a movable portion 160 (moving mechanism). .. Further, the robot 10 has a built-in control device 100. However, the control device 100 may be realized by another computer without being built in the robot 10. In that case, another computer transmits / receives information to / from the robot 10 by communication.

カメラ１１０は撮像装置（撮像部）の一例であり、ロボット１０の周囲を撮影し、画像データを生成する。なお、図１の例では、カメラ１１０が２つの撮像素子を有する場合について示しているが、撮像素子は２つに限定されない。また、図１の例は、カメラ１１０がロボット１０に搭載される場合について示しているが、カメラ１１０は、ロボット１０とは別体に配されていてもよい。 The camera 110 is an example of an image pickup device (imaging unit), and photographs the surroundings of the robot 10 to generate image data. In the example of FIG. 1, the case where the camera 110 has two image pickup elements is shown, but the number of image pickup elements is not limited to two. Further, although the example of FIG. 1 shows a case where the camera 110 is mounted on the robot 10, the camera 110 may be arranged separately from the robot 10.

マイクロフォン１２０は集音装置（集音部）の一例であり、空気などを伝播してくる音を検出し、音データを生成する。マイクロフォン１２０が検出する音には、ロボット１０の周囲の環境音のほか、ロボット１０自身が発する音、ロボット１０のユーザが発する音声等が含まれる。なお、図１の例では、１つのマイクロフォン１２０のみを示したが、ロボット１０は、複数のマイクロフォンを有していてもよい。 The microphone 120 is an example of a sound collecting device (sound collecting unit), detects sound propagating in air or the like, and generates sound data. The sounds detected by the microphone 120 include, in addition to the environmental sounds around the robot 10, the sounds emitted by the robot 10 itself, the sounds emitted by the user of the robot 10, and the like. Although only one microphone 120 is shown in the example of FIG. 1, the robot 10 may have a plurality of microphones.

スピーカ１３０、１３１は音声出力装置の一例であり、制御装置１００において生成された音声信号に基づく合成音声等を出力する。 The speakers 130 and 131 are examples of voice output devices, and output synthetic voices and the like based on voice signals generated by the control device 100.

可動部１４０、１５０は、ロボット１０が行う各種作業（例えば、ユーザとコミュニケーションをとりながら、物体を把持し、把持した状態で移動することで、当該物体を目的の場所まで運ぶ作業等）において、把持動作等を実行する。可動部１４０、１５０に含まれる複数の関節及びエンドエフェクタは、それぞれ、アクチュエータを備える。 The movable parts 140 and 150 are used in various tasks performed by the robot 10 (for example, a task of grasping an object while communicating with a user and moving the object while holding the object to carry the object to a target place). Perform a gripping operation or the like. The plurality of joints and end effectors included in the movable portions 140 and 150 each include an actuator.

可動部１６０は、ロボット１０が行う各種作業において、移動動作等を実行する。可動部１６０に含まれる移動機構は、車輪、モータ、ギア、ベルト、アクチュエータ等を備える。 The movable portion 160 executes a moving operation or the like in various operations performed by the robot 10. The moving mechanism included in the movable portion 160 includes wheels, a motor, gears, belts, actuators and the like.

制御装置１００は、カメラ１１０やマイクロフォン１２０の動作、スピーカ１３０、１３１の出力、可動部１４０、１５０のアクチュエータ等の動作、可動部１６０のモータ、アクチュエータ等の動作を制御する制御部１０２を有する。制御部１０２は、ユーザとコミュニケーションをとりながら、ロボット１０に把持動作や移動動作等を実行させる。 The control device 100 includes a control unit 102 that controls the operation of the camera 110 and the microphone 120, the outputs of the speakers 130 and 131, the operation of the actuators of the movable parts 140 and 150, and the operation of the motor and the actuator of the movable part 160. The control unit 102 causes the robot 10 to perform a gripping operation, a moving operation, and the like while communicating with the user.

また、制御装置１００は、制御部１０２に対して音源制御命令を出力する停止部１０１を有する。停止部１０１は、ロボット１０により各種作業が行われている最中に、スピーカ１３０、１３１や可動部１４０〜１６０が発する音（ロボット１０自身が発する音）が原因でユーザの音声を検出できない場合に、制御部１０２に音源制御命令を出力する。なお、停止部１０１では、ロボット１０により各種作業が行われているか否かを、作業中フラグに基づいて判定する。 Further, the control device 100 has a stop unit 101 that outputs a sound source control command to the control unit 102. When the stop unit 101 cannot detect the user's voice due to the sound emitted by the speakers 130, 131 and the movable units 140 to 160 (the sound emitted by the robot 10 itself) while various operations are being performed by the robot 10. A sound source control command is output to the control unit 102. The stop unit 101 determines whether or not various operations are being performed by the robot 10 based on the work in progress flag.

これにより、制御部１０２では、例えば、音源であるスピーカ１３０、１３１の出力や可動部１４０〜１６０のアクチュエータ、モータ等の動作を制御し、ユーザの音声が検出しやすい状態を作り出す。この結果、ユーザの音声指示に基づいて動作するロボット１０において、音声検出率を向上させることができる。 As a result, the control unit 102 controls, for example, the output of the speakers 130 and 131, which are sound sources, and the operations of the actuators, motors, and the like of the movable units 140 to 160, and creates a state in which the user's voice can be easily detected. As a result, the voice detection rate can be improved in the robot 10 that operates based on the voice instruction of the user.

＜制御装置のハードウェア構成＞
次に、ロボット１０に内蔵される制御装置１００のハードウェア構成について説明する。図２は、制御装置のハードウェア構成の一例を示す第１の図である。 <Hardware configuration of control device>
Next, the hardware configuration of the control device 100 built in the robot 10 will be described. FIG. 2 is a first diagram showing an example of the hardware configuration of the control device.

図２に示すように、制御装置１００は、プロセッサ２０１と、主記憶装置２０２と、補助記憶装置２０３と、デバイスインタフェース２０４と、通信装置２０５とを備え、これらの各構成要素がバス２１０を介して接続されたコンピュータとして実現される。 As shown in FIG. 2, the control device 100 includes a processor 201, a main storage device 202, an auxiliary storage device 203, a device interface 204, and a communication device 205, and each of these components is via a bus 210. It is realized as a connected computer.

なお、図２の例では、制御装置１００は、各構成要素を１個ずつ備えているが、同じ構成要素を複数個ずつ備えていてもよい。また、図２の例では、１台の制御装置が示されているが、複数台の制御装置を配し、ソフトウェア（例えば、後述する制御プログラム）が当該複数台の制御装置にインストールされて、各制御装置がソフトウェアの異なる一部の処理を実行するように構成してもよい。この場合、複数の制御装置それぞれがネットワークインタフェース等を介して、相互に通信してもよい。 In the example of FIG. 2, the control device 100 includes one component for each component, but the control device 100 may include a plurality of the same component components. Further, in the example of FIG. 2, one control device is shown, but a plurality of control devices are arranged, and software (for example, a control program described later) is installed in the plurality of control devices. Each controller may be configured to perform different parts of the software. In this case, each of the plurality of control devices may communicate with each other via a network interface or the like.

プロセッサ２０１は、演算装置を含む電子回路（処理回路、Processing circuit、Processing circuitry）である。プロセッサ２０１は、制御装置１００内の各構成要素などから入力されたデータやプログラムに基づいて演算処理を行い、演算結果や制御信号を各構成要素などに出力する。具体的には、プロセッサ２０１は、ＯＳ（Operating System）やアプリケーションなどを実行することにより、制御装置１００内の各構成要素を制御する。なお、プロセッサ２０１は、上記のような処理を行うことができれば特定の処理回路に限られるものではない。ここで、処理回路とは、１チップ上に配置された１又は複数の電子回路を指してもよいし、２つ以上のチップあるいはデバイス上に配置された１又は複数の電子回路を指してもよい。複数の電子回路を用いる場合、各電子回路は有線または無線により通信してもよい。 The processor 201 is an electronic circuit (processing circuit, Processing circuitry) including an arithmetic unit. The processor 201 performs arithmetic processing based on data and programs input from each component in the control device 100, and outputs an arithmetic result and a control signal to each component. Specifically, the processor 201 controls each component in the control device 100 by executing an OS (Operating System), an application, or the like. The processor 201 is not limited to a specific processing circuit as long as it can perform the above processing. Here, the processing circuit may refer to one or more electronic circuits arranged on one chip, or may refer to one or more electronic circuits arranged on two or more chips or devices. Good. When a plurality of electronic circuits are used, each electronic circuit may communicate by wire or wirelessly.

主記憶装置２０２は、プロセッサ２０１が実行する命令およびデータなどの電子情報を記憶する記憶装置である。主記憶装置２０２に記憶された電子情報はプロセッサ２０１により直接読み出される。補助記憶装置２０３は、主記憶装置２０２以外の記憶装置である。なお、これらの記憶装置は、電子情報を記憶可能な任意の電子部品を意味するものとし、メモリでもストレージでもよい。また、メモリには、揮発性メモリと、不揮発性メモリがあるが、いずれであってもよい。制御装置１００内において電子情報を保存するためのメモリは、主記憶装置２０２または補助記憶装置２０３により実現されてもよい。 The main storage device 202 is a storage device that stores electronic information such as instructions and data executed by the processor 201. The electronic information stored in the main storage device 202 is directly read by the processor 201. The auxiliary storage device 203 is a storage device other than the main storage device 202. It should be noted that these storage devices mean arbitrary electronic components capable of storing electronic information, and may be memory or storage. Further, the memory includes a volatile memory and a non-volatile memory, but any of them may be used. The memory for storing electronic information in the control device 100 may be realized by the main storage device 202 or the auxiliary storage device 203.

デバイスインタフェース２０４は、ロボット１０が有するカメラ１１０、マイクロフォン１２０、スピーカ１３０、１３１、可動部１４０〜１６０と接続するＵＳＢ（Universal Serial Bus）などのインタフェースである。 The device interface 204 is an interface such as a USB (Universal Serial Bus) that connects the camera 110, the microphone 120, the speakers 130, 131, and the movable parts 140 to 160 of the robot 10.

通信装置２０５は、ロボット１０の外部の各種機器と通信を行う通信デバイスである。ロボット１０は、通信装置２０５を介して、ロボット１０の外部の各種機器に命令を送信し、外部の各種機器を制御する。 The communication device 205 is a communication device that communicates with various devices outside the robot 10. The robot 10 transmits commands to various devices outside the robot 10 via the communication device 205 to control various devices outside the robot 10.

＜制御装置の機能構成＞
次に、ロボット１０に内蔵される制御装置１００の機能構成について説明する。上述したとおり、制御装置１００には制御プログラムがインストールされており、プロセッサ２０１が、当該プログラムを実行することで、制御装置１００は、停止部１０１及び制御部１０２として機能する。なお、制御装置１００が実現するこれらの構成のうち、ここでは、主に停止部１０１の機能について説明する。 <Functional configuration of control device>
Next, the functional configuration of the control device 100 built in the robot 10 will be described. As described above, the control device 100 has a control program installed, and when the processor 201 executes the program, the control device 100 functions as a stop unit 101 and a control unit 102. Of these configurations realized by the control device 100, the function of the stop unit 101 will be mainly described here.

図３は、制御装置の機能構成の一例を示す第１の図である。図３に示すように、停止部１０１は、音取得部３０１、音声検出部３０２、画像取得部３０３、顔検出部３０４、口唇動作検出部３０５、判定部３０６を有する。停止部１０１は、ロボット１０が各種作業中であることを示す作業中フラグを制御部１０２から受信している間、各部が機能する。 FIG. 3 is a first diagram showing an example of the functional configuration of the control device. As shown in FIG. 3, the stop unit 101 includes a sound acquisition unit 301, a voice detection unit 302, an image acquisition unit 303, a face detection unit 304, a lip motion detection unit 305, and a determination unit 306. Each unit of the stop unit 101 functions while the robot 10 receives an in-work flag indicating that the robot 10 is in the process of various operations from the control unit 102.

音取得部３０１は、マイクロフォン１２０により生成された音データを取得し、音声検出部３０２に出力する。 The sound acquisition unit 301 acquires the sound data generated by the microphone 120 and outputs it to the voice detection unit 302.

音声検出部３０２は、音取得部３０１により出力された音データを受け取り、受け取った音データに音声データが含まれるか否かを判定する。また、音声検出部３０２は、受け取った音データに音声データが含まれると判定した場合、音声データを検出し、検出した音声データを判定部３０６に出力する。 The voice detection unit 302 receives the sound data output by the sound acquisition unit 301, and determines whether or not the received sound data includes the voice data. When the voice detection unit 302 determines that the received sound data includes the voice data, the voice detection unit 302 detects the voice data and outputs the detected voice data to the determination unit 306.

画像取得部３０３は、カメラ１１０により出力された画像データを取得し、顔検出部３０４に出力する。 The image acquisition unit 303 acquires the image data output by the camera 110 and outputs it to the face detection unit 304.

顔検出部３０４は、画像取得部３０３により出力された画像データを受け取り、カメラ１１０の方向を向いたユーザの顔を検出し、検出した顔の画像（顔画像データ）を切り出す。また、顔検出部３０４は、切り出した顔画像データを口唇動作検出部３０５に出力する。 The face detection unit 304 receives the image data output by the image acquisition unit 303, detects the face of the user facing the direction of the camera 110, and cuts out the detected face image (face image data). Further, the face detection unit 304 outputs the cut out face image data to the lip motion detection unit 305.

口唇動作検出部３０５は、顔画像データに含まれる口唇領域から、ユーザの口唇動作を検出する。また、口唇動作検出部３０５は、口唇動作の検出結果を判定部３０６に出力する。 The lip motion detection unit 305 detects the user's lip motion from the lip region included in the face image data. Further, the lip motion detection unit 305 outputs the detection result of the lip motion to the determination unit 306.

判定部３０６は指示部の一例である。音声検出部３０２により音声データが検出されたか否か、及び、口唇動作検出部３０５より口唇動作の検出結果が出力されたか否かを判定する。また、判定部３０６は、口唇動作検出部３０５より口唇動作の検出結果が出力されたにも関わらず、音声検出部３０２により音声データが検出されていない場合に、制御部１０２に対して、音源制御命令を出力することで、音源が発する音を低減させるよう指示する。 The determination unit 306 is an example of an instruction unit. It is determined whether or not the voice data is detected by the voice detection unit 302, and whether or not the detection result of the lip movement is output from the lip movement detection unit 305. Further, the determination unit 306 sends a sound source to the control unit 102 when the voice data is not detected by the voice detection unit 302 even though the lip movement detection result is output from the lip movement detection unit 305. By outputting a control command, it is instructed to reduce the sound emitted by the sound source.

なお、判定部３０６は、口唇動作検出部３０５より口唇動作の検出結果が出力され、音声検出部３０２により音声データが検出された場合には、制御部１０２に対して、検出された音声データを出力する。 The determination unit 306 outputs the detection result of the lip movement from the lip movement detection unit 305, and when the voice detection unit 302 detects the voice data, the determination unit 306 sends the detected voice data to the control unit 102. Output.

制御部１０２は、判定部３０６により出力された音源制御命令を受け取ると、スピーカ１３０、１３１や可動部１４０〜１６０等に動作停止信号を出力する。そして、音源であるスピーカ１３０、１３１の出力や可動部１４０〜１６０のアクチュエータ、モータ等の動作を制御する。これにより、制御部１０２は、音声データの検出を妨げる音源が発する音を低減させ、音声データを検出しやすい状態を作り出すことができる。 When the control unit 102 receives the sound source control command output by the determination unit 306, the control unit 102 outputs an operation stop signal to the speakers 130, 131, the movable units 140 to 160, and the like. Then, the outputs of the speakers 130 and 131, which are sound sources, and the operations of the actuators, motors, and the like of the movable parts 140 to 160 are controlled. As a result, the control unit 102 can reduce the sound emitted by the sound source that hinders the detection of the voice data, and can create a state in which the voice data can be easily detected.

一方、制御部１０２は、判定部３０６により出力された音声データを受け取ると、受け取った音声データに基づいて、ユーザの音声指示を認識する。また、制御部１０２は、認識した音声指示に基づいて、カメラ１１０、マイクロフォン１２０、スピーカ１３０、１３１、可動部１４０〜１６０等に動作信号を出力する。そして、カメラ１１０やマイクロフォン１２０の動作、スピーカ１３０、１３１の出力、可動部１４０〜１６０のアクチュエータ、モータ等の動作を制御する。これにより、制御部１０２は、ユーザの音声指示に基づいて、カメラ１１０やマイクロフォン１２０の動作、スピーカ１３０、１３１の出力、可動部１４０〜１６０等の動作を制御することができる。 On the other hand, when the control unit 102 receives the voice data output by the determination unit 306, the control unit 102 recognizes the user's voice instruction based on the received voice data. Further, the control unit 102 outputs an operation signal to the camera 110, the microphone 120, the speakers 130, 131, the movable units 140 to 160, and the like based on the recognized voice instruction. Then, the operation of the camera 110 and the microphone 120, the output of the speakers 130 and 131, the actuators of the movable parts 140 to 160, the operation of the motor and the like are controlled. As a result, the control unit 102 can control the operations of the camera 110 and the microphone 120, the outputs of the speakers 130 and 131, and the operations of the movable units 140 to 160, etc., based on the voice instruction of the user.

＜動作制御処理の流れ＞
次に、制御装置１００による動作制御処理の流れについて説明する。図４は、制御装置による動作制御処理の流れを示す第１のフローチャートである。 <Flow of operation control processing>
Next, the flow of the operation control process by the control device 100 will be described. FIG. 4 is a first flowchart showing a flow of operation control processing by the control device.

ステップＳ４０１において、停止部１０１は、ロボット１０が作業中であるか否かを判定する。制御部１０２から作業中フラグを受信していない場合、停止部１０１は、ロボット１０が作業中でないと判定し（ステップＳ４０１においてＮｏと判定し）、動作制御処理を終了する。 In step S401, the stop unit 101 determines whether or not the robot 10 is working. When the working flag is not received from the control unit 102, the stop unit 101 determines that the robot 10 is not working (determines No in step S401), and ends the operation control process.

一方、制御部１０２から作業中フラグを受信している場合、停止部１０１は、ロボット１０が作業中であると判定し（ステップＳ４０１においてＹｅｓと判定し）、ステップＳ４０２に進む。 On the other hand, when the working flag is received from the control unit 102, the stop unit 101 determines that the robot 10 is working (determines Yes in step S401), and proceeds to step S402.

ステップＳ４０２において、画像取得部３０３は、カメラ１１０から画像データを取得する。 In step S402, the image acquisition unit 303 acquires image data from the camera 110.

ステップＳ４０３において、顔検出部３０４は、取得された画像データから、カメラ１１０の方向を向いたユーザの顔を検出したか否かを判定する。ステップＳ４０３において、ユーザの顔を検出しなかった場合には（ステップＳ４０３においてＮｏの場合には）、ステップＳ４０１に戻る。 In step S403, the face detection unit 304 determines from the acquired image data whether or not the face of the user facing the direction of the camera 110 has been detected. If the user's face is not detected in step S403 (No in step S403), the process returns to step S401.

一方、ステップＳ４０３において、ユーザの顔を検出した場合には（ステップＳ４０３においてＹｅｓの場合）、顔検出部３０４は、顔画像データを切り出し、ステップＳ４０４に進む。 On the other hand, when the user's face is detected in step S403 (Yes in step S403), the face detection unit 304 cuts out the face image data and proceeds to step S404.

ステップＳ４０４において、口唇動作検出部３０５は、切り出された顔画像データに基づいて口唇動作を検出したか否かを判定する。ステップＳ４０４において、口唇動作を検出しなかった場合には（ステップＳ４０４においてＮｏの場合には）、ステップＳ４０１に戻る。 In step S404, the lip motion detection unit 305 determines whether or not the lip motion is detected based on the cut out face image data. If no lip movement is detected in step S404 (No in step S404), the process returns to step S401.

一方、ステップＳ４０４において、口唇動作を検出した場合には（ステップＳ４０４においてＹｅｓの場合には）、ステップＳ４０５に進む。 On the other hand, if the lip movement is detected in step S404 (if Yes in step S404), the process proceeds to step S405.

なお、ステップＳ４０３において、ユーザの顔を複数検出した場合には、口唇動作検出部３０５は、ステップＳ４０４の処理を、検出された顔の数に応じた回数だけ繰り返し実行する。また、ステップＳ４０４において、口唇動作を１つでも検出した場合には、ステップＳ４０５に進む。 When a plurality of user faces are detected in step S403, the lip motion detection unit 305 repeatedly executes the process of step S404 as many times as the number of detected faces. If even one lip movement is detected in step S404, the process proceeds to step S405.

ステップＳ４０５において、音取得部３０１は、マイクロフォン１２０から音データを取得する。 In step S405, the sound acquisition unit 301 acquires sound data from the microphone 120.

ステップＳ４０６において、音声検出部３０２は、取得された音データに音声データが含まれるか否かを判定する。ステップＳ４０６において、取得された音データに音声データが含まれると判定された場合、音声検出部３０２は、音声データを検出する（ステップＳ４０６においてＹｅｓ）。また、判定部３０６は、検出された音声データを制御部１０２に出力した後、ステップＳ４０１に戻る。 In step S406, the voice detection unit 302 determines whether or not the acquired sound data includes voice data. If it is determined in step S406 that the acquired sound data includes voice data, the voice detection unit 302 detects the voice data (Yes in step S406). Further, the determination unit 306 returns to step S401 after outputting the detected voice data to the control unit 102.

この場合、制御部１０２では、検出された音声データに基づいて、ユーザの音声指示を認識し、認識した音声指示に基づく動作信号を、カメラ１１０、マイクロフォン１２０、スピーカ１３０、１３１、可動部１４０〜１６０等に出力する。そして、カメラ１１０やマイクロフォン１２０の動作、スピーカ１３０、１３１の出力、可動部１４０〜１６０等の動作を制御する。これにより、制御部１０２は、カメラ１１０やマイクロフォン１２０の動作、スピーカ１３０、１３１の出力、可動部１４０〜１６０のアクチュエータ、モータ等の動作を、ユーザの音声指示に基づいて制御することができる。 In this case, the control unit 102 recognizes the user's voice instruction based on the detected voice data, and outputs the operation signal based on the recognized voice instruction to the camera 110, the microphone 120, the speakers 130, 131, and the movable unit 140 to. Output to 160 etc. Then, the operation of the camera 110 and the microphone 120, the output of the speakers 130 and 131, and the operation of the movable parts 140 to 160 and the like are controlled. As a result, the control unit 102 can control the operation of the camera 110 and the microphone 120, the outputs of the speakers 130 and 131, the actuators of the movable units 140 to 160, the operation of the motor, and the like based on the voice instruction of the user.

一方、ステップＳ４０６において、取得された音データに音声データが含まれないと判定された場合、音声検出部３０２は、音声データを検出しないため（ステップＳ４０６においてＮｏ）、ステップＳ４０７に進む。 On the other hand, if it is determined in step S406 that the acquired sound data does not include the voice data, the voice detection unit 302 does not detect the voice data (No in step S406), so the process proceeds to step S407.

ステップＳ４０７において、判定部３０６は、音源制御命令を制御部１０２に出力する。また、制御部１０２は、動作停止信号を、スピーカ１３０、１３１や可動部１４０〜１６０等に出力する。そして、音源であるスピーカ１３０、１３１の出力や可動部１４０〜１６０等の動作を制御する。具体的には、制御部１０２は、ユーザの音声指示に対する反応以外の合成音声をスピーカ１３０、１３１から出力するのを停止させる、あるいは、スピーカ１３０、１３１から音楽を出力するのを停止させる、あるいは、可動部１４０〜１６０等の動作を停止させる。 In step S407, the determination unit 306 outputs a sound source control command to the control unit 102. Further, the control unit 102 outputs an operation stop signal to the speakers 130, 131, the movable units 140 to 160, and the like. Then, the output of the speakers 130 and 131, which are sound sources, and the operation of the movable parts 140 to 160 and the like are controlled. Specifically, the control unit 102 stops the output of synthetic voice other than the reaction to the user's voice instruction from the speakers 130 and 131, or stops the output of music from the speakers 130 and 131, or , The operation of the movable parts 140 to 160 and the like is stopped.

なお、図４の例では、ステップＳ４０２からステップＳ４０６までの処理を、順に実行する場合について示したが、ステップＳ４０２からステップＳ４０４までの処理と、ステップＳ４０５からステップＳ４０６までの処理とは、並行して実行されてもよい。 In the example of FIG. 4, the case where the processes from step S402 to step S406 are executed in order is shown, but the processes from step S402 to step S404 and the processes from step S405 to step S406 are performed in parallel. May be executed.

その場合、判定部３０６がステップＳ４０７の処理を実行する前に同期処理を行い、口唇動作を検出した場合であって、かつ、音声データを検出しなかった場合にのみ、判定部３０６がステップＳ４０７の処理を実行するように構成する。 In that case, the determination unit 306 performs the synchronization process before executing the process of step S407 and detects the lip movement, and the determination unit 306 does not detect the voice data only in step S407. Configure to perform the processing of.

また、図４の例では、ステップＳ４０６において、音声データを検出した場合、ステップＳ４０１に戻るものとして説明した。しかしながら、音声データを検出した場合であっても、制御部１０２が、当該音声データから音声指示を認識できない、または、その確信度が低い（例えば、所定の閾値以下）と判定した場合には、音声データが検出されなかった場合と同様な制御を行ってもよい。この場合、判定部４０６は、ステップＳ４０７に進み、音源制御命令を出力するように構成してもよい。 Further, in the example of FIG. 4, when the voice data is detected in step S406, it is described as returning to step S401. However, even when the voice data is detected, if the control unit 102 determines that the voice instruction cannot be recognized from the voice data or the certainty is low (for example, below a predetermined threshold value), The same control as when no voice data is detected may be performed. In this case, the determination unit 406 may be configured to proceed to step S407 and output a sound source control command.

このように構成することで、制御部１０２は、
・音データからの音声データの検出状況、または
・検出した音声データの認識状況（例えば、音声認識のスコア（尤度情報））、
に基づいて、動作停止信号を出力することが可能となる。 With this configuration, the control unit 102
-Detection status of voice data from sound data, or recognition status of detected voice data (for example, voice recognition score (probability information)),
It becomes possible to output an operation stop signal based on.

また、図４の例では、ステップＳ４０７において判定部３０６が音源制御命令を出力した際、制御部１０２が、スピーカ１３０、１３１や可動部１４０〜１６０等に動作停止信号を出力するものとして説明した。しかしながら、制御部１０２は、スピーカ１３０、１３１に音量を下げるための信号を出力したり、可動部１４０〜１６０等に可動部１４０〜１６０等の動作速度を落とすための減速信号を出力してもよい。これにより、制御部１０２は、スピーカ１３０、１３１の音量を下げたり、可動部１４０〜１６０等の動作を減速させることができる。 Further, in the example of FIG. 4, when the determination unit 306 outputs the sound source control command in step S407, the control unit 102 outputs the operation stop signal to the speakers 130, 131, the movable units 140 to 160, and the like. .. However, even if the control unit 102 outputs a signal for lowering the volume to the speakers 130 and 131, or outputs a deceleration signal for reducing the operating speed of the movable parts 140 to 160 and the like to the movable parts 140 to 160 and the like. Good. As a result, the control unit 102 can reduce the volume of the speakers 130 and 131 and slow down the operation of the movable units 140 to 160 and the like.

また、図４の例では省略したが、顔画像データに基づいて口唇動作を検出しなかった場合であっても、音声検出部３０２が音声データを検出した場合には、判定部３０６は、検出された音声データを制御部１０２に出力するように構成してもよい。 Further, although omitted in the example of FIG. 4, even when the lip movement is not detected based on the face image data, when the voice detection unit 302 detects the voice data, the determination unit 306 detects it. The voice data may be configured to be output to the control unit 102.

＜まとめ＞
以上の説明から明らかなように、第１の実施形態に係る制御装置１００は、
・取得した画像データに基づいて、ユーザの口唇動作を検出する。
・取得した音データから音声データを検出する。
・ユーザの口唇動作を検出した場合であって、音声データを検出しなかった場合、動作停止信号（または音量を下げるための信号、減速信号）を出力する。あるいは、
・ユーザの口唇動作を検出し、かつ、音声データを検出した場合であって、音声データを認識した際の尤度情報が所定の閾値以下であった場合、動作停止信号（または音量を下げるための信号、減速信号）を出力する。
・そして、スピーカの出力や可動部等の動作を制御する（またはスピーカの出力を停止させる、音量を下げる、可動部等の動作を停止させる、減速させる）ことで、音源であるスピーカや可動部が発する音を低減させる。 <Summary>
As is clear from the above description, the control device 100 according to the first embodiment is
-Detects the user's lip movement based on the acquired image data.
-Detects voice data from the acquired sound data.
-When the user's lip movement is detected and no voice data is detected, an operation stop signal (or a signal for lowering the volume, a deceleration signal) is output. Or
-When the user's lip movement is detected and the voice data is detected, and the likelihood information when the voice data is recognized is equal to or less than a predetermined threshold value, the movement stop signal (or to lower the volume) Signal, deceleration signal) is output.
-And by controlling the output of the speaker and the operation of the moving parts (or stopping the output of the speaker, lowering the volume, stopping the operation of the moving parts, decelerating), the speaker and moving parts that are sound sources Reduces the sound emitted by.

これにより、第１の実施形態に係る制御装置１００では、音声データを検出しやすい状態を作り出すことができる。この結果、第１の実施形態に係る制御装置１００によれば、ユーザの音声指示に基づいて動作するロボットにおいて、音声検出率を向上させることができる。 As a result, the control device 100 according to the first embodiment can create a state in which voice data can be easily detected. As a result, according to the control device 100 according to the first embodiment, the voice detection rate can be improved in the robot that operates based on the voice instruction of the user.

［第２の実施形態］
上記第１の実施形態では、制御装置１００が、動作制御処理として、音声データを検出しやすい状態を作り出す処理を実行したが、第２の実施形態では、更に、制御装置１００が、当該状態のもとで音声指示の発話を促す処理を実行する。以下、第２の実施形態について、上記第１の実施形態との相違点を中心に説明する。 [Second Embodiment]
In the first embodiment, the control device 100 executes a process for creating a state in which voice data can be easily detected as an operation control process, but in the second embodiment, the control device 100 further performs a process in that state. The process of prompting the utterance of the voice instruction is executed. Hereinafter, the second embodiment will be described focusing on the differences from the first embodiment.

＜制御装置の機能構成＞
はじめに、第２の実施形態に係る制御装置１００の機能構成について説明する。図５は、制御装置の機能構成の一例を示す第２の図である。図３に示した機能構成との相違点は、判定部５００の機能及び制御部５１０の機能が、図３の判定部３０６の機能及び制御部１０２の機能とは異なる点である。 <Functional configuration of control device>
First, the functional configuration of the control device 100 according to the second embodiment will be described. FIG. 5 is a second diagram showing an example of the functional configuration of the control device. The difference from the functional configuration shown in FIG. 3 is that the function of the determination unit 500 and the function of the control unit 510 are different from the function of the determination unit 306 and the function of the control unit 102 of FIG.

判定部５００は指示部の一例であり、音声検出部３０２により音声データが検出されたか否か、及び、口唇動作検出部３０５より口唇動作の検出結果が出力されたか否かを判定する。また、判定部５００は、口唇動作検出部３０５より口唇動作の検出結果が出力されたにも関わらず、音声検出部３０２により音声データが検出されていない場合に、制御部５１０に対して、音源制御命令と再発声指示とを出力する。これにより、判定部５００は、音源が発する音を低減させるよう指示するとともに、音声指示の発話を促すよう指示する。 The determination unit 500 is an example of an instruction unit, and determines whether or not voice data is detected by the voice detection unit 302 and whether or not the lip motion detection result is output from the lip motion detection unit 305. Further, the determination unit 500 sends a sound source to the control unit 510 when the voice data is not detected by the voice detection unit 302 even though the lip movement detection result is output from the lip movement detection unit 305. Outputs a control command and a recurrence voice instruction. As a result, the determination unit 500 instructs to reduce the sound emitted by the sound source and also instructs to prompt the utterance of the voice instruction.

なお、判定部５００は、図３の判定部３０６同様、口唇動作検出部３０５より口唇動作の検出結果が出力され、音声検出部３０２により音声データが検出された場合には、制御部５１０に対して、音声データを出力する。 Similar to the determination unit 306 in FIG. 3, the determination unit 500 outputs the detection result of the lip movement from the lip movement detection unit 305, and when the voice detection unit 302 detects the voice data, the determination unit 500 sends the control unit 510. And output audio data.

制御部５１０は、判定部５００により出力された音源制御命令を受け取ると、スピーカ１３０、１３１や可動部１４０〜１６０等に動作停止信号を出力する。そして、音源であるスピーカ１３０、１３１や可動部１４０〜１６０のアクチュエータ、モータ等の動作を停止させる。これにより、制御部５１０は、音声データの検出を妨げる音源が発する音を低減させ、音声データを検出しやすい状態を作り出すことができる。 When the control unit 510 receives the sound source control command output by the determination unit 500, the control unit 510 outputs an operation stop signal to the speakers 130, 131, the movable units 140 to 160, and the like. Then, the operations of the speakers 130 and 131, which are sound sources, and the actuators and motors of the movable parts 140 to 160 are stopped. As a result, the control unit 510 can reduce the sound emitted by the sound source that hinders the detection of the voice data, and can create a state in which the voice data can be easily detected.

また、制御部５１０は、判定部５００により出力された再発声指示を受け取ると、音声指示の発話を促す音声出力信号を生成し、生成した音声出力信号に基づく合成音声を、スピーカ１３０、１３１を介して出力する。音声指示の発話を促す音声出力信号とは、例えば、「もう一度話してください」といった合成音声を出力するための音声出力信号である。これにより、制御部５１０は、音声データを検出しやすい状態のもとで、ユーザに音声指示の発話を促すことができる。 Further, when the control unit 510 receives the recurrence voice instruction output by the determination unit 500, the control unit 510 generates a voice output signal prompting the utterance of the voice instruction, and generates a synthetic voice based on the generated voice output signal to the speakers 130 and 131. Output via. The voice output signal for prompting the utterance of a voice instruction is, for example, a voice output signal for outputting a synthetic voice such as "Please speak again". As a result, the control unit 510 can urge the user to utter a voice instruction under a state in which the voice data can be easily detected.

＜動作制御処理の流れ＞
次に、第２の実施形態に係る制御装置１００による動作制御処理の流れについて説明する。図６は、制御装置による動作制御処理の流れを示す第２のフローチャートである。図４に示すフローチャートとの相違点は、ステップＳ６０１である。 <Flow of operation control processing>
Next, the flow of the operation control process by the control device 100 according to the second embodiment will be described. FIG. 6 is a second flowchart showing the flow of operation control processing by the control device. The difference from the flowchart shown in FIG. 4 is step S601.

ステップＳ６０１において、制御部５１０は、音声指示の発話を促す音声出力信号を生成し、生成した音声出力信号に基づく合成音声を、スピーカ１３０、１３１を介して出力する。 In step S601, the control unit 510 generates a voice output signal for prompting the utterance of a voice instruction, and outputs a synthetic voice based on the generated voice output signal via the speakers 130 and 131.

＜まとめ＞
以上の説明から明らかなように、第２の実施形態に係る制御装置１００は、上記第１の実施形態に係る制御装置１００の構成に加えて、更に、音声指示の発話を促す構成を有する。 <Summary>
As is clear from the above description, the control device 100 according to the second embodiment has a configuration for prompting the utterance of a voice instruction in addition to the configuration of the control device 100 according to the first embodiment.

これにより、第２の実施形態に係る制御装置１００では、音声データを検出しやすい状態のもとで、ユーザの音声指示を受け取ることができる。この結果、第２の実施形態に係る制御装置１００によれば、ユーザの音声指示に基づいて動作するロボットにおいて、音声検出率を向上させることができる。 As a result, the control device 100 according to the second embodiment can receive the voice instruction of the user in a state where the voice data can be easily detected. As a result, according to the control device 100 according to the second embodiment, the voice detection rate can be improved in the robot that operates based on the voice instruction of the user.

［第３の実施形態］
上記第１の実施形態では、制御装置１００が、動作制御処理として、音声データを検出しやすい状態を作り出す処理を実行したが、第３の実施形態では、更に、制御装置１００が、当該状態のもとでユーザの音声指示の有無を判定する。以下、第３の実施形態について、上記第１の実施形態との相違点を中心に説明する。 [Third Embodiment]
In the first embodiment, the control device 100 executes a process of creating a state in which voice data can be easily detected as an operation control process, but in the third embodiment, the control device 100 further performs a process of creating a state in which the voice data is easily detected. Based on this, it is determined whether or not the user has a voice instruction. Hereinafter, the third embodiment will be described focusing on the differences from the first embodiment.

＜制御装置の機能構成＞
はじめに、第３の実施形態に係る制御装置１００の機能構成について説明する。図７は、制御装置の機能構成の一例を示す第３の図である。図３に示した機能構成との相違点は、判定部７００の機能及び制御部７１０の機能が、図３の判定部３０６の機能及び制御部１０２の機能とは異なる点である。 <Functional configuration of control device>
First, the functional configuration of the control device 100 according to the third embodiment will be described. FIG. 7 is a third diagram showing an example of the functional configuration of the control device. The difference from the functional configuration shown in FIG. 3 is that the function of the determination unit 700 and the function of the control unit 710 are different from the function of the determination unit 306 and the function of the control unit 102 of FIG.

判定部７００は指示部の一例であり、音声検出部３０２により音声データが検出されたか否か、及び、口唇動作検出部３０５より口唇動作の検出結果が出力されたか否かを判定する。また、判定部７００は、口唇動作検出部３０５より口唇動作の検出結果が出力されたにも関わらず、音声検出部３０２により音声データが検出されていない場合に、制御部７１０に対して、音源制御命令を出力する。これにより、判定部７００は、音源が発する音を低減させるよう指示する。 The determination unit 700 is an example of an instruction unit, and determines whether or not voice data is detected by the voice detection unit 302 and whether or not the lip movement detection result is output from the lip movement detection unit 305. Further, the determination unit 700 sends a sound source to the control unit 710 when the voice data is not detected by the voice detection unit 302 even though the lip movement detection result is output from the lip movement detection unit 305. Output a control command. As a result, the determination unit 700 instructs to reduce the sound emitted by the sound source.

更に、判定部７００は、音源制御命令を出力した後の所定時間の間に、音声検出部３０２により音声データが検出されなかった場合に、制御部７１０に動作再開指示を出力することで、音源制御命令を出力する前の動作を再開するよう指示する。また、判定部７００は、音源制御命令を出力した後の所定時間の間に、音声検出部３０２により音声データが検出された場合に、制御部７１０に音声データを出力する。 Further, the determination unit 700 outputs an operation restart instruction to the control unit 710 when the voice data is not detected by the voice detection unit 302 within a predetermined time after outputting the sound source control command, thereby producing a sound source. Instructs to resume the operation before outputting the control command. Further, the determination unit 700 outputs the voice data to the control unit 710 when the voice data is detected by the voice detection unit 302 during a predetermined time after the sound source control command is output.

なお、判定部７００は、図３の判定部３０６同様、口唇動作検出部３０５より口唇動作の検出結果が出力され、音声検出部３０２により音声データが検出された場合、制御部７１０に対して、音声データを出力する。 Similar to the determination unit 306 in FIG. 3, the determination unit 700 outputs the detection result of the lip movement from the lip movement detection unit 305, and when the voice detection unit 302 detects the voice data, the determination unit 700 sends the control unit 710. Output audio data.

制御部７１０は、判定部７００により出力された音源制御命令を受け取ると、スピーカ１３０、１３１や可動部１４０〜１６０等に動作停止信号を出力する。そして、音源であるスピーカ１３０、１３１の出力や可動部１４０〜１６０のアクチュエータ、モータ等の動作を停止させる。これにより、制御部７１０は、音声データの検出を妨げる音源が発する音を低減させ、音声データを検出しやすい状態を作り出すことができる。 When the control unit 710 receives the sound source control command output by the determination unit 700, the control unit 710 outputs an operation stop signal to the speakers 130, 131, the movable units 140 to 160, and the like. Then, the outputs of the speakers 130 and 131, which are sound sources, and the actuators, motors, and the like of the movable parts 140 to 160 are stopped. As a result, the control unit 710 can reduce the sound emitted by the sound source that hinders the detection of the voice data, and can create a state in which the voice data can be easily detected.

また、制御部７１０は、スピーカ１３０、１３１や可動部１４０〜１６０等に動作停止信号を出力した後の所定時間の間に検出された音声データを受け取ると、受け取った音声データに基づいて、ユーザの音声指示を認識する。また、制御部７１０は、認識した音声指示に応じた動作信号を出力することで、カメラ１１０やマイクロフォン１２０の動作、スピーカ１３０、１３１の出力、可動部１４０〜１６０のアクチュエータ、モータ等の動作を制御する。これにより、制御部７１０は、音源制御命令を受け取る前の状態の如何によらず、受け取った後のユーザの音声指示に基づいて、カメラ１１０やマイクロフォン１２０の動作、スピーカ１３０、１３１の出力、可動部１４０〜１６０等の動作を制御することができる。 Further, when the control unit 710 receives the voice data detected during a predetermined time after outputting the operation stop signal to the speakers 130, 131, the movable parts 140 to 160, etc., the control unit 710 receives the voice data detected by the user based on the received voice data. Recognize voice instructions. Further, the control unit 710 outputs an operation signal according to the recognized voice instruction to operate the camera 110 and the microphone 120, output the speakers 130 and 131, and operate the actuators and motors of the movable parts 140 to 160. Control. As a result, the control unit 710 operates the camera 110 and the microphone 120, outputs and moves the speakers 130 and 131, based on the user's voice instruction after receiving the sound source control command, regardless of the state before receiving the sound source control command. The operation of units 140 to 160 and the like can be controlled.

また、制御部７１０は、判定部７００により出力された動作再開指示を受け取ると、音源制御命令を受け取る前のスピーカ１３０、１３１の出力、可動部１４０〜１６０等の動作を再開するよう、動作信号を出力する。これにより、制御部７１０は、音源制御命令を受け取る前のスピーカ１３０、１３１の出力、可動部１４０〜１６０等の動作を再開させることができる。 Further, when the control unit 710 receives the operation restart instruction output by the determination unit 700, the operation signal is such that the output of the speakers 130 and 131 and the operation of the movable units 140 to 160 and the like before receiving the sound source control command are restarted. Is output. As a result, the control unit 710 can restart the operations of the outputs of the speakers 130 and 131, the movable units 140 to 160, and the like before receiving the sound source control command.

＜動作制御処理の流れ＞
次に、第３の実施形態に係る制御装置１００による動作制御処理の流れについて説明する。図８は、制御装置による動作制御処理の流れを示す第３のフローチャートである。図４に示すフローチャートとの相違点は、ステップＳ８０１〜Ｓ８０４である。 <Flow of operation control processing>
Next, the flow of the operation control process by the control device 100 according to the third embodiment will be described. FIG. 8 is a third flowchart showing the flow of operation control processing by the control device. The difference from the flowchart shown in FIG. 4 is steps S801 to S804.

ステップＳ８０１において、音声検出部３０２は、音声データを検出したか否かを判定する。ステップＳ８０１において、音声データを検出したと判定した場合には（ステップＳ８０１においてＹｅｓの場合には）、ステップＳ８０２に進む。 In step S801, the voice detection unit 302 determines whether or not voice data has been detected. If it is determined in step S801 that the voice data has been detected (yes in step S801), the process proceeds to step S802.

ステップＳ８０２において、判定部７００は、音声検出部３０２により検出された音声データを制御部７１０に出力する。制御部７１０は、判定部７００により出力された音声データに基づいて、ユーザの音声指示を認識し、認識した音声指示に応じた動作信号を、カメラ１１０、マイクロフォン１２０、スピーカ１３０、１３１、可動部１４０〜１６０等に出力する。そして、カメラ１１０やマイクロフォン１２０の動作、スピーカ１３０、１３１の出力、可動部１４０〜１６０等の動作を制御する。 In step S802, the determination unit 700 outputs the voice data detected by the voice detection unit 302 to the control unit 710. The control unit 710 recognizes the user's voice instruction based on the voice data output by the determination unit 700, and outputs an operation signal corresponding to the recognized voice instruction to the camera 110, the microphone 120, the speakers 130, 131, and the movable unit. Output to 140 to 160 and so on. Then, the operation of the camera 110 and the microphone 120, the output of the speakers 130 and 131, and the operation of the movable parts 140 to 160 and the like are controlled.

一方、ステップＳ８０１において、音声データを検出していない場合には（ステップＳ８０１においてＮｏの場合には）、ステップＳ８０３に進む。 On the other hand, if the voice data is not detected in step S801 (if No in step S801), the process proceeds to step S803.

ステップＳ８０３において、判定部７００は、ステップＳ４０７において、制御部７１０がスピーカ１３０、１３１の出力、可動部１４０〜１６０等の動作を停止させてから、所定時間が経過したか否かを判定する。ステップＳ８０３において、所定時間が経過していないと判定した場合には（ステップＳ８０３においてＮｏの場合には）、ステップＳ８０１に戻る。 In step S803, the determination unit 700 determines whether or not a predetermined time has elapsed since the control unit 710 stopped the operations of the output of the speakers 130 and 131, the movable units 140 to 160, and the like in step S407. If it is determined in step S803 that the predetermined time has not elapsed (No in step S803), the process returns to step S801.

一方、ステップＳ８０３において、所定時間が経過したと判定した場合には（ステップＳ８０３においてＹｅｓの場合には）、ステップＳ８０４に進む。 On the other hand, if it is determined in step S803 that the predetermined time has elapsed (yes in step S803), the process proceeds to step S804.

ステップＳ８０４において、判定部７００は、制御部７１０に動作再開指示を出力する。また、制御部７１０は、音源制御命令を受け取る前のスピーカ１３０、１３１の出力、可動部１４０〜１６０等の動作を再開するよう、動作信号を出力する。これにより、制御部７１０は、音源制御命令を受け取る前のスピーカ１３０、１３１の出力、可動部１４０〜１６０等の動作を再開させることができる。 In step S804, the determination unit 700 outputs an operation restart instruction to the control unit 710. Further, the control unit 710 outputs an operation signal so as to restart the operations of the speakers 130 and 131 and the movable units 140 to 160 before receiving the sound source control command. As a result, the control unit 710 can restart the operations of the outputs of the speakers 130 and 131, the movable units 140 to 160, and the like before receiving the sound source control command.

＜まとめ＞
以上の説明から明らかなように、第３の実施形態に係る制御装置１００は、上記第１の実施形態に係る制御装置１００の構成に加えて、更に、
・音声データを検出しやすい状態のもとで音声指示を受け取った場合に、当該音声指示に基づいて、カメラやマイクロフォンの動作、スピーカの出力、可動部等の動作を制御する。
・音声データを検出しやすい状態にもとで音声指示を受け取らなかった場合に、音声データを検出しやすい状態を作り出す前のスピーカの出力、可動部等の動作を再開させる。 <Summary>
As is clear from the above description, the control device 100 according to the third embodiment is further provided with the configuration of the control device 100 according to the first embodiment.
-When a voice instruction is received in a state where voice data is easy to detect, the operation of the camera or microphone, the output of the speaker, the operation of moving parts, etc. are controlled based on the voice instruction.
-When the voice instruction is not received in the state where the voice data is easy to detect, the operation of the speaker output, the moving part, etc. before creating the state where the voice data is easy to detect is restarted.

このように、音声データを検出しやすい状態のもとで、ユーザの音声指示の有無を判定することで、第３の実施形態に係る制御装置１００では、ユーザが音声指示を行ったか否かを正しく判定することができる。この結果、第３の実施形態に係る制御装置１００によれば、ユーザの意図に反してロボットが動作するといった事態を回避することができる。 In this way, by determining the presence or absence of the user's voice instruction in a state where the voice data can be easily detected, the control device 100 according to the third embodiment determines whether or not the user has given the voice instruction. It can be judged correctly. As a result, according to the control device 100 according to the third embodiment, it is possible to avoid a situation in which the robot operates against the intention of the user.

［第４の実施形態］
上記第１乃至第３の実施形態では、カメラ１１０をロボット１０に配するものとして説明した。しかしながら、カメラ１１０はロボット１０以外に配してもよい。あるいは、カメラ１１０をロボット１０に配したうえで、更に、カメラ１１０以外のカメラを、ロボット１０以外に配してもよい。以下、第４の実施形態について、上記第１の実施形態との相違点を中心に説明する。 [Fourth Embodiment]
In the first to third embodiments described above, the camera 110 is described as being arranged on the robot 10. However, the camera 110 may be arranged in addition to the robot 10. Alternatively, the camera 110 may be arranged on the robot 10, and then a camera other than the camera 110 may be arranged on the robot 10. Hereinafter, the fourth embodiment will be described focusing on the differences from the first embodiment.

＜ロボットの作業環境＞
はじめに、第４の実施形態に係るロボット１０の作業環境について説明する。図９は、ロボットの作業環境の一例を示す第１の図である。図９に示すように、ロボット１０が各種作業を行う居室９００には、カメラ９００＿１、９００＿２が取り付けられており、ロボット１０のユーザ（不図示）を撮影する。また、カメラ９００＿１、９００＿２により撮影された画像データは、ロボット１０に送信される。 <Robot work environment>
First, the working environment of the robot 10 according to the fourth embodiment will be described. FIG. 9 is a first diagram showing an example of the working environment of the robot. As shown in FIG. 9, cameras 900_1 and 900_2 are attached to the living room 900 in which the robot 10 performs various tasks, and a user (not shown) of the robot 10 is photographed. Further, the image data taken by the cameras 900_1 and 900_2 is transmitted to the robot 10.

これにより、第４の実施形態に係るロボット１０では、ロボット１０とは別体のカメラ９００＿１、９００＿２により撮影された画像データに基づいて、ロボット１０のユーザの顔を検出するとともに、口唇動作を検出することができる。 As a result, in the robot 10 according to the fourth embodiment, the face of the user of the robot 10 is detected and the lip movement is detected based on the image data taken by the cameras 900_1 and 900_2, which are separate from the robot 10. can do.

この結果、例えば、第４の実施形態に係るロボット１０の場合、ロボット１０に配されたカメラ１１０が、ロボット１０のユーザの方向を向いていない場合であっても、当該ユーザの顔を検出するとともに、口唇動作を検出することができる。 As a result, for example, in the case of the robot 10 according to the fourth embodiment, the face of the user is detected even when the camera 110 arranged on the robot 10 is not facing the user of the robot 10. At the same time, lip movement can be detected.

なお、第４の実施形態に係るロボット１０の場合、制御装置１００の顔検出部３０４では、画像データを受け取った際、カメラ９００＿１またはカメラ９００＿２の方向を向いたユーザの顔ではなく、カメラ１１０の方向を向いたユーザの顔を検出する。このように、ロボット１０とは別体のカメラ９００＿１、９００＿２を活用することで、ロボット１０のユーザの口唇動作を検出する可能性を高めることができる（ユーザの口唇動作の検出漏れを防ぐことができる）。 In the case of the robot 10 according to the fourth embodiment, when the face detection unit 304 of the control device 100 receives the image data, the face of the camera 110 is not the face of the user facing the direction of the camera 900_1 or the camera 900_2. Detects the user's face facing the direction. In this way, by utilizing the cameras 900_1 and 900_2 that are separate from the robot 10, it is possible to increase the possibility of detecting the lip movement of the user of the robot 10 (preventing omission of detection of the user's lip movement). it can).

なお、図９の例では、ロボット１０とは別体のカメラとして２台のカメラを活用する場合について示したが、活用する別体のカメラは２台に限定されない。また、顔検出部３０４は、ロボット１０に配されたカメラ１１０により撮影された画像データと、ロボット１０とは別体のカメラ９００＿１、９００＿２等により撮影された画像データのそれぞれにおいて顔を検出するように構成してもよい。 In the example of FIG. 9, the case where two cameras are used as a camera separate from the robot 10 is shown, but the separate cameras to be used are not limited to two. Further, the face detection unit 304 detects the face in each of the image data taken by the camera 110 arranged on the robot 10 and the image data taken by the cameras 900_1, 900_2, etc., which are separate from the robot 10. It may be configured as.

また、口唇動作検出部３０５は、それぞれの画像データにおいて検出された顔画像データのうち、口唇領域が写っている顔画像データを選択して、ユーザの口唇動作を検出するように構成してもよい。そして、いずれの顔画像データにおいても、ユーザの口唇動作を検出しなかった場合に、ユーザの口唇動作を検出しなかった旨の検出結果を判定部３０６に対して出力するように構成してもよい。 Further, the lip motion detection unit 305 may be configured to detect the user's lip motion by selecting the face image data in which the lip region is captured from the face image data detected in each image data. Good. Then, in any of the face image data, when the user's lip movement is not detected, the detection result indicating that the user's lip movement is not detected may be output to the determination unit 306. Good.

＜制御装置のハードウェア構成＞
次に、第４の実施形態に係るロボット１０に内蔵される制御装置１００のハードウェア構成について説明する。図１０は、ロボットの制御装置のハードウェア構成の一例を示す第２の図である。図２に示したハードウェア構成との相違点は、通信装置２０５が、ロボット１０とは別体のカメラ９００＿１〜９００＿ｎと通信を行う点である。 <Hardware configuration of control device>
Next, the hardware configuration of the control device 100 built in the robot 10 according to the fourth embodiment will be described. FIG. 10 is a second diagram showing an example of the hardware configuration of the robot control device. The difference from the hardware configuration shown in FIG. 2 is that the communication device 205 communicates with the cameras 900_1 to 900_n, which are separate from the robot 10.

通信装置２０５がカメラ９００＿１〜９００＿ｎと通信を行うことで、制御装置１００では、カメラ９００＿１〜９００＿ｎにより撮影され、送信された画像データを取得することができる。 When the communication device 205 communicates with the cameras 900_1 to 900_n, the control device 100 can acquire the image data captured and transmitted by the cameras 900_1 to 900_n.

＜まとめ＞
以上の説明から明らかなように、第４の実施形態に係るロボット１０は、ロボット１０が各種作業を行う居室に取り付けられたカメラ（ロボット１０とは別体のカメラ）が撮影した画像データを取得する。また、第４の実施形態に係るロボット１０は、当該カメラにより撮影された画像データ（及び、ロボット１０に搭載されたカメラにより撮影された画像データ）に基づいて、ユーザの口唇動作を検出する。 <Summary>
As is clear from the above description, the robot 10 according to the fourth embodiment acquires image data taken by a camera (a camera separate from the robot 10) attached to a living room in which the robot 10 performs various tasks. To do. In addition, the robot 10 according to the fourth embodiment detects the user's lip movement based on the image data taken by the camera (and the image data taken by the camera mounted on the robot 10).

これにより、第４の実施形態に係るロボット１０によれば、上記第１の実施形態において説明した効果に加えて、更に、ユーザの口唇動作を検出する可能性を高めることができる。 As a result, according to the robot 10 according to the fourth embodiment, in addition to the effects described in the first embodiment, the possibility of detecting the lip movement of the user can be further increased.

［第５の実施形態］
上記第１乃至第４の実施形態では、制御部１０２、５１０、７１０が、音源制御命令に基づいて、ロボット１０のスピーカ１３０、１３１の出力、可動部１４０〜１６０等の動作を制御するものとして説明した。 [Fifth Embodiment]
In the first to fourth embodiments, the control units 102, 510, and 710 control the outputs of the speakers 130 and 131 of the robot 10, the operations of the movable units 140 to 160, and the like based on the sound source control command. explained.

しかしながら、音源制御命令に基づいて制御部１０２、５１０、７１０が音を低減させる音源は、ロボット１０のスピーカ１３０、１３１、可動部１４０〜１６０等に限定されない。例えば、ロボット１０以外の外部音源が発する音を低減させるように構成してもよい。以下、第５の実施形態について、上記第１乃至第４の実施形態との相違点を中心に説明する。 However, the sound source for which the control units 102, 510, and 710 reduce the sound based on the sound source control command is not limited to the speakers 130, 131, and the movable units 140 to 160 of the robot 10. For example, it may be configured to reduce the sound emitted by an external sound source other than the robot 10. Hereinafter, the fifth embodiment will be described focusing on the differences from the first to fourth embodiments.

＜ロボットの作業環境＞
はじめに、第５の実施形態に係るロボット１０の作業環境について説明する。図１１は、ロボットの作業環境の一例を示す第２の図である。図１１に示すように、ロボット１０が各種作業を行う居室１１００には、複数の外部音源が配されている。 <Robot work environment>
First, the working environment of the robot 10 according to the fifth embodiment will be described. FIG. 11 is a second diagram showing an example of the working environment of the robot. As shown in FIG. 11, a plurality of external sound sources are arranged in the living room 1100 in which the robot 10 performs various tasks.

具体的には、居室１１００には、オーディオ機器のスピーカ、テレビのスピーカ、エアコン、扇風機、空気清浄器、水道設備等の外部音源が配されている。 Specifically, external sound sources such as audio equipment speakers, television speakers, air conditioners, electric fans, air purifiers, and water supply facilities are arranged in the living room 1100.

このうち、オーディオ機器、テレビ、エアコンには、通信装置が配されており、ロボット１０と有線または無線を介して通信可能に接続される。 Of these, communication devices are arranged in audio equipment, televisions, and air conditioners, and are connected to the robot 10 so as to be able to communicate with each other via wire or wireless.

このため、第５の実施形態に係るロボット１０の制御装置１００では、オーディオ機器、テレビ、エアコンの各機器の動作を、当該通信装置を介して制御することができる。具体的には、第５の実施形態に係るロボット１０の制御装置１００は、オーディオ機器、テレビ、エアコンの各機器に、各機器の動作を停止させるための信号を送信する。これにより、第５の実施形態に係るロボット１０の制御装置１００は、各機器の動作を停止させ、当該各機器が発する音を低減させる。 Therefore, in the control device 100 of the robot 10 according to the fifth embodiment, the operation of each device of the audio device, the television, and the air conditioner can be controlled via the communication device. Specifically, the control device 100 of the robot 10 according to the fifth embodiment transmits a signal for stopping the operation of each device to each device of the audio device, the television, and the air conditioner. As a result, the control device 100 of the robot 10 according to the fifth embodiment stops the operation of each device and reduces the sound emitted by each device.

あるいは、第５の実施形態に係るロボット１０の制御装置１００は、オーディオ機器またはテレビに、例えば、音量を下げるための信号を送信する。これにより、第５の実施形態に係るロボット１０の制御装置１００は、オーディオ機器またはテレビが発する音を低減させる。 Alternatively, the control device 100 of the robot 10 according to the fifth embodiment transmits a signal for lowering the volume, for example, to an audio device or a television. As a result, the control device 100 of the robot 10 according to the fifth embodiment reduces the sound emitted by the audio equipment or the television.

また、第５の実施形態に係るロボット１０の制御装置１００は、エアコンに、例えば、風量を下げるための信号（あるいは、設定温度を変更するための信号）等を出力する。これにより、第５の実施形態に係るロボット１０の制御装置１００は、エアコンの風量を下げさせ（エアコンの設定温度を変更させ）、エアコンが発する音を低減させる。 Further, the control device 100 of the robot 10 according to the fifth embodiment outputs, for example, a signal for lowering the air volume (or a signal for changing the set temperature) or the like to the air conditioner. As a result, the control device 100 of the robot 10 according to the fifth embodiment lowers the air volume of the air conditioner (changes the set temperature of the air conditioner) and reduces the sound emitted by the air conditioner.

この結果、第５の実施形態に係るロボット１０の制御装置１００によれば、音声データを検出しやすい状態を作り出すことができる。 As a result, according to the control device 100 of the robot 10 according to the fifth embodiment, it is possible to create a state in which voice data can be easily detected.

また、第５の実施形態に係るロボット１０の制御装置１００では、扇風機、空気清浄器、水道設備の各機器を操作するために、可動部１４０の動作を制御する。具体的には、第５の実施形態に係るロボット１０の制御装置１００は、例えば、扇風機の動作を停止させるためのスイッチ（あるいは、風量を下げるためのスイッチ）を操作するよう、可動部１４０の動作を制御する。これにより、第５の実施形態に係るロボット１０の制御装置１００は、扇風機の動作を停止させ、扇風機が発する音を低減させる。 Further, the control device 100 of the robot 10 according to the fifth embodiment controls the operation of the movable portion 140 in order to operate each device of the electric fan, the air purifier, and the water supply facility. Specifically, the control device 100 of the robot 10 according to the fifth embodiment has a movable portion 140 so as to operate, for example, a switch for stopping the operation of the electric fan (or a switch for reducing the air volume). Control the operation. As a result, the control device 100 of the robot 10 according to the fifth embodiment stops the operation of the electric fan and reduces the sound emitted by the electric fan.

また、第５の実施形態に係るロボット１０の制御装置１００は、例えば、空気清浄器の動作を停止させるためのスイッチを操作するよう、可動部１４０の動作を制御する。これにより、第５の実施形態に係るロボット１０の制御装置１００は、空気清浄器の動作を停止させ、空気清浄器が発する音を低減させる。 Further, the control device 100 of the robot 10 according to the fifth embodiment controls the operation of the movable portion 140 so as to operate a switch for stopping the operation of the air purifier, for example. As a result, the control device 100 of the robot 10 according to the fifth embodiment stops the operation of the air purifier and reduces the sound emitted by the air purifier.

また、第５の実施形態に係るロボット１０の制御装置１００は、例えば、水道の蛇口をひねり、水を止める（あるいは、水量を下げる）よう、可動部１４０の動作を制御する。これにより、第５の実施形態に係るロボット１０の制御装置１００は、水を止めさせ（あるいは水量を下げさせ）、水道の蛇口から水が流れ出ることで発する音を低減させる。 Further, the control device 100 of the robot 10 according to the fifth embodiment controls the operation of the movable portion 140 so as to, for example, twist the faucet of the water supply to stop the water (or reduce the amount of water). As a result, the control device 100 of the robot 10 according to the fifth embodiment stops the water (or reduces the amount of water) and reduces the sound generated by the water flowing out from the tap.

なお、可動部１４０の動作を制御することによる外部音源の操作は、公知の方法により実現される。 The operation of the external sound source by controlling the operation of the movable portion 140 is realized by a known method.

なお、図１１の例では、居室１１００内の異なる位置にも外部音源が配されているが、ロボット１０は、それぞれの外部音源を制御し、それぞれの外部音源が発する音を低減させてもよい。あるいは、ロボット１０は、いずれか一方の外部音源を制御し、いずれか一方の外部音源が発する音を低減させてもよい。 In the example of FIG. 11, external sound sources are arranged at different positions in the living room 1100, but the robot 10 may control each external sound source to reduce the sound emitted by each external sound source. .. Alternatively, the robot 10 may control one of the external sound sources to reduce the sound emitted by the one of the external sound sources.

なお、いずれか一方の外部音源を制御するにあたっては、ロボット１０に近い方の外部音源を制御するように構成してもよい。 In controlling one of the external sound sources, the external sound source closer to the robot 10 may be controlled.

また、ロボット１０が制御する外部音源を、ロボット１０から距離ｄ以内に配された外部音源に限定してもよい。なお、距離ｄは、ロボット１０のユーザとロボット１０のマイクロフォン１２０との間の距離に応じて変更するように構成してもよい。例えば、距離ｄは、ロボット１０のユーザとロボット１０のマイクロフォン１２０との間の距離に、予め定められた係数をかけ合わせることで算出するように構成してもよい。 Further, the external sound source controlled by the robot 10 may be limited to the external sound source arranged within the distance d from the robot 10. The distance d may be changed according to the distance between the user of the robot 10 and the microphone 120 of the robot 10. For example, the distance d may be calculated by multiplying the distance between the user of the robot 10 and the microphone 120 of the robot 10 by a predetermined coefficient.

＜まとめ＞
以上の説明から明らかように、第５の実施形態に係るロボット１０は、ロボット１０以外の外部音源が発する音を低減させる。これにより、第１の実施形態に係る制御装置１００では、音声データを検出しやすい状態を作り出すことができる。この結果、第５の実施形態に係る制御装置１００によれば、ユーザの音声指示に基づいて動作するロボットにおいて、音声検出率を向上させることができる。 <Summary>
As is clear from the above description, the robot 10 according to the fifth embodiment reduces the sound emitted by an external sound source other than the robot 10. As a result, the control device 100 according to the first embodiment can create a state in which voice data can be easily detected. As a result, according to the control device 100 according to the fifth embodiment, the voice detection rate can be improved in the robot that operates based on the voice instruction of the user.

［第６の実施形態］
上記第５の実施形態では、ロボット１０以外の外部音源が発する音を直接的に低減させる場合について説明した。これに対して、第６の実施形態では、ロボット１０以外の外部音源が、リモートコントローラを介して操作可能な場合においては、リモートコントローラを操作することで、ロボット１０以外の外部音源が発する音を低減させる。以下、第６の実施形態について、上記第５の実施形態との相違点を中心に説明する。 [Sixth Embodiment]
In the fifth embodiment, the case where the sound emitted by an external sound source other than the robot 10 is directly reduced has been described. On the other hand, in the sixth embodiment, when an external sound source other than the robot 10 can be operated via the remote controller, the sound emitted by the external sound source other than the robot 10 can be produced by operating the remote controller. Reduce. Hereinafter, the sixth embodiment will be described focusing on the differences from the fifth embodiment.

＜ロボットの作業環境＞
はじめに、第６の実施形態に係るロボット１０の作業環境について説明する。図１２は、ロボットの作業環境の一例を示す第３の図である。図１１に示す作業環境との違いは、居室１２００の場合、外部音源遠隔操作器が配されている点である。 <Robot work environment>
First, the working environment of the robot 10 according to the sixth embodiment will be described. FIG. 12 is a third diagram showing an example of the working environment of the robot. The difference from the working environment shown in FIG. 11 is that in the case of the living room 1200, an external sound source remote controller is arranged.

なお、図１２において、リモコン１は、オーディオ機器を遠隔操作するための操作器であり、リモコン２は、テレビを遠隔操作するための操作器である。また、リモコン３は、エアコンを遠隔操作するための操作器であり、リモコン４は、扇風機を遠隔操作するための操作器である。 In FIG. 12, the remote controller 1 is an operator for remotely controlling an audio device, and the remote controller 2 is an operator for remotely controlling a television. Further, the remote controller 3 is an operator for remotely controlling the air conditioner, and the remote controller 4 is an operator for remotely controlling the electric fan.

第６の実施形態に係るロボット１０の制御装置１００では、オーディオ機器、テレビ、エアコン、扇風機の各機器を、リモコン１〜リモコン４等を介して操作する。 In the control device 100 of the robot 10 according to the sixth embodiment, each device of the audio device, the television, the air conditioner, and the electric fan is operated via the remote controller 1 to the remote controller 4.

具体的には、第６の実施形態に係るロボット１０の制御装置１００は、オーディオ機器、テレビ、エアコン、扇風機の各機器のリモコン１〜リモコン４等を操作するよう、可動部１５０の動作を制御する。 Specifically, the control device 100 of the robot 10 according to the sixth embodiment controls the operation of the movable portion 150 so as to operate the remote controls 1 to 4 of each device of the audio device, the television, the air conditioner, and the electric fan. To do.

例えば、第６の実施形態に係るロボット１０の制御装置１００は、リモコン１に対して、オーディオ機器を停止させる、あるいは、音量を下げるための操作を行うよう、可動部１５０の動作を制御する。 For example, the control device 100 of the robot 10 according to the sixth embodiment controls the operation of the movable portion 150 so that the remote controller 1 performs an operation for stopping the audio equipment or lowering the volume.

また、例えば、第６の実施形態に係るロボット１０の制御装置１００は、リモコン２に対して、テレビを消す、あるいは、テレビの音量を下げるための操作を行うよう、可動部１５０の動作を制御する。 Further, for example, the control device 100 of the robot 10 according to the sixth embodiment controls the operation of the movable portion 150 so that the remote controller 2 is operated to turn off the television or reduce the volume of the television. To do.

また、例えば、第６の実施形態に係るロボット１０の制御装置１００は、リモコン３に対して、エアコンを停止させる、風量を下げる、あるいは、設定温度を変更するための操作を行うよう、可動部１５０の動作を制御する。 Further, for example, the control device 100 of the robot 10 according to the sixth embodiment causes the remote controller 3 to perform an operation for stopping the air conditioner, lowering the air volume, or changing the set temperature. Controls the operation of 150.

また、例えば、第６の実施形態に係るロボット１０の制御装置１００は、リモコン４に対して、扇風機を停止させる、あるいは、風量を下げるための操作を行うよう、可動部１５０の動作を制御する。 Further, for example, the control device 100 of the robot 10 according to the sixth embodiment controls the operation of the movable portion 150 so that the remote controller 4 is operated to stop the electric fan or reduce the air volume. ..

＜まとめ＞
以上の説明から明らかなように、第６の実施形態に係るロボット１０は、外部音源遠隔操作器を操作することで、ロボット１０以外の外部音源が発する音を低減させる。これにより、第６の実施形態に係る制御装置１００では、音声データを検出しやすい状態を作り出すことができる。この結果、第６の実施形態に係る制御装置１００によれば、ユーザの音声指示に基づいて制御するロボットにおいて、音声検出率を向上させることができる。 <Summary>
As is clear from the above description, the robot 10 according to the sixth embodiment reduces the sound emitted by an external sound source other than the robot 10 by operating the external sound source remote controller. As a result, the control device 100 according to the sixth embodiment can create a state in which voice data can be easily detected. As a result, according to the control device 100 according to the sixth embodiment, the voice detection rate can be improved in the robot that controls based on the voice instruction of the user.

［第７の実施形態］
上記第１乃至第６の実施形態では、ロボット１０が音源（外部音源を含む）が発する音を低減させる場合について説明した。しかしながら、ロボット１０の作業環境には、ロボット１０が（直接的か間接的かに関わらず）、音を低減させることができない外部音源が配されている場合もある。 [7th Embodiment]
In the first to sixth embodiments, the case where the robot 10 reduces the sound emitted by the sound source (including the external sound source) has been described. However, in the working environment of the robot 10, an external sound source that cannot reduce the sound of the robot 10 (whether direct or indirect) may be arranged.

第７の実施形態では、このような外部音源が発する音を低減させるために、外部音源に対する操作をユーザに依頼する。以下、第７の実施形態について、上記第１の実施形態との相違点を中心に説明する。 In the seventh embodiment, in order to reduce the sound emitted by such an external sound source, the user is requested to operate the external sound source. Hereinafter, the seventh embodiment will be described focusing on the differences from the first embodiment.

＜制御装置の機能構成＞
はじめに、第７の実施形態に係る制御装置１００の機能構成について説明する。図１３は、制御装置の機能構成の一例を示す図である。図３に示した機能構成との相違点は、判定部１３００の機能及び制御部１３１０の機能が、図３の判定部３０６の機能及び制御部１０２の機能とは異なる点である。 <Functional configuration of control device>
First, the functional configuration of the control device 100 according to the seventh embodiment will be described. FIG. 13 is a diagram showing an example of the functional configuration of the control device. The difference from the functional configuration shown in FIG. 3 is that the function of the determination unit 1300 and the function of the control unit 1310 are different from the function of the determination unit 306 and the function of the control unit 102 of FIG.

判定部１３００は指示部の一例であり、音声検出部３０２により音声データが検出されたか否か、及び、口唇動作検出部３０５より口唇動作の検出結果が出力されたか否かを判定する。また、判定部１３００は、口唇動作検出部３０５より口唇動作の検出結果が出力されたにも関わらず、音声検出部３０２より音声データが検出されていない場合に、制御部１３１０に対して、音声出力命令を出力する。これにより、判定部１３００は、外部音源の操作をユーザに依頼するよう指示する。 The determination unit 1300 is an example of an instruction unit, and determines whether or not voice data is detected by the voice detection unit 302 and whether or not the lip motion detection result is output from the lip motion detection unit 305. Further, the determination unit 1300 sends a voice to the control unit 1310 when the voice data is not detected by the voice detection unit 302 even though the lip movement detection result is output from the lip movement detection unit 305. Output the output command. As a result, the determination unit 1300 instructs the user to operate the external sound source.

なお、判定部１３００は、図３の判定部３０６同様、口唇動作検出部３０５より口唇動作の検出結果が出力され、音声検出部３０２により音声データが検出された場合、制御部１３１０に対して、音声データを出力する。 Similar to the determination unit 306 in FIG. 3, the determination unit 1300 outputs the detection result of the lip movement from the lip movement detection unit 305, and when the voice detection unit 302 detects the voice data, the determination unit 1300 sends the control unit 1310 to the control unit 1310. Output audio data.

制御部１３１０は、判定部１３００により出力された音声出力命令を受け取ると、ロボット１０のユーザに、外部音源の操作を依頼するための音声出力信号を生成する。また、制御部１３１０は、生成した音声出力信号に基づく合成音声を、スピーカ１３０、１３１を介して、ユーザに出力する。 Upon receiving the voice output command output by the determination unit 1300, the control unit 1310 generates a voice output signal for requesting the user of the robot 10 to operate the external sound source. Further, the control unit 1310 outputs the synthesized voice based on the generated voice output signal to the user via the speakers 130 and 131.

なお、外部音源の操作を依頼するための音声出力信号とは、例えば、
・「テレビを消してください」
・「音楽を一時停止してください」、
・「聞き取れないので水を止めてください」、
等が挙げられる。 The audio output signal for requesting the operation of the external sound source is, for example,
・ "Please turn off the TV"
・ "Pause music",
・ "I can't hear you, so please stop the water."
And so on.

これにより、制御部１３１０は、音声データを検出しやすい状態を作り出すことができる。 As a result, the control unit 1310 can create a state in which voice data can be easily detected.

＜動作制御処理の流れ＞
次に、第７の実施形態に係る制御装置１００による動作制御処理の流れについて説明する。図１４は、制御装置による動作制御処理の流れを示す第４のフローチャートである。図４に示すフローチャートとの相違点は、ステップＳ１４０１である。 <Flow of operation control processing>
Next, the flow of the operation control process by the control device 100 according to the seventh embodiment will be described. FIG. 14 is a fourth flowchart showing the flow of operation control processing by the control device. The difference from the flowchart shown in FIG. 4 is step S1401.

ステップＳ１４０１において、判定部１３００は音声出力命令を出力する。また、制御部１３１０は音声出力命令を受け取ると、ユーザに、外部音源の操作を依頼するための音声出力信号を生成し、生成した音声出力信号に基づく合成音声を、スピーカ１３０、１３１を介して出力する。これにより、制御部１３１０は、外部音源が発する音を低減させ、音声データを検出しやすい状態を作り出すことができる。 In step S1401, the determination unit 1300 outputs a voice output command. When the control unit 1310 receives the voice output command, the control unit 1310 generates a voice output signal for requesting the user to operate the external sound source, and generates a synthetic voice based on the generated voice output signal via the speakers 130 and 131. Output. As a result, the control unit 1310 can reduce the sound emitted by the external sound source and create a state in which the voice data can be easily detected.

＜まとめ＞
以上の説明から明らかなように、第７の実施形態に係る制御装置１００は、ユーザに対して、外部音源の操作を依頼することで、外部音源が発する音を低減させる。これにより、第７の実施形態に係る制御装置１００では、音声データを検出しやすい状態を作り出すことができる。この結果、第７の実施形態に係る制御装置１００によれば、ユーザの音声指示に基づいて動作するロボットにおいて、音声検出率を向上させることができる。 <Summary>
As is clear from the above description, the control device 100 according to the seventh embodiment reduces the sound emitted by the external sound source by requesting the user to operate the external sound source. As a result, the control device 100 according to the seventh embodiment can create a state in which voice data can be easily detected. As a result, according to the control device 100 according to the seventh embodiment, the voice detection rate can be improved in the robot that operates based on the voice instruction of the user.

［その他の実施形態］
上記各実施形態では、判定部３０６、５００、７００、１３００が音源制御命令を出力するタイミングについて言及しなかったが、音源制御命令を出力するタイミングとしては、様々なケースが考えられる。 [Other Embodiments]
In each of the above embodiments, the timing at which the determination units 306, 500, 700, and 1300 output the sound source control command is not mentioned, but various cases can be considered as the timing at which the sound source control command is output.

例えば、口唇動作検出部３０５により口唇動作の始端が検出されたが、音声検出部３０２により音声データの始端が検出されていない場合においては、口唇動作の終端が検出される前に、音源制御命令を出力してもよい。 For example, when the start end of the lip movement is detected by the lip movement detection unit 305 but the start end of the voice data is not detected by the voice detection unit 302, the sound source control command is issued before the end of the lip movement is detected. May be output.

また、口唇動作検出部３０５により口唇動作の始端が検出され、音声検出部３０２により音声データの始端が検出されたが、口唇動作の始端の検出位置と、音声データの始端の検出位置とのずれ量が所定の閾値以上であったとする。この場合、判定部では、ずれ量が所定の閾値以上となったタイミングで音源制御命令を出力してもよい。つまり、判定部３０６は、口唇動作の始端の検出位置と、音声データの始端の検出位置とのずれ量に基づいて、音源制御命令を出力してもよい。 Further, the lip motion detection unit 305 detected the start end of the lip motion, and the voice detection unit 302 detected the start end of the voice data, but the deviation between the detection position of the start end of the lip movement and the detection position of the start end of the voice data. It is assumed that the amount is equal to or more than a predetermined threshold value. In this case, the determination unit may output the sound source control command at the timing when the deviation amount becomes equal to or more than a predetermined threshold value. That is, the determination unit 306 may output a sound source control command based on the amount of deviation between the detection position of the start end of the lip movement and the detection position of the start end of the voice data.

また、口唇動作検出部３０５により口唇動作の終端が検出され、音声検出部３０２により音声データの終端が検出されたが、口唇動作の終端の検出位置と、音声データの終端の検出位置とのずれ量が所定の閾値以上であったとする。この場合、判定部では、ずれ量が所定の閾値以上となったタイミングで音源制御命令を出力してもよい。つまり、判定部３０６は、口唇動作の終端の検出位置と、音声データの終端の検出位置とのずれ量に基づいて、音源制御命令を出力してもよい。 Further, the lip motion detection unit 305 detected the end of the lip motion, and the voice detection unit 302 detected the end of the voice data, but the deviation between the detection position of the end of the lip movement and the detection position of the end of the voice data. It is assumed that the amount is equal to or more than a predetermined threshold value. In this case, the determination unit may output the sound source control command at the timing when the deviation amount becomes equal to or more than a predetermined threshold value. That is, the determination unit 306 may output a sound source control command based on the amount of deviation between the detection position at the end of the lip movement and the detection position at the end of the voice data.

また、上記各実施形態において、ロボット１０は、可動部１４０〜１６０を有するものとして説明したが、ロボット１０は、可動部１４０〜１６０以外の可動部を有していてもよい。可動部１４０〜１６０以外の可動部には、例えば、吸引部やファン等が含まれる。 Further, in each of the above embodiments, the robot 10 has been described as having movable portions 140 to 160, but the robot 10 may have movable portions other than the movable portions 140 to 160. The movable portion other than the movable portion 140 to 160 includes, for example, a suction portion and a fan.

また、上記各実施形態では、複数の音源（外部音源を含む）がある場合に、制御部が音を低減させる順序について特に言及しなかったが、例えば、予め定められた優先順位に従って、音を低減させるようにしてもよい。あるいは、同時に、全ての音を低減させるようにしてもよい。 Further, in each of the above embodiments, when there are a plurality of sound sources (including an external sound source), the order in which the control unit reduces the sound is not particularly mentioned, but for example, the sounds are produced according to a predetermined priority. It may be reduced. Alternatively, at the same time, all sounds may be reduced.

また、上記各実施形態で説明した機能は、他の任意の実施形態で説明した機能と組み合わせて実現されてもよい。 In addition, the functions described in each of the above embodiments may be realized in combination with the functions described in any other embodiment.

また、上記各実施形態において、制御装置１００の機能は、プロセッサ２０１が、制御プログラムを実行することで実現されるものとして説明した。しかしながら、制御装置１００の機能は、アナログ回路、デジタル回路又はアナログ・デジタル混合回路で構成された回路により実現されてもよい。また、制御装置１００の機能を実現する制御回路を備えていてもよい。各回路の実装は、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等によるものであってもよい。 Further, in each of the above embodiments, the function of the control device 100 has been described as being realized by the processor 201 executing the control program. However, the function of the control device 100 may be realized by a circuit composed of an analog circuit, a digital circuit, or an analog / digital mixed circuit. Further, a control circuit that realizes the function of the control device 100 may be provided. The mounting of each circuit may be by ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array) or the like.

また、上記各実施形態において、制御プログラムを実行するにあたっては、制御プログラムをフレキシブルディスクやＣＤ−ＲＯＭ等の記憶媒体に収納し、コンピュータに読み込ませて実行させるものであってもよい。記憶媒体は、磁気ディスクや光ディスク等の着脱可能なものに限定されず、ハードディスク装置やメモリなどの固定型の記憶媒体であってもよい。また、ソフトウェアによる処理は、ＦＰＧＡ等の回路に実装され、ハードウェアが実行するものであってもよい。ジョブの実行は、例えば、ＧＰＵ（Graphics Processing Unit）等のアクセラレータを使用して行ってもよい。 Further, in each of the above embodiments, when the control program is executed, the control program may be stored in a storage medium such as a flexible disk or a CD-ROM, read by a computer, and executed. The storage medium is not limited to a removable one such as a magnetic disk or an optical disk, and may be a fixed storage medium such as a hard disk device or a memory. Further, the processing by software may be implemented in a circuit such as FPGA and executed by hardware. The job may be executed by using an accelerator such as a GPU (Graphics Processing Unit), for example.

なお、上記実施形態に挙げた構成等に、その他の要素との組み合わせ等、ここで示した構成に本発明が限定されるものではない。これらの点に関しては、本発明の趣旨を逸脱しない範囲で変更することが可能であり、その応用形態に応じて適切に定めることができる。 The present invention is not limited to the configurations shown here, such as combinations with other elements in the configurations and the like described in the above embodiments. These points can be changed without departing from the spirit of the present invention, and can be appropriately determined according to the application form thereof.

１０：ロボット
１００：制御装置
１０１：停止部
１０２：制御部
１１０：カメラ
１２０：マイクロフォン
１３０、１３１：スピーカ
１４０、１５０：可動部
１６０：可動部
３０１：音取得部
３０２：音声検出部
３０３：画像取得部
３０４：顔検出部
３０５：口唇動作検出部
３０６：判定部
５００：判定部
５１０：制御部
７００：判定部
７１０：制御部
９００：居室
９００＿１、９００＿２：カメラ
１１００、１２００：居室
１３００：判定部
１３１０：制御部 10: Robot 100: Control device 101: Stop unit 102: Control unit 110: Camera 120: Microphone 130, 131: Speaker 140, 150: Movable part 160: Movable part 301: Sound acquisition unit 302: Sound detection unit 303: Image acquisition Unit 304: Face detection unit 305: Lip motion detection unit 306: Judgment unit 500: Judgment unit 510: Control unit 700: Judgment unit 710: Control unit 900: Living room 900_1, 900_2: Camera 1100, 1200: Living room 1300: Judgment unit 1310 : Control unit

Claims

It ’s a robot control device,
A lip motion detection unit that detects the user's lip motion based on the acquired image data,
A voice detector that detects voice data from the acquired sound data,
A control having an instruction unit for instructing to reduce the sound emitted by the sound source when the lip movement detection unit detects the user's lip movement and the voice detection unit does not detect voice data. apparatus.

It ’s a robot control device,
A lip motion detection unit that detects the user's lip motion based on the acquired image data,
A voice detector that detects voice data from the acquired sound data,
An instruction unit that instructs to reduce the sound emitted by the sound source based on the amount of deviation between the detection position of the lip movement detected by the lip movement detection unit and the detection position of the voice data detected by the voice detection unit. Control device to have.

It ’s a robot control device,
A lip motion detection unit that detects the user's lip motion based on the acquired image data,
A voice detector that detects voice data from the acquired sound data,
When the lip motion detection unit detects the user's lip motion and the voice detection unit detects voice data, the sound emitted by the sound source is reduced based on the likelihood information when the voice data is recognized. A control device having an instruction unit for instructing the operation.

The invention according to any one of claims 1 to 3, further comprising a control unit that reduces the sound emitted by the sound source by controlling the operation of the movable unit of the robot when instructed by the instruction unit. Control device.

The invention according to any one of claims 1 to 3, further comprising a control unit that reduces the sound emitted by the sound source by controlling the sound source mounted on the robot when instructed by the instruction unit. Control device.

The control device according to any one of claims 1 to 3, wherein the sound source is an external sound source separate from the robot.

The control device according to claim 6, further comprising a control unit that reduces the sound emitted by the external sound source by transmitting a signal to the external sound source when instructed by the instruction unit.

The control device according to claim 6, further comprising a control unit that reduces the sound emitted by the external sound source by causing the robot to operate the external sound source when instructed by the instruction unit.

The control according to claim 6, further comprising a control unit that reduces the sound emitted by the external sound source by causing the robot to operate an operating device that remotely controls the external sound source when instructed by the instruction unit. apparatus.

6. The sixth aspect of the present invention further includes a control unit that reduces the sound emitted by the external sound source by outputting a synthetic voice for requesting the user to operate the external sound source when instructed by the instruction unit. The control device described.

The control device according to any one of claims 1 to 10, wherein when the instruction unit is instructed to reduce the sound emitted by the sound source, the instruction unit further instructs to urge the vocalization.

If the voice detection unit does not detect voice data during a predetermined time after instructing the sound source to reduce the sound, the instruction unit operates before reducing the sound emitted by the sound source. The control device according to claim 4, wherein the control device is instructed to restart.

When the voice detection unit detects voice data during a predetermined time after the instruction unit instructs to reduce the sound emitted by the sound source, the control unit determines the robot based on the detected voice data. The control device according to claim 4, which controls the operation of the movable portion of the above.

An imaging unit that acquires image data and
A sound collector that acquires sound data and
A robot having the control device according to any one of claims 1 to 13.

It ’s a robot control method.
A lip motion detection process that detects the user's lip motion based on the acquired image data,
A voice detection process that detects voice data from the acquired sound data,
It has an instruction step of instructing to reduce the sound emitted by the sound source when the lip motion of the user is detected in the lip motion detection step and the voice data is not detected in the voice detection step. Control method.

On the computer
A lip motion detection process that detects the user's lip motion based on the acquired image data,
A voice detection process that detects voice data from the acquired sound data,
When the user's lip movement is detected in the lip movement detection step and no voice data is detected in the voice detection step, an instruction step of instructing to reduce the sound emitted by the sound source is executed. Control program to make it.