JP2019124855A

JP2019124855A - Apparatus and program and the like

Info

Publication number: JP2019124855A
Application number: JP2018006267A
Authority: JP
Inventors: 水野　隆之; Takayuki Mizuno; 隆之水野; 幹雄島津江; Mikio Shimazue; 裕一梶田; Yuichi Kajita; 清水　勇喜; Yuki Shimizu; 勇喜清水; 和田　昌浩; Masahiro Wada; 昌浩和田; 圭三高橋; Keizo Takahashi; 慶介高橋; Keisuke Takahashi
Original assignee: Yupiteru Corp; Yupiteru Kagoshima Corp
Current assignee: Yupiteru Corp; Yupiteru Kagoshima Corp
Priority date: 2018-01-18
Filing date: 2018-01-18
Publication date: 2019-07-25
Anticipated expiration: 2038-01-18
Also published as: JP7408105B2; JP2022169645A; JP7130201B2; JP2024026341A

Abstract

To provide an apparatus provided with a function for performing communication, for example.SOLUTION: A robot 1 is provided with a function for outputting sound and a function for communicating with a user, the robot 1 interacts with the user by using an interactive engine, and the user is allowed to visually check the contents uttered by the user by means of a touch panel part 7 which is displayed on the touch panel part as a display part by converting the contents spoken by the user immediately before the conversation into character string data, thereby contributing correction and trend of subsequent communication.SELECTED DRAWING: Figure 1

Description

本発明は、例えばコミュニケーション等を行う機能を備えた装置及びプログラム等に関するものである。 The present invention relates to, for example, an apparatus and a program having a function of performing communication and the like.

特許文献１には、対話式のコミュニケーションロボットに関する技術が開示されている。 Patent Document 1 discloses a technology related to an interactive communication robot.

特開２０１１−０００６８１号公報JP, 2011-000681, A

しかし、従来のコミュニケーションロボットは十分な能力を備えていないという課題があった。そこで従来よりも優れた能力を有する装置及びプログラム等を提供することを目的とする。
本願の発明の目的はこれに限定されず、本明細書および図面等に開示される構成の部分から奏する効果を得ることを目的とする構成についても分割出願・補正等により権利取得する意思を有する。例えば本明細書において「〜できる」と記載した箇所を「〜が課題である」と読み替えた課題が本明細書には開示されている。課題はそれぞれ独立したものとして記載しているものであり、この課題を解決するための構成についても単独で分割出願・補正等により権利取得する意思を有する。課題が明細書の記載から黙字的に把握されるものであっても、本出願人は本明細書に記載の構成の一部を補正または分割出願にて特許請求の範囲とする意思を有する。またこれら独立の課題を組み合わせた課題も開示されている。 However, there is a problem that conventional communication robots do not have sufficient ability. Therefore, it is an object of the present invention to provide an apparatus, a program, and the like having a capability superior to the conventional one.
The object of the invention of the present application is not limited to this, and the intention is to acquire the right of division application / correction etc. even for the construction aiming to obtain the effects exerted from the parts of the construction disclosed in this specification and the drawings etc. . For example, in the present specification, a subject is disclosed in which a part described as “can be” is read as “a subject is a subject” in the present specification. The issues are described as being independent of each other, and the organization for solving the issues is also willing to acquire the right by divisional application / correction etc. alone. Even if the subject is grasped silently from the description of the specification, the applicant has the intention of claiming a part of the configuration described herein in the correction or divisional application. . Moreover, the subject which combined these independent subjects is also disclosed.

（１）ユーザー又は他の機器の少なくともいずれか一方への出力情報の出力をすることでコミュニケーションを行う機能とを備える装置であって、前記コミュニケーションのための前記出力情報の生成を制御する機能又は前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能を備えるとよい。
このようにすれば、ユーザーまたは他の機器の少なくともいずれか一方は、生成が制御された出力情報又はタイミングが制御された出力情報の少なくともいずれか一方を得ることができる。従来よりも優れた装置を提供できる。
装置はコミュニケーションのための前記出力情報の生成を制御として例えばコミュニケーションに応じた出力情報を生成するとよい。このようにすれば特にユーザー又は他の機器において装置とのコミュニケーションを図る際の利便性が高まる。コミュニケーションはどのような出力情報を出力して行うようにしてもよいが、特に過去のコミュニケーションの履歴情報を記憶しておき当該履歴情報にも基いて行うとよい。また異なる複数のユーザーまたは他の機器とのコミュニケーションの履歴情報に基いて行うとよい。特に出力情報の出力は１の出力手段から行うようにしてもよいが、異なる複数の出力手段から可能な構成とし、これらのうちいずれかの出力手段を選択して出力を行うようにするとよい。出力手段としては例えば音声出力手段、表示手段、通信手段等とするとよい。コミュニケーションは、音声や目視あるいはそれら以外の五感、例えば触感にうったえコミュニケーションを図る構成としてもよい。 (1) A device comprising a function of performing communication by outputting output information to at least one of a user or another device, the function of controlling generation of the output information for the communication, or It is preferable to have a function of controlling the timing of outputting the output information for the communication.
In this way, the user or at least one of the other devices can obtain at least one of output information whose generation is controlled and / or output information whose timing is controlled. An apparatus superior to the conventional one can be provided.
The apparatus may generate output information according to communication, for example, under control of generation of the output information for communication. In this way, the convenience in communicating with the device especially by the user or another device is enhanced. Although communication may be performed by outputting any output information, it is preferable to store history information of communication in the past and to perform based on the history information. Also, it may be performed based on history information of communication with different users or other devices. In particular, although the output information may be output from one output means, it is preferable that the configuration is possible from a plurality of different output means, and one of these output means is selected for output. The output means may be, for example, an audio output means, a display means, a communication means or the like. The communication may be configured to communicate with voice, visual or other five senses, such as tactile sensation.

「装置」は特に出力情報を音声出力をする機能を備えるとよい。このようにすれば、例えば音声を入力して動作するデバイスを制御できる。装置は、特にコミュニケーションするためのインターフェースを備え、コミュニケーションを実行するための判断手段を備えるとよい。
なお「装置」の構成の含まれる部分は複数の筺体で構成してもよいが、特に１つの筺体で構成するとよい。
また装置は、例えば、有線や無線を通じてネットワークにアクセスする機能を備えるシステムとするとよい。特に、例えば、スマートフォン、タブレット端末、スマートスピーカ、スマートカメラ等とするとよい。また、外観も限定されるものではないが、特に、いかにも他者とコミュニケーションをとるような装置とするとよい。特にロボットとするとよい。例えば人や動物を模したような、あるいは例えばそれら以外の擬人化した形態のロボットとすると特によい。 In particular, the "device" may have a function of outputting the output information as an audio. In this way, it is possible to control, for example, a device that operates by inputting voice. The device may comprise an interface, in particular for communication, and may comprise decision means for carrying out the communication.
In addition, although the part in which the structure of an "apparatus" is contained may be comprised with several housings, it is good to comprise with one housing especially.
Also, the device may be, for example, a system provided with a function of accessing a network through a wired or wireless connection. In particular, for example, a smartphone, a tablet terminal, a smart speaker, a smart camera or the like may be used. In addition, although the appearance is not limited, in particular, it is preferable to use a device that communicates with others. In particular, it is better to use a robot. For example, it is particularly preferable to use a robot that imitates a person or an animal, or, for example, has an anthropomorphic form other than them.

装置は、コミュニケーションするための入力側のインターフェースを備えるとよく、例えばキーボードのような入力装置、例えば文字を読み込んでデータ化する光学文字認識（ＯＣＲ：Optical character recognition）とのインターフェースでもよいが、入力側のインターフェースとして音声によるものを備えるとよい。音声によるものとしては、例えばマイクロフォンで電気信号に変換した音声信号に基づく音声データの取得する機能を備えるとよい。
出力側のインターフェースとして音声によるものを備えるとよく、例えばスピーカ装置、イヤフォン等がよい。出力側のインターフェースとして目視によるものを備えるとよく、例えば表示内容を変更可能なディスプレイを備えるとよく、例えば液晶ディスプレイ（ＬＣＤ）、プラズマディスプレイ（ＰＤＰ）、有機ELディスプレイ、ブラウン管等の表示装置を備えるとよい。また、例えば印刷物による出力を備えるとよい。また特に出力側のインターフェースとして実際に動きを発生するものを備えるとよい。実際に動きを発生するものとしてアクチュエータを備えるとよい。例えばモータ等を備えるとよい。特に装置は実際に動きを発生する部材を備えるロボットとするとよい。装置は特に出力情報の出力を、実際に動きを発生する部材の動きとして行なうとよい。特に、出力側のインターフェースとしては音声によるものと目視によるものと実際に動きを発生するものをいずれも備えるとよい。
「ユーザー」は例えば装置を扱える人であって、一人でもよいが、複数人とするとよい。
「他の装置」は上記の装置の具体的な１つと例えば外観、機能等が同じであっても異なるものであってもよい。他の装置は音声出力をする機能を備えてなくともよい。音声出力機能を備えるとよい。また他の装置は音声入力をする機能を備えてなくともよいが音声入力機能を備えるとよい。
他の機器は、ネットワークにアクセスできない機器としてもよいが、ネットワークにアクセスできる機器とするとよい。特にインターネットにアクセスできる機器とするとよい。
出力情報は出力手段からある出力をさせる構成とするとよい。「ある出力」は、例えば外部に対する報知である。明らかな「報知」という形態でなくともそれによって結果的に何かの変化があったことだけでも「報知」と解釈できる。「ある出力」は必ずしも報知することを目的としたものでなくともよい。例えばなんらかの情報を有する、あるいはなんらの情報も有さない音や光の出力がよく、例えば何か物理的な量の変化、物の移動等がよい。 The device may have an interface on the input side to communicate, for example an interface with an input device such as a keyboard, eg Optical Character Recognition (OCR), which reads and digitizes characters, but the input It is preferable to have an audio interface as a side interface. For example, it is preferable to have a function of acquiring audio data based on an audio signal converted into an electrical signal by a microphone.
An audio interface may be provided as an interface on the output side, for example, a speaker device, an earphone, and the like. It is preferable to have a visual interface as an interface on the output side. For example, a display capable of changing the display content may be provided. It is good. In addition, for example, an output by a printed matter may be provided. In addition, it is preferable to provide an interface on the output side that actually generates motion. An actuator may be provided to actually generate movement. For example, a motor or the like may be provided. In particular, the device may be a robot with members that actually generate movement. The device may in particular output the output information as a movement of a member that actually generates a movement. In particular, as an interface on the output side, it is preferable to provide both an audio interface, a visual interface and an interface which actually generates motion.
The “user” is, for example, a person who can handle the device, and may be one person or plural persons.
The “other device” may be the same as or different from the specific one of the above devices, for example, in appearance, function, etc. Other devices may not have the function of outputting voice. It is preferable to have an audio output function. Further, the other device may not have the function of inputting voice, but may have the function of inputting voice.
The other device may be a device that can not access the network, but may be a device that can access the network. In particular, the device may have access to the Internet.
The output information may be configured to output an output from the output means. “A certain output” is, for example, a notification to the outside. Even if it is not a form of "inform", it can be interpreted as "inform" even if there is any change as a result. The "certain output" may not necessarily be intended for notification. For example, the output of sound or light having some information or no information is good, for example, change in some physical quantity, movement of an object, etc.

（２）前記装置は、音声による前記コミュニケーションによって表示部での表示態様を変化させるように表示させる表示機能を備えているとよい。 (2) The device may have a display function for displaying so as to change the display mode on the display unit by the communication by voice.

音声によってコミュニケーションを取る際に表示部に音声によるコミュニケーションとの関係で表示態様が変化させられるため、音声を目で見る表示に変更してコミュニケーションできることとなり、コミュニケーションを図る際の利便性が高まる。
「表示部」は、音声による前記コミュニケーションによって前記表示部での表示態様を変化させるように表示させるデバイスとするとよく、例えば液晶ディスプレイ（ＬＣＤ）、プラズマディスプレイ（ＰＤＰ）、有機ELディスプレイ、ブラウン管等のような表示装置がよい。特に表示部は装置に備えるとよい。
「音声による前記コミュニケーションによって表示部での表示態様を変化させるように表示させる」は、ユーザー又は他の機器の音声を表示部に表示させてその態様を変化させる場合と、装置自身の音声も表示部に表示させてその態様を変化させる場合のいずれか一方のみとしてもよいが、特に両方を備えるとよく、このときどちらか片方だけ表示させても両方とも表示させてもよいが、片方だけ表示させる状態と両方を表示させる状態との双方を備え、切り替え可能な構成とするとよい。例えば、片方だけ表示させる例としては下記実施の形態１のロボット１で顔画面Ｓ１が表示されている場合の態様であり、両方とも表示させる例としては下記実施の形態１のロボット１でチャット画面Ｓ２が表示されている場合の態様である。
表示態様としては、例えば音声との関係で画面を様々に変化させることとするとよく、例えば、音声によって表示画面に表示されたオブジェクトが動くようなアニメーションを実行させるとよい。例えば、音声の出力に伴って画像を変動させたり、音声の変化によって画像に他の画像を重ねたりするとよい。また、例えば、音声データを文字データに変換して表示させたりするとよい。その文字データの表示は音声の変化に応じて刻々と変化させるとよい。
音声による前記コミュニケーションは、前記出力情報の生成を制御する機能又は前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよく、特に前記コミュニケーションは前記出力情報の生成を制御する機能及び前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよい。
また、表示部での表示態様を変化させる機能は、前記出力情報の生成を制御する機能又は前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよく、特に前記コミュニケーションは前記出力情報の生成を制御する機能及び前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよい。
以下（３）以降も同様に、装置からの出力を行なう構成については、前記出力情報の生成を制御する機能又は前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよく、特に前記コミュニケーションは前記出力情報の生成を制御する機能及び前記コミュニケーションのための前記出力情報の出力を行うタイミングを制御する機能によって制御するとよい。 When communication is performed by voice, the display mode can be changed on the display unit in relation to the communication by voice, so that the voice can be changed to a visual display and communication can be performed, and convenience in communication can be enhanced.
The “display unit” may be a device for displaying so as to change the display mode in the display unit by the communication by voice, for example, a liquid crystal display (LCD), a plasma display (PDP), an organic EL display, a cathode ray tube, etc. Such a display device is good. In particular, the display may be provided in the device.
“Display the display mode on the display unit according to the communication by voice” so as to display the voice of the user or another device on the display unit and change the mode, and also display the voice of the device itself Only one of them may be displayed on the part and the mode may be changed, but it is preferable to provide both particularly, and at this time, only one or both may be displayed, but only one is displayed. It is preferable to have both a state in which the user is allowed to display and a state in which both are displayed, so that the configuration is switchable. For example, an example in which only one side is displayed is an aspect when the face screen S1 is displayed in the robot 1 of the following first embodiment, and an example in which both are displayed is a chat screen in the robot 1 of the following first embodiment It is an aspect when S2 is displayed.
As the display mode, for example, the screen may be changed variously in relation to the voice, and for example, an animation may be performed such that an object displayed on the display screen is moved by voice. For example, it is preferable to change the image according to the sound output or to superimpose another image on the image by the change of the sound. Also, for example, voice data may be converted into character data and displayed. The display of the character data may be changed every moment according to the change of the voice.
The communication by voice may be controlled by the function of controlling the generation of the output information or the function of controlling the timing of outputting the output information for the communication, and in particular, the communication controls the generation of the output information. It is good to control by the function which controls the timing which outputs a function and the said output information for the said communication.
Further, the function of changing the display mode in the display unit may be controlled by the function of controlling the generation of the output information or the function of controlling the timing of outputting the output information for the communication, in particular, the communication It is good to control by the function which controls the generation of the above-mentioned output information, and the function which controls the timing which outputs the above-mentioned output information for the above-mentioned communication.
Similarly, in the following (3) and later, the configuration for outputting from the device is preferably controlled by the function of controlling the generation of the output information or the function of controlling the timing of outputting the output information for the communication. Particularly, the communication may be controlled by a function of controlling the generation of the output information and a function of controlling the timing of outputting the output information for the communication.

（３）前記表示部での前記表示態様の変化は、前記ユーザー又は前記他の機器の少なくともいずれか一方の発話のみに基づく構成とするとことがよい。 (3) The change in the display mode on the display unit may be configured based only on the utterance of at least one of the user or the other device.

ユーザーや他の機器からの発話に基づいて装置が表示部での表示態様を変化させることでユーザーや他の機器側では自身の発話が装置に認識されているかが目視でわかることとなり、コミュニケーションを図る際の利便性が高まる。また、異種のヒューマンインターフェースによる特殊なコミュニケーションとなって、新鮮でおもしろさを感じる。
発話は直接的にユーザー又は他の機器から行われてもよく、間接的に発話を例えば文字データ化したものを使用してもよい。また、発話をなんらかの対応する情報、例えば他の音や視覚化した模様等に変換し、それに基づいて表示部で表示態様を変化させるようにしてもよい。
（３）では「ユーザー又は他の機器の少なくともいずれか一方の発話に基づく」ものであるため、例えば装置自身は音声のコミュニケーション機能とするとよい。
前記ユーザー又は前記他の機器の少なくともいずれか一方の発話に基づく構成としては、例えば、音声認識機能により音声を文字列に変換して前記ユーザー又は前記他の機器の少なくともいずれか一方の発話内容を特定する構成を備えるとよい。 When the device changes the display mode on the display unit based on the utterance from the user or another device, the user or the other device visually recognizes whether or not the user's utterance is recognized by the device, and communication is performed. Convenience in planning is enhanced. In addition, it becomes a special communication by different human interface and feels fresh and interesting.
The utterance may be made directly from the user or another device, or indirectly, for example, the utterance may be converted into character data. Alternatively, the utterance may be converted into some corresponding information, such as another sound or a visualized pattern, and the display mode may be changed on the basis of the information.
In (3), the device itself may be a voice communication function, for example, because it is "based on the utterance of the user or at least one of the other devices".
As a configuration based on the utterance of at least one of the user or the other device, for example, voice is converted to a character string by a voice recognition function, and the utterance content of at least one of the user or the other device is converted. It is preferable to have a configuration to identify.

（４）前記表示部での前記表示態様の変化は、前記装置からの音声出力と交互に行われるようにした。
コミュニケーションが一方的にならず、安定して意思疎通しながらコミュニケーションを図ることができる。「交互」とは、例えば基本的に装置側とユーザー又は他の機器側とのコミュニケーションが対話形式で進行するように構成とするとよく、片方だけが一方的に発話する構成でない構成とするとよい。 (4) The change of the display mode in the display unit is alternately performed with the audio output from the device.
Communication is not one-sided, and communication can be made while stably communicating. “Alternately” may be basically configured such that communication between the device side and the user or other device side proceeds in an interactive manner, and it may be configured not to have only one side speak unilaterally.

（５）前記表示態様は前記ユーザー又は前記他の機器の少なくともいずれか一方の発話が変換された文字情報でを備えるとよい。 (5) The display mode may include character information in which an utterance of at least one of the user or the other device is converted.

このようにすれば、ユーザーや他の機器側では自身の発話がどのように装置に認識されているかが表示された文字情報から具体的にわかることとなり、正しくコミュニケーションができているかを表示された文字情報の内容から判断できる。また、発話した内容の目視での確認ができる。また、認識が誤っているならもう一度言ったり、他の表現で言い直したりして正しいコミュニケーションに導くことができる。
「文字情報」は、例えば日本語であれば、例えば通常の漢字、ひらがな、かたかな等のユーザー又は他の機器の発話に基づく文字であり、発話が文を構成している場合には、漢字、ひらがな、かたかな、外国語表記等の混じった文節を有する文であることがよい。外国語、例えば英語や中国語等で発話される場合には、それらの文字で表示されることがよい。
例えば、音声認識機能と音声出力機能とを備え、音声認識機能によって音声認識し文字情報に変換された前記ユーザー又は前記他の機器の少なくともいずれか一方の発話の内容を表示部に表示させ、その内容に基づく返答文字情報を生成し、当該返答文字情報を音声合成機能により音声情報に変換して、音声として出力させる機能を備えると特によい。 In this way, the user or other equipment can know specifically from the displayed character information how the user's speech is recognized by the device, and it is displayed whether communication has been correctly performed. It can be judged from the content of the text information. In addition, it is possible to visually confirm the content of the utterance. Also, if recognition is wrong, it is possible to lead again to the correct communication by re-speaking or rephrasing in another expression.
For example, in the case of Japanese, "character information" is a character based on an utterance of a user such as ordinary kanji, hiragana or katakana or other device, and when the utterance constitutes a sentence, for example, It is preferable that the sentence has a mixture of kanji, hiragana, katakana, and foreign language notation. When uttered in a foreign language, such as English or Chinese, it is preferable that these characters be displayed.
For example, it has a voice recognition function and a voice output function, and causes the display unit to display the content of the utterance of at least one of the user or the other device which is voice-recognized by the voice recognition function and converted into character information It is particularly preferable to have a function of generating response character information based on the content, converting the response character information into speech information by a speech synthesis function, and outputting the speech information.

（６）前記発話が変換された文字は発話の開始から終了までの全内容が同時に前記表示部に表示されるようにするとよい。
ユーザーや他の機器からのある長さを持った発話全体が装置側に受け止められるため、その発話に対する装置からの正しいコミュニケーションが期待できる。また、自らが発話した内容の目視での確認が瞬時にできることとなり、以後のコミュニケーションの修正や方向性に寄与する。
発話の開始から終了は、人が一息で発話できる時間を加味して設定するとよい。発話の終了は、例えば、所定の時間、音声の発話がないとみなされる音の大きさが続いた時点とするとよい。発話の開始は例えば音の大きさが所定のレベルを越えたことを条件に開始するとよい。発話の開始は例えば音の大きさが所定のレベル以上のレベルの急激な変化が検出されたことを条件に開始するとよい。 (6) It is preferable that all the contents from the start to the end of the utterance be displayed on the display unit at the same time as the characters into which the utterance has been converted.
As the entire utterance having a certain length from the user or another device is received by the device side, correct communication from the device to the utterance can be expected. In addition, visual confirmation of the content spoken by oneself can be made instantaneously, which contributes to the correction and directionality of the subsequent communication.
The start and end of the utterance may be set in consideration of the time in which a person can speak in a single breath. The end of the utterance may be, for example, a point in time at which a predetermined amount of sound continues to be regarded as having no speech. The start of speech may be started, for example, on condition that the loudness exceeds a predetermined level. The start of the speech may be started, for example, on the condition that a sudden change in the level of the sound is detected above a predetermined level.

（７）前記装置は、前記装置と前記ユーザー又は前記他の機器の少なくともいずれか一方の音声によるコミュニケーションの対話履歴を文字情報として前記表示部に表示させる機能を備えるとよい。 (7) The device may have a function of causing the display unit to display an interaction history of communication by voice of at least one of the device and the user or the other device as character information.

装置とユーザー又は他の機器との間でどのように対話がされたかが容易にわかり、音声によるコミュニケーションにおける利便性が高まる。
対話履歴の表示は、例えばどちらが対話したものかがわかるように表示させることがよい。そのためには、例えば吹き出しを設けていずれの発話に基づく文字情報かを区別することがよい。過去の対話については画面上でスクロールして確認できることがよい。対話形式であることを示すために装置側とユーザー又は他の機器側とで異なるアバターキャラクターを表示させるとよい。いつ発話したのかその日時が同時に表示されるとユーザーが対話履歴から過去を思い出す契機となるのでよい It is easy to see how the device interacts with the user or other device, and the convenience in voice communication is enhanced.
The display of the dialogue history may be made to show, for example, which one has interacted. For that purpose, for example, it is preferable to provide a speech balloon to distinguish which speech information is based on which speech. About past dialogue, it is good to be able to scroll and confirm on a screen. Different avatar characters may be displayed on the device side and the user or other device side to indicate that it is interactive. It is good for the user to remember the past from the dialogue history if the date and time are displayed at the same time when it speaks

（８）前記装置は、前記対話履歴を文字情報として表示させる際に、前記ユーザー又は前記他の機器の交替があり、前記装置の対話対象が代わった場合には前記表示部にその旨が表示させる機能を備えるとよい。 (8) When the apparatus displays the dialog history as character information, the user or the other device is replaced, and when the dialog target of the apparatus is replaced, the display indicates that effect It is good to have a function to

装置の対話対象が代われば対話内容にも変化がある。対話対象が代わった旨を表示させることで、例えば過去の対話履歴を見た際にその表示があることでその前後で対話内容が代わることが読み手にわかるため、対話内容の切れ目がわかることとなる。
装置は前記ユーザー又は前記他の機器の交代を検出する機能を備えるとよい。交代の検出は、音声の特徴の変化から検出する機能を備えるとよいが、カメラを用いて周囲の人または機器の状態を取得して検出する機能を備える構成が望ましく、特に両者に基づいて検出する構成とするとよい。 There is also a change in the contents of dialogue if the dialogue target of the device is replaced. By displaying the fact that the dialogue target has been replaced, for example, when the past dialogue history is viewed, it is understood that the dialogue contents are replaced before and after that display, so that the dialogue contents can be understood. Become.
The device may have a function of detecting the change of the user or the other device. The detection of alternation preferably has a function of detecting from changes in voice characteristics, but a configuration having a function of acquiring and detecting the status of a surrounding person or device using a camera is desirable, and in particular detection based on both. Configuration.

（９）前記装置は音声認識機能によって前記ユーザー又は前記他の機器の少なくともいずれか一方の音声の認識状況を前記装置の表示部に表示させる機能を備えるようにするとよい。 (9) The device may have a function of causing a display unit of the device to display a recognition status of at least one of the user and the other device by using a voice recognition function.

例えば、ユーザーや他の機器が発話している場合に、それを間違いなく聞いていることをユーザー等に理解させることで円滑に対話が行われていることをユーザー等に理解させることができる。
例えば、視覚を通じた機能として、表示部に音声認識の度合いに応じて、例えば表示画面や表示画面に表示されるオブジェクトの色を変えたり、例えば音声の認識状況応じて異なるオブジェクトを表示をさせたり、音声認識の度合いに応じて量的な表示、例えばよく認識していれば高い数値を示したりすることがよい。聴覚を通じた機能として、例えば音声認識の度合いに応じて音を大きくしたり小さくしたりすることがよく、例えば音色を変えたりすることがよい。
特に擬人化された態様での表示を表示部に行なうようにし、その表情を変化させる構成とするとよい。 For example, when the user or another device is speaking, it is possible to make the user etc. understand that the dialogue is being carried out smoothly by making the user etc. understand that he or she is surely listening to it.
For example, as a function through vision, for example, the color of an object displayed on a display screen or a display screen may be changed according to the degree of speech recognition on a display unit, or different objects may be displayed according to a speech recognition situation. In accordance with the degree of speech recognition, it is preferable to display quantitatively, for example, a high numerical value if well recognized. As a function through hearing, for example, it is preferable to make the sound larger or smaller depending on the degree of speech recognition, for example, it is preferable to change the timbre.
In particular, it is preferable to display on the display portion in a personified mode, and to change the expression thereof.

（１０）音声を認識して文字列に変換した結果を用いて前記音声出力を行うことで前記コミュニケーションを行うための機能を備え、音声を認識して文字列に変換した結果が、予め前記結果の文字列と出力内容との対応関係を記憶した記憶手段に記憶された文字列と一致する部分がある場合に当該文字列に対応する出力内容を音声出力する機能を備えるようにするとよい。 (10) A function is provided to perform the communication by performing the voice output using the result of recognizing the voice and converting it into a character string, and the result of recognizing the voice and converting it into a character string is the result in advance It is preferable to have a function of voice-outputting the output content corresponding to the character string when there is a portion that matches the character string stored in the storage means storing the correspondence between the character string and the output content.

音声を認識して変換した文字列が音声記憶手段に記憶された文字列と一致する部分がある場合に、装置内のみで必要な音声出力ができれば、ユーザーや他の機器からの発話に迅速に応じることができる。また、外部サーバーに接続しないため、接続のためのコストが削減できる。
例えば、記憶手段に記憶された文字列として多数のビルトインシナリオを用意することがよい。ビルトインシナリオは予定された対話であって例えば、ユーザーからの「おはよう」に対して装置側から「お元気ですか」と返答するような簡単な挨拶や、一定の処理を実行するための、例えば「設定画面を開いて」（ユーザー）、「本当にいいですか」（装置）、「はい」（ユーザー）、「じゃあ、設定画面を開くね」（装置）のようなシナリオ等がよい。記憶手段としては、例えばコンピュータ内部のＲＯＭやＳＳＤや外付けのＳＤカードmicroＳＤカード、ＣＤ−ＲＯＭ等がよい。 If there is a part where the character string recognized and converted to voice matches the character string stored in the voice storage means, if necessary voice output can be performed only in the device, the user or other equipment can utter quickly It can respond. In addition, the connection cost can be reduced because the server is not connected to an external server.
For example, it is good to prepare a large number of built-in scenarios as a character string stored in the storage means. The built-in scenario is a scheduled dialogue, for example, a simple greeting such as a response from the user to "Good morning" to "Good morning" from the user, or to execute certain processing, for example Scenarios such as “Open setting screen” (user), “Is it really good” (device), “Yes” (user), “Just open the setting screen” (device), etc. are good. As a storage means, for example, a ROM or an SSD inside a computer, an external SD card microSD card, a CD-ROM or the like is preferable.

ここで「文字列と一致する部分がある場合」とは、完全に記憶された文字列と一致する場合と、ある部分が異なっていてもよい正規表現である場合である。正規表現とは文字列の集合を一つの文字列で表現する言語処理方法の一つであり、例えば「×××音量大きく××」という場合に「ユピ坊音量大きくしてよ」とか「おい音量大きくして」「音量大きくしてください」等のように異なる部分があっても要部が一致すれば解釈として「音量を大きく」する表現として認識するような場合である。そして、「当該文字列に対応する出力内容を音声出力する」とは、例えば、このような「音量を大きく」するという当該文字列に応じて「はい、音量を大きくします」というような音声出力がよい。また、このような音声出力に続いて装置はある処理をするようにしてもよい。例えば、「はい、音量を大きくします」という発話の後で装置は実際に以後の対話における自身の発話の音量を大きくすることがよい。
音声を認識して文字列に変換する処理は、装置で行なうようにしてもよいが、ネットワークに接続された音声認識サーバーに対して音声データを送信し、音声認識サーバーで変換された文字列を受信するようにして行なうようにしてもよい。望ましくは両者を備えるとよく、コミュニケーションの状況等に応じていずれの結果を用いるかを決定する機能を備えるとよい。 Here, “when there is a portion that matches a character string” is a case where it matches a completely stored character string and a case where it is a regular expression in which a certain portion may be different. A regular expression is one of the language processing methods for representing a set of character strings as one character string. Even if there are different parts such as "increase the volume,""increase the volume," etc., it is a case where it is recognized as an expression that "increases the volume" as an interpretation if the main parts coincide. Then, "output voice corresponding to the character string" means, for example, a voice such as "Yes, increase the volume according to the character string of" increase the volume ". The output is good. Also, the device may perform some processing following such voice output. For example, after the utterance "Yes, increase the volume", the device may actually increase the volume of its own utterance in subsequent dialogues.
The process of recognizing speech and converting it into a character string may be performed by the device, but the speech data is sent to a speech recognition server connected to the network, and the character string converted by the speech recognition server is It may be made to receive. Preferably, both may be provided, and a function may be provided to determine which result is to be used according to the communication situation or the like.

（１１）音声を認識して文字列に変換した結果が、予め前記結果の文字列と出力内容との対応関係を記憶した記憶手段に記憶された文字列と一致する部分がない場合に、対話エンジンを備えるサーバーに接続して音声データを出力する機能を備えるとよい。 (11) If there is no part where the result of speech recognition and conversion into a character string matches the character string stored in the storage means in which the correspondence between the character string of the result and the output content is stored in advance It is preferable to have a function of outputting voice data by connecting to a server provided with an engine.

音声を認識して変換した文字列が音声記憶手段に記憶された文字列と一致する部分がない場合に、対話エンジンを備えるサーバーに接続するため、接続のためのコストが削減できる。
サーバーは、例えばインターネット回線を使用して接続する記憶部、制御部としてのコンピュータの機能を有する装置とするとよい。本発明では対話エンジンを備えていることがよい。外部サーバーの場合には例えばＩＤやパスワードや電子認証によって接続可能となる。外部サーバーはクラウドサーバーがよい。サーバーは音声認識エンジンを備え、音声認識エンジンによって音声を文字列データに変換ことができることがよい。変換された文字列データはインターネット回線を使用して装置に送信されることがよい。
対話エンジンを備えるサーバーに接続して音声データを出力する機能は、例えば、音声認識した文字列を対話エンジンに送信し、対話エンジンからその文字列に対応する対話内容を含む文字列を受信して、当該対話内容の文字列を音声合成機能で音声データに変換するとよい。
文字列の音声データへの変換は、装置に備えた音声合成エンジンで行ってもよいが、文字列を音声データに変換する音声認識サーバーに文字列を送信し、当該音声認識サーバーから変身された当該文字列に対応する音声データを受信して行うとよい。 The connection cost can be reduced because the connection is made to the server equipped with the dialogue engine when there is no part where the character string recognized and converted to speech matches the character string stored in the speech storage means.
The server may be, for example, an apparatus having a storage unit connected using an internet connection and a computer function as a control unit. The present invention may include a dialog engine. In the case of an external server, connection is possible by, for example, ID, password or electronic authentication. The external server is preferably a cloud server. The server may include a speech recognition engine, which may convert speech into text data by the speech recognition engine. The converted string data may be sent to the device using an internet connection.
The function of connecting to a server equipped with a dialog engine to output voice data, for example, transmits a voice-recognized character string to the dialog engine, and receives from the dialog engine a string including dialog content corresponding to the string. The character string of the dialogue content may be converted into speech data by a speech synthesis function.
The conversion of the character string into voice data may be performed by a voice synthesis engine provided in the device, but the character string is transmitted to a voice recognition server that converts the character string into voice data, and converted from the voice recognition server It is preferable to receive voice data corresponding to the character string.

（１２）音声を認識して文字列に変換した結果が、予め前記結果の文字列と出力内容との対応関係を記憶した記憶手段に記憶された文字列と一致する部分があっても、ある条件を満たすことで音声認識エンジンを備えるサーバーに接続して音声データを出力する機能を備えるようにするとよい。 (12) Even if there is a portion where the result of speech recognition and conversion into a character string matches the character string stored in the storage means in which the correspondence between the character string of the result and the output content is stored in advance By satisfying the condition, it is preferable to connect to a server provided with a speech recognition engine and have a function of outputting speech data.

音声を認識して変換した文字列が記憶手段に記憶された文字列と一致する部分がある場合に、ユーザーが予測できるような決まった音声出力をすることは対話の意欲を削ぐことにもなるため、敢えてこのよう外部サーバーに接続することが、より人間的な対話ができることとなりよい。
例えば、ユーザーから「こんにちは」と発話がされ、それを装置側が認識した場合に、本来のシナリオでは「こんにちは、ご機嫌はいかがですか」というように対話をさせるビルトインシナリオであった場合に、そのシナリオを使用せずに外部サーバーに「こんにちは」という音声データをリクエストし、外部サーバーの対話エンジンを使用してその「こんにちは」に対する返答データの作成をリクエストするようにすることがよい。ある条件は例えば何回かに一回の回数や、ランダムなタイミングとするとよい。 If there is a portion that matches the character string stored in the storage means with a character string that is recognized and converted into speech, providing a fixed voice output that can be predicted by the user also reduces the motivation for interaction. Therefore, daringly connecting to an external server in this way is better for more human interaction.
For example, the utterance as "Hello" from the user, if it was recognized device side, is in the original scenario, "Hello, your mood is How about" when was the built-in scenario to the dialogue and so on, the to request the voice data of "Hello" to an external server without the use of a scenario, it is possible to to request the creation of the response data for the "Hello" to use the interactive engine of an external server. A certain condition may be, for example, a number of times or random timing.

（１３）音声認識後に音声が途切れて無音状態となったことを検知する機能と、音声認識から無音状態となるまでの音声データを記憶する記憶手段と、前記記憶手段に記憶された音声データを無音状態となったタイミングで音声認識エンジンを備えるサーバーに接続して音声データを出力する機能を備えるとよい。 (13) A function of detecting that a voice is interrupted and becomes silent after voice recognition, a storage unit for storing voice data from voice recognition to silence, and voice data stored in the storage unit It is preferable to have a function of outputting voice data by connecting to a server provided with a voice recognition engine at the timing of silence.

対話においてはしばしば無音状態となることがある。しかし、無音状態となっても外部の音声認識エンジンを備えるサーバーに接続したままでは無用なコストがかかってしまう。そのためこのような前もって音声認識から無音状態となるまでの音声データを記憶手段に記憶させ、リアルタイムではなくその音声データを無音状態となったタイミングで送ることで無音部分の時間分をカットできるため、コストが削減できる。 Often in the dialogue there may be silence. However, even if silence occurs, connecting to a server equipped with an external speech recognition engine results in unnecessary costs. Therefore, by storing voice data from voice recognition to silence in advance in storage means and sending the voice data not at real time but at timing when silence is achieved, it is possible to cut the time of silent part, Cost can be reduced.

（１４）前記装置は録音機能を備え、所定の音圧レベルの音声の検出によって音声認識エンジンを備えるサーバーに接続して音声データを出力する機能を備えるようにするとよい。
常に外部の音声認識エンジンを備えるサーバーに接続したままでは無用なコストがかかってしまう。これによって無音や無音に近いような対話になっていない場合には接続せずに必要な対話が開始される場合にのみ外部サーバーに接続するため、接続のためのコストが削減できる。
音声認識エンジンを備えるサーバーは、装置ですでに録音済みの過去の所定期間の録音データを受信して、当該録音データに対する文字列を返信するものとしてもよいが、特に、例えばストリーミングデータとしてリアルタイムに音声データを受信して、文字列を返信するタイプのものとするとよい。音声データの受信時間当たり何円という形で従量課金等されるケースが多いが、大幅にコストを削減することが可能となる。 (14) The device may have a recording function, and may be connected to a server provided with a voice recognition engine by detection of voice of a predetermined sound pressure level to output voice data.
Always connected to a server with an external speech recognition engine costs money. This can reduce the cost for connection since the connection to the external server is made only when the necessary dialogue is started without being connected when the dialogue is not silent or close to silence.
The server provided with the speech recognition engine may receive the recording data of the past predetermined period already recorded by the device, and may return a character string for the recording data, but in particular, for example, in real time as streaming data. It is preferable to receive voice data and return a character string. Although there are many cases in which pay-as-you-go charges are often made in the form of how many yen per voice data reception time, it is possible to significantly reduce the cost.

（１５）音声認識エンジンを備えるサーバーに接続して音声データを出力した際に、前記サーバーがビジー状態である場合に、前記ユーザーに対して記憶手段に記憶された対話データから選択された対話例を音声出力する機能を備えるようにするとよい。 (15) Example of dialogue selected from dialogue data stored in storage means for the user when the server is in a busy state when connected to a server equipped with a speech recognition engine and outputting voice data It is preferable to have a function of outputting a voice.

ビジー状態である場合にはその旨の報知をすることが普通であるが、例えば対話途中でそのような報知は唐突でいかにも対話とは関係ない発話であり、対話がしらけてしまう可能性もある。そのため、ビジー状態である旨の報知の代わりに例えば「もう一度いってくれる？」という呼びかけや「ほう、そうですか」などのつなぎの発話をして対話をつなぐようにすれば、その間に音声認識エンジンに接続して適切な対話を続けることが可能となるし、対話が不自然にならない。 When busy, it is normal to notify that effect, but for example, such an alert during a dialogue is a sudden and an utterance that has nothing to do with the dialogue, and there is a possibility that the dialogue may be obscured. . Therefore, instead of announcing that it is busy, for example, if it is made to connect the dialogue by making a speech such as "Are you going again?" It is possible to connect to the engine and continue the appropriate dialogue, and the dialogue does not become unnatural.

（１６）認識した前記ユーザーの発話が長すぎると判断した場合に、音声認識エンジンを備えるサーバーに接続することなく記憶手段に記憶された音声データから選択された対話例を音声出力する機能を備えるようにするとよい。 (16) When it is determined that the user's recognized speech is too long, it has a function of outputting an example of dialogue selected from speech data stored in storage means without connecting to a server equipped with a speech recognition engine You should do it.

ユーザー側の発話が長すぎると、音声認識エンジンが誤認識をする可能性がある。そして、その結果的外れな返答が返ってくることがある。そのため、一定以上のセンテンスになってしまった場合には、あえてそのような可能性を排除して対話を仕切り直しするために「うん」とか「マジ？」とか「本当ですか？」などという対話においてどのようにも取れる相づちのような対話例を選択して音声出力することがよく、それによって適切な対話を続けることが可能となる。 If the user speaks too long, the speech recognition engine may misinterpret. And, as a result, a wrong response may be returned. Therefore, when it comes to a sentence of a certain level or more, in a dialogue such as “Yes” or “Maj?” Or “Is it true?” In order to daringly eliminate such possibility and regroup the dialogue. It is good to select and output voice examples of interactive examples such as correlations that can be taken in any way, thereby enabling appropriate dialogue to be continued.

（１７）対話による前記コミュニケーションにおいて、前記装置の音声を聞き逃した際に、前記ユーザーのある発話に基づいて前記装置は直前の音声を再度出力するとよい。
例えば「もう一度言って」とか「もう一回しゃべって」のような直前に装置が話した言葉が聞き取れなかったり、うっかり聞き忘れた場合にこのような呼びかけをすることで、直前に装置が話した言葉を発話させることができる。これによって、直前まで行っていた対話を途切れさせることなくそのまま続けることが可能となる。 (17) In the communication by dialogue, when the user misses the voice of the device, the device may output the last voice again based on a certain utterance of the user.
For example, the device spoke immediately before such a call when the device speaks immediately before, such as "Say again" or "Speak again" or such a call is made if it is inadvertently forgotten You can make them speak words. By this, it is possible to continue as it is without interrupting the dialogue which has been performed until just before.

（１８）音声を認識できなかった場合に、前記ユーザーに対して再度の発話を促すように前記装置から音声が出力されるとよい。
これによって、直前まで行っていた対話を途切れさせることなくそのまま続けることが可能となる。 (18) The voice may be output from the device so as to prompt the user to speak again if the voice can not be recognized.
By this, it is possible to continue as it is without interrupting the dialogue which has been performed until just before.

（１９）認識した音声内容がある条件を満たす場合に、表示部にある表示をさせるようにするとよい。
例えば、所定の言葉が含まれた発話がされ、それを音声認識した場合に、表示部にその言葉に対応する「ある表示」をさせるようにする。「所定の言葉」とは、例えば、ユーザーの誕生日、ユーザーの子供の名前、装置の愛称、会社の名称、特定の宣伝用のキャッチフレーズ等とするとよい。所定の言葉とある表示との対応関係を予め設定しておく機能を備えるとよい。これによって、単なる対話に留まらず目視を含めたコミュニケーションをすることができ、装置との間でコミュニケーションの態様が増すこととなってコミュニケーションを図る際の利便性が高まる。 (19) It is preferable to cause the display unit to display when the recognized voice content satisfies a certain condition.
For example, when an utterance including a predetermined word is voice-recognized, the display unit is made to make "a certain display" corresponding to the word. The “predetermined word” may be, for example, the user's birthday, the name of the user's child, the nickname of the device, the name of the company, a tagline for a specific advertisement, or the like. It is preferable to have a function of setting in advance a correspondence between a predetermined word and a certain display. By this, communication can be performed including visual observation as well as simple dialogue, and the aspect of communication with the device is increased, and convenience in communication is enhanced.

（２０）前記装置は筐体又は筐体に接続される部分を動かす機能を備え、認識した音声内容がある条件を満たす場合に、筐体又は筐体に接続される部分がある動きをするとよい。
例えば、所定の言葉が含まれた発話がされ、それを音声認識した場合に、筐体又は筐体に接続される部分にその言葉に対応するある動き、例えばジェスチャーをさせるようにする。これによって、単なる対話に留まらず装置の動きを含めたコミュニケーションをすることができ、装置との間でコミュニケーションの態様が増すこととなってコミュニケーションを図る際の利便性が高まる。上記の「ある表示」と組み合わせると特によい。 (20) The device has a function of moving the housing or a part connected to the housing, and the housing or the part connected to the housing may move when the recognized audio content satisfies a certain condition .
For example, when an utterance including a predetermined word is speech-recognized, the housing or a portion connected to the housing causes a certain movement, for example, a gesture corresponding to the word. By this, communication can be performed including not only mere dialogue but also the movement of the device, and the aspect of communication with the device can be increased, and convenience in communication can be enhanced. It is particularly good in combination with the above "a certain display".

（２１）前記装置は前記ユーザーが目として認識できる部分である目部と、前記ユーザーの位置を認識するユーザー位置認識機能と、前記目部を動かす機能とを備え、前記コミュニケーションとして前記位置認識機能で認識した前記ユーザーの位置方向を向くよう前記目部を動かす機能を備えるとよい。 (21) The device has an eye portion which is a portion that the user can recognize as an eye, a user position recognition function that recognizes the position of the user, and a function that moves the eye portion; It is preferable to have a function to move the eye part so as to face the position direction of the user recognized in the above.

目として認識できる目部がユーザーの位置方向を向くことで、実際に人と話しているような疑似感覚を得られることとなり、装置とコミュニケーションを取りたいという欲求もますこととなり、装置の利用価値が向上する。
「ユーザーが目として認識できる部分である目部」は表示画面に表示されるオブジェクトとしての目でもよく、そのようなバーチャルな映像ではない実際に機械的に動作する目でもよい。目部と同期して装置自体もユーザーの位置方向を向くよう制御してもよい。目部をユーザーの方向に向けるための装置だけをユーザーの位置方向を向くよう制御してもよい。 When the eyes that can be recognized as eyes turn to the position direction of the user, it is possible to obtain a pseudo sensation that you are actually talking to a person, and there is also a desire to communicate with the device. Improve.
The “eye portion that is a portion that can be recognized by the user as an eye” may be an eye as an object displayed on the display screen, or an eye that actually operates mechanically that is not such a virtual image. The device itself may be controlled to be directed toward the position of the user in synchronization with the eyes. Only the device for turning the eye towards the user may be controlled to turn towards the user's position.

（２２）前記装置はユーザーの顔を認識する顔認識機能を備えるとよい。
個々の人物の顔を識別できるため、個々の個性に応じたコミュニケーションをとることが可能となる。例えば個々の人物の認証された顔と名前を関連付けすることで、対話の際に顔認識した人物をその名前で呼ぶことができる。また、過去の対話履歴に基づいて顔認識した人物に特化した対話を行う構成とすると特によい。 (22) The device may have a face recognition function that recognizes the face of the user.
Since the face of each person can be identified, it is possible to communicate according to each individual personality. For example, by associating the name with the authenticated face of each person, it is possible to call the person who recognized the face during the dialogue by that name. In addition, it is particularly preferable to adopt a configuration in which a dialog specialized for a person whose face is recognized based on a past dialog history.

（２３）前記装置は表示部を備え、前記顔認識機能によってユーザーの顔の認識状況を表示部に表示させる機能を備えるとよい。
このようにすればユーザーは自身の顔の装置での認識状況を表示部を見ることで把握できる。特に認識状況として顔認識が完了しているか、それとも未だ人物の顔として認識されていないかを表示させるとよく、このようにすれば、ユーザーは未だ認識が完了していなければなるべく装置が認識しやすいように顔を動かさないようにして協力することができる。 (23) The apparatus may include a display unit, and may have a function of causing the display unit to display the recognition status of the user's face by the face recognition function.
In this way, the user can grasp the recognition status of the user's own face by looking at the display unit. In particular, it is preferable to display whether face recognition has been completed or not yet recognized as a person's face as a recognition situation. In this way, the user recognizes as much as possible if the recognition is not yet completed. You can work together by keeping your face stationary for ease.

（２４）前記位置認識機能は、三角形の頂点に配置された３つのマイクロフォンと、音源から前記３つのマイクロフォンの各々までの音の到達時間の差に基づき、前記音源の位置を、前記三角形を含む平面に垂直な方向に沿って前記三角形を含む平面に投影した位置から前記平面の前記三角形で囲まれた領域の内側にある基準位置へ向かう音源方向を特定する特定部と、を備える音源方向特定機能であるとよい。
これによって３つのマイクロフォンで音源方向を特定することができる。そして、音源方向を特定することができれば、ユーザーが発話すればその方向に装置を向けさせることができるため、対話によるコミュニケーションをしているようにユーザーは感じることができる。 (24) The position recognition function includes the triangle and the position of the sound source based on the difference between the three microphones arranged at the vertex of the triangle and the arrival time of sound from the sound source to each of the three microphones. Specifying a sound source direction identifying a sound source direction from a position projected on a plane including the triangle along a direction perpendicular to the plane to a reference position inside the area surrounded by the triangle on the plane It should be a function.
This makes it possible to specify the sound source direction with three microphones. Then, if the direction of the sound source can be specified, the user can feel as if communicating by interaction, since the device can be directed to that direction if the user speaks.

（２５）前記装置は赤外線リモコン信号出力部を備え、前記コニュニケーションは赤外線リモコン受信機能を備える前記他の機器との間のコミュニケーションであるとよい。
これによって装置の赤外線リモコン信号出力部を介して簡単に機器との間のコミュニケーションを取ることができる。
例えば、赤外線リモコン信号受信部を備えた受信側装置、例えば、赤外線リモコン信号受信部を備えた受信側装置、例えばテレビ、オーディオ装置、エアコン装置等に対して装置から赤外線リモコン信号を出力して例えばＯＮ・０ＦＦ等の制御を実行させることが可能となる。特に装置に音声対話機能を備え、例えば「テレビつけて」とか「テレビ消して」という命令語句の発話に対し、装置はその命令に基づいて赤外線リモコン信号出力部を制御する構成とよい。 (25) The device may include an infrared remote control signal output unit, and the communication may be communication with the other device having an infrared remote control reception function.
This allows easy communication with the device via the device's infrared remote control signal output.
For example, an infrared remote control signal is output from the device to a receiving side apparatus including an infrared remote control signal receiving unit, for example, a receiving side apparatus including an infrared remote control signal receiving unit, such as a television, an audio It becomes possible to execute control such as ON · FF. In particular, the apparatus may have a voice interactive function, and the apparatus may be configured to control the infrared remote control signal output unit based on the instruction when an instruction word such as "turn on TV" or "turn off TV" is uttered.

（２６）前記装置は前記他の機器からのインターネットを介して遠隔操作されるようにするとよい。
他の機器から装置を遠隔操作できるため、装置の利便性が高まる。例えば、他の機器としてのスマートフォン等とするとよい。装置にはカメラを備えるとよい。装置にはカメラの向きを変える機構を備えるとよい。他の機器からアクセスして、例えば、装置側のカメラ動画を見たり、カメラの向きを代えたりすることがよい。これによって装置の近くにいなくとも装置の制御が可能となる。また、例えばスマートフォンからアクセスして、例えば装置の見守り機能をＯＮとして、人（物）が動いたことをスマートフォンにｅメールで通報するようにするとよい。また、例えば病人や被介護者の見守りとして、常に動いていることを前提とし、例えば一定時間以上その人が動いていない場合に通報するようにするとよい。他の機器とは、例えばタブレット端末やパソコン等でもよい。 (26) The device may be controlled remotely via the Internet from the other device.
Since the device can be remotely controlled from another device, the convenience of the device is enhanced. For example, a smartphone as another device may be used. The device may be equipped with a camera. The device may be equipped with a mechanism to change the orientation of the camera. It may be accessed from another device, for example, to view a camera moving image on the apparatus side or to change the camera orientation. This allows control of the device without being close to the device. For example, it is good to access from a smart phone, for example, set a watching function of the device to ON, and notify the smart phone that a person has moved by e-mail. Further, for example, it is preferable to always move when watching for a sick person or a care receiver, and for example, when the person is not moving for a predetermined time or more, notification may be made. The other device may be, for example, a tablet terminal or a personal computer.

（２７）装置は前記他の機器からインターネットを介して送信された文字情報を用いて前記音声出力を行うようにするとよい。
例えば受信した電子メールの文字列を読み上げる機能を備えるとよい。誰かからのｅメールが届く設定にしておくことで、そのメール内容が装置から音声出力されるため、自身の端末を目視で確認する必要がなくなる。他の機器とは、例えばスマートフォンやタブレット端末やパソコン等とするとよいが、他の「装置」としてもよい。 (27) The apparatus may output the voice using character information transmitted from the other device via the Internet.
For example, it is preferable to have a function of reading out a received e-mail character string. By setting that e-mail from someone can receive it, the contents of the e-mail are outputted from the device by voice, so it is not necessary to visually check the own terminal. The other device may be, for example, a smartphone, a tablet terminal, a personal computer or the like, but may be another “device”.

（２８）前記音声出力は前記文字情報の内容によって前記音声出力を行う時間、時刻又は回数を変更できるとよい。
例えば「薬飲んだ」とういうメールは決まった時刻にしゃべらせたい。例えば件名の記載が合致することで、所定の時刻に装置が発話したり、例えば重要な内容を時間を空けて２回発話させるようにすれば、装置側の近くにいるユーザーにメール内容を間違いなく実行させることができる。 (28) It is preferable that the time, time or number of times of performing the voice output can be changed according to the contents of the character information.
For example, an e-mail saying "I took a medicine" would like to speak at a fixed time. For example, if the description of the subject line matches, the device speaks at a predetermined time, or if important content is uttered twice, for example, the mail content may be incorrect for the user near the device. It can be run without.

（２９）前記装置は音声認識した文字情報を前記他の機器へインターネットを介して送信する機能を有するとよい。
装置の音声コミュニケーション機能を使用して音声を文字化して他の機器に文字データとして送れば、例えばｅメールを送りたい場合に自身の端末に手入力しなくとも、送ることができる。 (29) The device may have a function of transmitting voice-recognized character information to the other device via the Internet.
If the voice communication function of the device is used to convert the voice into characters and send it as character data to another device, it is possible to send, for example, an e-mail without sending it manually to its own terminal.

（３０）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、最も長い前記出力文字列を選択して対話させる機能を有するとよい。
最も長い返答であると、いかにも対話しているように感じ、対話の単調さがなくなり、聞き手（ユーザー）は対話を楽しむことができる。 (30) The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string, and the output character string from the plurality of different servers It is preferable to have a function to receive a string output as and to select and interact with the longest output string.
With the longest response, it feels as if it were interacting, the monotony of the interaction disappears, and the listener (user) can enjoy the interaction.

（３１）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、語尾に疑問符がついた前記出力文字列を選択して対話させる機能を有するとよい。
語尾に疑問符がつくと、その疑問に更に答えるような話の流れになるため、会話が続きやすくなり聞き手（ユーザー）は対話を楽しむことができる。 (31) The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string, and the output character string from the plurality of different servers It is preferable to have a function to receive a string output as and to select and interact with the output string with a question mark appended to the end.
When a question mark is attached to the end, it becomes a flow of speech that answers the question further, so that the conversation can be continued and the listener (user) can enjoy the dialogue.

（３２）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、肯定文を組み合わせた後に疑問文を組み合わせて対話させる機能を有するとよい。
このようにアレンジすることでいかにも考えて文章を練ったような応答になるため、ユーザーは真剣に自身の発話を聞いてもらっているような感覚となり、続けて会話をしたいと思うようになるため、会話が続きやすくなり聞き手（ユーザー）は対話を楽しむことができる。また、出力尺をかせぐことができるとともに聞き手（ユーザー）への返答を求めることができる。 (32) The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string, and the output character string from the plurality of different servers It is good to have the function to receive the character string outputted as and to make the question text combine and interact after combining the positive sentences.
Because arranging in this way makes the response look like thinking and writing sentences, the user feels like they are seriously listening to their own speech, and they want to continue talking. The conversation can be continued easily and the listener (user) can enjoy the dialogue. In addition, it is possible to gain an output scale and to request a response to a listener (user).

（３３）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、話題転換した文字列を最後に配置するように組み合わせて対話させる機能を有するとよい。
このようにアレンジすることで話題転換したことで次の発話を誘うような対話となり、対話が続きやすくなる。 (33) The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string, and the output character string from the plurality of different servers It is preferable to have a function to receive the character string output as and to combine and interact so as to place the switched character string at the end.
By arranging in this way, it becomes a dialogue that invites the next utterance by having a topic change, and the dialogue becomes easy to continue.

（３４）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、フレンドリーな前記出力文字列を初めに配置するように組み合わせて対話させる機能を有するとよい。
このようにアレンジすることで聞き手（ユーザー）が対話に引き込まれやすくなり、対話が続きやすくなる。 (34) The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string, and the output character string from the plurality of different servers It is preferable to have a function to receive a string output as and to combine and interact so that the friendly output string is placed first.
This arrangement makes it easy for the listener (user) to be drawn into the dialogue and makes the dialogue easy to continue.

（３５）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、それらをランダムな順で組み合わせて対話させる機能を有するとよい。
対話のバリエーションが増えることとなるため、聞き手（ユーザー）が同じ発話をした場合でもまったく同じ応答が帰ってきてしまうことがなくなり、対話に飽きることがなく対話が続きやすくなる。 (35) The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string, and the output character string is output from the plurality of different servers It is good to have the function to receive the character string output as and to combine them in random order and to make it interact.
Since the variation of the dialogue increases, the same response does not return even when the listener (user) makes the same utterance, and the dialogue can be continued without getting tired of the dialogue.

（３６）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、それらの内に顔文字を含む前記出力文字列がある場合には、対話対象とせず、表示部には対話対象とされた前記出力文字列と一緒に表示させる機能を有するとよい。
顔文字は音声出力できないが、表示部に敢えて顔文字を表示させることで、音声と併せて対話の一部とすることで通常にはない対話のおもしろさを創出することができる。 (36) The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string, and the output character string from the plurality of different servers Receive the character string output as and when there is the output character string including the emoticon in them, it is not the dialog target, and displayed together with the output character string targeted for the dialog on the display unit It is good to have a function to
Although emoticons can not be output as voices, by displaying emoticons on the display unit, it is possible to create unusual fun of dialogue by combining them with speech as part of the dialogue.

（３７）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、同じ文字列が含まれる前記出力文字列同士についてはいずれか１つのみを選択して他の前記出力文字列と組み合わせて対話させる機能を有するとよい。
同じ文字列が繰り返されると対話がくどくなってしまうし、聞き手に違和感を覚えさせてしまうためである。 (37) The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string, and the output character string from the plurality of different servers It is preferable to have a function of receiving the character string output as and selecting only one of the output character strings including the same character string and interacting with another output character string.
If the same string is repeated, the dialogue will be disturbed, and the listener may feel uncomfortable.

（３８）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、前記出力文字列の語尾を語尾変換エンジンによって変換してから組み合わせて対話させる機能を有するとよい。
普通の対話エンジンの文章に比べて、より親しみやすい表現となるのでよい。 (38) The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string, and the output character string from the plurality of different servers It is preferable to have a function of receiving the character string output as and converting the end of the output character string by the word conversion engine and then combining and interacting.
It should be a more familiar expression than the text of a normal dialogue engine.

（３９）入力文字列に対応する出力文字列を出力する対話エンジンを備える異なる複数のサーバーに対して音声認識した文字列を前記入力文字列として送信し、前記異なる複数のサーバーから前記出力文字列として出力された文字列を受信し、すべての前記出力文字列を使用せずに一部の前記出力文字列を記憶手段に記憶させておき、以後の対話で前記記憶手段から取り出して対話に使用させる機能を有するとよい。
音声認識が失敗した場合や、外部サーバーからのレスポンスがなかなか来ない場合に使用することで、対話が途切れずにつなげることができ、自然な対話に寄与する。 (39) The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string, and the output character string from the plurality of different servers Receive the character string output as ???, and store some of the output character strings in the storage unit without using all the output character strings, and take them out from the storage unit in subsequent dialogues and use them for dialogue It is good to have a function to
The dialog can be connected without interruption by using it when speech recognition fails or when a response from an external server does not easily come, contributing to a natural dialogue.

（４０）音声認識エンジンを備えるサーバーを利用する際に料金が無料のサーバーと有料のサーバーをミックスして利用するとよい。
これによって例えば特に装置との対話のヘビーユーザーはサーバー接続料金を節約することができる。 (40) When using a server equipped with a speech recognition engine, it is preferable to mix and use a free server and a paid server.
This enables, for example, heavy users of the device interaction to save on server connection charges.

（４１）前記他の機器はスマートスピーカであり、前記装置は前記スマートスピーカに音声出力を行って前記スマートスピーカとコミュニケーションを行うようにするとよい。
スマートスピーカは表示部がないため、装置と組み合わせて使用することで利便性が高まる。
スマートスピーカとは、例えば無線通信接続機能と音声操作のアシスタント機能を持つスピーカーとするとよい。例えばGoogleHome、AmazonEcho、LINE Clova等とするとよい。スマートスピーカは例えば様々な機能・能力（スキル）を実現する機能を備えるものとするとよい。て音声でのコミュニケーションをする装置からスマートスピーカに対して発話することでその機能を実行させることができる。装置から発話させる際には、例えばユーザーがスマートスピーカのスキルを起動させるフレーズを発話し、装置がその発話を音声認識して文字列データとして保存し、あるタイミングでその文字列データを音声合成してスマートスピーカに対して発話してスキルを実行させる構成とするとよい。 (41) The other device may be a smart speaker, and the device may output voice to the smart speaker to communicate with the smart speaker.
Since the smart speaker does not have a display unit, the convenience is enhanced by using it in combination with the device.
The smart speaker may be, for example, a speaker having a wireless communication connection function and an assistant function of voice operation. For example, GoogleHome, AmazonEcho, LINE Clova etc. are good. The smart speaker may have, for example, a function to realize various functions / skills. It is possible to execute the function by uttering to the smart speaker from the device that communicates by voice. For example, when uttering from the device, the user utters a phrase for activating the skill of the smart speaker, the device performs speech recognition of the utterance, stores it as character string data, and synthesizes the character string data at a certain timing. It is preferable that the smart speaker be uttered to execute the skill.

（４２）前記他の機器はスマートスピーカであり、前記装置は前記スマートスピーカに音声出力を行って前記スマートスピーカとコミュニケーションを行うようにするとよい。
スマートスピーカを起動させたり、スキルを起動させる。あるいは問い合わせを行う。自らスマートスピーカを起動させなくとも、ある決まったタイミングや、ある予定時刻にスマートスピーカのスキルを自動的に実行させることが可能となる。 (42) The other device may be a smart speaker, and the device may output sound to the smart speaker to communicate with the smart speaker.
Activate smart speakers or activate skills. Or make an inquiry. Even if the smart speaker is not activated by itself, it is possible to automatically execute the smart speaker skill at a predetermined timing or at a certain scheduled time.

（４３）前記他の機器はスマートスピーカであり、前記装置は前記スマートスピーカの音声出力を翻案する翻案機能を有するとよい。
質問に対するスマートスピーカの決まった長い回答を聞くのが面倒であったりする場合や、内容をざっと再確認したい場合に便利である。 (43) The other device may be a smart speaker, and the device may have an adaptation function for adapting an audio output of the smart speaker.
This is useful when it is troublesome to listen to a fixed long answer of a smart speaker in response to a question or when you want to roughly reconfirm the contents.

（４４）前記装置の音声出力はＷｅｂ記事の読み上げる機能を有するとよい。
Ｗｅｂ記事を読まなくとも装置との対話のみで聞くことができる。 (44) The audio output of the device may have a function of reading web articles.
Even if you do not read Web articles, you can listen only by interacting with the device.

（４５）前記ある出力としてロボティクスプロセスオートメーションの所定の処理単位の実行が完了した時点でなされるようにするとよい。
ロボティクスプロセスオートメーションは処理単位の実行状況がわかりにくいが、装置に処理単位の実行に応じた「ある出力」をさせることで処理状況がわかりやすくなり利便性が高まる。 (45) It is preferable that the certain output be performed when execution of a predetermined processing unit of robotics process automation is completed.
Robotics process automation is difficult to understand the execution status of the processing unit, but by making the device “output” according to the execution of the processing unit, the processing status can be easily understood and the convenience is enhanced.

（４６）前記ある出力とは報知動作とするとよい。
報知動作によってある出力がされたことがわかることとなる。
（４７）ロボティクスプロセスオートメーションの実行中のコンピュータがユーザーからの入力待ち状態となった場合に、報知動作を行うようにした。
これによって入力待ち状態となったことをユーザーに報せ、次の処理を促すことが可能となる。 (46) The certain output may be a notification operation.
It will be understood that a certain output has been made by the notification operation.
(47) When the computer in execution of robotics process automation is in the state of waiting for an input from the user, the notification operation is performed.
As a result, it is possible to inform the user that the input is in the waiting state and to prompt the next processing.

（４８）ロボティクスプロセスオートメーションを行うクライアントコンピュータと、前記クライアントコンピュータに対してロボティクスプロセスオートメーションの実行指示を与えるサーバーコンピュータとを備え、前記クライアントコンピュータに前記サーバーコンピュータからの指示があった場合、報知動作を行うようにするとよい。
これによってサーバーコンピュータからの指示があったことをユーザーに報せ、次の処理を促すことが可能となる (48) A client computer for performing robotic process automation, and a server computer for giving an instruction to execute robotic process automation to the client computer, and when the client computer receives an instruction from the server computer, a notification operation is performed. You should do it.
This makes it possible to inform the user that there is an instruction from the server computer and to prompt the next processing.

（４９）ロボティクスプロセスオートメーションを実行しているコンピュータの方向を指し示す動作を行なう前記出力情報を生成するとよい。
これによってどのコンピュータにおいて実行が行われたのかをユーザーがわかることとなり、ユーザーに次の処理を促すことが可能となる。
（５０）ロボティクスプロセスオートメーションの実行状態に応じて異なる前記出力情報を生成するものとするとよい。
これによってどのような実行が行われたかを区別することができる。 (49) The output information may be generated to perform an operation pointing at a computer executing robotics process automation.
This makes it possible for the user to know on which computer the execution has been performed and to prompt the user to perform the next processing.
(50) It is preferable that different output information be generated depending on the execution state of robotics process automation.
This makes it possible to distinguish what kind of execution has been done.

（５１）（１）〜（５０）のいずれかに記載の装置の機能をコンピュータに実現させるためのプログラム。
「ある出力」など「ある」と記載した部分は例えば「所定の」とするとよい。
上述した（１）から（５０）に示した発明は、任意に組み合わせることができる。例えば、（１）に示した発明の全てまたは一部の構成に、（２）以降の少なくとも１つの発明の少なくとも一部の構成を加える構成としてもよい。特に、（１）に示した発明に、（２）以降の少なくとも１つの発明の少なくとも一部の構成を加えた発明とするとよい。また、（１）から（５０）に示した発明から任意の構成を抽出し、抽出された構成を組み合わせてもよい。本願の出願人は、これらの構成を含む発明について権利を取得する意思を有する。また「〜の場合」「〜のとき」という記載があったとしてもその場合やそのときに限られる構成として記載はしているものではない。これらの場合やときでない構成についても開示しているものであり、権利取得する意思を有する。また順番を伴った記載になっている箇所もこの順番に限らない。一部の箇所を削除したり、順番を入れ替えた構成についても開示しているものであり、権利取得する意思を有する。 (51) A program for causing a computer to realize the function of the device according to any one of (1) to (50).
For example, a portion described as "presence" such as "a certain output" may be "predetermined".
The inventions described in (1) to (50) above can be arbitrarily combined. For example, at least a part of the configuration of at least one invention of (2) or later may be added to the configuration of all or part of the invention shown in (1). In particular, at least a part of the invention of (2) or later may be added to the invention of (1). Further, any configuration may be extracted from the inventions shown in (1) to (50), and the extracted configurations may be combined. The applicant of the present application has the intention to acquire the rights to the invention including these configurations. Further, even if there is a description of "in case of""at time of", it is not described as a configuration limited in that case or at that time. We also disclose the configuration in these cases and not at times, and have the intention to acquire the right. Moreover, the part which is described with the order is not limited to this order. It also discloses the configuration in which some parts are deleted or the order is changed, and has the intention to acquire the right.

ユーザーや他の機器とコミュニケーションを取る際に、装置はコミュニケーションに応じた出力情報を生成することができる。そのため、ユーザー又は他の機器において装置とのコミュニケーションを図る際の利便性が高まる。
本願の発明の効果はこれに限定されず、本明細書および図面等に開示される構成の部分から奏する効果についても開示されており、当該効果を奏する構成についても分割出願・補正等により権利取得する意思を有する。例えば本明細書において「〜できる」と記載した箇所などは奏する効果を明示する記載であり、また「〜できる」と記載がなくとも効果を示す部分が存在する。またこのような記載がなくとも当該構成よって把握される効果が存在する。 When communicating with a user or another device, the device can generate output information according to the communication. Therefore, the convenience when communicating with the device in the user or another device is enhanced.
The effect of the invention of the present application is not limited to this, and the effect to be exerted from the portion of the configuration disclosed in the present specification and the drawings is also disclosed. Have the intention to For example, in the present specification, a portion described as “can be” is a description that clearly indicates the effect to be exerted, and there is a portion showing an effect even if it is not described as “can”. In addition, even if such a description is not made, there are effects to be grasped by the configuration.

本発明にかかる実施の形態１のロボットの正面図。The front view of the robot of Embodiment 1 concerning this invention. 同じ実施の形態１のロボットの側面図。The side view of the robot of the same embodiment 1. 同じ実施の形態１のロボットの背面図。The back view of the robot of the same Embodiment 1. FIG. ロボットの電気的構成を説明するブロック図。FIG. 2 is a block diagram illustrating an electrical configuration of a robot. ロボットの顔面部に表示される顔画面のある表情の一例を捉えた説明図。Explanatory drawing which captured an example of the facial expression with the face screen displayed on the face part of a robot. ロボットの顔面部に表示されるチャット画面のあるチャット状態の一例を捉えた説明図。The explanatory view which caught an example of the chat state with the chat screen displayed on the face part of a robot. ロボットの顔面部に表示される顔画面を背景とする待ち受け画像を説明する説明図。Explanatory drawing explaining the standby image which makes the background the face screen displayed on the face part of a robot. ロボットの顔面部に表示されるチャット画面を背景とする待ち受け画像を説明する説明図。Explanatory drawing explaining the standby image which makes the background the chat screen displayed on the face part of a robot. （ａ）〜（ｄ）はロボットの顔面部に表示される目オブジェクトの変形パターンを説明する説明図。(A)-(d) is explanatory drawing explaining the deformation | transformation pattern of the eye object displayed on the face part of a robot. （ａ）〜（ｃ）はロボットの顔面部にユーザーの発話内容が文字列として徐々に表れてくる様子を説明する説明図。(A)-(c) is explanatory drawing explaining a mode that the user's utterance content appears gradually as a character string on the face part of a robot. ロボットの顔面部に表示される顔画面において目オブジェクトがユーザーの顔を追って移動している状態を説明する説明図。Explanatory drawing explaining the state which the eye object is following the user's face, and is moving in the face screen displayed on the face part of a robot. スマートフォンの一例を説明する説明図。Explanatory drawing explaining an example of a smart phone. ロボットの起動〜ウェイクアップモード〜対話モード〜スリープモードの関係を説明する説明図。Explanatory drawing explaining the relationship of starting of a robot-wake up mode-dialog mode-sleep mode. 実施の形態７においてロボットとスマートスピーカの関係を説明する説明図。Explanatory drawing explaining the relationship between a robot and a smart speaker in Embodiment 7. FIG. 実施の形態９においてスマートフォンの一例を説明する説明図。FIG. 32 is an explanatory diagram for explaining an example of a smartphone in Embodiment 9;

＜実施の形態１＞
図１〜図３に示すように、人の声に反応して動作するコミュニケーションロボットであるロボット１は、下半身となる固定部２と、固定部２上に載置される上半身となる可動部３を筐体として備えている。可動部３は固定部２に隣接配置された胴部４と、胴部４に支持された頭部５とから構成されている。固定部２は上に開いた碗状の外観に形成され、胴部４は固定部２上縁と上下方向に連続的なカーブで構成された筒体形状に形成されている。ロボット１は固定部２と胴部４の接続部分がもっとも大径に構成されて、その接続部分を境界に上下方向に窄まった外形とされている。胴部４はその筒体形状の前方が半円形形状に大きく切り欠かれている。頭部５は胴部４の上部に埋め込まれるように嵌合されている。胴部４は固定部２に対して水平方向（図１の矢印方向）に回動し、頭部５は胴部４に対して縦方向（図２の矢印方向）と左右回転方向（図３の矢印方向）の２方向に回動する。 Embodiment 1
As shown in FIGS. 1 to 3, the robot 1, which is a communication robot that operates in response to human voice, has a fixed unit 2 that is a lower body and a movable unit 3 that is an upper body placed on the fixed unit 2. Is provided as a housing. The movable portion 3 is composed of a body 4 adjacent to the fixed portion 2 and a head 5 supported by the body 4. The fixing portion 2 is formed in a bowl-like appearance opened upward, and the body portion 4 is formed in a cylindrical shape formed of a continuous curve in the vertical direction with the upper edge of the fixing portion 2. In the robot 1, the connecting portion between the fixing portion 2 and the body portion 4 is formed to have the largest diameter, and the connecting portion is shaped so as to be narrowed in the vertical direction at the boundary. The body portion 4 is largely cut out in a semicircular shape at the front of its cylindrical shape. The head 5 is fitted to be embedded in the upper portion of the body 4. The body 4 rotates in the horizontal direction (the direction of the arrow in FIG. 1) with respect to the fixed portion 2 and the head 5 with respect to the body 4 in the longitudinal direction (the direction of the arrow in FIG. 2) In the direction of the arrow).

頭部５は全体として球体の一部（前面部分）が１つの平面でカットされた残余である球欠状の形状に構成されている。カット状に形成された前面部分は円形形状に現れロボット１の顔面部６を構成する。顔面部６の表面に形成された長方形部分はタッチパネル機能を備えた液晶ディスプレイ（ＬＣＤ）である表示部としてのタッチパネル部７とされている。タッチパネル部７に表示される内容については後述する。タッチパネル部７の周囲の顔面部６領域にはスモークパネルが配置され顔面部６全体が統一された濃色の背景となっている。頭部５の内部において顔面部６の上方左右の収容位置には照度センサ８と高輝度白色ＬＥＤ９がそれぞれ配設されている。顔面部６においてタッチパネル部７の上部中央位置には顔認識用カメラ１０のレンズ１１が配設されている。 The head 5 is generally formed in a spherical shape, which is the remainder of a portion of the sphere (the front surface portion) cut in one plane. The front portion formed in a cut shape appears in a circular shape and constitutes the face portion 6 of the robot 1. A rectangular portion formed on the surface of the face portion 6 is a touch panel portion 7 as a display portion which is a liquid crystal display (LCD) having a touch panel function. The contents displayed on the touch panel unit 7 will be described later. A smoked panel is disposed in the area of the face portion 6 around the touch panel portion 7 so that the entire face portion 6 has a uniform dark background. An illuminance sensor 8 and a high-brightness white LED 9 are respectively disposed at upper left and right accommodation positions of the face portion 6 inside the head 5. The lens 11 of the face recognition camera 10 is disposed at the upper center position of the touch panel unit 7 in the face unit 6.

胴部４内部において胴部４の前方左右寄り位置と後方中央位置の１２０度ずつずれた同じ高さの３箇所の位置にはマイクロフォン１２が配設されている。固定部２内部において固定部２上には左右一対のスピーカ装置１３が配設されている。スピーカ装置１３の側方にはスピーカ装置１３で発生した音を出力するための開口部１４が形成されている。スピーカー用開口部１４に隣接した位置には電源スイッチ１５とスピーカー装置１３の音量を調整するためのアップスイッチ１６とダウンスイッチ１７がそれぞれ配設されている。固定部２の後方位置にはＵＳＢのＯＴＧ（On-The-Go）用の端子１８、ＤＣ１２Ｖ用の電源用ジャック２０、マイクロＳＤカード用ソケット（リーダー）１９が配設されている。 The microphones 12 are disposed at three positions of the same height offset by 120 degrees in the front left and right positions of the trunk 4 and the rear center position inside the trunk 4. A pair of left and right speaker devices 13 is disposed on the fixed portion 2 inside the fixed portion 2. An opening 14 for outputting the sound generated by the speaker device 13 is formed on the side of the speaker device 13. At positions adjacent to the speaker opening 14, an up switch 16 and a down switch 17 for adjusting the volume of the power switch 15 and the speaker device 13 are respectively disposed. A terminal 18 for OTG (On-The-Go) of USB, a power jack 20 for DC 12 V, and a socket (reader) 19 for micro SD card are disposed at the rear position of the fixed part 2.

ロボット１はインターネット回線を利用して所定の外部のクラウドサーバーに接続可能とされている。クラウドサーバーはロボット１が必要とするデータを記憶する記憶手段としての記憶領域、ロボット１が必要とする各種処理を行うための各種エンジン等を備えている。そのため、広義にはロボット１はこれらのクラウドサーバーのソフトウェア等の部分を含めた装置として解釈することができる。 The robot 1 can be connected to a predetermined external cloud server using an internet connection. The cloud server includes a storage area as storage means for storing data required by the robot 1, various engines for performing various processes required by the robot 1, and the like. Therefore, in a broad sense, the robot 1 can be interpreted as an apparatus including portions of software and the like of these cloud servers.

次に、図４のブロック図に基づいて、実施の形態１のロボット１の電気的構成について説明する。
制御手段としてのコントローラＭＣには上記のタッチパネル部７、照度センサ８、高輝度白色ＬＥＤ９、顔認識用カメラ１０、マイクロフォン１２、スピーカ装置１３、端子１８、マイクロＳＤカード用ソケット１９が接続され、これらに加え、無線ＬＡＮ装置２１、ドップラーセンサ２２、第１〜第３のモータ２３〜２５等がそれぞれ接続されている。 Next, the electrical configuration of the robot 1 of the first embodiment will be described based on the block diagram of FIG.
The touch panel 7, the illuminance sensor 8, the high brightness white LED 9, the face recognition camera 10, the microphone 12, the speaker device 13, the terminal 18, and the micro SD card socket 19 are connected to the controller MC as control means In addition to the above, the wireless LAN device 21, the Doppler sensor 22, the first to third motors 23 to 25 and the like are connected.

タッチパネル部７はその表面に接触することで入力する入力操作機能を有する。タッチパネル部７は後述する自然対話モードにおいては図５又は図６のような異なった画面を表示させることができる。コントローラＭＣは第１の画面として図５のようなロボット１の表情、特に目の周辺の変化を司る顔画面Ｓ１を変位可能にタッチパネル部７に表示させる。顔画面Ｓ１はデフォルトで表示される画面であって、ロボット１の目オブジェクト２７とほっぺオブジェクト２８と楕円領域２９が表示される。目オブジェクト２７はアニメーション画像としていくつかの目オブジェクト２７の変形パターンを備えている（図９（ａ）〜（ｄ））。また、アニメーション画像として瞳オブジェクト２７ａが左右に移動する動きをする。 The touch panel unit 7 has an input operation function of inputting by touching the surface. The touch panel unit 7 can display different screens as shown in FIG. 5 or 6 in the natural interaction mode described later. The controller MC causes the touch panel unit 7 to displaceably display the facial expression of the robot 1 as shown in FIG. The face screen S1 is a screen displayed by default, in which the eye object 27 of the robot 1, the cheek object 28 and the elliptical area 29 are displayed. The eye object 27 is provided with deformation patterns of several eye objects 27 as animation images (FIGS. 9A to 9D). In addition, the pupil object 27a moves to the left and right as an animation image.

また、コントローラＭＣは第２の画面として図６のようなチャット画面Ｓ２をタッチパネル部７に表示させる。チャット画面Ｓ２は顔画面Ｓ１の状態でタッチパネル部７をタッチしてスライドさせることで顔画面Ｓ１に代えてタッチパネル部７上にチャット画面Ｓ２を表示させることができる。スライド操作によって顔画面Ｓ１とチャット画面Ｓ２は相互に表示切り替えが可能となっている。チャット画面Ｓ２については後述する。
また、タッチパネル部７は待ち受けモード（ウェイクアップモード）ではタッチパネル部７上に図７又は図８の待ち受け画像を表示させることができる。待ち受け画像については後述する。
ロボット１にはこれら以外の異なる画像として設定画面が用意され、チャット画面Ｓ２からその設定画面に移動可能である。ロボット１が初めて起動された状態では設定画面からアクセスして、例えば、ＩＤ・パスワードのサーバーへの設定登録、Wi-Fiパスワードの設定登録、ユーザー登録（例えば名前、年齢、性別等）、顔認証、データを転送する先となるｅメールアドレスの設定等の必要な初期設定項目を入力する。 Further, the controller MC causes the touch panel unit 7 to display a chat screen S2 as shown in FIG. 6 as a second screen. The chat screen S2 can display the chat screen S2 on the touch panel unit 7 instead of the face screen S1 by touching and sliding the touch panel unit 7 in the state of the face screen S1. The face screen S1 and the chat screen S2 can be mutually switched by slide operation. The chat screen S2 will be described later.
Further, the touch panel unit 7 can display the standby image of FIG. 7 or 8 on the touch panel unit 7 in the standby mode (wake-up mode). The standby image will be described later.
The robot 1 is provided with a setting screen as a different image other than these, and can be moved from the chat screen S2 to the setting screen. When the robot 1 is activated for the first time, access is made from the setting screen, for example, setting registration of the ID and password to the server, setting registration of the Wi-Fi password, user registration (eg name, age, gender etc.), face authentication Input necessary initial setting items such as e-mail address settings to which data is to be transferred.

照度センサ８は、ロボット１の設置された環境の明るさを認識する。高輝度白色ＬＥＤ９は照度センサ８の検出した数値に基づいて顔認識用カメラ１０による撮影に光度が足りない場合に自動的に点灯される。
マイクロフォン１２は、ユーザーとの対話においてユーザーの発話を取得する音声入力手段であると同時に、三角形の頂点に配置される３つのマイクロフォン１２を同時に使用することで、これらの間での音の到達時間の差によって音源方向を特定することができる方向検知手段でもある。コントローラＭＣは各マイクロフォン１２の取得した電気信号の位相差から到達時間差を求める。コントローラＭＣはその到達時間差に基づいて基準方向に対する音源角度を算出する。ユーザーとの対話に特化したマイクロフォンを例えば顔面部６に設けるようにしてもよい。
スピーカ装置１３は、ユーザーとの対話においてロボット１が発話（音声出力）する音声出力手段である。
マイクロＳＤカード用ソケット１９は挿入されるmicroＳＤカードのデータの読み取り及び書き換えをする。
無線ＬＡＮ装置２１は、Wi-Fi対応機器であるロボット１をインターネットに無線接続させるための機器である。本実施の形態ではIEEE802.11bの国際標準規格とされている。
ドップラーセンサ２２は、マイクロ波を使用したセンサであって、マイクロ波を発射し、反射してきたマイクロ波の周波数と、発射した電波の周波数とを比較し、物体（人）が動いているかどうかを検出する。ドップラー効果により物体（人）が動いている場合の反射波の周波数が変化することを利用するものである。例えば、ユーザー不在時の不審者の有無等のように、ロボット１の周囲の異常を検知するために使用される装置である。
第１のモータ２３は胴部４を固定部２に対して水平方向（図１の矢印方向）に回動させるためのサーボモータである。第２のモータ２４は頭部５を胴部４に対して縦方向（図２の矢印方向）に回動させるためのサーボモータである。第３のモータ２５は頭部５を胴部４に対して左右回転方向（図３の矢印方向）に回動させるためのサーボモータである。マイクロフォン１２によってユーザーの発話する方向が決定された場合にはコントローラＭＣは顔面部６がユーザーの方向に正対するように第１のモータ２３を制御して固定部２に対して胴部４（可動部３）を回動させる。 The illuminance sensor 8 recognizes the brightness of the environment in which the robot 1 is installed. The high brightness white LED 9 is automatically turned on based on the numerical value detected by the illumination sensor 8 when the light intensity is insufficient for photographing by the face recognition camera 10.
The microphone 12 is an audio input means for acquiring the user's speech in the dialog with the user, and simultaneously using three microphones 12 arranged at the apex of the triangle, the arrival time of the sound among them It is also a direction detection means that can specify the sound source direction by the difference The controller MC obtains the arrival time difference from the phase difference of the electric signals acquired by the microphones 12. The controller MC calculates the sound source angle with respect to the reference direction based on the arrival time difference. For example, a microphone specialized for interaction with the user may be provided on the face portion 6.
The speaker device 13 is an audio output unit with which the robot 1 utters (voice output) in the dialog with the user.
The micro SD card socket 19 reads and rewrites data of the inserted micro SD card.
The wireless LAN device 21 is a device for wirelessly connecting the robot 1 which is a Wi-Fi compatible device to the Internet. In this embodiment, the international standard of IEEE 802.11b is used.
The Doppler sensor 22 is a sensor using microwaves, emits microwaves, compares the frequency of the reflected microwaves with the frequency of emitted radio waves, and determines whether the object (person) is moving or not. To detect. It utilizes that the frequency of the reflected wave when the object (person) is moving is changed by the Doppler effect. For example, it is a device used to detect an abnormality around the robot 1, such as the presence or absence of a suspicious person when the user is absent.
The first motor 23 is a servomotor for rotating the body 4 in the horizontal direction (the direction of the arrow in FIG. 1) with respect to the fixed portion 2. The second motor 24 is a servomotor for rotating the head 5 in the longitudinal direction (the direction of the arrow in FIG. 2) with respect to the body 4. The third motor 25 is a servomotor for rotating the head 5 relative to the body 4 in the left-right rotational direction (the direction of the arrow in FIG. 3). When the direction in which the user speaks is determined by the microphone 12, the controller MC controls the first motor 23 so that the face 6 faces the user's direction, and the trunk 4 (movable Turn part 3).

コントローラＭＣは周知のＣＰＵやＲＯＭ及びＲＡＭ、ＳＳＤ等の記憶手段としてのメモリ、バス、リアルタイムクロック（ＲＴＣ）等から構成されている。コントローラＭＣのＲＯＭ内にはロボット１の各種機能を実行させるための各種プログラムが記憶されている。
各種プログラムとしては、例えばマイクロフォン１２とスピーカ装置１３を介したユーザーとの対話を制御するための対話プログラム、顔認識用カメラ１０を使用した顔認識に関する顔認識プログラム、タッチパネル部７や第１〜第３のモータ２３〜２５を制御してロボット１との対話中におけるロボット１の表情や動作を変化させるための表示変動・ジェスチャープログラム、ユーザーとの対話やタッチパネル部７の操作に基づいて異なる画面や画像をタッチパネル部７上に表示させる画面表示プログラム、他のコンピュータやスマートフォンとの間でロボット１側で取得した例えばカメラ画像やスマートフォン等からのｅメール等を処理するデータ送受信プログラム、ユーザーが不在の際の見守りのための留守設定時プログラム、ＧＵＩ機能・ネット接続機能、プロセス管理等の操作・運用・運転のためのＯＳ等が記憶されている。ＲＡＭ内には対話や顔認識における入出力データ等が一旦記憶される。各プログラムは他のプログラムと連携してあるいは独立してマルチタスクで対話、顔認識、ジェスチャー等の機能が実現される。 The controller MC is composed of a known CPU, ROM, RAM, memory as storage means such as an SSD, a bus, a real time clock (RTC), and the like. In the ROM of the controller MC, various programs for executing various functions of the robot 1 are stored.
As various programs, for example, a dialogue program for controlling dialogue with the user through the microphone 12 and the speaker device 13, a face recognition program related to face recognition using the face recognition camera 10, the touch panel unit 7, the first to the first A display fluctuation / gesturing program for changing the facial expression or the motion of the robot 1 during the dialogue with the robot 1 by controlling the motors 23 to 25 of FIG. Screen display program for displaying an image on the touch panel unit 7 Data transmission / reception program for processing e-mails from camera images, smartphones, etc. acquired on the robot 1 side with other computers or smartphones, user absent Out-of-time setting program for watching over, GUI Noh net connectivity, OS or the like for operation, operation and operation of the process management, etc. are stored. In the RAM, input / output data and the like in dialogue and face recognition are temporarily stored. Each program can realize functions such as dialogue, face recognition, gesture, etc. in multitasking in cooperation with other programs or independently.

Ａ．対話時の動作内容について
上記のような構成において、コントローラＭＣは対話プログラムを実行することによってユーザーとの対話によるコミュニケーションを制御する。尚、対話の開始可能と同期して可能となる顔認識については下記「Ｂ．顔認識時の動作内容について」で後述する。
ここで、対話プログラムは、
１）マイクロフォン１２から取得したユーザーの発話データ（音声データ）をクラウドサーバーにリクエスト発行し、サーバー側の音声認識エンジンを使用してテキスト化したユーザーの発話データ（文字列データ）をレスポンスするためのサブプログラム
２）ユーザーの発話（文字列データ）に基づいてビルトインシナリオの対話を実行させるビルトインシナリオサブプログラム
３）ユーザーの発話がビルトインシナリオに対応しない場合に発話データ（文字列データ）を再びクラウドサーバーにリクエスト発行し、対話ＡＰＩ（アプリケーションプログラミングインタフェース）を利用して対話エンジンにロボット１の返答データ（文字列データ）を作成させる発話データ転送サブプログラム
４）レスポンスされた返答データ（文字列データ）を表示部としてのタッチパネル部７に表示させる文字列データ表示サブプログラム
５）レスポンスされた返答データ（文字列データ）を音声合成エンジンによって音声データに変換しスピーカ装置１３からロボット１側の発話として音声出力させるための音声データサブプログラム
６）ユーザー側文字列データやロボット１側文字列データに基づいてタッチパネル部７上の表示態様やロボット１の動作を変動させる表示態様・動作変動サブプログラム、
等を含む。
以下、主として対話プログラムに基づいたコントローラＭＣの制御内容の一例について、起動後の待ち受けモード（ウェイクアップモード）と自然対話モードとスリープモードの相互の関係と共に説明する。これらの相互の関係は図１２に示されるとおりである。
図５と図６は自然対話モードの画面であり、図７と図８は待ち受けモードの画面である。スリープモードではこれらの画面はタッチパネル部７のバックライトが消灯して暗くなった画面である。 A. Regarding Operation Contents at the Time of Dialogue In the above-described configuration, the controller MC controls communication by dialogue with the user by executing the dialogue program. The face recognition that can be performed in synchronization with the start of the dialogue will be described later in “B. About the operation contents at the time of face recognition” below.
Here, the dialogue program is
1) A request is issued to the cloud server for user utterance data (voice data) acquired from the microphone 12 and a response is made to the user utterance data (character string data) converted into text using the server-side voice recognition engine Subprogram 2) Built-in scenario subprogram for executing built-in scenario dialogue based on user's utterance (character string data) 3) Cloud data of utterance data (character string data) again when the user's utterance does not correspond to the built-in scenario Utterance data transfer subprogram 4 to make the dialog engine create response data (character string data) of the robot 1 using the interaction API (application programming interface) Character string data display subprogram 5 to display on the touch panel unit 7 as a display unit) response data (character string data) that has been responded is converted into speech data by a speech synthesis engine and speech from the speaker device 13 on the robot 1 side Voice data subprogram 6 for outputting voice as a display mode / operation variation subprogram for varying the display mode on the touch panel unit 7 or the operation of the robot 1 based on the user side character string data and the robot 1 side character string data,
Etc.
Hereinafter, an example of the control contents of the controller MC mainly based on the dialogue program will be described together with the relationship between the standby mode (wakeup mode) after activation, the natural dialogue mode, and the sleep mode. These interrelations are as shown in FIG.
FIGS. 5 and 6 show screens in the natural dialogue mode, and FIGS. 7 and 8 show screens in the standby mode. In the sleep mode, these screens are darkened because the backlight of the touch panel unit 7 is turned off.

１．起動
電源スイッチ１５の投入によってロボット１は起動される（図１３の処理Ｍ０）。コントローラＭＣではブート・プログラムが実行され、次いでＯＳが起動すると、ＯＳはユーザーからの「命令（コマンド）」待ち状態、つまりウェイクアップ状態となる。この初期の待ち受けモードでは図７の待ち受け画面が表示される。
尚、以下では初期設定が完了した後の状態、つまりロボット１のＩＤとパスワードがクラウドサーバーに登録され、ユーザー登録が完了し、複数のユーザーの顔認証がされ、スマートフォンのｅメールアドレスがロボット１に登録される等以後の起動とする。
『１．起動における効果』
このように起動によって複数の待ち受け画面から選択された１つの画面（図７）がまず表示される。つまり、起動時には常に決まった画面が表示されることとなる。そしてロボット１の目が閉じている（対話ができないことを暗示している）ことから待ち受けモードにあることがユーザーに容易にわかるようになっている。 1. Startup The robot 1 is started by turning on the power switch 15 (process M0 in FIG. 13). The controller MC executes the boot program, and then when the OS is booted up, the OS waits for an “instruction” from the user, ie, wakes up. In this initial standby mode, the standby screen of FIG. 7 is displayed.
In the following, the state after the initial setting is completed, that is, the ID and password of the robot 1 are registered in the cloud server, the user registration is completed, the face authentication of a plurality of users is performed, and the e-mail address of the smartphone is the robot 1 It is assumed that it starts after being registered in
[1. Effect on launch
As described above, one screen (FIG. 7) selected from the plurality of standby screens is displayed at first. That is, a fixed screen is always displayed at startup. The user can easily recognize that the robot 1 is in the standby mode from the fact that the eyes of the robot 1 are closed (indicating that the user can not interact).

２．待ち受けモード（ウェイクアップモード）
待ち受けモードは自然対話モードの開始のトリガーがあるとロボット１と対話が可能となる状態である。また、自然対話モードにおいて対話が終了した場合に移行する状態でもある。また、一定時間自然対話モードが開始されないとスリープモードになってしまう状態でもある。スリープモードはロボット１と対話でコミュニケーションが取れない状態である。
ここに「自然対話」とはロボット１のビルトインシナリオやサーバ上の対話エンジン（対話ソフト）を使用してユーザーが音声合成された装置（ロボット１）側の音声と対話することをいう。自然対話モードは自然対話が可能な状態である。
待ち受けモードの画面は複数あり、本実施の形態では図７と図８の２種類が用意されている。
図７は図５の自然対話モードにおける画面から移行する待ち受け画面である（図１３の処理Ｍ２）。また、図１３の処理Ｍ０によって起動時に表示される待ち受け画面でもある。図７では、日時と曜日と、大きく現時間が表示がされた時計レイヤーの画面の背景にロボット１の目（目オブジェクト２７）が閉じた状態の顔画面Ｓ１のレイヤー画面が薄く表示されている。
図８は図６の自然対話モードにおける画面から移行する待ち受け画面である（図１３の処理Ｍ２）。図８では、日時と曜日と、大きく現時間が表示がされた時計レイヤーの背景にチャット画面Ｓ２のレイヤー画面が薄く表示されている。つまり、待ち受けモードではあるが自然対話モードではない。
『２．待ち受けモード（ウェイクアップモード）における効果』
このように、異なる待ち受け画面が用意されているので、ある待ち受け画面から自然対話モードが開始される場合にユーザーは直前にアクセスしていた画面での対話を行うことができるため利便性がよい。また、待ち受けモード特有の画面を表示させることで、ロボット１が待ち受けモードにあることがユーザーに容易にわかるようになっている。 2. Standby mode (wake up mode)
The standby mode is a state in which interaction with the robot 1 is enabled when there is a trigger for the start of the natural interaction mode. In addition, it is also a state of transition when the dialogue ends in the natural dialogue mode. In addition, if the natural dialogue mode is not started for a certain period of time, the sleep mode is set. The sleep mode is a state in which communication with the robot 1 can not be achieved by dialogue.
Here, "natural interaction" means that the user interacts with the voice on the side of the device (robot 1) on which voice synthesis is performed using the built-in scenario of the robot 1 or the interaction engine (interaction software) on the server. The natural dialogue mode is a state in which natural dialogue is possible.
There are a plurality of screens in the standby mode, and two types shown in FIG. 7 and FIG. 8 are prepared in this embodiment.
FIG. 7 shows a standby screen for transitioning from the screen in the natural dialogue mode of FIG. 5 (process M2 of FIG. 13). It is also a standby screen displayed at the time of activation by the process M0 of FIG. In FIG. 7, the layer screen of the face screen S1 with the eyes of the robot 1 (eye object 27) closed is lightly displayed on the background of the screen of the clock layer where the current time and day are displayed largely. .
FIG. 8 is a standby screen for transitioning from the screen in the natural dialogue mode of FIG. 6 (process M2 of FIG. 13). In FIG. 8, the layer screen of the chat screen S <b> 2 is lightly displayed on the background of the clock layer on which the date and time, the day of the week, and the current time are largely displayed. That is, it is in the standby mode but not in the natural dialogue mode.
[2. Effect in standby mode (wake up mode)
As described above, since different standby screens are prepared, when the natural dialogue mode is started from a certain standby screen, the user can perform a dialog on the screen accessed immediately before, which is convenient. Further, by displaying a screen unique to the standby mode, the user can easily recognize that the robot 1 is in the standby mode.

３．自然対話モードの開始と停止
起動されて待ち受けモードやスリープモードにある状態から、コントローラＭＣは例えば次のような複数のタイミング、つまりモード移行のトリガーによって自然対話モードに移行させるよう処理する（図１３の処理Ｍ１、Ｍ５）。以下のトリガーは一例である。自然対話モードでは下記「Ｂ．顔認識時の動作内容について」で説明するような顔認識モードに切り替わる（顔認識ができるようになる）。 3. Starting and stopping the natural interaction mode From the state of being in the standby mode or the sleep mode, the controller MC performs processing to shift to the natural interaction mode by a plurality of timings as follows, that is, triggering of mode transition (FIG. 13) Processing M1, M5). The following trigger is an example. In the natural dialogue mode, the mode is switched to the face recognition mode as described in “B. Operation contents at the time of face recognition” below (a face can be recognized).

１−１）待ち受けモードにおいてコントローラＭＣは一定時間内に起動フレーズとして、例えば「ねえ、ユピ坊」というような発話（音声）をマイクロフォン１２から認識するとそれをトリガーとして自然対話モードとする（図１３の処理Ｍ１）。また、タッチパネル部７へのタッチ動作があったと判断した場合もそれをトリガーとして自然対話モードとする（図１３の処理Ｍ１）。
１−２）スリープモードにおいてコントローラＭＣは、所定のタイミングで待ち受け画面のタッチパネル部７へのタッチ動作があったかどうかを判断する。タッチパネル部７へのタッチ動作は顔面部６における表示態様によって異なり、顔画面Ｓ１の待ち受け画面ではタッチパネル部７全域へのタッチが、チャット画面Ｓ２の待ち受け画面では後述する対話開始ボタンオブジェクト３６へのタッチで開始される。つまり、異なる画面で異なる操作で開始されることとなる。
タッチパネル部７へのタッチ動作があったと判断した場合には、コントローラＭＣは一旦待ち受けモードとし（図１３の処理Ｍ３）、続いてもう一度タッチがあったと判断すると自然対話モードとする（図１３の処理Ｍ１）。
１−３）スリープモードにおいてコントローラＭＣは、１−２）と同様にタッチパネル部７へのタッチ動作があったかどうかを判断する。タッチ動作があったと判断した場合には、コントローラＭＣは一旦待ち受けモードとする（図１３の処理Ｍ３）。この状態で一定時間内に起動フレーズとして、例えば「ねえ、ユピ坊」というような発話（音声）をマイクロフォン１２から認識するとコントローラＭＣはそれをトリガーとして自然対話モードとする（図１３の処理Ｍ１）。
２）スリープモードにおいてコントローラＭＣはＲＴＣによってあらかじめ設定された所定の時刻になったかどうかを判断し、所定の時刻になったタイミングで自然対話モードとする（図１３の処理Ｍ５）。
３）スリープモードにおいてコントローラＭＣは、例えば所定のタイミングで生成した乱数によって、ランダムな時間間隔でランダムにある発話（音声データ）をスピーカ装置１３から出力する。つまり、一種の独り言として、例えば「ねえねえ何してる？」とか「暇だなあ」のような対話を誘うような音声をロボット１から出力させて自然対話モードとし（図１３の処理Ｍ５）、ユーザーに発話を促す。
４）スリープモードにおいてコントローラＭＣはドップラーセンサ２２によって物体（人）が動いているかどうかを判断し、物体（人）が動いていることを検出したタイミングで自然対話モードとする（図１３の処理Ｍ５）。 1-1) In the standby mode, when the controller MC recognizes an utterance (voice) such as "Ye, Yupibo" from the microphone 12 as a start-up phrase within a predetermined time, the controller MC sets it as a natural dialogue mode (FIG. 13). Process M1). In addition, when it is determined that there is a touch operation on the touch panel unit 7, the natural interaction mode is set as a trigger (processing M1 in FIG. 13).
1-2) In the sleep mode, the controller MC determines whether or not there is a touch operation on the touch panel unit 7 of the standby screen at a predetermined timing. The touch operation on the touch panel unit 7 differs depending on the display mode in the face unit 6, and a touch on the entire touch panel unit 7 is on the standby screen of the face screen S1, and a dialog start button object 36 described later is on the standby screen of the chat screen S2. It starts with That is, different screens are started by different operations.
If it is determined that there is a touch operation on the touch panel unit 7, the controller MC once enters the standby mode (processing M3 in FIG. 13), and subsequently determines that there is another touch (in FIG. 13, the natural interaction mode). M1).
1-3) In the sleep mode, the controller MC determines whether there is a touch operation on the touch panel unit 7 as in 1-2). When it is determined that the touch operation has been performed, the controller MC once enters the standby mode (processing M3 in FIG. 13). In this state, when a speech (voice) such as "Hey, Ipipbo" is recognized from the microphone 12 as a start-up phrase within a predetermined time, the controller MC takes it as a trigger to set it as a natural dialogue mode (process M1 in FIG. 13) .
2) In the sleep mode, the controller MC determines whether or not a predetermined time preset by the RTC has come, and sets the natural interaction mode at the timing when the predetermined time has come (process M5 in FIG. 13).
3) In the sleep mode, the controller MC outputs speech (voice data) randomly present at random time intervals from the speaker device 13 by using, for example, random numbers generated at predetermined timing. That is, as a kind of soliloquy, for example, the robot 1 outputs a voice that invites a dialogue such as "Hey, what are you doing?" Or "I'm free!" To make it a natural dialogue mode (processing M5 in FIG. 13), Prompt the user to speak.
4) In the sleep mode, the controller MC determines by the Doppler sensor 22 whether or not the object (person) is moving, and sets the natural interaction mode at the timing when it is detected that the object (person) is moving (processing M5 in FIG. 13). ).

５）スリープモードにおいてコントローラＭＣは天候異常や地震等の気象の変化を察知した場合に、それをユーザーに報知してこれを契機として自然対話を開始する。外部のクラウドサーバーでは一定の基準で例えば天候異常（例えば、大雪、台風等）や地震、落雷等を含む異常気象の情報を異常気象検出エンジンを利用して一定時刻ごとに取得して記憶する。一定時刻とはすべて同じタイミングでもよく、気象の内容によって取得するタイミングを変えてもよい。異常気象の情報は本実施の形態１では、例えばサーバーからプッシュ型の配信システムを採用して装置（コントローラＭＣ）に配信される。コントローラＭＣは情報を取得すると自然対話モードとする（図１３の処理Ｍ５）。
６）コントローラＭＣは上記１）〜５）においてそれぞ自然対話モードとなった状態で一定時間ユーザーからの発話を検出しなかった場合には、待ち受けモードとし（図１３の処理Ｍ２）、更に一定時間後にスリープモードとする（図１３の処理Ｍ４）。これらのモード変位時間の長さは、例えば端末装置よって、あるいはビルトインシナリオとして発話によって適宜設定変更可能である。 5) In the sleep mode, when the controller MC senses a weather abnormality or a change in weather such as an earthquake, it notifies the user of it and starts a natural dialogue triggered by this. The external cloud server acquires and stores abnormal weather information including, for example, weather abnormalities (for example, heavy snow, typhoons, etc.), earthquakes, lightning strikes, etc. at predetermined time intervals using a abnormal weather detection engine on a fixed basis. The predetermined time may be all the same timing, or the timing of acquisition may be changed according to the content of the weather. In the first embodiment, the abnormal weather information is distributed from the server to the device (controller MC) by adopting a push type distribution system, for example. When the controller MC acquires the information, the controller MC sets the natural interaction mode (process M5 in FIG. 13).
6) When the controller MC does not detect speech from the user for a certain period of time in the above-described 1) to 5) in the natural interaction mode, the controller MC enters the standby mode (processing M2 in FIG. 13). After time, the sleep mode is set (processing M4 in FIG. 13). The lengths of these mode displacement times can be set and changed as appropriate, for example, by the terminal device or by speech as a built-in scenario.

『３．自然対話モードの開始と停止における効果』
このように多種類の自然対話モードの開始が用意されることで、様々なタイミングでロボット１と対話することとなりロボット１との対話する機会が多くなり、それによって自然と対話を楽しむ機会も増えることとなって、ユーザーがロボット１を所有するメリットを感じることとなる。また、対話モードが終了すると一旦待ち受けモードになってからスリープモードとなるため、電力コストが削減される。
また、スリープモードから待ち受けモードを飛び越して自然対話モードの画面になるので、直ちに対話を初めることができるため対話開始がスムーズである。また、対話が続く限り対話用の画面（図５や図７）が表示されるため、ユーザーに対話する意欲を惹起させることとなる。 "3. Effect on start and stop of natural dialogue mode
By preparing the start of various types of natural interaction modes in this way, the user interacts with the robot 1 at various timings, which increases the opportunity to interact with the robot 1, thereby increasing the opportunity to enjoy the natural interaction. As a result, the user feels the merit of owning the robot 1. In addition, since the sleep mode is set after the interactive mode is ended once after the standby mode, the power cost is reduced.
Moreover, since the screen of the natural dialogue mode is displayed by jumping over the standby mode from the sleep mode, the dialogue can be started immediately and the dialogue start is smooth. In addition, since the dialog screen (FIG. 5 and FIG. 7) is displayed as long as the dialogue continues, the user is motivated to interact.

４．自然対話モードにおけるビルトインシナリオの対話
自然対話モードにおいては、ビルトインシナリオの対話とサーバーの対話エンジンを使用した通常対話の複数の対話処理が用意されている。
コントローラＭＣは、ユーザーの発話に基づく発話データ（文字列データ）が、まずビルトインシナリオに合致するかどうかを判断し、そうではない場合にクラウドサーバー経由での対話エンジンを使用した対話（以下、通常対話とする）とするよう制御する。ユーザーからすると常にロボット１と対話しているようであるが、実際は自然対話モードの内部処理は複数あることとなる。
コントローラＭＣはクラウドサーバー側の音声認識エンジンによって作成されたユーザーの発話（文字列データ）をビルトインシナリオ（スクリプト）のテキストデータと比較する処理を実行する。本実施の形態ではビルトインシナリオのテキストデータはメモリに記憶されている。ビルトインシナリオをＳＤカードに追加させてもよい。ＳＤカードであれば書き換えによってビルトインシナリオを次々と増やすことが容易である。
コントローラＭＣはユーザーの発話を認識するとその文字列データが予定した正規表現又は非正規表現に合致するかどうか判断し、合致する場合にはその文字列データに対応するスクリプトを音声合成エンジンによって音声データに変換しスピーカ装置１３からロボと１の発話として音声出力させる。
ビルトインシナリオには、例えばユーザーの発話を促すための「こんにちは」「今日はいい天気ですね」のような挨拶のような簡単なシナリオや、ユーザーからの発話に基づく何かの処理を求めるためのシナリオのようなもの等、多くのビルトインシナリオが設定されている（用意されている）。表１〜３にこのようなビルトインシナリオの一例を開示する。もちろん、実際にはこれらのビルトインシナリオ以外にも多くのビルトインシナリオが設定されている。 4. Built-in scenario dialogue in natural dialogue mode In natural dialogue mode, several dialogue processes of normal dialogue using the dialogue of built-in scenario and the dialogue engine of the server are prepared.
The controller MC first determines whether the utterance data (string data) based on the user's utterance conforms to the built-in scenario, otherwise the dialogue using the dialogue engine via the cloud server (hereinafter referred to as normal) Control to be interactive). From the user's point of view, it seems to always interact with the robot 1, but in reality there will be multiple internal processes in the natural interaction mode.
The controller MC executes processing of comparing the user's speech (character string data) created by the speech recognition engine on the cloud server side with the text data of the built-in scenario (script). In the present embodiment, the text data of the built-in scenario is stored in the memory. The built-in scenario may be added to the SD card. If it is an SD card, it is easy to rewrite built-in scenarios one after another by rewriting.
When the controller MC recognizes the user's speech, it determines whether the character string data matches the scheduled regular expression or non-regular expression, and if it matches, the script corresponding to the character string data is voice data by the speech synthesis engine And the speaker device 13 outputs a voice as an utterance of Robo and 1.
The built-in scenario, for example to promote the user's utterance "Hello" and a simple scenario, such as the greeting such as "today sounds good weather", in order to determine the processing of something based on the speech from the user Many built-in scenarios are set up (provided), such as scenario-like ones. Tables 1 to 3 disclose an example of such a built-in scenario. Of course, many built-in scenarios are set in addition to these built-in scenarios.

ビルトインシナリオ通りに対話がされない場合には、途中でビルトインシナリオでの対話は終了する。ビルトインシナリオ通りに対話がされない場合とは、例えば次のような場合である。
１）予定した正規表現又は非正規表現に合致しない場合
ビルトインシナリオに当初から、あるいは途中から正規表現又は非正規表現に合致しなくなる場合である。また、ユーザーの滑舌が悪くて発話を正しく取得できなかった場合も含む。この場合にはコントローラＭＣは通常対話であると判断して直ちに外部のクラウドサーバーに接続し、以後は外部のクラウドサーバーへ発話データをリクエスト発行し、外部のクラウドサーバー側の対話エンジンに文字列データ化された返答データを作成させる。そして、その返答データを音声合成エンジンによって音声データに変換しスピーカ装置１３から音声出力させるようにして対話を続ける。 If the interaction is not performed according to the built-in scenario, the interaction in the built-in scenario ends halfway. The case where the dialogue is not conducted according to the built-in scenario is, for example, the following case.
1) In the case where it does not match the scheduled regular expression or non-regular expression This is the case where the built-in scenario does not match the regular expression or the non-regular expression from the beginning or from the middle. Also included is the case where the user's slippage was bad and the utterance could not be acquired correctly. In this case, the controller MC determines that it is a normal dialogue and immediately connects to the external cloud server, and thereafter issues a request for speech data to the external cloud server, and the external cloud server side dialogue engine sends character string data Create a formatted response data. Then, the response data is converted into voice data by the voice synthesis engine and the dialogue is continued by causing the speaker device 13 to output voice.

２）予定通りにビルトインシナリオでの対話が終了した場合
例えば、ユーザーに対してシナリオに従った、例えば、「×××を行ってよいですか？」という発話をした際に、「はい」や「お願いします」等の肯定的な発話があって予定通りにビルトインシナリオでの対話が終了したため対話がなくなった場合、あるいはシナリオの途中で対話がなくなった場合等が考えられる。この場合には一定時間後に待ち受けモードとなる。
３）ある処理を進めてよいかどうかについてユーザーの発話が否定的であった場合
ユーザーに対してシナリオに従った、例えば、「×××を行ってよいですか？」という発話をした際に、「はい」や「お願いします」等の肯定的な発話ではなく、「いいえ」「間違いでした」のような否定的な発話があった場合もビルトインシナリオは終了し、以後の対話は１）又は２）と同様である。
この否定的発話の際にはコントローラＭＣは「本当にいいですか？」などと処理をやめてよいかどうかの確認を行う。これによってユーザーの言い間違いや心変わり等に対応することができる。例えば、電源オフ用シナリオにおいてユーザーに対してシナリオに従った「本当に電源オフしなくてもいいの？」という問いかけの発話をした際に、ユーザーから「はい」という発話があった場合には「本当に電源オフしなくてもいいの？」という問いかけを複数回（実施の形態では例えば３回）繰り返して「はい」があるとビルトインシナリオでの対話は終了する。 2) When the dialog in the built-in scenario is finished as scheduled For example, when the user utters "Are you OK to do?" According to the scenario, "Yes" or If there is a positive utterance such as "Please" and the dialogue in the built-in scenario is finished as scheduled, the dialogue disappears, or the dialogue disappears in the middle of the scenario. In this case, the standby mode is set after a predetermined time.
3) In the case where the user's utterance is negative about whether to proceed with a certain process When the user utters, for example, "Are you OK to do?" The built-in scenario also ends when there is a negative utterance such as "No" or "Wrong" instead of a positive utterance such as "Yes" or "Please", and the subsequent dialogue is 1 Or 2).
In the case of this negative utterance, the controller MC confirms whether or not to cancel the process, such as "Is it really good?" By this, it is possible to cope with the user's mistake or mental change. For example, in the power off scenario, when the user utters a question asking "Is it really necessary to turn off the power?" According to the scenario, if the user utters "Yes" The question "Does it really do not need to be turned off?" Is repeated several times (for example, three times in the embodiment), and if "Yes", the dialogue in the built-in scenario is ended.

『ビルトインシナリオとする効果』
このようにビルトインシナリオが用意されていると、すべての対話を外部サーバーにリクエストする必要がなく、装置内部で処理できるため、サーバーに接続する通信コストが軽減され、また通信時間やサーバー側での計算時間が不要となるためユーザーの発話に対する返答が遅くなりすぎて会話が途切れてしまうような違和感を覚えることがなくなる。また、例えば、決まった処理を実行させる場合にこのようなビルトインシナリオを設けておくことでユーザーは処理実行のためにタッチパネル部７を操作したり、他の端末からロボット１にアクセスしたりする必要がなくなり対話で処理を実行させることができ、ユーザーフレンドリーである。 "Effect of built-in scenario"
If the built-in scenario is prepared like this, it is not necessary to request an external server for all interaction, and it can be processed inside the device, so the communication cost for connecting to the server is reduced, and the communication time and server side Since the calculation time becomes unnecessary, the user does not feel uncomfortable that the response to the user's speech becomes too late and the conversation is interrupted. Further, for example, when executing a predetermined process, by providing such a built-in scenario, the user needs to operate the touch panel unit 7 to execute the process or access the robot 1 from another terminal. There is no need to interact with the process and it is user-friendly.

５．通常対話におけるリクエストとレスポンス
一方、発話データ（文字列データ）はビルトインシナリオではない場合に、コントローラＭＣはクラウドサーバーに接続させ、改めて発話データ（文字列データ）をサーバーに送信し対話エンジンによる返答データの作成をリクエストする。コントローラＭＣはユーザーが認証できている場合にはリクエストにおいてユーザー毎の認証情報（例えばＩＤとパスワード）に発話データの冒頭に送る。その場合には過去の対話情報が加味されて返答データが作成される。一方、ユーザーが認証できていない場合には過去に特定されていない人物として特に認証情報は送信しないので、過去の対話情報は加味されない。クラウドサーバーはリクエストがあると対話ＡＰＩ（Application Programming Interface）を利用して対話エンジンにその発話データ（文字列データ）に基づいて返答データ（文字列データ）を作成させ、ロボット１（コントローラＭＣ）にレスポンスする。過去のユーザーの対話履歴がある場合にはその内容を加味した返答データが作成される。コントローラＭＣはこの返答データを音声データに変換しスピーカ装置１３からロボット１側の発話として出力させる。
上記のビルトインシナリオと同様に一定以上時間のユーザーの無言があれば待ち受けモードとなる。
『５．通常対話におけるリクエストとレスポンスにおける効果』
ビルトインシナリオと異なり外部のクラウドサーバーに接続して通常対話をすることで、ビルトインシナリオに比べて格段なデータ量による高度な対話解析が迅速に実行できることとなり、実際に人と対話しているような高度な対話が実現できる。 5. Request and response in normal dialogue On the other hand, when the utterance data (character string data) is not a built-in scenario, the controller MC connects to the cloud server and transmits utterance data (character string data) to the server again, and response data by the dialogue engine Request to create When the user has been authenticated, the controller MC sends authentication information (for example, an ID and a password) for each user at the beginning of the speech data in the request. In that case, past dialogue information is added to create response data. On the other hand, when the user can not be authenticated, the authentication information is not transmitted as a person who has not been identified in the past, so the past dialogue information is not added. When there is a request, the cloud server makes the dialogue engine create response data (character string data) based on the utterance data (character string data) using the dialogue API (Application Programming Interface), and makes the robot 1 (controller MC) Respond. If there is a past user interaction history, response data is created in consideration of the contents. The controller MC converts the response data into voice data and causes the speaker device 13 to output it as an utterance on the robot 1 side.
Similar to the above built-in scenario, if there is a user's silence for a certain period of time or more, it will be in the standby mode.
[5. Effects of request and response in normal dialogue "
Unlike built-in scenarios, connecting to an external cloud server and interacting normally makes it possible to perform advanced dialogue analysis with a significant amount of data more quickly than built-in scenarios, and it seems that you are actually interacting with people Advanced dialogue can be realized.

６．対話時のロボット１の所作について
コントローラＭＣは、ビルトインシナリオ又は通常対話に関わらず自然対話モードで対話が行われている際に以下のイ．〜ニ．のような所作の制御を実行する。
イ．起動〜自然対話モード〜待ち受けモードにおけるロボット１のジェスチャー
コントローラＭＣはロボット１の以下の様々なタイミングで第１〜第３のモータ２３〜２５を制御してロボット１の姿勢を変えるようにする。以下は一例である。
１）起動時：頭部５の顔面部６が正面を向いていない場合や頭部５が傾いている場合に正面のデフォルト位置に移動させる。
２）画面タッチ時：１）と同様（顔認証における顔認識用カメラ１０をユーザーと正対させるため）
３）「ねぇユピ坊」というトリガーの発話発生時：１）と同様（顔認証における顔認識用カメラ１０をユーザーと正対させるため）
４）音声方向検出時：頭部５の顔面部６をその方向に向ける。
５）特別な感情発話として、例えばうれしい場合：頭部５を顔面部６をユーザーに向けたまま左右方向（時計回りと反時計回り）に回動するように第３のモータ２５を制御する。
６）特別な発話として、例えば悲しい場合：頭部５をうなずいたまましばらく静止させ、その後デフォルト位置に戻すように第２のモータ２４を制御する。
７）感情は発話として、例えば「おはよう」「こんにちは」「こんばんわ」「ハロー」のような挨拶用の発話発生時：頭部５をお辞儀させるように第２のモータ２４を制御する。
８）特別ではない感情発話として、例えば「そうか」「わかった」「そうだね」「うん！」のような簡単な肯定的な意思疎通の用語の発話発生時：頭部５をうなずかせるように第２のモータ２４を制御する。６〜８）では第２のモータ２４の速度や時間を変更して悲しさとお辞儀とうなづきが異なるようにするとよい。
９）特別ではない感情発話として、例えば「いいえ」「できない」「だめだよ」のような簡単な否定的な意思疎通の用語の発話発生時：可動部３を何度か左右に回動させるように第１のモータ２３を制御する。
１０）特別な感情発話や特別ではない感情発話と様々な対話に応じて、様々なジェスチャー、例えば頭部５をなんどか左右方向（時計回りと反時計回り）にや前後に回動させたり、それと可動部３全体を左右に回動させたり、大きく回動させたり小さくうなづくように回動させたりを組み合わせてもよい。
このロボット１のジェスチャーは以下のタッチパネル部７（顔面部６）における表示と組み合わせるとよい。
『６．対話時のロボット１の所作についてにおける効果その１』
ロボット１にこれらのようなジェスチャーをさせることで、ユーザーはロボット１に親しみを覚えることとなり、ロボット１との対話を楽しむと同時にロボット１と積極的に触れ合う楽しみを覚えることになる。 6. About the behavior of the robot 1 at the time of interaction The controller MC performs the following operations while the dialog is being performed in the natural dialog mode regardless of the built-in scenario or the normal dialog. ~ D. To control the behavior like.
B. The gesture controller MC of the robot 1 in the start-natural dialogue mode-standby mode controls the first to third motors 23 to 25 at various timings of the robot 1 to change the posture of the robot 1. The following is an example.
1) At start-up: When the face portion 6 of the head 5 is not facing the front, or when the head 5 is tilted, the head 5 is moved to the default position on the front.
2) When the screen is touched: the same as 1) (for facing the camera 10 for face recognition in face recognition with the user)
3) At the time of the utterance occurrence of the trigger "Nee yupibobo": 1) (in order to make the camera 10 for face recognition in face recognition face the user)
4) At the time of voice direction detection: The face portion 6 of the head 5 is directed in that direction.
5) As special emotional speech, for example, when happy: The third motor 25 is controlled to turn the head 5 in the left-right direction (clockwise and counterclockwise) with the face portion 6 facing the user.
6) As a special utterance, for example, in the case of sadness: The second motor 24 is controlled so as to stand still for a while with the head 5 nodding and then return to the default position.
7) emotions as speech, for example, "Good morning,""Hello,""Goodevening" during the speech occurs for greeting such as "Hello": to control the second motor 24 so that the head 5 to bow.
8) As a non-special emotional speech, for example, when an utterance of a simple positive communication term such as "Oka""Ok""Oh""Yeah""Yeah!" Nod head 5 To control the second motor 24. In 6 to 8), it is preferable to change the speed and time of the second motor 24 so that sadness and bowing and contact may be different.
9) As a non-special emotional speech, for example, when an utterance of a simple negative communication term such as "No", "I can not" or "I'm sorry": Move the movable part 3 to the left or right several times The first motor 23 is controlled as follows.
10) Various gestures, for example, rotating the head 5 in the lateral direction (clockwise and counterclockwise) or back and forth according to special emotional speech and non-special emotional speech and various dialogues, It may be combined with rotating the whole movable part 3 to the left and right, or rotating it so as to make it larger or smaller.
The gesture of the robot 1 may be combined with the display on the touch panel unit 7 (face unit 6) described below.
[6. About the action of robot 1 at the time of dialogue
By making the robot 1 make such gestures as described above, the user feels familiar with the robot 1 and enjoys the interaction with the robot 1 and the pleasure of actively touching the robot 1 at the same time.

ロ．顔画面Ｓ１の状態での表示態様の変化
１）ユーザーから発話がされている場合
コントローラＭＣはユーザーの発話をマイクロフォン１２から取得してこれを認識すると、タッチパネル部７の図５に示すような顔画面Ｓ１において楕円領域２９を青色表示としてユーザーの発話音量に応じてその領域の面積（つまり大きさ）を変化させるアニメーション表示をする。具体的にはコントローラＭＣは、ユーザーの発話の音量が大きくなると楕円領域２９は楕円形状を保ったまま拡大させ、音量が小さくなると楕円形状を保ったまま縮小させる。また、ほっぺオブジェクト２８を緑色で表示させる。 B. Change in display mode in the state of the face screen S1 1) When the user speaks from the user When the controller MC obtains the user's speech from the microphone 12 and recognizes it, the face of the touch panel unit 7 as shown in FIG. In the screen S1, the elliptical area 29 is displayed in blue, and animation display is performed to change the area (that is, the size) of the area according to the user's speech volume. Specifically, the controller MC enlarges the elliptical area 29 while maintaining the elliptical shape when the volume of the user's speech increases, and reduces the elliptical area when the volume decreases. In addition, the cheek object 28 is displayed in green.

また、コントローラＭＣは、クラウドサーバーからレスポンスされたユーザーの発話データ（文字列データ）を所定の態様でタッチパネル部７に表示させる。
例えば、図５に示すように、タッチパネル部７が顔画面Ｓ１の場合に、ユーザーからの例えば「こんにちは」という発話を取得すると、コントローラＭＣは顔画面Ｓ１のレイヤー画面にこの発話に基づく「こんにちは」という文字列を表示させる。順序としてはユーザーの発話の返答となるロボット１の発話よりも先にこの表示が開始される。
表示態様としては、例えば顔画面Ｓ１を図１０（ａ）から図１０（ｂ）のように透明な状態から徐々に不透明になるように表示させ、最後に図１０（ｃ）のように背後の顔画面Ｓ１を完全に隠すようにする。つまり、徐々に文字列を表示させていくようにする。この文字列だけを暗い背景に対して文字部分だけを明るく表示させた図１０（ｃ）の状態をごくわずかな一定時間停止表示させた後に、今度は逆に文字列を表示したレイヤー画面を図１０（ｃ）→図１０（ｂ）→図１０（ａ）というように徐々に消していき、デフォルト状態である顔画面Ｓ１に戻すようにする。このとき、一回の発話での文字列はすべて同時に現れてきて同時に消失していくように表示される。この表示態様は一例であり、異なる態様で表示させるようにしてもよい。
『６．対話時のロボット１の所作についてにおける効果その２』
これによってユーザーは自分の発した言葉をロボット１上で目で見ることができるため、ロボット１が正しく聞き取ったかどうかを確認でき、対話が間違いなく行われているかを判断でき、おかしな的外れな対話にならないように対話を導くことができる。また、的外れな会話はついイライラしてしまうが、確認することでその理由もわかるため、しゃべり方を変えて再度対話を試みることもできる。
また、ユーザーの発話データはビルトインシナリオの対象も通常対話の対象もすべてクラウドサーバーに一旦文字列データすることをリクエストするため、文字列データ化の前提処理に手間取らず、また、このような文字列データ後に初めて対話エンジンによる返答データの作成がリクエストされることとなるため、ユーザーの発話のタッチパネル部７の表示は少なくともロボット１の返答データによる発話より前に行うことができ、対話の順序を間違うおそれがない。 Further, the controller MC causes the touch panel unit 7 to display the user's utterance data (character string data) sent from the cloud server in a predetermined manner.
For example, as shown in FIG. 5, when the touch panel section 7 of the face screen S1, the example of the user to get the speech as "Hello", the controller MC is based on the utterance layer screen face screen S1 "Hello" Display the string. As the order, this display is started earlier than the speech of the robot 1 which is the reply of the speech of the user.
As the display mode, for example, the face screen S1 is displayed so as to gradually become opaque from the transparent state as shown in FIG. 10 (a) to FIG. 10 (b), and finally it is displayed behind as shown in FIG. Make the face screen S1 completely hidden. In other words, display the character string gradually. This figure shows the layer screen with the text displayed in reverse, after stopping displaying the state of FIG. 10 (c) with only the text displayed bright against the dark background for a very short period of time 10 (c) → FIG. 10 (b) → FIG. 10 (a) and gradually disappear to return to the default face screen S1. At this time, all the character strings in one utterance appear simultaneously and are displayed so as to disappear simultaneously. This display mode is an example, and may be displayed in a different mode.
[6. About effect of robot 1's action at the time of dialogue
As a result, the user can see his / her words on the robot 1 by eye, so that it can check whether the robot 1 has correctly heard, and can judge whether the dialog is definitely being carried out, which is a strange dialogue. Can lead to dialogue. In addition, although the off-target conversation is just frustrating, it is possible to change the way of speaking and try to talk again because the reason is also understood by confirming.
In addition, the user's speech data is subject to both the built-in scenario target and the normal interaction target to request the cloud server to perform character string data once, so the process of string data digitization does not take time, and such characters Since the creation of response data by the dialogue engine is requested for the first time after the column data, the display of the user's speech on the touch panel unit 7 can be performed at least before the speech by the reply data of the robot 1. There is no risk of mistakes.

ユーザーからの発話を文字列とする場合、その長さは発話に応じて異なるため同じではない。また、単語ではなく文節がある「文」となっている場合にはかなり長くなる場合もある。コントローラＭＣはそのように長い文の発話である場合でも、一回の発話の内容がタッチパネル部７にすべて同時に表示されるように文字列のフォントの大きさを調整する。つまり、一回の発話が短ければ大きなフォントで、一回の発話が長くなるほど相対的に小さなフォントで表示させる。
『６．対話時のロボット１の所作についてにおける効果その３』
これによって、ユーザーがどのような発話をしても、一回の目視で確認できるため、全文が現れるまで対話を中断しにくく、次のユーザーからの発話とかぶりにくくなる。また、一回の発話が一度に同時に現れるため、文全体を一挙に理解できることとなり、表示される時間が短くともユーザーは十分理解できることとなる。また、タッチパネル部７全体に文字列が展開されるため、字の１つ１つを大きく表示できユーザーにとって見やすくなっており、ごく短い表示時間であっても十分確認できるようになっている。 When the speech from the user is a character string, the lengths are not the same because they differ depending on the speech. In addition, when it is not a word but a "sentence" with a clause, it may be quite long. The controller MC adjusts the font size of the character string so that the contents of one utterance are all simultaneously displayed on the touch panel unit 7 even in the case of the utterance of such a long sentence. That is, if one utterance is short, the font is displayed in a large font, and as the one utterance is long, a relatively small font is displayed.
[6. Effects of Robot 1 on the Action of Dialogue 1
This makes it difficult to interrupt the dialogue until the full text appears, and makes it difficult for the next user's utterance and fogging, because the user can confirm it with a single visual check, no matter what the utterance. In addition, since one utterance appears simultaneously at a time, the whole sentence can be understood at once, and even if the displayed time is short, the user can fully understand it. Further, since the character strings are expanded on the entire touch panel unit 7, each character can be displayed large and easy for the user to view, and even a very short display time can be sufficiently confirmed.

２）ロボット１から発話がされている場合
タッチパネル部７の図５に示すような顔画面Ｓ１において楕円領域２９を赤色表示としてロボットの発話音量に応じてその領域の面積（つまり大きさ）を変化させるアニメーション表示をする。具体的にはコントローラＭＣは、スピーカ装置１３からの出力レベルが大きくなると楕円領域２９は楕円形状を保ったまま拡大させ、音量が小さくなると楕円形状を保ったまま縮小させる。また、ほっぺオブジェクト２８を薄い赤色で表示させる。
『６．対話時のロボット１の所作についてにおける効果その４』
このように、ユーザーとロボット１との交互の対話に応じて顔画面Ｓ１における表示態様が異なることとなり、実際の対話だけでなく画面においても交互に行われるというおもしろさがあり、会話がはずむことになる。 2) When the robot 1 speaks, the elliptical area 29 is displayed in red on the face screen S1 as shown in FIG. 5 of the touch panel unit 7 and the area (that is, the size) of the area changes according to the speech volume of the robot Display animation to make Specifically, the controller MC enlarges the elliptical area 29 while maintaining the elliptical shape when the output level from the speaker device 13 increases, and reduces the elliptical area when the volume decreases. In addition, the cheek object 28 is displayed in light red.
[6. Effects of Robot 1 on Action of Dialogue 4
As described above, the display mode on the face screen S1 is different according to the alternate dialogue between the user and the robot 1, and there is an interest that the dialogue is alternately performed not only on the actual dialogue but also on the screen, and the conversation is broken. become.

ハ．チャット画面Ｓ２の表示態様の変化
図６及び図８に基づいてタッチパネル部７のチャット画面Ｓ２の対話に伴う表示態様について説明する。上記のようにユーザーの操作によって顔画面Ｓ１からチャット画面Ｓ２へと表示が変わる。
まず、改めてチャット画面Ｓ２の構成について説明する。
図６に示すように、チャット画面Ｓ２の左寄り下側位置にはアバターキャラクターとしてユーザーオブジェクト３１が、右寄り下側位置には同じくロボットオブジェクト３２が対向するように配置されて表示されている。ユーザーオブジェクト３１は後述する顔認識モードで認識された認証されたユーザー毎あるいは認証のないユーザにおいて異なるオブジェクトが用意され、現在対話しているユーザーに応じてそれぞれ異なるオブジェクトが表示される。中央寄り領域にはユーザー側とロボット１側の対話内容を文字列化して配置した吹き出しオブジェクト３３が時間軸に沿って順に表示されている。チャット画面Ｓ２の左寄り上側位置には対話停止ボタンオブジェクト３４が表示されている。チャット画面Ｓ２の右寄り上側位置には設定ボタンオブジェクト３５が表示されている。 Ha. Change of Display Mode of Chat Screen S2 A display mode according to the dialogue of the chat screen S2 of the touch panel unit 7 will be described based on FIGS. 6 and 8. As described above, the display changes from the face screen S1 to the chat screen S2 by the operation of the user.
First, the configuration of the chat screen S2 will be described again.
As shown in FIG. 6, at the lower left position of the chat screen S2, the user object 31 is disposed as an avatar character, and at the lower right position, the robot object 32 is also displayed oppositely. In the user object 31, different objects are prepared for each authenticated user recognized in the face recognition mode described later or for a user without authentication, and different objects are displayed according to the currently interacting user. In the central area, a balloon object 33, in which dialogue contents of the user side and the robot 1 side are arranged in a character string, is sequentially displayed along the time axis. A dialogue stop button object 34 is displayed at the upper left position of the chat screen S2. A setting button object 35 is displayed at the upper right position of the chat screen S2.

チャット画面Ｓ２ではユーザー側とロボット１側の対話に応じて刻々と吹き出しオブジェクト３３が追加されるように表示される。吹き出しオブジェクト３３には文字列データ化されたユーザーの発話内容と、同じく文字列データ化されたロボット１の発話内容が時間軸に沿って一列に表示されてチャット画面Ｓ２上に表示可能とされ、直近の発話内容は新たな吹き出しオブジェクト３３内にその発話と同期して過去の吹き出しオブジェクト３３列の最も下側に表示される。
吹き出しオブジェクト３３はユーザー側の発話内容かロボット１側の発話内容かわかるように発話方向が示されている。すべての対話履歴を一度に画面表示できないためチャット画面Ｓ２は上下方向にスクロール可能な画面構成とされ、過去に遡って吹き出しオブジェクト３１を表示させることができる。過去に遡らない場合には常に直近の対話の吹き出しオブジェクト３３が表示される。
本実施の形態１では一旦対話が終了して待ち受けモードとなった後に、対話が再開され、その際に後述する顔認識モードで改めて認識されたユーザーが変更された場合には、吹き出しオブジェクト３３列の途中に「ユーザー交替」の表示がされ、ユーザーオブジェクト３１が改めて認識されたユーザーに応じて違うユーザーオブジェクト３１が表示される。 In the chat screen S2, the balloon object 33 is displayed so as to be added every moment according to the interaction between the user side and the robot 1 side. In the balloon object 33, the user's utterance content converted into character string data and the utterance content of the robot 1 similarly converted into character string data are displayed in a line along the time axis and can be displayed on the chat screen S2. The latest utterance content is displayed in the new balloon object 33 at the lowest position of the past balloon object 33 in synchronization with the utterance.
The direction of speech of the balloon object 33 is indicated so as to indicate whether it is the speech contents on the user side or the speech contents on the robot 1 side. Since it is not possible to display all the dialogue history on the screen at one time, the chat screen S2 has a screen configuration which can be scrolled in the vertical direction, and the balloon object 31 can be displayed retroactively. When not going back to the past, the balloon object 33 of the latest dialogue is always displayed.
In the first embodiment, after the dialog is ended and the standby mode is set, the dialog is resumed, and at that time, when the user recognized again in the face recognition mode to be described later is changed, the balloon object 33 is displayed In the middle of the display of “user alternation” is displayed, and a different user object 31 is displayed according to the user whose user object 31 has been recognized again.

また、チャット画面Ｓ２の対話停止ボタンオブジェクト３４をタッチすることで、対話はユーザーによって能動的に中断され、待ち受けモードとなる。この場合には図６に代わって図８のチャット画面Ｓ２の待ち受け画面が表示されることとなるが、対話停止ボタンオブジェクト３４に位置には対話開始ボタンオブジェクト３６が代わって表示される。再び自然対話モードにする場合には対話開始ボタンオブジェクト３６をタッチすることで図６のチャット画面Ｓ２に戻ることができる。
『６．対話時のロボット１の所作についてにおける効果その５』
このように対話する関係にあるユーザーのユーザーオブジェクト３１とロボット１のロボットオブジェクト３２とが対向するように配置され、その間に対話した吹き出しオブジェクト３３が並ぶことでいかにも対話しているような感覚をチャット画面Ｓ２から受けることができる。
また、過去のチャット履歴を後から確認することもできるため日記代わりにチャット利用をすることができる。また、だれがどのような対話をしたかもわかるため、家族でだれがよく利用しているか等といったデータを確認することもできる。「ユーザー交替」という表示がされるので、そこで一旦対話が途切れていることがわかり、過去の履歴を読んだ際の混乱がない。 Further, by touching the dialogue stop button object 34 of the chat screen S2, the dialogue is actively interrupted by the user, and the standby mode is set. In this case, the standby screen of the chat screen S2 of FIG. 8 is displayed instead of FIG. 6, but the dialog start button object 36 is displayed in place of the dialog stop button object 34. When the natural interaction mode is to be set again, it is possible to return to the chat screen S2 of FIG. 6 by touching the interaction start button object 36.
[6. Effects of robot 1's behavior during dialogue Part 5
In this way, the user object 31 of the user in the interactive relationship and the robot object 32 of the robot 1 are arranged to face each other, and the balloon objects 33 which interact with each other are lined up to chat like a feeling of interaction It can be received from the screen S2.
Also, since the past chat history can be confirmed later, the chat can be used instead of the diary. In addition, because it is possible to know who interacts with what, it is also possible to confirm data such as who is often used by family members. Since the display "user change" is displayed, it is understood that the dialogue has been interrupted once, and there is no confusion when reading the past history.

ニ．通常対話における特別な所作
通常対話においてクラウドサーバーはユーザーの発話データ内に特定の言葉が含まれていると判断した場合に特別な所作を実行させるようなコマンドを文字列データとともにレスポンスする。コントローラＭＣはそのコマンドによって上記の画面表示プログラムやジェスチャープログラムや対話プログラムに基づいて、例えば次のような具体的な所作を実行させる。以下の制御は一例であり、他の所作となるように制御をさせてもよく、ユーザーの発話中にコマンドが複数あれば連続又は同時に所作を行うように制御してもよい。以下の特別な所作はそれぞれ別個でもよく、組み合わせるように実行されてもよい。上記の「６．対話時のロボット１の所作について」のイ．におけるロボット１のジェスチャーに代わって下記の表示部での表示をしてもよく、下記の表示部での表示を適宜組み合わせるようにしてもよい。 D. Special Actions in Normal Dialogue In normal dialogue, the cloud server responds to the user with a command to execute a special action when it is determined that the user's speech data contains a specific word. The controller MC executes, for example, the following specific actions based on the screen display program, the gesture program, and the dialogue program according to the command. The following control is an example, and may be controlled to be another behavior, or may be controlled to perform behavior continuously or simultaneously if there are a plurality of commands in the user's speech. The following special actions may be implemented separately or in combination. B. Above about “6. Action of robot 1 at the time of dialogue” Instead of the gesture of the robot 1 in the above, display on the following display unit may be performed, or the display on the following display unit may be appropriately combined.

１）通常の人同士の会話で否定的な表現がユーザーから発話された場合には、ロボット１側の発話と同時にタッチパネル部７の表示を図９（ａ）の通常の目から図９（ｂ）の怒った目のオブジェクトに変化するアニメーション表示をさせる。本実施の形態では目のオブジェクトはメモリに記憶されている。
２）通常の人同士の会話で楽しくなるような表現がユーザーから発話された場合には、ロボット１側の発話と同時にタッチパネル部７の表示を図９（ａ）の通常の目から図９（ｃ）の笑った目のオブジェクトに変化するアニメーション表示をさせる。本実施の形態では目のオブジェクトはメモリに記憶されている。
３）通常の人同士の会話で悲しくなるような表現がユーザーから発話された場合には、ロボット１側の発話と同時にタッチパネル部７の表示を図９（ａ）の通常の目から図９（ｄ）の悲しそうな目のオブジェクトに変化するアニメーション表示をさせる。本実施の形態では目のオブジェクトはメモリに記憶されている。
４）ユーザーの子供の名前がユーザーから発話された場合には、ロボット１の頭部５がうなずくようなジェスチャー動作をするように第２のモータ２４を制御する。本実施の形態では目ジェスチャー用のプログラムはメモリに記憶されている。
５）ロボット１を製造している会社名がユーザーから発話された場合には、ロボット１の頭部５が前後左右に動くと同時に胴部４が固定部２に対してなんども揺動を繰り返すようなジェスチャー動作をするように第１〜第３のモータ２３〜２５を制御する。同時にその会社名のテキストデータを音声合成して、音声としてスピーカ装置１３から会社名を連呼させる。本実施の形態ではジェスチャー用のプログラムはメモリに記憶されている。
『６．対話時のロボット１の所作についてにおける効果その６』
これらのような特別な所作が行われることで、ユーザーは対話と同時にロボット１の思わぬ所作を期待することができ、ロボット１との対話を積極的に楽しむことができる。 1) When a negative expression is uttered by the user in a normal person-to-person conversation, the display on the touch panel 7 is made simultaneously with the utterance on the robot 1 side from the normal eye of FIG. Make the animated display change to the angry eye object of). In the present embodiment, the eye object is stored in the memory.
2) When a user utters an expression that makes the person's conversation between people more enjoyable, the display on the touch panel unit 7 simultaneously with the utterance on the robot 1 side can be seen from the normal eye of FIG. The animated object which changes in c) laughing is displayed. In the present embodiment, the eye object is stored in the memory.
3) When a user utters an expression that makes him sad in a normal conversation between people, the display on the touch panel unit 7 simultaneously with the utterance on the robot 1 side can be seen from the normal eye of FIG. d) Make the sad eye object change animation display. In the present embodiment, the eye object is stored in the memory.
4) When the user's child's name is uttered from the user, the second motor 24 is controlled so that the head 5 of the robot 1 makes a nodding gesture. In the present embodiment, the program for the eye gesture is stored in the memory.
5) When the name of the company manufacturing the robot 1 is uttered by the user, the head 4 of the robot 1 moves back and forth and right and left while the body 4 repeatedly swings with respect to the fixed part 2 The first to third motors 23 to 25 are controlled to perform such gesture operations. At the same time, the text data of the company name is subjected to voice synthesis, and the company name is continuously called from the speaker device 13 as voice. In the present embodiment, a program for gestures is stored in the memory.
[6. Effects of the robot 1 on the behavior of the robot during dialogue Part 6
By performing such special movements, the user can expect unexpected movements of the robot 1 at the same time as the dialogue, and can actively enjoy the dialogue with the robot 1.

Ｂ．顔認識時の動作内容について
１．顔認識モードの開始と停止
コントローラＭＣは顔認識プログラムを実行することによってユーザーの顔の認識及び認証をする。顔認識プログラムでは取得した画像を顔パターン認識することによって人の顔と認識し、かつ認識された顔の様々な位置を数値化して記憶することで過去に登録された顔の数値データとの一致度を判断して認証を行う。コントローラＭＣは自然対話モードと同期して顔認識モードとし、待ち受けモードから自然対話モードに移行する度に顔認証を実行する。 B. About operation contents at the time of face recognition 1. Activation and Deactivation of Face Recognition Mode The controller MC recognizes and authenticates the user's face by executing a face recognition program. In the face recognition program, the acquired image is recognized as a human face by face pattern recognition, and various positions of the recognized face are quantified and stored to match the numerical data of the face registered in the past Determine the degree and perform authentication. The controller MC sets the face recognition mode in synchronization with the natural dialogue mode, and executes face recognition each time the standby mode is shifted to the natural dialogue mode.

顔認識モードではコントローラＭＣは顔認識用カメラ１０を使用してユーザーの顔認識を行う。具体的には、
１）顔認識用カメラ１０を起動させる。顔認識用カメラ１０に写ったユーザーの顔画像をタッチパネル部７に表示させる（顔表示モード）。つまり、ユーザーにタッチパネル部７上の自分の顔を見るように促す。これによって顔認識処理が可能となり、このようにプレビューさせることで過去に認証された人かどうかを判断できる。
２）１）で顔認識用カメラ１０が顔を撮影できず、一定時間内に顔認識ができなかった場合には、第３のモータ２５を駆動させて頭部５を上下に揺動させる。つまり顔認識用カメラ１０に縦方向をスキャンさせる。そして、そのように顔認識用カメラ１０を縦方向にスキャンさせながら第１のモータ２３を駆動させて顔認識用カメラ１０を３６０度一周回転させながら顔認識動作をさせる。
３）１）又は２）で顔認識できた場合には認証を行う。既に登録されたユーザーであれば対話において特定の認証されたユーザーのデータを利用して上記自然対話モードとする。登録されていないユーザーであれば不特定の人物として認識して上記自然対話モードとする。タッチパネル部７は顔表示モードから直前の顔画面Ｓ１（図５）かチャット画面Ｓ２（図６）のいずれかに復帰する。
４）２）において顔認識ができなかった場合には人を認識できなかったとして顔認識モードとともに自然対話モード自体を終了させて待ち受けモードとする。タッチパネル部７は顔表示モードから直前の待ち受けモードである顔画面Ｓ１の待ち受け画面（図７）かチャット画面Ｓ２の待ち受け画面（図８）のいずれかに復帰する。
『１．顔認識モードの開始と停止における効果』
対話は相手の顔を見ながら話すのが基本であるため、顔が認識できない場合には対話をさせないことで、積極的にユーザーに顔認識をさせるようにしたため、対話においてはロボット１と実際に面と向かわないと対話はできず、そのためユーザーは実際に対話をしているような感覚を得ることができる。 In the face recognition mode, the controller MC performs face recognition of the user using the face recognition camera 10. In particular,
1) Activate the face recognition camera 10. The user's face image captured by the face recognition camera 10 is displayed on the touch panel unit 7 (face display mode). That is, the user is urged to look at his face on the touch panel unit 7. This enables face recognition processing, and by previewing in this way, it is possible to determine whether the person has been authenticated in the past.
2) If the face recognition camera 10 can not capture a face in 1) and face recognition can not be performed within a predetermined time, the third motor 25 is driven to swing the head 5 up and down. That is, the face recognition camera 10 is caused to scan in the vertical direction. Then, the first motor 23 is driven while the face recognition camera 10 is scanned in the vertical direction, and the face recognition operation is performed while the face recognition camera 10 is rotated 360 degrees.
3) If face recognition is possible in 1) or 2), authentication is performed. If the user is already registered, the natural interaction mode is set by using data of a specific authenticated user in the interaction. If it is a user who is not registered, it will be recognized as an unspecified person, and it will be said natural dialogue mode. The touch panel unit 7 returns from the face display mode to either the face screen S1 (FIG. 5) immediately before or the chat screen S2 (FIG. 6).
4) If face recognition can not be performed in 2), it is determined that a person can not be recognized, and the natural dialogue mode itself is ended together with the face recognition mode, and a standby mode is set. The touch panel unit 7 returns from the face display mode to either the standby screen (FIG. 7) of the face screen S1 which is the last standby mode or the standby screen (FIG. 8) of the chat screen S2.
[1. Effect in start and stop of face recognition mode
Since dialogue is based on speaking while looking at the face of the other party, the user is made to actively recognize the face by not allowing the dialogue to be made if the face can not be recognized. If you don't face it, you can't interact, so you get the feeling that the user is actually interacting.

２．顔認識時のロボット１の所作について
１）顔認識後においてはコントローラＭＣは顔認識用カメラ１０に画像を取得させて一定のタイミングで常時顔パターン認識を実行する。そして、顔認識用カメラ１０の画角内でユーザーの顔を認識し、画角内の所定の位置、例えば画角中央の原点にユーザーの顔の２つの目の中央位置Ｃがある状態をデフォルト位置とする。コントローラＭＣはこのデフォルト位置から中央位置Ｃがずれた場合に、そのずれ量に応じて左右いずれかのずれ方向に瞳オブジェクト２７ａが移動するようなアニメーション表示をさせる。通常、瞳オブジェクト２７ａは、例えば図９（ａ）のように目オブジェクト２７の中で白目内に楕円形状として全体が現れているが、ユーザーの顔が移動しているある状態では図１１に示すように瞳オブジェクト２７ａはあたかもその方向を見ているように一部が隠れた目オブジェクト２７として表示されることとなる。
ユーザーが動いて顔認識用カメラ１０の画角から顔が出てしまい顔認識できなくなった場合には、コントローラＭＣは第１のモータ２３を駆動させ、中央位置Ｃがずれた方向に可動部３全体を回動させて顔認識用カメラ１０を向けるよう制御する。顔認識がされた段階で第１のモータ２３の駆動を停止させる。ある程度の回動、例えば可動部３全体を４５度回動させても顔認識がされない場合には、その段階でコントローラＭＣは第１のモータ２３の駆動を停止させ、その状態で常時顔パターン認識を継続する。
『２．顔認識時のロボット１の所作についてにおける効果』
これによって、ユーザーはロボット１にいつも見られながら対話をしているような感覚になり、対話のおもしろさが増すこととなる。 2. About the action of the robot 1 at the time of face recognition 1) After face recognition, the controller MC causes the face recognition camera 10 to acquire an image, and always performs face pattern recognition at a fixed timing. Then, the user's face is recognized within the angle of view of the face recognition camera 10, and a state in which the center position C of the user's face is located at a predetermined position within the angle of view Position. When the central position C deviates from the default position, the controller MC displays an animation such that the pupil object 27a moves in either the left or right direction according to the amount of deviation. Normally, the entire pupil object 27a appears as an elliptical shape in the white eyes in the eye object 27 as shown in FIG. 9A, for example, but it is shown in FIG. 11 in a state where the user's face is moving Thus, the pupil object 27a is displayed as the partially hidden eye object 27 as if looking in the direction.
When the user moves and the face comes out from the angle of view of the face recognition camera 10 and the face can not be recognized, the controller MC drives the first motor 23 to move the movable portion 3 in the direction in which the central position C is shifted. It is controlled to turn the whole to direct the face recognition camera 10. The driving of the first motor 23 is stopped when the face is recognized. If face recognition is not performed even if the movable part 3 is rotated 45 degrees to some extent, for example, the controller MC stops driving the first motor 23 at that stage, and face pattern recognition is always performed in that state. To continue.
[2. Effects of Robot 1's Action on Face Recognition
As a result, the user feels as if the robot 1 is always watching and interacting, and the fun of the dialog is increased.

２）コントローラＭＣは顔認識用カメラ１０で常時顔パターン認識を実行するが、ユーザが動いていたり顔認識用カメラ１０の画角内にいなかったりする場合には顔認識ができない。そのため、顔認識状態を画面上の変化としてユーザーに報知することがよい。実施の形態１では、瞳オブジェクト２７ａ内部の瞳上での反射を表現した鎌状の反射オブジェクト２７ｂの色の濃さの変化で顔認識状態を報知するようにしている。本実施の形態１では、コントローラＭＣは認識されていない場合にはごく薄い青色で表示させ、認識中である状態ではそれより濃く、通常の顔認識されている状態では濃い青色で表示させる。
『２．顔認識時のロボット１の所作についてにおける効果その２』
これによって、ユーザーはロボット１に顔認識されているかいないかが容易にわかるため、積極的に顔認識するようにユーザーは協力するようになり、円滑な対話が進むこととなる。 2) The controller MC always performs face pattern recognition with the face recognition camera 10, but face recognition can not be performed when the user is moving or is not within the angle of view of the face recognition camera 10. Therefore, it is preferable to notify the user of the face recognition state as a change on the screen. In the first embodiment, the face recognition state is notified by the change in the color density of the eyelid-like reflective object 27b expressing the reflection on the pupil inside the pupil object 27a. In the first embodiment, the controller MC displays a very pale blue color when it is not recognized, a darker color when it is in recognition, and a dark blue color when a normal face is recognized.
[2. Effects of robot 1 on face recognition in face recognition Part 2
As a result, since the user can easily recognize whether or not the robot 1 is face-recognized, the user cooperates to actively face-recognize, and a smooth dialogue progresses.

Ｃ．留守設定時の動作について
実施の形態１では留守設定モード、つまり留守設定時に登録したｅメールに対して画像の転送が可能である。留守設定モードは本実施の形態１ではユーザーがロボット１のチャット画面Ｓ２の設定ボタンオブジェクト３５をタッチした後に表示される設定画面において設定とその解除がされる。以下では留守設定モードがされている場合のコントローラＭＣの留守設定時プログラムに基づく処理について説明する。
コントローラＭＣは、留守設定モードにおける上記の「２．自然対話モードの開始と停止」におけるＢ．の４）での待ち受けードにおいて、ドップラーセンサ２２によって物体（人）が動いていると判断すると以下のように制御する。 C. Regarding the operation at the time of absence setting In the first embodiment, it is possible to transfer an image to the absence setting mode, that is, an e-mail registered at the time of absence setting. In the present embodiment, the absence setting mode is set and released on the setting screen displayed after the user touches the setting button object 35 of the chat screen S2 of the robot 1. Hereinafter, a process based on the absence setting program of the controller MC when the absence setting mode is selected will be described.
The controller MC controls the B.I. in the above-mentioned “2. In the standby mode in 4), if it is determined by the Doppler sensor 22 that the object (person) is moving, the following control is performed.

１）コントローラＭＣは、ロボット１の周囲になんらかの動く物体が存在することで、この状態をユーザーにｅメールによって報知をする。コントローラＭＣはインターネット回線を通じてｅメールアドレスが登録されているロボット１の近くにいないユーザー（以下、外部ユーザーとする）の端末装置、例えばスマートフォンに対してロボット１のクラウドサーバーのＵＲＬをｅメールに記載して送る。ｅメールの件名や送信文中にこの報知の意図のわかるような表現を表記をする。例えば「誰かが来ているようです」のような文章やそれを意味するようなアイコン等である。
『Ｃ．留守設定時の動作についてにおける効果』
これによって、まずｅメールが送られて来たことによって外部ユーザーはなんらかのロボット１周囲に物体（人）が動いる状態が報知されて認識することができ、この状態に対して外部ユーザーに対策をとる機会が与えられることとなる。 1) The controller MC notifies the user of this state by e-mail when there is any moving object around the robot 1. The controller MC writes the URL of the cloud server of the robot 1 to the terminal device of a user (hereinafter, referred to as an external user) who is not near the robot 1 whose e-mail address is registered through the Internet. And send. In the subject line of the e-mail or in the transmitted text, write an expression that shows the intention of this notification. For example, a sentence such as "It seems that someone is coming" or an icon that means that.
"C. The effect on the operation when there is no answer setting
As a result, the external user can be notified of the state in which an object (person) is moving around the robot 1 and recognized as a result of the e-mail being sent. You will be given the opportunity to

２）コントローラＭＣは、外部ユーザーにｅメールを送信すると同時に顔認識用カメラ１０を起動させて画像を取得する。
３）コントローラＭＣは、外部ユーザーにｅメールを送信すると同時に、一定時間内にあるトリガーとなる発話があるかどうかを判断する。例えば「ただいまユピ坊」のような挨拶の発話である。この発話に基づいて自然対話モードにおけるビルトインシナリオが開始され、ユーザー（ここでは「ただいまユピ坊」を発話したロボット１の近くにいる者）に顔認証を促す。コントローラＭＣは顔認証の結果、登録されているユーザーの一人であると判断した場合に、外部ユーザーの端末装置に対して二回目となるｅメールを送信する。このｅメールはロボット１の周囲にいる者は不審者ではないという外部ユーザーに対する情報となる。つまり、二回目のｅメールは例えば家族のような関係者であることを報知するものとなる。ｅメールには登録情報に基づいて登録されているユーザーの名前を情報として件名や送信文中に表記する。尚、トリガーとして発話以外の、例えばタッチパネル部７にタッチして顔認識し、登録ユーザーであることを確認してもよい。
『Ｃ．留守設定時の動作についてにおける効果その２』
これによって、留守中に例えば子供等の家族が帰ってきた場合には、この二回目のｅメールによってその旨がわかるため、わざわざスマートフォン経由で留守中の家の様子を確認する必要がない。 2) The controller MC sends an e-mail to an external user and at the same time activates the face recognition camera 10 to acquire an image.
3) The controller MC sends an e-mail to the external user, and at the same time determines whether there is a trigger utterance that is within a predetermined time. For example, it is an utterance of a greeting like "I'm happy now". Based on this utterance, a built-in scenario in the natural dialogue mode is started to urge the user (here, a person near the robot 1 who has uttered "I'm happy now") face recognition. If the controller MC determines that it is one of the registered users as a result of face authentication, it sends a second e-mail to the terminal device of the external user. This e-mail is information for an external user that the person around the robot 1 is not a suspicious person. That is, the second e-mail informs that it is a related person, such as a family. In the e-mail, the name of the registered user based on the registration information is described in the subject line and the transmission text as information. Note that, as a trigger, for example, the touch panel unit 7 may be touched to face-recognize other than the utterance, and the user may be confirmed as a registered user.
"C. About the operation at the time of absence setting Effect in part 2
As a result, when, for example, a family such as a child comes home while he is away, it is not necessary to check the appearance of the away home via the smart phone because the second e-mail indicates that fact.

４）１）において、ｅメールを受信した外部ユーザーは、特に３）において二回目のｅメールの送信がなかった場合に、ロボット１のＩＤとパスワードを入力してクラウドサーバーに接続し、スマートフォンのブラウザ上でクラウドサーバーが提供する顔認識用カメラ１０のカメラ画像をリアルタイムで見ることができる。二回目のｅメールの送信があってもそれは可能である。
図１２はユーザーのスマートフォン４１の一例であり、クラウドサーバーに接続後においてはタッチパネルを兼ねたその表示画面４３上に顔認識用カメラ１０の所定のカメラ画像が表示される。カメラ画像内には顔認識用カメラ１０の向きを遠隔操作するための４つの操作アイコン４４ａ〜４４ｄが表示される。外部ユーザーは操作アイコン４４ａ〜４４ｄを操作することで制御コマンドがクラウドサーバーを介してコントローラＭＣに出力され、制御コマンドに基づいて第１のモータ２３又は第３のモータ２５が駆動制御されてロボット１の頭部５と胴部４が回動して顔認識用カメラ１０の向きを変えることができる。また、録画ボタンアイコン４５にタッチすることで、録画を開始し再度タッチすることで録画を停止することができる。
また、スマートフォン４１の図示しないマイクロフォンに発話した音声データはクラウドサーバーを介してロボット１のスピーカー装置１３から音声出力され、一方でロボット１のマイクロフォン１２から発話した音声データはクラウドサーバーを介してスマートフォン４１の図示しないスピーカー装置から音声出力される。そのため、外部ユーザーは顔認識用カメラ１０の画像を見ながらロボット１近傍のユーザーとスマートフォン４１とロボット１を使用した対話をすることができる。
『Ｃ．留守設定時の動作についてにおける効果その３』
これによって、外部ユーザーは遠隔操作で顔認識用カメラ１０の向きを変えてロボット１の周囲の状況を確認することができ、例えば留守の際の自宅の安全状況をチェックすることができる。また、留守中に子供等の家族が帰ってきた場合でもこのようにスマートフォンを使用して積極的に外部から連絡することで家族を含めた他者との良好な関係に寄与する。 4) In 1), the external user who received the e-mail inputs the ID and password of the robot 1 and connects to the cloud server, especially when there is no second e-mail transmission in 3). A camera image of the face recognition camera 10 provided by the cloud server can be viewed in real time on the browser. It is possible if there is a second e-mail transmission.
FIG. 12 shows an example of the smartphone 41 of the user. After connection to the cloud server, a predetermined camera image of the face recognition camera 10 is displayed on the display screen 43 which also serves as a touch panel. In the camera image, four operation icons 44a to 44d for remotely controlling the direction of the face recognition camera 10 are displayed. The external user operates the operation icons 44a to 44d to output a control command to the controller MC via the cloud server, and the first motor 23 or the third motor 25 is driven and controlled based on the control command, and the robot 1 is operated. The head 5 and the torso 4 can be rotated to change the direction of the face recognition camera 10. Moreover, by touching the recording button icon 45, recording can be stopped by starting recording and touching again.
Further, voice data uttered to a microphone (not shown) of the smartphone 41 is voice-outputted from the speaker device 13 of the robot 1 via the cloud server, while voice data uttered from the microphone 12 of the robot 1 is smartphone 41 via the cloud server The sound is output from a speaker device (not shown). Therefore, the external user can interact with the user in the vicinity of the robot 1 and the smartphone 41 using the robot 1 while viewing the image of the face recognition camera 10.
"C. About the operation at the time of absence setting Effect in 3
As a result, the external user can remotely change the direction of the face recognition camera 10 to check the situation around the robot 1 and, for example, can check the safety situation of the home at the time of absence. In addition, even if a family such as a child comes home while away from home, using a smartphone to actively communicate in this way contributes to a good relationship with others including the family.

＜実施の形態１の変形例１＞
次に、実施の形態１の変形例１について説明する。
上記自然対話におけるビルトインシナリオの対話において、ユーザーの滑舌が悪かったり、他の音が混ざってしまいマイクロフォン１２から取得した音声データがビルトインシナリオの正規表現又は非正規表現に合致しない場合には、コントローラＭＣは直ちに通常対話であると判断することなくユーザーに再度の発言を促すための発話として、例えば「もう一度言って下さい」というような音声出力をさせるようにしてもよい。
コントローラＭＣは、このような促しの発話をスピーカ装置１３からさせ、これに対して一定の時間内にユーザーからビルトインシナリオに沿った正しい発話がされた場合には再びビルトインシナリオの対話として処理するようにする。一方、このような場合でもユーザーの発話がビルトインシナリオに正規表現又は非正規表現に合致しない場合に外部のクラウドサーバーに接続させるようにする。
このようにすれば、無駄に外部のクラウドサーバーに接続させるようなことがなく、ロボット１の内部のみで対話を行うことができる。 <Modification 1 of Embodiment 1>
Next, a first modification of the first embodiment will be described.
In the built-in scenario dialogue in the above natural dialogue, in the case where the user's bad tongue is mixed or other sounds are mixed and the voice data acquired from the microphone 12 does not match the regular expression or non-regular expression of the built-in scenario, The MC may immediately output a voice such as "please say again" as an utterance for prompting the user to re-speak without immediately determining that the dialog is a normal dialogue.
The controller MC causes such a prompting utterance to be issued from the speaker device 13 and, if the user makes a correct utterance according to the built-in scenario within a certain time, it is treated as a built-in scenario dialogue again. Make it On the other hand, even in such a case, the built-in scenario is connected to an external cloud server when the built-in scenario does not match the regular expression or the non-regular expression.
In this way, communication can be performed only inside the robot 1 without being unnecessarily connected to an external cloud server.

＜実施の形態１の変形例２＞
次に、実施の形態１の変形例２について説明する。
上記自然対話においてユーザーがロボット１の言葉（発話）を聞き逃した場合、一定の時間内であれば直前のロボット１の発話を繰り返すような依頼の発話をユーザーが発話することで再度ロボット１に発話（音声出力）させるようにしてもよい。この処理はビルトインシナリオの対話でも自然対話でもいずれでも可能である。
コントローラＭＣは対話モード中において聞き逃しのトリガーとなるような発話、例えば「もう一度言って」という発話があったかどうかを音声認識する。そして、ロボット１からの発話後にユーザーから発話を繰り返す依頼があったと判断すると、直前にロボット１が発話した内容を再度音声出力する。そして、先の発話をしたことはキャンセルして、二回目の発話をもって一回目の発話として処理する。
このようにすれば、ユーザーが対話途中で聞き逃したりした場合でも対話が途切れることなく再開されることとなる。 <Modification 2 of Embodiment 1>
Next, a second modification of the first embodiment will be described.
When the user misses the word (speech) of the robot 1 in the above natural dialogue, the user utters a speech of a request to repeat the immediately preceding speech of the robot 1 within a predetermined time, so that the robot 1 is given a second time. You may make it speak (voice output). This process can be either built-in scenario dialogue or natural dialogue.
The controller MC performs speech recognition as to whether there is an utterance that triggers a missed speech during the interactive mode, for example, an utterance of "say again". Then, if it is determined that there is a request from the user after the utterance from the robot 1 to repeat the utterance, the content uttered immediately before by the robot 1 is voice-outputted again. Then, the previous utterance is canceled, and the second utterance is processed as the first utterance.
In this way, even if the user misses in the middle of the dialogue, the dialogue is resumed without interruption.

＜実施の形態１の変形例３＞
次に、実施の形態１の変形例３について説明する。
上記自然対話においてロボット１がタッチパネル部７に表示させたユーザーの発話をユーザーが確認して、間違って音声認識されたことがわかった場合には、一定の時間内であればそれを指摘して正しい対話に修正することができるようにしてもよい。この処理はビルトインシナリオの対話でも自然対話でもいずれでも可能である。
コントローラＭＣは対話モード中において、ユーザーからの音声認識が間違っている旨の指摘となるトリガーとなるような発話、例えば「間違えているよ」や「違うよ。もう一度いうよ」というような発話があったかどうかを音声認識する。そして、コントローラＭＣはその発話がユーザーの発話をタッチパネル部７に表示させた後の一定時間内にあったと判断すると再度のユーザーの発話を促す音声出力をする。例えば「ごめんね。もう一度言って」という発話内容を音声出力する。
そして、
（１）ビルトインシナリオの対話の場合には直前のユーザーの発話はキャンセルされ、再度ユーザーが発話する内容が正しい発話として音声認識処理される。
（２）通常対話では上記の「間違えているよ」や「もう一度いうよ」という対話内容も外部のクラウドサーバーに発話データ（音声データ）として送信され、そのような発話も含め再度ユーザーが発話した内容で返答データの作成をリクエストする。
このようにすれば、ロボット１が間違ってユーザーの発話を認識した場合でも正しい対話に修正することができる。 <Modification 3 of Embodiment 1>
Next, a third modification of the first embodiment will be described.
If the user confirms the speech of the user displayed on the touch panel 7 by the robot 1 in the above natural dialogue and if it is found that voice recognition has been made erroneously, it is pointed out within a fixed time. It may be possible to correct to the correct dialogue. This process can be either built-in scenario dialogue or natural dialogue.
The controller MC, while in the interactive mode, makes an utterance that causes the user to indicate that voice recognition is incorrect, such as an utterance such as "I'm wrong" or "No, I'll say it again." Speech recognition whether there was. Then, when controller MC determines that the utterance is within a predetermined time after the user's utterance is displayed on touch panel unit 7, controller MC outputs a voice prompting the user's utterance again. For example, the utterance content "I'm sorry. Please say again."
And
(1) In the case of the dialogue of the built-in scenario, the immediately preceding user's utterance is canceled, and the content of the user's utterance is voice-recognized again as the correct utterance.
(2) In the normal dialogue, the dialogue contents such as "I'm wrong" or "I say again" are also transmitted as speech data (voice data) to an external cloud server, and the user utters again including such speech Request to create response data by content.
In this way, even if the robot 1 mistakenly recognizes the user's speech, it can be corrected to the correct interaction.

＜実施の形態１の変形例４＞
実施の形態１の構成とは異なる例えば次のような構成を採用するようにしてもよい。
（１）上記ではビルトインシナリオが実行されない場合にクラウドサーバーにリクエストして通常対話に移行するような設定であった。つまり、ビルトインシナリオが実行されるのであれば、すべてビルトインシナリオとするような設定であったが、敢えてビルトインシナリオに対応する場合でもある条件でローカルで対応をせずにクラウドサーバーにリクエストするようにしてもよい。ある条件とは例えば何回かに一回の回数や、ランダムなタイミングで実行することがよい。
これによって、ロボット１と予測されない対話をすることとなり、決まり切っていないより人間的な対話ができることができる。
（２）上記実施の形態１ではコントローラＭＣは音声認識エンジンを備えず、音声認識エンジンを備えた外部のサーバーに接続してユーザーの発話（音声データ）をテキスト化するようにしていた。それによってロボット１の負担が軽減されている。
しかし、コントローラＭＣは、メモリ内に音声認識エンジンを備えるようにし、音声認識エンジンを呼び出してマイクロフォン１２から取得したユーザーの発話データ（音声データ）を音声認識エンジンを使用して自身でテキスト化した文字列データを作成するようにしてもよい。つまり、ロボット１のコントローラＭＣは自らユーザーの音声データをテキストデータ化する能力を有していてもよい。これによって、音声認識エンジンを使用せずに文字列データを作成できることとなって、例えば内部の処理時間が短くなる。
（３）上記実施の形態１では設定画面から初期設定するようにしていたが、例えばスマートフォンのような端末装置を使用して外部からクラウドサーバー経由で登録するようにしてもよい。その方が特に端末装置を使い慣れた人には設定が容易で時間の短縮となる。
（４）第１〜の第３モータ２３〜２５はサーボモータ以外の他の駆動手段を使用するようにしてもよい。他の駆動手段とは、例えば他の形式のモータや油圧シリンダ等である。
（５）上記実施の形態１では文字列データをロボット１内部で音声合成するようにしていた。このように内部の音声合成エンジンを用いることはそのまま音声データをサーバーとやり取りするよりデータが重くなりすぎずによいが、クラウドサーバー側で対話エンジンを使用して取得した返答データ（文字列データ）を音声合成し、その音声データをロボット１にレスポンスするようにしてもよい。
（６）上記実施の形態１ではチャット画面Ｓ２の設定ボタンオブジェクト３５から設定画面に移行するような構成であったが、タッチパネル部７をスライド操作することで設定画面に移行ようにしってもよい。 <Modification 4 of Embodiment 1>
For example, the following configuration, which is different from the configuration of the first embodiment, may be adopted.
(1) In the above, when the built-in scenario is not executed, the cloud server is requested to shift to normal interaction. In other words, if built-in scenarios are to be executed, they are all configured as built-in scenarios, but they are requested to request to the cloud server without responding locally under conditions that may correspond to built-in scenarios. May be A certain condition may be executed, for example, once in several times or at random timing.
This makes it possible to interact with the robot 1 unpredictably, and to provide an unconventional and more human interaction.
(2) In the first embodiment, the controller MC is not provided with the speech recognition engine, and is connected to an external server provided with the speech recognition engine to convert the user's speech (speech data) into text. The burden on the robot 1 is thereby reduced.
However, the controller MC has a speech recognition engine in the memory, and calls the speech recognition engine to text the user's speech data (speech data) acquired from the microphone 12 into text by itself using the speech recognition engine. Column data may be created. That is, the controller MC of the robot 1 may have the ability to convert voice data of the user into text data. As a result, character string data can be created without using a speech recognition engine, and the internal processing time can be shortened, for example.
(3) In the first embodiment, initial setting is performed from the setting screen. However, for example, a terminal device such as a smartphone may be used to register from the outside via the cloud server. This makes setting easier and saves time, especially for people who are used to terminal devices.
(4) The first to third motors 23 to 25 may use other drive means than the servomotor. Other drive means are, for example, other types of motors, hydraulic cylinders, and the like.
(5) In the first embodiment, the character string data is speech-synthesized inside the robot 1. In this way, using the internal speech synthesis engine is better without making the data too heavy than exchanging the speech data with the server as it is, but the response data (character string data) acquired using the dialogue engine on the cloud server side May be synthesized, and the speech data may be sent to the robot 1 as a response.
(6) In the first embodiment, the configuration is such that the setting button object 35 of the chat screen S2 shifts to the setting screen, but the setting screen may be shifted by sliding the touch panel unit 7 .

（７）「ハ．通常対話における特別な所作」においてはビルトインシナリオにおいても同様にユーザーの発話データ内に特定の言葉が含まれていると判断した場合に特別な所作を実行させるようにしてもよい。例えばコントローラＭＣはユーザーの発話データ内に特定の言葉が含まれていると判断すると上記と同様に特別な所作をさせるように制御してもよい。
（８）「ハ．通常対話における特別な所作」においてロボット１の形状が異なれば更に異なるジェスチャーをさせるように制御してもよい。例えば、コントローラＭＣはロボット１に手や足があればそれらを駆動手段を制御して動かすようにしてもよい。
（９）顔画面Ｓ１の目オブジェクト２７のアニメーションとして、ときどき、瞬きさせるようなアニメーションを入れてもよい。例えば図９（ａ）の目オブジェクト２７の状態から図７の閉じた状態の目オブジェクト２７を挿入するような制御とすることで実行させる。そのようにすれば、ロボット１が実際に本当にこちらを見ているようなリアル感が創出されることとなりロボット１との対話をより楽しむことができる。
（１０）顔認識モードでは、登録されていないユーザーであれば不特定の人物として認識するようにしていたが、その状態から設定画面に移行して新たな別のユーザーとして認証登録するようにすると、便利である。
（１１）「Ｃ．留守設定時の動作について」において、留守設定モードをスマートフォンから設定できるようにすると便利である。
（１２）「Ｃ．留守設定時の動作について」において、コントローラＭＣは、ロボット１の周囲になんらかの動く物体が存在することで、この状態をユーザーにｅメールによって報知をするような処理をするが、逆に一定間隔で動いていることを認識し、一定時間内に動く物体がない場合にこの状態をユーザーにｅメールによって報知をするような処理を設けてもよい。
例えば、病人や介護対象者がある場合にその近くにロボット１を置くことで常に動きがあることを前提とした見守りをすることができる。 (7) In "Ha. Special Actions in Normal Dialogue", even in the built-in scenario, even if it is determined that the user's speech data contains a specific word, the special actions should be executed. Good. For example, if it is determined that the user's speech data contains a specific word, the controller MC may control to cause a special action as described above.
(8) In “C. Special action in normal dialogue”, control may be performed to cause different gestures if the shape of the robot 1 is different. For example, if there is a hand or foot on the robot 1, the controller MC may control the drive means to move them.
(9) As the animation of the eye object 27 of the face screen S1, an animation which blinks occasionally may be inserted. For example, from the state of the eye object 27 of FIG. 9A, the control is performed such that the eye object 27 of the closed state of FIG. 7 is inserted. By doing so, it is possible to create a sense of realism that the robot 1 is really looking at here, and to enjoy the dialogue with the robot 1 more.
(10) In the face recognition mode, an unregistered user is recognized as an unspecified person, but when the state is shifted from the state to a setting screen and authentication is registered as another new user It is convenient.
(11) It is convenient to be able to set the absence setting mode from the smartphone in “C.
(12) In “C. Operation at the time of absence setting”, the controller MC performs processing such as notifying the user by e-mail that there is some moving object around the robot 1 by e-mail. On the contrary, it may be recognized that it is moving at a constant interval, and a process may be provided to notify the user by e-mail when there is no moving object within a predetermined time.
For example, when there is a sick person or a care recipient, by placing the robot 1 near it, it can be watched on the assumption that there is always a movement.

＜実施の形態２＞
次に、実施の形態２について説明する。
上記実施の形態１のロボット１の高輝度白色ＬＥＤ９に変えて、あるいはこれに併設した赤外線ＬＥＤを備えるようにしてもよい。この際に顔認識用カメラ１０のモジュールに赤外線フィルタが備えられていれば取り外す。赤外線ＬＥＤは人には見えないため、夜間の空き巣等の侵入者があった場合に、高輝度白色ＬＥＤ９が点灯することで驚いて侵入者に逃げられてしまう可能性がある。一方、赤外線ＬＥＤであると撮影されていることがわかりにくいので侵入者は逃げず、そのため侵入の画像を確認したり、保存したりすることが可能となる。 Second Embodiment
Next, the second embodiment will be described.
Instead of the high-intensity white LED 9 of the robot 1 according to the first embodiment, an infrared LED may be provided. At this time, if the module of the face recognition camera 10 is equipped with an infrared filter, it is removed. Since the infrared LED is invisible to human beings, when there is an intruder such as a void at night, the high brightness white LED 9 may be surprised to be escaped by the intruder. On the other hand, since it is hard to understand that the infrared LED is photographed, the intruder does not run away, so that it is possible to confirm or save the intruding image.

＜実施の形態２の変形例１＞
次に、実施の形態２の変形例１について説明する。
ロボット１が赤外線ＬＥＤを備えた場合に、この赤外線ＬＥＤを利用して赤外線リモコン信号受信部を備えた室内の各種装置の制御をするようにしてもよい。各種装置としては、例えばテレビ、オーディオ装置、エアコン装置等がよい。
ロボット１は赤外線リモコン信号受信部を備えた装置の赤外線リモコン信号受信部を直接見通せる場所に設置することがよい。実施の形態２の変形例１ではコントローラＭＣは顔認識用カメラ１０を使用した形状認識に関する形状認識プログラムを備えており、例えばテレビであればその形状の特徴（四角、大きい、黒い等）に基づいて認識することができる。ロボット１はユーザーの各種装置へ赤外線リモコン信号を出力するためのトリガーとしての例えば「テレビのスイッチ付けて」のような発話があると、その発話に基づいて第１のモータ２３又は第３のモータ２５を制御してロボット１を顔認識用カメラ１０を上下に顔５を首振りさせながら、３６０度回転させて周囲を撮影させ、形状認識プログラムによってテレビの形状を認識させるように動作させる。
コントローラＭＣがテレビがあると判断すると、その方向を記憶させると同時に赤外線ＬＥＤからその物体方向に赤外線リモコン信号を出力させて、テレビを動作させるＯＮ・０ＦＦ等の制御を実行させる。次のテレビについてのトリガーがあった際にはまず、その方向において形状認識を実行する。赤外線リモコン信号は単にＯＮ・０ＦＦのスイッチング制御だけではなく、例えばテレビであればチャンネルの変更、例えばエアコンであれば温度調整等にも対応するように赤外線周波数を変更して制御することが可能となる。このような細かな制御では複数種類の周波数の異なる赤外線リモコン信号が必要となるが、赤外線の周波数の設定は、例えば図１２はユーザーのスマートフォン４１を使用してサーバー経由で行うようにするとよい。
また、形状認識プログラムによって方向を探さなくとも、スマートフォン４１経由で顔認識用カメラ１０を操作してその向きを変えることで各種装置の方向を取得し、その方向を登録するようにしてもよい。 <Modification 1 of Embodiment 2>
Next, a first modification of the second embodiment will be described.
When the robot 1 includes an infrared LED, the infrared LED may be used to control various devices in the room provided with the infrared remote control signal reception unit. As various devices, for example, a television, an audio device, an air conditioner, etc. are preferable.
The robot 1 may be installed at a place where the infrared remote control signal receiver of the apparatus provided with the infrared remote control signal receiver can be directly seen. In the first modification of the second embodiment, the controller MC includes a shape recognition program related to shape recognition using the face recognition camera 10. For example, in the case of a television, the controller MC is based on the characteristics (square, large, black, etc.) of the shape. Can be recognized. When the robot 1 has an utterance such as "Turn on the television" as a trigger for outputting an infrared remote control signal to various devices of the user, the first motor 23 or the third motor is based on the utterance. The control unit 25 controls the robot 1 to rotate the face recognition camera 10 up and down while swinging the face 5 360 degrees to photograph the surroundings and operate the shape recognition program to recognize the shape of the television.
When the controller MC determines that there is a television, it stores the direction and simultaneously causes the infrared LED to output an infrared remote control signal in the direction of the object to execute control of ON / OFF etc. which operates the television. When there is a trigger for the next television, first, shape recognition is performed in that direction. The infrared remote control signal can be controlled not only by ON / OFF switching control, but also by changing the infrared frequency so as to cope with temperature change etc. Become. Such fine control requires different types of infrared remote control signals of different types of frequencies. For example, in the case of FIG. 12, the setting of the infrared frequency may be performed via the server using the smartphone 41 of the user.
Further, without searching for the direction by the shape recognition program, the direction of the various devices may be acquired by operating the face recognition camera 10 via the smartphone 41 to change the direction, and the direction may be registered.

＜実施の形態３＞
次に、実施の形態３について説明する。
実施の形態３では音声認識エンジンを搭載したサーバーを使用する際の例えば対話ＡＰＩの利用等の接続に伴うランニングコストを削減することを主眼とした制御について説明する。
また、実施の形態３の対話プログラムはマイクロフォン１２から取得した音声の無音状態を検知できるサブプログラムを含んでいる。また、対話プログラムはユーザーの発話の音声データをマイクロフォン１２から取得して一旦録音し、サーバーに出力させるための録音・出力サブプログラムを含んでいる。
コントローラＭＣは発話があった場合には直ちにサーバーに接続させず、ユーザーの発話の音声データをまず一旦録音し、録音したユーザーの発話の音声データの無音状態を検出した段階で初めてサーバーに接続してその録音した音声データを出力し、対話エンジンでの返答データを作成させるようにする。このようにすれば常にサーバーに接続されているわけではなく、無音時間を含んだ長時間をサーバーに接続する必要がないため、無音の接続時間をカットすることができる。
音声認識エンジンはユーザーの発話中において１つのプロセスがそのユーザーに専有されることとなる。つまり、1つの処理に「何秒」というコンピュータとしては非常に長い時間が専有されることとなり、結果として音声認識エンジンを使用するユーザーのコストの負担が大きくなってしまうが、実施の形態３のようにすれば単位ユーザー当たりに必要なプロセスを減らすことができ、ユーザーのコスト削減に寄与する。 Embodiment 3
Next, the third embodiment will be described.
In the third embodiment, control will be described focusing on reducing running costs associated with connection such as use of the dialog API when using a server equipped with a speech recognition engine.
Further, the dialogue program of the third embodiment includes a sub program capable of detecting a silent state of the voice acquired from the microphone 12. Further, the dialogue program includes a recording / output subprogram for acquiring speech data of the user's speech from the microphone 12 and temporarily recording the speech data for output to the server.
If there is an utterance, the controller MC does not immediately connect to the server but first records the speech data of the user's speech once, and connects to the server only when the silence state of the recorded speech data of the user is detected. The recorded voice data is output, and the dialogue engine is made to create response data. In this way, it is not always connected to the server, and since it is not necessary to connect to the server for a long time including silence time, silent connection time can be cut.
The speech recognition engine has one process dedicated to the user during the user's speech. In other words, a very long time is occupied as "a few seconds" for one process in one process, and as a result, the burden on the cost of the user who uses the speech recognition engine is increased. In this way, the process required per unit user can be reduced, which contributes to the cost reduction of the user.

＜実施の形態３の変形例１＞
次に、実施の形態３の変形例１について説明する。
実施の形態３の変形例１でも音声認識エンジンを搭載したサーバーを使用する際の接続のランニングコストを削減することを主眼とした制御について説明する。ユーザーの発話が開始されるまでにタイムラグが発生することや、ユーザーの発話待ちの状態で結局ユーザーが発話せずタイムアウトでサーバーとの接続を終了する場合があると、音声認識サーバーはプロセスを消費してしまうのでユーザーのコストがかかってしまう。
実施の形態３の変形例１のロボット１の対話プログラムは発話の音声データをマイクロフォン１２から取得して一旦録音し、サーバーに出力させるための録音・出力サブプログラムを含んでいる。また、対話プログラムには録音された音声データの音圧レベルを検出し、出力するサブプログラムを含んでいる。
コントローラＭＣは自然対話の状態で常時録音されている音声データが一定音圧以上のである場合にサーバーに接続させ、録音中のデータを追っかけ再生するようにする。つまり、無音、あるいは音声認識ができないような小さな発話を無視し、対話可能な発話があった場合だけサーバーに接続して音声データを出力し、サーバー側に音声認識エンジンで返答データを作成させるようにする。
これによって、発話待ちの無駄な接続時間をなくすことが可能となる。 <Modification 1 of Embodiment 3>
Next, a first modification of the third embodiment will be described.
Also in the first modification of the third embodiment, control will be described focusing on reducing the running cost of connection when using a server equipped with a voice recognition engine. The speech recognition server consumes the process if there is a time lag before the user's speech starts, or if the user's speech is waiting for the user to end the connection with the server after a timeout without a speech. It will cost you a lot of money.
The dialogue program of the robot 1 according to the first modification of the third embodiment includes a recording / output subprogram for acquiring speech data of speech from the microphone 12 and temporarily recording the speech data, and causing the server to output the speech data. Further, the dialogue program includes a sub program which detects and outputs the sound pressure level of the recorded voice data.
The controller MC is connected to the server when voice data being constantly recorded in a state of natural interaction is equal to or more than a predetermined sound pressure, and causes the data being recorded to be chased and reproduced. In other words, ignore small utterances that can not be silenced or can not be recognized, connect to the server only when there is an utterance that can be interacted, and output voice data, and let the server create response data with the voice recognition engine Make it
This makes it possible to eliminate useless connection time waiting for speech.

＜実施の形態４＞
次に、実施の形態４について説明する。
音声認識サーバーは高価であるため、あらかじめ十分なリソースを用意することができないことがあり、サーバーリソースに余裕がない場合、端末が音声認識サーバーに接続しようとした場合にサーバーがビジー状態であることがある。
上記通常対話においては、サーバーに発話データ（音声データ）を送信し返答データの作成をリクエストする。とサーバーは発話データに基づいて文字列データ化された返答データを作成してレスポンスする。ところが、ビジー状態であると返答データがされず、エラーになってしまうことがある。サーバーからエラーメッセージが返信されることとなる。
ロボット１のコントローラＭＣはサーバーからエラーメッセージの送信を受けた場合にサーバー接続エラーである旨の発話をユーザーにせずに、ビルトインシナリオから対話を続けられるような返信を音声出力するようにする。例えば「もう一度言って」とか「うんうん」とか「なんだっけ？」というような曖昧な返答したり、適当な相槌を返すなどしてサーバーが空くのを待つ処理をするとよい。 Fourth Preferred Embodiment
Next, the fourth embodiment will be described.
The voice recognition server is expensive, so it may not be possible to prepare sufficient resources beforehand, and if the server resource is not enough, the server is busy when trying to connect to the voice recognition server There is.
In the normal interaction, the server transmits speech data (speech data) to request creation of response data. And the server creates response data converted into character string data based on the speech data and responds. However, if it is busy, response data may not be sent and an error may occur. An error message will be sent back from the server.
When the controller MC of the robot 1 receives an error message from the server, the built-in scenario outputs a reply that allows the user to continue the dialogue, without giving the user an utterance indicating that the server connection error. For example, it is good to make an ambiguous response such as "Say again", "Uneun", or "What is it?", Or return an appropriate sumo wrestling to wait for the server to become available.

＜実施の形態４の変形例１＞
次に、実施の形態４の変形例１について説明する。
ユーザー側の発話が長すぎると、音声認識エンジンが誤認識をする可能性がある。そのため、認識したユーザーの発話が長すぎると判断した場合に、音声認識エンジンを備えるサーバーに接続することなく記憶手段に記憶された音声データから選択された対話例を音声出力する機能を備えることがよい。
ロボット１のコントローラＭＣは、ユーザーの発話データが一定以上の長さになったと判断した場合には、サーバーに接続させることなくビルトインシナリオから「うん」とか「マジ？」とか「本当ですか？」などという対話においてどのようにも取れる相づちのような発話を音声出力する。発話データは音声データのままでもよく、コントローラＭＣあるいはサーバーで文字列データに変換された後のものでもよい。
これによって的外れな言葉が返ってくることを防止し、対話を仕切り直しして改めてユーザーに対話を促すようにすることができる。 <Modification 1 of Embodiment 4>
Next, a first modification of the fourth embodiment will be described.
If the user speaks too long, the speech recognition engine may misinterpret. Therefore, when it is determined that the speech of the recognized user is too long, a function of outputting an example of an interaction selected from the speech data stored in the storage unit without connecting to a server provided with a speech recognition engine is provided. Good.
If the controller MC of the robot 1 determines that the user's speech data has become longer than a certain length, it is not connected to the server, and is "true" or "true" or "true" from the built-in scenario? And so on, such as an utterance that can be taken in any way in the dialogue such as. The speech data may be voice data as it is or may be converted to character string data by the controller MC or server.
This can prevent the return of off-target words, restructure the dialogue, and prompt the user to interact again.

＜実施の形態５＞
次に、実施の形態５について説明する。
実施の形態５では複数の音声認識エンジンを組み合わせて利用する場合について説明する。
音声認識エンジンにはローカル（つまり、ネットサーバーに接続せずに装置内で処理する場合）の音声認識エンジンと、ネットサーバーに接続してリクエストによって作成した対話データをレスポンスするクラウドの音声認識エンジンがある。ローカルにもクラウドにもそれぞれ複数種類の音声認識エンジンがあり、無料のものも有料のものもある。そのため、これら異なる音声認識エンジンを備えるサーバーを利用する際に料金が無料のサーバーと有料のサーバーをミックスして利用するようにする。
ロボット１がインターネット回線を利用して接続されるクラウドサーバーでは、対話モードにおいて、ロボット１から発話データがリクエスト発行され音声認識エンジンに返答データを作成させる際に、例えばクラウドサーバーは次のように対応することがよい。
（１）月あたり設定したある時間Ａまでは有料の対話ＡＰＩにアクセスする。
（２）月あたり設定したある時間Ａからある時間Ｂまでは有料のＡＰＩと無料の音声認識エンジンを混ぜて使う。例えば最初の連続数回の認識は有料の音声認識エンジンのサーバーを使いその後の連続した認識には無料の音声認識エンジンのサーバーを使うなどミックスして使う。
（３）月あたり設定したある時間Ｂを超えた場合、無料の音声認識エンジンのみを使う。これは一例であって、例えば月あたり設定したある時間Ａを越えた場合に直ちに無料の音声認識エンジンのみを使うような設定でもよい。
このようにすれば、有料の範囲を大きく越えずに対話をすることができる。ロボット１と接続されているクラウドサーバーがこのような処理を実行するプログラムに基づいて有料と無料とを月あたり設定した時間に基づいて計算してロボット１からのリクエスト発行を処理する。
ロボット１自体がこのような処理を実行して、リクエスト発行の際にクラウドサーバーに対して有料の対話ＡＰＩを使用するか、無料のサーバーの音声認識エンジンを使用するかの命令をするようにしてもよい。
＜実施の形態５−１＞
実施の形態５の複数の音声認識エンジンを組み合わせて利用する場合は複数の対話エンジンを組み合わせる場合についても同様である。 The Fifth Preferred Embodiment
Next, the fifth embodiment will be described.
In the fifth embodiment, a case where a plurality of speech recognition engines are combined and used will be described.
The speech recognition engine includes a local speech recognition engine (that is, when processing in the device without connecting to a net server) and a cloud speech recognition engine that connects to a net server and responds to interaction data created by a request. is there. There are multiple voice recognition engines, both locally and in the cloud, and some are free and paid. Therefore, when using servers with these different speech recognition engines, a mix of free servers and paid servers will be used.
In the cloud server to which the robot 1 is connected using the Internet, in the interactive mode, when the speech data is issued from the robot 1 and the speech recognition engine generates response data, for example, the cloud server responds as follows: It is good to do.
(1) Access the paid interaction API until a certain time A set per month.
(2) From a certain time A set up per month to a certain time B, mixed use of paid API and free speech recognition engine. For example, the first few consecutive recognitions use a paid speech recognition server, and the subsequent consecutive recognitions use a mixed speech, such as a free speech recognition server.
(3) When a certain time B set per month is exceeded, only the free speech recognition engine is used. This is an example, and for example, it may be set to use only the free speech recognition engine immediately when the set time A per month is exceeded.
In this way, it is possible to interact without significantly exceeding the pay range. The cloud server connected to the robot 1 calculates the charge and free based on the time set per month based on the program that executes such processing, and processes the request issuance from the robot 1.
The robot 1 itself performs such processing and instructs the cloud server to use a paid interaction API or use a free server speech recognition engine when issuing a request. It is also good.
Embodiment 5-1
When combining and using a plurality of speech recognition engines in the fifth embodiment, the same applies to the case of combining a plurality of dialogue engines.

＜実施の形態６＞
次に、実施の形態６について説明する。
ある１つの決まった対話エンジンを使うだけでは、返答がきまったパターンになってしまいユーザーがロボット１との会話に飽きてしまう可能性がある。そのため、実施の形態６ではこれを解消するため複数の対話エンジンの出力の結果を用いて会話に飽きないようにその結果をアレンジするための処理を説明する。
対話エンジンはクラウドの対話エンジンだけではなく、ロボット１内のローカルな対話エンジンを使用してもよい。
この処理は複数の返答データを送信されたロボット１側で行ってもよく、対話エンジンを備えたいくつものサーバーからの返答データを取得した際にクラウドサーバー側で行ってもよい。
（１）雑談対話エンジンのうち文字列の文字数の最も長い返答をしてきたエンジンの結果を出力する。
最も長い返答とすると、いかにも対話しているように感じ、対話の単調さがなくなり、ユーザーは対話を楽しむことができる。
Ａ．例えば、「腹減った」とユーザーが発話した場合に、ａ〜ｃの３つのエンジンからの回答が「ａエンジン：よく間食をしますか?」「ｂエンジン：ご飯食べてないの？」「ｃエンジン：なんか食え」である場合に、ａエンジンを採用してその返答データを出力する。
Ｂ．例えば、「今日の天気は晴れ」とユーザーが発話した場合に、ａ〜ｃの３つのエンジンからの回答が「ａエンジン：今すぐお空に行って確認してきます」「ｂエンジン：快晴っぽい？」「ｃエンジン：晴れか雨かで、その日の気分が決まることがあるよね。」である場合に、ｃエンジンを採用してその返答データを出力する。 Embodiment 6
A sixth embodiment will now be described.
By using only one fixed dialogue engine, the response may become a well-defined pattern and the user may get bored with the conversation with the robot 1. Therefore, in the sixth embodiment, in order to solve this, processing for arranging the result so as not to get tired of conversation using a result of outputs of a plurality of dialogue engines will be described.
The dialogue engine may use not only the dialogue engine of the cloud but also the local dialogue engine in the robot 1.
This process may be performed on the side of the robot 1 to which a plurality of response data has been sent, or may be performed on the cloud server side when response data from a number of servers equipped with a dialogue engine is acquired.
(1) Output the result of the engine which has sent the longest reply of the number of characters of the character string among the chat dialog engines.
With the longest response, it feels like you are interacting, the monotony of the interaction is eliminated, and the user can enjoy the interaction.
A. For example, when the user utters "I'm hungry", the answer from the three engines of a to c is "A engine: Do you eat well?""B engine: Do not eat rice?"" c) Engine: If it is "something", adopt engine a and output its response data.
B. For example, when the user utters that "the weather is fine today", the answers from the three engines of ac are "a engine: I will go to the sky right now and check""b engine: clear weather If it is "c engine: whether it is sunny or rainy, the mood of the day may be determined", the c engine is adopted and the response data is output.

（２）雑談対話エンジンのうち、語尾に「？」がついているものを最後に持ってきて出力する。このとき「？」がついている回答が複数あればそれらを連続して出力する。
語尾に疑問符がつくと、その疑問に更に答えるような話の流れになるため、会話が続きやすくなりユーザーは対話を楽しむことができる。
例えば、上記（１）Ａ．の選択肢では「よく間食をしますか?ご飯食べてないの？」と出力する。また、上記（１）Ｂ．の選択肢であれば「快晴っぽい？」と出力する。
（３）肯定文を組み合わせた後、疑問文を組み合わせて出力する。
このようにアレンジすることでいかにも考えて文章を練ったような応答になるため、ユーザーは真剣に自身の発話を聞いてもらっているような感覚となり、続けて会話をしたいと思うようになるため、会話が続きやすくなりユーザーは対話を楽しむことができる。また、出力尺をかせぐことができるとともに人への返答を求めることができる。
例えば、上記（１）Ｂ．のような返答データが取得された場合「今すぐお空に行って確認してきます。晴れか雨かで、その日の気分が決まることがあるよね。快晴っぽい？」出力する。 (2) Of the chat dialogue engine, the one with "?" Attached to the end is taken last and output. At this time, if there are multiple answers with "?", They are output continuously.
A question mark at the end of the word makes the conversation more responsive to the question, making it easier for the user to continue the conversation.
For example, the above (1) A. In the option, output "Do you eat snacks often? Are you not eating rice?" In addition, the above (1) B.1. If it is an option, output "Are it clear?"
(3) After combining positive sentences, combine and output question sentences.
Because arranging in this way makes the response look like thinking and writing sentences, the user feels like they are seriously listening to their own speech, and they want to continue talking. The conversation is easy to continue and the user can enjoy the dialogue. In addition, it is possible to gain an output scale and to request a response to people.
For example, the above (1) B.1. If response data such as is acquired, "I will go to the sky right now and check it. It may be decided whether it is sunny or rainy, the mood of the day. Is it sunny?"

（４）他よりも話題の転換をより頻繁にしてくるエンジンからの結果を、他のエンジンの結果よりも後に持ってきて出力する。
このようにアレンジすることで話題転換したことで次の発話を誘うような対話となり、対話が続きやすくなる。
Ａ．例えば、「どーもどーも」とユーザーが発話した場合に、ａ〜ｃの３つのエンジンからの回答が「ａエンジン：だょね〜」「ｂエンジン：そうですね」「ｃエンジン：野球は見たりしますか？」である場合に、ｃエンジンのデータを最後にして「だょね〜そうですね野球は見たりしますか？」と出力する。
Ｂ．例えば、「なかなか見つからないね」とユーザーが発話した場合に、ａ〜ｃの３つのエンジンからの回答が「ａエンジン：その通りですね」「ｂエンジン：あるあるー」「ｃエンジン：ご家族は何人ですか？」である場合に、ｃエンジンのデータを最後にして「その通りですねあるあるーご家族は何人ですか？」と出力する。
（５）他よりもよりフレンドリーな返答をしてくるエンジンをまず真っ先に出力して、その後に他のエンジンからの返答をくっつけて出力する。
このようにアレンジすることでユーザーが対話に引き込まれやすくなり、対話が続きやすくなる。フレンドリーかどうかは言葉（単語）に相対的な序列化をすることでどの位置に配置するかを決定することができる。
Ａ．例えば、上記（４）Ｂ．の場合ではｂエンジンの結果を最初にして「あるあるーその通りですねご家族は何人ですか？」と出力する。
Ｂ．例えば、「なるほどね」とユーザーが発話した場合に、ａ〜ｂの２つのエンジンからの回答が「ａエンジン：あら適当な相槌ですね」「ｂエンジン：うむ」である場合に、ｂエンジンのデータを最後にして「ほほほーあら適当な相槌ですねうむ」と出力する。 (4) The results from engines that make topic conversion more frequent than others are brought in and output later than the results of other engines.
By arranging in this way, it becomes a dialogue that invites the next utterance by having a topic change, and the dialogue becomes easy to continue.
A. For example, when the user utters "Domo Domo", the answers from the three engines a to c are "a engine: yes"-"b engine: yes""c engine: baseball is watched If it is, c engine data will be output last and output as “Don't you think so? Do you watch baseball?”.
B. For example, when the user utters "I can not find it easily", the answers from the three engines a to c are "a engine: yes," b engine: there is "c engine: family In the case of “How many people are you?”, The data of c engine is output at the end as “Yes, there is one – how many families are you?”.
(5) The engine that makes a more friendly response than others is output first, and then the responses from other engines are attached and output.
This arrangement makes it easy for the user to be drawn into the dialogue and the dialogue to be continued. Whether it is friendly or not can be determined in which position it should be placed by ordering relative to the word (word).
A. For example, in (4) B. In the case of b, the result of the engine is first, and the output is "Are there-it is true. How many families are there?"
B. For example, when the user utters "I see," the answer from the two engines a and b is "a engine: I think it's a proper sumo wrestling" and "b engine: umu". At the end of the data, the output is "Hohoho, you're right."

（６）（１）〜（５）の処理を任意に組み合わせる
これによって、対話のバリエーションが増えることとなるため、ユーザーが同じ発話をした場合でもまったく同じ応答が帰ってきてしまうことがなくなり、対話に飽きることがなく対話が続きやすくなる。
例えば「老後って何」とユーザーが発話した場合に、ａ〜ｃの３つのエンジンからの回答が「ａエンジン：ちょっと待ってくださいね」「ｂエンジン：サポートは嫌いじゃないよ」「ｃエンジン：今健康でいらっしゃいますか？」である場合に、最もフレンドリーなｂエンジンを最初にし、肯定文を組み合わせた後、疑問文を組み合わせ、「サポートは嫌いじゃないよちょっと待ってくださいね今健康でいらっしゃいますか？」と出力する。 (6) Arbitrarily combining the processes of (1) to (5) Since this will increase the variation of the dialogue, the same response will not be returned even when the user makes the same utterance, and the dialogue It is easy to continue the dialogue without getting tired.
For example, when the user utters "What is old age", the answers from the three engines a to c are "a engine: please wait for a moment""b engine: I do not hate support""c engine : If you are healthy now? "If you are the most friendly b engine first, then combine the positive statements, then combine the questions," I don't hate support. Do you come? "

（７）テキスト出力用の場合のエンジンでは、カッコや顔文字が帰ってくることがあるため、これらが帰ってきた場合には音声出力を抑制して出力する。そして画面にはそれらは表示させる。
顔文字は音声出力できないが、表示部に敢えて顔文字を表示させることで、音声と併せて対話の一部とすることで通常にはない対話のおもしろさを創出することができる。
Ａ．例えば、「腹減った」とユーザーが発話した場合に、ａ〜ｃの３つのエンジンからの回答が「ａエンジン：(わざと無視)」「ｂエンジン：こんにちはお元気ですね」「ｃエンジン：こんにちは」である場合に、ａエンジンだけは音声出力させず、タッチパネル部７（表示画面）に表示させるようにする。
Ｂ．例えば、「元々入ってる」とユーザーが発話した場合に、ａ〜ｃの３つのエンジンからの回答が「ａエンジン：あなたはよくするんですか?「ｂエンジン：(´・ω・｀)」「ｃエンジン：夜型さんですか？」である場合に、ｂだけは音声出力させず、タッチパネル部７（表示画面）に表示させるようにする。
（８）同じ文字列が含まれる返答についてはいずれか１つを出力する。
同じ文字列が繰り返されると対話がくどくなってしまうし、聞き手に違和感を覚えさせてしまうためである。
例えば、「中華」とユーザーが発話した場合に、ａ〜ｃの３つのエンジンからの回答が「ａエンジン：あらっいいですねぇ」「ｂエンジン：うん、中華です。」「ｃエンジン：中華を食べに行くんでしょうか？」である場合に、ｂエンジンとｃエンジンには「中華」の文字列があるためいずれか一方のみ出力する。例えば「あらっいいですねぇ中華を食べに行くんでしょうか？」のように出力する。 (7) In the engine for text output, parentheses and emoticons may be returned, so when these are returned, the voice output is suppressed and output. And they are displayed on the screen.
Although emoticons can not be output as voices, by displaying emoticons on the display unit, it is possible to create unusual fun of dialogue by combining them with speech as part of the dialogue.
A. For example, if the user as "hungry" is uttered, the answer is "a engine :( deliberately ignored)" from the three engines of a~c "b engine: Hello Take care I am", "c engine: Hello In the case of “a”, only the a engine is not output as voice, and is displayed on the touch panel unit 7 (display screen).
B. For example, when the user utters "I'm originally in", the answer from the three engines a to c is "A engine: Do you do well?""B engine: ('· ω ·`) "" c Engine: Is it a night type? ”, only b does not output voice and is displayed on the touch panel 7 (display screen).
(8) Output any one of the responses including the same character string.
If the same string is repeated, the dialogue will be disturbed, and the listener may feel uncomfortable.
For example, when the user utters “Chinese food”, the answer from the three engines a to c is “a engine: good sounding” “b engine: yes, Chinese food” “c engine: eating Chinese food If you go to "?", Because there is a string of "Chinese" in b engine and c engine, only one of them is output. For example, output like "Are you going to eat Chinese?"

（９）語尾変換手段、例えば語尾変換ＡＰＩを使って統一感を出すようにする。このときすべての返答について語尾変換を行ってもよいが、最後に出力する文か最初に出力する文のいずれか一方にのみ語尾変換を行うようにしてもよい。
普通の対話エンジンの文章に比べて、より親しみやすい表現となるのでよい。
例えば、あるエンジンから「ねむいな」と返答データがあった場合に語尾を語尾変換ＡＰＩによって変換させて「ねむいニャ」というように出力する。
（１０）認識失敗に備えて、複数のエンジンから得た返答のうち一部のみを音声出力に利用し、残りの返答は保持しておき、次の音声認識に失敗したときや対話システムからの返答がなかったときは、その保持しておいた返答を返すようにする。
音声認識が失敗した場合や、外部サーバーからのレスポンスがなかなか来ない場合に使用することで、対話が途切れずにつなげることができ、自然な対話に寄与する (9) Use an end-of-speech conversion means, such as an end-of-speech conversion API, to give a sense of unity. At this time, although the end conversion may be performed on all the responses, the end conversion may be performed on only one of the sentence output last and the sentence output first.
It should be a more familiar expression than the text of a normal dialogue engine.
For example, when there is a response data "Nemuina" from a certain engine, the word ending is converted by the vocabulary conversion API and output as "Nemuina".
(10) In preparation for recognition failure, only a part of the responses obtained from multiple engines is used for voice output, and the remaining responses are retained, and when the next voice recognition fails or from the dialogue system If there is no response, return the stored response.
The dialog can be connected without interruption by using it when speech recognition fails or when the response from the external server does not come easily, it contributes to the natural dialogue

＜実施の形態７＞
図１４に示すように、実施の形態７はロボット１の近傍にスマートスピーカ５１を配置し、ロボット１とスマートスピーカ５１を組み合わせた装置（システム）である。ロボット１とスマートスピーカ５１の間隔は互いのマイクロフォンで音が拾える程度の距離であって例えば１〜２ｍ以内に隣接配置されることがよい。
スマートスピーカ５１は無線ＬＡＮ装置を内蔵し、インターネットを使用した無線通信機能、電話回線接続機能等を有しネットワークモジュールが搭載されたネットワーク端末であり、マイクロフォンとスピーカ装置を備えた一種のコンピュータでもある。スマートスピーカ５１はスマートフォンのような端末装置を利用してサーバーを介して各種初期登録（例えば、使用者の名前、住所、電話番号、メールアドレス登録、複数の音声登録、ブルトゥースによるネットワーク対応のＡＩ機器の設定等）を実行し、音声登録した使用者からの発話（命令）によってインターネットに接続してサーバーの検索エンジンを使用して所定の処理を実行し、その結果をスピーカ装置から音声情報として出力する。 Seventh Embodiment
As shown in FIG. 14, the seventh embodiment is an apparatus (system) in which the smart speaker 51 is disposed in the vicinity of the robot 1 and the robot 1 and the smart speaker 51 are combined. The distance between the robot 1 and the smart speaker 51 is a distance that allows the microphones to pick up the sound, and may be adjacent to each other, for example, within 1 to 2 m.
The smart speaker 51 is a network terminal having a wireless LAN device, a wireless communication function using the Internet, a telephone line connection function and the like and a network module mounted thereon, and is also a kind of computer provided with a microphone and a speaker device. . The smart speaker 51 performs various initial registrations (for example, user's name, address, telephone number, mail address registration, plural voice registrations, network-compatible AI by Bluetooth) via a server using a terminal device such as a smartphone. Set up the device, etc.), connect to the Internet by the voice (instruction) from the user who registered the voice, execute the predetermined processing using the search engine of the server, and use the result as the voice information from the speaker device Output.

ロボット１はスマートスピーカ５１と連携することで互いの機能を補うことができる。具体的にはロボット１とスマートスピーカ５１とを音声をインターフェースとして次のような機能を奏する。
（１）スマートスピーカ５１へのロボット１からの指示機能
イ．例えば、ユーザーがスマートスピーカのスキルを起動するフレーズを喋ったとき、ロボットはそのフレーズの音声認識結果の文字列を記憶しておき、ロボットは自らその文字列を音声合成で所定のタイミングで喋るようにする。
実施の形態７ではロボット１のコントローラＭＣは、ユーザーの発話を周波数成分を分析して個人の声を識別する声識別プログラム、ユーザーの発話を個人毎に区別して舞う頃フォン１２によって取得し、文字列データとして記憶させ、その文字列データに基づいて音声合成してスピーカ装置１３から再生させるフレーズを録音・再生プログラムを備えている。
ロボット１は、例えば発話を記憶するトリガーとなる発話、例えば「今からしゃべるから、録音して」という発話の後の言葉を記憶する機能を有している。そして、ユーザーはこの機能を利用して、スマートスピーカ５１を、起動させたりなんらかの処理をさせるような言葉を記憶させるようにする。例えば「ＯＫ、×××。照明をつけて。」のような言葉がよい。このとき、ロボット１に登録されるユーザー個人の音声は、スマートスピーカ５１に登録されるユーザー個人の声である。
そして、ロボット１に所定のタイミングで発話させるようにする。所定のタイミングで発話させる設定は、例えばスマートフォンのような端末を操作して設定登録できる。
ロ．ロボット１の「所定のタイミングでの発話」としては、例えばロボット１がなんらかの変化を検知すること、例えばタッチパネル部７へのタッチ動作や、ドップラーセンサ２２による物体（人）の検知等である。
例えばロボット１のコントローラＭＣはドップラーセンサ２２によって人を検知した場合に「ＯＫ、×××。照明をつけて。」というようにスピーカ装置１３から音声出力をさせる。それを受けてスマートスピーカ５１はネットワーク対応しているＡＩ機器である室内の照明を点灯させるように制御する。尚、制御される照明は前もってスマートスピーカ５１によって制御される対象であるように登録されている。照明以外に例えば、エアコン、テレビ、カーテンの開閉装置等をＡＩ機器とすることがよい。 The robots 1 can compensate each other's functions by cooperating with the smart speakers 51. Specifically, the following function is exerted using the robot 1 and the smart speaker 51 as an audio interface.
(1) Instruction function from the robot 1 to the smart speaker 51 a. For example, when the user sings a phrase that activates the smart speaker skill, the robot stores a character string of the speech recognition result of the phrase, and the robot sings the character string by voice synthesis at a predetermined timing by itself. Make it
In the seventh embodiment, the controller MC of the robot 1 is a voice identification program that analyzes user's speech by analyzing frequency components and identifies an individual's voice, and acquires it by the phone 12 when the user's speech is distinguished for each individual and dances A phrase recording / reproducing program is provided which is stored as string data, is speech synthesized based on the character string data, and is reproduced from the speaker device 13.
The robot 1 has a function of storing, for example, an utterance serving as a trigger for storing an utterance, for example, a word after the utterance “I speak from now, then record”. Then, the user uses this function to make the smart speaker 51 store words that activate or perform some processing. For example, words such as “OK, ××× with illumination.” Are good. At this time, the voice of the user personal registered in the robot 1 is the voice of the user personal registered in the smart speaker 51.
Then, the robot 1 is made to speak at a predetermined timing. For example, the setting for uttering at a predetermined timing can be registered by operating a terminal such as a smartphone.
B. The “utterance at a predetermined timing” of the robot 1 is, for example, that the robot 1 detects any change, for example, a touch operation on the touch panel unit 7, detection of an object (person) by the Doppler sensor 22, or the like.
For example, when the controller MC of the robot 1 detects a person by the Doppler sensor 22, the controller MC causes the speaker device 13 to output an audio as "OK, xx. Turn on the light." In response to this, the smart speaker 51 is controlled to turn on the room lighting which is the AI device compatible with the network. The illumination to be controlled is registered in advance as an object to be controlled by the smart speaker 51. Besides the lighting, for example, an air conditioner, a television, a curtain opening and closing device, etc. may be used as the AI device.

（２）スマートスピーカ５１からの発話かユーザーの発話かを区別する機能
ユーザーの個人の声を識別してロボット１に設定登録することで、ロボット１がスマートスピーカ５１からの発話か、あるいはユーザーの直の発話かを区別する機能を備えることができる。これによって、ロボット１が登録されていない声であるスマートスピーカ５１からの音に反応しないように制御することができ、逆に（１）のようにスマートスピーカ５１とロボット１にそれぞれユーザーの個人の声を登録することで、スマートスピーカ５１に対してユーザーだけでなくロボット１からも指示をすることができる。このように個人の発話を区別できることでスマートスピーカ５１への音声操作を妨害しないという機能も有する。
（３）スマートスピーカ５１からの発話の表示機能
スマートスピーカ５１にユーザーが指示した発話内容をロボット１が取得して文字テキスト化し、タッチパネル部７に表示させるようにしてもよい。
ロボット１のコントローラＭＣは発話内容を取得して自身であるいはサーバーに接続して文字テキスト化するプログラムを有しているとよい。
例えば、ユーザーが「ＯＫ、×××。今日の天気を教えて。」と発話し、これに対してスマートスピーカ５１が「今日の愛知県岡崎市の天気は晴れ、最高気温１５度、降水確率は２０％です」と回答した場合に、これらのすべての対話を、例えば、ロボット１のタッチパネル部７には例えば、次のように聞き取った両者の対話が表示される。
「ユーザー：ＯＫ、×××。今日の天気を教えて。
スマートスピーカ：今日の愛知県岡崎市の天気は晴れ、最高気温１５度、降水確率は２０％です」
また、加えてロボット１のコントローラＭＣは文字テキスト化した内容を短く翻案したり要約するプログラムを有しているとよい。ロボット１は翻案したり要約した内容を音声出力又はタッチパネル部７への表示させるようにするとよい。
（４）人がいないときにロボット１がスマートスピーカ５１へ色々聞いて学習しておく機能
例えば、ロボット１がビルトインシナリオとして「明日の天気は？」とか「なにか事件はないですか」などという質問ワードを有しており、ユーザーが留守の時にロボット１のコントローラＭＣに所定のタイミングでスマートスピーカ５１を起動するフレーズと一緒に質問ワードを音声出力させるようにする（ロボット１の声はスマートスピーカ５１に登録済みとする）。このとき、コントローラＭＣはスマートスピーカ５１からの発話内容をロボット１はマイクロフォン１２によって取得して記憶しておき、所定のタイミングでその内容を音声出力させる。所定のタイミングとは、例えば所定の時間、ドップラーセンサ２２によって人を検知した際、ユーザーがロボット１に「何かニュースはないの？」というようなビルトインシナリオとしての発話を行った際等である。 (2) A function to distinguish whether the user speaks from the smart speaker 51 or the user's voice By identifying the user's voice and setting and registering the robot 1, whether the robot 1 speaks from the smart speaker 51 or the user's voice It can be equipped with a function to distinguish whether it is a direct utterance. As a result, the robot 1 can be controlled not to respond to the sound from the smart speaker 51 which is a voice not registered, and conversely, the smart speaker 51 and the robot 1 each have individual user's By registering the voice, it is possible to instruct the smart speaker 51 not only from the user but also from the robot 1. The ability to distinguish individual utterances in this manner also has a function of not disturbing the voice operation on the smart speaker 51.
(3) Display Function of Utterance from Smart Speaker 51 The robot 1 may acquire the utterance content instructed by the user to the smart speaker 51 and convert it into text and display it on the touch panel unit 7.
The controller MC of the robot 1 may have a program for acquiring speech content and connecting to the server itself or to a server to convert it into text.
For example, the user utters "OK, ×××. Tell me the weather of today." In contrast, the smart speaker 51 says, "The weather in Okazaki, Aichi Prefecture today is fine, the maximum temperature is 15 degrees, the rainfall probability Is 20%, "for example, the touch panel unit 7 of the robot 1 displays all these dialogs, for example, the dialogs of both heard as follows.
"User: OK, xx. Tell me the weather today.
Smart Speaker: Today's weather in Okazaki, Aichi Prefecture is fine, the maximum temperature is 15 degrees, the probability of precipitation is 20%. "
In addition, it is preferable that the controller MC of the robot 1 have a program for short adaptation and summarization of the text contents. The robot 1 may be configured to cause the contents adapted or summarized to be displayed as voice output or on the touch panel unit 7.
(4) Function that robot 1 listens to smart speaker 51 in various ways and learns when there are no people It has a word, and makes the controller MC of the robot 1 voice output the question word together with the phrase for activating the smart speaker 51 at a predetermined timing when the user is away (the voice of the robot 1 is the smart speaker 51 Registered with). At this time, the controller MC causes the robot 1 to acquire and store the utterance content from the smart speaker 51 by the microphone 12 and causes the content to be voice-outputted at a predetermined timing. The predetermined timing is, for example, when the user has detected a person by the Doppler sensor 22 for a predetermined time, the user utters the robot 1 as a built-in scenario such as "Is there any news?" .

＜実施の形態７の変形例１＞
スマートスピーカ等、他の音声認識機器の音声操作を妨害しない機能を設定するようにしてもよい。例えば、他のスマートスピーカの起動フレーズ（音声認識開始ワード）を、の音声をロボット１が認識した場合、自身の音声出力を停止するようにしてもよい。
例えば、他のスマートスピーカであるＡ社の起動フレーズである「ＯＫ、×××」のような起動用のフレーズの音声を認識した場合に、ロボット１のコントローラＭＣはそれをマイクロフォン１２から取得し、登録済みの起動フレーズであると判断すると、自身の音声出力を一旦停止させる。
これによって音声認識機器の音声操作を妨害せずに、機能を発揮させることができる。 <Modification 1 of Embodiment 7>
A function that does not disturb the voice operation of another voice recognition device such as a smart speaker may be set. For example, when the robot 1 recognizes the voice of the activation phrase (voice recognition start word) of another smart speaker, the voice output of the robot itself may be stopped.
For example, when the voice of the activation phrase such as “OK, xxx” which is the startup phrase of company A, which is another smart speaker, is recognized, the controller MC of the robot 1 acquires it from the microphone 12 If it is determined that the activation phrase has been registered, the voice output of its own is temporarily suspended.
This allows the function to be exhibited without disturbing the voice operation of the voice recognition device.

＜実施の形態７の変形例２＞
ロボット１にスマートスピーカ５１のような他の音声認識機器の音声認識起動キーワードを認識し、その後のユーザーのスマートスピーカ５１への発音を認識してクラウドサーバ−にリクエストして、検索エンジン等に検索をさせて対応する回答を得ておく。
ロボット１は、スマートスピーカ５１が音声認識に失敗してしまった場合（例えば「エラーです」など）や音声認識結果に対する適切な回答を出力できない旨の音声出力（例えば「すみません」など）を認識した場合、ロボット１は自身が前もって得ておいた回答を出力する。あるいは、音声認識に失敗してしまった場合や音声認識結果に対する適切な回答を出力できない旨の音声出力を受けてから、ロボット１はクラウドサーバ−にリクエストして回答を得るようにしてもよい。 <Modification 2 of Embodiment 7>
The robot 1 recognizes the speech recognition activation keyword of another speech recognition device such as the smart speaker 51, recognizes the subsequent pronunciation on the smart speaker 51 of the user, makes a request to the cloud server, and searches the search engine etc. Let me do it and get the corresponding answer.
The robot 1 recognizes voice output (for example, "Sorry") that the smart speaker 51 fails in voice recognition (for example, "error") or can not output an appropriate answer to the voice recognition result. In this case, the robot 1 outputs an answer it has obtained in advance. Alternatively, the robot 1 may request the cloud server to obtain an answer after receiving an audio output indicating that the speech recognition has failed or an appropriate answer to the speech recognition result can not be output.

＜実施の形態８＞
ロボット１は、例えばユーザーの要求によってｗｅｂサイト上のニュース記事を音声で読み上げるようにしてもよい。ロボット１のコントローラＭＣはサーバー上での検索エンジンを利用したニュース記事のリクエストをし、クラウドサーバーはそのリクエストに対して、例えば登録サイトのニュースデータをテキストデータとしてレスポンスする。
ロボット１はニュースデータを音声合成して読み上げる（出力する）と同時に記事の情報源の名称も音声合成して読み上げる（出力する）。また、併せて表示画面としてのタッチパネル部７には記事の情報源のＵＲＬを表示をし、タッチパネル部７上でそのＵＲＬにタッチされたら、そのＵＲＬのページの内容をタッチパネル部７上に表示するようにする。
また、記事や記事の情報源を読み上げる場合には、それらが引用であることがわかるような表現で出力することがよい。
例えば、ＵＲＬ「https://\\\\.jp/archives/92###」の記事内容を読み上げる場合を説明する。
『「××ニュース」のサイトの記事を読み上げるよ。「・・・・・を本年１月１５日より販売する。」そうだよ。』
というように、例えば「のサイトの記事を読み上げるよ。」や「そうだよ」というような記事や記事の情報源以外を正規表現として引用であるように、聞き手にわかるように発話させ、この場合では画面表示に、例えば『https://\\\\.jp/archives/92###の情報だよ。』というようにＵＲＬを表示させる。そしてこのＵＲＬ部分にタッチすることでタッチパネル部７に読み上げた記事の内容を改めて表示させる。
「聞き手にわかるように発話」とは記事部分とそうでない部分で、例えば語調や声を変えるようにすることがよい。
このようにすれば、Ｗｅｂ記事を読まなくともロボット１の読み上げた内容を聞き取るだけでニュース内容を理解でき、場合によっては念のため目視でニュース内容を確認することもできる。 Embodiment 8
The robot 1 may read the news articles on the web site by voice, for example, at the request of the user. The controller MC of the robot 1 requests a news article using a search engine on the server, and the cloud server responds to the request, for example, news data of a registered site as text data.
The robot 1 synthesizes speech data of the news data and reads it out (outputs it), and at the same time, it synthesizes the name of the information source of the article and reads it out (outputs it). At the same time, the URL of the information source of the article is displayed on the touch panel 7 as a display screen, and when the URL is touched on the touch panel 7, the contents of the page of the URL are displayed on the touch panel 7. Let's do it.
In addition, when reading out an article or an information source of an article, it is preferable to output in an expression that indicates that the article is a citation.
For example, the case where the article content of URL "https: // \\\\ .jp / archives / 92 ###" is read out will be described.
"I'll read the article on the" XX News "site. "We will sell ... from January 15, this year.""
For example, let's utter it so that the listener can see it as a regular expression, citing something other than an article or article information source such as "Read the article of the site's site" or "Yes". So, for example, 'https: // \\\\ .jp / archives / 92 ### information on the screen display. And so on, to display the URL. Then, touching the URL portion causes the touch panel unit 7 to display again the contents of the read article.
"Speech so as to be understood by the listener" may be, for example, changing the tone or voice between the article part and the other part.
In this way, even if the Web article is not read, the news contents can be understood simply by listening to the contents read by the robot 1 and in some cases, the news contents can be checked visually just in case.

＜実施の形態９＞
実施の形態９ではコンピュータの見えない動きをロボットのアクチュエータの動作で見せる場合について説明する。
（１）ロボティクスプロセスオートメーションについて
ロボティクスプロセスオートメーション（以下、ＲＰＡとする）は、単純なパソコン作業を自動化するソフトウェアである。ソフトウェアはサーバーに設定することもでき、ユーザーのコンピュータに設定することもできる。図１５に基づいてソフトウェアをサーバーに設定した場合であって、上記各実施の形態のロボット１をＲＰＡのネットワークに組み込んだ場合の一例について説明する。
図１５に示すように、クラウドサーバー５５とユーザー側コンピュータ５６、５７とがインターネットを使用したネットワークで接続されている。また、クラウドサーバー５５とロボット１もネットワークで接続されている。ユーザー側コンピュータ５６はＲＰＡプログラムによってクラウドサーバー５５によって制御されている。また、ロボット１にはクラウドサーバー５５によって実行されるＲＰＡのためのプログラムにおける所定の処理においてその処理がまもなく実行される、実行されている、あるいは実行された等の処理情報が報知されるようになっている。
クラウドサーバー５５はユーザー側コンピュータ５６に処理１〜処理４を順に処理させる。本実施の形態では処理1と処理２はコンピュータ５６、処理３と処理４はコンピュータ５７が実行する。もっと多くの処理を設定してもよく、処理に関わるユーザー側コンピュータ５６も１以上いくつでもよい。
処理１としては、例えばコンピュータ５６へのユーザーのアクセス・ログイン等、処理２としては、例えばコンピュータ５６内のデータに基づくリストの作成・仕分け等、処理３は、例えば処理２に続いて実行する顧客毎の請求内容の修正、処理４は、例えば処理３に続いて実行する請求書の発行である。本実施の形態９では例えば処理３ではユーザに修正のための入力を促し、その入力があって後に、次の処理４に移行するものとする。つまり、処理２の後は処理３での力が完了するまで一旦待ち受けードとなる。
クラウドサーバー５５はこれらの処理を実行する直前、処理中、処理後にそれぞれロボット１に異なる報知情報を出力し、ロボット１はその報知情報に基づいてロボット１の周囲に処理状況を報知するようにするとよい。あるいは各処理毎に一回の報知でもよい。 <Embodiment 9>
In the ninth embodiment, the case of showing the invisible motion of the computer by the motion of the actuator of the robot will be described.
(1) Robotics Process Automation Robotics Process Automation (hereinafter referred to as RPA) is software that automates a simple personal computer task. The software can be configured on the server or on the user's computer. An example of the case where the software is set in the server based on FIG. 15 and the robot 1 of each of the above embodiments is incorporated in the network of RPA will be described.
As shown in FIG. 15, a cloud server 55 and user computers 56, 57 are connected by a network using the Internet. The cloud server 55 and the robot 1 are also connected by a network. The user side computer 56 is controlled by the cloud server 55 by the RPA program. Further, the robot 1 is notified of processing information such as processing being executed, executed, or executed in a predetermined process in a program for RPA executed by the cloud server 55 soon. It has become.
The cloud server 55 causes the user-side computer 56 to process processes 1 to 4 in order. In the present embodiment, processes 1 and 2 are executed by the computer 56, and processes 3 and 4 are executed by the computer 57. More processes may be set, and one or more user computers 56 may be involved in the process.
Process 1 includes, for example, access / login of the user to the computer 56, and process 2 includes, for example, creation / sorting of a list based on data in the computer 56. Process 3 is, for example, a customer executed following process 2 The modification of the contents of each bill, process 4 is, for example, issuance of a bill to be executed following process 3. In the ninth embodiment, for example, in process 3, the user is prompted to input for correction, and after the input is made, the process proceeds to the next process 4. That is, after the process 2, it temporarily becomes a standby mode until the power in the process 3 is completed.
The cloud server 55 outputs different notification information to the robot 1 immediately before, during, and after these processing, and the robot 1 notifies the processing status around the robot 1 based on the notification information. Good. Alternatively, one notification may be made for each process.

例えば、
ａ．ロボット１がどの処理がどのような状態かを音声や音の違い、あるいは音楽等で報知する。
ｂ．表示画面上で報知する。ａ．と同時に行ってもよい。
ｃ．処理３ではユーザーの入力が必要であるため、処理３だけを報知するようにしてもよく、処理３だけを他の報知とは異なる（識別できる）報知としてもよい。
ｄ．ロボット１から他の端末装置に処理の状態を転送して報知する。
ｅ．ロボット１が処理状況がわかるような動作をする。例えば、コンピュータ５６の処理が行われていればその方向を向くように制御する。そのため、前もってロボット１に対する各コンピュータ５６、５７の方向は何らかの方向特定手段、例えば上記の形状認識プログラムを使用して認識しておくことがよい。ロボット１に例えば矢印や腕部材のような方向指示部材を設け、その指し示す方向に報知対象としてのコンピュータ５６、５７があるように動作してもよい。 For example,
a. The robot 1 reports which process is in what state, by the difference in sound or sound, or music.
b. Report on the display screen. a. You may go at the same time.
c. In the process 3, since the user's input is required, only the process 3 may be notified, and only the process 3 may be an alert (differentiable) different from other informing.
d. The state of processing is transferred from the robot 1 to another terminal device and notified.
e. The robot 1 operates so as to know the processing status. For example, if the processing of the computer 56 is being performed, it is controlled to turn in that direction. Therefore, the direction of each computer 56, 57 relative to the robot 1 may be recognized in advance using some direction specifying means, for example, the shape recognition program described above. For example, the robot 1 may be provided with a direction indicating member such as an arrow or an arm member, and the computer 56 or 57 as a notification target may be operated in the pointing direction.

（２）ブロックチェーンについて
ブロックチェーンは多数のコンピュータが分散して記録する仕組みである。特にパブリック型のブロックチェーンでは記録対象のデータや記録されたデータが公開される。
そこでブロックチェーンのネットワーク中にロボット１を配置し、ブロックデータが送信される前にユーザーにロボット１がお知らせするようにする。ロボット１に「待て」という命令を出力させることで（つまりデータ送信させずに待機するリクエストをする）送信を停止させるようにするとよい。 (2) Blockchain The blockchain is a mechanism in which a large number of computers distribute and record. In particular, in a public type block chain, data to be recorded and recorded data are released.
Therefore, the robot 1 is placed in the block chain network so that the robot 1 notifies the user before the block data is transmitted. The transmission may be stopped by outputting an instruction “wait” to the robot 1 (that is, a request for waiting without transmitting data).

＜実施の形態１０＞
各実施の形態では自然対話モードについて説明したが、自然対話モードに代えて、または、自然対話モードとともに、外国語学習モードを設けるとよい。自然対話モードに加えて外国語学習モードを設けるときは、例えば自然対話モードで「外国語学習モードへ切り替え」という音声を認識したときに自然対話モードから外国語学習モードへ切替えるとよい。また「外国語学習モード」で「自然対話モードへ切り替え」という音声を認識したときに外国語学習モードから自然対話モードへ切替えるとよい。
よく外国語を習得するには外国人の友人を作るとよいなどと言われるが、そのような機会に恵まれる人は多くない。そこで外国語学習のパートナーになり得る対話システムである外国語学習機能を備えるとよい。
任意の第一言語と第二言語との連携とすることができる。以下、日本語の対話システムと英語の対話システムを連携させる構成で説明する。
基本的には英語で会話するシステムとし、会話中に日本語で「もう一回言って」などの要求を出力するとよい。さらに、英文解析Webサービスなどと連携して、「説明して」などの要求にたいして会話中の英文を日本語で解説する機能を備える。会話中に言いたいことが英語でどう言えばいいかわからないときには、「翻訳して」と要求をすると英語でどういうのかを出力する。出力された英文を読めばそのまま会話を続けることができる。自分で調べたりする必要がないので、会話が途切れることもなく、円滑な英会話学習が期待できる。
外国語学習モードでの母国語（例えば、日本人なら日本語）の音声認識エンジンと外国語（例えば英語）の音声認識エンジンはどちらもクラウド上で動作している音声認識エンジンを利用するようにしてもよいが、母国語（例えば日本語）については要求内容が定型文であること、要求に対する回答のフォーマットが決まっていることから、ローカル（例えばロボット１内）に音声認識エンジンを設けこれを利用するとよい。対話エンジンも同様である。音声合成エンジンもいずれの場所に設けてもよいが、特にローカルに設けるとよい。
マイクロフォン１２からの信号に基づく音声データを両言語の音声認識エンジンに投げると、どちらの音声認識エンジンからもなんらかの結果が返ってくる。例えば、「もう一回言って」という日本語を両方のエンジンに投げると、日本語のエンジンは「もう一回言って」というテキストデータを返し、英語のエンジンは「もう一回言って」を英語として解釈したデタラメなテキストデータを返してくる。このような場合、日本語の要求は定型文であるため、日本語のエンジンが返してきたテキストデータと要求の定型文を比較して、一致すれば日本語の要求がされたと判断し、一致しなければ英語が話されたと判断することで、英語と日本語を切り分ける処理を行なうと良い。
要求は英語で受け付けるようにしてもよいが、特に、要求をしようにも英語がわからないというケースを想定して、日本語でも要求できるようにすることが望ましいことを発明者は見出した。
「説明して」などの要求に対して会話中の英文を日本語で解説する機能は、単に英文の訳を日本語にして出力するだけでもよいが、特に英文で用いられている語句や文法の解説を出力するとよい。特にその英文で用いられている構文についての解説を出力するとよい。構文についての解説は例えば各句を頂点（ノード）して例えば各句を囲む描画をし、関連する各句の関係を示す線分等の枝（エッジ）を描画するとよい。例えばグラフ構造（特にツリー構造とするとよい）の図でタッチパネル部７に表示するとよい。
また、表示した内容を音声でスピーカ装置１３から出力するとよい。例えば、https://gigazine.net/news/20160602-foxtype-review/で解説されるような構文解析サービスのAPIをコールし、その結果を受け取って、解析結果を日本語で出力する構成とするとよい。
例えば以下のような処理と出力を行なう。
『処理英語の対話エンジンからフレーズを取得
ロボット I'm a fantastic robot.
人「もう一回言って」
ロボット I'm a fantastic robot.
人「説明して」
処理（「説明して」を認識）→構文解析APIコール→構文解析結果から日本語解説を生成
ロボット Iが主語で、amが動詞、robotが目的語になるよ。
fantasticは素晴らしいという意味の形容詞でrobotを修飾しているよ。
英文は「私は素晴らしいロボットです」という意味になるよ。
人「I don't think so.」
ロボット Don't say it! 』
話したいことが英語でわからなければ、日本語で英語での言い方を教えてくれるように要求できるので会話が途切れないという優れた効果を発揮する。日本語での要求ができない場合は、英語での言い方がわからないとき、辞書で調べたりネットで翻訳したりする必要があり、勉強しているという感じになってしまいストレスを感じる。日本語で要求できれば、ただバイリンガルと会話しているという感覚でストレスなく学習できる。語学学習は継続することがとても大事であるから、なるべく学習の際にストレスが少ないということは継続する上で極めて重要なことである。本構成によれば、継続して語学学習を行なえるロボット１を実現できる。 <Embodiment 10>
Although the natural dialogue mode has been described in each embodiment, the foreign language learning mode may be provided instead of the natural dialogue mode or in addition to the natural dialogue mode. When a foreign language learning mode is provided in addition to the natural dialogue mode, for example, it is preferable to switch from the natural dialogue mode to the foreign language learning mode when a voice "switch to foreign language learning mode" is recognized in the natural dialogue mode. In addition, it is preferable to switch from the foreign language learning mode to the natural dialogue mode when the voice "switch to the natural dialogue mode" is recognized in the "foreign language learning mode".
It is often said that it is better to make foreign friends to master foreign languages, but not many people have such an opportunity. Therefore, it is preferable to have a foreign language learning function which is a dialogue system that can be a partner in foreign language learning.
It can be a combination of any first language and second language. In the following, the configuration will be described in which a Japanese dialog system and an English dialog system are linked.
Basically, it is a system that speaks in English, and it is preferable to output a request such as "Speak again" in Japanese during the conversation. Furthermore, in cooperation with the English analysis Web service, etc., it has a function to explain English in conversation in Japanese in response to a request such as "explain." If you don't know what to say in English when speaking in English, when you ask for "translate", it will output what it is in English. You can continue the conversation if you read the English output. Since you do not have to check it yourself, you can expect smooth English language learning without interruption in conversation.
Both the native language (for example, Japanese if Japanese) speech recognition engine and the foreign language (for example English) speech recognition engine in the foreign language learning mode use a speech recognition engine operating on the cloud. However, for native language (for example, Japanese), since the request content is a fixed phrase and the format of the response to the request is determined, a speech recognition engine is provided locally (for example, in the robot 1). It is good to use. The dialog engine is similar. The speech synthesis engine may be provided anywhere, but in particular locally.
When speech data based on the signal from the microphone 12 is thrown to the speech recognition engine of both languages, some result is returned from both speech recognition engines. For example, if you throw the Japanese saying "Tell me once more" to both engines, the Japanese engine will return the text data "Tell me once more" and the English engine will say "Tell me once more" It returns unwanted text data interpreted as English. In such a case, since the Japanese request is a fixed phrase, the text data returned by the Japanese engine and the fixed phrase of the request are compared, and if they match, it is determined that the Japanese request has been received. If you do not do so, it is better to separate English and Japanese by judging that English is spoken.
Although the request may be accepted in English, the inventor has found that it is desirable to be able to request in Japanese, especially in the case where the request is not understood in English.
The function to explain English text in conversation in Japanese in response to a request such as “explain” may be as simple as outputting Japanese translation of English text, but it is especially possible to use phrases and grammars used in English text. You should output the commentary of. In particular, it is good to output the explanation about the syntax used in the English language. For explanation of the syntax, for example, it is preferable to draw each phrase as a vertex (node), for example, to draw each phrase, and to draw a branch (edge) such as a line segment indicating the relationship of each related phrase. For example, it may be displayed on the touch panel unit 7 in the form of a graph structure (in particular, a tree structure).
Further, it is preferable to output the displayed content from the speaker device 13 by voice. For example, it is assumed that the parsing service API described in https://gigazine.net/news/20160602-foxtype-review/ is called, the result is received, and the analysis result is output in Japanese. Good.
For example, the following processing and output are performed.
"Process Phrase from English dialogue engine Robot I'm a fantastic robot.
People "Say it one more time"
Robot I'm a fantastic robot.
Person "explain"
Process (recognize “describe”) → Parsing API call → Generate Japanese commentary from syntax analysis result Robot I is the subject, am is the verb, robot is the object.
Fantastic is an adjective that means to modify the robot.
English text means "I am a wonderful robot".
Person "I don't think so."
Robot Don't say it!
If you don't know what you want to talk in English, you can request that you teach me how to speak in English in Japanese, so it has the excellent effect that the conversation is uninterrupted. If you can not make a request in Japanese, and you do not know how to speak in English, you need to look in a dictionary or translate it on the net, which makes you feel that you are studying and feel stressed. If you can request in Japanese, you can learn without stress as if you are just talking with bilingual. Since it is very important to continue language learning, it is extremely important to keep stress as low as possible. According to this configuration, it is possible to realize the robot 1 capable of performing language learning continuously.

＜実施の形態１１＞
各実施の形態で説明した機能に加え、ロボット１の設置された室内へ人が入ってきたことを検知したとき発話する機能を設けるとよい。またロボット１の設置された室内から人が出ていくことを検知したとき発話する機能を設けるとよい。
例えば、その室内とその室内以外の場所の通路にセンサを設けて、ロボット１の設置された室内へ人が入ってきたこと、ロボット１の設置された室内から人が出ていくことを検知するとよい。特にロボット１が設置された室内に出入りするための自動ドアがある場合、センサは特に自動ドアの開閉のために人がその自動ドアに接近していることを検知するセンサを用いると良い。特に自動ドアをはさんで室外にある第一の人検知センサと、自動ドアをはさんで室内にある第二の人検知センサと、自動ドアが開いているときに自動ドアの場所にいる人を検知する第三の人検知センサセンサ（人のドアへの挟み込みを防止するためのセンサ）の少なくともいずれか２つにロボット１のコントローラMCを接続して、室内への出入りを検知するとよい。このようにすれば、ロボット１が設置された室内への出入り等を新たなセンサを設置することなく検出できる。例えば各センサの人を検知した際に立ち上がる信号のエッジを捉えて検出するとよい。
例えば、第一の人検知センサで人が検知された後、第三の人検知センサで人が検知された場合、「いらっしゃいませ」などと入ってきた人を歓迎するフレーズの音声をスピーカ装置１３から出力するとよい。例えば、第二の人検知センサで人が検知された後、第三の人検知センサで人が検知された場合、「ありがとうございました」と出ていく人に感謝するフレーズの音声をスピーカ装置１３から出力するとよい。
これらのときに第１〜第３のモータ２３〜２５を動かし、ロボット１の設置位置から予め設定した自動ドアの方を向く動作を行なうようにするとよい。なお、第一の人検知センサと第二の人検知センサとが同じ時に人を検知した場合には、第一の人検知センサを優先するとよい。このようにすれば、入ってくる人によりロボット１の存在を気づいてもらいやすくなるとともに、入ってくる人が「ありがとうございます」とロボット１にいきなり言われる違和感を軽減できる。 Embodiment 11
In addition to the functions described in each of the embodiments, it is preferable to provide a function of speaking when it is detected that a person has entered the room in which the robot 1 is installed. In addition, it is preferable to provide a function to speak when it is detected that a person leaves the room in which the robot 1 is installed.
For example, if a sensor is provided in the passage of the room and a place other than the room and it is detected that a person has entered into the room in which the robot 1 is installed and a person is out of the room in which the robot 1 is installed. Good. In particular, when there is an automatic door for entering and exiting the room in which the robot 1 is installed, the sensor may be a sensor that detects that a person is approaching the automatic door, particularly for opening and closing the automatic door. In particular, the first person detection sensor outside the automatic door, the second person detection sensor indoors through the automatic door, and the person at the automatic door when the automatic door is open The controller MC of the robot 1 may be connected to at least two of the third human detection sensors (sensors for preventing a person from being pinched) to detect movement in and out of the room. In this way, it is possible to detect entering and leaving the room where the robot 1 is installed without installing a new sensor. For example, it is preferable to capture and detect an edge of a signal that rises when a person of each sensor is detected.
For example, when a person is detected by the first person detection sensor and then a person is detected by the third person detection sensor, a voice of a phrase that welcomes a person who has come in, such as “Welcome me,” can be used as the speaker device 13 It is good to output from. For example, after a person is detected by the second person detection sensor, when a person is detected by the third person detection sensor, the speaker device 13 sounds a phrase that thanks the person saying "Thank you". It is good to output from.
At these times, the first to third motors 23 to 25 may be moved to perform an operation of pointing the automatic door set in advance from the installation position of the robot 1. If the first human detection sensor and the second human detection sensor detect a person at the same time, priority may be given to the first human detection sensor. In this way, it becomes easier for the incoming person to notice the presence of the robot 1, and it is possible to reduce the sense of incongruity that the incoming person says "thank you" to the robot 1 suddenly.

＜その他の実施の形態＞
（１）各実施形態等においては、無線ＬＡＮ装置２１を備えることとしたが、これに代えてまたはこれとともに有線ＬＡＮ装置を備え、有線ＬＡＮネットワークに接続するようにしてもよい。有線ＬＡＮ装置はロボット１に内蔵しても、外付けとしてもよい。ＵＳＢのＯＴＧ（On-The-Go）用の端子１８に有線ＬＡＮ装置を接続する構成としてもよい。有線ＬＡＮネットワークはルーター等を介してインターネットに接続される構成とするとよい。無線ＬＡＮは環境によっては通信が安定しないまたは接続できないケースも想定されうる。例えばスマホ等、多数の無線ＬＡＮ装置が存在する場所にロボット１を設置する場合には有線ＬＡＮ装置を介してインターネットにアクセスする構成とすると望ましい。
（２）各実施形態等においては、半二重方式での人とロボット１との対話の例を示しているが、全二重方式で人とロボット１との対話を行なうようにしてもよい。例えば表１の対話の中で「・・・を開いて」と人が行った後、「本当にいいですか」の発話を行っている間も人の音声の認識を続け「取消」という音声が「本当にいいですか」の発話中に認識された場合には、発話中であれば発話を中断し、すぐに「中止しました」とロボットから発話するように構成してもよい。
（３）半二重方式での対話を行なう構成は、構成や処理を簡素化でき、コストを低減できるので特によい。しかし、ロボット１がマイクロフォン１２をオンにしたタイミングが分かりづらく、ユーザーが喋っても、ロボットが認識対象とするユーザーの音声の先頭部分、すなわち言葉の先頭部分が欠けてしまうことが多いという課題を発明者らは見出した。この課題を解決するため、コントローラＭＣがマイクをオンしたタイミングで特徴的な画面表示をおこなうとよい。これによりスムーズな会話をサポートする。コントローラＭＣがマイクをオンしたタイミングで特徴的な画面表示をおこなう態様としては、１）画面の四隅を光らせる、２）マイクのアイコンを表示させる、の少なくともいずれか一方を行なうとよく、特に１）、２）の両方とも行うと優れた効果を発揮する。
（４）第１〜第３のモータ２３〜２５を構成するモータとしては、ＤＣモータなど各種のモータとすることができるが、特にステッピングモータとするとよく、ロボット１はステッピングモータにより姿勢を制御する構成とするとよい。ステッピングモータはモータに流す電流に比例してトルクの大きさが変わる。電流をたくさん流せば大きなトルクを得られるが、発熱や電池寿命などが問題になる。そこでロボット１の静止時はその姿勢を維持するために必要な最小限の電流を流し、ロボット１が姿勢を変えるときのみ大きな電流を流すようにすると特によい。なお、サーボモータ２３〜２５のすべてについてその静止時に姿勢を維持するために必要な最小限の電流を通電するようにしてもよいが、ディテントトルク(通電しない状態でのトルク)で支持できる胴体部の第１のモータ２３は通電しないようにする一方、頭部分はディテントトルクでは負けてしまうため第２のモータ２４，第３のモータ２５については静止時もこの通電をするようするとよい。
また、静止時のトルクのまま回転させるとトルク不足で脱調が起こり上手く回転しないため、回転させる時は静止時に比べ、電流をたくさん流すようにしてトルクを上げるとよい。一方、静止時は回転時のような大きなトルクは必要ないので回転時に比べ、電流を下げるとよい。
また、ある方向に向きを変える場合、加速しながら一定速度まで上げて目的の角度が近づいたら減速して止めるという制御を行なうとよい。
また、ロボットをモータによって駆動させると機械的、電気的なノイズを発生する。このノイズが音声認識の認識率を低下させるので音声認識中はモータを停止するように制御するとよい。
本発明の範囲は，明細書に明示的に説明された構成や限定されるものではなく，本明細書に開示される本発明の様々な側面の組み合わせをも，その範囲に含むものである。本発明のうち，特許を受けようとする構成を，添付の特許請求の範囲に特定したが，現在の処は特許請求の範囲に特定されていない構成であっても，本明細書に開示される構成を，将来的に特許請求の範囲とする意思を有する。
本願発明は上述した実施の形態に記載の構成に限定されない。上述した各実施の形態や変形例の構成要素は任意に選択して組み合わせて構成するとよい。また各実施の形態や変形例の任意の構成要素と，発明を解決するための手段に記載の任意の構成要素または発明を解決するための手段に記載の任意の構成要素を具体化した構成要素とは任意に組み合わせて構成するとよい。これらについても本願の補正または分割出願等において権利取得する意思を有する。また「〜の場合」「〜のとき」という記載があったとしてもその場合やそのときに限られる構成として記載はしているものではない。これらの場合やときでない構成についても開示しているものであり、権利取得する意思を有する。また順番を伴った記載になっている箇所もこの順番に限らない。一部の箇所を削除したり、順番を入れ替えた構成についても開示しているものであり、権利取得する意思を有する。
また，意匠出願への変更出願により，全体意匠または部分意匠について権利取得する意思を有する。図面は本装置の全体を実線で描画しているが，全体意匠のみならず当該装置の一部の部分に対して請求する部分意匠も包含した図面である。例えば当該装置の一部の部材を部分意匠とすることはもちろんのこと，部材と関係なく当該装置の一部の部分を部分意匠として包含した図面である。当該装置の一部の部分としては，装置の一部の部材としても良いし，その部材の部分としても良い。全体意匠はもちろんのこと，図面の実線部分のうち任意の部分を破線部分とした部分意匠を，権利化する意思を有する。 <Other Embodiments>
(1) Although the wireless LAN device 21 is provided in each embodiment and the like, a wired LAN device may be provided instead of or in addition to this and connected to a wired LAN network. The wired LAN device may be built in the robot 1 or attached externally. A wired LAN device may be connected to the terminal 18 for USB OTG (On-The-Go). The wired LAN network may be connected to the Internet via a router or the like. Depending on the environment, there may be cases where communication is not stable or can not be connected depending on the environment. For example, in the case where the robot 1 is installed in a place where there are a large number of wireless LAN devices, such as a smartphone, it is desirable that the Internet be accessed via the wired LAN device.
(2) In each embodiment etc., although the example of the dialogue of the person and the robot 1 in a half duplex system is shown, the dialogue between the person and the robot 1 may be performed in a full duplex system. . For example, after the person says "open ..." in the dialog of Table 1, the voice of "cancel" is continued while the person's voice is recognized while the "Is it really good?" If it is recognized during the utterance of "Is it really OK?", The utterance may be interrupted if it is in the process of speech and the robot may immediately say "cancelled" from the robot.
(3) The configuration in which the interaction is performed in the half duplex mode is particularly preferable because the configuration and processing can be simplified and the cost can be reduced. However, it is difficult to understand when the robot 1 turns on the microphone 12, and even if the user hears, there is a problem that the head portion of the user's voice to be recognized by the robot is often missing. We found it. In order to solve this problem, it is preferable to perform characteristic screen display at the timing when the controller MC turns on the microphone. This supports smooth conversations. As a mode of performing a characteristic screen display at the timing when the controller MC turns on the microphone, at least one of 1) lighting the four corners of the screen and 2) displaying the icon of the microphone may be performed. If both of 2) are performed, they exert excellent effects.
(4) The motors constituting the first to third motors 23 to 25 may be various motors such as a DC motor, but it is particularly preferable to use a stepping motor, and the robot 1 controls the attitude by the stepping motor It is good to be configured. The magnitude of torque changes in proportion to the current flowing through the stepping motor. A large torque can be obtained if a large amount of current is supplied, but problems such as heat generation and battery life will occur. Therefore, when the robot 1 is at rest, it is particularly preferable to supply a minimum current necessary to maintain its posture and to supply a large current only when the robot 1 changes its posture. Although all the servomotors 23 to 25 may be supplied with the minimum current necessary to maintain the posture when they are at rest, the body portion which can be supported by the detent torque (torque in the non-energized state) The first motor 23 is not energized, while the head portion loses the detent torque, so that the second motor 24 and the third motor 25 may be energized even at rest.
In addition, if the torque at the time of stationary is rotated, the torque is insufficient and the step-out occurs and the rotational does not rotate well. Therefore, it is better to increase the torque by flowing a large amount of current when rotating. On the other hand, since no large torque as in the case of rotation is required when stationary, it is better to reduce the current as compared to when rotating.
In addition, when changing the direction to a certain direction, it is preferable to increase the speed to a constant speed while accelerating, and control to decelerate and stop when the target angle approaches.
In addition, when the robot is driven by a motor, mechanical and electrical noises are generated. Since this noise reduces the recognition rate of speech recognition, the motor may be controlled to stop during speech recognition.
The scope of the present invention is not limited to the configuration or limitation explicitly described in the specification, and includes within the scope a combination of the various aspects of the present invention disclosed herein. Among the present inventions, the configuration to be patented is specified in the appended claims, but the present invention is disclosed in this specification even if the configuration is not specified in the claims. In the future, we will have the intention to set the scope of claims as the scope of claims.
The present invention is not limited to the configuration described in the above-described embodiment. The components of the above-described embodiments and modifications may be arbitrarily selected and combined. In addition, any component of each embodiment or modification, and any component described in the means for solving the invention or a component embodying the any component described in the means for solving the invention And may be configured in any combination. These also have the intention to acquire the right in the correction or divisional application etc. of the present application. Further, even if there is a description of "in case of""at time of", it is not described as a configuration limited in that case or at that time. We also disclose the configuration in these cases and not at times, and have the intention to acquire the right. Moreover, the part which is described with the order is not limited to this order. It also discloses the configuration in which some parts are deleted or the order is changed, and has the intention to acquire the right.
In addition, he / she has the intention to acquire the right for a full design or a partial design by filing a change to a design application. Although the drawing depicts the entire device as a solid line, it also includes not only the overall design but also the partial design claimed for part of the device. For example, it is a drawing in which a part of the device is not limited to a partial design, and a part of the device is included as a partial design regardless of the members. A part of the device may be a part of the device or a part of the member. In addition to the overall design, we intend to entitle the partial design where any part of the solid line part of the drawing is the broken line part.

１…装置としてのロボット、４１…他の機器としてのスマートフォン、５１…他の機器としてのスマートスピーカ。 1 ... robot as a device, 41 ... a smartphone as another device, 51 ... a smart speaker as another device.

Claims

An apparatus comprising: a function of communicating by outputting output information to at least one of a user and another apparatus,
An apparatus comprising a function of controlling generation of the output information for the communication or a function of controlling timing of outputting the output information for the communication.

The device according to claim 1, wherein the device has a display function of causing a display mode on a display unit to be changed by the communication by voice.

The apparatus according to claim 2, wherein the change in the display mode on the display unit is based on an utterance of at least one of the user or the other device.

The apparatus according to claim 2 or 3, wherein the change in the display mode on the display unit is performed alternately with the audio output from the apparatus.

The apparatus according to any one of claims 2 to 4, wherein the display mode is character information obtained by converting an utterance of at least one of the user and the other device.

The apparatus according to any one of claims 2 to 5, wherein all characters from the start to the end of the utterance are displayed on the display unit simultaneously as the character into which the utterance is converted.

7. The apparatus according to claim 2, wherein the apparatus has a function of causing the display unit to display an interaction history of communication by voice of at least one of the apparatus and the user or the other apparatus as character information. The device described in any one.

The device has a function of causing the display unit to display the interaction history when the user or the other device is replaced when the interaction history is displayed as character information. The apparatus of claim 7, comprising.

The device according to any one of claims 2 to 8, wherein the device has a function of causing a display unit of the device to display the recognition status of at least one of the user and the other device by using a voice recognition function. Device described.

It has a function for performing the communication by performing speech output using the result of recognizing speech and converting it into a character string,
Corresponds to the character string when there is a portion where the result of speech recognition and conversion to a character string matches the character string stored in the storage means in which the correspondence between the character string of the result and the output content is stored in advance The apparatus according to any one of claims 2 to 9, further comprising: an audio output function for outputting the output content.

A dialog engine is provided when there is no part where the result of speech recognition and conversion to a character string matches the character string stored in the storage means in which the correspondence between the character string of the result and the output content is stored in advance. The apparatus according to any one of claims 2 to 10, comprising a function of connecting to a server and outputting voice data.

Even if there is a portion where the result of speech recognition and conversion into a character string matches the character string stored in the storage means in which the correspondence between the character string of the result and the output content is stored in advance, a certain condition is satisfied The apparatus according to any one of claims 2 to 11, further comprising a function of outputting voice data by connecting to a server provided with a voice recognition engine.

A function of detecting that a voice is interrupted after voice recognition and becomes silent, storage means for storing voice data from voice recognition to silence, and voice data stored in the storage means being silent The apparatus according to any one of claims 2 to 12, further comprising a function of outputting voice data by connecting to a server provided with a voice recognition engine at a time when it has become.

The apparatus according to any one of claims 2 to 13, wherein the apparatus has a recording function, and outputs a voice data by connecting to a server having a voice recognition engine by detecting voice of a predetermined sound pressure level. Device described.

When the server connected to a server equipped with a voice recognition engine and outputting voice data, if the server is in a busy state, voice output of an example of dialogue selected from the dialogue data stored in the storage means for the user The device according to any of the claims 2-14, characterized in that it comprises

It is characterized by having a function of outputting an example of dialogue selected from the voice data stored in the storage means as voice without connecting to a server equipped with a voice recognition engine, when it is judged that the recognized user's speech is too long. An apparatus according to any one of claims 2 to 15, wherein:

The apparatus according to any one of claims 2 to 16, wherein, in the communication by dialogue, when the user misses the voice of the device, the device re-outputs the immediately preceding voice based on an utterance of the user. Device.

The apparatus according to any one of claims 2 to 17, wherein when the apparatus fails to recognize speech, the apparatus outputs speech to prompt the user to speak again.
.

The apparatus according to any one of claims 2 to 18, wherein the display unit is caused to display when a recognized voice content satisfies a certain condition.

The device has a function of moving a housing or a part connected to the housing, and the housing or the part connected to the housing performs a movement when the recognized audio content satisfies a certain condition. An apparatus according to any one of claims 2-19.

The device includes an eye portion which is a portion that the user can recognize as an eye, a user position recognition function that recognizes the position of the user, and a function that moves the eye portion.
The apparatus according to any one of claims 2 to 20, further comprising a function of moving the eye to face the user's position direction recognized by the position recognition function as the communication.

22. A device according to any of claims 2 to 21, wherein the device comprises a face recognition function that recognizes the face of the user.

The apparatus according to claim 22, wherein the apparatus comprises a display unit, and the face recognition function comprises a function of displaying the recognition status of the user's face on the display unit.

The position recognition function is
Three microphones placed at the top of the triangle,
Based on the difference in arrival time of sound from the sound source to each of the three microphones, the position of the sound source is projected from the position projected onto the plane including the triangle along the direction perpendicular to the plane including the triangle. The apparatus according to any one of claims 2 to 23, further comprising: a sound source direction identification function comprising: an identifying unit identifying an audio source direction going to a reference position inside the area surrounded by the triangle.

The apparatus according to any one of claims 2 to 24, wherein the apparatus comprises an infrared remote control signal output unit, and the communication is communication with the other device having an infrared remote control reception function.

The device according to any one of claims 2 to 25, wherein the device is remotely controlled from the other device via the Internet.

The apparatus according to any one of claims 2 to 26, wherein the apparatus performs voice output using character information transmitted from the other device via the Internet.

The apparatus according to claim 27, wherein a time, a time or a number of times of voice output can be changed according to the contents of the character information.

The apparatus according to any one of claims 2 to 28, wherein the apparatus has a function of transmitting voice-recognized character information to the other apparatus via the Internet.

The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string,
30. The apparatus according to any one of claims 2 to 29, further comprising a function of receiving the character string output as the output character string from the plurality of different servers, selecting the longest output character string, and causing interaction. Device.

The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string,
The method according to any one of claims 2 to 30, characterized in that it has a function of receiving a character string output as the output character string from the plurality of different servers, selecting the output character string with a question mark appended to the end and interacting. The device described in any one.

The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string,
34. The method according to any one of claims 2 to 31, further comprising receiving a character string output as the output character string from the plurality of different servers, combining positive sentences, and combining question texts for interaction. Device described.

The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string,
33. The apparatus according to claim 2, further comprising a function of receiving a character string output as the output character string from the plurality of different servers, and combining and interacting so as to arrange the character string converted into a topic last. The device described in any one.

The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string,
A character string having a function of receiving a character string output as the output character string from the plurality of different servers and combining and interacting so as to initially arrange the friendly output character string. The device according to any of the above.

The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string,
The apparatus according to any one of claims 2 to 34, further comprising a function of receiving the character strings output as the output character string from the plurality of different servers, combining them in random order and making them interact. .

The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string,
If the character string outputted as the output character string is received from the plurality of different servers, and the output character string including the emoticon is included in them, it is not considered as a dialogue target, and the display unit 36. The apparatus according to any of the claims 2-35, characterized in that it has a function to be displayed together with the output string.

The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string,
The character string output as the output character string is received from the plurality of different servers, and only one of the output character strings including the same character string is selected and combined with the other output character string 37. A device according to any of the claims 2-36, characterized in that it has the ability to interact.

The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string,
It has a function to receive a character string output as the output character string from the plurality of different servers, convert the end of the output character string by the word conversion engine, and then combine and interact. The apparatus in any one of -37.

The voice-recognized character string is transmitted as the input character string to a plurality of different servers provided with a dialogue engine that outputs an output character string corresponding to the input character string,
The character string output as the output character string from the plurality of different servers is received, and a part of the output character string is stored in the storage unit without using all the output character strings, and the dialogue is subsequently performed The apparatus according to any one of claims 2 to 38, characterized in that it has a function of taking it out of the storage means and using it for interaction.

The apparatus according to any one of claims 2 to 39, characterized in that a server with a voice recognition engine is used in combination with a free server and a paid server.

The device according to any one of claims 2 to 40, wherein the other device is a smart speaker, and the device performs audio output to the smart speaker to communicate with the smart speaker.

42. The device according to any of the claims 2-41, wherein the other device is a smart speaker and the device provides an audio output to the smart speaker to communicate with the smart speaker.

43. A device according to any of the claims 2-42, wherein the other device is a smart speaker and the device has a adaptation function for adapting the audio output of the smart speaker.

The apparatus according to any one of claims 2 to 43, wherein the audio output of the apparatus has a function of reading web articles.

45. The apparatus according to any one of the preceding claims, characterized in that the timing of said output is done when execution of a predetermined processing unit of robotics process automation is completed.

46. Apparatus according to any of the preceding claims, wherein the output is a notification operation.

47. The apparatus according to any one of claims 1 to 46, wherein an alerting operation is performed when a computer in execution of robotics process automation is in a state of waiting for an input from a user.

A client computer for performing robotics process automation, and a server computer for giving an execution instruction of robotics process automation to the client computer, and performing an alerting operation when the client computer receives an instruction from the server computer 48. Apparatus according to any of the preceding claims, characterized in that.

49. An apparatus according to any of the preceding claims, wherein said output information is generated to perform an operation pointing in the direction of a computer running robotics process automation.

50. An apparatus according to any of the preceding claims, wherein different said output information is generated depending on the state of execution of robotics process automation.

A program for causing a computer to realize the function of the device according to any one of claims 1 to 50.