JP2019184813A

JP2019184813A - Robot and robot control program

Info

Publication number: JP2019184813A
Application number: JP2018075313A
Authority: JP
Inventors: 山田　光穗; Mitsuo Yamada; 光穗山田; 祐子星野; Yuko Hoshino; 龍彦常田; Tatsuhiko Tokida
Original assignee: Tokai University; Smart Robotics Co Ltd
Current assignee: Tokai University; Smart Robotics Co Ltd
Priority date: 2018-04-10
Filing date: 2018-04-10
Publication date: 2019-10-24

Abstract

To provide a robot which enables conversation adjusted to a care receiver.SOLUTION: A robot 1 comprises: a speaker specification section 15 for specifying a care receiver; a parameter storage section 16 for storing a parameter at every care receiver; a parameter acquisition section 17 for acquiring the parameter of the specified care receiver; a conversation selection section 20 for selecting conversation to the care receiver; a voice synthesis section 21 for outputting the conversation by voice through a speaker S; and an operation command section 23 for giving a command of presentation operation which is previously set to a driving mechanism D.SELECTED DRAWING: Figure 2

Description

本発明は、高齢者、被介護者等の人物と会話を行うロボット及びロボット制御プログラムに関する。 The present invention relates to a robot and a robot control program that have a conversation with a person such as an elderly person or a care recipient.

介護分野では、介護者の負担を軽減するためのロボットの導入が進んでいる。従来より、人物に関する検出データに基づき、パーソナルアシスタンスを自動的、能動的に提供するロボットが提案されている（特許文献１参照）。この特許文献１に記載のロボットは、人物の健康、運動又はダイエット活動に関するパーソナルアシスタンスを行うものである。 In the care field, robots are being introduced to reduce the burden on caregivers. Conventionally, a robot that automatically and actively provides personal assistance based on detection data related to a person has been proposed (see Patent Document 1). The robot described in Patent Document 1 performs personal assistance related to human health, exercise, or diet activity.

特開２０１４−１７６９６３号公報JP 2014-176963 A

しかし、特許文献１に記載のロボットは、高齢者、被介護者等の人物に合わせた会話を行うことができない。その結果、このロボットとの会話は、その人物にとって必ずしも聞き取り易いものではなく、会話が途絶えてしまい、パーソナルアシスタンスを十分に提供できないことがある。 However, the robot described in Patent Document 1 cannot perform conversations according to persons such as elderly people and care recipients. As a result, the conversation with the robot is not always easy for the person to hear, and the conversation may be interrupted, and personal assistance may not be sufficiently provided.

そこで、本発明は、人物に合わせた会話を可能とするロボット及びロボット制御プログラムを提供することを課題とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide a robot and a robot control program that enable conversation according to a person.

前記した課題に鑑みて、本発明に係るロボットは、人物の発話が入力されるマイクと、前記人物を撮影するカメラと、前記人物に音声を出力するスピーカとを備えるロボットであって、前記人物に対する質問を選択する質問選択部と、前記スピーカを介して、前記質問選択部が選択した質問を音声で出力する音声出力部と、前記マイクから入力された、前記質問に対する前記人物の返答を音声認識する音声認識部と、前記カメラが撮影した人物の顔画像を画像認識する顔画像認識部と、前記返答の音声認識結果、又は、前記顔画像の画像認識結果に基づいて、前記人物を特定する人物特定部と、前記人物毎のパラメータとして、強調する前記音声の高域側周波数成分を表す高域強調情報、前記質問の内容を置き換える質問置換規則、及び、話速を記憶するパラメータ記憶部と、前記人物特定部が特定した人物について、前記パラメータ記憶部からパラメータを取得するパラメータ取得部と、を備え、前記音声出力部は、前記パラメータ取得部が取得したパラメータに応じて前記質問の内容を置き換えて、当該パラメータに応じた周波数及び話速で当該質問を出力する構成とした。 In view of the above-described problems, a robot according to the present invention is a robot including a microphone to which a person's utterance is input, a camera that captures the person, and a speaker that outputs sound to the person. A question selection unit that selects a question for the voice, a voice output unit that outputs the question selected by the question selection unit by voice via the speaker, and a voice of the person's response to the question that is input from the microphone The person is identified based on the voice recognition unit that recognizes, the face image recognition unit that recognizes the face image of the person photographed by the camera, and the voice recognition result of the response or the image recognition result of the face image A person specifying unit that performs high frequency emphasis information that represents a high frequency side frequency component of the voice to be emphasized, a question replacement rule that replaces the content of the question, and a story A parameter storage unit for storing parameters, and a parameter acquisition unit for acquiring parameters from the parameter storage unit for the person specified by the person specifying unit, wherein the voice output unit uses the parameters acquired by the parameter acquisition unit as parameters Accordingly, the content of the question is replaced, and the question is output at a frequency and a speech speed corresponding to the parameter.

また、前記した課題に鑑みて、本発明に係るロボット制御プログラムは、コンピュータを、本発明に係るロボットとして機能させる構成とした。 In view of the above-described problems, the robot control program according to the present invention is configured to cause a computer to function as the robot according to the present invention.

本発明に係るロボット及びロボット制御プログラムによれば、高齢者、被介護者等の人物が聞き取りにくい単語や文章を置き換えると共に、その人物に合わせた周波数及び話速を用いることで、その人物に合わせて会話を行うことができる。 According to the robot and the robot control program according to the present invention, words and sentences that are difficult for a person such as an elderly person or a cared person to replace are replaced with each other by using a frequency and a speech speed adapted to the person. Can have a conversation.

本発明の実施形態に係るロボットの外観図である。1 is an external view of a robot according to an embodiment of the present invention. 図１のロボットの構成を示すブロック図である。It is a block diagram which shows the structure of the robot of FIG. 実施形態において、（ａ）〜（ｆ）は、上腕の可動範囲の定義を説明する説明図である。In embodiment, (a)-(f) is explanatory drawing explaining the definition of the movable range of an upper arm. 図２のロボットの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the robot of FIG. 図４の話者認識処理を示すフローチャートである。It is a flowchart which shows the speaker recognition process of FIG. 図４の動作解析処理を示すフローチャートである。It is a flowchart which shows the operation | movement analysis process of FIG. 図４の動作解析処理を示すフローチャートである。It is a flowchart which shows the operation | movement analysis process of FIG. 図４の歌唱処理を示すフローチャートである。It is a flowchart which shows the singing process of FIG. 実施形態において、音声解析処理を示すフローチャートである。In an embodiment, it is a flow chart which shows voice analysis processing.

（実施形態）
以下、本発明の実施形態について、適宜図面を参照しながら詳細に説明する。なお、実施形態において、同一の手段には同一の符号を付し、説明を省略した。 (Embodiment)
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings as appropriate. In the embodiment, the same means is denoted by the same reference numeral, and description thereof is omitted.

［ロボットの概略］
図１を参照し、本発明の実施形態に係るロボット１の概略について説明する。
図１に示すように、ロボット１は、介護現場で用いるものであり、高齢の被介護者を相手する人型の介護ロボットである。具体的には、ロボット１は、介護者に代わって、被介護者と会話し、被介護者と共に歌唱や体操を行う。 [Robot outline]
The outline of the robot 1 according to the embodiment of the present invention will be described with reference to FIG.
As shown in FIG. 1, the robot 1 is a human-type nursing robot that is used at a nursing care site and is opposed to an elderly care recipient. Specifically, the robot 1 has a conversation with a cared person on behalf of the caregiver, and performs singing and gymnastics with the cared person.

ここで、高齢の被介護者は、特定周波数の音声や早口の音声を聞き取りにくいことがある。介護施設やリハビリテーション病院では、リハビリテーションの一環として、童謡等の歌唱や体操が行われるが、被介護者は、介護者に合わせた歌唱や体操が困難なことが多い。このような場合、介護者は、被介護者に合わせて歌唱や体操を行う。そこで、ロボット１は、人間の介護者と同様、被介護者に合わせて会話、歌唱、体操を行う。 Here, an elderly care receiver may have difficulty in hearing a specific frequency sound or a quick-speaking sound. In nursing facilities and rehabilitation hospitals, singing and gymnastics such as nursery rhymes are performed as part of rehabilitation, but the cared person often has difficulty in singing and gymnastics tailored to the carer. In such a case, the caregiver performs singing and gymnastics according to the care recipient. Therefore, the robot 1 performs conversation, singing, and gymnastics according to the cared person, like a human carer.

ロボット１は、被介護者を撮影するカメラＣと、被介護者の音声を取得するマイクＭと、被介護者に音声や歌を出力するスピーカＳと、ロボット１の各種制御を行う制御部１０（図２）とを備える。カメラＣは、ロボット１の頭部正面に２個、取り付けられている。マイクＭは、ロボット１の頭頂前後に２個、取り付けられている。スピーカＳは、ロボット１の頭部左右に２個、取り付けられている。また、ロボット１は、首、肩、肘、股関節、膝、足首等の可動部を有し、サーボモータ等の駆動機構Ｄにより可動部を駆動する。
なお、図１では、マイクＭ及びスピーカＳを１個のみ図示した。 The robot 1 includes a camera C that captures the cared person, a microphone M that acquires the cared person's voice, a speaker S that outputs voice and a song to the cared person, and a control unit 10 that performs various controls of the robot 1. (FIG. 2). Two cameras C are attached to the front of the head of the robot 1. Two microphones M are attached before and after the top of the robot 1. Two speakers S are attached to the left and right of the head of the robot 1. The robot 1 has movable parts such as a neck, a shoulder, an elbow, a hip joint, a knee, and an ankle, and drives the movable part by a drive mechanism D such as a servo motor.
In FIG. 1, only one microphone M and one speaker S are shown.

［ロボットの構成］
図２を参照し、ロボット１の構成について説明する。
カメラＣは、被介護者の顔画像及び全身の撮影画像を撮影する一般的なカメラである。本実施形態では、カメラＣは、被介護者の顔領域が含まれる顔画像を撮影し、撮影した顔画像を顔画像認識部１１に出力する。また、カメラＣは、被介護者の全身が含まれる撮影画像を撮影し、撮影した撮影画像を動作解析部１２に出力する。
マイクＭは、被介護者の音声を取得する一般的なマイクロホンである。また、マイクＭは、取得した音声を発話認識部１３及び音声解析部１４に出力する。
スピーカＳは、音声合成部２１からの合成音声を被介護者に出力する一般的なスピーカである。
駆動機構Ｄは、動作指令部２３からの指令に従って、ロボット１の各可動部を駆動するものである。例えば、駆動機構Ｄとしては、一般的なサーボモータをあげることができる。 [Robot configuration]
The configuration of the robot 1 will be described with reference to FIG.
The camera C is a general camera that captures a face image of a cared person and a captured image of the whole body. In the present embodiment, the camera C captures a face image that includes the face area of the care recipient and outputs the captured face image to the face image recognition unit 11. In addition, the camera C captures a captured image that includes the whole body of the care recipient, and outputs the captured image to the motion analysis unit 12.
The microphone M is a general microphone that acquires the care receiver's voice. The microphone M outputs the acquired voice to the utterance recognition unit 13 and the voice analysis unit 14.
The speaker S is a general speaker that outputs synthesized speech from the speech synthesizer 21 to a care recipient.
The drive mechanism D drives each movable part of the robot 1 in accordance with a command from the operation command unit 23. For example, the drive mechanism D can be a general servo motor.

制御部１０は、顔画像認識部１１と、動作解析部１２と、発話認識部（音声認識部）１３と、音声解析部１４と、話者特定部（人物特定部）１５と、パラメータ記憶部１６と、パラメータ取得部１７と、会話解析部１８と、会話記憶部１９と、会話選択部（質問選択部）２０と、音声合成部（音声出力部）２１と、提示動作記憶部２２と、動作指令部２３と、を備える。 The control unit 10 includes a face image recognition unit 11, a motion analysis unit 12, an utterance recognition unit (voice recognition unit) 13, a voice analysis unit 14, a speaker identification unit (person identification unit) 15, and a parameter storage unit. 16, a parameter acquisition unit 17, a conversation analysis unit 18, a conversation storage unit 19, a conversation selection unit (question selection unit) 20, a speech synthesis unit (speech output unit) 21, a presentation motion storage unit 22, An operation command unit 23.

顔画像認識部１１は、カメラＣから入力された顔画像を画像認識するものである。本実施形態では、顔画像認識部１１は、既知の手法により、顔画像の特徴点を抽出し、抽出した特徴点を画像認識結果として話者特定部１５に出力する。 The face image recognition unit 11 recognizes a face image input from the camera C. In the present embodiment, the face image recognition unit 11 extracts feature points of the face image by a known method, and outputs the extracted feature points to the speaker specifying unit 15 as an image recognition result.

ここで、顔画像を画像認識する手法としては、参考文献１に記載の顔認証技術を用いることができる。この顔認証技術は、大きく顔検出処理と顔照合処理の２つの処理に分かれている。顔検出処理では、顔画像の中から顔領域を決定し、次に顔特徴点の検出を行って、目、鼻、口端等の顔特徴点の位置を求める。さらに、顔特徴点の位置を用いて、顔領域の位置や大きさを正規化した後、顔照合処理を行う。
参考文献１：“顔認証のしくみ”、［online］、［平成３０年３月３０日検索］、インターネット〈URL：https://jpn.nec.com/biometrics/face/technology/structure.html> Here, as a method for recognizing a face image, the face authentication technique described in Reference 1 can be used. This face authentication technique is roughly divided into two processes, a face detection process and a face matching process. In the face detection process, a face region is determined from the face image, and then the face feature points are detected to obtain the positions of the face feature points such as eyes, nose, and mouth edge. Further, after the position and size of the face area are normalized using the position of the face feature point, face matching processing is performed.
Reference 1: "Face authentication mechanism", [online], [March 30, 2018 search], Internet <URL: https://jpn.nec.com/biometrics/face/technology/structure.html>

動作解析部１２は、カメラＣから入力された撮影画像を用いて、被介護者の追従動作を解析するものである。例えば、動作解析部１２は、動きベクトル等の既知の手法により、被介護者の動作部分、動作量、動作開始時間、動作持続時間等を解析する。そして、動作解析部１２は、その動作解析結果を話者特定部１５、会話選択部２０及び動作指令部２３に出力する。
なお、追従動作とは、ロボット１の動作に追従して被介護者が行う動作のことである。 The motion analysis unit 12 analyzes the follow-up motion of the care recipient using the captured image input from the camera C. For example, the motion analysis unit 12 analyzes a care receiver's motion part, motion amount, motion start time, motion duration, and the like by a known method such as a motion vector. Then, the motion analysis unit 12 outputs the motion analysis result to the speaker specifying unit 15, the conversation selection unit 20, and the motion command unit 23.
The following operation is an operation performed by the care recipient following the operation of the robot 1.

発話認識部１３は、マイクＭから入力された被介護者の音声（例えば、質問に対する被介護者の返答）を音声認識するものである。本実施形態では、発話認識部１３は、既知の手法により、マイクＭからの音声に含まれる単語を音声認識し、単語認識結果を話者特定部１５及び会話解析部１８に出力する。 The utterance recognition unit 13 recognizes voice of a cared person (for example, a cared person's response to a question) input from the microphone M. In the present embodiment, the utterance recognition unit 13 recognizes a word included in the sound from the microphone M by a known method, and outputs the word recognition result to the speaker identification unit 15 and the conversation analysis unit 18.

音声解析部１４は、マイクＭから入力された被介護者の音声や歌唱を解析するものである。本実施形態では、音声解析部１４は、既知の手法により、マイクＭからの音声について、音韻、音素、発話速度等の音響特性を解析する（参考文献２）。そして、音声解析部１４は、音響特性解析結果を話者特定部１５、パラメータ取得部１７及び音声合成部２１に出力する。 The voice analysis unit 14 analyzes the care receiver's voice and singing input from the microphone M. In the present embodiment, the voice analysis unit 14 analyzes acoustic characteristics such as phoneme, phoneme, speech rate, etc., for the voice from the microphone M by a known method (Reference Document 2). Then, the speech analysis unit 14 outputs the acoustic characteristic analysis result to the speaker identification unit 15, the parameter acquisition unit 17, and the speech synthesis unit 21.

参考文献２：藤崎博也、“韻律の分析，定式化とモデル化”、［online］、［平成３０年３月３０日検索］、インターネット〈URL：http://www.gavo.t.u-tokyo.ac.jp/tokutei_pub/houkoku/model/model.pdf#search=%27%E9%9F%B3%E9%9F%BB%E3%80%81%E9%9F%B3%E7%B4%A0%E3%80%81%E7%99%BA%E8%A9%B1%E9%80%9F%E5%BA%A6+%E8%A7%A3%E6%9E%90%27> Reference 2: Hiroya Fujisaki, “Prosody Analysis, Formulation and Modeling”, [online], [March 30, 2018 search], Internet <URL: http: //www.gavo.tu-tokyo .ac.jp / tokutei_pub / houkoku / model / model.pdf # search =% 27% E9% 9F% B3% E9% 9F% BB% E3% 80% 81% E9% 9F% B3% E7% B4% A0% E3% 80% 81% E7% 99% BA% E8% A9% B1% E9% 80% 9F% E5% BA% A6 +% E8% A7% A3% E6% 9E% 90% 27>

話者特定部１５は、顔画像認識部１１、動作解析部１２、発話認識部１３及び音声解析部１４の解析結果に基づいて、被介護者を特定するものである。ここで、話者特定部１５では、ロボット１の演算装置の能力、被介護者の人数等を考慮して、適切な話者特定手法を採用できる。例えば、話者特定手法としては、ディープラーニング、パターンマッチング、特徴点間の位置関係等の手法があげられる。そして、話者特定部１５は、特定した被介護者を表す話者特定結果をパラメータ取得部１７に出力する。 The speaker specifying unit 15 specifies a care receiver based on the analysis results of the face image recognition unit 11, the motion analysis unit 12, the speech recognition unit 13, and the voice analysis unit 14. Here, the speaker specifying unit 15 can adopt an appropriate speaker specifying method in consideration of the ability of the arithmetic unit of the robot 1 and the number of care recipients. For example, examples of the speaker specifying method include deep learning, pattern matching, and a positional relationship between feature points. Then, the speaker specifying unit 15 outputs a speaker specifying result representing the specified care receiver to the parameter acquiring unit 17.

また、話者特定部１５は、被介護者を特定できない場合、顔画像の画像認識結果や音響特性解析結果に基づいて、被介護者の推定年齢及び推定性別を求める。本実施形態では、話者特定部１５は、既知の手法により、被介護者の年齢及び性別を推定できる。
例えば、年齢及び性別を推定する手法としては、ＩＭＤＢ−ＷＩＫＩデータセットを用いるものがあげられる（参考文献３）。このＩＭＤＢ−ＷＩＫＩデータセットは、顔画像から年齢及び性別を推定するタスクに利用できる。
また、性別を推定する手法としては、フォルマント分布を認識するものがあげられる（参考文献４）。この手法は、男女で音声のフォルマント分布が異なるので、そのフォルマント分布を認識することで男女を判定する。例えば、男声の場合、ピッチ周波数が１００Ｈｚ〜１５０Ｈｚとなり、女声の場合、ピッチ周波数が２５０Ｈｚ〜３００Ｈｚとなる。
また、年齢を推定する手法としては、分節的特徴（声道の音響特性）や韻律的特徴（音源の音響特性）を用いるものがあげられる。前者の分節的特徴は、加齢に伴う声道長の伸びに起因するフォルマントシフトや、加齢に伴うスペクトル高域のゲイン低下を利用するものである。後者の韻律的特徴は、加齢に伴う平均基本周波数の低下や、加齢に伴う音源波形（有声音であれば周期波形となる）の乱れ（シマー及びジッター）を測定するものである(参考文献５)。 Moreover, the speaker specific | specification part 15 calculates | requires an estimated age and estimated sex of a cared person based on the image recognition result and acoustic characteristic analysis result of a face image, when a cared person cannot be specified. In this embodiment, the speaker specific | specification part 15 can estimate the age and sex of a care receiver by a known method.
For example, as a method for estimating age and sex, there is a method using an IMDB-WIKI data set (Reference Document 3). This IMDB-WIKI data set can be used for the task of estimating age and gender from face images.
Moreover, as a technique for estimating gender, there is a method for recognizing formant distribution (Reference Document 4). In this method, since the formant distribution of speech differs between men and women, the sexes are determined by recognizing the formant distribution. For example, in the case of a male voice, the pitch frequency is 100 Hz to 150 Hz, and in the case of a female voice, the pitch frequency is 250 Hz to 300 Hz.
Further, as a method for estimating the age, there are methods using segmental features (acoustic characteristics of the vocal tract) and prosodic features (acoustic characteristics of the sound source). The former segmental feature utilizes formant shift due to the increase in vocal tract length accompanying aging and gain reduction in the spectral high band accompanying aging. The latter prosodic feature measures the decrease in average fundamental frequency with aging and the disturbance (simmer and jitter) of the sound source waveform (periodic waveform for voiced sound) with aging (reference) Reference 5).

参考文献３：“ＩＭＤＢ−ＷＩＫＩ”、［online］、［平成３０年２月１４日検索］、インターネット〈URL：https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/>
参考文献４：“フォルマント分布”、［online］、［平成３０年２月１４日検索］、インターネット〈URL：http://media.sys.wakayama-u.ac.jp/kawahara-lab/LOCAL/diss/diss7/S3_6.htm>
参考文献５：峯松信明、“音声の音響的特徴を用いた知覚的年齢の推定とその高精度化(人を観る)” Reference 3: “IMDB-WIKI”, [online], [searched on February 14, 2018], Internet <URL: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki />
Reference 4: “Formant distribution”, [online], [Search February 14, 2018], Internet <URL: http://media.sys.wakayama-u.ac.jp/kawahara-lab/LOCAL/ diss / diss7 / S3_6.htm>
Reference 5: Nobuaki Hamamatsu, “Estimating Perceptual Age Using Acoustic Features of Speech and Improving Its Accuracy (Watching People)”

この他、話者特定部１５は、撮影画像の動作解析結果から、被介護者の推定年齢及び推定性別を求めてもよい。例えば、話者特定部１５は、歩容認証により、被介護者の年齢及び性別を推定できる。この歩容認証は、歩き方の個性を歩幅や腕の振り方から分析し、その分析結果と加齢に伴う一般的な歩き方の変化とを比較することで、年齢性別を推定するものである。
その後、話者特定部１５は、推定した年齢及び性別を表す年齢性別推定結果をパラメータ取得部１７に出力する。 In addition, the speaker specifying unit 15 may obtain the estimated age and estimated sex of the care recipient from the motion analysis result of the captured image. For example, the speaker specifying unit 15 can estimate the age and sex of the care recipient by gait authentication. This gait authentication estimates the age and gender by analyzing the individuality of the way of walking from the stride and how to swing the arm, and comparing the analysis results with changes in the general way of walking with aging. is there.
Thereafter, the speaker identification unit 15 outputs an age-gender estimation result representing the estimated age and sex to the parameter acquisition unit 17.

パラメータ記憶部１６は、被介護者との会話に必要な各種パラメータを記憶するメモリ等の記憶手段である。例えば、ロボット１の管理者や介護者が、被介護者毎に、各種パラメータをパラメータ記憶部１６に手動で設定する。このパラメータには、被介護者に出力する音声に関する音声パラメータ、被介護者に提示する提示動作に関する提示動作パラメータ、被介護者との歌唱に関する歌唱パラメータが含まれる。
さらに、パラメータ記憶部１６は、被介護者を特定できない場合に備え、推定年齢及び推定性別に対応したパラメータを記憶する。 The parameter storage unit 16 is a storage unit such as a memory that stores various parameters necessary for a conversation with the care recipient. For example, an administrator or a caregiver of the robot 1 manually sets various parameters in the parameter storage unit 16 for each care recipient. This parameter includes a voice parameter related to the voice to be output to the care recipient, a presentation operation parameter related to the presentation operation presented to the care receiver, and a singing parameter related to singing with the care receiver.
Further, the parameter storage unit 16 stores parameters corresponding to the estimated age and the estimated gender in case the care receiver cannot be identified.

＜音声パラメータ＞
以下、パラメータ記憶部１６に記憶する各種パラメータについて説明する。
ここで、人間は、２０歳を超えると徐々に聴力が低下すると言われている。内耳には、音を伝える役割を担う数万本の毛が生えた細胞（有毛細胞）が並んでいる。この数万本の毛は、耳の穴から鼓膜と伝わってきた音に反応して揺れることで、音を電気信号に変換する。この有毛細胞の毛が加齢と共に減少することが、加齢性難聴の原因と考えられる。 <Audio parameters>
Hereinafter, various parameters stored in the parameter storage unit 16 will be described.
Here, it is said that humans gradually lose their hearing when they are over 20 years old. In the inner ear, cells (hair cells) with tens of thousands of hairs that play a role in transmitting sound are arranged. These tens of thousands of hairs change in response to the sound transmitted from the ear hole to the eardrum, thereby converting the sound into an electrical signal. It is thought that this hair cell hair decreases with aging is the cause of age-related hearing loss.

また、人間は、加齢と共に高い周波数が聞えにくくなり、全体にくぐもり、はっきりしない感じに聞こえると言われている。そこで、音声合成部２１は、被介護者に聞こえやすい音声を出力するため、低下した周波数成分を上げるとよい。
また、高齢者は、小さな音は聞こえにくいが、聞こえる音の大きさがある音量で急に増加すると言われている。単純に、高齢者の耳元で大きな声で話せばよいというわけではない。つまり、音声合成部２１は、小さな音は大きく、大きな音は抑え気味にするとよい。
また、高齢者は、有毛細胞が減り、内耳から脳に伝達できる情報が欠落するため、時間分解能が減退し、言葉の内容を理解するのに時間がかかると言われている。そこで、音声合成部２１は、ゆっくり、はっきりと音声を出力するとよい。
また、高齢者は、音に含まれる微妙な周波数の違いが分からなくなり、聞き取り能力が落ちると言われている。例えば、高齢者は、パ行、タ行、カ行、そしてサ行の音を聞き取りにくい。また、例えば、高齢者は、「しゅ」「つ」「て」「す」「か」「ひ」「さ」「し」等、高い周波数成分の音を聞き取りにくい。 In addition, it is said that humans cannot hear high frequencies as they age, and they are muffled throughout and sound indistinct. Therefore, the speech synthesizer 21 may increase the reduced frequency component in order to output a sound that can be easily heard by the care recipient.
In addition, it is said that elderly people can hardly hear small sounds but suddenly increase at a certain volume. Simply speaking, you don't have to speak loudly in the ears of the elderly. In other words, the speech synthesizer 21 is preferably configured to suppress loud sounds and suppress loud sounds.
In addition, it is said that elderly people have less hair cells and lack information that can be transmitted from the inner ear to the brain, so that time resolution is reduced and it takes time to understand the content of words. Therefore, the speech synthesizer 21 may output the speech slowly and clearly.
In addition, it is said that elderly people can not understand the subtle frequency difference included in the sound and the listening ability is reduced. For example, elderly people are difficult to hear the sounds of pa, ta, ka and sa. Also, for example, elderly people are difficult to hear high frequency component sounds such as “shu”, “tsu”, “te”, “su”, “ka”, “hi”, “sa”, and “shi”.

そこで、パラメータ記憶部１６には、以下の（１）〜（３）に留意し、各被介護者に適切な高域エンハンス情報及び話速を音声パラメータとして設定する。なお、高域エンハンス情報とは、ロボット１が出力する音声の周波数成分のうち、強調する音声の高域側周波数成分を表す高域強調情報のことである。 Therefore, paying attention to the following (1) to (3), the parameter storage unit 16 sets high frequency enhancement information and speech speed appropriate for each cared person as voice parameters. The high-frequency enhancement information is high-frequency emphasis information that represents the high-frequency component of the voice to be emphasized among the frequency components of the voice that the robot 1 outputs.

（１）大声ではなく、少し大きめの声でゆっくり、ハッキリと話す
（２）パ行、タ行、カ行、サ行を明確にハッキリと発声する
（３）母音部分は、過度に大きい声にならないように注意して、言葉の始まり（立ち上がり）に、しっかりと力を入れて長めに話す (1) Slowly and clearly speaking with a slightly louder voice rather than loud (2) Speak clearly clearly in the pa, ta, ka, and sa lines (3) The vowel part is overly loud Be careful not to become confused and speak long for the beginning of words.

さらに、パラメータ記憶部１６には、各被介護者に適切な会話置換規則（質問置換規則）を音声パラメータとして設定する。この会話置換規則としては、以下の（Ａ）〜（Ｃ）を例示できる。 Further, the parameter storage unit 16 sets a speech replacement rule (question replacement rule) appropriate for each care recipient as a voice parameter. Examples of the conversation replacement rule include the following (A) to (C).

（Ａ）単語の置き換え
「明日の夜７時に会いましょう」という文章に含まれる単語“７（しち）”を“なな”に置き換える。
「履き違える」という単語を「間違える」に置き換える。 (A) Word replacement Replace the word “7 (shichi)” in the sentence “Let's meet at 7:00 tomorrow night” with “Nana”.
Replace the word “I miss” with “I miss”.

（Ｂ）文章の置き換え
話そうとしていることが、事前にわかるように文章を置き換える。例えば、「明日の７時に駅前広場で待ち合わせの予定です」という文書を「待合せは、時間が明日の7時、場所は駅前広場の予定です」に置き換える。 (B) Replacing the sentence Replace the sentence so that it is understood in advance that you are going to speak. For example, the document “I am planning to meet at the station square at 7 o'clock tomorrow” is replaced with “Meeting is scheduled for 7 o'clock tomorrow and the place will be at the station square”.

（Ｃ）空白の付与
単語が長いときは、一呼吸あけるように空白を付与する。例えば、「からし酢味噌」という単語を「からし□□□酢味噌」に置き換える。なお、“□”が単語に付与した空白を表し、合成音声に変換すると無音区間となる。 (C) Giving a space When a word is long, a space is given to take a break. For example, the word “karakushi vinegar miso” is replaced with “karashi □□□ vinegar miso”. Note that “□” represents a blank given to a word, and when converted to synthesized speech, it becomes a silent section.

＜提示動作パラメータ＞
人間は、加齢と共に関節の可動範囲が狭く、動作を開始するまでの時間が遅く、ある動作を継続できる時間が短くなると言われている。そこで、パラメータ記憶部１６には、各被介護者に適切な動作部分、動作量、動作開始時間及び動作継続時間を提示動作パラメータとして設定する。
なお、提示動作とは、ロボット１が被介護者に対して提示する動作のことである。 <Presentation operation parameters>
It is said that humans have a narrow joint movable range as they age, and it takes a long time to start a motion, and a time during which a certain motion can be continued is shortened. In view of this, in the parameter storage unit 16, a motion part, a motion amount, a motion start time, and a motion continuation time appropriate for each care recipient are set as presentation motion parameters.
The presentation operation is an operation that the robot 1 presents to the care recipient.

図３を参照し、動作部分及び動作量の定義の一例を説明する。例えば、動作部分が上腕の場合、その動作部分の動作量は、運動方向毎の角度で表すことができる。このとき、運動方向が屈曲又は伸展の場合、動作部分及び動作量は、両肩の峰を結ぶ線を基本軸とし、頭頂と肩峰を結ぶ線を移動軸とする。また、運動方向が拳上又は引き下げの場合、動作部分及び動作量は、基本軸が同様であるが、肩峰と胸骨上縁を結ぶ線を移動軸とする。 An example of the definition of the motion part and the motion amount will be described with reference to FIG. For example, when the motion part is the upper arm, the motion amount of the motion part can be represented by an angle for each motion direction. At this time, when the movement direction is flexion or extension, the movement portion and the movement amount have a line connecting the peaks of both shoulders as a basic axis and a line connecting the crown and shoulder peaks as a movement axis. When the direction of movement is fist up or down, the movement part and the movement amount are the same on the basic axis, but the line connecting the acromion and the upper sternum is the movement axis.

なお、動作部分及び動作量の定義は、例えば、参考文献６に記載されているため、これ以上の説明を省略する。
参考文献６：“上腕の関節可動域の定義”、［online］、［平成３０年２月１４日検索］、インターネット〈URL：http://www.study-channel.com/2015/06/ROM-upper-limbs.html/> In addition, since the definition of an operation | movement part and operation amount is described in the reference document 6, for example, the description beyond this is abbreviate | omitted.
Reference 6: “Definition of range of motion of upper arm”, [online], [searched on February 14, 2018], Internet <URL: http://www.study-channel.com/2015/06/ROM -upper-limbs.html />

＜歌唱パラメータ＞
人間は、加齢と共に高い音域が聞えにくくなり、歌唱速度が低下すると言われている（参考文献７，８）。そこで、パラメータ記憶部１６には、各被介護者に適切な音域及び歌唱速度を歌唱パラメータとして設定する。
参考文献７：“声の老化について”［online］、［平成３０年３月３０日検索］、インターネット〈URL：http://hozawa.jp/news/2011/11/post-48.html/>
参考文献８：西尾正輝、新美成二、“加齢に伴う話声位の変化”、音声言語医学46:136-144,2005 <Singing parameters>
It is said that humans cannot hear a high sound range with aging, and the singing speed decreases (Reference Documents 7 and 8). Therefore, in the parameter storage unit 16, a sound range and singing speed appropriate for each care recipient are set as singing parameters.
Reference 7: “Aging of voice” [online], [Search on March 30, 2018], Internet <URL: http://hozawa.jp/news/2011/11/post-48.html/>
Reference 8: Masateru Nishio, Seiji Niimi, “Change in spoken voice position with aging”, Spoken Language Medicine 46: 136-144,2005

図２に戻り、ロボット１の構成について、説明を続ける。
パラメータ取得部１７は、話者特定部１５が被介護者を特定できた場合、話者特定部１５からの話者特定結果に基づいて、パラメータ記憶部１６からパラメータを取得するものである。
また、パラメータ取得部１７は、話者特定部１５が被介護者を特定できかった場合、話者特定部１５からの年齢性別推定結果に基づいて、パラメータ記憶部１６からパラメータを取得する。
その後、パラメータ取得部１７は、取得したパラメータのうち、提示動作パラメータを動作指令部２３に出力し、音声パラメータ及び歌唱パラメータを音声合成部２１に出力する。 Returning to FIG. 2, the description of the configuration of the robot 1 will be continued.
The parameter acquisition unit 17 acquires parameters from the parameter storage unit 16 based on the speaker identification result from the speaker identification unit 15 when the speaker identification unit 15 can identify the care recipient.
Further, the parameter acquisition unit 17 acquires parameters from the parameter storage unit 16 based on the age-sex estimation result from the speaker identification unit 15 when the speaker identification unit 15 cannot identify the care recipient.
Thereafter, the parameter acquisition unit 17 outputs the presentation operation parameter among the acquired parameters to the operation command unit 23, and outputs the speech parameter and the singing parameter to the speech synthesis unit 21.

会話解析部１８は、発話認識部１３からの単語認識結果に基づいて、被介護者の会話を解析するものである。本実施形態では、会話解析部１８は、既知の手法により、具体的な会話内容を解析し、会話解析結果を会話選択部２０に出力する。
また、会話解析部１８は、会話解析結果が予め設定された歌唱希望の場合、会話選択部２０に歌唱を指令する（歌唱指令）。この歌唱希望は、例えば、「○○○を歌おうよ」のような、被介護者が歌唱を希望することを示す音声である。なお、“○○○”が歌の名称を表す。
また、会話解析部１８は、会話解析結果が予め設定された提示動作希望の場合、動作指令部２３に提示動作を指令する（提示動作指令）。この提示動作希望は、例えば、「△△△しようよ」のような、被介護者が提示動作を希望することを示す音声である。なお、“△△△”が体操や太極拳といった提示動作の種類を表す。 The conversation analysis unit 18 analyzes the care receiver's conversation based on the word recognition result from the utterance recognition unit 13. In the present embodiment, the conversation analysis unit 18 analyzes specific conversation contents by a known method and outputs a conversation analysis result to the conversation selection unit 20.
Moreover, the conversation analysis part 18 commands a singing to the conversation selection part 20 (singing instruction | indication), when the conversation analysis result is singing hope with which it was preset. This singing request is a voice indicating that the care recipient desires to sing, such as “Let's sing xxx”. “XXX” represents the name of the song.
The conversation analysis unit 18 instructs the operation command unit 23 to perform a presentation operation (presentation operation command) when the conversation analysis result is a desired presentation operation set in advance. This presentation motion request is, for example, a voice indicating that the care recipient desires the presentation motion, such as “Let ’s try”. Note that “ΔΔΔ” represents the type of presentation operation such as gymnastics or Tai Chi.

会話記憶部１９は、被介護者との会話（会話プログラム）を記憶するメモリ等の記憶手段である。この会話には、被介護者に対する質問が含まれる。また、会話記憶部１９は、被介護者との歌唱に必要な歌詞・楽譜（曲）も記憶する。 The conversation storage unit 19 is a storage unit such as a memory for storing a conversation (conversation program) with the care recipient. This conversation includes a question for the care recipient. The conversation storage unit 19 also stores lyrics / musical scores (songs) necessary for singing with the care recipient.

会話選択部２０は、会話解析部１８からの会話解析結果に基づいて、会話記憶部１９に記憶されている会話を選択するものである。本実施形態では、会話選択部２０は、既知の手法により、被介護者の発話に対して適切な内容の会話を選択する。
また、会話選択部２０は、会話解析部１８から歌唱指令が入力された場合、その指令に応じた歌詞・楽譜を会話記憶部１９から選択する。
その後、会話選択部２０は、選択した会話、又は、歌詞・楽譜を音声合成部２１に出力する。 The conversation selection unit 20 selects a conversation stored in the conversation storage unit 19 based on the conversation analysis result from the conversation analysis unit 18. In the present embodiment, the conversation selection unit 20 selects a conversation having an appropriate content with respect to the care receiver's utterance by a known method.
In addition, when a singing instruction is input from the conversation analyzing unit 18, the conversation selecting unit 20 selects lyrics / musical scores corresponding to the instruction from the conversation storing unit 19.
After that, the conversation selection unit 20 outputs the selected conversation or the lyrics / score to the speech synthesis unit 21.

音声合成部２１は、スピーカＳを介して、会話選択部２０からの会話を音声で出力するものである。本実施形態では、音声合成部２１は、パラメータ記憶部１６からの音声パラメータ（会話置換規則）に基づいて会話を置き換える。そして、音声合成部２１は、音声パラメータ（高域エンハンス情報、話速）に基づいて、その会話を表す合成音声を生成し、生成した合成音声をスピーカＳに出力する。このように、ロボット１は、被介護者が聞き取りやすい周波数、話速で合成音声を出力するので、被介護者が会話を継続しやすくなる。 The voice synthesizer 21 outputs the conversation from the conversation selector 20 via the speaker S by voice. In the present embodiment, the speech synthesizer 21 replaces the conversation based on the speech parameter (conversation replacement rule) from the parameter storage unit 16. Then, the speech synthesizer 21 generates synthesized speech representing the conversation based on the speech parameters (high frequency enhancement information, speech speed), and outputs the generated synthesized speech to the speaker S. Thus, since the robot 1 outputs the synthesized speech at a frequency and speech speed that are easy for the care recipient to hear, it becomes easy for the care recipient to continue the conversation.

また、音声合成部２１は、スピーカＳを介して、会話選択部２０からの歌詞・楽譜を出力する。本実施形態では、音声合成部２１は、会話選択部２０からの歌詞の合成音声を生成する。そして、音声合成部２１は、楽譜による伴奏と共に、パラメータ記憶部１６からの歌唱パラメータ（音域、歌唱速度）に基づいて、生成した合成音声で歌唱を行う。このように、ロボット１は、被介護者が共に歌いやすい音域、歌唱速度で歌唱・伴奏を行うので、被介護者が歌唱を継続しやすくなる。 The voice synthesizer 21 outputs the lyrics / score from the conversation selector 20 via the speaker S. In the present embodiment, the speech synthesizer 21 generates lyrics synthesized speech from the conversation selector 20. The voice synthesizer 21 performs singing with the generated synthesized voice based on the singing parameters (sound range, singing speed) from the parameter storage unit 16 together with the accompaniment by the score. Thus, since the robot 1 performs singing and accompaniment at a singing speed and a singing speed that are easy for the cared person to sing, the cared person can easily continue singing.

提示動作記憶部２２は、体操や太極拳等の提示動作を記憶するメモリ等の記憶手段である。この提示動作は、例えば、体操や太極拳等、ロボット１が被介護者に提示する動作である。 The presentation operation storage unit 22 is a storage unit such as a memory that stores presentation operations such as gymnastics and tai chi. This presenting operation is an operation that the robot 1 presents to the care recipient, such as gymnastics or Tai Chi.

動作指令部２３は、提示動作記憶部２２から提示動作を取得し、取得した提示動作を駆動機構Ｄに指令するものである。本実施形態では、動作指令部２３は、会話解析部１８から提示動作指令が入力された場合、その指令に応じた提示動作を提示動作記憶部２２から取得する。そして、動作指令部２３は、パラメータ記憶部１６からの提示動作パラメータや動作解析部１２から動作解析結果に基づいて、提示動作を調整する。このように、ロボット１は、被介護者の動きに合わせて提示動作を調整するので、被介護者が運動を継続しやすくなる。 The motion command unit 23 acquires a presentation motion from the presentation motion storage unit 22 and commands the acquired presentation motion to the drive mechanism D. In the present embodiment, when a presentation motion command is input from the conversation analysis unit 18, the motion command unit 23 acquires a presentation motion corresponding to the command from the presentation motion storage unit 22. Then, the motion command unit 23 adjusts the presentation motion based on the presentation motion parameter from the parameter storage unit 16 and the motion analysis result from the motion analysis unit 12. Thus, since the robot 1 adjusts the presentation operation in accordance with the movement of the cared person, the cared person can easily continue the exercise.

［ロボットの動作］
図４を参照し、ロボット１の動作について説明する。
図４に示すように、ロボット１は、提示動作を行うか否かを判定する。例えば、ロボット１は、会話解析部１８によって、会話解析結果が提示動作希望であるか否かにより、提示動作を行うか否かを判定する（ステップＳ１）。
提示動作を行わない場合（ステップＳ１でＮｏ）、ロボット１は、歌唱を行うか否かを判定する。例えば、ロボット１は、会話解析部１８によって、会話解析結果が歌唱希望であるか否かを判定する（ステップＳ２）。
歌唱を行わない場合（ステップＳ２でＮｏ）、ロボット１は、図５の話者認識処理を行って、パラメータを取得する（ステップＳ３）。 [Robot motion]
The operation of the robot 1 will be described with reference to FIG.
As shown in FIG. 4, the robot 1 determines whether or not to perform a presentation operation. For example, the robot 1 determines whether or not to perform the presentation operation by the conversation analysis unit 18 depending on whether or not the conversation analysis result is a presentation operation request (step S1).
When the presentation operation is not performed (No in step S1), the robot 1 determines whether or not to sing. For example, in the robot 1, the conversation analysis unit 18 determines whether or not the conversation analysis result is a singing request (step S2).
When singing is not performed (No in step S2), the robot 1 performs the speaker recognition process of FIG. 5 and acquires parameters (step S3).

＜話者認識処理＞
図５を参照し、話者認識処理について説明する。
図５に示すように、カメラＣは、被介護者の顔画像を撮影する（ステップＳ１００）。
顔画像認識部１１は、ステップＳ１００で撮影した顔画像を画像認識する。
話者特定部１５は、顔画像認識部１１の画像認識結果に基づいて、被介護者を特定する（ステップＳ１０１）。 <Speaker recognition processing>
The speaker recognition process will be described with reference to FIG.
As shown in FIG. 5, the camera C captures a face image of the care recipient (step S100).
The face image recognition unit 11 recognizes the face image captured in step S100.
The speaker specifying unit 15 specifies the care recipient based on the image recognition result of the face image recognition unit 11 (step S101).

被介護者を特定できた場合（ステップＳ１０１でＹｅｓ）、パラメータ取得部１７は、ステップＳ１０１で特定した被介護者について、パラメータ記憶部１６からパラメータを取得し、話者認識処理を終了する（ステップＳ１０２）。 When the care receiver can be identified (Yes in step S101), the parameter acquisition unit 17 acquires parameters from the parameter storage unit 16 for the care recipient identified in step S101, and ends the speaker recognition process (step S101). S102).

被介護者を特定できない場合（ステップＳ１０１でＮｏ）、話者特定部１５は、顔画像の画像認識結果に基づいて、被介護者の年齢及び性別を推定する（ステップＳ１０３）。
パラメータ取得部１７は、ステップＳ１０３で推定した年齢及び性別に対応するパラメータをパラメータ記憶部１６から取得し、話者認識処理を終了する（ステップＳ１０４）。 When the cared person cannot be identified (No in step S101), the speaker identifying unit 15 estimates the age and sex of the cared person based on the image recognition result of the face image (step S103).
The parameter acquisition unit 17 acquires the parameters corresponding to the age and sex estimated in step S103 from the parameter storage unit 16, and ends the speaker recognition process (step S104).

図４に戻り、ロボット１の動作について説明を続ける。
会話選択部２０は、被介護者が提示動作や歌唱の何れも希望していないので、被介護者に対する質問を選択する。例えば、会話選択部２０は、被介護者に会話を促す「体調はいかがですか？」といった質問を選択する。
音声合成部２１は、ステップＳ３で取得した音声パラメータ（会話置換規則）に基づいて、選択した質問を置き換える。そして、音声合成部２１は、その音声パラメータ（高域エンハンス情報、話速）に基づいて、選択した質問の合成音声を生成し、生成した合成音声をスピーカＳに出力する（ステップＳ４）。 Returning to FIG. 4, the description of the operation of the robot 1 will be continued.
The conversation selection unit 20 selects a question for the care recipient because the care recipient does not wish to present or sing. For example, the conversation selection unit 20 selects a question such as “How are you feeling?” That prompts the cared person to speak.
The speech synthesizer 21 replaces the selected question based on the speech parameter (conversation replacement rule) acquired in step S3. Then, the speech synthesizer 21 generates synthesized speech of the selected question based on the speech parameters (high frequency enhancement information, speech speed), and outputs the generated synthesized speech to the speaker S (step S4).

発話認識部１３は、マイクＭから入力された被介護者の音声に含まれる単語を認識し、ステップＳ４の質問に対して、被介護者が返答したか否かを判定する（ステップＳ５）。 The utterance recognition unit 13 recognizes a word included in the care receiver's voice input from the microphone M, and determines whether or not the care receiver has answered the question in step S4 (step S5).

被介護者が返答した場合（ステップＳ５でＹｅｓ）、会話解析部１８は、ステップＳ５の単語認識結果に基づいて、被介護者の返答を解析する。
会話選択部２０は、会話解析部１８が解析した被介護者の返答に基づいて、会話記憶部１９から会話を選択する。
音声合成部２１は、ステップＳ３で取得した音声パラメータ（会話置換規則）に基づいて、選択した会話を置き換える。そして、音声合成部２１は、その音声パラメータ（高域エンハンス情報、話速）に基づいて、選択した会話の合成音声を生成し、生成した合成音声をスピーカＳに出力する（ステップＳ６）。 When the cared person responds (Yes in step S5), the conversation analysis unit 18 analyzes the cared person's response based on the word recognition result in step S5.
The conversation selection unit 20 selects a conversation from the conversation storage unit 19 based on the response of the care recipient analyzed by the conversation analysis unit 18.
The speech synthesizer 21 replaces the selected conversation based on the speech parameter (conversation replacement rule) acquired in step S3. Then, the speech synthesizer 21 generates synthesized speech of the selected conversation based on the speech parameters (high frequency enhancement information, speech speed), and outputs the generated synthesized speech to the speaker S (step S6).

その後、ロボット１は、被介護者が会話を終了するまでステップＳ６の処理を継続し、被介護者が会話を終了したら、ステップＳ１の処理に戻る。
なお、被介護者が返答したにも関わらず、図５の話者認識処理により被介護者を特定できない場合がある。この場合、ロボット１は、後記する音声解析処理（図９）を行って、被介護者を特定する。 Thereafter, the robot 1 continues the process of step S6 until the cared person finishes the conversation. When the cared person finishes the conversation, the robot 1 returns to the process of step S1.
In some cases, the cared person cannot be identified by the speaker recognition process in FIG. 5 even though the cared person responds. In this case, the robot 1 performs a voice analysis process (FIG. 9) to be described later, and identifies the care recipient.

被介護者が返答しない場合（ステップＳ５でＮｏ）、話者特定部１５は、顔画像の画像認識結果に基づいて、被介護者の年齢及び性別を推定する（ステップＳ７）。 When the cared person does not respond (No in step S5), the speaker specifying unit 15 estimates the age and sex of the cared person based on the image recognition result of the face image (step S7).

会話選択部２０は、被介護者に対する質問を選択する。
音声合成部２１は、取得した音声パラメータ（会話置換規則）に基づいて、選択した質問を置き換える。そして、音声合成部２１は、ステップＳ７で推定した年齢及び性別に基づいて、さらに話速を遅くし、高域周波数成分を強調した合成音声を生成する。ここで、音声合成部２１は、話速を２割遅くすると聞き取り易くなるので（参考文献９）、話速を２割だけ低下させる（ステップＳ８）。
参考文献９：“人にやさしい話速変換”、［online］、［平成３０年３月３０日検索］、インターネット〈URL：https://www.nhk.or.jp/strl/onepoint/data/wasoku.pdf#search=%27%E9%AB%98%E9%BD%A2%E8%80%85+%E8%A9%B1%E9%80%9F%27> The conversation selection unit 20 selects a question for the care recipient.
The speech synthesizer 21 replaces the selected question based on the acquired speech parameter (conversation replacement rule). Then, the speech synthesizer 21 further generates a synthesized speech in which the speech speed is further reduced and the high frequency component is emphasized based on the age and sex estimated in step S7. Here, since the speech synthesizer 21 is easy to hear when the speech speed is slowed by 20% (Reference Document 9), the speech speed is reduced by 20% (Step S8).
Reference 9: "People-friendly speech speed conversion", [online], [Search on March 30, 2018], Internet <URL: https://www.nhk.or.jp/strl/onepoint/data/ wasoku.pdf # search =% 27% E9% AB% 98% E9% BD% A2% E8% 80% 85 +% E8% A9% B1% E9% 80% 9F% 27>

音声合成部２１は、ステップＳ８で生成した質問の合成音声をスピーカＳに出力する（ステップＳ９）。
発話認識部１３は、マイクＭから入力された被介護者の音声に含まれる単語を認識し、ステップＳ９の質問に対して、被介護者が返答したか否かを判定する（ステップＳ１０）。
被介護者が返答した場合（ステップＳ１０でＹｅｓ）、ロボット１は、図５の話者認識処理を行って、パラメータを取得する（ステップＳ１１）。 The voice synthesizer 21 outputs the synthesized voice of the question generated in step S8 to the speaker S (step S9).
The utterance recognition unit 13 recognizes a word included in the care receiver's voice input from the microphone M, and determines whether or not the care receiver has answered the question in step S9 (step S10).
When the cared person responds (Yes in step S10), the robot 1 performs the speaker recognition process of FIG. 5 and acquires parameters (step S11).

ロボット１は、ステップＳ６と同様、ステップＳ１１で取得したパラメータを用いて、会話を行う（ステップＳ１２）。
その後、ロボット１は、被介護者が会話を終了するまでステップＳ１２の処理を継続し、被介護者が会話を終了したら、ステップＳ１３の処理に進む。 Similar to step S6, the robot 1 has a conversation using the parameters acquired in step S11 (step S12).
Thereafter, the robot 1 continues the process of step S12 until the cared person finishes the conversation. When the cared person finishes the conversation, the robot 1 proceeds to the process of step S13.

パラメータ取得部１７は、被介護者の識別情報（例えば、顔画像の特徴量）と、ステップＳ１１で取得したパラメータとを対応付けてパラメータ記憶部１６に書き込み、ステップＳ１の処理に戻る（ステップＳ１３）。 The parameter acquisition unit 17 writes the care receiver identification information (for example, the feature amount of the face image) and the parameter acquired in step S11 in association with each other in the parameter storage unit 16, and returns to the process of step S1 (step S13). ).

被介護者が返答しない場合（ステップＳ１０でＮｏ）、話者特定部１５は、質問回数に‘１’を加算する。この質問回数は、ロボット１が被介護者に質問を行った回数を表し、初期値を‘０’とする（ステップＳ１４）。
話者特定部１５は、ステップＳ１４で加算した質問回数が予め設定した指定回数未満であるか否かを判定する。この指定回数は、ロボット１が被介護者に質問を行う回数を表す（ステップＳ１５）。
質問回数が指定回数未満の場合（ステップＳ１５でＹｅｓ）、ロボット１は、ステップＳ８の処理に戻る。 If the care receiver does not respond (No in step S10), the speaker specifying unit 15 adds “1” to the number of questions. The number of questions represents the number of times the robot 1 has asked the care recipient, and the initial value is “0” (step S14).
The speaker specifying unit 15 determines whether or not the number of questions added in step S14 is less than a preset designated number. This designated number of times represents the number of times that the robot 1 makes a question to the care recipient (step S15).
If the number of questions is less than the specified number (Yes in step S15), the robot 1 returns to the process in step S8.

質問回数が指定回数以上の場合（ステップＳ１５でＮｏ）、会話選択部２０は、会話記憶部１９から会話終了を選択する。この会話終了は、例えば、「残念ですが、私は聞き取りやすい発声ができません」というように、被介護者との会話を終了する内容である。
音声合成部２１は、会話終了の合成音声を生成し、生成した合成音声をスピーカＳに出力し（ステップＳ１６）、ステップＳ１の処理に戻る。 When the number of questions is equal to or greater than the specified number (No in step S15), the conversation selection unit 20 selects the conversation end from the conversation storage unit 19. The end of the conversation is, for example, the content of ending the conversation with the cared person, such as “Sorry, I cannot speak easily”.
The voice synthesizer 21 generates a synthesized voice at the end of the conversation, outputs the generated synthesized voice to the speaker S (step S16), and returns to the process of step S1.

提示動作を行う場合（ステップＳ１でＹｅｓ）、ロボット１は、図６の動作解析処理を行い、ステップＳ１の処理に戻る（ステップＳ１７）。
歌唱を行う場合（ステップＳ２でＹｅｓ）、ロボット１は、図８の歌唱処理を行い、ステップＳ１の処理に戻る（ステップＳ１８）。 When performing the presentation operation (Yes in step S1), the robot 1 performs the operation analysis process of FIG. 6 and returns to the process of step S1 (step S17).
When singing (Yes in step S2), the robot 1 performs the singing process of FIG. 8 and returns to the process of step S1 (step S18).

＜動作解析処理＞
図６，図７を参照し、動作解析処理について説明する。
図６に示すように、ロボット１は、図５の話者認識処理を行って、パラメータを取得する（ステップＳ２００）。
動作指令部２３は、被介護者が希望した提示動作を提示動作記憶部２２から取得し、ステップＳ２００の提示動作パラメータで提示動作を駆動機構Ｄに指令する（ステップＳ２０１）。
カメラＣは、被介護者の撮影画像を撮影する（ステップＳ２０２）。 <Operation analysis processing>
The operation analysis process will be described with reference to FIGS.
As shown in FIG. 6, the robot 1 performs the speaker recognition process of FIG. 5 and acquires parameters (step S200).
The operation command unit 23 acquires the presentation operation desired by the care recipient from the presentation operation storage unit 22, and instructs the drive mechanism D with the presentation operation parameter in step S200 (step S201).
The camera C takes a photographed image of the care recipient (step S202).

動作解析部１２は、ステップＳ２０２で撮影した撮影画像を用いて、被介護者の追従動作を解析する。例えば、動作解析部１２は、その撮影画像から、被介護者の動作部分、動作量、動作開始時間、動作持続時間等の動作解析結果を求める（ステップＳ２０３）。 The motion analysis unit 12 analyzes the follow-up motion of the care recipient using the captured image captured in step S202. For example, the motion analysis unit 12 obtains a motion analysis result such as the motion part, motion amount, motion start time, motion duration, and the like of the care recipient from the captured image (step S203).

動作解析部１２は、ロボット１が行っている提示動作と被介護者が行っている追従動作とを比較し、動作部分が一致するか否かを判定する。例えば、動作解析部１２は、ロボット１が駆動している可動部と、被介護者が動かしている関節が一致するか否かを判定する（ステップＳ２０４）。
動作部分が一致しない場合（ステップＳ２０４でＮｏ）、動作解析部１２は、被介護者が座位であるか否かを判定する（ステップＳ２０５）。 The motion analysis unit 12 compares the presentation motion performed by the robot 1 with the follow-up motion performed by the care recipient, and determines whether or not the motion portions match. For example, the motion analysis unit 12 determines whether or not the movable unit driven by the robot 1 and the joint being moved by the care recipient match (step S204).
If the motion parts do not match (No in step S204), the motion analysis unit 12 determines whether or not the cared person is in the sitting position (step S205).

被介護者が座位の場合（ステップＳ２０５でＹｅｓ）、動作解析部１２は、会話選択部２０や動作指令部２３に座位モードを指令する。この座位モードは、被介護者が座位の状態で可能な動作の中で、動作が一致していない部分を、ロボット１が小刻みに動かしたり音声で案内するモードである。
動作指令部２３は、座位モードで提示動作を駆動機構Ｄに指令する（ステップＳ２０６）。 When the cared person is in the sitting position (Yes in step S205), the motion analysis unit 12 commands the sitting mode to the conversation selection unit 20 or the motion command unit 23. This sitting mode is a mode in which the robot 1 moves in small steps or guides by voice a part of the actions that can be performed while the cared person is in the sitting position.
The motion command unit 23 commands the presentation mechanism to the drive mechanism D in the sitting position mode (step S206).

被介護者が座位でない場合（ステップＳ２０５でＮｏ）、動作解析部１２は、会話選択部２０や動作指令部２３に立位モードを指令する。この立位モードは、被介護者が立位の状態で可能な動作の中で、動作が一致していない部分を、ロボット１が小刻みに動かしたり音声で案内するモードである。
動作指令部２３は、立位モードで提示動作を駆動機構Ｄに指令する（ステップＳ２０７）。 When the cared person is not in the sitting position (No in step S205), the motion analysis unit 12 commands the standing mode to the conversation selection unit 20 or the motion command unit 23. This standing mode is a mode in which the robot 1 moves in small steps or guides by voice a portion of the motions that can be performed while the care recipient is standing.
The operation command unit 23 instructs the drive mechanism D to perform a presentation operation in the standing mode (step S207).

例えば、リハビリテーションでは、ロボット１の提示動作に合わせ、被介護者が追従動作を行うのが基本である一方、ロボット１が被介護者に合わせて動作する必要もある。
そこで、動作解析部１２は、ロボット１が行っている提示動作と被介護者が行っている追従動作との動作量を比較する。例えば、動作解析部１２は、提示動作毎に動作量閾値を予め設定し、この動作量閾値と追従動作の動作量とを比較する（ステップＳ２０８）。 For example, in rehabilitation, it is basic that the cared person performs a follow-up operation in accordance with the presentation operation of the robot 1, while the robot 1 also needs to operate in accordance with the cared person.
Therefore, the motion analysis unit 12 compares the motion amount between the presentation motion performed by the robot 1 and the follow-up motion performed by the care recipient. For example, the motion analysis unit 12 presets a motion amount threshold value for each presentation motion, and compares this motion amount threshold value with the motion amount of the follow-up motion (step S208).

動作量閾値よりも追従動作の動作量が小さい場合（ステップＳ２０８で小）、動作解析部１２は、会話選択部２０や動作指令部２３に追従動作の動作量増加を指令する。この場合、ロボット１は、追従動作の動作量が小さい部分を、小刻みに動かしたり音声で案内する（ステップＳ２０９）。 When the motion amount of the follow-up motion is smaller than the motion amount threshold value (small in step S208), the motion analysis unit 12 commands the conversation selection unit 20 or the motion command unit 23 to increase the motion amount of the follow-up motion. In this case, the robot 1 moves a portion where the movement amount of the follow-up operation is small or guides it by voice (step S209).

動作量閾値よりも追従動作の動作量が大きい場合（ステップＳ２０８で大）、動作解析部１２は、会話選択部２０や動作指令部２３に追従動作の動作量減少を指令する。この場合、ロボット１は、追従動作の動作量が大きい部分を、小刻みに動かしたり音声で案内する（ステップＳ２１０）。 When the motion amount of the follow-up motion is larger than the motion amount threshold (large in step S208), the motion analysis unit 12 commands the conversation selection unit 20 or the motion command unit 23 to decrease the follow-up motion amount. In this case, the robot 1 moves a portion where the amount of the follow-up motion is large or guides it by voice (step S210).

動作解析部１２は、ロボット１が行っている提示動作と被介護者が行っている追従動作との動作速度を比較する。例えば、動作解析部１２は、提示動作毎に動作速度閾値を予め設定し、この動作速度閾値と追従動作の動作速度とを比較する（ステップＳ２１１）。 The motion analysis unit 12 compares the motion speed of the presentation motion performed by the robot 1 and the follow-up motion performed by the care recipient. For example, the motion analysis unit 12 presets a motion speed threshold value for each presentation motion, and compares the motion speed threshold value with the motion speed of the follow-up motion (step S211).

動作速度閾値よりも追従動作の動作速度が遅い場合（ステップＳ２１１で遅）、動作解析部１２は、会話選択部２０や動作指令部２３に追従動作の速度増加を指令する。この場合、ロボット１は、追従動作の動作速度が遅い部分を、小刻みに動かしたり音声で案内する（ステップＳ２１２）。 When the operation speed of the follow-up operation is slower than the operation speed threshold (delayed in step S211), the operation analysis unit 12 instructs the conversation selection unit 20 and the operation command unit 23 to increase the speed of the follow-up operation. In this case, the robot 1 moves a portion where the operation speed of the follow-up operation is slow or guides it by voice (step S212).

動作速度閾値よりも追従動作の動作速度が速い場合（ステップＳ２１１で速）、動作解析部１２は、会話選択部２０や動作指令部２３に追従動作の速度減少を指令する。この場合、ロボット１は、追従動作の動作速度が速い部分を、小刻みに動かしたり音声で案内する（ステップＳ２１３）。 When the operation speed of the follow-up operation is faster than the operation speed threshold (fast in step S211), the operation analysis unit 12 instructs the conversation selection unit 20 and the operation command unit 23 to decrease the speed of the follow-up operation. In this case, the robot 1 moves a portion where the operation speed of the follow-up operation is fast or guides it by voice (step S213).

動作解析部１２は、動作開始時間の調整が初回であるか否かを判定する（ステップＳ２１４）。
動作開始時間の調整が初回の場合（ステップＳ２１４でＹｅｓ）、動作解析部１２は、ロボット１が行っている提示動作と被介護者が行っている追従動作との動作開始時間とを比較する。
動作解析部１２は、追従動作の動作開始時間を調整する。つまり、動作解析部１２は、追従動作の動作開始時間が提示動作より早い場合、追従動作の動作開始時間を遅く、追従動作の動作開始時間が提示動作より遅い場合、追従動作の開始時間を早くするように動作指令部２３に指令する。この場合、ロボット１は、追従動作の動作開始時間の調整が必要な部分を、小刻みに動かしたり音声で案内する（ステップＳ２１５）。
なお、動作解析部１２は、追従動作及び提示動作の動作開始時間が一致する場合、調整を行わない。 The motion analysis unit 12 determines whether or not the operation start time is adjusted for the first time (step S214).
When the adjustment of the motion start time is the first time (Yes in step S214), the motion analysis unit 12 compares the motion start time between the presentation motion performed by the robot 1 and the follow-up motion performed by the care recipient.
The motion analysis unit 12 adjusts the motion start time of the follow-up motion. That is, the motion analysis unit 12 delays the operation start time of the tracking operation when the operation start time of the tracking operation is earlier than the presentation operation, and increases the start time of the tracking operation when the operation start time of the tracking operation is later than the presentation operation. The operation command unit 23 is instructed to do so. In this case, the robot 1 moves a portion requiring adjustment of the operation start time of the follow-up operation in small steps or guides it by voice (step S215).
Note that the motion analysis unit 12 does not perform adjustment when the motion start times of the follow-up motion and the presentation motion match.

動作開始時間の調整が初回でない場合（ステップＳ２１４でＮｏ）、被介護者がこれ以上早く動作を開始できないと考えられる。従って、動作解析部１２は、提示動作の動作開始時間を遅くするように動作指令部２３に指令する（ステップＳ２１６）。 When the adjustment of the operation start time is not the first time (No in step S214), it is considered that the care recipient cannot start the operation earlier than this. Therefore, the motion analysis unit 12 commands the motion command unit 23 to delay the motion start time of the presentation motion (step S216).

動作解析部１２は、動作継続時間の調整が初回であるか否かを判定する（ステップＳ２１７）。
動作継続時間の調整が初回の場合（ステップＳ２１７でＹｅｓ）、動作解析部１２は、ロボット１が行っている提示動作と、被介護者が行っている追従動作との動作継続時間を比較する。
動作解析部１２は、追従動作の動作継続時間を調整する。つまり、動作解析部１２は、追従動作の動作継続時間が提示動作より短い場合、追従動作の動作継続時間を長くし、追従動作の動作継続時間が提示動作より長い場合、追従動作の動作継続時間を短くするように動作指令部２３に指令する。この場合、ロボット１は、追従動作の動作継続時間の調整が必要な部分を、小刻みに動かしたり音声で案内する（ステップＳ２１８）。
なお、動作解析部１２は、追従動作及び提示動作の動作継続時間が一致する場合、調整を行わない。 The motion analysis unit 12 determines whether or not the operation duration time is adjusted for the first time (step S217).
When the adjustment of the operation duration is the first time (Yes in step S217), the operation analysis unit 12 compares the operation duration of the presentation operation performed by the robot 1 and the follow-up operation performed by the care recipient.
The motion analysis unit 12 adjusts the motion continuation time of the follow-up motion. That is, the motion analysis unit 12 increases the operation duration time of the tracking operation when the motion duration time of the tracking operation is shorter than the presentation operation, and increases the operation duration time of the tracking operation when the motion duration time of the tracking operation is longer than the presentation operation. Is commanded to the operation command unit 23 so as to shorten In this case, the robot 1 moves a portion that needs to be adjusted for the duration of the follow-up operation in small steps or guides it by voice (step S218).
Note that the motion analysis unit 12 does not perform adjustment when the motion durations of the follow-up motion and the presentation motion match.

動作継続時間の調整が初回でない場合（ステップＳ２１７でＮｏ）、被介護者がこれ以上早く動作できないと考えられる。従って、動作解析部１２は、追従動作の動作継続時間を長くするように動作指令部２３に指令する（ステップＳ２１９）。 When the adjustment of the operation duration time is not the first time (No in step S217), it is considered that the care recipient cannot operate any faster. Therefore, the motion analysis unit 12 commands the motion command unit 23 to increase the motion continuation time of the follow-up motion (step S219).

動作解析部１２は、被介護者の追従動作が安定したか否かを判定する。例えば、動作解析部１２は、動作量の偏位や動作開始時間が動作安定範囲になったとき、動作が安定したと判定する。この動作安定範囲は、被介護者の年齢や性別、提示動作の種類や難易度に応じて、予め設定する（ステップＳ２２０）。 The motion analysis unit 12 determines whether the follow-up motion of the care receiver is stable. For example, the motion analysis unit 12 determines that the operation is stable when the displacement of the motion amount or the motion start time is within the motion stable range. This operation stable range is set in advance according to the age and sex of the cared person, the type and difficulty of the presentation operation (step S220).

追従動作が安定した場合（ステップＳ２２０でＹｅｓ）、パラメータ取得部１７は、被介護者の識別情報と、調整後の提示動作パラメータとを対応付けてパラメータ記憶部１６に書き込む（ステップＳ２２１）。 When the tracking operation is stable (Yes in step S220), the parameter acquisition unit 17 writes the care receiver identification information and the adjusted presentation operation parameter in association with each other in the parameter storage unit 16 (step S221).

動作指令部２３は、提示動作が終了したか否かを判定する（ステップＳ２２２）。
提示動作が終了していない場合（ステップＳ２２２でＮｏ）、動作指令部２３は、ステップＳ２０２の処理に戻る。
提示動作が終了した場合（ステップＳ２２２でＹｅｓ）、動作指令部２３は、動作解析処理を終了する。 The operation command unit 23 determines whether or not the presentation operation has ended (step S222).
If the presentation operation has not ended (No in step S222), the operation command unit 23 returns to the process of step S202.
When the presentation operation is completed (Yes in step S222), the operation command unit 23 ends the operation analysis process.

＜歌唱処理＞
図８を参照し、歌唱処理について説明する。
図８に示すように、ロボット１は、図５の話者認識処理を行って、パラメータを取得する（ステップＳ３００）。 <Singing process>
The singing process will be described with reference to FIG.
As shown in FIG. 8, the robot 1 performs the speaker recognition process of FIG. 5 and acquires parameters (step S300).

会話選択部２０は、被介護者が希望した歌について、歌詞・楽譜を会話記憶部１９から選択する。
音声合成部２１は、ステップＳ３００で取得した歌唱パラメータ（音域、歌唱速度）で歌唱・伴奏を行う（ステップＳ３０１）。 The conversation selection unit 20 selects lyrics / scores from the conversation storage unit 19 for the song desired by the care recipient.
The speech synthesizer 21 performs singing and accompaniment with the singing parameters (sound range, singing speed) acquired in step S300 (step S301).

音声解析部１４は、マイクＭから入力された被介護者の歌唱の音響特性を解析し、被介護者の音域及び歌唱速度を抽出する（ステップＳ３０２）。
音声合成部２１は、ステップＳ３０２に抽出した被介護者の音響特性に合わせて、音域及び歌唱速度を変更する（ステップＳ３０３）。 The voice analysis unit 14 analyzes the acoustic characteristics of the cared person's song input from the microphone M, and extracts the cared person's range and singing speed (step S302).
The voice synthesizer 21 changes the sound range and singing speed in accordance with the acoustic characteristics of the cared person extracted in step S302 (step S303).

音声解析部１４は、被介護者の歌唱が安定したか否かを判定する。例えば、音声解析部１４は、被介護者の音域や歌唱速度が歌唱安定範囲になったとき、歌唱が安定したと判定する。この歌唱安定範囲は、被介護者の年齢や性別、歌の種類や難易度に応じて、予め設定する（ステップＳ３０４）。 The voice analysis unit 14 determines whether or not the cared person's singing is stable. For example, the voice analysis unit 14 determines that the singing is stable when the care receiver's range or singing speed is within the singing stable range. This singing stable range is set in advance according to the age and sex of the cared person, the kind of song, and the difficulty level (step S304).

歌唱が安定した場合（ステップＳ３０４でＹｅｓ）、パラメータ取得部１７は、被介護者の識別情報と、調整後の歌唱パラメータとを対応付けてパラメータ記憶部１６に書き込む（ステップＳ３０５）。 When the singing is stable (Yes in step S304), the parameter acquisition unit 17 writes the identification information of the care recipient and the adjusted singing parameter in the parameter storage unit 16 in association with each other (step S305).

会話選択部２０は、歌唱が終了したか否かを判定する（ステップＳ３０６）。
歌唱が終了していない場合（ステップＳ３０６でＮｏ）、会話選択部２０は、ステップＳ３０２の処理に戻る。
歌唱が終了した場合（ステップＳ３０６でＹｅｓ）、会話選択部２０は、歌唱処理を終了する。 The conversation selection unit 20 determines whether or not the singing is finished (step S306).
When the singing has not ended (No in step S306), the conversation selection unit 20 returns to the process of step S302.
When the singing is finished (Yes in step S306), the conversation selecting unit 20 finishes the singing process.

＜音声解析処理＞
図９を参照し、音声解析処理について説明する。
図９に示すように、マイクＭは、被介護者の音声を取得する（ステップＳ４００）。
音声解析部１４は、ステップＳ４００で取得した被介護者の音響特性を解析する（ステップＳ４０１）。
話者特定部１５は、ステップＳ４０１の音響特性解析結果に基づいて、被介護者（話者）を特定する（ステップＳ４０２）。 <Audio analysis processing>
The voice analysis process will be described with reference to FIG.
As shown in FIG. 9, the microphone M acquires the care receiver's voice (step S400).
The voice analysis unit 14 analyzes the acoustic characteristics of the cared person acquired in step S400 (step S401).
The speaker specifying unit 15 specifies the care receiver (speaker) based on the acoustic characteristic analysis result of step S401 (step S402).

被介護者を特定できた場合（ステップＳ４０２でＹｅｓ）、パラメータ取得部１７は、ステップＳ４０２で特定した被介護者について、パラメータ記憶部１６からパラメータを取得し、音声解析処理を終了する（ステップＳ４０３）。 When the care recipient can be identified (Yes in step S402), the parameter acquisition unit 17 acquires parameters from the parameter storage unit 16 for the care recipient identified in step S402, and ends the voice analysis process (step S403). ).

被介護者を特定できない場合（ステップＳ４０２でＮｏ）、パラメータ取得部１７は、新たな被介護者のパラメータを取得する。例えば、パラメータ取得部１７は、予め設定されているパラメータ初期値を、新たな被介護者のパラメータとすればよい。この他、パラメータ取得部１７は、新たな被介護者が歌唱を行った場合、ステップＳ３０４で歌唱が安定したときのパラメータを用いてもよい。そして、パラメータ取得部１７は、被介護者の識別情報（例えば、顔画像の特徴量）に対応付けて、新たな被介護者のパラメータをパラメータ記憶部１６に書き込み、音声解析処理を終了する（ステップＳ４０４）。 When the care receiver cannot be specified (No in step S402), the parameter acquisition unit 17 acquires a new care receiver parameter. For example, the parameter acquisition unit 17 may use a preset parameter initial value as a new care receiver parameter. In addition, the parameter acquisition part 17 may use the parameter when a song was stabilized by step S304, when a new cared person sings. Then, the parameter acquisition unit 17 writes the new care receiver parameter in the parameter storage unit 16 in association with the care receiver identification information (for example, the feature amount of the face image), and ends the voice analysis process ( Step S404).

［作用・効果］
以上のように、本実施形態に係るロボット１は、被介護者が聞き取りにくい単語や文章を置き換えると共に、被介護者に合わせて会話を行うので、被介護者が会話を中断せずに継続しやすくなる。
さらに、ロボット１は、被介護者に合わせて動作を行うので、被介護者が運動を中断せずに継続しやすくなる。
さらに、ロボット１は、被介護者に合わせて歌唱や伴奏を行うので、被介護者が歌を中断せずに継続しやすくなる。
さらに、ロボット１は、被介護者に適したパラメータを学習するので、被介護者が会話、動作、歌唱をより継続しやすくなる。
このように、ロボット１は、介護者に代わって、被介護者を相手することができる。 [Action / Effect]
As described above, the robot 1 according to the present embodiment replaces words and sentences that are difficult for the cared person to hear, and also performs conversation according to the cared person, so that the cared person continues without interrupting the conversation. It becomes easy.
Furthermore, since the robot 1 operates according to the cared person, the cared person can easily continue the exercise without interruption.
Furthermore, since the robot 1 performs singing and accompaniment in accordance with the cared person, the cared person can easily continue without interrupting the song.
Furthermore, since the robot 1 learns parameters suitable for the cared person, the cared person can more easily continue the conversation, operation, and singing.
As described above, the robot 1 can deal with the care recipient in place of the caregiver.

（変形例）
以上、本発明の実施形態を詳述してきたが、本発明は前記した実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。
前記した実施形態では、ロボットとの会話相手となる人物が被介護者であることとして説明したが、被介護者に限定されない。
前記した実施形態では、ロボットがスタンドアローンであることとして説明したが、クラウドにより、より高速なコンピュータを利用することもできる。 (Modification)
As mentioned above, although embodiment of this invention was explained in full detail, this invention is not limited to above-described embodiment, The design change etc. of the range which does not deviate from the summary of this invention are included.
In the above-described embodiment, it has been described that the person who is a conversation partner with the robot is a cared person, but is not limited to the cared person.
In the above-described embodiment, the robot is described as being a stand-alone, but a faster computer can also be used with the cloud.

前記した実施形態では、被介護者が１人であることとして説明したが、複数の被介護者からなるグループにも対応できる。この場合、ロボットは、半数以上の被介護者にふさわしい発話、提示動作、歌唱をおこなってもよい。
前記した実施形態では、ロボットが歌唱及び伴奏を行うこととして説明したが、歌唱又は伴奏の何れか一方のみをおこなってもよい。例えば、ロボットは、ディスプレイを備え、伴奏を行うと共に、その伴奏に合わせて歌詞をディスプレイに表示してもよい。
前記した実施形態では、ロボットが会話、提示動作、歌唱の何れか一つのみを行うこととして説明したが、これらを組み合わせておこなってもよい。例えば、ロボットは、会話及び提示動作の組み合わせ、提示動作及び歌唱の組み合わせを行うことができる。 In the above-described embodiment, it has been described that there is one cared person, but it can also be applied to a group of a plurality of cared persons. In this case, the robot may perform utterances, presentation operations, and singing suitable for more than half of the care recipients.
In the above-described embodiment, it has been described that the robot performs singing and accompaniment, but either one of singing or accompaniment may be performed. For example, the robot may include a display, perform accompaniment, and display lyrics on the display in accordance with the accompaniment.
In the above-described embodiment, the robot has been described as performing only one of conversation, presentation operation, and singing, but may be performed in combination. For example, the robot can perform a combination of conversation and presentation operation, and a combination of presentation operation and singing.

前記した実施形態では、ロボットを独立したハードウェアとして説明したが、本発明は、これに限定されない。例えば、本発明は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を、前記したロボットの制御部として協調動作させるプログラムで実現することもできる。これらのプログラムは、通信回線を介して配布してもよく、ＣＤ−ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布してもよい。 In the above-described embodiment, the robot has been described as independent hardware, but the present invention is not limited to this. For example, the present invention can also be realized by a program that causes hardware resources such as a CPU, a memory, and a hard disk included in a computer to operate cooperatively as the control unit of the robot. These programs may be distributed via a communication line, or may be distributed by writing in a recording medium such as a CD-ROM or a flash memory.

１ロボット
１０制御部
１１顔画像認識部
１２動作解析部
１３発話認識部（音声認識部）
１４音声解析部
１５話者特定部（人物特定部）
１６パラメータ記憶部
１７パラメータ取得部
１８会話解析部
１９会話記憶部
２０質問選択部（会話選択部）
２１音声合成部（音声出力部）
２２提示動作記憶部
２３動作指令部
Ｃカメラ
Ｍマイク
Ｓスピーカ
Ｄ駆動機構 DESCRIPTION OF SYMBOLS 1 Robot 10 Control part 11 Face image recognition part 12 Motion analysis part 13 Speech recognition part (voice recognition part)
14 Voice analysis part 15 Speaker specific part (person specific part)
16 parameter storage unit 17 parameter acquisition unit 18 conversation analysis unit 19 conversation storage unit 20 question selection unit (conversation selection unit)
21 Speech synthesis unit (speech output unit)
22 presentation motion storage unit 23 motion command unit C camera M microphone S speaker D drive mechanism

Claims

A robot comprising a microphone for inputting a person's speech, a camera for photographing the person, and a speaker for outputting sound to the person,
A question selection unit for selecting a question for the person;
Via the speaker, a voice output unit that outputs the question selected by the question selection unit by voice; and
A voice recognition unit that recognizes the person's response to the question input from the microphone;
A face image recognition unit for recognizing a face image of a person photographed by the camera;
A person identifying unit that identifies the person based on the voice recognition result of the reply or the image recognition result of the face image;
As the parameter for each person, high frequency emphasis information representing the high frequency side frequency component of the voice to be emphasized, a question replacement rule for replacing the content of the question, and a parameter storage unit for storing speech speed;
A parameter acquisition unit that acquires a parameter from the parameter storage unit for the person specified by the person specifying unit;
The voice output unit replaces the content of the question according to the parameter acquired by the parameter acquisition unit, and outputs the question at a frequency and a speech speed according to the parameter.

The person specifying unit, when the person cannot be specified, obtains the estimated age and estimated sex of the person from the image recognition result of the face image,
The parameter storage unit further stores parameters corresponding to the estimated age and the estimated sex,
The robot according to claim 1, wherein the parameter acquisition unit acquires parameters corresponding to the estimated age and the estimated sex when the person specifying unit cannot specify the person.

The parameter acquisition unit writes the identification information of the person and the parameter at the time of outputting the question in the parameter storage unit when a response to the question output with the parameters of the estimated age and the estimated gender is input. The robot according to claim 2.

A drive mechanism for driving the movable part of the robot;
An operation command unit that commands the drive mechanism to perform a preset presentation operation;
The parameter storage unit further stores an operation part, an operation amount, an operation start time, and an operation duration as the parameters,
The said operation command part commands the said presentation operation | movement with the operation | movement part according to the parameter which the said parameter acquisition part acquired, the operation amount, the operation start time, and the operation continuation time. The robot according to any one of the above.

The parameter storage unit further stores a range and singing speed as the parameter,
The robot according to any one of claims 1 to 4, wherein the voice output unit sings at a sound range and a singing speed corresponding to the parameter acquired by the parameter acquisition unit.

The robot control program for functioning a computer as a robot as described in any one of Claims 1-5.