JP5045519B2

JP5045519B2 - Motion generation device, robot, and motion generation method

Info

Publication number: JP5045519B2
Application number: JP2008079736A
Authority: JP
Inventors: 雅之高島
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2008-03-26
Filing date: 2008-03-26
Publication date: 2012-10-10
Anticipated expiration: 2028-03-26
Also published as: JP2009233764A

Abstract

<P>PROBLEM TO BE SOLVED: To provide an operation generating device for adjusting a processing speed of a gesture according to a speech content. <P>SOLUTION: An operation generating device 20 includes: a speech content database 2 for holding a speech content and operation timing when an operation pattern is operated according to the speech content; an operation pattern database 8 for registering operation data corresponding to the operation pattern; a speech content generation part 3 for selecting the speech content from the speech content database 2; an operation kind determining part 7 for determining the operation pattern necessary for explanation; an operation pattern combination generation part 9 for extracting the operation data from the operation pattern database 8, based on the determined operation pattern and combining the speech content and the operation data; and an operation processing speed adjustment part 10 for adjusting a reproduction processing speed of the operation data, based on the operation timing according to the speech content. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

ロボットが動作する処理速度を調整する装置及び方法に関する。 The present invention relates to an apparatus and method for adjusting a processing speed at which a robot operates.

ロボットが人間の問いかけに応じて応答する場合、応答する内容によって、発話時間長が変わる。従って、予め決められた動作データを再生するだけでは、発話内容と動作との対応や、発話終了タイミングと動作終了タイミングにずれが生じることがあった。このような場合、次に人間が問いかけするタイミングがわかりにくくなっていた。また、発話内容と動作内容が異なり、ロボットの応答の印象が不自然となることがあった。 When the robot responds in response to a human question, the utterance time length varies depending on the response content. Therefore, if only predetermined motion data is reproduced, the correspondence between the utterance content and the motion, or the utterance end timing and the motion end timing may be shifted. In such a case, it was difficult to understand the next timing for human questions. Also, the utterance content and action content are different, and the robot's response impression may become unnatural.

例えば、特許文献１には、短い動作を組み合わせ、無動作時間を設けることで、終了タイミングを合わせる技術が開示されている。特許文献２には、音韻継続時間と動作時間とを調整し、調整した動作時間に基づいて可動部位の動作を制御することによって、音声合成による発声する言葉と発声タイミングに、調音器官を模擬した部位の動作を同期させる技術が開示されている。特許文献３には、音声単位の継続時間超が特定されたリストに基づいて口の動きの軌跡パラメータと音声波形を準備し、生成される音声と口の動きとを同期させる技術が開示されている。特許文献４には、ロボットと人間との快適なインタラクションを実現するため、ロボットが対人距離及び人間の顔の向きからインタラクションパラメータ（対人距離、注視時間、モーション開始時間、モーション速度）の適切度を最適化する技術が開示されている。また、特許文献５には、メディア再生部からの時間経過の通知を受け、対応するフレームにおいてロボットを動作させる技術が開示されている。
特許第３９３０３８９号公報特開２００１−１７９６６７号公報特開２００６−２１２７３号公報特開２００６−２４７７８０号公報特開２００２−６６１５６号公報 For example, Patent Document 1 discloses a technique for adjusting the end timing by combining a short operation and providing a non-operation time. In Patent Document 2, the articulatory organ is imitated by the words and utterance timings uttered by speech synthesis by adjusting the phoneme duration and the operation time and controlling the operation of the movable part based on the adjusted operation time. A technique for synchronizing the movements of the parts is disclosed. Patent Document 3 discloses a technique for preparing a mouth movement trajectory parameter and a voice waveform based on a list in which the duration exceeding the voice unit is specified, and synchronizing the generated voice and the mouth movement. Yes. In Patent Document 4, in order to realize comfortable interaction between a robot and a human, the robot determines the appropriateness of the interaction parameters (personal distance, gaze time, motion start time, motion speed) from the interpersonal distance and the direction of the human face. Techniques for optimizing are disclosed. Patent Document 5 discloses a technique for receiving a time elapse notification from a media playback unit and operating a robot in a corresponding frame.
Japanese Patent No. 3930389 JP 2001-179667 A JP 2006-21273 A JP 2006-247780 A JP 2002-66156 A

しかしながら、上記特許文献は、発話時間に合わせてロボットの身振り動作の処理速度を変更するものではなく、ロボットの身振り動作を開始するタイミングを調整することができなかった。また、特許文献１の技術では、無動作時間に不自然さが残っていた。施設内の展示物を案内するロボットにおいて、発話内容に沿って展示物を指し示すタイミングが重要となる。このため、発話中に展示物を指し示すタイミングが合わせられないという問題があった。 However, the above-mentioned patent document does not change the processing speed of the gesture operation of the robot according to the utterance time, and the timing for starting the gesture operation of the robot cannot be adjusted. In the technique of Patent Document 1, unnaturalness remains in the non-operation time. In the robot that guides the exhibits in the facility, the timing of pointing the exhibits along the utterance contents is important. For this reason, there is a problem that the timing for pointing to the exhibit during the utterance cannot be adjusted.

本発明は、発話内容に応じて身振り動作の処理速度を調整する動作生成装置及び方法を提供することを目的とする。 An object of the present invention is to provide a motion generation apparatus and method that adjusts the processing speed of gesture motion according to the content of an utterance.

本発明に係る動作生成装置の一態様は、発話内容と前記発話内容に応じて動作パターンが動作する動作タイミングを保持する発話内容データベースと、動作パターンに対応する動作データを登録する動作パターンデータベースと、前記発話内容データベースから発話内容を選択する発話内容生成部と、選択された発話内容に応じて、説明に必要な動作パターンを判断する動作種別判断部と、判断された動作パターンに基づいて、前記動作パターンデータベースから動作データを抽出し、前記発話内容と前記動作データとを組み合わせる動作パターン組合わせ生成部と、前記発話内容に応じた動作タイミングに基づいて、前記動作データの前記再生処理速度を調整する動作処理速度調整部と、を備える。動作生成装置は、発話内容に応じて身振り動作の処理速度を変更することができる。これにより、発話内容と身振り動作とを同期させることができるため、より自然な説明を行うことができる。 An aspect of the motion generation device according to the present invention includes: an utterance content database that holds an utterance content and an operation timing at which an operation pattern operates according to the utterance content; an operation pattern database that registers operation data corresponding to the operation pattern; Based on the determined operation pattern, an utterance content generation unit that selects utterance content from the utterance content database, an operation type determination unit that determines an operation pattern necessary for explanation according to the selected utterance content, The operation data is extracted from the operation pattern database, the operation pattern combination generation unit that combines the utterance content and the operation data, and the reproduction processing speed of the operation data based on the operation timing according to the utterance content. An operation processing speed adjustment unit for adjustment. The motion generation device can change the processing speed of the gesture motion according to the utterance content. Thereby, since the utterance content and the gesture operation can be synchronized, a more natural explanation can be given.

また、本発明に係る動作生成装置の一態様において、前記動作処理速度調整部は、前記動作データの処理周期の長さを変更することによって再生処理速度を調整する。前記動作処理速度調整部は、動作タイミングにおいて動作が開始または終了するように、再生処理速度を調整することが好ましい。前記動作処理速度調整部は、発話内容の再生終了に動作データの再生終了をあわせるように再生処理速度を調整することが好ましい。これにより、説明のタイミングと動作とを同期させたり、発話内容の終了と動作の終了とを一致させることができる。 Further, in one aspect of the motion generation device according to the present invention, the motion processing speed adjustment unit adjusts the playback processing speed by changing the length of the processing cycle of the motion data. The operation processing speed adjustment unit preferably adjusts the reproduction processing speed so that the operation starts or ends at the operation timing. It is preferable that the motion processing speed adjustment unit adjusts the playback processing speed so that the playback end of the motion data is matched with the playback end of the utterance content. As a result, the timing and operation of the explanation can be synchronized, and the end of the utterance content can be matched with the end of the operation.

さらに、本発明に係る動作生成装置の一態様において、前記発話内容生成部は、複数の発話内容を選択し、前記動作種別判断部は、前記複数の発話内容に応じて、複数の動作パターンを判断し、前記動作パターン組合わせ生成部９は、前記複数の動作パターンを結合させ、前記複数の発話内容と前記複数の動作パターンを組み合わせ、前記動作処理速度調整部は、前記複数の発話内容それぞれに応じた動作タイミングに、組み合わされた動作データが適切に動作するように再生処理速度を調整する。 Further, in one aspect of the action generation device according to the present invention, the utterance content generation unit selects a plurality of utterance contents, and the action type determination unit displays a plurality of action patterns according to the plurality of utterance contents. The action pattern combination generation unit 9 combines the plurality of action patterns to combine the plurality of utterance contents and the plurality of action patterns, and the action processing speed adjustment unit The reproduction processing speed is adjusted so that the combined operation data operates appropriately at the operation timing according to the above.

本発明に係るロボットの一態様は、発話内容に応じて動作するロボットであって、アクチュエータと、上述した動作生成装置と、前記動作生成装置が動作速度を調整した前記複数の動作データの組み合わせに基づいて、アクチュエータを動作させる動作制御部と、を備える。 One aspect of the robot according to the present invention is a robot that operates in accordance with utterance content, and includes a combination of an actuator, the above-described motion generation device, and the plurality of motion data in which the motion generation device has adjusted the motion speed. And an operation control unit for operating the actuator.

本発明に係る動作生成方法は、発話内容と発話内容に応じて動作パターンが動作する動作タイミング動作タイミングを保持する発話内容データベースと、動作パターンに対応する動作データを登録する動作パターンデータベースと、を備え、前記発話内容データベースから発話内容を選択し、選択された発話内容に応じて、説明に必要な動作パターンを判断し、判断された動作パターンに基づいて、前記動作パターンデータベースから動作データを抽出し、前記発話内容に応じた動作タイミングに基づいて、前記複数の動作データの再生処理速度を調整する。 The motion generation method according to the present invention includes an utterance content database that holds an operation timing corresponding to an utterance content and an operation pattern that operates according to the utterance content, and an operation pattern database that registers operation data corresponding to the operation pattern. The utterance content is selected from the utterance content database, the operation pattern necessary for explanation is determined according to the selected utterance content, and the operation data is extracted from the operation pattern database based on the determined operation pattern The reproduction processing speed of the plurality of operation data is adjusted based on the operation timing corresponding to the utterance content.

本発明によれば、発話内容に応じて身振り動作の処理速度を調整する動作生成装置及び方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the operation | movement production | generation apparatus and method which adjust the processing speed of gesture operation | movement according to utterance content can be provided.

以下、本発明の実施形態について、図面を参照しながら説明する。説明の明確化のため、以下の記載及び図面は、適宜、省略、及び簡略化がなされている。各図面において同一の構成または機能を有する構成要素および相当部分には、同一の符号を付し、その説明は省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. For clarity of explanation, the following description and drawings are omitted and simplified as appropriate. In the drawings, components having the same configuration or function and corresponding parts are denoted by the same reference numerals and description thereof is omitted.

（実施形態１）
図１は、本発明に係る実施形態１のロボットの構成例を示すブロック図である。ロボットは、音声認識部１、発話内容データベース２、発話内容生成部（説明内容生成部）３、音声合成部４、音声再生部５、スピーカ６、動作種別判断部７、動作パターンデータベース８、動作パターン組合わせ生成部９、動作処理速度調整部１０、動作制御部１１、及び、アクチュエータ１２を備える。 (Embodiment 1)
FIG. 1 is a block diagram illustrating a configuration example of the robot according to the first embodiment of the present invention. The robot includes a speech recognition unit 1, an utterance content database 2, an utterance content generation unit (explanation content generation unit) 3, a speech synthesis unit 4, an audio reproduction unit 5, a speaker 6, an action type determination unit 7, an operation pattern database 8, an operation A pattern combination generation unit 9, an operation processing speed adjustment unit 10, an operation control unit 11, and an actuator 12 are provided.

音声認識部１は、音声による問いかけを認識する。
発話内容データベース２は、発話内容、発話内容全体の発話時間、及び、発話内容に応じて動作パターンが動作する動作タイミングを保持する記憶領域である。発話内容は、例えば、問いかけ内容に対応する回答文の文例を保持する。図２に、発話内容データベース２の構成例を示す。発話内容、発話時間、及び、動作タイミングは、発話内容番号（図２ではＮｏ．で示している）に対応づけられて格納される。 The voice recognition unit 1 recognizes a question by voice.
The utterance content database 2 is a storage area that holds the utterance content, the utterance time of the entire utterance content, and the operation timing at which the operation pattern operates according to the utterance content. The utterance content holds, for example, an example of an answer sentence corresponding to the inquiry content. FIG. 2 shows a configuration example of the utterance content database 2. The utterance content, the utterance time, and the operation timing are stored in association with the utterance content number (indicated by No. in FIG. 2).

発話内容生成部３は、音声認識部１が認識した問いかけ内容に基づいて、ロボットが回答する発話内容（以下、適宜、「回答文」、または、「回答内容」ともいう）を決定する。また、発話内容生成部３は、問いかけ内容に限定されることなく、定期的に発話する内容、例えば、所定の時間に通知する案内などを決定することもできる。また、発話内容生成部３は、複数の発話内容を選択する場合、発話内容の順番を特定して選択する。発話内容生成部３は、選択した少なくとも一つの発話内容番号を動作種別判断部７へ受け渡す。 The utterance content generation unit 3 determines the utterance content (hereinafter also referred to as “answer text” or “response content” as appropriate) to be answered by the robot, based on the question content recognized by the voice recognition unit 1. Further, the utterance content generation unit 3 is not limited to the inquiry content, and can also determine the content to be uttered regularly, for example, guidance to be notified at a predetermined time. Further, when selecting a plurality of utterance contents, the utterance contents generating unit 3 specifies and selects the order of the utterance contents. The utterance content generation unit 3 delivers the selected at least one utterance content number to the action type determination unit 7.

音声合成部４は、発話内容生成部３で生成された発話内容を音声データに合成する。
音声再生部５は、音声合成部４で合成された音声データを再生する。
スピーカ６は、音声再生部５から実際に音声を外部出力させる。 The voice synthesizer 4 synthesizes the utterance content generated by the utterance content generator 3 with voice data.
The voice reproduction unit 5 reproduces the voice data synthesized by the voice synthesis unit 4.
The speaker 6 actually outputs the sound from the sound reproduction unit 5 to the outside.

動作種別判断部７は、発話内容生成部３が判断した発話内容に基づいて、説明に必要な動作パターン（例えば、指示方向などの説明ポイントなどの発話内容に応じた身振り動作）を判断する。動作種別判断部７は、発話内容番号と動作パターンとを動作パターン組合わせ生成部９へ受け渡す。 Based on the utterance content determined by the utterance content generation unit 3, the operation type determination unit 7 determines an operation pattern necessary for explanation (for example, a gesture operation according to the utterance content such as an explanation point such as an instruction direction). The action type determination unit 7 passes the utterance content number and the action pattern to the action pattern combination generation unit 9.

動作パターンデータベース８は、さまざまな動作パターンに対応する動作データを登録する記憶領域である。複数の動作データはそれぞれ動作パターンに対応づけられて動作パターンデータベース８へ予め格納されている。図１では、具体的に右を示す動作等の４種類の動作を一例と示しているがこれらに限られるわけではない。また、図１には示していないが、動作パターンデータベース８は、動作データとして、動作パターン毎に、動作パターンに対応する動作、動作時間等の情報を、動作パターンを識別する番号に対応づけて格納する。動作データに含まれる動作は、所定の処理周期で動作するように設定されている。処理周期を変更する（例えば、長くする、短くする）ことによって動作を開始・終了するタイミングを調整することができる。 The operation pattern database 8 is a storage area for registering operation data corresponding to various operation patterns. The plurality of operation data are stored in advance in the operation pattern database 8 in association with the operation patterns. In FIG. 1, four types of operations such as an operation indicating the right are shown as examples, but the present invention is not limited to these. Although not shown in FIG. 1, the operation pattern database 8 associates, as operation data, information such as operation and operation time corresponding to the operation pattern with a number for identifying the operation pattern for each operation pattern. Store. The operation included in the operation data is set to operate at a predetermined processing cycle. The timing for starting and ending the operation can be adjusted by changing the processing cycle (for example, lengthening or shortening).

動作パターン組合わせ生成部９は、動作種別判断部７が判断した動作パターンに基づいて、動作パターンデータベース８から適切な動作データを抽出し、発話内容番号と動作データとを組み合わせる。また、動作パターン組合わせ生成部９は、複数の発話内容が選択されている場合、複数の発話内容それぞれに応じて、複数の動作データを抽出し、複数の動作データを結合させる。また、一つの発話内容であっても複数の動作パターンが選択されている場合、複数の動作データを結合させる。動作パターン組合わせ生成部９は、一つの動作データまたは複数の動作データを結合させた速度未調整動作データを生成する。動作パターン組合わせ生成部９は、発話内容番号と速度未調整動作データとを動作処理速度調整部１０へ受け渡す。 The action pattern combination generation unit 9 extracts appropriate action data from the action pattern database 8 based on the action pattern determined by the action type determination unit 7 and combines the utterance content number and the action data. In addition, when a plurality of utterance contents are selected, the action pattern combination generation unit 9 extracts a plurality of action data according to each of the plurality of utterance contents and combines the plurality of action data. In addition, when a plurality of motion patterns are selected even for one utterance content, a plurality of motion data are combined. The motion pattern combination generation unit 9 generates speed unadjusted motion data obtained by combining one motion data or a plurality of motion data. The motion pattern combination generation unit 9 delivers the utterance content number and the speed unadjusted operation data to the operation processing speed adjustment unit 10.

動作処理速度調整部１０は、発話内容に応じた動作タイミングに基づいて、動作パターン組合わせ生成部９が生成した動作データ（速度未調整動作データ）の再生処理速度を調整（決定）し、速度調整済動作データを生成する。具体的には、動作処理速度調整部１０は、動作データの処理周期の長さを変更することによって再生処理速度を調整する。また、動作処理速度調整部１０は、発話内容前記動作データの処理周期の長さを変更することによって発話内容の再生終了に動作データの再生終了をあわせるように再生処理速度を調整する。 The motion processing speed adjustment unit 10 adjusts (determines) the playback processing speed of the motion data (speed unadjusted motion data) generated by the motion pattern combination generation unit 9 based on the motion timing according to the utterance content. Generate adjusted motion data. Specifically, the motion processing speed adjustment unit 10 adjusts the playback processing speed by changing the length of the motion data processing cycle. Further, the motion processing speed adjustment unit 10 adjusts the playback processing speed so that the playback end of the utterance content is matched with the playback end of the motion data by changing the length of the processing cycle of the utterance content.

動作制御部１１は、動作処理速度調整部１０が調整した速度調整済動作データに基づいて、アクチュエータに処理信号を送る。
アクチュエータ１２は、実際にロボットの腕、首、胴、脚を動かす。 The motion control unit 11 sends a processing signal to the actuator based on the speed-adjusted motion data adjusted by the motion processing speed adjustment unit 10.
The actuator 12 actually moves the arm, neck, torso, and leg of the robot.

図１に示すロボットにおいて、発話内容に応じて動作の処理速度を調整する動作生成装置２０は、少なくとも、発話内容データベース２、動作種別判断部７、動作パターンデータベース８、動作パターン組合わせ生成部９、及び、動作処理速度調整部１０から構成される。 In the robot shown in FIG. 1, the motion generation device 20 that adjusts the processing speed of the motion according to the utterance content includes at least the utterance content database 2, the motion type determination unit 7, the motion pattern database 8, and the motion pattern combination generation unit 9. And the operation processing speed adjustment unit 10.

続いて、ロボットの動作について、動作生成装置２０を中心に説明する。図３は、ロボットが説明を始めるまでの動作例を示すフローチャートである。図３では、展示会場などで接客するロボットの動作例を示している。 Next, the operation of the robot will be described focusing on the motion generation device 20. FIG. 3 is a flowchart showing an operation example until the robot starts explanation. FIG. 3 shows an example of the operation of a robot serving customers at an exhibition hall or the like.

まず、ロボットは来客者から問いかけを受けると、音声認識部１は、音声による問いかけを認識する（Ｓ１１）。発話内容生成部３は、問い合わせ内容を理解し、問い合わせ内容に応じた発話内容（回答文）を、発話内容データベース２を用いて生成する（Ｓ１２）。発話内容生成部３は、生成した発話内容に対応する少なくとも一つの発話内容番号を音声合成部４と動作種別判断部７へ渡す。次に、生成された発話内容に基づいて、音声合成と動作生成との二つが並行して動作する。 First, when the robot receives an inquiry from a visitor, the voice recognition unit 1 recognizes the inquiry by voice (S11). The utterance content generation unit 3 understands the inquiry content and generates the utterance content (answer sentence) according to the inquiry content by using the utterance content database 2 (S12). The utterance content generation unit 3 passes at least one utterance content number corresponding to the generated utterance content to the speech synthesis unit 4 and the action type determination unit 7. Next, based on the generated utterance content, two of speech synthesis and motion generation operate in parallel.

まず音声合成から説明する。音声合成部４は、発話内容を音声データに変換する（Ｓ１３）。音声再生部５は、発話開始のタイミングを待ち、音声を再生する（Ｓ１４）。 First, speech synthesis will be described. The voice synthesizer 4 converts the utterance content into voice data (S13). The voice playback unit 5 waits for the start timing of the utterance and plays back the voice (S14).

次に動作生成について説明する。動作種別判断部７は、発話内容生成部３が判断した発話内容に基づいて、文意を種別し、発話内容に応じた動作パターンを判断する（Ｓ１５）。具体的には、動作種別判断部７は、現在位置を取得し、説明する内容の配置位置との関係を検出して、必要な動作パターンを判断する。例えば、動作種別判断部７は、発話内容に二つの文意があり（発話内容番号が二つある場合、一つの発話内容に複数の文意がある場合を含む）、右を指す動作と左を指す動作が含まれていると判断する。動作種別判断部７は、判断した動作パターンと発話内容番号を動作パターン組合わせ生成部９へ渡す。 Next, operation generation will be described. The action type determination unit 7 classifies the meaning of the sentence based on the utterance content determined by the utterance content generation unit 3, and determines an operation pattern according to the utterance content (S15). Specifically, the action type determination unit 7 acquires the current position, detects the relationship with the arrangement position of the content to be described, and determines a necessary action pattern. For example, the action type determination unit 7 has two sentences in the utterance content (including two utterance contents numbers, including a case in which one utterance contents has a plurality of sentences), an action pointing to the right, and a left It is determined that an action pointing to is included. The action type determination unit 7 passes the determined action pattern and utterance content number to the action pattern combination generation unit 9.

続いて、動作パターン組合わせ生成部９は、動作種別判断部７が判断した動作パターンに対応する動作データを動作パターンデータベース８から読み出し、読み出した動作データを組み合わせる（Ｓ１６）。このとき、動作パターン組合わせ生成部９は、各動作パターンを接続する動作を挿入したりして、連続する動作となるように動作データを組み合わせて速度未調整動作データを生成する。動作パターン組合わせ生成部９は、発話内容番号と速度未調整動作とを動作処理速度調整部１０に渡す。 Subsequently, the operation pattern combination generation unit 9 reads operation data corresponding to the operation pattern determined by the operation type determination unit 7 from the operation pattern database 8, and combines the read operation data (S16). At this time, the motion pattern combination generation unit 9 generates motions without speed adjustment by combining motion data so as to be continuous motions by inserting motions connecting the motion patterns. The motion pattern combination generation unit 9 passes the utterance content number and the speed unadjusted operation to the motion processing speed adjustment unit 10.

動作処理速度調整部１０は、速度未調整動作データの再生処理速度を決定する（Ｓ１７）。具体的には、動作処理速度調整部１０は、発話内容番号を用いて発話内容データベース２を検索し、発話内容番号に対応づけられている発話時間と動作タイミングと読み出す。動作処理速度調整部１０は、発話時間、動作タイミング、動作データ等の情報に基づいて、動作データの処理周期を調整する。具体的には、動作処理速度調整部１０は、動作タイミングに動作パターンが一致する（例えば、開始する）ように、あるいは、発話内容の発話時間に対応する動作パターンが終了するように処理周期を調整する。このようにして、動作処理速度調整部１０は、速度未調整動作データに含まれる一つまたは複数の動作データについて処理速度を調整して速度調整済動作データを生成し、動作制御部１１へ渡す。 The operation processing speed adjustment unit 10 determines the reproduction processing speed of the speed unadjusted operation data (S17). Specifically, the operation processing speed adjustment unit 10 searches the utterance content database 2 using the utterance content number, and reads out the utterance time and the operation timing associated with the utterance content number. The motion processing speed adjustment unit 10 adjusts the processing cycle of motion data based on information such as speech time, motion timing, motion data, and the like. Specifically, the motion processing speed adjustment unit 10 sets the processing cycle so that the motion pattern matches (for example, starts) with the motion timing or the motion pattern corresponding to the speech time of the speech content ends. adjust. In this way, the motion processing speed adjustment unit 10 adjusts the processing speed for one or a plurality of motion data included in the speed unadjusted motion data, generates speed-adjusted motion data, and passes it to the motion control unit 11. .

動作制御部１１は、音声再生部５と同期をとって、速度調整済動作データに基づいてアクチュエータ１２を動作させる、動作データを再生させる（Ｓ１９）。 The operation control unit 11 synchronizes with the audio reproduction unit 5 and reproduces the operation data for operating the actuator 12 based on the speed-adjusted operation data (S19).

図４は、発話内容と動作データの処理周期とを調整した一例を示す図である。図４では、発話内容生成部３は、問い合わせに対して、発話内容Ａと発話内容Ｂとを回答文として生成した。動作種別判断部７は、発話内容Ａへ動作パターンとして右を指す動作を組み合わせた。また、動作種別判断部７は、発話内容Ｂへ左を指す動作を組み合わせた。このような発話内容と動作パターンにおいて、例えば、動作パターン組合わせ生成部９は次のように再生処理速度を調整する。 FIG. 4 is a diagram illustrating an example in which the utterance content and the operation data processing cycle are adjusted. In FIG. 4, the utterance content generation unit 3 generates the utterance content A and the utterance content B as an answer sentence in response to the inquiry. The action type determination unit 7 combines the utterance content A with an action pointing to the right as an action pattern. Further, the action type determination unit 7 combines the action pointing to the left with the utterance content B. In such utterance contents and motion patterns, for example, the motion pattern combination generation unit 9 adjusts the playback processing speed as follows.

動作パターン組合わせ生成部９は、Ｔ１からＴ３の期間、動作データの処理周期を長くして再生処理速度を遅くらせる。これにより、Ｔ２のタイミングで右を指す動作が展示物を指すように調整する。動作パターン組合わせ生成部９は、Ｔ４からＴ６の期間、動作データの処理周期を長くして再生処理速度を遅くらせる。これにより、Ｔ５のタイミングで左を指す動作が展示物を指すように調整する。また、動作パターン組合わせ生成部９は、Ｔ４からＴ６の期間、動作データ処理周期を長くして再生処理速度を遅らせる。これにより、左を指す動作が発話内容Ｂの終了タイミング（Ｔ８）で動作が終了するように調整する。 The motion pattern combination generation unit 9 lengthens the motion data processing period and slows down the playback processing speed during the period from T1 to T3. Thereby, it adjusts so that the operation | movement which points to the right at the timing of T2 may point to an exhibit. The motion pattern combination generation unit 9 lengthens the motion data processing period and slows down the playback processing speed during the period from T4 to T6. Thereby, it adjusts so that the operation | movement which points to the left may point to an exhibit at the timing of T5. Further, the motion pattern combination generation unit 9 lengthens the motion data processing period and delays the playback processing speed during the period from T4 to T6. Thus, the adjustment is performed so that the operation pointing to the left ends at the end timing (T8) of the utterance content B.

以上説明したとおり、本実施形態によれば、発話内容に応じて身振り動作の処理速度を変更することができる。これにより、発話内容の中で、身振り動作を開始したいタイミングで動作を開始させることができる。また、発話内容の終了するタイミングに合わせて、動作が終了するように調整することが可能になる。これにより、問いかけに対して、発話内容と動作とが対応した自然な応答を行うことができる。 As described above, according to the present embodiment, the processing speed of the gesture operation can be changed according to the utterance content. Thus, the operation can be started at the timing when the gesture operation is desired to be started in the utterance content. In addition, it is possible to adjust the operation to end in accordance with the timing when the utterance content ends. Accordingly, a natural response corresponding to the utterance content and the action can be performed in response to the question.

（その他の実施形態）
実施形態１では、ロボットへの問いかけに対する発話内容に応じた動作データを調整する場合を説明した。本発明はこれに限られることなく、ロボットが自ら発話内容を決定し、動作する場合に適用することもできる。例えば、来場者を検知したときに、発話内容番号１の発話内容に対応する動作パターン（お辞儀をする動作など）を調整することもできる。また、所定の時間に展示物の説明を開始する場合に本発明を適用することができる。 (Other embodiments)
In the first embodiment, the case where the operation data corresponding to the utterance content in response to the question to the robot is adjusted has been described. The present invention is not limited to this, and can also be applied to a case where the robot determines its own utterance content and operates. For example, when a visitor is detected, an operation pattern (such as a bowing operation) corresponding to the utterance content of the utterance content number 1 can be adjusted. Further, the present invention can be applied when the description of the exhibit starts at a predetermined time.

また、実施形態１では、発話内容生成部３は、人間からの問いかけに対応する発話内容を生成する場合を説明したが、これに限られることはない。外部からの動作指示、例えば、無線信号等によって展示物の説明を開始する指示などを受信した場合に、開始する動作に本発明を適用することも可能である。ロボットは、動作指示に基づいて発話内容を決定し、現在位置等に基づいて展示物を指し示す動作パターンを選択し、動作の処理速度を調整して展示物の説明を開始することができる。 Moreover, although Embodiment 1 demonstrated the case where the speech content generation part 3 produced | generated the speech content corresponding to the question from a human, it is not restricted to this. It is also possible to apply the present invention to an operation to be started when an operation instruction from the outside, for example, an instruction to start an explanation of an exhibit by a wireless signal or the like is received. The robot can determine the content of the utterance based on the operation instruction, select an operation pattern indicating the exhibit based on the current position, etc., and adjust the processing speed of the operation to start the description of the exhibit.

さらに、発話内容データベース２は、動作タイミングとして、発話内容に応じて動作パターンが動作するポイントを保持する。動作生成装置２０は、例えば、動作が開始するタイミング、動作が終了するタイミング、あるいは、動作の中間のタイミングなど発話内容と動作パターンとの同期をとりやすいポイントのタイミングを保持することができる。 Furthermore, the utterance content database 2 holds a point at which the operation pattern operates according to the utterance content as the operation timing. The motion generation device 20 can hold, for example, the timing of the point at which the utterance content and the motion pattern can be easily synchronized, such as the timing at which the motion starts, the timing at which the motion ends, or the timing at the middle of the motion.

以上説明したように、上記各実施形態は、例えば、施設内の展示物案内を行うロボットに適用することができる。あらかじめ説明内容と動作タイミングとを保持し、ロボットが適切なタイミングで展示物を指示すること、発話終了と同時に動作が終了することができるように、身振り動作の処理速度を変更する。これにより、ロボットおいて、発話内容と身振り動作とを同期させることができる。従って、発話内容に応じたより自然な動作や説明を実現するロボットを提供することができる。 As described above, the above embodiments can be applied to, for example, a robot that guides exhibits in a facility. The processing content of the gesture operation is changed so that the explanation contents and the operation timing are held in advance, and the robot can instruct the exhibit at an appropriate timing, and the operation can be completed simultaneously with the end of the utterance. Thereby, in the robot, the utterance content and the gesture operation can be synchronized. Therefore, it is possible to provide a robot that realizes a more natural operation and explanation according to the utterance content.

なお、本発明は上記に示す実施形態に限定されるものではない。本発明の範囲において、上記実施形態の各要素を、当業者であれば容易に考えうる内容に変更、追加、変換することが可能である。 In addition, this invention is not limited to embodiment shown above. Within the scope of the present invention, it is possible to change, add, or convert each element of the above-described embodiment to a content that can be easily considered by those skilled in the art.

本発明に係る実施形態１のロボットの構成例を示すブロック図である。It is a block diagram which shows the structural example of the robot of Embodiment 1 which concerns on this invention. 発話内容データベースの構成例を示す図である。It is a figure which shows the structural example of an utterance content database. ロボットが説明を始めるまでの動作例を示すフローチャートである。It is a flowchart which shows the operation example until a robot starts description. 発話内容と動作データの処理周期とを調整した一例を示す図である。It is a figure which shows an example which adjusted the content of speech, and the processing period of operation data.

Explanation of symbols

１音声認識部
２発話内容データベース
３発話内容生成部
４音声合成部
５音声再生部
６スピーカ
７動作種別判断部
８動作パターンデータベース
９動作パターン組合わせ生成部
１０動作処理速度調整部
１１動作制御部
１２アクチュエータ DESCRIPTION OF SYMBOLS 1 Speech recognition part 2 Utterance content database 3 Utterance content production | generation part 4 Speech synthesis part 5 Voice reproduction part 6 Speaker 7 Action type judgment part 8 Action pattern database 9 Action pattern combination production | generation part 10 Action processing speed adjustment part 11 Action control part 12 Actuator

Claims

An utterance content database that holds utterance content and operation timing at which an operation pattern operates according to the utterance content;
An operation pattern database for registering operation data corresponding to the operation pattern;
An utterance content generator for selecting utterance content from the utterance content database;
According to the selected utterance content, an action type determination unit that determines an action pattern necessary for explanation,
Based on the determined operation pattern, operation data is extracted from the operation pattern database, and an operation pattern combination generation unit that combines the utterance content and the operation data;
An operation processing speed adjustment unit that adjusts the reproduction processing speed of the operation data based on the operation timing according to the utterance content,
The motion processing speed adjustment unit adjusts the playback processing speed so that the motion starts or ends at the motion timing by changing the length of the motion data processing cycle .

An utterance content database that holds utterance content and operation timing at which an operation pattern operates according to the utterance content;
An operation pattern database for registering operation data corresponding to the operation pattern;
An utterance content generator for selecting utterance content from the utterance content database;
According to the selected utterance content, an action type determination unit that determines an action pattern necessary for explanation,
Based on the determined operation pattern, operation data is extracted from the operation pattern database, and an operation pattern combination generation unit that combines the utterance content and the operation data;
An operation processing speed adjustment unit that adjusts the reproduction processing speed of the operation data based on the operation timing according to the utterance content,
The motion processing speed adjustment unit adjusts the playback processing speed to match the end of playback of the motion data with the end of playback of the utterance content by changing the length of the processing cycle of the motion data .

The utterance content generation unit selects a plurality of utterance contents,
The action type determination unit determines a plurality of action patterns according to the plurality of utterance contents,
The operation pattern combination generation unit combines the plurality of operation patterns, combines the plurality of utterance contents and the plurality of operation patterns,
The operation processing speed adjusting unit, the operation timing corresponding to each of the plurality of speech content, combined operation data and adjusts the playback processing speed for correct operation claim 1 or 2 The motion generation device described in 1.

A robot that operates according to the utterance content,
An actuator,
The motion generation device according to claim 1 or 2 ,
A robot comprising: a motion control unit that operates an actuator based on a combination of the plurality of motion data adjusted by the motion generation device.

An utterance content database that holds an operation timing corresponding to an utterance content and an operation pattern that operates according to the utterance content, and an operation pattern database that registers operation data corresponding to the operation pattern,
Select the utterance content from the utterance content database,
According to the selected utterance content, determine the operation pattern necessary for explanation,
Based on the determined operation pattern, the operation data is extracted from the operation pattern database,
An operation generation method for adjusting the reproduction processing speed so that an operation starts or ends at the operation timing by changing a length of a processing cycle of the operation data based on an operation timing according to the utterance content. .

An utterance content database that holds an operation timing corresponding to an utterance content and an operation pattern that operates according to the utterance content, and an operation pattern database that registers operation data corresponding to the operation pattern,
Select the utterance content from the utterance content database,
According to the selected utterance content, determine the operation pattern necessary for explanation,
Based on the determined operation pattern, the operation data is extracted from the operation pattern database,
The playback processing speed is adjusted to match the end of playback of the utterance content with the end of playback of the operation data by changing the length of the processing cycle of the operation data based on the operation timing according to the utterance content. How to generate motion.