JP4178777B2

JP4178777B2 - Robot apparatus, recording medium, and program

Info

Publication number: JP4178777B2
Application number: JP2001266703A
Authority: JP
Inventors: 和夫石井; 渡小野木; 亮半谷
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2001-09-04
Filing date: 2001-09-04
Publication date: 2008-11-12
Anticipated expiration: 2021-09-04
Also published as: JP2003076398A

Abstract

PROBLEM TO BE SOLVED: To give a more suitable response in accordance with speech from a user. SOLUTION: A voice recognition part 24 performs voice recognition when acquiring feature parameters of a voice collected by a microphone 11. The voice recognition part 24 informs a melody generation part 32 or the like of a prescribed keyword when detecting the prescribed keyword. A voice pitch analysis part 28 analyzes the voice of the keyword on the basis of an indication from the melody generation part 32 and extracts a pitch frequency of the voice. The voice pitch analysis part 28 selects a musical scale corresponding to the extracted pitch frequency and informs the melody generation part 32 of the musical scale. The melody generation part 32 reads out melody data stored in a melody data storage part 33 and converts the melody data on the basis of the musical scale informed from the voice pitch analysis part 28. New melody data generated by the melody generation part 32 are reproduced by a voice synthesis part 34 and are outputted from a speaker 14.

Description

【０００１】
【発明の属する技術分野】
本発明は、ロボット装置、記録媒体、並びにプログラムに関し、特に、ユーザからの発話に応じて、より好適な応答を行うことができるようにするロボット装置、記録媒体、並びにプログラムに関する。
【０００２】
【従来の技術】
近年、例えば、周囲の環境や、自らの内部状態に応じて自律的に各種の行動をとる、エンターテイメント用のペットロボットが実現されている。
【０００３】
そして、このようなペットロボットは、例えば、使用者から話しかけられたときに、それに応答する音を出したり、使用者から頭を叩かれたときに怒っていることを表わす音を出すようになされている。また、ユーザがあまり遊んでくれずに、その内部状態が「さみしい」という感情になったとき、「さみしい」ことを表わす（遊んで欲しいことを知らせる）音を出すようになされている。
【０００４】
【発明が解決しようとする課題】
しかしながら、このようにペットロボットから出力される音のデータは、ペットロボットに予め用意されているものであるため、内蔵されている記憶容量などの観点から、その種類が制限されてしまうという課題があった。すなわち、出力される音がパターン化されてしまうことになる。
【０００５】
従って、しばらくの間その行動を観察すれば、ユーザは、次に、ペットロボットから発せられる音を容易に予測できるようになってしまい、ペットロボットとのコミュニケーションが面白みの欠けたものとなってしまう。
【０００６】
本発明はこのような状況に鑑みてなされたものであり、ユーザからの呼びかけに応じて、より好適な応答を行うことができるようにしたものである。
【０００７】
【課題を解決するための手段】
本発明の第１のロボット装置は、第１の音階から構成される第１の旋律データを記憶する記憶手段と、入力された音声のピッチ周波数を抽出する抽出手段と、抽出手段により抽出されたピッチ周波数に基づいて第２の音階を選択する選択手段と、記憶手段により記憶されている第１の旋律データを構成する第１の音階を、選択手段により選択された第２の音階に変換して第２の旋律データを生成する生成手段と、生成手段により生成された第２の旋律データを再生する再生手段とを備えることを特徴とする。
【０００８】
抽出手段は、所定のキーワードを表わす音声のピッチ周波数を抽出するようにすることができる。
【０００９】
抽出手段は、母音を含む音声のピッチ周波数を抽出するようにすることができる。
【００１０】
抽出手段は、音声の所定の期間に検出されたピッチ周波数の平均値をピッチ周波数として抽出するようにすることができる。
【００１１】
生成手段は、第２の音階を、所定の数のオクターブだけ遷移させて第２の旋律データを生成するようにすることができる。
【００１３】
本発明の第１の記録媒体のプログラムは、第１の音階から構成される第１の旋律データの記憶を制御する記憶制御ステップと、入力された音声のピッチ周波数の抽出を制御する抽出ステップと、抽出ステップの処理により抽出されたピッチ周波数に基づいて第２の音階を選択する選択ステップと、記憶制御ステップの処理により記憶されている第１の旋律データを構成する第１の音階を、選択ステップの処理により選択された第２の音階に変換して第２の旋律データを生成する生成ステップと、生成ステップの処理により生成された第２の旋律データの再生を制御する再生制御ステップとを含むことを特徴とする。
【００１４】
本発明の第１のプログラムは、第１の音階から構成される第１の旋律データの記憶を制御する記憶制御ステップと、入力された音声のピッチ周波数の抽出を制御する抽出ステップと、抽出ステップの処理により抽出されたピッチ周波数に基づいて第２の音階を選択する選択ステップと、記憶制御ステップの処理により記憶されている第１の旋律データを構成する第１の音階を、選択ステップの処理により選択された第２の音階に変換して第２の旋律データを生成する生成ステップと、生成ステップの処理により生成された第２の旋律データの再生を制御する再生制御ステップとを含むことを特徴とする。
【００１５】
本発明の第２のロボット装置は、第１の音階から構成される第１の旋律データを記憶する記憶手段と、自らの内部状態を管理する管理手段と、管理手段により管理される内部状態が変化したとき、内部状態の変化に対応する第２の音階を選択する選択手段と、記憶手段により記憶されている第１の旋律データを構成する第１の音階を、選択手段により選択された第２の音階に変換して第２の旋律データを生成する生成手段と、生成手段により生成された第２の旋律データを再生する再生手段とを備えることを特徴とする。
【００１６】
生成手段は、第２の音階を、所定の数のオクターブだけ遷移させて第２の旋律データを生成するようにすることができる。
【００１８】
本発明の第２の記録媒体のプログラムは、第１の音階から構成される第１の旋律データの記憶を制御する記憶制御ステップと、自らの内部状態を管理する管理ステップと、管理ステップの処理により管理される内部状態が変化したとき、内部状態の変化に対応する第２の音階を選択する選択ステップと、記憶制御ステップの処理により記憶されている第１の旋律データを構成する第１の音階を、選択ステップの処理により選択された第２の音階に変換して第２の旋律データを生成する生成ステップと、生成ステップの処理により生成された第２の旋律データの再生を制御する再生制御ステップとを含むことを特徴とする。
【００１９】
本発明の第２のプログラムは、第１の音階から構成される第１の旋律データの記憶を制御する記憶制御ステップと、自らの内部状態を管理する管理ステップと、管理ステップの処理により管理される内部状態が変化したとき、内部状態の変化に対応する第２の音階を選択する選択ステップと、記憶制御ステップの処理により記憶されている第１の旋律データを構成する第１の音階を、選択ステップの処理により選択された第２の音階に変換して第２の旋律データを生成する生成ステップと、生成ステップの処理により生成された第２の旋律データの再生を制御する再生制御ステップとを含むことを特徴とする。
【００２０】
本発明の第１のロボット装置、並びにプログラムにおいては、第１の音階から構成される第１の旋律データが記憶され、入力された音声のピッチ周波数が抽出され、抽出されたピッチ周波数に基づいて第２の音階が選択される。また、記憶されている第１の旋律データを構成する第１の音階が、選択された第２の音階に変換されて第２の旋律データが生成され、第２の旋律データが再生される。
【００２１】
本発明の第２のロボット装置、並びにプログラムにおいては、第１の音階から構成される第１の旋律データが記憶され、自らの内部状態が管理され、管理される内部状態が変化したとき、内部状態の変化に対応する第２の音階が選択される。また、記憶されている第１の旋律データを構成する第１の音階が、選択された第２の音階に変換されて第２の旋律データが生成され、生成された第２の旋律データが再生される。
【００２２】
【発明の実施の形態】
図１は、本発明を適用したペットロボット１の外観構成の例を示す斜視図である。
【００２３】
図に示すように、例えば、ペットロボット１は、四つ足の犬形状のものとされており、胴体部ユニット２の前後左右に、それぞれ脚部ユニット３Ａ，３Ｂ，３Ｃ，３Ｄが連結されるとともに、胴体部ユニット２の前端部と後端部に、それぞれ頭部ユニット４と尻尾部ユニット５が連結されている。
【００２４】
尻尾部ユニット５は、胴体部ユニット２の上面に設けられたベース部５Ｂから、２自由度をもって、湾曲または揺動自在に引き出されている。
【００２５】
このような外観構成を有するペットロボット１には、後に詳述するように、例えば、ユーザから「おはよう」などと呼びかけられた場合に、それに応答するための旋律（メロディ）データが用意されている。そして、ペットロボット１は、ユーザからの音声を分析し、その分析結果に基づいて、用意されている旋律データを変換し、変換後に得られた旋律データを再生する。すなわち、ユーザ毎、或いはその発話毎に旋律データが変換されて、出力されることになる。
【００２６】
従って、１種類の旋律データであっても、毎回、ユーザからの発話に応じた旋律データに変換されるため、ユーザの歌いかけに対応した音が、ペットロボット１から出力されることになる。これにより、ペットロボット１との相互理解度が深まり、コミュニケーションに飽きてしまうといったことを抑制することができ、より好適なコミュニケーションを図ることができる。
【００２７】
図２は、図１のペットロボット１の内部構成の例を示すブロック図である。
【００２８】
胴体部ユニット２には、ペットロボット１の全体を制御するコントローラ１０が格納されている。このコントローラ１０には、基本的に、CPU(Central Processing Unit)１０Ａ、およびCPU１０Ａが各部を制御するためのプログラムが記憶されているメモリ１０Ｂが設けられている。
【００２９】
また、胴体部ユニット２には、コントローラ１０の他、例えば、ペットロボット１の動力源となるバッテリ（図示せず）等も格納されている。
【００３０】
図２に示すように、頭部ユニット４には、外部からの刺激を感知するセンサとしての、音を感知する「耳」に相当するマイクロフォン１１、CCD(Charge Coupled Device)やCMOS(Complementary Metal Oxide Semiconductor)イメージセンサなどから構成され、光を感知する「目」に相当するカメラ１２、およびユーザがふれることによる圧力等を感知する触覚に相当するタッチセンサ１３が、それぞれ所定の位置に設けられている。また、頭部ユニット４には、ペットロボット１の「口」に相当するスピーカ１４が、所定の位置に設置されている。
【００３１】
脚部ユニット３Ａ乃至３Ｄのそれぞれの関節部分、脚部ユニット３Ａ乃至３Ｄのそれぞれと胴体部ユニット２の連結部分、頭部ユニット４と胴体部ユニット２の連結部分、並びに尻尾部ユニット５と胴体部ユニット２の連結部分などには、アクチュエータが設置されている。アクチュエータは、コントローラ１０からの指示に基づいて各部を動作させる。
【００３２】
図２の例においては、脚部ユニット３Ａには、アクチュエータ３ＡＡ₁乃至３ＡＡ_kが設けられ、脚部ユニット３Ｂには、アクチュエータ３ＢＡ₁乃至３ＢＡ_kが設けられている。また、脚部ユニット３Ｃには、アクチュエータ３ＣＡ₁乃至３ＣＡ_kが設けられ、脚部ユニット３Ｄには、アクチュエータ３ＤＡ₁乃至３ＤＡ_kが設けられている。さらに、頭部ユニット４には、アクチュエータ４Ａ₁乃至４Ａ_Lが設けられており、尻尾部ユニット５には、アクチュエータ５Ａ₁および５Ａ₂がそれぞれ設けられている。
【００３３】
頭部ユニット４に設置されるマイクロフォン１１は、ユーザからの発話を含む周囲の音声（音）を集音し、得られた音声信号をコントローラ１０に出力する。カメラ１２は、周囲の状況を撮像し、得られた画像信号を、コントローラ１０に出力する。タッチセンサ１３は、例えば、頭部ユニット４の上部に設けられており、ユーザからの「撫でる」や「叩く」といった物理的な働きかけにより受けた圧力を検出し、その検出結果を圧力検出信号としてコントローラ１０に出力する。
【００３４】
コントローラ１０は、マイクロフォン１１、カメラ１２、およびタッチセンサ１３から与えられる音声信号、画像信号、圧力検出信号に基づいて、周囲の状況や、ユーザからの指令、ユーザからの働きかけなどの有無を判断し、その判断結果に基づいて、ペットロボット１が次にとる行動を決定する。そして、コントローラ１０は、その決定に基づいて、必要なアクチュエータを駆動させ、これにより、頭部ユニット４を上下左右に振らせたり、尻尾部ユニット５を動かせたり、脚部ユニット３Ａ乃至３Ｄのそれぞれを駆動して、ペットロボット１を歩行させるなどの行動をとらせる。
【００３５】
その他にも、コントローラ１０は、ペットロボット１の「目」の位置に設けられたLED(Light Emitting Diode)４１（図３参照）を点灯、消灯または点滅させるなどの処理を行う。
【００３６】
図３は、図２のコントローラ１０の機能的構成例を示すブロック図である。なお、図３に示す各機能は、メモリ１０Bに記憶されている制御プログラムをCPU１０Aが実行することで実現される。
【００３７】
マイクロフォン１１からの音声信号は、AD(Analog Digital)変換部２１に供給される。AD変換部２１は、マイクロフォン１１から供給されてきた音声信号をサンプリング、および量子化し、音声データに変換する。AD変換部２１により取得された音声データは、音声特徴量分析部２２、および音声ピッチ分析部２８に供給される。
【００３８】
音声特徴量分析部２２は、入力される音声データについて、適当なフレーム毎に、例えば、MFCC(Mel Frequency Cepstrum Coefficient)分析を行い、その分析結果を、特徴パラメータ（特徴ベクトル）として、音声区間検出部２３に出力する。なお、音声特徴量分析部２２では、その他、例えば、線形予測係数、ケプストラム係数、線スペクトル対、所定の周波数帯域ごとのパワー（フィルタバンクの出力）等を、特徴パラメータとして抽出することが可能である。
【００３９】
音声区間検出部２３は、音声特徴量分析部２２から供給される特徴パラメータに基づいて、例えば、所定の閾値以上の値をとるパワーが所定の期間継続して検出されたか否かに基づいて音声区間を検出する。そして、音声区間検出部２３は、検出した音声区間情報を、音声特徴量分析部２２から供給されてきた特徴パラメータとともに音声認識部２４に出力する。また、音声区間検出部２３により検出された音声区間情報は、音声ピッチ分析部２８にも供給されている。
【００４０】
音声認識部２４は、音声認識する音声の言語における個々の音素や音節などの音響的な特徴を表す音響モデル、認識対象の各単語について、その発音に関する情報（音韻情報）が記述された単語辞書、および、単語辞書に登録されている各単語が、どのように連鎖する（つながる）のかを記述した文法規則を記憶している。ここで、文法規則としては、例えば、文脈自由文法（CFG）や、統計的な単語連鎖確率（N-gram）などに基づく規則を用いることができる。
【００４１】
そして、音声認識部２４は、音声特徴量分析部２２からの特徴パラメータを用いて、そのような音響モデル、単語辞書、および文法規則を必要に応じて参照しながら、マイクロフォン１１に入力された音声を、例えば、連続分布HMM(Hidden Markov Model)法に基づいて音声認識する。
【００４２】
具体的には、音声認識部２４は、単語辞書を参照し、音響モデルを接続することで、単語の音響モデル（単語モデル）を構成する。また、音声認識部２４は、幾つかの単語モデルを、文法規則を参照することにより接続し、そのようにして接続された単語モデルと特徴パラメータに基づき、連続分布HMM法によって、マイクロフォン１１に入力された音声を認識する。即ち、音声認識部２４は、音声特徴量分析部２２（音声区間検出部２３）が出力する時系列の特徴パラメータが観測されるスコア（尤度）が最も高い単語モデルの系列を検出し、その単語モデルの系列に対応する単語列を、音声の認識結果として出力する。
【００４３】
つまり、音声認識部２４は、接続された単語モデルに対応する単語列について、音声特徴量分析部２２からの特徴パラメータの出現確率を累積し、その累積値をスコアとして、そのスコアを最も高くする単語列を、音声認識結果として出力する。
【００４４】
例えば、音声認識部２４は、「歩け」、「伏せ」、「ボールを追いかけろ」等の指令その他を、音声認識情報として、本能・感情管理部２６、行動管理部２７に通知する。また、音声認識部２４は、所定のキーワードを認識したとき、それを旋律生成部３２等に通知する。
【００４５】
カメラ１２により撮像された画像信号、およびタッチセンサ１３により検出された圧力検出信号は、センサデータ処理部２５に出力される。センサデータ処理部２５には、カメラ１２から供給されてきた画像信号に基づいて画像認識を行う画像認識部、およびタッチセンサ１３から供給されてきた圧力検出信号を処理するタッチセンサ入力処理部（いずれも図示せず）が設けられている。
【００４６】
この画像認識部は、カメラ１２から与えられる画像信号を用いて、画像認識を行い、その処理の結果、例えば、「赤い丸いもの」や、「地面に対して垂直な、かつ、所定の高さ以上の平面」等を検出したときには、「ボールがある」や、「壁がある」等の画像認識結果を画像認識情報として、本能・感情管理部２６、および行動管理部２７に出力する。
【００４７】
タッチセンサ入力処理部は、タッチセンサ１３から与えられる圧力検出信号を処理し、その処理の結果、所定の閾値以上で、かつ、短時間の圧力を検出したときには、「叩かれた（しかられた）」と認識し、所定の閾値未満で、かつ、長時間の圧力を検出したときには、「なでられた（ほめられた）」と認識して、その認識結果を、状態認識情報として、本能・感情管理部２６、および行動管理部２７に出力する。
【００４８】
本能・感情管理部２６は、ペットロボット１の本能、または感情を表わすパラメータを管理し、本能の状態、または感情の状態を所定のタイミングで行動管理部２７に出力する。
【００４９】
図４は、図３の本能・感情管理部２６の機能構成例を模式的に示す図であり、図４に示すように、本能・感情管理部２６は、ペットロボット１の感情を表現する感情モデル５１と、本能を表現する本能モデル５２を記憶し、管理している。
【００５０】
感情モデル５１は、例えば、「うれしさ」、「悲しさ」、「怒り」、「驚き」、「恐れ」、「嫌悪」等の感情の状態（度合い）を、所定の範囲（例えば、０乃至１００等）の感情パラメータによってそれぞれ表し、音声認識部２４、およびセンサデータ処理部２５からの出力や時間経過等に基づいて、その値を変化させる。
【００５１】
この例においては、感情モデル５１は、「うれしさ」を表わす感情ユニット５１Ａ、「悲しさ」を表わす感情ユニット５１Ｂ、「怒り」を表わす感情ユニット５１Ｃ、「驚き」を表わす感情ユニット５１Ｄ、「恐れ」を表わす感情ユニット５１Ｅ、および、「嫌悪」を表わす感情ユニット５１Ｆから構成されている。
【００５２】
本能モデル５２は、例えば、「食欲」、「睡眠欲」、「運動欲」等の本能による欲求の状態（度合い）を、所定の範囲の本能パラメータによってそれぞれ表し、音声認識部２４、およびセンサデータ処理部２５からの出力や時間経過等に基づいて、その値を変化させる。また、本能モデル５２は、行動履歴に基づいて「運動欲」を表わすパラメータを高めたり、或いは、バッテリー残量に基づいて「食欲」を表わすパラメータを高めたりする。
【００５３】
この例においては、本能モデル５２は、「運動欲」を表わす本能ユニット５２Ａ、「愛情欲」を表わす本能ユニット５２Ｂ、「食欲」を表わす本能ユニット５２Ｃ、「好奇心」を表わす本能ユニット５２Ｄ、および「睡眠欲」を表わす本能ユニット５２Ｅから構成されている。
【００５４】
本能・感情管理部２６は、このような感情ユニット５１Ａ乃至５１Ｆと本能ユニット５２Ａ乃至５２Ｅのパラメータを変化させることにより、ペットロボット１の感情と本能の状態を表現し、その変化をモデル化している。
【００５５】
また、感情ユニット５１Ａ乃至５１Ｆと本能ユニット５２Ａ乃至５２Ｅのパラメータは、外部からの入力だけでなく、図の矢印で示すように、それぞれのユニット同士が相互に影響しあうことによっても変化される。
【００５６】
例えば、「うれしさ」を表現する感情ユニット５１Ａと「悲しさ」を表現する感情ユニット５１Ｂが相互抑制的に結合することにより、本能・感情管理部２６は、ユーザにほめてもらったときには「うれしさ」を表現する感情ユニット５１Ａのパラメータを大きくするとともに、「悲しさ」を表現する感情ユニット５１Ｂのパラメータを小さくするなどして、表現する感情を変化させる。
【００５７】
また、感情モデル５１を構成する各ユニット同士、および本能モデル５２を構成する各ユニット同士だけでなく、双方のモデルを超えて、それぞれのユニットのパラメータが変化される。
【００５８】
例えば、図に示すように、本能モデル５２の「愛情欲」を表わす本能ユニット５２Ｂや、「食欲」を表わす本能ユニット５２Ｃのパラメータの変化に応じて、感情モデル５１の「悲しさ」を表現する感情ユニット５１Ｂや「怒り」を表現する感情ユニット５１Ｃのパラメータが変化される。
【００５９】
具体的には、「食欲」を表わす本能ユニット５２Ｃのパラメータが大きくなったとき、感情モデル５１の「悲しさ」を表現する感情ユニット５１Ｂや、「怒り」を表現する感情ユニット５１Ｃのパラメータが大きくなる。
【００６０】
なお、より詳細には、感情モデル５１や本能モデル５２だけでなく、成長モデルが本能・感情管理部２６に用意され、その成長段階によって、感情モデル５１や本能モデル５２の各ユニットのパラメータが変化される。この成長モデルは、例えば、「幼年期」、「青年期」、「熟年期」、「老年期」等の成長の状態（度合い）を、所定の範囲の値によってそれぞれ表し、音声認識部２４、およびセンサデータ処理部２５からの出力や時間経過等に基づいて、その値を変化させる。
【００６１】
本能・感情管理部２６は、感情モデル、本能モデル等のパラメータで表される感情、本能等の状態を、内部情報として行動管理部２７に出力する。
【００６２】
なお、本能・感情管理部２６には、音声認識部２４、およびセンサデータ処理部２５から認識情報が供給される他に、行動管理部２７から、ペットロボット１の現在、または過去の行動、具体的には、例えば、「長時間歩いた」などの行動の内容を示す行動情報が供給されるようになされている。そして、本能・感情管理部２６は、同一の認識情報等が与えられた場合であっても、行動情報により示されるペットロボット１の行動に応じて、異なる内部情報を生成する。
【００６３】
例えば、ペットロボット１がユーザに挨拶をし、ユーザに頭を撫でられた場合には、ユーザに挨拶をしたという行動情報と、頭を撫でられたという認識情報が本能・感情管理部２６に供給される。このとき、本能・感情管理部２６においては、「うれしさ」を表す感情ユニット５１Ａの値が増加される。
【００６４】
図３の説明に戻り、行動管理部２７は、音声認識部２４、およびセンサデータ処理部２５から供給されてきた情報と、本能・感情管理部２６から供給されてきた内部情報、および時間経過等に基づいて次の行動を決定し、決定した行動を表わす情報をコマンド生成部２９に通知する。
【００６５】
図５は、図３の行動管理部２７の機能構成例を示す模式図である。
【００６６】
行動管理部２７は、行動モデルライブラリ６１と行動選択部６２から構成されており、行動モデルライブラリ６１は、予め設定された条件（トリガ）に対応させて、図に示すように、各種の行動モデルを有している。
【００６７】
図の例においては、行動モデルライブラリ６１には、ボールを検出した場合にとる行動を示すボール対応行動モデル６１Ａ、ボールを見失った場合などにとる行動を示す自律検索行動モデル６１Ｂ、上述した感情モデル５１の変化を検出した場合にとる行動を示す感情表現行動モデル６１Ｃが用意されている。また、障害物を検出した場合にとる行動を示す障害物回避行動モデル６１Ｄ、転倒したことを検出した場合にとる行動を示す転倒復帰行動モデル６１Ｅ、およびバッテリー残量が少なくなった場合にとる行動を示すバッテリー管理行動モデル６１Ｆが用意されている。
【００６８】
そして、行動選択部６２は、音声認識部２４、センサデータ処理部２５等から供給される情報と、本能・感情管理部２６から供給される内部情報、および時間経過等を参照し、ペットロボット１が次にとる行動を行動モデルライブラリ６１に用意されている行動モデルから選択する。
【００６９】
なお、行動選択部６２は、現在のペットロボット１の状態を表わすノードから、どの状態を表わすノードに遷移するのかを、それぞれのノード間を結ぶアークに設定されている遷移確率に基づいて決定する、図６に示すような有限確率オートマトンを用いて選択する。
【００７０】
図６に示す有限確率オートマトンは、例えば、現在の状態がノード０（NODE₀）である場合、確率P₁でノード１（NODE₁）に遷移し、確率P₂でノード２（NODE₂）に遷移し、確率P_nでノードｎ（NODE_n）に遷移することを示している。また、この有限確率オートマトンは、確率P_n-1でノード０（NODE₀）に遷移すること、すなわち、いずれのノードにも遷移しないことを示している。
【００７１】
そして、行動モデルライブラリ６１に規定されるそれぞれの行動モデルは、複数の状態を表わすノードから構成されており、それぞれのノードには、他のノードに遷移する確率が記述された状態遷移表が設定されている。
【００７２】
図７は、行動モデルライブラリ６１に規定される、所定の行動モデルに属するノード１００（NODE₁₀₀）の状態遷移表の例を示す図である。
【００７３】
図７においては、他のノード（または自らのノード）に遷移する条件としての入力イベントが「入力イベント名」の欄に記載されており、遷移する条件として規定される「入力イベント名」についての、さらなる条件が「データ名」、および「データ範囲」の欄に記載されている。
【００７４】
図の例において、ID「１」が設定されている情報は、「ボールを検出した」（入力イベント名「BALL」）ことに関する情報が行動管理部２７に通知され、かつ、検出されたボールの大きさが「０乃至１０００」の範囲であるとき、「３０パーセント」の確率でノード１００からノード１２０に遷移することを示している。
【００７５】
また、ID「２」が設定されている情報は、「頭を軽く叩かれた」（入力イベント名「PAT」）ことに関する情報が行動管理部２７に通知されてきたとき、「４０パーセント」の確率でノード１００からノード１５０に遷移することを示している。
【００７６】
さらに、ID「３」が設定されている情報は、「頭を強く叩かれた」（入力イベント名「HIT」）ことに関する情報が行動管理部２７に通知されてきたとき、「２０パーセント」の確率でノード１００からノード１５０に遷移することを示している。
【００７７】
ID「４」が設定されている情報は、「障害物を検出した」（入力イベント名「OBSTACLE」）ことに関する情報が行動管理部２７に通知され、かつ、その障害物までの距離が「０乃至１００」の範囲であることが通知されてきたとき、「１００パーセント」の確率でノード１００からノード１０００に遷移することを示している。
【００７８】
また、この例においては、音声認識部２４やセンサデータ処理部２５等からの入力がない場合であっても、感情モデル５１や本能モデル５２のパラメータに応じて他のノードに遷移するようになされている。
【００７９】
例えば、ID５が設定されている情報は、感情モデル５１の「うれしさ（JOY）」を表わす感情ユニット５１Ａのパラメータが「５０乃至１００」の範囲にあるとき、「５パーセント」の確率でノード１２０に遷移し、「３０パーセント」の確率でノード１５０に遷移することを示している。
【００８０】
ID６が設定されている情報は、感情モデル５１の「驚き（SURPRISE）」を表わす感情ユニット５１Ｄのパラメータが「５０乃至１００」の範囲にあるとき、「１５パーセント」の確率でノード１０００に遷移することを示しており、ID７が設定されている情報は、感情モデル５１の「悲しみ（SUDNESS）」を表わす感情ユニット５１Ｂのパラメータが「５０乃至１００」の範囲にあるとき、「５０パーセント」の確率でノード１２０に遷移することを示している。
【００８１】
また、図７の状態遷移表には、それぞれのノードに遷移したとき、そこで出力される行動が「出力行動」の欄に記述されており、例えば、ノード１２０に遷移したとき、そこで出力される行動は「ACTION１」であり、ノード１５０に遷移したとき、そこで出力される行動は「ACTION２」であり、ノード１０００に遷移したとき、そこで出力される行動は「MOVE BACK」（後退）であるとされている。
【００８２】
行動選択部６２は、このような状態遷移表を参照することで状態を遷移させ、そのノードに設定されている行動を指示する情報を、本能・感情管理部２６、コマンド生成部２９等に出力する。
【００８３】
そして、例えば、行動管理部２７により管理され、遷移したノードに規定されている行動が「こんにちはを話す」である場合、コマンド生成部２９は、行動管理部２７からの指示に基づいて発話コマンドを生成し、音声合成部３４に出力する。
【００８４】
また、遷移したノードに規定されている行動が「後退する」である場合、コマンド生成部２９は、行動管理部２７からの指示に基づいて、それを実行させるコマンドを生成し、制御部３１に出力する。また、コマンド生成部２９は、ペットロボット１の「目」の位置に設けられているLED４１を点灯、消灯または点滅させるとき、それを指示するコマンドを生成し、LED制御部３０に出力する。
【００８５】
LED制御部３０は、コマンド生成部２９から供給されてきたコマンドに基づいてLED４１を制御する。
【００８６】
制御部３１は、コマンド生成部２９から供給されるコマンドに基づいて、アクチュエータ３ＡＡ₁乃至５Ａ₂を制御し、ペットロボット１の姿勢を現在の姿勢から次の姿勢に遷移させる。
【００８７】
ここで、現在の姿勢から次に遷移可能な姿勢は、例えば、胴体や手や足の形状、重さ、各部の結合状態のようなペットロボット１の物理的形状と、関節が曲がる方向や角度のようなアクチュエータ３ＡＡ₁乃至５Ａ₂の機構とによって決定される。
【００８８】
なお、姿勢には、現在の姿勢から直接遷移可能な姿勢と、直接遷移できない姿勢がある。例えば、４本足のペットロボット１は、手足を大きく投げ出して寝転んでいる状態から、伏せた状態へ直接遷移することはできるが、立った状態へ直接遷移することはできず、一旦、手足を胴体近くに引き寄せて伏せた姿勢になり、それから立ち上がるという２段階の動作が必要である。また、安全に実行できない姿勢も存在する。例えば、ペットロボット１は、４本足で立っている姿勢から、両前足を挙げてバンザイをしようとすると、簡単に転倒してしまう。
【００８９】
このため、制御部３１は、直接遷移可能な姿勢をあらかじめ登録しておき、コマンド生成部２９から供給されるコマンドが、直接遷移可能な姿勢を示す場合には、それに基づいて対応するアクチュエータを駆動する。
【００９０】
一方、コマンド生成部２９から供給されるコマンドが、直接遷移不可能な姿勢を示す場合には、制御部３１は、遷移可能な他の姿勢に一旦遷移した後に、目的の姿勢まで遷移させるように、対応するアクチュエータを駆動する。これによりペットロボット１が、遷移不可能な姿勢を無理に実行しようとする事態や、転倒するような事態を回避することができる。
【００９１】
旋律生成部３２は、音声認識部２４からキーワードが認識されたことが通知されてきたとき、そのキーワードのピッチ周波数を分析することを音声ピッチ分析部２８に指示する。そして、旋律生成部３２は、キーワードの音声が分析され、ピッチ周波数に基づいて選択された音階が音声ピッチ分析部２８から通知されてきたとき、旋律データ記憶部３３から所定の旋律データを読み出し、通知されてきた音階に基づいて、その旋律データを変換する。
【００９２】
すなわち、旋律データ記憶部３３には、ペットロボット１から出力される音（メロディ）の旋律データが、例えば、MIDI形式で複数用意されている。
【００９３】
旋律生成部３２は、例えば、旋律データの各音階に、音声ピッチ分析部２８から供給されてきた音階で置換するなどの処理を施し、新たな旋律データを生成する。旋律生成部３２により生成された旋律データは、音声合成部３４に出力される。
【００９４】
音声ピッチ分析部２８は、旋律生成部３２から指示されたとき、AD変換部２１から供給されてきたキーワードの入力音声を分析し、個々の時刻のピッチ周波数を算出する。例えば、音声ピッチ分析部２８は、ピッチ周波数に対応する時間シフト量を変量とし、自己相関係数が最大の値をとる時間シフト量に基づいて、入力音声の個々の時刻におけるピッチ周波数を算出する。例えば、音声ピッチ分析部２８による入力音声のサンプリング周波数が１６ｋHzであり、自己相関係数が最大の値をとる時間シフト量が１１８である場合、音声ピッチ分析部２８は、ピッチ周波数ｆ（Hz）をｆ_PITCH＝１６０００／１１８＝１３５．５９Hzと算出する。
【００９５】
そして、音声ピッチ分析部２８は、算出したピッチ周波数に基づいて、音階を選択するための所定のピッチ周波数を抽出し、抽出したピッチ周波数に対応する音階を選択する。音声ピッチ分析部２８は、例えば、所定の期間の平均値からなるピッチ周波数を複数抽出し、それに基づいて複数の音階を選択する。音声ピッチ分析部２８により選択された音階は、旋律生成部３２に通知され、旋律データを変換するために利用される。
【００９６】
音声合成部３４は、コマンド生成部２９から所定の発話コマンドと、その発話の内容を示すテキストデータが供給されてきたとき、内部に有する音声合成用辞書を参照し、そのテキストデータに対応する、例えば、WAVフォーマットなどによる音声データを生成する。
【００９７】
より詳細には、この音声合成用辞書には、各単語の品詞情報や、読み、アクセント等の情報が記述された単語辞書、その単語辞書に記述された単語について、単語連鎖に関する制約等の生成用文法規則、音声情報としての音素片データが記憶された音声情報辞書が格納されている。
【００９８】
音声合成部３４は、単語辞書、および生成用文法規則に基づいて、入力されるテキストの形態素解析や、構文解析等のテキスト解析（言語解析）を行い、音声合成に必要な情報を抽出する。音声合成に必要な情報としては、例えば、ポーズの位置や、アクセント、イントネーション、パワー等を制御するための韻律情報、各単語の発音を表す音韻情報などがある。
【００９９】
音声合成部３４は、抽出した各種の情報に基づいて、音声情報辞書を参照しながら、必要な音素片データを接続し、さらに、音素片データの波形を加工することによって、ポーズ、アクセント、イントネーション等を適切に付加し、供給されてきたテキストデータに対応する音声データを生成する。
【０１００】
音声合成部３４により生成された音声データは、DA変換部３５においてディジタルアナログ変換され、スピーカ１４から出力される。
【０１０１】
また、音声合成部３４は、旋律生成部３２から旋律データが供給されてきたとき、それを再生し、スピーカ１４から出力させる。
【０１０２】
次に、以上のような構成を有するペットロボット１の動作について説明する。
【０１０３】
始めに、図８のフローチャートを参照して、入力音声に基づいて選択した音階を利用して旋律データを生成し、それを再生するペットロボット１の処理について説明する。
【０１０４】
ステップＳ１において、音声認識部２４は、音声認識の結果、キーワードを検出したか否かを判定し、キーワードを検出したと判定するまで待機する。例えば、ユーザにより発話された音声は、上述したようにマイクロフォン１１により集音され、各処理が施された後、音声認識部２４に供給されている。
【０１０５】
音声認識部２４は、複数のキーワードを記憶しており、それに基づいて、入力されてきた音声がキーワードであるか否かを判定する。このキーワードは、例えば、周波数変化が比較的少なく、ピッチ周波数を抽出しやすい母音（例えば、「あ」の音）からなるものとしてもよい。
【０１０６】
そして、音声認識部２４は、ステップＳ１において、キーワードを検出したと判定した場合、ステップＳ２に進み、それを行動管理部２７、および旋律生成部３２に通知する。
【０１０７】
ステップＳ３において、旋律生成部３２は、音声ピッチ分析部２８に対して、検出されたキーワードの分析を行うことを指示する。
【０１０８】
音声ピッチ分析部２８は、ステップＳ４において、旋律生成部３２からの指示に応じて、AD変換部２１から供給されるキーワードの波形の分析を行い、ピッチ周波数を抽出する。
【０１０９】
図９は、ピッチ周波数を抽出する音声ピッチ分析部２８の処理を説明する図である。なお、図９においては、キーワードとして「ららら」が検出された場合の例とされている。
【０１１０】
そして、図９の上方に示す図は、縦軸を音声の振幅、横軸を時刻とするキーワード「ららら」の波形を表わしている。また、図９の下方に示す図の縦軸はキーワード「ららら」のピッチ周波数を表わしており、横軸は時刻を表わしている。
【０１１１】
音声ピッチ分析部２８は、図９の上方に示すようなキーワード「ららら」の波形が供給されてきたとき、それぞれの時刻のピッチ周波数を、上述したように時間シフト量とサンプリング周波数に基づいて算出する。
【０１１２】
そして、音声ピッチ分析部２８は、例えば、それぞれの「ら」の音の音声区間（ｔ₁，ｔ₂，ｔ₃）を３等分し、３等分した音声区間のうち、中央の区間の平均のピッチ周波数をそれぞれ抽出する。この音声区間に関する情報は、音声区間検出部２３により検出され、供給されてきたものである。
【０１１３】
図の例においては、音声区間ｔ₁が音声区間ｔ₁₁，ｔ₁₂，ｔ₁₃とそれぞれ３等分され、同様に、音声区間ｔ₂が音声区間ｔ₂₁，ｔ₂₂，ｔ₂₃とされ、音声区間ｔ₃が音声区間ｔ₃₁，ｔ₃₂，ｔ₃₃とされている。
【０１１４】
そして、音声区間ｔ₁₂の区間におけるピッチ周波数の平均値が１３１とされ、音声区間ｔ₂₂の区間におけるピッチ周波数の平均値が１４８とされ、音声区間ｔ₃₂の区間におけるピッチ周波数の平均値が１６３とされている。
【０１１５】
なお、このように、キーワードのそれぞれの音の中央の区間におけるピッチ周波数の平均値を算出し、それを抽出するだけでなく、様々な方法によりピッチ周波数を抽出することもできる。ピッチ周波数の抽出方法としては、例えば、キーワードのそれぞれの音の平均のピッチ周波数を抽出するようにしてもよいし、所定の区間における最大のピッチ周波数を抽出するようにしてもよい。また、それぞれの音の中央の時刻におけるピッチ周波数を抽出するようにしてもよい。
【０１１６】
そして、音声ピッチ分析部２８は、ステップＳ５において、このようにして抽出したピッチ周波数に基づいて所定の音階を選択する。
【０１１７】
図１０は、音声ピッチ分析部２８が管理する、ピッチ周波数と音階の対応例を示す図である。図１０の対応例は、「Ａ４」を「４４０Hz」の音としたものであり、「C１」から「B７」までの音階と、それぞれの周波数が示されている。
【０１１８】
従って、音声ピッチ分析部２８は、例えば、図９を参照して説明したように、「ららら」のキーワードから「１３１Hz，１４８Hz，１６３Hz」のピッチ周波数を抽出した場合、それぞれのピッチ周波数に対応する「Ｃ３，Ｄ３，Ｅ３」の音階を選択する。すなわち、音声ピッチ分析部２８は、抽出したピッチ周波数に、最も近い周波数の音階を選択する。
【０１１９】
なお、ペットロボット１は、図１０に示した周波数のうち、認識の容易さから、例えば、８０Hz乃至４００Hzの範囲のピッチ周波数を抽出するようになされている。
【０１２０】
このようにして音声ピッチ分析部２８により選択された音階は、旋律生成部３２に通知される。
【０１２１】
一方、行動管理部２７は、ステップＳ６において、音声認識部２４からキーワードが検出されたことが通知されてきたとき、旋律生成部３２に対して、旋律データの生成を指示する。また、旋律生成部３２に対しては、その指示とともに、変換する旋律データの識別情報も通知されている。
【０１２２】
ステップＳ７において、旋律生成部３２は、行動管理部２７から通知されてきた識別情報に対応する旋律データを旋律データ記憶部３３から読み出し、音声ピッチ分析部２８から供給されてきた音階に基づいて、新たな旋律データを生成する。
【０１２３】
図１１は、旋律データ記憶部３３に記憶されている旋律データを楽譜上に表わしたものの例を示す図であり、この旋律データ１−１の音階１乃至９はそれぞれ「Ｃ３，Ｃ３，Ｅ３，Ｇ３，Ｃ３，Ｃ３，Ｅ３，Ｇ３，Ｃ３（＋１２）」である。なお、音階９は、音階１（「Ｃ３」）を基準として、例えば、ピアノ上などで１２鍵盤だけ上の音であることを示している。このように、旋律データを構成する音階の種類が、キーワードにより抽出された音階の数以上あるとき、旋律データを構成する所定の音階は、他の音階を基準として表わされる。
【０１２４】
そして、旋律生成部３２は、新たな旋律データを生成するために、始めに、図１２に示すように、各音階を所定の文字に置換する。なお、この処理は、説明の便宜上用いたものであり、実際には、旋律データ１−１から旋律データ１−３（図１３）に直接変換されるようにしてもよい。
【０１２５】
例えば、旋律生成部３２は、旋律データ１−１の「Ｃ３」を「Ｘ」に、「Ｄ３」を「Ｙ」に、「Ｅ３」を「Ｚ」にそれぞれ置換する。これにより、旋律データ１−１は、旋律データ１−２に示すように「Ｘ，Ｘ，Ｙ，Ｚ，Ｘ，Ｘ，Ｙ，Ｚ，Ｘ（＋１２）」となる。
【０１２６】
また、旋律生成部３２は、旋律データ１−２の各文字に対して、音声ピッチ分析部２８により選択された音階を対応付ける。図１３は、音声ピッチ分析部２８により選択された音階が対応付けられて、新たな旋律データが生成される処理を説明する図である。
【０１２７】
例えば、上述したように、キーワード「ららら」に基づいて「Ｃ３，Ｄ３，Ｅ３」の音階が音声ピッチ分析部２８により選択された場合、旋律生成部３２は、旋律データ１−２の「Ｘ，Ｙ，Ｚ」に「Ｃ３，Ｄ３，Ｅ３」をそれぞれ対応付ける。そして、旋律生成部３２は、図に示すように「Ｃ３，Ｃ３，Ｄ３，Ｅ３，Ｃ３，Ｃ３，Ｄ３，Ｅ３，Ｃ３（＋１２）」の音階の並びからなる旋律データ１−３を生成する。すなわち、この時点において、キーワード「ららら」から抽出されたピッチ周波数に基づいて、予め用意されている旋律データ１−１が旋律データ１−３に変換されている（音階３および４、並びに音階７および８が他の音階にそれぞれ変換されている）。
【０１２８】
さらに、旋律生成部３２は、以上のようにして生成した旋律データが比較的低い音の場合、生成した旋律データの各音を所定の数のオクターブだけ高音に遷移し、新たな旋律データを生成する。当然、生成した旋律データが比較的高い音の場合、数オクターブだけ下げるようにしてもよい。
【０１２９】
図１４は、旋律データの各音階を、例えば、それぞれ３オクターブだけ遷移させて新たな旋律データを生成する旋律生成部３２の処理を説明する図である。
【０１３０】
図に示すように、旋律データ１−３のそれぞれの音階が３オクターブだけ高音に遷移しており、「Ｃ６，Ｃ６，Ｄ６，Ｅ６，Ｃ６，Ｃ６，Ｄ６，Ｅ６，Ｃ６（＋１２）」の音階の並びからなる旋律データ１−４が生成されている。すなわち、予め用意されている旋律データ１−１から、旋律データ１−４が新たに生成されている。
【０１３１】
そして、旋律生成部３２により生成された旋律データは、音声合成部３４に出力される。
【０１３２】
音声合成部３４は、ステップＳ８において、旋律生成部３２から供給されてきた旋律データを再生し、スピーカ１４から出力させる。なお、出力される音の長さは、旋律データ記憶部３３に予め記憶されている旋律データの通りでもよいし、上述したようにして抽出されたピッチ周波数や、ピッチ周波数が検出された時刻、或いは検出区間の範囲などに基づいて変換されるようにしてもよい。
【０１３３】
図１５は、旋律データ記憶部３３に記憶されている他の旋律データと、その変換処理を説明する図である。
【０１３４】
図１５において、旋律データ２−１の音の並びは「Ｃ３，Ｅ３，Ｇ３，Ｅ３，Ｇ３，Ｃ３（−５），Ｅ３，Ｇ３，Ｅ３，Ｇ３」とされている。そして、上述したように、キーワード「ららら」から「Ｃ３，Ｄ３，Ｅ３」の音階が選択された場合、旋律データ２−１の「Ｃ３，Ｅ３，Ｇ３」がそれぞれ「Ｃ３，Ｄ３，Ｅ３」に変換され（矢印▲１▼の処理）、「Ｃ３，Ｄ３，Ｅ３，Ｄ３，Ｅ３，Ｃ３（−５），Ｄ３，Ｅ３，Ｄ３，Ｅ３」の音の並びからなる旋律データ２−２に変換される。すなわち、この時点において、旋律データ２−１を構成する音階２乃至５、および音階７乃至１０が他の音階にそれぞれ変換されている。
【０１３５】
また、旋律データ２−２の各音は、旋律生成部３２により、それぞれ３オクターブだけ高音に遷移され（矢印▲２▼の処理）、「Ｃ６，Ｄ６，Ｅ６，Ｄ６，Ｅ６，Ｃ６（−５），Ｄ６，Ｅ６，Ｄ６，Ｅ６」の音の並びからなる旋律データ２−３が生成されている。そして、この旋律データ２−３が再生され、対応する音が出力される。
【０１３６】
図１６は、旋律データ記憶部３３に記憶されているさらに他の旋律データと、その変換処理を説明する図である。
【０１３７】
図１６において、旋律データ３−１の音の並びは「Ｃ３，Ｅ３，Ｇ３，Ｃ３（＋１２），Ｃ３，Ｅ３，Ｇ３，Ｃ３（＋１２），Ｃ３（−５），Ｃ３（−１）」とされている。そして、上述したように、キーワード「ららら」から「Ｃ３，Ｄ３，Ｅ３」の音階が選択された場合、旋律データ３−１の「Ｃ３，Ｅ３，Ｇ３」がそれぞれ「Ｃ３，Ｄ３，Ｅ３」に変換され（矢印
【外１】

の処理）、「Ｃ３，Ｄ３，Ｅ３，Ｃ３（＋１２），Ｃ３，Ｄ３，Ｅ３，Ｃ３（＋１２），Ｃ３（−５），Ｃ３（−１）」の音の並びからなる旋律データ２−２に変換される。すなわち、この時点において、旋律データ２−１を構成する音階２および３、並びに音階６および７が他の音階にそれぞれ変換されている。
【０１３８】
また、旋律データ３−２の各音は、旋律生成部３２により、それぞれ３オクターブだけ高音に遷移され（矢印
【外２】

の処理）、「Ｃ６，Ｄ６，Ｅ６，Ｃ６（＋１２），Ｃ６，Ｄ６，Ｅ６，Ｃ６（＋１２），Ｃ６（−５），Ｃ６（−１）」の音の並びからなる旋律データ３−３が生成される。そして、この旋律データ３−３が再生され、対応する音が出力されることになる。
【０１３９】
以上のように、予め用意されている旋律データを、キーワードの入力音声から抽出したピッチ周波数に基づいて変換することにより、仮に同じキーワードを複数のユーザが順次発話した場合であっても、同じ旋律データから、それぞれ異なる音が出力されることになる。
【０１４０】
従って、ペットロボット１から出力される音が予測のつかないものとなるため、ペットロボット１とのコミュニケーションが、より面白みのあるものとなる。
【０１４１】
以上においては、抽出したピッチ周波数に対応する音階を選択し、その音階に基づいて変換した後に得られる旋律データを、所定の数のオクターブだけ遷移させるとしたが、当然、反対に、抽出したピッチ周波数に対応する音階を選択し、その音階を所定の数のオクターブだけ遷移させたものを用いて、予め用意されている旋律データを変換するようにしてもよい。
【０１４２】
この場合、例えば、キーワード「ららら」のピッチ周波数に基づいて選択された「Ｃ３，Ｄ３，Ｅ３」の音階は、それぞれ３オクターブだけ高音に遷移されて「Ｃ６，Ｄ６，Ｅ６」の音階とされる。そして、例えば、図１１の旋律データ１−１の「Ｃ３，Ｅ３，Ｇ３」がそれぞれ「Ｃ６，Ｄ６，Ｅ６」により変換されて、「Ｃ６，Ｃ６，Ｄ６，Ｅ６，Ｃ６，Ｃ６，Ｄ６，Ｅ６，Ｃ６（＋１２）」の音の並びからなる旋律データ１−４が生成される。
【０１４３】
また、ピッチ周波数に基づいて選択した音階を、単に所定の数のオクターブだけ遷移させるだけでなく、ペットロボット１による発話の周波数領域内の音に遷移させるようにしてもよい。
【０１４４】
図１７は、ピッチ周波数に基づいて選択された音階を、所定の周波数の範囲に遷移させる処理を説明する図である。
【０１４５】
図１７Ａは、ピッチ周波数に基づいて選択された音階の例を示す図であり、上述したように、キーワード「ららら」に基づいて、「Ｃ３，Ｄ３，Ｅ３」の音階が選択されている。そして、選択された音階が、常に、ペットロボット１の標準発話域の範囲内にあるように、図１７Ｂに示すように遷移される。
【０１４６】
例えば、この例においては、選択した音階の中央の音「Ｄ３」が「Ｃ６」の音に遷移されている。
【０１４７】
従って、このようにペットロボット１の標準発話域に遷移された音階「Ｂ５，Ｃ６，Ｄ６」に基づいて旋律データが変換された場合、旋律データ１−１は、「Ｂ５，Ｂ５，Ｃ６，Ｄ６，Ｂ５，Ｂ５，Ｃ６，Ｄ６，Ｂ５（＋１２）」の音の並びからなる旋律データに変換される。すなわち、変換された旋律データの各音階は、いずれも図１７に示すペットロボット１の標準発話領域「Ｆ５」乃至「Ｆ６」の範囲内のものとなる。
【０１４８】
このように、ピッチ周波数に基づいて選択した音階を様々な領域に遷移することにより、ペットロボット１から出力される音を制御することができる。これにより、入力された音声の高低が、直接ペットロボット１から出力される音に現れることを抑制することができる。すなわち、このような周波数領域の遷移を行わない場合、高い声が入力された場合、それに応じて高音がペットロボット１から出力されることになる。
【０１４９】
以上においては、音声のピッチ周波数に対応する音階に基づいて旋律データを変換するとしたが、様々な情報に基づいて旋律データを変換し、それを出力させることもできる。
【０１５０】
次に、図１８のフローチャートを参照して、感情の変化に基づいて選択した音階を利用して旋律データを生成し、それを再生するペットロボット１の処理について説明する。
【０１５１】
行動管理部２７は、ステップＳ２１において、本能・感情管理部２６から通知されてくる内部情報に基づいて、ペットロボット１の感情の変化を検出したか否かを判定し、検出したと判定するまで待機する。
【０１５２】
そして、行動管理部２７は、ステップＳ２１において、感情の変化を検出したと判定した場合、ステップＳ２２に進み、ペットロボット１の感情が変化したことを表わす情報を、変換する旋律データの識別情報とともに旋律生成部３２に通知する。
【０１５３】
旋律生成部３２は、ステップＳ２３において、行動管理部２７から感情が変化したことが通知されてきたとき、旋律データを変換するために用いる音階を、変化した感情に基づいて選択する。
【０１５４】
図１９は、変化したペットロボット１の感情と、音階の対応例を示す図であり、このような対応が旋律生成部３２に予め用意されている。
【０１５５】
この例においては、旋律生成部３２は、感情が「JOY（うれしさ）」に変化したとき、「Ｃ，Ｅ，Ｇ」の音階を選択し、感情が「SAD（悲しさ）」に変化したとき、「Ｃ，Ｄ♯，Ｇ」の音階を選択し、感情が「ANGRY（怒り）」に変化したとき、「Ｃ，Ｃ，Ｆ」の音階を選択するとされている。また、旋律生成部３２は、感情が「SURPRISE（驚き）」に変化したとき、「Ｃ，Ｄ♯，Ｆ♯」の音階を選択し、感情が「DISGUST（嫌悪）」に変化したとき、「Ｃ，Ｆ♯，Ｃ」の音階を選択し、感情が「FEAR（おそれ）」に変化したとき、「Ｃ，Ｃ♯，Ｃ」の音階を選択するとされている。
【０１５６】
例えば、「JOY（うれしさ）」にメジャーな音を対応付け、「SAD（悲しさ）」にマイナーな音を対応付けることにより、ペットロボット１の感情がうれしいときには、一般的に明るい感じがする音が出力されることになり、悲しいときには、一般的に暗く悲しげな感じがする音が出力されることになる。従って、ペットロボット１の感情を表現することができる。
【０１５７】
さらに、感情値が大きいときには、音程を上げると自然なので、感情値が１０のときは音程を＋１０シフトすることも考えられる。
【０１５８】
そして、ステップＳ２４において、旋律生成部３２は、行動管理部２７から指示された旋律データを旋律データ記憶部３３から読み出し、選択した音階に基づいて、その旋律データを変換する。
【０１５９】
図２０は、旋律データの変換処理を説明する図である。
【０１６０】
図２０においては、図１１に示した旋律データ１−１を変換する場合の例とされ、また、旋律生成部３２に対して、感情が「ANGRY（怒り）」に変化したことが通知されてきた場合の例とされている。
【０１６１】
従って、旋律生成部３２は、図１９に示したような対応テーブルから、「Ｃ，Ｃ，Ｆ」の音階を選択し、それに基づいて旋律データ１−１を変換する。例えば、旋律生成部３２は、出力される音がペットロボット１の標準発話域内のものとなるように、選択した音階「Ｃ，Ｃ，Ｆ」を「Ｃ６，Ｃ６，Ｆ６」とし、旋律データ１−１の「Ｃ３，Ｅ３，Ｇ３」を、それぞれ「Ｃ６，Ｃ６，Ｆ６」で変換する。
【０１６２】
そして、以上のような変換により「Ｃ６，Ｃ６，Ｃ６，Ｆ６，Ｃ６，Ｃ６，Ｃ６，Ｆ６，Ｃ６（＋１２）」の音の並びからなる旋律データ１−５が生成される。
【０１６３】
また、例えば、ペットロボット１の感情が「SAD（悲しさ）」に変化したことが通知されてきた場合、旋律生成部３２は、「Ｃ，Ｄ♯，Ｇ」の音階を選択し、出力される音がペットロボット１の標準発話域内のものとなるように、その音階を「Ｃ６，Ｄ６♯，Ｇ６」とする。そして、旋律生成部３２は、「Ｃ６，Ｄ６♯，Ｇ６」の音階を用いて旋律データ１−１を変換し、「Ｃ６，Ｃ６，Ｄ６♯，Ｇ６，Ｃ６，Ｃ６，Ｄ６♯，Ｇ６，Ｃ６（＋１２）」の音の並びからなる新たな旋律データを生成する。
【０１６４】
このようにして生成された旋律データは、音声合成部３４に供給され、ステップＳ２５において、音声合成部３４により再生される。
【０１６５】
以上のように、ペットロボット１により検出された様々な情報に基づいて、予め用意されている旋律データを変換し、新たな旋律データを生成することができる。例えば、カメラ１２により撮像された画像に含まれる、所定の色（光）の割合を抽出し、それに基づいて音階を選択するようにしてもよいし、タッチセンサ１３により検出された外部からの圧力のレベルに基づいて、音階を選択し、それを利用して新たな旋律データを生成するようにしてもよい。また、マイクロフォン１１により集音された音声などの音量に基づいて新たな旋律データを生成することもできる。
【０１６６】
これにより、ユーザの予測のできない、無数のパターンの音（メロディ）をペットロボット１に出力させることができる。また、予め用意しておく旋律データの数を減らすことが可能となる。
【０１６７】
上述した各種の処理は、図１に示したような動物型のロボットに実行させるだけでなく、例えば、２足歩行が可能な人間型のロボットや、コンピュータ内で活動する仮想ロボット等に実行させるようにしてもよい。
【０１６８】
上述した一連の処理は、ハードウェアにより実行させることもできるが、ソフトウェアにより実行させることもできる。この場合、そのソフトウェアを実行させる情報処理装置は、例えば、図２１に示されるようなパーソナルコンピュータにより構成される。
【０１６９】
図２１において、CPU７１は、ROM（Read Only Memory）７２に記憶されているプログラム、または、記憶部７８からRAM（Random Access Memory）７３にロードされたプログラムに従って各種の処理を実行する。RAM７３にはまた、CPU７１が各種の処理を実行する上において必要なデータなどが適宜記憶される。
【０１７０】
CPU７１、ROM７２、およびRAM７３は、バス７４を介して相互に接続されている。このバス７４にはまた、入出力インタフェース７５も接続されている。
【０１７１】
入出力インタフェース７５には、キーボード、マウスなどよりなる入力部７６、CRT(Cathode Ray Tube)，LCD(Liquid Crystal Display)などよりなるディスプレイ、並びにスピーカなどよりなる出力部７７、ハードディスクなどより構成される記憶部７８、モデム、ターミナルアダプタなどより構成される通信部７９が接続されている。通信部７９は、ネットワークを介しての通信処理を行う。
【０１７２】
入出力インタフェース７５にはまた、必要に応じてドライブ８０が接続され、磁気ディスク８１、光ディスク８２、光磁気ディスク８３、或いは半導体メモリ８４などが適宜装着される。
【０１７３】
一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、図２１に示すような汎用のパーソナルコンピュータなどに、ネットワークや記録媒体からインストールされる。
【０１７４】
この記録媒体は、図２１に示すように、装置本体とは別に、ユーザにプログラムを提供するために配布される、プログラムが記録されている磁気ディスク８１（フロッピディスクを含む）、光ディスク８２（CD-ROM(Compact Disk-Read Only Memory)，DVD(Digital Versatile Disk)を含む）、光磁気ディスク８３（MD（登録商標）(Mini-Disk)を含む）、もしくは半導体メモリ８４などよりなるパッケージメディアにより構成されるだけでなく、装置本体に予め組み込まれた状態でユーザに提供される、プログラムが記録されているROM７２や、記憶部７８に含まれるハードディスクなどで構成される。
【０１７５】
なお、本明細書において、記録媒体に記録されるプログラムを記述するステップは、記載された順序に従って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。
【０１７６】
【発明の効果】
本発明の第１のロボット装置、並びにプログラムによれば、ユーザからの発話に応じて、より好適な応答を行うことができる。
【０１７７】
本発明の第２のロボット装置、並びにプログラムによれば、ユーザが予測のできない音を発生することができる。
【図面の簡単な説明】
【図１】本発明を適用したペットロボットの外観の例を示す斜視図である。
【図２】図１のペットロボットの内部構成の例を示すブロック図である。
【図３】図１のペットロボットの機能構成の例を示すブロック図である。
【図４】図３の本能・感情管理部の機能の例を模式的に示す図である。
【図５】図３の行動管理部の機能の例を模式的に示す図である。
【図６】有限オートマトンの例を示す図である。
【図７】状態の遷移確率の例を示す図である
【図８】図１のペットロボットの処理を説明するフローチャートである。
【図９】ピッチ周波数を抽出する処理を説明する図である。
【図１０】音階とピッチ周波数との対応例を示す図である
【図１１】旋律データの例を示す図である。
【図１２】旋律データを変換する処理を説明する図である。
【図１３】旋律データを変換する他の処理を説明する図である。
【図１４】旋律データを変換するさらに他の処理を説明する図である。
【図１５】他の旋律データを変換する処理を説明する図である。
【図１６】さらに他の旋律データを変換する処理を説明する図である。
【図１７】音階の遷移を説明する図である。
【図１８】図１のペットロボットの他の処理を説明するフローチャートである。
【図１９】感情と音階の対応例を示す図である。
【図２０】旋律データを変換する処理を説明する図である。
【図２１】パーソナルコンピュータの例を示すブロック図である。
【符号の説明】
１０コントローラ，１１マイクロフォン，１４スピーカ，２１ AD変換部，２２音声特徴量分析部，２３音声区間検出部，２４音声認識部，２６本能・感情管理部，２７行動管理部，２８音声ピッチ分析部，３２旋律生成部，３３旋律データ記憶部，３４音声合成部，３５ DA変換部，８１磁気ディスク，８２光ディスク，８３光磁気ディスク，８４半導体メモリ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a robot apparatus. , A robot apparatus that enables a more suitable response to a recording medium and a program, in particular, according to an utterance from a user. , The present invention relates to a recording medium and a program.
[0002]
[Prior art]
In recent years, for example, an entertainment pet robot has been realized that autonomously performs various actions in accordance with the surrounding environment and its internal state.
[0003]
Such a pet robot, for example, makes a sound that responds when spoken by the user, or a sound that indicates angry when the user hits the head. ing. Further, when the user does not play much and the feeling of the internal state becomes "Sad", a sound indicating "Sad" (notifying that he wants to play) is made.
[0004]
[Problems to be solved by the invention]
However, since the sound data output from the pet robot is prepared in advance in the pet robot, there is a problem that the type is limited from the viewpoint of the built-in storage capacity. there were. That is, the output sound will be patterned.
[0005]
Therefore, if the behavior is observed for a while, the user can easily predict the sound emitted from the pet robot next, and the communication with the pet robot becomes less interesting. .
[0006]
The present invention has been made in view of such a situation, and can provide a more suitable response in response to a call from a user.
[0007]
[Means for Solving the Problems]
The first robot apparatus of the present invention is extracted by the storage means for storing the first melody data composed of the first scale, the extraction means for extracting the pitch frequency of the input voice, and the extraction means. The selection means for selecting the second scale based on the pitch frequency and the first scale constituting the first melody data stored in the storage means are converted into the second scale selected by the selection means. And generating means for generating the second melody data, and reproducing means for reproducing the second melody data generated by the generating means.
[0008]
The extracting means can extract the pitch frequency of the voice representing the predetermined keyword.
[0009]
The extraction means can extract the pitch frequency of the voice including vowels.
[0010]
The extraction means can extract an average value of pitch frequencies detected during a predetermined period of the voice as the pitch frequency.
[0011]
The generating means can generate the second melody data by shifting the second scale by a predetermined number of octaves.
[0013]
The program of the first recording medium of the present invention includes a storage control step for controlling storage of the first melody data composed of the first scale, and an extraction step for controlling extraction of the pitch frequency of the input voice. The selection step for selecting the second scale based on the pitch frequency extracted by the processing of the extraction step, and the first scale constituting the first melody data stored by the processing of the storage control step are selected. A generation step for generating the second melody data by converting to the second scale selected by the step processing, and a reproduction control step for controlling the reproduction of the second melody data generated by the generation step processing It is characterized by including.
[0014]
The first program of the present invention includes a storage control step for controlling the storage of the first melody data composed of the first scale, an extraction step for controlling the extraction of the pitch frequency of the input voice, and an extraction step. The selection step of selecting the second scale based on the pitch frequency extracted by the processing of the step, and the first scale constituting the first melody data stored by the processing of the storage control step, processing of the selection step Including a generation step of generating second melody data by converting to the second scale selected by the step, and a reproduction control step of controlling the reproduction of the second melody data generated by the processing of the generation step. Features.
[0015]
The second robot apparatus of the present invention has a storage means for storing the first melody data composed of the first scale, a management means for managing its own internal state, and an internal state managed by the management means. When changing, the selecting means for selecting the second scale corresponding to the change of the internal state, and the first scale constituting the first melody data stored in the storing means is selected by the selecting means. And generating means for generating second melody data by converting into two scales, and reproducing means for reproducing the second melody data generated by the generating means.
[0016]
The generating means can generate the second melody data by shifting the second scale by a predetermined number of octaves.
[0018]
The program of the second recording medium of the present invention includes a storage control step for controlling storage of the first melody data composed of the first scale, a management step for managing its own internal state, and processing of the management step When the internal state managed by the control changes, the selection step of selecting the second scale corresponding to the change of the internal state, and the first melody data stored in the first melody data stored by the processing of the storage control step A generation step for converting the musical scale to the second musical scale selected by the processing of the selection step to generate second melody data, and reproduction for controlling the reproduction of the second melody data generated by the processing of the generation step And a control step.
[0019]
The second program of the present invention is managed by the storage control step for controlling the storage of the first melody data composed of the first scale, the management step for managing its own internal state, and the processing of the management step. When the internal state changes, the selection step of selecting the second scale corresponding to the change of the internal state, and the first scale constituting the first melody data stored by the processing of the storage control step, A generation step for generating the second melody data by converting to the second scale selected by the process of the selection step, and a reproduction control step for controlling the reproduction of the second melody data generated by the process of the generation step; It is characterized by including.
[0020]
First robot apparatus of the present invention , In the program, the first melody data composed of the first scale is stored, the pitch frequency of the input voice is extracted, and the second scale is selected based on the extracted pitch frequency. Further, the first musical scale constituting the stored first musical melody data is converted into the selected second musical scale to generate second melody data, and the second melody data is reproduced.
[0021]
Second robot apparatus of the present invention , In the program, the first melody data composed of the first scale is stored, and the internal state is managed. When the managed internal state changes, the second melody corresponding to the change in the internal state is stored. A scale is selected. In addition, the first scale constituting the stored first melody data is converted to the selected second scale to generate the second melody data, and the generated second melody data is reproduced. Is done.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a perspective view showing an example of an external configuration of a pet robot 1 to which the present invention is applied.
[0023]
As shown in the figure, for example, the pet robot 1 has a four-legged dog shape, and

leg units

3A, 3B, 3C, and 3D are connected to the front and rear, left and right of the torso unit 2, respectively. At the same time, the head unit 4 and the tail unit 5 are connected to the front end portion and the rear end portion of the body unit 2, respectively.
[0024]
The tail unit 5 is pulled out from a base portion 5B provided on the upper surface of the body unit 2 so as to be curved or swingable with two degrees of freedom.
[0025]
In the pet robot 1 having such an external configuration, as will be described in detail later, for example, when the user calls “Good morning” or the like, melody data for responding to the pet robot 1 is prepared. . Then, the pet robot 1 analyzes the voice from the user, converts the prepared melody data based on the analysis result, and reproduces the melody data obtained after the conversion. That is, the melody data is converted and output for each user or each utterance.
[0026]
Therefore, even if one type of melody data is converted into melody data corresponding to the utterance from the user each time, the sound corresponding to the user's singing is output from the pet robot 1. Thereby, it is possible to suppress the degree of mutual understanding with the pet robot 1 from becoming bored with communication, and more suitable communication can be achieved.
[0027]
FIG. 2 is a block diagram showing an example of the internal configuration of the pet robot 1 of FIG.
[0028]
The body unit 2 stores a controller 10 that controls the entire pet robot 1. The controller 10 is basically provided with a CPU (Central Processing Unit) 10A and a memory 10B in which a program for the CPU 10A to control each unit is stored.
[0029]
In addition to the controller 10, for example, a battery (not shown) serving as a power source for the pet robot 1 is also stored in the body unit 2.
[0030]
As shown in FIG. 2, the head unit 4 includes a microphone 11 corresponding to an “ear” that senses sound, a CCD (Charge Coupled Device), and a CMOS (Complementary Metal Oxide) as a sensor that senses external stimuli. Semiconductor) image sensors and the like, and a camera 12 corresponding to an “eye” that senses light and a touch sensor 13 that corresponds to a tactile sensation that senses pressure caused by the user touching are provided at predetermined positions, respectively. Yes. The head unit 4 is provided with a speaker 14 corresponding to the “mouth” of the pet robot 1 at a predetermined position.
[0031]
The joint portions of the leg units 3A to 3D, the connection portions of the leg units 3A to 3D and the body unit 2, the connection portions of the head unit 4 and the body unit 2, and the tail unit 5 and the body portion An actuator is installed at a connecting portion of the unit 2 or the like. The actuator operates each unit based on an instruction from the controller 10.
[0032]
In the example of FIG. 2, the leg unit 3A includes an actuator 3AA. ₁ Thru 3AA _k The leg unit 3B includes an actuator 3BA. ₁ Thru 3BA _k Is provided. The leg unit 3C includes an actuator 3CA. ₁ Thru 3CA _k The leg unit 3D includes an actuator 3DA. ₁ Thru 3DA _k Is provided. Further, the head unit 4 includes an actuator 4A. ₁ To 4A _L The tail unit 5 includes an actuator 5A. ₁ And 5A ₂ Are provided.
[0033]
The microphone 11 installed in the head unit 4 collects surrounding sounds (sounds) including speech from the user, and outputs the obtained sound signals to the controller 10. The camera 12 images the surrounding situation and outputs the obtained image signal to the controller 10. The touch sensor 13 is provided, for example, in the upper part of the head unit 4 and detects the pressure received by a physical action such as “blow” or “slap” from the user, and the detection result is used as a pressure detection signal. Output to the controller 10.
[0034]
Based on the audio signal, image signal, and pressure detection signal given from the microphone 11, camera 12, and touch sensor 13, the controller 10 determines the surrounding situation, presence / absence of commands from the user, user's actions, and the like. Based on the determination result, the next action of the pet robot 1 is determined. Based on the determination, the controller 10 drives necessary actuators, thereby causing the head unit 4 to swing up and down, left and right, the tail unit 5 to move, and the leg units 3A to 3D. To drive the pet robot 1 to walk.
[0035]
In addition, the controller 10 performs processing such as turning on, turning off, or blinking an LED (Light Emitting Diode) 41 (see FIG. 3) provided at the “eye” position of the pet robot 1.
[0036]
FIG. 3 is a block diagram illustrating a functional configuration example of the controller 10 of FIG. Each function shown in FIG. 3 is realized by the CPU 10A executing the control program stored in the memory 10B.
[0037]
The audio signal from the microphone 11 is supplied to an AD (Analog Digital) converter 21. The AD conversion unit 21 samples and quantizes the audio signal supplied from the microphone 11 and converts it into audio data. The audio data acquired by the AD conversion unit 21 is supplied to the audio feature amount analysis unit 22 and the audio pitch analysis unit 28.
[0038]
The voice feature amount analysis unit 22 performs, for example, MFCC (Mel Frequency Cepstrum Coefficient) analysis on the input voice data for each appropriate frame, and uses the analysis result as a feature parameter (feature vector) to detect a voice section. To the unit 23. In addition, the speech feature quantity analysis unit 22 can extract, for example, linear prediction coefficients, cepstrum coefficients, line spectrum pairs, power for each predetermined frequency band (filter bank output), and the like as feature parameters. is there.
[0039]
Based on the feature parameters supplied from the voice feature quantity analysis unit 22, the voice section detection unit 23, for example, based on whether or not power having a value equal to or greater than a predetermined threshold value has been continuously detected for a predetermined period. Detect intervals. Then, the speech segment detection unit 23 outputs the detected speech segment information to the speech recognition unit 24 together with the feature parameter supplied from the speech feature amount analysis unit 22. Further, the voice section information detected by the voice section detector 23 is also supplied to the voice pitch analyzer 28.
[0040]
The speech recognition unit 24 is a word dictionary in which information related to pronunciation (phonological information) is described for each recognition target word, an acoustic model representing acoustic features such as individual phonemes and syllables in the speech recognition speech language. , And grammatical rules that describe how each word registered in the word dictionary is linked (connected). Here, as the grammar rule, for example, a rule based on context-free grammar (CFG), statistical word chain probability (N-gram), or the like can be used.
[0041]
Then, the speech recognition unit 24 uses the feature parameters from the speech feature amount analysis unit 22 to refer to such an acoustic model, word dictionary, and grammatical rules as necessary, and inputs the speech input to the microphone 11. Are recognized based on, for example, a continuous distribution HMM (Hidden Markov Model) method.
[0042]
Specifically, the speech recognition unit 24 configures an acoustic model (word model) of words by referring to a word dictionary and connecting acoustic models. The speech recognition unit 24 connects several word models by referring to the grammatical rules, and inputs them to the microphone 11 by the continuous distribution HMM method based on the connected word models and feature parameters. Recognize recorded voice. That is, the speech recognition unit 24 detects a sequence of word models having the highest score (likelihood) in which the time-series feature parameters output by the speech feature amount analysis unit 22 (speech section detection unit 23) are observed, A word string corresponding to the word model series is output as a speech recognition result.
[0043]
That is, the speech recognition unit 24 accumulates the appearance probability of the feature parameter from the speech feature amount analysis unit 22 for the word string corresponding to the connected word model, and uses the accumulated value as a score to make the score the highest. The word string is output as a speech recognition result.
[0044]
For example, the voice recognition unit 24 notifies the instinct / emotion management unit 26 and the behavior management unit 27 as voice recognition information of commands such as “walk”, “turn down”, and “follow the ball”. Further, when the voice recognition unit 24 recognizes a predetermined keyword, it notifies the melody generation unit 32 and the like.
[0045]
The image signal captured by the camera 12 and the pressure detection signal detected by the touch sensor 13 are output to the sensor data processing unit 25. The sensor data processing unit 25 includes an image recognition unit that performs image recognition based on the image signal supplied from the camera 12, and a touch sensor input processing unit that processes the pressure detection signal supplied from the touch sensor 13 (whichever (Not shown).
[0046]
This image recognition unit performs image recognition using an image signal given from the camera 12 and, as a result of the processing, for example, “red round object” or “vertical to the ground and a predetermined height” When the above plane is detected, image recognition results such as “there is a ball” and “there is a wall” are output to the instinct / emotion management unit 26 and the behavior management unit 27 as image recognition information.
[0047]
The touch sensor input processing unit processes the pressure detection signal given from the touch sensor 13, and when the result of the processing detects a pressure that is equal to or greater than a predetermined threshold value and for a short time, the touch sensor input processing unit ) ”, And when a pressure is detected for a long time that is less than a predetermined threshold, it is recognized as“ struck (praised) ”and the recognition result is used as state recognition information. Output to the emotion management unit 26 and the behavior management unit 27.
[0048]
The instinct / emotion management unit 26 manages parameters representing the instinct or emotion of the pet robot 1 and outputs the state of instinct or the state of emotion to the behavior management unit 27 at a predetermined timing.
[0049]
FIG. 4 is a diagram schematically illustrating an example of a functional configuration of the instinct / emotion management unit 26 in FIG. 3. As illustrated in FIG. 4, the instinct / emotion management unit 26 expresses emotions expressing the emotion of the pet robot 1. A model 51 and an instinct model 52 expressing the instinct are stored and managed.
[0050]
The emotion model 51 includes, for example, emotion states (degrees) such as “joyfulness”, “sadness”, “anger”, “surprise”, “fear”, “disgust”, etc. within a predetermined range (for example, 0 to 100) and the like, and the values are changed based on the output from the voice recognition unit 24 and the sensor data processing unit 25, the passage of time, and the like.
[0051]
In this example, the emotion model 51 includes an emotion unit 51A representing “happiness”, an emotion unit 51B representing “sadness”, an emotion unit 51C representing “anger”, an emotion unit 51D representing “surprise”, and “fear” "And an emotion unit 51F representing" disgust ".
[0052]
The instinct model 52 represents, for example, the state (degree) of desires based on instinct such as “appetite”, “sleep desire”, “exercise desire”, etc., by a predetermined range of instinct parameters. The value is changed based on the output from the processing unit 25, the passage of time, or the like. Further, the instinct model 52 increases a parameter indicating “motivation” based on the behavior history, or increases a parameter indicating “appetite” based on the remaining battery level.
[0053]
In this example, the instinct model 52 includes an instinct unit 52A representing “motivation”, an instinct unit 52B representing “love desire”, an instinct unit 52C representing “appetite”, an instinct unit 52D representing “curiosity”, and It is composed of an instinct unit 52E representing "sleep desire".
[0054]
The instinct / emotion management unit 26 expresses the emotion and instinct of the pet robot 1 by changing the parameters of the emotion units 51A to 51F and the instinct units 52A to 52E, and models the changes. .
[0055]
Further, the parameters of the emotion units 51A to 51F and the instinct units 52A to 52E are changed not only by external input but also by mutual influence of each unit as indicated by arrows in the figure.
[0056]
For example, when the emotion unit 51A expressing “happiness” and the emotion unit 51B expressing “sadness” are coupled to each other in an inhibitory manner, the instinct / emotion management unit 26 is “joyed” when the user praises it. The emotion to be expressed is changed by increasing the parameter of the emotion unit 51A expressing “sa” and decreasing the parameter of the emotion unit 51B expressing “sadness”.
[0057]
Further, the parameters of each unit are changed not only between each unit constituting the emotion model 51 and each unit constituting the instinct model 52 but also across both models.
[0058]
For example, as shown in the figure, “sadness” of the emotion model 51 is expressed in accordance with changes in the parameters of the instinct unit 52B representing “loving desire” of the instinct model 52 and the instinct unit 52C representing “appetite”. The parameters of the emotion unit 51B and the emotion unit 51C expressing “anger” are changed.
[0059]
Specifically, when the parameter of the instinct unit 52C representing “appetite” increases, the parameters of the emotion unit 51B representing “sadness” of the emotion model 51 and the emotion unit 51C representing “anger” are large. Become.
[0060]
More specifically, not only the emotion model 51 and the instinct model 52 but also a growth model is prepared in the instinct / emotion management unit 26, and the parameters of each unit of the emotion model 51 and the instinct model 52 change depending on the growth stage. Is done. This growth model represents, for example, growth states (degrees) such as “childhood”, “adolescence”, “mature age”, “old age”, etc., by values within a predetermined range, respectively. The value is changed based on the output from the sensor data processing unit 25, the passage of time, or the like.
[0061]
The instinct / emotion management unit 26 outputs emotion, instinct, and other states represented by parameters such as an emotion model and an instinct model to the behavior management unit 27 as internal information.
[0062]
In addition to the recognition information supplied from the voice recognition unit 24 and the sensor data processing unit 25 to the instinct / emotion management unit 26, the behavior management unit 27 sends the current or past behavior of the pet robot 1, specifically Specifically, for example, behavior information indicating the content of the behavior such as “walked for a long time” is supplied. The instinct / emotion management unit 26 generates different internal information according to the behavior of the pet robot 1 indicated by the behavior information even when the same recognition information or the like is given.
[0063]
For example, when the pet robot 1 greets the user and strokes the head, the behavior information that the user has been greeted and the recognition information that the head has been stroked are supplied to the instinct / emotion management unit 26 Is done. At this time, in the instinct / emotion management unit 26, the value of the emotion unit 51A representing “joy” is increased.
[0064]
Returning to the description of FIG. 3, the behavior management unit 27 includes information supplied from the voice recognition unit 24 and the sensor data processing unit 25, internal information supplied from the instinct / emotion management unit 26, time passage, and the like. The next action is determined based on the information and the command generation unit 29 is notified of information representing the determined action.
[0065]
FIG. 5 is a schematic diagram illustrating a functional configuration example of the behavior management unit 27 in FIG. 3.
[0066]
The behavior management unit 27 includes a behavior model library 61 and a behavior selection unit 62. The behavior model library 61 corresponds to preset conditions (triggers), and various behavior models as shown in the figure. have.
[0067]
In the example shown in the figure, the behavior model library 61 includes a ball-corresponding behavior model 61A indicating an action to be taken when a ball is detected, an autonomous search behavior model 61B indicating an action to be taken when the ball is lost, and the emotion model described above. An emotion expression behavior model 61 </ b> C indicating an action to be taken when 51 changes are detected is prepared. Also, an obstacle avoidance action model 61D indicating an action to be taken when an obstacle is detected, a fall return action model 61E showing an action to be taken when a fall is detected, and an action to be taken when the remaining battery level is low A battery management behavior model 61F is prepared.
[0068]
Then, the action selection unit 62 refers to the information supplied from the voice recognition unit 24, the sensor data processing unit 25, etc., the internal information supplied from the instinct / emotion management unit 26, and the passage of time, etc. The action to be taken next is selected from the action models prepared in the action model library 61.
[0069]
In addition, the action selection unit 62 determines which state represents a transition from the node representing the current state of the pet robot 1 based on the transition probability set for the arc connecting the nodes. These are selected using a finite probability automaton as shown in FIG.
[0070]
In the finite probability automaton shown in FIG. 6, for example, the current state is node 0 (NODE ₀ ), The probability P ₁ At node 1 (NODE ₁ ) And the probability P ₂ At node 2 (NODE ₂ ) And the probability P _n At node n (NODE _n ). This finite probability automaton has a probability P _n-1 At node 0 (NODE ₀ ), That is, no transition to any node.
[0071]
Each behavior model defined in the behavior model library 61 is composed of nodes representing a plurality of states, and each node has a state transition table in which the probability of transition to another node is described. Has been.
[0072]
FIG. 7 shows a node 100 (NODE belonging to a predetermined behavior model defined in the behavior model library 61. ₁₀₀ It is a figure which shows the example of the state transition table of ().
[0073]
In FIG. 7, an input event as a condition for transitioning to another node (or its own node) is described in the “input event name” column, and the “input event name” defined as the transition condition is Further conditions are described in the columns “Data Name” and “Data Range”.
[0074]
In the example of the figure, the information for which the ID “1” is set is that information related to “ball detected” (input event name “BALL”) is notified to the behavior management unit 27 and the detected ball When the size is in the range of “0 to 1000”, the transition from the node 100 to the node 120 is indicated with a probability of “30 percent”.
[0075]
The information set with the ID “2” is “40%” when information regarding “tapping the head” (input event name “PAT”) is notified to the behavior management unit 27. The transition from the node 100 to the node 150 is shown with probability.
[0076]
Further, the information set with the ID “3” is “20%” when the action management unit 27 is notified of information related to “striking your head” (input event name “HIT”). The transition from the node 100 to the node 150 is shown with probability.
[0077]
In the information set with the ID “4”, the information regarding “detected obstacle” (input event name “OBSTACLE”) is notified to the behavior management unit 27 and the distance to the obstacle is “0”. When it is notified that it is in the range of “100” to “100”, the transition from the node 100 to the node 1000 is indicated with a probability of “100 percent”.
[0078]
Further, in this example, even when there is no input from the voice recognition unit 24, the sensor data processing unit 25, etc., transition is made to another node according to the parameters of the emotion model 51 and the instinct model 52. ing.
[0079]
For example, when the parameter of the emotion unit 51A representing “JOY” of the emotion model 51 is in the range of “50 to 100”, the information in which the ID 5 is set is the node 120 with a probability of “5 percent”. , And transition to the node 150 with a probability of “30 percent” is shown.
[0080]
When the parameter of the emotion unit 51D representing “SURPRISE” of the emotion model 51 is in the range of “50 to 100”, the information in which the ID 6 is set transitions to the node 1000 with a probability of “15 percent”. The information in which ID 7 is set indicates that the probability of “50 percent” when the parameter of the emotion unit 51B representing “SUDNESS” of the emotion model 51 is in the range of “50 to 100”. The transition to the node 120 is shown.
[0081]
In the state transition table of FIG. 7, the behavior output when the transition is made to each node is described in the “output behavior” column. For example, when the transition is made to the node 120, the behavior is output there. The action is “ACTION 1”, and when the transition is made to the node 150, the action outputted there is “ACTION 2”, and when the transition is made to the node 1000, the action outputted there is “MOVE BACK” (retreat). Has been.
[0082]
The action selection unit 62 refers to such a state transition table to change the state, and outputs information indicating the action set in the node to the instinct / emotion management unit 26, the command generation unit 29, and the like. To do.
[0083]
Then, for example, is managed by the behavior management unit 27, when a transition to a node that actions defined in is a "speak Hello", the command generating unit 29, the speech commands based on instructions from the behavior management unit 27 Generated and output to the speech synthesizer 34.
[0084]
In addition, when the action specified for the transitioned node is “retreat”, the command generation unit 29 generates a command to execute it based on an instruction from the action management unit 27, and sends the command to the control unit 31. Output. Further, when the LED 41 provided at the “eye” position of the pet robot 1 is turned on, turned off, or blinked, the command generation unit 29 generates a command for instructing the LED 41 and outputs the command to the LED control unit 30.
[0085]
The LED control unit 30 controls the LED 41 based on the command supplied from the command generation unit 29.
[0086]
Based on the command supplied from the command generator 29, the controller 31 controls the actuator 3AA. ₁ To 5A ₂ To control the posture of the pet robot 1 from the current posture to the next posture.
[0087]
Here, the posture that can be transitioned from the current posture is, for example, the physical shape of the pet robot 1 such as the shape and weight of the torso, hands and feet, and the combined state of each part, and the direction and angle at which the joint bends. Actuator 3AA ₁ To 5A ₂ Determined by the mechanism.
[0088]
Note that there are postures that can be directly transitioned from the current posture and postures that cannot be directly transitioned. For example, a four-legged pet robot 1 can make a direct transition from a lying state with its limbs thrown down to a lying state, but cannot make a direct transition to a standing state. A two-step movement is required, which is a close-up posture by pulling close to the torso and then standing up. There are also postures that cannot be executed safely. For example, if the pet robot 1 tries to perform a banzai by raising both front legs from a posture standing on four legs, the pet robot 1 falls easily.
[0089]
For this reason, the control unit 31 registers in advance a posture capable of direct transition, and when the command supplied from the command generation unit 29 indicates a posture capable of direct transition, the corresponding actuator is driven based on the command. To do.
[0090]
On the other hand, when the command supplied from the command generation unit 29 indicates a posture that cannot be directly transitioned, the control unit 31 temporarily transitions to another transitionable posture and then transitions to a target posture. , Drive the corresponding actuator. As a result, it is possible to avoid a situation where the pet robot 1 tries to forcibly execute a posture incapable of transition or a situation where the pet robot 1 falls.
[0091]
When the melody generation unit 32 is notified from the voice recognition unit 24 that the keyword has been recognized, the melody generation unit 32 instructs the voice pitch analysis unit 28 to analyze the pitch frequency of the keyword. Then, when the voice of the keyword is analyzed and the scale selected based on the pitch frequency is notified from the voice pitch analysis unit 28, the melody generation unit 32 reads predetermined melody data from the melody data storage unit 33, The melody data is converted based on the notified scale.
[0092]
That is, in the melody data storage unit 33, a plurality of melody data of sounds (melody) output from the pet robot 1 are prepared in, for example, the MIDI format.
[0093]
For example, the melody generation unit 32 performs processing such as replacing each scale of the melody data with a scale supplied from the voice pitch analysis unit 28 to generate new melody data. The melody data generated by the melody generation unit 32 is output to the voice synthesis unit 34.
[0094]
When instructed by the melody generator 32, the voice pitch analyzer 28 analyzes the input voice of the keyword supplied from the AD converter 21 and calculates the pitch frequency at each time. For example, the voice pitch analysis unit 28 uses the time shift amount corresponding to the pitch frequency as a variable, and calculates the pitch frequency at each time of the input voice based on the time shift amount at which the autocorrelation coefficient takes the maximum value. . For example, when the sampling frequency of the input voice by the voice pitch analysis unit 28 is 16 kHz and the time shift amount at which the autocorrelation coefficient takes the maximum value is 118, the voice pitch analysis unit 28 sets the pitch frequency f (Hz). F _PITCH = 16000/118 = 135.59 Hz.
[0095]
Then, the voice pitch analysis unit 28 extracts a predetermined pitch frequency for selecting a scale based on the calculated pitch frequency, and selects a scale corresponding to the extracted pitch frequency. For example, the voice pitch analysis unit 28 extracts a plurality of pitch frequencies composed of an average value for a predetermined period, and selects a plurality of scales based on the extracted pitch frequencies. The scale selected by the voice pitch analysis unit 28 is notified to the melody generation unit 32 and used to convert the melody data.
[0096]
When a predetermined utterance command and text data indicating the content of the utterance are supplied from the command generation unit 29, the speech synthesizer 34 refers to a speech synthesis dictionary provided therein and corresponds to the text data. For example, voice data in the WAV format is generated.
[0097]
More specifically, in this speech synthesis dictionary, a word dictionary in which information such as part-of-speech information of each word, information of reading, accent, etc. is described, and generation of constraints on word chain for the words described in the word dictionary A speech information dictionary storing phoneme segment data as speech grammar rules and speech information is stored.
[0098]
The speech synthesizer 34 performs text analysis (language analysis) such as morphological analysis and syntax analysis of the input text based on the word dictionary and the grammatical rules for generation, and extracts information necessary for speech synthesis. Information necessary for speech synthesis includes, for example, prosody information for controlling the position of a pose, accent, intonation, power, etc., and phoneme information representing the pronunciation of each word.
[0099]
The speech synthesizer 34 connects necessary phoneme data while referring to the voice information dictionary based on the extracted various information, and further processes the waveform of the phoneme data to pose, accent, and intonation. Etc. are appropriately added to generate voice data corresponding to the supplied text data.
[0100]
The voice data generated by the voice synthesizer 34 is converted from digital to analog by the DA converter 35 and output from the speaker 14.
[0101]
When the melody data is supplied from the melody generation unit 32, the voice synthesis unit 34 reproduces the melody data and outputs it from the speaker 14.
[0102]
Next, the operation of the pet robot 1 having the above configuration will be described.
[0103]
First, the process of the pet robot 1 that generates melody data using the scale selected based on the input voice and reproduces it will be described with reference to the flowchart of FIG.
[0104]
In step S1, the speech recognition unit 24 determines whether or not a keyword has been detected as a result of speech recognition, and waits until it is determined that a keyword has been detected. For example, the voice uttered by the user is collected by the microphone 11 as described above, subjected to each process, and then supplied to the voice recognition unit 24.
[0105]
The voice recognition unit 24 stores a plurality of keywords, and determines whether or not the input voice is a keyword based on the keywords. This keyword may be composed of, for example, a vowel (e.g., “A” sound) that has a relatively small frequency change and that can easily extract the pitch frequency.
[0106]
If the speech recognition unit 24 determines in step S1 that a keyword has been detected, the speech recognition unit 24 proceeds to step S2 and notifies the behavior management unit 27 and the melody generation unit 32 of this.
[0107]
In step S3, the melody generation unit 32 instructs the voice pitch analysis unit 28 to analyze the detected keyword.
[0108]
In step S4, the voice pitch analysis unit 28 analyzes the keyword waveform supplied from the AD conversion unit 21 in accordance with an instruction from the melody generation unit 32, and extracts the pitch frequency.
[0109]
FIG. 9 is a diagram for explaining the processing of the voice pitch analysis unit 28 for extracting the pitch frequency. In FIG. 9, an example is shown in which “Rarara” is detected as a keyword.
[0110]
The diagram shown in the upper part of FIG. 9 represents the waveform of the keyword “Larala” with the vertical axis representing the amplitude of the voice and the horizontal axis representing the time. In addition, the vertical axis of the diagram shown in the lower part of FIG. 9 represents the pitch frequency of the keyword “Larala”, and the horizontal axis represents time.
[0111]
The voice pitch analysis unit 28 calculates the pitch frequency at each time based on the time shift amount and the sampling frequency as described above when the waveform of the keyword “Larala” as shown in the upper part of FIG. 9 is supplied. To do.
[0112]
Then, the voice pitch analysis unit 28, for example, the voice section (t ₁ , T ₂ , T _Three ) Is divided into three equal parts, and the average pitch frequency of the central section is extracted from the divided voice sections. The information related to the voice section has been detected and supplied by the voice section detector 23.
[0113]
In the example of the figure, the voice section t ₁ Is the voice interval t ₁₁ , T ₁₂ , T ₁₃ Are divided into three equal parts, and similarly, the voice interval t ₂ Is the voice interval t _{twenty one} , T _{twenty two} , T _{twenty three} And the voice interval t _Three Is the voice interval t ₃₁ , T ₃₂ , T ₃₃ It is said that.
[0114]
And the voice interval t ₁₂ The average value of the pitch frequency in the section is 131, and the voice section t _{twenty two} The average value of the pitch frequency in the section is 148, and the voice section t ₃₂ The average value of the pitch frequency in this section is 163.
[0115]
As described above, the average value of the pitch frequencies in the central section of each sound of the keyword is not only calculated and extracted, but the pitch frequencies can be extracted by various methods. As a pitch frequency extraction method, for example, an average pitch frequency of each sound of a keyword may be extracted, or a maximum pitch frequency in a predetermined section may be extracted. Alternatively, the pitch frequency at the center time of each sound may be extracted.
[0116]
Then, in step S5, the voice pitch analysis unit 28 selects a predetermined scale based on the pitch frequency extracted in this way.
[0117]
FIG. 10 is a diagram illustrating a correspondence example between the pitch frequency and the scale managed by the voice pitch analysis unit 28. In the correspondence example of FIG. 10, “A4” is changed to “440 Hz”, and the scales from “C1” to “B7” and the respective frequencies are shown.
[0118]
Therefore, for example, as described with reference to FIG. 9, when the pitch frequency of “131 Hz, 148 Hz, 163 Hz” is extracted from the keyword “Larala”, the voice pitch analysis unit 28 corresponds to each pitch frequency. The scale of “C3, D3, E3” is selected. That is, the voice pitch analysis unit 28 selects a scale having a frequency closest to the extracted pitch frequency.
[0119]
Note that the pet robot 1 extracts a pitch frequency in the range of 80 Hz to 400 Hz, for example, from the frequencies shown in FIG. 10 for ease of recognition.
[0120]
The scale selected by the voice pitch analysis unit 28 in this way is notified to the melody generation unit 32.
[0121]
On the other hand, the behavior management unit 27 instructs the melody generation unit 32 to generate melody data when the voice recognition unit 24 is notified in step S6 that the keyword has been detected. Also, the melody generation unit 32 is notified of the melody data identification information to be converted together with the instruction.
[0122]
In step S7, the melody generation unit 32 reads out melody data corresponding to the identification information notified from the behavior management unit 27 from the melody data storage unit 33, and based on the scale supplied from the voice pitch analysis unit 28, Generate new melody data.
[0123]
FIG. 11 is a diagram showing an example of the melody data stored in the melody data storage unit 33 on the musical score. The scales 1 to 9 of the melody data 1-1 are “C3, C3, E3”, respectively. G3, C3, C3, E3, G3, C3 (+12) ". Note that scale 9 indicates a sound that is only 12 keyboards above the scale 1 (“C3”), for example, on a piano. As described above, when the types of scales constituting the melody data are equal to or greater than the number of scales extracted by the keyword, the predetermined scales constituting the melody data are represented with reference to other scales.
[0124]
Then, in order to generate new melody data, the melody generation unit 32 first replaces each scale with a predetermined character as shown in FIG. This process is used for convenience of explanation, and in actuality, the melody data 1-1 may be directly converted into the melody data 1-3 (FIG. 13).
[0125]
For example, the melody generation unit 32 replaces “C3”, “D3” with “Y”, and “E3” with “Z” in the melody data 1-1. As a result, the melody data 1-1 becomes “X, X, Y, Z, X, X, Y, Z, X (+12)” as shown in the melody data 1-2.
[0126]
The melody generation unit 32 associates the scale selected by the voice pitch analysis unit 28 with each character of the melody data 1-2. FIG. 13 is a diagram illustrating a process in which new melody data is generated by associating the scales selected by the voice pitch analysis unit 28 with each other.
[0127]
For example, as described above, when the scale of “C3, D3, E3” is selected by the voice pitch analysis unit 28 based on the keyword “Larala”, the melody generation unit 32 selects “X, “C3, D3, E3” are associated with “Y, Z”, respectively. And the melody production | generation part 32 produces | generates the melody data 1-3 which consists of an arrangement | sequence of the scale of "C3, C3, D3, E3, C3, C3, D3, E3, C3 (+12)" as shown in the figure. That is, at this time point, the melody data 1-1 prepared in advance is converted into the melody data 1-3 based on the pitch frequency extracted from the keyword “Larala” (scales 3 and 4, and scale 7). And 8 are converted to other scales, respectively).
[0128]
Further, when the melody data generated as described above is a relatively low tone, the melody generation unit 32 generates a new melody data by shifting each tone of the generated melody data to a high tone by a predetermined number of octaves. To do. Of course, when the generated melody data has a relatively high sound, it may be lowered by a few octaves.
[0129]
FIG. 14 is a diagram for explaining the processing of the melody generation unit 32 for generating new melody data by shifting each scale of the melody data by, for example, 3 octaves.
[0130]
As shown in the figure, each scale of the melody data 1-3 has shifted to a high tone by 3 octaves, and the scale of “C6, C6, D6, E6, C6, C6, D6, E6, C6 (+12)”. The melody data 1-4 consisting of these sequences is generated. That is, melody data 1-4 is newly generated from melody data 1-1 prepared in advance.
[0131]
The melody data generated by the melody generation unit 32 is output to the voice synthesis unit 34.
[0132]
In step S <b> 8, the speech synthesizer 34 reproduces the melody data supplied from the melody generator 32 and outputs it from the speaker 14. The length of the sound to be output may be the same as the melody data stored in advance in the melody data storage unit 33, the pitch frequency extracted as described above, the time when the pitch frequency is detected, Or you may make it convert based on the range of a detection area, etc.
[0133]
FIG. 15 is a diagram for explaining other melody data stored in the melody data storage unit 33 and its conversion process.
[0134]
In FIG. 15, the sound arrangement of the melody data 2-1 is “C3, E3, G3, E3, G3, C3 (−5), E3, G3, E3, G3”. Then, as described above, when the scale “C3, D3, E3” is selected from the keyword “Larala”, “C3, E3, G3” in the melody data 2-1 becomes “C3, D3, E3”, respectively. Converted (process of arrow (1)) and converted into melody data 2-2 consisting of a sequence of sounds of “C3, D3, E3, D3, E3, C3 (−5), D3, E3, D3, E3”. The That is, at this time, scales 2 to 5 and scales 7 to 10 constituting melody data 2-1 are converted to other scales, respectively.
[0135]
Further, each sound of the melody data 2-2 is shifted to a high tone by 3 octaves by the melody generation unit 32 (processing of arrow (2)), and “C6, D6, E6, D6, E6, C6 (−5”). ), D6, E6, D6, E6 ", melody data 2-3 is generated. And this melody data 2-3 is reproduced | regenerated and a corresponding sound is output.
[0136]
FIG. 16 is a diagram for explaining still another melody data stored in the melody data storage unit 33 and its conversion process.
[0137]
In FIG. 16, the arrangement of sounds in the melody data 3-1 is “C3, E3, G3, C3 (+12), C3, E3, G3, C3 (+12), C3 (−5), C3 (−1)”. Has been. Then, as described above, when the scale of “C3, D3, E3” is selected from the keyword “Larala”, “C3, E3, G3” of the melody data 3-1 becomes “C3, D3, E3”, respectively. Converted (arrow
[Outside 1]

), Melody data 2-2 consisting of a sequence of sounds "C3, D3, E3, C3 (+12), C3, D3, E3, C3 (+12), C3 (-5), C3 (-1)" Is converted to That is, at this time, scales 2 and 3 and

scales

6 and 7 constituting melody data 2-1 are converted to other scales, respectively.
[0138]
Each sound of the melody data 3-2 is shifted to a high tone by 3 octets by the melody generator 32 (arrows).
[Outside 2]

Melody data 3-3 consisting of a sequence of sounds "C6, D6, E6, C6 (+12), C6, D6, E6, C6 (+12), C6 (-5), C6 (-1)" Is generated. Then, this melody data 3-3 is reproduced and the corresponding sound is output.
[0139]
As described above, the melody data prepared in advance is converted based on the pitch frequency extracted from the input voice of the keyword, so that even if a plurality of users utter the same keyword, the same melody Different sounds are output from the data.
[0140]
Therefore, since the sound output from the pet robot 1 becomes unpredictable, the communication with the pet robot 1 becomes more interesting.
[0141]
In the above, the scale corresponding to the extracted pitch frequency is selected, and the melody data obtained after conversion based on the scale is shifted by a predetermined number of octaves. A melody data prepared in advance may be converted by selecting a scale corresponding to the frequency and using the scale shifted by a predetermined number of octaves.
[0142]
In this case, for example, the scale of “C3, D3, E3” selected based on the pitch frequency of the keyword “Larala” is shifted to a high tone by 3 octaves to be the scale of “C6, D6, E6”. . Then, for example, “C3, E3, G3” of the melody data 1-1 in FIG. 11 is converted by “C6, D6, E6”, respectively, and “C6, C6, D6, E6, C6, C6, D6, E6” are converted. , C6 (+12) "melody data 1-4 is generated.
[0143]
Further, the scale selected based on the pitch frequency may be shifted not only to a predetermined number of octaves but also to a sound within the frequency range of the utterance by the pet robot 1.
[0144]
FIG. 17 is a diagram illustrating a process of transitioning a scale selected based on the pitch frequency to a predetermined frequency range.
[0145]
FIG. 17A is a diagram illustrating an example of a scale selected based on the pitch frequency. As described above, the scale of “C3, D3, E3” is selected based on the keyword “Larala”. Then, transition is made as shown in FIG. 17B so that the selected scale is always within the standard utterance range of the pet robot 1.
[0146]
For example, in this example, the sound “D3” at the center of the selected scale is changed to the sound “C6”.
[0147]
Therefore, when the melody data is converted based on the scale “B5, C6, D6” shifted to the standard utterance range of the pet robot 1 in this way, the melody data 1-1 is “B5, B5, C6, D6”. , B5, B5, C6, D6, B5 (+12) "is converted into melody data. That is, the scales of the converted melody data are all within the standard utterance areas “F5” to “F6” of the pet robot 1 shown in FIG.
[0148]
Thus, the sound output from the pet robot 1 can be controlled by transitioning the scale selected based on the pitch frequency to various regions. Thereby, the level of the input voice can be suppressed from appearing in the sound directly output from the pet robot 1. That is, when such a frequency domain transition is not performed, when a high voice is input, a high tone is output from the pet robot 1 accordingly.
[0149]
In the above description, the melody data is converted based on the scale corresponding to the pitch frequency of the voice. However, the melody data can be converted based on various information and output.
[0150]
Next, processing of the pet robot 1 that generates melody data using the scale selected based on the change of emotion and reproduces it will be described with reference to the flowchart of FIG.
[0151]
In step S21, the behavior management unit 27 determines whether or not a change in the emotion of the pet robot 1 has been detected based on the internal information notified from the instinct / emotion management unit 26, and until the determination is made. stand by.
[0152]
If the action management unit 27 determines in step S21 that a change in emotion has been detected, the process proceeds to step S22, and information indicating that the emotion of the pet robot 1 has changed, together with the identification information of the melody data to be converted. The melody generation unit 32 is notified.
[0153]
In step S23, the melody generation unit 32 selects a scale to be used for converting the melody data based on the changed emotion when it is notified from the behavior management unit 27 that the emotion has changed.
[0154]
FIG. 19 is a diagram showing an example of correspondence between the changed emotion of the pet robot 1 and the scale, and such correspondence is prepared in advance in the melody generation unit 32.
[0155]
In this example, when the emotion changes to “JOY”, the melody generation unit 32 selects the scale of “C, E, G”, and the emotion changes to “SAD (sadness)”. When the scale of “C, D #, G” is selected and the emotion changes to “ANGRY”, the scale of “C, C, F” is selected. The melody generation unit 32 selects the scale of “C, D #, F #” when the emotion changes to “SURPRISE (surprise)”, and when the emotion changes to “DISGUST (disgust)” The scale of “C, F #, C” is selected, and when the emotion changes to “FEAR”, the scale of “C, C #, C” is selected.
[0156]
For example, when a major sound is associated with “JOY” and a minor sound is associated with “SAD (sadness)”, when the pet robot 1 is happy, it generally has a bright sound. When it is sad, a sound that is generally dark and sad is output. Therefore, the emotion of the pet robot 1 can be expressed.
[0157]
Furthermore, when the emotion value is large, it is natural to raise the pitch, so when the emotion value is 10, the pitch can be shifted by +10.
[0158]
In step S24, the melody generation unit 32 reads out the melody data instructed from the behavior management unit 27 from the melody data storage unit 33, and converts the melody data based on the selected scale.
[0159]
FIG. 20 is a diagram for explaining melody data conversion processing.
[0160]
FIG. 20 shows an example in which the melody data 1-1 shown in FIG. 11 is converted, and the melody generation unit 32 is notified that the emotion has changed to “ANGRY (angry)”. This is an example.
[0161]
Therefore, the melody generation unit 32 selects the scale of “C, C, F” from the correspondence table as shown in FIG. 19 and converts the melody data 1-1 based on the scale. For example, the melody generation unit 32 sets the selected scale “C, C, F” to “C6, C6, F6” so that the output sound is within the standard utterance range of the pet robot 1, and the melody data 1 −1 “C3, E3, G3” are converted by “C6, C6, F6”, respectively.
[0162]
Then, melody data 1-5 including a sequence of sounds “C6, C6, C6, F6, C6, C6, C6, F6, C6 (+12)” is generated by the above-described conversion.
[0163]
Further, for example, when it is notified that the emotion of the pet robot 1 has changed to “SAD (sadness)”, the melody generation unit 32 selects and outputs the scale of “C, D #, G”. The scale is “C6, D6 #, G6” so that the sound is within the standard utterance range of the pet robot 1. Then, the melody generation unit 32 converts the melody data 1-1 using the scale of “C6, D6 #, G6”, and “C6, C6, D6 #, G6, C6, C6, D6 #, G6, C6”. New melody data composed of a sequence of (+12) "is generated.
[0164]
The melody data generated in this way is supplied to the speech synthesizer 34 and is reproduced by the speech synthesizer 34 in step S25.
[0165]
As described above, based on various information detected by the pet robot 1, melody data prepared in advance can be converted and new melody data can be generated. For example, a ratio of a predetermined color (light) included in an image captured by the camera 12 may be extracted, and a scale may be selected based on the ratio, or an external pressure detected by the touch sensor 13 A musical scale may be selected on the basis of the level and new melody data may be generated using the scale. Also, new melody data can be generated based on the volume of the sound collected by the microphone 11.
[0166]
This allows the pet robot 1 to output countless patterns of sounds (melody) that cannot be predicted by the user. In addition, the number of melody data prepared in advance can be reduced.
[0167]
The various processes described above are not only executed by an animal-type robot as shown in FIG. 1, but are also executed by, for example, a humanoid robot capable of bipedal walking, a virtual robot active in a computer, or the like. You may do it.
[0168]
The series of processes described above can be executed by hardware, but can also be executed by software. In this case, the information processing apparatus that executes the software is configured by a personal computer as shown in FIG. 21, for example.
[0169]
In FIG. 21, the CPU 71 executes various processes according to a program stored in a ROM (Read Only Memory) 72 or a program loaded from a storage unit 78 into a RAM (Random Access Memory) 73. The RAM 73 also appropriately stores data necessary for the CPU 71 to execute various processes.
[0170]
The CPU 71, ROM 72, and RAM 73 are connected to each other via a bus 74. An input / output interface 75 is also connected to the bus 74.
[0171]
The input / output interface 75 includes an input unit 76 including a keyboard and a mouse, a display including a CRT (Cathode Ray Tube) and an LCD (Liquid Crystal Display), an output unit 77 including a speaker, and a hard disk. A communication unit 79 including a storage unit 78, a modem, a terminal adapter, and the like is connected. The communication unit 79 performs communication processing via a network.
[0172]
A drive 80 is connected to the input / output interface 75 as necessary, and a magnetic disk 81, an optical disk 82, a magneto-optical disk 83, a semiconductor memory 84, or the like is appropriately mounted.
[0173]
When a series of processing is executed by software, a program constituting the software is installed from a network or a recording medium into a general-purpose personal computer as shown in FIG.
[0174]
As shown in FIG. 21, the recording medium is distributed to provide a program to the user separately from the apparatus main body, and includes a magnetic disk 81 (including a floppy disk) on which the program is recorded, an optical disk 82 (CD -Package media including ROM (compact disk-read only memory), DVD (digital versatile disk), magneto-optical disk 83 (including MD (registered trademark) (mini-disk)), or semiconductor memory 84 In addition to being configured, it is configured by a ROM 72 in which a program is recorded and a hard disk included in the storage unit 78 provided to the user in a state of being pre-installed in the apparatus main body.
[0175]
In the present specification, the step of describing the program recorded in the recording medium is not limited to the processing performed in time series according to the described order, but is not necessarily performed in time series, either in parallel or individually. The process to be executed is also included.
[0176]
【The invention's effect】
First robot apparatus of the present invention , And according to the program , A more suitable response can be performed according to the utterance from the user.
[0177]
Second robot apparatus of the present invention , And according to the program , Sound that the user cannot predict can be generated.
[Brief description of the drawings]
FIG. 1 is a perspective view showing an example of the appearance of a pet robot to which the present invention is applied.
FIG. 2 is a block diagram showing an example of an internal configuration of the pet robot of FIG. 1;
FIG. 3 is a block diagram illustrating an example of a functional configuration of the pet robot of FIG. 1;
4 is a diagram schematically illustrating an example of a function of an instinct / emotion management unit in FIG. 3;
FIG. 5 is a diagram schematically illustrating an example of a function of the behavior management unit in FIG. 3;
FIG. 6 is a diagram illustrating an example of a finite automaton.
FIG. 7 is a diagram illustrating an example of a state transition probability
FIG. 8 is a flowchart for explaining processing of the pet robot of FIG. 1;
FIG. 9 is a diagram illustrating processing for extracting a pitch frequency.
FIG. 10 is a diagram illustrating an example of correspondence between a musical scale and a pitch frequency.
FIG. 11 is a diagram illustrating an example of melody data.
FIG. 12 is a diagram for explaining processing for converting melody data;
FIG. 13 is a diagram illustrating another process for converting melody data.
FIG. 14 is a diagram illustrating still another process for converting melody data.
FIG. 15 is a diagram illustrating a process for converting other melody data.
FIG. 16 is a diagram illustrating a process for converting other melody data.
FIG. 17 is a diagram for explaining musical scale transition;
18 is a flowchart for explaining another process of the pet robot of FIG. 1. FIG.
FIG. 19 is a diagram illustrating an example of correspondence between emotions and musical scales.
FIG. 20 is a diagram for explaining processing for converting melody data;
FIG. 21 is a block diagram illustrating an example of a personal computer.
[Explanation of symbols]
10 controllers, 11 microphones, 14 speakers, 21 AD conversion units, 22 speech feature analysis units, 23 speech segment detection units, 24 speech recognition units, 26 instinct / emotion management units, 27 behavior management units, 28 speech pitch analysis units, 32 melody generation unit, 33 melody data storage unit, 34 speech synthesis unit, 35 DA conversion unit, 81 magnetic disk, 82 optical disk, 83 magneto-optical disk, 84 semiconductor memory

Claims

Storage means for storing first melody data composed of a first scale;
Extraction means for extracting the pitch frequency of the input voice;
Selecting means for selecting a second scale based on the pitch frequency extracted by the extracting means;
Generation means for converting the first scale constituting the first melody data stored in the storage means into the second scale selected by the selection means to generate second melody data When,
Reproducing means for reproducing the second melody data generated by the generating means. A robot apparatus comprising:

The robot apparatus according to claim 1, wherein the extraction unit extracts a pitch frequency of the voice representing a predetermined keyword.

The robot apparatus according to claim 1, wherein the extraction unit extracts a pitch frequency of the voice including vowels.

The robot apparatus according to claim 1, wherein the extraction unit extracts an average value of pitch frequencies detected during a predetermined period of the voice as the pitch frequency.

2. The robot apparatus according to claim 1, wherein the generation unit further generates the second melody data by further shifting the second scale by a predetermined number of octaves .

A storage control step for controlling storage of the first melody data composed of the first scale;
An extraction step for extracting the pitch frequency of the input voice;
A selection step of selecting a second scale based on the pitch frequency extracted by the processing of the extraction step;
Second melody data obtained by converting the first scale constituting the first melody data stored by the process of the storage control step into the second scale selected by the process of the selection step. A generation step for generating
And a reproduction control step for controlling reproduction of the second melody data generated by the processing of the generation step. A recording medium on which a computer-readable program is recorded.

A storage control step for controlling storage of the first melody data composed of the first scale;
An extraction step for extracting the pitch frequency of the input voice;
A selection step of selecting a second scale based on the pitch frequency extracted by the processing of the extraction step;
Second melody data obtained by converting the first scale constituting the first melody data stored by the process of the storage control step into the second scale selected by the process of the selection step. A generation step for generating
A reproduction control step for controlling reproduction of the second melody data generated by the processing of the generation step.

Storage means for storing first melody data composed of a first scale;
Management means to manage its own internal state,
Selection means for selecting a second scale corresponding to the change in the internal state when the internal state managed by the management means has changed;
Generation means for converting the first scale constituting the first melody data stored in the storage means into the second scale selected by the selection means to generate second melody data When,
Reproducing means for reproducing the second melody data generated by the generating means. A robot apparatus comprising:

The robot apparatus according to claim 8 , wherein the generation unit generates the second melody data by further shifting the second scale by a predetermined number of octaves.

A storage control step for controlling storage of the first melody data composed of the first scale;
Management steps to manage their internal state,
A selection step of selecting a second scale corresponding to the change of the internal state when the internal state managed by the process of the management step changes;
Second melody data obtained by converting the first scale constituting the first melody data stored by the process of the storage control step into the second scale selected by the process of the selection step. A generation step for generating
And a reproduction control step for controlling reproduction of the second melody data generated by the processing of the generation step. A recording medium on which a computer-readable program is recorded.

A storage control step for controlling storage of the first melody data composed of the first scale;
Management steps to manage their internal state,
A selection step of selecting a second scale corresponding to the change of the internal state when the internal state managed by the process of the management step changes;
Second melody data obtained by converting the first scale constituting the first melody data stored by the process of the storage control step into the second scale selected by the process of the selection step. A generation step for generating
A reproduction control step for controlling reproduction of the second melody data generated by the processing of the generation step.