JP2000181896A

JP2000181896A - Learning type interaction device

Info

Publication number: JP2000181896A
Application number: JP10352707A
Authority: JP
Inventors: Kazuo Ishii; 和夫石井
Original assignee: ATR Media Integration and Communication Research Laboratories
Current assignee: ATR Media Integration and Communication Research Laboratories
Priority date: 1998-12-11
Filing date: 1998-12-11
Publication date: 2000-06-30

Abstract

PROBLEM TO BE SOLVED: To provide a learning type interaction device capable of learning the association between perception and movement while autonomously performing interaction. SOLUTION: A classifying part 8 classifies a speech pattern inputted from a microphone 1 being a perceptual sensor while dynamically generating a class. An associating part 9 performs association between classes. A sufficiency degree calculating part 10 calculates sufficiency degrees. An expectation generating part 11 searches for an expectation class based on them. An activity calculating part 12 decides whether or not to perform voluntary movement, and a speech searching part 13 searches for a class that should make a speech. Voice being an optimum movement pattern is outputted from a speaker 2 being a movement actuator through a speech generating part 14 and a waveform generating part 15.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、学習型相互作用装
置に関し、特に、外界の状況を自律的に自覚して最適な
行動をとることにより自発的なインタラクションを行な
い、さらにインタラクションによって最適な行動を学習
することが可能な機能を有するものに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a learning-type interaction device, and more particularly to a learning-type interaction device which autonomously recognizes an external situation and takes an optimal action to perform a spontaneous interaction. It has a function that can learn.

【０００２】[0002]

【従来の技術】従来より、話者の発話の意味を解読して
その意味から最適な応答を生成する対話型装置がある。2. Description of the Related Art Hitherto, there has been an interactive device for decoding the meaning of a speaker's utterance and generating an optimum response from the meaning.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
対話型装置では、知覚するパターンのクラスは設計時に
予め与えられたものであり、またそのクラスに関して最
適な行動パターンも予め与えられていた。すなわち、パ
ターンに対するクラス分類は、設計者が予め決められて
おり（固定）、それ以外については応答不可能という問
題があった。However, in the conventional interactive device, the class of the pattern to be perceived is given in advance at the time of design, and the optimum behavior pattern for the class is also given in advance. That is, there is a problem in that the class classification for the pattern is predetermined by the designer (fixed), and other types cannot respond.

【０００４】また、システムの自発的な動作について
は、従来の対話型装置では知識が欠けている空きスロッ
トについての問合せというシステム主導型としての自発
的発話にすぎなかった。[0004] In addition, the spontaneous operation of the system is merely a system-driven spontaneous utterance of inquiring about an empty slot that lacks knowledge in a conventional interactive device.

【０００５】さらに、人間と相互作用するシステムにお
いて人間らしさまたは生物らしさを付与し、共感的な相
互作用を形成するのに必要な情動的な行為の生成は、情
動モデルといわれる怒り・悲しみなどの状態の遷移とし
て記述され、情動と行為とのマッピングも予め記述され
ていた。[0005] Furthermore, in a system that interacts with humans, the generation of emotional actions required to impart humanity or creatures and to form empathic interactions is performed by anger and sadness, which are called emotional models. It was described as a state transition, and the mapping between emotion and action was also described in advance.

【０００６】そこで、本発明はこのような問題を解決す
るためになされたものであり、インタラクション介して
自律的に最適な行為選択を獲得し、自発的な行為や情動
的な行為の生成が可能な学習型相互作用装置を提供する
ことを目的とする。Accordingly, the present invention has been made to solve such a problem, and it is possible to autonomously obtain an optimal action selection through interaction and generate a spontaneous action or an emotional action. It is an object to provide a simple learning type interaction device.

【０００７】[0007]

【課題を解決するための手段】請求項１に係る学習型相
互作用装置は、外界を知覚するセンシング手段と、知覚
されたパターンを動的にクラスを生成しながら分類化す
る分類化手段と、分類化されたクラス間の関連づけを行
い、関連づけの重みを変化させて知覚されたパターンに
対して最適な出力パターンを選択する手段と、知覚され
たパターンに対する最適な出力パターンを外界に出力す
るアクチュエータとを備える。According to a first aspect of the present invention, there is provided a learning type interaction apparatus, comprising: a sensing unit for perceiving an external world; a classifying unit for classifying a perceived pattern while dynamically generating a class; Means for associating the classified classes and changing the weight of association to select an optimal output pattern for the perceived pattern, and an actuator for outputting the optimal output pattern for the perceived pattern to the outside world And

【０００８】請求項２に係る学習型相互作用装置は、請
求項１に係る学習型相互作用装置であって、予期手段を
さらに備え、予期手段は、知覚されるパターンを予期す
る手段と、予期したパターンとセンシング手段により実
際に知覚されたパターンとのずれを計算する手段と、ず
れに基づく内部状態をもとに、予期の生成のためのスト
ラテジを選択する手段とを含む。[0008] The learning-type interaction device according to claim 2 is the learning-type interaction device according to claim 1, further comprising an expectation means, wherein the expectation means includes means for expecting a perceived pattern, and expectation. Means for calculating a shift between the set pattern and the pattern actually perceived by the sensing means, and means for selecting a strategy for generating an expectation based on an internal state based on the shift.

【０００９】請求項３に係る学習型相互作用装置は、請
求項１または請求項２に係る学習型相互作用装置であっ
て、センシング手段は、外界を知覚するマイクと、音声
を分析して、分析した結果得られるパラメータを分類化
手段に出力する音声分析手段とを含み、アクチュエータ
は、外界に出力するスピーカと、最適な出力パターンで
ある音声を生成する音声生成手段とを含み、音声による
対話により、特定の言語の体系を生成する。A learning interaction device according to claim 3 is the learning interaction device according to claim 1 or 2, wherein the sensing means analyzes a microphone that perceives the outside world and a voice, The actuator includes a voice analysis unit that outputs a parameter obtained as a result of the analysis to the classification unit, the actuator includes a speaker that outputs to the outside world, and a voice generation unit that generates a voice that is an optimal output pattern, and a dialogue by voice is performed. Generates a specific language system.

【００１０】請求項４に係る学習型相互作用装置は、請
求項３に係る学習型相互作用装置であって、音声分析手
段は、音声のパワーを分析し、音声生成手段は、分析し
た音声のパワーと周波数とをなめらかに変化させること
により、自然性の高い音声を生成する。A learning-type interaction device according to a fourth aspect is the learning-type interaction device according to the third aspect, wherein the voice analysis means analyzes the power of the voice, and the voice generation means generates the voice of the analyzed voice. By smoothly changing the power and the frequency, a highly natural sound is generated.

【００１１】請求項５に係る学習型相互作用装置は、請
求項３に係る学習型相互作用装置であって、音声分析手
段は、外界から知覚した音声、ならびに音声生成手段で
生成された音声を分析する。A learning interaction device according to a fifth aspect is the learning interaction device according to the third aspect, wherein the voice analyzing means converts the voice perceived from the outside world and the voice generated by the voice generating means. analyse.

【００１２】[0012]

【発明の実施の形態】［実施の形態１］本発明の実施の
形態１における学習型相互作用装置について、図１を用
いて説明する。図１は、本発明の実施の形態１における
学習型相互作用装置の全体構成の概要を示すブロック図
である。図１を参照して、学習型相互作用装置は、マイ
ク１、スピーカ２、Ａ／Ｄ変換部３、パワー計算部４、
音声区間検出部５、正規化部６、距離計算部７、分類化
部８および関連づけ部９を備える。[First Embodiment] A learning type interaction device according to a first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing an outline of an overall configuration of a learning interaction device according to Embodiment 1 of the present invention. Referring to FIG. 1, a learning type interaction device includes a microphone 1, a speaker 2, an A / D converter 3, a power calculator 4,
It includes a voice section detection unit 5, a normalization unit 6, a distance calculation unit 7, a classification unit 8, and an association unit 9.

【００１３】本発明の実施の形態１における学習型相互
作用装置は、知覚のセンサとしてマイク１を、行動のア
クチュエータとしてスピーカ２を備えた音声による対話
システムである。マイク１から入力された音声信号は、
Ａ／Ｄ変換部３において、１６ｋＨｚ、１６ｂｉｔでデ
ジタル化される。The learning interaction apparatus according to the first embodiment of the present invention is a voice interactive system including a microphone 1 as a sensor for perception and a speaker 2 as an actuator for action. The audio signal input from the microphone 1 is
In the A / D converter 3, the data is digitized at 16 kHz and 16 bits.

【００１４】パワー計算部４では、５１２サンプル（１
フレームシフト）ごとに対話音声パワーが計算される。
フレーム長をＬとすると、対話音声パワーｌｏｇｐは、
式（１）で表わされる。In the power calculation unit 4, 512 samples (1
The dialog voice power is calculated for each frame shift.
Assuming that the frame length is L, the dialog voice power logp is
It is represented by equation (1).

【００１５】[0015]

【数１】 (Equation 1)

【００１６】たとえば、フレーム長としてＬ＝５１２、
またはＬ＝７６８で計算する。背景音の変動を除去する
ために、式（２）で底上げする。For example, if the frame length is L = 512,
Alternatively, the calculation is performed using L = 768. In order to remove the fluctuation of the background sound, the level is raised by equation (2).

【００１７】[0017]

【数２】 (Equation 2)

【００１８】たとえば、ｂａｃｋｇｒｏｕｎｄｌｅｖｅ
ｌ＝３０（ｄＢ）である。音声区間検出部５では、１フ
レームごとに平均パワーを求め、それとの差分が一定値
を超えると音声が入力されたと判断する。平均パワーａ
ｖｅｐｏｗは、式（３）で更新する。For example, the backgroundroundve
1 = 30 (dB). The voice section detection unit 5 calculates an average power for each frame, and determines that voice has been input if the difference from the average power exceeds a certain value. Average power a
Vepow is updated by Expression (3).

【００１９】[0019]

【数３】 (Equation 3)

【００２０】ただし、これは音声がオフ状態を表わして
いる。音声がオン状態の場合には、長い音声に対する慣
れの効果から、音声のオン時間（ｓｐｃｎｔ）と慣れの
時定数ＭＡＸ＿ＵＴＴＥＲ＿ＣＮＴ＿ＡＲＴＩＣＵＬＡ
ＴＥとに基づき、式（４）を用いて平均パワーａｖｅｐ
ｏｗを求める。However, this indicates that the sound is off. When the sound is on, the on-time (spcnt) of the sound and the time constant MAX_UTTER_CNT_ARTICULA of the sound are used due to the effect of the long sound.
Based on the TE and the average power avep using equation (4)
Find ow.

【００２１】[0021]

【数４】 (Equation 4)

【００２２】なお、（ｌｏｇｐ−ａｖｅｐｏｗ）の値
が、所定数のＴＨＯＮより大きい状態が、所定フレーム
数ＣＮＴＯＮだけ続いたらオン状態とし、所定数ＴＨＯ
ＦＦより小さい状態が所定フレーム数ＣＮＴＯＦＦだけ
続いたらオフ状態とする。When a state in which the value of (logp-avepow) is larger than a predetermined number of THONs continues for a predetermined number of frames CNTON, the state is turned on, and the predetermined number THO is set.
If the state smaller than the FF continues for the predetermined number of frames CNTOFF, the state is set to the off state.

【００２３】音声区間検出部５には、外部からの音声入
力と同時にシステムの発話するパワーのデータも入力さ
れる。これにより、自己の発する発話データと外部から
入力される発話データとが等価的に分節化（音声区分
化）され、これ以降の処理も等価的に扱うことが可能と
なる。The voice section detecting section 5 receives data of power spoken by the system at the same time as voice input from the outside. As a result, the utterance data uttered by itself and the utterance data input from the outside are equivalently segmented (speech segmentation), and the subsequent processing can be treated equivalently.

【００２４】音声区間検出部５からは、音声入力のオン
状態／オフ状態と、音声がオフ状態になったときの音声
パワーの時系列のパターンとが出力される。なお外部か
らの音声データには、フラグｃｈ＝０、内部からの発話
にはフラグｃｈ＝１をつけて区分する。The voice section detection section 5 outputs an on / off state of voice input and a time-series pattern of voice power when the voice is turned off. Note that external voice data is classified by attaching a flag ch = 0, and internal utterances by a flag ch = 1.

【００２５】また、マイク１とスピーカ２との距離が近
い場合、システムの発話がマイク１に入ってしまう。こ
れを避けるために、パワーの計算時にシステムの発話の
パワー分を引く。あるフレームｔの対話パワーをｌｏｇ
ｐ（ｔ）、システムの発話パワーをＯｕｔＰｏｗｅｒ
（ｔ）とすると、対話パワーｌｏｇｐ（ｔ）は、式
（５）で表わされる。When the distance between the microphone 1 and the speaker 2 is short, the speech of the system enters the microphone 1. To avoid this, the power of the utterance of the system is subtracted when calculating the power. Log conversation power of a certain frame t
p (t), the utterance power of the system as OutPower
Assuming (t), the dialog power logp (t) is expressed by equation (5).

【００２６】[0026]

【数５】 (Equation 5)

【００２７】ここで、係数ａ（ｉ）と字数Ｎとは、出力
から入力への等価的な伝達特性を表わしている。たとえ
ば、ａ（１）＝１．０、ａ（２）＝１．０、Ｎ＝２とす
る。Here, the coefficient a (i) and the number of characters N represent equivalent transfer characteristics from output to input. For example, a (1) = 1.0, a (2) = 1.0, and N = 2.

【００２８】正規化部６では、長すぎる発話に対して
は、システムの受け付ける最長の長さに最終部を重みづ
けして切取り、短すぎる発話に対しては、発話の最終部
にゼロ詰めをして発話の長さをシステムの受け付ける最
低の長さに揃える。The normalization unit 6 cuts out the last part of the utterance by weighting the longest length accepted by the system for an utterance that is too long, and pads the last part of the utterance with zeros for an utterance that is too short. And adjust the utterance length to the minimum length accepted by the system.

【００２９】距離計算部７では、入力された音声パター
ンとシステムが内部に持つパターンクラスとの距離を計
算し、最小距離のクラスを求める。ここで、システムが
内部に持つパターンクラスについて図２を用いて説明す
る。図２は、パターンクラスの分類を行なう際の階層構
造を示した図である。図２を参照して、学習型相互作用
装置は、複数の階層からなるクラスを備え、各クラス
は、ツリー構造をとる。図２においては、一例として３
階層のものを示している。The distance calculator 7 calculates the distance between the input voice pattern and the pattern class stored in the system, and obtains the minimum distance class. Here, a pattern class included in the system will be described with reference to FIG. FIG. 2 is a diagram showing a hierarchical structure when classifying the pattern classes. Referring to FIG. 2, the learning interaction device includes a class having a plurality of hierarchies, and each class has a tree structure. In FIG. 2, for example, 3
This shows the hierarchy.

【００３０】第１階層Ｎ１は、時間で分類するためのク
ラスＣ１を含む（以下、時間クラスと称す）。第２階層
Ｎ２は、パターンの大きな分類を行なうためのクラスＣ
２を含む（以下、メジャークラスと称す）。第３階層Ｎ
３は、パターンのさらに細かい分類を行なうためのクラ
スＣ３を含む（以下、マイナークラスと称す）。第２階
層（メジャークラス）および第３階層（マイナークラ
ス）のそれぞれは、クラスを代表するクラス代表パター
ンをそれぞれ有する。The first hierarchy N1 includes a class C1 for classifying by time (hereinafter, referred to as a time class). The second layer N2 is a class C for performing a large classification of the pattern.
2 (hereinafter referred to as major class). Third level N
No. 3 includes a class C3 for performing finer classification of patterns (hereinafter, referred to as a minor class). Each of the second hierarchy (major class) and the third hierarchy (minor class) has a class representative pattern representing a class.

【００３１】パターンは、まず時間クラスＣ１で分類さ
れる。具体的には、パターン長で分類を行なう。たとえ
ば、時間クラスＣ１を１６個用意し、各時間クラスＣ１
の範囲を４フレームとする。長さ１２フレームから１５
フレームまでを時間クラスＣ１（０）、長さ１６フレー
ムから１９フレームまでを時間クラスＣ１（１）とし、
…、さらに７２フレームから７５フレームまでを時計ク
ラスＣ１（１５）とする。この範囲を逸脱するものは、
正規化部６で正規化されている。The patterns are first classified by the time class C1. Specifically, classification is performed based on the pattern length. For example, 16 time classes C1 are prepared, and each time class C1 is prepared.
Range is 4 frames. 15 from 12 frames in length
The time class C1 (0) up to the frame, the time class C1 (1) from the 16th frame to the 19th frame,
.., And the frames from 72 to 75 are defined as a clock class C1 (15). Anything that deviates from this range
It is normalized by the normalization unit 6.

【００３２】ある時間クラスＣ１に分類された後、第２
階層Ｎ２のメジャークラスＣ２との距離計算を行なう。
距離Ｄは、入力パターンをｘ（ｔ）、クラス代表パター
ンをｃ（ｔ）とすると、式（６）で表現される。After being classified into a certain time class C1, the second
The distance to the major class C2 of the hierarchy N2 is calculated.
The distance D is represented by Expression (6), where x (t) is the input pattern and c (t) is the class representative pattern.

【００３３】[0033]

【数６】 (Equation 6)

【００３４】ここで、Ｌはパターン長を表わす。入力パ
ターンの長さの端末数はゼロ詰めにする。式（６）を用
いて、入力したパターンを第２階層の中で距離が最小で
あるメジャークラスＣ２に分類する。そして、第３階層
のマイナークラスＣ３に対しても、同様に距離計算を行
ない、最小距離のマイナークラスＣ３を求める。このと
きの分類化誤差をＤｍｉｎと記す。Here, L represents the pattern length. The number of terminals of the input pattern length is padded with zeros. Using Expression (6), the input pattern is classified into the major class C2 having the smallest distance in the second hierarchy. Then, the distance calculation is similarly performed on the third class minor class C3, and the minimum distance minor class C3 is obtained. The classification error at this time is denoted as Dmin.

【００３５】図１を参照して、分類化部８では、分類化
誤差Ｄｍｉｎがメジャークラスとマイナークラスとにお
けるクラス生成基準より大きいときに新しいクラスを生
成し、それ以外の場合には、最小距離のクラスへ分類化
する。Referring to FIG. 1, the classification unit 8 generates a new class when the classification error Dmin is larger than the class generation criterion of the major class and the minor class, and otherwise generates the minimum distance. Classification into classes.

【００３６】具体的には、分類化誤差Ｄｍｉｎが、メジ
ャークラスの生成基準である距離ＭｊＤより大きい場合
には、入力パターンを代表パターンする新しいメジャー
クラスを生成し、その下に入力パターンを代表パターン
とするマイナークラスを生成する。そして、入力パター
ンはこのクラスに分類する。Specifically, when the classification error Dmin is larger than the distance MjD, which is a measure for generating a major class, a new major class representing the input pattern is generated, and the input pattern is placed under the new major class. And generate a minor class. Then, the input pattern is classified into this class.

【００３７】一方、分類化誤差Ｄｍｉｎが、距離ＭｊＤ
より小さい場合には、マイナークラスの生成基準である
距離ＭｒＤと比較する。分類化誤差Ｄｍｉｎが、距離Ｍ
ｒＤより大きい場合には、新しいマイナークラスを生成
する。そして、入力パターンをこのクラスに分類する。On the other hand, if the classification error Dmin is equal to the distance MjD
If it is smaller, the distance is compared with the distance MrD, which is the generation reference of the minor class. The classification error Dmin is equal to the distance M
If it is greater than rD, create a new minor class. Then, the input pattern is classified into this class.

【００３８】新しくマイナークラスを生成する場合に
は、自己のクラスに向けて関連リンクを生成する。さら
に、各マイナークラスには累積分類誤差を記憶してお
き、入力パターンがそのクラスに分類化されたときに分
類誤差を累積しておく。累積分類誤差が、ある値ＭｒＣ
ｕｍＤより大きい場合も新しいマイナークラスを生成す
る。さらに、マイナークラスが持っている関連リストの
数がある値ＭＲＣＬＡＳＳ＿ＣＯＭＰＬＥＸＩＴＹより
大きい場合も、新しいマイナークラスを生成する。ここ
で示した各しきい値（基準の値）は固定値ではなく、学
習の状況において変化させることが可能である。When a new minor class is generated, a related link is generated for the own class. Further, an accumulated classification error is stored in each minor class, and the classification errors are accumulated when the input pattern is classified into that class. Cumulative classification error is a certain value MrC
If it is larger than umD, a new minor class is generated. Furthermore, a new minor class is also generated when the number of related lists possessed by the minor class is larger than a certain value MRCLASS_COMPLEXITY. Each threshold value (reference value) shown here is not a fixed value but can be changed in a learning situation.

【００３９】関連づけ部９では、入力発話クラス間の関
連づけを行なう。マイナークラスは、自己に対する重み
つきリンクおよび他のマイナークラスに対する重みつき
リンクを持つ。この関連が、知覚に対する発話行為の関
連づけとなる。The associating unit 9 associates the input utterance classes. The minor class has a weighted link for itself and a weighted link for other minor classes. This association is the association of the utterance act with the perception.

【００４０】知覚に対する発話行為の関連づけについ
て、図３を用いて説明する。図３は、知覚に対する発話
の行為の関連づけについて説明するための図である。図
３を参照して、発話と発話の因果関係とをクラスの関連
として捉えリンクを生成する。図３では、ユーザの発話
に関するクラス２、クラス３とシステムの有するクラス
３との関連が示されている。記号ｉｎｔｅｒｖａｌは、
発話間の間隔を表わしている。たとえば、図３に示すよ
うにユーザの発話がクラス３に分類された場合、クラス
２からクラス３に対してリンクが生成され、またはリン
クの重みが調整される。The association of the utterance act with the perception will be described with reference to FIG. FIG. 3 is a diagram for explaining the association of the utterance act with the perception. Referring to FIG. 3, an utterance and a causal relationship between the utterances are regarded as a class association, and a link is generated. FIG. 3 shows the association between the classes 2 and 3 relating to the utterance of the user and the class 3 possessed by the system. The symbol interval is
Indicates the interval between utterances. For example, as shown in FIG. 3, when the utterance of the user is classified into class 3, a link is generated from class 2 to class 3, or the link weight is adjusted.

【００４１】具体的に、クラス２においてクラス３に対
するリンクを探す。リンクがない場合には、新しいリン
クを生成する。新しいリンクの重みＲｅｌＥｎｅｒｇｙ
（以下、ＲＥと記す）は、発話間の間隔ｉｎｔｅｒｖａ
ｌ、関連づける最大の時間ＴＲｅｌ、および更新係数Ｒ
ｅｌＡｌｆａから、式（７）および式（８）に基づき求
める。Specifically, a link for class 3 in class 2 is searched for. If there is no link, create a new link. New link weight RelEnergy
(Hereinafter referred to as RE) is the interval interva between utterances.
l, maximum time to associate TRel, and update factor R
From elAlfa, it is determined based on equations (7) and (8).

【００４２】[0042]

【数７】 (Equation 7)

【００４３】[0043]

【数８】 (Equation 8)

【００４４】なお他のクラスへのリンクは、式（９）を
用いる。A link to another class uses equation (9).

【００４５】[0045]

【数９】 (Equation 9)

【００４６】このように、式（９）を用いて重みＲＥを
小さくすることによりすべてのリンクへの重みの合計が
１．０に正規化される。As described above, by reducing the weight RE using the equation (9), the sum of the weights for all the links is normalized to 1.0.

【００４７】新しくクラスを生成するときは、自己のク
ラスへ１．０の重みつきリンクを生成する。リンクが既
にある場合には、すべてのリンクに対して重みＲＥを式
（１０）とする。When a new class is generated, a weighted link of 1.0 is generated for its own class. If links already exist, the weight RE is set to equation (10) for all links.

【００４８】[0048]

【数１０】 (Equation 10)

【００４９】さらに、クラス３に対するリンクに対して
は、式（１１）を用いる。Further, for a link to class 3, equation (11) is used.

【００５０】[0050]

【数１１】 [Equation 11]

【００５１】これにより、正規化しながら重みを増して
いく。新しいクラス生成時に、自己に対するリンク（自
己リンク）を生成する。このとき、最適行為として自己
リンク先のクラスを選択するため、外部入力に対して模
擬的な応答を生成することができる。Thus, the weight is increased while normalizing. When a new class is created, a link to itself (self-link) is created. At this time, since the self-link destination class is selected as the optimal action, a simulated response to an external input can be generated.

【００５２】また、クラス２からクラス３への関連づけ
を生成するときに同時に、クラス３からクラス２への逆
方向の逆転リンクも生成することで、因果が逆の関係性
も付与する。この場合、上述の値ＴＲａｔｉｏに値Ｉｎ
ｖｅｒｓｅＦａｃｔｏｒを掛ける。たとえば、値Ｉｎｖ
ｅｒｓｅＦａｃｔｏｒ＝０．１とし、逆の因果を付与し
ない場合には、ＩｎｖｅｒｓｅＦａｃｔｏｒ＝０．０と
する。Further, when the association from class 2 to class 3 is generated, a reverse link in the reverse direction from class 3 to class 2 is also generated at the same time. In this case, the value TRatio is set to the value In.
multiply by factorFactor. For example, the value Inv
Inverse factor = 0.1, and if no reverse cause is given, Inverse factor = 0.0.

【００５３】図１を参照して、学習型相互作用装置はさ
らに、充足度計算部１０、予期生成部１１、活性計算部
１２、発話探索部１３、発話生成部１４、および波形生
成部１５を備える。Referring to FIG. 1, the learning interaction apparatus further includes a sufficiency calculation unit 10, an expectation generation unit 11, an activity calculation unit 12, an utterance search unit 13, an utterance generation unit 14, and a waveform generation unit 15. Prepare.

【００５４】充足度計算部１０では、内部パラメータと
して充足度Ｆとその時間的平均である平均充足度Ｆａｖ
ｅとを計算する。これらは、行動の選択のストラテジー
の選択に重要な役割を果たす。充足度Ｆは、生成した予
期と実際の入力とのずれによって計算される。充足度Ｆ
は、式（１２）で定義する。The sufficiency calculation unit 10 calculates the sufficiency F as an internal parameter and the average sufficiency Fav which is a temporal average thereof.
and e. These play an important role in choosing a strategy of action choice. The sufficiency F is calculated by the difference between the generated expectation and the actual input. Satisfaction degree F
Is defined by equation (12).

【００５５】[0055]

【数１２】 (Equation 12)

【００５６】ここで、値Ｍは、入力クラスと予期との整
合度を、値Ｌは入力クラスと予期とのずれを、値Ｐｇ
は、入力クラスのパターン類似の距離に対するずれを、
値Ｔｇは、期待に対する入力が来るまでの時間のずれを
それぞれ表わしている。Here, the value M is the degree of consistency between the input class and the expectation, the value L is the difference between the input class and the expectation, and the value Pg
Is the shift for distances similar to the pattern of the input class,
The value Tg represents a time lag until an input with respect to an expectation comes.

【００５７】実際には、各要素の値のレンジに応じて重
みをつけて和をとる。ここでは、一例として充足度Ｆは
１次元の値であるが、内部空間として多次元化してもよ
い。Actually, the sum is obtained by weighting according to the range of the value of each element. Here, the sufficiency F is a one-dimensional value as an example, but may be multi-dimensional as an internal space.

【００５８】入力イベントがあった場合、すなわち外部
からまたはシステム自己の発話があった場合、値Ｐｇと
して入力クラスへの分類化誤差Ｄｍｉｎを代入する。次
に、後述する予期生成部１１で生成された予期の中に入
力されたクラスが存在するか否かを探索する。予期の中
に入力されたクラスが存在した場合は、そのクラスの予
期の確率をＣＥとすると、値Ｍは、式（１３）を用い
て、値Ｌは、式（１４）を用いてそれぞれ更新する。When there is an input event, that is, when there is an utterance from the outside or the system itself, the classification error Dmin to the input class is substituted as the value Pg. Next, it is searched whether or not the input class exists in the expectation generated by the expectation generation unit 11 described later. If the class input in the expectation exists, and the probability of the expectation of the class is CE, the value M is updated using the equation (13), and the value L is updated using the equation (14). I do.

【００５９】[0059]

【数１３】 (Equation 13)

【００６０】[0060]

【数１４】 [Equation 14]

【００６１】一方、予期の中に入力されたクラスが存在
しなかった場合には、値Ｍを式（１５）で、値Ｌを式
（１６）で加減する。On the other hand, if the input class does not exist in the expectation, the value M is adjusted by the equation (15) and the value L is adjusted by the equation (16).

【００６２】[0062]

【数１５】 (Equation 15)

【００６３】[0063]

【数１６】 (Equation 16)

【００６４】値Ｍや値Ｌが発散しないように、値Ｍｄｅ
ｃ、Ｌｉｎｃを用いて、最大値に制限をかける。そして
これらの値から、式（１２）を用いて充足度Ｆを算出す
る。To prevent the values M and L from diverging, the value Mde
c, Limit the maximum value using Link. Then, the sufficiency F is calculated from these values using Expression (12).

【００６５】また、イベントがない場合でも各フレーム
ごとに充足度Ｆを更新する。値Ｔｇは、発話が入力され
ない限り式（１７）に基づき値を増加させる。Even if there is no event, the sufficiency F is updated for each frame. The value Tg is increased based on Expression (17) unless an utterance is input.

【００６６】[0066]

【数１７】 [Equation 17]

【００６７】発話が入力された場合には、値Ｔｇ＝０と
する。値Ｔｇ以外は、式（１８）、式（１９）および式
（２０）に示すように、各ファクタの重みで減少させる
ことにより、ずれを時間的に減衰させる。When an utterance is input, the value Tg = 0. Except for the value Tg, as shown in Expression (18), Expression (19), and Expression (20), the shift is attenuated over time by decreasing the weight of each factor.

【００６８】[0068]

【数１８】 (Equation 18)

【００６９】[0069]

【数１９】 [Equation 19]

【００７０】[0070]

【数２０】 (Equation 20)

【００７１】ここで、充足度Ｆの時間的平均である平均
充足度Ｆａｖｅは、たとえば式（２１）により計算す
る。Here, the average degree of satisfaction Fave, which is the temporal average of the degree of satisfaction F, is calculated by, for example, equation (21).

【００７２】[0072]

【数２１】 (Equation 21)

【００７３】なお、後述する活性計算部１２において用
いる情動度Ｅｍｏｔｉｖｅは、式（２２）に基づき計算
する。The emotion degree Emotive used in the activity calculator 12 described later is calculated based on the equation (22).

【００７４】[0074]

【数２２】 (Equation 22)

【００７５】情動度Ｅｍｏｔｉｖｅは、情動を喚起する
要因の和として計算し、行動を起こす基準として用い
る。なお、平均充足度Ｆａｖｅは、感情ではムードに相
当する時間的平易化の滑らかなパラメータである。本実
施の実施の形態１においては、値Ｔｇを行動源として独
立なものとして取扱うため、平均充足度Ｆａｖｅは情動
度Ｅｍｏｔｉｖｅの計算には含めていない。The degree of emotion Emotive is calculated as the sum of the factors that evoke emotion and used as a criterion for taking action. The average satisfaction level Fave is a smooth parameter for temporal simplification corresponding to mood in emotion. In the first embodiment, since the value Tg is treated as an independent action source, the average degree of satisfaction Fave is not included in the calculation of the emotional degree Emotive.

【００７６】予期生成部１１では、発話イベントがあっ
た場合に、充足度Ｆと平均充足度Ｆａｖｅとの値とに応
じて予期の生成を行なう。これらのストラテジが予期と
行為のストラテジとになり、情動的な行動を生み出す。When there is an utterance event, the expectation generator 11 generates an expectation according to the values of the degree of satisfaction F and the average degree of satisfaction Fave. These strategies become strategies of anticipation and action, creating emotional behavior.

【００７７】予期には、外部から入力された発話に対す
る予期と、自己の発話に対する予期との二通りがある。
外部から入力された発話に対する予期とは、次にシステ
ムが行動すべき行為の予期である。自己の発話に対する
予期とは、次に外部のユーザが発話するクラスの予期で
ある。There are two types of expectation, an expectation for an utterance input from the outside and an expectation for an utterance of the user.
The expectation of the utterance input from the outside is an expectation of an action to be performed next by the system. The expectation of the user's own utterance is the expectation of the next class to be uttered by an external user.

【００７８】予期の生成は、起点となるクラス（入力さ
れたクラス）から、関連リンクによって確率を計算し
て、状況に応じた個数の予期クラスを探索する。ここ
で、予期生成部１の処理について、図４を用いて説明す
る。図４は、予期の生成のための計算を説明するための
図であり、クラス１〜クラス１０における各クラス間の
リンクの様子を示している。なお、図４においては、ク
ラス１以外にも自己リンクがある場合があるが、略記し
ている。In the generation of the expectation, the probabilities are calculated from the class serving as the starting point (the input class) by the related link, and the number of expectation classes according to the situation is searched. Here, the processing of the expectation generation unit 1 will be described with reference to FIG. FIG. 4 is a diagram for explaining calculation for generating an expectation, and shows a state of a link between the classes in the classes 1 to 10. In FIG. 4, there are cases where there is a self-link other than the class 1 but it is abbreviated.

【００７９】図４を参照して、各クラス１〜１０は予期
確率ＣＥ１〜ＣＥ１０を持ち、関連づけで生成したリン
クの重みＲＥ１〜ＲＥ１０を通じて、他のクラスまたは
自己とつながっている。Referring to FIG. 4, each class 1 to 10 has expectation probabilities CE1 to CE10, and is connected to another class or itself through link weights RE1 to RE10 generated by association.

【００８０】たとえば、クラス１に対して、１次リンク
としてクラス１、クラス２、クラス３およびクラス４の
４つがリンクで接続されている。したがって、予期確率
ＣＥ１を有するクラス１を起点として予期を計算する場
合には、それぞれに対する新しい予期確率は次に従って
計算する。たとえば、予期確率ＣＥ１は、式（２３）に
基づき計算する。For example, for class 1, four primary links of class 1, class 2, class 3 and class 4 are connected by links. Thus, when calculating expectations starting from class 1 having an expectation probability CE1, a new expectation probability for each is calculated according to: For example, the expected probability CE1 is calculated based on Expression (23).

【００８１】[0081]

【数２３】 (Equation 23)

【００８２】また、クラス２に対する予期確率ＣＥ２
は、式（２４）に基づき計算する。The expected probability CE2 for class 2
Is calculated based on equation (24).

【００８３】[0083]

【数２４】 (Equation 24)

【００８４】ここで、係数ａｌｆａは、現状の予期確率
ＣＥに対する重みを表わし、係数ｂｅｔａは、新しいリ
ンクに対する重みを表わす。なお、（ａｌｆａ＋ｂｅｔ
ａ）＝１であり、係数ｂｅｔａが大きいと予期が移動し
やすい。Here, the coefficient alfa represents the weight for the current expected probability CE, and the coefficient beta represents the weight for the new link. Note that (alfa + bet
a) = 1, and the expectation is likely to shift when the coefficient beta is large.

【００８５】新しい予期確率ＣＥを計算して、その値を
大きいものからソートし、上位のものから決定された個
数だけクラスを選択する。そして、予期確率ＣＥの合計
が１になるように正規化する。また、リンクの個数が少
ない場合は、２次リンクまで計算する。その場合、１次
リンクの中で予期確率ＣＥの大きいものから探索する。
図４の場合、破線でくくられた範囲が計算範囲であり、
クラス３に対して２次リンクまで計算を行なっている。
このように予期の計算は、予期確率ＣＥの個数、または
リンクの次数の制限によって決まった数を行なう。A new expected probability CE is calculated, the values are sorted from the largest one, and the classes are selected by the number determined from the top one. Then, normalization is performed so that the sum of the expected probabilities CE becomes 1. If the number of links is small, calculation is performed up to the secondary link. In that case, the search is performed from the primary link having the highest expected probability CE.
In the case of FIG. 4, the range enclosed by the broken line is the calculation range,
Calculation is performed for class 3 up to the secondary link.
As described above, the expectation is calculated by the number determined by the number of expected probabilities CE or the limit of the degree of the link.

【００８６】その他の範囲のクラスの予期確率ＣＥは０
とする。予期の計算は、必要な個数のクラスに対しての
計算しか行なわれないので、クラスの個数が増加しても
計算量が少ない。また、指定する個数に比例する計算量
とシステムのレスポンスとのトレードオフを適応的に決
めることで、システムの反応性と熟考性の性格付けがで
きる。The expected probabilities CE of the other classes are 0.
And Since the calculation of the expectation is performed only for the required number of classes, the amount of calculation is small even if the number of classes increases. Also, by adaptively determining a trade-off between the amount of calculation proportional to the specified number and the response of the system, it is possible to characterize the responsiveness and contemplation of the system.

【００８７】内部状態としての充足度Ｆと平均充足度Ｆ
ａｖｅとによって、予期の生成のストラテジを選択する
ことで、予期とそれに伴って選択される行為とに正確付
けができる。一例として、充足度Ｆと平均充足度Ｆａｖ
ｅとの組合せで、以下の具体例に示すように、行動スト
ラテジを選択する。Satisfaction degree F and average satisfaction degree F as internal states
By selecting a strategy for generating an expectation with ave, the expectation and the action selected with it can be accurately identified. As an example, the satisfaction degree F and the average satisfaction degree Fav
In combination with e, an action strategy is selected as shown in the following specific example.

【００８８】充足度Ｆ≧０、平均充足度Ｆａｖｅ≧０の
場合には、現在の充足度および時間平均の充足度の両方
ともが正の値なので、予期がとてもうまくいっているこ
とを表わす。この場合には、予期の範囲を狭めても大丈
夫な状態であり、それによってシステムとして速いレス
ポンスをすることができる。When the satisfaction degree F ≧ 0 and the average satisfaction degree Fave ≧ 0, both the current satisfaction degree and the time-average satisfaction degree are positive values, indicating that the prediction is very successful. In this case, even if the range of the expectation is narrowed, it is OK, and the system can respond quickly.

【００８９】充足度Ｆ≧０、平均充足度Ｆａｖｅ＜０の
場合、しばらく予期はうまくいっていなかったが、現在
の予期はうまくいっているので、広めに予期を生成す
る。When the satisfaction degree F ≧ 0 and the average satisfaction degree Fave <0, the expectation has not been successful for a while, but the present expectation has been successful, so that the expectation is broadly generated.

【００９０】充足度Ｆ＜０、平均充足度Ｆａｖｅ≧０の
場合、しばらく予期はうまくいっていたが、現在の予期
が外れた状態である。予期は変更せずに従来の予期をそ
のまま維持することで、自己内部の予期を重んずること
に相当し、自己主張的な怒りを感じさせる行為となる。When the degree of satisfaction F <0 and the average degree of satisfaction Fave ≧ 0, the expectation has been successful for a while, but the present expectation has been missed. By maintaining the conventional expectations without altering the expectations, this is equivalent to respecting the internal expectations, which is an act of feeling self-asserted anger.

【００９１】充足度Ｆ＜０、平均充足度Ｆａｖｅ＜０の
場合、外部から入力された発話に対する予期において、
外部から入力されたクラスに対する重みを強くするよう
に予期を生成し、このクラスを発話探索部１３で選択す
るようになり、相手に対する模倣的な行為をする。した
がって従順な行動であるようにみえる。悲しみに近い状
態を表わす。また、自己の発話に対する予期では、同一
の発話をしないように自己のクラスを外して予期を生成
し、行動の多様性を生成する。ここで平均充足度Ｆａｖ
ｅは心地よさ／不快さを表わすパラメータとしても捉え
ることができる。When the satisfaction degree F <0 and the average satisfaction degree Fave <0, in the expectation of the utterance input from the outside,
An expectation is generated so as to increase the weight of a class input from the outside, and this class is selected by the utterance search unit 13, thereby imitating an opponent. Thus, it appears to be obedient. Represents a state close to sadness. In addition, in the expectation of the user's utterance, the user removes his / her own class so as not to make the same utterance and generates the expectation, thereby generating a variety of actions. Where the average degree of satisfaction Fav
e can also be grasped as a parameter representing comfort / discomfort.

【００９２】図１を参照して、活性計算部１２では、外
部からのイベントや予期から自己の行動をするかどうか
の判定を行なう。これをフラグＡｃｔｉｏｎの真偽とし
て設定する。Referring to FIG. 1, activity calculating section 12 determines whether or not to take action based on an external event or expectation. This is set as the true / false flag.

【００９３】まず、外部イベントが入力した場合には、
フラグＡｃｔｉｏｎが真とする。これはシステムとして
何か入力があったら必ずリスポンスをするということに
相当する。そして、平均充足度Ｆａｖｅの絶対値がある
値を超えたらフラグＡｃｔｉｏｎを真とする。平均充足
度Ｆａｖｅは、上述したように感情ではムードに相当す
る時間的に滑らかなパラメータであり、うまくいってい
るとき、ずれが大きくなったとき、または状況に応じて
自発的な行為をすることを意味している。さらに、情動
度Ｅｍｏｔｉｖｅがある値を超えたらフラグＡｃｔｉｏ
ｎを真とする。これは、短期的にイベントが起こったと
きに反応することに相当する。First, when an external event is input,
The flag Action is true. This is equivalent to always responding if there is any input as a system. If the absolute value of the average satisfaction level Fave exceeds a certain value, the flag Action is set to true. The average satisfaction level Fave is a temporally smooth parameter corresponding to the mood in emotion as described above, and indicates that when it is working well, when the gap is large, or when it performs spontaneous action according to the situation. Means. Further, when the emotion degree Emotive exceeds a certain value, the flag Actio
Let n be true. This corresponds to reacting when an event occurs in the short term.

【００９４】また、値Ｔｇがある値を超えたらフラグＡ
ｃｔｉｏｎを真とする。これは、相手からの反応がある
時間以上なかった場合にシステムが行動を催促すること
を意味する。平均充足度Ｆａｖｅと情動度Ｅｍｏｔｉｖ
ｅとによって、一度フラグＡｃｔｉｏｎが真になった場
合、しばらくフラグＡｃｔｉｏｎを決定するしきい値を
大きくして、二度フラグＡｃｔｉｏｎが真になることを
避ける。大きくなったしきい値は、時間的に減少させ
る。If the value Tg exceeds a certain value, the flag A
Let ction be true. This means that the system prompts for an action when there is no response from the other party for a certain period of time. Average Satisfaction Fave and Emotion Emotiv
When the flag Action becomes true once by e, the threshold value for determining the flag Action is increased for a while to avoid that the flag Action becomes true twice. The increased threshold is reduced in time.

【００９５】発話探索部１３では、活性計算部１２でフ
ラグＡｃｔｉｏｎが真になったときに発話すべきクラス
を探索する。具体的には、予期されているクラスの中か
ら、最大の予期確率ＣＥを有するクラスを探索する。The utterance searching unit 13 searches for a class to be uttered when the activity calculation unit 12 sets the flag Action to true. Specifically, the class having the highest expected probability CE is searched from the expected classes.

【００９６】発話生成部１４では、発話すべきクラスの
パターンから音声パワーのフレームごとの時系列を生成
する。この出力は、音声区間検出部５に入力される。The utterance generation unit 14 generates a time series of audio power for each frame from the pattern of the class to be uttered. This output is input to the voice section detection unit 5.

【００９７】波形生成部１５では、音声パワーのデータ
から波形を生成する。一例として、連続したＮフレーム
のパワーデータｐ（１）、…、ｐ（Ｎ）から波形を生成
するアルゴリズムを以下に示す。あるフレームにおいて
は、フレームシフト（ＦＲＳＨＩＦＴ）だけの波形サン
プルデータを生成する。波形の自然性を持たせるため
に、パワーが増加するのに比例して周波数も増加させ
る。また自然性のためにパワーおよび周波数を滑らかに
変化させる。波形は、一例としてｓｉｎ波の１倍、２
倍、３倍の成分を足し合わせたものを生成する。The waveform generator 15 generates a waveform from the audio power data. As an example, an algorithm for generating a waveform from power data p (1),..., P (N) of N consecutive frames is described below. In a certain frame, waveform sample data of only frame shift (FRSHIFT) is generated. To increase the naturalness of the waveform, the frequency is increased in proportion to the increase in power. Also, the power and frequency are smoothly changed for naturalness. The waveform is, for example, 1 time of a sine wave, 2 times.
The sum of the double and triple components is generated.

【００９８】まず、あるフレームのパワーデータｐ
（ｉ）（１≦ｉ≦Ｎ）から、式（２５）に基づき値Ｄｅ
ｌｔａＰｏｗｅｒを計算する。First, power data p of a certain frame
(I) From (1 ≦ i ≦ N), the value De is calculated based on equation (25).
Calculate itaPower.

【００９９】[0099]

【数２５】 (Equation 25)

【０１００】あるフレームの周波数Ｆｒｅｑ（ｉ）は、
式（２６）に基づき計算する。The frequency Freq (i) of a certain frame is
It is calculated based on equation (26).

【０１０１】[0101]

【数２６】 (Equation 26)

【０１０２】ここで、ＢａｓｅＦｒｅｑは基準となる周
波数を示す定数であり、ｆｒｅｑ＿ｓｅｎｓｅはパワー
データｐ（ｉ）によって周波数の変動を決める度合いを
示す定数である。Here, BaseFreq is a constant indicating the reference frequency, and freq_sense is a constant indicating the degree to which the frequency fluctuation is determined by the power data p (i).

【０１０３】周波数の変化を滑らかにするために、フレ
ームの周波数Ｆｒｅｑ（ｉ）を、式（２７）を用いて平
滑化する。In order to smooth the frequency change, the frame frequency Freq (i) is smoothed using the equation (27).

【０１０４】[0104]

【数２７】 [Equation 27]

【０１０５】次に、式（２７）の結果を用いて、式（２
８）に基づき、値ＤｅｌｔａＦｒｅｑを求める。Next, using the result of equation (27), equation (2)
The value DeltaFreq is determined based on 8).

【０１０６】[0106]

【数２８】 [Equation 28]

【０１０７】波形サンプルｘ（ｔ）（ただし、ｔ＝０、
…、ＦＲＳＨＩＦＴ−１）は、サンプリング周波数をＦ
Ｓとすると、式（２９）に示す位相ｐｈａｓｅ（ｔ）
と、式（３０）で示す振幅ａｍｐ（ｔ）に基づき、式
（３１）により求める。Waveform sample x (t) (where t = 0,
..., FRSHIFT-1) sets the sampling frequency to F
Let S be the phase phase (t) shown in equation (29).
, And the amplitude amp (t) shown in the equation (30).

【０１０８】[0108]

【数２９】 (Equation 29)

【０１０９】[0109]

【数３０】 [Equation 30]

【０１１０】[0110]

【数３１】 (Equation 31)

【０１１１】なお、波形は、波形テーブルを参照して生
成することも可能である。以上の説明に基づき、本発明
の実施の形態１における学習型相互作用装置の処理過程
を、図５および図６を用いて説明する。図５および図６
は、本発明の実施の形態１における学習型相互作用装置
の処理過程を説明するためのフローチャートである。The waveform can be generated with reference to a waveform table. Based on the above description, the process of the learning interaction apparatus according to Embodiment 1 of the present invention will be described with reference to FIGS. 5 and 6
5 is a flowchart for explaining a process performed by the learning interaction apparatus according to Embodiment 1 of the present invention.

【０１１２】図５および図６を参照して、ステップＳ１
では、パワー計算部４において入力音声のパワー計算
（１フレーム）を行なう。ステップＳ２においては、音
声区間検出部５において入力音声および自己発話につい
ての音声区間の検出が行なわれる。ステップＳ３におい
て発話が検出されたか否かが判断される。発話が検出さ
れなかった場合には、後述するステップＳ９に移る。Referring to FIG. 5 and FIG. 6, step S1
Then, the power calculator 4 calculates the power of the input voice (one frame). In step S2, the voice section detection section 5 detects a voice section of the input voice and the self-utterance. In step S3, it is determined whether an utterance has been detected. If no utterance has been detected, the process proceeds to step S9 described below.

【０１１３】続くステップＳ４では、正規化部６におい
て、発話の正規化が行なわれる。ステップＳ５では、距
離計算部７において発話パターンに一番近いクラスと発
話パターンとの間の距離を求める。In the subsequent step S4, the normalizing section 6 normalizes the utterance. In step S5, the distance calculator 7 calculates the distance between the class closest to the utterance pattern and the utterance pattern.

【０１１４】続くステップＳ６では、分類化部８におい
て、求めた距離とクラス生成基準とを比較し、クラス生
成基準よりも大きい場合には、ステップＳ７に移り、新
しいクラスの生成が行なわれる。クラス生成基準よりも
小さい場合、および新しいクラスが生成された後には、
ステップＳ８に移り、関連づけ部９において発話クラス
の関連づけが行なわれる。In the following step S6, the categorizing section 8 compares the obtained distance with the class generation standard. If the distance is larger than the class generation standard, the process proceeds to step S7, where a new class is generated. If it is less than the class creation criteria, and after a new class is created,
In step S8, the associating unit 9 associates the utterance classes.

【０１１５】続くステップＳ９では、充足度計算部１０
において、充足度の計算（更新）が行なわれる。続くス
テップＳ１０において、発話イベントがあった場合には
ステップＳ１１に移る。ステップＳ１１では、予期生成
部１１において予期の生成が行なわれる。発話イベント
がなかった場合、および予期の生成が行なわれた後に
は、ステップＳ１２に移り、活性計算部１２において活
性（フラグＡｃｔｉｏｎ）の計算を行なう。ステップＳ
１３においては、フラグＡｃｔｉｏｎが真か否かが判断
される。フラグＡｃｔｉｏｎが真の場合には、ステップ
Ｓ１４に移り、発話探索部１３において発話するクラス
の探索をする。自己の発話を開始する。In the following step S9, the sufficiency calculation section 10
In, the calculation (update) of the sufficiency is performed. In the following step S10, if there is a speech event, the process proceeds to step S11. In step S11, the expectation generation unit 11 generates an expectation. When there is no utterance event and after the generation of the expectation, the process proceeds to step S12, and the activity calculation unit 12 calculates the activity (flag Action). Step S
At 13, it is determined whether the flag Action is true. If the flag Action is true, the process proceeds to step S14, and the utterance search unit 13 searches for a class to be uttered. Start speaking yourself.

【０１１６】フラグＡｃｔｉｏｎが真でない場合、およ
び発話探索の後は、ステップＳ１５において、自己の発
話が終了したか否かが判断される。終了していない場合
には、ステップＳ１６に移り、発話生成部１４および波
形生成部１５を介して波形を生成し、スピーカ２から出
力（１フレーム分）する。ステップＳ１５またはステッ
プＳ１６の終了後、ステップＳ１に戻る。If the flag Action is not true and after the utterance search, it is determined in step S15 whether or not the own utterance has ended. If the processing has not been completed, the process proceeds to step S16, where a waveform is generated via the utterance generation unit 14 and the waveform generation unit 15 and output from the speaker 2 (for one frame). After the end of step S15 or step S16, the process returns to step S1.

【０１１７】なお、本発明の実施の形態１で示した各パ
ラメータは１つの例であり、これに限定するものではな
い。具体的実施例において適宜変更することが可能であ
る。また、取扱うセンサも音声のマイクに限らずカメラ
や触覚センサを用いて多次元の感覚データとして知覚
し、ディスプレイやロボットアームなどで行為を生成す
ることも可能である。Each parameter shown in the first embodiment of the present invention is one example, and the present invention is not limited to this. It can be changed appropriately in specific embodiments. In addition, the sensor to be handled is not limited to a voice microphone, but can be perceived as multidimensional sensory data using a camera or a tactile sensor, and an action can be generated by a display or a robot arm.

【０１１８】今回開示された実施の形態はすべての点で
例示であって制限的なものではないと考えられるべきで
ある。本発明の範囲は上記した説明ではなくて特許請求
の範囲によって示され、特許請求の範囲と均等の意味お
よび範囲内でのすべての変更が含まれることが意図され
る。The embodiments disclosed this time are to be considered in all respects as illustrative and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

【０１１９】[0119]

【発明の効果】以上のように、本発明によれば、予め設
計者が知覚と知覚に対する行為の生成方法とを与えなく
とも、自律的にインタラクションをする中で知覚と行為
のパターンとを獲得し、知覚と行為との関連づけを学習
するインタラクションシステムを構成することができ
る。As described above, according to the present invention, it is possible to acquire patterns of perception and action while interacting autonomously without giving a designer a method of generating perception and actions for perception in advance. In addition, an interaction system that learns the association between perception and action can be configured.

【０１２０】また、自発的な行為の生成を生成する機構
によって自発的にインタラクションを行なうとともに、
さらなる学習を行なうことができる。また、自発的な行
動生成のストラテジが人間の情動に基づくものと同様な
行動を生成するため、人間とのインタラクションを行な
う場合に人間にとって理解しやすいシステムとなる。In addition, a mechanism for generating a spontaneous action performs a spontaneous interaction,
More learning can be done. In addition, since the strategy of spontaneous action generation generates actions similar to those based on human emotions, the system becomes easy for humans to understand when interacting with humans.

[Brief description of the drawings]

【図１】本発明の実施の形態１に係る学習型相互作用装
置の全体構成を示すブロック図である。FIG. 1 is a block diagram showing an overall configuration of a learning interaction device according to Embodiment 1 of the present invention.

【図２】パターンクラスの分類を行なう際の階層構造を
示した図である。FIG. 2 is a diagram showing a hierarchical structure when pattern classes are classified.

【図３】知覚に対する発話の行為の関連づけについて説
明するための図である。FIG. 3 is a diagram for explaining association of an utterance act with respect to perception;

【図４】予期の生成のための計算を説明するための概念
図である。FIG. 4 is a conceptual diagram illustrating a calculation for generating an expectation.

【図５】本発明の実施の形態１における学習型相互作用
装置の処理過程を説明するためのフローチャートであ
る。FIG. 5 is a flowchart illustrating a process of the learning interaction apparatus according to the first embodiment of the present invention.

【図６】本発明の実施の形態１における学習型相互作用
装置の処理過程を説明するためのフローチャートであ
る。FIG. 6 is a flowchart for explaining a processing process of the learning interaction apparatus according to the first embodiment of the present invention.

[Explanation of symbols]

１マイク２スピーカ３Ａ／Ｄ変換部４パワー計算部５音声区間検出部６正規化部７距離計算部８分類化部９関連づけ部１０充足度計算部１１予期生成部１２活性計算部１３発話探索部１４発話生成部１５波形生成部 Reference Signs List 1 microphone 2 speaker 3 A / D conversion unit 4 power calculation unit 5 voice section detection unit 6 normalization unit 7 distance calculation unit 8 classification unit 9 association unit 10 sufficiency calculation unit 11 expectation generation unit 12 activity calculation unit 13 utterance search Unit 14 utterance generation unit 15 waveform generation unit

─────────────────────────────────────────────────────
────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成１１年９月２８日（１９９９．９．２
８）[Submission date] September 28, 1999 (1999.9.2)
8)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】請求項２[Correction target item name] Claim 2

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【手続補正２】[Procedure amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】０００８[Correction target item name] 0008

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【０００８】請求項２に係る学習型相互作用装置は、請
求項１に係る学習型相互作用装置であって、予期手段を
さらに備え、予期手段は、センシング手段により次に知
覚されるパターンが分類化されるクラスの確率を計算す
る手段と、計算された確率に基づき、次に知覚されるパ
ターンが分類化されるクラスとセンシング手段により実
際に知覚されたパターンの属するクラスとのずれを計算
する手段と、ずれに基づく内部状態をもとに、次に知覚
されるパターンが分類化されるクラスの確率の計算のた
めのストラテジを選択する手段とを含む。[0008] A learning-type interaction device according to a second aspect is the learning-type interaction device according to the first aspect, further comprising an expectation unit, and the expectation unit is notified next by the sensing unit.
Calculate the probabilities of the class in which the perceived pattern is classified
Means that, based on the calculated probabilities, then Pa perceived
Means for calculating a deviation between the actual perceived pattern belongs class by class and sensing means turns are categorization, the internal state based on the shift to the original, then perceived
Means for calculating the probability of the class in which the pattern to be classified is categorized .

フロントページの続きＦターム(参考） 5H004 GA26 HA20 HB15 JB18 JB19 KD63 LA17 9A001 BB04 DD13 EE05 FF10 FZ03 GZ05 HH05 HH16 HH18 HH21 JZ06 KZ32 Continued on the front page F term (reference) 5H004 GA26 HA20 HB15 JB18 JB19 KD63 LA17 9A001 BB04 DD13 EE05 FF10 FZ03 GZ05 HH05 HH16 HH18 HH21 JZ06 KZ32

Claims

[Claims]

1. A sensing means for perceiving the outside world, a classifying means for classifying the perceived pattern while dynamically generating a class, and an association between the classified classes is provided. A learning type interaction device comprising: means for changing a weight to select an optimal output pattern for the perceived pattern; and an actuator for outputting the optimal output pattern for the perceived pattern to the outside world.

2. The apparatus according to claim 1, further comprising: an expectation unit, wherein the expectation unit expects the perceived pattern, and calculates a deviation between the expected pattern and the pattern actually perceived by the sensing unit. The learning-type interaction device according to claim 1, further comprising: a means for selecting a strategy for generating the expectation based on an internal state based on the deviation.

3. The sensing unit includes: a microphone that perceives the outside world; and a voice analysis unit that analyzes a voice and outputs a parameter obtained as a result of the analysis to the classification unit. The speaker according to claim 1 or 2, further comprising: a speaker that outputs to the outside world; and a voice generation unit that generates voice that is the optimal output pattern, wherein a system of a specific language is generated by the dialogue with the voice. Learning type interaction device.

4. The voice analyzing means analyzes the power of the voice, and the voice generating means generates a highly natural voice by smoothly changing the power and frequency of the analyzed voice. The learning-type interaction device according to claim 3.

5. The learning interaction apparatus according to claim 3, wherein the voice analysis unit analyzes the voice perceived from the outside world and the voice generated by the voice generation unit.