JP3164346B2

JP3164346B2 - Voice Recognition Interactive Doll Toy

Info

Publication number: JP3164346B2
Application number: JP32900999A
Authority: JP
Inventors: サンソウルキム; ジュウヒョンリュウ; ウォンイルカン; ヨンジョンパク; ウンジャキム; サッボンクオン; チェキョンイー; キョンチェチー; テイシクパン; チュイョンハン
Original assignee: 株式会社韓国エキシス
Priority date: 1999-05-10
Filing date: 1999-11-19
Publication date: 2001-05-08
Anticipated expiration: 2019-11-19
Also published as: KR100332966B1; KR19990068379A; JP2000325669A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識対話型人
形おもちゃに係り、より詳しくは、人形おもちゃ内に音
声認識システムを設け、使用者と音声表現により面白い
対話を行うことができる音声認識対話型人形おもちゃ及
びその制御方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice-recognition interactive doll toy, and more particularly, to a voice-recognition dialogue in which a voice-recognition system is provided in a doll toy, and an interesting dialogue with a user can be made by voice expression. The present invention relates to a model doll toy and a control method thereof.

【０００２】[0002]

【従来の技術】一般に、子供は興味ある遊び又はおもち
ゃにより生活教育を学習する傾向があり、そのおもちゃ
との親密な触れ合は、実社会へ導く模倣学習を実行する
ものである。このような模倣学習は、大部分は人形を通
じて行われるものである。そこで、子供は自分で模倣学
習のシナリオを作り、そのシナリオに従って、その人形
に適切な反応を誘導し、即ち適切な音声表現と動作行為
を双方向対話型に面白く進行することにより、その模倣
学習に没頭することになる。2. Description of the Related Art In general, children tend to learn life education through interesting play or toys, and the close contact with the toys is to perform imitation learning that leads to the real world. Such imitation learning is mostly performed through dolls. Therefore, the child makes his or her own imitation learning scenario, and in accordance with the scenario, induces an appropriate response to the doll. You will be immersed in.

【０００３】このように、おもちゃによる教育は、昔か
ら子供に密着したものとして引き継がれている。最近で
は、このような教育的な効果を期待し得る発声人形の研
究が活発になり、より進歩的な人形の製作が絶え間なく
試みられている。[0003] As described above, education using toys has been carried on as a close contact with children since ancient times. In recent years, research on voiced dolls that can be expected to have such educational effects has become active, and attempts have been made constantly to produce more advanced dolls.

【０００４】このような従来の技術による人形おもちゃ
は、そのほとんどがタッチセンサが人形のある位置に設
けられているものである。子供がこのタッチセンサを動
作させると、磁気記録媒体（磁気テープ）又は半導体記
録媒体（ＩＣメモリ）に記録させた簡単な文章の音声表
現、例えば「今日は、私はミッキーです。あなたは誰で
すか。あなたは何をしていますか。」などのような不連
続の短文の音声を発声するようになっている。また、定
型化された２、３通りの動作行為、例えば腕を上げる動
作、頭を動かす動作などのような単純動作を行って、一
時的な好奇心を満足させるに止めている。Most of such doll toys according to the prior art have a touch sensor provided at a certain position of the doll. When a child activates this touch sensor, a voice expression of a simple sentence recorded on a magnetic recording medium (magnetic tape) or a semiconductor recording medium (IC memory), for example, "Today, I am Mickey. Who are you? "What are you doing?" And so on. In addition, the user performs only a few types of stylized movements, for example, simple movements such as raising the arm and moving the head, to satisfy the temporary curiosity.

【０００５】[0005]

【発明が解決しようとする課題】従って、このような従
来の人形おもちゃは、単発的で、簡単な文章を話す人形
であり、タッチセンサの動作によって、シナリオのない
単純な文章が録音された音声を聞かせるため、一時的な
好奇心を誘発することができる。しかし、直ぐに子供は
飽きてしまい、実際にこのような人形おもちゃと遊ぶ期
間が短くなるため、教育的な効果が低いという問題点が
ある。Therefore, such a conventional doll toy is a spontaneous doll which speaks a simple sentence, and a simple sentence without a scenario is recorded by the operation of a touch sensor. Can trigger a temporary curiosity. However, the child gets tired immediately, and the period of playing with such a doll toy is shortened, so that the educational effect is low.

【０００６】また、従来の人形おもちゃが話す音声文章
は、対話型のシナリオでなく、不連続的な文章の羅列で
あり、現実味に乏しいため、その教育的効果も次第に低
下するという問題点がある。[0006] In addition, the audio text traditional doll toy is speaking, not an interactive scenario, Ri Oh in the enumeration of discrete sentences, for poor in reality, also decreases gradually the educational effect There is a problem.

【０００７】本発明は、かかる問題点を解決するために
創案されたものである。すなわち、本発明の目的は、子
供等の使用者の音声を認識し、子供の思考方式及び行動
様式によって、少なくとも一つのシナリオに従って連続
的に対話を行なえるようにした音声認識対話型人形おも
ちゃ及びその制御方法を提供することにある。The present invention has been made to solve such a problem. That is, an object of the present invention is to provide a voice recognition interactive doll toy which recognizes a voice of a user such as a child and can continuously perform a dialog according to at least one scenario according to a child's thinking method and behavior style. An object of the present invention is to provide a control method.

【０００８】本発明の一番目の目的は、基本的に、話題
に応じた音声出力を可能にし、子供がとり得る行動パタ
ーンをシナリオに作成し記録して、任意に設定された状
況に応じて人形と双方向の対話を可能にすることにあ
る。本発明の二番目の目的は、子供と対話する状況で、
多様なシナリオに導くため、音声圧縮用ソフトウェアで
音声を圧縮した後、これをロム（ＲＯＭ）に記録し、必
要時に速やかに取り出すことは勿論、一つの話題におい
ても、選択可能な状況に応じて直ちに対話を可能にし
て、速やかな音声出力を可能にした装置に構成すること
にある。本発明の三番目の目的は、不特定多数人から入
力される音声を理解するため、話者独立型という音声認
識技法により、前記多数人の音声を学習させて、合理的
な反応が起こるようにすることある。本発明の四番目の
目的は、人形に触るときと撫でるときに備えた雑音研究
により、適切に処理し得るソフトウェアで周辺の雑音と
子供の音声とを識別し得るようにすることにある。本発
明の五番目の目的は、４個の接触スイッチを備え、人形
が特定姿勢となるか、子供が人形の所定部位に接触する
とき、即ち子供と人形との接触があるとき、適切な音声
反応を通じて興味を抱かせることにある。本発明の六番
目の目的は、入力された音声信号をシステムが理解し、
これを適切に解釈して、実時間で適切な反応を行なえる
ようにハードウェアを具現化し、予め記録されたデータ
ベースから、人が反応するような現実的な内容（シナリ
オ）を取り出して出力させ得るようにすることにある。[0008] A first object of the present invention is to basically enable voice output according to a topic, create and record a behavior pattern that a child can take in a scenario, and respond to an arbitrarily set situation. To enable two-way conversation with a doll. A second object of the present invention is in situations where children interact,
In order to lead to various scenarios, after compressing the voice with the voice compression software, record it in ROM (ROM) and take it out promptly when necessary. It is an object of the present invention to provide a device that enables immediate dialogue and prompt audio output. A third object of the present invention is to use a speaker-independent voice recognition technique to learn the voices of the multiple people so that a reasonable reaction occurs in order to understand the voices input from the unspecified multiple people. Sometimes. A fourth object of the present invention is to make it possible to discriminate between a surrounding noise and a child's voice by software that can appropriately process the noise by preparing a noise when touching and stroking the doll. A fifth object of the present invention is to provide four contact switches so that when the doll is in a specific posture or when the child contacts a predetermined part of the doll, that is, when the child and the doll are in contact with each other, an appropriate sound is output. there to inspire an interest through the reaction. A sixth object of the present invention is to allow a system to understand an input audio signal,
By properly interpreting this, the hardware is implemented so that an appropriate reaction can be performed in real time, and realistic contents (scenarios) that respond to humans are extracted from a pre-recorded database and output. Is to get it.

【０００９】従って、本発明は、このような多様な機能
と性能を充足させ得るように、先端のソフトウェアと先
端の回路製作技術を実現するため、即ち人形に音声デコ
ーダ、音声認識部、システムコントローラ、ダイアログ
マネージャー、その他の興味を誘発させ得る多様な補助
機能を加えて、主に子供用人形おもちゃとしての外的及
び内的要件を充足させ、言語による教育的効果（言語教
育、遊び教育）を奏し得るよう、話者独立型、人工知能
型、対話型の性能を有するようにした音声認識対話型人
形おもちゃを提供することにその目的がある。Accordingly, the present invention is to realize advanced software and advanced circuit manufacturing technology so as to satisfy such various functions and performances, that is, to provide a doll with a voice decoder, a voice recognition unit, and a system controller. , Dialog managers, and other various auxiliary functions that can elicit interest, mainly satisfy the external and internal requirements of children's doll toys, and enhance the educational effects of language (language education, play education) It is an object of the present invention to provide a voice-recognition interactive doll toy that has speaker independent, artificial intelligence, and interactive performance so that it can be played.

【００１０】[0010]

【課題を解決するための手段】上記目的を達成するため
に、本発明によれば、人と動物の形態が混合した形状に
形成された人形本体に、多数の文章のデジタル音声信号
ストリームが所定の圧縮率で圧縮された音声圧縮データ
を記録している第１メモリ部（３３）と、外部から入力
された使用者の音声信号を認識するための演算エリアが
備えられている第２メモリ部（３５）とを備えた音声認
識対話型人形おもちゃであって、前記第２メモリ部（３
５）に記録された、少なくとも１文章の使用者の音声を
電気的な音声信号に変換して出力し、伸張された音声信
号を前記使用者に聴覚的に聞かせる音声入出力部（３
７）と、前記音声入出力部（３７）から出力されるフレ
ーム単位の使用者のデジタル音声信号を一時的に記録す
るサーキュラバッファ（５１）と、前記サーキュラバッ
ファ（５１）に記録されたデジタル音声信号を、前記第
１メモリ部（３３）に記録された圧縮データの音声認識
用定数によって音声認識用単語に区分してビタビアルゴ
リズムで前記使用者の音声を認識する音声認識部（５
３）と、前記音声認識部（５３）で認識された音声の内
容が所定のシナリオに対応するように、少なくとも一つ
の応答文章を前記第１メモリ部（３３）で選択するダイ
アログマネージャ（５５）と、前記ダイアログマネージ
ャ（５５）で選択された前記第１メモリ部（３３）の音
声圧縮データを伸張及び復元する音声デコーダ（５７）
と、前記音声デコーダ（５７）と音声入出力部（３７）
間に設けられた、アナログ音声信号とデジタル音声信号
の一方を他方に変換するＡ／Ｄ・Ｄ／Ａコンバータ（４
７）と、前記第２メモリ部（３５）とリストコントロー
ラ（６１）間に設けられ、前記第１メモリ部（３３）の
データを前記第２メモリ部（３５）に転送するメモリコ
ントローラ（６３）と、を含むことを特徴とする音声認
識対話型人形おもちゃが提供される。According to the present invention, in order to achieve the above-mentioned object, the shape of human and animal is mixed.
A first memory unit (33) in which digital audio signal streams of a large number of sentences are compressed at a predetermined compression rate in the formed doll body, and a user voice input from outside. A speech recognition interactive doll toy comprising a second memory unit (35) provided with an operation area for recognizing a signal, wherein the second memory unit (3)
5) recorded in, and outputs the converted electrical voice signal to the voice of the user of the at least one sentence, voice input and output unit to aurally tell the decompressed audio signal to the user (3
7) and a frame output from the audio input / output unit (37).
Temporarily record the digital audio signal of the user
Circular buffer (51), and the circular buffer
The digital audio signal recorded in the
Voice recognition of compressed data recorded in one memory unit (33)
Viterbi Argo divided into words for speech recognition according to the
A voice recognition unit (5) that recognizes the user's voice in rhythm
3) and the speech recognized by the speech recognition unit (53).
At least one so that the content corresponds to the given scenario
For selecting a response sentence in the first memory unit (33).
Alog manager (55) and the dialog manager
Sound of the first memory unit (33) selected by the keyer (55)
Voice decoder for expanding and restoring voice compressed data (57)
And the audio decoder (57) and the audio input / output unit (37)
Analog audio signal and digital audio signal provided between
A / D / D / A converter (4
7), the second memory unit (35) and the list control
And between the first memory unit (33).
A memory controller for transferring data to the second memory unit (35);
And a controller (63) .

【００１１】上記発明の構成では、話題に応じた音声出
力を可能にし、子供が行う可能性のある行動パターンを
シナリオに作成し記録させ、任意に設定された状況に応
じて人形と双方向の対話が可能になる。例えば、子供と
対話する状況で、多様なシナリオに導くため、音声圧縮
用ソフトウェアで音声を圧縮した後、これを第１メモリ
部（３３）に記録させ、必要時に速やかに取り出すこと
は勿論、一つの話題においても、選択可能な状況に応じ
て直ちに質疑応答が可能になる。また、この音声認識対
話型人形おもちゃは、入力された音声信号をシステムが
理解し、これを適切に解釈して、実時間で適切な反応を
行なえるようにハードウェアを具現化し、予め記録され
たデータベースから、人が反応するような現実的な内容
（シナリオ）を取り出して出力させることができる。[0011] In the configuration of the present invention, voice output according to the topic is enabled, action patterns that a child may perform are created and recorded in a scenario, and bidirectional communication with the doll can be performed according to an arbitrarily set situation. Dialogue becomes possible. For example, in order to lead to various scenarios in a situation in which the child interacts with the child, the voice is compressed by voice compression software, and then the voice is recorded in the first memory unit (33). Even in one topic, questions and answers can be made immediately according to the available situations. In addition, this speech recognition interactive doll toy implements hardware so that the system can understand the input voice signal, interpret it appropriately, and perform an appropriate reaction in real time, and it is recorded in advance. From the database, realistic contents (scenarios) to which humans react can be extracted and output.

【００１２】[0012]

【００１３】[0013]

【００１４】[0014]

【００１５】そして、前記音声認識部（５３）と前記第
１メモリ部（３３）との間と、前記ダイアログマネージ
ャ（５５）と前記第１メモリ部（３３）との間には、前
記第１メモリ部（３３）から音声圧縮データと圧縮デー
タの音声認識用定数を取り出し、第２メモリ部（３５）
に前記音声認識用データを転送するリストコントローラ
（６１）が備えられていることが好ましい。[0015] The first memory section (33) and the voice recognition section (53) are connected between the dialog manager (55) and the first memory section (33) . The voice compression data and the voice recognition constant of the compressed data are taken out from the memory unit (33) , and the second memory unit (35)
List controller for transferring the voice recognition data to the memory
(61) is preferably provided.

【００１６】前記音声認識部（５３）は、前記サーキュ
ラバッファ（５１）に記録されたフレーム単位のデジタ
ル音声信号から、前記第１メモリ部（３３）の音声認識
用定数によって所定の雑音を除去させ、一つの文字音声
に対する固有値を特徴ベクトルとして算出する音声認識
算出部（７１）と、前記デジタル音声信号のサンプリン
グ値から０点を検出するゼロクロシングレート（７３）
と、前記ゼロクロシングレート（７３）での０点検出に
対する信頼性を向上させるため、前記０点に対するエネ
ルギーを算出するエネルギー算出部（７５）と、前記ゼ
ロクロシングレート（７３）と前記エネルギー算出部
（７５）の出力信号に基づいて、連続的なデジタル音声
信号の中のいずれか１単語の端点データを検出する単位
音声検出部（７７）と、前記音声認識算出部（７１）の
特徴ベクトルデータと前記単位音声検出部（７７）の端
点データに基づいて１単語ずつ音声認識用単語に区分す
る前処理器（７９）と、前記前処理器（７９）で区分さ
れた単語に該当する第１メモリ部（３３）の音声圧縮デ
ータが前記リストコントローラ（６１）により取り出さ
れ、ビタビアルゴリズムで演算するようにした領域を提
供する第２メモリ部（３５）と、から成るものである。The voice recognition section (53) removes predetermined noise from the digital voice signal in frame units recorded in the circular buffer (51) by using a voice recognition constant of the first memory section (33). A speech recognition calculation unit (71) for calculating an eigenvalue for one character voice as a feature vector, and a zero-crossing rate (73) for detecting zero points from the sampling value of the digital speech signal.
When the zero crocin order to improve the reliability of the 0-point detection of the Great (73), an energy calculation unit for calculating an energy for the zero point (75), wherein the energy calculating unit and the zero crocin Great (73)
(75) a unit voice detection unit (77) for detecting end point data of any one word in a continuous digital voice signal based on the output signal of (75) ; and feature vector data of the voice recognition calculation unit (71). And a preprocessor (79) for dividing the words into speech recognition words one by one based on the end point data of the unit speech detector (77) , and a first processor corresponding to the words divided by the preprocessor (79) . A second memory section (35) for providing an area in which the compressed voice data of the memory section (33) is extracted by the list controller (61) and operated by the Viterbi algorithm.

【００１７】このような構成により、不特定多数人から
入力される音声を理解するため、話者独立型という音声
認識技法により、前記多数人の音声を学習させて、合理
的な反応が起こるようにすることができる。With such a configuration, in order to understand voices input from an unspecified number of people, the voices of the plurality of people are learned by a speaker-independent voice recognition technique so that a rational reaction occurs. Can be

【００１８】前記人形本体の複数領域に埋設されてお
り、使用者の接触を前記音声デコーダ（５７）に知らせ
るため、人形本体の背中、鼻、口及び尻に設けられた接
触スイッチ（Ｔ１，Ｔ２，Ｔ３，Ｔ４）を更に含むこと
が好ましい。[0018] The are embedded into a plurality of regions of the doll, to inform the contact of the user to the audio decoder (57), the doll's back, nose, mouth and contact provided ass touch It is preferable to further include a switch (T1, T2, T3, T4) .

【００１９】前記使用者が口、鼻、背中及び尻に設けら
れた接触スイッチ（Ｔ１，Ｔ２，Ｔ３，Ｔ４）と接触す
ると、それに対応する適切な音声を前記ダイアログマネ
ージャ（５５）と前記第１メモリ部（３３）から取り出
し、前記音声デコーダ（５７）で実際の音声に伸張及び
復元した後、前記音声入出力部（３７）を通じて前記使
用者に聴覚的に聞かせることが好ましい。When the user makes contact with the contact switches (T1, T2, T3, T4) provided on the mouth, nose, back and buttocks, an appropriate sound corresponding to the contact is transmitted to the dialog manager (55) and the first switch. It is preferable that the audio data is extracted from the memory unit (33) , decompressed and restored to the actual audio by the audio decoder (57) , and then audibly heard by the user through the audio input / output unit (37) .

【００２０】そして、前記音声入出力部（３７）は、前
記使用者の音声と外部の雑音を電気的信号に変換して前
記サーキュラバッファ（５１）に出力する第１マイクロ
フォン（３９）と、前記外部の雑音を電気的信号に変換
して前記サーキュラバッファ（５１）に出力する第２マ
イクロフォン（４１）と、前記音声デコーダ（５７）で
伸張及び復元された音声信号を電力増幅し、スピーカ
（４３）を通じて前記使用者に聴覚的に聞かせるための
電力増幅部（４５）と、から成るものである。[0020] Then, the voice output unit (37), the first micro phone for outputting voice and external noise of the user on said converted into an electrical signal circular buffer (51) ( 39) , a second microphone (41) for converting the external noise into an electric signal and outputting the electric signal to the circular buffer (51) , and expanded and restored by the audio decoder (57) . Power amplification of audio signal, speaker
(43) a power amplifying section (45) for hearing the user audibly.

【００２１】前記サーキュラバッファ（５１）と前記第
１及び第２マイクロフォン（３９，４１）との間に、該
第１及び第２マイクロフォン（３９，４１）の出力信号
をデジタルに変換し、前記音声デコーダ（５７）と前記
電力増幅部（４５）間に、前記音声デコーダ（５７）で
伸張及び復元されたデジタル音声信号をアナログに変換
するＡ／Ｄ・Ｄ／Ａコンバータ（４７）が設けられてい
ることが好ましい。The conversion between the said circular buffer (51) first and second microphones (39, 41), the output signal of the first and second microphones (39, 41) into a digital An A / D / D / A converter (47) for converting the digital audio signal expanded and restored by the audio decoder (57) into an analog signal between the audio decoder (57) and the power amplifier (45 ); preferably are al provided.

【００２２】このように、４個の接触スイッチ（Ｔ１，
Ｔ２，Ｔ３，Ｔ４）を備え、人形が特定姿勢となるか、
子供が人形の所定部位を接触するとき、即ち子供と人形
との接触があるとき、適切な音声反応を通じて興味を抱
かせることができる。Thus, the four contact switches (T1, T1,
T2, T3, T4) , whether the doll is in a specific posture,
Idaku when the child comes into contact with a predetermined portion of the doll, i.e. when there is contact between the child and the doll, the interest through the appropriate voice reaction
You can make it.

【００２３】また、前記Ａ／Ｄ・Ｄ／Ａコンバータ（４
７）と前記電力増幅部（４５）との間には、使用者のボ
リュームを調節するための命令（「音を大きく」及び
「音を小さく」）に応じて、前記電力増幅部（４５）の
出力強さを調節するボリューム制御部（４９）が備えら
れていることが好ましい。The A / D / D / A converter (4)
7) and the power amplifying unit (45) according to a command for adjusting the volume of the user (“louder sound” and “lower sound” ). It is preferable that a volume control unit (49) for adjusting the output intensity is provided.

【００２４】[0024]

【００２５】[0025]

【００２６】[0026]

【発明の実施の形態】以下、添付図面に基づいて本発明
の音声認識対話型人形おもちゃを詳細に説明する。図１
は本発明の人形おもちゃを示す正面図、図２は本発明の
人形おもちゃを示す側面図、図３は本発明の音声認識対
話型人形おもちゃを示すシステムブロック図、図４は図
３の処理順序による流れ図、図５は本発明のＡＳＩＣ化
された音声認識部のブロック図、図６は本発明の音声認
識対話型人形おもちゃの動作を示すフローチャートであ
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A speech recognition interactive doll toy of the present invention will be described below in detail with reference to the accompanying drawings. FIG.
FIG. 2 is a front view showing the doll toy of the present invention, FIG. 2 is a side view showing the doll toy of the present invention, FIG. 3 is a system block diagram showing the voice recognition interactive doll toy of the present invention, and FIG. FIG. 5 is a block diagram of an ASIC-based voice recognition unit of the present invention, and FIG. 6 is a flowchart showing the operation of the voice recognition interactive doll toy of the present invention.

【００２７】本発明の人形おもちゃは、図１及び図２に
示すように、縫製人形の形状を有し、表皮が被せられて
おり、頭部の内部と胴体の内部は、回路の保護のため、
堅固な枠（図示せず）を有する構造物からなっている。As shown in FIGS. 1 and 2, the doll toy of the present invention has the shape of a sewing doll, is covered with a skin, and the inside of the head and the inside of the body are provided for protecting the circuit. ,
It consists of a structure with a solid frame (not shown).

【００２８】図示例は、全体として人間に似ている妖精
の形態を有しており、胴体は腹部及び胸部１と、４本の
指を有する両手２，３と、２本の腕部８，９とから構成
されている。下半身は２本の脚４，５と、４本の足指を
有する両足６，７と、尻及び尾１７から成る。顔は口１
０と、両耳１１，１２と、頭髪１６と、両目１４，１５
とから構成されている。The illustrated example has a fairy shape which resembles a human as a whole, and has a torso and a chest 1, two hands 2 and 3 having four fingers, and two arms 8, 9. The lower body consists of two legs 4,5, both feet 6,7 with four toes, buttocks and tail 17. Face is mouth 1
0, both ears 11, 12, hair 16, and both eyes 14, 15
It is composed of

【００２９】図２の側面図に示すように、頭部と胴体を
連結する首１９は屈曲可能な柔軟性材料から製作されて
おり、頭部の回路と胴体との配線を容易にしてある。ま
た、本発明の人形おもちゃは美しい外観を有すると共
に、内部の回路を保護し得る素材で被覆してあり、柔ら
かで現実感を与える子供用人形の形状となるように形成
されている。As shown in the side view of FIG. 2, the neck 19 connecting the head and the body is made of a flexible material which can be bent to facilitate the wiring between the head circuit and the body. Further, the doll toy of the present invention has a beautiful appearance and is covered with a material capable of protecting the internal circuit, and is formed into a soft and realistic child doll shape.

【００３０】そして、人形おもちゃには、これに接触し
た時に、人形の反応を誘導する接触スイッチが４箇所に
配置してある。その配置位置は鼻（Ｔ１）、口（Ｔ
２）、背中（Ｔ３）及び尻（Ｔ４）である。これらの接
触スイッチＴ１，Ｔ２，Ｔ３，Ｔ４は、接触を容易に感
知し得るように製作されたものである。これらの接触ス
イッチＴ１，Ｔ２，Ｔ３，Ｔ４は、非常に高い感度を有
するものであり、縫製人形の皮膚内部に設けられ、接触
したときに音声反応を誘導するため、制御部（ＡＳＩ
Ｃ；注文型半導体−マイクロプロセッサ）に直接ハイア
クティブ信号が入力されるようになっている。特に、尻
にある接触スイッチＴ４は、人形が立っているか座って
いるかを感知して、適合した反応を動作させる機能を有
する。The doll toy is provided with four contact switches for inducing a reaction of the doll when it comes into contact with the doll. The arrangement position is nose (T1), mouth (T
2) The back (T3) and buttocks (T4). These contact switches T1, T2, T3, T4 are manufactured so that contact can be easily detected. These contact switches T1, T2, T3, and T4 have extremely high sensitivity, are provided inside the skin of the sewing doll, and induce a voice response when touched.
C; custom-made semiconductor-microprocessor) is directly input with a high active signal. In particular, the contact switch T4 on the buttocks has a function of detecting whether the doll is standing or sitting and operating an appropriate response.

【００３１】例えば、尻の接触スイッチＴ４による反応
は、うつ伏せにしている姿勢を取っているときは、「一
緒に休みませんか」、立っている姿勢にあるときは、
「遊びたい」、口の接触スイッチＴ２を触ると、「とて
もおいしい」、この口の接触スイッチＴ２から手を離す
と、「お腹がペコペコだ」、背中の接触スイッチＴ３を
触ると、「誰ですか」、鼻の接触スイッチＴ１を触る
と、「くすぐったい、アーアー」という音声が出力され
る。[0031] For example, the reaction caused by the contact switch T4 of the buttocks, when you are taking to have attitude to the prone position is, "Do not rest together", when in a standing posture,
"I want to play", touch the contact switch T2 on the mouth, "Very delicious", release my hand from the contact switch T2, "I'm stomach stomach", touch the contact switch T3 on the back, "Who is it?" Or touching the contact switch T1 on the nose, a sound "Tickling, er" is output.

【００３２】本発明の人形おもちゃのシステムは、図３
に示すように、サーキュラバッファ５１と、音声認識部
５３と、音声デコーダ５７と、Ａ／Ｄ・Ｄ／Ａコンバー
タ４７と、メモリコントローラ６３とを含む音声処理制
御部（ASIC-application specific IC）３０と、第１メ
モリ部（ＲＯＭ）３３と、第２メモリ部３５と、音声入
出力部３７とを備えている。第１メモリ部３３は、多数
文章のデジタル音声信号ストリームが所定の圧縮率に圧
縮された音声圧縮データを記録しており、第２メモリ部
３５は、外部から入力された前記子供の音声信号を認識
するための演算エリアを提供し、音声処理制御部３０
は、前記第２メモリ部３５の演算エリアを用いて、前記
子供の音声信号に相応する対話型応答を分析し、前記第
１メモリ部３３から、応答に相当する音声圧縮データを
伸張及び復元し、音声入出力部３７は、少なくとも１文
章の前記子供の音声を電気的音声信号に変換して前記音
声処理制御部３０に出力し、前記音声処理制御部３０か
ら、増幅された音声信号を前記子供に聴覚的に知らせ
る。The doll toy system of the present invention is shown in FIG.
As shown in the figure , a circular buffer 51 and a voice recognition unit
53, an audio decoder 57, and an A / D / D / A converter
A voice processing control unit (ASIC-application specific IC) 30 including a data controller 47, a memory controller 63 , a first memory unit (ROM) 33, a second memory unit 35, and a voice input / output unit 37. I have. The first memory unit 33 stores compressed audio data obtained by compressing a digital audio signal stream of a large number of sentences at a predetermined compression ratio. The second memory unit 35 stores an externally input audio signal of the child. An operation area for recognition is provided, and the voice processing control unit 30 is provided.
Analyzes the interactive response corresponding to the child's voice signal using the calculation area of the second memory unit 35, and decompresses and decompresses the voice compressed data corresponding to the response from the first memory unit 33. The voice input / output unit 37 converts at least one sentence of the child's voice into an electrical voice signal and outputs the electrical voice signal to the voice processing control unit 30. The voice processing control unit 30 converts the amplified voice signal into the electrical voice signal. Tell the child aurally.

【００３３】前記音声入出力部３７は、図４に示すよう
に、前記子供の音声と前記人形の表皮から発生する雑音
を電気的信号に変換して前記サーキュラバッファ５１に
出力する第１マイクロフォン３９と、前記人形の表皮か
ら発生する雑音を電気的信号に変換して前記サーキュラ
バッファ５１に出力する第２マイクロフォン４１と、前
記音声デコーダ５７から、伸張及び復元された音声信号
を電力増幅しスピーカ４３を通じて前記子供に聴覚的に
聞かせる電力増幅部４５とを備えている。前記サーキュ
ラバッファ５１と前記第１及び第２マイクロフォン３
９，４１との間には、前記第１及び第２マイクロフォン
３９，４１の出力信号をデジタルに変換し、前記音声デ
コーダ５７と前記電力増幅部４５との間で、前記音声デ
コーダ５７から、伸張及び復元されたデジタル音声信号
をアナログに変換するＡ／Ｄ・Ｄ／Ａコンバータ４７が
備えられている。ここで、前記スピーカ４３は、前記第
１メモリ部３３に記録された圧縮音声が所定の過程によ
り処理して使用者（子供）に聞かせるように構成されて
いる。As shown in FIG. 4, the voice input / output unit 37 converts a voice of the child and noise generated from the skin of the doll into an electric signal and outputs the electric signal to the circular buffer 51. When the circular converts the noise generated from the epidermis of the doll into an electrical signal
A second microphone 41 for outputting to the buffer 51 and a power amplifying unit 45 for power-amplifying the expanded and restored voice signal from the voice decoder 57 and hearing it to the child through the speaker 43 are provided. The circus
Buffer 51 and the first and second microphones 3
Between the 9,41, and converts the output signal of the first and second microphones 39 and 41 to digital, the audio de
Between the coder 57 and the power amplifier 45, the audio de
The coder 57 is provided with an A / D / D / A converter 47 for converting the expanded and restored digital audio signal into analog. Here, the speaker 43 is configured to process the compressed voice recorded in the first memory unit 33 in a predetermined process and to allow the user (child) to hear the processed voice.

【００３４】一方、前記電力増幅部４５の出力強度を調
節して、実際に前記スピーカ４３から発生する音声を大
きくするためのボリューム制御部４９は、前記Ａ／Ｄ・
Ｄ／Ａコンバータ４７と前記電力増幅部４５との間に連
結されている。例えば、前記ボリューム制御部４９は、
子供が所望の音声ボリュームに調節するため、前記第１
マイクロフォン３９を通じてボリューム調節のための命
令（例えば、「音を大きく」及び「音を小さく」）が前
記Ａ／Ｄ・Ｄ／Ａコンバータ４７を介して入力される
と、前記電力増幅部４５を制御して、前記スピーカ４３
から、前記命令に応じたボリュームの音声が発生するよ
うに制御する。結果として、前記電力増幅部４５は、前
記音声処理制御部３０のシステムコントローラ５９のア
ンミュート（unMute）信号とボリューム制御部４９の出
力信号に基づいて、その大きさ及び利得が決定される。On the other hand, the volume control unit 49 for adjusting the output intensity of the power amplifying unit 45 so as to increase the sound actually generated from the speaker 43 is provided by the A / D.
It is connected between the D / A converter 47 and the power amplifier 45. For example, the volume control unit 49
In order for the child to adjust to the desired sound volume, the first
When a command for volume adjustment (for example, “louder sound” and “lower sound”) is input through the A / D / D / A converter 47 through the microphone 39, the power amplifying unit 45 is controlled. And the speaker 43
, So that sound of a volume corresponding to the command is generated. As a result, the magnitude and gain of the power amplifying unit 45 are determined based on the unmute signal of the system controller 59 of the audio processing control unit 30 and the output signal of the volume control unit 49.

【００３５】また、前記音声入出力部３７の第１及び第
２マイクロフォン３９，４１は雑音除去機能を有するよ
うにするためのもので、例えば、第１マイクロフォン３
９には音声と雑音が混ぜ合わせられた信号が入力され、
第２マイクロフォン４１には、人形が使用者と接触する
か、周辺の雑音から影響を受けるとき、純粋雑音信号が
入力される。この際に、音声処理制御部（ＡＳＩＣ）３
０は、両信号の雑音間相関関係（関数）を用いて雑音を
減らすため、前記第１マイクロフォン３９を通じた音声
及び雑音信号と前記第２マイクロフォン４１を通じた純
粋雑音信号をコリレイション（CORRELATION）して雑音
成分のみを除去させる。このような第１及び第２マイク
ロフォン３９，４１は、実験結果に基づいて、人形の両
耳１１，１２に装着され、特に、前記第１及び第２マイ
クロフォン３９，４１のいずれか一つは小型のステレオ
マイクロフォンであり、そのステレオマイクロフォンと
しては、音声周波数帯域において敏感で指向性の高いも
のが設けられている。The first and second microphones 39 and 41 of the audio input / output unit 37 are provided to have a noise removing function.
9, a signal in which voice and noise are mixed is input,
A pure noise signal is input to the second microphone 41 when the doll comes into contact with the user or is affected by ambient noise. At this time, the audio processing control unit (ASIC) 3
0 correlates a voice and noise signal through the first microphone 39 with a pure noise signal through the second microphone 41 to reduce noise using a correlation (function) between noises of both signals. To remove only the noise component. The first and second microphones 39 and 41 are mounted on both ears 11 and 12 of the doll based on experimental results. In particular, one of the first and second microphones 39 and 41 is small. And a stereo microphone having a high sensitivity and a high directivity in an audio frequency band is provided.

【００３６】また、各接触スイッチＴ１，Ｔ２，Ｔ３，
Ｔ４は、図４に示すように、音声処理制御部３０に直接
連結されている。前記サーキュラバッファ５１は、前記
音声入出力部３７に入力された子供のデジタル音声信
号、つまりＡ／Ｄ・Ｄ／Ａコンバータ４７で変換された
フレーム単位にデジタル化された音声サンプリング信号
を一時的に記録し、音声認識部５３は、前記サーキュラ
バッファ５１に記録されたデジタル音声信号を、前記第
１メモリ部３３に記録された圧縮データの音声認識用定
数によって、音声認識用単語に区分し、ビタビアルゴリ
ズムで前記子供の音声の意味を認識し、前記ダイアログ
マネージャ５５は前記音声認識部５３で認識された音声
の内容が展開される多数のシナリオのいずれか一つを選
択し、その選択されたシナリオに対応するように、少な
くとも１文章の圧縮音声データを前記第１メモリ部３３
から取り出し、前記音声デコーダ５７は、前記ダイアロ
グマネージャ５５から取り出された前記音声圧縮データ
を伸張及び復元して前記音声出力部３７に出力する。そ
して、前記第１メモリ部３３、第２メモリ部３５、ボリ
ューム制御部４９、Ａ／Ｄ・Ｄ／Ａコンバータ４７、電
力増幅部４５に適切な制御信号を出力するシステムコン
トローラ５９が設けられている。Each contact switch T1, T2, T3
T4 is directly connected to the audio processing control unit 30, as shown in FIG. The circular buffer 51 temporarily stores a digital audio signal of a child input to the audio input / output unit 37, that is, an audio sampling signal digitized in frame units converted by the A / D / D / A converter 47. The voice recognition unit 53 records the digital voice signal recorded in the circular buffer 51 into voice recognition words based on the voice recognition constant of the compressed data stored in the first memory unit 33, and to recognize the meaning of the child's voice in the algorithm, the dialog
The manager 55 selects any one of a number of scenarios in which the content of the voice recognized by the voice recognition unit 53 is expanded, and compresses at least one sentence of the compressed voice data so as to correspond to the selected scenario. The first memory unit 33
And Eject from the audio decoder 57, the dialog stretching and restored to the audio compression data retrieved from the grayed manager 55 outputs to the audio output unit 37. A system controller 59 for outputting appropriate control signals to the first memory unit 33, the second memory unit 35, the volume control unit 49, the A / D / D / A converter 47, and the power amplification unit 45 is provided. .

【００３７】また、前記音声処理制御部３０は、前記子
供が人形の鼻、口、背中、尻に設置された接触スイッチ
Ｔ１，Ｔ２，Ｔ３，Ｔ４に接触すると、それに対応する
適宜の圧縮音声データを前記ダイアログマネージャ５５
及び前記第１メモリ部３３から取り出し前記音声デコー
ダ５７で変換して実際の音声に伸張及び復元した後、前
記音声入出力部３７のスピーカ４３を通じて前記子供に
聴覚的に聞かせるように制御する。When the child makes contact with the contact switches T1, T2, T3, and T4 installed on the nose, mouth, back, and buttocks of the doll, the audio processing control unit 30 generates appropriate compressed audio data corresponding thereto. The dialog manager 55
Then, after being extracted from the first memory unit 33, converted by the audio decoder 57 and expanded and restored to an actual audio, control is performed so that the child can be audibly heard through the speaker 43 of the audio input / output unit 37.

【００３８】前記第１メモリ部３３には、シナリオによ
る数多い文章の音声及び音楽、多数の話データ、音声認
識用定数、音声デコーディングのための復元データが圧
縮記録されている。使用された素子は４Ｍバイト以上の
大容量を有し、１word単位（１６ビット）にデータを記
録して、総２Mwordsを記録することができる。記録され
た内容は次の表１に示す。In the first memory 33, speech and music of many sentences according to a scenario, a large number of speech data, speech recognition constants, and restoration data for speech decoding are compressed and recorded. The element used has a large capacity of 4 Mbytes or more, and can record data in units of 1 word (16 bits), so that a total of 2 Mwords can be recorded. The recorded contents are shown in Table 1 below.

【００３９】[0039]

【表１】第１メモリ部の内部に記録された情報の内訳 [Table 1] Breakdown of information recorded inside the first memory unit

【００４０】前記第２メモリ部３５は、子供の音声及び
応答文章の音声を処理するための処理プログラムを記録
して内部的なデータ信号処理を行う素子で、ブロックリ
スト（block list）のためのエリアと音声認識の前処理
のための使用エリアを有し、所定のデータ記憶容量を有
する。ここで、リストコントローラ６１は、前記第２メ
モリ部３５のデータを取り出す機能だけでなく、前記第
１メモリ部３３の圧縮音声データも取り出して音声デコ
ーダ５７に出力するようになっている。The second memory unit 35 is an element for recording a processing program for processing a child's voice and a voice of a response sentence and performing internal data signal processing. The second memory unit 35 is used for a block list. It has an area and a use area for preprocessing of voice recognition, and has a predetermined data storage capacity. Here, the list controller 61 not only function to retrieve the data of the second memory unit 35, and outputs it to the audio decoder 57 also takes out the compressed audio data of the first memory unit 33.

【００４１】ここで、前記第２メモリ部３５と前記音声
処理制御部３０のリストコントローラ６１との間には、
相互間のデータ伝送のため、メモリコントローラ６３が
設けられている。特に、このメモリコントローラ６３は
前記圧縮データを読み取って前記第２メモリ部３５に出
力するように構成されている。Here, between the second memory unit 35 and the list controller 61 of the audio processing control unit 30,
For data transmission between each other, a memory controller 63 is provided. In particular, the memory controller 63 is configured to read the compressed data and output it to the second memory unit 35.

【００４２】一方、電源供給部６５は、３〜２４Ｖの電
圧変動範囲にある任意の電圧を３．３Ｖの一定電圧に維
持し、基本的に直列連結された乾電池３個（４．５Ｖ）
の電圧を使用するが、その他の電源を使用することも可
能である。このほかの動作のために必要な要素として、
第２メモリ部３５クロック用２４．５４６MHzのクロッ
ク発生部６７と、３２．７６８kHzのタイマー６９など
の前記構成要素を動作させるのに必須のものであるの
で、その説明を省略する。On the other hand, the power supply unit 65 maintains an arbitrary voltage within a voltage fluctuation range of 3 to 24 V at a constant voltage of 3.3 V, and basically includes three dry batteries (4.5 V) connected in series.
, But other power sources can be used. Elements required for other operations include:
The second memory unit is indispensable for operating the above components such as the clock generator 67 of 24.546 MHz for 35 clocks and the timer 69 of 32.768 kHz, and the description thereof will be omitted.

【００４３】前記音声認識部５３は、図５に示すよう
に、前記サーキュラバッファ５１に記録されたフレーム
単位のデジタル音声信号から、前記第１メモリ部３３の
音声認識用定数によって所定の雑音を除去させ、１文字
に対する固有値を特徴ベクトルデータとして算出する音
声認識算出部７１と、前記デジタル音声信号のサンプリ
ング値から０点を検出するゼロクロシングレート７３
と、前記ゼロクロシングレート７３での０点検出に対す
る信頼性を向上させるため、前記０点に対するエネルギ
ーを算出するエネルギー算出部７５と、前記ゼロクロシ
ングレート７３と前記エネルギー算出部７５の出力信号
に基づいて、連続的なデジタル音声信号のうち、どの１
単語の端点データを検出する単位音声検出部７７と、前
記音声認識算出部７１の特徴ベクトルデータと前記単位
音声検出部７７の端点データに基づいて、１単語ずつ音
声認識用単語に区分する前処理器７９と、前記前処理器
７９で区分された単語に該当する第１メモリ部３３の音
声圧縮データを前記リストコントローラ６１により取り
出しビタビアルゴリズムで演算する第２メモリ部３５と
を含んでいる。ここで、前記音声認識算出部７１と前記
サーキュラバッファ５１との間には、前記サーキュラバ
ッファ５１のデジタル音声信号をより速やかに処理する
ために周波数増幅するプレエンファシス８１とが設けら
れている。As shown in FIG. 5, the voice recognition section 53 removes a predetermined noise from the digital voice signal of the frame unit recorded in the circular buffer 51 by using the voice recognition constant of the first memory section 33. And a speech recognition calculation unit 71 for calculating an eigenvalue for one character as feature vector data, and a zero-crossing rate 73 for detecting 0 points from the sampling value of the digital speech signal.
And an energy calculator 75 for calculating the energy for the zero point in order to improve the reliability of detecting the zero point at the zero crossing rate 73, based on the zero crossing rate 73 and the output signal of the energy calculator 75. Which one of the continuous digital audio signals
A unit speech detection unit 77 for detecting end point data of a word; and a preprocessing for dividing words into words for speech recognition one by one based on feature vector data of the speech recognition calculation unit 71 and end point data of the unit speech detection unit 77 And a second memory unit 35 for extracting the compressed voice data of the first memory unit 33 corresponding to the words divided by the pre-processor 79 by the list controller 61 and calculating by the Viterbi algorithm. Here, a pre-emphasis 81 that amplifies the frequency in order to process the digital audio signal of the circular buffer 51 more quickly is provided between the audio recognition calculator 71 and the circular buffer 51.

【００４４】より詳しく説明すると、前記音声認識部５
３の計算の流れとモジュールの構成は、二つのモジュー
ル群からなっており、ビタビアルゴリズムと単位音声検
出アルゴリズムをASIC半導体で集積化させた多くのサー
バー−モジュールから構成されている。More specifically, the voice recognition unit 5
The calculation flow and the module configuration of No. 3 consist of two module groups, and are composed of many server modules in which the Viterbi algorithm and the unit voice detection algorithm are integrated by ASIC semiconductors.

【００４５】まず、ビタビアルゴリズムは、４才から１
０才の年齢層の子供のおもちゃに使用し得るように、Ｈ
ＭＭ（Hidden Markov Model）を用いるビタビアルゴリ
ズムを使用して一つのチップから構成されている。ま
た、ビタビアルゴリズムを実行する過程で生ずる多くの
可変データを処理し得るよう、外部第２メモリ部３５
（１６Ｍバイト）で収容し得るブロックリスト構造を使
用しており、全体的に約１Ｍバイトの第２メモリ部３５
の領域で動作するように構成されている。ＨＭＭ学習方
法は、使用者が異なっても、信頼性が向上されるように
し、即ち、話者独立型認識となるようにし、音素単位の
認識をするものである。First, the Viterbi algorithm starts at 4 years old
H, so that it can be used in children's toys of the age of 0
It is composed of one chip using a Viterbi algorithm using MM (Hidden Markov Model). Also, the external second memory unit 35 is designed to process a large amount of variable data generated in the process of executing the Viterbi algorithm.
(16 Mbytes), and uses a block list structure capable of accommodating about 16 Mbytes.
It is configured to operate in the region. The HMM learning method is to improve the reliability even if the user is different, that is, to perform speaker-independent recognition, and to recognize phoneme units.

【００４６】図３、図４及び図５を参照して、前述した
各要素の動作を簡略に説明すると次のようである。ま
ず、二つのマイクロフォン３９，４１は音声信号を受け
て電気信号に変換し、前述したＡ／Ｄ・Ｄ／Ａコンバー
タ（Codec）４７のアナログ音声信号変換装置に送る。
この際に、入力された二つの音声信号は、雑音除去のた
め、互いに独立した形態で音声処理制御部３０に伝達さ
れて、コリレイションを行う。前記音声処理制御部３０
では、特別な状況がない限り、Ａ／Ｄ・Ｄ／Ａコンバー
タ４７に制御信号（データ入力準備信号）を送出して、
収容準備状態になったことを知らせ、Ａ／Ｄ・Ｄ／Ａコ
ンバータ４７で補間（Interpolation）のためにｘ２５
６ＦＳの値である２．０４８MHｚを使用し、その同期周
波数（SYNC Frequency）は８kHzであり、音声認識部５
３で音声の認識を向上させるサンプリング率で適用され
るようにする。特に、前記８kHzサンプリング率は、前
記音声処理制御部３０の音声認識部５３において、認識
アルゴリズムに対する重要な処理基準となっている。一
方、入力された音声信号はＡ／Ｄ・Ｄ／Ａコンバータ４
７でＡ／Ｄ変換されて音声処理制御部３０に送られ、第
１及び第２マイクロフォン３９，４１を通じて独立デー
タとして入力されて，前記コリレイション演算によるノ
イズがフィルタリングされる。With reference to FIGS. 3, 4 and 5, the operation of each element described above will be briefly described as follows. First, two microphones 39 and 41 is converted into an electric signal by receiving the audio signal and sends the analog sound signal conversion apparatus A / D · D / A converter (Codec) 4 7 described above.
At this time, the two input audio signals are transmitted to the audio processing controller 30 in a form independent of each other to remove noise, and are correlated. The voice processing control unit 30
Then, unless there is a special situation, a control signal (data input preparation signal) is sent to the A / D / D / A converter 47,
Informs that it is now receiving ready, for between complement in A / D · D / A converter 47 (Interpolation) x25
The value of 6FS is 2.048 MHz, and the synchronization frequency (SYNC Frequency) is 8 kHz.
In 3 it is applied at a sampling rate that improves speech recognition. In particular, the 8 kHz sampling rate is an important processing criterion for a recognition algorithm in the voice recognition unit 53 of the voice processing control unit 30. On the other hand, the input audio signal is converted by an A / D / D / A converter 4.
The signal is A / D-converted at 7 and sent to the audio processing control unit 30, input as independent data through the first and second microphones 39 and 41, and the noise due to the correlation operation is filtered.

【００４７】このように、雑音の除去されたデジタル音
声サンプリング信号は、サーキュラバッファ５１でフレ
ーム単位で一時的に記録されてから、プレエンファシス
８１と音声認識算出部７１から一つ一つの使用者の音声
に対する固有値が特徴ベクトルとして算出され、各々の
単語の端点を検出するため、ゼロクロシングレート７３
及びエネルギー算出部７５と単位音声検出部７７をほぼ
同時に経ることになり、これらのそれぞれは前処理器７
９で単語ずつ音声認識用単語に区分される。すると、リ
ストコントローラ６１から前記前処理器７９からの音声
認識用単語に該当する第１メモリ部３３の圧縮データを
取り出すと、第２メモリ部３５にこれらのデータとビタ
ビアルゴリズムを移して、認識のための演算動作を行っ
て分析する。As described above, the digital audio sampling signal from which noise has been removed is temporarily recorded in a frame unit in the circular buffer 51, and then the pre-emphasis 81 and the audio recognition calculation unit 71 output the digital audio sampling signal from each user. An eigenvalue for speech is calculated as a feature vector, and a zero crossing rate 73 is used to detect the end point of each word.
And the energy calculation unit 75 and the unit sound detection unit 77 almost simultaneously.
In step 9, the words are classified into words for speech recognition. Then, when the compressed data of the first memory unit 33 corresponding to the speech recognition word from the preprocessor 79 is extracted from the list controller 61, the data and the Viterbi algorithm are transferred to the second memory unit 35, and the recognition is performed. And perform an arithmetic operation for the analysis.

【００４８】より詳しく説明すると、実際に、その動作
は８kbpsでサンプリングされた音声信号→前処理（音声
特徴検出）→音声検出→音声認識の段階からなってい
る。前処理は、Power、Hamming Window、プレエンファ
シスなどの計算段階を経た後、リアルエフエフティー
（RealFFT）をしたスペクトル結果に対してメル（Mel）
スケールのケプストラム（Cepstrum）を計算する。これ
とは別に、音声をゼロクロシングレート７３とエネルギ
ー算出部７５で計算して、音声の始点と端点を検出す
る。このような二つの音声検出結果に基づいて、音声認
識の開始及び終了あるいはリセットの有無を決定し、メ
ルスケールケプストラム係数列とＨＭＭに対してビタビ
アルゴリズムを用いて遂に音声を認識する。勿論、この
ような数多い計算をするために必要な定数は第１メモリ
部３３に記録されてから、必要となるたびに取り出され
て使用される。また、必要な値を計算してから取り出す
作業のため、第２メモリ部３５を使用し、そのデータ計
算の膨大性のため、リストコントローラ６１を用いてい
る。ここで、音声認識及び圧縮のため、音声の端点検出
は認識率と圧縮率を高めるのに使用される単位音声検出
部７７でなされる。More specifically, the operation actually comprises the steps of an audio signal sampled at 8 kbps → preprocessing (voice feature detection) → voice detection → voice recognition. The pre-processing is performed on the spectrum results that have been subjected to calculation steps such as Power, Hamming Window, and Pre-Emphasis, and then subjected to Real FFT.
Calculate the cepstrum of the scale. Separately, the speech is calculated by the zero crossing rate 73 and the energy calculation unit 75, and the start point and the end point of the speech are detected. Based on these two voice detection results, the start and end of voice recognition or the presence or absence of reset is determined, and voice is finally recognized using the Viterbi algorithm for the mel-scale cepstrum coefficient sequence and the HMM. Of course, constants necessary for performing such a large number of calculations are recorded in the first memory unit 33, and are taken out and used each time they are needed. Further, the second memory unit 35 is used for the operation of calculating and taking out a necessary value, and the list controller 61 is used for the enormous amount of data calculation. Here, for speech recognition and compression, the end point detection of the speech is performed by the unit speech detection unit 77 used to increase the recognition rate and the compression rate.

【００４９】ところで、ゼロクロシングレート７３とエ
ネルギー算出部７５は実験室又は比較的静かな室内で高
い効率と的中率を表すが、僅かな騒音にも反応するスピ
ーチ端点検出では根本的な問題点を有していることが事
実であるので、メルケプストラムと共に動作されなけれ
ばならない。Incidentally, the zero crossing rate 73 and the energy calculating section 75 show high efficiency and accuracy in a laboratory or a relatively quiet room, but there is a fundamental problem in the detection of a speech endpoint which responds to even a slight noise. Must be operated with the mel cepstrum.

【００５０】即ち、音声、雑音、無音の混ぜ合わせられ
たサンプリング信号をエネルギー検出部、ゼロクロシン
グレート、メルスケールケプストラムを求めて単位音声
検出部に入力すると、音声（雑音混合）部分が出力され
る。このように、二つのモジュールから出た結果は前処
理器７９に送られて、音声信号を認識することになる。That is, when a sampling signal obtained by mixing voice, noise, and silence is input to a unit voice detection unit to obtain an energy detection unit, a zero crossing rate, and a mel-scale cepstrum, a voice (noise mixing) portion is output. . Thus, the results from the two modules are sent to the preprocessor 79 to recognize the audio signal.

【００５１】このように、音声認識部５３で使用者、つ
まり子供の音声が認識されると、ダイアログマネージャ
５５は、その認識された音声を多数のパターンに分けら
れたシナリオのいずれか一つを選択し、その選択された
一つのシナリオによる応答音声の圧縮データを前記リス
トコントローラ６１と前記第１メモリ部３３から取り出
して音声デコーダ５７に伝達する。As described above, when the voice of the user, that is, the child, is recognized by the voice recognition unit 53, the dialog manager 55 divides the recognized voice into one of the scenarios divided into a number of patterns. The compressed data of the response voice according to the selected one scenario is extracted from the list controller 61 and the first memory unit 33 and transmitted to the voice decoder 57.

【００５２】次いで、音声デコーダ５７は、前記第１メ
モリ部３３の圧縮データを所定のデコーディング過程に
より伸張させてデジタル音声信号に復元し、音声入出力
部３７を通じて話者である子供に聞かせる。この際に、
前記音声デコーダ５７と前記音声入出力部３７との間に
はＡ／Ｄ・Ｄ／Ａコンバータ４７が備えられているた
め、デコーディングされた前記デジタル音声信号はアナ
ログに変化されて実際音声として発生される。Next, the audio decoder 57 expands the compressed data in the first memory unit 33 by a predetermined decoding process to restore it to a digital audio signal, and makes the audio input / output unit 37 listen to the child as a speaker. . At this time,
Since the A / D / D / A converter 47 is provided between the audio decoder 57 and the audio input / output unit 37, the decoded digital audio signal is changed to analog and generated as actual audio. Is done.

【００５３】ここで、ボリュームを調節するため、音声
入出力部３７を通じて、子供の音声から、「音を大きく
しろ」という命令が入力される場合、この命令はＡ／Ｄ
・Ｄ／Ａコンバータ４７を介して音声処理制御部３０に
入力されて認識される。すると、音声デコーダ５７のボ
リューム制御信号に応じて、所定の利得が決定されたボ
リューム制御部４９は前記Ａ／Ｄ・Ｄ／Ａコンバータ４
７に出力される前記アナログ音声信号を通常の増幅利得
値より大きくして子供の耳に聞かせるように、前記電力
増幅部４５を制御する。Here, when a command to "increase the sound" is input from the child's voice through the voice input / output unit 37 to adjust the volume, the command is A / D
-It is input to the voice processing control unit 30 via the D / A converter 47 and recognized. Then, according to the volume control signal of the audio decoder 57 , the predetermined gain is determined by the volume control unit 49, and the A / D / D / A converter 4
The power amplifier 45 is controlled so that the analog audio signal output to 7 is made larger than a normal amplification gain value so as to be heard by a child.

【００５４】前述したような本発明による音声認識対話
型人形おもちゃは、図６に示すような制御方法により操
作する。まず、人形が反応し得る段階は３段階（バッテ
リオン、タイムシグナル、接触スイッチモード）に区分
される（１３０，１３１，１３２）。仮に、電源が供給
されるか、電源の電池が取り替えられると、予め決めら
れた挨拶のことば、つまり「こんにちは、私はサラで
す。あなたは誰ですか。」という挨拶の言葉が出力され
（１３０）、使用者が録音しておいたメッセージ、つま
り朝、昼、夕によってそれぞれ相違したメッセージの挨
拶の言葉、又は設定されたシナリオによる言葉が音声と
して出力され（タイマーモード、メッセージモード）
（１３１）、接触スイッチＴ１，Ｔ２，Ｔ３，Ｔ４によ
り、挨拶の言葉に対応する音声が出力されることもある
（１３２）。The speech recognition interactive doll toy according to the present invention as described above is operated by a control method as shown in FIG.
Make. First, the stages to which the doll can react are classified into three stages (battery ON, time signal, contact switch mode) (130, 131, 132). If, that power is supplied, the battery power is replaced, a predetermined greeting of the word, that is, "Hi, I'm Sarah. Who are you do." Word of greeting is output that (130 ), A message recorded by the user, that is, a greeting message of a message that differs depending on the morning, noon, and evening, or a word according to a set scenario is output as a voice (timer mode, message mode).
(131), a voice corresponding to the greeting may be output by the contact switches T1, T2, T3, T4 (132).

【００５５】この流れ図によると、初期化により挨拶の
言葉を出力した後（１３３）、子供の対話用音声を認識
するため、音声信号を待つ（１３４）。仮に、子供が応
答しない待機時間が長くなると（１３７）、時間を終了
し（１４４）、まず、待機中であることを知らせる擬声
語、歌モード、話モード、遊びモード（病院ごっこ、ま
まごと、市場遊び、ティーパーティ遊び）の中の任意の
状況を付与するか、又はこれらのモードに対する案内音
声を出力する（１４５）。この際に、待機時間は約１０
秒程度である。このような過程が最小限３回以上である
と、セーブモード状態となるが（１４７）、この状態に
ないときは、続けて子供の音声を待つ（１４６）。According to this flowchart, after the greeting is output by initialization (133), a voice signal is waited for to recognize the child's dialogue voice (134). If the waiting time during which the child does not respond becomes long (137), the time is ended (144), and first, onomatopoeia, song mode, talking mode, and play mode (hospital play, play house, market play) that indicate that the child is waiting. , A tea party play) or output a guidance voice for these modes (145). At this time, the waiting time is about 10
On the order of seconds. If such a process is performed at least three times or more, the save mode mode is set (147). If not, the process waits for the child's voice (146).

【００５６】次いで、子供が望む音声反応をすると、す
ぐ、挨拶の言葉を出力し、上述した各遊びに関する所望
の遊びモード（１３６，１３８）に動く（１４１）。仮
に、応答がないか、認識が不可能（未認識）である場合
は、決められた方式の質問が繰り返される（１４３）。
所望の遊びモードが認識されたときは、すぐ、開始を知
らせる音声が出力され（１４８）、遊びが開始される。
この段階を経た後、多様なパターンによって、使用者と
人形は続けて進行することができる（１４９，１５０，
１５１，１５３，１５４）。仮に、対話中に認識が不可
能であっても、人が相手の言葉を認識し得ない場合に類
似した行動を見せるように、決定アルゴリズムを作って
計算した後、再び問い合わせるか、意図的にパターン上
で可能な応答をするか、再び遊びを進行させるかを決定
する（１５３，１５７）。これは人形に内蔵した処理方
式に依存する。多様なパターンを経て遊びが終わると、
使用者は再びほかの遊びをするか、止めるかを選択すべ
きである（１５５，１５６，１５８）。このような遊び
方式が４種存在し、可能なパターンの数は約３，０００
種に至る。Next, as soon as the child makes a desired voice response, a greeting is output, and the child moves to the desired play mode (136, 138) for each play (141). If there is no response or recognition is not possible (unrecognized), the question of the determined method is repeated (143).
As soon as the desired play mode is recognized, a sound indicating the start is output (148), and play is started.
After this step, the user and the doll can proceed in succession according to various patterns (149, 150,
151, 153, 154). Even if recognition is not possible during the dialogue, a decision algorithm is created and calculated so that a person performs a similar action when he or she cannot recognize the other party's words, and then asks again or intentionally It is decided whether to make a possible response on the pattern or to proceed with the play again (153, 157). This depends on the processing method built into the doll. After playing through various patterns,
The user should choose to play again or stop again (155, 156, 158). There are four such play modes, and the number of possible patterns is about 3,000
Leads to seeds.

【００５７】使用者にもっと興味を与え、実用性を強調
するため、タイマーモードとメッセージ録音及び再生モ
ードがある。すべての調整は、人形の内部に記録された
音声のみによってなされ、使用者はセッティングの正確
性を出力される音声により判断し得るようになってい
る。There are a timer mode and a message recording / playback mode to make the user more interested and emphasize the practicality. All adjustments are made only by the voice recorded inside the doll, and the user can judge the accuracy of the setting by the output voice.

【００５８】タイマーモードは四つの接触スイッチによ
る３ビットの信号を用い、七つのセッティングモード、
八つのモード、時刻調整モードがある。また、電力の供
給が中断されて、使用者がセッティングしたデータが消
えるときに備えて、デフォルトセッティング機能があ
る。セッティングモードは、一度押すとセッティング状
態がオンとなり、二度押すとオフとなる。また、５秒以
上何の操作もしないと、自動的に元の状態を記録したま
までオフされる。セッティング可能なモードとしてはノ
ーアクションモード、時間調整モード、起床時間、朝食
時間、昼食時間、昼寝時間、夕食時間、就寝時間を知ら
せる八つのモードがあり、各モードで、使用者の音声メ
ッセージが決められた時刻に出力されるようにセッティ
ングすることができる。この際に、時刻調整モードは、
分単位にセッティング可能であり、修正された後、音声
で結果を知らせる。もちろん、このような機能の一部の
みを動作させたいときは、セッティングモードを順次押
して、オン／オフさせることができる。仮に、中途に実
行を止めても、自動的に既存に実行された値を記録し、
オフされる。モード調整スイッチを一度押すと、時刻修
正を知らせる音声が出力され、もう一度押すと、起床時
間をセッティングするモードとなり、残りも同一形式で
なされる。The timer mode uses a three-bit signal from four contact switches and has seven setting modes.
There are eight modes, time adjustment mode. In addition, there is a default setting function in case the power supply is interrupted and the data set by the user disappears. In the setting mode, the setting state is turned on when pressed once, and turned off when pressed twice. If no operation is performed for more than 5 seconds, the camera is automatically turned off with the original state recorded. There are eight modes that can be set: no-action mode, time adjustment mode, wake-up time, breakfast time, lunch time, nap time, dinner time, and bedtime.Each mode determines the user's voice message. It can be set to be output at the specified time. At this time, the time adjustment mode is
It can be set in minutes and, after correction, announces the result by voice. Of course, when it is desired to operate only a part of such functions, the setting mode can be sequentially pressed to turn on / off. Even if the execution is stopped halfway, the value of the existing execution is automatically recorded,
Turned off. When the mode adjustment switch is pressed once, a sound indicating time correction is output, and when the mode adjustment switch is pressed again, a mode for setting the wake-up time is set, and the rest is performed in the same format.

【００５９】メッセージ録音（再生）モードを用いて使
用者の音声を４分程度録音することができ、それ以上の
録音をすると、自動的に中断される。一方、八つのモー
ドは所望の音声を七つの使用者セッティング時間に合わ
せて音声を再生させることができる。１０秒間に調整が
行われないと、自動的にオフされる。仮に、中間に実行
を止めても、自動的に実行された値を記録しオフされ
る。Using the message recording (playback) mode, the user's voice can be recorded for about 4 minutes, and if the recording is made longer, the recording is automatically interrupted. On the other hand, the eight modes can reproduce a desired sound in accordance with seven user setting times. If no adjustment is made within 10 seconds, it will be turned off automatically. Even if the execution is stopped in the middle, the automatically executed value is recorded and turned off.

【００６０】圧縮された音声データを音声信号に変換す
る音声デコーダ５７は第１メモリ部３３に記録された圧
縮音声情報（１４４ビット／２４０サンプル１６kHz
サンプルデータ）を与えられたアルゴリズムでデコー
ディングするモジュールで、総１４個のサブモジュール
から構成され、次の順に従って進行される。第２メモリ
部３５に初期値を付与し（set init）、第１メモリ部３
３の圧縮データを読み取り（rd dat）、monotone、lsf
intを用いてＬＳＦ処理をした後、con gain，stoch c
w，adapl cw，adap2.cw，lsf pc，lp syn，post fitの
処理モジュール経て、音声信号が出力される。The audio decoder 57 for converting the compressed audio data into an audio signal has the compressed audio information (144 bits / 240 samples, 16 kHz) recorded in the first memory unit 33.
This module decodes sample data) by a given algorithm, and is composed of a total of 14 submodules, and proceeds in the following order. An initial value is assigned to the second memory unit 35 (set init), first memory unit 3
3 compressed data is read (rd dat), monotone, lsf
After performing LSF processing using int, con gain, stoch c
w 、 adapl cw, adapt2.cw, lsf pc, lp syn, post An audio signal is output through the fit processing module.

【００６１】[0061]

【発明の効果】上述したように、本発明の音声認識対話
型人形おもちゃは、音声認識手段と音声発声手段から構
成されたシステムと、シナリオ展開を可能にするダイア
ログマネージャとを結合させて対話用に構成したもので
あり、子供のための人形であることを考慮して機械構造
のシステムを縫製した人形内に入れて、この人形おもち
ゃと遊びたいという欲求を誘発し、かつ言語の教育的効
果を高める、等の優れた効果がある。As described above, the voice-recognition interactive doll toy of the present invention combines a system composed of voice recognition means and voice utterance means with a dialog manager which enables scenario development, for dialogue. In consideration of being a doll for children, a mechanical structure system is put into a sewn doll to induce the desire to play with this doll toy, and the educational effect of language There are excellent effects such as enhancing

[Brief description of the drawings]

【図１】本発明の人形おもちゃを示す正面図である。FIG. 1 is a front view showing a doll toy of the present invention.

【図２】本発明の人形おもちゃを示す側面図である。FIG. 2 is a side view showing the doll toy of the present invention.

【図３】本発明の音声認識対話型人形おもちゃを示すシ
ステムブロック図である。FIG. 3 is a system block diagram showing a voice recognition interactive doll toy of the present invention.

【図４】図３の処理順序による流れ図である。FIG. 4 is a flowchart according to the processing order of FIG. 3;

【図５】本発明のＡＳＩＣ化された音声認識部のブロッ
ク図である。FIG. 5 is a block diagram of an ASIC-based voice recognition unit of the present invention.

【図６】本発明の音声認識対話型人形おもちゃの動作を
示すフローチャート図である。FIG. 6 is a flowchart showing the operation of the voice recognition interactive doll toy of the present invention.

[Explanation of symbols]

３０音声処理制御部３３第１メモリ部３５第２メモリ部３７音声入出力部３９第１マイクロフォン４１第２マイクロフォン４３スピーカ４５電力増幅部４７Ａ／Ｄ・Ｄ／Ａコンバータ４９ボリューム制御部５１サーキュラバッファ５３音声認識部５５ダイアログマネージャ５７音声デコーダ５９システムコントローラ６１リストコントローラ６３メモリコントローラ６５電源供給部６７クロック発生部６９タイマー７１音声認識算出部７３ゼロクロシングレート７５エネルギー算出部７７単位音声検出部７９前処理器８１プレエンファシスＴ１，Ｔ２，Ｔ３，Ｔ４接触スイッチ Reference Signs List 30 audio processing control unit 33 first memory unit 35 second memory unit 37 audio input / output unit 39 first microphone 41 second microphone 43 speaker 45 power amplification unit 47 A / D / D / A converter 49 volume control unit 51 circular buffer 53 voice recognition unit 55 dialog manager 57 voice decoder 59 system controller 61 list controller 63 memory controller 65 power supply unit 67 clock generation unit 69 timer 71 voice recognition calculation unit 73 zero crossing rate 75 energy calculation unit 77 unit voice detection unit 79 preprocessing Container 81 Pre-emphasis T1, T2, T3, T4 Contact switch

フロントページの続き (72)発明者パクヨンジョン大韓民国京畿道波洲市内面白連里28 (72)発明者キムウンジャ大韓民国ソウル市馬浦区桃花洞現代２次アパート208−1201 (72)発明者クオンサッボン大韓民国ソウル市江西区藤村洞公アパート805−402 (72)発明者イーチェキョン大韓民国全羅南道光洲広域市西区花正１洞786−22 (72)発明者チーキョンチェ大韓民国慶尚南道馬山市廻円区廻円２洞 509−11 (72)発明者パンテイシク大韓民国京畿道南陽洲市錦谷洞310−１ (72)発明者ハンチュイョン大韓民国京畿道高陽市徳陽区幸新１洞ヘッピッマウル1904棟902号 (56)参考文献特開平10−179941（ＪＰ，Ａ) 特開平９−326856（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) A63H 1/00 - 37/00 G10K 15/04 G10L 15/00 G10L 15/28 Continuing on the front page (72) Inventor Park Young-jeong 28, Baek-ri-ri, Hwasu-si, Gyeonggi-do, Republic of Korea (72) Inventor Kim Unjah Apartment 208-1201, Taekwha-dong, Mapo-gu, Seoul, South Korea 208-201 (72) Inventor Kuon Sapbon 805-402, Fujimura-dong, Gangseo-gu, Seoul, Republic of Korea (72) Inventor E Che-kyon 786-22, Hansa 1-dong, West-gu, Gwangju, Jeollanam-do, Republic of Korea (72) Inventor Chi Kyungchea, Masan, Gyeongsangnam-do, Republic of Korea 509-11, Kaiyuan 2-dong, Ichimaki-en-ku, Japan (72) Inventor Pan Tae-sik 310-1 Kinsuk-dong, Namyangsu-si, Gyeonggi-do, Republic of Korea (72) Inventor Han Chu-yeon Heppima-ul, Koshin-dong, Deok-yang, Goyang-si, Gyeonggi-do, Korea No. 902, 1904 (56) References JP-A-10-179941 (JP, A) JP-A-9-326856 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) A63H 1 / 00-37/00 G10K 15/04 G10L 15/00 G10L 15/28

Claims

(57) [Claims]

(1) A mixture of human and animal forms is formed.
A first memory unit (33) in which compressed digital audio signal streams of a large number of sentences are recorded at a predetermined compression rate, and a user audio signal input from the outside. A speech recognition interactive doll toy comprising a second memory unit (35) provided with a calculation area for recognizing the at least one , wherein at least one of the doll toys recorded in the second memory unit (35) is provided.
Converts the speech sentence of the user into an electrical audio signal output, the audio output unit to aurally tell the decompressed audio signal to the user and (37), the voice input and output unit (37 ) Output frame unit
To temporarily record the digital audio signal of the user
And a digital buffer recorded in the circular buffer (51).
The audio signal is transmitted to the first memory section (33).
The words for speech recognition are separated by the speech recognition constants of the compressed data.
Recognize the user's voice using the Viterbi algorithm
A voice recognition unit (53), and the contents of the voice recognized by the voice recognition unit (53) are predetermined.
At least one response statement to support the scenario
Dialog box for selecting a chapter in the first memory section (33)
Manager (55), and the second selected by the dialog manager (55).
Decompress and decompress audio compressed data in one memory unit (33)
Audio decoder (57), and between the audio decoder (57) and the audio input / output unit (37).
An analog audio signal and a digital audio signal
A / D / D / A converter for converting one to the other (47)
And the second memory unit (35) and the list controller (6).
1) The data of the first memory unit (33) provided between
To transfer to the second memory section (35)
Speech recognition interactive doll toy, which comprises a chromatography La (63), the.

And a dialog manager (5 ) between the voice recognition unit (53) and the first memory unit (33).
5) and the first memory unit (33) .
A list controller (6 ) that extracts voice compression data and voice recognition constants of the compressed data from the memory unit (33) and transfers the voice recognition data to the second memory unit (35).
The voice recognition interactive doll toy according to claim 1 , wherein 1) is provided.

3. The voice recognition section (53) converts the digital voice signal in frame units recorded in the circular buffer (51) into the first memory section (3).
The predetermined noise is removed by the voice recognition constant of 3) ,
A speech recognition calculation unit (71) for calculating an eigenvalue for one character voice as a feature vector; a zero crossing rate (73) for detecting 0 points from the sampling value of the digital voice signal; and a zero crossing rate (73) . In order to improve the reliability of detecting the zero point, an energy calculating unit (75) for calculating the energy for the zero point, based on the zero crossing rate (73) and the output signal of the energy calculating unit (75) , A unit speech detection unit (77) for detecting end point data of any one word in the continuous digital speech signal; a feature vector data of the speech recognition calculation unit (71) ; and a unit speech detection unit (77) . of Category preprocessor for dividing words for speech recognition word by word based on the end point data and (79), with said preprocessor (79) The first corresponding to the word was
A second memory unit (3) which provides an area in which the compressed voice data of the memory unit (33) is extracted by the list controller (61) and operated by the Viterbi algorithm
5. The voice-recognition interactive doll toy according to claim 1 , comprising:

4. is embedded into a plurality of regions of the doll body, to inform the said contact of the user audio decoder (57), the doll's back, nose, contact provided in the mouth and Ass <br / The voice recognition interactive doll toy of claim 1, further comprising a touch switch (T1, T2, T3, T4) .

5. When the user makes contact with contact switches (T1, T2, T3, T4) provided on the mouth, nose, back, and buttocks, an appropriate sound corresponding to the contact switches is transmitted to the dialog manager (55) and the dialogue manager. The audio data is extracted from the first memory unit (33) , decompressed and restored to the actual audio by the audio decoder (57) , and then audibly heard by the user through the audio input / output unit (37). Claim 4
Voice recognition interactive doll toy.

Wherein said voice input portion (37), the first microphone Rofon for outputting sound and external noise of the user on said converted into an electrical signal circular buffer (51) ( and 39), said circular converts the external noise into an electrical signal
The second microphone (4 ) outputting to the buffer (51)
1) and a power amplifying unit (45) for power-amplifying the audio signal expanded and restored by the audio decoder (57) and hearing it to the user audibly through a speaker (43). The speech recognition interactive doll toy according to claim 1, characterized in that:

7. A circuit between the circular buffer (51) and the first and second microphones (39, 41) .
The first and second output signals of the microphone (39, 41) into digital, said power amplifier and the audio decoder (57) (45) between the speech decoder (57)
In stretching and restored digital audio signal is converted into an analog A / D · D / A converter (47) is provided we are characterized by being claim 6 of the speech recognition interactive doll toy.

8. The A / D / D / A converter (47)
And the power amplifying unit (45) , the power amplifying unit according to a command for adjusting a volume of a user.
Volume control unit (4 ) for adjusting the output intensity of (45)
The voice recognition interactive doll toy according to claim 7 , wherein 9) is provided.