JP2004301942A

JP2004301942A - Speech recognition device, conversation device, and robot toy

Info

Publication number: JP2004301942A
Application number: JP2003092400A
Authority: JP
Inventors: Yuji Sawajiri; 雄二澤尻
Original assignee: Bandai Co Ltd
Current assignee: Bandai Co Ltd
Priority date: 2003-03-28
Filing date: 2003-03-28
Publication date: 2004-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition device etc., configured to automatically set the time when a specified speech made to correspond to recognized words is voiced and voice the speech made to correspond to the words at the set time by recognizing the specified words spoken in a certain daily time range. <P>SOLUTION: Provided are a control means equipped with a CPU, a clock transmitter, etc., a rewritable storage means such as a nonvolatile ROM, a speech input means for inputting a speech, and a speech recognizing means of recognizing the speech inputted through the speech input means as words. When the speech recognizing means recognizes specified words, time is acquired according to the clock signal of the clock transmitter and when the time at which the specified words are recognized is within a previously set time range, a speech predetermined corresponding to the specified words is voiced at specified time on a day which is a predetermined number of days after the day when the specified words are recognized. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本願発明は音声認識装置、当該音声認識装置を有した会話装置、前記音声認識装置または会話装置を有したロボット玩具に関する。
【０００２】
【従来技術】
従来、蓋体の開閉時に時刻、カレンダ情報に対応した所定の音声出力を有する「音声出力機能を有する小型電子機器装置」（特許文献１）が知られている。当該特許文献１記載の技術は、蓋体の開閉動作時にカレンダ情報、時刻情報等を音声により出力させる装置に関するものである。
【０００３】
【特許文献１】
特開昭５７−９３４４４号公報
【０００４】
【発明が解決しようとする課題】
前記特許文献１記載の技術は、セットした時刻または蓋体の開動作に応じて、時刻の報知、「おはようございます」、「こんにちは」等の音声を発声させることができるものであるが、音声を発声させるための設定および起動は使用者の操作に基づくものであり、使用者が当該装置に対して設定若しくはアクセスしなければ、所定音声による報知は行われないものである。
【０００５】
本件技術は、上記の点に鑑み発明されたものであって、使用者が意識せずに行った行動や、「行って来ます」等の日常のある時間帯に発声される特定の言葉を認識することで、当該認識した言葉に対応付けられた音声を発声すべき日時を自動的に設定し、当該対応付けられた音声を設定された日時に発声するように構成した音声認識装置等を提供することを課題とする。
また、当該音声認識装置を用いて使用者と音声により会話をすることができる会話装置を提供することを課題とする。
また、前記音声認識装置、または会話装置を所定の形態の筐体に内蔵したロボット玩具を提供することを課題とする。
【０００６】
【課題を解決するための手段】
上記課題を解決するために本願発明は下記の構成を有する。すなわち、請求項１記載の音声認識装置は、
ＣＰＵ、クロック発信器等を備えた制御手段と、
不揮発性ＲＯＭ等の書き換え可能な記憶手段と、
音声を入力するための音声入力手段と、
前記音声入力手段によって入力された音声を言葉として認識する音声認識手段とを有し、
前記音声認識手段によって所定の言葉を認識した場合に、前記クロック発信器のクロック信号に基づいて時刻を取得し、
当該所定の言葉を認識した時刻が予め設定された時間帯内である場合に、当該所定の言葉を認識した日の後予め設定された日数が経過した日の所定時刻に、前記所定の言葉に対応して予め定められた所定の音声を発声することを特徴とする。
【０００７】
また、上記課題を解決するために本願発明は下記の構成を有する。すなわち、請求項２記載の発明は請求項１記載の音声認識装置であって、
前記所定時刻が、前記所定の言葉を認識した時刻と同一時刻若しくは当該時刻によって設定される時刻であることを特徴とする。
【０００８】
また、上記課題を解決するために本願発明は下記の構成を有する。すなわち、請求項３記載の会話装置は、
前記請求項１または請求項２記載の音声認識装置を有し、前記予め設定された時間帯内に認識された言葉に応じて実行される会話処理プログラムを有したことを特徴とする。
【０００９】
また、上記課題を解決するために本願発明は下記の構成を有する。すなわち、請求項４記載のロボット玩具は、
前記請求項１または請求項２記載の音声認識装置を人間、動物、アニメーションキャラクター等を模した外形の筐体に内蔵したことを特徴とする。
【００１０】
また、上記課題を解決するために本願発明は下記の構成を有する。すなわち、請求項５記載のロボット玩具は、
前記請求項３記載の会話装置を人間、動物、アニメーションキャラクター等を模した外形の筐体に内蔵したことを特徴とする。
【００１１】
【発明の実施の形態】
以下、本願発明に係る音声認識装置の一実施形態として、音声によって人間と会話をすることができるロボット玩具について説明する。
図１は、本実施の形態に係るロボット玩具１の会話システムおよび駆動システム等の電気的な構成を表したブロック図である。ロボット玩具１は、次に述べる各種デバイスを制御する、ＣＰＵ１１等を中心として構成された制御手段（制御回路）を有している。当該制御手段は、前記ＣＰＵ１１の他にクロック発信器（図示せず）を有している。前記クロック発信器は、ＣＰＵ１１に対して所定のクロックパルスを出力するものであり、ＣＰＵ１１は当該クロックパルスに基づいて所定の制御を行うようになっている。
その他、ロボット玩具１は、音声認識ＩＣ１３、ＲＯＭ１５、フラッシュＲＯＭ１７等を有し、これら各デバイスおよび前記制御手段に対する入力手段としてスイッチ、センサ等の種々の手段を備えている。
ＣＰＵ１１は演算プロセッサであって、前記音声認識ＩＣ１３、ＲＯＭ１５、フラッシュＲＯＭ１７その他のデバイスを制御し、主としてＲＯＭ１５に記憶されたプログラムに従って各デバイスに対する命令等を行うものである。また、ＣＰＵ１１は、クロック発信器の出力するパルス信号に基づき時間を管理する機能（現在時刻演算機能・タイマー機能、曜日管理機能等）を有している。
【００１２】
音声認識ＩＣ１３は、ロボット玩具１に備えられた音声入力手段であるマイク（本実施の形態では３つのマイクＭＲ、ＭＣ、ＭＬ）を介して入力される音声を所定の言葉として認識することができる機能を有した音声認識手段である。本実施の形態では、当該音声認識手段ＩＣ１３として「Ｓｅｎｓｏｒｙ社製ＲＳＣ−３００−ＤＩＥ」を採用する。
前記３つのマイクＭＲ、ＭＣ、ＭＬはロボット玩具１を構成する筐体の頭部右側面、顔面部、頭部左側面にそれぞれ内蔵されており、どのマイクに対する入力（音の大きさ）が大きいかのを判定し、音の大きさに基づいて入力される音声がどちらの方向から発声されているのかを判断できるようになっている。また、当該音声認識ＩＣ１３は音声認識機能の他に、ＣＰＵ１１とＲＯＭ１５およびフラッシュＲＯＭ１７との間に介在し、ＣＰＵ１１の命令（信号）に従ってＲＯＭ１５、フラッシュＲＯＭ１７に対する命令・情報の入出力等を行う機能を有している。
また、ＲＯＭ１５は、前記ＣＰＵ１１を動作させるためのプログラム、スピーカＳＰを介して発生する音声（音声発声用のデータ）等を記憶したメモリーＩＣである。フラッシュＲＯＭ１７は、ロボット玩具１を使用する者（以下「使用者」）の固有の情報や制御上一時的に必要な各種情報を記憶するための不揮発性メモリーＩＣである。
また、ＣＰＵ１１に対する指示、信号の入力手段としては、ロボット玩具１を後述するスタンバイモードへ移行又はスタンバイモードから同じく後述する他のモードへ移行させる押圧スイッチＳＷ１（以下「スリープスイッチＳＷ１」または「しっぽスイッチＳＷ１」という。）と、使用者によるロボット玩具１への接近もしくは接触、または、ロボット玩具１の状態（傾斜または直立している等）もしくはロボット玩具１の周囲の状況を検知するその他のスイッチ（センサ含む）ＳＷ２〜ＳＷ７を有している。当該スイッチ中スイッチＳＷ２およびスイッチＳＷ３は、使用者によるロボット玩具１に対する押圧等を検知する押圧スイッチであり、スイッチＳＷ４〜ＳＷ７はロボット玩具１の状態（傾斜、直立等）またはロボット玩具１の周囲の環境を検知するセンサ等となっている。
【００１３】
また、３つの発光ダイオード（ＬＥＤ１〜３）、スピーカＳＰ、液晶表示装置ＬＣＤ、３つのサーボモータ（ＳＭ１〜３）、４つのＤＣモータ（Ｍ１〜４）が設けられており、ＣＰＵ１１からの駆動信号によってそれぞれ駆動されるようになっている。
前記サーボモータＳＭ１〜ＳＭ３およびＤＣモータＭ１〜Ｍ４は、それぞれ各モータ駆動用のモータドライバを有し、ＣＰＵ１１からの命令に基づき所定角度、若しくは所定時間回転することで、ロボット玩具１の走行（歩行）、頭部の回動等を行うようになっている。
【００１４】
本実施の形態では、ロボット玩具１の外観を構成する筐体３は、アニメーションキャラクターを模した形状に形成されており、頭部、胴部、腕部、脚部を構成する各部材によって構成されている。当該筐体３内には、前記ＣＰＵ１１、音声認識ＩＣ１３、ＲＯＭ１５、フラッシュＲＯＭ１７等が所定の回路基板上に設けられた状態で、電源となる電池とともに収容されている。なお、当該筐体の形態は、人間、動物等を模したものであってもよい。
また、頭部は左右に回動するように胴体部に取り付けられ、前記サーボモータＳＭ１によって所定角度内で回動するようになっている。同様に左右の各腕は上下方向に回動するように胴体部左右に取り付けられサーボモータＳＭ２、ＳＭ３によって所定角度回動若しくは回転するようになっている。
また、胴体部底面には動力輪（図示せず）が設けられており、当該動力輪が前記ＤＣモータによって回転し、筐体３を前後・左右に移動させることができるようになっている。
さらに、ロボット玩具１には、前記ＣＰＵ１１からの信号によって点灯・点滅を行うする３つの発光ダイオードＬＥＤ１、ＬＥＤ２、ＬＥＤ３および所定の画像表示等を行う液晶表示装置ＬＣＤが筐体３胴体部若しくは頭部内の所定位置に視認可能に設けられ、音声認識ＩＣ１３によって生成された音声を出力する音声スピーカＳＰが頭部の顔面部所定位置（口に相当する部位）に設けられている。
【００１５】
スリープスイッチＳＷ１は押圧スイッチであり、使用者によって押圧されることにより回路が閉じ、当該スリープスイッチＳＷ１への押圧がＣＰＵ１１によって検出されるようになっている。
具体的には、ロボット玩具１にはスリープスイッチＳＷ１を内蔵した「しっぽ」に相当する部材が設けられており、当該しっぽを握る若しくは押圧することによりスリープスイッチＳＷ１の回路が閉じるようになっている。
また、スイッチＳＷ２、ＳＷ３は、前記左右の腕内部（手）にそれぞれ内蔵されており、例えば使用者がロボット玩具１の手を握る等の動作によってスイッチＳＷ２、ＳＷ３が押圧され、当該手を握る等の動作が行われたことをＣＰＵ１１が検出できるようになっている。
また、他のスイッチ（センサ含む）ＳＷ４〜ＳＷ７の種類としては、例として頭部に設けることで頭をなでたような動作を検出する（光センサにより構成され、入力される光量の変化により頭をなでたような動作を検出する）スイッチや、ロボット玩具１の傾きを検知する傾斜センサ、周囲の明るさを検知する光センサ、その他ロボット玩具１に対する使用者の種々の動作や、ロボット玩具１の周囲の環境を検出することが出来るセンサが用いられ、各々筐体３の適切な位置に設けられている。
【００１６】
ＲＯＭ１５は各種データ等を記憶したメモリーであり、ＣＰＵ１１又は音声認識ＩＣ１３によって実行されるプログラム、および、セリフデータ、音声合成データ等の音声発生用のデータを記憶させたものである。図２（ａ）は、当該ＲＯＭ１５のメモリーマップを示している。
当該ＲＯＭ１５に記憶されたプログラムは、図３に示す「Ｔ、Ｒ・・」等の各コマンドを組み合わせて記述したものを、ＣＰＵ１１で実行用可能にコンパイルしたものである。また、図４、図５、図６は、図３に示したコマンドの記述式および同コマンドの処理内容の一例を表したものである。また、当該プログラムは主としてＣＰＵ１１が音声認識ＩＣ１３を駆動するためのものであり、当該プログラムによって音声認識ＩＣ１３が前述のマイクＭＲ、ＭＣ、ＭＬに入力される音声を言葉として認識したり、所定の音声発声用のデータに基づいて音声の生成等を行い、これらの処理に伴うＲＯＭ１５やフラッシュＲＯＭ１７に対するデータの入出力等を行うようになっている。
【００１７】
ＲＯＭ１５には、前記プログラムの他、音声発生用のデータが記憶されており、音声発生用のデータには、セリフデータと音声合成データが含まれている。セリフデータとは、図７、図８の表に示すようなセリフに関するデータであり、当該セリフデータと前記音声合成データをもとに音声認識ＩＣ１３において所定の音声を生成し、スピーカＳＰより出力するためのものである。
また、音声として出力されるセリフは、図８に示す「細分化番号」と対応して記憶された短い語句を、図７に示すセリフ番号毎に定義された「細分化番号」の組み合わせに基づいて配列することで生成するようになっている。例えば、図７の「セリフ番号１」に定義された言葉は、図８に示す「細分化番号」の「１」「２」「３」「４」の語句を順番に配列したものとして定義されており、この定義によって「はじめまして！、僕、○○えもん、です。」というようにセリフが生成され音声によって出力される。
また、図７に示すセリフ番号毎のセリフには、セリフとともに各語句間に与えられる音声を発声しない無音声時間（インターバル）が設定されている。このような無音声時間の設定によって、音声認識ＩＣ１３が発生する音声を、人が通常話しをしているかのように人にとって聞きやすい語調とすることができるようになっている。
【００１８】
また、フラッシュＲＯＭ１７は前記コマンドの実行の結果取得されたデータや、使用者によって入力等された年齢、生年月日等の個人情報を記憶する機能を有した書き換え可能な記憶手段（不揮発性メモリー）である。図２（ｂ）は、当該フラッシュＲＯＭ１７のメモリーマップを示している。
【００１９】
次に、上記ロボット玩具１において実行されるプログラムおよび当該プログラム制御の内容について説明する。図９は当該プログラムの全体構成を表した概念図であり、図１０は当該概念図に含まれる会話処理の主要部分を表したメインフローチャートである。
はじめに、全体的な構成を説明する。プログラムによる制御は大きく３つのモードに分けて説明することが出来る。すなわち、スリープモード（Ｍ１）、スタンバイモード（Ｍ２）、システムパネルモード（Ｍ３）である。各モードの概略は以下の通りである。
【００２０】
スリープモード（Ｍ１）は、ロボット玩具１の主な機能が停止されているモードである。当該スリープモードは、使用者がロボット玩具１で遊んでいないとき（例えば使用者が寝ているとき）等に主要動作を行わず、消費電力の節約等を目的としてロボット玩具１の機能を低下させるモードである。
当該モードへの移行は、スリープスイッチＳＷ１（本実施例ではしっぽに内蔵されている）が使用者によって操作されるか、または、スイッチＳＷ２〜ＳＷ７への入力およびマイクマイク（ＭＲ、ＭＣ、ＭＬ）への入力が一定時間（例えば３６時間）無い場合に行われる。
このモードでは、音声によって話しかけたり（音声入力）、筐体３を揺すったり（傾斜センサによる入力）、手を握った場合（スイッチ入力ＳＷ２、ＳＷ３に対する入力）であっても、ロボット玩具１は入力として受け付けないようになっている。ただし、しっぽに内蔵されたスリープスイッチＳＷ１に対する操作の検知と計時機能だけは行えるようになっており、しっぽのスリープスイッチＳＷ１が操作されるとロボット玩具１は次に述べるスタンバイモード（Ｍ２）になり、再びしっぽのスリープスイッチＳＷ１が操作されるとスリープモード（Ｍ１）に戻るようになっている。
【００２１】
スタンバイモード（Ｍ２）は、ロボット玩具１が使用者によるスイッチ若しくはセンサ（ＳＷ１〜ＳＷ７等）に対する操作や入力、マイクによる外部からの音の入力（以下「アプローチ」という。）を検知することができるモードである。なお、当該スタンバイモード（Ｍ２）で行われる音の検知とは、人が話をした音声を言葉として認識するいわゆる音声認識ではなく、マイク（ＭＲ、ＭＣ、ＭＬ）に対して何らかの音が入力されたことを検知するものである。
ロボット玩具１は、スリープスイッチＳＷ１の押圧により強制的にスリープモード（Ｍ１）になるか、アプローチが３６時間無い場合にスリープモード（Ｍ１）になる以外は、スタンバイモード（Ｍ２）とシステムパネルモード（Ｍ３）のいずれかのモードになっている。
スタンバイモード（Ｍ２）からシステムパネルモード（Ｍ３）への移行は、前述のアプローチによって行われる。また、スタンバイモード（Ｍ２）は、上記アプローチ以外に、後述する特定の時刻（アプローチタイム）が到来する（計時クロック情報に基いて演算された時刻と特定の時刻が一致する）と、自動的に所定の音声を発声させた後システムパネルモード（Ｍ３）へ移行するようになっている。
【００２２】
システムパネルモード（Ｍ３）は、前述のアプローチによりスタンバイモード（Ｍ２）から移行されるモードであり、ロボット玩具１が、入力された音声を言葉として認識し、当該認識した言葉に対応する言葉（ＲＯＭ１５に記憶されている、セリフデータおよび音声合成データを含む音声発生用のデータ）を音声によって出力することにより、使用者とロボット玩具１の間で音声会話を行うことができるモードである。
また、当該システムパネルモード（Ｍ３）の中で行われる処理は、後述するようにアプローチがあった時刻等によって異なっている。
【００２３】
フラグについて説明する。本実施の形態における説明中、フラグとは、主としてＣＰＵ１１によって管理される特定の記憶領域におけるデータ（有り（１）または無し（０））を意味するものとして説明する。
本実施の形態ではロボット玩具１の状態に対応する複数のフラグを有している。このうち主なものは「おかえりフラグ」と「おはようフラグ」、および後述する「いってらっしゃい出力フラグ」と「いってらっしゃい時間フラグ」である。
フラグの有無は「１」又は「０」で表され、一般的にフラグが「１」の状態をフラグが立っている等と称し、「０」の場合にはフラグが立っていない、消えている等と称している。
【００２４】
図９に示した概念図を用いて、初期状態に於ける、スリープモード（Ｍ１）からスタンバイモード（Ｍ２）に移行するまでの制御の内容を説明する。
購入時（出荷時）の状態若しくはオールリセットした状態から主電源ＰＳＷを入れた（電源ＯＮ）場合には、『初期設定フローＮｏ．１』の処理が行われる。
当該初期設定の処理内容についての詳細は省略するが、当該処理では前述したコマンドによって記述されたプログラムに従って、現在の年月日、時刻、使用者の誕生日等の入力を行うものである。この設定は、ロボット玩具１の後頭部内に収容された液晶表示装置ＬＣＤおよび所定の操作スイッチを用いて行われる者であり、液晶表示装置ＬＣＤに表示される内容に従って、対話形式で使用者によって行われる。
上記処理の後、予め定められた所定の語句に対応した使用者が発声する音声を登録する。この処理では、例えば「ポケット」「いってきます」「お散歩」「ダンス」「○○えもん」「道具」「ドラヤキ」「ネズミ」「ハイ」「イイエ」、ポケット開閉キーワード等の語句を対話形式で音声入力するものであり、会話処理の際に行われる音声認識の精度を高めるための処理である。当該入力された各音声はフラッシュＲＯＭ１７内の所定アドレスに記憶される。
上記『初期設定フローＮｏ．１』が終了すると、システムパネルモード（Ｍ３）へ処理を移行する。以降、スリープスイッチＳＷ１が操作される等してスリープモード（Ｍ１）に移行しない限り、スタンバイモード（Ｍ２）とシステムパネルモード（Ｍ３）が交互に繰り返される。
【００２５】
前記初期設定が完了した状態で、電源が入れられると最初はスリープモード（Ｍ１）になる。スリープモード（Ｍ１）は、起動用のスイッチとなるしっぽに設けられたスリープスイッチＳＷ１の操作を行わない限り、計時機能、記憶した情報の保持等の最小限の機能を残し、外見上ロボット玩具１を動作させないようにする制御である。これは、夜間や未使用時における電力の消費を最小限に抑制するための機能である。
当該『スリープ』制御中にしっぽに設けられたスリープスイッチＳＷ１が操作されると、スタンバイモード（Ｍ２）になる。
スタンバイモード（Ｍ２）に移行する際には、当該モードに入ると同時にカレンダー、時刻、ポケットがしまっているか等のロボット玩具１についての各種の状態を確認し、電池切れの有無を確認するための電圧低下の確認を行う。これらの書の後、ロボット玩具１に対するアプローチ（単なる音の検知、各スイッチに対する入力、使用者の話しかけ等による音声の入力）を検出し、アプローチがあった場合に詳細には後述する『スタンバイセリフ１Ｎｏ．３』等の処理を行った後にシステムパネルモード（Ｍ３）に移行するモードである。
【００２６】
次に、スタンバイモード（Ｍ２）からシステムパネルモード（Ｍ３）へ、システムパネルモード（Ｍ３）からスタンバイモード（Ｍ２）へ移行するモードサイクルの概要を、図１０のフローチャートおよび図１１に示す制御表を用いて説明する。
以下、基本動作としてアプローチ（Ｓ１）があった場合について説明する。スタンバイモード（Ｍ２）の状態で、ロボット玩具１に対してアプローチ（Ｓ１）があった場合、又は所定のアプローチタイムになった場合（アプローチタイムについては後述する）、ロボット玩具１はシステムパネルモード（Ｍ３）に移行する。
アプローチ（Ｓ１）が行われたことは、腕、筐体３等に設けられたスイッチ若しくはセンサへの入力が検知されるか、または、マイク（ＭＲ、ＭＣ、ＭＬ）によって何らかの音が検知されることによって判断される。すなわち、各スイッチ等に入力があったことは、使用者がロボット玩具１に話しかけたり、触れたりという動作が行われたことを意味しており、ロボット玩具１は当該動作を検知する。
アプローチ（Ｓ１）があるとＣＰＵ１１は「おかえりフラグ」が立っているか否をチェックする（Ｓ２）。本実施の形態では、「おかえりフラグ」が立っているということはロボット玩具１が外出している使用者の帰りを待っている状態であり、「おかえりフラグ」が立っていない状態とは使用者が帰ってきている（外出していない）状態として処理が行われることを意味している。
【００２７】
「おかえりフラグ」が立っている（おかえりフラグ＝１）場合には（Ｓ３）、前記アプローチが検知された後「おかえりなさーい」という音声が発声され（Ｓ４）、「おかえりフラグ」が更新（おかえりフラグ＝０）される（Ｓ５）。当該更新処理の後、次に、前記アプローチが行われた時刻を確認し（Ｓ６）、当該確認された時刻に応じて音声が後述する第５〜第８のパターンの音声を発声する（Ｓ７）。
当該第５〜第８のパターンは、「おかえりフラグ」が立てられてから次のアプローチが行われる迄の時間（ｔ）に応じて変えられており、例えば本実施例では「おかえりフラグ」が立てられてから３時間未満、３時間以上１０時間未満、１０時間以上１４時間未満、１４時間以上という時間によって規定されている。これは、使用者が出かけてから帰るまでの時間が早ければロボット玩具１が喜び、帰りが遅ければ怒る等という実際の家庭で行われるような状態を再現するためのものである。
【００２８】
前記アプローチ（Ｓ１）があった際に「おかえりフラグ」が立っていない（おかえりフラグ＝０）場合（Ｓ８）には、ＣＰＵ１１は「おはようフラグ」が立っているか否をチェックする（Ｓ９）。この場合、「おはようフラグ」が立っている状態とは、詳細には後述するが使用者が朝起きてからロボット玩具１にアプローチが行われるのを待っている状態であり、「おはようフラグ」が立っていない状態とは、その日の朝（６：００以降）に、既にアプローチが行われていることを意味している。
「おはようフラグ」が立っている場合（Ｓ１０）には、時刻ｈの確認が行われる（Ｓ１１）。時刻ｈが６：００以降２３：００以前の場合（Ｓ１２）には、「おはよう」という音声が発声され（Ｓ１３）、「おはようフラグ」が更新（おはようフラグ＝０）される（Ｓ１４）。そして、前記時刻ｈが「おはよう」アプローチタイムとして新規若しくは更新された情報として曜日の情報とともにフラッシュＲＯＭ１７に記憶される（Ｓ１５）。
そして、次週の同じ曜日の午前６時以降、前記記憶したアプローチタイム（▲１▼おはようアプローチタイム）までの間、物理的なアプローチが行われなかった場合は、自動的にスタンバイモード（Ｍ２）からシステムパネルモード（Ｍ３）へ移行するように設定される。すなわち、次週の同じ曜日において記憶された「おはようアプローチタイム」になると、他のフラグ条件を満たしている場合にロボット玩具１は「おはよう」と発声し、使用者からの話しかけを待機する。
【００２９】
前記時刻ｈの確認（Ｓ１１）が行われた結果、時刻ｈが２３：００以降６：００以前の場合には（Ｓ１６）、まだ寝ている時間であるという演出として「なんだ、なんだ？」「こんな時間にどうしたの？」等の「おはよう」とは異なる音声が発声され（Ｓ１７）、システムパネルモード（Ｍ３）に移行する。このようにアプローチがあった時間帯に応じて（本実施例では、午前６時以降午後１１時以前と、午前６時以前もしくは午後１１時以降とで）アプローチに対するロボット玩具１が発生する音声が変化する。
また、前記「おはようフラグ」が立っているか否のチェック（Ｓ９）の結果、「おはようフラグ」が立っていない場合には、「うわっ！なに？」「ウフフ・・・な〜に？」等の音声が発声され（Ｓ１７）システムパネルモード（Ｍ３）に移行する。
【００３０】
以上のように、何らかのアプローチがあった場合若しくは所定のアプローチタイムになった場合には、前記「おかえりフラグ」、「おはようフラグ」の確認した後所定の処理を行い、ロボット玩具１はシステムパネルモード（Ｍ３）に移行する。
システムパネルモード（Ｍ３）において行われる主な制御は、当該モードに移行してから５分経過する迄音声入力がおこなわれなかった場合に再びスタンバイモード（Ｍ２）に移行することと、音声入力があった場合に音声認識ＩＣにより当該入力された音声を言葉として認識し、認識結果に従って所定の会話処理を行うことである。特に、このモードでは使用者から入力された音声が「いってきます」という所定の言葉として認識された場合、この言葉が認識された時刻を記憶（▲２▼いってきます（でかける時間でしょ）アプローチタイム）し、前記「おかえりフラグ」を立てる。すなわち、「いってきます」という特定の言葉が認識されると、フラッシュＲＯＭ１７に当該曜日とともに「▲２▼いってきますアプローチタイム」を記憶若しくは更新する処理を行う。当該記憶された「▲２▼いってきますアプローチタイム」は、次週の同じ曜日に同時刻におけるロボット玩具の「おかえりフラグ」の状態を参照し、おかえりフラグ＝０（「いってきます」という言葉が認識されていない、すなわち、まだ出かけていない）の場合に、「出かける時間でしょ」等の音声報知を行うために使用される。
また、システムパネルモード（Ｍ３）において、「いってきます」等の特定の言葉以外の言葉を認識した場合には、当該認識結果に応じて所定の会話処理が行われる。
【００３１】
図１２に示した表を用いて、アプローチタイムについて説明する。本実施の形態では、前述したように一週間毎に更新される２つのアプローチタイムを記憶するようになっている。すなわち、使用者により入力された音声が「おはよう」という言葉であると認識された場合に記憶されるアプローチタイム▲１▼（「おはようアプローチタイム」）と、「いってきます」という言葉であると認識した場合に記憶するアプローチタイム▲２▼（「いってきます（でかける時間でしょ）アプローチタイム」）である。
当該各アプローチタイムは、フラッシュＲＯＭ１７の所定領域に記憶されるようになっており、記憶されたアプローチタイムに従って次週の同一曜日（一週間後の同じ曜日）に『アプローチフローＮｏ．２』として定義されている各処理が行われる。
【００３２】
本実施の形態では、４種類の制御方法が『アプローチフローＮｏ．２』として定義されている。
第１のアプローチフローは図１３に示す通りである。当該第１のアプローチフローは、「おはようアプローチタイム」に対応するものであり、当該アプローチタイムとして記憶されている曜日および時刻が到来すると処理がスタートする。
処理がスタート（Ｓ２０）すると、「▲１▼おはよう！朝だよ起きて！起きて！」、「▲２▼おはよう！時間だよ起きて！起きて！」、「▲３▼おはよう！さぁ！起きて！起きて！」の３つのセリフの中から一つの音声がランダムに選択されて出力され（ステップＳ２１）、当該音声が出力されると使用者が発声する音声の入力を待つ状態となる。この状態で使用者が音声を入力すると（Ｓ２３）、次に「▲１▼気持ちのいい朝だね」、「▲２▼ちゃんと起きたね。」、「▲３▼僕はもうひと眠り・・・」という音声の内の一つを選択し発声する（Ｓ２４）。次に、「おはようフラグ」（「おはよう」出力フラグ）を下げ、おはようアプローチタイム（「おはよう」時間フラグを維持）を維持し（Ｓ２５）、当該処理を終了する（Ｓ２６）。
前記ステップＳ２１において音声が発声された後、使用者が発する音声を一定時間（本実施例では５分間）検知することができなかった場合（Ｓ２７）には、「▲１▼朝だけど・・・寝よう・・・」、「▲２▼ゆっくり寝るのもいいことだ！」、「▲３▼グゥこれは寝言です。」という音声の内の一つを選択し発声する（Ｓ２８）。次に「おはようフラグ」（「おはよう」出力フラグ）を立て、「おはようアプローチタイム」（「おはよう」時間フラグを）をリセットする処理が行われ（Ｓ２９）、当該処理を終了する（Ｓ３０）。
その他、第２〜第４のアプローチフローとして、図１４、１５、１６、１７に示す会話処理が行われる。
【００３３】
次に、［スタンバイセリフ１Ｎｏ．３］処理について説明する。
当該処理は、スタンバイモード（Ｍ２）の状態でロボット玩具１に対してアプローチがあった場合、前記「おはようフラグ」が立っているか否か、およびロボット玩具１に対するアプローチが行われた時刻によって発声されるセリフを異ならせて出力し、システムパネルモード（Ｍ３）に移行する制御である。
図１８に示すように、当該処理はアプローチが行われた時刻によって８個のパターンに分かれており、使用者からの「いってらっしゃい」という言葉の認識の前後によってさらにパターンが二つに分かれている（図１１参照）。
第１のパターンは「朝６時から〜夜２３時までの間で、何もフラグが立っていない」場合、第２のパターンはフラグの有無にかかわらず「２３時から朝６時までの間」、第３のパターン「朝６時からおはようフラグの立っている時間まで」、第４のパターンは「おはようフラグの立っている時間から１１時まで」である。なお、「おはようフラグの立っている時間」というのは、一週間前の同一曜日にフラッシュＲＯＭ１７に記憶された時刻をいい、一週毎に書き換えられるものである。また、初期設定が行われた当初の一週間や、ロボット玩具１が「おはよう」という言葉を一日の中で音声認識できなかった場合（入力されなかった場合）には時刻は記憶されず、次の週の同一曜日には「おはようフラグ」は立たない。
【００３４】
図１９に、第１〜第４のパターン時に発声される音声の一覧を示す。例えば、第１のパターンの場合には、何らかのアプローチがロボット玩具１に対してあると、セリフ番号「▲１▼８２、▲２▼１０１９、▲３▼８３、▲４▼８４、▲５▼８５」の内からランダムにセリフが選択され出力され、▲１▼が選択された場合には「ムニャムニャ・・」というセリフが発声された後、ロボット玩具１はシステムパネルモード（Ｍ３）に移行する。また、当該セリフの音声出力の際には、図７に示したように選択されたセリフ番号に対応する細分化番号（図６）によって定義されている語句データを組み合わせて一つのセリフとして構成され出力される。
当該セリフが音声出力された後、使用者による音声入力が無い場合（図１１「音声無しの場合」）には、５分程度待機した後にセリフ番号「▲１▼８６、▲２▼２８」がランダムに音声出力された後、スタンバイモード（Ｍ２）に移行する。
【００３５】
また、システムパネルモード（Ｍ３）において、セリフが音声出力された後、さらに使用者による音声入力が行われた場合には、音声の内容に応じて次の処理が行われる。
マイクに対して音声が入力されると、ＣＰＵ１１は音声認識ＩＣ１３の処理の結果「いってきます」という言葉が入力されたものであるか否かを判断する。当該音声が「いってきます」であると認識されなかった場合には、他の音声が入力されたものとして、図９に示す『記念日セリフ１Ｎｏ．５ＤＸバージョン』に処理が分岐する。尚、本願明細書では当該処理以降についての説明は省略する。
入力された音声が「いってきます」であると認識された場合には、音声認識ＩＣ１３によって「いってらっしゃい」を音声出力させ、同時に「おかえりフラグ」を立てスタンバイモードに移行する。
また、前述のように朝の６時以降、最初のアプローチがあった時刻、および「おかえりフラグ」を立てた時刻を、毎日それぞれ一週間単位で記憶する。すなわち、一週間分の記憶領域が確保されており、当初の一週間が経過した後は各曜日が到来する毎に前記各時刻を書き換えて保存する。アプローチがなかった場合には、新たな時刻は記憶されずに時刻データは消去される。
【００３６】
前記のように「いってらっしゃい」を音声出力して、おかえりフラグを立てた後はスタンバイモード（Ｍ２）に移行する。当該状態は、ロボット玩具１が外出した使用者の帰りを待つという状態である。
当該おかえりフラグが立っているスタンバイモード（Ｍ２）の状態で、何らかのアプローチがあった場合、ＣＰＵ１１はおかえりフラグが立っているか否かを確認し、当該フラグが「おかえりなさ〜い」と音声を発声させ、「いってきます」の認識をした時間（おかえりフラグをたてた時刻）から今までの経過時間（すなわち、おかえりフラグを立てた時刻から次に音を検知した時刻までの経過時間）に応じて次に発声させるセリフの発声処理を分岐させる。この場合のセリフの内容は図１８（合わせて図１１参照）に示した第５〜第８のパターンに従い、図２０に示す内容のセリフが発声される。
すなわち、「いってらっしゃい」の発声より経過時間が３時間未満の場合には図１２に示す第５のパターンに従い、「いってらっしゃい」より３時間〜１０時間未満の場合には図１２に示す第６のパターンに従い、「いってらっしゃい」より１０時間〜１４時間未満の場合には図１２に示す第７のパターンに従い、「いってらっしゃい」より１４時間以上の場合には図１２に示す第８のパターンに従い音声が発声される。なお、この際に行われるセリフの選択、音声出力については前述したものと同様の処理で行われる。例えば、「いってきます」という特定の言葉を認識をしておかえりフラグをたてた時刻から、８時間後に何らかのアプローチがあった場合、ロボット玩具１は図１２に示す第６のパターンに従い、▲１▼「お帰りなさい！」、▲２▼「お疲れさま！」の２つのセリフ番号をランダムに選択して、セリフ番号に対応する音声データをもとにスピーカＳＰを介して音声を発声させる。
【００３７】
前記のようにスタンバイモード（Ｍ２）の状態でアプローチがあり、かつおかえりフラグが立っている場合には、「おかえりなさ〜い」と音声を発声させた後、第５〜第８のパターンのセリフが音声出力され、その後使用者からの音声入力が一定時間（５分程度）無い場合には「▲１▼さては泥棒？」、「▲２▼ウフフ・・・帰ってないと思ってるな・・・」等の音声を発声した後、再びスタンバイモード（Ｍ２）に戻る。
前記「おかえりなさ〜い」と音声を発声させた後５分経過前（システムパネルモードの状態）に音声入力があると、使用者との会話を行うための各音声会話処理が行われる。当該音声入力後の会話処理は『記念日セリフ１Ｎｏ．５ＤＸバージョン』、『記念日セリフ２Ｎｏ．５単発バージョン』、『季節セリフＮｏ．６』、『スタンバイセリフ２Ｎｏ．４』、『時刻別会話フローＮｏ．７』、『ドラ会話フローＮｏ．８』等であり、各処理の具体的な内容の説明は省略するが、これら一連の会話処理を行った後、再びシステムパネルモードに戻る。
【００３８】
なお、前記実施の形態では、所定のアプローチタイムを一週間単位で更新する構成としたが、他の玩具装置に応用する場合には１日単位で更新をしたり、２日単位で更新するようにすることも可能である。また、使用者自身の選択により、更新期間を自由に設定するようにしてもよい。
さらに、アプローチタイムは必ずしも、前の周の同一曜日のアプローチと同一時刻である必要はなく、当該アプローチタイムよりも１０分前であるとか、所定の当該アプローチタイムに対して一定の条件で変更されたものであってもよい。
【００３９】
また、時間管理、週の管理については、カレンダー情報を記憶させて、現在の日時が西暦○○年○月○日、○曜日の○時○分であるという認識ができるように構成しても良いが、クロック発信器からのクロック情報に基づいて所定時刻からの経過時間をカウントし、一週間毎にリセットされるような管理を行っても良い。すなわち、第１日目はカウント開始から２４時間経過まで、第２日は２４時間経過から４８時間まで、・・というような管理を行うことで、カレンダー情報を持たなくても１週間分の情報を管理するようにすることが出来る。
【００４０】
また、前記音声認識装置、若しくは会話装置を内蔵させたロボット玩具では、その外観形状を例えばアニメーションキャラクタを模した形態などにすることができる。この場合、発声される音声を、実際に放映されているアニメーションの声優の声と同一にすることができる。そのようにすることで、当該アニメーションキャラクタが実際にそばにいるかのような親近感を使用者に与えることが出来る。さらに、内蔵されているモータ等を駆動して各部を動作させることで、言葉の発声とともに放映されているアニメーションキャラクタ特有の動作を再現することができる。
【００４１】
【実施例】
次に、前記アプローチタイムになると実行されるプログラム処理の一例として、▲２▼「いってきます（でかける時間でしょ）アプローチタイム」（図９参照）になった場合の会話処理について説明する。図１４は、『アプローチフローＮｏ．２』の内、前記▲２▼「でかける時間でしょアプローチタイム」になった場合に行われる会話処理（Ｎｏ．００２）を表している。
所定のアプローチタイムになると、会話処理（Ｎｏ．００２）がスタートする（Ｓ４０）。会話処理がスタートすると「もう出かける時間でしょ？」と音声出力される（Ｓ４１）。当該音声はセリフ番号３１として定義されており、当該定義に従って発声されるものである。以下、フローチャート中所定のセリフ番号の表示があるものについては、複数用意された当該セリフの中からランダムに選択されたセリフが出力される。
前記「もう出かける時間でしょ？」という音声が発声されると、ＣＰＵ１１は使用者からの音声入力を待ち、所定時間内に音声が入力された場合若しくは入力されなかった場合に応じて、その後の処理を異ならせるようになっている。
この状態で「はい」若しくは「はい」に相当する言葉を認識した場合（Ｓ４２）には、２分の１の確率で次の２通りの処理（Ｓ４３またはＳ４４）の内の一つが選択され、処理が行われる。この選択でＳ４３の処理に分岐すると「それじゃあね！いってらっしゃい！」という音声が出力され、次の音声入力を待つ。この状態で何らかの音声が入力された場合（Ｓ４５）には▲１▼「今日も一日がんばってね！」、▲２▼「気をつけてね〜」等の音声の中からランダムに選択された一つの音声が発声され（Ｓ４６）、「いってらっしゃい」出力フラグを下げ、お帰りフラグを立て、「いってらっしゃい」時間フラグは維持したまま（Ｓ４７）当該会話処理（Ｎｏ．００２）を終了する（Ｓ４８）。
【００４２】
また、前記「それじゃあね！いってらっしゃい！」という音声が出力（Ｓ４３）された後に、何も音声が認識されなかった場合（Ｓ４９）には▲１▼「あれっ？もういっちゃった・・・」、▲２▼「挨拶しないなんて、だらしがないな〜」等の音声を発声し（Ｓ５０）、前記Ｓ４７の処理を行った後当該会話処理（Ｎｏ．００２）を終了する（Ｓ４８）
また、前記「はい」に相当する言葉を認識（Ｓ４２）した後に、２分の１の確率で処理（Ｓ４４）に移行した場合には、後述する会話処理（Ｎｏ．００３またはＮｏ．００４）が行われ、これらの処理が行われた後に「それじゃあね！いってらっしゃい！」という音声が出力（Ｓ４３）される。
【００４３】
前記「もう出かける時間でしょ？」と音声出力（Ｓ４１）された後に、使用者より「いいえ」若しくは「いいえ」に相当する音声が入力（Ｓ５０）された場合には、次の音声出力処理（Ｓ５１）が行われ、▲１▼「僕の勘違いか。」、▲２▼「あれっ、そうだっけ。」等のいくつかのセリフの中からランダムに選択された一の音声が出力され、次の処理に移行する。次の処理では、「いってらっしゃい」出力フラグは立てたまま、「いってらっしゃい」時間フラグをリセットし、次の「いってきます」という音声を認識した時を記憶し（Ｓ５２）当該会話処理（Ｎｏ．００２）を終了する（Ｓ５３）。
また、前記「もう出かける時間でしょ？」と音声出力（Ｓ４１）された後に、「はい」、「いいえ」若しくは「はい」、「いいえ」に相当する意味合いの言葉以外の音声が入力された場合（Ｓ５４）には、▲１▼「用意は早めにね」、▲２▼「早くしないと〜」等の音声が出力（Ｓ５５）され、前記（Ｓ５２）に移行し会話処理（Ｎｏ．００２）を終了する（Ｓ５３）。
また、前記「もう出かける時間でしょ？」と音声出力（Ｓ４１）された後に、音声が入力されなかった場合（Ｓ５６）には、▲１▼「予定ナシと・・・」等の音声が出力され前記（Ｓ５７）に移行し会話処理（Ｎｏ．００２）を終了する（Ｓ５３）。以上が会話処理（Ｎｏ．００２）の一連の処理の流れである。
【００４４】
次に、前記会話プログラム処理の中で行われる「イエス・ノー」会話処理について説明する。当該「イエス・ノー」会話処理は、使用者との音声会話を進めるためのプログラム処理方法であり、前記会話処理（Ｎｏ．００２）において、会話処理がスタートした後に行われている処理である。
当該会話処理は、プログラムの構成が簡単であるにもかかわらず、使用者との会話時に同じ言葉が繰り返して発声されることが少なく、会話の内容を豊富にすることができるような会話装置の一つを提供するためのものである。
当該処理は、使用者に対して特定の質問、問いかけ、話しかけを行った後、「はい」若しくは「はい」に相当する意味内容の音声が入力されたか、「いいえ」若しくは「いいえ」に相当する意味内容の音声が入力されたかに応じて次に発声する音声の音声グループ（後述するワードセット）を決定し、当該音声グループとして記憶されている複数のセリフの中から一のセリフを選択して音声報知させることを基本的な処理の流れとしている。また、「はい」「いいえ」、「はい」「いいえ」に相当する意味内容の言葉以外が入力された場合や、音声が何も入力されなかった場合には、それぞれに対応する音声出力が行われ、次に「はい」「いいえ」等に応じて複数の言葉の中から選択された一の言葉が発声され会話が進行するようになっている。
【００４５】
すなわち、上記「イエス・ノー」会話処理とは、使用者が何らかの言葉を話したときに、それが肯定的な言葉であるか、否定的な言葉であるかに応じて、会話処理を進行（分岐）させていくものである。
当該入力された音声が、肯定的であるか否定的であるかを判断するには、予め入力が想定される言葉に対して割り当てられた意味番号によって判断される。当該意味番号とは、肯定的であるか否定的であるか若しくは他の意味合いであるかという観点から言葉をいくつかのグループ（カテゴリー）に分け、当該入力された言葉がどのカテゴリーに属するものであるかに応じて付与された番号その他の識別符号である。
当該実施例に限られるものではないが、具体的な一例をあげると本実施例では、プログラム内に図２１に示すような内容のデータテーブルを有しており、認識された音声が肯定的な意味を表す「はい」、「うん」、「できた」、「できたでよ」という言葉であれば意味番号を１と定義し、否定的な言葉を表す「いいえ」「まだ」「まだだよ」であれば意味番号を２と定義している。本実施例では、会話処理プログラムの進行中に入力された音声が、特定の意味番号が定義付けされている言葉に相当する場合には、意味番号に応じて会話処理を進行させるようになっている。
また、入力される音声は、使用者に対する質問や問いかけに応じて変わるものであるから、特定の問いかけ等を行う会話処理プログラム（図２１の表に示すプログラム処理名（プログラム番号））毎に言葉に対する意味番号が定義づけされている。
【００４６】
当該意味番号による会話処理進行の例を前述した会話処理（Ｎｏ．００２）に当てはめて説明すると次のようになる。
当該会話処理は図２１に示す表の「もうでかける時間でしょ」と記載されているプログラム名の会話処理に相当する。プログラム番号はＮｏ．２−００２である。当該プログラム番号Ｎｏ．２−００２の会話処理では、ワードセット「３」として複数の認識ワードが登録されている。当該認識ワードが質問や問いかけに対する予め想定される言葉になる。そして、当該認識ワードに対してそれぞれ意味番号が定義づけられている。
前記、会話処理がスタート（Ｓ４０）した後の「もう出かける時間でしょ？」と音声出力（Ｓ４１）が行われた次の音声入力に対して、「はい」「いいえ」等の音声が入力される場合には、「はい」が入力された場合には意味番号が「１」であるから当該意味番号「１」に基づいて次の処理（Ｓ４２若しくはＳ４４）に移行し、「いいえ」が入力された場合には意味番号が「２」であるから当該意味番号「２」に基づいて次の処理（Ｓ５０）に移行する。また、「はい」「いいえ」以外の言葉であっても、「うん」という言葉であれば意味番号が「１」として定義されているので、当該意味番号に従って処理を行う。
また、音声は入力されたものの、予め想定されていない意味番号が定義付けされていない言葉が入力された場合には、音声のみが認識された（前記Ｓ５４）という処理を行う。
【００４７】
以上のように、本実施例における会話処理では、特定の会話処理に応じたワードセットとして質問や問いかけに対する想定される返答（言葉）を複数記憶しておき、入力された言葉の意味番号に応じて次に行われる処理を選択し処理を行うということが行われるようになっている。
【００４８】
次に、簡単に当該会話処理プログラムの他の例を説明する。
図１５は会話処理（Ｎｏ．００３）を表しており、当該会話処理は、前記の会話処理における処理Ｓ４４において２分の１の確率で選択される処理の一つとなっている。
当該会話処理においても、図２１に示すようにワードセット「５」として、ロボット玩具１の「用意はできたの？」という問いかけに対して想定される複数の認識ワード（「はい」、「うん」、「できた」、「できたでよ」、「いいえ」、「まだ」、「まだだよ」）が用意されている。また、同図に示すように各認識ワードに対してそれぞれ意味番号が定義されている。
当該会話処理（Ｎｏ．００３）では、ロボット玩具１の「用意はできたの？」という問いかけに対して、「はい」、「うん」、「しない」・・等の入力が行われた場合には所定の意味番号に応じた次の処理が行われるようになっている。
【００４９】
また、図１６は会話処理（Ｎｏ．００４）を表しており、当該会話処理は、前記の会話処理における処理Ｓ４４において２分の１の確率で選択される処理の一つとなっている。
当該会話処理においても、図２１に示すようにワードセット「６」として、ロボット玩具１の「遅刻しないよね」という問いかけに対して想定される複数の認識ワード（「はい」、「うん」、「しない」、「しないよ」、「いいえ」、「しちゃう」、「しちゃうよ」）が用意されている。また、同図に示すように各認識ワードに対してそれぞれ意味番号が定義されている。
当該会話処理（Ｎｏ．００４）では、ロボット玩具１の「遅刻しないよね」という問いかけに対して、「はい」、「うん」、「しない」・・等の入力が行われた場合には所定の意味番号に応じた次の処理が行われるようになっている。
【００５０】
また、図１７は会話処理（Ｎｏ．００５）を表している。
図９において示すスタンバイモード（Ｍ２）において所定のアプローチタイムになった場合に行われる処理である。
当該会話処理においても、図２１に示すようにワードセット「７」として、ロボット玩具１の「寝てもいい？」という問いかけに対して想定される複数の認識ワード（「はい」、「うん」、「いい」、「いいよ」、「いいえ」、「ダメ」、「ダメだよ」）が用意されている。また、同図に示すように各認識ワードに対してそれぞれ意味番号が定義されている。
当該会話処理（Ｎｏ．００４）では、ロボット玩具１の「寝てもいい？」という問いかけに対して、「はい」、「うん」、「いいよ」・・等の入力が行われた場合には所定の意味番号に応じた音声出力処理が行われるようになっている。
【００５１】
以上のように、当該実施例に係る会話処理は、特定の質問や問いかけに対して想定される答えとなる言葉を認識ワードとして記憶し、また、当該認識ワードを意味番号によって肯定・否定等にカテゴリー分けしていることにより、当該認識ワードが属するカテゴリー、すなわち、意味番号に応じて会話処理を進行させるようになっている。このように意味番号に応じて会話処理を分岐させ、かつ分岐先の処理においても複数用意されている音声のなかから一つの言葉が選択されるので、プログラムの構成が簡単であるにもかかわらず、使用者との会話においては同じ会話が繰り返されることが少なく、飽きずに使用を続けることができるようになっている。
【００５２】
【発明の効果】
以上説明した本願発明に係る音声認識装置は、以下に述べる効果を有する。すなわち、簡単な構成で、使用者が朝起きて、会社や学校へ出かけ、外出から戻り、寝るという定期的な行動に合わせて適切な音声会話出力をすることができるという効果を有する。例えば、厳密に使用者の定期的な行動を把握して対応する音声会話出力処理をしようとした場合には、長期間にわたって「おはよう」、「いってきます」、「ただいま」といった音声が認識できた時刻をデータを取得し、統計的な処理によって算出する必要がある。しかし、このように厳密に使用者の行動パターンを算出したとしても、所詮使用者の規則的ではない行動には対処することができない。したがって、一週間前に取得された使用者の行動情報を元に会話処理を行ったとしても、詳細に統計を取った場合と比較してもさほど影響はない。また、本発明の音声認識装置をロボット玩具に適用すれば、多少、実際の使用者の行動に即していなくても、玩具的な要素としてかえっておもしろみのあるロボット玩具を提供することができる。このように、大量のデータを取得しなければならないような大規模なハードウエア資産は必要とせず、最小限のハードウエアで玩具としておもしろみのある会話を成立させることができるという効果をも有している。
【００５３】
また、本発明の音声認識装置によれば、使用者が朝起きて、会社や学校へ出かけ、外出から戻り、寝るという行動に応じて音声が出力されるために、通常朝起きるということであれば目覚まし時計に起動時刻を自分で設定する等、スケジュール管理を目的としてスケジューラーを設定する作業を要することなく、特にスケジュール管理を目的とはしない当該行動に伴う使用者から音声認識装置への音声入力に応じて、使用者の行動パターンを記憶でき、さらにそれに合わせて適切な音声を出力することができる。また、本発明をロボット玩具に適用すれば、使用者がロボット玩具に対して「おはよう」「いってきます」など話し掛けて遊ぶことに伴って、使用者に特に意識させることなく使用者の行動パターンを記憶することができ、また、使用者の行動に合わせて適切な音声を出力することができるので、使用者としては予期せず音声が発声されることになり、飽きずに使用し続けることが出来るという効果がある。
さらに、正確ではないが、極端に不規則な生活パターンの人間を除き、大方の使用者にとってはおよその自己の生活・行動パターンに即したタイミングで音声が報知等されるので、使用者としては自分の行動を覚えていてくれるということで親しみを持ってロボット玩具に接することが出来、長期に渡り使用し続けることが出来るという効果を有している。
【００５４】
また、本願発明に係る会話装置では、会話を行う時間帯によって会話の内容を異ならせることが出来る。すなわち、朝であれば出勤前等の状況に見合った会話が行われ、夜であれば帰宅した使用者をねぎらったりという状況に合わせた会話を行うことが出来るという効果を有している。
さらに、「イエス・ノー」会話処理を行うことで、簡単なプログラムの構成であるにもかかわらず、豊富な会話を提供することができるという効果を有している。
また、本願発明に係るロボット玩具では、アニメーションキャラクタを模した形態などにすることで、親近感をもって使用できるという効果を有する。特に、実際にテレビ放映等されているアニメーションキャラクタを模したものであれば、実際に放映されているものの音声と同一の音声を用いることもでき、そのようにすることで、いっそう親近感を持たせることが出来る。その他、手、足、頭部等をモータ等によって駆動させることで、会話をしながら動いたり、アニメーションキャラクタ特有の動作や発声を再現することができる等の効果を有している。
【図面の簡単な説明】
【図１】本実施の形態に係る音声認識装置の電気的な構成を表すブロック図である。
【図２】本実施の形態に係る音声認識装置に用いる記憶手段の構成を表すアドレスマップである。
【図３】本実施の形態に係る音声認識装置に用いるプログラムで使用されるコマンドの一例である。
【図４】本実施の形態に係る音声認識装置のプログラムに用いられるコマンドの処理内容の一例を説明するための説明表である
【図５】本実施の形態に係る音声認識装置のプログラムに用いられるコマンドの処理内容の他の例を説明するための説明表である
【図６】本実施の形態に係る音声認識装置のプログラムに用いられるコマンドの処理内容のさらに他の例を説明するための説明表である
【図７】本実施の形態に係る音声認識装置に発声されるセリフのデータテーブルを表す表である
【図８】本実施の形態に係る音声認識装置に発声されるセリフを構成する語句のデータを表す表である
【図９】本実施の形態に係る音声認識装置のプログラム処理の全体構成を説明するための全体フローチャートである
【図１０】本実施の形態に係る音声認識装置のプログラム処理の要部に関するフローチャートである
【図１１】本実施の形態に係る音声認識装置の状態を説明するための説明表である。
【図１２】本実施の形態に係る音声認識装置に記憶される時刻を説明するための説明表である。
【図１３】本実施の形態に係る音声認識装置おいて実行されるプログラム処理のフローチャートの一例である。
【図１４】本実施の形態に係る音声認識装置おいて実行されるプログラム処理のフローチャートの一例である。
【図１５】本実施の形態に係る音声認識装置おいて実行されるプログラム処理のフローチャートの一例である。
【図１６】本実施の形態に係る音声認識装置おいて実行されるプログラム処理のフローチャートの一例である。
【図１７】本実施の形態に係る音声認識装置おいて実行されるプログラム処理のフローチャートの一例である。
【図１８】本実施の形態に係る音声認識装置おいて実行されるプログラム処理の概要を説明するための表である。
【図１９】本実施の形態に係る音声認識装置おいて発声されるセリフの一例を説明するための表である。
【図２０】本実施の形態に係る音声認識装置おいて発声されるセリフの他の一例を説明するための表である。
【図２１】本実施の形態に係る音声認識装置におけるワードセットを説明するための表である。
【符号の説明】
ＭＲ右マイク
ＭＣ中央マイク
ＭＬ左マイク
ＳＭ１サーボモータ
ＳＭ２サーボモータ
ＳＭ３サーボモータ
ＤＣ１ＤＣモータ
ＤＣ２ＤＣモータ
ＤＣ３ＤＣモータ
ＤＣ４ＤＣモータ
ＬＥＤ１発光ダイオード
ＬＥＤ２発光ダイオード
ＬＥＤ３発光ダイオード
ＳＰ音声スピーカ
ＳＷ１押圧スイッチ
ＳＷ２押圧スイッチ
ＳＷ３押圧スイッチ
ＰＳＷ電源スイッチ
１ロボット玩具
３玩具本体（筐体）
１１ＣＰＵ
１３音声認識ＩＣ
１５ＲＯＭ１５
１７フラッシュＲＯＭ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech recognition device, a conversation device having the speech recognition device, and a robot toy having the speech recognition device or the conversation device.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, a “small electronic device having an audio output function” that has a predetermined audio output corresponding to time and calendar information when opening and closing a lid is known (Patent Document 1). The technique described in Patent Literature 1 relates to a device that outputs calendar information, time information, and the like by voice during opening and closing operations of a lid.
[0003]
[Patent Document 1]
JP-A-57-93444
[0004]
[Problems to be solved by the invention]
The technology described in Patent Document 1, depending on the opening operation of the set the time or the lid, the time notification of "Good morning", but those that can be uttered voice such as "Hello", the voice Is set based on the user's operation, and if the user does not set or access the device, the notification by the predetermined sound is not performed.
[0005]
The present technology was invented in view of the above points, and is designed to recognize a user's unconscious actions or specific words uttered during a certain time of day, such as "I will come". By recognizing, a speech recognition device or the like configured to automatically set a date and time at which a voice associated with the recognized word should be uttered and to utter the associated voice at the set date and time. The task is to provide.
It is another object of the present invention to provide a conversation device capable of having a conversation with a user by voice using the speech recognition device.
Another object of the present invention is to provide a robot toy in which the voice recognition device or the conversation device is built in a casing of a predetermined form.
[0006]
[Means for Solving the Problems]
In order to solve the above problems, the present invention has the following configuration. That is, the voice recognition device according to claim 1 is
Control means including a CPU, a clock transmitter, and the like;
Rewritable storage means such as a non-volatile ROM,
Voice input means for inputting voice;
Voice recognition means for recognizing the voice input by the voice input means as words,
When a predetermined word is recognized by the voice recognition unit, a time is acquired based on a clock signal of the clock transmitter,
When the time at which the predetermined word is recognized is within a predetermined time zone, at a predetermined time on a day on which a predetermined number of days has elapsed after the day at which the predetermined word was recognized, It is characterized by producing a predetermined sound corresponding to the predetermined sound.
[0007]
Further, in order to solve the above problem, the present invention has the following configuration. That is, the invention according to claim 2 is the speech recognition device according to claim 1,
The predetermined time is the same time as the time at which the predetermined word is recognized or a time set by the time.
[0008]
Further, in order to solve the above problem, the present invention has the following configuration. That is, the conversation device according to claim 3 is
The speech recognition apparatus according to claim 1 or 2, further comprising a conversation processing program that is executed according to the words recognized within the preset time period.
[0009]
Further, in order to solve the above problem, the present invention has the following configuration. That is, the robot toy according to claim 4 is:
The voice recognition device according to claim 1 or 2 is built in a housing having an outer shape imitating a human, an animal, an animation character, or the like.
[0010]
Further, in order to solve the above problem, the present invention has the following configuration. That is, the robot toy according to claim 5 is:
The conversation device according to claim 3 is built in a housing having an outer shape imitating a human, an animal, an animation character, or the like.
[0011]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, as one embodiment of the voice recognition device according to the present invention, a robot toy capable of talking with humans by voice will be described.
FIG. 1 is a block diagram illustrating an electrical configuration of a conversation system, a drive system, and the like of the robot toy 1 according to the present embodiment. The robot toy 1 has a control means (control circuit) mainly composed of the CPU 11 and the like for controlling various devices described below. The control means has a clock transmitter (not shown) in addition to the CPU 11. The clock transmitter outputs a predetermined clock pulse to the CPU 11, and the CPU 11 performs predetermined control based on the clock pulse.
In addition, the robot toy 1 has a voice recognition IC 13, a ROM 15, a flash ROM 17, and the like, and has various means such as switches and sensors as input means for these devices and the control means.
The CPU 11 is an arithmetic processor, which controls the voice recognition IC 13, the ROM 15, the flash ROM 17, and other devices, and mainly issues instructions to each device in accordance with a program stored in the ROM 15. Further, the CPU 11 has a function of managing time based on a pulse signal output from the clock transmitter (current time calculation function / timer function, day of week management function, etc.).
[0012]
The voice recognition IC 13 can recognize a voice input via a microphone (three microphones MR, MC, and ML in the present embodiment) as voice input means provided in the robot toy 1 as a predetermined word. This is a voice recognition unit having a function. In the present embodiment, "RSC-300-DIE manufactured by Sensory" is adopted as the voice recognition means IC13.
The three microphones MR, MC, and ML are respectively incorporated in the right side of the head, the face, and the left side of the head of the housing constituting the robot toy 1, and the input (sound volume) to any of the microphones is large. It is possible to determine from which direction the input voice is uttered based on the loudness of the sound. In addition to the voice recognition function, the voice recognition IC 13 intervenes between the CPU 11 and the ROM 15 and the flash ROM 17 and performs a function of inputting / outputting commands and information to / from the ROM 15 and the flash ROM 17 in accordance with commands (signals) from the CPU 11. Have.
The ROM 15 is a memory IC that stores a program for operating the CPU 11, voice generated via the speaker SP (data for voice utterance), and the like. The flash ROM 17 is a non-volatile memory IC for storing information unique to a person who uses the robot toy 1 (hereinafter, “user”) and various kinds of information temporarily necessary for control.
As a means for inputting instructions and signals to the CPU 11, a push switch SW1 (hereinafter referred to as a "sleep switch SW1" or a "tail switch") for shifting the robot toy 1 to a standby mode to be described later or to shift from the standby mode to another mode also described later. SW1 ”) and other switches that detect the approach or contact of the robot toy 1 by the user, or the state of the robot toy 1 (inclined or upright, etc.) or the situation around the robot toy 1 ( SW2) (including sensors). The switch SW2 and the switch SW3 in the switch are push switches for detecting a push or the like of the robot toy 1 by the user, and the switches SW4 to SW7 are in a state of the robot toy 1 (inclination, upright, etc.) or around the robot toy 1. It is a sensor for detecting the environment.
[0013]
Further, three light emitting diodes (LEDs 1 to 3), a speaker SP, a liquid crystal display device LCD, three servo motors (SM 1 to 3), and four DC motors (M 1 to 4) are provided. , Respectively.
Each of the servo motors SM1 to SM3 and the DC motors M1 to M4 has a motor driver for driving each motor, and rotates at a predetermined angle or for a predetermined time based on a command from the CPU 11, so that the robot toy 1 travels (walks). ), And the head is rotated.
[0014]
In the present embodiment, the housing 3 constituting the appearance of the robot toy 1 is formed in a shape imitating an animation character, and is constituted by members constituting a head, a torso, an arm, and a leg. ing. In the housing 3, the CPU 11, the voice recognition IC 13, the ROM 15, the flash ROM 17, and the like are housed together with a battery serving as a power supply while being provided on a predetermined circuit board. In addition, the form of the said housing | casing may imitate a human, an animal, etc.
The head is attached to the body so as to turn left and right, and is turned within a predetermined angle by the servo motor SM1. Similarly, the left and right arms are mounted on the right and left sides of the body so as to rotate in the vertical direction, and are rotated or rotated by a predetermined angle by servo motors SM2 and SM3.
Further, a power wheel (not shown) is provided on the bottom of the body portion, and the power wheel is rotated by the DC motor so that the housing 3 can be moved back and forth and left and right.
Further, the robot toy 1 includes three light emitting diodes LED1, LED2, LED3 that are turned on and off by a signal from the CPU 11, and a liquid crystal display device LCD that performs predetermined image display and the like. A voice speaker SP is provided at a predetermined position (a portion corresponding to a mouth) on the face of the head and is provided so as to be visually recognizable at a predetermined position in the inside and outputs the voice generated by the voice recognition IC 13.
[0015]
The sleep switch SW1 is a push switch, and the circuit is closed by being pushed by the user, and the push on the sleep switch SW1 is detected by the CPU 11.
Specifically, the robot toy 1 is provided with a member corresponding to a "tail" incorporating a sleep switch SW1, and the circuit of the sleep switch SW1 is closed by grasping or pressing the tail. .
The switches SW2 and SW3 are respectively built in the inside of the left and right arms (hands), and the switches SW2 and SW3 are pressed by an operation of, for example, a user holding the hand of the robot toy 1, and the hands are held. The CPU 11 can detect that such an operation has been performed.
As other types of the switches (including the sensors) SW4 to SW7, for example, a head-patched operation is detected by providing the switch on the head (configured by an optical sensor, and is configured by a change in the amount of input light. A switch for detecting a motion of stroking the head), an inclination sensor for detecting the inclination of the robot toy 1, an optical sensor for detecting the brightness of the surroundings, various other operations of the user with respect to the robot toy 1, and a robot. Sensors that can detect the environment around the toy 1 are used, and are provided at appropriate positions on the housing 3.
[0016]
The ROM 15 is a memory that stores various data and the like, and stores a program executed by the CPU 11 or the voice recognition IC 13 and data for voice generation such as speech data and voice synthesis data. FIG. 2A shows a memory map of the ROM 15.
The program stored in the ROM 15 is a program which is described by combining each command such as “T, R...” Shown in FIG. FIGS. 4, 5, and 6 show an example of the description formula of the command shown in FIG. 3 and the processing content of the command. The program is mainly for the CPU 11 to drive the voice recognition IC 13, and the voice recognition IC 13 recognizes the voice input to the microphones MR, MC, and ML as words by the program, or executes a predetermined voice. Speech is generated based on the utterance data, and data is input to and output from the ROM 15 and the flash ROM 17 in accordance with these processes.
[0017]
The ROM 15 stores voice generation data in addition to the program, and the voice generation data includes speech data and voice synthesis data. The serif data is data related to serifs as shown in the tables of FIGS. 7 and 8. Based on the serif data and the speech synthesis data, a predetermined voice is generated in the voice recognition IC 13 and output from the speaker SP. It is for.
In addition, the words output as audio are obtained by combining short phrases stored in correspondence with the “subdivision number” shown in FIG. 8 based on a combination of “subdivision numbers” defined for each of the line numbers shown in FIG. They are generated by arranging them. For example, the word defined in "Serif No. 1" in FIG. 7 is defined as the words "1", "2", "3", and "4" of the "subdivision number" shown in FIG. According to this definition, dialogue is generated and output by voice, such as "Hello!
In the dialogue for each dialogue number shown in FIG. 7, a non-speech time (interval) in which a voice given between words and phrases together with the dialogue is not uttered is set. By setting such a non-speech time, the speech generated by the speech recognition IC 13 can be set to a tone that is easy for a person to hear as if the person is talking normally.
[0018]
The flash ROM 17 is a rewritable storage unit (non-volatile memory) having a function of storing data acquired as a result of the execution of the command and personal information such as age and date of birth input by the user. It is. FIG. 2B shows a memory map of the flash ROM 17.
[0019]
Next, the program executed in the robot toy 1 and the contents of the program control will be described. FIG. 9 is a conceptual diagram showing the entire configuration of the program, and FIG. 10 is a main flowchart showing a main part of the conversation process included in the conceptual diagram.
First, the overall configuration will be described. Control by a program can be roughly divided into three modes. That is, a sleep mode (M1), a standby mode (M2), and a system panel mode (M3). The outline of each mode is as follows.
[0020]
The sleep mode (M1) is a mode in which the main functions of the robot toy 1 are stopped. In the sleep mode, the main operation is not performed when the user is not playing with the robot toy 1 (for example, when the user is sleeping), and the function of the robot toy 1 is reduced for the purpose of saving power consumption. Mode.
The transition to the mode is performed by the user operating the sleep switch SW1 (built in the tail in this embodiment) or by inputting to the switches SW2 to SW7 and the microphones (MR, MC, ML). This is performed when there is no input for a predetermined time (for example, 36 hours).
In this mode, the robot toy 1 performs input even if the user speaks by voice (voice input), shakes the housing 3 (input by the tilt sensor), or holds the hand (input to the switch inputs SW2 and SW3). As not accepting. However, only the operation of the sleep switch SW1 built in the tail can be detected and the timing function can be performed. When the sleep switch SW1 of the tail is operated, the robot toy 1 enters the standby mode (M2) described below. When the tail sleep switch SW1 is operated again, the mode returns to the sleep mode (M1).
[0021]
In the standby mode (M2), the robot toy 1 can detect a user's operation or input to a switch or a sensor (SW1 to SW7 and the like) and an external sound input by a microphone (hereinafter, referred to as "approach"). Mode. Note that the sound detection performed in the standby mode (M2) is not so-called voice recognition for recognizing a voice spoken by a person as a word, but a certain sound is input to a microphone (MR, MC, ML). This is to detect that.
The robot toy 1 is in the standby mode (M2) and the system panel mode (except for forcible sleep mode (M1) by pressing the sleep switch SW1 or sleep mode (M1) when there is no approach for 36 hours. M3).
The transition from the standby mode (M2) to the system panel mode (M3) is performed according to the approach described above. In addition, in addition to the above approach, the standby mode (M2) is automatically activated when a specific time (approach time) described later arrives (the time calculated based on the clock clock information matches the specific time). After uttering a predetermined voice, the system shifts to the system panel mode (M3).
[0022]
The system panel mode (M3) is a mode that is shifted from the standby mode (M2) by the above-described approach. The robot toy 1 recognizes the input voice as a word, and the word (ROM 15) corresponding to the recognized word. This is a mode in which voice conversation between the user and the robot toy 1 can be performed by outputting voice generation data including speech data and voice synthesis data) stored in the robot toy.
Further, the processing performed in the system panel mode (M3) differs depending on the time at which the approach is made, as described later.
[0023]
The flag will be described. In the description of the present embodiment, a flag is described as meaning data (present (1) or absent (0)) in a specific storage area managed mainly by the CPU 11.
In the present embodiment, a plurality of flags corresponding to the state of the robot toy 1 are provided. The main ones are a "return flag" and a "good morning flag", and a "return output flag" and a "return time flag" described later.
The presence / absence of a flag is represented by “1” or “0”. In general, a state where the flag is “1” is referred to as a flag being raised, and when “0”, the flag is not raised or disappears. And so on.
[0024]
The contents of the control from the sleep mode (M1) to the standby mode (M2) in the initial state will be described with reference to the conceptual diagram shown in FIG.
When the main power supply PSW is turned on (power ON) from the state at the time of purchase (shipment) or from the state of all reset, “initial setting flow No. 1] is performed.
Although details of the processing contents of the initial setting are omitted, in the processing, the current date, time, user's birthday, and the like are input according to the program described by the above-described command. This setting is performed by the user using the liquid crystal display device LCD accommodated in the back of the robot toy 1 and predetermined operation switches, and is performed by the user in an interactive manner according to the content displayed on the liquid crystal display device LCD. Is
After the above processing, a voice uttered by the user corresponding to a predetermined word is registered. In this processing, for example, words such as "pocket", "coming", "walk", "dance", "xx-emon", "tool", "dryaki", "mouse", "high", "no", and pocket opening / closing keywords are used in an interactive form. This is a process for improving the accuracy of voice recognition performed during the conversation process. Each input voice is stored at a predetermined address in the flash ROM 17.
The above “Initial setting flow No. When "1" is completed, the processing shifts to the system panel mode (M3). Thereafter, the standby mode (M2) and the system panel mode (M3) are alternately repeated as long as the sleep mode (M1) does not occur due to the operation of the sleep switch SW1 or the like.
[0025]
When the power is turned on in a state where the initial setting is completed, the sleep mode (M1) is initially set. In the sleep mode (M1), unless the sleep switch SW1 provided on the tail serving as a start-up switch is operated, a minimal function such as a clock function and retention of stored information is left. This is a control for preventing the operation of. This is a function for minimizing power consumption at night or when not in use.
When the sleep switch SW1 provided at the tail is operated during the “sleep” control, the standby mode (M2) is set.
When shifting to the standby mode (M2), at the same time as entering the mode, various states of the robot toy 1 such as a calendar, a time, and whether or not a pocket is closed are checked to check whether the battery has run out. Check for voltage drop. After these books, an approach to the robot toy 1 (simple sound detection, input to each switch, voice input by user's talking, etc.) is detected, and when there is an approach, "standby dialog" described later is described in detail. 1No. This mode is for shifting to the system panel mode (M3) after performing processes such as "3".
[0026]
Next, an outline of a mode cycle for shifting from the standby mode (M2) to the system panel mode (M3) and from the system panel mode (M3) to the standby mode (M2) will be described with reference to the flowchart of FIG. 10 and the control table shown in FIG. It will be described using FIG.
Hereinafter, a case where the approach (S1) is present as the basic operation will be described. In the standby mode (M2), when there is an approach (S1) to the robot toy 1 or when a predetermined approach time has been reached (the approach time will be described later), the robot toy 1 enters the system panel mode ( Move to M3).
The fact that the approach (S1) has been performed means that an input to a switch or a sensor provided on the arm, the housing 3, or the like is detected, or some sound is detected by a microphone (MR, MC, ML). It is determined by That is, the input to each switch or the like means that the user has performed an operation of speaking or touching the robot toy 1, and the robot toy 1 detects the operation.
When there is the approach (S1), the CPU 11 checks whether the "return flag" is set (S2). In the present embodiment, the state in which the “return flag” is set is a state in which the robot toy 1 is waiting for the return of the user who has gone out, and the state in which the “return flag” is not set means the user. Means that the process is performed in a state where is returned (has not gone out).
[0027]
When the "return flag" is set (return flag = 1) (S3), after the approach is detected, a voice of "return home" is uttered (S4), and the "return flag" is updated (S4). A return flag is set to "0" (S5). After the updating process, the time at which the approach is performed is confirmed (S6), and the voices of the later-described fifth to eighth patterns are uttered according to the confirmed time (S7). .
The fifth to eighth patterns are changed according to the time (t) from when the “return flag” is set to when the next approach is performed. For example, in the present embodiment, the “return flag” is set. It is defined by the time of less than 3 hours, 3 hours or more and less than 10 hours, 10 hours or more and less than 14 hours, or 14 hours or more after being used. This is for reproducing a state in which the robot toy 1 is delighted when the time from when the user goes out to return is early, and gets angry when the return is late, such as a state that is performed in an actual home.
[0028]
If the "return flag" is not set when the approach (S1) is made (return flag = 0) (S8), the CPU 11 checks whether the "good morning flag" is set (S9). In this case, the state in which the “good morning flag” is set is a state in which the user waits for the approach to the robot toy 1 after waking up in the morning, as will be described in detail later, and the “good morning flag” is set. The state in which no approach has been taken means that the approach has already been performed in the morning of the day (after 6:00).
When the “good morning flag” is set (S10), the time h is checked (S11). If the time h is between 6:00 and 23:00 (S12), a voice saying "Good morning" is uttered (S13), and the "Good morning flag" is updated (Good morning flag = 0) (S14). Then, the time h is stored in the flash ROM 17 together with the day of the week as new or updated information as the “good morning” approach time (S15).
Then, after 6:00 am on the same day of the next week, if the physical approach is not performed until the stored approach time ((1) Good morning approach time), the system automatically switches from the standby mode (M2). It is set to shift to the system panel mode (M3). That is, when the “good morning approach time” stored on the same day of the next week is reached, the robot toy 1 utters “good morning” when other flag conditions are satisfied, and waits for the user to speak.
[0029]
As a result of the confirmation of the time h (S11), if the time h is between 23:00 and 6:00 (S16), the effect that the user is still sleeping is "What, what? A different voice from "Good morning" such as "What happened at such a time?" Is uttered (S17), and the system shifts to the system panel mode (M3). The sound generated by the robot toy 1 for the approach in accordance with the time zone in which the approach was performed (in this embodiment, from 6:00 am to before 11:00 pm, and before 6:00 am or after 11:00 pm). Change.
Also, as a result of checking whether or not the "good morning flag" is set (S9), if the "good morning flag" is not set, "Wow! What?" (S17), and the mode shifts to the system panel mode (M3).
[0030]
As described above, when there is any approach or when a predetermined approach time is reached, predetermined processing is performed after confirming the “return flag” and “good morning flag”, and the robot toy 1 is set in the system panel mode. Move to (M3).
The main control performed in the system panel mode (M3) is that if the voice input is not performed until 5 minutes have elapsed since the shift to the mode, the mode is shifted to the standby mode (M2) again, and the voice input is performed. If there is, the input voice is recognized as a word by the voice recognition IC, and a predetermined conversation process is performed according to the recognition result. In particular, in this mode, if the voice input by the user is recognized as a predetermined word "coming," the time when this word was recognized is memorized ((2) comes (approaching time) approach time). Then, the "return flag" is set. That is, when the specific word "I'm coming" is recognized, the flash ROM 17 stores or updates "(2) I'm coming approach time" together with the day of the week. The stored "[2] coming approach time" refers to the state of the "return flag" of the robot toy at the same time on the same day of the next week, and the return flag = 0 (the word "return" is recognized. (That is, not yet out), it is used to give voice notification such as "It's time to go out".
Further, in the system panel mode (M3), when a word other than a specific word such as "I'm coming" is recognized, a predetermined conversation process is performed according to the recognition result.
[0031]
The approach time will be described with reference to the table shown in FIG. In the present embodiment, as described above, two approach times updated every week are stored. That is, the approach time (1) (“good morning approach time”) stored when the voice input by the user is recognized as the word “good morning” and the word “coming” are recognized. This is the approach time (2) (“approaching time, approach time”) to remember when you do.
Each of the approach times is stored in a predetermined area of the flash ROM 17, and the “approach flow No.” is set on the same day of the next week (the same day one week later) according to the stored approach time. 2) are performed.
[0032]
In the present embodiment, four types of control methods are referred to as “approach flow No. 2 ”.
The first approach flow is as shown in FIG. The first approach flow corresponds to “good morning approach time”, and the process starts when the day and time stored as the approach time arrive.
When the process is started (S20), "(1) Good morning! Wake up in the morning! Wake up!", "(2) Good morning! It's time to get up! Wake up!", "(3) Good morning! One voice is randomly selected from the three words "Get up! Get up!" And output (step S21). When the voice is output, the user enters a state of waiting for input of a voice uttered. . In this state, when the user inputs a voice (S23), next, "(1) It is a pleasant morning", "(2) I got up properly", "(3) I have another sleep ..." Is selected and uttered (S24). Next, the "good morning flag"("goodmorning" output flag) is lowered, the good morning approach time (the "good morning" time flag is maintained) is maintained (S25), and the process is terminated (S26).
After the voice is uttered in step S21, if the voice uttered by the user cannot be detected for a certain period of time (5 minutes in this embodiment) (S27), "(1) Morning ... One of the voices "Let's go to bed ...", "(2) It's good to sleep slowly!", And "(3) Gu This is a nap" is selected and spoken (S28). Next, a "good morning flag"("goodmorning" output flag) is set, and a process of resetting "good morning approach time"("goodmorning" time flag) is performed (S29), and the process ends (S30).
In addition, conversation processing shown in FIGS. 14, 15, 16, and 17 is performed as second to fourth approach flows.
[0033]
Next, [Standby line 1 No. 3] The processing will be described.
This processing is uttered according to whether or not the “good morning flag” is set and the time when the approach to the robot toy 1 is performed when the approach to the robot toy 1 is performed in the standby mode (M2). This is a control to output a different dialogue and shift to the system panel mode (M3).
As shown in FIG. 18, the process is divided into eight patterns depending on the time at which the approach is performed, and the pattern is further divided into two depending on before and after the recognition of the word "go to go" from the user. (See FIG. 11).
The first pattern is “when no flag is raised between 6:00 am and 23:00 pm”, and the second pattern is “from 23:00 to 6 am The third pattern is “from 6:00 in the morning to the time when the good morning flag is set”, and the fourth pattern is “from the time when the good morning flag is set to 11:00”. The "time when the good morning flag is on" refers to the time stored in the flash ROM 17 on the same day one week ago, and can be rewritten weekly. In addition, the time is not stored in the first week when the initial setting is performed, or when the robot toy 1 fails to recognize the word “good morning” in a day (when not input), The "good morning flag" is not set on the same day of the next week.
[0034]
FIG. 19 shows a list of sounds uttered in the first to fourth patterns. For example, in the case of the first pattern, if there is some approach to the robot toy 1, the dialogue numbers "(1) 82, (2) 1019, (3) 83, (4) 84, (5) 85 Are randomly selected and output from the list, and when (1) is selected, the line "Munyanya" is uttered, and then the robot toy 1 shifts to the system panel mode (M3). In addition, when the speech is output as speech, as shown in FIG. 7, the phrase data defined by the subdivision number (FIG. 6) corresponding to the selected speech number is combined to form one speech. Is output.
If there is no voice input by the user after the voice is output (“No voice” in FIG. 11), after waiting for about 5 minutes, the voice numbers “<1> 86, <2> 28” are displayed. After the sound is randomly output, the mode shifts to the standby mode (M2).
[0035]
Further, in the system panel mode (M3), if the user further inputs a voice after the voice is output as a voice, the following processing is performed according to the content of the voice.
When a voice is input to the microphone, the CPU 11 determines whether the word “coming” has been input as a result of the processing of the voice recognition IC 13. If the voice is not recognized as “coming”, it is assumed that another voice has been input and “anniversary dialogue 1 No. The processing branches to “5DX version”. In addition, in the specification of the present application, the description after the process will be omitted.
If the input voice is recognized as "coming," the voice recognition IC 13 outputs "coming home" as a voice, and at the same time, sets the "return flag" and shifts to the standby mode.
Further, as described above, the time when the first approach was made after 6:00 in the morning and the time when the “return flag” was set are stored every week on a weekly basis. That is, a storage area for one week is secured, and after the first week has elapsed, each time is rewritten and stored each time each day of the week comes. If there is no approach, the time data is erased without storing a new time.
[0036]
As described above, the voice of "Welcome" is output, and after the return flag is set, the mode shifts to the standby mode (M2). This state is a state where the robot toy 1 waits for the user who has gone out to return.
If there is any approach in the standby mode (M2) in which the home return flag is on, the CPU 11 checks whether the home return flag is on, and utters a voice saying that the home return flag is on. According to the elapsed time from the time when the "coming back" is recognized (the time at which the welcome back flag was set) to the present (that is, the elapsed time from the time at which the welcome back flag is set to the time at which the next sound is detected) Then, the process of uttering the next line to be uttered is branched. In this case, the contents of the dialogue are the words shown in FIG. 20 according to the fifth to eighth patterns shown in FIG. 18 (see also FIG. 11).
In other words, when the elapsed time is less than 3 hours from the utterance of "Welcome," the fifth pattern shown in FIG. 12 is used. According to the sixth pattern, when the time is less than 10 hours to 14 hours from "Welcome," the seventh pattern shown in FIG. 12 is used. When the time is 14 hours or more than "Welcome," the first time shown in FIG. A voice is uttered according to the pattern No. 8. Note that the selection of the dialogue and the audio output performed at this time are performed in the same manner as described above. For example, if there is any approach 8 hours after the time when the specific word "I'm coming" is recognized and the return flag is set, the robot toy 1 follows the sixth pattern shown in FIG. ▼ “Welcome home!”, ▲ 2 ▼ Two dialogue numbers “Thank you!” Are selected at random, and voice is uttered via the speaker SP based on voice data corresponding to the voice number.
[0037]
As described above, when there is an approach in the standby mode (M2) and the homing flag is on, a voice saying "No welcoming home" is uttered, and then the dialogue of the fifth to eighth patterns is performed. Is output as voice, and if there is no voice input from the user for a certain period of time (about 5 minutes), "(1) Is it a thief?", "(2) Uh ... After returning to the standby mode (M2).
If there is a voice input 5 minutes before the voice of "Welcome back!" Is emitted (in a state of the system panel mode), each voice conversation process for performing a conversation with the user is performed. The conversation processing after the voice input is "anniversary dialogue 1 No. 5DX version "," Anniversary dialogue 2No. 5 single-shot version ”,“ Seasonal dialogue No. 6 ”,“ Standby line 2 No. 4, "Time-based conversation flow No. 7 ”,“ Dora Conversation Flow No. 8 ", etc., and a detailed description of each process is omitted, but after performing a series of conversation processes, the process returns to the system panel mode again.
[0038]
In the above-described embodiment, the predetermined approach time is updated on a weekly basis. However, when the present invention is applied to other toy devices, the update is performed on a daily basis or on a two-day basis. It is also possible to Further, the update period may be freely set by the user himself / herself.
Further, the approach time does not necessarily have to be the same as the approach on the same day of the week before, but may be changed 10 minutes before the approach time or under certain conditions with respect to the predetermined approach time. May be used.
[0039]
In addition, for time management and week management, calendar information may be stored so that the current date and time can be recognized as XX year, month, day, and day of the week as hour and minute. It is good, however, that the elapsed time from a predetermined time is counted based on the clock information from the clock transmitter, and management may be performed such that the time is reset every week. That is, the first day is managed from the start of counting to 24 hours, the second day is managed from 24 hours to 48 hours, and so on. Can be managed.
[0040]
Further, in the case of the robot toy incorporating the voice recognition device or the conversation device, the external shape thereof can be, for example, a form imitating an animation character. In this case, the voice to be uttered can be made the same as the voice of the voice actor of the animation actually being broadcast. By doing so, it is possible to give the user a sense of closeness as if the animation character is actually near. Further, by driving a built-in motor or the like to operate each unit, it is possible to reproduce an operation peculiar to the animation character that is broadcast together with the utterance of words.
[0041]
【Example】
Next, as an example of the program processing executed when the approach time is reached, a description will be given of the conversation processing in the case where the (2) “coming (the time to go out) approach time” (see FIG. 9) is reached. FIG. 14 shows “Approach flow No. 2), the conversation processing (No. 002) performed when the above-mentioned (2) “time to go out is approach time” is reached.
When the predetermined approach time comes, the conversation process (No. 002) starts (S40). When the conversation process starts, a voice is output as "It's time to go out?" (S41). The voice is defined as a dialogue number 31 and is uttered according to the definition. Hereinafter, in the flowchart, when a predetermined line number is displayed, a line randomly selected from a plurality of prepared lines is output.
When the voice saying "It's time to go out?" Is uttered, the CPU 11 waits for a voice input from the user, and depending on whether the voice is input within a predetermined time or not, the subsequent processing is performed. Are different.
In this state, when a word corresponding to "Yes" or a word corresponding to "Yes" is recognized (S42), one of the following two processes (S43 or S44) is selected with a probability of 1/2, Processing is performed. When the process branches to the process of S43 by this selection, a voice saying "That's it! Come on!" Is output, and the next voice input is awaited. In this state, if any voice is input (S45), the voice is randomly selected from voices such as (1) "Keep working for one day today" and (2) "Be careful!" One voice is uttered (S46), the "Welcome" output flag is lowered, the return flag is set, and the "Welcome" time flag is maintained (S47), and the conversation process (No. 002) ends. (S48).
[0042]
If no voice is recognized (S49) after outputting the voice saying "That's it! Come on!" .., {Circle around (2)} "Say no greeting, there is no sloppyness" or the like (S50), and after performing the process of S47, terminate the conversation process (No. 002) (S48).
Further, when the process is shifted to the process (S44) with a half probability after recognizing the word corresponding to the "yes" (S42), a conversation process (No. 003 or No. 004) described later is performed. Then, after these processes are performed, a voice saying “That's it! Come on!” Is output (S43).
[0043]
After the voice is output (S41) and the voice corresponding to "No" or "No" is input by the user (S50), the next voice output processing (S51) is performed. ) Is performed, and one voice that is randomly selected from several dialogues such as (1) "Is it my misunderstanding?", (2) "Oh, that's right." Move to the processing of. In the next process, while the "Welcome" output flag is set, the "Welcome" time flag is reset, and the time when the next "coming" voice is recognized is stored (S52). No. 002) is ended (S53).
Also, when a voice other than a word having a meaning corresponding to "yes", "no" or "yes" or "no" is input after the voice output of "Is it time to go out?" (S41) ( In S54), voices such as (1) "prepare early" and (2) "must be early" are output (S55), and the flow shifts to (S52) to execute the conversation process (No. 002). The process ends (S53).
If the voice is not input (S56) after the voice is output (S41), "Is it time to go out?", A voice such as (1) "scheduled pear and ..." is output. The flow shifts to (S57) and ends the conversation process (No. 002) (S53). The above is the flow of a series of processes of the conversation process (No. 002).
[0044]
Next, "yes / no" conversation processing performed in the conversation program processing will be described. The “yes / no” conversation process is a program processing method for advancing a voice conversation with the user, and is a process performed after the conversation process is started in the conversation process (No. 002).
In the conversation process, despite the simple structure of the program, the same words are rarely repeated during conversation with the user, and the conversation device can enrich the contents of the conversation. To provide one.
In this processing, after asking the user a specific question, asking, or speaking, a voice having a meaning equivalent to "yes" or "yes" is input, or "no" or "no". A voice group (word set described later) of the voice to be uttered next is determined according to whether the voice having the meaning content is input, and one voice is selected from a plurality of voices stored as the voice group. The notification of voice is a basic processing flow. Also, if a word other than words with meanings equivalent to "Yes", "No", "Yes", or "No" is input, or if no voice is input, the corresponding audio output is output. Then, one word selected from a plurality of words is uttered according to "yes", "no", etc., and the conversation proceeds.
[0045]
That is, the "yes / no" conversation process means that when the user speaks a certain word, the conversation process proceeds according to whether the word is a positive word or a negative word ( Branch).
To determine whether the input voice is affirmative or negative, it is determined based on a meaning number assigned to a word that is assumed to be input in advance. The meaning number divides words into several groups (categories) from the viewpoint of being positive, negative, or another meaning, and to which category the input word belongs. It is a number or other identification code given according to whether there is any.
Although the present invention is not limited to this embodiment, to give a specific example, in this embodiment, the program has a data table with contents as shown in FIG. 21 in the program, and the recognized voice is positive. If the words are "Yes", "Yeah", "Done", "Done", the meaning number is defined as 1, and "No", "Not yet", "Not yet", which indicates negative words If it is, the meaning number is defined as 2. In the present embodiment, if the voice input during the course of the conversation processing program corresponds to a word having a specific meaning number defined, the conversation processing is advanced according to the meaning number. I have.
Further, since the input voice changes depending on the question or inquiry to the user, a word is given for each conversation processing program (program processing name (program number) shown in the table of FIG. 21) for performing a specific inquiry or the like. Is defined.
[0046]
An example of the progress of the conversation process based on the meaning number will be described with reference to the conversation process (No. 002) described above.
The conversation processing corresponds to the conversation processing of the program name described as "Time to go out" in the table shown in FIG. The program number is No. 2-002. The program number No. In the conversation process of 2-002, a plurality of recognized words are registered as the word set “3”. The recognition word becomes a word assumed in advance for the question or the question. A meaning number is defined for each of the recognition words.
After the conversation process is started (S40), the voice input such as "Yes" or "No" is input to the next voice input after the voice output (S41) saying "It's time to go out?" In this case, if "yes" is input, the meaning number is "1", so the processing shifts to the next processing (S42 or S44) based on the meaning number "1", and "no" is input. In this case, since the meaning number is “2”, the process moves to the next processing (S50) based on the meaning number “2”. Even if the word is other than "yes" or "no", the meaning number is defined as "1" if the word is "yes", so that processing is performed according to the meaning number.
If a voice is input but a word is input in which a meaning number that is not assumed in advance is defined, a process is performed in which only the voice is recognized (S54).
[0047]
As described above, in the conversation process according to the present embodiment, a plurality of supposed replies (words) to a question or a question are stored as a word set corresponding to a specific conversation process, and the response is made according to the meaning number of the input word. Then, the next process to be performed is selected and the process is performed.
[0048]
Next, another example of the conversation processing program will be briefly described.
FIG. 15 shows a conversation process (No. 003), which is one of the processes selected with a half probability in the process S44 in the conversation process.
Also in the conversation process, as shown in FIG. 21, as the word set “5”, a plurality of recognition words (“Yes”, “Y , "Done,""Done,""No,""Notyet,""Notyet." In addition, as shown in the figure, a meaning number is defined for each recognition word.
In the conversation process (No. 003), when a question such as "Is the preparation ready?" For the robot toy 1 is made, if "Yes", "Yes", "No", etc. are input, The following processing is performed according to the predetermined meaning number.
[0049]
FIG. 16 shows a conversation process (No. 004), which is one of the processes selected with a half probability in the process S44 in the conversation process.
Also in the conversation process, as shown in FIG. 21, as a word set “6”, a plurality of recognition words (“yes”, “un”, “ No, No, No, No, No, No). In addition, as shown in the figure, a meaning number is defined for each recognition word.
In the conversation process (No. 004), when a question such as "Do not be late" of the robot toy 1 is input, "Yes", "Yes", "No", etc., a predetermined value is input. The following processing according to the meaning number is performed.
[0050]
FIG. 17 shows a conversation process (No. 005).
This is a process performed when a predetermined approach time has been reached in the standby mode (M2) shown in FIG.
Also in the conversation process, as shown in FIG. 21, as a word set “7”, a plurality of recognition words (“yes”, “yeah”) assumed for the question “may I sleep?” , "Good", "good", "no", "no good", "no good"). In addition, as shown in the figure, a meaning number is defined for each recognition word.
In the conversation process (No. 004), when a question such as "Do you want to sleep?" Of the robot toy 1 is input when "Yes", "Yes", "Good", etc. are input. Is adapted to perform audio output processing according to a predetermined meaning number.
[0051]
As described above, the conversation process according to the present embodiment stores words that are assumed answers to a specific question or question as a recognition word, and the recognition word is affirmed / negated by a semantic number. By classifying, the conversation process proceeds according to the category to which the recognized word belongs, that is, the meaning number. In this way, the conversation process is branched according to the meaning number, and one word is selected from a plurality of prepared voices in the process at the branch destination, so that the program configuration is simple. In a conversation with a user, the same conversation is rarely repeated, so that the user can continue using without getting tired.
[0052]
【The invention's effect】
The speech recognition device according to the present invention described above has the following effects. In other words, with a simple configuration, there is an effect that an appropriate voice conversation output can be made in accordance with the regular action of the user getting up in the morning, going to a company or school, returning from home, and going to sleep. For example, when trying to strictly grasp the user's regular behavior and perform corresponding voice conversation output processing, voices such as "good morning", "coming", and "now" could be recognized over a long period of time. It is necessary to obtain the time and calculate the time by statistical processing. However, even if the behavior pattern of the user is calculated strictly in this way, it is impossible to cope with the irregular behavior of the user. Therefore, even if the conversation processing is performed based on the behavior information of the user obtained one week ago, there is not much effect compared with the case where the statistics are collected in detail. Further, if the voice recognition device of the present invention is applied to a robot toy, it is possible to provide an interesting robot toy as a toy-like element, even if it does not conform to the behavior of an actual user. . In this way, there is no need for large-scale hardware assets that require acquisition of a large amount of data, and there is also an effect that an interesting conversation can be established as a toy with a minimum of hardware. are doing.
[0053]
Further, according to the voice recognition device of the present invention, if the user wakes up in the morning, goes to a company or school, returns from home, and outputs a voice in response to the action of sleeping, so if the user usually wakes up in the morning, There is no need to set the scheduler for schedule management, such as setting the start time on the alarm clock by yourself, and it is possible to input voice to the voice recognition device from the user associated with the action that is not particularly intended for schedule management. Accordingly, the user's behavior pattern can be stored, and an appropriate voice can be output in accordance with the pattern. Also, if the present invention is applied to a robot toy, the user can talk to the robot toy such as "good morning" or "coming," and play with it. Since it can be memorized and can output an appropriate sound in accordance with the user's actions, the user will be uttered unexpectedly as a sound, and can continue to use without getting tired. There is an effect that can be done.
Furthermore, except for humans who have an inaccurate but extremely irregular life pattern, most users will be notified of the sound at the timing according to the approximate life and behavior pattern of their own. By being able to remember your actions, you can have a close contact with the robot toy and have the effect that you can continue using it for a long time.
[0054]
Further, in the conversation device according to the present invention, the content of the conversation can be varied depending on the time period during which the conversation is performed. That is, in the morning, a conversation suitable for a situation such as before going to work is performed, and in the night, a conversation can be performed according to a situation in which a user who comes home is neglected.
Further, by performing the “yes / no” conversation process, there is an effect that a rich conversation can be provided despite the simple program configuration.
Further, the robot toy according to the present invention has an effect that the robot toy can be used with a sense of closeness by adopting a form imitating an animation character. In particular, if it imitates an animation character that is actually being broadcast on television, the same voice as the voice of the one that is actually being broadcast can be used. I can make it. In addition, by driving the hands, feet, head, and the like with a motor or the like, there are effects such as moving while talking and reproducing an action or utterance unique to an animation character.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating an electrical configuration of a speech recognition device according to an embodiment.
FIG. 2 is an address map showing a configuration of a storage unit used in the speech recognition device according to the present embodiment.
FIG. 3 is an example of a command used in a program used in the speech recognition device according to the present embodiment.
FIG. 4 is an explanatory table for explaining an example of processing contents of a command used in a program of the speech recognition device according to the present embodiment;
FIG. 5 is an explanatory table for explaining another example of the processing contents of the command used in the program of the speech recognition device according to the present embodiment;
FIG. 6 is an explanatory table for explaining still another example of the processing contents of the command used in the program of the voice recognition device according to the present embodiment;
FIG. 7 is a table showing a data table of dialogue uttered by the voice recognition device according to the present embodiment;
FIG. 8 is a table showing data of phrases constituting words spoken by the speech recognition apparatus according to the present embodiment.
FIG. 9 is an overall flowchart for explaining the overall configuration of the program processing of the speech recognition device according to the present embodiment;
FIG. 10 is a flowchart related to a main part of a program process of the speech recognition device according to the present embodiment;
FIG. 11 is an explanatory table for explaining a state of the voice recognition device according to the present embodiment.
FIG. 12 is an explanatory table for explaining times stored in the speech recognition device according to the present embodiment.
FIG. 13 is an example of a flowchart of a program process executed in the speech recognition device according to the present embodiment.
FIG. 14 is an example of a flowchart of a program process executed in the speech recognition device according to the present embodiment.
FIG. 15 is an example of a flowchart of a program process executed in the speech recognition device according to the present embodiment.
FIG. 16 is an example of a flowchart of a program process executed in the speech recognition device according to the present embodiment.
FIG. 17 is an example of a flowchart of a program process executed in the speech recognition device according to the present embodiment.
FIG. 18 is a table for describing an outline of a program process executed in the speech recognition device according to the present embodiment.
FIG. 19 is a table for explaining an example of dialogue uttered in the voice recognition device according to the present embodiment.
FIG. 20 is a table for explaining another example of dialogue uttered in the speech recognition device according to the present embodiment.
FIG. 21 is a table for explaining a word set in the speech recognition device according to the present embodiment.
[Explanation of symbols]
MR right microphone
MC Central microphone
ML Left microphone
SM1 servo motor
SM2 servo motor
SM3 servo motor
DC1 DC motor
DC2 DC motor
DC3 DC motor
DC4 DC motor
LED1 light emitting diode
LED2 light emitting diode
LED3 light emitting diode
SP audio speaker
SW1 press switch
SW2 push switch
SW3 push switch
PSW power switch
1 Robot toys
3 Toy body (housing)
11 CPU
13 Voice recognition IC
15 ROM15
17 Flash ROM

Claims

Control means including a CPU, a clock transmitter, and the like;
Rewritable storage means such as a non-volatile ROM,
Voice input means for inputting voice;
Voice recognition means for recognizing the voice input by the voice input means as words,
When a predetermined word is recognized by the voice recognition unit, a time is acquired based on a clock signal of the clock transmitter,
When the time at which the predetermined word is recognized is within a preset time zone, at a predetermined time on a day on which a predetermined number of days has elapsed after the day at which the predetermined word was recognized, A speech recognition device characterized by producing a predetermined sound corresponding to the speech.

The voice recognition device according to claim 1, wherein the predetermined time is the same time as the time when the predetermined word is recognized or a time set by the time.

3. A conversation device, comprising the speech recognition device according to claim 1 or 2, further comprising a conversation processing program executed according to a word recognized within the preset time period.

3. A robot toy comprising the voice recognition device according to claim 1 or 2 incorporated in a housing having an outer shape imitating a human, an animal, an animation character, or the like.

4. A robot toy, wherein the conversation device according to claim 3 is incorporated in a housing having an outer shape imitating a human, an animal, an animation character or the like.