JP4284989B2

JP4284989B2 - Speech analyzer

Info

Publication number: JP4284989B2
Application number: JP2002362855A
Authority: JP
Inventors: 保雄黒木; 敬介殿村
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2002-12-13
Filing date: 2002-12-13
Publication date: 2009-06-24
Anticipated expiration: 2022-12-13
Also published as: CN1506936A; CN1261922C; JP2004191872A

Description

【０００１】
【発明の属する技術分野】
本発明は、動物の音声を分析する音声分析装置に関する。
【０００２】
【従来の技術】
犬やネコなど動物等をペットとして飼っている者にとっては、ペットは家族の一員であり、人間と同様にして感情や意思を疎通し、コミュニケーションしたいと言う願望がある。
【０００３】
近年、音声分析技術、特に声紋分析の技術進歩に伴って、動物の鳴声に含まれる感情や意思に相当するもの（以下、単に「感情」と言う）を判別することが可能になった。例えば、ペットや家畜等の動物が発生する音声を音声分析してその特徴を抽出したパターン（例えば、ソノグラフ）を求める。そして、予め用意された動物行動学的に分析された基準の音声パターンと比較することによって、動物の感情を判別する。
【０００４】
こうした音声分析技術をもとにして、例えば、動物の鳴声の音声と動物の動作（しぐさ）の映像とを入力し、予め動物行動学的に分析された音声と動作のデータと比較することにより動物の感情を判別して、人間が理解できる文字や画像として表示させるものも提案されている（例えば、特許文献１参照；全請求項に対応）。
【０００５】
【特許文献１】
特開平１０−３４７９号公報
【０００６】
【発明が解決しようとする課題】
こうした技術によれば、飼主は、ある程度動物の感情を知ることが可能となり、動物からの要求がある場合には、それを理解し応えてあげることができる。しかし、従来技術によって実現される感情の伝達は、動物から人間への一方向のみであり、人間から動物への感情の伝達はサポートされていない。そのため、飼主とペットとがコミュニケーションをとっているとは言い難かった。
【０００７】
本発明は、上記課題を鑑みて成されたものであり、その目的とするところは、人間と動物との間における双方向の感情の伝達をサポートすることである。
【０００８】
【課題を解決するための手段】
上記課題を解決するために、請求項１に記載の発明の音声分析装置は、次の構成を備えている。
【００１１】
ここで言う、「動物の音声」とは鳴声の意味である。また、「人間語」とは、人間の音声や人間がその意味内容をできる言葉などのテキストや画像などの意である。また、「動物語」とは、同属や同グループ内において意思疎通を可能にする音パターンの意である。
【００２２】
請求項１に記載の発明に係る音声分析装置は、
音声分析装置本体を備えている音声分析装置において、
動物の音声を入力する第１音声入力手段（例えば、図３の音声入力部１０、ＣＰＵ２０、図１２のステップＳ１０２）と、
この第１音声入力手段により入力された音声を分析する第１音声分析手段（例えば、図３の音声分析部１２、ＣＰＵ２０、図１２のステップＳ１０８、図１３のステップＳ２０２〜Ｓ２１６）と、
この第１音声分析手段による分析結果に対応する文字または画像を人間語として表示出力する第１出力手段（例えば、図３の音出力部５０、ＣＰＵ２０、図１２のステップＳ１１６、図１４のステップＳ３０４、Ｓ３０６）と、
この第１出力手段により人間語が表示出力された後に、当該表示出力された内容に応答するために、使用者の音声を入力する第２音声入力手段（例えば、図３の音声入力部１０、ＣＰＵ２０、図１２のステップＳ１０２）と、
この第２音声入力手段により入力された音声を分析する第２音声分析手段（例えば、図３の音声分析部１２、ＣＰＵ２０、図１２のステップＳ１０８、図１３のステップＳ２０２〜Ｓ２１６）と、
この第２音声分析手段による分析結果に対応する音声を動物が理解可能な動物語として音声出力する第２出力手段（例えば、図３の音出力部５０、ＣＰＵ２０、図１２のステップＳ１１２、図１５のステップＳ４０４、Ｓ４０６）と、
使用者の声紋を登録する第１登録手段（例えば、図３のＲＯＭ４０、図８の声紋データ４１０）と、
所定の意味内容の人間語による音声を登録する第２登録手段（例えば、図３のＲＯＭ４０、図８の人語動物語変換ＴＢＬ４１８、図１０の登録音声データ４１８ｆ）と、前記第２音声入力手段により入力された音声と前記第１登録手段により登録された声紋とが一致するか否かを判定する判定手段（例えば、図１の制御ユニット１２０、図３のＣＰＵ２０、図１５のステップＳ４１６〜Ｓ４１８）と、
この判定手段によって一致しないと判定された場合に、前記第２登録手段によって登録された音声を人間語で出力する第４出力手段（例えば、図１のスピーカ部１０４、図３のＣＰＵ２０、音出力部５０、図１５の（ステップＳ４１８〜Ｓ４２０、Ｓ４２４）と、
を備えることを特徴とする。
【００２３】
請求項１に記載の発明によれば、使用者の音声が、予め登録されている声紋と異なる場合には、予め登録されている人間の音声を人間語で出力することができる。従って、動物に使用者の感情に対応する人間の音声を聞かせて、馴染ませる訓練効果が得られる。また、声紋を動物が最もなれた人物（例えば、飼主）の声紋とし、更に第２音声登録手段によって当該人物によって発せられた音声を登録した場合、使用者が当該人物と異なる場合に、動物に最もなれた人物の音声を聞かせることによって、動物を落ちつかせる効果が得られる。
【００２４】
請求項２に記載の発明は、請求項１に記載の音声分析装置において、前記音声分析装置本体を使用者の身体に装着するための装着手段（例えば、図１のリストバンド部１１０、図２４のクリップ部２１２）を備えることを特徴とする。
【００２５】
請求項２に記載の発明によれば、請求項１に記載の発明と同様の効果を奏するとともに、音声分析装置を身体に装着し、携行して使用することができる。従って、当該音声分析装置の使用時に、一々バックなどの中から取り出す必要も無く、使い勝手が向上する。
【００２６】
請求項３に記載の発明は、請求項１または請求項２に記載の音声分析装置において、前記第１音声入力手段によって入力された音声と、前記第１音声入力手段によって音声が入力された時刻とを対応付けて記憶する記憶手段（例えば、図３のＣＰＵ２０、ＲＡＭ３０、図７の履歴データ３２２、図１２のステップＳ１１７）を備えることを特徴とする。
【００２７】
また、請求項４に記載の発明は、請求項３に記載の音声分析装置において、前記記憶手段によって記憶された音声と時刻とを対応付けて出力する第５出力手段（例えば、図１のモニタ部１０６、図３のＣＰＵ２０、表示部５２、図１２のステップＳ１２８）を備えることを特徴とする。
【００２８】
請求項３に記載の発明によれば、動物と人間との間で交わされたやりとりの履歴を記憶しておくことができる。
請求項４に記載の発明によれば、動物と人間との間で交わされたやりとりの履歴を出力することによって、履歴を分析し利用することができる。
【００２９】
請求項５に記載の発明は、請求項１〜４の何れか一項に記載の音声分析装置において、前記第１音声分析手段による分析結果に対応して所定の振動を発生させる加振手段（例えば、図１のバイブレータ１１２、図３のＣＰＵ２０、加振部５４、図１１の振動パターンＴＢＬ４２０）を備えることを特徴とする。
【００３０】
請求項５に記載の発明によれば、請求項１〜４の何れか一項に記載の発明と同様の効果を奏するとともに、前記第１音声分析手段による分析結果に対応して所定の振動を発生させ、使用者に体感によって分析結果を知らせることができる。従って、使用者は、分析結果を人間語のテキストなどを読む必要がなく、使い勝手が向上し、よりスムーズなコミュニケーションを図ることが可能になる。また、視覚障害者や聴覚障害者などであっても、分析結果を知ることが可能になる。
【００３１】
【発明の実施の形態】
〔第１の実施の形態〕
次に、図１〜図２１を参照して、本発明を適用した音声分析装置の第１の実施の形態について説明する。尚、本実施の形態においては、動物を犬として説明するがこれに限定されるものではなく、例えば、ネコ、イルカ、オウムなど他の動物でも構わない。
【００３２】
[構成の説明]
図１は、本発明を適用した腕時計型音声分析装置の外観の一例を示す図である。図１（ａ）に示すように、腕時計型音声分析装置１００の外観は、全体として従来の腕時計と同様の形態を成している。そして、動物や人間の音声を入力するマイク部１０２と、音を出力するスピーカ部１０４と、テキストや画像を表示出力するモニタ部１０６と、種々の操作を入力するキー操作部１０８と、当該腕時型計音声分析装置１００の携行時に人体等に装着するためのリストバンド部１１０と、バイブレータ１１２と、外部装置との無線通信を行うためのデータ通信部１１４と、腕時計型音声分析装置１００を統合的に制御する制御ユニット１２０と、図示されない電源部とを備える。
【００３３】
マイク部１０２は、集音装置であって、例えばマイクロフォン等によって実現される。同図では、単体とされているが複数あっても構わないし、着脱自在に分離し、本体と接続されるケーブルをクリップ等によって取り付け可能な構成としても良い。
【００３４】
スピーカ部１０４は、音出力装置であって、例えばスピーカなどによって実現される。本実施の形態では、人間にとっての可聴域外の高周波音を出力する場合があるので、スピーカ部１０４は当該高周波域の音を出力できる仕様とする。
【００３５】
モニタ部１０６は、例えばＬＣＤ（Liquid Crystal Display）やＥＬＤ（Electronic Luminescent Display）などの表示素子及びバックライトや駆動回路等によって実現される表示出力手段である。モニタ部１０６は、制御ユニット１２０の制御によって文字（テキスト）や図形、画像などを表示することができる。同図では、モニタ部１０６は単数であるが複数備える構成であっても良い。
【００３６】
キー操作部１０８は、例えば、ボタンスイッチやレバー、ダイヤル等によって実現される入力手段である。本実施の形態では、図１（ｂ）に示すように、上方向キー１０８ａと、下方向キー１０８ｃと、選択キー１０８ｂと、キャンセルキー１０８ｄとを備える。キーの押下時間や押下順の組合せによって、例えば、複数のメニュー等からの選択操作、決定及びキャンセル操作、所定機能の呼び出し操作などを入力できる。キー操作部１０８の数は、上記に限らず適宜設定して構わない。
【００３７】
リストバンド部１１０は、使用者が携行する際に身体や持ち物等に装着・装備するための手段であって、例えば、腕時計のリストバンドと同様のもののほか、クリップや紐、チェーン、マジックテープ（登録商標）、マグネットなどであっても良い。
【００３８】
バイブレータ１１２は、小型の加振装置である。本実施の形態では、制御ユニット１２０の制御によって犬２の音声に含まれる感情に対応したパターンで振動を発生させる。使用者４は、種々の振動パターンを体感することによって、モニタ部１０６を見なくとも犬２の感情や意思を知ることが可能となり、視覚障害者や聴覚障害者でも利用可能としている。
【００３９】
データ通信部１１４は、パソコンなどの外部装置と無線通信によってデータの送受を実現するものであって、例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標）や、ＩｒＤＡなどの規格に対応する通信モジュールや、有線通信用のジャック端子等によって実現される。
【００４０】
制御ユニット１２０は、ＣＰＵ（Central Processing Unit）や各種ＩＣメモリ、水晶発振器などを備え、ＣＰＵがＩＣメモリに記憶されているプログラム等を読み出し演算処理することによって腕時計型音声分析装置１００を統合的に制御することができる。また、例えば水晶発振器などを用いて腕時計型音声分析装置１００を腕時計として機能させることもできる。
【００４１】
図２は、本実施の形態における使用方法の一例を示す概念図である。同図に示すように、使用者４は腕時計型音声分析装置１００をリストバンド部１１０で、例えば、使用者４の手首等に装着して携行して使用する。腕時計型音声分析装置１００を、腕時計のように携行することによって、装置を別途持ち歩く不便さや、使用時に装置を一々バックなどから取り出す不便さを解消することができる。
【００４２】
そして、腕時計型音声分析装置１００は、使用者４と使用者４の側にいるペットの犬２の音声を捉えて（検出して）、両者間の双方向のコミュニケーションをサポートする。即ち、マイク部１０２で犬２の音声を捉えた場合には、この音声を音声分析して犬２の感情を判別し、モニタ部１０６で使用者４が理解できるテキストや図（人間語）を表示する。反対に、使用者４の音声を捉えた場合には、音声分析して使用者４の感情を判別し、スピーカ部１０４から犬２が理解できる音（動物語）で出力する。
【００４３】
[機能ブロックの説明]
図３は、本実施の形態における機能構成の一例を示す機能ブロック図である。
同図に示すように、腕時計型音声分析装置１００は、音声入力部１０と、音声分析部１２と、キー入力部１４と、音声分析用ＲＯＭ（Read Only Memory）１６と、ＣＰＵ２０と、ＲＡＭ（Random Access Memory）３０と、ＲＯＭ４０と、音出力部５０と、表示部５２と、加振部５４と、通信部６０と、システムバス９０とを備える。
【００４４】
音声入力部１０は、犬２や使用者４の音声を入力し、音声信号を音声分析部１２に出力する。図１ではマイク部１０２がこれに該当する。
【００４５】
音声分析部１２は、音声入力部１０から入力された音声信号を音声分析する。より具体的には、例えば、音声信号に含まれるノイズ成分の除去処理や音声信号をＡ／Ｄ変換して所定形式の音声データに変換する処理、音声データの特徴抽出のためのパターン化処理、予め登録されている基準音声パターンとの比較処理等を実行する。これらの処理は、例えば、Ａ／Ｄ変換器やフィルタ回路、ＤＳＰ（Digital Signal Processor）などの演算処理用集積回路等によって実現できる。機能の一部または全部を、音声分析用ＲＯＭ１６に格納されているプログラムやデータを読み出して、演算処理することによって実現する（ソフトウェア的に実現する）構成であっても良い。音声分析部１２は、図１では制御ユニット１２０に実装されている。
【００４６】
音声分析用ＲＯＭ１６は、音声分析部１２の各種処理に供されるプログラムやデータを記憶し、音声分析部１２から参照される。図１では、音声分析用ＲＯＭ１６は制御ユニット１２０に実装されている。
【００４７】
図４は、本実施の形態における音声分析用ＲＯＭ１６に記憶されている内容の一例を示す図である。同図に示すように、例えば、音声分析部１２の各種処理を演算処理によって実現させるためのプログラムである音声分析プログラム１６２と、音声入力部１０から入力された音声と比較する基準となるデータである動物基準音声パターン１６４及び人間基準音声パターン１６６とを記憶する。
【００４８】
図５は、本実施の形態における動物基準音声パターン１６４のデータ構成の一例を示す図である。同図に示すように、動物基準音声パターン１６４は、動物の種類（動物属性コード）毎に予め用意され、適用される動物の種族を示す動物属性コード１６４ａと、動物の感情を分類する情報である感情識別コード１６４ｂと、この感情を伝えるための動物語にあたる音声（鳴声）の基準音声パターン１６４ｃとを対応付けて格納する。基準音声パターン１６４ｃは、例えばソノグラフのデータである。動物語とは、同属や同グループ内において意思疎通を可能にする音パターンの意である。
【００４９】
動物基準音声パターン１６４は、統計手法によって求められ、動物行動学的に分析された情報である。動物属性コード１６４ａをもとに、対象とする動物に合致する動物基準音声パターン１６４を検索し、音声入力部１０から入力された音声の音声データをパターン化して、基準音声パターン１６４ｃとマッチング判定することによって、この音声に含まれる動物の感情を判別することができる。
【００５０】
人間基準音声パターン１６６は、使用者４の音声に含まれる感情を判別するための基準となる情報であり、適用される人間の属性に応じて予め用意される。ここで言う人間の属性とは、例えば、言語分類、性別、年齢などをパラメータとする分類である。
【００５１】
人間基準音声パターン１６６は、例えば、図６に示すように、適用される人間の属性を示す人間属性コード１６６ａと、人間の感情を分類する感情識別コード１６６ｂと、それに対応する人間の音声の基準音声パターン１６６ｃとを含む。
【００５２】
基準音声パターン１６６ｃは、統計的に求められ分析された特徴的な音声パターンや感情を表現する単語を発音したときの音声パターンであって、例えばソノグラフのデータ等である。従って、使用者４に合致する人間属性コード１６６ａの人間基準音声パターン１６６を検索して、音声入力部１０から入力された音声の音声データをパターン化し、基準音声パターン１６６ｃとマッチング判定することによって、この音声に含まれる使用者４の感情を判別することができる。尚、人間基準音声パターン１６６に含まれるデータは上記に限らず、例えば、言葉の発音の速さ、音声の強弱などの判定値等、判別に要するデータを適宜含み、マッチングの判別に用いても構わない。
【００５３】
キー入力部１４は、例えば、ボタンスイッチやレバー、ダイヤル、タッチパネル、トラックパッドなどによって実現され、操作を入力して操作信号をＣＰＵ２０に出力する。図１では、キー操作部１０８がこれに該当する。
【００５４】
ＣＰＵ２０は、図１では制御ユニット１２０に実装されており、演算処理によって各ブロックを統合的に制御して各種処理を実行する。
【００５５】
ＲＡＭ３０は、ＣＰＵ２０や音声分析部１２が一時的にプログラムやデータを格納するＩＣメモリであって、図１では制御ユニット１２０に実装されている。
【００５６】
図７は、本実施の形態におけるＲＡＭ３０に記憶される内容の一例を示す図である。同図に示すように、例えば、犬２の名称情報を格納する動物名３０２と、動物属性コード３０４と、人間属性コード３０６と、計時データ３０８と、音声データ３１０と、音声入力時刻データ３１２と、音声識別フラグ３１４と、感情識別コード３１６と、高周波モードフラグ３１８と、体感モードフラグ３２０と、履歴データ３２２とを記憶する。
【００５７】
動物名３０２は犬２の名称を示す情報であり、動物属性コード３０４は犬２の種類を示す情報である。どちらも使用者４が使用前に登録する。動物名３０２は、後述する人語出力処理等においてモニタ部１０６に表示されて、犬２と使用者４との親密感を高める働きをする。
【００５８】
人間属性コード３０６は、使用者４の属性（例えば、言語種類、性別、年齢など）を示す情報であって、使用者４が使用前に登録する。
【００５９】
計時データ３０８は、日時情報を示す情報である。計時データ３０８を参照することによって、腕時計型音声分析装置１００は時計やタイマーとしても機能できる。
【００６０】
音声データ３１０は、音声入力部１０から入力された音声が音声分析部１２によって変換されたデジタルデータである。本実施の形態では、波形データとして記憶するが、その他、ソノグラフなど他のデータ形式であっても良い。音声データ３１０の元になった音声が入力された時刻は、音声入力時刻データ３１２に格納される。
【００６１】
音声識別フラグ３１４と感情識別コード３１６は、音声データ３１０が音声分析部１２によって音声分析された結果を格納する。音声識別フラグ３１４は、音声データが動物の音声か人間の音声かを示す情報である。感情識別コード３１６は、基準音声パターン１６４ｃ又は１６６ｃとのマッチングによって判別された感情識別コード１６４ｂ又は１６６ｂを格納する。
【００６２】
高周波モードフラグ３１８は、後述する動物語出力処理において、使用者４の感情を判別し、動物語でスピーカ部１０４から音出力する際に、人間には聞こえず、犬２に聞こえる高周波音を出力するか否かを設定する情報である。例えば、動物が犬である場合には、高周波音は、所謂「犬笛」で出される音域の音に該当する。
【００６３】
体感モードフラグ３２０は、後述する人語出力処理において、犬２の音声に含まれる感情を判別し、使用者４が理解できるテキストや図をモニタ部１０６に表示する際に、バイブレータ１１２によって振動を発生させるか否かを設定する情報である。
【００６４】
履歴データ３２２は、音声入力と出力に関する履歴であって、音声入力時刻３２２ａと、音声識別フラグ３２２ｂと、感情識別コード３２２ｃとを対応づけて格納する。従って、履歴データ３２２を参照することによって、いつ、誰（犬２又は使用者４）が、どういった感情のやりとり示したかを知ることができる。
【００６５】
ＲＯＭ４０は、ＣＰＵ２０に演算処理によって種々の機能を実現させるためのプログラムとデータを記憶する。
【００６６】
図８は、本実施の形態におけるＲＯＭ４０に記憶される内容の一例を示す図である。同図に示すように、プログラムとしては、システムプログラム４００と、犬２（動物）の音声の音声分析結果に基づいて使用者４（人間）が理解できるテキストや図等（人語）として出力する人語出力処理を実行させるための人語出力プログラム４０２と、使用者４の音声の音声分析結果等に基づいて犬２が理解できる音を出力する動物語出力処理を実行させるための動物語出力プログラム４０４と、各種のモード切換に係る処理を実行させるためのモード切換プログラム４０６、履歴データ３２２に基づく履歴表示処理を実行させるための履歴出力プログラム４０８とを含む。
【００６７】
データとしては、使用者４の本人確認に用いる声紋データ４１０と、モニタ部１０６に時計表示をさせるための時計表示データ４１２と、各種画面表示に必要な情報を格納する画面フレームデータ４１４と、動物語人語変換ＴＢＬ（テーブル）４１６と、人語動物語変換ＴＢＬ（テーブル）４１８と、振動パターンＴＢＬ（テーブル）４２０とを記憶する。
【００６８】
声紋データ４１０は、犬２が日ごろ慣れ親しんだ人物の声紋、例えば飼主の声紋であって、例えば予め腕時計型音声分析装置のメーカ等において採取されて記憶される。尚、声紋データ４１０はＲＯＭ４０に記憶されるに限らず、ＲＡＭ３０に使用者４によって登録される構成であっても良いのは勿論である。
【００６９】
動物語人語変換ＴＢＬ４１６は、犬２の感情と人語を対応付けて格納し、人語動物語変換ＴＢＬ４１８は、使用者４の感情と動物語とを対応づけて格納する辞書データに相当する情報である。
【００７０】
図９は、本実施の形態における動物語人語変換ＴＢＬ４１６のデータ構成の一例を示す図である。同図に示すように、動物語人語変換ＴＢＬ４１６は、音声分析部１２が犬２の音声を分析して判別した感情識別コード４１６ａと、それに対応する人間が理解可能なテキストデータ４１６ｂと、動物の画像を表示させるための画像データ４１６ｃとを対応付けて格納する。尚、画像データ４１６ｃは、静止画情報でも良いし、アニメーションなどを表示させるための動画情報でも良い。
【００７１】
図１０は、本実施の形態における人語動物語変換ＴＢＬ４１８のデータ構成の一例を示す図である。同図に示すように、人語動物語変換ＴＢＬ４１８は、音声分析部１２が使用者４の音声を分析して判別した感情識別コード４１８ａと、それに対応する人間が理解可能なテキストデータ４１８ｂと、人間の画像を表示させるための画像データ４１８ｃと、動物（この場合、犬）の鳴声を人工的に合成した合成音データ４１８ｄと、人間の可聴域外の高周波音データ４１８ｅと、予め登録された使用者４の音声である登録音声データ４１８ｆとを対応付けて格納する。尚、画像データ４１８ｃは、静止画情報でも良いし、アニメーションなどを表示させるための動画情報でも良い。
【００７２】
振動パターンＴＢＬ４２０は、例えば、図１１に示すように、感情識別コード４２０ａと、振動パターン４２０ｂとを対応付けて格納する。振動パターンＴＢＬ４２０を参照することによって、感情識別コード４２０ａに対応する振動パターン４２０ｂでバイブレータ１１２を振動させることができる。
【００７３】
音出力部５０は、例えば、スピーカなどによって実現され、音を出力する。図１のスピーカ部１０４がこれに該当する。
【００７４】
表示部５２は、例えば、ＬＣＤ、ＥＬＤ、ＰＤＰなどの表示素子等によって実現され、画像を表示出力する。図１のモニタ部１０６がこれに該当する。
【００７５】
加振部５４は、例えばバイブレータなどの振動器などによって実現され、振動を発生させる。図１のバイブレータ１１２がこれに該当する。
【００７６】
通信部６０は、外部装置との無線通信を行うための送受信手段である。例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標）や、ＩｒＤＡなどのモジュールや、有線用の通信ケーブルのジャックや制御回路などによって実現される。図１のデータ通信部１１４がこれに該当する。尚、通信部６０が、通信時に供するプロトコルスタック等の情報は、ＲＯＭ４０に記憶されており（図示略）、適宜読み出して利用する。
【００７７】
[処理の説明]
次に、図１２〜図２３を参照して、本実施の形態における処理の流れを説明する。
【００７８】
図１２は、本実施の形態における主たる処理の流れを説明するためのフローチャートである。同図に示すように、音声入力部１０が音声の入力を検知したならば（ステップＳ１０２）、音声分析部１２は音声入力部１０から入力された音声信号にＡ／Ｄ変換やフィルタ処理を実行して、音声分析に適した適当な形式の音声データ３１０に変換する（ステップＳ１０４）。
次に、そのときの計時データ３０８を音声入力時刻データ３１２として音声データ３１０と対応付けて記憶し（ステップＳ１０６）、音声データ３１０の音声分析処理を実行する（ステップＳ１０８）。
【００７９】
図１３は、本実施の形態における音声分析処理の流れを説明するためのフローチャートである。同図に示すように、音声分析部１２は、先ず記憶されている音声データ３１０を読み出し（ステップＳ２０２）、動物基準音声パターン１６４とのマッチングを実行する（ステップＳ２０４）。即ち、音声データ３１０をパターン化してソノグラフを求め、基準音声パターン１６４ｃのパターンと比較し、パターンの特徴が近似するものが有れば、マッチングするものが有ると判断する。
【００８０】
動物基準音声パターン１６４中にマッチングするものがある場合（ステップＳ２０６；ＹＥＳ）、音声分析部１２は音声識別フラグ３１４に動物の音声であることを示す「１」を格納し、マッチングした基準音声パターン１６４ｃに対応する感情識別コード１６４ｂをＲＡＭ３０の感情識別コード３１６に格納させ（ステップＳ２０８）、音声分析処理を終了して、図１２のフローに戻る。
【００８１】
動物基準音声パターン１６４中にマッチングするものが無い場合（ステップＳ２０６；ＮＯ）、人間基準音声パターン１６６とのマッチングを実行する（ステップＳ２１０）。
【００８２】
人間基準音声パターン１６６中にマッチングするものがある場合（ステップＳ２１２；ＹＥＳ）、音声分析部１２は音声識別フラグ３１４に人間の音声であることを示す「０」を格納し、マッチングした基準音声パターン１６６ｃに対応する感情識別コード１６６ｂをＲＡＭ３０の感情識別コード３１６に格納させ（ステップＳ２１４）、音声分析処理を終了して、図１２のフローに戻る。
【００８３】
人間基準音声パターン１６６中にマッチングするものが無い場合（ステップＳ２１２；ＮＯ）には、音声分析部１２は音声識別フラグ３１４に「０」を格納し、ＲＡＭ３０の感情識別コード３１６に「０」を格納させ（ステップＳ２１６）、音声分析処理を終了して、図１２のフローに戻る。
【００８４】
音声分析処理を終了して、図１２のフローに戻ったならば、ＣＰＵ２０は、音声識別フラグ３１４と感情識別コード３１６とを参照する。
音声識別フラグ３１４が「１」である場合、即ち動物である犬２の音声が入力された場合（ステップＳ１１０；ＹＥＳ）、人語出力処理を実行する（ステップＳ１１２）。音声識別フラグ３１４が「０」である場合、即ち人間である使用者４の音声が入力された場合（ステップＳ１１４；ＹＥＳ）、動物語出力処理を実行する（ステップＳ１１６）。音声識別フラグ３１４が「０」で、且つ感情識別コード３１６が「０」である場合、即ち動物の音声とも人間の音声とも判別できなかった場合には（ステップＳ１１４；ＮＯ）、人語出力処理も動物語出力処理にも移らない。
【００８５】
図１４は、本実施の形態における人語出力処理の流れを説明するためのフローチャートである。同図に示すように、ＣＰＵ２０は、先ず画面フレームデータ４１４を参照して、人語出力用のフレームを表示部５２に表示させる（ステップＳ３０２）。
【００８６】
次に、ＲＡＭ３０の感情識別コード３１６を参照し（ステップＳ３０４）、動物語人語変換ＴＢＬ４１６から感情識別コード３１６に対応するテキストデータ４１６ｂと画像データ４１６ｃとを読み出し、人語出力用の画面内の所定位置に表示させる（ステップＳ３０６）。
【００８７】
次に、音声データ３１０を読み出し、人語出力用の画面内の所定位置に音声データを表示させ（ステップＳ３０８）、音声入力時刻データ３１２を読み出して、音声が入力された日時を表示させる（ステップＳ３１０）。
【００８８】
次に、ＣＰＵ２０は体感モードフラグ３２０を参照し、体感モードフラグが「１」である場合、即ち体感モードが「ＯＮ」に設定されている場合（ステップＳ３１２；ＹＥＳ）、振動パターンＴＢＬ４２０から、先に読み出した感情識別コード３１６に対応する振動パターン４２０ｂを読み出す。そして、読み出した振動パターン４２０ｂに従って加振部５４を制御して振動を発生させ（ステップＳ３１４）、人語出力処理を終了し、図１２のフローに戻る。そして、図１２のフローに戻ったならば、ＣＰＵ２０は、履歴データ３２２を更新する（ステップＳ１１７）。
【００８９】
図１８は、本実施の形態における人語出力処理における画面の一例を示す図である。人語出力用の画面５では、タイトル表示５ａで犬２から使用者４へのメッセージであることを示す。この際、例えば「（太郎）からのメッセージが届きました」のように、動物名３０２（ペットの名前）を表示に含めることで、使用者４により親密感を与える。
【００９０】
動物語人語変換ＴＢＬ４１６から読み出した感情識別コード３１６に対応するテキストデータ４１６ｂと画像データ４１６ｃは、それぞれテキスト表示部５ｂと画像表示部５ｃとに表示される。テキスト表示部５ｂは、例えば画像表示部５ｃからのフキダシ内に表示されると、より好適である。
【００９１】
音声データ３１０は、音声データ表示部５ｄにグラフ表示される。波形データとして表示しても良いし、ソノグラフなど他の形式で表示しても良い。ここで音声データ３１０を表示することによって、使用者４が、この表示の特徴（グラフの形状など）を読み取るセンスを養い、ゆくゆくはテキスト表示部５ｂのテキストを読まずに、音声データ３１０のグラフ表示を見ただけで、犬２の感情や意思を理解できるようにする。このグラフの形状は、より微細な感情や意思を含んでおり、使用者４がグラフの特徴を読み取るセンスを得ると、感情識別コードによる分類よりもより細やかに犬２を理解してあげることができるようになる。
音声が入力された時刻が日時表示部５ｅは、例えば画面下部に表示される。
【００９２】
図１５は、本実施の形態における動物語出力処理の流れを説明するためのフローチャートである。同図に示すように、ＣＰＵ２０は、先ず画面フレームデータ４１４を参照して、動物語出力用のフレームを表示部５２に表示させる（ステップＳ４０２）。
【００９３】
次に、ＲＡＭ３０の感情識別コード３１６を参照し（ステップＳ４０４）、人語動物語変換ＴＢＬ４１８から感情識別コード３１６に対応するテキストデータ４１８ｂと画像データ４１８ｃとを読み出し、動物語出力用の画面内の所定位置に表示させる（ステップＳ４０６）。
【００９４】
次に、音声データ３１０を読み出し、動物語出力用の画面内の所定位置に音声データをグラフ表示させ（ステップＳ４０８）、音声入力時刻データ３１２を読み出して、音声が入力された日時を表示させる（ステップＳ４１０）。
【００９５】
次に、ＣＰＵ２０は高周波モードフラグ３１８を参照し、高周波モードフラグが「１」である場合、即ち高周波モードが「ＯＮ」に設定されている場合（ステップＳ４１２；ＹＥＳ）、人語動物語変換ＴＢＬ４１８から、先に参照した感情識別コード３１６に対応する高周波音データ４１８ｅを読み出し、音出力部５０から出力させる（ステップＳ４１４）。
【００９６】
次に、音声データ３１０を声紋データ４１０と照合し（ステップＳ４１６）、一致するか否かを判定する（ステップＳ４１８）。
【００９７】
音声データ３１０が声紋データ４１０と一致し、本人の声であると判定された場合（ステップＳ４１８；ＹＥＳ）、人語動物語変換ＴＢＬ４１８から感情識別コード３１６に対応する合成音データ４１８ｄを読み出し（ステップＳ４２２）、音出力部５０から出力させる（ステップＳ４２４）。
【００９８】
音声データ３１０が声紋データ４１０と一致しないと判定された場合（ステップＳ４１８；ＮＯ）、人語動物語変換ＴＢＬ４１８から感情識別コード３１６に対応する登録音声データ４１８ｆを読み出し（ステップＳ４２０）、音出力部５０から出力させる（ステップＳ４２４）。登録音声データ４１８ｆを出力することによって、飼主でない人が使用者４の場合に、日ごろ慣れ親しんだ人物の声を聴かせることによって、犬２の緊張や警戒心を和らげさせ、使用者４に犬２が慣れていない場合であっても、よりスムーズにコミュニケーションが行われるようにする。
【００９９】
音出力部５０から合成音データ４１８ｄ又は登録音声データ４１８ｆを出力したならば、動物語出力処理を終了して図１２のフローに戻る。そして、図１２のフローに戻ったならば、ＣＰＵ２０は、履歴データ３２２を更新する（ステップＳ１１７）。
【０１００】
図１９は、本実施の形態における動物語出力処理における画面の一例を示す図である。動物語出力用の画面６では、タイトル表示６ａで犬２へのメッセージであることを示す。この際、例えば「（太郎）へのメッセージを入力します」のように、動物名３０２（ペットの名前）を表示に含めることで、使用者４により親密感を与える。
【０１０１】
人語動物語変換ＴＢＬ４１８から読み出した感情識別コード３１６に対応するテキストデータ４１８ｂと画像データ４１８ｃは、それぞれテキスト表示部６ｂと画像表示部６ｃとに表示される。テキスト表示部６ｂは、同図のように、例えば画像表示部６ｃからのフキダシ内に表示されると、より好適である。
【０１０２】
音声データ３１０は、音声データ表示部６ｄにグラフ表示され、音声が入力された時刻が画面下部の日時表示部６ｅに表示される。
【０１０３】
図１２のフローにおいて、例えば、上方向キー１０８ａまたは下方向キー１０８ｃが所定時間、比較的長く押下された場合（ステップＳ１１８；ＹＥＳ）、ＣＰＵ２０はキー入力処理を実行する（ステップＳ１２０）。
【０１０４】
図１６は、本実施の形態におけるキー入力処理の流れを説明するためのフローチャートである。同図に示すように、ＣＰＵ２０は、先ず画面フレームデータ４１４を参照して、キー入力用のフレームを表示部５２に表示させる（ステップＳ５０２）。キー入力用の画面では、例えば、人語動物語変換ＴＢＬ４１８を参照して、テキストデータ４１８ｂの内容を選択可能にボタン表示する（ステップＳ１１６）。
【０１０５】
使用者４は、上方向キー１０８ａまたは下方向キー１０８ｃで、所望する内容のボタンを選択し、選択キー１０８ｂを押下して選択決定する（ステップＳ５０４）。
【０１０６】
選択決定が入力されたならば、ＣＰＵ２０は、人語動物語変換ＴＢＬ４１８から選択された内容に対応する感情識別コード４１８ａを選択して、ＲＡＭ３０に記憶する（ステップＳ５０６）。そして、キー入力処理を終了し、図１２のフローに戻る。図１２のフローに戻ると、次にＣＰＵ２０は動物語出力処理を実行する。
【０１０７】
図２０は、本実施の形態におけるキー入力処理における画面の一例を示す図である。キー入力用の画面７では、タイトル表示７ａで犬２へのメッセージであることを示す。
【０１０８】
人語動物語変換ＴＢＬ４１８から読み出したテキストデータ４１８ｂを内容とする選択ボタン７ｂを表示する。全ての選択ボタン７ｂが１度に表示しきれない場合には、スクロール表示可能に表示させる。また、現在選択状態にある選択ボタン７ｂは、例えば反転表示する。
【０１０９】
また、画面７には、選択ボタン７ｃとキャンセルボタン７ｄとが表示され、それぞれ選択キー１０８ｂ及びキャンセルキー１０８ｄが押下されると反転表示されて、該当するキーが入力されたことを使用者４に視覚的に通知する。
【０１１０】
図１２のフローにおいて、例えば、選択キー１０８ｂが所定時間、比較的長く押下されている場合（ステップＳ１２２；ＹＥＳ）、ＣＰＵ２０はモード切換処理を実行する（ステップＳ１２４）。
【０１１１】
図１７は、本実施の形態におけるモード切換処理の流れを説明するためのフローチャートである。同図に示すように、ＣＰＵ２０は、先ず画面フレームデータ４１４を参照して、モード切換用のフレームを表示部５２に表示させる（ステップＳ６０２）。
【０１１２】
次に、高周波モードの切換の操作が入力された場合（ステップＳ６０４；ＹＥＳ）、ＣＰＵ２０は高周波モードフラグ３１８を切換える（ステップＳ６０６）。体感モードの切換の操作が入力された場合（ステップＳ６０８；ＹＥＳ）、ＣＰＵ２０は体感モードフラグ３２０を切換える（ステップＳ６１０）。そして、所定の終了操作が入力されたならば（ステップＳ６１２；ＹＥＳ）モード切換処理を終了して、図１２のフローに戻る。
【０１１３】
図２１は、本実施の形態におけるモード切換処理における画面の一例を示す図である。モード切換用の画面７では、タイトル表示８ａでモード切換処理が実行されていることを示す。モード切換用の画面には、高周波モードのＯＮ／ＯＦＦ表示８ｂと、体感モードのＯＮ／ＯＦＦ表示８ｃとを表示する。ＯＮ／ＯＦＦ表示８ｂと８ｃとは上方向キー１０８ａまたは下方向キー１０８ｃの入力によって順次選択状態となる。選択状態において、選択キー１０８ｂを入力すると、当該モードの切換処理を入力したことになり、ＣＰＵ２０は、ＯＮとＯＦＦとを切換える。キャンセルキー１０８ｄを押下すると、モード切換処理の終了操作を入力できる。
【０１１４】
図１２のフローにおいて、例えば、キャンセルキー１０８ｄが所定時間、比較的長く押下されている場合（ステップＳ１２６；ＹＥＳ）、ＣＰＵ２０は履歴表示処理を実行する（ステップＳ１２８）。
【０１１５】
図２２は、本実施の形態における履歴表示処理における画面の一例を示す図である。同図に示すように、履歴表示処理では、履歴データ３２２を参照して履歴表示部９ａを表示する。例えば、時刻９ｂと、犬２と使用者４の何れの音声であったかを示すアイコン９ｃと、内容９ｄとを表示する。アイコン９ｃは、音声識別フラグ３２２ｂに基づいて表示される。内容９ｄは、音声識別フラグ３２２ｂと感情識別コード３２２ｃとに基づいて、動物語人語変換テーブル４１６又は人語動物語変換ＴＢＬ４１８からテキストデータ４１６ｂ又は４１８ｂを読み出してテキスト表示する。
【０１１６】
また、履歴表示部９ａを画面内に１度に表示し切れない場合は、上方向キー１０８ａや下方向キー１０８ｃの入力によってスクロール表示可能に表示する。この際、現在表示されている履歴が、１日（２４時間）の内のどの時間帯に該当するかをバー９ｅで表示すると好ましい。
【０１１７】
使用者４は、この履歴表示を見ることによって、例えば、犬２の性格やクセの理解、体調の変化などを知ることに役立てることができる。
【０１１８】
図１２において、音声の入力が無かった場合（ステップＳ１０２；ＮＯ）や、音声が入力されたが動物の音声とも人間の音声とも判別できなかった場合（ステップＳ１１４；ＮＯ）、及び特定のキー操作が入力されなかった場合（ステップＳ１１８のＮＯ→Ｓ１１２のＮＯ→Ｓ１２６のＮＯ）、ＣＰＵ２０は、例えば図２３に示すように、表示部５２に時計画面３を表示させる（ステップＳ１３０）。
【０１１９】
時計画面３には、例えば、アナログ時計３ａと、日付３ｂと、曜日３ｃとが表示される。従って、使用者４は、腕時計型音声分析装置１００を犬２とのコミュニケーション＝ツールとして使用できるとともに、腕時計としても利用できる。
【０１２０】
〔第２の実施の形態〕
次に、本発明を適用した音声分析装置の第２の実施の形態について説明する。尚、本実施の形態は、基本的に第１の実施の形態と同様の構成によって実現可能であり、同様の構成要素には同じ符号を付け説明は省略するものとする。
【０１２１】
図２４は、本実施の形態における、ドッグリーダ型音声分析装置２００の外観の一例を示す図である。同図に示すように、ドッグリーダ型音声分析装置２００は、犬２を散歩させる際に使用するリーダ２０２を、リール２０４によって引出し／巻取り自在に備えている。リーダ２０２の先端には、犬２の首輪２０７とリーダ２０２を着脱させる金具２０６とマイク部１０２とが設けられている。使用者４は、本体２０８を把持し、或いはクリップ部２１２でベルトなどに装着して使用する。
【０１２２】
マイク部１０２は、リーダ２０２内に配設された信号線２１０によって、本体２０８に内蔵された制御ユニット１２０及び電源と接続されている。マイク部１０２をリーダ２０２の先端部に設けることによって屋外など音声が拡散しやすい条件においてもより効率良く集音できる。
【０１２３】
尚、ドッグリーダ型音声分析装置２００は、使用者４が装着する腕時計型音声分析装置１００とデータ通信部１１４を介して、マイク部１０２で集音した音声信号を送信する構成としても良い。この場合、ドッグリーダ型音声分析装置２００は、音声分析部１２及び音声分析用ＲＯＭ１６、表示部５２、加振部５４を省略し、使用者４が装着する腕時計型音声分析装置１００を利用する構成とすることができる。
【０１２４】
以上、本発明を適応した実施の形態を説明したが、本発明の適用がこれらに限定されるものではなく、発明の趣旨を逸脱しない限りにおいて、適宜構成要素の変更・追加・削除等を行っても構わない。
【０１２５】
例えば、音声分析装置は、パソコンやＰＤＡ（パーソナル・デジタル・アシスタント）、多機能形態電話機として実現しても良い。
【０１２６】
音声分析部１２は、ＣＰＵ２０の演算処理によって実現される構成としても良く、音声分析用ＲＯＭ１６はＲＯＭ４０と同一であっても構わない。また、キー入力部１４として、モニタ部１０６の表示面上にタッチパネルを設ける構成としても良い。
【０１２７】
また、動物語出力処理において、合成音データ４１８ｄは、使用者４の音声データが声紋データ４１０との照合結果に関係無く出力し、音声データが声紋データ４１０と一致しなかった場合に、登録音声データ４１８ｆを追加的に出力するフローとしても良い。
【０１２８】
【発明の効果】
本発明によれば、次のような効果を奏することができる。
【０１３０】
請求項１に記載の発明によれば、使用者の音声を予め登録されている声紋と照合し、異なる場合には、予め登録されている人間の音声を人間語で出力することができる。従って、動物に使用者の感情に対応する人間の音声を出力することによって、動物に人間の音声を聞かせて、馴染ませる訓練効果が得られる。また、声紋を動物が最もなれた人物（例えば、飼主）の声紋とすると、使用者が当該人物と異なる場合であっても、動物を落ちつかせる効果が得られる。
【０１３１】
請求項２に記載の発明によれば、音声分析装置を身体に装着可能とすることによって、携行して使用することができる。従って、当該音声分析装置の使用時に一々バックなどの中から取り出す必要も無く、使い勝手が向上する。
【０１３２】
請求項３、４に記載の発明によれば、動物と人間との間で交わされたやりとりの時間的な履歴を記憶しておくことよって、動物と人間との間で交わされたやりとりの履歴を分析し利用することができる。
【０１３３】
請求項５に記載の発明によれば、動物の音声の分析結果に対応して所定の振動を発生させ、使用者に体感によって分析結果を知らせることによって、分析結果を人間語のテキストなどを読む必要がなく使い勝手が向上し、よりスムーズなコミュニケーションを図ることが可能になる。また、視覚障害者や聴覚障害者などであっても、分析結果を知ることが可能になる。
【図面の簡単な説明】
【図１】第１の実施の形態である腕時計型音声分析装置の外観の一例を示す図。
【図２】腕時計型音声分析装置の使用方法の一例を示す概念図。
【図３】機能構成の一例を示す機能ブロック図。
【図４】音声分析用ＲＯＭに記憶されている内容の一例を示す図。
【図５】動物基準音声パターンのデータ構成の一例を示す図。
【図６】人間基準音声パターンのデータ構成の一例を示す図。
【図７】ＲＡＭに記憶される内容の一例を示す図。
【図８】ＲＯＭに記憶される内容の一例を示す図。
【図９】動物語人語変換ＴＢＬのデータ構成の一例を示す図。
【図１０】人語動物語変換ＴＢＬのデータ構成の一例を示す図。
【図１１】振動パターンＴＢＬのデータ構成の一例を示す図。
【図１２】主たる処理の流れを説明するためのフローチャート。
【図１３】音声分析処理の流れを説明するためのフローチャート。
【図１４】人語出力処理の流れを説明するためのフローチャート。
【図１５】動物語出力処理の流れを説明するためのフローチャート。
【図１６】キー入力処理の流れを説明するためのフローチャート。
【図１７】モード切換処理の流れを説明するためのフローチャート。
【図１８】人語出力処理における画面の一例を示す図。
【図１９】動物語出力処理における画面の一例を示す図。
【図２０】キー入力処理における画面の一例を示す図。
【図２１】モード切換処理における画面の一例を示す図。
【図２２】履歴表示処理における画面の一例を示す図。
【図２３】時計表示画面の一例を示す図。
【図２４】第２の実施の形態であるドッグリーダ型音声分析装置の外観の一例を示す図。
【符号の説明】
２犬
４使用者
１０音声入力部
１２音声分析部
１４キー入力部
１６音声分析用ＲＯＭ
１６２音声分析プログラム
１６４動物基準音声パターン
１６６人間基準音声パターン
２０ＣＰＵ
３０ＲＡＭ
３０４動物属性コード
３０６人間属性コード
３１０音声データ
３１２音声入力時刻データ
３１４音声識別フラグ
３１６感情識別コード
３１８高周波モードフラグ
３２０体感モードフラグ
３２２履歴データ
４０ＲＯＭ
４０２動物語出力プログラム
４０４人語出力プログラム
４０６モード切換プログラム
４０８履歴出力処理プログラム
４１０声紋データ
４１６動物語人語変換ＴＢＬ（テーブル）
４１８人語動物語変換ＴＢＬ（テーブル）
４２０振動パターンＴＢＬ（テーブル）
５０音出力部
５２表示部
５４加振部
６０通信部
１００腕時計型音声分析装置
１０２マイク部
１０４スピーカ部
１０６モニタ部
１０８キー操作部
１１０リストバンド部
１１２バイブレータ
１１４データ通信部
１２０制御ユニット
２００ドッグリーダ型音声分析装置
２０２リーダ
２０４リール
２０６金具
２０８本体
２１０信号線
２１２クリップ部[0001]
BACKGROUND OF THE INVENTION
  The present invention analyzes animal speechSpeech analyzerAbout.
[0002]
[Prior art]
For those who keep animals such as dogs and cats as pets, pets are members of the family, and there is a desire to communicate and communicate with each other in the same way as humans.
[0003]
In recent years, with advances in speech analysis technology, particularly voiceprint analysis, it has become possible to discriminate emotions and intentions included in animal calls (hereinafter simply referred to as “emotions”). For example, a voice (such as a sonograph) obtained by voice analysis of voices generated by animals such as pets and domestic animals is obtained. Then, the emotion of the animal is discriminated by comparison with a reference voice pattern prepared in advance and analyzed in terms of animal behavior.
[0004]
Based on such voice analysis technology, for example, input the voice of the animal's cry and the video of the animal's movement (gesture) and compare it with the voice and movement data analyzed in advance in animal behavior It has also been proposed to discriminate animal emotions and display them as characters or images that can be understood by humans (see, for example, Patent Document 1; corresponding to all claims).
[0005]
[Patent Document 1]
Japanese Patent Laid-Open No. 10-3479
[0006]
[Problems to be solved by the invention]
According to such a technique, the owner can know the emotion of the animal to some extent, and when there is a request from the animal, it can understand and respond to it. However, the transmission of emotions realized by the prior art is only one way from animals to humans, and the transmission of emotions from humans to animals is not supported. Therefore, it was difficult to say that the owner and the pet were communicating.
[0007]
  The present invention has been made in view of the above-mentioned problems, and the object of the present invention is humans and animals.WithTo support two-way emotional communication.
[0008]
[Means for Solving the Problems]
  In order to solve the above-mentioned problem, the speech analysis apparatus according to claim 1Has the following configuration.
[0011]
As used herein, “animal voice” means crying. “Human language” means text or an image such as a human voice or a word that allows a human to understand its meaning. The term “animal language” means a sound pattern that enables communication within the same genus or group.
[0022]
  Claim1The speech analysis apparatus according to the invention described in
  In the voice analysis device provided with the voice analysis device body,
  First voice input means for inputting animal voice (for example, voice input unit 10 in FIG. 3, CPU 20, step S102 in FIG. 12);
  A first voice analysis means for analyzing the voice input by the first voice input means (for example, the voice analysis unit 12 of FIG. 3, the CPU 20, step S108 of FIG. 12, steps S202 to S216 of FIG. 13);
  First output means (for example, the sound output unit 50 of FIG. 3, the CPU 20, step S116 of FIG. 12, step S304 of FIG. 14), which displays and outputs characters or images corresponding to the analysis result by the first voice analysis means as human language. , S306)
  After the human language is displayed and output by the first output means, a second voice input means (for example, the voice input unit 10 in FIG. CPU 20, step S102 in FIG. 12,
  A second voice analysis means for analyzing the voice input by the second voice input means (for example, the voice analysis unit 12 of FIG. 3, the CPU 20, step S108 of FIG. 12, steps S202 to S216 of FIG. 13);
  The second output means (for example, the sound output unit 50 of FIG. 3, the CPU 20, step S112 of FIG. 12, FIG. 15) outputs the voice corresponding to the analysis result by the second voice analysis means as an animal language understandable by the animal. Steps S404 and S406),
  First registration means for registering a user's voiceprint (for example, ROM 40 in FIG. 3, voiceprint data 410 in FIG. 8);
  Second registration means (for example, ROM 40 in FIG. 3, human language / animal language conversion TBL 418 in FIG. 8, registered voice data 418f in FIG. 10) for registering speech in human language having a predetermined meaning, and the second voice input means Determining means (for example, the control unit 120 in FIG. 1, the CPU 20 in FIG. 3, and the steps S416 to S418 in FIG. 15) for determining whether or not the voice input by the first registration means coincides with the voiceprint registered by the first registration means. )When,
  Fourth output means (for example, the speaker unit 104 in FIG. 1, the CPU 20 in FIG. 3, the sound output) that outputs the voice registered by the second registration means in human language when it is determined by the determination means that they do not match. Unit 50, (steps S418 to S420, S424) of FIG.
  It is characterized by providing.
[0023]
  Claim1In the invention described inAccording to the userIs different from a pre-registered voiceprint, it is possible to output a pre-registered human voice in human language. Therefore, it is possible to obtain a training effect that allows the animal to hear and hear the human voice corresponding to the emotion of the user. In addition, when the voiceprint is the voiceprint of the person who is most familiar with the animal (for example, the owner) and the voice uttered by the person is registered by the second voice registration means, if the user is different from the person, By listening to the voice of the most familiar person, the effect of calming the animal can be obtained.
[0024]
  Claim2The invention described in claim1The voice analyzer describedIn the voice analyzer main bodyIt is characterized by comprising mounting means (for example, the wristband unit 110 in FIG. 1 and the clip unit 212 in FIG. 24) for mounting on the user's body.
[0025]
  Claim2According to the invention described in claim1While having the same effect as the described invention, the voice analyzer can be worn on the body and carried. Therefore, it is not necessary to take out from the back or the like at the time of using the speech analysis apparatus, and the usability is improved.
[0026]
  Claim3The invention described in claim1 or claim 2In the voice analysis device describedLeaveStorage means for storing the voice input by the first voice input means and the time when the voice is input by the first voice input means (for example, the CPU 20, RAM 30, FIG. 7 history data in FIG. 7) 322, step S117 of FIG. 12).
[0027]
  Claims4Invention described inIs, Claims3In the voice analyzer described inLeaveAnd fifth output means (for example, the monitor unit 106 in FIG. 1, the CPU 20 in FIG. 3, the display unit 52, and step S128 in FIG. 12) for outputting the voice and time stored in the storage unit in association with each other.It is characterized by.
[0028]
  Claim3According to the invention described in (1), it is possible to store a history of exchanges between animals and humans.
  Claim4Invention described inAccording toBy outputting a history of exchanges between animals and humans, the history can be analyzed and used.
[0029]
  Claim5The invention described in claim 14The voice analysis device according to any one ofInExcitation means (for example, the vibrator 112 in FIG. 1, the CPU 20 in FIG. 3, the excitation unit 54, and the vibration pattern TBL 420 in FIG. 11) that generates a predetermined vibration corresponding to the analysis result by the first voice analysis means is provided. It is characterized by providing.
[0030]
  Claim5According to the invention described in claim 1,4In addition to the same effects as the invention described in any one of the above, it is possible to generate a predetermined vibration corresponding to the analysis result by the first voice analysis means and notify the user of the analysis result by bodily sensation. Therefore, the user does not need to read the human language text or the like on the analysis result, so that the usability is improved and smoother communication can be achieved. Moreover, it becomes possible to know the analysis result even for a visually handicapped person or a hearing handicapped person.
[0031]
DETAILED DESCRIPTION OF THE INVENTION
[First Embodiment]
Next, a first embodiment of a speech analyzer to which the present invention is applied will be described with reference to FIGS. In this embodiment, the animal is described as a dog, but the present invention is not limited to this. For example, other animals such as cats, dolphins, and parrots may be used.
[0032]
[Description of configuration]
FIG. 1 is a diagram showing an example of the appearance of a wristwatch type speech analyzer to which the present invention is applied. As shown in FIG. 1 (a), the external appearance of the wristwatch type speech analysis apparatus 100 has the same form as a conventional wristwatch as a whole. Then, a microphone unit 102 for inputting animal and human voices, a speaker unit 104 for outputting sound, a monitor unit 106 for displaying and outputting text and images, a key operation unit 108 for inputting various operations, and the arm A wristband unit 110 for wearing on a human body or the like when carrying the hour meter voice analyzer 100, a vibrator 112, a data communication unit 114 for performing wireless communication with an external device, and a wristwatch type voice analyzer 100. A control unit 120 for integrated control and a power supply unit (not shown) are provided.
[0033]
The microphone unit 102 is a sound collection device and is realized by, for example, a microphone. In the figure, a single unit is provided, but there may be a plurality of units, or a configuration in which a cable that is detachably separated and connected to the main body can be attached by a clip or the like.
[0034]
The speaker unit 104 is a sound output device and is realized by, for example, a speaker. In this embodiment, since there is a case where high-frequency sound outside the audible range for humans is output, the speaker unit 104 is configured to output sound in the high-frequency range.
[0035]
The monitor unit 106 is a display output unit realized by a display element such as an LCD (Liquid Crystal Display) and an ELD (Electronic Luminescent Display), a backlight, a driving circuit, and the like. The monitor unit 106 can display characters (text), figures, images, and the like under the control of the control unit 120. In the figure, the monitor unit 106 is singular, but may be configured to include a plurality.
[0036]
The key operation unit 108 is input means realized by, for example, a button switch, a lever, a dial, or the like. In the present embodiment, as shown in FIG. 1B, an up direction key 108a, a down direction key 108c, a selection key 108b, and a cancel key 108d are provided. For example, a selection operation from a plurality of menus, a determination and cancellation operation, a calling operation of a predetermined function, and the like can be input depending on a combination of a key pressing time and a pressing order. The number of key operation units 108 is not limited to the above, and may be set as appropriate.
[0037]
The wristband unit 110 is a means for the user to attach and equip to the body and belongings when carrying it. For example, in addition to the wristband of a wristwatch, a clip, string, chain, magic tape ( Registered trademark), a magnet, or the like.
[0038]
Vibrator 112 is a small vibration device. In the present embodiment, the vibration is generated in a pattern corresponding to the emotion included in the voice of the dog 2 under the control of the control unit 120. By experiencing various vibration patterns, the user 4 can know the emotion and intention of the dog 2 without looking at the monitor unit 106, and can also be used by visually impaired persons and hearing impaired persons.
[0039]
The data communication unit 114 realizes data transmission / reception by wireless communication with an external device such as a personal computer. For example, a communication module corresponding to a standard such as Bluetooth (registered trademark) or IrDA, This is realized by a jack terminal or the like.
[0040]
The control unit 120 includes a CPU (Central Processing Unit), various IC memories, a crystal oscillator, and the like, and the wristwatch type speech analysis apparatus 100 is integrated by the CPU reading and processing programs stored in the IC memory. Can be controlled. Further, for example, the wristwatch type voice analysis apparatus 100 can be made to function as a wristwatch using a crystal oscillator or the like.
[0041]
FIG. 2 is a conceptual diagram showing an example of a usage method in the present embodiment. As shown in the figure, the user 4 uses the wristwatch type voice analysis device 100 by wearing it on the wristband unit 110, for example, on the wrist of the user 4 and carrying it. Carrying the wristwatch-type speech analysis apparatus 100 like a wristwatch can eliminate the inconvenience of separately carrying the apparatus and the inconvenience of taking out the apparatus from the bag at the time of use.
[0042]
The wristwatch type voice analysis apparatus 100 captures (detects) the voice of the user 4 and the pet dog 2 on the user 4 side, and supports bidirectional communication between the two. That is, when the voice of the dog 2 is captured by the microphone unit 102, the voice of the dog 2 is analyzed to discriminate the emotion of the dog 2, and the monitor unit 106 generates a text or a figure (human language) that can be understood by the user 4. indicate. On the other hand, when the voice of the user 4 is captured, voice analysis is performed to determine the emotion of the user 4 and the speaker unit 104 outputs a sound (animal language) that can be understood by the dog 2.
[0043]
[Description of functional block]
FIG. 3 is a functional block diagram illustrating an example of a functional configuration according to the present embodiment.
As shown in the figure, the wristwatch type voice analysis apparatus 100 includes a voice input unit 10, a voice analysis unit 12, a key input unit 14, a voice analysis ROM (Read Only Memory) 16, a CPU 20, and a RAM ( Random Access Memory) 30, ROM 40, sound output unit 50, display unit 52, vibration unit 54, communication unit 60, and system bus 90.
[0044]
The voice input unit 10 inputs the voice of the dog 2 or the user 4 and outputs a voice signal to the voice analysis unit 12. In FIG. 1, the microphone unit 102 corresponds to this.
[0045]
The voice analysis unit 12 performs voice analysis on the voice signal input from the voice input unit 10. More specifically, for example, a process for removing a noise component included in an audio signal, a process for converting an audio signal into A / D conversion into audio data of a predetermined format, a patterning process for extracting features of audio data, Comparison processing with a reference voice pattern registered in advance is executed. These processes can be realized by an arithmetic processing integrated circuit such as an A / D converter, a filter circuit, or a DSP (Digital Signal Processor), for example. A configuration in which some or all of the functions are realized by reading a program or data stored in the voice analysis ROM 16 and performing arithmetic processing (implemented by software) may be employed. The voice analysis unit 12 is mounted on the control unit 120 in FIG.
[0046]
The voice analysis ROM 16 stores programs and data used for various processes of the voice analysis unit 12 and is referred to by the voice analysis unit 12. In FIG. 1, the voice analysis ROM 16 is mounted on the control unit 120.
[0047]
FIG. 4 is a diagram showing an example of the contents stored in the voice analysis ROM 16 in the present embodiment. As shown in the figure, for example, a voice analysis program 162 that is a program for realizing various processes of the voice analysis unit 12 by arithmetic processing, and data serving as a reference for comparison with the voice input from the voice input unit 10. A certain animal reference voice pattern 164 and a human reference voice pattern 166 are stored.
[0048]
FIG. 5 is a diagram showing an example of the data configuration of the animal reference voice pattern 164 in the present embodiment. As shown in the figure, the animal reference voice pattern 164 is prepared in advance for each type of animal (animal attribute code), and includes an animal attribute code 164a indicating the animal species to be applied, and information for classifying animal emotions. A certain emotion identification code 164b and a reference voice pattern 164c of a voice (scream) corresponding to an animal word for transmitting this emotion are stored in association with each other. The reference voice pattern 164c is, for example, sonograph data. Animal language means a sound pattern that enables communication within the same genus and group.
[0049]
The animal reference voice pattern 164 is information obtained by a statistical method and analyzed in terms of animal behavior. Based on the animal attribute code 164a, the animal reference voice pattern 164 matching the target animal is searched, the voice data of the voice input from the voice input unit 10 is patterned, and matching with the reference voice pattern 164c is determined. Thus, it is possible to discriminate the emotions of animals included in this voice.
[0050]
The human reference voice pattern 166 is information serving as a reference for discriminating emotions included in the voice of the user 4, and is prepared in advance according to the applied human attribute. The human attribute mentioned here is a classification using, for example, language classification, gender, age, and the like as parameters.
[0051]
For example, as shown in FIG. 6, the human reference voice pattern 166 includes a human attribute code 166a indicating a human attribute to be applied, an emotion identification code 166b for classifying a human emotion, and a human voice reference corresponding thereto. Audio pattern 166c.
[0052]
The reference voice pattern 166c is a voice pattern when a characteristic voice pattern statistically obtained and analyzed or a word expressing emotion is pronounced, and is, for example, sonograph data. Therefore, by searching for the human reference voice pattern 166 of the human attribute code 166a that matches the user 4, patterning the voice data of the voice input from the voice input unit 10, and determining the matching with the reference voice pattern 166c, The emotion of the user 4 included in this voice can be determined. Note that the data included in the human reference voice pattern 166 is not limited to the above, and includes data necessary for discrimination, such as judgment values such as the speed of pronunciation of words and strength of speech, as appropriate, and may be used for discrimination of matching. I do not care.
[0053]
The key input unit 14 is realized by, for example, a button switch, a lever, a dial, a touch panel, a track pad, and the like, and inputs an operation and outputs an operation signal to the CPU 20. In FIG. 1, the key operation unit 108 corresponds to this.
[0054]
The CPU 20 is mounted on the control unit 120 in FIG. 1 and executes various processes by controlling each block in an integrated manner through arithmetic processing.
[0055]
The RAM 30 is an IC memory in which the CPU 20 and the voice analysis unit 12 temporarily store programs and data, and is mounted on the control unit 120 in FIG.
[0056]
FIG. 7 is a diagram showing an example of contents stored in the RAM 30 in the present embodiment. As shown in the figure, for example, an animal name 302 that stores name information of the dog 2, an animal attribute code 304, a human attribute code 306, time data 308, audio data 310, and audio input time data 312 The voice identification flag 314, the emotion identification code 316, the high frequency mode flag 318, the sensation mode flag 320, and the history data 322 are stored.
[0057]
The animal name 302 is information indicating the name of the dog 2, and the animal attribute code 304 is information indicating the type of the dog 2. Both are registered by the user 4 before use. The animal name 302 is displayed on the monitor unit 106 in a human language output process or the like, which will be described later, and functions to increase the intimacy between the dog 2 and the user 4.
[0058]
The human attribute code 306 is information indicating an attribute (for example, language type, sex, age, etc.) of the user 4 and is registered by the user 4 before use.
[0059]
The time measurement data 308 is information indicating date and time information. By referring to the time measurement data 308, the wristwatch type speech analysis apparatus 100 can also function as a clock or a timer.
[0060]
The voice data 310 is digital data obtained by converting the voice input from the voice input unit 10 by the voice analysis unit 12. In the present embodiment, the waveform data is stored, but other data formats such as a sonograph may be used. The time when the voice that is the source of the voice data 310 is input is stored in the voice input time data 312.
[0061]
The voice identification flag 314 and the emotion identification code 316 store the result of voice analysis of the voice data 310 by the voice analysis unit 12. The voice identification flag 314 is information indicating whether the voice data is animal voice or human voice. The emotion identification code 316 stores the emotion identification code 164b or 166b determined by matching with the reference voice pattern 164c or 166c.
[0062]
The high-frequency mode flag 318 outputs the high-frequency sound that can be heard by the dog 2 but is not heard by humans when the emotion of the user 4 is determined and the sound is output from the speaker unit 104 using the animal language in the animal language output process described later. This is information for setting whether or not to perform. For example, when the animal is a dog, the high-frequency sound corresponds to a sound in a range produced by a so-called “dog flute”.
[0063]
The bodily sensation mode flag 320 discriminates emotions included in the voice of the dog 2 in the human language output process described later, and causes the vibrator 112 to vibrate when displaying text or a figure that can be understood by the user 4 on the monitor unit 106. This is information for setting whether or not to generate.
[0064]
The history data 322 is a history relating to voice input and output, and stores a voice input time 322a, a voice identification flag 322b, and an emotion identification code 322c in association with each other. Accordingly, by referring to the history data 322, it is possible to know when and who (the dog 2 or the user 4) showed what emotions were exchanged.
[0065]
The ROM 40 stores programs and data for causing the CPU 20 to realize various functions through arithmetic processing.
[0066]
FIG. 8 is a diagram showing an example of contents stored in the ROM 40 in the present embodiment. As shown in the figure, as a program, a system program 400 and a text or figure (human language) that can be understood by the user 4 (human) based on the voice analysis result of the voice of the dog 2 (animal) are output. A human language output program 402 for executing a human language output process and an animal language output process for outputting a sound that can be understood by the dog 2 based on a voice analysis result of the voice of the user 4 and the like. A program 404, a mode switching program 406 for executing processing relating to various mode switching, and a history output program 408 for executing history display processing based on the history data 322 are included.
[0067]
As data, voice print data 410 used for user 4 identification, clock display data 412 for displaying a clock on the monitor 106, screen frame data 414 for storing information necessary for various screen displays, animal data A language / language conversion TBL (table) 416, a language / animal language conversion TBL (table) 418, and a vibration pattern TBL (table) 420 are stored.
[0068]
The voiceprint data 410 is a voiceprint of a person familiar to the dog 2 on a daily basis, for example, a voiceprint of an owner, and is collected and stored in advance, for example, by a manufacturer of a wristwatch type voice analyzer. Of course, the voiceprint data 410 is not limited to be stored in the ROM 40, but may be configured to be registered in the RAM 30 by the user 4.
[0069]
The animal language human language conversion TBL 416 stores the emotion of the dog 2 in association with the human language, and the human language animal language conversion TBL 418 corresponds to dictionary data that stores the emotion of the user 4 in association with the animal language. Information.
[0070]
FIG. 9 is a diagram illustrating an example of a data configuration of the animal language human language conversion TBL 416 according to the present embodiment. As shown in the figure, the animal language human language conversion TBL 416 includes an emotion identification code 416a that the voice analysis unit 12 analyzed and discriminated from the voice of the dog 2, text data 416b that can be understood by humans, and an animal The image data 416c for displaying the image is stored in association with each other. The image data 416c may be still image information or moving image information for displaying an animation or the like.
[0071]
FIG. 10 is a diagram illustrating an example of a data configuration of the human-animal language conversion TBL 418 according to the present embodiment. As shown in the figure, the human-animal language conversion TBL 418 includes an emotion identification code 418a determined by the voice analysis unit 12 analyzing and analyzing the voice of the user 4, and corresponding human-readable text data 418b. Image data 418c for displaying a human image, synthesized sound data 418d obtained by artificially synthesizing a voice of an animal (in this case, a dog), high-frequency sound data 418e outside the human audible range, and pre-registered The registered voice data 418f, which is the voice of the user 4, is stored in association with it. The image data 418c may be still image information or moving image information for displaying an animation or the like.
[0072]
For example, as shown in FIG. 11, the vibration pattern TBL 420 stores an emotion identification code 420 a and a vibration pattern 420 b in association with each other. By referring to the vibration pattern TBL420, the vibrator 112 can be vibrated with the vibration pattern 420b corresponding to the emotion identification code 420a.
[0073]
The sound output unit 50 is realized by, for example, a speaker and outputs sound. The speaker unit 104 in FIG. 1 corresponds to this.
[0074]
The display unit 52 is realized by, for example, a display element such as an LCD, ELD, or PDP, and displays and outputs an image. The monitor unit 106 in FIG. 1 corresponds to this.
[0075]
The vibration unit 54 is realized by a vibrator such as a vibrator, for example, and generates vibration. The vibrator 112 in FIG. 1 corresponds to this.
[0076]
The communication unit 60 is transmission / reception means for performing wireless communication with an external device. For example, it is realized by a module such as Bluetooth (registered trademark) or IrDA, a jack of a wired communication cable, a control circuit, or the like. The data communication unit 114 in FIG. 1 corresponds to this. Note that information such as a protocol stack provided by the communication unit 60 during communication is stored in the ROM 40 (not shown), and is appropriately read and used.
[0077]
[Description of processing]
Next, the flow of processing in the present embodiment will be described with reference to FIGS.
[0078]
FIG. 12 is a flowchart for explaining the main processing flow in the present embodiment. As shown in the figure, if the voice input unit 10 detects voice input (step S102), the voice analysis unit 12 performs A / D conversion and filter processing on the voice signal input from the voice input unit 10. Then, the voice data 310 is converted into an appropriate format suitable for voice analysis (step S104).
Next, the timing data 308 at that time is stored as voice input time data 312 in association with the voice data 310 (step S106), and voice analysis processing of the voice data 310 is executed (step S108).
[0079]
FIG. 13 is a flowchart for explaining the flow of the voice analysis processing in the present embodiment. As shown in the figure, the voice analysis unit 12 first reads the stored voice data 310 (step S202) and executes matching with the animal reference voice pattern 164 (step S204). That is, the sonographic data is obtained by patterning the audio data 310, and compared with the pattern of the reference audio pattern 164c. If there is an approximation of the pattern feature, it is determined that there is a match.
[0080]
If there is a match in the animal reference voice pattern 164 (step S206; YES), the voice analysis unit 12 stores “1” indicating the voice of the animal in the voice identification flag 314 and matches the matched reference voice pattern. The emotion identification code 164b corresponding to 164c is stored in the emotion identification code 316 of the RAM 30 (step S208), the voice analysis process is terminated, and the flow returns to the flow of FIG.
[0081]
If there is no matching in the animal reference voice pattern 164 (step S206; NO), matching with the human reference voice pattern 166 is executed (step S210).
[0082]
If there is a matching in the human reference voice pattern 166 (step S212; YES), the voice analysis unit 12 stores “0” indicating human voice in the voice identification flag 314, and the matched reference voice pattern. The emotion identification code 166b corresponding to 166c is stored in the emotion identification code 316 of the RAM 30 (step S214), the voice analysis process is terminated, and the flow returns to the flow of FIG.
[0083]
If there is no matching in the human reference voice pattern 166 (step S212; NO), the voice analysis unit 12 stores “0” in the voice identification flag 314 and “0” in the emotion identification code 316 of the RAM 30. Store (step S216), end the voice analysis process, and return to the flow of FIG.
[0084]
When the voice analysis process ends and the flow returns to the flow of FIG. 12, the CPU 20 refers to the voice identification flag 314 and the emotion identification code 316.
When the voice identification flag 314 is “1”, that is, when the voice of the dog 2 which is an animal is input (step S110; YES), the human language output process is executed (step S112). When the voice identification flag 314 is “0”, that is, when the voice of the user 4 who is a human is input (step S114; YES), the animal language output process is executed (step S116). When the voice identification flag 314 is “0” and the emotion identification code 316 is “0”, that is, when neither the animal voice nor the human voice can be discriminated (step S114; NO), the human language output process Does not move to animal language output processing.
[0085]
FIG. 14 is a flowchart for explaining the flow of the human language output process in the present embodiment. As shown in the figure, the CPU 20 first refers to the screen frame data 414 and causes the display unit 52 to display a frame for human language output (step S302).
[0086]
Next, the emotion identification code 316 in the RAM 30 is referred to (step S304), the text data 416b and the image data 416c corresponding to the emotion identification code 316 are read from the animal language human language conversion TBL 416, and the human language output screen is displayed. It is displayed at a predetermined position (step S306).
[0087]
Next, the voice data 310 is read out, the voice data is displayed at a predetermined position in the human language output screen (step S308), the voice input time data 312 is read out, and the date and time when the voice is input is displayed (step S308). S310).
[0088]
Next, the CPU 20 refers to the sensation mode flag 320, and when the sensation mode flag is “1”, that is, when the sensation mode is set to “ON” (step S312; YES), the vibration pattern TBL420 starts. The vibration pattern 420b corresponding to the emotion identification code 316 read out is read out. And the vibration part 54 is controlled according to the read vibration pattern 420b, a vibration is generated (step S314), a human language output process is complete | finished, and it returns to the flow of FIG. Then, when returning to the flow of FIG. 12, the CPU 20 updates the history data 322 (step S117).
[0089]
FIG. 18 is a diagram illustrating an example of a screen in the human language output process according to the present embodiment. In the human language output screen 5, the title display 5a indicates a message from the dog 2 to the user 4. At this time, for example, the user 4 is given a sense of intimacy by including the animal name 302 (the name of the pet) in the display like “a message from (Taro) has arrived”.
[0090]
Text data 416b and image data 416c corresponding to the emotion identification code 316 read from the animal language human language conversion TBL 416 are displayed on the text display unit 5b and the image display unit 5c, respectively. For example, the text display unit 5b is more preferably displayed in a balloon from the image display unit 5c.
[0091]
The audio data 310 is displayed in a graph on the audio data display unit 5d. It may be displayed as waveform data, or may be displayed in other formats such as a sonograph. By displaying the voice data 310 here, the user 4 develops the sense of reading the characteristics (graph shape, etc.) of the display, and eventually the graph of the voice data 310 without reading the text on the text display unit 5b. You can understand the feelings and intentions of Dog 2 just by looking at the display. The shape of this graph includes finer emotions and intentions. When the user 4 has a sense of reading the characteristics of the graph, the user can understand the dog 2 more finely than classification by the emotion identification code. become able to.
The date and time display unit 5e displays the time when the voice is input, for example, at the bottom of the screen.
[0092]
FIG. 15 is a flowchart for explaining the flow of the animal language output process according to the present embodiment. As shown in the figure, the CPU 20 first refers to the screen frame data 414 and displays a frame for animal language output on the display unit 52 (step S402).
[0093]
Next, the emotion identification code 316 in the RAM 30 is referred to (step S404), and the text data 418b and the image data 418c corresponding to the emotion identification code 316 are read from the human-animal language conversion TBL 418, and the animal language output screen is displayed. It is displayed at a predetermined position (step S406).
[0094]
Next, the voice data 310 is read, the voice data is displayed in a graph at a predetermined position in the animal language output screen (step S408), and the voice input time data 312 is read to display the date and time when the voice was input ( Step S410).
[0095]
Next, the CPU 20 refers to the high frequency mode flag 318, and when the high frequency mode flag is “1”, that is, when the high frequency mode is set to “ON” (step S412; YES), the human-animal language conversion TBL418. Thus, the high frequency sound data 418e corresponding to the previously referred emotion identification code 316 is read out and output from the sound output unit 50 (step S414).
[0096]
Next, the voice data 310 is checked against the voiceprint data 410 (step S416), and it is determined whether or not they match (step S418).
[0097]
When it is determined that the voice data 310 matches the voiceprint data 410 and is the voice of the person (step S418; YES), the synthesized sound data 418d corresponding to the emotion identification code 316 is read from the human-animal language conversion TBL418 (step S418). S422), output from the sound output unit 50 (step S424).
[0098]
When it is determined that the voice data 310 does not match the voiceprint data 410 (step S418; NO), the registered voice data 418f corresponding to the emotion identification code 316 is read from the human-animal language conversion TBL418 (step S420), and the sound output unit 50 (step S424). By outputting the registered voice data 418f, when the person who is not the owner is the user 4, by listening to the voice of a person familiar to him / her, the dog 2 is relieved of tension and alertness. Even if you are not familiar with, make communication more smoothly.
[0099]
If the synthesized sound data 418d or the registered voice data 418f is output from the sound output unit 50, the animal language output process is terminated and the flow returns to the flow of FIG. Then, when returning to the flow of FIG. 12, the CPU 20 updates the history data 322 (step S117).
[0100]
FIG. 19 is a diagram illustrating an example of a screen in the animal language output process according to the present embodiment. On the animal language output screen 6, the title display 6a indicates a message to the dog 2. At this time, for example, the user 4 is given a sense of intimacy by including the animal name 302 (the name of the pet) in the display like “I will input a message to (Taro)”.
[0101]
Text data 418b and image data 418c corresponding to the emotion identification code 316 read out from the human-animal language conversion TBL 418 are displayed on the text display unit 6b and the image display unit 6c, respectively. It is more preferable that the text display unit 6b is displayed in, for example, a balloon from the image display unit 6c as shown in FIG.
[0102]
The voice data 310 is displayed as a graph on the voice data display unit 6d, and the time when the voice is input is displayed on the date / time display unit 6e at the bottom of the screen.
[0103]
In the flow of FIG. 12, for example, when the up key 108a or the down key 108c is pressed for a relatively long time (step S118; YES), the CPU 20 executes a key input process (step S120).
[0104]
FIG. 16 is a flowchart for explaining the flow of key input processing in the present embodiment. As shown in the figure, the CPU 20 first refers to the screen frame data 414 to display a key input frame on the display unit 52 (step S502). On the key input screen, for example, referring to the human-animal language conversion TBL 418, the contents of the text data 418b are displayed in a selectable button (step S116).
[0105]
The user 4 selects a button having a desired content with the up direction key 108a or the down direction key 108c, and presses the selection key 108b to select and determine (step S504).
[0106]
If selection decision is input, CPU20 will select the emotion identification code 418a corresponding to the content selected from the human-animal language conversion TBL418, and memorize | stores it in RAM30 (step S506). Then, the key input process is terminated, and the flow returns to the flow of FIG. Returning to the flow of FIG. 12, the CPU 20 next executes an animal language output process.
[0107]
FIG. 20 is a diagram illustrating an example of a screen in the key input process according to the present embodiment. In the screen 7 for key input, the title display 7a indicates that the message is for the dog 2.
[0108]
A selection button 7b containing the text data 418b read from the human-animal language conversion TBL 418 is displayed. When all the selection buttons 7b cannot be displayed at a time, they are displayed in a scrollable manner. In addition, the selection button 7b that is currently selected is displayed in reverse video, for example.
[0109]
In addition, a selection button 7c and a cancel button 7d are displayed on the screen 7, and when the selection key 108b and the cancel key 108d are pressed, respectively, the screen 7 is highlighted and the user 4 is notified that the corresponding key has been input. Notify visually.
[0110]
In the flow of FIG. 12, for example, when the selection key 108b is pressed for a predetermined time for a relatively long time (step S122; YES), the CPU 20 executes a mode switching process (step S124).
[0111]
FIG. 17 is a flowchart for explaining the flow of the mode switching process in the present embodiment. As shown in the figure, the CPU 20 first refers to the screen frame data 414 to display a frame for mode switching on the display unit 52 (step S602).
[0112]
Next, when an operation for switching the high frequency mode is input (step S604; YES), the CPU 20 switches the high frequency mode flag 318 (step S606). When an operation for switching the sensation mode is input (step S608; YES), the CPU 20 switches the sensation mode flag 320 (step S610). If a predetermined end operation is input (step S612; YES), the mode switching process is ended, and the flow returns to the flow of FIG.
[0113]
FIG. 21 is a diagram showing an example of a screen in the mode switching process in the present embodiment. In the mode switching screen 7, the title display 8a indicates that the mode switching process is being executed. On the mode switching screen, a high frequency mode ON / OFF display 8b and a bodily sensation mode ON / OFF display 8c are displayed. The ON / OFF displays 8b and 8c are sequentially selected by the input of the up key 108a or the down key 108c. When the selection key 108b is input in the selected state, it means that the mode switching process has been input, and the CPU 20 switches between ON and OFF. When the cancel key 108d is pressed, the end operation of the mode switching process can be input.
[0114]
In the flow of FIG. 12, for example, when the cancel key 108d is pressed for a predetermined time for a relatively long time (step S126; YES), the CPU 20 executes a history display process (step S128).
[0115]
FIG. 22 is a diagram illustrating an example of a screen in the history display process according to the present embodiment. As shown in the figure, in the history display process, the history display unit 9a is displayed with reference to the history data 322. For example, a time 9b, an icon 9c indicating which voice of the dog 2 or the user 4 was used, and a content 9d are displayed. The icon 9c is displayed based on the voice identification flag 322b. The content 9d reads the text data 416b or 418b from the animal language human language conversion table 416 or the human language animal language conversion TBL 418 based on the voice identification flag 322b and the emotion identification code 322c, and displays the text.
[0116]
If the history display portion 9a cannot be displayed on the screen at a time, the history display portion 9a is displayed in a scrollable manner by the input of the up direction key 108a or the down direction key 108c. At this time, it is preferable to display with a bar 9e which time zone within one day (24 hours) the currently displayed history corresponds to.
[0117]
By viewing this history display, the user 4 can be used to know, for example, the personality and habit of the dog 2 and changes in physical condition.
[0118]
In FIG. 12, when no voice is input (step S102; NO), or when a voice is input but neither an animal voice nor a human voice is discriminated (step S114; NO), and a specific key operation is performed. Is not input (NO in step S118 → NO in S112 → NO in S126), the CPU 20 displays the clock screen 3 on the display unit 52 as shown in FIG. 23, for example (step S130).
[0119]
On the clock screen 3, for example, an analog clock 3a, a date 3b, and a day of the week 3c are displayed. Therefore, the user 4 can use the wristwatch type voice analysis apparatus 100 as a tool for communication with the dog 2 as well as a wristwatch.
[0120]
[Second Embodiment]
Next, a second embodiment of the speech analysis apparatus to which the present invention is applied will be described. Note that this embodiment can be realized basically by the same configuration as the first embodiment, and the same components are denoted by the same reference numerals and the description thereof will be omitted.
[0121]
FIG. 24 is a diagram showing an example of the appearance of the dog reader type speech analysis apparatus 200 in the present embodiment. As shown in the figure, the dog reader type speech analysis apparatus 200 includes a reader 202 that is used when the dog 2 is taken for a walk so that it can be pulled out / wound by a reel 204. At the tip of the reader 202, a collar 207 of the dog 2, a metal fitting 206 for attaching and detaching the reader 202, and the microphone unit 102 are provided. The user 4 grips the main body 208 or attaches it to a belt or the like with the clip unit 212 for use.
[0122]
The microphone unit 102 is connected to a control unit 120 and a power source built in the main body 208 by a signal line 210 disposed in the reader 202. By providing the microphone 102 at the tip of the reader 202, it is possible to collect sound more efficiently even under conditions where sound is likely to diffuse, such as outdoors.
[0123]
The dog reader type voice analysis apparatus 200 may be configured to transmit the voice signal collected by the microphone unit 102 via the wristwatch type voice analysis apparatus 100 worn by the user 4 and the data communication unit 114. In this case, the dog reader type voice analysis device 200 omits the voice analysis unit 12, the voice analysis ROM 16, the display unit 52, and the vibration unit 54, and uses the wristwatch type voice analysis device 100 worn by the user 4. It can be.
[0124]
Although the embodiments to which the present invention is applied have been described above, the application of the present invention is not limited to these embodiments, and appropriate modifications, additions, deletions, etc. of components are made without departing from the spirit of the invention. It doesn't matter.
[0125]
For example, the voice analysis device may be realized as a personal computer, a PDA (Personal Digital Assistant), or a multi-function phone.
[0126]
The voice analysis unit 12 may be configured by arithmetic processing of the CPU 20, and the voice analysis ROM 16 may be the same as the ROM 40. The key input unit 14 may be configured to have a touch panel on the display surface of the monitor unit 106.
[0127]
In the animal language output process, the synthesized speech data 418d is output when the speech data of the user 4 is output regardless of the collation result with the voiceprint data 410, and the speech data does not match the voiceprint data 410. A flow for additionally outputting data 418f may be used.
[0128]
【The invention's effect】
  According to the present invention, the following effects can be achieved.it can.
[0130]
  Claim1In the invention described inAccording to the userAre compared with a pre-registered voiceprint, and if they are different, the pre-registered human voice can be output in human language. Therefore, by outputting a human voice corresponding to the user's emotion to the animal, a training effect can be obtained that allows the animal to hear the human voice and become familiar with it. Further, if the voiceprint is the voiceprint of the person (for example, the owner) who is most familiar with the animal, the effect of calming the animal can be obtained even if the user is different from the person.
[0131]
  Claim2According to the invention described in (1), it is possible to carry and use the voice analysis device by making it attachable to the body. Therefore, it is not necessary to take out from the bag etc. one by one when using the speech analyzer, and the usability is improved.
[0132]
  Claim3, 4According to the invention described in the above, the history of the exchanges between the animals and the human is analyzed and used by storing the temporal history of the exchanges between the animals and the humans. be able to.
[0133]
  Claim5According to the invention described in, by generating a predetermined vibration corresponding to the analysis result of the voice of the animal and informing the user of the analysis result by bodily sensation, it is not necessary to read the human language text etc. Usability is improved and smoother communication becomes possible. Moreover, it becomes possible to know the analysis result even for a visually handicapped person or a hearing handicapped person.
[Brief description of the drawings]
FIG. 1 is a diagram showing an example of the appearance of a wristwatch type speech analysis apparatus according to a first embodiment.
FIG. 2 is a conceptual diagram showing an example of how to use a wristwatch type voice analysis apparatus.
FIG. 3 is a functional block diagram illustrating an example of a functional configuration.
FIG. 4 is a diagram showing an example of contents stored in a voice analysis ROM.
FIG. 5 is a diagram illustrating an example of a data configuration of an animal reference voice pattern.
FIG. 6 is a diagram illustrating an example of a data configuration of a human reference voice pattern.
FIG. 7 is a diagram showing an example of contents stored in a RAM.
FIG. 8 is a diagram showing an example of contents stored in a ROM.
FIG. 9 is a diagram showing an example of the data configuration of animal language / human language conversion TBL;
FIG. 10 is a diagram showing an example of the data structure of a human-animal language conversion TBL.
FIG. 11 is a diagram illustrating an example of a data configuration of a vibration pattern TBL.
FIG. 12 is a flowchart for explaining the main processing flow;
FIG. 13 is a flowchart for explaining the flow of voice analysis processing;
FIG. 14 is a flowchart for explaining the flow of human language output processing;
FIG. 15 is a flowchart for explaining the flow of animal language output processing;
FIG. 16 is a flowchart for explaining the flow of key input processing;
FIG. 17 is a flowchart for explaining the flow of mode switching processing;
FIG. 18 is a diagram showing an example of a screen in the human language output process.
FIG. 19 is a diagram showing an example of a screen in animal language output processing.
FIG. 20 is a diagram showing an example of a screen in key input processing.
FIG. 21 is a diagram showing an example of a screen in mode switching processing.
FIG. 22 is a diagram showing an example of a screen in history display processing.
FIG. 23 is a diagram showing an example of a clock display screen.
FIG. 24 is a diagram showing an example of the appearance of a dog reader type speech analyzer according to the second embodiment.
[Explanation of symbols]
2 dogs
4 users
10 Voice input part
12 Speech analysis unit
14 Key input section
16 ROM for voice analysis
162 Speech analysis program
164 Animal reference voice pattern
166 Human reference voice pattern
20 CPU
30 RAM
304 Animal attribute code
306 Human attribute code
310 Audio data
312 Voice input time data
314 Voice identification flag
316 Emotion identification code
318 High-frequency mode flag
320 Experience mode flag
322 history data
40 ROM
402 Animal Language Output Program
404 human language output program
406 Mode switching program
408 History output processing program
410 Voiceprint data
416 Animal language human language conversion TBL (table)
418 Human-animal language conversion TBL (table)
420 Vibration pattern TBL (table)
50 sound output section
52 Display
54 Excitation unit
60 Communication Department
100 Wristwatch type speech analyzer
102 Microphone part
104 Speaker section
106 Monitor unit
108 Key operation unit
110 Wristband
112 Vibrator
114 Data Communication Department
120 control unit
200 Dog Reader Type Voice Analyzer
202 reader
204 reel
206 metal fittings
208 body
210 Signal line
212 Clip part

Claims

In the voice analysis device provided with the voice analysis device body,
First voice input means for inputting animal voice;
First voice analysis means for analyzing the voice input by the first voice input means;
First output means for displaying and outputting characters or images corresponding to the analysis result by the first voice analysis means as human language;
After the human language is displayed and output by the first output means, a second voice input means for inputting the user's voice in order to respond to the displayed and output content;
Second voice analysis means for analyzing the voice input by the second voice input means;
Second output means for outputting voice corresponding to an analysis result by the second voice analysis means as an animal language understandable by the animal;
A first registration means for registering a user's voiceprint;
A second registration means for registering speech in human language having a predetermined meaning;
Determination means for determining whether or not the voice input by the second voice input means matches the voiceprint registered by the first registration means;
A fourth output means for outputting the speech registered by the second registration means in human language when it is determined by the determination means that they do not match;
A speech analysis apparatus comprising:

The speech analysis apparatus according to claim 1 , further comprising mounting means for mounting the speech analysis apparatus main body on a user's body.

3. The storage device according to claim 1, further comprising a storage unit that stores the voice input by the first voice input unit and the time when the voice is input by the first voice input unit in association with each other. The speech analyzer described.

The speech analysis apparatus according to claim 3 , further comprising a fifth output unit that outputs the speech stored in the storage unit in association with the time.

Sound analysis apparatus according to any one of claims 1-4, characterized in that it comprises vibration means for generating a predetermined vibration corresponding to the analysis result by said first sound analysis unit.