JP2004275220A

JP2004275220A - Utterance meter, method of measuring quantity of utterance, program and recording medium

Info

Publication number: JP2004275220A
Application number: JP2003067160A
Authority: JP
Inventors: Koji Yamamoto; 浩司山本; Eiji Noguchi; 栄治野口
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-03-12
Filing date: 2003-03-12
Publication date: 2004-10-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide an utterance meter capable of measuring the contents and situation of utterance of a user excluding the utterance the user does not intend, and a method of measuring the quantity of utterance. <P>SOLUTION: The utterance meter comprises a sound input means 1 for inputting sound, a voice recognition means 2 for recognizing the voice on the signal of the sound inputted by the sound input means 1, and an utterance quantity measuring means 3 comprising a classification database 6 with a plurality of words previously classified into a plurality of categories according to a prescribed standard, a classification means 4 for classifying the result of recognition from the voice recognition means by referring to the classification database 6 and a counter means 5 for counting the result of classification from the classification means 4 by category as the quantity of utterance. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、発話量計、及び発話量計測方法に関する。
【０００２】
【従来の技術】
近年、ストレス社会と呼ばれるように社会環境が変化しつつある中、精神的な健康に関する関心が高まっている。人間の言語的活動度は、精神的な健康度合いや痴呆や自閉症などの評価尺度に用いられ、精神的療法の参考とされる。
【０００３】
従来の言語的活動を評価する装置は、被験者の発声音をマイクロホンを用いて、電気信号に変換し、その電気信号が送出された時間もしくは信号量、又はその双方を測定時間内において積算することで、測定時間内に被験者が発語した量を客観的データとして得ることを可能としている（例えば、特許文献１参照。）。
【０００４】
【特許文献１】
特開平５−５６９５１号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、従来技術においては、使用者の発語量を単に音声信号の強度レベルで計数しているので、咳払い等の使用者が発話を意図しない場合もカウントされる可能性があり、言語活動の計測といった使用目的としては信頼性に課題があった。
【０００６】
又、実際の発話内容や特定の言葉に対する発話量等、精神治療に必要な情報を得ることが出来なかった。
【０００７】
更に、人に対して発話したのか、一人言なのか、又、怒って発話だったのかといった発話時の状況を計測することが出来ず、対人関係に関する精神的療法の評価には使用出来なかった。
【０００８】
本発明は、上記従来の発語量計の課題を考慮し、使用者が意図しない発話を除き、又、使用者の発話内容や発話時の状況も計測出来る発話量計、及び発話量計測方法を提供することを目的としている。
【０００９】
【課題を解決するための手段】
上記の目的を達成するために第１の本発明は、音を入力する音入力手段（１）と、前記音入力手段によって入力された音の信号について音声認識する音声認識手段（２）と、前記音声認識手段からの結果に基づいて、発話量を計測する発話量計測手段（３）とを備えた発話量計である。
【００１０】
又、第２の本発明は、前記音声認識手段は、使用者の意図しない複数の音声と使用者が発声すると予測される複数の音声と、その音声の表す言葉を関連づけ、予め記録されている登録音声データベース（２０２）を有し、前記登録音声データベースの記録から、前記音の信号を音声認識する音声認識手段である第１の本発明の発話量計である。
【００１１】
又、第３の本発明は、前記音声認識手段は、前記音の信号から音声波形を検出する音声検出手段（２０１）と、前記使用者の音声波形と前記使用者の音声波形の表す言葉を関連づけ、予め記録されている登録音声データベース（２０２）と、前記登録音声データベース内に登録されている音声波形を順次、前記音声検出手段から入力された音声波形と比較し、類似度を算出する比較演算手段（２０３）と、前記類似度が最も高い音声波形に対応する言葉を認識結果として出力する認識手段（２０４）とを少なくとも有し、前記発話量計測手段は、複数の言葉が予め所定の基準により複数のカテゴリに分類された分類データベース（６、６’）と、前記認識手段から入力された言葉を前記分類データベースを参照して分類する分類手段（４）と、前記分類手段からの結果をカテゴリ毎に発話量としてカウントするカウンター手段（５）とを有する第１の本発明の発話量計である。
【００１２】
又、第４の本発明は、前記音声検出手段、又は前記認識手段からの結果をもとに感情判定を行う感情判定手段（４０７）を更に有し、前記分類手段は、前記感情判定手段からの結果を考慮して、前記音声認識手段からの結果を分類する分類手段である第３の本発明の発話量計である。
【００１３】
又、第５の本発明は、前記感情判定手段は、前記音声検出手段から入力された音声波形の大きさを計測するパワー計測手段（４０１）と、又は、前記音声検出手段から入力された音声波形の周波数分析を行うことによって発話速度の評価値を計測し、出力する速度計測手段（４０２）と、又は、前記認識手段から入力された言葉の表す感情を分類し、出力する感情分類手段（４０４）と、前記パワー計測手段、又は前記速度計測手段、又は前記感情分類手段からの結果を用いて、前記使用者の感情を特定する感情特定手段（４０３）とを有する第４の本発明の発話量計である。
【００１４】
又、第６の本発明は、前記使用者の生理情報を計測する生理計測手段（６０１）を更に備え、前記感情判定手段は、前記生理情報に所定閾値以上の変化が生じた場合、感情判定を行い、感情判定結果を出力し、前記生理情報に所定閾値以上の変化が生じない場合、感情判定を行わず、前回の感情判定結果を出力する感情判定手段である第４の本発明の発話量計である。
【００１５】
又、第７の本発明は、前記生理計測手段は、前記使用者の血管に赤外線を照射する赤外線発光手段（７０１）と、前記血管にて反射した前記赤外線を受光し、前記赤外線の強度に比例した信号を出力する赤外線受光手段（７０２）と、前記赤外線受光手段から入力された信号を用いて脈波を構成し、出力する脈波検出手段（７０３）と、前記脈波検出手段から入力された脈波を用いて、単位時間当たりの心拍数を算出し、出力する心拍計数手段（７０４）とを有する生理計測手段である第６の本発明の発話量計である。
【００１６】
又、第８の本発明は、前記使用者の前方に人がいるか否かを検知する人体検知手段（８０１）を更に備え、前記発話量計測手段は、前記人体検知手段からの人体検知情報を加味した発話量を計測する発話量計測手段である第１の本発明の発話量計である。
【００１７】
又、第９の本発明は、前記人検知手段は、対象物の輻射熱を検知する焦電センサ（９０１）と、前記焦電センサによって検知される温度分布の変化量を計測する変動検出手段（９０２）と、前記変動検出手段の結果から所定基準以上の変化量がある場合、対象物を人間と判定する人体判定手段（９０３）とを有する第８の本発明の発話量計である。
【００１８】
又、第１０の本発明は、音を入力する音入力工程と、前記音入力工程によって入力された音の信号について音声認識する音声認識工程と、前記音声認識工程からの結果に基づいて、発話量を計測する発話量計測工程とを備えた発話量計測方法である。
【００１９】
又、第１１の本発明は、第１、４、７、又は９の本発明の発話量計の前記音入力手段によって入力された音の信号について音声認識する音声認識手段、前記音声認識手段に基づいて、発話量を計測する発話量計測手段、前記音声検出手段、又は前記認識手段からの結果をもとに感情判定を行う感情判定手段、前記赤外線受光手段から入力された信号を用いて脈波を構成し、出力する脈波検出手段、前記脈波検出手段から入力された脈波を用いて、単位時間当たりの心拍数を算出し、出力する心拍計数手段、前記焦電センサによって検知される温度分布の変化量を計測する変動検出手段、及び前記変動検出手段の結果から所定基準以上の変化量がある場合、対象物を人間と判定する人体判定手段としてコンピュータを機能させるためのプログラムである。
【００２０】
又、第１２の本発明は、第１１の本発明のプログラムを担持した記録媒体であって、コンピュータで利用可能な記録媒体である。
【００２１】
【発明の実施の形態】
以下、本発明の実施の形態を図面を参照しながら説明する。
【００２２】
（実施の形態１）
図１は、本実施の形態１における発話量計のブロック図である。本実施の形態１における発話量計は、使用者の音声を入力する音入力手段１を有している。又、音声信号を文字変換する音声認識手段２と、文字変換された文字をカウントして、発話量を計測する発話量計測手段３とを備えている。
【００２３】
又、図２に発話量計測手段３のブロック図を示す。図に示すように、発話量計測手段３は、予め文字をカテゴリに分類し、保存している分類データベース６と、分類データベース６に保存された分類に従って、文字の分類を行う分類手段４を有している。更に、カテゴリ毎に記録された文字の数を使用者の発話量として計数し、使用者等に通知を行うカウンター手段５を有している。
【００２４】
又、図３は、音声認識手段２のブロック図を示している。図に示すように、音声認識手段２は、音声信号波形から使用者の発声した音声区間を切り出し、その区間の音声信号波形を音声波形として出力する音声検出手段２０１を有している。尚、ここで音入力手段１から入力されたノイズも含む音の信号を、以下「音声信号波形」と呼び、音声信号波形から、ノイズを除去し、実際の発声に関わる音声部分を切り出したものを、以下「音声波形」と呼ぶ。
【００２５】
更に、使用者が発話すると予測される言葉に対する音声波形が入力された登録音声データベース２０２と、入力された音声波形と登録音声データベース２０２に登録された音声波形とを比較し、登録音声データベース２０２に登録された音声波形毎に、類似度を検出する比較演算手段２０３とを有している。又、類似度の最も高い音声波形を検出し、登録音声データベース２０２から対応する言葉を決定し、文字変換を行う認識手段２０４を有している。
【００２６】
以上のような構成をした本実施の形態１における発話量計を用いた発話量計測方法について以下に説明する。
【００２７】
はじめに、音入力手段１において、使用者から入力された音声を音声信号波形に変換し、音声認識手段２に出力する。続いて、音声検出手段２０１は、音声信号波形が一定基準を満たす波形部分を実際に使用者が発声した音声区間として切り出し、その区間の音声信号波形を音声波形として比較演算手段２０３に出力する。ここで、音声区間を切り出す一定基準とは、例えば、一般的に人間の音声帯域である１ｋＨｚ以下の周波数帯域における信号波形のパワーが一定レベル以上であるということがあげられる。又、使用者の喉やあごに取り付けた振動センサや筋電センサ等で物理的な発声動作を検知した上で、前記基準を適用しても良い。
【００２８】
又、登録音声データベース２０２には、使用者が発話すると予測される言葉に対する音声波形とその言葉が予め対応付けられて登録されている。そして、比較演算手段２０３は、音声検出手段２０１から入力された音声波形と登録音声データベース２０２に登録されている音声波形とを順次、比較し、登録音声データベース２０２に登録された音声波形毎に類似度を算出し、認識手段２０４に出力する。
【００２９】
ここで、比較手法としては比較する２つの音声波形をフーリエ変換等の周波数分析した後の各周波数におけるパワー成分の差分合計としても良いし、周波数分析した後、更に極座標変換したケプストラム特徴量やメルケプストラム特徴量において時間的伸縮を考慮したＤＰマッチングを行うといった手法でも良い。又、比較演算の効率化のために登録音声データベース２０２に登録されている音声波形を比較演算手段２０３で用いる比較要素、例えば各周波数のパワー成分としても良い。
【００３０】
又、登録音声データベース２０２に登録されている音声波形には、使用者の咳払いやうなり声等、発声を意図しない場合の音声波形を登録し、対応する言葉として「意図しない発声」として登録しておく。これによって、発話の意図しない発声を区別することが可能となる。
【００３１】
続いて、認識手段２０４は、比較演算手段２０３から入力される各音声波形毎の類似度のうち、最も高い類似度を持つ音声波形を検出し、登録音声データベース２０２から対応する言葉を決定することで、音声波形の文字変換を行う。各類似度に大きな差異が見られない場合には、入力音声は、ノイズであると判断して文字変換は行わない。あるいは、「ノイズ」と文字変換しても良い。
【００３２】
次に、分類手段４は、図４に示すような予め決められたカテゴリ分類に従って、使用期間中に認識手段２０４から入力される言葉を分類する。ここで、カテゴリ分類とは、例えば、音声が「おはようございます。」とか「おはようさん。」といった同じ内容を表す言葉を「朝の挨拶」という１つのカテゴリにまとめたものである。
【００３３】
最後に、カウンター手段５は、使用期間中に分類手段４でカテゴリ毎に記録された言葉の数を使用者の発話量として計数し、使用者や医師等にカテゴリ毎の数値を示す。
【００３４】
これにより、使用者の意図しない発話を区別出来、又、発話内容毎の発話量を計測することが出来、精神的療法に必要な本来の言語活動量についての評価が可能となる。
【００３５】
尚、本発明の音声認識手段は、本実施の形態１では、登録されている、音声波形とその音声の示す言葉「おはよう」、「おはようございます」等によって、音声波形を言葉に変換するが、「花」、「咲く」等の単語、又は「あ」、「え」等の一文字毎の音声認識を行って、その結果出来る言葉を分類しても良い。
【００３６】
尚、本発明の音入力手段には、音声以外のノイズ音も入力され、又、本発明の音声とは、咳などの意図しない発生や文字も含んでいる。
【００３７】
（実施の形態２）
図５は、本実施の形態２における発話量計のブロック図である。本実施の形態２における発話量計は、基本的構成は、実施の形態１と同じであるが、発話時の感情をも付加し分類を行う感情判定手段４０７を備えている点が相違する。そこで、本相違点を中心に説明をする。尚、実施の形態１と同じ構成要素には、同一番号を付している。
【００３８】
本実施の形態２における発話量計の感情判定手段４０７は、音声波形の周波数パワーを計測するパワー計測手段４０１を有している。又、音声波形の周波数から音声波形の速度の評価を行う速度計測手段４０２を有している。更に、予め言葉と、その言葉の表す感情が対応付けられ、保存されている感情分類データベース４０５と、感情分類データベース４０５に従って、感情分類を行う感情分類手段４０４を有している。
【００３９】
又、周波数パワーと速度と感情分類の結果と、それらを用いた感情判定結果が関連づけられた感情特定データベース４０６と、感情特定データベース４０６に従って感情特定を行う感情特定手段４０３を有している。
【００４０】
又、実施の形態１における発話量計は、言葉とそれらのカテゴリ分類が対応付けられている分類データベース６を有していたが、本実施の形態２における発話量計は、その言葉を発した時の感情を更に付加されている分類データベース６’を有している。
【００４１】
以上のような構成をした本実施の形態２における発話量計を用いた発話量計測方法について以下に説明する。
【００４２】
はじめに、実施の形態１と同様に、音声検出手段２０１にて、音入力手段１から入力された音声信号波形から、使用者が実際に発声した音声区間を切り出し、その区間の音声信号波形を音声波形としてパワー計測手段４０１に出力する。次に、パワー計測手段４０１にて、入力された音声波形の周波数パワーを測定し、入力音量レベルを決定し、感情特定手段４０３に出力する。
【００４３】
又、上記と同様に、音声検出手段２０１にて音声波形を切り出し、出力された音声波形をフーリエ変換等の周波数分析を行い、人間の音声帯域中での各周波数成分のパワーにおいて、「音程が一定とした場合に全体的に見て高周波成分が大きい分布の場合に発話速度が早いとし、低周波成分が大きい場合に発話速度が遅い。」という仮定から、発話速度の評価を行う。この時、評価基準として、予め使用者の標準発声速度を測定し、その周波数成分分布を基準値０として比較することで発話速度の評価値を決定して、感情特定手段４０３に出力する。ここで、高周波成分が多く、発話速度が早いと判断される場合に正値としても良いし、又、逆に負値としても良い。
【００４４】
又、感情分類手段４０４において、音声認識手段２によって文字変換された言葉が、怒り、悲しみ、喜び等のどの感情を表す言葉なのかを、予め登録された感情分類データベース４０５に従って、分類され、感情特定手段４０３に出力される。ここで、図６に感情分類データベース４０５の例を示す。図に示すように、言葉と、その言葉に対応する感情が登録されている。
【００４５】
次に、パワー計測手段４０１から出力された入力音量レベルと、速度計測手段４０２から出力された発話速度の評価値と、感情分類手段４０４から出力された感情分類とから、感情特定データベース４０６に従って、発話時の平常、喜び、怒り、悲しみ等の感情特定を感情特定手段４０３が行い、分類手段４に出力する。感情特定データベース４０６には、例えば、図７に示す判定表が記録されている。この判定表で、「怒り」と判定される場合は、音声認識手段２によって怒りを表す言葉と認識され、入力音量レベルは大きいか又は普通であると計測され、発生速度が早い又は普通と判定された場合に限り、「怒り」と感情判定がなされる。
【００４６】
続いて、分類手段４は、分類データベース６’に保持される各カテゴリに対して感情分類の結果を付加して分類する。ここで、図８に本実施の形態２における分類データベース６’の例を示す。図に示すように、１つのカテゴリを更に感情毎に区分けした発話量が計測出来る。
【００４７】
最後に、カウンター手段５は、使用期間中に分類手段４でカテゴリ毎及び、感情分類毎に記録された言葉の数を使用者の発話量として計数し、使用者や医師等にカテゴリ毎及び、感情分類毎の数値を示す。
【００４８】
これにより、使用者の発話時の感情状態毎の発話量を計測することが出来、精神的療法に使用可能な、より詳細な情報を医師や介護士に提供出来る。
【００４９】
尚、本発明の感情判定手段は、本実施の形態２では、パワー計測手段４０１と、速度計測手段４０２と、感情分類手段４０４と、感情特定手段４０３と、感情特定データベース４０６とから構成されている感情判定手段４０７に相当するが、パワー計測手段４０１と速度計測手段４０２と感情分類手段４０４のうち、任意に選び出した２つ又は１つを備え、それらの結果だけを用いて感情判定を行っても良く、要するに音の信号の感情判定を行う感情判定手段でありさえすれば良い。
【００５０】
尚、本発明の感情分類手段は、本実施の形態２では、感情分類手段４０４と感情分類データベース４０５とから構成されているが、感情分類データベース４０５を感情分類手段４０４に組み込んでも良い。
【００５１】
尚、本発明の感情特定手段は、本実施の形態２では、感情特定手段４０３と感情特定データベース４０６とから構成されているが、感情特定データベース４０６を感情特定手段４０３に組み込んでも良い。
【００５２】
又、本発明の感情分類手段は、感情特定手段と兼ねていても良い。
【００５３】
（実施の形態３）
図９は、本実施の形態３における発話量計のブロック図である。本実施の形態３における発話量計は、実施の形態２と基本的構成は同じであるが、使用者の生理情報を測定し、生理情報の変化を加味して、感情判定を行う点が異なる。そこで、本相違点を中心に説明する。尚、実施の形態２と同一の構成要素には、同一番号を付している。
【００５４】
本実施の形態３における発話量計は、使用者の生理情報を計測する生理計測手段６０１を更に備えている。ここで、図１０に生理計測手段６０１の構成図を示す。図に示すように、生理計測手段６０１は、赤外線を血管に照射する赤外線発光手段７０１と、血管にて反射し、入射された赤外線の強度に応じた信号を出力する赤外線受光手段７０２を有している。ここで、赤外線発光手段７０１と赤外線受光手段７０２は、図１０に示すようにクリップ構造や、指輪型構造によって使用者の耳朶や指内の血管に照射され、その反射光が赤外線受光手段７０２に入射される構造となっている。又、赤外線受光手段７０２から入力された信号を用いて脈波を構成し、出力する脈波検出手段７０３と、脈波検出手段７０３から入力された脈波を用いて、単位時間当たりの心拍数を算出し、感情特定手段４０３に出力する心拍計数手段７０４が備えられている。
【００５５】
以上のような構成をした発話量計を用いた発話量測定方法について以下に説明する。
【００５６】
はじめに、図１０に示すようにクリップ構造や指輪型構造によって使用者の耳朶や指等に、赤外線発光手段７０１と赤外線受光手段７０２を取り付け、赤外線発光手段７０１から発せられた赤外線が耳朶や指内の血管に照射され、その反射光が赤外線受光手段７０２に入射される。赤外線受光手段７０２は入射された赤外線強度に比例した信号を脈波検出手段７０３に出力する。
【００５７】
ここで、赤外線受光手段７０２に入射される赤外線強度は血管の吸光度に反比例するが、心臓の鼓動に従って、血管は収縮、膨張し、吸光度がそれによって変化する。脈波検出手段７０３は、前述した血管の吸光度変化から心臓の脈波を再構成し、心拍計数手段７０４に出力する。
【００５８】
続いて、心拍計数手段７０４は、脈波データから脈波１周期分の時間を割り出し、単位時間辺りの心拍数を算出して、感情特定手段４０３に出力する。
【００５９】
次に、感情特定手段４０３は、「感情の変化が発生した時には、生理情報の変化も発生する。」という仮定により、生理計測手段６０１からの入力を評価し、一定基準以上の変化量が生じた場合にはパワー計測手段４０１、速度計測手段４０２及び感情分類手段４０４の結果から例えば図７に示す判定表に従って、発話時の平常、喜び、怒り、悲しみ等の感情判定を行い、感情判定結果を分類手段４に出力する。
【００６０】
又、生理情報の変化量が一定基準以下の場合には、前回の感情判定結果を分類手段４に出力する。分類手段４は、保持される各カテゴリに対して感情分類の結果を付加して記録保持し、カウンター手段５は、使用期間中に分類手段４でカテゴリ毎及び、感情分類毎に記録された言葉の数を使用者の発話量として計数し、使用者や医師等にカテゴリ毎及び、感情分類毎の数値を示す。
【００６１】
これにより、より正確に使用者の発話を、発話時の感情状態毎の発話量として評価することが出来る。
【００６２】
尚、本発明の生理情報として、本実施の形態３では、心拍数をあげたが、発汗量や体温、目の瞬き量等を検出しても良い。
【００６３】
（実施の形態４）
図１１は、本実施の形態４における発話量計のブロック図である。本実施の形態４における発話量計は、実施の形態１と基本的構成は同じであるが、発話している時に、人に対して発話しているか否かを検知する人体検知手段８０１を備えている点が異なる。そこで、本相違点を中心に説明する。尚、実施の形態１と同じ構成要素には、同一番号を付している。
【００６４】
図１２に、人体検知手段８０１の構成図を示す。図に示すように、人体検知手段８０１は、対象物からの輻射熱を検知する焦電センサ９０１を有している。又、焦電センサ９０１によって検知される温度分布の変化量を計測する変動検出手段９０２と、変化量の結果から対象物を人間か否か判定する人体判定手段９０３を有している。
【００６５】
又、実施の形態１における発話量計は、言葉とそれらのカテゴリ分類が対応付けられている分類データベース６を有していたが、本実施の形態４における発話量計は、その言葉を発した時に人に対して発話しているのか否かの人検知の有無を更に付加した分類データベース６”を有している。
【００６６】
以上のような構成をした発話量計を用いた発話量測定方法について以下に説明する。
【００６７】
焦電センサ９０１は、図１２に示すように使用者の前面を向くように取り付けられ、焦電効果によって対象物体からの輻射熱による赤外線を検出し、その強度を出力する。
【００６８】
次に、変動検出手段９０２にて、焦電センサ９０１からの入力強度から、通常の人体温度である３４度から４０度までの温度分布の面積を求め、前記面積の単位時間辺りの変化量を温度分布の変化量として人体判定手段９０３に出力する。
【００６９】
次に、人体判定手段９０３は、予め、通常状態で人が対面で話している時の温度分布の変化量の基準幅を持っており、変動検出手段９０２からの入力が前記基準幅に入っている場合に、前面にいる対象物は人体であると判定し、分類手段４に出力する。
【００７０】
次に、分類手段４は、分類データベース６”に保持される各カテゴリに対して人体検知手段８０１の結果を付加して分類する。ここで、図１３に本実施の形態４における分類データベース６”の例を示す。図に示すように、１つのカテゴリ内の言葉毎に人検知の有無を付加したデータベースとなっている。
【００７１】
最後に、カウンター手段５は、使用期間中に分類手段４でカテゴリ毎及び、前面に人がいた場合といなかった場合毎に記録された言葉の数を使用者の発話量として計数し、使用者や医師等にカテゴリ毎及び、対人発話か否かの数値を示す。
【００７２】
これにより、使用者の発話を人に対する発話の場合とそうでない場合に区別して、発話量を計測することが出来、対人関係に関わる精神的活動の情報を得ることが出来、最近、急増している対人恐怖症や対人ストレスに対する療法に有用な情報を医師や介護婦に提供出来る。
【００７３】
尚、本発明の人体検知手段としては、本実施の形態４では、焦電センサを利用したが、例えばＣＣＤセンサ等の光センサを利用しても良く、要するに使用者の前方に人がいるか否かを判定する人検知手段でありさえすれば良い。
【００７４】
尚、本発明のプログラムは、上述した本発明の発話量計の全部又は一部の手段（又は、装置、素子等）の機能をコンピュータにより実行させるためのプログラムであってコンピュータと協働して動作するプログラムである。
【００７５】
又、本発明の記録媒体は、上述した本発明の発話量計の全部又は一部の手段（又は、装置、素子等）の全部又は一部の機能をコンピュータにより実行させるためのプログラムを担持した記録媒体であり、コンピュータにより読み取り可能且つ、読み取られた前記プログラムが前記コンピュータと協働して前記機能を実行する記録媒体である。
【００７６】
又、本発明のプログラムの一利用形態は、コンピュータにより読み取り可能な記録媒体に記録され、コンピュータと協働して動作する態様であっても良い。
【００７７】
又、本発明のプログラムの一利用形態は、伝送媒体中を伝送し、コンピュータにより読み取られ、コンピュータと協働して動作する態様であっても良い。
【００７８】
又、記録媒体としては、ＲＯＭ等が含まれ、伝送媒体としては、インターネット等の伝送媒体、光・電波・音波等が含まれる。
【００７９】
又、上述した本発明のコンピュータは、ＣＰＵ等の純然たるハードウェアに限らず、ファームウェアや、ＯＳ、更に周辺機器を含むものであっても良い。
【００８０】
尚、以上説明した様に、本発明の構成は、ソフトウェア的に実現しても良いし、ハードウェア的に実現しても良い。
【００８１】
【発明の効果】
以上説明したところから明らかなように、使用者の意図しない発話を区別出来る発話量計を提供することが出来る。
【００８２】
又、感情状態毎の発話量を計測出来、又、人がいる場合といな場合に区別して発話量を計測することが出来る発話量計を提供することが出来る。
【図面の簡単な説明】
【図１】本発明の実施の形態１における発話量計のブロック図
【図２】本発明の実施の形態１における発話量計のブロック図
【図３】本発明の実施の形態１における音声認識手段の構成図
【図４】本発明の実施の形態１における分類データベースのカテゴリ分類の例を示す図
【図５】本発明の実施の形態２における発話量計のブロック図
【図６】本発明の実施の形態２における感情分類データベースの感情分類の例を示す図
【図７】本発明の実施の形態２における感情特定データベースの感情特定表の例を示す図
【図８】本発明の実施の形態２における分類データベースのカテゴリ分類の例を示す図
【図９】本発明の実施の形態３における発話量計のブロック図
【図１０】本発明の実施の形態３における生理計測手段の構成図
【図１１】本発明の実施の形態４における発話量計のブロック図
【図１２】本発明の実施の形態４における人体検知手段の構成図
【図１３】本発明の実施の形態４における分類データベースのカテゴリ分類の例を示す図
【符号の説明】
１音入力手段
２音声認識手段
３発話量計測手段
４分類手段
５カウンター手段
６分類データベース[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech volume meter and a speech volume measurement method.
[0002]
[Prior art]
2. Description of the Related Art In recent years, as the social environment is changing to be called a stress society, interest in mental health is increasing. The human linguistic activity is used as an evaluation scale for mental health, dementia, autism, etc., and is used as a reference for mental therapy.
[0003]
A conventional device for evaluating linguistic activity converts a subject's vocal sound into an electric signal using a microphone, and integrates the time or amount of the electric signal transmitted, or both in the measurement time. Thus, it is possible to obtain, as objective data, the amount spoken by the subject within the measurement time (for example, see Patent Document 1).
[0004]
[Patent Document 1]
JP-A-5-56951
[0005]
[Problems to be solved by the invention]
However, in the related art, since the amount of speech of the user is simply counted based on the intensity level of the voice signal, the speech may be counted even when the user does not intend to speak, such as coughing, and speech activity may be counted. There was a problem in reliability for use such as measurement.
[0006]
In addition, it was not possible to obtain information necessary for mental treatment, such as the actual utterance content and the utterance amount for a specific word.
[0007]
In addition, it was not possible to measure the situation at the time of speech, such as whether he spoke to a person, whether it was a single word, or whether he was angry, and could not be used to evaluate psychological therapy for interpersonal relationships. .
[0008]
The present invention considers the problems of the above conventional utterance volume meter, removes utterances not intended by the user, and also measures the utterance content of the user and the situation at the time of utterance, and an utterance volume measurement method. It is intended to provide.
[0009]
[Means for Solving the Problems]
In order to achieve the above object, a first aspect of the present invention provides a sound input means (1) for inputting sound, a voice recognition means (2) for recognizing a sound signal input by the sound input means, An utterance amount meter including utterance amount measuring means (3) for measuring an utterance amount based on a result from the voice recognition means.
[0010]
In a second aspect of the present invention, the voice recognition means associates a plurality of voices not intended by the user with a plurality of voices predicted to be uttered by the user and a word represented by the voices, and records the voices in advance. The utterance meter according to the first aspect of the present invention, which has a registered voice database (202) and is a voice recognition unit that recognizes the sound signal from the recorded voice database.
[0011]
In a third aspect of the present invention, the voice recognition means includes a voice detection means (201) for detecting a voice waveform from the sound signal, and a voice waveform of the user and words represented by the voice waveform of the user. Associating and comparing a pre-recorded registered voice database (202) with a voice waveform registered in the registered voice database sequentially with a voice waveform input from the voice detecting means to calculate a similarity An arithmetic unit (203); and a recognition unit (204) for outputting a word corresponding to the speech waveform having the highest similarity as a recognition result. A classification database (6, 6 ′) classified into a plurality of categories according to criteria, a classification unit (4) for classifying words input from the recognition unit with reference to the classification database, A utterance meter according to a first aspect of the present invention, comprising counter means (5) for counting a result from the classification means as an utterance amount for each category.
[0012]
Further, the fourth invention further includes an emotion determination unit (407) for performing emotion determination based on the result from the voice detection unit or the recognition unit, and the classification unit includes The utterance meter according to the third aspect of the present invention, which is a classifying means for classifying the result from the voice recognition means in consideration of the result of (1).
[0013]
According to a fifth aspect of the present invention, the emotion determining means includes a power measuring means (401) for measuring a magnitude of a voice waveform input from the voice detecting means, or a voice input from the voice detecting means. A speed measuring unit (402) that measures and outputs an evaluation value of the utterance speed by performing frequency analysis of the waveform, or an emotion classifying unit that classifies and outputs an emotion represented by a word input from the recognition unit ( 404) and an emotion specifying means (403) for specifying the user's emotion using a result from the power measuring means, the speed measuring means, or the emotion classifying means. It is an utterance meter.
[0014]
The sixth aspect of the present invention further includes a physiological measurement unit (601) for measuring physiological information of the user, wherein the emotion determining unit determines an emotion when a change of a predetermined threshold or more occurs in the physiological information. And outputs an emotion determination result. If the physiological information does not change by a predetermined threshold value or more, the emotion determination unit outputs the previous emotion determination result without performing emotion determination. It is a meter.
[0015]
In a seventh aspect of the present invention, the physiological measuring means includes an infrared light emitting means (701) for irradiating the user's blood vessel with infrared light, the infrared light reflected by the blood vessel, and an intensity of the infrared light. An infrared light receiving means (702) for outputting a proportional signal, a pulse wave using the signal input from the infrared light receiving means, and a pulse wave detecting means (703) for outputting the pulse wave; and an input from the pulse wave detecting means A speech volume meter according to a sixth aspect of the present invention, which is a physiological measurement unit having a heart rate counting unit (704) for calculating and outputting a heart rate per unit time using the obtained pulse wave.
[0016]
An eighth aspect of the present invention further comprises a human body detecting means (801) for detecting whether or not there is a person in front of the user, wherein the utterance amount measuring means detects the human body detection information from the human body detecting means. The utterance amount meter according to the first aspect of the present invention, which is an utterance amount measurement unit that measures the added utterance amount.
[0017]
In a ninth aspect of the present invention, the human detecting means includes a pyroelectric sensor (901) for detecting radiant heat of the object, and a fluctuation detecting means for measuring a change in a temperature distribution detected by the pyroelectric sensor (901). 902) and a human body determining means (903) for determining that a target object is a human when there is a change amount equal to or more than a predetermined reference based on the result of the fluctuation detecting means.
[0018]
Further, a tenth aspect of the present invention provides a sound inputting step of inputting a sound, a voice recognizing step of recognizing a sound signal input in the sound inputting step, and an utterance based on a result from the voice recognizing step. And an utterance amount measuring step of measuring the amount.
[0019]
An eleventh aspect of the present invention is the speech recognition means according to the first, fourth, seventh or ninth aspect, wherein the speech recognition means recognizes a sound signal input by the sound input means. Based on the utterance amount measuring means for measuring the utterance amount, the voice detecting means, or the emotion determining means for performing emotion determination based on the result from the recognizing means, and the pulse using signals input from the infrared light receiving means. A pulse wave detecting means for forming and outputting a pulse, a heart rate per unit time is calculated using a pulse wave input from the pulse wave detecting means, and a heart rate counting means for outputting the pulse wave is detected by the pyroelectric sensor. A fluctuation detecting means for measuring a change amount of the temperature distribution, and a program for causing a computer to function as a human body judging means for judging an object as a human when there is a change amount equal to or more than a predetermined reference from the result of the fluctuation detecting means. It is.
[0020]
According to a twelfth aspect of the present invention, there is provided a recording medium carrying the program according to the eleventh aspect of the present invention, which is a recording medium usable by a computer.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0022]
(Embodiment 1)
FIG. 1 is a block diagram of the utterance meter according to the first embodiment. The utterance meter according to the first embodiment includes a sound input unit 1 for inputting a user's voice. The apparatus further includes voice recognition means 2 for converting a voice signal into characters, and utterance amount measurement means 3 for counting the number of converted characters and measuring the amount of utterance.
[0023]
FIG. 2 shows a block diagram of the utterance amount measuring means 3. As shown in the figure, the utterance amount measuring means 3 has a classification database 6 that classifies characters in advance and stores them, and a classification unit 4 that classifies characters according to the classification stored in the classification database 6. are doing. Further, there is provided a counter means 5 for counting the number of characters recorded for each category as a user's utterance amount and notifying the user or the like.
[0024]
FIG. 3 shows a block diagram of the voice recognition means 2. As shown in the figure, the voice recognition unit 2 has a voice detection unit 201 that cuts out a voice section uttered by a user from a voice signal waveform and outputs a voice signal waveform of the section as a voice waveform. Here, a sound signal including noise input from the sound input means 1 is hereinafter referred to as an "audio signal waveform", which is obtained by removing noise from an audio signal waveform and cutting out an audio portion related to actual utterance. Is hereinafter referred to as a “voice waveform”.
[0025]
Further, a registered voice database 202 in which a voice waveform corresponding to a word predicted to be spoken by the user is input, and the input voice waveform and a voice waveform registered in the registered voice database 202 are compared. A comparison operation unit 203 for detecting a similarity is provided for each registered voice waveform. Further, there is provided a recognition unit 204 that detects a speech waveform having the highest similarity, determines a corresponding word from the registered speech database 202, and performs character conversion.
[0026]
An utterance amount measuring method using the utterance meter according to the first embodiment having the above-described configuration will be described below.
[0027]
First, the sound input unit 1 converts a voice input by a user into a voice signal waveform and outputs the voice signal waveform to the voice recognition unit 2. Subsequently, the voice detection unit 201 cuts out a waveform portion where the voice signal waveform satisfies a certain standard as a voice section actually uttered by the user, and outputs the voice signal waveform of the section to the comparison calculation unit 203 as a voice waveform. Here, the fixed reference for cutting out a voice section includes, for example, that the power of a signal waveform in a frequency band of 1 kHz or less, which is generally a human voice band, is equal to or higher than a certain level. Further, the above-described criterion may be applied after a physical vocalization operation is detected by a vibration sensor or a myoelectric sensor attached to a user's throat or chin.
[0028]
In the registered voice database 202, a voice waveform corresponding to a word predicted to be uttered by the user and the word are registered in advance in association with each other. Then, the comparison operation unit 203 sequentially compares the audio waveform input from the audio detection unit 201 with the audio waveform registered in the registered audio database 202, and performs similarity for each audio waveform registered in the registered audio database 202. The degree is calculated and output to the recognition means 204.
[0029]
Here, as a comparison method, the two voice waveforms to be compared may be the total sum of the power components at each frequency after frequency analysis such as Fourier transform, or the cepstrum feature amount and the mel code that are further subjected to the polar coordinate conversion after the frequency analysis. A method of performing DP matching in consideration of temporal expansion and contraction in the cepstrum feature amount may be used. Further, a sound waveform registered in the registered sound database 202 may be used as a comparison element used in the comparison operation means 203, for example, a power component of each frequency for the purpose of efficient comparison operation.
[0030]
In addition, in the voice waveform registered in the registered voice database 202, a voice waveform when the user does not intend to utter, such as a user's coughing and humming, is registered, and the corresponding word is registered as “unintended utterance”. deep. This makes it possible to distinguish unintended utterances.
[0031]
Subsequently, the recognizing unit 204 detects a speech waveform having the highest similarity among the similarities of the respective speech waveforms input from the comparison operation unit 203, and determines a corresponding word from the registered speech database 202. Then, character conversion of the audio waveform is performed. If there is no large difference between the similarities, the input voice is determined to be noise and no character conversion is performed. Alternatively, the character may be converted to “noise”.
[0032]
Next, the classification unit 4 classifies the words input from the recognition unit 204 during the use period according to a predetermined category classification as shown in FIG. Here, the category classification is, for example, words in which the voices have the same content such as "Good morning" or "Good morning."
[0033]
Finally, the counter unit 5 counts the number of words recorded for each category by the classifying unit 4 during the use period as the amount of utterance of the user, and indicates a numerical value for each category to the user, a doctor, or the like.
[0034]
This makes it possible to distinguish utterances not intended by the user, measure the amount of utterance for each utterance content, and evaluate the original amount of language activity necessary for psychotherapy.
[0035]
In the first embodiment, the voice recognition unit of the present invention converts a voice waveform into a word by using a registered voice waveform and words “Ohayo”, “Ohayosao” or the like indicated by the voice. , "Flower", "blooming", etc., or one character such as "a", "e" may be subjected to voice recognition, and the resulting words may be classified.
[0036]
In addition, noise sound other than voice is also input to the sound input means of the present invention, and the voice of the present invention also includes unintended occurrences such as cough and characters.
[0037]
(Embodiment 2)
FIG. 5 is a block diagram of the speech meter according to the second embodiment. The utterance meter according to the second embodiment has the same basic configuration as that of the first embodiment, but differs in that an emotion determination unit 407 for adding an emotion at the time of utterance and performing classification is also provided. Therefore, the description will focus on this difference. The same components as those in the first embodiment are denoted by the same reference numerals.
[0038]
The emotion determination unit 407 of the utterance meter according to the second embodiment includes a power measurement unit 401 that measures the frequency power of a speech waveform. Further, there is provided a speed measuring means 402 for evaluating the speed of the voice waveform from the frequency of the voice waveform. Further, the apparatus includes an emotion classification database 405 in which words and emotions represented by the words are previously associated and stored, and an emotion classification unit 404 for performing emotion classification according to the emotion classification database 405.
[0039]
Further, it has an emotion identification database 406 in which the results of the frequency power, speed, and emotion classification and the emotion determination result using them are associated with each other, and an emotion identification unit 403 that performs emotion identification according to the emotion identification database 406.
[0040]
Further, the utterance meter according to the first embodiment has the classification database 6 in which words and their category classifications are associated, but the utterance meter according to the second embodiment emits the words. It has a classification database 6 'to which the emotion of time is further added.
[0041]
An utterance amount measuring method using the utterance amount meter according to the second embodiment having the above configuration will be described below.
[0042]
First, as in the first embodiment, the voice detection unit 201 cuts out a voice section actually uttered by the user from the voice signal waveform input from the sound input unit 1, and converts the voice signal waveform of the section into voice. The waveform is output to the power measurement unit 401 as a waveform. Next, the power measuring unit 401 measures the frequency power of the input audio waveform, determines the input volume level, and outputs it to the emotion specifying unit 403.
[0043]
In the same manner as described above, the audio waveform is cut out by the audio detection means 201, and the output audio waveform is subjected to frequency analysis such as Fourier transform. The speech speed is evaluated on the assumption that the speech speed is high when the distribution has a large high-frequency component as a whole, and the speech speed is low when the low-frequency component is large. At this time, the standard utterance speed of the user is measured in advance as an evaluation criterion, and the evaluation value of the utterance speed is determined by comparing the frequency component distribution with the reference value 0, and is output to the emotion specifying means 403. Here, when it is determined that there are many high-frequency components and the utterance speed is fast, a positive value may be used, or a negative value may be used.
[0044]
The emotion classification unit 404 classifies the words converted by the voice recognition unit 2 into words representing anger, sadness, joy, or the like according to the emotion classification database 405 registered in advance. It is output to the specifying means 403. Here, FIG. 6 shows an example of the emotion classification database 405. As shown in the figure, words and emotions corresponding to the words are registered.
[0045]
Next, based on the input sound volume level output from the power measurement unit 401, the evaluation value of the utterance speed output from the speed measurement unit 402, and the emotion classification output from the emotion classification unit 404, according to the emotion identification database 406, The emotion identification means 403 performs emotion identification such as normality, joy, anger, sadness and the like at the time of utterance, and outputs it to the classification means 4. In the emotion identification database 406, for example, a determination table shown in FIG. 7 is recorded. In this determination table, when it is determined that "anger", the voice recognition means 2 recognizes the word as expressing anger, measures the input volume level as being large or normal, and determines that the generation speed is fast or normal. Only when it is done, the judgment of “anger” is made.
[0046]
Subsequently, the classification unit 4 adds the emotion classification result to each category held in the classification database 6 ′ to perform classification. Here, FIG. 8 shows an example of the classification database 6 ′ according to the second embodiment. As shown in the figure, it is possible to measure the utterance amount obtained by further dividing one category for each emotion.
[0047]
Lastly, the counter means 5 counts the number of words recorded for each category and for each emotion classification by the classification means 4 during the use period as a user's utterance amount, and provides the user and doctor etc. for each category and Shows numerical values for each emotion classification.
[0048]
This makes it possible to measure the amount of speech for each emotional state when the user speaks, and to provide more detailed information that can be used for psychotherapy to doctors and caregivers.
[0049]
In the second embodiment, the emotion determination unit of the present invention includes a power measurement unit 401, a speed measurement unit 402, an emotion classification unit 404, an emotion identification unit 403, and an emotion identification database 406. The power measuring unit 401, the speed measuring unit 402, and the emotion classifying unit 404 are arbitrarily selected, and the emotion determination is performed using only those results. In other words, it only needs to be an emotion determination unit that determines the emotion of the sound signal.
[0050]
In the second embodiment, the emotion classification unit of the present invention includes the emotion classification unit 404 and the emotion classification database 405. However, the emotion classification database 405 may be incorporated in the emotion classification unit 404.
[0051]
In the second embodiment, the emotion specifying means of the present invention includes the emotion specifying means 403 and the emotion specifying database 406. However, the emotion specifying database 406 may be incorporated in the emotion specifying means 403.
[0052]
Further, the emotion classification means of the present invention may also serve as the emotion identification means.
[0053]
(Embodiment 3)
FIG. 9 is a block diagram of a speech meter according to the third embodiment. The utterance meter according to the third embodiment has the same basic configuration as that of the second embodiment, except that the utterance meter measures the physiological information of the user and makes an emotion determination in consideration of a change in the physiological information. . Therefore, this difference will be mainly described. The same components as those in the second embodiment are denoted by the same reference numerals.
[0054]
The utterance meter according to the third embodiment further includes a physiological measurement unit 601 for measuring physiological information of the user. Here, a configuration diagram of the physiological measurement unit 601 is shown in FIG. As shown in the figure, the physiological measurement means 601 has an infrared light emitting means 701 for irradiating an infrared ray to a blood vessel and an infrared light receiving means 702 for reflecting a blood vessel and outputting a signal corresponding to the intensity of the incident infrared light. ing. Here, as shown in FIG. 10, the infrared light emitting means 701 and the infrared light receiving means 702 irradiate a user's earlobe or a blood vessel in a finger with a clip structure or a ring-shaped structure, and reflected light thereof is transmitted to the infrared light receiving means 702. It is structured to be incident. Also, a pulse wave is formed using a signal input from the infrared light receiving unit 702, and a pulse wave detecting unit 703 that outputs the pulse wave, and a heart rate per unit time is calculated using the pulse wave input from the pulse wave detecting unit 703. Is calculated and output to the emotion specifying means 403.
[0055]
An utterance amount measuring method using the utterance meter configured as described above will be described below.
[0056]
First, as shown in FIG. 10, an infrared light emitting means 701 and an infrared light receiving means 702 are attached to an earlobe or a finger of a user by a clip structure or a ring-shaped structure, and infrared rays emitted from the infrared light emitting means 701 are transmitted to the earlobe or the finger. And the reflected light is incident on the infrared light receiving means 702. The infrared light receiving means 702 outputs a signal proportional to the intensity of the incident infrared light to the pulse wave detecting means 703.
[0057]
Here, the intensity of the infrared light incident on the infrared light receiving means 702 is inversely proportional to the absorbance of the blood vessel, but the blood vessel contracts and expands as the heart beats, and the absorbance changes accordingly. The pulse wave detecting means 703 reconstructs the pulse wave of the heart from the above-mentioned change in the absorbance of the blood vessel, and outputs the reconstructed pulse wave to the heart rate counting means 704.
[0058]
Subsequently, the heart rate counting unit 704 calculates a period of one cycle of the pulse wave from the pulse wave data, calculates a heart rate per unit time, and outputs the calculated heart rate to the emotion specifying unit 403.
[0059]
Next, the emotion specifying unit 403 evaluates an input from the physiological measurement unit 601 based on the assumption that “when a change in emotion occurs, a change in physiological information also occurs.” In this case, based on the results of the power measurement unit 401, the speed measurement unit 402, and the emotion classification unit 404, emotion determination such as normal, joy, anger, sadness, and the like at the time of speech is performed according to the determination table illustrated in FIG. Is output to the classification means 4.
[0060]
If the amount of change in the physiological information is equal to or less than a certain reference, the previous emotion determination result is output to the classification unit 4. The classification unit 4 adds and records and retains the result of emotion classification for each of the held categories, and the counter unit 5 stores the words recorded for each category and each emotion classification by the classification unit 4 during the use period. Is counted as the user's utterance amount, and the numerical value for each category and each emotion classification is shown to the user, doctor, and the like.
[0061]
This makes it possible to more accurately evaluate the utterance of the user as the amount of utterance for each emotional state at the time of utterance.
[0062]
Although the heart rate is raised in the third embodiment as the physiological information of the present invention, the amount of perspiration, the body temperature, the amount of blinking of the eyes, and the like may be detected.
[0063]
(Embodiment 4)
FIG. 11 is a block diagram of the utterance meter according to the fourth embodiment. The utterance meter according to the fourth embodiment has the same basic configuration as that of the first embodiment, but includes a human body detection unit 801 that detects whether or not a person is speaking when speaking. Is different. Therefore, this difference will be mainly described. The same components as those in the first embodiment are denoted by the same reference numerals.
[0064]
FIG. 12 shows a configuration diagram of the human body detection means 801. As shown in the figure, the human body detecting means 801 has a pyroelectric sensor 901 for detecting radiant heat from an object. Further, there are provided a fluctuation detecting means 902 for measuring a change amount of the temperature distribution detected by the pyroelectric sensor 901 and a human body determining means 903 for judging whether or not the target object is a human based on the result of the change amount.
[0065]
Further, the utterance meter according to the first embodiment has the classification database 6 in which words and their categorization are associated, but the utterance meter according to the fourth embodiment emits the words. It has a classification database 6 "to which the presence or absence of human detection as to whether or not the user is speaking to a person is sometimes added.
[0066]
An utterance amount measuring method using the utterance meter configured as described above will be described below.
[0067]
The pyroelectric sensor 901 is attached so as to face the front of the user as shown in FIG. 12, detects infrared rays due to radiant heat from the target object by a pyroelectric effect, and outputs its intensity.
[0068]
Next, the fluctuation detecting means 902 obtains the area of the temperature distribution from the normal human body temperature of 34 degrees to 40 degrees from the input intensity from the pyroelectric sensor 901 and calculates the amount of change of the area per unit time. It outputs to the human body determination means 903 as the amount of change in the temperature distribution.
[0069]
Next, the human body determination unit 903 has a reference width of the amount of change in the temperature distribution when a person is talking face-to-face in a normal state, and the input from the fluctuation detection unit 902 falls within the reference width. If so, the object in front is determined to be a human body and output to the classifying means 4.
[0070]
Next, the classification means 4 adds the result of the human body detection means 801 to each category held in the classification database 6 "to perform classification. Here, FIG. 13 shows the classification database 6" in the fourth embodiment. Here is an example. As shown in the figure, the database is a database in which the presence or absence of human detection is added to each word in one category.
[0071]
Finally, the counter means 5 counts the number of words recorded by the classification means 4 for each category and each time when there is or is not a person in front during the use period as a user's utterance amount, And doctors, etc., are shown for each category and the numerical value of whether or not it is interpersonal utterance.
[0072]
As a result, it is possible to measure the amount of utterance by distinguishing the utterance of the user into the case of utterance to a person and the case of not, and obtain information on mental activities related to interpersonal relationships. It can provide doctors and caregivers with useful information for treatment of social phobia and social stress.
[0073]
In the fourth embodiment, a pyroelectric sensor is used as the human body detecting means of the present invention. However, for example, an optical sensor such as a CCD sensor may be used, in other words, whether a person is in front of the user. It only needs to be a person detecting means for determining whether or not it is.
[0074]
Note that the program of the present invention is a program for causing a computer to execute the functions of all or a part of the utterance meter of the present invention (or an apparatus, an element, or the like), and cooperates with the computer. This is a working program.
[0075]
Further, the recording medium of the present invention carries a program for causing a computer to execute all or a part of functions of all or a part of the utterance meter of the present invention (or an apparatus, an element, or the like). It is a recording medium readable by a computer, and the read program is a recording medium that executes the function in cooperation with the computer.
[0076]
One usage form of the program of the present invention may be a form in which the program is recorded on a computer-readable recording medium and operates in cooperation with the computer.
[0077]
One use form of the program of the present invention may be a form in which the program is transmitted through a transmission medium, read by a computer, and operates in cooperation with the computer.
[0078]
The recording medium includes a ROM and the like, and the transmission medium includes a transmission medium such as the Internet, light, radio waves, and sound waves.
[0079]
Further, the computer of the present invention described above is not limited to pure hardware such as a CPU, but may include firmware, an OS, and peripheral devices.
[0080]
Note that, as described above, the configuration of the present invention may be realized by software or hardware.
[0081]
【The invention's effect】
As is clear from the above description, it is possible to provide an utterance meter capable of distinguishing utterances not intended by the user.
[0082]
Further, it is possible to provide an utterance meter capable of measuring the amount of utterance for each emotional state, and measuring the amount of utterance separately when there is a person.
[Brief description of the drawings]
FIG. 1 is a block diagram of an utterance meter according to Embodiment 1 of the present invention.
FIG. 2 is a block diagram of an utterance meter according to the first embodiment of the present invention.
FIG. 3 is a configuration diagram of a voice recognition unit according to the first embodiment of the present invention.
FIG. 4 is a diagram showing an example of category classification of a classification database according to the first embodiment of the present invention;
FIG. 5 is a block diagram of a speech volume meter according to Embodiment 2 of the present invention.
FIG. 6 is a diagram showing an example of emotion classification in an emotion classification database according to Embodiment 2 of the present invention.
FIG. 7 is a diagram showing an example of an emotion identification table in an emotion identification database according to Embodiment 2 of the present invention.
FIG. 8 is a diagram showing an example of category classification of a classification database according to the second embodiment of the present invention;
FIG. 9 is a block diagram of an utterance meter according to Embodiment 3 of the present invention.
FIG. 10 is a configuration diagram of a physiological measurement unit according to Embodiment 3 of the present invention.
FIG. 11 is a block diagram of an utterance meter according to Embodiment 4 of the present invention.
FIG. 12 is a configuration diagram of a human body detection unit according to a fourth embodiment of the present invention.
FIG. 13 is a diagram showing an example of category classification in a classification database according to a fourth embodiment of the present invention.
[Explanation of symbols]
One sound input means
2 voice recognition means
3 utterance volume measurement means
4 classification means
5 counter means
6 classification database

Claims

Sound input means for inputting sound,
Voice recognition means for recognizing the voice of the sound signal input by the sound input means,
An utterance amount meter comprising: an utterance amount measurement unit that measures an utterance amount based on a result from the voice recognition unit.

The voice recognition means,
A plurality of voices not intended by the user and a plurality of voices predicted to be uttered by the user, and associating the words represented by the voices, having a pre-recorded registered voice database,
2. The utterance meter according to claim 1, wherein the utterance meter is a voice recognition unit for recognizing the sound signal from the recorded voice database.

The voice recognition means,
Sound detection means for detecting a sound waveform from the sound signal,
Correlating the words represented by the user's voice waveform and the user's voice waveform, a pre-recorded registered voice database,
Comparison arithmetic means for sequentially comparing the audio waveform registered in the registered audio database with the audio waveform input from the audio detection means, and calculating a similarity;
A recognition unit that outputs a word corresponding to the speech waveform having the highest similarity as a recognition result,
The utterance amount measuring means,
A classification database in which a plurality of words are classified in advance into a plurality of categories according to predetermined criteria,
Classification means for classifying the words input from the recognition means with reference to the classification database,
2. The utterance meter according to claim 1, further comprising counter means for counting a result from the classification means as an utterance amount for each category.

The voice detection means, or further comprising an emotion determination means for performing emotion determination based on the result from the recognition means,
The utterance meter according to claim 3, wherein the classification unit is a classification unit that classifies a result from the voice recognition unit in consideration of a result from the emotion determination unit.

The emotion determination means,
Power measuring means for measuring the magnitude of the sound waveform input from the sound detecting means,
Or, by performing a frequency analysis of the voice waveform input from the voice detection means to measure the evaluation value of the speech speed, speed measurement means to output,
Or, emotion classification means to classify the emotion represented by the words input from the recognition means, and output,
The utterance meter according to claim 4, further comprising: an emotion identification unit that identifies an emotion of the user by using a result from the power measurement unit, the speed measurement unit, or the emotion classification unit.

Further comprising a physiological measurement means for measuring physiological information of the user,
The emotion determination means performs an emotion determination when a change equal to or more than a predetermined threshold occurs in the physiological information, outputs an emotion determination result, and performs an emotion determination when a change equal to or more than the predetermined threshold does not occur in the physiological information. 5. The utterance meter according to claim 4, wherein the utterance meter is an emotion determination unit that outputs a previous emotion determination result.

The physiological measurement means,
Infrared light emitting means for irradiating the user's blood vessels with infrared light,
Infrared light receiving means for receiving the infrared light reflected by the blood vessel and outputting a signal proportional to the intensity of the infrared light,
A pulse wave is configured using a signal input from the infrared light receiving unit, and a pulse wave detecting unit that outputs the pulse wave,
7. The vocal meter according to claim 6, wherein the stimulus meter is a physiological measurement unit having a heart rate counting unit that calculates and outputs a heart rate per unit time using the pulse wave input from the pulse wave detecting unit.

Further comprising a human body detection means for detecting whether there is a person in front of the user,
The utterance amount meter according to claim 1, wherein the utterance amount measurement unit is an utterance amount measurement unit that measures an utterance amount in consideration of human body detection information from the human body detection unit.

The human detection means, a pyroelectric sensor for detecting radiant heat of the object,
Fluctuation detecting means for measuring the amount of change in the temperature distribution detected by the pyroelectric sensor,
9. The utterance meter according to claim 8, further comprising: a human body determination unit that determines that the target object is a human when a change amount equal to or more than a predetermined reference is found from the result of the fluctuation detection unit.

A sound input process of inputting sound,
A voice recognition step of recognizing a voice of a sound signal input in the sound input step,
An utterance amount measuring step of measuring an utterance amount based on a result from the voice recognition step.

Speech recognition means for recognizing a sound signal input by said sound input means of the utterance meter according to claim 1, 4, 7, or 9.
An utterance amount measuring unit that measures an utterance amount based on the voice recognition unit,
The voice detection unit, or an emotion determination unit that performs an emotion determination based on the result from the recognition unit,
A pulse wave is configured using a signal input from the infrared light receiving unit, and a pulse wave detecting unit that outputs the pulse wave.
Using a pulse wave input from the pulse wave detection means, calculates a heart rate per unit time, heart rate counting means to output,
Fluctuation detecting means for measuring the amount of change in the temperature distribution detected by the pyroelectric sensor,
And a program for causing a computer to function as a human body determination unit that determines that an object is a human when there is a change amount equal to or more than a predetermined reference based on a result of the fluctuation detection unit.

A recording medium carrying the program according to claim 11, wherein the recording medium is usable by a computer.