JP3670180B2

JP3670180B2 - hearing aid

Info

Publication number: JP3670180B2
Application number: JP33845899A
Authority: JP
Inventors: 俊彦大場
Original assignee: 有限会社ジーエムアンドエム
Priority date: 1999-02-16
Filing date: 1999-11-29
Publication date: 2005-07-13
Anticipated expiration: 2019-11-29
Also published as: JP2000308198A

Description

【０００１】
【発明の属する技術分野】
本発明は、マイクロホン等により検出した音声を聴力障害者が理解しやすい形式に加工変換して提示する補聴器、音声言語障害を持つ者より発せられた音声や音声言語障害を是正するために用いる補助的装置や手段（例として喉頭摘出後の代用発声法（speech production substitutes））により発せられた音声を加工変換して出力する補聴器に関する。
【０００２】
【従来の技術】
補聴器には、気導方式と、骨導方式とが従来から使用されている。補聴器の種類としては、箱形補聴器、耳かけ補聴器、ＣＲＯＳ（Contra-lateral Routing of Signal）補聴器、耳穴形補聴器がある。また、従来の処理方式として分けると、アナログ補聴器とディジタル補聴器とがある。また、補聴器には、小寺の報告によると集団で使用する大型のもの（卓上訓練用補聴器、集団訓練用補聴器）、個人的に使用する小型のものがある（小寺一興、補聴器の選択と評価図説耳鼻咽喉科new approach メジカルビュ−,39,1996参照）。
【０００３】
このディジタル補聴器は、マイクロホンで検出した音声を先ずＡ／Ｄ（analog／digital）変換処理することでディジタルデータを生成する。そして、このディジタル補聴器は、例えばフーリエ変換処理を施すことにより入力されたディジタルデータを周波数スペクトルに分解することで解析を行い、各周波数帯域毎に音声の感覚的な大きさに基づいた増幅度の算出を行う。そして、このディジタル補聴器は、各周波数帯域毎に増幅されたディジタルデータをディジタルフィルターに通過させてＤ／Ａ変換処理を行って再び音声を使用者の耳に出力するように構成されている。これにより、ディジタル補聴器は、話し手の音声を雑音の少ない状態で使用者に聞かせていた。
【０００４】
また、従来において、例えば喉頭摘出により音声障害をもつ人は、通常の声帯振動による発声機構を失い、音声生成が困難になる。
【０００５】
現在まで、喉頭摘出後の代用発声法として、音源としての振動体の性質から大別するとゴム膜（笛式人工喉頭）やブザー（電気人工喉頭（経皮型、埋込み型））等の人工材料を用いる方法と、下咽頭や食道粘膜を使用する方法（食道発声、気管食道瘻発声、ボイスプロステーシス（voice prostheses）使用の気管食道瘻発声）がある。また、その他の代用発声法としては、口唇を動かしたときに生じる筋電図を利用したものや聴力障害による発声障害者のために種々の音声処理技術を利用した発声発話訓練装置、パラトグラフ（palatograph）によるものや口腔内の振動子によるものが報告されている。
【０００６】
【発明が解決しようとする課題】
しかし、上述したディジタル補聴器では、各周波数帯域毎にディジタルデータを増幅させる処理を行っているだけなので、マイクロホンにより周囲の音を無作為に収音し、雑音をそのまま再生してしまい使用者の不快感が残り、アナログ補聴器と比べても、種々の聴力検査において大幅な改善はなかった。また、従来のディジタル補聴器では、難聴者の身体状態、利用状態及び使用目的に応じて検出した音声に対する処理を適応させることはなされていなかった。
【０００７】
そこで、本発明の目的は、使用者の身体状態、利用状態及び使用目的に応じて音声認識の結果を提示するとともに、ノイズが少ない状態で認識結果を提示することができる補聴器を提供することにある。
【０００８】
また、上記代用発声法に共通してみられるのは、喉頭摘出前の本人自身の本来の正常な状態での声帯振動によるものではないので、生成する音声の音質が良くなく、本来正常であった本人が発していた声とはかけ離れているという問題点が挙げられる。
【０００９】
そこで、本発明は、上述したような実情に鑑みて提案されたものであり、喉頭摘出や舌口腔底切除や構音障害等による音声言語障害を有する人達が本来自身がもつ、或いは自在に変換させて自然な音声で発声することを可能とするとともに、外部からの音声を使用者に出力して自然な会話を行わせることができる補聴器を提供することを目的とする。
【００１０】
【課題を解決するための手段】
上述の課題を解決する本発明に係る補聴器は、音声言語障害を有した使用者から発せられた音声及び／又は外部からの音声を検出して音声信号を生成する音響電気変換手段と、上記音響電気変換手段からの音声信号に基づいて音声認識をする処理を行う音声認識手段と、音声データを記憶する記憶手段と、上記音声認識手段からの認識結果に基づいて上記記憶手段に記憶された音声データを組み合わせ、出力する音声を示す音声情報を生成する音声情報生成手段と、上記音声情報生成手段で生成された音声情報を音声に変換して外部に出力する使用者音声出力手段と、上記音声認識手段で認識された認識結果を上記外部からの音声として使用者に出力する外部音声出力手段とを備えることを特徴とするものである。
【００１１】
このような補聴器は、外部からの音声を使用者に出力するとともに、障害を有して発せられた音声を発声した使用者に出力する。
【００１２】
本発明に係る補聴器は、外部からの音声を検出して音声信号を生成する音響電気変換手段と、上記音響電気変換手段からの音声信号を用いて音声認識処理を行う認識手段と、使用者の身体状態、利用状態及び使用目的に応じて、上記認識手段からの認識結果の内容を変更するように変換する変換手段と、上記認識手段による認識結果及び／又は認識結果を上記変換手段により変換した認識結果を出力させる制御信号を生成する出力制御手段と、上記出力制御手段で生成された制御信号に基づいて上記認識手段による認識結果及び／又は上記変換手段により変換された認識結果、である音声情報を出力して使用者に音声を提示するための認識結果を出力して認識結果である音声情報を使用者に提示する出力手段とを備えることを特徴とするものである。
【００１３】
このような補聴器は、変換手段で認識結果の内容を変更することで出力結果を変更して使用者に変換手段で変更された音声等を提示する。このような補聴器によれば、使用者の身体状態、利用状態及び使用目的に応じて自在に変換方式を変更して認識結果を提示する。
【００１４】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照しながら詳細に説明する。
【００１５】
本発明は、例えば図１及び図２に示すように構成された補聴器１に適用される。この補聴器１は、図１に示すように、ヘッドマウントディスプレイ（head-mounted display: HMD）２と、音声認識、音声情報の生成等を行うコンピュータ部３との間を光ファイバーケーブル４で接続してなる携帯型のものである。また、コンピュータ部３は、例えば使用者の腰部に装着されるような支持部５に付属して配設されており、当該支持部５に付属したバッテリ６から電力が供給されることで駆動するとともに、ＨＭＤ２を駆動させる。
【００１６】
ＨＭＤ２は、使用者の目前に配置されるディスプレイ部７と、使用者からの音声を検出する使用者用マイクロホン８と、使用者に音声を出力する音声出力部９と、使用者の頭部に上述の各部を配置させるように支持する支持部５と、外部からの音声等を検出する外部用マイクロホン１１とを備える。
【００１７】
ディスプレイ部７は、使用者の目前に配されることで例えば使用者用マイクロホン８及び／又は後述の外部用マイクロホン１１で検出した音声の意味内容等を表示する。なお、このディスプレイ部７は、コンピュータ部３からの命令に応じて、上述の音声の意味内容のみならず、他の情報を表示しても良い。
【００１８】
使用者用マイクロホン８は、使用者の口元付近に配設され、使用者が発した音声を検出する。そして、この使用者用マイクロホン８は、使用者からの音声を電気信号に変換してコンピュータ部３に出力する。
【００１９】
外部用マイクロホン１１は、丸板状に形成された音声出力部９の側面に設けられる。この外部用マイクロホン１１は、外部からの音声を検出して電気信号に変換してコンピュータ部３に出力する。
【００２０】
この使用者用マイクロホン８及び外部用マイクロホン１１は、配設する位置を問わず、使用者の操作に応じて、種々のマイク（骨導マイク、気導音と骨導音を拾い上げるマイクをもつ超小型送受話一体ユニットのマイク（日本電信電話株式会社製）、無指向性マイク、単一指向性（超指向性等）マイク、双指向性マイク、ダイナミックマイク、コンデンサーマイク（エレクトレットマイク）、ズームマイク、ステレオマイク、ＭＳステレオマイク、ワイヤレスマイク）、セラミックマイク、マグネティックマイク、マイクロフォンアレイを用いても良い。また、イヤホンとしては、マグネティックイヤホンが使用可能である。これらのマイクの収音技術として、また、伝送技術としてエコーキャンセラ等を用いても良い。また、これらのマイクロホン８，１１は、従来より採用されている利得調整器と音声調整器と出力制御装置（maximam output power control式、automatic recruitment control コンプレッション式等）を適用したものが使用可能である。
【００２１】
更に、使用者用マイクロホン８及び外部用マイクロホン１１は、図１に示すように、別個に設ける一例のみならず、一体に構成されたものであっても良い。
【００２２】
支持部５は、例えば形状記憶合金等の弾性材料等からなり、使用者の頭部に固定可能とすることで、上述のディスプレイ部７，使用者用マイクロホン８，音声出力部９を所定の位置に配設可能とする。なお、この図１に示した支持部５は、使用者の額から後頭部に亘って支持部材を配設することでディスプレイ部７等を所定位置に配設するものの一例について説明したが、所謂ヘッドホン型の支持部であっても良いことは勿論であり、音声出力部９を両耳について設けても良い。
【００２３】
コンピュータ部３は、例えば使用者の腰部に装着される支持部５に付属されてなる。このコンピュータ部３は、図２に示すように、例えばマイクロホン８，１１で検出して生成した電気信号が入力される。このコンピュータ部３は、電気信号を処理するためのプログラムを格納した記録媒体、この記録媒体に格納されたプログラムに従って音声認識、音声情報の生成処理を行うＣＰＵ（Central Processing Unit）等を備えてなる。なお、このコンピュータ部３は、腰部のみならず、頭部のＨＭＤ２と一体化しても良い。
【００２４】
コンピュータ部３は、使用者用マイクロホン８及び／又は外部用マイクロホン１１で検出した音声から生成した電気信号に基づいて、記録媒体に格納されたプログラムを起動することで、ＣＰＵにより音声認識処理を行うことで、認識結果を得る。これにより、コンピュータ部３は、ＣＰＵにより、使用者用マイクロホン８及び／又は外部用マイクロホン１１で検出した音声の内容を得る。
【００２５】
つぎに、本発明を適用した補聴器１の電気的な構成について図２を用いて説明する。この補聴器１は、音声を検出して音声信号を生成する上述のマイクロホン８，１１に相当するマイクロホン２１と、マイクロホン２１で生成された音声信号が入力され音声認識処理を行う上述のコンピュータ部３に含まれる信号処理部２２、信号処理部２２からの認識結果に基づいて音声情報を生成する上述のコンピュータ部３に含まれる音声情報生成部２３と、音声データが記憶され信号処理部２２及び音声情報生成部２３にその内容が読み込まれる上述のコンピュータ部３に含まれる記憶部２４と、音声情報生成部２３からの音声情報を用いて音声を出力する上述の音声出力部９に相当するスピーカ部２５と、音声情報生成部２３からの音声情報を用いて当該音声情報が示す内容を表示する上述のディスプレイ部７に相当する表示部２６とを備える。
【００２６】
上記マイクロホン２１は、例えば喉頭摘出後の代用発声法を用いて発せられた使用者からの音声又は外部からの音声を検出して、当該音声に基づく音声信号を生成する。そして、このマイクロホン２１は、生成した音声信号を信号処理部２２に出力する。
【００２７】
また、このマイクロホン２１は、使用者の口元付近に配設され、使用者が発した音声を検出する。また、このマイクロホン２１は、外部からの音声を検出して音声信号を生成する。なお、以下の説明においては、使用者の音声を検出するマイクロホンを上述と同様に使用者用マイクロホン８と呼び、外部からの音声を検出するマイクロホンを上述と同様に外部用マイクロホン１１と呼び、双方を総称するときには単にマイクロホン２１と呼ぶ。
【００２８】
上記代用発声法としては、例えば人工喉頭（電気式、笛式）、食道発声及び種々の音声再建術を実現するための機構である。
【００２９】
上記信号処理部２２は、マイクロホン２１からの音声信号を用いて音声認識処理を行う。この信号処理部２２は、例えば内部に備えられたメモリに格納した音声認識処理を行うためのプログラムに従った処理を行うことにより音声認識処理を実行する。具体的には、この信号処理部２２は、使用者の音声をサンプリングして生成し記憶部２４に格納された音声データを参照し、マイクロホン２１からの音声信号を言語として認識する処理を行う。この結果、この信号処理部２２は、マイクロホン２１からの音声信号に応じて認識結果を生成する。
【００３０】
この信号処理部２２は、例えば認識対象音声による分類と対象話者による分類の音声認識処理があり、認識対象音声による分類の音声認識処理では単語音声認識（isolated word recognition）と連続音声認識（continuous speech recognition）がある。また、音声情報生成部２３は、連続単語音声認識には連続単語音声認識（continuous word recognition）と文音声認識（sentence speech recognition）、会話音声認識（conversational speech recognition）、音声理解（speech understanding）がある。また対象話者による分類では不特定話者型（speaker independent）、特定話者型（speaker dependent）、話者適応型（speaker adaptive）等がある。この信号処理部２２が行う音声認識手法としては、ダイナミックプログラミング（Dynamic Programming）マッチングによるもの、音声の特徴によるもの、隠れマルコフモデル（ＨＭＭ）によるものがある。
【００３１】
また、信号処理部２２は、入力した音声を用いて話者認識を行う。このとき、信号処理部２２は、使用者の話者からの音声の特徴を抽出する処理や音声の周波数特性を用いて話者認識結果を生成して音声情報生成部２３に出力する。また、信号処理部２２は、話者による変動が小さな特徴量を用いる方法、マルチテンプレート法、統計的手法を用いて不特定話者認識を行う。また、話者適応には、個人差の正規化法、話者間の音声データの対応関係によるもの、モデルパラメータの更新によるもの、話者選択によるものがある。この信号処理部２２では、以上の音声認識を使用者の身体状態、利用状態及び使用目的に応じて行う。
【００３２】
ここで、使用者の身体状態とは使用者の難聴や言語障害の程度等を意味し、利用状態とは使用者が補聴器１を使用する環境（室内、野外、騒音下）等を意味し、使用目的とは使用者が補聴器１を利用するときの目的、すなわち認識の向上させることや、使用者が理解しやすいようにすること等であって、例えば普段話す人と対話することや、不特定多数の人と対話することや、音楽（オペラ、演歌）を観覧することや、講演を聴くことや、言語障害者と対話することである。
【００３３】
また、この信号処理部２２は、マイクロホン２１に入力した音声を記憶し、学習する機能を有する。具体的には、信号処理部２２は、マイクロホン２１で検出した音声の波形データを保持しておき、後の音声認識処理に用いる。これにより、信号処理部２２は、更に音声認識を向上させる。更に、この信号処理部２２は、学習機能を備えることで出力する結果を正確にすることができる。
【００３４】
上記記憶部２４には、上記信号処理部２２が入力された音声を認識するときに、入力された音声を検出することで生成した音声波形と比較される音声モデルを示すデータが格納されている。また、記憶部２４には、例えば喉頭摘出前に発声した声帯振動による発声機構を持つ使用者の音声や、出力することを希望する音声を予めサンプリングして得たデータが音声データとして格納されている。
【００３５】
音声情報生成部２３は、信号処理部２２からの認識結果及び記憶部２４に格納された使用者の音声を示す音声データを用いて、音声情報を生成する。このとき音声情報生成部２３は、認識結果に応じて、記憶部２４に格納された音声データを組み合わせるとともに、認識結果を加工変換して音声情報を生成する。このとき、音声情報生成部２３は、内蔵したＣＰＵ、音声情報生成プログラムを用いて音声情報を生成する。
【００３６】
また、この音声情報生成部２３は、認識結果を用いて音声から音声分析し、当該音声分析した音声の内容に応じて、音声データを再構成するという処理を行うことで、音声を示す音声情報を生成する。そして、音声情報生成部２３は、生成した音声情報をスピーカ部２５及び表示部２６に出力する。
【００３７】
更に、音声情報生成部２３は、信号処理部２２からの認識結果を、使用者の身体状態、利用状態及び使用目的に応じて加工、変換、合成等をして音声情報を生成する処理を行う。更に、この音声情報生成部２３は、マイクロホン２１で検出された音声を使用者に提示するための処理を認識結果及び／又は加工等をして得た認識結果について行う。
【００３８】
更にまた、音声情報生成部２３は、認識結果から生成した音声情報を修飾して新たな音声情報を生成しても良い。このとき、音声情報生成部２３は、使用者の身体状態、利用状態及び使用目的に基づいて、更に使用者が理解し易い言葉を付け加えることで、使用者の音声の認識を更に向上させる。
【００３９】
更にまた、この音声情報生成部２３は、音声情報を表示部２６に出力するときに音声の意味内容を画像として表示部２６に出力する。音声情報生成部２３は、例えば使用者又は使用者の話者及び外部からの音声が入力されて信号処理部２２からの認識結果として物体を示す認識結果が入力されたときには、当該物体を示す画像データを表示部２６に出力して表示させる処理を行う。
【００４０】
更にまた、この音声情報生成部２３は、信号処理部２２からの認識結果に応じて、以前にスピーカ部２５又は表示部２６に出力した音声情報を再度出力する。音声情報生成部２３は、音声情報を出力した後に、使用者又は使用者に対する話者がもう一度聞き直したいことに応じて発した音声を示す認識結果が入力されたと判定したときには、スピーカ部２５又は表示部２６に出力した音声情報を再度出力する処理を行う。また、音声情報生成部２３は、例えば使用者の話者からの音声の特徴を抽出する処理や音声の周波数特性を用いた話者認識結果に基づいて、以前にスピーカ部２５又は表示部２６に出力した音声情報を再度出力しても良い。更に、音声情報生成部２３は、人工知能の機能を用いて音声対話を行うことで、スピーカ部２５又は表示部２６に出力した音声情報を再度出力しても良い。
【００４１】
更にまた、音声情報生成部２３は、再度出力する処理を行うか否かを操作入力部２８からの操作入力命令に応じて切り換えても良い。すなわち、使用者が再度出力する処理を行うか否かの切換を操作入力部２８を操作することで決定し、操作入力部２８をスイッチとして用いる。
【００４２】
また、この音声情報生成部２３は、再度音声情報を出力するとき、以前に出力した音声情報を再度出力するか、以前に出力した音声情報とは異なる音声情報を出力するかを、信号処理部２２を介して入力される操作入力部２８からの操作入力信号に応じて選択する。
【００４３】
表示部２６は、上記音声情報生成部２３で生成した音声情報が示す音声、カメラ機構２９で撮像した画像等を表示する。
【００４４】
操作入力部２８は、スイッチ、キーボード、マウス等でも良く、使用者に操作されることで、操作入力信号を生成する。
【００４５】
このような補聴器１は、マイクロホン２１で検出した音声について信号処理部２２で音声認識処理をして、認識結果に基づいて音声情報生成部２３でプログラムを起動することで使用者に応じた処理を行うことができる。これにより、補聴器１は、スピーカ部２５にマイクロホン２１からの音声を出力するとともに、表示部２６に表示するので、音声に対する使用者の認識を向上させることができる。視覚聴覚同時に矛盾する音韻情報を提示した場合にいずれの情報とも異なった音韻に異聴が生ずるというMuGurk効果（MuGurk H and MacDonald J: Hearing lips and seeing voice,Nature 264,746-8,1976参照）や、乳児がすでに聴覚からの音声情報と視覚からの口形の情報との対応関係を獲得しているとの報告（Kuhl PK et al. Human processing of auditory-visual information in speech perception. ICSLP'94 S11.4,Yokohama,1994）や視覚が音源方向の知覚に影響を与える（腹話術効果）、及び人間は無意識のうちに音源かどうかを学習し、区別するなどの報告は人間のコミュニケーションが本来マルチモーダルなものであるとする仮説を支持するものである（Saitou H and Mori T:視覚認知と聴覚認知Ohmsha,119-20,1999参照）。以上のことは、視覚が聴覚に影響を及ぼしていることを意味し、表示部２６に認識結果等を表示することで音声情報を補足し、音声に対する使用者の認識を向上させる。この補聴器１では、音声のみならず、表示部２６に表示する画像を通じて話者に音声の意味内容を伝達し、対話することができる。
【００４６】
更に、この補聴器１によれば、使用者用マイクロホン８及び／又は外部用マイクロホン１１で検出した音声を認識した結果に応じて表示部２６に表示する音声の意味内容及びスピーカ部２５から出力する音声の内容を変更させることができるので、更に音声に対する使用者の認識を向上させることができる。従って、この補聴器１によれば、音声情報生成部２３により音声認識処理を変更するプログラムを実行することにより、身体状態（難聴の程度等）、利用状態及び使用目的に応じて認識処理を変更することで、使用者が理解しやすい音声の意味的な情報を表示することで更に認識を向上させることができる。
【００４７】
スピーカ部２５は、上記音声情報生成部２３で生成した音声を出力する。このスピーカ部２５としては、例えば使用者から話し手に対して音声を出力するものであっても良く、更には、使用者が発した音声を使用者の耳に対して発声するように音声を出力するものであっても良い。また、使用者の耳に対して発声するように音声を出力するスピーカ部２５は、スピーカユニットの変換方式としてダイナミック型や静電型（コンデンサ型、エレクトロスタティック型）によるものでも良く、形状としてはヘッドフォン（オープンエア型、クローズド型、カナルタイプ等のイン・ザ・イヤー型等）によるものでも良い。また、スピーカ部２５は、従来の補聴器、拡声器、集音器のスピーカによるものでも良く、使用者から話者に対して音声を出力するスピーカ部２５は従来から用いられているスピーカ装置でよい。
【００４８】
また、スピーカ部２５は、音声情報に基づいて出力する音声と逆位相の音を出力するようにしても良い。これにより、スピーカ部２５から出力する音声に含まれる雑音成分を除去し、使用者及び／又は使用者に対する話者に雑音の少ない音声を出力する。
【００４９】
また、この補聴器１は、外部の通信ネットワークと接続された通信回路２７を備えている。この通信回路２７は、電話、携帯電話、インターネットや無線、衛星通信等の通信ネットワークを介して例えば音声言語障害を有する者から発せられた音声が入力される。この通信回路２７は、外部からの音声や音声を示すデータを信号処理部２２に入力する。また、この通信回路２７は、音声情報生成部２３で生成した音声情報を外部のネットワークに出力する。
【００５０】
また、この通信回路２７は、信号処理部２２、音声情報生成部２３を介して文字放送、文字ラジオを表示部２６で表示させても良い。このとき、通信回路２７は、文字放送等を受信するためのチューナ機能を備え、使用者の所望のデータを受信する。
【００５１】
このように構成された補聴器１は、例えば喉頭摘出後に電気式人工喉頭を使って発声された音声がマイクロホン２１に入力された場合であっても、信号処理部２２で音声認識し、記憶部２４に格納された喉頭摘出前にサンプリングした音声を示す音声データを用いて音声情報生成部２３で出力する音声を示す音声情報を生成するので、スピーカ部２５から喉頭摘出前の使用者の音声に近似した音声を出力することができる。
【００５２】
なお、上述した本発明を適用した補聴器１の説明においては、マイクロホン２１で検出される喉頭摘出した人の音声である一例について説明したが、聴力障害による言語障害の一つである構音障害（articulation disorders）を持つ者からの音声を検出したときであっても良い。このとき、補聴器１は、言語障害の音声を音声データとして記憶部２４に記憶しておき、当該発声者が発声したことに応じて記憶部２４に格納された発声者の音声を示す音声データを参照して信号処理部２２で音声認識処理を行い、音声情報生成部２３で認識結果に応じて音声データを組み合わせることで音声情報を生成する処理を行うことにより、スピーカ部２５から音声言語障害のない音声を出力するとともに、表示部２６により音声情報に基づいた音声内容を表示することができる。
【００５３】
したがってこの補聴器１によれば、例えば喉頭摘出者が代用発声法により発生した音声を表示部２６に表示することで不自然な音声を訂正させることができる。
【００５４】
更に、補聴器１は、例えば聴力障害による構音障害を持つ者は発声のためのフィードバックが得られず、「きょうは（今日は）」という音声が「きょんわあ」となってしまうのを上述した処理を行うことにより正常な「きょうは（今日は）」という音声に訂正してスピーカ部２５から出力することができる。
【００５５】
更に、この補聴器１は、表示部２６を備えているので、発声者の音声をスピーカ部２５から正常な音声にして出力するとともに、発声者の音声内容を表示することにより音声障害者や難聴者の言語訓練学習にとって好適なシステムを提供することができる。
【００５６】
つぎに、上述の音声情報生成部２３が信号処理部２２からの認識結果を加工、変換して音声情報を生成する処理、音声データを組み合わせる処理で適用することができる種々の例について述べる。なお、変換処理等の種々の例は、以下に述べる例に限定するものではない。
【００５７】
音声情報生成部２３は、信号処理部２２からの認識結果を変換するとき、人工知能技術を用いて認識結果を加工変換して音声情報を生成しても良い。音声情報生成部２３は、例えば音声対話システムを用いる。ここで、特に聴力の低下した老人は相手話者の言ったことを再度聞き直すことがあるが、このシステムを用いて認識結果を加工変換することにより、補聴器１と使用者とが対話して以前に記憶した相手話者の言ったことの情報を得て、使用者の音声認識を向上させることができ、聞き直す手間を省略することができる。
【００５８】
このようなシステムは、マルチモーダル対話システムである表情つき音声対話システムを用いることで実現可能である。このマルチモーダル対話システムでは、ポインティングデバイスとタブレットを利用する入力技術である直接操作・ペンジェスチャ技術、テキスト入力技術、音声認識等の音声入出力技術、人間の視覚、聴覚、触覚、力覚を利用した仮想現実感技術、ノンバーバルモダリティ技術の技術要素をモダリティとし組み合わせて用いる。このとき、音声情報生成部２３は、言語情報を補足する手段、対話の文脈情報（或いはその補足手段）、使用者の認知的負担或いは心理的抵抗感を軽減する手段として各モダリティを用いる。なお、ノンバーバルインターフェースとして身振り（gesture）インターフェースを用いてもよい。その場合ジェスチャーインターフェースの計測として装着型センサによる身振り計測には身振りトラッキングが必要であり手袋型デバイス、磁気や光学的位置計測を用い、身振りの非接触計測にはマーカを立体解析したりする映像や３Ｄ再構成によるものを用いてもよい。
【００５９】
なお、このマルチモーダル対話システムの詳細は文献「Nagao K and Takeuchi A,Speech dialogue with facial displays: Multimodal human-computer conversation.Proc.32nd Ann Meeting of the Association for Computational Linguistics,102-9,Morgan Kaufmann Publishers,1994及びTakeuchi A and Nagao K,Communicative facial displays as a new conversational modality.Proc ACM/IFIP Conf on Human Factors in Computing Systems（INTERCHI'93）,187-93, ACM Press,1993」に記載されている。
【００６０】
このような人工知能機能を用いた音声対話システムとしては、マイクホン２１で検出した音声を、信号処理部２２でＡ／Ｄ変換、音響分析、ベクトル量子化の後、音声認識モジュールによって、上位スコアをもつ単語レベルの最良仮説を生成するシステムが使用可能である。ここで、音声情報生成部２３は、隠れマルコフモデル（ＨＭＭ）に基づく音韻モデルを用いて、ベクトル量子コードから音素を推定し、単語列を生成する。音声情報生成部２３は、生成した単語列を、構文・意味解析モジュールにより意味表現に変換する。このとき、音声情報生成部２３は、単一化文法を用いて構文解析を行い、次にフレーム型知識ベースと事例ベース（例文を解析して得られた文パターン）を用いて曖昧さの解消を行う。発話の意味内容の決定後、プラン認識モジュールにより使用者の意図を認識する。これは対話の進行に従い動的に修正・拡張されていく使用者の信念モデルと対話のゴールに関するプランに基づいている。意図を認識する課程で、主題の管理や、代名詞の照応解消、省略の補完などを行う。そして使用者の意図に基づいて協調的な応答を生成するモジュールが起動する。このモジュールはあらかじめ用意されたテンプレートの発話パターンに領域知識により得られた応答に関する情報を埋め込むことにより発話を生成する。この応答は音声合成モジュールにより音声となる。なお、この信号処理部２２及び音声情報生成部２３が行う処理としては、例えば文献（Nagao N,A preferential constraint satisfaction technique for natural language analysis. Proc 10th European Conf on Artificial Intelligence ,523-7,John Wiley&Sons,1992）、（Tanaka H,Natural language processing and its applications,330-５,1999,電子情報通信学会編コロナ社）、（Nagao K, Abduction and dynamic preference in plan-based dialogue understanding.Proc 13th Int joint Conf on Artificial Intelligence,1186-92,Morgan Kaufmann Publishers,1993）に記載された処理を行うことでも実現可能である。
【００６１】
また、音声情報生成部２３は、人工知能機能を用いて行う処理として、システムの擬人化を行い、音声認識、構文・意味解析、プラン認識より表情パラメータ調節、表情アニメーションを表示部２６を用いて行うことにより、視覚的手段を用いて音声対話に対して使用者の認知的負担、心理的抵抗感を軽減する。なお、この音声情報生成部２３が行う処理としては、FACS（Facial Action Coding System；Ekman P and Friesen WV, Facial Action Coding System.Consulting Psychologists Press Palo Alto,Calif,1978）に記載された処理を行うことができる。
【００６２】
更にまた、音声情報生成部２３は、音声対話コンピュータシステム（参照Nakano M et al,柔軟な話者交代を行う音声対話システムDUG-1,言語処理学会第５回年次大会論文集,161-4,1999）としては話し言葉を理解する逐次理解方式（Incremental Utterance Understanding：Nakano M, Understanding unsegmented user utterances in real-time spoken dialogue systems.Proc of the 37th Ann meeting of the association for computational linguistics,200-7）と内容の逐次変更が可能な逐次生成方式（Incremental Utterance Production：Dohsaka K and Shimazu A,A computational model of incremental utterance production in task-oriented dialogues. Proc of the 16th Int Conf on Computational Linguistics, 304-9, 1996. 及びDohsaka K and Shimazu A,System architecture for spoken utterance production in collaborative dialogue. Working Notes of IJCAI 1997 Workshop on Collaboration, Cooperation and Conflict in Dialogue Systems, 1997及び Dohsaka K et al,複数の対話ドメインにおける協調的対話原則の分析、電子情報通信学会技術研究報告NLC-97-58,25-32,1998）による音声と画像を用いる人工知能システムである。ここで、音声情報生成部２３は、理解と応答のプロセスが平行動作する。また、音声情報生成部２３は、ISTARプロトコール（参照Hirasawa J,Implementation of coordinative nodding behavior on spoken dialogue systems, ICSLP-98,2347-50,1998）を用いて音声認識と同時に単語候補を言語処理部に逐次的に送る。
【００６３】
すなわち、音声対話システムDUG-1で用いている技術を用いることにより、補聴器１では、例えば所定のデータ量（文節）ごとに使用者及び／又外部からの音声を音声認識するとともに、音声情報を生成する処理を行う。音声情報生成部２３では、使用者及び／又は外部からの音声に応じて、音声認識処理、音声情報認識処理を随時中止、開始することができ、効率的な処理を行うことができる。更に、この補聴器１では、使用者の音声に応じて、音声認識処理、音声情報生成処理を制御することができるので、柔軟に話者の交替を実現することができる。すなわち、音声情報を生成している最中に使用者及び／又は外部からの音声を検出することで処理を変更し、使用者に提示する音声情報の内容を変更等の処理を行うことができる。
【００６４】
更にまた、音声情報生成部２３は、キーワードスポティングを用いて使用者の自由な発話を理解する処理を行っても良い（Takabayashi Y,音声自由対話システム TOSBURG II −使用者中心のマルチモーダルインターフェースの実現に向けて−.信学論 vol J77-D-II No.8 1417-28,1994）。
【００６５】
この音声情報生成部２３は、例えばアクセント等の処理を行うように変換処理を行って音声情報を出力しても良い。このとき、音声情報生成部２３は、必要に応じて、特定の発音についてはアクセントの強弱を変化させるように音声情報を変換して出力するようにする。
【００６６】
音声情報生成部２３は、音声データを合成するとき、どのような内容の音声でも合成するときには規則による音声合成、滑らかな音声を合成するために可変長単位を用いた音声合成、自然な音声を合成するための韻律制御、また音声の個人性付与のために音質変換を行って音声情報を生成しても良い。これは、例えば書籍「"自動翻訳電話" ATR国際電気通信基礎技術研究所編 pp.177-209, 1994オーム社」に記載されている技術を適用することにより実現可能である。
【００６７】
また、ボコーダ（vocoder）処理を用いても高品質の音声を合成することが可能である。例えば音声分析変換合成法STRAIGHT（speech transformation and representation based on adaptive interpolation of weighted spectrogram）等を施すことで実現可能である（文献「Maeda N et al,Voice Conversion with STRAIGHT. TECHNICAL REPORT OF IEICE, EA98-9,31-6, 1998」参照）。
【００６８】
更に、この音声情報生成部２３は、文字情報から音声を作り出す音声合成（text to speech synthesis）技術を用いることにより話の内容に関する情報（音韻性情報）や音の高さや大きさに関する情報（韻律情報）を聴力障害者の難聴の特性に合わせてその人の最も聞き易い音の高さに調整することも可能であり、他に話速変換技術（voice speed converting）、周波数圧縮（frequency compress）処理などの音声特徴量の変換処理を行う。また出力する音声の帯域を調整する帯域拡張（frequency band expansion）処理や、音声強調（speech enhancement）処理等を音声情報に施す。帯域拡張処理、音声強調処理としては、例えば「Abe M, "Speech Modification Methods for Fundamental Frequency, Duration and Speaker Individuality," TECHNICAL REPORT OF IEICE, SP93-137,69-75, 1994」にて示されている技術を用いることで実現可能である。なお、上述したように、信号処理部２２及び音声情報生成部２３で音声認識処理をして認識結果を加工変換する場合のみならず、上記処理のみを行ってスピーカ部２５に出力しても良い。また、この補聴器１では、認識結果及び／又は上記処理のみを行った結果を同時に又は時間差を付けて出力しても良い。また、この補聴器１では、認識結果及び／又は上記処理のみを行った結果をスピーカ部２５又は表示部２６の右チャンネルと左チャンネルとで異なる内容を出力しても良い。
【００６９】
更にまた、上記音声情報生成部２３は、認識結果を用いて音声から言語を理解し、当該理解した言語を用いて音声データから音声情報を構成するという処理を行うのみならず、他の処理を認識結果に基づいて理解した言語を必要に応じて加工変換する処理を行っても良い。すなわち、この音声情報生成部２３は、音声情報を構成するとともに、音声情報としてスピーカ部２５に出力するときの速度を変化させる話速変換処理を行っても良い。すなわち、この話速変換処理は、使用者の状態に応じて適当な話速を選択することによりなされる。
【００７０】
更にまた、この音声情報生成部２３は、認識結果に応じて、例えば日本語の音声情報を英語の音声情報に変換して出力するような翻訳処理を行って出力しても良く、通信機能と合わせて自動翻訳電話にも応用可能である。更には音声情報生成部２３は自動要約（automatic abstracting）を行い、「United States of America」を「USA」と要約するように変換して音声情報を出力しても良い。
【００７１】
音声情報生成部２３が行う他の自動要約処理としては、例えば文章内から要約に役立ちそうな手がかり表現を拾い出し、それらをもとに読解可能な文表現を生成する生成派の処理（文献「McKeown K and Radev DR,Generating Summaries ofMultiple News Articles. In Proc of 18th Ann Int ACM SIGIR Conf on Res and Development in Information Retrieval,74-82, 1995 及び Hovy E,Automated Discourse Generation using Discourse Structure Relations, Artificial Intelligence, 63, 341-85, 1993」参照）、要約を「切り抜き」と考えて処理し客観的評価が可能となるように問題を設定しようという立場の抽出派の処理（文献「Kupiec J et al,A Trainable Document Summarizer, In Proc of 14th AnnInt ACM SIGIR Conf on Res and Development in Information Retrieval, 68-73, 1995」、及び「Miike S, et al, A Full-text Retrieval System with a Dynamic Abstruct Generation Function.Proc of 17th Ann Int ACM SIGIR Conference on Res and Development in Information Retrieval,152-9, 1994」及び「Edmundson HP,New Method in Automatic Extracting. J of the ACM, 16,264-85, 1969」参照）がある。更に、この音声情報生成部２３は、例えば文献「Nakazawa M, et al.Text summary generation system from spontaneous speech,日本音響学会講演論文集 1-6-1,1-2, 1998」に記載されている手法（Partial Matching MethodとIncremental Reference Interval-Free連続DPを用いて重要キーワードの抽出を行い、Incremental Path Methodを用いて単語認識を行う）を用いることが可能である。
【００７２】
更にまた、この音声情報生成部２３は、認識結果に応じて、特定の音素、母音、子音、アクセント等において、消去したり、音声を出力することに代えてブザー音、あくび音、せき音、単調な音等を音声情報とともに出力するように制御しても良い。このとき、音声情報生成部２３は、例えば文献「Warren RM andPerceptual Restoration of Missing Speech Sounds, Science vol.167 ,392-393, 1970」や文献「Warren RM,Obusek CJ, "Speech perception and phonemic restoration,” Perception and psychophysics vol.9 ,358-362, 1971」に記載されている手法を実現した処理を音声情報について行う。
【００７３】
更にまた、音声情報生成部２３は、認識結果を用いてホーン調となるように音質を変換させて音声情報を出力しても良い。上記ホーン調とは、集音管を使ったもので、約２０００Ｈｚ以下の帯域の音声を増幅させて、利得を約１５ｄＢ程度とすることである。すなわち、このホーン調とは、管共鳴を用いた重低音を再生する技術により出力される音質である。この音声情報生成部２３は、例えばUS PATENT 4628528により公知となされいているアコースティックウェーブ・ガイド（acoustic wave guide）技術を用いて出力される音質に近似した音に変換して音声情報を出力する。ここで、音声情報生成部２３は、例えば低音のみを通過させるフィルター処理を行って音声情報を出力する処理を行っても良く、例えばSUVAG（Systeme Universel Verbo-tonal d'Audition-Guberina）機器を用いることにより、所定の周波数帯域の音声のみを通過させる種々のフィルタ処理を行って音声情報を出力する処理を行っても良い。
【００７４】
更にまた、この音声情報生成部２３は、例えばマイクロホン２１に音楽が入力されたと判断したときには、音声情報を変換して表示部２６に音符や色を表示するように処理を行っても良い。また、この音声情報生成部２３は、音声のリズムなどが分かるために変換した音声のリズムを信号が点滅するように音声情報を変換して表示部２６に表示しても良い。
【００７５】
更にまた、この音声情報生成部２３は、例えば警報等の発信音がマイクロホン２１に入力されたと判断したときには、音声情報を変換することで表示部２６に警報等がマイクロホン２１で検出された旨の表示を行ったり、スピーカ部２５に警報の内容を知らせるような内容を出力しても良く、例えば救急車や非常ベルのサイレンを聞いたら表示するだけでなく大音量で「救急車ですよ」や「火事ですよ」とスピーカ部２５から出力するとともに、表示部２６に救急車や火事を示す画像を表示することで難聴者に非常事態を伝えることができ、最悪の事態を避けることができる。
【００７６】
更にまた、音声情報生成部２３は、過去に行った変換合成処理について記憶する機能を備えていても良い。これにより、音声情報生成部２３は、過去に行った変換合成処理の改良を自動的に行う学習処理を行うことができ、変換合成処理の処理効率を向上させることができる。
【００７７】
更にまた、この信号処理部２２及び音声情報生成部２３は、話し手の音声のみについての認識結果を生成して音声情報を生成し、スピーカ部２５及び/又はディスプレイ部７に提示することで使用者に知らせる一例のみならず、例えば特定の雑音に対してのみ音声認識を行っても良い。要するに、信号処理部２２及び音声情報生成部２３は、入力した音について音声認識処理を行って、認識結果を使用者の身体状態、利用状態及び使用目的に応じて変換することで使用者が理解し易い表現で音声情報を生成して出力する処理を行う。
【００７８】
更にまた、上述した本発明を適用した補聴器１の説明おいては、記憶部２４に予めサンプリングして格納した音声データを音声情報生成部２３により組み合わせることにより音声情報を生成して出力するものの一例について説明したが、上記音声情報生成部２３は、記憶部２４に記憶された音声データを組み合わせて音声情報を生成するときに格納された音声データに変換処理を施す音声データ変換部を備えていても良い。このような音声データ変換部を備えた補聴器１は、例えばスピーカ部２５から出力する音声の音質を変化させることができる。
【００７９】
更にまた、上述した本発明を適用した補聴器１の説明おいては、例えば喉頭摘出前の使用者の音声を予めサンプリングすることにより得た音声データを記憶部２４に格納するものの一例について説明したが、記憶部２４には、一つの音声データのみならず複数の音声データを予めサンプリングして格納しても良い。すなわち記憶部２４には、例えば喉頭摘出前に発せられた音声を予めサンプリングした音声データ、及び前記喉頭摘出前に発せられた音声に近似した音声データを格納しても良く、更には全く異なる音質の音声データを格納しても良く、更にまた、喉頭摘出前の音声データを生成し易い音声データを格納しても良い。このように複数の音声データが記憶部２４に格納されているとき、音声情報生成部２３は、各音声データの関係を例えば関係式等を用いて関連づけを行って選択的に音声データを用いて音声情報を生成しても良い。
【００８０】
また、上述の補聴器１は、サンプリングして記憶部２４に格納した音声データを合成することで音声情報を生成して出力する一例について説明したが、記憶部２４に記憶されている音声データを合成することで生成した音声情報に、音声情報生成部２３によりボコーダ処理を施すことにより、サンプリングして記憶されている音声データが示す音声とは異なる音質の音声に変換して出力しても良い。このとき、音声情報生成部２３は、ボコーダ処理を用いた例としてSTRAIGHTを施す。
【００８１】
更にまた、信号処理部２２は、話者認識（speaker recognition）処理を入力される音声について行って各話者に対応した認識結果を生成しても良い。そして、この信号処理部２２では、各話者に関する情報を認識結果とともにスピーカ部２５や表示部２６に出力することで使用者に提示しても良い。
【００８２】
補聴器１で話者認識を行うときには、ベクトル量子化（文献Soong FK and Rosenberg AE,On the use of instantaneous and transitional spectral information in speaker recognition.Proc of ICASSP’86,877-80,1986）によるものでも良い。このベクトル量子化を利用した話者認識では、準備段階の処理として登録話者用の学習用音声データからスペクトルの特徴を表すパラメータを抽出して、これらをクラスタリングすることによりコードブックを作成する。ベクトル量子化による方法は話者の特徴が作成された符号帳に反映されていると考える手法である。認識時には入力された音声と全ての登録話者のコードブックを用いてベクトル量子化を行い、入力音声全体に対して量子化ひずみ（スペクトルの誤差）を計算する。この結果を用いて話者の識別や照合の判定を行う。
【００８３】
また、補聴器１で話者認識を行うときには、ＨＭＭ (文献Zheng YC and Yuan BZ,Text-dependent speaker identification using circular hidden Markov models.Proc of ICASSP’88,580-2,1988）よる方法であっても良い。この方法では、準備段階の処理として登録話者の学習用音声データからＨＭＭを作成する。ＨＭＭを用いる方法では話者の特徴は状態間の遷移確率とシンボルの出力確率に反映されると考える。話者認識の段階では入力音声を用いて全ての登録話者のＨＭＭによる尤度を計算して判定を行う。ＨＭＭの構造としてleft~to~rightモデルに対してエルゴティックなHMMを用いてもよい。
【００８４】
更にまた、補聴器１では、ATR-MATRIX system（ＡＴＲ音声翻訳通信研究所製：参照 Takezawa T et al, ATR-MATRIX: A spontaneous speech translation system between English and Japanese. ATR J2,29-33,June1999）で用いられている音声認識（ATRSPREC）、音声合成（CHATR）、言語翻訳（TDMT）を行うことで、マイクロホン２１で入力した音声を翻訳して出力することができる。
【００８５】
上記音声認識（ATRSPREC）では、大語彙連続音声認識を行い、音声認識ツールを用いて音声認識に必要な音響モデルと言語モデルの構築、及び信号処理から探索までの工程を処理する。この音声認識では、行った処理をツール群として完結し、ツール同士の組み合わせることができる。また、この音声認識を行うとき、不特定話者の音声認識を行っても良い。
【００８６】
上記音声合成（CHATR）では、あらかじめデータベース化された多量の音声単位から、出力したい文に最も適した単位を選択してつなぎあわせ、音声を合成する。このため、滑らかな音声が出力することができる。この音声合成では、話し手の声に最も近い音声データを用いて話し手の声に似た声で合成することができる。また、この音声合成を行うときには、音声情報生成部２３は、入力された音声から話し手が男性か女性かを判断し、それに応じた声で音声合成を行っても良い。
【００８７】
上記言語翻訳（TDMT）では、文の構造を判断する処理、対話用例を用いた対話特有のくだけた表現などの多様な表現を扱って言語翻訳を行う。また、この言語翻訳では、マイクロホン２１が一部聞き取れなかった部分があっても、翻訳できる部分はなるべく翻訳する部分翻訳処理を行い、一文全体を正確に翻訳できない場合でも、話し手が伝えたい内容をかなりの程度相手に伝える。
【００８８】
また、上記音声認識、音声合成、言語翻訳を行うときには、通信回路２７を介して携帯電話等の通信機器と接続して双方向の対話可能である。
【００８９】
上記音声認識、音声合成、言語翻訳を行う補聴器１では、例えば日英双方向の音声翻訳システムの利用、ほぼリアルタイムの認識、翻訳、合成、話し始めの指示をシステムに与える必要がなく、全二重の対話が可能自然な発話に対する、質の高い認識、翻訳、合成「あのー」、「えーと」といった言葉や、多少くだけた表現があっても認識が可能となる。
【００９０】
更にまた、音声情報生成部２３は、上記音声認識（ATRSPREC）において、信号処理部２２からの認識結果に基づいて文の構造を判断するだけでなく、対話用例を用いることにより、対話特有のくだけた表現などの多様な表現に対応した音声情報を生成する。また、音声情報生成部２３は、マイクロホン２１で会話中の一部が聞き取れなかった部分があっても、音声情報を生成することができる部分はなるべく音声情報を生成する。これにより、音声情報生成部２３は、一文全体の音声情報を正確に生成できない場合でも、話し手が伝えたい内容をかなりの程度相手に伝える。このとき、音声情報生成部２３は、翻訳処理（部分翻訳機能）を行って音声情報を生成しても良い。
【００９１】
また、音声情報生成部２３は、上記音声合成（CHATR）において、予めデータベース化して記憶された多量の音声単位の音声データから、出力したい文に最も適した単位を選択してつなぎあわせ、音声を合成して音声情報を生成する。これにより、音声情報生成部２３は、滑らかな音声を出力するための音声情報を生成する。また、音声情報生成部２３は、話し手の声に最も近い音声データを用いて話し手の声に似た声で合成処理を行っても良く、入力された音声から話し手が男性か女性かを判断し、それに応じた声で音声合成を行って音声情報を生成しても良い。
【００９２】
更にまた、音声情報生成部２３は、マイクロホン２１からの音声から、特定の音源の音のみを抽出してスピーカ部２５及び／又は表示部２６に出力しても良い。これにより、補聴器１は、複数の音源から到来する音の混合の中から、特定の音源の音のみを抽出して聞くことができるカクテルパーティ現象を人工的に作ることができる。
【００９３】
更にまた、音声情報生成部２３は、音韻的に近い例を用いて誤りを含んだ認識結果を訂正する手法を用いて聞き間違いを修正して音声情報を生成しても良い（文献Ishikawa K, Sumida E: A computer recovering its own misheard-Guessing the original sentence form a recognition result based on familiar expressions- ATR J 37,10-11,1999）。このとき、音声情報生成部２３は、使用者の身体状態、利用状態及び使用目的応じて処理を行って、使用者にとってわかりやすい形態に加工変換する。
【００９４】
なお、上述した補聴器１の説明においては、マイクロホン２１で検出した音声について音声認識処理、音声生成処理を行う一例について説明したが、使用者等により操作される操作入力部２８を備え当該操作入力部２８に入力されたデータを音声及び／又は画像とするように信号処理部２２により変換しても良い。また、この操作入力部２８は、例えば使用者の指に装着され、指の動きを検出することでデータを生成して信号処理部２２に出力するものであっても良い。
【００９５】
また、この補聴器１は、例えば使用者が液晶画面等をペンにより接触させることで文字及び/又は画像を描き、その軌跡を取り込むことによる画像に基づいて文字及び/又は画像データを生成する文字及び/又は画像データ生成機構を備えていても良い。補聴器１は、生成した文字及び/又は画像データを信号処理部２２及び音声情報生成部２３により認識・変換等の処理を行って出力する。
【００９６】
更に、上述の補聴器１は、マイクロホン２１等からの音声を用いて信号処理部２２により音声認識処理を行う一例に限らず、例えば使用者及び/又は使用者以外の人が装着する鼻音センサ、呼気流センサ、頸部振動センサからの検出信号及びマイクロホン２１等からの信号を用いて音声認識処理を行っても良い。このように、補聴器１は、マイクロホン２１のみならず上記各センサを用いることにより、信号処理部２２による認識率を更に向上させることができる。
【００９７】
更に、この補聴器１は、例えば自動焦点機能やズーム機能を搭載したデジタルカメラにより動画像や静止画像等を撮像するカメラ機構２９を図２に示すように備え、表示部２６に表示するものであっても良い。このカメラ機構２９は例えば図１のディスプレイ部７と一体に搭載されても良い。また、上記カメラ機構２９としては、デジタルカメラを用いても良い。
【００９８】
また、この補聴器１に備えられたカメラ機構２９は、撮像した画像を使用者の視力や乱視等の状態に合わせて歪ませたり拡大させたりする画像変換処理を施して表示部２６に表示する眼鏡機能を備えていても良い。
【００９９】
このような補聴器１は、例えばカメラ機構２９からＣＰＵ等からなる信号処理回路を経由して表示部２６に撮像した画像を表示する。この補聴器１は、このようなカメラ機構２９により例えば話者を撮像した画像を使用者に提示することで、使用者の認識を向上させる。また、この補聴器１は、撮像した画像を通信回路２７を介して外部のネットワークに出力しても良く、更には外部のネットワークからカメラ機構２９で撮像した画像を入力して通信回路２７及び信号処理回路等を介して表示部２６に表示しても良い。
【０１００】
更に、この補聴器１では、話者を撮像した画像を用いて信号処理部２２で顔面認識処理、物体認識処理を行って音声情報生成部２３を介して表示部２６に表示しても良い。これにより、補聴器１では、撮像対象者の口唇、顔の表情、全体の雰囲気等を使用者に提示して、使用者の音声認識を向上させる。
【０１０１】
撮像機能を用いた顔の認識において顔の個人性特徴を抽出して個人認識をおこなうものとして、以下の方法があるがこれらに限られるものではない。
【０１０２】
濃淡画像のマッチングにより識別するための特徴表現の一つとしてパターンをモザイク化し、各ブロック内の画素の平均濃度をブロックの代表値とすることで濃淡画像を低次元ベクトルに情報圧縮して表現する方法でＭ特徴といわれている方法である。また、ＫＩ特徴という濃淡顔画像の特徴表現で、Karhunen-Loeve（ＫＬ）展開を顔画像の標本集合に適応して求められる直交基底画像を固有顔とよび、任意の顔画像をこの固有顔を用いて展開した係数から構成される低次元の特徴ベクトルで記述する方法である。更に、顔画像集合のＫＬ展開による次元圧縮に基づくＫＩ特徴によるもの照合パターンをまずフーリエスペクトルに変換しＫＩ特徴の場合と同様に標本集合をＫＬ展開することで次元圧縮を行って得られる低次元の特徴スペクトルであるＫＦ特徴による識別を行う方法がある。以上の方法によるものが顔画像認識に用いることが可能であり、それらを用いて顔の認識を行うことは対話者が誰であるかという個人識別情報をコンピュータに与えることになり、使用者にとって対話者に対する情報が得られ、音声情報に対する認識が増す。なお、このような処理は、文献「小杉信：“ニューラルネットを用いた顔画像の識別と特徴抽出”，情処学ＣＶ研報，73-2（1991-07）」、文献「Turk MA and Pentland AP,Face recognition using eigenface.Proc CVPR,586-91（1991-06）」、文献「Akamatsu S et al,Robust face intification by pattern matching Based on KL expansion of the Fourier Spectrum 信学論 vol J76-D-II No.7,1363-73,1993」、文献「Edwards GJ et al,Learning to identify and track faces in image seguences,Proc of FG '98,260-5,1998」に記載されている。
【０１０３】
この補聴器１では、物体認識を行うときには、物体を示すパターンをモザイク化しておき、実際に撮像した画像とマッチングを取ることにより物体の識別を行う。そして、この補聴器１では、マッチングがとれた物体の動きベクトルを検出することで、物体の追尾を行う。これにより、物体から発せられる音声から生成される音声情報に対する認識が増す。この物体認識処理は、ＳｏｎｙＣＳＬから提案されているUbiquitous Talker（文献Nagao K and Rekimoto J,Ubiquitous Talker:Spoken language interaction with real world objects. Proc 14th IJCAI-95,1284-90,Morgan Kaufmann Publishers,1995）で用いられてる技術を採用することができる。
【０１０４】
更に、この補聴器１は、静止画撮像用デジタルカメラのようにシャッターを押すことで静止画を撮像しても良い。更に、カメラ機構２９は、動画像を生成して信号処理部２２に出力しても良い。このカメラ機構２９により動画像を撮像するときの信号方式としては、例えばＭＰＥＧ（Moving Picture Experts Group）方式などを用いる。更にまた、この補聴器１に備えられるカメラ機構２９は、３次元画像を撮像することで、話者や話者の口唇を撮像して表示部２６に表示させることで更に使用者の認識を向上させることができる。
【０１０５】
このような補聴器１は、使用者自身の発した音声や相手の発した音声等及び／又はその場の情景を撮像した画像を記録し再生することで、言語学習における復習することができ言語学習に役立てることができる。
【０１０６】
また、この補聴器１によれば、画像を拡大処理等して表示部２６に表示することで相手を確認し全体の雰囲気をつかめ音声聴取の正確さが向上し、更に読唇（lip reading）を行うことが可能となり認識を上昇させる。
【０１０７】
更にまた、この補聴器１は、例えばスイッチ機構が設けられており、マイクロホン２１で検出した音声をスピーカ部２５により出力するか、カメラ機構２９により撮像した画等像を表示部２６により出力するか、又は音声及び画像の双方を出力するかを使用者により制御可能としても良い。このときスイッチ機構は、使用者に操作されることで、音声情報生成部２３から出力を制御する。
【０１０８】
また例として、スイッチ機構は、使用者及び／又は使用者以外の音声を検出して、例えば「音声」という音声を検出したときにはマイクロホン２１で検出した音声をスピーカ部２５により出力するように切り換え、例えば「画像」という音声を検出したときにはカメラ機構２９により撮像した画等像を表示部２６により出力するように切り換え、「音声、画像」という音声を検出したときには音声及び画像の双方を出力するするように切り換えても良く、以上のような音声認識を用いたスイッチ制御機構を備えていても良い。また、ジェスチャーインターフェースを用いることで、ジェスチャー認識によるスイッチ制御システムとしても良い。
【０１０９】
更にまた、このスイッチ機構は、カメラ機構２９のズーム状態等のパラメータを切り換えることでカメラ機構２９で画像を撮像するときの状態を切り換える機能を備えていても良い。
【０１１０】
つぎに、この補聴器１において、音声情報生成部２３により作成した音声情報を出力する機構の種々の例について説明する。なお、本発明は、以下に説明する出力する機構に限られることはないことは勿論である。
【０１１１】
すなわち、この補聴器１において、音声情報を出力する機構としてはスピーカ部２５や表示部２６に限らず、例えば骨導や皮膚刺激を利用したものであっても良い。この音声情報を出力する機構は、例えば小さな磁石を鼓膜等に装着し、磁石を振動させるものや、骨を通して信号を蝸牛に伝達するものであっても良い。
【０１１２】
このような補聴器１は、例えば圧挺板を備え、音声情報生成部２３により変換することにより得た信号を前記圧挺板に出力するようにしたものや、皮膚刺激を用いたタクタイルエイド（Tactile Aid）等の触覚による補償技術を利用したものであっても良く、これらの骨振動や皮膚刺激等を用いた技術を利用することで、音声情報生成部２３からの信号を使用者に伝達することができる。皮膚刺激を利用した補聴器１においては、音声情報生成部２３からの音声情報が入力されるタクタイルエイド用振動子アレイが備えられており、タクタイルエイドと当該振動子アレイを介してスピーカ部２５から出力する音声を出力しても良い。
【０１１３】
また、上述した補聴器１の説明においては、音声情報を音声として出力するときの処理の一例について説明したが、これに限らず、例えば人工中耳により使用者に認識結果を提示するものであっても良い。すなわち、この補聴器１は、音声情報を電気信号としてコイル、振動子を介して使用者に提示しても良い。
【０１１４】
更には、この補聴器１は、人工内耳機構を備え、人工内耳により使用者に認識結果を提示するものであっても良い。すなわち、この補聴器１は、例えば埋め込み電極、スピーチプロセッサ等からなる人工内耳システムに音声情報を電気信号として供給して使用者に提示しても良い。
【０１１５】
更には、この補聴器１は、脳幹インプラント（Auditory Brainstem Implant）機構を備え、聴性脳幹インプラントにより使用者に音声情報を提示するものであっても良い。すなわち、この補聴器１は、例えば埋め込み電極、スピーチプロセッサ等からなる脳幹インプラントシステムに音声情報を電気信号として供給して使用者に提示しても良い。
【０１１６】
更にまた、この補聴器１は、使用者の身体状態、利用状態及び使用目的に応じて、例えば超音波帯域の音声が認識可能な難聴者に対しては認識結果及び加工変換した認識結果を音声情報として超音波帯域の音声に変調・加工変換して出力しても良い。更にまた、この補聴器１は、超音波出力機構（bone conduction ultrasound）を用いて超音波周波数帯域の信号を生成し、超音波振動子等を介して使用者に出力しても良い。
【０１１７】
更にまた、この補聴器１は、ヘッドホンの接触子を耳珠に当て、骨伝導をおこしさらに耳珠、外耳道内壁の振動が気導音となるシステムである骨伝導ユニットを備え、当該骨伝導ユニットを使用して音声情報を使用者に提示しても良い。この骨導ユニットとしては、聴覚障害者用ヘッドホンシステムであるライブホン（日本電信電話株式会社製）が使用可能である。
【０１１８】
更にまた、この補聴器１は、スピーカ部２５、表示部２６等の複数の出力手段を備える一例について説明したが、これらの出力手段を組み合わせて用いても良く、更には各出力手段を単独で出力しても良い。また、この補聴器１では、マイクロホン２１に入力した音声の音圧レベルを変化させる従来の補聴器の機能を用いて音声を出力するとともに、上述した他の出力手段で認識結果を提示しても良い。
【０１１９】
更にまた、この補聴器１は、スピーカ部２５及び／又は表示部２６から出力する出力結果を同時に或いは時間差を持たせて出力してするように音声情報生成部部２３で制御するスイッチ機構を備えていても良く、複数回に亘って出力結果を出力するか一回に限って出力結果を出力するかを制御するスイッチ機構を備えていても良い。
【０１２０】
また、この補聴器１の説明においては、図２に示したような一例について説明したが、入力された音声について上述した種々の加工変換処理を行って表示部２６に表示させる第１の処理を行うＣＰＵと、入力された音声について上述した種々の加工変換処理を行ってスピーカ部２５に出力結果を出力するための第２の処理を行うＣＰＵと、カメラ機構２９で撮像した画像を表示するための第３の処理を行うＣＰＵとを備えたものであっても良い。
【０１２１】
このような補聴器１は、各処理を行うＣＰＵを独立に動作させて第１の処理又は第２の処理を行わせて出力させても良く、更には各処理を行うＣＰＵを同時に動作させて第１の処理、第２の処理、及び第３の処理を行わせて出力させても良く、更には、第１及び第２の処理、第１及び第３の処理又は第２及び第３の処理を行うＣＰＵを同時に動作させて出力させても良い。
【０１２２】
更にまた、補聴器１は、使用者の身体状態、利用状態及び使用目的に応じて上述した種々の出力機構からの出力結果を同時に或いは時間差を持たせて出力してするように音声情報生成部２３で制御しても良い。
【０１２３】
更に、この補聴器１は、複数のＣＰＵを有し、上述した複数のＣＰＵで行う第１〜第３処理のうち、少なくとも１の処理をひとつのＣＰＵで行うとともに、残りの処理を他のＣＰＵで行っても良い。
【０１２４】
例えば、この補聴器１において、ひとつのＣＰＵが入力された音声を文字データとして加工変換を行って表示部２６に出力する処理（text to speech synthesis）を行うとともに、又はひとつのＣＰＵが入力された音声に対して文字データとして加工変換を行って他のＣＰＵが入力された同じ音声に対してSTRAIGHT処理を行ったりしてスピーカ部２５に出力する処理を行い、他のＣＰＵが入力された音声に対してボコーダ処理のうち、例えば音声分析合成法STRAIGHTを用いた処理を行ってスピーカ部２５に出力する処理を行っても良い。すなわちこの補聴器１は、スピーカ部２５に出力する信号と、表示部２６に出力信号とで異なる処理を異なるＣＰＵにより行うものであっても良い。
【０１２５】
更に、この補聴器１においては、上述した種々の加工変換処理を行って上述の種々の出力機構に出力する処理を行うＣＰＵを有するとともに、加工変換処理を施さないでマイクロホン２１に入力された音声を出力しても良い。
【０１２６】
更に、この補聴器１においては、上述した種々の加工変換処理の一を行うためのＣＰＵと、他の加工変換処理を行うＣＰＵとを別個に備えていても良い。
【０１２７】
更に、この補聴器１においては、上述のように認識結果や加工変換した認識結果や撮像した画像等について音声情報生成部２３で変換する処理を行うとともに、従来と同様に音声を検出して得た電気信号を増幅させて音質調整、利得調整や圧縮調整等を行いスピーカ部２５に出力するものであっても良い。
【０１２８】
なお、この補聴器１において、信号処理部２２及び音声情報生成部２３で行う処理を、例えばフーリエ変換、ボコーダ処理（STRAIGHT等）の処理を組み合わせて適用することで、上述した処理を行っても良い。
【０１２９】
また、本発明を適用した補聴器１では、個人的に使用する小型のタイプの補聴器について説明したが、集団で用いる大型のもの（卓上訓練用補聴器や集団訓練用補聴器）にも用いてもよい。
【０１３０】
視覚ディスプレイとしてＨＭＤ、Head-coupled display（頭部結合型表示装置）があげられる。以下に例を示す。双眼式ＨＭＤ（左右眼毎に視差画像を提示し立体視を可能とするものや左右眼双方に同じ画像を提示し見かけ上の大画面を与えるもの）、単眼式、シースルー型ＨＭＤ、視覚補助や視覚強調機能のついたディスプレイ、眼鏡型の双眼望遠鏡に自動焦点機能付でVisual filterを用いたもの、接眼部にコンタクトレンズを使用するシステム、網膜投影型（Virtual Retinal Display、Retinal projection display、網膜投影型の中間型）、視線入力機能付きＨＭＤ（製品名HAQ-200（島津製作所））や頭部以外（首、肩、顔面、眼、腕、手など）にマウントするディスプレイ、立体ディスプレイ（投影式オブジェクト指向型ディスプレイ（例 head-mounted projector：Iinami M et al., Head-mounted projector（II）-implementation Proc 4th Ann Conf Of Virtual Reality Society of Japan 59-62,1999）、リンク式の立体ディスプレイ、大画面のディスプレイ（spatial immnersive display）（例omnimax、 CAVE（Cruz-Neira C et al. Surrounded-screen projection-based virtual reality: The design and implementation of the CAVE, Proc of SIGGRAPH'93,135-42,1993参照）、ＣＡＶＥ型立体映像表示装置であるＣＡＢＩＮ（Hirose M et al. 電子情報通信学会論文誌Vol J81-D-II No.5.888-96,1998）、ＣＡＶＥ等の投影ディスプレイとＨＭＤの両方の特徴をもつ小型超広視野ディスプレイ（Endo T et al. Ultra wide field of view compact display. Proc 4th Ann Conf of Virtual Reality Society of Japan,55-58,1999）、アーチスクリーン）が使用可能である。
【０１３１】
特に大画面のディスプレイのものは大型補聴器として用いるときに使用してもよい。また、上述した補聴器１では、音の再現方法としてバイノーラル方式（３次元音響システムはHead-Related Transfer Functionを用いた空間音源定位システムを用いる：例 Convolvotron & Acoustetron II（Crystal River Engineering）,ダイナミック型ドライバユニットとエレクトレットマイクロフォンを使用した補聴器TE-H50（Sony））を使用してもよく、実際と近い音場をつくったり、トランスオーラル方式（トラッキング機能付きのトランスオーラル方式が３次元映像再現におけるＣＡＶＥに対応する）を用いたりするものは主に大型の補聴器システムの場合に用いるのが好ましい。
【０１３２】
更にまた、上述のＨＭＤ２は、頭頂部に３次元位置検出センサーを備えていても良い。このようなＨＭＤ２を備えた補聴器１では、使用者の頭の動きに合わせてディスプレイ表示を変化させることが可能となる。
【０１３３】
強調現実感（Augmented reality（ＡＲ））を利用した補聴器１では、使用者の動作に関するセンサを備え、センサで検出した情報、マイクロホン２１で検出し音声情報生成部２３で生成した音声情報とを用いることで、ＡＲ空間を生成する。音声情報生成部２３は、種々のセンサシステムとＶＲ形成システムを統合するシステムとディスプレイシステムによりなるバーチャルリアリティ（Virtual realtiy（ＶＲ））システムとを協調的に用いることにより、実空間にＶＲ空間を適切に重畳することで、現実感を強調するＡＲの空間をつくることが可能となる。これにより補聴器１では視覚ディスプレイを用いるときに、顔面部にある画像からの情報を、情報が来るたびに大幅に視線をはずすことなく、ただ画像が目の前にあるだけでなく、画像情報が、いかにもそこにあるように自然に受けいれるようになり自然な状態で視覚からの情報を受け取ることが可能となる。以上を実行するには以下のシステムがある。
【０１３４】
このような補聴器１は、図３に示すように、ＡＲ空間を形成するためには、仮想環境映像生成のための３Ｄグラフィックアクセラレータを音声情報生成部２３の内部に搭載することでコンピュータグラフィックスの立体視が可能な構成とし、更に無線通信システムを搭載する。この補聴器１に使用者の位置と姿勢の情報を取得するため、センサ３１として頭部に小型ジャイロセンサ（例データテックGU-3011）を、使用者の腰に加速度センサ（例データテックGU-3012）を接続する。以上のセンサ３１からの情報を音声情報生成部２３で処理を行った後、使用者の右目、左目に対応するスキャンコンバータ３２ａ、３２ｂで処理をして表示部２６に映像が行くというシステム（Ban Y et al, Manual-less operation with wearable augmented reality system.Proc 3th Ann Conf of Virtual Reality society of Japan,313-4,1998参照）を用いることで可能となる。
【０１３５】
また、この補聴器１では、センサ３１に加えて状況認識システム（例Ubiquitous Talker（Sony CSL））とＶＲシステムを形成する他のシステムである以下の種々のセンサシステムとＶＲ形成システムを統合するシステムとディスプレイシステム、及び、この補聴器１とを協調的に用いることにより、ＡＲ空間を強化することも可能であり、マルチモダリティを用いて音声情報を補足可能となる。
【０１３６】
このようなＶＲ等の空間を形成するには、先ず、使用者がセンサ３１に本人から情報を送り、その情報がＶＲ形成システムを統合するシステムに送られ、ディスプレイシステムから使用者に情報が送られることで実現する。
【０１３７】
上記センサ３１（情報入力システム）として以下のデバイスがある。
【０１３８】
特に人体の動きの取り込みや、空間に作用するデバイスとして光学式３次元位置センサ（ExpertVision HiRES ＆ Face Tracker（MotionAnalysis））、磁気式３次元位置センサ（InsideTrack（Polhemus）,3SPACE system（POLHEMUS）, Bird（Ascension Tech.））、機械式３Ｄディジタイザ（MicroScribe 3D Extra（Immersion））、磁気式３Ｄディジタイザ（Model 350 （Polhemus））、音波式３Ｄディジタイザ（Sonic Digitizer（Science Accessories））、光学式３Ｄスキャナー（3D Laser Scanner（アステックス））、生体センサ（体内の電気で測る）サイバーフィンガー（ＮＴＴヒューマンインタフェース研究所）、手袋型デバイス（DetaGlove（VPL Res），Super Glove（日商エレクトロニクス）Cyber Glove（Virtual Tech））、フォースフィードバック（Haptic Master（日商エレクトロニクス）、PHANToM（SensAble Devices））、３Ｄマウス（Space Controller（Logitech））、視線センサ（眼球運動分析装置（ATR視聴覚機構研究所製））、体全体の動きの計測に関するシステム（DateSuit（VPL Res））、モーションキャプチャーシステム（HiRES（Motion Analysis））、加速度センサ（三次元半導体加速度センサ（NEC製））、視線入力機能付きＨＭＤがある。
【０１３９】
また、ＡＲを実現するためには、表示部２６のみならず、触覚を利用した触覚ディスプレイ、触圧ディスプレイ、力覚ディスプレイがある。触覚ディスプレイにより音声を触覚により伝え、聴覚だけでなく触覚をも加えることで音声の認識をあげことが可能となる。この触覚ディスプレイとしては、例えば振動子アレイ（オプタコンや触覚マウス、タクチュアルボコーダ等）、触知ピンアレイ（ペーパーレスブレイル等）などが使用可能である。他にwater jet、air jet.PHANToM（SensAble Devices）、Haptic Master（日商エレクトロニクス）などがある。具体的には、補聴器１は、ＶＲな空間でＶＲキーボードを表示し、信号処理部２２及び音声情報生成部２３での処理をＶＲキーボードまたはＶＲスイッチにより制御する。これにより、わざわざキーボードを用意したり、スイッチまで手を伸ばしたりすることが無くなり、使用者の操作を楽にし、耳に装着するのみの補聴器と近い装用感を得ることができる。
【０１４０】
前庭感覚ディスプレイとしては、ウオッシュアウトとウオッシュバックにより狭い動作範囲の装置でも多様な加速度表現ができるシステム（モーションベット）が使用可能である。
【０１４１】
ＶＲシステムを統合するシステムとしては、以下のものがあり、それら限定されることはないが、Ｃ、Ｃ＋＋のライブラリとして供給され、表示とそのデータベース、デバイス入力、干渉計算、イベント管理などをサポートし、アプリケーションの部分は使用者がライブラリを使用してプログラミングするものや、ユーザプログラミングを必要とせずデータベースやイベント設定をアプリケーションツールで行い、そのままＶＲシュミレーションを実行するシステムなどを使用してもよい。またこの補聴器１に関する個々のシステム間を通信にてつなげてもよい。また、状況を高い臨場感を保って伝送するのに広帯域の通信路を使用しても良い。また、補聴器１では、３Ｄコンピュータグラフィックスの分野で用いられている以下の技術を用いてもよい。現実に起こり得ることを忠実に画像として提示し、非現実的な空間を作り、実際には不可能なことも画像として提示することがコンセプトとなる。３Ｄコンピュータグラフィックスでは、以下のモデリング技術、レンダリング技術、アニメーション技術により可能となる。複雑で精密なモデルを作るモデリング技術としては、ワイヤーフレームモデリング、サーフェスモデリング、ソリッドモデリング、べジエ曲線、Ｂ−スプライン曲線、ＮＵＲＢＳ曲線、ブール演算（ブーリアン演算）、自由形状変形、自由形状モデリング、パーティクル、スイープ、フィレット、ロフティング、メタボール等がある。また、質感や陰影をつけリアルな物体を追求するためにレンダリング技術としては、シェーディング、テクスチュアマッピング、レンダリングアルゴリズム、モーションブラー、アンチエリアシング、デプスキューイングがある。また、作成したモデルを動かし、現実の世界をシミュレーションするためのアニメーション技術としては、キーフレーム法、インバースキネマティクス、モーフィング、シュリンクラップアニメーション、αチャンネルがある。また、サウンドレンダリングとして「文献Takala T,Computer Graphics （Proc SIGGRAPH 1992）Vol26,No2,211-20」に記載されている技術を用いても良い。
【０１４２】
このようなＶＲシステムを統合するシステムとしては、例えばDivision Incのシステム（VRランタイムソフトウェア[dVS]、ＶＲ空間構築ソフトウェア[dVISE]、ＶＲ開発用ライブラリ[VC Toolkit]、SENSE８社のWorldToolKitと、WorldUp、Superscape社のＶＲＴ、ソリッドレイ社のRealMaster、モデルなしのＶＲの生成として、文献「Hirose M et al. A study of image editing technology for synthetic sensation. Proc ICAT'94,63-70,1994」に記載されている方法等を使用しても良い。
【０１４３】
また、本実施の形態では、ＨＭＤ２と、コンピュータ部３との間を光ファイバーケーブル４で接続してなる携帯型の補聴器１について説明したが、ＨＭＤ２とコンピュータ部３との間をワイヤレスとし、ＨＭＤ２とコンピュータ部３との間を無線や赤外線を用いた信号伝送方式等により情報の送受信を行っても良い。更に、この補聴器１においては、ＨＭＤ２とコンピュータ部３との間をワイヤレスとする場合のみならず、図２に示した各部が行う機能毎に分割して複数の装置とし、各装置間をワイヤレスとしても良く、少なくともコンピュータ部３を使用者に装着させずにＨＭＤ２と情報の送受信を行っても良い。更にまた、この補聴器１においては、使用者の身体状態、利用状態、使用目的に応じて、図２に示した各部が行う機能毎に分割して複数の装置とし、各装置間をワイヤレスとしても良い。これにより、補聴器１は、使用者が装着する装置の重量、体積を軽減し、使用者の身体の自由度を向上させ、使用者の認識を更に向上させることができる。
【０１４４】
また、補聴器１では、通信回路２７を介して信号処理部２２及び音声情報生成部２３で行う処理の制御及びバージョンアップ、修理等をしても良い。これにより、補聴器１では、通信回路２７を通じて視覚ディスプレイ、聴覚ディスプレイ等を通じて修理、制御、調整等を受けることができる。
【０１４５】
また、本発明を適用した補聴器１によれば、合成した音声を表示することで使用者に提示することができるので、例えば事務（ウェアブルコンピュータとして）、通信（自動翻訳電話への応用など）、産業医学領域（メンタルヘルスなど）、医療現場（聴力検査への利用）、外国語学習、言語訓練、娯楽（テレビゲーム）、個人用のホームシアター、コンサートや試合等の観戦、番組製作（アニメーション、実写映像、ニュース、音楽制作）、水中（ダイビングでの水中における会話など）、諜報活動や軍事、騒音下などの悪条件での作業業務（建築現場工場など）、スポーツ（自動車やヨット等のレースや、山や海等の冒険時、選手の試合時や練習時での選手同士や選手とコーチ間の意志疎通や情報変換）、や宇宙空間での作業、運輸（宇宙船や飛行機のパイロット）、カーナビゲーションシステム、ＶＲとＡＲとを用いた種々のシミュレーション作業（遠隔手術（マイクロサージュリー）など）等、教育、トレーニング、内科治療、傷病治療、政治、旅行、買い物、マーケティング、広告、宗教、デザインの分野、アミューズメントパーク等におけるFish-tank VR display、裸眼立体視システム、テレイグジスタンス視覚システムなどを用いたＶＲやＡＲや、テレエグシスタンスやアールキューブを利用したもの、電話やインターネットでの応対業務にも適用可能であり、音声言語障害者のみならず、重病患者、重度身体障害者のコミュニケーション、介護学校等の広い分野で使用可能である。
【０１４６】
【発明の効果】
以上詳細に説明したように、本発明に係る補聴器は、音声言語障害者を検出して得た認識結果に基づいて予め記憶した音声データを組み合わせて音声情報を音声に変換して外部に出力するとともに、外部からの音声を使用者に出力することができるので、喉頭摘出や舌口腔底切除や構音障害等による音声言語障害を有する人達が本来自身がもつ、或いは自在に変換させて自然な音声で発声することを可能とするとともに、外部からの音声を使用者に出力することで使用者の聴覚を補うことができる。
【０１４７】
本発明に係る補聴器は、使用者の身体状態、利用状態及び使用目的に応じて、上記認識手段からの認識結果の内容を変更するように加工変換する変換手段を備えているので、使用者の身体状態、利用状態及び使用目的に応じて音声認識の結果を提示するとともに、ノイズが少ない状態で認識結果を提示することができる。
【図面の簡単な説明】
【図１】本発明を適用した補聴器の外観の一例を示すブロック図である。
【図２】本発明を適用した補聴器の構成を示すブロック図である。
【図３】本発明を適用した補聴器でＡＲ空間を作るための構成について示すブロック図である。
【符号の説明】
１音声生成装置、２ヘッドマウントディスプレイ、３コンピュータ部、７ディスプレイ部、８使用者用マイクロホン、１１外部用マイクロホン、２１マイクロホン、２３音声情報生成部、２４記憶部、２５スピーカ部、２６表示部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a hearing aid that converts and presents speech detected by a microphone or the like into a format that is easily understood by a hearing impaired person, and an auxiliary that is used to correct a speech or spoken language disorder that is produced by a person with a speech language disorder. The present invention relates to a hearing aid that processes and converts a voice emitted by an automatic device or means (for example, speech production substitutes after laryngectomy).
[0002]
[Prior art]
Conventionally, an air conduction method and a bone conduction method are used for hearing aids. As the types of hearing aids, there are a box-type hearing aid, an ear-hiding hearing aid, a CROS (Contra-lateral Routing of Signal) hearing aid, and an ear-hole type hearing aid. Moreover, if it divides as a conventional processing system, there are an analog hearing aid and a digital hearing aid. In addition, according to Kodera's report, there are large-sized hearing aids (table training hearing aids, group training hearing aids) and small personally-used hearing aids (Koji Kodera, selection and evaluation of hearing aids) Otolaryngology new approach medical view, see 39, 1996).
[0003]
This digital hearing aid first generates digital data by performing A / D (analog / digital) conversion processing on sound detected by a microphone. The digital hearing aid analyzes the digital data input by, for example, performing Fourier transform processing by decomposing the digital data into frequency spectra, and calculates the amplification degree based on the sensory magnitude of the sound for each frequency band. Perform the calculation. The digital hearing aid is configured to pass the digital data amplified for each frequency band through a digital filter, perform D / A conversion processing, and output the sound to the user's ear again. As a result, the digital hearing aid allowed the user to hear the speaker's voice with little noise.
[0004]
Conventionally, for example, a person who has a speech disorder due to, for example, laryngectomy loses the utterance mechanism due to normal vocal cord vibration, and speech generation becomes difficult.
[0005]
To date, artificial materials such as rubber membranes (flute-type artificial larynx) and buzzers (electric artificial larynx (percutaneous type, implantable type)) can be roughly classified according to the nature of the vibrating body as a sound source as a substitute utterance method after laryngectomy And methods using the hypopharynx and esophageal mucosa (esophageal speech, tracheoesophageal speech, and tracheoesophageal speech using voice prostheses). Other alternative utterance methods include electromyograms generated when the lips are moved, utterance training devices that use various speech processing technologies for persons with speech impairment due to hearing impairment, and palatographs (palatographs). ) And by the vibrator in the oral cavity have been reported.
[0006]
[Problems to be solved by the invention]
However, since the digital hearing aid described above only performs the process of amplifying the digital data for each frequency band, the surrounding sound is randomly picked up by the microphone, and the noise is reproduced as it is. Pleasant feeling remained and there was no significant improvement in various hearing tests compared to analog hearing aids. Further, conventional digital hearing aids have not been adapted to the processing for the detected sound according to the physical condition, usage state, and purpose of use of the hearing impaired person.
[0007]
Therefore, an object of the present invention is to provide a hearing aid that can present a result of speech recognition according to a user's physical condition, usage state, and purpose of use, and can present a recognition result with less noise. is there.
[0008]
Moreover, what is commonly seen in the above alternative utterance method is not due to the vocal cord vibration in the normal state of the person himself before laryngectomy, so the sound quality of the generated speech is not good and is normally normal. The problem is that it is far from the voice that the person himself uttered.
[0009]
Therefore, the present invention has been proposed in view of the above-described circumstances, and those who have speech language disorders such as laryngectomy, tongue-and-mouth excision, articulation disorder, or the like originally have or are freely converted. It is an object of the present invention to provide a hearing aid that makes it possible to utter a natural voice and to output a voice from the outside to a user so that a natural conversation can be performed.
[0010]
[Means for Solving the Problems]
A hearing aid according to the present invention that solves the above-described problems includes an acoustoelectric conversion unit that generates a sound signal by detecting a sound emitted from a user having a spoken language disorder and / or an external sound, and the sound Voice recognition means for performing voice recognition processing based on a voice signal from the electrical conversion means; Audio data A voice information generating means for generating voice information indicating a voice to be output by combining voice data stored in the memory means based on a recognition result from the voice recognition means, and the voice information generation User voice output means for converting voice information generated by the means into voice and outputting the voice to the outside, and external voice output means for outputting the recognition result recognized by the voice recognition means to the user as voice from the outside Are provided.
[0011]
Such a hearing aid outputs sound from the outside to the user and also outputs the sound uttered with a fault to the user.
[0012]
A hearing aid according to the present invention includes an acoustoelectric converter that detects an external sound and generates an audio signal, a recognition unit that performs an audio recognition process using the audio signal from the acoustoelectric converter, and a user's The content of the recognition result from the recognition means is changed according to the physical condition, usage condition and purpose of use. conversion Conversion means, and the recognition result and / or the recognition result by the recognition means conversion An output control means for generating a control signal for outputting the recognized result, and a recognition result by the recognition means and / or a conversion means based on the control signal generated by the output control means. To output voice information that is the converted recognition result and present the voice to the user Output recognition result and recognition result Voice information Output means for presenting to the user.
[0013]
Such a hearing aid changes the output result by changing the content of the recognition result by the conversion means, and presents the voice or the like changed by the conversion means to the user. According to such a hearing aid, the conversion method is freely changed according to the user's physical condition, usage state, and usage purpose, and the recognition result is presented.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0015]
The present invention is applied to a hearing aid 1 configured as shown in FIGS. 1 and 2, for example. As shown in FIG. 1, this hearing aid 1 has a head-mounted display (HMD) 2 and a computer unit 3 that performs voice recognition, generation of voice information, and the like connected by an optical fiber cable 4. It is a portable type. Further, the computer unit 3 is provided with a support unit 5 attached to the user's waist, for example, and is driven by power supplied from a battery 6 attached to the support unit 5. At the same time, the HMD 2 is driven.
[0016]
The HMD 2 includes a display unit 7 disposed in front of the user, a user microphone 8 that detects voice from the user, a voice output unit 9 that outputs voice to the user, and a head of the user. A support unit 5 that supports the above-described units to be disposed and an external microphone 11 that detects sound from the outside are provided.
[0017]
The display unit 7 is arranged in front of the user and displays, for example, the meaning content of the sound detected by the user microphone 8 and / or the external microphone 11 described later. The display unit 7 may display not only the meaning content of the above-mentioned sound but also other information in response to a command from the computer unit 3.
[0018]
The user microphone 8 is disposed in the vicinity of the user's mouth, and detects the voice uttered by the user. The user microphone 8 converts voice from the user into an electrical signal and outputs the electrical signal to the computer unit 3.
[0019]
The external microphone 11 is provided on the side surface of the audio output unit 9 formed in a round plate shape. The external microphone 11 detects external audio, converts it into an electrical signal, and outputs it to the computer unit 3.
[0020]
The user microphone 8 and the external microphone 11 are super microphones having various microphones (bone conduction microphones, microphones for picking up air conduction sounds and bone conduction sounds) in accordance with the user's operations regardless of the positions where they are disposed. Microphone with a compact transmitter / receiver unit (manufactured by Nippon Telegraph and Telephone Corporation), omnidirectional microphone, unidirectional (superdirective, etc.) microphone, bidirectional microphone, dynamic microphone, condenser microphone (electret microphone), zoom microphone Stereo microphone, MS stereo microphone, wireless microphone), ceramic microphone, magnetic microphone, and microphone array may be used. As the earphone, a magnetic earphone can be used. An echo canceller or the like may be used as a sound collection technique for these microphones or as a transmission technique. In addition, these microphones 8 and 11 can be used to which a gain adjuster, a sound adjuster, and an output control device (maximum output power control type, automatic recruitment control compression type, etc.) that have been conventionally used are applied. .
[0021]
Further, as shown in FIG. 1, the user microphone 8 and the external microphone 11 may be configured integrally as well as an example of providing them separately.
[0022]
The support unit 5 is made of, for example, an elastic material such as a shape memory alloy and can be fixed to the user's head so that the display unit 7, the user microphone 8, and the audio output unit 9 are placed at predetermined positions. It is possible to arrange in. The support unit 5 shown in FIG. 1 has been described as an example of disposing the display unit 7 and the like at predetermined positions by disposing a support member from the user's forehead to the back of the head. Needless to say, it may be a mold support section, and the audio output section 9 may be provided for both ears.
[0023]
The computer part 3 is attached to the support part 5 with which a user's waist | hip | lumbar part is mounted | worn, for example. As shown in FIG. 2, for example, an electric signal detected and generated by the microphones 8 and 11 is input to the computer unit 3. The computer unit 3 includes a recording medium storing a program for processing an electrical signal, a CPU (Central Processing Unit) that performs voice recognition and voice information generation processing according to the program stored in the recording medium, and the like. . The computer unit 3 may be integrated with the HMD 2 on the head as well as the waist.
[0024]
The computer unit 3 performs voice recognition processing by the CPU by starting a program stored in a recording medium based on an electrical signal generated from voice detected by the user microphone 8 and / or the external microphone 11. In this way, the recognition result is obtained. Thereby, the computer unit 3 obtains the content of the sound detected by the user microphone 8 and / or the external microphone 11 by the CPU.
[0025]
Next, the electrical configuration of the hearing aid 1 to which the present invention is applied will be described with reference to FIG. The hearing aid 1 includes a microphone 21 corresponding to the microphones 8 and 11 that detects sound and generates a sound signal, and the computer unit 3 that receives the sound signal generated by the microphone 21 and performs sound recognition processing. The signal processing unit 22 included, the audio information generation unit 23 included in the computer unit 3 that generates audio information based on the recognition result from the signal processing unit 22, the audio data stored therein, the signal processing unit 22 and the audio information The storage unit 24 included in the computer unit 3 whose contents are read by the generation unit 23 and the speaker unit 25 corresponding to the audio output unit 9 that outputs audio using audio information from the audio information generation unit 23. And the display unit 26 corresponding to the display unit 7 that displays the content indicated by the audio information using the audio information from the audio information generation unit 23. Equipped with a.
[0026]
The microphone 21 detects, for example, a voice from a user or an external voice uttered using a substitute utterance method after laryngectomy, and generates a voice signal based on the voice. The microphone 21 outputs the generated audio signal to the signal processing unit 22.
[0027]
The microphone 21 is disposed near the user's mouth and detects the voice uttered by the user. The microphone 21 detects an external sound and generates an audio signal. In the following description, the microphone for detecting the user's voice is referred to as the user microphone 8 as described above, and the microphone for detecting the voice from the outside is referred to as the external microphone 11 as described above. Are collectively referred to as a microphone 21.
[0028]
The substitute utterance method is, for example, a mechanism for realizing artificial larynx (electric type, whistle type), esophageal utterance, and various voice reconstruction techniques.
[0029]
The signal processing unit 22 performs voice recognition processing using the voice signal from the microphone 21. For example, the signal processing unit 22 performs the speech recognition processing by performing processing according to a program for performing speech recognition processing stored in a memory provided inside. Specifically, the signal processing unit 22 performs processing for recognizing the audio signal from the microphone 21 as a language by referring to the audio data generated by sampling the user's audio and stored in the storage unit 24. As a result, the signal processing unit 22 generates a recognition result according to the sound signal from the microphone 21.
[0030]
The signal processing unit 22 includes, for example, classification by recognition target speech and classification by the target speaker. In the classification speech recognition processing by the recognition target speech, word speech recognition (isolated word recognition) and continuous speech recognition (continuous speech recognition) speech recognition). In addition, the speech information generation unit 23 includes continuous word recognition, sentence speech recognition, conversational speech recognition, and speech understanding for continuous word speech recognition. is there. In addition, classification by target speaker includes a non-specific speaker type (speaker independent), a specific speaker type (speaker dependent), and a speaker adaptive type. As the speech recognition method performed by the signal processing unit 22, there are a method using dynamic programming matching, a method using speech features, and a method using a hidden Markov model (HMM).
[0031]
Further, the signal processing unit 22 performs speaker recognition using the input voice. At this time, the signal processing unit 22 generates a speaker recognition result using a process of extracting a voice feature from the user's speaker and a frequency characteristic of the voice, and outputs the result to the voice information generating unit 23. In addition, the signal processing unit 22 performs unspecified speaker recognition using a method using a feature amount with small fluctuations by a speaker, a multi-template method, and a statistical method. Speaker adaptation includes normalization methods for individual differences, correspondence between speech data between speakers, update of model parameters, and selection of speakers. The signal processing unit 22 performs the above voice recognition according to the user's physical condition, usage condition, and usage purpose.
[0032]
Here, the user's physical condition means the degree of hearing loss or language disorder of the user, and the use state means the environment (in the room, outdoors, under noise) etc. where the user uses the hearing aid 1, The purpose of use is the purpose when the user uses the hearing aid 1, that is, to improve recognition, to make it easier for the user to understand. To talk with a large number of people, to listen to music (opera, enka), to listen to lectures, and to talk with people with language disabilities.
[0033]
The signal processing unit 22 has a function of storing and learning the voice input to the microphone 21. Specifically, the signal processing unit 22 retains voice waveform data detected by the microphone 21 and uses it for later voice recognition processing. Thereby, the signal processing unit 22 further improves voice recognition. Further, the signal processing unit 22 can provide a learning function so that the output result is accurate.
[0034]
The storage unit 24 stores data indicating a speech model to be compared with a speech waveform generated by detecting the input speech when the signal processing unit 22 recognizes the input speech. . In addition, in the storage unit 24, for example, voice data of a user who has a vocalization mechanism by vocal cord vibrations uttered before laryngectomy and data obtained by sampling in advance a voice desired to be output are stored as voice data. Yes.
[0035]
The voice information generation unit 23 generates voice information using the recognition result from the signal processing unit 22 and the voice data indicating the user's voice stored in the storage unit 24. At this time, the voice information generation unit 23 combines the voice data stored in the storage unit 24 according to the recognition result, and processes and converts the recognition result to generate voice information. At this time, the voice information generation unit 23 generates voice information using a built-in CPU and a voice information generation program.
[0036]
In addition, the voice information generation unit 23 performs voice analysis from the voice using the recognition result, and performs processing of reconstructing voice data according to the content of the voice that has been voice-analyzed. Is generated. Then, the audio information generation unit 23 outputs the generated audio information to the speaker unit 25 and the display unit 26.
[0037]
Further, the voice information generation unit 23 performs processing for generating voice information by processing, converting, synthesizing, etc., the recognition result from the signal processing unit 22 according to the user's physical condition, usage state, and usage purpose. . Further, the voice information generating unit 23 performs a process for presenting the voice detected by the microphone 21 to the user with respect to the recognition result and / or the recognition result obtained by processing and the like.
[0038]
Furthermore, the voice information generation unit 23 may generate new voice information by modifying the voice information generated from the recognition result. At this time, the voice information generating unit 23 further improves the recognition of the user's voice by adding words that are easier for the user to understand based on the user's physical condition, usage state, and usage purpose.
[0039]
Furthermore, the sound information generation unit 23 outputs the meaning content of the sound as an image to the display unit 26 when outputting the sound information to the display unit 26. For example, when a user or a speaker of the user and an external voice are input and a recognition result indicating an object is input as a recognition result from the signal processing unit 22, the audio information generation unit 23 displays an image indicating the object A process of outputting and displaying data on the display unit 26 is performed.
[0040]
Furthermore, the audio information generation unit 23 outputs again the audio information that was previously output to the speaker unit 25 or the display unit 26 according to the recognition result from the signal processing unit 22. When the voice information generating unit 23 determines that the recognition result indicating the voice uttered in response to the user or the speaker who wants to listen again after inputting the voice information, the speaker unit 25 or A process of outputting the audio information output to the display unit 26 again is performed. In addition, the voice information generation unit 23 previously uses the speaker unit 25 or the display unit 26 based on a speaker recognition result using, for example, a process of extracting voice features from the user's speaker or a frequency characteristic of the voice. The output audio information may be output again. Further, the voice information generation unit 23 may output the voice information output to the speaker unit 25 or the display unit 26 again by performing a voice dialogue using the artificial intelligence function.
[0041]
Furthermore, the audio information generation unit 23 may switch whether or not to perform the process of outputting again according to an operation input command from the operation input unit 28. That is, it is determined by operating the operation input unit 28 whether the user performs the process of outputting again, and the operation input unit 28 is used as a switch.
[0042]
In addition, when outputting the audio information again, the audio information generation unit 23 determines whether to output the audio information output previously or output audio information different from the audio information output previously. 22 is selected according to an operation input signal from the operation input unit 28 input via the control unit 22.
[0043]
The display unit 26 displays the voice indicated by the voice information generated by the voice information generation unit 23, the image captured by the camera mechanism 29, and the like.
[0044]
The operation input unit 28 may be a switch, a keyboard, a mouse, or the like, and generates an operation input signal when operated by a user.
[0045]
Such a hearing aid 1 performs a speech recognition process on the sound detected by the microphone 21 by the signal processing unit 22 and activates a program by the speech information generation unit 23 based on the recognition result to perform a process according to the user. It can be carried out. Thereby, since the hearing aid 1 outputs the sound from the microphone 21 to the speaker unit 25 and displays the sound on the display unit 26, the user's recognition of the sound can be improved. The MuGurk effect (see MuGurk H and MacDonald J: Hearing lips and seeing voice, Nature 264, 746-8, 1976) ICSLP'94 S11.4 Report that infants have already acquired the correspondence between auditory speech information and visual mouth shape information (Kuhl PK et al. Human processing of auditory-visual information in speech perception. Yokohama, 1994) and reports that visual perception affects the perception of the direction of the sound source (abdomen effect), and that humans unconsciously learn whether or not they are sound sources, and that human communication is inherently multimodal (See Saitou H and Mori T: Visual and auditory cognition Ohmsha, 119-20, 1999). The above means that the visual influence has an influence on hearing, and by displaying the recognition result or the like on the display unit 26, the voice information is supplemented and the user's recognition of the voice is improved. In this hearing aid 1, not only the voice but also the meaning content of the voice can be transmitted to the speaker through an image displayed on the display unit 26 and can interact.
[0046]
Furthermore, according to this hearing aid 1, the meaning content of the sound displayed on the display unit 26 and the sound output from the speaker unit 25 according to the result of recognizing the sound detected by the user microphone 8 and / or the external microphone 11 Therefore, the user's recognition of voice can be further improved. Therefore, according to the hearing aid 1, by executing a program for changing the voice recognition process by the voice information generation unit 23, the recognition process is changed according to the physical condition (degree of deafness, etc.), the usage state, and the purpose of use. Thus, the recognition can be further improved by displaying the semantic information of the voice that is easy for the user to understand.
[0047]
The speaker unit 25 outputs the sound generated by the sound information generation unit 23. For example, the speaker unit 25 may output a sound from the user to the speaker. Further, the speaker unit 25 outputs a sound so that the sound uttered by the user is uttered to the user's ear. It may be what you do. In addition, the speaker unit 25 that outputs sound so as to utter the user's ear may be of a dynamic type or an electrostatic type (capacitor type, electrostatic type) as a conversion method of the speaker unit. Headphones (open-air type, closed type, in-the-ear type such as canal type) may be used. The speaker unit 25 may be a conventional hearing aid, loudspeaker, or sound collector speaker, and the speaker unit 25 that outputs sound from the user to the speaker may be a conventionally used speaker device. .
[0048]
Further, the speaker unit 25 may output a sound having a phase opposite to that of the sound output based on the sound information. Thereby, the noise component contained in the sound output from the speaker unit 25 is removed, and the sound with less noise is output to the user and / or the speaker to the user.
[0049]
The hearing aid 1 also includes a communication circuit 27 connected to an external communication network. The communication circuit 27 is inputted with a voice emitted from a person having a speech language disorder, for example, via a communication network such as a telephone, a mobile phone, the Internet, wireless communication, or satellite communication. The communication circuit 27 inputs external audio and data indicating the audio to the signal processing unit 22. The communication circuit 27 outputs the voice information generated by the voice information generation unit 23 to an external network.
[0050]
In addition, the communication circuit 27 may display a text broadcast and a text radio on the display unit 26 via the signal processing unit 22 and the audio information generation unit 23. At this time, the communication circuit 27 has a tuner function for receiving a text broadcast and the like, and receives data desired by the user.
[0051]
The hearing aid 1 configured as described above recognizes the voice by the signal processing unit 22 even when the voice uttered using the electric artificial larynx after laryngectomy is input to the microphone 21, and the storage unit 24. Since the voice information indicating the voice output by the voice information generation unit 23 is generated using the voice data indicating the voice sampled before laryngectomy stored in the speaker, it approximates the voice of the user before laryngectomy from the speaker unit 25. Sound can be output.
[0052]
In the above description of the hearing aid 1 to which the present invention is applied, an example of a voice of a laryngectomized person detected by the microphone 21 has been described, but articulation disorder (articulation) is one of language disorders due to hearing impairment. It may be when a voice from a person having disorders) is detected. At this time, the hearing aid 1 stores the speech of the language disorder as speech data in the storage unit 24, and the speech data indicating the speech of the speaker stored in the storage unit 24 in response to the speech of the speaker. The speech processing unit 22 performs speech recognition processing, and the speech information generation unit 23 performs processing for generating speech information by combining speech data according to the recognition result. The voice content based on the voice information can be displayed on the display unit 26.
[0053]
Therefore, according to this hearing aid 1, an unnatural sound can be corrected, for example, by displaying on the display unit 26 the sound generated by the laryngectomy by the substitute speech method.
[0054]
Furthermore, the hearing aid 1 has the above-described processing that, for example, a person with articulation disorder due to hearing impairment cannot obtain feedback for utterance, and the voice “Kyowa (Today)” becomes “Konwaa”. By performing the above, it is possible to correct the voice “Today (today)” and output it from the speaker unit 25.
[0055]
Further, since the hearing aid 1 includes the display unit 26, the voice of the speaker is output as a normal voice from the speaker unit 25, and the voice content of the speaker is displayed to display the voice content of the speaker. It is possible to provide a system suitable for language training learning.
[0056]
Next, various examples that can be applied in the process in which the above-described voice information generation unit 23 processes and converts the recognition result from the signal processing unit 22 to generate voice information and the process of combining voice data will be described. Note that various examples such as the conversion process are not limited to the examples described below.
[0057]
When converting the recognition result from the signal processing unit 22, the voice information generation unit 23 may process and convert the recognition result using an artificial intelligence technique to generate voice information. The voice information generation unit 23 uses, for example, a voice dialogue system. Here, an elderly person whose hearing has deteriorated may rehearse what the other speaker has said, but the hearing aid 1 and the user interact by converting the recognition result using this system. It is possible to improve the voice recognition of the user by obtaining the previously stored information of the other speaker and to save the trouble of re-listening.
[0058]
Such a system can be realized by using a voice dialogue system with an expression, which is a multimodal dialogue system. This multimodal dialogue system uses direct input / pen gesture technology, text input technology, voice input / output technology such as voice recognition, human vision, hearing, touch, and force sense, which are input technologies that use a pointing device and a tablet. The virtual reality technology and the non-verbal modality technology elements are used as modalities. At this time, the voice information generation unit 23 uses each modality as means for supplementing language information, context information for conversation (or its supplement means), means for reducing a user's cognitive burden or psychological resistance. A gesture interface may be used as the non-verbal interface. In that case, gesture tracking is required for gesture interface measurement using wearable sensors, and glove-type devices, magnetic and optical position measurement are used. You may use the thing by 3D reconstruction.
[0059]
Details of this multimodal dialogue system can be found in the literature `` Nagao K and Takeuchi A, Speech dialogue with facial displays: Multimodal human-computer conversation.Proc. 32nd Ann Meeting of the Association for Computational Linguistics, 102-9, Morgan Kaufmann Publishers, 1994 and Takeuchi A and Nagao K, Communicative facial displays as a new conversational modality. Proc ACM / IFIP Conf on Human Factors in Computing Systems (INTERCHI'93), 187-93, ACM Press, 1993.
[0060]
As a speech dialogue system using such an artificial intelligence function, the speech detected by the microphone 21 is subjected to A / D conversion, acoustic analysis, vector quantization by the signal processing unit 22, and then a higher score is obtained by the speech recognition module. Any system that generates the best hypothesis at the word level can be used. Here, the speech information generation unit 23 estimates a phoneme from a vector quantum code using a phoneme model based on a hidden Markov model (HMM), and generates a word string. The voice information generation unit 23 converts the generated word string into a semantic expression by the syntax / semantic analysis module. At this time, the speech information generation unit 23 performs syntactic analysis using the unified grammar, and then resolves the ambiguity using the frame-type knowledge base and the case base (sentence pattern obtained by analyzing the example sentence). I do. After the meaning of the utterance is determined, the plan recognition module recognizes the user's intention. This is based on a user's belief model that is dynamically modified and expanded as the dialogue progresses and a plan for the goal of the dialogue. In the process of recognizing the intention, the subject management, pronoun anaphoric elimination, omission supplementation, etc. are performed. And the module which produces | generates a cooperative response based on a user's intention starts. This module generates an utterance by embedding information about a response obtained by domain knowledge in an utterance pattern of a template prepared in advance. This response is converted to speech by the speech synthesis module. In addition, as a process which this signal processing part 22 and the audio | voice information production | generation part 23 perform, literature (Nagao N, A preferential constraint satisfaction technique for natural language analysis. Proc 10th European Confon Artificial Intelligence, 523-7, John Wiley & Sons, 1992), (Tanaka H, Natural language processing and its applications, 330-5, 1999, edited by the Institute of Electronics, Information and Communication Engineers, Corona), (Nagao K, Abduction and dynamic preference in plan-based dialogue understanding. Proc 13th Int joint Conf on Artificial Intelligence, 1186-92, Morgan Kaufmann Publishers, 1993) can also be realized.
[0061]
In addition, the speech information generation unit 23 performs anthropomorphization of the system as processing performed using the artificial intelligence function, and uses the display unit 26 to adjust facial expression parameters and expression animation through speech recognition, syntax / semantic analysis, and plan recognition. By doing so, the user's cognitive burden and psychological resistance to the voice dialogue are reduced using visual means. In addition, as a process which this audio | voice information production | generation part 23 performs, performing the process described in FACS (Facial Action Coding System; Ekman P and Friesen WV, Facial Action Coding System.Consulting Psychologists Press Palo Alto, Calif, 1978). Can do.
[0062]
Furthermore, the speech information generation unit 23 is a speech dialogue computer system (reference Nakano M et al, spoken dialogue system DUG-1, which performs flexible speaker change, Proc. Of the 5th Annual Conference of the Language Processing Society of Japan, 161-4 , 1999) (Incremental Utterance Understanding: Nakano M, Understanding unsegmented user utterances in real-time spoken dialogue systems.Proc of the 37th Ann meeting of the association for computational linguistics, 200-7) Proc of the 16th Int Conf on Computational Linguistics, 304-9, 1996. Incremental Utterance Production: Dohsaka K and Shimazu A, A computational model of incremental utterance production in task-oriented dialogues. And Dohsaka K and Shimazu A, System architecture for spoken utterance production in collaborative dialogue.Working Notes of IJCAI 1997 Workshop on Collaboration, Cooperation and Conflict in Dialogue Systems, 1997 and Dohsaka K et al, an artificial intelligence system using speech and images by analyzing collaborative dialogue principles in multiple dialogue domains, IEICE Technical Report NLC-97-58, 25-32,1998). Here, the voice information generation unit 23 operates in parallel with the process of understanding and response. In addition, the speech information generation unit 23 uses the ISTAR protocol (refer to Hirasawa J, Implementation of coordinative nodding behavior on spoken dialogue systems, ICSLP-98, 2347-50, 1998) to convert word candidates into a language processing unit simultaneously with speech recognition. Send sequentially.
[0063]
That is, by using the technology used in the voice dialogue system DUG-1, the hearing aid 1 recognizes voices from the user and / or outside for each predetermined amount of data (sentence), for example, and receives voice information. Generate the process. In the voice information generation unit 23, the voice recognition process and the voice information recognition process can be stopped and started at any time according to the voice from the user and / or the outside, and an efficient process can be performed. Further, in this hearing aid 1, since the voice recognition process and the voice information generation process can be controlled according to the user's voice, the change of the speaker can be realized flexibly. That is, it is possible to change the process by detecting the voice from the user and / or outside while generating the voice information, and to change the contents of the voice information presented to the user. .
[0064]
Furthermore, the voice information generating unit 23 may perform processing for understanding a user's free speech using keyword spotting (Takabayashi Y, voice free dialogue system TOSBURG II-user-centered multimodal interface). Toward realization-.Science theory vol J77-D-II No.8 1417-28,1994).
[0065]
The voice information generation unit 23 may output voice information by performing a conversion process so as to perform a process such as an accent. At this time, the voice information generation unit 23 converts and outputs the voice information so as to change the strength of the accent for a specific pronunciation as necessary.
[0066]
When synthesizing voice data, the voice information generating unit 23 synthesizes voice of any content, voice synthesis by rules, voice synthesis using variable length units to synthesize smooth voice, natural voice Voice information may be generated by performing sound quality conversion for prosodic control for synthesis and for adding personality to the voice. This can be realized, for example, by applying the technology described in the book “Automatic Translation Telephone” ATR International Telecommunications Research Institute, pp.177-209, 1994 Ohmsha.
[0067]
It is also possible to synthesize high-quality speech using a vocoder process. For example, it can be realized by applying speech analysis and synthesis method STRAIGHT (speech transformation and representation based on adaptive interpolation of weighted spectrogram) (references “Maeda N et al, Voice Conversion with STRAIGHT. TECHNICAL REPORT OF IEICE, EA98-9”). , 31-6, 1998 ”).
[0068]
Further, the speech information generation unit 23 uses information about speech content (phonological information) and information about pitch and loudness (prosodic information) by using a text to speech synthesis technique for generating speech from text information. Information) can be adjusted to the level of the person's most deaf sound according to the characteristics of the hearing loss of the hearing impaired person, as well as voice speed converting and frequency compression. Performs voice feature value conversion processing such as processing. Also, the speech information is subjected to frequency band expansion processing for adjusting the bandwidth of the output speech, speech enhancement processing, and the like. The band extension processing and speech enhancement processing are shown in, for example, “Abe M,“ Speech Modification Methods for Fundamental Frequency, Duration and Speaker Individuality, ”TECHNICAL REPORT OF IEICE, SP93-137, 69-75, 1994”. It can be realized by using technology. As described above, not only when the speech recognition processing is performed by the signal processing unit 22 and the speech information generation unit 23 and the recognition result is processed and converted, but only the above processing may be performed and output to the speaker unit 25. . Moreover, in this hearing aid 1, you may output the recognition result and / or the result which performed only the said process simultaneously or with a time difference. Further, the hearing aid 1 may output different results for the right channel and the left channel of the speaker unit 25 or the display unit 26 as a result of the recognition and / or the result of performing only the above processing.
[0069]
Furthermore, the voice information generation unit 23 not only performs a process of understanding the language from the voice using the recognition result and constructing the voice information from the voice data using the understood language, but also performs other processes. You may perform the process which processes and converts the language understood based on the recognition result as needed. In other words, the voice information generation unit 23 may perform speech speed conversion processing that changes the speed when the voice information is output to the speaker unit 25 as voice information. That is, the speech speed conversion process is performed by selecting an appropriate speech speed according to the user's condition.
[0070]
Furthermore, the speech information generation unit 23 may perform output such as translation processing that converts, for example, Japanese speech information into English speech information according to the recognition result. It can also be applied to automatic translation telephones. Further, the voice information generation unit 23 may perform automatic abstracting, convert “United States of America” to be summarized as “USA”, and output the voice information.
[0071]
As another automatic summarization process performed by the voice information generation unit 23, for example, a clue expression that seems to be useful for summarization is extracted from a sentence, and a generation process that generates a readable sentence expression based on the clue expression (reference “Document“ McKeown K and Radev DR, Generating Summaries of Multiple News Articles. In Proc of 18th Ann Int ACM SIGIR Conf on Res and Development in Information Retrieval, 74-82 , 1995 and Hovy E, Automated Discourse Generation using Discourse Structure Relations, Artificial Intelligence, 63, 341-85, 1993)), setting the problem so that the summary can be treated as a “crop” and objective evaluation is possible Extraction process in the position of trying (see “Kupiec J et al, A Trainable Document Summarizer, In Proc of 14th AnnInt ACM SIGIR Conf on Res and Development in Information Retrieval, 68-73, 1995” and “Miike S, et al, A Full-text Retrieval System with a Dynamic Abstruct Generation Function.Proc of 17th Ann Int ACM SIGIR Conference on Res and Development in Information Retrieval, 152-9, 1994 '' and `` Edmundson HP, New Method in Automatic Extracting J of the ACM, 16,264-85, 1969 ”). Further, the voice information generation unit 23 is described in, for example, a document “Nakazawa M, et al. Text summary generation system from spontaneous speech, Acoustical Society of Japan Proceedings 1-6-1, 1-2, 1998”. It is possible to use a method (extracting important keywords using Partial Matching Method and Incremental Reference Interval-Free continuous DP, and recognizing words using Incremental Path Method).
[0072]
Furthermore, the voice information generation unit 23 selects a specific phoneme, Vowels, consonants, In accents, etc., instead of erasing or outputting sound, a buzzer, yawning, coughing, Monotonous sound May be controlled so as to be output together with audio information. At this time, the voice information generation unit 23 is, for example, a document “Warren RM and Perceptual Restoration of Missing Speech Sounds, Science vol. 167, 392-393 , 1970 ”and references“ Warren RM, Obusek CJ, “Speech perception and phonemic restoration,” Perception and psychophysics vol.9, 358-362 , 1971 "is performed on the audio information.
[0073]
Furthermore, the voice information generation unit 23 may output the voice information by converting the sound quality so as to obtain a horn tone using the recognition result. The horn tone means that a sound collection tube is used, and a gain in a band of about 2000 Hz or less is amplified to a gain of about 15 dB. That is, the horn tone is a sound quality output by a technique for reproducing a deep bass using tube resonance. The audio information generation unit 23 converts the sound into sound similar to the output sound quality using, for example, an acoustic wave guide technique known by US PATENT 4628528, and outputs audio information. Here, the audio information generation unit 23 may perform a process of outputting audio information by performing, for example, a filter process that allows only low sounds to pass, and uses, for example, an SUVAG (Systeme Universal Verbo-tonal d'Audition-Guberina) device. Thus, it is possible to perform various kinds of filter processing that allows only sound in a predetermined frequency band to pass through and output the sound information.
[0074]
Furthermore, when it is determined that music is input to the microphone 21, for example, the voice information generation unit 23 may perform processing to convert the voice information and display a note or color on the display unit 26. In addition, the voice information generation unit 23 may convert the voice information so that the rhythm of the voice converted so that the rhythm of the voice can be understood and the signal blinks and display the voice information on the display unit 26.
[0075]
Furthermore, when the voice information generating unit 23 determines that a dial tone such as an alarm is input to the microphone 21, for example, the voice information is converted to indicate that the alarm or the like is detected on the display unit 26 by converting the voice information. It is possible to display the content of the alarm or inform the speaker unit 25 of the alarm. For example, if you hear an ambulance or siren of an emergency bell, you will not only display it, It is output from the speaker unit 25 and an image indicating an ambulance or fire can be displayed on the display unit 26, so that an emergency can be conveyed to the hearing impaired person, and the worst situation can be avoided.
[0076]
Furthermore, the voice information generation unit 23 may have a function of storing the conversion / synthesis processing performed in the past. Thereby, the audio | voice information production | generation part 23 can perform the learning process which automatically improves the conversion synthetic | combination process performed in the past, and can improve the processing efficiency of a conversion synthetic | combination process.
[0077]
Furthermore, the signal processing unit 22 and the voice information generating unit 23 generate a recognition result for only the voice of the speaker, generate voice information, and present it to the speaker unit 25 and / or the display unit 7 to provide the user. For example, voice recognition may be performed only for specific noise. In short, the signal processing unit 22 and the voice information generation unit 23 perform voice recognition processing on the input sound, and the user understands by converting the recognition result according to the user's physical state, usage state, and usage purpose. A process of generating and outputting voice information with easy-to-use expressions is performed.
[0078]
Furthermore, in the description of the hearing aid 1 to which the above-described present invention is applied, an example of generating and outputting sound information by combining sound data previously sampled and stored in the storage unit 24 by the sound information generating unit 23. The audio information generation unit 23 includes an audio data conversion unit that performs conversion processing on audio data stored when generating audio information by combining the audio data stored in the storage unit 24. Also good. The hearing aid 1 including such a sound data conversion unit can change the sound quality of the sound output from the speaker unit 25, for example.
[0079]
Furthermore, in the description of the hearing aid 1 to which the present invention is applied, an example in which audio data obtained by pre-sampling a user's voice before laryngectomy is stored in the storage unit 24 has been described. The storage unit 24 may sample and store a plurality of audio data as well as a single audio data. That is, the storage unit 24 may store, for example, audio data obtained by sampling audio generated before laryngectomy in advance, audio data approximated to audio generated before laryngectomy, and a completely different sound quality. In addition, audio data that can easily generate audio data before laryngectomy may be stored. When a plurality of audio data is stored in the storage unit 24 as described above, the audio information generation unit 23 selectively uses the audio data by associating the relationship between the audio data using, for example, a relational expression. Audio information may be generated.
[0080]
Moreover, although the above-mentioned hearing aid 1 demonstrated the example which produces | generates and outputs audio | voice information by synthesize | combining the audio | voice data sampled and stored in the memory | storage part 24, the audio | voice data memorize | stored in the memory | storage part 24 are synthesize | combined. By performing vocoder processing on the voice information generated by the voice information generation unit 23, the voice information may be converted into voice having a sound quality different from the voice indicated by the sampled and stored voice data and output. At this time, the voice information generation unit 23 performs STRAIGHT as an example using vocoder processing.
[0081]
Furthermore, the signal processing unit 22 may perform speaker recognition processing on the input voice and generate a recognition result corresponding to each speaker. And in this signal processing part 22, you may show to a user by outputting the information regarding each speaker to the speaker part 25 or the display part 26 with a recognition result.
[0082]
When performing speaker recognition with Hearing Aid 1, vector quantization (references Soong FK and Rosenberg AE, On the use of instantaneous and transitional spectral information in speaker recognition. Proc of ICASSP '86, 877-80, 1986). In speaker recognition using this vector quantization, as a preparatory stage process, parameters representing spectral features are extracted from learning speech data for registered speakers, and a codebook is created by clustering these parameters. The method based on vector quantization is a method that considers that speaker characteristics are reflected in the created codebook. At the time of recognition, vector quantization is performed using the input speech and the codebooks of all registered speakers, and the quantization distortion (spectral error) is calculated for the entire input speech. Using this result, speaker identification and collation are determined.
[0083]
When performing speaker recognition with the hearing aid 1, a method based on HMM (Document Zheng YC and Yuan BZ, Text-dependent speaker identification using circular hidden Markov models. Proc of ICASSP '88, 580-2, 1988) may be used. In this method, an HMM is created from learning speech data of a registered speaker as a preparatory process. In the method using the HMM, it is considered that speaker characteristics are reflected in transition probabilities between states and symbol output probabilities. At the stage of speaker recognition, determination is performed by calculating the likelihood of all registered speakers using the HMM using the input speech. As an HMM structure, an ergonomic HMM may be used for the left-to-right model.
[0084]
Furthermore, in the hearing aid 1, the ATR-MATRIX system (manufactured by ATR Speech Translation and Communications Research Institute: Reference Takezawa T et al, ATR-MATRIX: A spontaneous speech translation system between English and Japanese. ATR J2,29-33, June1999) By performing speech recognition (ATRSPREC), speech synthesis (CHATR), and language translation (TDMT) that are used, speech input through the microphone 21 can be translated and output.
[0085]
In the above speech recognition (ATRSPREC), large vocabulary continuous speech recognition is performed, and the steps from the acoustic model and language model necessary for speech recognition and signal processing to search are processed using a speech recognition tool. In this speech recognition, the performed processing is completed as a tool group, and the tools can be combined. Further, when performing this voice recognition, voice recognition of an unspecified speaker may be performed.
[0086]
In the above-mentioned speech synthesis (CHATR), a unit most suitable for a sentence to be output is selected from a large number of speech units stored in advance in a database and connected to synthesize speech. For this reason, smooth sound can be output. In this voice synthesis, voice data similar to the voice of the speaker can be synthesized using the voice data closest to the voice of the speaker. In addition, when performing this voice synthesis, the voice information generation unit 23 may determine whether the speaker is male or female from the input voice, and may perform voice synthesis with a voice corresponding thereto.
[0087]
In the above language translation (TDMT), language translation is performed by handling various expressions such as a process for determining the structure of a sentence and a simple expression unique to a dialog using a dialog example. Also, in this language translation, even if there is a part that the microphone 21 could not hear, the part that can be translated is subjected to partial translation processing as much as possible, and even if the whole sentence cannot be accurately translated, the content that the speaker wants to convey Tell the other party to a considerable extent.
[0088]
Further, when performing the above speech recognition, speech synthesis, and language translation, it is possible to interactively communicate with a communication device such as a mobile phone via the communication circuit 27.
[0089]
In the hearing aid 1 that performs speech recognition, speech synthesis, and language translation, there is no need to use, for example, a bilingual speech translation system, and almost real-time recognition, translation, synthesis, and instructions to start speaking. High-quality recognition, translation, synthesis “Ao”, “Ut” and natural expressions can be recognized even if there are some expressions.
[0090]
Furthermore, the speech information generation unit 23 not only determines the structure of the sentence based on the recognition result from the signal processing unit 22 in the speech recognition (ATRSPREC), but also uses the dialogue example, so that the speech information generation unit 23 is not only unique to the dialogue. Generate voice information corresponding to various expressions such as In addition, the voice information generation unit 23 generates voice information as much as possible in a part where the voice information can be generated even if there is a part where a part of the conversation in the microphone 21 cannot be heard. As a result, even if the voice information generating unit 23 cannot accurately generate the voice information of the entire sentence, the voice information generating unit 23 transmits the content that the speaker wants to convey to a considerable extent. At this time, the voice information generation unit 23 may generate voice information by performing a translation process (partial translation function).
[0091]
Also, the voice information generation unit 23 selects and joins the most suitable unit for the sentence to be output from a large amount of voice data stored in a database in advance in the voice synthesis (CHATR), and the voice is generated. Synthesizing to generate voice information. Thereby, the audio | voice information production | generation part 23 produces | generates the audio | voice information for outputting a smooth audio | voice. Further, the voice information generation unit 23 may perform synthesis processing with a voice similar to the voice of the speaker using voice data closest to the voice of the speaker, and determines whether the speaker is a male or a female from the input voice. The voice information may be generated by performing voice synthesis with a voice corresponding to the voice.
[0092]
Furthermore, the audio information generation unit 23 may extract only the sound of a specific sound source from the sound from the microphone 21 and output it to the speaker unit 25 and / or the display unit 26. As a result, the hearing aid 1 can artificially create a cocktail party phenomenon in which only the sound of a specific sound source can be extracted and heard from a mixture of sounds coming from a plurality of sound sources.
[0093]
Furthermore, the speech information generation unit 23 may generate speech information by correcting a mistake in hearing using a method of correcting a recognition result including an error using an example that is phonologically similar (document Ishikawa K, Sumida E: A computer recovering its own misheard-Guessing the original sentence form a recognition result based on familiar expressions- ATR J 37,10-11,1999). At this time, the voice information generation unit 23 performs processing according to the user's physical condition, usage state, and purpose of use, and processes and converts it into a form that is easy for the user to understand.
[0094]
In the description of the hearing aid 1 described above, an example in which voice recognition processing and voice generation processing are performed on the voice detected by the microphone 21 has been described. However, the operation input unit includes an operation input unit 28 operated by a user or the like. The data input to 28 may be converted by the signal processing unit 22 so as to be sound and / or images. Further, the operation input unit 28 may be attached to a user's finger, for example, and may generate data by detecting the movement of the finger and output the data to the signal processing unit 22.
[0095]
Further, the hearing aid 1 is configured such that, for example, a user draws characters and / or images by touching a liquid crystal screen or the like with a pen, and generates characters and / or image data based on an image obtained by capturing the trajectory. An image data generation mechanism may be provided. The hearing aid 1 outputs the generated character and / or image data after processing such as recognition and conversion by the signal processing unit 22 and the voice information generation unit 23.
[0096]
Furthermore, the above-described hearing aid 1 is not limited to an example in which the speech recognition processing is performed by the signal processing unit 22 using the sound from the microphone 21 or the like, for example, a nasal sound sensor or breath that is worn by a user and / or a person other than the user. Voice recognition processing may be performed using a detection signal from the flow sensor, the neck vibration sensor, and a signal from the microphone 21 or the like. Thus, the hearing aid 1 can further improve the recognition rate by the signal processing unit 22 by using not only the microphone 21 but also each of the above sensors.
[0097]
Further, the hearing aid 1 includes a camera mechanism 29 that captures a moving image, a still image, and the like with a digital camera equipped with, for example, an autofocus function and a zoom function as shown in FIG. May be. This camera mechanism 29 may be mounted integrally with the display unit 7 of FIG. As the camera mechanism 29, a digital camera may be used.
[0098]
In addition, the camera mechanism 29 provided in the hearing aid 1 performs eyeglasses displayed on the display unit 26 by performing an image conversion process for distorting or enlarging the captured image according to the user's visual acuity, astigmatism, or the like. It may have a function.
[0099]
Such a hearing aid 1 displays a captured image on the display unit 26 from a camera mechanism 29 via a signal processing circuit including a CPU, for example. The hearing aid 1 improves the user's recognition by presenting, for example, an image obtained by capturing a speaker with the camera mechanism 29 to the user. Further, the hearing aid 1 may output the captured image to an external network via the communication circuit 27, and further input the image captured by the camera mechanism 29 from the external network to input the communication circuit 27 and the signal processing. You may display on the display part 26 via a circuit.
[0100]
Further, in the hearing aid 1, the signal processing unit 22 may perform face recognition processing and object recognition processing using an image obtained by capturing a speaker and display the image on the display unit 26 via the voice information generation unit 23. Thereby, in the hearing aid 1, the user's voice recognition is improved by presenting the user's lips, facial expressions, overall atmosphere, and the like to the user.
[0101]
There are the following methods for extracting a personality feature of a face and performing personal recognition in face recognition using an imaging function, but the method is not limited to these.
[0102]
Mosaic pattern is used as one of feature representations for identification by grayscale image matching, and grayscale image is expressed by compressing information into low-dimensional vector by using average density of pixels in each block as representative value of block. This method is called M feature. In addition, a feature expression of a gray face image called a KI feature, an orthogonal base image obtained by applying Karhunen-Loeve (KL) expansion to a sample set of face images is called an eigenface, and an arbitrary face image is assigned to this eigenface. This is a method of describing with a low-dimensional feature vector composed of coefficients expanded by using. Further, a low-dimensional dimension obtained by performing dimension compression by converting a collation pattern based on a KI feature based on dimensional compression by KL expansion of a face image set into a Fourier spectrum and then performing KL expansion on the sample set in the same manner as in the case of the KI feature. There is a method for performing identification based on the KF feature which is the feature spectrum. The above methods can be used for face image recognition, and performing face recognition using them gives personal identification information to the computer as to who the conversation is, so that the user can Information for the interlocutor is obtained, and recognition of voice information is increased. Such processing is described in the literature “Shin Kosugi:“ Identification and Feature Extraction of Facial Images Using Neural Networks ”, Information Processing CV Research Bulletin, 73-2 (1991-07)”, “Turk MA and Pentland AP, Face recognition using eigenface.Proc CVPR, 586-91 (1991-06), “Akamatsu S et al, Robust face intification by pattern matching Based on KL expansion of the Fourier Spectrum II No. 7,1363-73, 1993 ”and literature“ Edwards GJ et al, Learning to identify and track faces in image seguences, Proc of FG '98, 260-5, 1998 ”.
[0103]
In this hearing aid 1, when performing object recognition, a pattern indicating an object is made into a mosaic, and an object is identified by matching with an actually captured image. The hearing aid 1 tracks the object by detecting the motion vector of the matched object. Thereby, recognition with respect to the audio | voice information produced | generated from the audio | voice emitted from an object increases. This object recognition process is Ubiquitous Talker proposed by Sony CSL (Nagao K and Rekimoto J, Ubiquitous Talker: Spoken language interaction with real world objects. Proc 14th IJCAI-95,1284-90, Morgan Kaufmann Publishers, 1995) The technique used can be adopted.
[0104]
Further, the hearing aid 1 may capture a still image by pressing a shutter like a digital camera for capturing a still image. Further, the camera mechanism 29 may generate a moving image and output it to the signal processing unit 22. As a signal system for capturing a moving image by the camera mechanism 29, for example, an MPEG (Moving Picture Experts Group) system or the like is used. Furthermore, the camera mechanism 29 provided in the hearing aid 1 captures a three-dimensional image, captures the speaker and the lips of the speaker, and displays them on the display unit 26 to further improve the user's recognition. be able to.
[0105]
Such a hearing aid 1 can be reviewed in language learning by recording and playing back an image of the user's own voice, the voice of the other party, and / or the scene of the scene. Can be useful.
[0106]
Further, according to the hearing aid 1, the image is enlarged and displayed on the display unit 26, so that the other party can be confirmed, the overall atmosphere can be grasped, and the accuracy of voice listening can be improved, and lip reading is performed. It becomes possible to raise awareness.
[0107]
Furthermore, the hearing aid 1 is provided with, for example, a switch mechanism, and outputs whether the sound detected by the microphone 21 is output by the speaker unit 25 or the image such as an image captured by the camera mechanism 29 is output by the display unit 26. Alternatively, the user may be able to control whether to output both sound and images. At this time, the switch mechanism is controlled by the user to control output from the voice information generation unit 23.
[0108]
Further, as an example, the switch mechanism detects the voice of the user and / or the person other than the user, and switches the voice detected by the microphone 21 to be output by the speaker unit 25 when the voice “voice” is detected, for example. For example, when the sound “image” is detected, the image is picked up by the camera mechanism 29 so as to be output by the display unit 26, and when the sound “sound, image” is detected, both the sound and the image are output. Switching may be performed as described above, and a switch control mechanism using voice recognition as described above may be provided. Moreover, it is good also as a switch control system by gesture recognition by using a gesture interface.
[0109]
Furthermore, this switch mechanism may have a function of switching the state when the camera mechanism 29 captures an image by switching parameters such as the zoom state of the camera mechanism 29.
[0110]
Next, in the hearing aid 1, various examples of a mechanism for outputting sound information created by the sound information generating unit 23 will be described. It is needless to say that the present invention is not limited to the output mechanism described below.
[0111]
That is, in this hearing aid 1, the mechanism for outputting audio information is not limited to the speaker unit 25 and the display unit 26, and may be one utilizing, for example, bone conduction or skin stimulation. The mechanism for outputting the audio information may be, for example, a mechanism in which a small magnet is attached to the eardrum etc. to vibrate the magnet, or a signal is transmitted to the cochlea through the bone.
[0112]
Such a hearing aid 1 includes, for example, a crush plate and outputs a signal obtained by conversion by the audio information generation unit 23 to the crush plate, or a tactile aid (Tactile aid using skin stimulation). Aid) or other tactile compensation technology may be used, and the signal from the voice information generation unit 23 is transmitted to the user by using the technology using bone vibration, skin stimulation, or the like. be able to. The hearing aid 1 using skin stimulation includes a tactile aid transducer array to which audio information from the audio information generation unit 23 is input, and is output from the speaker unit 25 via the tactile aid and the transducer array. Sound may be output.
[0113]
In the description of the hearing aid 1 described above, an example of processing when audio information is output as sound has been described. However, the present invention is not limited to this, and the recognition result is presented to the user using, for example, an artificial middle ear. Also good. That is, the hearing aid 1 may present audio information as an electrical signal to the user via a coil and a vibrator.
[0114]
Furthermore, the hearing aid 1 may be provided with a cochlear implant mechanism and present a recognition result to the user through the cochlear implant. In other words, the hearing aid 1 may supply audio information as an electrical signal to a cochlear implant system including, for example, an embedded electrode, a speech processor, etc., and present it to the user.
[0115]
Furthermore, this hearing aid 1 may be provided with a brainstem implant (Auditory Brainstem Implant) mechanism and present audio information to the user by an auditory brainstem implant. That is, the hearing aid 1 may supply audio information as an electrical signal to a brainstem implant system including, for example, an implanted electrode, a speech processor, and the like, and present it to the user.
[0116]
Furthermore, the hearing aid 1 is adapted to provide the voice information of the recognition result and the processed and converted recognition result for a hearing-impaired person capable of recognizing, for example, an ultrasonic band sound according to the user's physical condition, usage condition, and purpose of use. As described above, it may be output after being modulated / processed / converted into sound in the ultrasonic band. Furthermore, the hearing aid 1 has an ultrasonic output mechanism (bone conduction An ultrasonic frequency band signal may be generated using ultrasound and output to the user via an ultrasonic transducer or the like.
[0117]
Furthermore, the hearing aid 1 includes a bone conduction unit which is a system in which a headphone contact is placed on the tragus, bone conduction is performed, and vibrations of the tragus and the inner wall of the ear canal become air conduction sound. It may be used to present audio information to the user. As this bone conduction unit, a live phone (manufactured by Nippon Telegraph and Telephone Corporation) which is a headphone system for the hearing impaired can be used.
[0118]
Furthermore, although this hearing aid 1 demonstrated an example provided with several output means, such as the speaker part 25 and the display part 26, you may use combining these output means, Furthermore, each output means is output independently. You may do it. Moreover, in this hearing aid 1, while outputting a sound using the function of the conventional hearing aid which changes the sound pressure level of the sound input into the microphone 21, you may present a recognition result by the other output means mentioned above.
[0119]
Furthermore, the hearing aid 1 is provided with a switch mechanism that is controlled by the audio information generation unit 23 so that the output result output from the speaker unit 25 and / or the display unit 26 is output simultaneously or with a time difference. Alternatively, a switch mechanism for controlling whether to output the output result a plurality of times or to output the output result only once may be provided.
[0120]
In the description of the hearing aid 1, the example as shown in FIG. 2 has been described. However, the first processing for performing the above-described various processing conversion processes on the input sound and displaying it on the display unit 26 is performed. A CPU, a CPU that performs the above-described various processing conversion processes on the input sound, and performs a second process for outputting an output result to the speaker unit 25, and an image captured by the camera mechanism 29 are displayed. It may be equipped with a CPU that performs the third processing.
[0121]
Such a hearing aid 1 may operate the CPU for performing each process independently to perform the first process or the second process for output, and further operate the CPU for performing each process simultaneously. The first processing, the second processing, and the third processing may be performed and output. Furthermore, the first and second processing, the first and third processing, or the second and third processing may be performed. The CPUs that perform the above may be operated and output simultaneously.
[0122]
Furthermore, the hearing aid 1 outputs the output results from the various output mechanisms described above at the same time or with a time difference according to the user's physical condition, usage condition, and purpose of use. You may control by.
[0123]
Further, the hearing aid 1 includes a plurality of CPUs, and at least one of the first to third processes performed by the plurality of CPUs described above is performed by one CPU, and the remaining processes are performed by another CPU. You can go.
[0124]
For example, in the hearing aid 1, processing (text to speech synthesis) is performed to process and convert voice input by one CPU as character data and output to the display unit 26, or voice input by one CPU. Is processed and converted as character data, and the STRAIGHT process is performed on the same voice input by another CPU, and the process is output to the speaker unit 25, and the voice input by the other CPU is processed. In the vocoder process, for example, a process using the speech analysis / synthesis method STRAIGHT may be performed and output to the speaker unit 25 may be performed. In other words, the hearing aid 1 may be configured to perform different processing with different CPUs on the signal output to the speaker unit 25 and the output signal on the display unit 26.
[0125]
Further, the hearing aid 1 has a CPU that performs the above-described various processing conversion processes and outputs them to the above-described various output mechanisms, and receives the sound input to the microphone 21 without performing the processing conversion processing. It may be output.
[0126]
Further, the hearing aid 1 may include a CPU for performing one of the various processing conversion processes described above and a CPU for performing another processing conversion process.
[0127]
Furthermore, in this hearing aid 1, while performing the process in which the speech information generation unit 23 converts the recognition result, the recognition result processed and converted, the captured image, and the like as described above, Same as before It is also possible to amplify the electric signal obtained by detecting the sound and perform sound quality adjustment, gain adjustment, compression adjustment, etc., and output it to the speaker unit 25.
[0128]
In the hearing aid 1, the processing described above may be performed by applying the processing performed by the signal processing unit 22 and the voice information generation unit 23 in combination with, for example, Fourier transform and vocoder processing (STRAIGHT etc.). .
[0129]
In the hearing aid 1 to which the present invention is applied, a small type hearing aid for personal use has been described. However, the hearing aid 1 may be used for a large-sized hearing aid for a group (table training hearing aid or group training hearing aid).
[0130]
Examples of visual displays include HMDs and head-coupled displays. An example is shown below. Binocular HMD (one that presents a parallax image for each left and right eye and enables stereoscopic viewing, and one that presents the same image to both the left and right eyes to give an apparent large screen), monocular, see-through HMD, visual assistance, Display with visual enhancement function, eyeglass-type binocular telescope with auto-focus function and Visual filter, system using contact lens for eyepiece, retina projection type (Virtual Retinal Display, Retinal projection display, retina Projection type intermediate type, HMD with line-of-sight input function (product name HAQ-200 (Shimadzu Corporation)), display mounted on other than the head (neck, shoulder, face, eyes, arms, hands, etc.), stereoscopic display (projection) Object-oriented display (eg head-mounted projector: Iinami M et al., Head-mounted projector (II) -implementation Proc 4th Ann Conf Of Virtual Reality Society of Japan 59-62,1999) Display, spatial immnersive display (eg omnimax, CAVE (Cruz-Neira C et al. Surrounded-screen projection-based virtual reality: The design and implementation of the CAVE, Proc of SIGGRAPH'93,135-42,1993 CABIN (Hirose M et al. Transactions of the Institute of Electronics, Information and Communication Engineers Vol J81-D-II No.5.888-96,1998), a feature of both a projection display such as CAVE and an HMD. A compact ultra-wide field of view display (Endo T et al. Ultra field of view compact display. Proc 4th Ann Conf of Virtual Reality Society of Japan, 55-58, 1999), arch screen) can be used.
[0131]
In particular, a large-screen display may be used as a large hearing aid. In the hearing aid 1 described above, a binaural method is used as a sound reproduction method (a spatial sound source localization system using a head-related transfer function is used for the three-dimensional acoustic system: eg, Convolvotron & Acoustetron II (Crystal River Engineering), dynamic driver) Hearing aid TE-H50 (Sony) using a unit and electret microphone may be used, creating a sound field that is close to the actual one, or trans-oral method (trans-oral method with tracking function is a CAVE in 3D image reproduction) Are preferably used mainly in the case of large hearing aid systems.
[0132]
Furthermore, the above-described HMD 2 may include a three-dimensional position detection sensor on the top of the head. In the hearing aid 1 equipped with such an HMD 2, the display display can be changed in accordance with the movement of the user's head.
[0133]
The hearing aid 1 using Augmented Reality (AR) includes a sensor relating to the user's movement, and uses information detected by the sensor and voice information detected by the microphone 21 and generated by the voice information generation unit 23. Thus, an AR space is generated. The voice information generating unit 23 appropriately uses the VR space in the real space by cooperatively using a virtual reality (VR) system composed of a display system and a system that integrates various sensor systems and VR forming systems. It is possible to create an AR space that emphasizes the sense of reality by superimposing on. As a result, when the hearing aid 1 uses a visual display, the information from the image on the face part is not only in front of the eyes, but also the image information is not in front of the eyes, without greatly dropping the line of sight each time information is received. It becomes possible to receive the information from the sight in a natural state as it is naturally received as it is. There are the following systems to execute the above.
[0134]
As shown in FIG. 3, such a hearing aid 1 has a 3D graphic accelerator for generating virtual environment images installed in the audio information generating unit 23 in order to form an AR space. A configuration capable of stereoscopic viewing is provided, and a wireless communication system is further mounted. In order to obtain information on the position and posture of the user in the hearing aid 1, a small gyro sensor (eg Data Tech GU-3011) is used as the sensor 31 on the head, and an acceleration sensor (eg Data Tech GU-3012 is provided on the user's waist). ). A system in which the information from the sensor 31 is processed by the audio information generation unit 23 and then processed by the scan converters 32a and 32b corresponding to the right eye and the left eye of the user, and the image goes to the display unit 26 (Ban Y et al, Manual-less operation with wearable augmented reality system. See Proc 3th Ann Conf of Virtual Reality Society of Japan, 313-4, 1998).
[0135]
Further, in this hearing aid 1, in addition to the sensor 31, a situation recognition system (for example, Ubiquitous Talker (Sony CSL)) and a system that integrates the VR forming system with the following various sensor systems that are other systems that form the VR system, By using the display system and the hearing aid 1 in a cooperative manner, the AR space can be enhanced, and voice information can be supplemented using multi-modality.
[0136]
In order to form such a space such as VR, first, the user sends information to the sensor 31 from the person, the information is sent to a system that integrates the VR forming system, and the information is sent from the display system to the user. To be realized.
[0137]
The sensor 31 (information input system) includes the following devices.
[0138]
In particular, 3D optical position sensor (ExpertVision HiRES & Face Tracker (Motion Analysis)), 3D magnetic position sensor (InsideTrack (Polhemus), 3SPACE system (POLHEMUS), Bird (Ascension Tech.)), Mechanical 3D digitizer (MicroScribe 3D Extra (Immersion)), magnetic 3D digitizer (Model 350 (Polhemus)), acoustic 3D digitizer (Sonic Digitizer (Science Accessories)), optical 3D scanner ( 3D Laser Scanner (ASTEX)), biosensor (measured by electricity inside the body) cyber finger (NTT Human Interface Laboratories), glove-type device (DetaGlove (VPL Res), Super Glove (Nissho Electronics) Cyber Glove (Virtual Tech) )), Force feedback (Haptic Master (Nissho Electronics), PHANToM (SensAble Devices)), 3 Mouse (Space Controller (Logitech)), eye sensor (eye movement analyzer (ATR Audio Visual Laboratory)), whole body motion measurement system (DateSuit (VPL Res)), motion capture system (HiRES (Motion Analysis) )), Acceleration sensors (three-dimensional semiconductor acceleration sensors (manufactured by NEC)), and HMDs with a line-of-sight input function.
[0139]
In order to realize AR, there are not only the display unit 26 but also a tactile display, a tactile pressure display, and a force display using tactile sense. The tactile display conveys the sound by tactile sense, and it is possible to recognize the sound by adding not only hearing but also tactile sense. As the tactile display, for example, a vibrator array (such as an optacon, a tactile mouse, or a tactical vocoder), a tactile pin array (such as a paperless brail), or the like can be used. Others include water jet, air jet, PHANToM (SensAble Devices), and Haptic Master (Nissho Electronics). Specifically, the hearing aid 1 displays a VR keyboard in a VR space, and controls the processing in the signal processing unit 22 and the audio information generation unit 23 by a VR keyboard or a VR switch. This eliminates the need to prepare a keyboard or extend the hand to the switch, making it easier for the user to operate and providing a feeling of wearing similar to that of a hearing aid only worn on the ear.
[0140]
As a vestibule-like display, a system (motion bet) that can express various accelerations even in a device with a narrow operating range by washout and washback can be used.
[0141]
The systems that integrate the VR system include the following, but are not limited to these systems, but are provided as C and C ++ libraries that support display and database, device input, interference calculation, event management, etc. The application portion may be programmed by a user using a library, or a system that performs database simulation and event simulation without executing user programming, and executes VR simulation as it is. Moreover, you may connect between each system regarding this hearing aid 1 by communication. In addition, a broadband communication path may be used to transmit the situation with high presence. Further, in the hearing aid 1, the following technique used in the field of 3D computer graphics may be used. The concept is to faithfully present what can happen in reality as an image, create an unreal space, and present what is actually impossible as an image. In 3D computer graphics, the following modeling technology, rendering technology, and animation technology are possible. Modeling techniques for creating complex and precise models include wireframe modeling, surface modeling, solid modeling, Bezier curves, B-spline curves, NURBS curves, Boolean operations (Boolean operations), free-form deformation, free-form modeling, particles , Sweep, fillet, lofting, metaball, etc. In order to pursue realistic objects with textures and shadows, rendering techniques include shading, texture mapping, rendering algorithms, motion blur, anti-aliasing, and depth cueing. In addition, animation techniques for moving the created model and simulating the real world include key frame method, inverse kinematics, morphing, shrink wrap animation, and α channel. Further, as a sound rendering, a technique described in “Document Takala T, Computer Graphics (Proc SIGGRAPH 1992) Vol 26, No 2, 211-20” may be used.
[0142]
As a system for integrating such VR systems, for example, a system of Division Inc (VR runtime software [dVS], VR space construction software [dVISE], VR development library [VC Toolkit], SENSE8 WorldToolKit, WorldUp, As described in the document “Hirose M et al. A study of image editing technology for synthetic sensation. Proc ICAT '94, 63-70, 1994”. You may use the method.
[0143]
In the present embodiment, the portable hearing aid 1 in which the HMD 2 and the computer unit 3 are connected by the optical fiber cable 4 has been described. However, the HMD 2 and the computer unit 3 are wireless, and the HMD 2 Information may be transmitted to and received from the computer unit 3 by a signal transmission method using wireless or infrared rays. Furthermore, in this hearing aid 1, not only when the area between the HMD 2 and the computer unit 3 is wireless, but also by dividing each function performed by each unit shown in FIG. It is also possible to send and receive information to and from the HMD 2 without attaching at least the computer unit 3 to the user. Furthermore, in this hearing aid 1, according to the user's physical condition, usage state, and purpose of use, it is divided into a plurality of devices for each function performed by each unit shown in FIG. good. As a result, the hearing aid 1 can reduce the weight and volume of the device worn by the user, improve the degree of freedom of the user's body, and further improve the user's recognition.
[0144]
In the hearing aid 1, the control, version upgrade, repair, and the like of the processing performed by the signal processing unit 22 and the audio information generation unit 23 may be performed via the communication circuit 27. Accordingly, the hearing aid 1 can receive repair, control, adjustment, and the like through the communication circuit 27 through a visual display, an auditory display, and the like.
[0145]
In addition, according to the hearing aid 1 to which the present invention is applied, since it can be presented to the user by displaying synthesized speech, for example, office work (as a wearable computer), communication (application to automatic translation telephones, etc.) , Occupational medicine (such as mental health), medical practice (use for hearing test), foreign language learning, language training, entertainment (video game), personal home theater, watching concerts and games, program production (animation, Live-action video, news, music production), underwater (underwater conversation during diving, etc.), intelligence activities, military operations, work operations under adverse conditions such as noise (building site factories, etc.), sports (race of automobiles, yachts, etc.) , During adventures in the mountains and the sea, during the game and practice of players, communication and information exchange between players and between players and coaches, and work in space, transportation (U Pilots of ships and airplanes), car navigation systems, various simulation work using VR and AR (teleoperative surgery (microsurgery), etc.), education, training, medical treatment, medical treatment of illness, politics, travel, shopping, VR / AR using Fish-tank VR display, autostereoscopic system, tele-existence visual system, etc. in marketing, advertising, religion, design fields, amusement parks, etc. It can also be applied to telephone and internet service, and can be used in a wide range of fields such as communication for severely ill patients, severely disabled people, nursing schools, etc. as well as those with speech and speech disabilities.
[0146]
【The invention's effect】
As described above in detail, the hearing aid according to the present invention combines speech data stored in advance based on a recognition result obtained by detecting a speech language handicapped person, converts speech information into speech, and outputs the speech to the outside. At the same time, since the external voice can be output to the user, people who have spoken language disorder such as laryngectomy, tongue-and-mouth excision, articulation disorder, etc. have their own or can be converted freely and natural voice The user's hearing can be supplemented by outputting voice from the outside to the user.
[0147]
The hearing aid according to the present invention includes conversion means for processing and converting so as to change the content of the recognition result from the recognition means in accordance with the user's physical condition, usage state and purpose of use. It is possible to present the result of speech recognition according to the physical condition, the utilization state, and the purpose of use, and present the recognition result with little noise.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an example of the appearance of a hearing aid to which the present invention is applied.
FIG. 2 is a block diagram showing a configuration of a hearing aid to which the present invention is applied.
FIG. 3 is a block diagram showing a configuration for creating an AR space with a hearing aid to which the present invention is applied.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Audio | voice production | generation apparatus, 2 Head mounted display, 3 Computer part, 7 Display part, 8 User's microphone, 11 External microphone, 21 Microphone, 23 Audio | voice information production | generation part, 24 Memory | storage part, 25 Speaker part, 26 Display part

Claims

A hearing aid that summarizes and presents detected speech for a hearing impaired person , using an audio / electric conversion means that detects an external sound and generates an audio signal, and an audio signal from the acoustoelectric conversion means A recognition means for performing speech recognition processing; a conversion means for summarizing the speech information as the recognition result so that the user can easily understand the meaning of the recognition result from the recognition means; Alternatively, output control means for generating a control signal for outputting the recognition result obtained by converting the recognition result by the conversion means, and the recognition result by the recognition means and / or the conversion means based on the control signal generated by the output control means. Output means for outputting voice information, which is the recognition result converted by the above, and presenting the voice to the user, the output control means from the recognition means Based on the identification result, the recognition result of the recognition result and / or the recognition result processed converted by said converting means by the recognizing means, hearing aid and controls to output again from the output means.

A hearing aid that summarizes and presents detected speech for a hearing impaired person , using an audio / electric conversion means that detects an external sound and generates an audio signal, and an audio signal from the acoustoelectric conversion means A recognition means for performing speech recognition processing; a conversion means for summarizing the speech information as the recognition result so that the user can easily understand the meaning of the recognition result from the recognition means; Alternatively, output control means for generating a control signal for outputting the recognition result obtained by converting the recognition result by the conversion means, and the recognition result by the recognition means and / or the conversion means based on the control signal generated by the output control means. Output means for outputting voice information, which is the recognition result converted by the above, and presenting the voice to the user, wherein the output means comprises a cochlear implant mechanism, Output control means, a hearing aid, characterized in that generating the control signal so as to output the recognition result and / or the transformed recognition result as an electric signal.

A hearing aid that summarizes and presents detected speech for a hearing impaired person , using an audio / electric conversion means that detects an external sound and generates an audio signal, and an audio signal from the acoustoelectric conversion means A recognition means for performing speech recognition processing; a conversion means for summarizing the speech information as the recognition result so that the user can easily understand the meaning of the recognition result from the recognition means; Alternatively, output control means for generating a control signal for outputting the recognition result obtained by converting the recognition result by the conversion means, and the recognition result by the recognition means and / or the conversion means based on the control signal generated by the output control means. Output means for outputting voice information which is the recognition result converted by the above and presenting the voice to the user, the output means is composed of a crush plate, and the output Control means, a hearing aid, characterized in that generating the control signal so as to output the recognition result and / or the converted recognition result to the 圧挺 plate as vibrations.

A hearing aid that summarizes and presents detected speech for a hearing impaired person , using an audio / electric conversion means that detects an external sound and generates an audio signal, and an audio signal from the acoustoelectric conversion means A recognition means for performing speech recognition processing; a conversion means for summarizing the speech information as the recognition result so that the user can easily understand the meaning of the recognition result from the recognition means; Alternatively, output control means for generating a control signal for outputting the recognition result obtained by converting the recognition result by the conversion means, and the recognition result by the recognition means and / or the conversion means based on the control signal generated by the output control means. Output means for outputting voice information, which is the recognition result converted by the above, and presenting the voice to the user, and the output means comprises an artificial middle ear mechanism, Output control means, a hearing aid, characterized in that generating the control signal so as to output the recognition result and / or the transformed recognition result as an electric signal.

A hearing aid that summarizes and presents detected speech for a hearing impaired person , using an audio / electric conversion means that detects an external sound and generates an audio signal, and an audio signal from the acoustoelectric conversion means A recognition means for performing speech recognition processing, a conversion means for summarizing the speech information as the recognition result so that the user can easily understand the meaning content of the recognition result from the recognition means, a recognition result by the recognition means, and / or Alternatively, output control means for generating a control signal for outputting the recognition result obtained by converting the recognition result by the conversion means, and the recognition result by the recognition means and / or the conversion means based on the control signal generated by the output control means. Output means for outputting voice information which is the recognition result converted by the above and presenting the voice to the user, and the output means comprises an ultrasonic output mechanism. The output control means, a hearing aid, characterized in that generating the control signal so as to output the recognition result and / or the converted recognition result to the ultrasonic output mechanism as an electric signal.

A hearing aid that summarizes and presents detected speech for a hearing impaired person , using an audio / electric conversion means that detects an external sound and generates an audio signal, and an audio signal from the acoustoelectric conversion means A recognition means for performing speech recognition processing, a conversion means for summarizing the speech information as the recognition result so that the user can easily understand the meaning content of the recognition result from the recognition means, a recognition result by the recognition means, and / or Alternatively, output control means for generating a control signal for outputting the recognition result obtained by converting the recognition result by the conversion means, and the recognition result by the recognition means and / or the conversion means based on the control signal generated by the output control means. Output means for outputting voice information, which is the recognition result converted by, and presenting the voice to the user, wherein the output means includes vibration for tactile aid Consists of an array, said output control means, a hearing aid, characterized in that generating the control signal so as to output the recognition result and / or the converted recognition result to the transducer array as an electric signal.

A hearing aid that summarizes and presents detected speech for a hearing impaired person , using an audio / electric conversion means that detects an external sound and generates an audio signal, and an audio signal from the acoustoelectric conversion means A recognition means for performing speech recognition processing; a conversion means for summarizing the speech information as the recognition result so that the user can easily understand the meaning of the recognition result from the recognition means; Alternatively, output control means for generating a control signal for outputting the recognition result obtained by converting the recognition result by the conversion means, and the recognition result by the recognition means and / or the conversion means based on the control signal generated by the output control means. Output means for outputting voice information, which is a recognition result converted by, and presenting the voice to the user, the output means comprising an auditory brainstem implant device Made, the output control means, a hearing aid, characterized in that generating the control signal so as to output the recognition result and / or the converted recognition result to the auditory brainstem implants mechanism as an electric signal.

A hearing aid that summarizes and presents detected speech for a hearing impaired person , using an audio / electric conversion means that detects an external sound and generates an audio signal, and an audio signal from the acoustoelectric conversion means A recognition means for performing speech recognition processing, a conversion means for summarizing the speech information as the recognition result so that the user can easily understand the meaning content of the recognition result from the recognition means, a recognition result by the recognition means, and / or Alternatively, output control means for generating a control signal for outputting the recognition result obtained by converting the recognition result by the conversion means, and the recognition result by the recognition means and / or the conversion means based on the control signal generated by the output control means. Output means for outputting voice information which is the recognition result converted by the above and presenting the voice to the user, the acoustoelectric conversion means, the recognition means, Realizes the output control means and output means by a plurality of devices, connected between said plurality of devices in a wireless communication network, a hearing aid, characterized in that transmitting and receiving at least voice information.

A hearing aid that summarizes and presents detected speech for a hearing impaired person , using an audio / electric conversion means that detects an external sound and generates an audio signal, and an audio signal from the acoustoelectric conversion means A recognition means for performing speech recognition processing; a conversion means for summarizing the speech information as the recognition result so that the user can easily understand the meaning of the recognition result from the recognition means; Alternatively, output control means for generating a control signal for outputting the recognition result obtained by converting the recognition result by the conversion means, and the recognition result by the recognition means and / or the conversion means based on the control signal generated by the output control means. and output means for presenting the audio to the user by outputting audio information is converted recognition result by said acoustoelectric conversion means is detected from the outside Hearing aids and generates the audio signal using the electric signal generated from the voice.

A hearing aid that summarizes and presents detected speech for a hearing impaired person , using an audio / electric conversion means that detects an external sound and generates an audio signal, and an audio signal from the acoustoelectric conversion means A recognition means for performing speech recognition processing; a conversion means for summarizing the speech information as the recognition result so that the user can easily understand the meaning of the recognition result from the recognition means; Alternatively, output control means for generating a control signal for outputting the recognition result obtained by converting the recognition result by the conversion means, and the recognition result by the recognition means and / or the conversion means based on the control signal generated by the output control means. Output means for outputting voice information which is the recognition result converted by the above and presenting the voice to the user, a display unit, the recognition result by the recognition means and / or Hearing aids and means for outputting the audio information which is recognition result converted by the converting means to the display unit.