JP4115723B2

JP4115723B2 - Text search device by voice input

Info

Publication number: JP4115723B2
Application number: JP2002073850A
Authority: JP
Inventors: 敦藤井; 克亘伊藤; 徹也石川
Original assignee: Japan Science and Technology Agency; National Institute of Advanced Industrial Science and Technology AIST; National Institute of Japan Science and Technology Agency
Current assignee: Japan Science and Technology Agency; National Institute of Advanced Industrial Science and Technology AIST; National Institute of Japan Science and Technology Agency
Priority date: 2002-03-18
Filing date: 2002-03-18
Publication date: 2008-07-09
Anticipated expiration: 2022-03-18
Also published as: JP2003271629A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声入力によるテキスト検索装置に関するものである。
【０００２】
【従来の技術】
従来、このような技術分野の参考文献としては、以下に示されるようなものがあった。
【０００３】
〔１〕Ｊ．Ｂａｒｎｅｔｔ，Ｓ．Ａｎｄｅｒｓｏｎ，Ｊ．Ｂｒｏｇｌｉｏ，Ｍ．Ｓｉｎｇｈ，Ｒ．Ｈｕｄｓｏｎ，ａｎｄＳ．Ｗ．Ｋｕｏ．Ｅｘｐｅｒｉｍｅｎｔｓｉｎｓｐｏｋｅｎｑｕｅｒｉｅｓｆｏｒｄｏｃｕｍｅｎｔｒｅｔｒｉｅｖａｌ．ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆＥｕｒｏｓｐｃｅｃｈ，９７，ｐｐ．１３２３−１３２６，１９９７．
〔２〕ＦａｂｉｏＣｒｅｓｔａｎｉ．Ｗｏｒｄｒｅｃｏｇｎｉｔｉｏｎｅｒｒｏｒｓａｎｄｒｅｌｅｖａｎｃｅｆｅｅｄｂａｃｋｉｎｓｐｏｋｅｎｑｕｅｒｙｐｒｏｃｅｓｓｉｎｇ．ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＦｏｕｒｔｈＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＦｌｅｘｉｂｌｅＱｕｅｒｙＡｎｓｗｒｉｎｇＳｙｓｔｅｍｓ，ｐｐ．２６７−２８１，２０００．
〔３〕ＡｔｓｕｓｈｉＦｕｊｉｉ，ＫａｔｕｎｏｂｕＩｔｏｕ，ａｎｄＴｅｔｓｕｙａＩｓｈｉｋａｗａ．Ｓｐｅｅｃｈ−ｄｒｉｖｅｎｔｅｘｔｒｅｔｒｉｅｖａｌ：ＵｓｉｎｇｔａｒｇｅｔＩＲｃｏｌｌｅｃｔｉｏｎｓｆｏｒｓｔａｔｉｓｔｉｃａｌｌａｎｇｕａｇｅｍｏｄｅｌａｄａｐｔａｔｉｏｎｉｎｓｐｅｅｃｈｒｅｃｏｇｎｉｔｉｏｎ．ＩｎＩｎｆｏｒｍａｔｉｏｎＲｅｔｒｉｅｖａｌＴｅｃｈｎｉｑｕｅｓｆｏｒＳｐｅｅｃｈＡｐｐｌｉｃａｔｉｏｎｓ，ｐｐ．９４−１０４，Ｓｐｒｉｎｇｅｒ，２００２．
〔４〕ＪｏｈｎＳ．Ｇａｒｏｆｏｌｏ，ＥｌｌｅｎＭ．Ｖｏｏｒｈｅｅｓ，ＶｉｎｃｅｎｔＭ．Ｓｔａｎｆｏｒｄ，ａｎｄＫａｒｅｎＳｐａｒｃｋＪｏｎｅｓ．Ｔｒｅｃ−６１９９７ｓｐｏｋｅｎｄｏｃｕｍｅｎｔｒｅｔｒｉｅｖａｌｔｒａｃｋｏｖｅｒｖｉｅｗａｎｄｒｅｓｕｌｔｓ．ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ６ｔｈＴｅｘｔＲＥｔｒｉｅｖａｌＣｏｎｆｅｒｅｎｃｅ，ｐｐ．８３−９１，１９９７．
〔５〕ＫａｔｕｎｏｂｕＩｔｏｕ，ＡｔｓｕｓｈｉＦｕｊｉｉ，ａｎｄＴｅｔｓｕｙａＩｓｈｉｋａｗａ．Ｌａｎｇｕａｇｅｍｏｄｅｌｉｎｇｆｏｒｍｕｌｔｉ−ｄｏｍａｉｎｓｐｅｅｃｈ−ｄｒｉｖｅｎｔｅｘｔｒｅｔｒｉｅｖａｌ．ＩｎＩＥＥＥＡｕｔｏｍａｔｉｃＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎａｎｄＵｎｄｅｒｓｔａｎｄｉｎｇＷｏｒｋｓｈｏｐ，２００１．
〔６〕ＫａｔｕｎｏｂｕＩｔｏｕ，ＭｉｋｉｏＹａｍａｍｏｔｏ，ＫａｚｕｙａＴａｋｅｄａ，ＴｏｓｈｉｙｕｋｉＴａｋｅｚａｗａ，ＴａｔｓｕｏＭａｔｓｕｏｋａ，ＴｅｔｓｕｎｏｒｉＫｏｂａｙａｓｈｉ，ａｎｄＫｉｙｏｈｉｒｏＳｈｉｋａｎｏ．ＪＮＡＳ：Ｊａｐａｎｅｓｅｓｐｅｅｃｈｃｏｒｐｕｓｆｏｒｌａｒｇｅｖｏｃａｂｕｌａｒｙｃｏｎｔｉｎｕｏｕｓｓｐｅｅｃｈｒｅｃｏｇｎｉｔｉｏｎｒｅｓｅａｒｃｈ．ＪｏｕｒｎａｌｏｆＡｃｏｕｓｔｉｃＳｏｃｉｅｔｙｏｆＪａｐａｎ，Ｖｏｌ．２０，Ｎｏ．３，ｐｐ．１９９−２０６，１９９９．

【０００４】
〔８〕Ｋ．Ｌ．ＫｗｏｋａｎｄＭ．Ｃｈａｎ．Ｉｍｐｒｏｖｉｎｇｔｗｏ−ｓｔａｇｅａｄ−ｈｏｃｒｅｔｒｉｅｖａｌｆｏｒｓｈｏｒｔｑｕｅｒｉｅｓ．ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２１ｓｔＡｎｎｕａｌＩｎｔｅｒｎａｔｉｏｎａｌＡＣＭＳＩＧＩＲＣｏｎｆｅｒｅｎｃｅｏｎＲｅｓｅａｒｃｈａｎｄＤｅｖｅｌｏｐｍｅｎｔｉｎＩｎｆｏｒｍａｔｉｏｎＲｅｔｒｅｖａｌ，ｐｐ．２５０−２５６，１９９８．〔９〕ＤｏｕｇｌａｓＢ．ＰａｕｌａｎｄＪａｎｅｔＭ．Ｂａｋｅｒ．ＴｈｅｄｅｓｉｇｎｆｏｒｔｈｅＷａｌｌＳｔｒｅｅｔＪｏｕｒｎａｌ−ｂａｓｅｄＣＳＲｃｏｒｐｕｓ．ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆＤＡＲＰＡＳｐｅｅｃｈ＆ＮａｔｕｒａｌＬａｎｇｕａｇｅＷｏｒｋｓｈｏｐ，ｐｐ．３５７−３６２，１９９２．
〔１０〕Ｓ．ＥＲｏｂｅｒｔｓｏｎａｎｄＳ．Ｗａｌｋｅｒ．Ｓｏｍｅｓｉｍｐｌｅｅｆｆｅｃｔｉｖｅａｐｐｒｏｘｉｍａｔｉｏｎｓｔｏｔｈｅ２−ｐｏｉｓｓｏｎｍｏｄｅｌｆｏｒｐｒｏｂａｂｉｌｉｓｔｉｃｗｅｉｇｈｔｅｄｒｅｔｒｉｅｖａｌ．ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ１７ｔｈＡｎｎｕａｌＩｎｔｅｒｎａｔｉｏｎａｌＡＣＭＳＩＧＩＲＣｏｎｆｅｒｅｎｃｅｏｎＲｅｓｅａｒｃｈａｎｄＤｅｖｅｌｏｐｍｅｎｔｉｎＩｎｆｏｒｍａｔｉｏｎ．Ｒｅｔｒｉｅｖａｌ，ｐｐ．２３２−２４１，１９９４．
〔１１〕ＨｅｒｍａｎＪ．Ｍ．ＳｔｅｅｎｅｋｅｎａｎｄＤａｖｉｄＡ．ｖａｎＬｅｅｕｗｅｎ．Ｍｕｌｔｉｌｉｎｇｕａｌａｓｓｅｓｓｍｅｎｔｏｆｓｐｅａｋｅｒｉｎｄｅｐｅｎｄｅｎｔｌａｒｇｅｖｏｃａｂｕｌａｒｙｓｐｅｅｃｈ−ｒｅｃｏｇｎｉｔｉｏｎｓｙｓｔｅｍｓ：ＴｈｅＳＱＡＬＥ−ｐｒｏｊｅｃｔ．ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆＥｕｒｏｓｐｅｅｃｈ９５，ｐｐ．１２７１−１２７４，１９９５．

【０００５】
〔１３〕ＳｔｅｖｅＹｏｕｎｇ．Ａｒｅｖｉｅｗｏｆｌａｒｇｅ−ｖｏｃａｂｕｌａｒｙｃｏｎｔｉｎｕｏｕｓ−ｓｐｅｅｃｈｒｅｃｏｇｎｉｔｉｏｎ．ＩＥＥＥＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇＭａｇａｚｉｎｅ，ｐｐ．４５−５７，Ｓｅｐｔｅｍｂｅｒ１９９６．
〔１４〕伊藤克亘，田中和世．被覆率を重視した大語彙連続音声認識用統計的言語モデル．日本音響学会論文集，ｐｐ．６５−６６，Ｍａｒｃｈ１９９９．〔１５〕鹿野清宏、伊藤克亘、河原達也、武田一哉、山本幹雄（編）、音声認識システム、オーム社、２００１．
近年の音声認識技術は、ある程度内容が整理されている発話に対しては実用的な認識精度を達成できるようになっており、様々な応用が考えられる。また、情報検索の分野では音声認識を採り入れた研究も数多く行われている。これらの研究は目的に応じて「音声データの検索」と「音声による検索」の２つに大別される。前者は、ＴＲＥＣのＳｐｏｋｅｎＤｏｃｕｍｅｎｔＲｅｔｒｉｅｖａｌ（ＳＤＲ）トラック（参考文献〔４〕）で放送音声データを対象にしたテストコレクションが整備されていることを背景にして盛んに研究が行われ、既に実用レベルに達している（参考文献〔７〕）。
【０００６】
【発明が解決しようとする課題】
それに対して、音声による検索は、カーナビゲーションシステムやコールセンターのようにキーボード入力を前提としないアプリケーションを支える重要な基盤技術であるにも拘らず、音声データ検索に比べて研究事例は少ない。
【０００７】
また、従来の研究では既存の音声認識とテキスト検索システムが単純に接続されているだけであり、音声認識誤りによって検索精度が顕著に低下する（参考文献〔１〕，〔２〕）。
【０００８】
そこで、本願発明者らは、検索対象のコレクションを用いて音声認識用の言語モデルを作成し、音声認識と検索精度の両方を向上させることができる手法を提案した（参考文献〔３〕，〔５〕）。
【０００９】
しかし、音声入力型の検索システムでは、未知語（システム辞書未登録語）の問題がある。近年の情報検索システムは、古典的な統制語彙型システムとは異なり、検索対象テキスト中の任意の語による検索を可能とする。索引のサイズが数１００万のオーダーに達することは珍しくない。
【００１０】
助詞などの機能語などは不要語として索引から除外されるものの、これらが検索キーワードとして利用されることは稀であるため、事実上、語彙制限はないと考えてよい。
【００１１】
他方において、近年の音声認識システムでは語彙サイズ（辞書登録語数）が制限される。これは、ハードウェアに関する制約や統計モデルの学習効率が主な原因であるため（参考文献〔１３〕）、登録語数を増やすという単純な方法では解決が困難である。多くの言語において、語彙サイズは高々数万語に制限されており（参考文献〔６〕，〔９〕，〔１１〕）、実用的な検索システムの検索サイズに比べると極端に小さい。
【００１２】
また、統計的な音声認識では、機能語などの高頻出語ほど高い精度で認識されるのに対して、情報検索では特定の文書にしか出現しない低頻度語ほど効果的な索引語になりやすい。すなわち、ユーザ発話中の効果的な検索キーワードほど誤認識されやすいという矛盾が生じる。
【００１３】
以上をまとめると、音声入力型の検索システムにおいて「未知語問題」は本質的に不可避であり、何らかの積極的な解決策が必要である。
【００１４】
本発明は、上記状況に鑑みて、音声認識でカバーできない単語を検索用の索引語によって自動的に補完することにより、音声発話の誤認識をなくし、検索精度の向上を図り得る音声入力によるテキスト検索装置を提供することを目的とする。
【００１５】
【課題を解決するための手段】
本発明は、上記目的を達成するために、
〔１〕音声入力によるテキスト検索装置において、音声認識部（１）と、書き起こし部（５）と、テキスト検索部（６）と、検索対象テキストコレクション（７）と、未知語補完部（９）と、補完された検索要求部（１０）と、検索結果の出力部（８）とを備え、ユーザが検索要求を発話すると、前記音声認識部（１）が、音声認識用の辞書に登録されていない未知語を含むユーザの発話を前記書き起こし部（５）で書き起こし、前記検索対象テキストコレクション（７）の索引語から、検出された未知語と音韻的に等価な語もしくは類似する語を探索してユーザの発話中の未知語を自動的に前記未知語補完部（９）で補完し、前記補完された検索要求部（１０）で補完された検索要求を用いて前記テキスト検索部（６）で再検索を行い、前記検索結果の出力部（８）から最終的な検索結果を得ることを特徴とする。
【００１６】
〔２〕上記〔１〕記載の音声入力によるテキスト検索装置において、前記音声認識部（１）は、音響モデル（２）、辞書（３）及び言語モデル（４）を有することを特徴とする。
【００１７】
【発明の実施の形態】
以下、本発明の実施の形態を詳細に説明する。
【００１８】
図１は本発明の実施例を示す音声入力型テキスト検索システムの構成図、図２は本発明の実施例を示す音声入力型テキスト検索フローチャートである。
【００１９】
図１において、ＢはユーザＡによる検索要求の発話、１は音声認識部、２は音響モデル、３は辞書、４は言語モデル、５は書き起こし部、６はテキスト検索部、７は検索対象テキストコレクション、８は検索結果の出力部、９は未知語補完部、１０は補完された検索要求部である。
【００２０】
このように、本発明の音声入力型テキスト検索システムは、音声認識部１、テキスト検索部６、未知語補完部９の３つのモジュールで構成されている。現在は日本語を対象に実装されているものの、本発明で提案する手法は言語の種類を問わない。
【００２１】
以下、図１を参照しながら、この音声入力型テキスト検索システムの処理について説明する。
【００２２】
まず、ユーザＡが検索要求の発話Ｂをすると、音声認識部１が辞書３、音響モデル２、言語モデル４を用いてユーザＡの発話Ｂの書き起こしを生成する。本発明のシステムでは、日本語ディクテーションツールキット（参考文献〔１５〕）で提供されている音声認識部１と音響モデル２を利用した。しかし、ユーザ発話Ｂ中に含まれる未知語を検出するために、辞書３と言語モデル４は独自に作成して利用した（参考文献〔１４〕）。
【００２３】
具体的には毎日新聞ＣＤ−ＲＯＭ１０年分（１９９１−２０００）の記事を「茶筌」で形態素解析し、高頻度語２０，０００語を抽出して辞書を構成した。通常は、辞書中の単語Ｎグラムなどによって言語モデル４を作成する。しかし、これでは辞書未登録語は認識できない。そこで、辞書に登録されなかった約３０万語（異なり数）を音節単位に分割し、単語と音節を併用してトライグラムを作成した。音節は異なりで７００件あった。
【００２４】
すなわち、本発明のシステムの言語モデル４において、辞書未登録語は音節の組み合わせとしてモデル化されている。その結果、辞書未登録語は単語としては認識されないものの、音節単位でカタカナ列として書き起こしされる。また、当該言語モデル４は通常の統計的Ｎグラムなので、既存のデコーダを拡張せずに利用できる。そこで、音韻系列の認識を別途必要とする手法（参考文献〔１２〕）とは異なる。
【００２５】
「オレンジやグレープフルーツなどの柑橘系果物の輸入に関する記事」という発話を例にとると、
『オレンジや／グレープラチナガノ／などの／カンキツケイ／果物の輸入に関する記事』のように「グレープフルーツ」や「柑橘系」が未知語として検出される（ここでは未知語部分をスラッシュで括っている）。
【００２６】
なお、「柑橘系」のように未知語箇所の検出と音韻列の特定に成功する場合や「グレープフルーツ」のように音韻列の特定は不完全でも未知語箇所の検出に成功する場合がある。いずれの場合も、未知語に対する正しい語を推定することができれば、音声認識精度が向上し、結果として検索精度も向上する。
【００２７】
本発明のシステムのユーザは、検索対象のテキストコレクションから何らかの情報を引き出したいという意図を持って発話を行う。言い替えれば、ユーザの発話はテキストコレクション中の情報に関連したものである可能性が高い。そこで、上記「グレープラチナガノ」や「カンキツケイ」に対応する正しい語がコレクション中に含まれていると考えることは自然な発想である。
【００２８】
直感的には、検索対象テキストコレクションの索引語から、検出された未知語と音韻的に等価な語もしくは類似する語を探索してユーザ発話中の未知語を補完すればよい。しかし、音韻的に「類似する」語の探索（すなわち、音韻列の部分一致による探索）を大規模な索引に対して行うことは効率が悪く、実時間処理には耐えない。
【００２９】
そこで、まず、ユーザ発話中で単語として認識された部分だけを用いて初期検索を実行し、ユーザの検索要求に関連する文書を選択的に取得する。テキスト検索には確率型の「Ｏｋａｐｉ法」（参考文献〔１０〕）を用いた。当該手法は、与えられた検索要求に対するスコアを各文書に対して計算し、スコアが高い順番に文書を出力する。本発明のシステムでは、対象テキストを「茶筌」で形態素解析して名詞を索引語として抽出し、単語単位で索引付けを行って転置ファイルを事前に作成する。
【００３０】
次に、初期検索で得られた文書から、検出された未知語に対応する語を探索し、未知語と置き換えることで検索要求を補完する。具体的な方法については後述する。
【００３１】
最後に、補完された検索要求を用いて再検索を行い、最終的な検索結果が得られる。
【００３２】
上記の手法は、初期検索の結果を用いて最終的な検索精度を向上させるという点において、情報検索で用いられる検索要求の拡張（ｑｕｅｒｙｅｘｐａｎｓｉｏｎ）やローカルフィードバックに類似している（参考文献〔８〕）。
【００３３】
しかし、これらは検索精度を向上させることに主眼が置かれ、ユーザが意図しない索引語を追加する可能性がある。それに対して本手法は、ユーザの発話を正しく認識することを目的としている点が異なる。これは「自分が発話（意図）した通りに検索が行われている」という安心感をユーザに与える上で重要である。
【００３４】
次に、未知語の自動補完について説明する。
【００３５】
本発明のシステムの特長は、音声認識で検出された未知語の音韻系列を、初期検索で取得された上位文書中の索引語に対応付けることによって単語として正しく認識する点にある。この処理を「未知語の補完」と呼ぶことにする。
【００３６】
同音意義語のために、一つの音韻系列が複数の単語に対応することがある（例えば「河川」と「架線」）。また、未知語の音韻系列は誤って検出されることがあるため、補完対象の音韻系列一つに対して、音韻的に類似する複数の索引語を考慮する必要がある。すなわち、未知語の自動補完では、複数の候補から適切な索引語を選択するための曖昧性解消が必要である。
【００３７】
そこで、選択されるべき索引語が満たす条件について検討し、以下に示す３つの基準を設定した。
【００３８】
（１）補完対象の未知語との音韻的な類似度が高い（完全一致すれば類似度は最大となる）。
【００３９】
（２）上位文書における出現頻度が高い。
【００４０】
（３）より上位の文書に出現する。
【００４１】
これらを確率論的な枠組みで定式化すると、未知語補完は、式（１）で計算されるスコアを最大化するｔを選択することに相当する。
【００４２】
【数１】

【００４３】
ここで、Ｄ_qは検索要求ｑによって初期検索された上位文書の集合である。Ｐ（ω｜ｔ）はｔが音韻的にωと等価である確率、Ｐ（ｔ｜ｄ）は上位文書の一つｄから索引語を無作為に選んだ場合に、それがｔである確率、Ｐ（ｄ｜ｑ）は検索要求ｑによって文書ｄが検索される確率である。これらのパラメタは、上記３つの基準（１）〜（３）に順番に対応している。
【００４４】
しかし、実際にはＰ（ω｜ｔ）やＰ（ｄ｜ｑ）の確率値を正確に推定することは難しい。また、音韻的な類似度（上記、第１の基準）が他の基準よりもかなり強い制約になることが経験的に分かっている。そこで、予備実験の結果に基づいて、式（１）を式（２）のように近似する。
【００４５】
【数２】

【００４６】
ここで、Ｐ（ω｜ｔ）はｔとωが共有する音韻数とωに含まれる音韻総数の比率によって計算する。具体的には、ＤＰマッチングによってｔとωを音韻単位で比較し、両者に共通して含まれる音韻列を特定する。Ｐ（ｔ｜ｄ）はｄにおけるｔの相対頻度で計算する。Ｐ（ｄ｜ｑ）としてＯｋａｐｉ法で計算される文書ｄのスコアで代用する。また、Ｐ（ｄ｜ｑ）とＰ（ｄ｜ｑ）のｌｏｇを用いることで、これら２つの影響力が相対的に小さくなるように制御している。
【００４７】
以上の方法は、索引付けの手法に依存しない点に注意が必要である。言い替えれば、索引語ｔの単位として、文字、単語、複合語など文書中に現れる任意の文字列を対象とすることができる。
【００４８】
初期検索によって文書数を制限しても、索引語数は膨大なものになる場合がある。特に、ＤＰマッチングによる音韻単位の比較は実時間応答を低下させる要因となる。
【００４９】
また、上位文書中の索引語の多くは、補完対象の未知語と音韻的に全く類似しないため、これらのノイズを早期に排除できれば、計算効率の向上が期待できる。
【００５０】
通常のテキスト検索に用いられる索引（本発明のシステムでは転置ファイル）は、入力されたキーワードとの完全一致によって、該当する項目を効率良く検索できる。しかし、未知語補完用索引では、入力された音韻列に対して、部分一致を許容しながら、ある程度類似した項目だけを効率良く特定できなければならない。
【００５１】
本発明のシステムで用いる未知語検出の傾向を調査した結果、検出された未知語と、それに対応する正しい索引語は、前方もしくは後方で一致していることが多く、両端が一致せずに語中のみが一致することは少ない。そこで、未知語補完用の索引を以下の手順で事前に作成した。
【００５２】
まず、コレクション中の全文書を「茶筌」で形態素解析し、単語表記とカナ表記を抽出する。次に、カナ表記を規則によって音韻系列に変換する（規則数１４３）。最後に、音韻系列の前方と後方から任意長の部分列を抽出して、前方／後方部分一致探索が可能な索引を編成する。
【００５３】
このとき、単語一つと単語バイグラムを併用して索引を作成することで「弥生／時代」や「オゾン／ホール」のように２単語で構成される複合語にも対応した。
【００５４】
原理的には、３単語以上で構成される長い複合語も扱うことができる。しかし、未知語の長さに比例して探索時間がかかるため、現在は２単語までとしている。また、現状の音声認識では機能語のような高頻度語は既知語として正しく認識されやすいため、長い単語列（例えば「情報検索の応用分野」）が一つにまとまった未知語として検出されることは稀である。
【００５５】
本発明で実装したシステムを評価するために、ＩＲＥＸの日本語検索コレクションを用いて実験を行った。当コレクションは、毎日新聞１９９４−１９９５年（記事総数２１１，８５３件）を対象にした検索課題３０件と各課題に対する正解記事ＩＤで構成されている。検索課題の例を以下に示す。
【００５６】
＜ＴＯＰＩＣ＞＜ＴＯＰＩＣ−ＩＤ＞１０１０＜／ＴＯＰＩＣ−ＩＤ＞
＜ＤＥＳＣＲＩＰＴＩＯＮ＞柑橘類の輸入＜／ＤＥＳＣＲＩＰＴＩＯＮ＞
＜ＮＡＲＲＡＡＴＩＶＥ＞オレンジ、レモン、グレープフルーツなどの柑橘系果物の日本への輸入の記事、政府の市場開放や輸入による日本生産地の影響、値段への影響や消費者の反応などの記事を含む。＜／ＮＡＲＲＡＴＩＶＥ＞＜／ＴＯＰＩＣ＞
さらに、４名の話者（男女各２名）に＜ＮＡＲＲＡＴＩＶＥ＞フィールドを読み上げてもらい、合計１２０件の音声発話データを作成して実験に利用した。初期検索、再検索ともに上位３００件を出力した。
【００５７】
まず、未知語の検出と補完に関する評価を行った。３０件の検索要求（＜ＮＡＲＲＡＴＩＶＥ＞のみ）に含まれる単語は、のべ数で約４００語あり、１４単語（異なりで１３単語）が音声認識用辞書に登録されていなかった。
【００５８】
未知語検出の再現率と精度はそれぞれ７１．４％と２２．６％であった。本発明のシステムは未知語を網羅的に特定する傾向があることが分かる。さらに、未知語の補完精度を調べた結果、３６．２％であった。ここでは、辞書登録語が未知語として誤検出されても、補完処理によって正しい索引語に対応付けられた場合は正解と判定した。正しく補完された未知語と、索引語の例を以下に示す。
【００５９】
グレープラチナガノ／グレープフルーツ
ヤヨイチタ／弥生時代
ニククライス／ニックプライス
ベンピ／便秘
次に、検索精度への影響を調べるために、以下の異なる検索手法（システム）を比較した。
【００６０】
（ｉ）テキスト入力型検索システム
（ii）高頻度語２０，０００語のみを含む言語モデルを音声認識に使用した音声入力型検索システム
（iii ）検出した未知語は補完しないシステム
（iv）本発明のシステム（未知語の検出・補完を併用）
システム（iv）が本発明で提案するシステムに相当する。システム（ii）は未知語音節をモデル化していないため、未知語の検出と補完を行わない点を除けば、本発明のシステムと同じである。
【００６１】
各システムの平均適合率（％）を以下に示す。
【００６２】
【表１】

【００６３】
本発明のシステムの精度はテキスト検索には及ばないものの、約８７％を再現している。また、全ての話者に対してそれ以外の音声入力型システム〔（ii）と（iii ）〕の検索精度を向上させた。システム（iii ）と（iv）を比較することで未知語補完の効果が分かり、システム（ii）と（iv）を比較することで、未知語の検出と補完を併用した提案手法の有効性が分かる。
【００６４】
しかし、精度の向上はそれほど大きくなかった。今回の実験では未知語が本質的に少なかったため、全体的な差異が大きくならなかった。また、未知語を人工的に作るような不自然な実験設定は避けた。未知語問題がより深刻な対象（例えば、技術文書やウェブページ）について、今後さらなる評価実験を行う予定である。
【００６５】
システム（ii）に比べて、本発明のシステム（iv）の精度が顕著に低下した課題を分析した結果、初期検索の上位文書に正しい索引語を含まれているにも拘らず、式（２）のスコアで適切に選択されなかった事例が大半を占めた。例えば「制度」が未知語検出によって「センド」と誤認識されたために「鮮度」のように音韻的に等価な別の語が選択されてしまった。文書中の索引語頻度や文書順位などとのバランスについて今後検討が必要である。また、未知語音節をモデル化したために、辞書登録語を誤認識し、検索精度が低下した事例が若干あった。
【００６６】
最後に、オンライン処理のＣＰＵ時間を測定した。未知語の検出は、通常の統計的音声認識の枠組み内で行われるため、それに伴う付加的なＣＰＵ時間は発生しない。補完に要したＣＰＵ時間は未知語あたり平均３．５秒だった（ＣＰＵとしてＡＭＤＡｔｈｌｏｎＭＰ１９００＋を使用）。依然として改善の余地はあるものの、ほぼ実時間で動作すると考えて良い。
【００６７】
音声入力型の検索システムでは、音声認識と検索における語彙サイズの不整合は不可避である。本発明は、単語と音節を併用した言語モデルによってユーザ発話中の未知語を検出し、検索対象文書中の索引語によって適切に補完する手法を提案した。例としては、新聞記事を対象にした実験の結果、本発明は実時間で動作し、既存の手法を上回る検索精度を実現することができた。また、上述のように、技術文書やウェブページの検索では未知語問題がより深刻になるが、このような場合に、本発明の実用的効果は著大である。
【００６８】
なお、本発明は上記実施例に限定されるものではなく、本発明の趣旨に基づいて種々の変形が可能であり、これらを本発明の範囲から排除するものではない。
【００６９】
【発明の効果】
以上、詳細に説明したように、本発明によれば、音声認識でカバーできない単語を検索用の索引語によって自動的に補完することにより、音声発話の誤認識をなくし、検索精度の向上を図ることができる。
【図面の簡単な説明】
【図１】本発明の実施例を示す音声入力型テキスト検索システムの構成図である。
【図２】本発明の実施例を示す音声入力型テキスト検索フローチャートである。
【符号の説明】
Ａユーザ
Ｂ検索要求の発話
１音声認識部
２音響モデル
３辞書
４言語モデル
５書き起こし部
６テキスト検索部
７検索対象テキストコレクション
８検索結果の出力部
９未知語補完部
１０補完された検索要求部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a text search SakuSo location by voice input.
[0002]
[Prior art]
Conventionally, there are the following references in such technical fields.
[0003]
[1] J. et al. Barnett, S .; Anderson, J.M. Broglio, M .; Singh, R .; Hudson, and S.M. W. Kuo. Experiments in spoke queries for document retrival. In Proceedings of Eurospech, 97, pp. 1323-1326, 1997.
[2] Fabio Crestani. Word recognition errors and relevance feedback in spoke query processing. In Proceedings of the Fourth International Conference on Flexible Query Answering Systems, pp. 267-281, 2000.
[3] Atsushi Fujii, Katunobu Itou, and Tetsuya Ishikawa. Speech-driven text retrieval: Usage target IR collections for statistical language adaptation in specification recognition. In Information Retrieval Technologies for Speech Applications, pp. 94-104, Springer, 2002.
[4] John S. Garofolo, Ellen M. et al. Vorhees, Vincent M .; Stanford, and Karen Sparkk Jones. Trec-6 1997 spoke document retrieval track overview and results. In Proceedings of the 6th Text Retrieval Conference, pp. 5-7. 83-91, 1997.
[5] Katunobu Itou, Atsushi Fujii, and Tetsuya Ishikawa. Language modeling for multi-domain speech-driven text retry. In IEEE Automatic Speech Recognition and Understanding Workshop, 2001.
[6] Katunobu Itou, Mikio Yamamoto, Kazuya Takeda, Toshiyuki Takezawa, Tatsuo Matsuoka, Tetsunori Kobayashi, and Kiyohiro Shiro. JNAS: Japan speech corpus for large vocabulary continuous speech recognition research. Journal of Acoustic Society of Japan, Vol. 20, no. 3, pp. 199-206, 1999.

[0004]
[8] K.I. L. Kwok and M.M. Chan. Improving two-stage ad-hoc retry for short queries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Revelal, pp. 250-256, 1998. [9] Douglas B.M. Paul and Janet M.M. Baker. The design for the Wall Street Journal-based CSR corpus. In Proceedings of DARPA Speech & Natural Language Works, pp. 357-362, 1992.
[10] S.M. E Robertson and S.M. Walker. Some simple effective applications to the 2-poison model for probable weighted retrival. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information. Retrieval, pp. 232-241, 1994.
[11] Herman J. et al. M.M. Steeneken and David A.M. van Leeuwen. Multi-assessment of spike independence large vocabulary special-recognition systems: The SQALE-project. In Proceedings of Eurospeech95, pp. 1271-1274, 1995.

[0005]
[13] Steve Young. A review of large-vocabulary continuous-speech recognition. IEEE Signal Processing Magazine, pp. 45-57, September 1996.
[14] Katsunobu Ito, Nakano Seiya. Statistical language model for large vocabulary continuous speech recognition with emphasis on coverage. Proceedings of the Acoustical Society of Japan, pp. 65-66, March 1999. [15] Kiyohiro Shikano, Katsunobu Ito, Tatsuya Kawahara, Kazuya Takeda, Mikio Yamamoto (ed.), Speech recognition system, Ohmsha, 2001.
Recent speech recognition technology can achieve practical recognition accuracy for utterances whose contents are organized to some extent, and various applications can be considered. In addition, in the field of information retrieval, many studies that incorporate voice recognition have been conducted. These studies are broadly divided into two types according to the purpose: “search of voice data” and “search by voice”. The former has been actively studied against the background of the fact that a test collection targeting broadcast audio data is being prepared on TREC's Spoken Document Retrieval (SDR) track (reference [4]). (Reference [7]).
[0006]
[Problems to be solved by the invention]
On the other hand, although search by voice is an important basic technology that supports applications that do not require keyboard input, such as car navigation systems and call centers, there are few examples of research compared to voice data search.
[0007]
Further, in the conventional research, the existing speech recognition and the text search system are simply connected, and the search accuracy is significantly lowered due to the speech recognition error (references [1] and [2]).
[0008]
Therefore, the inventors of the present application have proposed a method capable of creating a language model for speech recognition using a collection of search targets and improving both speech recognition and search accuracy (references [3], [3] 5]).
[0009]
However, the speech input type search system has a problem of unknown words (system dictionary unregistered words). Unlike the conventional controlled vocabulary system, information retrieval systems in recent years can search by any word in the search target text. It is not uncommon for index sizes to reach the order of millions.
[0010]
Although functional words such as particles are excluded from the index as unnecessary words, since these are rarely used as search keywords, it may be considered that there is virtually no vocabulary restriction.
[0011]
On the other hand, in recent speech recognition systems, the vocabulary size (number of dictionary registration words) is limited. This is mainly due to hardware constraints and statistical model learning efficiency (reference [13]), and is difficult to solve by a simple method of increasing the number of registered words. In many languages, the vocabulary size is limited to tens of thousands of words (reference documents [6], [9], [11]), which is extremely small compared to the search size of a practical search system.
[0012]
In statistical speech recognition, high-frequency words such as function words are recognized with high accuracy, whereas in information retrieval, low-frequency words that appear only in specific documents tend to be effective index words. . That is, a contradiction arises in that an effective search keyword being uttered by a user is likely to be erroneously recognized.
[0013]
In summary, the “unknown word problem” is essentially inevitable in a speech input type search system, and some positive solution is necessary.
[0014]
In view of the above situation, the present invention eliminates misrecognition of speech utterances and automatically improves search accuracy by automatically supplementing words that cannot be covered by speech recognition with index words for search. an object of the present invention is to provide a test SakuSo location.
[0015]
[Means for Solving the Problems]
In order to achieve the above object, the present invention provides
[ 1 ] In a text search device by voice input, a voice recognition unit (1) , a transcription unit (5), a text search unit (6) , a search target text collection (7), and an unknown word completion unit (9 ) , A complemented search request unit (10), and a search result output unit (8). When the user utters a search request, the speech recognition unit (1) registers in the dictionary for speech recognition. An utterance of a user including an unknown word that has not been written is transcribed by the transcription unit (5), and a phonologically equivalent word or similar to the detected unknown word from the index word of the search target text collection (7) An unknown word in a user's utterance is searched for automatically by the unknown word completion unit (9), and the text search is performed using the search request supplemented by the supplemented search request unit (10). Re-search in part (6) Wherein the output of serial Results (8) to obtain a final search result.
[0016]
[ 2 ] In the text search device by voice input according to [ 1 ], the voice recognition unit (1) includes an acoustic model (2), a dictionary (3), and a language model (4) .
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail.
[0018]
FIG. 1 is a block diagram of a speech input type text search system showing an embodiment of the present invention, and FIG. 2 is a speech input type text search flowchart showing an embodiment of the present invention.
[0019]
In FIG. 1, B is an utterance of a search request by user A, 1 is a speech recognition unit, 2 is an acoustic model, 3 is a dictionary, 4 is a language model, 5 is a transcription unit, 6 is a text search unit, and 7 is a search target. A text collection, 8 is a search result output unit, 9 is an unknown word complementing unit, and 10 is a complemented search request unit.
[0020]
Thus, the speech input type text search system of the present invention is composed of three modules: the speech recognition unit 1, the text search unit 6, and the unknown word complementing unit 9. Although it is currently implemented for Japanese, the method proposed in the present invention is not limited to the type of language.
[0021]
Hereinafter, processing of the speech input type text search system will be described with reference to FIG.
[0022]
First, when the user A utters a search request utterance B, the speech recognition unit 1 generates a transcript of the utterance B of the user A using the dictionary 3, the acoustic model 2, and the language model 4. In the system of the present invention, the speech recognition unit 1 and the acoustic model 2 provided in the Japanese dictation tool kit (reference document [15]) are used. However, in order to detect unknown words included in the user utterance B, the dictionary 3 and the language model 4 were independently created and used (reference document [14]).
[0023]
Specifically, articles from the Mainichi Shimbun CD-ROM for 10 years (1991-2000) were morphologically analyzed with “tea bowls”, and 20,000 high-frequency words were extracted to construct a dictionary. Usually, the language model 4 is created by a word N-gram in the dictionary. However, this does not recognize unregistered words. Therefore, about 300,000 words (different numbers) that were not registered in the dictionary were divided into syllable units, and a trigram was created using both words and syllables. There were 700 syllables, different.
[0024]
That is, in the language model 4 of the system of the present invention, unregistered words are modeled as syllable combinations. As a result, although the dictionary unregistered word is not recognized as a word, it is transcribed as a katakana string in syllable units. Further, since the language model 4 is a normal statistical N-gram, it can be used without extending the existing decoder. Therefore, it is different from the method (reference document [12]) that requires the recognition of the phoneme sequence separately.
[0025]
Take for example the utterance of “an article about importing citrus fruits such as oranges and grapefruits”
"Grapefruit" and "citrus" are detected as unknown words, as in "Articles on imports of oranges, gray platinum gano, etc., citrus and fruits" (Here, unknown words are enclosed in slashes) .
[0026]
Note that there are cases where the detection of an unknown word part and identification of a phoneme string are successful, such as “citrus”, or the detection of an unknown word part is successful even if the specification of a phoneme string is incomplete, such as “Grapefruit”. In any case, if the correct word for the unknown word can be estimated, the speech recognition accuracy is improved, and as a result, the search accuracy is also improved.
[0027]
The user of the system of the present invention speaks with the intention of extracting some information from the text collection to be searched. In other words, the user's utterance is likely to be related to information in the text collection. Therefore, it is a natural idea to think that the correct words corresponding to “Gray Platinum Gano” and “Kankitsukei” are included in the collection.
[0028]
Intuitively, an unknown word in the user's utterance may be complemented by searching for a word that is phonologically equivalent or similar to the detected unknown word from index words in the search target text collection. However, performing a phonologically “similar” word search (that is, a search by partial matching of phoneme strings) on a large index is inefficient and cannot withstand real-time processing.
[0029]
Therefore, first, an initial search is executed using only a portion recognized as a word in the user's utterance, and a document related to the user's search request is selectively acquired. Probability type “Okapi method” (reference [10]) was used for text search. In this method, a score for a given search request is calculated for each document, and the documents are output in order from the highest score. In the system of the present invention, morphological analysis is performed on the target text using “tea bowl”, nouns are extracted as index words, indexing is performed in units of words, and transposed files are created in advance.
[0030]
Next, the search request is complemented by searching for a word corresponding to the detected unknown word from the document obtained by the initial search and replacing it with the unknown word. A specific method will be described later.
[0031]
Finally, a re-search is performed using the supplemented search request, and a final search result is obtained.
[0032]
The above method is similar to the search request extension used in information search and local feedback in that the final search accuracy is improved using the results of the initial search (references [8 ]).
[0033]
However, these methods focus on improving search accuracy and may add index words that are not intended by the user. On the other hand, this method is different in that it aims at correctly recognizing the user's utterance. This is important for giving the user a sense of security that “the search is being performed as he / she spoke (intentions)”.
[0034]
Next, automatic completion of unknown words will be described.
[0035]
The feature of the system of the present invention is that a phoneme sequence of an unknown word detected by speech recognition is correctly recognized as a word by associating it with an index word in an upper document acquired by an initial search. This processing is called “unknown word completion”.
[0036]
Due to the homophone meaning word, one phoneme sequence may correspond to a plurality of words (for example, “river” and “overhead line”). In addition, since a phoneme sequence of an unknown word may be erroneously detected, it is necessary to consider a plurality of index words that are phonologically similar to one phoneme sequence to be complemented. That is, in the unknown word auto-completion, it is necessary to resolve the ambiguity in order to select an appropriate index word from a plurality of candidates.
[0037]
Therefore, the conditions that the index word to be selected satisfy were examined, and the following three criteria were set.
[0038]
(1) Phonological similarity with an unknown word to be complemented is high (similarity is maximized if they are completely matched).
[0039]
(2) The appearance frequency in the upper document is high.
[0040]
(3) Appear in higher-order documents.
[0041]
When these are formulated in a probabilistic framework, unknown word completion corresponds to selecting t that maximizes the score calculated by Equation (1).
[0042]
[Expression 1]

[0043]
Here, D _q is a set of high-order documents initially searched by the search request q. P (ω | t) is the probability that t is phonologically equivalent to ω, and P (t | d) is the probability that it is t when an index word is randomly selected from one of the upper documents d. , P (d | q) is a probability that the document d is retrieved by the retrieval request q. These parameters correspond to the above three criteria (1) to (3) in order.
[0044]
However, in practice, it is difficult to accurately estimate the probability values of P (ω | t) and P (d | q). Moreover, it has been empirically known that the phonological similarity (the above-mentioned first standard) is a much stronger constraint than the other standards. Therefore, based on the result of the preliminary experiment, Equation (1) is approximated as Equation (2).
[0045]
[Expression 2]

[0046]
Here, P (ω | t) is calculated by the ratio of the number of phonemes shared by t and ω to the total number of phonemes included in ω. Specifically, t and ω are compared in phoneme units by DP matching, and a phoneme string included in both is specified. P (t | d) is calculated by the relative frequency of t in d. The score of the document d calculated by the Okapi method is substituted as P (d | q). Further, by using the logs of P (d | q) and P (d | q), control is performed so that these two influences become relatively small.
[0047]
It should be noted that the above method does not depend on the indexing method. In other words, as a unit of the index word t, an arbitrary character string appearing in the document such as a character, a word, or a compound word can be targeted.
[0048]
Even if the number of documents is limited by the initial search, the number of index words may be enormous. In particular, the comparison of phoneme units by DP matching is a factor that reduces the real-time response.
[0049]
In addition, many of the index words in the upper document are not quite phonologically similar to the unknown word to be complemented. Therefore, if these noises can be eliminated at an early stage, an improvement in calculation efficiency can be expected.
[0050]
The index (transposed file in the system of the present invention) used for normal text search can efficiently search for the corresponding item by the complete match with the input keyword. However, in the unknown word completion index, it is necessary to be able to efficiently specify only items that are somewhat similar to each other while allowing partial matching to the input phoneme string.
[0051]
As a result of investigating the tendency of unknown word detection used in the system of the present invention, the detected unknown word and the correct index word corresponding to the detected unknown word often match forward or backward, and the words do not match at both ends. It is rare that only inside matches. Therefore, an index for unknown word completion was created in advance by the following procedure.
[0052]
First, all documents in the collection are morphologically analyzed with “tea bowl” to extract word notation and kana notation. Next, kana notation is converted into a phoneme sequence according to a rule (number of rules 143). Finally, substrings having an arbitrary length are extracted from the front and rear of the phoneme sequence, and an index that allows forward / backward partial match search is organized.
[0053]
At this time, by creating an index using both a word and a word bigram, it was possible to handle compound words composed of two words such as “Yayo / Era” and “Ozone / Hall”.
[0054]
In principle, long compound words composed of three or more words can be handled. However, since the search time is proportional to the length of the unknown word, it is currently limited to two words. In addition, since high-frequency words such as function words are easily recognized as known words in current speech recognition, a long word string (for example, “application field of information retrieval”) is detected as a single unknown word. That is rare.
[0055]
In order to evaluate the system implemented in the present invention, an experiment was conducted using the Japanese search collection of IREX. This collection is composed of 30 search tasks for the Mainichi Shimbun 1994-1995 (total number of articles 211,853) and correct article IDs for each task. Examples of search tasks are shown below.
[0056]
<TOPIC><TOPIC-ID> 1010 </ TOPIC-ID>
<DESCRIPTION> Import of citrus </ DESCRIPTION>
<NARRAATIVE> orange, lemon, articles of citrus imports of fruit to Japan, such as grapefruit, the effect of Japan production area due to open markets and import of government, the article such as impact and consumer reaction to the value stage. </ NARATIVE></TOPIC>
Furthermore, four speakers (two men and two men) read out the <NARATIVE> field and created a total of 120 voice utterance data for use in the experiment. The top 300 items were output for both initial search and re-search.
[0057]
First, we evaluated the detection and completion of unknown words. There are about 400 words included in the 30 search requests (only <NARATIVE>), and 14 words (13 different words) are not registered in the speech recognition dictionary.
[0058]
The recall and accuracy of unknown word detection were 71.4% and 22.6%, respectively. It can be seen that the system of the present invention tends to exhaustively identify unknown words. Furthermore, as a result of examining the complementation accuracy of unknown words, it was 36.2%. Here, even if a dictionary registered word is erroneously detected as an unknown word, it is determined to be a correct answer if it is associated with a correct index word by complement processing. Examples of correctly completed unknown words and index words are shown below.
[0059]
Gray Platinum Gano / Grapefruit Yayoichita / Yayoi Period Nikkurisu / Nickprice Bempi / Constipation Next, in order to investigate the influence on search accuracy, the following different search methods (systems) were compared.
[0060]
(I) Text input type search system (ii) Speech input type search system using a language model containing only 20,000 high frequency words for speech recognition (iii) A system that does not complement detected unknown words (iv) The present invention System (combined detection and completion of unknown words)
The system (iv) corresponds to the system proposed in the present invention. Since the system (ii) does not model unknown word syllables, the system (ii) is the same as the system of the present invention except that unknown words are not detected and complemented.
[0061]
The average precision (%) of each system is shown below.
[0062]
[Table 1]

[0063]
Although the accuracy of the system of the present invention does not reach the text search, it reproduces about 87%. In addition, the search accuracy of other speech input systems [(ii) and (iii)] was improved for all speakers. By comparing systems (iii) and (iv), the effect of unknown word completion can be seen. By comparing systems (ii) and (iv), the effectiveness of the proposed method combining unknown word detection and completion can be demonstrated. I understand.
[0064]
However, the improvement in accuracy was not so great. In this experiment, there were essentially few unknown words, so the overall difference did not increase. We also avoided unnatural experimental settings that artificially create unknown words. Further evaluation experiments will be conducted in the future for subjects with a more serious unknown word problem (for example, technical documents and web pages).
[0065]
As a result of analyzing a problem in which the accuracy of the system (iv) of the present invention is remarkably lowered as compared with the system (ii), the expression (2) ) Scored the majority of cases that were not properly selected. For example, because “system” was misrecognized as “send” by unknown word detection, another phonologically equivalent word such as “freshness” was selected. It is necessary to investigate the balance with the index word frequency and document order in the document. In addition, because unknown word syllables were modeled, there were some cases in which dictionary registration words were misrecognized and the search accuracy decreased.
[0066]
Finally, CPU time for online processing was measured. The detection of unknown words is performed within the normal statistical speech recognition framework, so there is no additional CPU time associated with it. The CPU time required for complementation was an average of 3.5 seconds per unknown word (using AMD Athlon MP 1900+ as the CPU). Although there is still room for improvement, it can be assumed that it will operate almost in real time.
[0067]
In a speech input type search system, vocabulary size mismatch between speech recognition and search is inevitable. The present invention has proposed a method of detecting an unknown word in a user's utterance using a language model that uses both a word and a syllable, and appropriately complementing it with an index word in a search target document. As an example, as a result of experiments on newspaper articles, the present invention was able to operate in real time and achieve a search accuracy higher than existing methods. Further, as described above, the unknown word problem becomes more serious in the search of technical documents and web pages. In such a case, the practical effect of the present invention is remarkable.
[0068]
In addition, this invention is not limited to the said Example, A various deformation | transformation is possible based on the meaning of this invention, and these are not excluded from the scope of the present invention.
[0069]
【The invention's effect】
As described above in detail, according to the present invention, words that cannot be covered by speech recognition are automatically complemented by search index words, thereby eliminating erroneous recognition of speech utterances and improving search accuracy. be able to.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a speech input type text search system showing an embodiment of the present invention.
FIG. 2 is a speech input type text search flowchart showing an embodiment of the present invention.
[Explanation of symbols]
A user B utterance of search request 1 speech recognition unit 2 acoustic model 3 dictionary 4 language model 5 transcription unit 6 text search unit 7 search target text collection 8 search result output unit 9 unknown word completion unit 10 supplemented search request unit

Claims

(A) a voice recognition unit (1) ;
(B) the transcription part (5);
( C ) a text search unit (6) ;
( D ) Search object text collection (7),
( E ) the unknown word completion part (9) ;
( F ) the supplemented search request part (10);
(G) a search result output unit (8);
(H) When the user utters a search request, the voice recognition unit (1) transcribes the user's utterance including an unknown word that is not registered in the dictionary for voice recognition by the transcription unit (5), From the index word of the search target text collection (7), a word that is phonologically equivalent to or similar to the detected unknown word is searched, and the unknown word in the user's utterance is automatically added to the unknown word complementing unit (9 ), The text search unit (6) performs a re-search using the search request supplemented by the supplemented search request unit (10), and the final search result is output from the search result output unit (8). A text search device by voice input, characterized by obtaining a search result .

The text search device by speech input according to claim 1 , wherein the speech recognition unit (1) has an acoustic model (2), a dictionary (3), and a language model (4) . apparatus.