JP2004348552A

JP2004348552A - Voice document search device, method, and program

Info

Publication number: JP2004348552A
Application number: JP2003146299A
Authority: JP
Inventors: Takaaki Hasegawa; 隆明長谷川; Yoshihiko Hayashi; 林　　良彦
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-05-23
Filing date: 2003-05-23
Publication date: 2004-12-09

Abstract

<P>PROBLEM TO BE SOLVED: To enrich a voice recognizing dictionary for improving voice document search accuracy. <P>SOLUTION: A correct answer to a voice recognition error is statistically found from a voice recognition result and a text corresponding to it, and the voice document is developed by the correct answer corresponding to the recognition error during search for increasing similarity with a query. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、録画または録音された音声文書に対し、ユーザがキーワードを入力することにより所望の音声文書を検索する音声文書検索システムに関し、特に、音声認識誤りを考慮して音声文書とユーザのクエリの類似度を計算する音声文書検索システムに関する。
【０００２】
【従来の技術】
音声認識技術の精度が向上することに伴い、マルチメディア文書内に存在する音声に対して音声認識を適用して文字化することにより、これまでのテキスト文書の検索技術をマルチメディア文書の検索に応用する研究開発が行われている。例えば、特許文献１では、音声情報に対して音声認識を行うことにより検索のためのキーワードを抽出している。
【０００３】
音声認識に用いられる辞書は、認識精度および処理速度の点から、適用分野や発話されやすい単語の組み合わせが考慮された語彙数が制限されている。辞書に存在しない単語が入力されると認識することができず、別の単語として出力される。このため、音声文書集合から作成される検索のためのインデックスは高々音声認識の辞書に格納される単語数で構成される。一方、検索を行うユーザは音声認識の語彙数の制限を考慮することはなく、あらゆるキーワードが入力される。
【０００４】
【特許文献１】
特開２００１−２２９１８０号公報
【非特許文献１】
“ＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇＡｌｇｏｒｉｔｈｍ（ＤＰＡ）ｆｏｒｅｄｉｔ−Ｄｉｓｔａｎｃｅ”，ｈｔｔｐ：／／ｗｗｗ．ｃｓｓｅ．ｍｏｎａｓｈ．ｅｄｕ．ａｕ？／￣ｌｌｏｙｄ／ｔｉｌｄｅＡｌｇＤＳ／Ｄｙｎａｍｉｃ／Ｅｄｉｔ／
【非特許文献２】
岩波講座ソフトウェア科学１５自然言語処理、長尾真編、１１章
【０００５】
【発明が解決しようとする課題】
上記に述べたように、音声認識時の認識語彙数の制限のために、音声文書集合から作成されるインデックスに存在する単語とユーザが入力するキーワードにおける単語の間でミスマッチが起こるため、ユーザは所望の音声文書を検索することが難しい。
【０００６】
本発明は、このようなミスマッチの問題を解決するため、音声文書内の音声認識結果の単語が正しくはどの単語の誤りとして出力されるのかを統計的に求めることによって得られる正解単語候補を用いて文書ベクトルの作成を行うことにより、ユーザが所望する音声文書を検索可能とすることを目的とするものである。
【０００７】
【課題を解決するための手段】
上記の課題を解決するための本発明の音声文書検索装置または方法は、音声認識誤りに対する正解を音声認識結果とこれに対応するテキストから統計的に求め、検索時に認識誤りに対応する正解で音声文書を拡張することによりクエリとの類似度を高め、検索精度を向上させることを特徴とする。このような統計的な単語誤りの検証は、例えば、本発明の音声文書検索装置の管理者が保守管理の一つの工程として行うことができる。例えば、管理者は、これまでの辞書には含まれてない単語を多く含むテキストを用意し、これを本発明装置に対して音声データとして入力すると共に、テキストデータとしても入力する。本発明装置では、以下に説明する構成または手順によって、入力された音声データとテキストデータとを照合し、その単語誤りを検証する。このようにして、本発明装置は、多くの検証を行いながら自動的に学習し、検索精度を向上させていくことができる。これにより、ユーザが所望する音声文書を検索可能とすることができる。
【０００８】
すなわち、本発明の第一の観点は、音声文書検索装置であって、本発明の特徴とするところは、音声文書に対する音声認識結果を出力する音声認識部と、この音声認識結果の単語と音声から書き起こしたテキストの単語との対応関係を抽出する単語対応付け部と、この抽出された対応関係のうち正しく音声が認識された音声認識単語および置換誤りした音声認識単語とこれに対応するテキストの単語との対からなるリストから構成される単語対テーブルと、前記単語対応付け部により抽出された前記対応関係から正しく音声が認識された音声認識単語および置換誤りした音声認識単語をそれぞれ選択し、前記単語対テーブルに対し、既にその音声認識単語およびこれに対応するテキストの単語の対が格納されている場合には当該音声認識単語およびこれに対応するテキストの単語の出現頻度を更新し、そうでない場合は新たに当該音声認識単語およびこれに対応するテキストの単語の対を格納する単語対格納部と、前記音声認識単語をキーとして、前記単語対テーブルに格納されているテキストの単語を抽出し、それらのテキストの単語を音声文書の音声認識正解単語候補に追加する単語候補拡張部と、この音声認識正解単語候補の集合に基づいて文書ベクトルを作成する文書ベクトル生成部と、ユーザの検索質問を入力するクエリ（ｑｕｅｒｙ）入力部と、この検索質問に基づいてクエリベクトルを生成するクエリベクトル生成部と、このクエリベクトルと前記文書ベクトルとの類似度を計算する類似度計算部と、この類似度の高い順に前記文書ベクトルが指し示す音声文書名を検索結果として出力する出力部とを備えたところにある（請求項１）。
【０００９】
また、本発明の第二の観点は、音声文書検索方法であって、本発明の特徴とするところは、音声文書に対する音声認識結果を出力するステップと、この音声認識結果の単語と音声から書き起こしたテキストの単語との対応関係を抽出するステップと、この抽出された対応関係のうち正しく音声が認識された音声認識単語および置換誤りした音声認識単語とこれに対応するテキストの単語との対のリストからテーブルを構成するステップと、抽出された前記対応関係から正しく音声が認識された音声認識単語および置換誤りした音声認識単語をそれぞれ選択し、前記テーブルに対し、既にその音声認識単語およびこれに対応するテキストの単語の対が格納されている場合には当該音声認識単語およびこれに対応するテキストの単語の出現頻度を更新し、そうでない場合は新たに当該音声認識単語およびこれに対応するテキストの単語の対を格納するステップと、前記音声認識単語をキーとして、前記テーブルに格納されているテキストの単語を抽出し、それらのテキストの単語を音声文書の音声認識正解単語候補に追加するステップと、この音声認識正解単語候補の集合に基づいて文書ベクトルを作成するステップと、ユーザの検索質問を入力するステップと、この検索質問に基づいてクエリ（ｑｕｅｒｙ）ベクトルを生成するステップと、このクエリベクトルと前記文書ベクトルとの類似度を計算するステップと、この類似度の高い順に前記文書ベクトルが指し示す音声文書名を検索結果として出力するステップとを実行するところにある（請求項２）。
【００１０】
本発明の第三の観点は、プログラムであって、本発明の特徴とするところは、情報処理装置にインストールすることにより、その情報処理装置に、音声文書に対する音声認識結果を出力する音声認識機能と、この音声認識結果の単語と音声から書き起こしたテキストの単語との対応関係を抽出する単語対応付け機能と、この抽出された対応関係のうち正しく音声が認識された音声認識単語および置換誤りした音声認識単語とこれに対応するテキストの単語との対からなるリストから構成される単語対テーブルに相応する機能と、前記単語対応付け機能により抽出された前記対応関係から正しく音声が認識された音声認識単語および置換誤りした音声認識単語をそれぞれ選択し、前記単語対テーブルに対し、既にその音声認識単語およびこれに対応するテキストの単語の対が格納されている場合には当該音声認識単語およびこれに対応するテキストの単語の出現頻度を更新し、そうでない場合は新たに当該音声認識単語およびこれに対応するテキストの単語の対を格納する単語対格納機能と、前記音声認識単語をキーとして、前記単語対テーブルに格納されているテキストの単語を抽出し、それらのテキストの単語を音声文書の音声認識正解単語候補に追加する単語候補拡張機能と、この音声認識正解単語候補の集合に基づいて文書ベクトルを作成する文書ベクトル作成機能と、ユーザの検索質問を入力するクエリ（ｑｕｅｒｙ）入力機能と、この検索質問に基づいてクエリベクトルを生成するクエリベクトル生成機能と、このクエリベクトルと前記文書ベクトルとの類似度を計算する類似度計算機能と、この類似度の高い順に前記文書ベクトルが指し示す音声文書名を検索結果として出力する出力機能とを実現させるところにある（請求項３）。
【００１１】
本発明の第四の観点は、本発明のプログラムが記録された前記情報処理装置読取可能な記録媒体である（請求項４）。本発明のプログラムは本発明の記録媒体に記録されることにより、前記情報処理装置は、この記録媒体を用いて本発明のプログラムをインストールすることができる。あるいは、本発明のプログラムを保持するサーバからネットワークを介して直接前記情報処理装置に本発明のプログラムをインストールすることもできる。
【００１２】
これにより、情報処理装置を用いて、音声文書内の音声認識結果の単語が正しくはどの単語の誤りとして出力されるのかを統計的に求めることによって得られる正解単語候補を用いて文書ベクトルの作成を行うことにより、ユーザが所望する音声文書を検索可能とする音声文書検索装置または方法を実現することができる。
【００１３】
【発明の実施の形態】
本発明実施形態を図１ないし図５を参照して説明する。図１は本実施形態の音声文書検索装置１のブロック構成図である。図２は音声認識部２に入力された音声が認識された結果を示す図である。図３は単語対応付け部３が実際の発声を書き起こしたテキストを形態素解析した単語列と音声認識結果の単語列とを対応付けた結果を示す図である。図４は単語対テーブル６に格納される全ての音声データとその書き起こしテキストから得られる音声認識単語とテキスト対応単語リストを示す図である。図５は単語候補拡張部５における正解単語候補、単純な重み、出現頻度による重みの対応関係を示す図である。
【００１４】
本実施形態は、音声文書検索装置１であって、本実施形態の特徴とするところは、図１に示すように、音声文書に対する音声認識結果を出力する音声認識部２と、この音声認識結果の単語と音声から書き起こしたテキストの単語（以降、テキスト対応単語という）との対応関係を抽出する単語対応付け部３と、この抽出された対応関係のうち正しく音声が認識された音声認識単語および置換誤りした音声認識単語とこれに対応するテキスト対応単語との対からなるリストから構成される単語対テーブル６と、単語対応付け部３により抽出された前記対応関係から正しく音声が認識された音声認識単語および置換誤りした音声認識単語をそれぞれ選択し、単語対テーブル６に対し、既にその音声認識単語およびこれに対応するテキスト対応単語の対が格納されている場合には当該音声認識単語およびこれに対応するテキスト対応単語の出現頻度を更新し、そうでない場合は新たに当該音声認識単語およびこれに対応するテキスト対応単語の対を格納する単語対格納部４と、前記音声認識単語をキーとして、単語対テーブル６に格納されているテキスト対応単語を抽出し、それらのテキスト対応単語を音声文書の音声認識正解単語候補に追加する単語候補拡張部５と、この音声認識正解単語候補の集合に基づいて文書ベクトルを作成する文書ベクトル生成部７と、ユーザの検索質問を入力するクエリ（ｑｕｅｒｙ）入力部８と、この検索質問に基づいてクエリベクトルを生成するクエリベクトル生成部９と、このクエリベクトルと前記文書ベクトルとの類似度を計算する類似度計算部１０と、この類似度の高い順に前記文書ベクトルが指し示す音声文書名を検索結果として出力する出力部１１とを備えたところにある（請求項１）。
【００１５】
本発明は、汎用の情報処理装置にインストールすることにより、その情報処理装置に本発明の音声文書検索装置１に相応する機能を実現させるプログラムとして実現することができる（請求項３）。このプログラムは、記録媒体に記録されて情報処理装置にインストールされ（請求項４）、あるいは通信回線を介して情報処理装置にインストールされることにより当該情報処理装置に、音声認識部２、単語対応付け部３、単語対格納部４、単語候補拡張部５、単語対テーブル６、文書ベクトル生成部７、クエリ入力部８、クエリベクトル生成部９、類似度計算部１０、出力部１１にそれぞれ相応する機能を実現させることができる。
【００１６】
以下、具体例を用いて、本実施形態の音声文書検索装置１の動作を説明する（請求項２）。図２に示すように、ここで実際の発声は「ＩＴベンチャーの中谷製作所の田中祐市部長は、新プロジェクトのシリウス・ダッシュの概要を発表した」であったとするが、音声認識の誤りのために「ＩＴベンチャーのなかったり製作所の田中唯一部長は、新プロジェクトのシリウス・ダッシュの概要を発表した」のように文字化けされたものとする。
【００１７】
図３に示すように、対応付けは、例えば読みの文字列の類似度を用いることにより、編集距離を最小にするように動的計画法を用いればよい（例えば、非特許文献１参照）。対応付けた後で、双方の表記を比較することにより、正解か誤りかを求める。誤りには表記が異なる置換誤りの他に、音声認識時に余計な単語が出力される挿入誤りとテキストの単語が脱落する削除誤りとがある。
【００１８】
単語対格納部４は音声認識の正解と置換誤りと判定された単語とそれに対応するテキストの単語を対象として、単語対テーブル６に音声認識語とそれに対応するテキストの単語を格納する。単語対テーブル６に音声認識単語がなければ新規に追加する。またすでに音声認識単語が存在すればテキスト対応単語リストの出現頻度を更新する。準備されたすべての音声データとその書き起こしテキストに対して、音声認識部２、単語対応付け部３、単語格納部４を駆動して単語対テーブル６を更新する。図４には、単語対テーブル６に格納される、すべての音声データとその書き起こしテキストから得られる音声認識単語とテキスト対応単語リストを示す。括弧内の数字は頻度を表す。
【００１９】
一方、音声文書を検索するための方法を説明する。検索のための具体的な方法としては、ベクトル空間法等の手法が確立されているので、これを用いることができる（例えば、非特許文献２参照）。ベクトル空間法では、検索対象となる文書の文書ベクトルとユーザの検索質問のクエリベクトルの類似度を計算することがポイントになる。以下では、音声文書を検索対象とした場合の文書ベクトルを作成する方法を説明する。新規に収集される音声文書を音声認識部２に通すことにより認識される各単語が出力される。単語候補拡張部５によって単語対テーブル６に格納された音声認識語とテキスト対応単語リストから、出力された各単語をキーとしてテキスト対応単語リストとその頻度を取得する。文書ベクトル生成部７は、取得された各単語リストを用いて文書ベクトルを生成する。
【００２０】
【数１】

単語リストに現れる全単語の数をｔ個、単語ベクトルをＶｉとして音声文書Ｄｒを表すことができる。重み係数［外１］は単語リストに出現する場合は１、出現しない場合は０でもよいし、単語リスト内の出現頻度の割合を用いて重みを決定してもよい。あるいは出現頻度の割合と音声認識語の出現頻度の積を用いてもよいし、この値と音声認識語の文書頻度の逆数の積を用いてもよい。例えば、図２が新規の音声文書を音声認識した結果であるとすると、図５のようになる。正解単語リストに出現しない単語の重みはすべて０となる。以上の処理を検索対象となるすべての音声文書に対して行うことにより、各文書の文書ベクトルを求めておく。
【００２１】
【外１】

次に検索の実行時について説明する。ユーザからのクエリがクエリ入力部８で受けられ、クエリベクトル生成部９により形態素解析が行われ、得られた単語に基づいてクエリベクトルを作成する。クエリベクトルも文書ベクトルと同様に表すことができる。
【００２２】
【数２】

重み係数［外２］はクエリに単語が出現すれば１、出現しなければ０とする。例えばクエリが「中谷製作所田中祐市」であるとすると、形態素解析により、「中谷製作所田中祐市」の各単語が得られる。これらの重みは１とし、これら以外の単語の重みはすべて０とする。
【００２３】
【外２】

類似度計算部１０は、クエリベクトルとすべての文書ベクトルの類似度を計算する。上記のクエリの例で説明すると、クエリベクトルと図２の文書ベクトルとの類似度は、各単語の重みの積の総和になるので、「中谷」「製作所」「田中」「祐市」の重みを考慮して、
【００２４】
【数３】

となる。上記の類似度の計算をすべての文書ベクトルに対して行う。出力部１１は、クエリベクトルと文書ベクトルとの類似度の高い音声文書から順番に所定の個数まで出力する。
【００２５】
【発明の効果】
以上、説明したように、本発明の音声文書検索システムによれば、音声認識の誤りのためにユーザが入力した検索質問と音声文書とのミスマッチの問題のうち、音声文書内の音声認識単語に対して統計的に求められた正解単語候補を用いて文書ベクトルを作成することにより、音声認識単語には存在しないユーザの検索質問との類似度を上昇させ、検索の再現率を向上させることができる。
【図面の簡単な説明】
【図１】本実施形態の音声文書検索装置のブロック構成図。
【図２】音声認識部に入力された音声が認識された結果を示す図。
【図３】単語対応付け部が実際の発声を書き起こしたテキストを形態素解析した単語列と音声認識結果の単語列とを対応付けた結果を示す図。
【図４】単語対テーブルに格納される全ての音声データとその書き起こしテキストから得られる音声認識単語とテキスト対応単語リストを示す図。
【図５】単語候補拡張部における正解単語候補、単純な重み、出現頻度による重みの対応関係を示す図。
【符号の説明】
１音声文書検索装置
２音声認識部
３単語対応付け部
４単語対格納部
５単語候補拡張部
６単語対テーブル
７文書ベクトル生成部
８クエリ入力部
９クエリベクトル生成部
１０類似度計算部
１１出力部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice document search system that searches for a desired voice document by inputting a keyword from a recorded or recorded voice document, and more particularly to a voice document and a user query in consideration of voice recognition errors. The present invention relates to a voice document search system that calculates the similarity of a document.
[0002]
[Prior art]
With the improvement in the accuracy of speech recognition technology, the speech recognition technology applied to speech present in multimedia documents and transcribed into characters has made it possible to use the conventional text document search technology to search multimedia documents. Research and development to apply. For example, in Patent Document 1, a keyword for search is extracted by performing voice recognition on voice information.
[0003]
The dictionary used for speech recognition has a limited number of vocabularies in consideration of application fields and combinations of easily uttered words in terms of recognition accuracy and processing speed. If a word that does not exist in the dictionary is input, it cannot be recognized and is output as another word. For this reason, an index for search created from a set of speech documents is configured at most by the number of words stored in a dictionary for speech recognition. On the other hand, the user performing the search does not consider the limitation of the number of vocabularies for speech recognition, and inputs all keywords.
[0004]
[Patent Document 1]
Japanese Patent Application Laid-Open No. 2001-229180 [Non-Patent Document 1]
"Dynamic Programming Algorithm (DPA) for edit-Distance", http: // www. csse. monash. edu. au? / @ Lloyd / tildeAlgDS / Dynamic / Edit /
[Non-patent document 2]
Iwanami Course Software Science 15 Natural Language Processing, Makoto Nagao, Chapter 11, Chapter [0005]
[Problems to be solved by the invention]
As described above, because of the limitation of the number of vocabulary words recognized during speech recognition, a mismatch occurs between a word in an index created from a set of speech documents and a word in a keyword input by the user. It is difficult to search for a desired voice document.
[0006]
In order to solve such a mismatch problem, the present invention uses a correct word candidate obtained by statistically obtaining which word of a speech recognition result in a voice document is correctly output as an error. The purpose of the present invention is to enable a user to search for a desired voice document by creating a document vector.
[0007]
[Means for Solving the Problems]
SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, a speech document search apparatus or method of the present invention statistically obtains a correct answer to a speech recognition error from a speech recognition result and a text corresponding thereto. It is characterized in that the degree of similarity with the query is increased by expanding the document, and the search accuracy is improved. Such a statistical word error verification can be performed, for example, by the manager of the voice document search device of the present invention as one step of maintenance management. For example, the administrator prepares a text including many words that are not included in the dictionary up to now, and inputs the text to the apparatus of the present invention as voice data and also as text data. In the device of the present invention, the input speech data and text data are collated by the configuration or procedure described below, and the word error is verified. In this way, the apparatus of the present invention can learn automatically while performing many verifications, and can improve search accuracy. This makes it possible to search for a voice document desired by the user.
[0008]
That is, a first aspect of the present invention is a voice document search device, which is characterized by a voice recognition unit that outputs a voice recognition result for a voice document, and a word and voice of the voice recognition result. A word associating unit for extracting a correspondence relationship between words of a text transcribed from the user, a speech recognition word whose speech is correctly recognized, a speech recognition word having a replacement error, and a text corresponding to the speech recognition word in the extracted correspondence relationship A word pair table composed of a list of pairs of words, and a speech recognition word whose speech has been correctly recognized from the correspondence extracted by the word correspondence unit and a speech recognition word having a replacement error. If the word-recognition word and the corresponding word pair of the text are already stored in the word-pair table, And a word pair storage unit for updating the frequency of appearance of the words of the text corresponding thereto and, if not, newly storing the speech recognition word and a pair of the corresponding text word, and using the speech recognition word as a key. A word candidate extension unit that extracts words of the text stored in the word pair table and adds the words of the text to the voice recognition correct word candidates of the voice document, and a set of the voice recognition correct word candidates. , A query input unit for inputting a user's search question, a query vector generation unit for generating a query vector based on the search question, and a query vector and the document A similarity calculation unit for calculating the similarity with the vector, and a speech document name indicated by the document vector in the descending order of the similarity. It is in place and an output unit for outputting a search result (claim 1).
[0009]
A second aspect of the present invention is a voice document search method, which is characterized by a step of outputting a voice recognition result for a voice document, and a step of writing from a word and voice of the voice recognition result. Extracting a correspondence relationship between the elicited text word and a pair of the speech recognition word in which the speech is correctly recognized and the speech recognition word having a replacement error and the corresponding text word in the extracted correspondence relationship. Constructing a table from the list, and selecting a speech recognition word whose speech has been correctly recognized and a speech recognition word having a replacement error from the extracted correspondence, respectively, and the speech recognition word and its If the word pair of the text corresponding to is stored, the frequency of occurrence of the speech recognition word and the corresponding text word is determined. Storing a new pair of the speech recognition word and the corresponding text word if not, and extracting the text word stored in the table using the speech recognition word as a key. Adding those text words to the speech recognition correct word candidates of the voice document, creating a document vector based on the set of voice recognition correct word candidates, and inputting a user's search question; Generating a query vector based on the search query; calculating a similarity between the query vector and the document vector; searching for a voice document name indicated by the document vector in descending order of the similarity And outputting the result (claim 2).
[0010]
A third aspect of the present invention is a program, and a feature of the present invention is that, when installed in an information processing apparatus, a voice recognition function for outputting a voice recognition result for a voice document to the information processing apparatus. And a word association function for extracting a correspondence between words of the speech recognition result and words of text transcribed from the speech, and a speech recognition word and a replacement error for which the speech is correctly recognized among the extracted correspondences. A function corresponding to a word pair table composed of a list of pairs of the recognized speech recognition words and corresponding text words, and speech was correctly recognized from the correspondence extracted by the word correspondence function. The speech recognition word and the replacement speech recognition word are respectively selected, and the speech recognition word and the If the word pair of the corresponding text is stored, the frequency of occurrence of the speech recognition word and the corresponding text word is updated; otherwise, the speech recognition word and the corresponding text are newly updated. A word pair storage function of storing word pairs, extracting the words of the text stored in the word pair table using the speech recognition words as keys, and recognizing those text words as speech recognition correct words of a voice document. A word candidate extension function to be added to a candidate, a document vector creation function to create a document vector based on the set of speech recognition correct word candidates, a query input function to input a user's search question, and the search question A query vector generating function of generating a query vector based on the query vector, and calculating a similarity between the query vector and the document vector. A similarity calculation function, there is to be achieved and an output function to output the resulting search audio document name the document vector in order of the similarity is indicated (claim 3).
[0011]
A fourth aspect of the present invention is a recording medium readable by the information processing device, on which the program of the present invention is recorded (claim 4). Since the program of the present invention is recorded on the recording medium of the present invention, the information processing apparatus can install the program of the present invention using the recording medium. Alternatively, the program of the present invention can be directly installed on the information processing apparatus from a server holding the program of the present invention via a network.
[0012]
This makes it possible to generate a document vector using correct word candidates obtained by statistically obtaining, using an information processing apparatus, which word of a speech recognition result in a voice document is correctly output as an error. By performing the above, it is possible to realize a voice document search device or method that enables a user to search for a voice document desired.
[0013]
BEST MODE FOR CARRYING OUT THE INVENTION
An embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram of a voice document search device 1 according to the present embodiment. FIG. 2 is a diagram illustrating a result of recognition of a voice input to the voice recognition unit 2. FIG. 3 is a diagram showing a result of associating a word string obtained by morphological analysis of a text in which an actual utterance is transcribed by the word associating unit 3 with a word string of a speech recognition result. FIG. 4 is a diagram showing all voice data stored in the word pair table 6, voice recognition words obtained from the transcribed text thereof, and a text-corresponding word list. FIG. 5 is a diagram showing the correspondence between correct word candidates, simple weights, and weights based on appearance frequencies in the word candidate expansion unit 5.
[0014]
The present embodiment is a speech document retrieval apparatus 1, which is characterized by a speech recognition unit 2 that outputs a speech recognition result for a speech document, as shown in FIG. A word associating unit 3 for extracting a correspondence between a word of the text and a word of a text transcribed from the speech (hereinafter, referred to as a text-corresponding word); A word pair table 6 composed of a list of pairs of a speech recognition word having a replacement error and a text-corresponding word corresponding thereto, and the speech was correctly recognized from the correspondence extracted by the word correspondence unit 3. The speech recognition word and the replaced speech recognition word are respectively selected, and the speech recognition word and the text-corresponding word corresponding thereto are already stored in the word pair table 6. If the pair is stored, the frequency of appearance of the speech recognition word and the corresponding text-corresponding word is updated; otherwise, the pair of the speech recognition word and the corresponding text-corresponding word is newly stored. A word pair storage unit 4 for extracting a text-corresponding word stored in a word-pair table 6 using the speech recognition word as a key, and adding the text-corresponding word to a speech recognition correct word candidate of a voice document A candidate extension unit 5, a document vector generation unit 7 for creating a document vector based on the set of speech recognition correct word candidates, a query (query) input unit 8 for inputting a user's search question, and a And a similarity calculator 1 for calculating a similarity between the query vector and the document vector. If, there is to provided with an output unit 11 for outputting the audio document name indicated by the said document vector in order of the similarity as a search result (claim 1).
[0015]
The present invention can be realized as a program that, when installed in a general-purpose information processing device, causes the information processing device to realize a function corresponding to the voice document search device 1 of the present invention (claim 3). The program is recorded on a recording medium and installed in the information processing apparatus (Claim 4), or installed in the information processing apparatus via a communication line, so that the information processing apparatus has the voice recognition unit 2 Tagging unit 3, word pair storage unit 4, word candidate expansion unit 5, word pair table 6, document vector generation unit 7, query input unit 8, query vector generation unit 9, similarity calculation unit 10, and output unit 11, respectively. Function can be realized.
[0016]
Hereinafter, the operation of the voice document search device 1 of the present embodiment will be described using a specific example (claim 2). As shown in Fig. 2, the actual utterance here is that "IT Venture Manager Nakatani's Yuichi Tanaka announced the outline of the new project Sirius Dash." It should be garbled as follows: "There is no IT venture or the only director of the factory, Tanaka, has presented an overview of the new project Sirius Dash."
[0017]
As shown in FIG. 3, the correspondence may be determined by using a dynamic programming method to minimize the editing distance by using, for example, the similarity of the character string of the reading (for example, see Non-Patent Document 1). After associating, the two expressions are compared to determine whether they are correct or incorrect. In addition to substitution errors with different notations, errors include insertion errors in which extra words are output during speech recognition and deletion errors in which text words are dropped.
[0018]
The word pair storage unit 4 stores the speech recognition words and the corresponding text words in the word pair table 6 for the words determined as the correct answer and the replacement error of the speech recognition and the corresponding text words. If there is no speech recognition word in the word pair table 6, a new word is added. If the speech recognition word already exists, the appearance frequency of the text-corresponding word list is updated. The speech recognition unit 2, the word association unit 3, and the word storage unit 4 are driven to update the word pair table 6 for all the prepared speech data and the transcribed text. FIG. 4 shows all speech data stored in the word pair table 6 and speech recognition words and text-corresponding word lists obtained from the transcribed text. Numbers in parentheses indicate frequency.
[0019]
Meanwhile, a method for searching for a voice document will be described. As a specific method for searching, a method such as a vector space method has been established, and this method can be used (for example, see Non-Patent Document 2). The point of the vector space method is to calculate the similarity between the document vector of the document to be searched and the query vector of the user's search question. Hereinafter, a method of creating a document vector when a speech document is to be searched will be described. Each word recognized by passing a newly collected voice document through the voice recognition unit 2 is output. The word candidate expansion unit 5 obtains a text corresponding word list and its frequency from the speech recognition words and the text corresponding word list stored in the word pair table 6 using each output word as a key. The document vector generation unit 7 generates a document vector using each of the acquired word lists.
[0020]
(Equation 1)

The voice document Dr can be represented by setting the number of all words appearing in the word list to t and the word vector to Vi. The weight coefficient [outside 1] may be 1 if it appears in the word list, 0 if it does not appear, or the weight may be determined using the ratio of the appearance frequency in the word list. Alternatively, the product of the ratio of the appearance frequency and the appearance frequency of the speech recognition word may be used, or the product of this value and the reciprocal of the document frequency of the speech recognition word may be used. For example, if FIG. 2 is the result of speech recognition of a new speech document, the result is as shown in FIG. The weights of words that do not appear in the correct word list are all zero. The above processing is performed on all the audio documents to be searched, thereby obtaining document vectors of the respective documents.
[0021]
[Outside 1]

Next, the execution of the search will be described. A query from a user is received by a query input unit 8, a morphological analysis is performed by a query vector generation unit 9, and a query vector is created based on the obtained words. The query vector can be represented in the same way as the document vector.
[0022]
(Equation 2)

The weight coefficient [2] is 1 if a word appears in the query, and 0 if it does not. For example, if the query is “Yuichi Tanaka, Nakaya Seisakusho”, each word of “Yuichi Tanaka, Nakatani Seisakusho” is obtained by morphological analysis. These weights are set to 1, and the weights of the other words are all set to 0.
[0023]
[Outside 2]

The similarity calculator 10 calculates the similarity between the query vector and all the document vectors. Explaining in the above query example, the similarity between the query vector and the document vector in FIG. 2 is the sum of the products of the weights of the words, so the weights of “Nakaya”, “Manufacturer”, “Tanaka”, and “Yuichi” are In consideration of,
[0024]
[Equation 3]

It becomes. The above similarity calculation is performed for all document vectors. The output unit 11 outputs up to a predetermined number of voice documents in descending order of similarity between the query vector and the document vector.
[0025]
【The invention's effect】
As described above, according to the speech document retrieval system of the present invention, the problem of mismatch between the search query input by the user due to an error in speech recognition and the speech document includes the speech recognition word in the speech document. By creating a document vector using the correct word candidates statistically obtained, the similarity with the search query of the user who is not present in the speech recognition word can be increased, and the recall of the search can be improved. it can.
[Brief description of the drawings]
FIG. 1 is a block configuration diagram of a voice document search device according to an embodiment.
FIG. 2 is a diagram illustrating a result of recognition of a voice input to a voice recognition unit.
FIG. 3 is a view showing a result of associating a word string obtained by morphological analysis of a text in which an actual utterance is transcribed by a word associating unit with a word string obtained as a result of speech recognition;
FIG. 4 is a diagram showing all voice data stored in a word pair table, a voice recognition word obtained from a transcribed text thereof, and a text-corresponding word list.
FIG. 5 is a diagram showing a correspondence relationship between correct word candidates, simple weights, and weights based on appearance frequencies in a word candidate expansion unit.
[Explanation of symbols]
REFERENCE SIGNS LIST 1 voice document search device 2 voice recognition unit 3 word association unit 4 word pair storage unit 5 word candidate expansion unit 6 word pair table 7 document vector generation unit 8 query input unit 9 query vector generation unit 10 similarity calculation unit 11 output unit

Claims

A voice recognition unit that outputs a voice recognition result for the voice document;
A word associating unit for extracting a correspondence between words of the speech recognition result and words of a text transcribed from the speech,
A word pair table comprising a list of pairs of a speech recognition word whose speech is correctly recognized and a speech recognition word having a replacement error and a text word corresponding to the speech recognition word in the extracted correspondence,
From the correspondence extracted by the word associating unit, a speech recognition word whose speech is correctly recognized and a speech recognition word having a replacement error are respectively selected, and the speech recognition word and the corresponding speech recognition word are already registered in the word pair table. If the word pair of the text to be stored is stored, the frequency of appearance of the speech recognition word and the text word corresponding thereto is updated, otherwise, the speech recognition word and the text corresponding thereto are newly updated. A word pair storage unit for storing word pairs,
A word candidate extension unit that extracts words of a text stored in the word pair table using the voice recognition words as keys, and adds the words of the text to the voice recognition correct word candidates of the voice document.
A document vector generation unit that generates a document vector based on the set of the voice recognition correct word candidates;
A query (query) input unit for inputting a user's search question;
A query vector generation unit that generates a query vector based on the search query;
A similarity calculator that calculates the similarity between the query vector and the document vector;
An output unit that outputs, as a search result, a voice document name indicated by the document vector in the order of the degree of similarity.

Outputting a speech recognition result for the speech document;
Extracting the correspondence between the words of the speech recognition result and the words of the text transcribed from the speech;
Constructing a table from a list of pairs of a speech recognition word in which speech has been correctly recognized and a speech recognition word having a replacement error and a text word corresponding thereto in the extracted correspondence,
From the extracted correspondence, a speech recognition word whose speech is correctly recognized and a speech recognition word having a replacement error are respectively selected, and a pair of the speech recognition word and the word of the text corresponding thereto are already stored in the table. Updating the frequency of occurrence of the speech recognition word and the text word corresponding to the speech recognition word, and storing the new speech recognition word and the corresponding text word pair otherwise. ,
Using the speech recognition word as a key, extracting text words stored in the table, and adding those text words to speech recognition correct word candidates of the speech document;
Creating a document vector based on the set of speech recognition correct word candidates;
Entering a search query for the user;
Generating a query vector based on the search query;
Calculating a similarity between the query vector and the document vector;
Outputting the name of the voice document indicated by the document vector in the descending order of similarity as a search result.

By installing it on an information processing device,
A voice recognition function for outputting a voice recognition result for the voice document;
A word association function for extracting a correspondence between words in the speech recognition result and words in a text transcribed from the speech,
A function corresponding to a word pair table comprising a list of pairs of a speech recognition word whose speech has been correctly recognized, a speech recognition word having a replacement error, and a corresponding text word among the extracted correspondences; ,
From the correspondence extracted by the word correspondence function, a speech recognition word for which speech is correctly recognized and a speech recognition word having a replacement error are respectively selected, and the speech recognition word and its corresponding If the word pair of the text to be stored is stored, the frequency of appearance of the speech recognition word and the text word corresponding thereto is updated, otherwise, the speech recognition word and the text corresponding thereto are newly updated. A word pair storage function for storing pairs with words,
A word candidate extension function for extracting the words of the text stored in the word pair table using the voice recognition words as keys, and adding those text words to the voice recognition correct word candidates of the voice document;
A document vector creation function for creating a document vector based on the set of the voice recognition correct word candidates;
A query (query) input function for inputting a user's search question;
A query vector generation function for generating a query vector based on the search query;
A similarity calculation function for calculating a similarity between the query vector and the document vector;
An output function of outputting, as a search result, a speech document name indicated by the document vector in the order of the degree of similarity.

A recording medium readable by the information processing device, wherein the program according to claim 3 is recorded.