JP2009271117A

JP2009271117A - Voice retrieval device and voice retrieval method

Info

Publication number: JP2009271117A
Application number: JP2008118815A
Authority: JP
Inventors: Yoichi Fujii; 洋一藤井; Yohei Okato; 洋平岡登; Tomohiro Iwasaki; 知弘岩崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2008-04-30
Filing date: 2008-04-30
Publication date: 2009-11-19
Anticipated expiration: 2028-04-30
Also published as: JP5004863B2

Abstract

PROBLEM TO BE SOLVED: To provide a voice retrieval device and a voice retrieval method, wherein voice recognition accuracy is improved, and a recognition result that is easily understood by a user is presented. SOLUTION: A voice recognition section 4 performs voice recognition on an input voice, by referring to a sound reference pattern data base 2 and a word dictionary 3 for recognition. A data base retrieval section 9 obtains a retrieval result corresponding to the recognition result by referring to a data base 8 for retrieval, and stores it with a retrieval score which indicates a similarity degree with the recognition result, in a retrieval result data storage section 10. A recognition result correction section 11 collates a word included in the retrieval result to a node network through the recognition result, and performs correction by replacing it with a similar word, and performs ordering of the retrieval result by correcting the retrieval score based on the corrected recognition result. A candidate presenting section 12 presents the retrieval result of a retrieval score order, and the corrected recognition result corresponding to each retrieval result. COPYRIGHT: (C)2010,JPO&INPIT

Description

この発明は、音声認識の結果に基づき検索されたあいまい性を含む検索結果を提示する音声検索装置および音声検索方法に関するものである。 The present invention relates to a voice search device and a voice search method for presenting a search result including ambiguity searched based on a result of voice recognition.

音声による入力は、キーボードまたはタッチパネルによる入力と比べて初心者でも素早い入力が可能であり、他のタスクで目または手が塞がっている場合でも入力実行が可能であるという利点がある。近年では、大語彙連続音声認識とデータベース検索とを組み合わせて、音声による全文検索および名称検索が検討されている。このとき、音声認識には認識可能な語彙の制限があること、および音声認識結果には認識誤りが含まれることを考慮する必要がある。 Voice input has an advantage that even a beginner can input faster than input using a keyboard or a touch panel, and input can be executed even when eyes or hands are blocked by other tasks. In recent years, full-text search and name search using speech have been studied by combining large vocabulary continuous speech recognition and database search. At this time, it is necessary to consider that there are restrictions on vocabulary that can be recognized in speech recognition, and that speech recognition results include recognition errors.

このための具体的な方策として、例えば特許文献１では、音声認識用辞書に未登録の未知語を考慮した音声認識を行うために、単語より短く、かつ少ない種類数で表されるサブワード（音節）と単語とを併用した言語モデルによって音声認識を行う検索装置が開示されている。この検索装置は、先ずユーザ発話中で単語として認識された部分のみを用いて検索対象文書を初期検索して検索結果を取得し、次にユーザ発話中で単語として認識されなかった部分である未知語を補間するために、検索結果中のキーワードからサブワードの連鎖（音節列）に一致または類似するキーワードを決定する。そして、未知語が補間されたキーワードを用いて再度検索対象文書を検索することにより、検索精度の向上を図っていた。 As a specific measure for this, for example, in Patent Document 1, in order to perform speech recognition in consideration of an unknown word that is not registered in the speech recognition dictionary, subwords (syllables) that are shorter than words and represented by a smaller number of types are used. ) And a word model used together, a search device that performs speech recognition is disclosed. This search device first searches for a search target document using only a part recognized as a word in a user utterance, acquires a search result, and then is an unknown part that is not recognized as a word in a user utterance. In order to interpolate words, keywords that match or are similar to the subword chain (syllable string) are determined from the keywords in the search results. Then, the search accuracy is improved by searching the search target document again using the keyword in which the unknown word is interpolated.

特開２００３−２７１６２９号公報JP 2003-271629 A

従来の音声検索装置および音声検索方法は以上のように構成されているので、語彙制限および誤認識による音声認識のあいまい性を考慮して、音声による文書検索の精度を向上させることが可能となっている。特許文献１に開示の検索装置の場合、限定された単語と音節を用いて音声認識を行い、認識された単語を使って文書を検索し、検索結果に含まれる音節列と検索結果中のキーワードとのマッチングを行う。ユーザ発話が文書であれば、キーワードを用いた検索対象文書の検索により、ある程度の精度の文書検索結果を取得できる可能性が高い。
しかしながら、ユーザ発話が大規模施設名等の施設名の場合には、検索結果中にキーワードが含まれなかったり、類似したキーワードと誤認識したりすることにより音声認識が失敗することがあり、所望の施設名を検索できないという課題があった。 Since the conventional voice search apparatus and voice search method are configured as described above, it is possible to improve the accuracy of voice document search in consideration of vocabulary restrictions and ambiguity of voice recognition due to misrecognition. ing. In the case of the search device disclosed in Patent Document 1, speech recognition is performed using limited words and syllables, a document is searched using the recognized words, syllable strings included in the search results, and keywords in the search results Match with. If the user utterance is a document, there is a high possibility that a document search result with a certain degree of accuracy can be obtained by searching a search target document using a keyword.
However, when the user utterance is a facility name such as a large-scale facility name, the speech recognition may fail due to the fact that the keyword is not included in the search result or the keyword is erroneously recognized as a similar keyword. There was a problem that the facility name of could not be searched.

この発明は、上記のような課題を解決するためになされたもので、音声認識によるあいまい性を考慮した検索を行い、その検索結果の情報を利用して音声認識結果の正当性を検証して補正することにより、認識精度の向上を図ると共に、利用者への認識結果提示において、利用者が理解しやすい認識結果を提示することを目的とする。 The present invention has been made to solve the above-described problems, and performs a search in consideration of ambiguity due to voice recognition, and verifies the validity of the voice recognition result by using the information of the search result. The purpose of the correction is to improve the recognition accuracy and present a recognition result that is easy for the user to understand when presenting the recognition result to the user.

この発明に係る音声検索装置は、入力された音声に応じた単語列を認識結果として出力する音声認識部と、認識結果を単語より小さい単位のサブワードに分割して検索キーに用い、サブワード単位に区切られた索引を含む検索対象文書を検索して検索結果を出力する検索部と、検索結果に基づいて認識結果を補正する認識結果補正部と、検索結果および認識結果補正部で補正した認識結果のうちの少なくとも一方を提示する候補提示部とを備えるようにしたものである。 The speech search device according to the present invention uses a speech recognition unit that outputs a word string corresponding to an input speech as a recognition result, and divides the recognition result into subwords in units smaller than words, and uses them as search keys. A search unit that searches a search target document including a delimited index and outputs a search result, a recognition result correction unit that corrects a recognition result based on the search result, and a recognition result corrected by the search result and the recognition result correction unit And a candidate presenting unit that presents at least one of them.

この発明によれば、入力された音声に応じた単語列を認識結果とし、認識結果を単語より小さい単位のサブワードに分割して検索キーに用い、サブワード単位に区切られた索引を含む検索対象文書を検索して検索結果とし、検索結果に基づいて認識結果を補正して、検索結果および補正した認識結果のうちの少なくとも一方を提示するようにしたので、音声認識精度の向上を図ると共に、利用者が理解しやすい認識結果を提示することができる。 According to the present invention, a search target document including an index divided into subwords by using a word string corresponding to the input speech as a recognition result, dividing the recognition result into subwords smaller than words and using them as search keys. Is used as a search result, the recognition result is corrected based on the search result, and at least one of the search result and the corrected recognition result is presented. Recognition results that are easy for a person to understand can be presented.

実施の形態１．
図１は、この発明の実施の形態１に係る音声検索装置の構成を示すブロック図である。図１に示す音声検索装置１は、音声認識に用いる音声の最小単位毎の音響特徴量を格納する音響標準パタンデータベース２、単語定義辞書および単語接続辞書からなる認識用単語辞書３、入力音声に対して音声分析処理、照合処理および探索処理からなる音声認識を行う音声認識部４、認識結果を用いて検索用データベース８を参照し検索結果を取得するデータベース検索部９、検索結果をリストにして格納する検索結果データ格納部１０、検索結果データ格納部１０の検索結果を用いて認識結果の補正を行う認識結果補正部１１、検索結果および補正した認識結果を提示する候補提示部１２を備える。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of a speech search apparatus according to Embodiment 1 of the present invention. A speech search apparatus 1 shown in FIG. 1 includes an acoustic standard pattern database 2 that stores acoustic feature quantities for each minimum unit of speech used for speech recognition, a recognition word dictionary 3 that includes a word definition dictionary and a word connection dictionary, and input speech. On the other hand, a speech recognition unit 4 that performs speech recognition including speech analysis processing, collation processing, and search processing, a database search unit 9 that refers to the search database 8 using the recognition results, and obtains the search results. A search result data storage unit 10 to be stored, a recognition result correction unit 11 for correcting the recognition result using the search result of the search result data storage unit 10, and a candidate presentation unit 12 for presenting the search result and the corrected recognition result are provided.

この音声検索装置１は、特に大規模施設名検索のように短い単語列でありながら、省略された表現が一般的に許される検索対象を入力音声に用いて検索を行う検索装置に関するものであり、データベース検索部９において単語より小さい単位である音素等のサブワードの集合を検索キーに用いて検索用データベース８を検索する。音声認識装置１は従来の音声検索装置と異なり、サブワード単位の区切りを利用するデータベース検索部９、認識結果補正部１１および候補提示部１２を備え、データベース検索部９において、単語より小さいサブワードの集合を検索キーとして検索用データベース８を検索し、認識結果補正部１１において、求まった検索結果と音声認識の認識結果との照合を行い、照合された補正認識結果候補をもとに検索結果の順位付けを補正して、候補提示部１２において利用者に提示する。 This speech search apparatus 1 relates to a search apparatus that performs a search using input objects as search targets that are generally allowed to be omitted, although they are short word strings as in a large-scale facility name search. The database search unit 9 searches the search database 8 using a set of subwords such as phonemes, which are units smaller than words, as search keys. Unlike the conventional speech search device, the speech recognition apparatus 1 includes a database search unit 9 that uses a subword unit delimiter, a recognition result correction unit 11, and a candidate presentation unit 12. In the database search unit 9, a set of subwords smaller than words. Is used as a search key to search the search database 8, and the recognition result correction unit 11 compares the obtained search result with the recognition result of speech recognition, and ranks the search results based on the corrected correction recognition result candidates. The candidate is corrected and presented to the user in the candidate presentation unit 12.

図１に示す音響標準パタンデータベース２は、音素等の音声認識の最小単位毎に、スペクトルと時間的な特徴とからなる音響パタンを格納する。図２は、この発明の実施の形態１に係る音声検索装置で用いられる隠れマルコフモデル（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ；ＨＭＭ）の構造を示す説明図であり、隠れマルコフモデルのトポロジーの例を示す。音響標準パタンとして、例えば図２に示すように、音素を単位とする３状態が時系列に並んだ隠れマルコフモデルを用いる。各状態２１はそれぞれ自己回帰アーク２２を有し、かつ後戻りアークを有さない。各状態に対応する音響特徴量は、共分散を保持する８混合ガウス分布により表される。音響標準パタンのパラメータは、予め多数の話者の学習用音声データから推定しておく。 The acoustic standard pattern database 2 shown in FIG. 1 stores an acoustic pattern composed of a spectrum and temporal characteristics for each minimum unit of speech recognition such as phonemes. FIG. 2 is an explanatory diagram showing the structure of a hidden Markov model (HMM) used in the speech search apparatus according to Embodiment 1 of the present invention, and shows an example of the topology of the hidden Markov model. As an acoustic standard pattern, for example, as shown in FIG. 2, a hidden Markov model in which three states in units of phonemes are arranged in time series is used. Each state 21 has an autoregressive arc 22 and no back arc. The acoustic feature amount corresponding to each state is represented by an 8-mixed Gaussian distribution that maintains covariance. The parameters of the acoustic standard pattern are estimated in advance from learning voice data of a large number of speakers.

認識用単語辞書３は、認識対象である入力音声を音響標準パタンの組み合わせで記述するために用いられる辞書である。具体的には、音響標準パタンに認識対象の言語的な出力対象と制約を記述した単語定義辞書と、単語間のつながりを記述した単語接続辞書からなる。
大規模施設名の検索を対象とした場合、音声検索装置１が全ての単語を認識用単語辞書３に登録して処理することは、メモリおよび処理速度の観点から現実的ではない。そのため、音声検索装置１は、典型的な単語とその他の形態素をサブワードのまま表現した認識用単語辞書３を使用するものとする。図３は、この発明の実施の形態１に係る音声検索装置の認識用単語辞書の１例を示す説明図である。図３に示すサブワードおよび単語定義辞書の例において、単語またはサブワードと音響標準パタンの連鎖とが対になっている。
また、図４は、この発明の実施の形態１に係る音声検索装置の認識用単語辞書の１例を示す説明図である。図４に示す単語接続辞書の例において、３つの単語またはサブワードの連鎖（トライグラム）の組み合わせと連鎖する確率とが対になっている。 The recognition word dictionary 3 is a dictionary used to describe the input speech to be recognized as a combination of acoustic standard patterns. Specifically, it consists of a word definition dictionary describing linguistic output targets and constraints to be recognized in an acoustic standard pattern, and a word connection dictionary describing connections between words.
When searching for a large-scale facility name, it is not realistic from the viewpoint of memory and processing speed that the voice search device 1 registers and processes all words in the recognition word dictionary 3. For this reason, the speech search apparatus 1 uses a recognition word dictionary 3 in which typical words and other morphemes are expressed as subwords. FIG. 3 is an explanatory diagram showing an example of a recognition word dictionary of the speech search apparatus according to Embodiment 1 of the present invention. In the example of the subword and word definition dictionary shown in FIG. 3, a word or subword and a chain of acoustic standard patterns are paired.
FIG. 4 is an explanatory diagram showing an example of a recognition word dictionary of the speech search apparatus according to Embodiment 1 of the present invention. In the example of the word connection dictionary shown in FIG. 4, a combination of a chain (trigram) of three words or subwords and a probability of chaining are paired.

図１に示す音声認識部４は、音声分析処理を行う音声分析部５、照合処理を行う照合部６、探索処理を行う探索部７を有し、入力された音声を音響標準パタンデータベース２および認識用単語辞書３と照合し、照合の度合いを表すスコアが高い単語の組み合わせを認識結果として出力する。 The speech recognition unit 4 shown in FIG. 1 includes a speech analysis unit 5 that performs speech analysis processing, a verification unit 6 that performs verification processing, and a search unit 7 that performs search processing. The recognition word dictionary 3 is collated, and a combination of words having a high score representing the degree of collation is output as a recognition result.

音声分析部５は、入力音声を音声認識に好適な音響特徴量へ変換する。音響特徴量の算出方法として、例えば音声分析部５は入力音声を標本化周期１６ｋＨｚ・１６ビットでＡ／Ｄ変換し、時間フレーム１０ｍｓ間隔で２５６点フーリエ変換を行ってパワースペクトルを求め、振幅軸および周波数軸をそれぞれ対数化した後で逆フーリエ変換を行う。このように算出した１２次元のメルケプストラムと、その時間方向の１次回帰係数１２次元の合計２４次元を音響特徴量として用いる。 The voice analysis unit 5 converts the input voice into an acoustic feature value suitable for voice recognition. As an acoustic feature amount calculation method, for example, the speech analysis unit 5 performs A / D conversion on input speech at a sampling period of 16 kHz and 16 bits, performs 256-point Fourier transform at intervals of 10 ms in a time frame, and obtains a power spectrum. Inverse Fourier transform is performed after logarithmizing the frequency axis. A total of 24 dimensions including the 12-dimensional mel cepstrum thus calculated and the 12-dimensional primary regression coefficient in the time direction is used as the acoustic feature amount.

照合部６は、音声分析部５で算出した音響特徴量と音響標準パタンデータベース２に格納されている音響標準パタンとを照合し、照合度合いを表すスコアを算出する。
探索部７は、認識用単語辞書３の単語またはサブワードと音響標準パタンとの対応付け、ならびに単語またはサブワードの組み合わせを参照し、入力音声の単語の接続関係に基づき音声全体に対する累積スコアが高くなる認識候補を探索し、認識結果の単語列を出力する。なお、探索部７は、スコアが高い上位の複数候補を認識結果として出力してもよく、またはスコアと共に認識結果を出力してもよい。 The collation unit 6 collates the acoustic feature amount calculated by the voice analysis unit 5 with the acoustic standard pattern stored in the acoustic standard pattern database 2, and calculates a score representing the degree of collation.
The search unit 7 refers to the correspondence between the words or subwords in the recognition word dictionary 3 and the acoustic standard patterns, and combinations of the words or subwords, and the cumulative score for the entire speech increases based on the word connection relationship of the input speech. Search for recognition candidates and output a word string as a recognition result. In addition, the search part 7 may output a high-order multiple candidate with a high score as a recognition result, or may output a recognition result with a score.

なお、音声認識部４による音声認識の詳細な方法については、「音声認識の基礎（上）（下）、ＬａｗｒｅｎｃｅＲａｂｉｎｅｒ、Ｂｉｉｎｇ−ＨｗａｎｇＪｕａｎｇ共著、古井貞煕監訳、ＮＴＴアドバンステクノロジ株式会社」に説明されている。 The detailed method of speech recognition by the speech recognition unit 4 is explained in “Basics of Speech Recognition (above) (below), co-authored by Lawrence Labiner, Biing-Hwang Jung, Translated by Sadahiro Furui, NTT Advanced Technology Co., Ltd.” Has been.

検索用データベース８は、施設名称等の検索対象の単語列を格納する。図５は、この発明の実施の形態１に係る音声検索装置の検索用データベースの１例を示す説明図である。検索用データベース８は、図５の例に示すように、少なくとも施設名称と単語に分割された読み情報とを有する。
検索用データベース８は、通常、予め作成された検索用の索引を備え、検索を効率化する。なお、データベースからの情報検索方法および索引作成方法については、「情報検索アルゴリズム、北研二、津田和彦、獅子堀正幹共著、共立出版株式会社」に説明されている。本実施の形態では、検索用データベース８には予めサブワードに区切られた索引が作成してあり、データベース検索部９が任意のサブワードに対して検索可能な構成となっている。 The search database 8 stores a search target word string such as a facility name. FIG. 5 is an explanatory diagram showing an example of a search database of the voice search device according to Embodiment 1 of the present invention. As shown in the example of FIG. 5, the search database 8 includes at least facility names and reading information divided into words.
The search database 8 is usually provided with a search index created in advance to improve the search efficiency. Information retrieval methods and index creation methods from databases are described in “Information retrieval algorithm, Kenji Kita, Kazuhiko Tsuda, Masami Isogobori, Kyoritsu Publishing Co., Ltd.”. In the present embodiment, an index divided into subwords is created in advance in the search database 8, and the database search unit 9 is configured to be able to search for an arbitrary subword.

データベース検索部９は、検索用データベース８を参照し、音声認識部４が出力した認識結果に対応した検索結果を取得する。音声認識のあいまい性を考慮する方法として、データベース検索部９は認識結果の全てのサブワードを検索キーとして検索用データベース８を検索し、候補となる施設名称を取得する。さらにデータベース検索部９は検索に用いた全てのサブワードと候補となる施設名称に含まれるサブワードとを比較し、一致するサブワード数をもとに検索結果をスコアリングする。 The database search unit 9 refers to the search database 8 and acquires a search result corresponding to the recognition result output by the voice recognition unit 4. As a method for taking into account the ambiguity of speech recognition, the database search unit 9 searches the search database 8 using all the subwords of the recognition result as search keys, and acquires candidate facility names. Furthermore, the database search unit 9 compares all the subwords used for the search with the subwords included in the candidate facility names, and scores the search results based on the number of matching subwords.

図６は、この発明の実施の形態１に係る音声検索装置の検索結果の１例を示す説明図である。データベース検索部９は、図６に示すような施設名称、読み情報および検索スコアを含む検索結果を検索結果データ格納部１０および認識結果補正部１１へ出力する。なお、検索結果に含まれるＩＤは、施設名称を識別する目的で付与されており、音声認識装置１の一連の処理を通して不変とする。
検索結果データ格納部１０は、データベース検索部９が出力した検索結果をリストにして格納する。 FIG. 6 is an explanatory diagram showing an example of a search result of the voice search device according to Embodiment 1 of the present invention. The database search unit 9 outputs a search result including the facility name, reading information, and search score as shown in FIG. 6 to the search result data storage unit 10 and the recognition result correction unit 11. The ID included in the search result is given for the purpose of identifying the facility name, and is unchanged through a series of processes of the voice recognition device 1.
The search result data storage unit 10 stores the search results output from the database search unit 9 as a list.

認識結果補正部１１は検索結果データ格納部１０を参照して検索結果のリストと音声認識部４で取得した認識結果とを照合し、認識結果に含まれる単語の補正を行い、補正認識結果を候補提示部１２へ出力する。認識結果補正部１１はさらに、補正認識結果に基づいて検索結果データ格納部１０に格納されている検索結果の検索スコアを補正することにより、検索結果リストの順位付け補正を行う。 The recognition result correction unit 11 refers to the search result data storage unit 10 to collate the list of search results with the recognition result acquired by the speech recognition unit 4, correct a word included in the recognition result, and obtain the corrected recognition result. Output to the candidate presentation unit 12. The recognition result correction unit 11 further corrects the ranking of the search result list by correcting the search score of the search result stored in the search result data storage unit 10 based on the correction recognition result.

候補提示部１２は、検索結果データ格納部１０に格納された検索結果の所定の順位のデータを利用者に提示すると共に、認識結果を補正した補正認識結果も同時に提示する。提示の方法として、例えば候補提示部１２はモニタ画面等を備え、画面上に検索結果および補正認識結果を表示する。 The candidate presenting unit 12 presents data of a predetermined rank of the search results stored in the search result data storage unit 10 to the user, and also presents a corrected recognition result obtained by correcting the recognition result. As a presentation method, for example, the candidate presentation unit 12 includes a monitor screen and displays a search result and a correction recognition result on the screen.

次に、音声検索装置１の動作を説明する。図７は、この発明の実施の形態１に係る音声検索装置の動作を示すフローチャートである。ここでは、施設名称の検索を例にとり、音声認識結果とデータベース検索結果に基づく認識精度向上、および利用者への認識結果提示内容の生成方法について説明する。説明を単純にするために、利用者は「マルキョードームツアイテン」と発声し、施設名を検索することを意図しているものとする。
また、図１に示す検索用データベース８には予めサブワードに区切られた索引が作成されており、データベース検索部９による任意のサブワード検索が可能であるものとする。 Next, the operation of the voice search device 1 will be described. FIG. 7 is a flowchart showing the operation of the speech search apparatus according to Embodiment 1 of the present invention. Here, taking a facility name search as an example, a recognition accuracy improvement based on a speech recognition result and a database search result, and a method of generating recognition result presentation contents to a user will be described. To simplify the explanation, it is assumed that the user intends to search for the name of the facility by saying “Marchy Dome Zaiten”.
Further, it is assumed that an index divided into subwords is created in advance in the search database 8 shown in FIG. 1, and that an arbitrary subword search by the database search unit 9 is possible.

図７に示すステップＳＴ１において、先ず音声認識部４は入力された音声を認識して、データベース検索部９への入力となる認識結果を出力する。ここでは、「マルキョードームツアイテン」という入力音声に対して、音声認識部４により「マルキュードーブツアイテル」という認識結果が取得され出力される。 In step ST 1 shown in FIG. 7, the speech recognition unit 4 first recognizes the input speech and outputs a recognition result that is input to the database search unit 9. Here, a recognition result of “Marque Doubts Eyetel” is acquired and output by the speech recognition unit 4 with respect to the input sound “Markov Dome Eyes”.

ステップＳＴ２において、データベース検索部９は、検索用データベース８を参照して検索結果を出力し、検索結果データ格納部１０に格納させる。
データベース検索部９は、認識結果として取得された「マルキュードーブツアイテル」をサブワード「マ」、「ル」、「キュー」、「ドー」、「ブ」、「ツ」、「ア」、「イ」、「テ」、「ル」に分解する。次に、データベース検索部９は各サブワードを検索キーに用いて、図５に示す検索対象データが蓄積された検索用データベース８を検索する。具体的な検索手法としては、例えば文書検索方式として利用されるベクトル空間モデルを使った検索手法において、検索に用いられる単語の代わりに、サブワードを用いて検索を行う手法が考えられる。
データベース検索部９により「マルキュードーブツアイテル」をキーとして図５に示す検索用データベース８を検索した結果、図６に示す施設名称の検索結果が取得され、検索スコアが付与されることとする。データベース検索部９は、これらの検索結果のうち、上位Ｎ（ここではＮ＝４とする）件のＩＤ＝８，１，９，１０の施設名称を検索結果として出力する。 In step ST 2, the database search unit 9 outputs a search result with reference to the search database 8 and stores it in the search result data storage unit 10.
The database search unit 9 subtracts “Marchy Doubutsuiteru” acquired as a recognition result from the subwords “ma”, “le”, “cue”, “do”, “bu”, “tu”, “a”, “b”. , “Te”, “le”. Next, the database search unit 9 uses the subwords as search keys to search the search database 8 in which the search target data shown in FIG. 5 is stored. As a specific search method, for example, in a search method using a vector space model used as a document search method, a method of performing a search using a subword instead of a word used for the search can be considered.
As a result of searching the database for search 8 shown in FIG. 5 by using the “Malque Dove's Eyetel” as a key by the database search unit 9, the search result of the facility name shown in FIG. 6 is acquired and a search score is given. Of these search results, the database search unit 9 outputs the facility names of ID = 8, 1, 9, 10 of the top N (N = 4 here) items as the search results.

ステップＳＴ３において、認識結果補正部１１は先ず上位Ｎ件の検索結果からネットワーク表現を作成する。図８は、この発明の実施の形態１に係る音声検索装置の認識結果補正部が作成するネットワーク構造図の１例を示す説明図である。このネットワーク表現とは、検索結果ＩＤ＝８，１，９，１０の施設名称毎に含まれる単語（読み情報の「｜」で区切られた単位）の言い換え表現を受理するネットワークである。このネットワークの特徴は、各単語３１に対して、予め間違いやすいサブワードへの可能性を展開した構造であるあいまいネットワーク３２，３３を生成すること、および任意のサブワードを通過させるフィラー要素３４を生成することにある。
ここでは、全ての単語を任意の順番で接続可能なネットワークとしたが、言語制約を考慮して適当に変形してもよい。また、各単語を接続するアークにスコアを付与してもよい。 In step ST3, the recognition result correction unit 11 first creates a network expression from the top N search results. FIG. 8 is an explanatory diagram showing an example of a network structure diagram created by the recognition result correction unit of the speech search apparatus according to Embodiment 1 of the present invention. This network expression is a network that accepts a paraphrase expression of words (units delimited by “|” in the reading information) included for each facility name of search result ID = 8, 1, 9, 10. The feature of this network is that, for each word 31, a fuzzy network 32, 33 having a structure in which the possibility of subwords that are likely to be mistaken is expanded is generated, and a filler element 34 that passes an arbitrary subword is generated. There is.
Here, the network is such that all words can be connected in an arbitrary order, but may be appropriately modified in consideration of language restrictions. Moreover, you may give a score to the arc which connects each word.

ステップＳＴ４において、認識結果補正部１１は、音声認識部４の認識結果「マルキュードーブツアイテル」をネットワーク表現に通して照合し、補正認識結果候補を作成する。
図９は、この発明の実施の形態１に係る音声検索装置の認識結果補正部が作成する補正認識結果候補の１例を示す説明図である。図９に示す補正認識結果候補４１は、各候補を「｜」によって単語単位に分割したデータとして作成される。また、ネットワーク表現を構成する各単語と一致せず、フィラー要素を通過した部分は、「（」および「）」で囲まれた疑似単語とする。
例えば、図９に示す補正認識結果候補４１のうちの「マルキュードー｜（ブツア）｜ショテン」は、認識結果「マルキュードーブツアイテル」に含まれる「マルキュードー」がネットワーク表現を構成する単語「マルキュードー」のあいまいネットワークを通過し、認識結果「ブツア」がフィラー要素を３回通過し、認識結果「イテル」が単語「ショテン」のあいまいネットワークを通過して作成されている。 In step ST 4, the recognition result correction unit 11 collates the recognition result “Marquee dovezitel” of the speech recognition unit 4 through the network expression, and creates a correction recognition result candidate.
FIG. 9 is an explanatory diagram showing an example of a correction recognition result candidate created by the recognition result correction unit of the speech search apparatus according to Embodiment 1 of the present invention. The correction recognition result candidate 41 shown in FIG. 9 is created as data obtained by dividing each candidate into words by “|”. Further, a portion that does not match each word constituting the network expression and passes through the filler element is a pseudo word surrounded by “(” and “)”.
For example, among the correction recognition result candidates 41 shown in FIG. 9, “Marchudo | (Butua) | The recognition result “Butua” passes through the filler element three times, and the recognition result “Itel” passes through the ambiguous network of the word “Shoten”.

各補正認識結果候補には補正スコアが付与され、認識結果「マルキュードーブツアイテル」と各補正認識結果候補との類似性を表している。補正スコアは、認識結果補正部１１が認識結果と補正認識結果候補とのサブワードの類似性をもとに一致度が高いほど評価が高くなるスコアリングを行い、さらに、補正認識結果候補に含まれるフィラー要素を通過したサブワードの割合が大きい場合に、より評価を下げるスコアリングを行ったものである。 A correction score is assigned to each correction recognition result candidate, and represents the similarity between the recognition result “Marque Doubts Eyetel” and each correction recognition result candidate. The correction score is scored such that the recognition result correction unit 11 has a higher evaluation as the degree of coincidence increases based on the similarity of the subwords between the recognition result and the correction recognition result candidate, and is further included in the correction recognition result candidate. When the proportion of subwords that have passed through the filler element is large, scoring that lowers the evaluation is performed.

続いて、認識結果補正部１１は補正認識結果候補と検索結果データ格納部１０に格納された検索結果とを比較し、検索結果のリストに含まれる検索スコアの補正を行う。
例えば、認識結果補正部１１は、検索結果の単語数と比較した補正認識結果候補の単語の不足数および過剰数、ならびに補正認識結果候補の補正スコアをもとに、下記式に従い検索スコアを補正する。
補正検索スコア＝検索スコア×｛１−（不足数／検索結果単語数）×α
−（過剰数／検索結果単語数）×β
−（１−補正スコア）×γ｝ Subsequently, the recognition result correction unit 11 compares the correction recognition result candidate with the search result stored in the search result data storage unit 10 and corrects the search score included in the search result list.
For example, the recognition result correction unit 11 corrects the search score according to the following equation based on the number of deficiencies and excess words in the correction recognition result candidate compared to the number of words in the search result and the correction score of the correction recognition result candidate. To do.
Corrected search score = search score × {1− (number of deficiencies / number of search result words) × α
− (Excess number / number of search result words) × β
− (1−correction score) × γ}

例えば（α，β，γ）＝（０．０５，０，２０，０．０３）とすれば、単語の不足をなるべく許容すると共に、過剰な発声を許容しにくくし、かつ、もとの認識結果からの隔たりもある程度考慮した補正が可能となる。 For example, if (α, β, γ) = (0.05, 0, 20, 0.03), the shortage of words is allowed as much as possible, the excessive utterance is hardly allowed, and the original recognition is performed. It is possible to make corrections taking into account the gap from the result to some extent.

実際に検索結果の検索スコアを補正し、補正検索スコアに従って補正認識結果候補から補正認識結果を選択すると以下のようになる。ここでは、補正スコアの上位Ｍ個（例えばＭ＝５）について補正検索スコアをそれぞれ算出し、Ｍ個のうち、補正検索スコアが最大となった補正認識結果候補を選択する。検索結果ＩＤ＝１の「マルキョードー｜ショテン｜ムツアイ｜テン」に対しては、補正認識結果候補４１のうち、「マルキョードー｜ムツアイ｜テン」が補正検索スコア０．８２８で最適な補正認識結果として選択される。また、ＩＤ＝８の「マルキュードー」に対しては、「マルキュードー｜（ブツアイテル）」が補正検索スコア０．６６８で選択される。さらに、ＩＤ＝９の「マルキュードー｜ショテン」に対しては、「マルキュードー｜（ブツア）｜ショテン」が補正検索スコア０．７３１で選択される。また、ＩＤ＝１０の「マルキュードー｜チリョーイン」に対しては、「マルキュードー｜（ブツアイテル）」が補正検索スコア０．７０８で選択される。 When the search score of the search result is actually corrected and the correction recognition result is selected from the correction recognition result candidates according to the correction search score, the result is as follows. Here, a corrected search score is calculated for each of the top M correction scores (for example, M = 5), and a correction recognition result candidate having the maximum corrected search score is selected from the M correction scores. For “Marchodo | Shoten | Mutsai | Ten” with search result ID = 1, among the correction recognition result candidates 41, “Marchodo | Mutsai | Ten” has a corrected search score of 0.828 and the optimum correction recognition result. Selected as. Also, for “Marquedeaux” with ID = 8, “Marquedeau | (Butsitel)” is selected with a corrected search score of 0.668. Further, for “Marquedeaux | Shoten” with ID = 9, “Marquedo | (Butua) | Shoten” is selected with a corrected search score of 0.731. In addition, for “Marquedeau | Chillyoin” with ID = 10, “Marquedeaux | (Butsitel)” is selected with a corrected search score of 0.708.

このようにして、認識結果補正部１１が検索結果データ格納部１０に格納された検索スコアを補正し、補正検索スコアに従って上位Ｎの検索結果をＩＤ＝１，９，１０，８の順番に再スコアリングする。 In this way, the recognition result correction unit 11 corrects the search score stored in the search result data storage unit 10, and re-retrieves the top N search results in the order of ID = 1, 9, 10, 8 according to the corrected search score. To score.

ステップＳＴ５において、候補提示部１２が不図示の画面上に検索結果を表示する。このとき、候補提示部１２は検索結果を補正検索スコア順に提示すると共に、各検索結果に対して選択された補正認識結果も提示する。 In step ST5, the candidate presentation unit 12 displays the search result on a screen (not shown). At this time, the candidate presentation unit 12 presents the search results in the order of the corrected search scores, and also presents the correction recognition results selected for each search result.

図１０は、この発明の実施の形態１に係る音声検索装置の候補提示部の提示例を示す説明図である。図１０に示す表示画面５１において、検索結果データ格納部１０に格納された検索結果のリスト５３が、補正検索スコアに基づいた順番で提示される。また、選択中の検索結果に対応した補正認識結果も提示される。
検索結果のリスト５３の先頭には、補正検索スコアが最も高い「○教堂書店六会店」が表示されている。認識結果表示枠５２には、選択中の「○教堂書店六会店」に対応する補正認識結果「マルキョードームツアイテン」が表示されている。
なお、利用者が発話していない「ショテン」については、認識結果に含まれず、利用者の理解しやすい認識結果５２が提示される。 FIG. 10 is an explanatory diagram showing a presentation example of the candidate presentation unit of the speech search device according to Embodiment 1 of the present invention. In the display screen 51 shown in FIG. 10, a list 53 of search results stored in the search result data storage unit 10 is presented in the order based on the corrected search score. A correction recognition result corresponding to the search result being selected is also presented.
At the top of the search result list 53, “○ Kyodo Shoten Rokukai” with the highest corrected search score is displayed. In the recognition result display frame 52, the correction recognition result “Marchyo Domes Eye Ten” corresponding to the currently selected “○ Kyodo Bookstore 6kai” is displayed.
Note that “Shoten” not spoken by the user is not included in the recognition result, and a recognition result 52 that is easy for the user to understand is presented.

利用者が候補提示部１２の提示する表示画面５１の検索結果のリスト５３の選択を移動して、「○久堂書店」を選択状態にした場合、ステップＳＴ６において候補提示部１２は表示画面５１を表示画面５４に遷移する。表示画面５４において、検索結果のリスト５６の「○久堂書店」が選択状態となり、認識結果表示枠５５も選択項目にマッチするように表示変更され、「マルキュードー（ブツア）ショテン」となる。「（」および「）」で囲まれた部分は、検索結果に該当する部分が存在せず、システムとして不明な認識結果が含まれていることを示している。 When the user moves the selection of the search result list 53 on the display screen 51 presented by the candidate presentation unit 12 and makes “○ Kudo Shoten” selected, the candidate presentation unit 12 displays the display screen 51 in step ST6. Transitions to the display screen 54. On the display screen 54, “○ Kudo Shoten” in the search result list 56 is selected, the display of the recognition result display frame 55 is also changed to match the selected item, and becomes “Marquedo (Butsua) Shoten”. A portion surrounded by “(” and “)” indicates that there is no portion corresponding to the search result and an unknown recognition result is included in the system.

このように、実施の形態１によれば、音声認識装置１は、入力された音声に応じた施設名称を認識結果として出力する音声認識部４と、認識結果をサブワード単位に分割したキーを用いて、単語より小さい単位のサブワードに区切られた索引を含む施設名称を格納した検索用データベース８を検索して、類似性の高い施設名を検索結果として出力するデータベース検索部９と、検索結果に含まれる単語を構成要素としたネットワーク表現を用いて認識結果を照合して、認識結果を補正した補正認識結果を出力する認識結果補正部１１と、検索結果およびそれに対応した補正認識結果を提示する候補提示部１２とを備えるように構成した。そのため、音声認識部４の音声認識のあいまい性を考慮した検索を行い、その検索結果を利用して音声認識結果の正当性を検証して補正することができ、入力音声認識精度の向上を図ると共に、利用者が理解しやすい認識結果を提示することが可能となる。 As described above, according to the first embodiment, the speech recognition apparatus 1 uses the speech recognition unit 4 that outputs the facility name corresponding to the input speech as the recognition result, and the key obtained by dividing the recognition result into subword units. A database search unit 9 that searches the search database 8 that stores facility names including an index divided into subwords in units smaller than words, and outputs a facility name having a high similarity as a search result. A recognition result correction unit 11 that collates recognition results using a network expression that includes included words and outputs a corrected recognition result obtained by correcting the recognition result, and presents a search result and a corresponding correction recognition result. The candidate presentation unit 12 is provided. Therefore, a search considering the ambiguity of the voice recognition of the voice recognition unit 4 can be performed, and the validity of the voice recognition result can be verified and corrected using the search result, thereby improving the input voice recognition accuracy. At the same time, it is possible to present a recognition result that is easy for the user to understand.

また、実施の形態１によれば、音声認識装置１は認識結果補正部１１が認識結果との類似性に基づいて順位付けを行った検索結果のうちの上位Ｎ件について、補正認識結果をもとに順位付けを補正し、候補提示部１２が補正した順位に応じて検索結果を提示する。そのため、より利用者が希望する認識結果を得られる可能性が高くなると共に、補正認識結果も検索結果に即した内容になり、利用者の発声を音声検索装置がどう認識したかを利用者に分かりやすく提示することが可能となる。 In addition, according to the first embodiment, the speech recognition apparatus 1 has corrected recognition results for the top N search results among the search results that the recognition result correction unit 11 has ranked based on the similarity to the recognition results. The ranking is corrected and the search result is presented according to the ranking corrected by the candidate presentation unit 12. As a result, the user is more likely to obtain the desired recognition result, and the corrected recognition result is also in accordance with the search result, so that the user can know how the voice search device has recognized the user's utterance. It becomes possible to present it in an easy-to-understand manner.

さらに、候補提示部１２が提示する補正認識結果は、検索結果に基づき補正されたものとなるので、最初の発声が検索対象である施設名称を特定するのに十分な発声内容でなかった場合でも、認識結果を流用して、さらにキーワードを追加発声することで絞込検索を実装するインタフェースが容易に構築できる。 Furthermore, since the correction recognition result presented by the candidate presentation unit 12 is corrected based on the search result, even when the first utterance is not utterance content sufficient to specify the facility name to be searched. By using the recognition results and further uttering additional keywords, an interface for implementing a refined search can be easily constructed.

なお、上記実施の形態１では、候補提示部１２が検索結果と補正された認識結果を同時に提示するように構成したが、どちらか一方のみ提示するように構成してもよい。 In the first embodiment, the candidate presentation unit 12 is configured to present the search result and the corrected recognition result at the same time, but may be configured to present only one of them.

また、上記実施の形態１では、単語の区切りを検索用データベース８の読み情報に予め登録しておく構成としてが、認識結果取得時にデータベース検索部９が必要に応じて単語を自動分割するように構成してもよい。
また、検索用データベース８の読み情報の単語の区切りに複数の候補がある場合には、複数の候補を併記しておき、データベース検索部９が認識結果と照合するように構成してもよい。 Further, in the first embodiment, the word break is registered in advance in the reading information of the search database 8, but the database search unit 9 automatically divides the word as necessary when the recognition result is acquired. It may be configured.
Further, when there are a plurality of candidates at the word break of the reading information in the search database 8, a plurality of candidates may be written together and the database search unit 9 may collate with the recognition result.

また、上記実施の形態１では、認識結果補正部１１が上位Ｎ件の検索結果を全て含むネットワーク表現を作成するように構成したが、検索結果１件毎にネットワーク表現を作成して、補正認識結果候補および補正スコアを求めるように構成してもよい。 In the first embodiment, the recognition result correction unit 11 is configured to create a network expression including all the top N search results. However, a network expression is generated for each search result, and the correction recognition is performed. You may comprise so that a result candidate and a correction score may be calculated | required.

また、上記実施の形態１では、認識結果補正部１１がネットワーク表現の構成要素を単語単位で作成する構成としたが、構成要素を形態素単位で作成する構成としてもよい。 Further, in the first embodiment, the recognition result correction unit 11 creates the network representation component in units of words. However, the configuration may be such that the component is created in units of morpheme.

実施の形態２．
図１１は、この発明の実施の形態２に係る音声検索装置の構成を示すブロック図である。本実施の形態２に係る音声検索装置１ａは、上記実施の形態１の認識結果補正部１１を、処理が単純なために軽量で、かつサブワード列の連続性に着目した認識結果の補正を行う認識結果補正部６１に置き換えた構成である。認識結果補正部６１以外の構成は上記実施の形態１の音声検索装置１と同一の構成であるため、詳細な説明は省略する。
認識結果補正部６１は、サブワード単位に分割された検索結果と認識結果とを、サブワード単位で照合して、認識結果のうちの検索結果と一致するサブワード列を含む単語を、検索結果のサブワード列を含む単語に置き換える補正を行う。 Embodiment 2. FIG.
FIG. 11 is a block diagram showing the configuration of the speech search apparatus according to Embodiment 2 of the present invention. The speech search apparatus 1a according to the second embodiment corrects the recognition result correction unit 11 according to the first embodiment, because the process is simple and lightweight, and the recognition result is focused on the continuity of the subword sequence. In this configuration, the recognition result correction unit 61 is replaced. Since the configuration other than the recognition result correction unit 61 is the same as that of the voice search device 1 of the first embodiment, detailed description thereof is omitted.
The recognition result correction unit 61 collates the search result divided into subword units with the recognition result in subword units, and selects a word including a subword string that matches the search result among the recognition results as a subword string of the search result. Perform correction to replace words that contain.

次に、音声検索装置１ａの動作を説明する。図１２は、この発明の実施の形態２に係る音声検索装置の動作を示すフローチャートである。上記実施の形態１同様に、利用者は「マルキョードームツアイテン」と発声し、施設名を検索することを意図しているものとする。 Next, the operation of the voice search device 1a will be described. FIG. 12 is a flowchart showing the operation of the speech search apparatus according to Embodiment 2 of the present invention. As in the first embodiment, it is assumed that the user intends to search for a facility name by saying “Marchyo Dome Aiten”.

図１２に示すステップＳＴ１１およびステップＳＴ１２は図７に示すステップＳＴ１およびステップＳＴ２と同様の処理であり、音声検索装置１ａの検索結果データ格納部１０には図６に示す検索結果が格納される。
続くステップＳＴ１３において、認識結果補正部６１は、検索結果データ格納部１０の検索結果を１つずつ取り出して、サブワードに連番を振る。ここでは、認識結果補正部６１が検索結果としてＩＤ＝１の「マルキョードーショテンムツアイテン」を取得し、認識結果とのマッチングを行う場合を考える。
図１３は、この発明の実施の形態２に係る音声検索装置の認識結果補正部による補正処理を説明する説明図である。認識結果補正部６１は、検索結果データ格納部１０から取り出した検索結果を音素、音節等のサブワード単位、「マ」、「ル」、「キョー」、「ドー」、「ショ」、「テ」、「ン」、「ム」、「ツ」、「ア」、「イ」、「テ」、「ン」に分割し、先頭から順番に連番を振り、番号付き検索結果７１とする。 Steps ST11 and ST12 shown in FIG. 12 are the same processes as steps ST1 and ST2 shown in FIG. 7, and the search result data storage unit 10 of the voice search device 1a stores the search results shown in FIG.
In subsequent step ST13, the recognition result correction unit 61 takes out the search results in the search result data storage unit 10 one by one, and assigns serial numbers to the subwords. Here, a case is considered where the recognition result correction unit 61 acquires “Market Shot Tenmite” with ID = 1 as a search result and performs matching with the recognition result.
FIG. 13 is an explanatory diagram for explaining the correction processing by the recognition result correction unit of the speech search apparatus according to Embodiment 2 of the present invention. The recognition result correction unit 61 converts the search result extracted from the search result data storage unit 10 into subword units such as phonemes and syllables, “ma”, “le”, “kyo”, “do”, “sho”, “te”. , “N”, “M”, “TSU”, “A”, “I”, “TE”, “N”, and a serial number is assigned in order from the top to obtain a numbered search result 71.

ステップＳＴ１４において、認識結果補正部６１は、認識結果「マルキュードーブツアイテル」の各サブワードに対して、番号付き検索結果７１と同一のサブワードに同一の番号を割り当てて、番号割当て認識結果７２とする。番号割当て認識結果７２のサブワード「マ」は、番号付き検索結果７１の「マ＝１」と同一のサブワードであるため、認識結果補正部６１によって「１」が割り当てられる。また、番号割当て認識結果７２のサブワード「キュー」は、相当するサブワードが番号付き検索結果７１には存在しないので、番号は割り当てられない。さらに、番号割当て認識結果７２のサブワード「テ」は、番号付き検索結果７１の「テ＝６」および「テ＝１２」の２箇所のサブワードと同一であるため、認識結果補正部６１によって「６」および「１２」の２つの番号が割り当てられる。 In step ST 14, the recognition result correction unit 61 assigns the same number to the same subword as the numbered search result 71 for each subword of the recognition result “Marque Doubts Iter”, and sets it as the number assignment recognition result 72. Since the subword “ma” of the number assignment recognition result 72 is the same subword as “ma = 1” of the numbered search result 71, “1” is assigned by the recognition result correction unit 61. In addition, the subword “queue” of the number assignment recognition result 72 is not assigned a number because the corresponding subword does not exist in the numbered search result 71. Further, since the subword “te” of the number assignment recognition result 72 is the same as the two subwords “te = 6” and “te = 12” of the numbered search result 71, the recognition result correction unit 61 sets “6”. "And" 12 "are assigned.

ステップＳＴ１５において、認識結果補正部６１は、番号割当て認識結果７２において連続する番号列を連続数が長い順に取り出す。認識結果補正部６１が番号割当て認識結果７２から連続性を考慮した番号列を選択すると、「１，２」、「４」、「９，１０，１１，１２」、「６」、「２」がそれぞれ選択される。認識結果補正部６１はこれら番号列の中から連続数が最も長い「９，１０，１１，１２」を先ず取得する。このとき、「６」は「１２」と同じ位置のサブワードであるため、以降の取得候補から除外される。次に、認識結果補正部６１は「９」〜「１２」を含まない番号列として２番目に長い「１，２」を取得する。このとき、番号列「４」および「２」が残っているが、「２」は既に取得されているので、認識結果候補６１は「４」を取得する。
このようにステップＳＴ１５では、番号割当て認識結果７２から、二重四角枠で示す番号列「９，１０，１１，１２」、「１，２」、「４」が取得される。 In step ST15, the recognition result correction unit 61 extracts consecutive number strings in the number assignment recognition result 72 in order of increasing number of consecutive numbers. When the recognition result correction unit 61 selects a number sequence in consideration of continuity from the number assignment recognition result 72, “1, 2,” “4”, “9, 10, 11, 12,” “6”, “2”. Are selected. The recognition result correction unit 61 first acquires “9, 10, 11, 12” having the longest continuous number from these number sequences. At this time, since “6” is a subword at the same position as “12”, it is excluded from the subsequent acquisition candidates. Next, the recognition result correction unit 61 obtains the second longest “1, 2” as a number string not including “9” to “12”. At this time, the number strings “4” and “2” remain, but since “2” has already been acquired, the recognition result candidate 61 acquires “4”.
Thus, in step ST15, the number strings “9, 10, 11, 12”, “1, 2,” and “4” indicated by the double square frames are acquired from the number assignment recognition result 72.

ステップＳＴ１６において、認識結果補正部６１は連続番号のサブワード列と番号付け検索結果７１の単語との対応付けを長い番号列の順に評価し、補正認識結果７３を作成する。
１番目に、認識結果補正部６１は最長の番号列「９，１０，１１，１２」が割り当てられたサブワード列に対して、同一番号が付けられた番号付け検索結果７１の単語「ムツアイ｜テン」を対応付けて比較する。そして、認識結果補正部６１が、番号割当て認識結果７２の「ブツアイ」の「ブ」を「ム」に、「テル」の「ル」を「ン」に補正する。
２番目に、認識結果補正部６１は番号列「１，２」が割り当てられたサブワード列に対して、同一番号が付けられた番号付け検索結果７１の単語「マルキョードー」を対応付けて比較する。さらに、認識結果補正部６１は、単語「マルキョードー」の部分サブワードに一致する番号割当て認識結果７２の「ドー＝４」も、単語「マルキョードー」に対応付けられるものと判断する。その結果、認識結果補正部６１は、番号割当て認識結果７２の「マルキュードー」の「キュー」を「キョー」に補正する。このように、認識結果補正部６１は、番号割当て認識結果７２の連続番号のサブワード列を含む単語を、同一番号が付けられた番号付け検索結果７１の単語に置き換える補正を行う。
なお、番号付け検索結果７１に含まれる単語「ショテン」は、番号割当て認識結果７２に同一番号のサブワードが存在しない。 In step ST 16, the recognition result correction unit 61 evaluates the correspondence between the serial number subword string and the numbered search result 71 word in the order of the long number string, and creates a correction recognition result 73.
First, the recognition result correction unit 61 applies the word “Mutsuai | Ten” in the numbering search result 71 with the same number assigned to the subword sequence to which the longest number sequence “9, 10, 11, 12” is assigned. ”And compare them. Then, the recognition result correction unit 61 corrects “bu” of “number” in the number assignment recognition result 72 to “mu” and “le” of “tel” to “n”.
Secondly, the recognition result correction unit 61 compares the word “Marko Doo” in the numbering search result 71 with the same number with the sub-word string to which the number string “1, 2” is assigned and compares them. . Furthermore, the recognition result correction unit 61 determines that “do = 4” of the number assignment recognition result 72 that matches the partial subword of the word “Marquis” is also associated with the word “Marquis”. As a result, the recognition result correction unit 61 corrects “cue” of “marked” in the number assignment recognition result 72 to “kyo”. In this way, the recognition result correction unit 61 performs correction to replace a word including a serial number subword string in the number assignment recognition result 72 with a word in the numbering search result 71 having the same number.
Note that the word “Shoten” included in the numbering search result 71 does not have a subword having the same number in the number assignment recognition result 72.

ステップＳＴ１７において、認識結果補正部６１は、番号割当て認識結果７２のうち、番号付け検索結果７１に存在しないサブワード列は単語としての特定が困難なために、認識結果のサブワードのまま補正認識結果７３に残す。なお、図１３に示す例では、認識結果における全てのサブワードが検索結果の単語として特定されたので、認識結果補正部６１はステップＳＴ２０５では処理を行わず、次の処理へ進む。 In step ST17, the recognition result correction unit 61 makes it difficult to specify a subword string that does not exist in the numbering search result 71 among the number assignment recognition results 72, so that the correction recognition result 73 remains as a subword of the recognition result. To leave. In the example shown in FIG. 13, since all subwords in the recognition result are specified as search result words, the recognition result correction unit 61 does not perform the process in step ST205 and proceeds to the next process.

検索結果データ格納部１０に格納された全ての検索結果に対して、認識結果補正部６１による認識結果の補正が行われると、続くステップＳＴ１８およびステップＳＴ１９において候補提示部１２が検索結果および補正認識結果を提示する。ステップＳＴ１８およびステップＳＴ１９は、図７に示すステップＳＴ５およびステップＳＴ６と同様の処理であるため、説明は省略する。 When the recognition result correction unit 61 corrects the recognition results for all the search results stored in the search result data storage unit 10, the candidate presentation unit 12 performs the search results and the correction recognition in subsequent steps ST18 and ST19. Present the results. Steps ST18 and ST19 are the same processes as steps ST5 and ST6 shown in FIG.

以上のように、実施の形態２によれば、音声認識装置１ａは認識結果補正部６１が検索結果と認識結果をサブワード単位に分割して照合し、認識結果のうちの検索結果と一致するサブワード列を含む単語を、検索結果のサブワード列を含む単語に置き変えた補正認識結果を出力し、候補提示部１２が検索結果およびそれに対応した補正認識結果を提示するように構成した。そのため、上記実施の形態１と同様に、検索結果と共に利用者が理解しやすい補正認識結果を提示することが可能となる。 As described above, according to the second embodiment, in the speech recognition device 1a, the recognition result correction unit 61 divides the search result and the recognition result into subwords and collates them, and the subword that matches the search result among the recognition results. A correction recognition result in which a word including a column is replaced with a word including a sub-word string as a search result is output, and the candidate presentation unit 12 is configured to present the search result and a correction recognition result corresponding thereto. Therefore, as in the first embodiment, it is possible to present a correction recognition result that is easy for the user to understand together with the search result.

なお、上記実施の形態２の音声検索装置１ａにおいて、認識結果補正部６１が、音声認識の際に間違いやすいサブワードのペアと間違いやすさの情報とを予め所持する構成であってもよい。間違いやすいサブワードのペアとは、例えば「キョー」と「キュー」であり、このペアの間違いやすさの情報、即ち間違える可能性は０．２とする。
認識結果補正部６１は、認識結果に番号を割り当てるステップＳＴ１４（図７）において、間違いやすいサブワードが存在する場合に間違いやすさの情報に基づき重み付きで番号を付与し、サブワード列の連続性を重み付きで判断する。例えば、図１３に示す番号割当て認識結果７２の「キュー」に対して、認識結果補正部６１が「キョー＝３」と同一の番号「３」を割当て、重み「０．２」を付与する。 In the voice search device 1a according to the second embodiment, the recognition result correction unit 61 may have a subword pair that is likely to be mistaken during voice recognition and information on the ease of mistake. The subword pairs that are likely to be mistaken are, for example, “Kyo” and “Cue”, and information on the likelihood of mistake of this pair, that is, the possibility of mistakes is 0.2.
In step ST14 (FIG. 7) in which numbers are assigned to the recognition results, the recognition result correction unit 61 assigns numbers with weights based on the information on the ease of error when there is an easily mistaken subword, and the continuity of the subword string is increased. Judge with weight. For example, the recognition result correction unit 61 assigns the same number “3” as “Kyo = 3” to the “queue” of the number assignment recognition result 72 shown in FIG.

また、上記実施の形態２の認識結果補正部６１の補正方式は、処理が単純であると共に、サブワード列の連続性を重視したアルゴリズムであるため、上記実施の形態１の認識結果補正部１１が作成した補正認識結果候補の各補正認識結果に対する後処理として適用することも可能である。即ち、認識結果補正部は、上記実施の形態１で説明したように、認識結果をネットワーク表現に通して照合し、認識結果に含まれる単語を検索結果の類似する単語に置き換えて補正認識結果を作成し、検索結果の順位付けを修正した後、上記実施の形態２で説明したように、検索結果と補正認識結果とをサブワード単位で照合して、認識結果のうちの検索結果と一致するサブワード列を含む単語を、検索結果のサブワード列を含む形単語に置き換える補正を行う構成である。
この構成の場合には、音声認識装置は単語連鎖をより考慮した補正認識結果を作成することが可能となる。 Further, the correction method of the recognition result correction unit 61 of the second embodiment is an algorithm that is simple in processing and places importance on the continuity of the subword sequence. It is also possible to apply as post-processing for each correction recognition result of the created correction recognition result candidate. That is, as described in the first embodiment, the recognition result correction unit collates the recognition result through the network expression, replaces a word included in the recognition result with a word similar to the search result, and outputs the corrected recognition result. After creating and correcting the ranking of search results, as described in the second embodiment, the search result and the corrected recognition result are collated in units of subwords, and the subword that matches the search result among the recognition results In this configuration, a word including a column is corrected to be replaced with a form word including a sub-word column as a search result.
In the case of this configuration, the speech recognition apparatus can create a corrected recognition result that further considers word chain.

また、上記実施の形態１および実施の形態２では、日本語を対象にした音声検索装置を例に用いて説明したが、対象とする言語を限定するものではなく、他の言語においても単語より小さい単位である音素あるいは音素列等を単位に用いて音声検索装置を構成することが可能である。 Further, in the first embodiment and the second embodiment, the description has been given by using the voice search device for Japanese as an example, but the target language is not limited, and in other languages than the word It is possible to configure a speech search device using phonemes or phoneme strings which are small units.

この発明の実施の形態１に係る音声検索装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech search device concerning Embodiment 1 of this invention. この発明の実施の形態１に係る音声検索装置で用いられる隠れマルコフモデルの構造を示す説明図である。It is explanatory drawing which shows the structure of the hidden Markov model used with the speech search device concerning Embodiment 1 of this invention. この発明の実施の形態１に係る音声検索装置の認識用単語辞書の１例を示す説明図である。It is explanatory drawing which shows one example of the word dictionary for recognition of the speech search device concerning Embodiment 1 of this invention. この発明の実施の形態１に係る音声検索装置の認識用単語辞書の１例を示す説明図である。It is explanatory drawing which shows one example of the word dictionary for recognition of the speech search device concerning Embodiment 1 of this invention. この発明の実施の形態１に係る音声検索装置の検索用データベースの１例を示す説明図である。It is explanatory drawing which shows one example of the database for a search of the speech search device concerning Embodiment 1 of this invention. この発明の実施の形態１に係る音声検索装置の検索結果の１例を示す説明図である。It is explanatory drawing which shows an example of the search result of the speech search device concerning Embodiment 1 of this invention. この発明の実施の形態１に係る音声検索装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the speech search device based on Embodiment 1 of this invention. この発明の実施の形態１に係る音声検索装置の認識結果補正部が作成するネットワーク構造図の１例を示す説明図である。It is explanatory drawing which shows one example of the network structure figure which the recognition result correction | amendment part of the speech search device concerning Embodiment 1 of this invention produces. この発明の実施の形態１に係る音声検索装置の認識結果補正部が作成する補正認識結果候補の１例を示す説明図である。It is explanatory drawing which shows an example of the correction | amendment recognition result candidate which the recognition result correction | amendment part of the speech search device concerning Embodiment 1 of this invention produces. この発明の実施の形態１に係る音声検索装置の候補提示部の提示例を示す説明図である。It is explanatory drawing which shows the example of a presentation of the candidate presentation part of the speech search device concerning Embodiment 1 of this invention. この発明の実施の形態２に係る音声検索装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech search device which concerns on Embodiment 2 of this invention. この発明の実施の形態２に係る音声検索装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the speech search device based on Embodiment 2 of this invention. この発明の実施の形態２に係る音声検索装置の認識結果補正部による補正処理を説明する説明図である。It is explanatory drawing explaining the correction process by the recognition result correction | amendment part of the speech search device which concerns on Embodiment 2 of this invention.

Explanation of symbols

１，１ａ音声検索装置、２音響標準パタンデータベース、３認識用単語辞書、４音声認識部、５音声分析部、６照合部、７探索部、８検索用データベース、９データベース検索部、１０検索結果データ格納部、１１認識結果補正部、１２候補提示部、２１状態、２２自己回帰アーク、３１単語、３２，３３あいまいネットワーク、３４フィラー要素、４１補正認識結果候補、５１表示画面、５２認識結果表示枠、５３検索結果のリスト、６１認識結果補正部、７１番号付き検索結果、７２番号割当て認識結果、７３補正認識結果。 1, 1a voice search device, 2 acoustic standard pattern database, 3 recognition word dictionary, 4 speech recognition unit, 5 speech analysis unit, 6 collation unit, 7 search unit, 8 search database, 9 database search unit, 10 search result Data storage unit, 11 recognition result correction unit, 12 candidate presentation unit, 21 state, 22 autoregressive arc, 31 words, 32, 33 fuzzy network, 34 filler element, 41 correction recognition result candidate, 51 display screen, 52 recognition result display Frame, 53 search result list, 61 recognition result correction unit, 71 numbered search result, 72 number assignment recognition result, 73 correction recognition result.

Claims

A speech recognition unit that outputs a word string corresponding to the input speech as a recognition result;
A search unit that divides the recognition result into subwords in units smaller than words and uses them as a search key, searches a search target document including an index divided in subword units, and outputs a search result;
A recognition result correction unit that corrects the recognition result based on the search result;
A speech search apparatus comprising: a candidate presentation unit that presents at least one of the search result and the recognition result corrected by the recognition result correction unit.

The recognition result correction unit compares the search result divided into subwords and the recognition result in word units, and performs correction to replace a word included in the recognition result with a word similar to the search result. The voice search device according to claim 1.

The search unit outputs multiple search results,
The recognition result correction unit calculates a search score according to the number of subword matches between the recognition result and each search result, assigns a ranking to the plurality of search results, and searches for a predetermined number of higher ranks among the plurality of search results For the result, the upper predetermined number of search results divided into subwords and the recognition result are collated in word units, and the words included in the recognition result are replaced with words similar to the upper predetermined number of search results. Performing correction, recalculating the search score of each search result using the corrected recognition result, correcting the order of the plurality of search results,
The voice search device according to claim 1, wherein the candidate presentation unit presents the plurality of search results according to the ranks given by the recognition result correction unit.

The recognition result correction unit collates the search result divided into subword units and the recognition result in subword units, and selects a word including a subword string that matches the search result among the recognition results. The speech search apparatus according to claim 1, wherein correction is performed by replacing with a word including a subword string.

The search unit outputs multiple search results,
The recognition result correction unit calculates a search score according to the number of subword matches between the recognition result and each search result, assigns a ranking to the plurality of search results, and searches for a predetermined number of higher ranks among the plurality of search results For the result, the upper predetermined number of search results divided into subwords and the recognition result are collated in word units, and the words included in the recognition result are replaced with words similar to the upper predetermined number of search results. The correction is performed, the search score of each search result is recalculated using the corrected recognition result, the order of the plurality of search results is corrected, and each search result and the corrected recognition result are Correction is performed by substituting a word including a subword string that matches the search result among the recognition results with a word including the subword string in each search result by collating in units of subwords. Voice search apparatus according to claim 1, wherein a.

A speech recognition step of outputting a word string corresponding to the input speech as a recognition result;
A search step of dividing the recognition result into subwords in units smaller than words and using them as search keys, searching for a search target document including an index divided in subword units, and outputting the search results;
A recognition result correction step for correcting the recognition result based on the search result;
A speech search method comprising: a candidate presentation step for presenting at least one of the search result and the recognition result corrected in the recognition result correction step.