JP2012022251A

JP2012022251A - Speech retrieval interface device and speech input retrieval method

Info

Publication number: JP2012022251A
Application number: JP2010161779A
Authority: JP
Inventors: Hiroyuki Washino; 浩之鷲野; Hirotaka Goi; 啓恭伍井; Kazuhiro Komatsu; 一宏小松
Original assignee: Mitsubishi Electric Corp; Mitsubishi Electric Building Techno Service Co Ltd
Current assignee: Mitsubishi Electric Corp; Mitsubishi Electric Building Solutions Corp
Priority date: 2010-07-16
Filing date: 2010-07-16
Publication date: 2012-02-02
Anticipated expiration: 2030-07-16
Also published as: JP5542559B2

Abstract

PROBLEM TO BE SOLVED: To provide a speech retrieval interface device capable of efficiently narrowing a retrieval result, when an user retrieves specific information from a database by using speech recognition.SOLUTION: Retrieval means 105 retrieves a word or a word string of a speech recognition result from a database for retrieval 104 and outputs the retrieval result and the number of retrieval candidates. When correction target word selection means 109 selects a correction target word, correction candidate generation means 111 matches the correction target word to a word of read and syllable storage means 110 for generating a correction candidate in a word unit and acquires the number of retrieval candidates to the correction candidate. Correction candidate display means 112 displays the correction candidate and the number of retrieval candidates obtained by the correction candidate generation means 111.

Description

本発明は、音声認識を利用して特定のデータベースを検索する音声検索インタフェース装置及び音声入力検索方法に関し、特に、ユーザが容易に検索結果を絞り込むことのできる機能と、ユーザが音声認識結果を修正したい場合に、容易にかつ素早く音声認識結果を修正する機能とを備えた音声検索インタフェース装置及び音声入力検索方法に関する。 The present invention relates to a voice search interface device and a voice input search method for searching a specific database using voice recognition, and in particular, a function that allows a user to easily narrow down a search result and a user to correct the voice recognition result. The present invention relates to a voice search interface apparatus and a voice input search method having a function of correcting a voice recognition result easily and quickly when desired.

インターネットの普及や、電子機器が有する機能の多様化・複雑化に伴い、検索キーワードを入力して大規模なデータベースの中から特定の情報を検索するような機会が増加している。パソコンであれば、検索キーワードをキーボードで入力することが一般的であるが、キーボードが使えないようなケースや、文字数の多い検索キーワードを入力する際には、音声認識を利用して音声により検索キーワードを入力する技術が開示されている（例えば、特許文献１参照）。
しかしながら、音声認識では、使用環境やユーザの個人差によって認識率が異なり、誤認識が生じるという本質的な課題がある。その結果、データベースの検索結果もユーザが全く意図しないような課題が出力される場合がある。そこで、特許文献１に示されているような音声認識装置では、音声認識結果の単語を選択してデータベースの検索結果を絞り込む技術が開示されている。 With the spread of the Internet and the diversification and complexity of functions of electronic devices, there are increasing opportunities to search for specific information from a large database by inputting a search keyword. If you are using a computer, it is common to use a keyboard to enter search keywords. However, when you use a keyboard that cannot be used or when you enter a search keyword with a large number of characters, you can use voice recognition to search by voice. A technique for inputting a keyword is disclosed (for example, see Patent Document 1).
However, in speech recognition, there is an essential problem that the recognition rate varies depending on the usage environment and individual differences of users, and erroneous recognition occurs. As a result, there may be a case where a problem that the user does not intend is output as the database search result. In view of this, a speech recognition apparatus such as that disclosed in Patent Document 1 discloses a technique for selecting a word of a speech recognition result and narrowing down a database search result.

一方、音声認識率を上げて誤認識を抑える方法としては、ユーザによって認識パラメータをチューニングする方法や、場面によって認識語彙を絞る方法があるが、不特定多数のユーザが使用することが想定され、さらにカーナビの施設名検索等のように大規模な語彙を対象にしなくてはならない場合には、上記のような解決策は本質的な解決方法とはならない。従って、音声認識において誤認識が生じた場合に、簡単かつ素早く認識結果を修正するインタフェースを提供することは極めて重要である。そこでこれまでにも、音声認識結果を修正するインタフェースは数々提案されている。 On the other hand, as a method of increasing the speech recognition rate and suppressing misrecognition, there are a method of tuning a recognition parameter by a user and a method of narrowing a recognition vocabulary depending on a scene, but it is assumed that an unspecified number of users use it. Furthermore, when a large-scale vocabulary must be targeted, such as a car navigation facility name search, the above solution is not an essential solution. Therefore, it is extremely important to provide an interface that can easily and quickly correct a recognition result when a misrecognition occurs in speech recognition. So far, many interfaces for correcting the speech recognition result have been proposed.

例えば、特許文献２には、音声認識による認識結果とともに、修正候補となる単語の一覧が表示され、ユーザが単語の一覧から所望の単語を選択するだけで、簡単に修正することのできる音声認識装置が開示されている。特許文献２に記載されている手法は、コンフュージョン・ネットワークを用いて音声入力に基づく単語グラフを音響的なクラスタリングにより複数の単語の区間に分割し、単語の各区間ごとに競合確率の高い単語を修正候補として生成する手法である。 For example, in Patent Literature 2, a list of words that are candidates for correction is displayed together with the recognition result by voice recognition, and the voice recognition that can be corrected simply by the user selecting a desired word from the list of words. An apparatus is disclosed. The technique described in Patent Literature 2 uses a confusion network to divide a word graph based on speech input into a plurality of word segments by acoustic clustering, and a word having a high contention probability for each word segment. Is generated as a correction candidate.

特開２００６−１９５５７６号公報JP 2006-195576 A 特開２００６−１４６００８号公報JP 2006-146008 A

しかしながら、ユーザが音声認識結果の単語を選択して検索結果を絞り込む場合に、どの単語を選択すると検索結果を効率的に絞り込むことができるのかをユーザが把握することができないため、ユーザは単語を選択して検索結果を確認し、必要であればさらに単語を選択する、という試行錯誤を繰り返すことになる。また、音声認識結果を修正したい場合も同様に、修正の結果、どの程度検索結果を絞り込むことができるのかをユーザが把握することができないため、ユーザは単語を修正して検索結果を確認し、必要であればさらに単語を修正する、という試行錯誤を繰り返すことになる。 However, when the user selects a word of the speech recognition result and narrows down the search result, the user cannot grasp which word can be narrowed down efficiently so that the user cannot select the word. The trial and error of selecting and confirming the search result and selecting more words if necessary is repeated. Similarly, when the user wants to correct the speech recognition result, the user cannot grasp how much the search result can be narrowed down as a result of the correction, so the user corrects the word and confirms the search result, If necessary, the trial and error of correcting the word further is repeated.

この発明は上記のような課題を解決するためになされたもので、ユーザが音声認識を利用してデータベースから特定の情報を絞り込む際に、ユーザが効率的に検索結果を絞り込むことのできる音声検索インタフェース装置及び音声入力検索方法を得ることを目的とする。 The present invention has been made to solve the above-described problems. When a user narrows down specific information from a database using voice recognition, the user can efficiently narrow down a search result. An object of the present invention is to obtain an interface device and a voice input retrieval method.

この発明に係る音声検索インタフェース装置は、音声入力に対する認識結果として単語または単語列を出力する単語出力手段と、任意の単語または単語列が与えられた場合、特定のデータベースを検索して単語または単語列の検索結果及び検索候補数を出力する検索手段と、単語の情報が登録された単語辞書記憶手段と、単語出力手段で出力された単語と、単語辞書記憶手段に登録された単語とのマッチングを行い、単語単位の修正候補を生成する修正候補生成手段と、修正候補生成手段で生成されたそれぞれの修正候補に対する検索候補数を、検索手段を介して取得する修正候補検索候補数取得手段と、修正候補生成手段で生成された修正候補と、修正候補検索候補数取得手段で取得されたそれぞれの修正候補に応じた検索候補数とを出力する修正候補出力手段とを備えたものである。 The speech search interface device according to the present invention includes a word output means for outputting a word or a word string as a recognition result for speech input, and, when an arbitrary word or word string is given, searches a specific database to search for a word or word Search means for outputting a search result and the number of search candidates for a column, a word dictionary storage means for registering word information, a word output by the word output means, and a word registered in the word dictionary storage means Correction candidate generation means for generating correction candidates in units of words, and correction candidate search candidate number acquisition means for acquiring the number of search candidates for each correction candidate generated by the correction candidate generation means via the search means, The correction candidate generated by the correction candidate generation means and the number of search candidates corresponding to each correction candidate acquired by the correction candidate search candidate number acquisition means are output. Is obtained by a correction candidate output means that.

この発明の音声検索インタフェース装置は、音声認識結果に対して、単語単位の修正候補を生成すると共に、それぞれの修正候補に対する検索候補数を出力するようにしたので、ユーザが音声認識を利用してデータベースから特定の情報を絞り込む際に、効率的に検索結果を絞り込むことができる。 The voice search interface device according to the present invention generates correction candidates in units of words for the voice recognition result and outputs the number of search candidates for each correction candidate, so that the user uses voice recognition. When narrowing down specific information from the database, the search results can be narrowed down efficiently.

この発明の実施の形態１の音声検索インタフェース装置を示す構成図である。It is a block diagram which shows the voice search interface apparatus of Embodiment 1 of this invention. この発明の実施の形態１の音声検索インタフェース装置における検索結果を示す説明図である。It is explanatory drawing which shows the search result in the voice search interface apparatus of Embodiment 1 of this invention. この発明の実施の形態１の音声検索インタフェース装置における一つの単語で検索結果を絞り込んだ場合の説明図である。It is explanatory drawing at the time of narrowing down a search result with one word in the voice search interface apparatus of Embodiment 1 of this invention. この発明の実施の形態１の音声検索インタフェース装置における複数の単語で検索結果を絞り込んだ場合の説明図である。It is explanatory drawing at the time of narrowing down a search result by the some word in the voice search interface apparatus of Embodiment 1 of this invention. この発明の実施の形態１の音声検索インタフェース装置における修正候補生成処理を示すフローチャートである。It is a flowchart which shows the correction candidate production | generation process in the speech search interface apparatus of Embodiment 1 of this invention. この発明の実施の形態１の音声検索インタフェース装置における読み・音節記憶手段の情報を示す説明図である。It is explanatory drawing which shows the information of the reading and a syllable memory | storage means in the speech search interface apparatus of Embodiment 1 of this invention. この発明の実施の形態１の音声検索インタフェース装置における修正候補の表示を示す説明図である。It is explanatory drawing which shows the display of the correction candidate in the speech search interface apparatus of Embodiment 1 of this invention.

実施の形態１．
図１に、本発明の実施の形態１における音声検索インタフェース装置の構成図を示し、以下に説明する。
音声入力手段１０１は、マイクなどの音声入力デバイス及びＡＤ変換器により構成されており、ユーザが音声を入力すると、アナログ音声信号をコンピュータにより処理可能なデジタル音声信号に変換する。音声認識辞書記憶手段１０２は、音声認識のために必要な認識辞書（言語モデル）を保存している記憶装置である。音声認識手段（単語出力手段）１０３は、上記デジタル音声信号を入力として音声認識辞書記憶手段１０２を参照して音声を認識し、音声認識結果として１つあるいは複数の単語列を出力する。検索用データベース（特定のデータベース）１０４は、検索対象となる施設名や人名等が保存されている記憶装置である。検索手段１０５は、検索用データベース１０４の中から、音声認識手段１０３から出力された音声認識結果の単語列のいずれか１つを含む検索結果及び検索結果候補数を取得する。同時に、音声認識結果のそれぞれの単語のみを含む検索結果候補数を取得する。 Embodiment 1 FIG.
FIG. 1 shows a block diagram of a voice search interface device according to Embodiment 1 of the present invention, which will be described below.
The voice input unit 101 includes a voice input device such as a microphone and an AD converter. When a user inputs voice, the voice input unit 101 converts an analog voice signal into a digital voice signal that can be processed by a computer. The speech recognition dictionary storage unit 102 is a storage device that stores a recognition dictionary (language model) necessary for speech recognition. The voice recognition means (word output means) 103 recognizes a voice by referring to the voice recognition dictionary storage means 102 with the digital voice signal as an input, and outputs one or a plurality of word strings as a voice recognition result. The search database (specific database) 104 is a storage device in which the name of a facility or person to be searched is stored. The search unit 105 acquires the search result and the number of search result candidates including any one of the word strings of the speech recognition result output from the speech recognition unit 103 from the search database 104. At the same time, the number of search result candidates including only each word of the speech recognition result is acquired.

音声認識結果表示手段１０６は、ＬＣＤ表示器などの表示デバイスを用いて、上記音声認識結果の単語列と、検索手段１０５から出力されたそれぞれの単語に対応する検索結果候補数を同時にユーザに提示する。絞り込み単語選択手段１０７は、ユーザが音声認識結果の単語のいずれかを選択して検索結果を絞り込みたい場合に、マウスやタッチパネルなどの入力デバイスを用いて修正対象となる単語を選択する操作を受け付け、ユーザによって選択操作がなされた場合に、選択された単語を出力する。以下、この絞り込みのために選択された単語を絞り込み単語と呼ぶ。検索結果絞り込み手段１０８は、絞り込み単語選択手段１０７においてユーザが絞り込み単語を選択した場合に、検索手段１０５によって取得された検索結果の中から絞り込み単語のみを含む検索結果及び検索結果候補数を取得する。修正対象単語選択手段１０９は、ユーザが音声認識結果のいずれかの単語を修正したい場合に、マウスやタッチパネルなどの入力デバイスを用いて修正対象となる単語を選択する操作を受け付け、ユーザによって選択操作がなされた場合に、修正対象単語を出力する。 The speech recognition result display means 106 simultaneously presents to the user the word sequence of the speech recognition result and the number of search result candidates corresponding to each word output from the search means 105 using a display device such as an LCD display. To do. The narrowed-down word selection unit 107 accepts an operation of selecting a word to be corrected using an input device such as a mouse or a touch panel when the user wants to narrow down the search result by selecting one of the words of the speech recognition result. When the selection operation is performed by the user, the selected word is output. Hereinafter, the word selected for the narrowing is referred to as a narrowed word. The search result narrowing means 108 acquires the search result including only the narrowed word from the search results acquired by the search means 105 and the number of search result candidates when the user selects the narrow word in the narrowed word selecting means 107. . The correction target word selection unit 109 receives an operation of selecting a word to be corrected using an input device such as a mouse or a touch panel when the user wants to correct any word in the speech recognition result, and the user performs a selection operation. When is done, the correction target word is output.

読み・音節記憶手段（単語辞書記憶手段）１１０は、認識対象となる単語の表記と読み情報と音節情報を保存している記憶装置である。修正候補生成手段１１１は、修正対象単語選択手段１０９が出力した修正対象単語に対して、類似度の高い単語を修正候補として生成して出力する。このとき、読み・音節記憶手段１１０に保存されている単語単位の読み情報及び音節情報を利用して修正候補を生成する。同時に、それぞれの修正候補がユーザに選択された場合、検索結果が何件に絞り込まれるかを示す検索結果候補数を検索手段１０５から取得して出力する（修正候補検索候補数取得手段及び修正候補出力手段としての機能）。修正候補表示手段１１２は、ＬＣＤ表示器などの表示デバイスを用いて、修正候補生成手段１１１から出力された修正候補及び検索結果候補数を同時にユーザに表示する。修正候補選択手段１１３は、マウスやタッチパネルなどの入力デバイスを用いてユーザが意図する修正候補を選択する操作を受け付け、ユーザによって選択操作がなされた場合に、選択された修正候補を出力する。修正実行手段１１４は、ユーザによって選択された修正候補を入力として、既にユーザに提示されている認識結果を更新し、修正結果をユーザに再提示する。 The reading / syllable storage means (word dictionary storage means) 110 is a storage device that stores notation of words to be recognized, reading information, and syllable information. The correction candidate generation unit 111 generates and outputs a word having a high similarity as a correction candidate with respect to the correction target word output by the correction target word selection unit 109. At this time, correction candidates are generated using word-by-word reading information and syllable information stored in the reading / syllable storage means 110. At the same time, when each correction candidate is selected by the user, the search result candidate number indicating how many search results are narrowed down is acquired and output from the search means 105 (correction candidate search candidate number acquisition means and correction candidates). Function as output means). The correction candidate display means 112 displays the correction candidates and the number of search result candidates output from the correction candidate generation means 111 to the user at the same time using a display device such as an LCD display. The correction candidate selection unit 113 receives an operation of selecting a correction candidate intended by the user using an input device such as a mouse or a touch panel, and outputs the selected correction candidate when the selection operation is performed by the user. The correction execution unit 114 receives the correction candidate selected by the user as an input, updates the recognition result already presented to the user, and re-presents the correction result to the user.

以下では、上記のように構成された音声検索インタフェース装置の処理の流れについて、具体例を交えて説明する。
ユーザが「三菱電機株式会社」を音声入力しようとして、「ミツビシデンキカブシキガイシャ」と発話したとする。
このとき、先ず、音声入力手段１０１は、発話されたアナログ音声信号をデジタル音声信号に変換する。次に、音声認識手段１０３は、上記変換されたデジタル音声信号を入力として音声認識辞書記憶手段１０２を参照して音声を認識し、音声認識結果の単語列を出力する。音声を認識する手法は任意であり、以下の非特許文献２、３、４に記されているような、公知の音声認識手法を利用することができる。例えば、デジタル音声信号を音響特徴量に変換し、音素など音声認識の基本単位に対する音響スコアと、言語モデルに基づく言語スコアに基づいて、認識候補の探索を行う手法などが考えられる。 Hereinafter, the processing flow of the voice search interface device configured as described above will be described with a specific example.
It is assumed that the user utters “Mitsubishi Denki Kabushiki Geisha” while trying to input “Mitsubishi Electric Corporation” by voice.
At this time, the voice input unit 101 first converts the uttered analog voice signal into a digital voice signal. Next, the voice recognition unit 103 recognizes the voice by referring to the voice recognition dictionary storage unit 102 using the converted digital voice signal as an input, and outputs a word string of the voice recognition result. The method for recognizing speech is arbitrary, and a known speech recognition method as described in Non-Patent Documents 2, 3, and 4 below can be used. For example, a method of searching for recognition candidates based on an acoustic score for a basic unit of speech recognition, such as phonemes, and a language score based on a language model, can be considered.

非特許文献２：鹿野清宏、伊藤克亘、河原達也、武田一哉、山本幹雄著：「音声認識システム」株式会社オーム社、平成１３年５月１５日
非特許文献３：北研二、辻井潤一著：「確率的言語モデル」、東京大学出版会、平成１１年１１月２５日
非特許文献４；中川聖一著：「確率モデルによる音声認識」、社団法人電子情報通信学会、昭和６３年７月１日 Non-Patent Document 2: Kiyohiro Shikano, Katsunobu Ito, Tatsuya Kawahara, Kazuya Takeda, Mikio Yamamoto: “Speech Recognition System” Ohm Co., Ltd., May 15, 2001 Non-Patent Document 3: Kenji Kita, Junichi Sakurai: "Probabilistic language model", The University of Tokyo Press, November 25, 1999 Non-patent document 4; Seiichi Nakagawa: "Speech recognition by probabilistic model", The Institute of Electronics, Information and Communication Engineers, July 1 1988 Day

音声認識手段１０３により音声認識結果の単語列が出力される（単語出力ステップ）と、検索手段１０５は、検索用データベース１０４の中から、音声認識結果の単語のいずれか１つを含む検索結果及び検索結果候補数を取得する（ＯＲ検索）。例えば上記のユーザ発話に対して、音声認識結果が「三菱／電機／株式／会社」（／は単語の区切りを表す）であった場合には、４つの単語「三菱」「電機」「株式」「会社」のいずれか１つを含む検索結果及び検索結果候補数を取得する（検索ステップ）。例えば、「三菱地所」、「大磯電機株式会社」、「株式証券取引所」等が検索結果となる。さらに、検索手段１０５は、音声認識結果の単語ｗ_ｉ（ｉ＝１，２，…，ｎ；ｎは検索結果の単語数）それぞれについて、その単語を含む検索結果及び検索結果候補数ｎ_ｉを取得する。上記の例では、検索用データベース１０４の中に「三菱」を含む語が５２件あった場合、ｗ_１＝“三菱”とすると、ｎ_１＝５２となる。 When the speech recognition unit 103 outputs a word sequence of the speech recognition result (word output step), the search unit 105 searches the search database 104 for a search result including any one of the speech recognition result words, and The number of search result candidates is acquired (OR search). For example, for the above user utterance, if the speech recognition result is “Mitsubishi / Electricity / Stock / Company” (/ represents a word break), the four words “Mitsubishi” “Electricity” “Stock” A search result including any one of “company” and the number of search result candidates are acquired (search step). For example, “Mitsubishi Estate”, “Okuma Electric Co., Ltd.”, “Stock Stock Exchange”, etc. are search results. Further, for each word w _i (i = 1, 2,..., N; n is the number of words in the search result) of the speech recognition result, the search unit 105 calculates the search result including the word and the number of search result candidates n _i . get. In the above example, if there are 52 words including “Mitsubishi” in the search database 104, n ₁ = 52 when w ₁ = “Mitsubishi”.

次に、音声認識結果表示手段１０６は、図２のように、音声認識結果の単語の分割区間がユーザにわかるようなレイアウトで、ＬＣＤ表示器などの表示デバイスを用いてユーザに提示する。同時に、検索手段１０５で取得した、それぞれの単語に応じた検索結果候補数ｎ_ｉを単語の表記に隣接して表示する。尚、ここでは、音声認識結果の単語に応じた検索結果候補数を表示したが、候補数そのものを表示するのではなく、候補数の多さによってぞれぞれの単語の大きさや色を変更して表示するなど、候補数を暗示する表示であっても良い。 Next, as shown in FIG. 2, the voice recognition result display means 106 presents the user with a display device such as an LCD display in a layout that allows the user to know the segment of the words of the voice recognition result. At the same time, acquired by the search unit 105, displayed adjacent to search result candidate number n _i corresponding to each word in the representation of words. Although the number of search result candidates corresponding to the word of the speech recognition result is displayed here, the size and color of each word are changed depending on the number of candidates instead of displaying the number of candidates themselves. For example, the display may indicate the number of candidates.

音声認識結果表示手段１０６によって音声認識結果及び検索結果候補数が表示されると、ユーザは、表示された音声認識結果の単語を選択することで、検索結果を絞り込むことができる。このとき、絞り込み単語選択手段１０７が、マウスやタッチパネルなどの入力デバイスを用いてユーザが絞り込み単語を選択する操作を受け付ける。ユーザが絞り込み単語を選択すると、絞り込み単語選択手段１０７は、選択操作を感知し、選択された絞り込み単語を出力する。 When the voice recognition result and the number of search result candidates are displayed by the voice recognition result display unit 106, the user can narrow down the search result by selecting a word of the displayed voice recognition result. At this time, the narrowed word selection means 107 accepts an operation for the user to select the narrowed word using an input device such as a mouse or a touch panel. When the user selects a narrowed word, the narrowed word selection unit 107 detects the selection operation and outputs the selected narrowed word.

次に、検索結果絞り込み手段１０８が、絞り込み単語選択手段１０７によって出力された単語だけを含むものだけに検索結果を絞り込み、絞り込んだ検索結果を図３のようにユーザに表示する。このとき、絞り込み単語として選択されている単語については、背景色を変更して表示する等、ユーザが見てわかるように表示することが望ましい。例えば、上記の例で「三菱」がユーザに選択された場合、「三菱」「電機」「株式」「会社」のいずれかを含む検索結果が、「三菱」のみを含む検索結果に絞り込まれる。絞り込み単語は１つだけでなく、複数選択することも可能であり、複数選択された場合には、選択された複数の絞り込み単語の全てを含む検索結果に絞り込まれる（ＡＮＤ検索）。例えば、「三菱」と「電機」が絞り込み単語として選択された場合には、図４のように、「三菱」と「電機」の両方を含む検索結果に絞り込まれる。 Next, the search result narrowing means 108 narrows down the search results to only those containing only the words output by the narrowed word selection means 107, and displays the narrowed search results to the user as shown in FIG. At this time, it is desirable to display the word selected as the narrowed-down word so that the user can see it, for example, by changing the background color. For example, when “Mitsubishi” is selected by the user in the above example, search results including any of “Mitsubishi”, “Electricity”, “Stock”, and “Company” are narrowed down to search results including only “Mitsubishi”. It is possible to select not only one narrowed word but also a plurality of selected words. When a plurality of selected words are selected, the search results are narrowed down to search results including all of the selected narrowed words (AND search). For example, when “Mitsubishi” and “Electric” are selected as narrowing words, the search results including both “Mitsubishi” and “Electric” are narrowed down as shown in FIG.

さらに検索結果絞り込み手段１０８は、絞り込み単語以外の単語がそれぞれ次に選択された場合に、検索結果が何件に絞り込まれるかを計算して図３のように各単語の検索結果候補数を更新してユーザに提示する。例えば、「三菱」のみが絞り込み単語として選択された場合、次に「電機」が選択された場合、「三菱」と「電機」のＡＮＤ検索によって検索結果が１５件に絞り込まれる場合、「電機」の表示箇所に隣接して検索結果候補数１５を表示する。他の単語「株式」と「会社」に対しても、同様の計算を行って表示する。このように、ユーザが次に行う操作で検索結果が何件に絞り込まれるかを一目で把握することができるので、効率的に検索結果を絞り込むことが可能である。 Further, the search result narrowing means 108 calculates the number of search results to be narrowed down when a word other than the narrowed word is selected next, and updates the number of search result candidates for each word as shown in FIG. And present it to the user. For example, when only “Mitsubishi” is selected as a refinement word, when “Electric” is selected next, and when the search result is narrowed down to 15 by AND search of “Mitsubishi” and “Electric”, “Electric” The number of search result candidates 15 is displayed adjacent to the display part. The same calculation is performed for the other words “stock” and “company”. In this way, since it is possible to grasp at a glance how many search results are narrowed down by the user's next operation, it is possible to efficiently narrow down the search results.

ここで、ユーザが音声認識結果の単語を修正したい場合、修正対象単語選択手段１０９が、マウスやタッチパネルなどの入力デバイスを用いてユーザが修正対象単語を選択する操作を受け付ける。
ユーザが修正対象単語を選択すると、修正対象単語選択手段１０９は、選択操作を感知し、選択された修正対象単語を出力する。ここで、絞り込み単語の選択操作と修正対象単語選択操作は、互いに判別可能な操作でなければならない。例えば、入力デバイスとしてマウスを用いる場合には、左クリックで絞り込み単語選択、右クリックで修正対象単語選択、タッチパネルを用いる場合には、シングルタップで絞り込み単語選択、ダブルタップで修正対象単語選択、といったように、別の操作を割り当てる。 Here, when the user wants to correct the word of the speech recognition result, the correction target word selection unit 109 accepts an operation for the user to select the correction target word using an input device such as a mouse or a touch panel.
When the user selects a correction target word, the correction target word selection unit 109 senses the selection operation and outputs the selected correction target word. Here, the narrow-down word selection operation and the correction target word selection operation must be operations that can be distinguished from each other. For example, when using a mouse as an input device, select a narrowed word by left-click, select a correction target word by right-click, and when using a touch panel, select a narrow-down word by single tap, select a correction target word by double tap, etc. To assign another operation.

例えば、ユーザが「三菱電機株式会社」を音声入力しようとして、「ミツビシデンキカブシキガイシャ」と発話したとき、音声認識結果が「三井／電機／株式／会社」であったとする。このとき、ユーザは音声認識結果の「三井」を「三菱」に修正すべく、「三井」を修正対象単語選択する。次に、修正対象単語選択手段１０９において修正対象単語が選択された場合に、修正候補生成手段１１１が行う処理（修正候補生成ステップ、修正候補検索候補数取得ステップ、修正候補出力ステップ）の流れを図５のフローチャートに従って詳しく説明する。 For example, when a user speaks “Mitsubishi Denki Kabushiki Gaisha” to input “Mitsubishi Electric Corporation” by voice, the voice recognition result is “Mitsui / Electricity / Stock / Company”. At this time, the user selects “Mitsui” as a correction target word to correct “Mitsui” in the speech recognition result to “Mitsubishi”. Next, when a correction target word is selected by the correction target word selection unit 109, a flow of processing (a correction candidate generation step, a correction candidate search candidate number acquisition step, a correction candidate output step) performed by the correction candidate generation unit 111 is performed. This will be described in detail with reference to the flowchart of FIG.

修正候補生成手段１１１は、修正対象単語選択手段１０９から出力された修正対象単語を入力として、先ず、読み・音節記憶手段１１０に記憶されている読み情報と音節情報から、修正対象単語の読みと音節情報を取得する（ステップＳＴ１０１）。ここで、読み・音節記憶手段１１０には、図６のような形で認識対象語句が形態素解析などの単語分割手法によって分割された単語の表記が保存されており、さらに表記に対応して、その読みと音節情報が格納されていることが望ましい。修正候補生成手段１１１は、読み・音節記憶手段１１０の中から修正対象単語を検索し、対応する読みと音節情報を取得する。例えば、修正対象単語が「三井」の場合、読み・音節記憶手段１１０の表記の中から「三井」を検索し、その読み「ミツイ」と音節「ｍｉ−ｃｕ−ｉ」を取得する。 The correction candidate generation unit 111 receives the correction target word output from the correction target word selection unit 109, and first reads the correction target word from the reading information and syllable information stored in the reading / syllable storage unit 110. Syllable information is acquired (step ST101). Here, in the reading / syllable storage means 110, a notation of a word obtained by dividing the recognition target phrase by a word dividing method such as morphological analysis in the form as shown in FIG. 6 is stored. It is desirable to store the reading and syllable information. The correction candidate generation unit 111 searches the reading / syllable storage unit 110 for a correction target word and acquires corresponding reading and syllable information. For example, when the correction target word is “Mitsui”, “Mitsui” is searched from the notation of the reading / syllable storage means 110, and the reading “Mitsui” and the syllable “mi-cu-i” are acquired.

次に、読み・音節記憶手段１１０の中から任意の単語を選択し（ステップＳＴ１０２）、修正対象単語の読みとステップＳＴ１０２で選択した単語の読みとの類似度を計算する（ステップＳＴ１０３）。単語の読みを利用した類似度の計算手法は任意の公知の計算方法を利用することができる。例えば、ある単語を別の単語に編集する際の操作手順（挿入、削除、置換）の最少の回数を単語間の距離として定義する編集距離（レーベンシュタイン距離）を利用する。例えば、「ミツイ」を「ミツビシ」に編集する際の手順は以下のように、
「ミツイ」
「ミツビ」（イをビに置換）
「ミツビシ」（シを挿入）
となるから、最少で２回の操作手順を必要とする。従って単語「ミツイ」と「ミツビシ」の編集距離は２となる。編集距離が小さいほど、読みの類似度は大きいとしてよいので、編集距離の逆数を単語間の類似度として計算することが可能である。以下では、この読み情報を用いた類似度を読み類似度と呼ぶ。 Next, an arbitrary word is selected from the reading / syllable storage means 110 (step ST102), and the similarity between the reading of the correction target word and the reading of the word selected in step ST102 is calculated (step ST103). Any known calculation method can be used as the similarity calculation method using word reading. For example, an edit distance (Levenstein distance) that defines the minimum number of operation procedures (insertion, deletion, replacement) when editing a word as another word as the distance between words is used. For example, the procedure for editing “Mitsui” to “Mitsubishi” is as follows:
"Mitsui"
"Mitsubi" (Replace I with Bi)
"Mitsubishi" (Insert)
Therefore, at least two operation procedures are required. Therefore, the edit distance between the words “Mitsui” and “Mitsubishi” is 2. Since the similarity of reading may be larger as the editing distance is smaller, the reciprocal of the editing distance can be calculated as the similarity between words. Hereinafter, the similarity using the reading information is referred to as reading similarity.

次に、修正候補生成手段１１１は、修正対象単語の音節とＳＴ１０２で選択した単語の音節の類似度を計算する（ステップＳＴ１０４）。単語の音節を利用した類似度の計算手法は、公知の計算方法を利用することができる。例えば、以下の非特許文献５に記載されているような、部分音節列の統計的な認識誤り傾向から各部分音節節相互の混同確率を計算し、全ての部分音節列の混同確率の積の対数として単語全体の類似度を求める手法を利用することができる。以下では、この音節情報を用いた類似度を音響類似度と呼ぶ。
非特許文献５：阿部他：『認識誤り傾向の確率モデルを用いた２段階探索法による大規模連続音声認識』、電子情報通信学会誌、Ｖｏｌ．Ｊ８３−Ｄ− ＩＩ、Ｎｏ．１２、ｐｐ．２５４５−２５５３、２０００．
以上のステップＳＴ１０２〜ステップＳＴ１０４まで処理を、読み・音節記憶手段１１０の中に保存されている全ての単語について繰り返す（ステップＳＴ１０５）。 Next, the correction candidate generating unit 111 calculates the similarity between the syllable of the correction target word and the syllable of the word selected in ST102 (step ST104). A known calculation method can be used as the similarity calculation method using the syllables of words. For example, as described in Non-Patent Document 5 below, the confusion probability between the partial syllable strings is calculated from the statistical recognition error tendency of the partial syllable string, and the product of the confusion probabilities of all the partial syllable strings is calculated. It is possible to use a technique for obtaining the similarity of whole words as a logarithm. Below, the similarity using this syllable information is called acoustic similarity.
Non-Patent Document 5: Abe et al .: “Large-scale continuous speech recognition using a two-stage search method using a probability error probability model”, IEICE Journal, Vol. J83-D-II, no. 12, pp. 2545-2553, 2000.
The above steps ST102 to ST104 are repeated for all the words stored in the reading / syllable storage means 110 (step ST105).

読み・音節記憶手段１１０に保存されている全ての単語ｉについて、修正対象単語との読み類似度ｒ_ｉ及び音響類似度ａ_ｉを求めると、次に修正候補生成手段１１１は、修正対象単語と、読み・音節記憶手段１１０の中に保存されている全ての単語との間の読み類似度と音響類似度を、それぞれ読み類似度の総和及び音響類似度の総和で割って正規化し（ただし、ｎは単語の総数）、次式のように、両類似度の重み付き和を計算して単語間類似度ｓ_ｉとする（ステップＳＴ１０６）。

After obtaining the reading similarity r _i and the acoustic similarity a _i with respect to the correction target word for all the words i stored in the reading / syllable storage unit 110, the correction candidate generation unit 111 then determines the correction target word and The reading similarity and the acoustic similarity between all the words stored in the reading / syllable storage means 110 are normalized by dividing by the sum of reading similarities and the sum of acoustic similarities, respectively (however, n is the total number of words), as in the following equation to calculate a weighted sum of both similarity to the word similarity between s _i (step ST 106).

上式中、αは読み類似度と音響類似度のどちらをどれだけ重視して単語の類似度を計算するかを決める重みである。αは音声認識の使用環境に応じて任意に設定することができ、α＝０の場合、音響類似度のみを利用することになり、逆に、α＝１の場合、読み類似度のみを利用することになる。 In the above formula, α is a weight that determines how much importance is given to the reading similarity or the acoustic similarity to calculate the word similarity. α can be arbitrarily set according to the use environment of voice recognition. When α = 0, only the acoustic similarity is used. Conversely, when α = 1, only the reading similarity is used. Will do.

このように、修正候補生成手段１１１は、ユーザが選択した修正対象単語、読み・音節記憶手段１１０の中に保存されている全ての単語との単語間類似度を計算した後、単語間類似度の大きい順に並べて上位ｍ件の単語を修正候補として生成する（ステップＳＴ１０７）。件数ｍは任意である。たとえば、ｍ＝３として修正対象単語が「三井」である場合には、例えば、単語間類似度の高い「三石（ミツイシ）」や「水井（ミズイ）」、「三菱（ミツビシ）」などの単語が修正候補として選ばれることになる。 As described above, the correction candidate generation unit 111 calculates the inter-word similarity between the correction target word selected by the user and all the words stored in the reading / syllable storage unit 110, and then the inter-word similarity. Are arranged in descending order of the top m words as correction candidates (step ST107). The number m is arbitrary. For example, when m = 3 and the correction target word is “Mitsui”, for example, words such as “Mitsuishi”, “Mizui”, “Mitsubishi” with high similarity between words are used. Will be selected as a correction candidate.

次に、修正候補生成手段１１１は、それぞれの修正候補に対して、修正候補が選択された場合の検索候補数を取得する（ステップＳＴ１０８）。例えば、修正候補「三菱」に対して、検索手段１０５で出力された検索結果のうち、「三菱」を含むものが何件あるかを示す検索結果候補数を取得する。最後に、修正候補生成手段１１１は、修正候補及び各修正候補に対応した検索結果候補数を出力する。
修正候補表示手段１１２は、ＬＣＤ表示器などの表示デバイスを用いて、修正候補生成手段１１１から出力された修正候補及び検索結果候補数を、図７のように同時にユーザに表示する。このとき、修正候補の類似度が大きいほど、修正対象単語の近くに表示されるようにレイアウトするのが望ましい。 Next, the correction candidate generation unit 111 acquires the number of search candidates when a correction candidate is selected for each correction candidate (step ST108). For example, for the correction candidate “Mitsubishi”, the number of search result candidates indicating how many of the search results output by the search means 105 include “Mitsubishi” is acquired. Finally, the correction candidate generation unit 111 outputs the correction candidates and the number of search result candidates corresponding to each correction candidate.
The correction candidate display means 112 displays the correction candidates and the number of search result candidates output from the correction candidate generation means 111 simultaneously to the user as shown in FIG. 7 using a display device such as an LCD display. At this time, it is desirable to lay out so that the higher the similarity of the correction candidates, the closer to the correction target word.

修正候補表示手段１１２により修正候補及び検索結果候補数が表示されると、修正候補選択手段１１３は、マウスやタッチパネルなどの入力デバイスを用いてユーザが意図する修正候補を選択する操作を受け付け、ユーザによって選択操作がなされた場合に、選択された修正候補を出力する。 When the correction candidate display unit 112 displays the correction candidates and the number of search result candidates, the correction candidate selection unit 113 accepts an operation of selecting a correction candidate intended by the user using an input device such as a mouse or a touch panel, and the user When the selection operation is performed by the above, the selected correction candidate is output.

最後に、修正実行手段１１４は、ユーザによって選択された修正候補を入力として、既に表示されている音声認識結果の修正を実行する。例えば図７において、修正対象単語の「三井」に対して、修正候補の「三菱」が修正候補として選択された場合には、修正対象単語の「三井」と修正候補の「三菱」を置き換えて図２のように表示する。修正実行手段１１４により修正が実行されると、検索手段１０５が、修正された単語から再度検索を実行する。また、修正実行手段１１４は、後段に音声認識結果を用いるような処理が続く場合には、認識結果の修正が行われた旨を適切な場所に通知する。
以上の処理を、ユーザの所望の検索結果が得られるまで繰り返す。
以上が、本発明に係る音声検索インタフェース装置の処理の流れである。 Finally, the correction execution unit 114 executes correction of the already displayed speech recognition result with the correction candidate selected by the user as an input. For example, in FIG. 7, when the correction candidate “Mitsubishi” is selected as the correction candidate for the correction target word “Mitsui”, the correction target word “Mitsui” and the correction candidate “Mitsubishi” are replaced. The display is as shown in FIG. When the correction is executed by the correction execution unit 114, the search unit 105 executes the search again from the corrected word. In addition, when the process of using the speech recognition result continues in the subsequent stage, the correction execution unit 114 notifies the appropriate place that the recognition result has been corrected.
The above processing is repeated until a user's desired search result is obtained.
The above is the processing flow of the voice search interface device according to the present invention.

尚、上記実施の形態１では、音声認識結果が複数の単語からなる単語列の場合を説明したが、音声認識結果が一つの単語だけであっても同様に適用可能である。但し、単語が筆頭だけの場合は、ＯＲ検索やＡＮＤ検索は行わない。 In the first embodiment, the case where the speech recognition result is a word string composed of a plurality of words has been described. However, the present invention can be similarly applied even if the speech recognition result is only one word. However, when the word is only the first word, the OR search and the AND search are not performed.

以上のように、実施の形態１の音声検索インタフェース装置によれば、音声入力に対する認識結果として単語または単語列を出力する単語出力手段と、任意の単語または単語列が与えられた場合、特定のデータベースを検索して単語または単語列の検索結果及び検索候補数を出力する検索手段と、単語の情報が登録された単語辞書記憶手段と、単語出力手段で出力された単語と、単語辞書記憶手段に登録された単語とのマッチングを行い、単語単位の修正候補を生成する修正候補生成手段と、修正候補生成手段で生成されたそれぞれの修正候補に対する検索候補数を、検索手段を介して取得する修正候補検索候補数取得手段と、修正候補生成手段で生成された修正候補と、修正候補検索候補数取得手段で取得されたそれぞれの修正候補に応じた検索候補数とを出力する修正候補出力手段とを備えたので、ユーザが音声認識を利用してデータベースから特定の情報を絞り込む際に、効率的に検索結果を絞り込むことができる。 As described above, according to the speech search interface device of the first embodiment, when a word output unit that outputs a word or a word string as a recognition result for speech input and an arbitrary word or word string are given, Search means for searching a database and outputting a search result of word or word string and the number of search candidates, word dictionary storage means in which word information is registered, word output by word output means, and word dictionary storage means The correction candidate generating means for performing the matching with the words registered in, and generating correction candidates in units of words, and the number of search candidates for the respective correction candidates generated by the correction candidate generating means are acquired via the search means. According to the correction candidate search candidate number acquisition means, the correction candidates generated by the correction candidate generation means, and the respective correction candidates acquired by the correction candidate search candidate number acquisition means Since a correction candidate output means for outputting the retrieval candidate number can be when a user narrow down the specific information from the database by using the voice recognition, Filter efficiently search results.

また、実施の形態１の音声検索インタフェース装置によれば、検索手段は、データベースを検索する際、単語出力手段で複数の単語が出力された場合、複数の単語のいずれかを含むＯＲ検索を行うようにしたので、ユーザは、音声認識結果によりどのような検索結果が得られるかを容易に知ることができる。 Further, according to the voice search interface device of the first embodiment, the search means performs an OR search including any of the plurality of words when the word output means outputs a plurality of words when searching the database. As a result, the user can easily know what search results can be obtained from the speech recognition results.

また、実施の形態１の音声検索インタフェース装置によれば、単語出力手段で複数の単語が出力され、かつ、複数の単語のうちいずれか複数の単語が絞り込み単語として選択された場合、選択された単語を全て含むＡＮＤ検索結果のみに、検索手段の検索結果を絞り込む検索結果絞り込み手段を備えたので、効率的な絞り込みを行うことができる。 Further, according to the voice search interface device of the first embodiment, a plurality of words are output by the word output means, and when any one of the plurality of words is selected as a narrowed word, the selected word is selected. Since the search result narrowing means for narrowing down the search result of the search means is provided only for AND search results including all words, efficient narrowing can be performed.

また、実施の形態１の音声検索インタフェース装置によれば、単語出力手段が出力したいずれかの単語に対して修正対象単語の指定を受けた場合、修正候補生成手段は、修正対象単語の音節と修正候補の音節との類似度と、修正対象単語の読みと修正候補の読みの類似度の両方を利用し、両類似度に重みを付けた総和を全体の類似度として修正候補を生成する際の情報に利用するようにしたので、的確な修正候補を生成することができる。 Further, according to the speech search interface device of the first embodiment, when the correction target word is designated for any of the words output by the word output means, the correction candidate generation means determines the syllable of the correction target word. Using both the similarity to the correction candidate syllable and the similarity between the reading of the correction target word and the correction candidate, and generating the correction candidate with the sum total weighted to both similarities as the overall similarity Therefore, it is possible to generate an appropriate correction candidate.

また、実施の形態１の音声検索インタフェース装置によれば、単語出力手段で複数の単語が出力され、かつ複数の単語のうちいずれかの複数の単語が絞り込み単語として選択された場合、選択された単語と選択されていない単語のＡＮＤ検索を行った場合の検索候補数を取得する検索結果絞り込み手段と、検索結果絞り込み手段が取得した検索候補数を表示する音声認識結果表示手段とを備えたので、ユーザは効率的に検索結果を絞り込むことができる。 Further, according to the voice search interface device of the first embodiment, a plurality of words are output by the word output means, and when any one of the plurality of words is selected as a narrowed word, the selected word is selected. Since the search result narrowing means for acquiring the number of search candidates when performing an AND search of the word and the unselected word and the voice recognition result display means for displaying the number of search candidates acquired by the search result narrowing means are provided. The user can narrow down the search results efficiently.

また、実施の形態１の音声入力検索方法によれば、音声入力に対する認識結果として単語または単語列を出力する単語出力ステップと、任意の単語または単語列が与えられた場合、特定のデータベースを検索して単語または単語列の検索結果及び検索候補数を出力する検索ステップと、単語出力ステップで出力された単語と、単語辞書記憶手段に登録されている単語とのマッチングを行い、単語単位の修正候補を生成する修正候補生成ステップと、修正候補生成ステップで生成されたそれぞれの修正候補に対する検索候補数を、検索ステップにより取得する修正候補検索候補数取得ステップと、修正候補生成ステップで生成された修正候補と、修正候補検索候補数取得ステップで取得されたそれぞれの修正候補に応じた検索候補数とを出力する修正候補出力ステップとを備えたので、ユーザが音声認識を利用してデータベースから特定の情報を絞り込む際に、ユーザが効率的に検索結果を絞り込むことができる Further, according to the speech input search method of the first embodiment, a word output step for outputting a word or a word string as a recognition result for the speech input, and a specific database is searched when an arbitrary word or word string is given Then, a search step for outputting a search result of word or word string and the number of search candidates, a word output in the word output step, and a word registered in the word dictionary storage means are matched, and word unit correction is performed. The correction candidate generation step for generating candidates, the number of search candidates for each correction candidate generated in the correction candidate generation step, the correction candidate search candidate number acquisition step for acquiring by the search step, and the correction candidate generation step Outputs correction candidates and the number of search candidates corresponding to each correction candidate acquired in the correction candidate search candidate number acquisition step. Since a correction candidate output step, when the user narrow down the specific information from the database by using the voice recognition, the user can narrow down efficiently search results

１０１音声入力手段、１０２音声認識辞書記憶手段、１０３音声認識手段、１０４検索用データベース、１０５検索手段、１０６音声認識結果表示手段、１０７絞り込み単語選択手段、１０８検索結果絞り込み手段、１０９修正対象単語選択手段、１１０読み・音節記憶手段、１１１修正候補生成手段、１１２修正候補表示手段、１１３修正候補選択手段、１１４修正実行手段。 101 voice input means, 102 voice recognition dictionary storage means, 103 voice recognition means, 104 search database, 105 search means, 106 voice recognition result display means, 107 narrowed word selection means, 108 search result narrowing means, 109 correction target word selection Means 110 reading / syllable storage means 111 correction candidate generation means 112 correction candidate display means 113 correction candidate selection means 114 correction execution means

Claims

Word output means for outputting a word or a word string as a recognition result for voice input;
Search means for searching a specific database and outputting a search result of the word or word string and the number of search candidates when given an arbitrary word or word string;
Word dictionary storage means in which word information is registered;
A correction candidate generating unit that performs matching between the word output by the word output unit and the word registered in the word dictionary storage unit, and generates a correction candidate in units of words;
Correction candidate search candidate number acquisition means for acquiring the number of search candidates for each correction candidate generated by the correction candidate generation means via the search means;
A speech search interface comprising: a correction candidate generated by the correction candidate generation means; and a correction candidate output means for outputting the number of search candidates corresponding to each correction candidate acquired by the correction candidate search candidate number acquisition means. apparatus.

2. The voice search interface device according to claim 1, wherein the search means performs an OR search including any of the plurality of words when the word output means outputs a plurality of words when searching the database. .

When a plurality of words are output by the word output unit and any one of the plurality of words is selected as a narrowed word, only the AND search result including all the selected words 2. The voice search interface device according to claim 1, further comprising search result narrowing means for narrowing search results.

When a correction target word is specified for any of the words output by the word output unit, the correction candidate generation unit includes a similarity between the syllable of the correction target word and the syllable of the correction candidate, and the correction target word. 2. The method according to claim 1, wherein both the readings of the readings of the correction candidates and the readings of the correction candidates are used, and the sum total weighted to both similarities is used as information for generating correction candidates as the overall similarity. The voice search interface device described.

When a plurality of words are output by the word output means and any one of the plurality of words is selected as a narrowed word, and an AND search is performed on the selected word and the unselected word Search result narrowing means to obtain the number of search candidates for,
The voice search interface device according to claim 1, further comprising voice recognition result display means for displaying the number of search candidates acquired by the search result narrowing means.

A word output step for outputting a word or word string as a recognition result for voice input;
When an arbitrary word or word string is given, a search step of searching a specific database and outputting a search result of the word or word string and the number of search candidates;
A correction candidate generation step of matching the word output in the word output step with a word registered in the word dictionary storage means to generate a correction candidate in units of words;
A number of correction candidate search candidates for acquiring the number of search candidates for each of the correction candidates generated in the correction candidate generation step by the search step;
A voice input search comprising: a correction candidate generated in the correction candidate generation step; and a correction candidate output step for outputting the number of search candidates corresponding to each correction candidate acquired in the correction candidate search candidate number acquisition step Method.