JP2005084829A

JP2005084829A - Information retrieving method, device, and program, and program recording medium

Info

Publication number: JP2005084829A
Application number: JP2003314538A
Authority: JP
Inventors: Kenichi Kumagai; 建一熊谷; Akira Tsuruta; 彰鶴田; Toshio Akaha; 俊夫赤羽; Yoichiro Hachiman; 洋一郎八幡
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2003-09-05
Filing date: 2003-09-05
Publication date: 2005-03-31

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and device for retrieving information, capable of retrieving information using a plurality of voice-recognized recognition results. <P>SOLUTION: This method for retrieving information by voice from a plurality of information supplied by a server via a communication line includes a step for inputting the information of user's requesting retrieval in a terminal by voice, a step for the server to receive from the terminal the voice data regarding the input information, a step for acquiring voice recognition candidate analyzing the contents of the voice data received at the receiving step, using a sound model and a language model and acquiring a plurality of recognition candidates, a step for retrieving information using at least two recognition candidates of the plurality of obtained recognition candidates, a step for transmitting the plurality of information acquired at the information retrieving step to the terminal, and a step for displaying at least one of the plurality of information transmitted in the transmitting step. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、情報検索方法、情報検索装置、情報検索プログラム、及びプログラム記録媒体に関する。 The present invention relates to an information search method, an information search device, an information search program, and a program recording medium.

近年、携帯電話等の携帯端末が広く用いられており、携帯性を向上させるために携帯端末の軽量化および小型化が望まれている。このため、携帯端末は、小型化に伴って、キーボードなどの入力装置を小さくする、またはなくす必要がある。 In recent years, mobile terminals such as mobile phones have been widely used, and it is desired to reduce the weight and size of mobile terminals in order to improve portability. For this reason, it is necessary for the portable terminal to reduce or eliminate an input device such as a keyboard as the mobile terminal is downsized.

一方、インターネット上には、さまざまな情報が提供されている。ユーザは、携帯端末から、情報アクセス用のアプリケーションであるブラウザを使用して、任意の場所から、必要な情報にアクセスすることができる。しかし、キーボードが小さい、またはキーボードが存在しない携帯端末から、ＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）や検索条件などを文字入力することは、煩雑である。 On the other hand, various information is provided on the Internet. A user can access necessary information from an arbitrary location using a browser, which is an information access application, from a mobile terminal. However, it is complicated to input characters such as a URL (Uniform Resource Locator) or a search condition from a portable terminal having a small keyboard or no keyboard.

このため、情報検索を音声で入力することが試みられている（例えば、特許文献１参照）。この文献には、表示されているウエブページにリンクされているウエブページを予め指定された階層数だけ、先読みしてダウンロードし、ダウンロードされた全てのウエブページから、単語を抽出し、音声検索用辞書を作成し、この単語を音声検索する音声検索方法が開示されている。 For this reason, it has been attempted to input information search by voice (see, for example, Patent Document 1). In this document, web pages linked to the displayed web page are pre-read for the number of layers specified in advance, downloaded, and words are extracted from all downloaded web pages for voice search. A voice search method for creating a dictionary and performing a voice search for this word is disclosed.

特開２００２−２９７３７２号公報（特許請求の範囲）JP 2002-297372 A (Claims)

しかし、この文献に記載の方法では、入力された音声から１つの認識結果を得て、この認識結果から、ウエブページを検索する。このため、音声認識が誤認識された場合に、ユーザは、所望の検索結果が得られずに、不快感を抱くという問題がある。 However, in the method described in this document, one recognition result is obtained from the input voice, and a web page is searched from the recognition result. For this reason, when voice recognition is misrecognized, there is a problem that the user is uncomfortable without obtaining a desired search result.

また、この文献に記載の方法では、記事などのテキストのみから音声認識用言語モデルや音声認識用辞書を作成する。このようにして作成された音声認識用言語モデルは、「ある文章が、認識できるかどうか？」、あるいは「ある単語文字列が、テキスト中にどのくらいの確率で存在するか？」などの情報しか有さない。このため、明確な目的を持たずに、ウエブページを見るユーザが、十分に楽しむことができない。 In the method described in this document, a speech recognition language model or speech recognition dictionary is created only from text such as articles. The language model for speech recognition created in this way has only information such as “whether a sentence can be recognized” or “how much probability that a certain word string exists in the text”. I don't have it. For this reason, a user who views a web page without having a clear purpose cannot fully enjoy it.

一方、明確な目的を持たずに、ウエブページを見るユーザにとって、ウエブページが楽しめるかどうかは重要な問題である。このようなユーザにとって、仮に、入力された音声が誤認識され、所望のキーワード以外で検索されたウエブページであっても、そのウエブページが楽しめるものであれば、問題はない。 On the other hand, whether a user can enjoy a web page is an important problem for a user who views the web page without having a clear purpose. For such a user, there is no problem as long as the input voice is misrecognized and the web page searched for using a keyword other than the desired keyword can enjoy the web page.

すなわち、本発明は、上記問題に鑑みなされたものであり、その目的は、音声認識された複数の認識結果を用いて、情報を検索できる情報検索方法、情報検索装置を提供することにある。 That is, the present invention has been made in view of the above problems, and an object of the present invention is to provide an information search method and an information search apparatus that can search for information using a plurality of recognition results that have been voice-recognized.

また、本発明の別の目的は、音声認識されたキーワードに、さらに別の条件を加えて情報を検索できる情報検索方法、情報検索装置を提供することにある。 Another object of the present invention is to provide an information search method and an information search apparatus capable of searching for information by adding another condition to a voice-recognized keyword.

上記目的を達成するために、本発明の情報検索方法は、通信回線を通じて、サーバが提供する複数の情報から、音声により情報を検索する方法であって、（１）ユーザが検索を要求する情報を音声で端末に入力する入力ステップと、（２）前記入力された情報に関する音声データを、端末からサーバが受信する受信ステップと、（３）前記受信ステップで受信した音声データの内容を、音響モデルと言語モデルとを用いて解析して、複数の認識候補を得る認識候補取得ステップと、（４）前記得られた複数の認識候補のうち、少なくとも２つの認識候補を用いて、情報を検索する情報検索ステップと、（５）前記情報検索ステップにおいて得られた複数の情報を端末へ送信する送信ステップと、（６）前記送信ステップにおいて送信された複数の情報のうち、少なくとも１つを表示する情報表示ステップと、を有する。 In order to achieve the above object, an information retrieval method of the present invention is a method for retrieving information by voice from a plurality of information provided by a server through a communication line, and (1) Information requested by a user for retrieval. Input to the terminal by voice, (2) a reception step in which the server receives voice data related to the input information, and (3) the contents of the voice data received in the reception step A recognition candidate acquisition step of analyzing a model and a language model to obtain a plurality of recognition candidates; and (4) searching for information using at least two recognition candidates among the plurality of recognition candidates obtained. (5) a transmission step for transmitting a plurality of pieces of information obtained in the information search step to a terminal; and (6) a copy transmitted in the transmission step. Among the information, having an information display step of displaying at least one.

この構成によれば、音声認識の結果、得られた複数の認識候補のうち、少なくとも２つの認識候補を用いて、情報を検索する。複数の認識候補を利用することにより、認識候補中に、ユーザの意図に近い認識結果が含まれる確率が高くなる。この結果、入力された音声を誤認識する可能性を小さくすることができる。 According to this configuration, information is searched using at least two recognition candidates among a plurality of recognition candidates obtained as a result of speech recognition. By using a plurality of recognition candidates, the probability that a recognition result close to the user's intention is included in the recognition candidates is increased. As a result, it is possible to reduce the possibility of erroneous recognition of the input voice.

また、得られた複数の認識候補のうち、少なくとも２つの認識候補を用いる。この結果、ユーザが必ずしも意図するキーワードと一致しない認識結果を用いて、情報を検索することができる。すなわち、ユーザが、明確な目的を持たずに情報を検索する場合には、予期しない認識結果を用いて、情報を検索することができる。 In addition, at least two recognition candidates are used among the plurality of obtained recognition candidates. As a result, information can be searched using a recognition result that does not necessarily match the keyword intended by the user. That is, when a user searches for information without having a clear purpose, information can be searched using an unexpected recognition result.

また、本発明の情報検索方法では、前記端末には、入力された音声をパラメータ化する音声分析ステップを有するものであってもよい。音声認識は、通常サーバで行う。しかし、音声分析ステップは、特別な音声認識操作を必要としない。したがってこの構成によると、端末に過度な負担を与えない。 In the information search method of the present invention, the terminal may include a voice analysis step for parameterizing the input voice. Speech recognition is usually performed by a server. However, the voice analysis step does not require a special voice recognition operation. Therefore, according to this configuration, an excessive burden is not given to the terminal.

また、本発明の情報検索方法では、前記端末において、音声データの内容を、音響モデルと言語モデルとを用いて認識して、複数の認識候補を得る音声認識候補取得ステップと、前記得られた認識候補を端末からサーバが受信する受信ステップとを有する構成としてもよい。 Further, in the information search method of the present invention, in the terminal, a speech recognition candidate acquisition step of recognizing the content of speech data using an acoustic model and a language model to obtain a plurality of recognition candidates, and the obtained It is good also as a structure which has a reception step which a server receives a recognition candidate from a terminal.

上記情報検索方法は、さらに、上記情報検索ステップにおいて得られた複数の情報のそれぞれに、各情報の利用度スコアを得る利用度スコア取得ステップを有し、上記情報表示ステップにおいて、表示される複数の情報のうち少なくとも１つが、利用度スコアを反映させて、表示されるものであってもよい。 The information search method further includes a usage score acquisition step for obtaining a usage score of each information for each of the plurality of information obtained in the information search step, and the plurality of information displayed in the information display step. At least one of the information may be displayed by reflecting the usage score.

この構成によると、音声認識結果を用いて検索して得られたそれぞれの情報に、利用度スコアが得られる。この結果、利用度スコアを反映して、情報が表示されるので、よく利用されているウエブサイトにアクセスできる。特に、明確な目的を有さずに検索したユーザにとっては、予想しないキーワードを用いて、人気のあるウエブサイトにアクセスできるので、誤認識による不快感を感じない。 According to this configuration, a usage score is obtained for each piece of information obtained by searching using the speech recognition result. As a result, the information is displayed reflecting the usage score, so that a frequently used website can be accessed. In particular, a user who searches without a clear purpose can access a popular website using an unexpected keyword, and thus does not feel discomfort due to misrecognition.

また、前記表示される複数の情報のうち少なくとも１つは、前記検索に用いた検索候補の認識スコアと、前記利用度スコアとの和を反映させて、表示されるものであってもよい。この構成によれば、表示される情報には、検索に用いた検索候補の認識スコアと、前記利用度スコアとの和が、反映される。したがって、音声認識の程度の高い情報であって、さらに良く利用されている情報を順に閲覧することができる。 Further, at least one of the plurality of pieces of displayed information may be displayed by reflecting a sum of the recognition score of the search candidate used for the search and the usage score. According to this configuration, the sum of the recognition score of the search candidate used for the search and the usage score is reflected in the displayed information. Therefore, information that has a high degree of voice recognition and is used more frequently can be browsed in order.

また、上記情報検索方法では、さらに前記検索に用いた検索候補の認識スコアと、前記利用度スコアとのそれぞれに重み付けを行う重み付け付与ステップを有することとしてもよい。この構成によれば、ユーザは、検索の目的により、検索に用いた検索候補の認識スコアと利用度スコアとの重み付けを変えることができる。 The information search method may further include a weighting step for weighting each of the recognition score of the search candidate used for the search and the usage score. According to this configuration, the user can change the weighting of the recognition score of the search candidate used for the search and the usage score depending on the purpose of the search.

本発明の情報検索方法は、通信回線を通じて、サーバが提供する複数の情報から、音声により情報を検索する方法であって、（１）ユーザが検索を要求する情報を音声で端末入力する入力ステップと、（２）前記入力された情報に関する音声データを、端末からサーバが受信する受信ステップと、（３）前記受信ステップで受信した音声データの内容を、音響モデルと言語モデルとを用いて認識して、複数の認識候補を得る音声認識候補取得ステップと、（４）前記得られた複数の認識候補のうち、少なくとも２つの認識候補を端末で表示する認識候補表示ステップと、（５）前記表示された少なくとも２つの認識候補から、ユーザが１つの認識候補を選択する認識候補選択ステップと、（６）前記認識候補選択ステップで選択された認識候補を用いて、情報を検索する情報検索ステップと、を有することとしてもよい。 The information retrieval method of the present invention is a method for retrieving information by voice from a plurality of pieces of information provided by a server through a communication line, and (1) an input step in which information requested by the user is input to the terminal by voice. And (2) a reception step in which a server receives voice data relating to the input information from a terminal, and (3) the content of the voice data received in the reception step is recognized using an acoustic model and a language model. A speech recognition candidate acquisition step for obtaining a plurality of recognition candidates; (4) a recognition candidate display step for displaying at least two recognition candidates among the obtained plurality of recognition candidates on a terminal; A recognition candidate selection step in which the user selects one recognition candidate from at least two displayed recognition candidates; and (6) the recognition candidate selected in the recognition candidate selection step. Using an information retrieval step of retrieving information, it is also possible to have a.

この構成によれば、表示された複数の認識候補から、ユーザが１つの認識候補を選択できる。この結果、ユーザの目的に沿った認識候補を利用して、情報が検索できる。また、音声入力時には予定していない認識結果であっても、ユーザが関心を持ったキーワードを選択できるので、より自由な情報検索が可能となる。 According to this configuration, the user can select one recognition candidate from the displayed plurality of recognition candidates. As a result, information can be searched for using recognition candidates according to the user's purpose. Moreover, even if the recognition result is not scheduled at the time of voice input, a keyword that the user is interested in can be selected, so that more free information retrieval is possible.

また、本発明の情報検索装置は、通信回線を通じて、サーバが提供する複数の情報から、音声により情報を検索する情報検索装置であって、（１）ユーザが検索を要求する情報を音声で端末に入力する入力手段と、（２）前記入力された情報に関する音声データを、端末からサーバが受信する受信手段と、（３）前記受信手段で受信した音声データの内容を、音響モデルと言語モデルとを用いて認識して、複数の認識候補を得る音声認識候補取得手段と、（４）前記得られた複数の認識候補のうち、少なくとも２つの認識候補を用いて、情報を検索する情報検索手段と、（５）前記情報検索手段において得られた複数の情報を端末へ送信する送信手段と、（６）前記送信手段において送信された複数の情報のうち、少なくとも１つを端末で表示する情報表示手段と、を有するものである。 The information search apparatus according to the present invention is an information search apparatus for searching information by voice from a plurality of information provided by a server through a communication line, and (1) a user requests information to be searched by voice. (2) receiving means for receiving voice data related to the input information from a terminal from a terminal; and (3) contents of the voice data received by the receiving means, an acoustic model and a language model. And (4) an information search for searching for information using at least two recognition candidates among the obtained plurality of recognition candidates. Means, (5) transmitting means for transmitting a plurality of pieces of information obtained by the information searching means to a terminal, and (6) expressing at least one of the plurality of pieces of information transmitted by the transmitting means at the terminal. And information display means for, and has a.

本発明の情報検索プログラムは、通信回線を通じて、サーバが提供する複数の情報から、音声により情報を検索する情報検索プログラムであって、コンピュータに、（１）ユーザが検索を要求する情報を音声で端末に入力する入力ステップと、（２）前記入力された情報に関する音声データを、端末からサーバが受信する受信ステップと、（３）前記受信ステップで受信した音声データの内容を、音響モデルと言語モデルとを用いて認識して、複数の認識候補を得る音声認識候補取得ステップと、（４）前記得られた複数の認識候補のうち、少なくとも２つの認識候補を用いて、情報を検索する情報検索ステップと、（５）前記情報検索ステップにおいて得られた複数の情報を端末へ送信する送信ステップと、（６）前記送信ステップにおいて送信された複数の情報のうち、少なくとも１つを端末で表示する情報表示ステップと、を実行させるものである。 The information retrieval program of the present invention is an information retrieval program for retrieving information by voice from a plurality of information provided by a server through a communication line, and (1) information requested by a user to be retrieved by voice is transmitted to a computer. An input step to input to the terminal; (2) a reception step in which a server receives voice data related to the input information; and (3) a content of the voice data received in the reception step, an acoustic model and a language. A speech recognition candidate acquisition step for recognizing using a model and obtaining a plurality of recognition candidates; and (4) information for searching for information using at least two recognition candidates among the plurality of recognition candidates obtained. A search step; (5) a transmission step of transmitting a plurality of information obtained in the information search step to a terminal; and (6) in the transmission step. Of the transmitted plurality of information, but to execute the information display step of displaying at least one in terminal.

上記情報検索プログラムは、コンピュータ読み取り可能な記録媒体に記録されていてもよい。 The information search program may be recorded on a computer-readable recording medium.

本発明の情報検索方法は、入力された音声を解析して得られた複数の認識候補の中から、少なくとも２つの認識結果を用いて、情報を検索する。この結果、誤認識が減り、よりユーザの意図に近い情報が検索できる。 The information search method of the present invention searches for information by using at least two recognition results from among a plurality of recognition candidates obtained by analyzing input speech. As a result, erroneous recognition is reduced, and information closer to the user's intention can be searched.

また、本発明の情報検索方法は、得られた複数の情報のそれぞれに、各情報の利用度スコアを得て、音声認識されたキーワードに、各情報の利用度スコアを加味した情報を検索できる情報検索方法、情報検索装置を提供することができる。 Further, the information search method of the present invention can obtain a use score of each information for each of a plurality of obtained information, and can search for information in which the use score of each information is added to a speech-recognized keyword. An information search method and an information search device can be provided.

以下に、本発明を実施するための最良の形態を、図面を参照しながら説明する。なお、本発明は、これらによって限定されるものではない。 The best mode for carrying out the present invention will be described below with reference to the drawings. In addition, this invention is not limited by these.

（実施例１）
図１は、本発明の一実施例の情報検索方法を実現するためのシステム構成を示すブロック図である。図１において、端末１００とサーバ２００とは通信回線３００を介して接続されている。 (Example 1)
FIG. 1 is a block diagram showing a system configuration for realizing an information search method according to an embodiment of the present invention. In FIG. 1, the terminal 100 and the server 200 are connected via a communication line 300.

端末１００には、音声入力部１０１と、表示部１０２と、重み入力部１０３とが設けられている。マイクロホンなどの音声入力部１０１から入力された音声は、ＡＤ変換部１０４で、デジタル信号に変換される。デジタル化された音声データは、送信制御部１０５から、通信回線３００を介してサーバ２００に送られる。また、ユーザが、情報検索に重み付けを行う場合には、重み入力部１０３から、所定の重みを入力する。 The terminal 100 is provided with a voice input unit 101, a display unit 102, and a weight input unit 103. The voice input from the voice input unit 101 such as a microphone is converted into a digital signal by the AD conversion unit 104. The digitized audio data is sent from the transmission control unit 105 to the server 200 via the communication line 300. Further, when the user weights information search, a predetermined weight is input from the weight input unit 103.

なお、本発明にいう端末とは、一般的に呼ばれている「携帯電話」のみを指すものではなく、ＰＨＳや次世代携帯電話など、電話機能とインターネット機能を備えた、例えば、モバイルコンピュータやカーナビゲーションシステムなどの通信端末もその範疇に含むものである。 Note that the term “terminal” as used in the present invention does not refer to a “mobile phone” that is generally called, but includes a telephone function and an Internet function, such as a PHS or a next-generation mobile phone, such as a mobile computer or Communication terminals such as car navigation systems are also included in the category.

通信回線としては、公衆回線、ＬＡＮ向けのアナログ電話回線、ＩＳＤＮ（綜合サービス・デジタル網）、ＤＳＬ、イーサネット（登録商標）、光ファイバ回線、ＰＨＳ、携帯電話（回線接続、バケット接続）、無線ＬＡＮ（ローカルエリアネットワーク）、固定マイクロ波回線、衛星通信回線などが挙げられる。 Communication lines include public lines, analog telephone lines for LAN, ISDN (combined service / digital network), DSL, Ethernet (registered trademark), optical fiber lines, PHS, mobile phones (line connection, bucket connection), wireless LAN (Local area network), fixed microwave line, satellite communication line and the like.

サーバ２００は、端末１００側からの情報を受信する受信制御部２０１と、受信された音声データを分析する音声分析部２１２と、分析された音声データを音声認識する音声認識候補取得部２１３と、音声認識に用いられる音響モデルテーブル及び言語モデルテーブル２２３と、音声認識に用いられる単語辞書２２４と、情報候補を取得する情報候補取得部２１４と、取得した情報候補を端末に送信する送信制御部２２６と、インターネット４００上のウエブページから必要な情報を解析する情報解析部２２０と、解析された情報に関するデータを記憶する情報データベース２２１とを有する。音声認識用の単語辞書２２４は、情報解析部２２０にて、インターネット上のウエブページから得られた文字情報を蓄える情報データベース２２１から作成される。また、利用度スコアは、情報解析部２２０にて、インターネット４００上のウエブページを解析して得られる。 The server 200 includes a reception control unit 201 that receives information from the terminal 100 side, a voice analysis unit 212 that analyzes the received voice data, a voice recognition candidate acquisition unit 213 that recognizes the analyzed voice data, An acoustic model table and language model table 223 used for speech recognition, a word dictionary 224 used for speech recognition, an information candidate acquisition unit 214 for acquiring information candidates, and a transmission control unit 226 for transmitting the acquired information candidates to the terminal And an information analysis unit 220 that analyzes necessary information from a web page on the Internet 400, and an information database 221 that stores data relating to the analyzed information. The word dictionary 224 for speech recognition is created by the information analysis unit 220 from the information database 221 that stores character information obtained from a web page on the Internet. The usage score is obtained by analyzing a web page on the Internet 400 by the information analysis unit 220.

本実施例の場合は、サーバ２００の受信制御部２１１で、端末１００からの音声データを受信すると、音声分析部２１２において音声分析がされる。具体的には、特徴パラメータの抽出は、例えばケプストラム分析などにより行い、ＭＦＣＣ（Ｍｅｌ−ＦｒｅｑｕｅｎｃｙＣｅｐｓｔｒｕｍＣｏｅｆｆｉｃｉｅｎｔ）パラメータに変換される。 In this embodiment, when the reception control unit 211 of the server 200 receives the voice data from the terminal 100, the voice analysis unit 212 performs voice analysis. Specifically, feature parameters are extracted by, for example, cepstrum analysis and converted into MFCC (Mel-Frequency Cepstrum Coefficient) parameters.

特徴パラメータ化された音声データは、音声認識候補取得部２１３において、音響モデルと、言語モデルとを用いることにより、音響モデル処理と、言語モデル処理とを行う。本実施例では、音響モデル処理と、言語モデル処理とをサーバで行うので、十分な語義数の辞書及び文法をサーバ側に予め用意することができ、端末側に辞書及び文法をダウンロードするための待ち時間を削減できるので、認識処理を迅速に行うことができる。 The speech data that has been converted into feature parameters is subjected to acoustic model processing and language model processing by using the acoustic model and the language model in the speech recognition candidate acquisition unit 213. In this embodiment, since the acoustic model processing and language model processing are performed by the server, a dictionary and grammar with a sufficient number of semantic meanings can be prepared in advance on the server side, and the dictionary and grammar can be downloaded to the terminal side. Since waiting time can be reduced, recognition processing can be performed quickly.

また、音響モデル処理と、言語モデル処理とをサーバ２００で行うことにより、端末１００側に音響情報、言語情報を保持する必要がない。この結果、端末側の利用可能な資源が少なく、処理能力が不十分な場合においても、精度の良い認識処理を効率よく行うことができる。 Further, since the acoustic model processing and the language model processing are performed by the server 200, it is not necessary to store acoustic information and language information on the terminal 100 side. As a result, accurate recognition processing can be performed efficiently even when there are few resources available on the terminal side and the processing capability is insufficient.

音声認識候補取得部２１３では、音響モデルを用いることにより、上記特徴パラメータから音韻情報（音素情報）が抽出される。音響モデルは、例えばＨＭＭ（隠れマルコフモデル）などを用いることができる。また、音韻情報としては、例えば、音韻候補とその尤度の列とすることができる。 The speech recognition candidate acquisition unit 213 extracts phoneme information (phoneme information) from the feature parameters by using an acoustic model. As the acoustic model, for example, an HMM (Hidden Markov Model) can be used. The phoneme information can be, for example, a phoneme candidate and its likelihood column.

抽出された音韻情報は、次に、単語辞書２２４を用いて、言語モデルにより、言語レベルの認識処理を行う。この認識処理により、複数の認識結果候補を取得する。 The extracted phoneme information is then subjected to language level recognition processing using a word model using the word dictionary 224. By this recognition processing, a plurality of recognition result candidates are acquired.

言語モデルは、以下のようにして作成する。まず、情報解析部２２０にて、ユーザが必要と思うＷＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）のサブセットを手動で作成する。次に、作成したサブセット上のＷｅｂページのテキストから、音声認識用の単語を集める。具体的には、ＷｅｂページのＨＴＭＬのタグを除いたテキストを抽出する。抽出したテキストは、情報テーブル２２１に記録される。その後、抽出されたテキストを形態素解析し、名詞句を得る。得られた名詞句は、単語辞書２２４に登録する。また、ＨＴＭＬ文章からハイパーリンクを抽出する場合には、＜Ａｈｒｅｆ＝“ＵＲＬ”＞．．．．＜／Ａ＞で示されるリンク先に移動し、上記と同様にして、音声認識用の単語を集め、音声認識用の単語辞書２２４に登録する。このように、音声認識用の単語をＷｅｂページのテキストから集めることで、検索しても意味がないような単語を認識することが避けられる。この結果、認識時の計算量を少なくすることができる。なお、この図の例では、単語辞書２２４は、情報データベース２２１とは、別個に設けているが、情報データベース２２１に単語辞書を含ませる構成としてもよい。 The language model is created as follows. First, the information analysis unit 220 manually creates a subset of WWW (World Wide Web) that the user needs. Next, words for speech recognition are collected from the text of the Web page on the created subset. Specifically, the text excluding the HTML tag of the Web page is extracted. The extracted text is recorded in the information table 221. Thereafter, the extracted text is subjected to morphological analysis to obtain a noun phrase. The obtained noun phrase is registered in the word dictionary 224. When extracting a hyperlink from an HTML sentence, <A href=“URL”>. . . . It moves to the link destination indicated by </A>, collects speech recognition words, and registers them in the speech recognition word dictionary 224 in the same manner as described above. In this way, by collecting words for speech recognition from the text of the Web page, it is possible to avoid recognizing words that have no meaning even if searched. As a result, the amount of calculation at the time of recognition can be reduced. In this example, the word dictionary 224 is provided separately from the information database 221, but the information database 221 may include a word dictionary.

音声認識用の単語の数には、特に制限はない。ＷＷＷから単語を集める場合には、登録される単語の数が非常に多くなる。一方、認識対象単語の数が多すぎれば、音声認識性能が大きく劣化するとともに、音声認識時にＣＰＵやメモリなどを大量に使用することになる。このために、認識用の単語辞書２２４に登録された単語と同一の単語は、登録しない。ここで、単語辞書２２４に登録された単語がＷｅｂページ上で複数回使用されている場合には、使用回数をカウントする。特に、登録する音声認識用の単語の数を制限すると好ましい。例えば、使用頻度の高い単語を上位から１０万語を単語辞書に登録し、他の単語を未知語として扱うなどである。 There is no particular limitation on the number of words for speech recognition. When collecting words from the WWW, the number of registered words is very large. On the other hand, if the number of recognition target words is too large, the speech recognition performance is greatly deteriorated and a large amount of CPU, memory, etc. are used during speech recognition. For this reason, the same word as the word registered in the recognition word dictionary 224 is not registered. Here, when the word registered in the word dictionary 224 is used a plurality of times on the Web page, the number of uses is counted. In particular, it is preferable to limit the number of words for speech recognition to be registered. For example, 100,000 words from the top are registered in the word dictionary, and other words are treated as unknown words.

上記音声認識用の単語が集められたＷｅｂページについては、情報データベース２２１に、当該ＷｅｂページのＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）（文字列）、当該Ｗｅｂページのタイトル（文字列）、Ｗｅｂページ上で単語辞書に登録された名詞句（文字列）、前記単語辞書に登録された名詞句の使用頻度（数字配列）、前記単語辞書２２４に登録された名詞句の総数（数値）などが記録されている。 For the Web page on which the speech recognition words are collected, the URL (Uniform Resource Locator) (character string) of the Web page, the title (character string) of the Web page, and the word on the Web page are stored in the information database 221. The noun phrases (character strings) registered in the dictionary, the usage frequency (numerical arrangement) of the noun phrases registered in the word dictionary, the total number (numerical values) of the noun phrases registered in the word dictionary 224, and the like are recorded. .

（利用度）
各情報の利用度スコアは、以下のようにして求める。
あるＷｅｂページが、利用されているかどうかは、以下の値により判断できる。
１）Ｗｅｂページのアクセス数
２）Ｗｅｂページが、他のＷｅｂページによりリンクされている数 (Usage)
The usage score of each information is obtained as follows.
Whether or not a certain Web page is used can be determined from the following values.
1) Number of web page accesses 2) Number of web pages linked by other web pages

情報の利用度スコアとしては、１）のアクセス数を用いるのが、最も好ましい。しかし、実際には、全てのＷｅｂページがアクセス数をカウントしているわけではない。したがって、１）の方法を用いることができるのは、アクセス数をカウントしているＷｅｂページに限られる。 As the information usage score, it is most preferable to use the number of accesses of 1). However, not all web pages actually count the number of accesses. Therefore, the method 1) can be used only for Web pages counting the number of accesses.

アクセス数をカウントしていないＷｅｂページに関しては、２）の方法による。ＷｅｂページＡがＷｅｂページＢをリンクしている場合に、ＷｅｂページＡによるＷｅｂページＢへの支持投票とみなす。すなわち、Ｗｅｂページがリンクされた数に基づいて、Ｗｅｂページをランク付けする。 For Web pages that do not count the number of accesses, the method 2) is used. When Web page A links Web page B, it is regarded as a support vote for Web page B by Web page A. That is, the web pages are ranked based on the number of linked web pages.

まず、手動、またはＡｌｔａＶｉｓｔａなどのテキストマッチングを用いたエンジンを用いて、ＷＷＷのサブセットを作成する。次に、各々のＷｅｂページを“オーソリティ”と“ハブ”とに分類する。ここで、オーソリティとは、ハブとして価値の高いページから、頻繁にリンクされているページをいい、ハブとは、オーソリティへのハイパーリンクが集合するページである。そして、ＩｔｅｒａｔｉｖｅＡｌｇｏｒｉｔｈｍにより、ハブとオーソリティのそれぞれの重みを求める。求められた各々の重みから、あるページのスコアを求めて、このスコアを利用度スコアとする。なお、ＩｔｅｒａｔｉｖｅＡｌｇｏｒｉｔｈｍに関しては、ＪｏｎＭ．Ｋｌｅｉｎｂｅｒｇ， “ＡｕｔｈｏｒｉｔａｔｉｖｅＳｏｕｒｃｅｓｉｎａＨｙｐｅｒｌｉｎｋｅｄＥｎｖｉｒｏｎｍｅｎｔ”，ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＡＣＭ−ＳＩＡＭＳｙｍｐｏｓｉｕｍｏｎＤｉｓｃｒｅｔｅＡｌｇｏｒｉｔｈｍｓ，１９９８に詳細に説明されている。このような方法により求めたＷｅｂページのスコアは、直接人気を反映しているものではないが、人気を、ある程度反映していると考えられる。 First, a subset of the WWW is created manually or using an engine using text matching such as Alta Vista. Next, each Web page is classified into “authority” and “hub”. Here, the authority means a page frequently linked from a page having high value as a hub, and the hub is a page on which hyperlinks to the authority are gathered. Then, the respective weights of the hub and the authority are obtained by iterative algorithm. A score of a certain page is obtained from each obtained weight, and this score is used as a usage score. As for Iterative Algorithm, Jon M. et al. Kleinberg, “Authentic Sources in a Hyperlinked Environment”, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1998. The score of the Web page obtained by such a method does not directly reflect popularity, but is considered to reflect popularity to some extent.

上記１）と２）の方法は、いずれの方法を用いてもよく、状況において設計時に使い分けることもできる。また、１）と２）の方法で得られた、それぞれのスコアの和を利用度スコアとしても良い。これらの方法により、収集した利用度スコアは、上記情報テーブル２２１の各Ｗｅｂページの情報に記録される。 Any of the above methods 1) and 2) may be used, and can be used properly at the time of design depending on the situation. The sum of the scores obtained by the methods 1) and 2) may be used as the usage score. By using these methods, the collected usage score is recorded in the information of each Web page of the information table 221.

（重み付け）
利用度スコアを反映して情報を検索する場合には、音声認識スコアと、利用度スコアとをそれぞれ重み付けを行う。重み付けは、音声認識スコアと、利用度スコアとに、それぞれユーザが設定した重みの値を掛けて、値を掛けた音声認識スコアと、値を掛けた利用度スコアとをそれぞれ加算して行う。重みの値は、音声認識スコアは、０より大きい正の値であり、利用度スコアは、０より大きい正の値である。重み付けをすることで、ユーザが、音声認識スコアと利用度スコアとのうち、どちらの値を重視して情報を検索するかを、選択できる。特に、音声認識スコアと、利用度スコアとの重みの値を変えることで、利用度スコアが音声認識スコアに比べて小さい場合に、利用度スコアを反映させて情報を検索することができる。また、重み付けを行わない場合には、音声認識スコアの重みの値と利用スコアの重みの値とは、いずれも１である。なお、本明細書中では、認識スコアと利用度スコアとの和、あるいは認識スコアと利用度スコアとにそれぞれの重みの値をかけた値の和を、総合スコアということがある。 (Weighting)
When searching for information reflecting the usage score, the voice recognition score and the usage score are weighted. Weighting is performed by multiplying the voice recognition score and the usage score by the weight value set by the user, respectively, and adding the multiplied voice recognition score and the multiplied usage score. As for the weight value, the speech recognition score is a positive value larger than 0, and the usage score is a positive value larger than 0. By weighting, the user can select which value is to be searched for information among the speech recognition score and the usage score. In particular, by changing the weight value between the voice recognition score and the usage score, when the usage score is smaller than the voice recognition score, it is possible to search for information by reflecting the usage score. When weighting is not performed, the value of the speech recognition score weight and the value of the usage score weight are both 1. In this specification, the sum of the recognition score and the usage score, or the sum of values obtained by multiplying the recognition score and the usage score by the respective weight values may be referred to as a total score.

上記の方法により情報候補取得部２１４で、得られた複数の認識候補のうち少なくとも２つを用いて、情報が検索される。複数の認識候補を用いて情報を検索するので、誤認識を減らすことができる。また、明確な目的を持たずに情報を検索するユーザにとって、楽しめる情報を検索することができる。得られた情報は、端末１００に送られ、表示される。 In the information candidate acquisition unit 214 by the above method, information is searched using at least two of the plurality of obtained recognition candidates. Since information is searched using a plurality of recognition candidates, erroneous recognition can be reduced. In addition, it is possible to search for information that can be enjoyed by a user who searches for information without having a clear purpose. The obtained information is sent to the terminal 100 and displayed.

上記の方法により情報候補取得部２１４で、得られた認識候補を、送信制御部２２６から、通信回線３００を介して、端末１００に送信する構成としてもよい。端末１００側の受信制御部１０６は、認識候補を受信し、表示部１０２は、受信した複数の認識候補のうち、少なくとも２つを表示する。ユーザは、表示された認識候補のうち、検索を望む認識候補を選択する。選択された認識候補は、サーバ２００に送られ、この認識候補に基づいて、情報候補取得部２１４で、情報が検索される。 The information candidate acquisition unit 214 may transmit the obtained recognition candidate from the transmission control unit 226 to the terminal 100 via the communication line 300 by the above method. The reception control unit 106 on the terminal 100 side receives the recognition candidates, and the display unit 102 displays at least two of the received plurality of recognition candidates. The user selects a recognition candidate desired to be searched from among the displayed recognition candidates. The selected recognition candidate is sent to the server 200, and the information candidate acquisition unit 214 searches for information based on the recognition candidate.

（動作１）
次に、本発明の情報検索方法について図２、図３を用いて詳細に説明する。図２は、本発明にかかる情報検索の動作を示すフローチャートである。この図の例では、認識候補に重み付けを行わない場合である。また、図３は、本発明で、実際に情報検索がされる例を示す図である。図３（ａ）は、音声認識された認識候補のうち、上位２個を表示する図である。図３（ｂ）は、それぞれの認識候補を用いて検索された情報の結果を表示する図である。 (Operation 1)
Next, the information search method of the present invention will be described in detail with reference to FIGS. FIG. 2 is a flowchart showing an information search operation according to the present invention. In the example of this figure, the recognition candidates are not weighted. FIG. 3 is a diagram showing an example in which information is actually searched in the present invention. FIG. 3A is a diagram showing the top two of the recognition candidates recognized by speech. FIG. 3B is a diagram for displaying the result of the information searched using each recognition candidate.

[ステップＳ１０１]
ステップＳ１０１において、認識候補の音声入力が開始される。認識候補の音声入力の開始に先立って、出力用バッファが初期化される。 [Step S101]
In step S101, speech input of recognition candidates is started. Prior to the start of speech input for recognition candidates, the output buffer is initialized.

［ステップＳ１０２〜ステップＳ１０３］
ステップＳ１０２において、入力された認識候補は、音響モデルを用いて音声認識する。次に、ステップＳ１０３において、認識した音声の中に、単語辞書に認識文字列があるかどうかを調べる。認識文字列が存在しない場合には、ステップＳ１１１に進み、認識文字列が存在する場合には、ステップＳ１０４に進む。 [Step S102 to Step S103]
In step S102, the input recognition candidate is recognized by using an acoustic model. Next, in step S103, it is checked whether or not the recognized speech includes a recognized character string in the word dictionary. If no recognized character string exists, the process proceeds to step S111. If a recognized character string exists, the process proceeds to step S104.

［ステップＳ１０４〜ステップＳ１０５］
ステップＳ１０４において、１つまたは複数の認識文字列が存在する場合には、認識された文字列が出力され、表示される。具体的には、図３（ａ）のような認識候補が表示される。複数の認識文字列が存在する場合には、例えば認識スコアの高い順から、２個の文字列が端末の表示部に表示される。ユーザは、自己が検索を望む認識候補を選択する（ステップＳ１０５）。 [Step S104 to Step S105]
If one or more recognized character strings exist in step S104, the recognized character string is output and displayed. Specifically, recognition candidates as shown in FIG. 3A are displayed. When there are a plurality of recognized character strings, for example, two character strings are displayed on the display unit of the terminal from the highest recognition score. The user selects a recognition candidate that he or she desires to search (step S105).

［ステップＳ１０６］
ステップＳ１０６では、選択された認識候補に基づいて、情報データベース中の情報が検索される。検索は、認識候補に相当するキーワードに一致する情報が存在するかどうかにより、行う。この場合に、キーワードとの一致は、キーワードと完全に一致しなくてもよく、認識文字列に含まれる文字列があれば一致したとみなす。 [Step S106]
In step S106, information in the information database is searched based on the selected recognition candidate. The search is performed depending on whether there is information that matches the keyword corresponding to the recognition candidate. In this case, the match with the keyword does not have to be completely matched with the keyword, and if there is a character string included in the recognized character string, it is regarded as a match.

［ステップＳ１０７］
ステップＳ１０６では、認識候補に相当するキーワードに一致する情報が存在するかどうかを判断する。認識候補に相当するキーワードに一致する情報が存在しない場合は、ステップ１１０に進み、認識候補に相当するキーワードに一致する情報が存在する場合には、ステップＳ１０８に進む。 [Step S107]
In step S106, it is determined whether there is information that matches the keyword corresponding to the recognition candidate. If there is no information matching the keyword corresponding to the recognition candidate, the process proceeds to step 110. If there is information matching the keyword corresponding to the recognition candidate, the process proceeds to step S108.

［ステップＳ１０８］
ステップＳ１０８では、出力バッファの総合スコアが、キーワードの使用頻度で、更新する。 [Step S108]
In step S108, the total score of the output buffer is updated with the keyword usage frequency.

［ステップＳ１０９］
出力用バッファを総合スコア順にソートし、出力する。出力情報は、送信制御部から、端末に送られる。端末は、サーバから送られた情報を、受信し、表示部で表示する。図３（ｂ）に示すように、選択した音声認識候補に基づいて検索された情報が利用度スコア順に表示される。 [Step S109]
Sort the output buffer in order of overall score and output. The output information is sent from the transmission control unit to the terminal. The terminal receives the information sent from the server and displays it on the display unit. As shown in FIG. 3B, information searched based on the selected speech recognition candidate is displayed in the order of the usage score.

［ステップＳ１１０］
認識文字列が存在しない場合、または認識候補に相当するキーワードに一致する情報が存在しない場合には、認識文字列または情報が存在しないことが、エラーメッセージとして出力される（ステップＳ１１０） [Step S110]
When there is no recognized character string or when there is no information matching the keyword corresponding to the recognition candidate, the fact that the recognized character string or information does not exist is output as an error message (step S110).

本動作の場合は、例えば、ユーザが“家のデザイン”と発声して、“絵のデザイン”と“家のデザイン”とがこの順で、認識結果として表示された場合に、ユーザが認識結果を参照して、検索対象を変えることができる。したがって、明確な目的がなく、情報を検索するユーザにとって、自由度の大きい情報検索が可能となる。 In the case of this operation, for example, when the user speaks “house design” and “picture design” and “house design” are displayed as recognition results in this order, the user recognizes the recognition result. The search target can be changed by referring to. Therefore, it is possible to perform information retrieval with a high degree of freedom for a user who retrieves information without a clear purpose.

（動作２）
次に、本発明の情報検索方法について図４を用いて詳細に説明する。図４は、本発明にかかる情報検索の動作を示すフローチャートである。この図の例では、認識候補に重み付けを行わない場合で、得られた認識候補の中から、複数の認識候補から特定の１つの認識候補を選択せずに、情報検索を行う場合である。
[ステップＳ２０１〜ステップＳ２０３]
ステップＳ２０１〜ステップＳ２０３においては、上記動作1のステップＳ１０１〜ステップＳ１０３と同様である。認識候補の音声入力の開始に先立って、出力用バッファが初期化される。 (Operation 2)
Next, the information search method of the present invention will be described in detail with reference to FIG. FIG. 4 is a flowchart showing an information search operation according to the present invention. In the example of this figure, when the recognition candidates are not weighted, the information search is performed without selecting one specific recognition candidate from the plurality of recognition candidates.
[Step S201 to Step S203]
Steps S201 to S203 are the same as steps S101 to S103 of the operation 1. Prior to the start of speech input for recognition candidates, the output buffer is initialized.

［ステップＳ２０４〜ステップＳ２０７］
ステップＳ２０４〜ステップＳ２０７においても、上記動作１のステップＳ１０６〜ステップＳ１０９の動作と基本的には同様である。ただし、ステップＳ２０４〜ステップＳ２０７においては、複数の認識候補が得られた場合には、認識候補ごとに、情報が検索される点において、動作1と異なる。 [Step S204 to Step S207]
Steps S204 to S207 are basically the same as the operations of Step S106 to Step S109 of the above operation 1. However, step S204 to step S207 differ from operation 1 in that information is searched for each recognition candidate when a plurality of recognition candidates are obtained.

［ステップＳ２０８〜ステップＳ２１０］
ステップＳ２０８では、認識結果が表示される。具体的には、図３（ａ）のような認識候補が示される。ステップＳ２０９で、ユーザは、情報を知りたい認識候補を選択する。ステップＳ２１０では、ユーザが選択した認識候補に基づく情報が、図３（ｂ）に示すように、選択した音声認識候補に基づいて検索された情報が利用度スコア順に表示される。 [Step S208 to Step S210]
In step S208, the recognition result is displayed. Specifically, recognition candidates as shown in FIG. In step S209, the user selects a recognition candidate for which information is desired. In step S210, information based on the recognition candidate selected by the user is displayed in the order of the usage score, as shown in FIG. 3B, information searched based on the selected speech recognition candidate.

［ステップＳ２１１］
認識文字列が存在しない場合、または認識候補に相当するキーワードに一致する情報が存在しない場合には、認識文字列または情報が存在しないことが、エラーメッセージとして出力される（ステップＳ２１１） [Step S211]
If the recognized character string does not exist, or if there is no information matching the keyword corresponding to the recognition candidate, it is output as an error message that no recognized character string or information exists (step S211).

本動作によれば、認識候補が複数ある場合には、ユーザは、複数の認識候補に基づいて、検索結果をみることができる。図３の例では、ユーザが“家のデザイン”と発声して、“絵のデザイン”と“家のデザイン”とがこの順で、認識結果として表示された場合に、いずれの認識候補からも、情報が検索できる。したがって、明確な目的を持たずに、情報を検索するユーザにとっては、広い情報を検索することができる。 According to this operation, when there are a plurality of recognition candidates, the user can view the search result based on the plurality of recognition candidates. In the example of FIG. 3, when the user utters “house design” and “picture design” and “house design” are displayed as recognition results in this order, from any recognition candidate, , Information can be searched. Therefore, a wide range of information can be searched for a user who searches for information without having a clear purpose.

（動作３）
次に、本発明の情報検索方法について図５、図６を用いて詳細に説明する。図５は、本発明にかかる情報検索の動作を示すフローチャートである。この図の例では、認識候補に重み付けを行う場合である。図６は、検索された情報に重み付けがされる課程を表す図である。図６（ａ）は、音声認識された認識候補のうち、上位２個を表す図である。図６（ｂ）は、それぞれの認識候補を用いて検索された情報の結果を表す図である。図６（ｃ）は、各検索された情報を重み付けした結果を示す図である。 (Operation 3)
Next, the information search method of the present invention will be described in detail with reference to FIGS. FIG. 5 is a flowchart showing an information search operation according to the present invention. In the example of this figure, the recognition candidates are weighted. FIG. 6 is a diagram illustrating a process in which the retrieved information is weighted. FIG. 6A is a diagram showing the top two recognition candidates recognized by speech recognition. FIG. 6B is a diagram illustrating a result of information searched using each recognition candidate. FIG. 6C is a diagram illustrating a result of weighting each searched information.

[ステップＳ３０１]
ステップＳ３０１において、認識スコアと、利用度スコアの重みの値が、それぞれ入力される。重み入力の開始に先立って、出力用バッファが初期化される。 [Step S301]
In step S301, a recognition score and a usage score weight value are input. Prior to the start of weight input, the output buffer is initialized.

[ステップＳ３０２〜ステップＳ３０７]
ステップＳ３０２〜ステップＳ３０７においては、上記動作２のステップＳ２０１〜ステップＳ２０６と同様である。 [Step S302 to Step S307]
Steps S302 to S307 are the same as steps S201 to S206 of the operation 2.

［ステップＳ３０８］
ステップＳ３０８では、重み付けを行う。図６（ａ）に示すように、音声認識候補は、上記動作１、２の場合と同様である。また。図６（ｂ）に示すように、それぞれに認識候補から、得られる情報も、上記動作１、２の場合と同様である。重み付けは、得られた各情報について、認識スコアに認識スコアの重みの値をかけたものと、各情報の利用度スコアに利用度スコアの重みの値をかけたものとを、加算することにより行う。この加算した値を、総合スコアとする。図６（ｃ）の例では、認識スコアの重みの値と、利用度スコアの重みの値とは、それぞれ１である。 [Step S308]
In step S308, weighting is performed. As shown in FIG. 6A, the speech recognition candidates are the same as those in the above operations 1 and 2. Also. As shown in FIG. 6B, the information obtained from each recognition candidate is the same as in the case of operations 1 and 2. For each piece of information obtained, weighting is performed by adding a value obtained by multiplying the recognition score by the weight value of the recognition score and a value obtained by multiplying the usage score of each information by the value of the weight of the usage score. Do. This added value is taken as the total score. In the example of FIG. 6C, the recognition score weight value and the usage score weight value are each 1.

［ステップＳ３０９〜ステップＳ３１０］
ステップＳ３０９では、重み付けされた情報が、出力される認識結果が表示される。具体的には、図３（ａ）のような認識候補が示される。ステップＳ３１０では、重み付けされた情報がが、図３（ｂ）に示すように、総合スコアの大きい順に検索された情報が表示される。 [Step S309 to Step S310]
In step S309, the recognition result to which the weighted information is output is displayed. Specifically, recognition candidates as shown in FIG. In step S310, as shown in FIG. 3B, information obtained by searching for weighted information in the descending order of the total score is displayed.

［ステップＳ３１１］
認識文字列が存在しない場合、または認識候補に相当するキーワードに一致する情報が存在しない場合には、認識文字列または情報が存在しないことが、エラーメッセージとして出力される（ステップＳ３１１） [Step S311]
If the recognized character string does not exist, or if there is no information matching the keyword corresponding to the recognition candidate, the fact that the recognized character string or information does not exist is output as an error message (step S311).

本動作の場合には、図６（ａ）に示すように、例えば、認識候補“家のデザイン”の認識スコアが、“絵のデザイン”の認識スコアよりも小さい場合でも、利用度スコアの値を反映することで、図６（ｃ）に示すように、“家のデザイン集”を最上位に表示することができる。すなわち、誤認識された場合であっても、利用度スコアを反映させることで、認識順位が低い認識候補に基づいて検索された情報をも、同時に参照できる。また、誤認識された“絵のデザイン”で検索された予期しない情報であっても、それが有用であれば、ユーザはその情報をも利用することができる。図６（ｃ）の例では、“壁紙デザイン”のように、“家のデザイン”に関連するが、“家のデザイン”で検索した場合に得ることができない情報が得られている。このように、認識スコアと利用度スコアとを統合することにより、より使いやすく、楽しめるシステムを得ることができる。 In the case of this operation, as shown in FIG. 6A, for example, even when the recognition score of the recognition candidate “house design” is smaller than the recognition score of “picture design”, the value of the usage score As shown in FIG. 6C, the “house design collection” can be displayed at the top. That is, even if it is erroneously recognized, information retrieved based on recognition candidates having a low recognition rank can be referred to at the same time by reflecting the usage score. Further, even if the unexpected information retrieved by the misrecognized “picture design” is useful, the user can also use the information. In the example of FIG. 6C, information related to “house design” but not obtained when searching by “house design” is obtained, such as “wallpaper design”. In this way, by integrating the recognition score and the usage score, it is possible to obtain a system that is easier to use and enjoyable.

（実施例２）
図７は、本発明の別の実施例の情報検索方法を実現するためのシステム構成を示すブロック図である。 (Example 2)
FIG. 7 is a block diagram showing a system configuration for realizing an information search method according to another embodiment of the present invention.

図７に示すように、本実施例では、サーバ２００側で、全ての認識処理を行わず、特徴パラメータの抽出までを端末１００で行い、その後の言語レベルの認識をサーバ２００側で行う。すなわち、図１におけるサーバ２００中の音声分析部２１３の代わりに、本実施例では、端末１００に音声分析部１０７を設けている点のみが実施例１と異なる。 As shown in FIG. 7, in the present embodiment, not all recognition processing is performed on the server 200 side, but feature parameter extraction is performed on the terminal 100, and subsequent language level recognition is performed on the server 200 side. That is, instead of the voice analysis unit 213 in the server 200 in FIG. 1, the present embodiment is different from the first embodiment only in that the terminal 100 is provided with the voice analysis unit 107.

したがって、この図の例では、端末１００では、音声分析によって得られる特徴パラメータの抽出までを行う。このため、端末に負担をかけず、かつ小さな通信量で音声認識に必要な情報を送信することができる。 Therefore, in the example of this figure, the terminal 100 performs the extraction of feature parameters obtained by voice analysis. For this reason, it is possible to transmit information necessary for voice recognition without burdening the terminal and with a small communication amount.

（変形実施例）
本発明の情報検索装置は、音声分析部に限らず、音声認識候補取得部をも、端末に設ける構成としてもよい。この場合に、必要な音響モデルや言語モデルは、サーバ側から、ダウンロードすればよい。 (Modified Example)
The information search apparatus according to the present invention is not limited to the voice analysis unit, and a voice recognition candidate acquisition unit may be provided in the terminal. In this case, the necessary acoustic model and language model may be downloaded from the server side.

また、音声認識取得部のうち、音韻レベルの抽出までを端末で行う構成としてもよい。この場合には、辞書や文法はサーバ側に用意すればよいので、端末側の利用可能な資源が少なく、処理能力が不十分な場合においても、精度のよい認識よく行うことができる。また、音韻情報を、通信回線を介してサーバに送り、発声された音声波形データそのものをサーバに送る必要がなくなる。このため、サーバ側で、認識処理を行う際の設備のコストを抑制することができ、音声認識機能を容易にサーバに組み込むことができる。 Moreover, it is good also as a structure which performs until the extraction of a phoneme level in a speech recognition acquisition part. In this case, since the dictionary and grammar need only be prepared on the server side, even if there are few resources available on the terminal side and the processing capability is insufficient, it can be performed with high accuracy and good recognition. In addition, it is not necessary to send phoneme information to the server via the communication line and send the voice waveform data itself uttered to the server. For this reason, the cost of the equipment at the time of performing recognition processing can be suppressed on the server side, and the voice recognition function can be easily incorporated into the server.

ところで、上記各実施の形態における情報検索装置としての機能は、プログラム記録媒体に記録された情報処理プログラムによって実現される。上記各実施の形態における上記プログラム記録媒体は、ＲＡＭ（ランダム・アクセル・メモリ）とは別体に設けられたＲＯＭ（リード・オンリー・メモリ）でなるプログラムメディアである。または、外部補助記録装置に装着されて読み出されるプログラムメディアであってもよい。尚、何れの場合においても、上記プログラムメディアから情報処理プログラムを読み出すプログラム読み出し手段は、上記プログラムメディアに直接アクセスして読み出す構成を有していてもよいし、上記ＲＡＭに設けられたプログラム記憶エリア（図示せず）にダウンロードし、上記プログラム記憶エリアにアクセスして読み出す構成を有していてもよい。尚、上記プログラムメディアからＲＡＭの上記プログラム記録エリアにダウンロードするためのダウンロードプログラムは、予め本体装置に格納されているもとする。
ここで、上記プログラムメディアとは、本体側と分離可能に構成され、磁気テープやカセットテープ等のテープ系、フレキシブルディスク、ハードディクス等の磁気ディスクやＣＤ（コンパクトディスク）−ＲＯＭ、ＭＯ（光磁気）ディスク、ＭＤ(ミニディスク)、ＤＶＤ（デジタル多用途ディスク）等の光ディスクのディスク系、ＩＣ（集積回路）カードや光カード等のカード系、マスクＲＯＭ、ＥＰＲＯＭ（紫外線消去型ＲＯＭ）、ＥＥＰＲＯＭ（電気的消去型ＲＯＭ）、フラッシュＲＯＭ等の半導体メモリ系を含めた、固定的にプログラムを坦持する媒体である。
また、上記実施の形態における情報検索装置は、モデムを備えてインターネットを含む通信ネットワークと接続可能となっている。この場合、上記プログラムメディアは、通信ネットワークからのダウンロード等によって流動的にプログラムを坦持する媒体であっても差し支えない。尚、その場合における上記通信ネットワークからダウンロードするためのダウンロードプログラムは、予め本体装置に格納されているものとする。あるいは、別の記録媒体からインストールされるものとする。
尚、上記記録媒体に記録されるものはプログラムのみに限定されるものではなく、データも記録することが可能である。 By the way, the function as the information retrieval apparatus in each of the above embodiments is realized by an information processing program recorded on a program recording medium. The program recording medium in each of the above embodiments is a program medium composed of a ROM (Read Only Memory) provided separately from a RAM (Random Accelerator Memory). Alternatively, it may be a program medium that is loaded into an external auxiliary recording device and read out. In any case, the program reading means for reading the information processing program from the program medium may have a configuration for directly accessing and reading the program medium, or a program storage area ( (Not shown) may be downloaded, and the program storage area may be accessed and read. It is assumed that a download program for downloading from the program medium to the program recording area of the RAM is stored in advance in the main unit.
Here, the program medium is configured to be separable from the main body side, and includes a tape system such as a magnetic tape and a cassette tape, a magnetic disk such as a flexible disk and a hard disk, a CD (compact disk) -ROM, and a MO (magneto-optical). ) Discs, optical discs such as discs, MD (mini discs), DVDs (digital versatile discs), card systems such as IC (integrated circuit) cards and optical cards, mask ROMs, EPROMs (ultraviolet erasable ROMs), EEPROMs ( It is a medium that carries a fixed program, including a semiconductor memory system such as an electrically erasable ROM) and a flash ROM.
In addition, the information retrieval apparatus in the above embodiment includes a modem and can be connected to a communication network including the Internet. In this case, the program medium may be a medium that fluidly carries the program by downloading from a communication network or the like. In this case, it is assumed that a download program for downloading from the communication network is stored in the main device in advance. Or it shall be installed from another recording medium.
It should be noted that what is recorded on the recording medium is not limited to a program, and data can also be recorded.

図１は、本発明の一実施例の情報検索方法を実現するためのシステム構成を示すブロック図である。FIG. 1 is a block diagram showing a system configuration for realizing an information search method according to an embodiment of the present invention. 図２は、本発明にかかる情報検索の動作を示すフローチャートである。FIG. 2 is a flowchart showing an information search operation according to the present invention. 図３は、本発明で、実際に情報検索がされる例を示す図である。図３（ａ）は、音声認識された認識候補のうち、上位２個を表示する図である。図３（ｂ）は、それぞれの認識候補を用いて検索された情報の結果を表示する図である。FIG. 3 is a diagram showing an example in which information is actually searched in the present invention. FIG. 3A is a diagram showing the top two of the recognition candidates recognized by speech. FIG. 3B is a diagram for displaying the result of the information searched using each recognition candidate. 図４は、本発明にかかる情報検索の動作を示すフローチャートである。FIG. 4 is a flowchart showing an information search operation according to the present invention. 図５は、本発明にかかる情報検索の動作を示すフローチャートである。FIG. 5 is a flowchart showing an information search operation according to the present invention. 図６は、検索された情報に重み付けがされる課程を表す図である。図６（ａ）は、音声認識された認識候補のうち、上位２個を表す図である。図６（ｂ）は、それぞれの認識候補を用いて検索された情報の結果を表す図である。図６（ｃ）は、各検索された情報を重み付けした結果を示す図である。FIG. 6 is a diagram illustrating a process in which the retrieved information is weighted. FIG. 6A is a diagram showing the top two recognition candidates recognized by speech recognition. FIG. 6B is a diagram illustrating a result of information searched using each recognition candidate. FIG. 6C is a diagram illustrating a result of weighting each searched information. 図７は、本発明の別の実施例の情報検索方法を実現するためのシステム構成を示すブロック図である。FIG. 7 is a block diagram showing a system configuration for realizing an information search method according to another embodiment of the present invention.

Explanation of symbols

１００端末
１０１音声入力部
１０２表示部
１０３重み入力部
１０４ＡＤ変換部
１０５送信制御部
１０６受信制御部
１０７音声分析部
２００サーバ
２１１受信制御部
２１２音声分析部
２１３音声認識候補取得部
２１４情報候補取得部
２２０情報解析部
２２１情報データベース
２２４単語辞書
２２５音響モデルテーブル及び言語モデルテーブル
２２６送信制御部
２２７通信回線
３００通信回線
４００インターネット DESCRIPTION OF SYMBOLS 100 Terminal 101 Voice input part 102 Display part 103 Weight input part 104 AD conversion part 105 Transmission control part 106 Reception control part 107 Voice analysis part 200 Server 211 Reception control part 212 Voice analysis part 213 Voice recognition candidate acquisition part 214 Information candidate acquisition part 220 Information Analysis Unit 221 Information Database 224 Word Dictionary 225 Acoustic Model Table and Language Model Table 226 Transmission Control Unit 227 Communication Line 300 Communication Line 400 Internet

Claims

A method for retrieving information by voice from a plurality of information provided by a server through a communication line,
An input step in which the user requests information to be searched by voice to the terminal;
A receiving step in which a server receives voice data relating to the input information from a terminal;
A speech recognition candidate acquisition step of recognizing the content of the speech data received in the reception step using an acoustic model and a language model, and obtaining a plurality of recognition candidates;
An information search step for searching for information using at least two recognition candidates among the plurality of obtained recognition candidates;
A transmission step of transmitting a plurality of pieces of information obtained in the information search step to a terminal;
An information display step for displaying at least one of the plurality of pieces of information transmitted in the transmission step;
A method for retrieving information, comprising:

The information search method according to claim 1, wherein the terminal includes a voice analysis step of parameterizing the input voice.

A method for retrieving information by voice from a plurality of information provided by a server through a communication line,
An input step in which the user requests information to be searched by voice to the terminal;
A speech recognition candidate acquisition step of recognizing the content of the speech data related to the input information using an acoustic model and a language model to obtain a plurality of recognition candidates;
A receiving step in which a server receives the obtained recognition candidate from a terminal; and
An information search step of searching for information using at least two recognition candidates among the plurality of recognition candidates received in the reception step;
A transmission step of transmitting a plurality of pieces of information obtained in the information search step to a terminal;
An information display step for displaying at least one of the plurality of pieces of information transmitted in the transmission step;
A method for retrieving information, comprising:

Each of the plurality of information obtained in the information search step has a usage score acquisition step of obtaining a usage score of each information,
The information search method according to any one of claims 1 to 3, wherein in the information display step, at least one of the plurality of pieces of displayed information is displayed by reflecting a usage score.

5. The display device according to claim 4, wherein at least one of the plurality of pieces of displayed information is displayed by reflecting a sum of a recognition score of a search candidate used for the search and the usage score. Information retrieval method described.

The information search method according to claim 5, further comprising a weighting step of weighting each of the recognition score of the search candidate used for the search and the usage score.

A method for retrieving information by voice from a plurality of information provided by a server through a communication line,
An input step in which the user requests information to be searched by voice to the terminal;
A receiving step in which a server receives voice data relating to the input information from a terminal;
A speech recognition candidate acquisition step of recognizing the content of the speech data received in the reception step using an acoustic model and a language model, and obtaining a plurality of recognition candidates;
A recognition candidate display step of displaying at least two recognition candidates on the terminal among the plurality of obtained recognition candidates;
A recognition candidate selection step in which the user selects one recognition candidate from the displayed at least two recognition candidates;
An information search step for searching for information using the recognition candidates selected in the recognition candidate selection step;
A method for retrieving information, comprising:

An information retrieval device for retrieving information by voice from a plurality of information provided by a server through a communication line,
An input means for inputting information that the user requests to search into the terminal by voice;
Receiving means for receiving voice data relating to the input information from a terminal by a server;
Speech recognition candidate acquisition means for recognizing the content of the voice data received by the receiving means using an acoustic model and a language model to obtain a plurality of recognition candidates;
Information search means for searching for information using at least two recognition candidates among the plurality of obtained recognition candidates;
Transmitting means for transmitting a plurality of information obtained in the information search means to the terminal;
Information display means for displaying at least one of the plurality of information transmitted by the transmission means;
An information retrieval apparatus comprising:

An information retrieval program for retrieving information by voice from a plurality of information provided by a server through a communication line,
On the computer,
An input step in which the user requests information to be searched by voice to the terminal;
A receiving step in which a server receives voice data relating to the input information from a terminal;
A speech recognition candidate acquisition step of recognizing the content of the speech data received in the reception step using an acoustic model and a language model, and obtaining a plurality of recognition candidates;
An information search step for searching for information using at least two recognition candidates among the plurality of obtained recognition candidates;
A transmission step of transmitting a plurality of pieces of information obtained in the information search step to a terminal;
An information display step for displaying at least one of the plurality of pieces of information transmitted in the transmission step;
An information retrieval program characterized by causing

A computer-readable recording medium on which the information search program according to claim 8 is recorded.