JP5559911B1

JP5559911B1 - Information retrieval apparatus and program

Info

Publication number: JP5559911B1
Application number: JP2013126730A
Authority: JP
Inventors: 博隆尾曲; 正亨菅野
Original assignee: SoftBank Mobile Corp
Current assignee: SoftBank Corp
Priority date: 2013-06-17
Filing date: 2013-06-17
Publication date: 2014-07-23
Anticipated expiration: 2033-06-17
Also published as: JP2015001881A

Abstract

【課題】情報検索システムが、質問に合致しない応答をしてしまう割合を減少させる技術が望まれている。
【解決手段】コンピュータを、ユーザが入力したクエリを受け付けるクエリ受付部、第１検索アルゴリズムを用いて検索された複数の応答内容のそれぞれについて、クエリに対する第１スコアを取得する第１スコア取得部、第１検索アルゴリズムとは異なる第２検索アルゴリズムを用いて検索された複数の応答内容のそれぞれについて、クエリに対する第２スコアを取得する第２スコア取得部、第１スコア取得部が取得した複数の第１スコア及び第２スコア取得部が取得した複数の第２スコアに基づいて、クエリに対する応答内容を決定する応答内容決定部、及び応答内容決定部が決定した応答内容を出力する応答内容出力部として機能させるためのプログラムを提供する。
【選択図】図２There is a demand for a technique for reducing the rate at which an information search system makes a response that does not match a question.
A query reception unit that receives a query input by a user, a first score acquisition unit that acquires a first score for the query for each of a plurality of response contents searched using a first search algorithm, For each of a plurality of response contents searched using a second search algorithm different from the first search algorithm, a second score acquisition unit that acquires a second score for the query, and a plurality of second contents acquired by the first score acquisition unit As a response content determination unit that determines the response content to the query based on the plurality of second scores acquired by the 1 score and the second score acquisition unit, and a response content output unit that outputs the response content determined by the response content determination unit Provide a program to make it function.
[Selection] Figure 2

Description

本発明は、情報検索装置及びプログラムに関する。 The present invention relates to an information search apparatus and a program.

従来、ユーザからの質問に対して、予め登録された複数の回答内容から検索した回答内容を出力する回答装置が知られていた。（例えば、特許文献１参照）。
［先行技術文献］
［特許文献］
［特許文献１］特開２０１１−０６０２１８号公報 2. Description of the Related Art Conventionally, there has been known an answering device that outputs answer contents searched from a plurality of answer contents registered in advance for a question from a user. (For example, refer to Patent Document 1).
[Prior art documents]
[Patent Literature]
[Patent Document 1] Japanese Patent Application Laid-Open No. 2011-060218

ユーザから受け付けたクエリに合致する応答をする可能性を向上させる技術が望まれている。 A technique for improving the possibility of responding to a query received from a user is desired.

本発明の第１の態様においては、コンピュータを、ユーザが入力したクエリを受け付けるクエリ受付部、第１検索アルゴリズムを用いて検索された複数の応答内容のそれぞれについて、クエリに対する第１スコアを取得する第１スコア取得部、第１検索アルゴリズムとは異なる第２検索アルゴリズムを用いて検索された複数の応答内容のそれぞれについて、クエリに対する第２スコアを取得する第２スコア取得部、第１スコア取得部が取得した複数の第１スコア及び第２スコア取得部が取得した複数の第２スコアに基づいて、クエリに対する応答内容を決定する応答内容決定部、及び応答内容決定部が決定した応答内容を出力する応答内容出力部として機能させるためのプログラムが提供される。 In the first aspect of the present invention, the computer acquires a first score for a query for each of a plurality of response contents searched using a query receiving unit that receives a query input by a user and a first search algorithm. A first score acquisition unit, a second score acquisition unit that acquires a second score for a query for each of a plurality of response contents searched using a second search algorithm different from the first search algorithm, a first score acquisition unit Based on the plurality of first scores acquired by the second score acquisition unit and the plurality of second scores acquired by the second score acquisition unit, the response content determination unit for determining the response content to the query and the response content determined by the response content determination unit are output. A program for functioning as a response content output unit is provided.

上記プログラムは、上記コンピュータを、複数の応答内容が登録された応答内容テーブルを参照する応答内容テーブル参照部、複数の語句及び複数の語句のそれぞれに対応付けられたスコアが登録された語句スコアテーブルを参照する語句スコアテーブル参照部、クエリ受付部が受け付けたクエリから複数の語句を抽出する語句抽出部、及び応答内容テーブルに登録された複数の応答内容のそれぞれについて、語句抽出部により抽出された複数の語句のうち、応答内容に含まれる複数の語句のスコアを、語句スコアテーブルから取得して乗算又は加算することにより、応答内容の第１スコアを算出する第１スコア算出部としてさらに機能させてよく、第１スコア取得部は、第１スコア算出部により算出された第１スコアを取得してよい。 The program includes a response content table reference unit that refers to a response content table in which a plurality of response contents are registered, a plurality of words, and a phrase score table in which scores associated with each of the plurality of words are registered. The phrase score table reference section for referring to the phrase, the phrase extraction section for extracting a plurality of phrases from the query received by the query reception section, and the plurality of response contents registered in the response contents table are extracted by the phrase extraction section. Of the plurality of phrases, the scores of the plurality of phrases included in the response content are acquired from the phrase score table and multiplied or added to further function as a first score calculation unit that calculates the first score of the response content. The first score acquisition unit may acquire the first score calculated by the first score calculation unit.

上記応答内容テーブルには、複数の応答内容及び複数の応答内容のそれぞれに対応付けられた複数の語句が登録されていてよく、第１スコア算出部は、応答内容テーブルに登録された複数の応答内容のそれぞれについて、語句抽出部により抽出された複数の語句のうち、応答内容及び対応付けられた複数の語句に含まれる複数の語句のスコアを、語句スコアテーブルから取得して乗算又は加算することにより、応答内容の第１スコアを算出してよい。また、上記プログラムにおいて、第１スコア算出部は、ＩＤＦ（ＩｎｖｅｒｓｅＤｏｃｕｍｅｎｔＦｒｅｑｕｅｎｃｙ）法をさらに用いて第１スコアを算出してよい。 In the response content table, a plurality of response contents and a plurality of words associated with each of the plurality of response contents may be registered, and the first score calculation unit includes a plurality of responses registered in the response content table. For each of the contents, out of a plurality of phrases extracted by the phrase extraction unit, a response content and scores of a plurality of phrases included in the associated plurality of phrases are acquired from the phrase score table and multiplied or added. Thus, the first score of the response content may be calculated. In the above program, the first score calculation unit may further calculate the first score by using an IDF (Inverse Document Frequency) method.

上記コンピュータを、応答内容テーブルに登録された複数の応答内容のそれぞれについて、語句抽出部により抽出された複数の語句のうち、応答内容に含まれる複数の語句のスコアを、ＴＦ（ＴｅｒｍＦｒｅｑｕｅｎｃｙ）法及びＩＤＦ法の少なくともいずれかによって算出して加算することにより、応答内容の第２スコアを算出する第２スコア算出部としてさらに機能させてよく、第２スコア取得部は、第２スコア算出部により算出された第２スコアを取得してよい。上記プログラムにおいて、第２スコア算出部は、応答内容テーブルに登録された複数の応答内容のそれぞれについて、語句抽出部により抽出された複数の語句のうち、応答内容及び対応付けられた複数の語句に含まれる複数の語句のスコアを、ＴＦ法及びＩＤＦ法の少なくともいずれかによって算出して加算することにより、応答内容の第２スコアを算出してよい。上記プログラムにおいて、第２スコア算出部は、語句抽出部により抽出された複数の語句に対する重み付けと、語句が応答内容に含まれるか当該応答内容に対応付けられた複数の語句に含まれるかに依存する重み付けと、応答内容の長さに対する重み付けと、語句抽出部により抽出された複数の語句のうち応答内容及び対応する複数の語句に含まれる語句の種類の数に対する重み付けと、の少なくともいずれかにさらに基づいて、第２スコアを算出してよい。 For each of a plurality of response contents registered in the response content table, the computer calculates scores of a plurality of words included in the response content among a plurality of words extracted by the word / phrase extraction unit, using a TF (Term Frequency) method. And by calculating and adding at least one of the IDF method, the second score calculating unit may further function as a second score calculating unit that calculates the second score of the response content. You may acquire the calculated 2nd score. In the above program, the second score calculation unit applies a response content and a plurality of associated words out of a plurality of words extracted by the word extraction unit for each of a plurality of response contents registered in the response content table. The second score of the response content may be calculated by calculating and adding scores of a plurality of words included by at least one of the TF method and the IDF method. In the above program, the second score calculation unit depends on the weighting for the plurality of words extracted by the word extraction unit and whether the word is included in the response content or the plurality of words associated with the response content. A weight for the length of the response content, and a weight for the number of types of words included in the response content and the corresponding multiple words among the multiple words extracted by the word extraction unit. Further, the second score may be calculated based on the above.

上記プログラムにおいて、応答内容決定部は、第１検索アルゴリズムを用いて検索された複数の応答内容のそれぞれの第１スコアを加算した第１加算値によって、複数の応答内容のそれぞれの第１スコアを正規化してよく、第２検索アルゴリズムを用いて検索された複数の応答内容のそれぞれの第２スコアを加算した第２加算値によって、複数の応答内容のそれぞれの第２スコアを正規化してよい。上記プログラムにおいて、応答内容決定部は、第１検索アルゴリズムの信頼度を適用した複数の第１スコアと、第２検索アルゴリズムの信頼度を適用した複数の第２スコアとに基づいて、クエリに対する応答内容を決定してよい。上記プログラムにおいて、応答内容決定部は、複数の第１スコアに対して、第１検索アルゴリズムを用いて検索された複数の応答内容の数を適用してよく、複数の第２スコアに対して、第２検索アルゴリズムを用いて検索された複数の応答内容の数を適用してよい。上記プログラムにおいて、応答内容決定部は、第１検索アルゴリズムを用いて検索された複数の応答内容の数が多いほど、複数の第１スコアを高くしてよく、第２検索アルゴリズムを用いて検索された複数の応答内容の数が多いほど、複数の第２スコアを高くしてよい。 In the above program, the response content determination unit obtains each first score of the plurality of response contents by a first addition value obtained by adding the first scores of the plurality of response contents searched using the first search algorithm. You may normalize and you may normalize each 2nd score of several response content with the 2nd addition value which added each 2nd score of each response content searched using the 2nd search algorithm. In the above program, the response content determination unit responds to the query based on the plurality of first scores to which the reliability of the first search algorithm is applied and the plurality of second scores to which the reliability of the second search algorithm is applied. The content may be determined. In the above program, the response content determination unit may apply the number of response contents searched using the first search algorithm to the plurality of first scores, and to the plurality of second scores, You may apply the number of the some response content searched using the 2nd search algorithm. In the above program, the response content determination unit may increase the plurality of first scores as the number of the plurality of response contents searched using the first search algorithm increases, and search using the second search algorithm. The larger the number of response contents, the higher the plurality of second scores.

本発明の第２の態様においては、ユーザが入力したクエリを受け付けるクエリ受付部と、第１検索アルゴリズムを用いて検索された複数の応答内容のそれぞれについて、クエリに対する第１スコアを取得する第１スコア取得部と、第１検索アルゴリズムとは異なる第２検索アルゴリズムを用いて検索された複数の応答内容のそれぞれについて、クエリに対する第２スコアを取得する第２スコア取得部と、第１スコア取得部が取得した複数の第１スコア及び第２スコア取得部が取得した複数の第２スコアに基づいて、クエリに対する応答内容を決定する応答内容決定部と、応答内容決定部が決定した応答内容を出力する応答内容出力部とを備える情報検索装置が提供される。 In the second aspect of the present invention, a query receiving unit that accepts a query input by a user and a first score for the query for each of a plurality of response contents searched using the first search algorithm. A score acquisition unit, a second score acquisition unit that acquires a second score for the query for each of a plurality of response contents searched using a second search algorithm different from the first search algorithm, and a first score acquisition unit Based on the plurality of first scores acquired by the first score and the plurality of second scores acquired by the second score acquisition unit, the response content determination unit that determines the response content to the query and the response content determined by the response content determination unit are output. An information search apparatus including a response content output unit is provided.

なお、上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではない。また、これらの特徴群のサブコンビネーションもまた、発明となりうる。 It should be noted that the above summary of the invention does not enumerate all the necessary features of the present invention. In addition, a sub-combination of these feature groups can also be an invention.

情報検索システムの通信環境の一例を概略的に示す。An example of the communication environment of an information search system is shown roughly. 情報検索システムの機能構成を概略的に示す。1 schematically shows a functional configuration of an information search system. コンテンツテーブルの一例を概略的に示す。An example of a content table is shown roughly. 語句スコアテーブルの一例を概略的に示す。An example of a phrase score table is shown roughly. 検索結果スコアの一例を概略的に示す。An example of a search result score is shown roughly. 評価結果スコアの一例を概略的に示す。An example of an evaluation result score is shown roughly.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described through embodiments of the invention, but the following embodiments do not limit the invention according to the claims. In addition, not all the combinations of features described in the embodiments are essential for the solving means of the invention.

図１は、情報検索システム１００の通信環境の一例を概略的に示す。情報検索システム１００は、本実施形態に係るプログラムによって各種処理を実行するコンピュータの一例であってよい。本実施形態において、情報検索システム１００は、ユーザ１０から受け付けたクエリに対する応答内容を検索して、ユーザ１０に対して出力する。 FIG. 1 schematically shows an example of a communication environment of the information search system 100. The information search system 100 may be an example of a computer that executes various processes using a program according to the present embodiment. In the present embodiment, the information search system 100 searches the response content for the query received from the user 10 and outputs it to the user 10.

図１は、情報検索システム１００が、通信端末３０を介してクエリの受付及び応答内容の出力をする例を示す。情報検索システム１００によるクエリの受付及び応答内容の出力はこれにかぎらず、ユーザ１０から直接クエリを受け付けて、応答内容を出力してもよい。また、通信端末３０が、情報検索システム１００として機能してもよい。 FIG. 1 illustrates an example in which the information search system 100 receives a query and outputs a response content via the communication terminal 30. The information retrieval system 100 is not limited to accepting queries and outputting response contents. The query may be received directly from the user 10 and the response contents may be output. Further, the communication terminal 30 may function as the information search system 100.

ユーザ１０は、例えば、通信端末３０に対してクエリをテキスト入力又は音声入力する。テキスト入力されたクエリは、通信端末３０によって情報検索システム１００に送信される。音声入力されたクエリは、音声認識技術によってテキストに変換され、情報検索システム１００に送信される。音声認識処理は、通信端末３０、ネットワーク２０に接続された音声処理サーバ４０、及び情報検索システム１００の少なくともいずれかによって実行されてよい。 The user 10 inputs a query to the communication terminal 30 by text or voice, for example. The text input query is transmitted to the information search system 100 by the communication terminal 30. The query input by voice is converted into text by voice recognition technology and transmitted to the information search system 100. The voice recognition process may be executed by at least one of the communication terminal 30, the voice processing server 40 connected to the network 20, and the information search system 100.

情報検索システム１００は、受け付けたクエリに対する応答内容を検索する。そして、情報検索システム１００は、特定した応答内容をユーザ１０に対して出力する。例えば、情報検索システム１００は、応答内容を通信端末３０に表示出力又は音声出力させる。また、情報検索システム１００は、ユーザ１０に対して直接表示出力又は音声出力してもよい。応答内容は、音声合成技術によって音声データに変換される。音声合成処理は、情報検索システム１００、音声処理サーバ４０、及び通信端末３０の少なくともいずれかによって実行されてよい。 The information search system 100 searches the response content for the accepted query. Then, the information search system 100 outputs the identified response content to the user 10. For example, the information search system 100 causes the communication terminal 30 to display or output the response content. Further, the information search system 100 may directly display or output the sound to the user 10. The response content is converted into speech data by speech synthesis technology. The speech synthesis process may be executed by at least one of the information search system 100, the speech processing server 40, and the communication terminal 30.

上述した処理の流れによって、情報検索システム１００は、クエリに対する応答内容を出力する。本実施形態における情報検索システム１００は、クエリに対する応答内容を検索するにあたり、クエリに合致しない応答をしてしまう割合を減少させるべく、複数種類の検索アルゴリズムを用いてそれぞれ検索した結果に基づいて、応答内容を特定する処理を実行する。 The information search system 100 outputs the response content to the query according to the processing flow described above. The information search system 100 according to the present embodiment is based on the results of searching using a plurality of types of search algorithms in order to reduce the proportion of responses that do not match the query when searching for the response content to the query. Executes processing to identify response contents.

図２は、情報検索システム１００の機能構成を概略的に示す。ここでは、情報検索システム１００が、単語条件検索と全文検索の２種類の検索アルゴリズムによる検索結果に基づいて、応答内容を特定する場合を例に挙げて説明する。なお、情報検索システム１００が用いる検索アルゴリズムはこれに限らず、他の検索アルゴリズムを用いて検索を行ってもよい。また、３種類以上の検索アルゴリズムによる検索結果に基づいて、応答内容を特定してもよい。 FIG. 2 schematically shows a functional configuration of the information search system 100. Here, a case will be described as an example where the information search system 100 specifies response contents based on search results obtained by two types of search algorithms, word condition search and full-text search. Note that the search algorithm used by the information search system 100 is not limited to this, and the search may be performed using another search algorithm. Moreover, you may identify the response content based on the search result by three or more types of search algorithms.

情報検索システム１００は、シナリオ実行エンジン２００、単語条件検索エンジン３００、全文検索エンジン４００、及び評価エンジン５００を備える。シナリオ実行エンジン２００は、クエリ受付部２０２、形態素解析部２０４、形態素辞書２０５、ＮＧワードフィルタ２０６、ＮＧワード辞書２０７、類義語展開部２０８、類義語辞書２０９、スコア取得部２１０、応答内容決定部２１６、及び応答内容出力部２１８を有する。単語条件検索エンジン３００は、単語条件検索部３０２及び語句スコアテーブル３１０を有する。全文検索エンジン４００は、全文検索部４０２を有する。評価エンジン５００は、評価部５０２を有する。 The information search system 100 includes a scenario execution engine 200, a word condition search engine 300, a full-text search engine 400, and an evaluation engine 500. The scenario execution engine 200 includes a query reception unit 202, a morpheme analysis unit 204, a morpheme dictionary 205, an NG word filter 206, an NG word dictionary 207, a synonym expansion unit 208, a synonym dictionary 209, a score acquisition unit 210, a response content determination unit 216, And a response content output unit 218. The word condition search engine 300 includes a word condition search unit 302 and a phrase score table 310. The full text search engine 400 has a full text search unit 402. The evaluation engine 500 includes an evaluation unit 502.

クエリ受付部２０２は、ユーザが入力したクエリを受け付ける。クエリ受付部２０２は、例えば、通信端末３０及び音声処理サーバ４０等から、テキスト又は音声のクエリを受信する。クエリ受付部２０２は、音声のクエリを受け付けた場合には、音声認識処理を実行することによってテキストに変換してよい。なお、クエリ受付部２０２は、マイク及びキーボードなどの入力機器を介して、直接クエリを受け付けてもよい。 The query receiving unit 202 receives a query input by the user. The query receiving unit 202 receives a text or voice query from, for example, the communication terminal 30 and the voice processing server 40. When the query reception unit 202 receives a speech query, the query reception unit 202 may convert the query into text by executing speech recognition processing. Note that the query receiving unit 202 may directly receive a query via an input device such as a microphone and a keyboard.

形態素解析部２０４は、形態素辞書２０５を用いてクエリを形態素解析して、複数の語句を抽出する。語句とは、単語及びフレーズの少なくともいずれかであってよい。形態素辞書２０５は周知の形態素データを含んでよく、また、情報検索システム１００の管理者等によって登録された語句を含んでよい。形態素辞書２０５は、情報検索システム１００の管理者等によって編集可能であってよい。形態素解析部２０４は、形態素辞書２０５に登録された語句に最長一致する単位で語句を抽出してよい。 The morpheme analysis unit 204 performs morpheme analysis on the query using the morpheme dictionary 205 to extract a plurality of words. The phrase may be at least one of a word and a phrase. The morpheme dictionary 205 may include well-known morpheme data, and may include words registered by an administrator of the information search system 100 or the like. The morpheme dictionary 205 may be editable by an administrator of the information search system 100 or the like. The morpheme analysis unit 204 may extract a phrase in a unit that most matches the phrase registered in the morpheme dictionary 205.

ＮＧワードフィルタ２０６は、形態素解析部２０４によって抽出された複数の語句から、ＮＧワード辞書２０７に登録されたＮＧワードを除去する。ＮＧワード辞書２０７は、情報検索システム１００の管理者等によって登録されたＮＧワードを含んでよい。ＮＧワードとしては、例えば、応答内容の識別に貢献しない使用頻度の高いワードが登録される。ＮＧワード辞書２０７は、情報検索システム１００の管理者等によって編集可能であってよい。 The NG word filter 206 removes NG words registered in the NG word dictionary 207 from the plurality of words extracted by the morphological analysis unit 204. The NG word dictionary 207 may include NG words registered by an administrator of the information search system 100 or the like. As the NG word, for example, a frequently used word that does not contribute to the identification of the response content is registered. The NG word dictionary 207 may be editable by an administrator of the information search system 100 or the like.

類義語展開部２０８は、形態素解析部２０４によって抽出され、ＮＧワードフィルタ２０６によって除去されなかった語句の類義語を、類義語辞書２０９を用いて展開する。類義語辞書２０９は、周知の類義語データを含んでよく、また、情報検索システム１００の管理者等によって登録された類義語を含んでよい。類義語辞書２０９は、情報検索システム１００の管理者等によって編集可能であってよい。 The synonym expansion unit 208 uses the synonym dictionary 209 to expand the synonyms of the phrases extracted by the morphological analysis unit 204 and not removed by the NG word filter 206. The synonym dictionary 209 may include well-known synonym data, and may include synonyms registered by an administrator of the information search system 100 or the like. The synonym dictionary 209 may be editable by an administrator of the information search system 100 or the like.

スコア取得部２１０は、第１スコア取得部２１２及び第２スコア取得部２１４を有する。第１スコア取得部２１２は、単語条件検索エンジン３００によって検索された複数の応答内容のそれぞれについて、クエリに対するスコア（第１スコアと称する場合がある。）を取得する。第１スコア取得部２１２は、類義語展開部２０８から受信した複数の語句を、単語条件検索部３０２に送信し、単語条件検索部３０２から、検索された複数の応答内容のそれぞれの第１スコアを受信してよい。 The score acquisition unit 210 includes a first score acquisition unit 212 and a second score acquisition unit 214. The first score acquisition unit 212 acquires a score for the query (sometimes referred to as a first score) for each of a plurality of response contents searched by the word condition search engine 300. The first score acquisition unit 212 transmits the plurality of phrases received from the synonym expansion unit 208 to the word condition search unit 302, and the first score of each of the plurality of response contents searched from the word condition search unit 302 is obtained. You may receive it.

単語条件検索部３０２は、コンテンツテーブル２８０及び語句スコアテーブル３１０を参照することにより、クエリに対する応答内容を検索する。コンテンツテーブル２８０には、複数の応答内容が登録される。複数の応答内容は、情報検索システム１００の管理者等によって予め登録されてよい。コンテンツテーブル２８０は、応答内容テーブルの一例であってよい。 The word condition search unit 302 searches the response contents for the query by referring to the content table 280 and the phrase score table 310. In the content table 280, a plurality of response contents are registered. The plurality of response contents may be registered in advance by an administrator of the information search system 100 or the like. The content table 280 may be an example of a response content table.

語句スコアテーブル３１０には、複数の語句及び複数の語句のそれぞれに対応付けられたスコアが登録される。複数の語句及びそれぞれのスコアは、情報検索システム１００の管理者等によって予め登録されてよい。 In the phrase score table 310, a plurality of phrases and scores associated with each of the plurality of phrases are registered. A plurality of words and their scores may be registered in advance by an administrator of the information search system 100 or the like.

単語条件検索部３０２は、コンテンツテーブル２８０に登録された複数の応答内容のそれぞれについて、第１スコア取得部２１２から受信した複数の語句のうち、応答内容に含まれる複数の語句のスコアを、語句スコアテーブル３１０から取得して乗算又は加算することにより、応答内容の第１スコアを算出してよい。そして、単語条件検索部３０２は、算出した第１スコアが予め定められた閾値より大きい応答内容を検索結果として採用してよい。 The word condition search unit 302 uses, for each of a plurality of response contents registered in the content table 280, out of a plurality of words received from the first score acquisition unit 212, scores of a plurality of words included in the response contents, The first score of the response content may be calculated by obtaining from the score table 310 and multiplying or adding. Then, the word condition search unit 302 may adopt a response content in which the calculated first score is greater than a predetermined threshold as a search result.

予め定められた閾値は、０であってよく、また、情報検索システム１００の管理者等によって任意に定められた値であってもよい。単語条件検索部３０２が用いる検索アルゴリズムは、第１検索アルゴリズムの一例であってよい。 The predetermined threshold may be 0, or may be a value arbitrarily determined by an administrator of the information search system 100 or the like. The search algorithm used by the word condition search unit 302 may be an example of a first search algorithm.

第２スコア取得部２１４は、全文検索エンジン４００によって検索された複数の応答内容のそれぞれについて、クエリに対するスコア（第２スコアと称する場合がある。）を取得する。第２スコア取得部２１４は、類義語展開部２０８から受信した複数の語句を、全文検索部４０２に送信し、全文検索部４０２から、検索された複数の応答内容のそれぞれの第２スコアを受信してよい。 The second score acquisition unit 214 acquires a score for the query (sometimes referred to as a second score) for each of a plurality of response contents searched by the full-text search engine 400. The second score acquisition unit 214 transmits the plurality of phrases received from the synonym expansion unit 208 to the full-text search unit 402, and receives the second scores of the plurality of retrieved response contents from the full-text search unit 402. It's okay.

全文検索部４０２は、コンテンツテーブル２８０を参照することにより、クエリに対する応答内容を検索する。なお、ここでは、単語条件検索エンジン３００及び全文検索エンジン４００が同一のコンテンツテーブル２８０を参照する場合を例に挙げて説明するが、単語条件検索エンジン３００及び全文検索エンジン４００のそれぞれが、複製されたコンテンツテーブル２８０を有してもよい。なお、全文検索エンジン４００として、Ａｐａｃｈｅプロジェクトが管理する全文検索エンジンであるＬＵＣＥＮＥを採用してもよい。 The full-text search unit 402 searches the response content with respect to the query by referring to the content table 280. Here, a case where the word condition search engine 300 and the full text search engine 400 refer to the same content table 280 will be described as an example, but the word condition search engine 300 and the full text search engine 400 are duplicated. The content table 280 may be included. Note that LUCENE, which is a full-text search engine managed by the Apache project, may be adopted as the full-text search engine 400.

全文検索部４０２は、コンテンツテーブル２８０に登録された複数の応答内容のそれぞれについて、第２スコア取得部２１４から取得した複数の語句のうち、応答内容に含まれる複数の語句を、ＴＦ法及びＩＤＦ法の少なくともいずれかによって算出して加算することにより、応答内容の第２スコアを算出してよい。 For each of a plurality of response contents registered in the content table 280, the full-text search unit 402 converts a plurality of words / phrases included in the response contents among the plurality of words / phrases acquired from the second score acquisition unit 214 into the TF method and the IDF. The second score of the response content may be calculated by calculating and adding according to at least one of the methods.

そして、全文検索部４０２は、算出した第２スコアが予め定められた閾値より大きい応答内容を検索結果として採用してよい。予め定められた閾値は、０であってよく、また、情報検索システム１００の管理者等によって任意に定められた値であってもよい。全文検索部４０２が用いる検索アルゴリズムは、第２検索アルゴリズムの一例であってよい。 Then, the full-text search unit 402 may adopt a response content in which the calculated second score is greater than a predetermined threshold as a search result. The predetermined threshold may be 0, or may be a value arbitrarily determined by an administrator of the information search system 100 or the like. The search algorithm used by the full text search unit 402 may be an example of a second search algorithm.

応答内容決定部２１６は、第１スコア取得部２１２が取得した複数の第１スコア及び第２スコア取得部２１４が取得した複数の第２スコアに基づいて、クエリに対する応答内容を決定する。応答内容決定部２１６は、単語条件検索部３０２によって検索された複数の応答内容及びそれぞれの第１スコアと、全文検索部４０２によって検索された複数の応答内容及びそれぞれの第２スコアとを、評価部５０２に評価させてよい。ここで、評価部５０２による評価とは、第１スコア及び第２スコアに基づいて、複数の応答内容の最終的なスコアを算出することであってよい。 The response content determination unit 216 determines the response content to the query based on the plurality of first scores acquired by the first score acquisition unit 212 and the plurality of second scores acquired by the second score acquisition unit 214. The response content determination unit 216 evaluates the plurality of response contents searched by the word condition search unit 302 and the respective first scores, and the plurality of response contents searched by the full-text search unit 402 and the respective second scores. The part 502 may be evaluated. Here, the evaluation by the evaluation unit 502 may be to calculate final scores of a plurality of response contents based on the first score and the second score.

評価部５０２は、複数の第１スコア及び複数の第２スコアに基づいて、クエリに対する複数の応答内容のそれぞれのスコアを算出する。評価部５０２は、複数の応答内容のうちの一の応答内容が、単語条件検索部３０２による検索結果及び全文検索部４０２による検索結果の一方のみに含まれる場合には、その一方のスコアを採用する。また、評価部５０２は、複数の応答内容のうちの一の応答内容が、単語条件検索部３０２による検索結果と、全文検索部４０２による検索結果の両方に含まれる場合には、対応する第１スコア及び第２スコアを重み付け加算してよい。 The evaluation unit 502 calculates each score of a plurality of response contents for the query based on the plurality of first scores and the plurality of second scores. If the response content of one of the plurality of response content is included in only one of the search result by the word condition search unit 302 and the search result by the full-text search unit 402, the evaluation unit 502 adopts one of the scores To do. Further, the evaluation unit 502 corresponds to the first response content in the case where one response content among the plurality of response content is included in both the search result by the word condition search unit 302 and the search result by the full-text search unit 402. The score and the second score may be weighted and added.

例えば、評価部５０２は、第１スコアに第１検索アルゴリズムの信頼度を適用し、第２スコアに第２検索アルゴリズムの信頼度を適用して、加算する。信頼度の適用とは、例えば、信頼度を乗算することである。これによって、より信頼性の高い検索アルゴリズムによる検索結果を重視しつつ、より信頼性の低い検索アルゴリズムによる検索結果をも考慮に加えた検索結果を提供できる。 For example, the evaluation unit 502 applies the reliability of the first search algorithm to the first score and applies the reliability of the second search algorithm to the second score. Application of reliability is, for example, multiplying reliability. As a result, it is possible to provide a search result in which the search result by the search algorithm with lower reliability is also taken into consideration while placing importance on the search result by the search algorithm with higher reliability.

なお、信頼度の適用とは、信頼度が大きいほどスコアが高くなる計算であってよく、例えば、信頼度の大きさに応じた大きさを有する係数を乗算することであってもよい。第１検索アルゴリズムの信頼度及び第２検索アルゴリズムの信頼度は、情報検索システム１００の管理者等によって予め設定されてよい。例えば、全文検索部４０２が用いる第２検索アルゴリズムの信頼度が、単語条件検索部３０２が用いる第１検索アルゴリズムの信頼度よりも高くなるように設定されてよい。第１検索アルゴリズムの信頼度と第２検索アルゴリズムの信頼度とは、合計が１．０になるように設定されてよい。 The application of the reliability may be a calculation in which the score increases as the reliability increases, and may be, for example, multiplication by a coefficient having a magnitude corresponding to the magnitude of the reliability. The reliability of the first search algorithm and the reliability of the second search algorithm may be set in advance by an administrator of the information search system 100 or the like. For example, the reliability of the second search algorithm used by the full text search unit 402 may be set to be higher than the reliability of the first search algorithm used by the word condition search unit 302. The reliability of the first search algorithm and the reliability of the second search algorithm may be set so that the sum is 1.0.

また、評価部５０２は、第１スコア及び第２スコアを、検索結果の順位によって正規化してよい。例えば、評価部５０２は、単語条件検索部３０２によって検索された複数の応答内容のそれぞれの第１スコアを加算した第１加算値によって、複数の応答内容のそれぞれの第１スコアを正規化する。また、例えば、評価部５０２は、全文検索部４０２によって検索された複数の応答内容のそれぞれの第２スコアを加算した第２加算値によって、複数の応答内容のそれぞれの第２スコアを正規化する。これにより、第１検索アルゴリズムと第２検索アルゴリズムとのスコアの尺度の違いを吸収できる。 Further, the evaluation unit 502 may normalize the first score and the second score according to the rank of the search result. For example, the evaluation unit 502 normalizes each first score of the plurality of response contents by using a first addition value obtained by adding the first scores of the plurality of response contents searched by the word condition search unit 302. For example, the evaluation unit 502 normalizes each second score of the plurality of response contents by using a second addition value obtained by adding the second scores of the plurality of response contents searched by the full-text search unit 402. . Thereby, the difference in the score scale between the first search algorithm and the second search algorithm can be absorbed.

また、評価部５０２は、単語条件検索部３０２により検索された複数の応答内容の数を第１スコアに適用してよく、全文検索部４０２により検索された複数の応答内容の数を第２スコアに適用してよい。検索された応答内容の数を適用するとは、例えば、検索された応答内容の数をスコアに乗算することである。これにより、例えば、スコアを正規化した場合における検索結果数が多いことによるスコアの値の低下を、補てんすることができる。なお、検索された応答内容の数を適用するとは、検索された応答内容の数が多いほどスコアが高くなる計算であってよく、例えば、検索された応答内容の数の多さに対応する大きさを有する係数を乗算することであってもよい。 Further, the evaluation unit 502 may apply the number of response contents searched by the word condition search unit 302 to the first score, and set the number of response contents searched by the full-text search unit 402 as the second score. May apply. Applying the number of retrieved response contents means, for example, multiplying the score by the number of retrieved response contents. Thereby, for example, a decrease in score value due to a large number of search results when the score is normalized can be compensated. The application of the number of retrieved response contents may be a calculation in which the score increases as the number of retrieved response contents increases. For example, the magnitude corresponding to the number of retrieved response contents is large. It is also possible to multiply by a coefficient having a certain length.

評価部５０２は、スコアを算出した複数の応答内容のうち、最も高いスコアを有する応答内容を検索結果として応答内容決定部２１６に送信してよい。また、評価部５０２は、複数の応答内容のうち、スコアの高い順に任意の数の応答内容を応答内容決定部２１６に送信してもよい。また、評価部５０２は、スコアを算出したすべての応答内容を応答内容決定部２１６に送信してもよい。 The evaluation unit 502 may transmit the response content having the highest score among the plurality of response contents for which the scores have been calculated to the response content determination unit 216 as a search result. In addition, the evaluation unit 502 may transmit an arbitrary number of response contents from the plurality of response contents to the response content determination unit 216 in descending order of score. In addition, the evaluation unit 502 may transmit all response contents for which scores have been calculated to the response content determination unit 216.

応答内容出力部２１８は、応答内容決定部２１６が決定した、クエリに対する応答内容を出力する。応答内容出力部２１８は、複数の応答内容のうち最も高いスコアを有する応答内容のみを検索結果として出力してよい。また、応答内容出力部２１８は、スコアの高い順に任意の数の応答内容を出力してもよい。 The response content output unit 218 outputs the response content for the query determined by the response content determination unit 216. The response content output unit 218 may output only the response content having the highest score among the plurality of response contents as a search result. Further, the response content output unit 218 may output an arbitrary number of response contents in descending order of score.

上述したように、本実施形態に係る情報検索システム１００は、コンテンツテーブル２８０に対して、複数の検索アルゴリズムを用いることによって複数の検索結果を取得し、複数の検索結果に基づいて、応答内容を特定する。これにより、複数のアルゴリズムの長所を活かし、短所を補うような検索結果を提供することができる。特に、第１検索アルゴリズム及び第２検索アルゴリズムの信頼度を編集可能に構成したことによって、第１検索アルゴリズムが有利な検索対象、第２検索アルゴリズムが有利な検索対象など、場面に応じたチューニングを容易に行わせることができる。 As described above, the information search system 100 according to the present embodiment acquires a plurality of search results from the content table 280 by using a plurality of search algorithms, and obtains response contents based on the plurality of search results. Identify. This makes it possible to provide a search result that makes use of the advantages of a plurality of algorithms and compensates for the disadvantages. In particular, by configuring the reliability of the first search algorithm and the second search algorithm so as to be editable, tuning according to the scene, such as a search object advantageous for the first search algorithm, a search object advantageous for the second search algorithm, etc. It can be done easily.

図３は、コンテンツテーブル２８０の一例を概略的に示す。コンテンツテーブル２８０は、複数のＩＤ毎に、ｔｉｔｌｅ、ｂｏｄｙ、及び登録語句を含んでよい。ｔｉｔｌｅ及びｂｏｄｙは、応答内容の一例であってよい。例えば、応答内容出力部２１８がＩＤ：０００１の応答内容を出力する場合、「パケットし放題対象になるサービスは、国内でご利用いただいたメール・ウェブのご利用料金（通信料）が「パケットし放題」の対象です」と出力する。登録語句は、情報検索システム１００の管理者等によって、ＩＤ毎に登録される語句であってよい。情報検索システム１００の管理者等は、例えば、ＩＤ毎の応答内容に関連する語句を登録する。 FIG. 3 schematically shows an example of the content table 280. The content table 280 may include a title, a body, and a registered phrase for each of a plurality of IDs. The title and body may be an example of response contents. For example, when the response content output unit 218 outputs the response content of ID: 0001, “the service subject to unlimited packet transmission is the usage fee (communication fee) of mail / web used in Japan” "It is the target of unlimited" The registered phrase may be a phrase registered for each ID by an administrator of the information search system 100 or the like. For example, the administrator of the information search system 100 registers a word / phrase related to the response content for each ID.

図４は、語句スコアテーブル３１０の一例を概略的に示す。語句スコアテーブル３１０には、複数の語句と、複数の語句のそれぞれのスコアとが登録される。複数の語句及びそれぞれのスコアは、情報検索システム１００の管理者等によって登録されてよい。 FIG. 4 schematically shows an example of the phrase score table 310. In the phrase score table 310, a plurality of phrases and the scores of the plurality of phrases are registered. A plurality of words and their scores may be registered by an administrator of the information search system 100 or the like.

語句スコアテーブル３１０に登録された語句は、形態素辞書２０５に登録されてよい。これにより、例えば、クエリに「パケットし放題」が含まれる場合に、形態素解析部２０４は「パケット」と「放題」に分解することなく、「パケットし放題」を切り出すことができる。なお、形態素解析部２０４は、クエリに「パケットし放題」が含まれる場合、「パケットし放題」、「パケット」、「放題」を抽出してもよい。なお、語句スコアテーブル３１０は、さらに、複数の語句のそれぞれの類義語を含んでもよい。例えば、図４の例において、「お金」という語句に対して「請求額」、「金額」が類義語として登録される。 The phrases registered in the phrase score table 310 may be registered in the morpheme dictionary 205. As a result, for example, when “unlimited packet” is included in the query, the morphological analysis unit 204 can extract “unlimited packet” without disassembling into “packet” and “unlimited”. Note that the morphological analysis unit 204 may extract “unlimited packet”, “packet”, and “unlimited” when the query includes “unlimited packet”. The phrase score table 310 may further include synonyms for each of a plurality of phrases. For example, in the example of FIG. 4, “charged amount” and “amount” are registered as synonyms for the phrase “money”.

上述したように、単語条件検索部３０２は、コンテンツテーブル２８０及び語句スコアテーブル３１０を参照することにより、クエリに対する、複数の応答内容の第１スコアを算出する。ここで、「定額で使えるパケットし放題のサービスができる対象は」というクエリを受け付けた場合のＩＤ：０００１の応答内容の第１スコア算出について、具体例を挙げて説明する。なお、単語条件検索部３０２は、第１スコア算出にあたり、コンテンツテーブル２８０の登録語句を用いても用いなくてもよいが、ここでは、登録語句を用いる場合について説明する。 As described above, the word condition search unit 302 refers to the content table 280 and the phrase score table 310 to calculate first scores of a plurality of response contents with respect to the query. Here, the first score calculation of the response content of ID: 0001 when the query “Who can provide an unlimited packet service that can be used for a fixed amount” is received will be described with a specific example. Note that the word condition search unit 302 may or may not use the registered words / phrases in the content table 280 in calculating the first score, but here, a case where registered words / phrases are used will be described.

まず、クエリ「定額で使えるパケットし放題のサービスができる対象は」から、形態素解析部２０４、ＮＧワードフィルタ２０６、及び類義語展開部２０８によって、例えば「定額」、「パケットし放題」、「サービス」、及び「対象」が抽出されたとする。単語条件検索部３０２は、抽出された語句のうち、ＩＤ：０００１の応答内容及び登録語句に含まれる語句を特定する。ここでは、「定額」、「パケットし放題」、「サービス」、及び「対象」が特定される。そして単語条件検索部３０２は、それぞれの語句に対応するスコアを語句スコアテーブル３１０から取得して乗算又は加算する。これにより、３．０、１０．０、３．０、２．０を乗算した１８０．０又は加算した１５．０が算出される。このように、単語条件検索部３０２は、コンテンツテーブル２８０及び語句スコアテーブル３１０を参照することによって、第１スコアを算出する。 First, from the query “objects for which an unlimited packet service can be used for a fixed amount”, the morphological analysis unit 204, the NG word filter 206, and the synonym expansion unit 208, for example, “fixed amount”, “unlimited packet”, “service”. , And “object” are extracted. The word condition search unit 302 identifies the phrase included in the response content and the registered phrase of ID: 0001 among the extracted phrases. Here, “fixed amount”, “all-you-can-packet”, “service”, and “target” are specified. Then, the word condition search unit 302 acquires a score corresponding to each word from the word score table 310 and multiplies or adds the scores. As a result, 180.0 obtained by multiplying 3.0, 10.0, 3.0, and 2.0 or 15.0 obtained by addition is calculated. As described above, the word condition search unit 302 calculates the first score by referring to the content table 280 and the phrase score table 310.

なお、単語条件検索部３０２は、ＩＤＦ法をさらに用いて第１スコアを算出してよい。すなわち、単語条件検索部３０２は、コンテンツテーブル２８０に登録された全応答内容のうち、対象とする語句を含む応答内容の数を考慮して第１スコアを算出してよい。例えば、全応答内容の数が１０００個であり、「定額」を含む応答内容の数が９個である場合に、ｌｏｇ（全応答内容数／「定額」を含む応答内容数＋１）＝ｌｏｇ（１０００／（９＋１））＝２を、「定額」のスコア３．０に乗算してよい。 Note that the word condition search unit 302 may further calculate the first score using the IDF method. That is, the word condition search unit 302 may calculate the first score in consideration of the number of response contents including the target word / phrase among all the response contents registered in the content table 280. For example, when the total number of response contents is 1000 and the number of response contents including “fixed amount” is 9, log (total number of response contents / number of response contents including “fixed amount” +1) = log ( 1000 / (9 + 1)) = 2 may be multiplied by a “flat rate” score of 3.0.

全文検索部４０２は、ＴＦ法及びＩＤＦ法の少なくともいずれかによって、第２スコアを算出してよい。例えば、ＴＦ法のみを用いる場合には、ＩＤ：０００１の応答内容における「定額」、「パケットし放題」、「サービス」、及び「対象」のそれぞれの出現頻度を算出して加算することにより、第２スコアを算出してよい。 The full-text search unit 402 may calculate the second score by at least one of the TF method and the IDF method. For example, when only the TF method is used, by calculating and adding the appearance frequencies of “fixed amount”, “unlimited packet”, “service”, and “target” in the response content of ID: 0001, A second score may be calculated.

なお、全文検索部４０２は、さらに、各種重み付けを適用することによって、第２スコアを算出してよい。例えば、全文検索部４０２は、クエリから抽出された複数の語句に対する重み付けを採用する。例えば、クエリから複数の語句を抽出した後で、複数の語句をユーザに提示して、複数の語句に対する重み付けの入力を受け付けることにより、クエリから抽出された複数の語句に重み付けをしてよい。これにより、ユーザがより重要と考える語句を重視した検索結果を提供することができる。全文検索エンジン４００としてＬＵＣＥＮＥを採用した場合、ＧｅｔＢｏｏｓｔ関数によって当該重み付けを実現してもよい。 The full-text search unit 402 may further calculate the second score by applying various weights. For example, the full-text search unit 402 employs weighting for a plurality of phrases extracted from the query. For example, after extracting a plurality of phrases from the query, the plurality of phrases extracted from the query may be weighted by presenting the plurality of phrases to the user and accepting input of weights for the plurality of phrases. As a result, it is possible to provide a search result that emphasizes words that the user considers more important. When LUCENE is adopted as the full-text search engine 400, the weighting may be realized by a GetBoost function.

また、全文検索部４０２は、検索フィールドに対する重み付けを採用してよい。例えば、全文検索部４０２は、語句が、コンテンツテーブル２８０の応答内容に含まれる場合と、登録語句に含まれる場合とで異なる重み付けをしてスコアを算出してよい。全文検索部４０２は、応答内容に含まれる場合よりも登録語句に含まれる場合の方により重い重み付けをして、スコアを算出してよい。全文検索エンジン４００としてＬＵＣＥＮＥを採用した場合、ＧｅｔＢｏｏｓｔ関数によって当該重み付けを実現してもよい。 Further, the full text search unit 402 may employ weighting for the search field. For example, the full-text search unit 402 may calculate the score by assigning different weights to the case where the phrase is included in the response content of the content table 280 and the case where it is included in the registered phrase. The full-text search unit 402 may calculate the score by weighting more heavily when it is included in the registered phrase than when it is included in the response content. When LUCENE is adopted as the full-text search engine 400, the weighting may be realized by a GetBoost function.

また、全文検索部４０２は、応答内容の長さに対する重み付けを採用してよい。例えば、全文検索部４０２は、応答内容の長さが短いほど、語句が含まれた場合のスコアが高くなるように、第２スコアを算出する。全文検索エンジン４００としてＬＵＣＥＮＥを採用した場合、ＬｅｎｇｔｈＮｏｒｍ関数によって当該重み付けを実現してもよい。 The full-text search unit 402 may employ weighting for the length of the response content. For example, the full-text search unit 402 calculates the second score such that the shorter the response content length, the higher the score when a phrase is included. When LUCENE is adopted as the full-text search engine 400, the weighting may be realized by a LengthNorm function.

また、全文検索部４０２は、形態素解析部２０４によってクエリから抽出された複数の語句のうち、応答内容及び登録語句に含まれる語句の種類の数に対応する重み付けを採用してよい。例えば、全文検索部４０２は、より多くの種類の語句が含まれる方が、第２スコアが高くなるべく、第２スコアを算出する。全文検索エンジン４００としてＬＵＣＥＮＥを採用した場合、ｃｏｏｒｄ関数によって当該重み付けを実現してもよい。 In addition, the full-text search unit 402 may employ a weight corresponding to the number of types of phrases included in the response content and the registered phrase among the plurality of phrases extracted from the query by the morpheme analyzer 204. For example, the full-text search unit 402 calculates the second score so that the second score is higher when more types of words / phrases are included. When LUCENE is adopted as the full-text search engine 400, the weighting may be realized by a coord function.

図５は、検索結果スコア及び正規化後の検索結果スコアの一例を概略的に示す。図５における単語条件検索結果スコア３５０は、４つの応答内容とそれぞれのスコアとを含む。ＩＤ：０００１のスコアが２４．０、ＩＤ：０００２のスコアが１０．０、ＩＤ：０００３のスコアが４．０、ＩＤ：０００４のスコアが２．０であり、合計スコアが４０．０である。 FIG. 5 schematically shows an example of the search result score and the search result score after normalization. The word condition search result score 350 in FIG. 5 includes four response contents and respective scores. The score of ID: 0001 is 24.0, the score of ID: 0002 is 10.0, the score of ID: 0003 is 4.0, the score of ID: 0004 is 2.0, and the total score is 40.0 .

評価部５０２は、合計スコア４０．０によって各スコアを正規化することにより、単語条件正規化後スコア３５２を算出する。ＩＤ：０００１については、２４．０／４０．０×１００＝６０．０が算出され、ＩＤ：０００２、ＩＤ：０００３、ＩＤ：０００４についても同様に、それぞれ、２５．０、１０．０、５．０が算出される。 The evaluation unit 502 calculates a post-word condition normalization score 352 by normalizing each score with the total score 40.0. For ID: 0001, 24.0 / 40.0 × 100 = 60.0 is calculated. Similarly, ID: 0002, ID: 0003, and ID: 0004 are 25.0, 10.0, 5 respectively. .0 is calculated.

全文検索結果スコア４５０は、６つの応答内容とそれぞれのスコアとを含む。ＩＤ：０００３のスコアが１２．０、ＩＤ：０００５のスコアが１０．０、ＩＤ：０００６のスコアが９．０、ＩＤ：０００４のスコアが６．０、ＩＤ：０００７のスコアが２．０、ＩＤ：０００８のスコアが１．０であり、合計スコアが４０．０である。 The full-text search result score 450 includes six response contents and respective scores. ID: 0003 score is 12.0, ID: 0005 score is 10.0, ID: 0006 score is 9.0, ID: 0004 score is 6.0, ID: 0007 score is 2.0, The score of ID: 0008 is 1.0 and the total score is 40.0.

評価部５０２は、合計スコア４０．０によって各スコアを正規化することにより、全文検索正規化後スコア４５２を算出する。ＩＤ：０００３については、１２．０／４０．０×１００＝３０．０が算出され、ＩＤ：０００５、ＩＤ：０００６、ＩＤ：０００４、ＩＤ：０００７、ＩＤ：０００８についても同様に、それぞれ、２５．０、２２．５、１５．０、５．０、２．５が算出される。 The evaluation unit 502 calculates the post-full-text search normalized score 452 by normalizing each score with the total score 40.0. For ID: 0003, 12.0 / 40.0 × 100 = 30.0 is calculated. Similarly, ID: 0005, ID: 0006, ID: 0004, ID: 0007, and ID: 0008 are each 25. 0.0, 22.5, 15.0, 5.0, 2.5 are calculated.

図６は、評価結果スコアの一例を概略的に示す。図６は、図５に示す検索結果スコアに基づいて、評価部５０２によって生成された評価結果スコア５５０の例を示す。評価結果スコア５５０は、応答内容のＩＤ毎にスコアを含む。図６に示す例では説明のため、スコアの算出式を示している。ＩＤ：０００３は、単語条件正規化後スコア３５２及び全文検索正規化後スコア４５２の両方に含まれることから、評価部５０２は、単語条件正規化後スコア３５２におけるスコアと、全文検索正規化後スコア４５２におけるスコアとの両方からＩＤ：０００３のスコアを算出する。 FIG. 6 schematically shows an example of the evaluation result score. FIG. 6 shows an example of the evaluation result score 550 generated by the evaluation unit 502 based on the search result score shown in FIG. The evaluation result score 550 includes a score for each ID of the response content. In the example shown in FIG. 6, a score calculation formula is shown for explanation. Since ID: 0003 is included in both the word condition normalized score 352 and the full text search normalized score 452, the evaluation unit 502 calculates the score in the word condition normalized score 352 and the full text search normalized score. The score of ID: 0003 is calculated from both the score at 452.

ここでは、評価部５０２が、正規化されたスコアに対して、検索アルゴリズムの信頼度と、検索結果数を乗じてスコアを算出する場合を例示している。本例では、単語条件検索エンジン３００による検索の信頼度を０．３とし、全文検索エンジン４００による検索の信頼度を０．７としている。 In this example, the evaluation unit 502 calculates the score by multiplying the normalized score by the reliability of the search algorithm and the number of search results. In this example, the reliability of the search by the word condition search engine 300 is 0.3, and the reliability of the search by the full-text search engine 400 is 0.7.

ＩＤ：０００３について、評価部５０２は、単語条件正規化後スコア３５２におけるスコア１０．０に対して、検索アルゴリズムの信頼度０．３及び検索結果数４を乗じることにより、１２．０というスコアを算出する。また、評価部５０２は、全文検索正規化後スコア４５２におけるスコア３０．０に対して、検索アルゴリズムの信頼度０．７及び検索結果数６を乗じることにより、１２６．０というスコアを算出する。そして、評価部５０２は、１２．０と１２６．０とを加算することにより、ＩＤ：０００３のスコアを１３８．０と算出する。 For ID: 0003, the evaluation unit 502 multiplies the score 10.0 in the word condition normalized score 352 by the reliability of the search algorithm 0.3 and the number of search results 4 to give a score of 12.0. calculate. In addition, the evaluation unit 502 calculates a score of 126.0 by multiplying the score 30.0 in the full-text search normalized score 452 by the reliability 0.7 of the search algorithm and the number of search results 6. Then, the evaluation unit 502 calculates the score of ID: 0003 as 138.0 by adding 12.0 and 126.0.

ＩＤ：０００５については、全文検索正規化後スコア４５２にしか含まれないことから、評価部５０２は、全文検索正規化後スコア４５２におけるスコア２５．０に対して、検索アルゴリズムの信頼度０．７及び検索結果数６を乗じることにより、１０５．０というスコアを算出する。他の応答内容についても同様に、ＩＤ：０００６については９４．５、ＩＤ：０００１については７２．０、ＩＤ：０００４について６９．０、ＩＤ：０００２については３０．０、ＩＤ：０００７については２１．０、ＩＤ：０００８については１０．５が算出される。 Since ID: 0005 is only included in the post-full-text search normalized score 452, the evaluation unit 502 compares the score of 25.0 in the full-text search normalized score 452 with a reliability of the search algorithm of 0.7. And the score of 105.0 is calculated by multiplying the number of search results 6. Similarly for other response contents, 94.5 for ID: 0006, 72.0 for ID: 0001, 69.0 for ID: 0004, 30.0 for ID: 0002, 21 for ID: 0007 1.0, ID: 0008 is calculated as 10.5.

このように、単語条件検索部３０２による検索と、全文検索部４０２による検索との両方に基づいて、クエリに対する応答内容のスコアを算出することにより、単語条件検索部３０２による検索及び全文検索部４０２による検索の長所を活かし、短所を補うような結果を提供することができる。 As described above, the search by the word condition search unit 302 and the full-text search unit 402 are calculated by calculating the score of the response content to the query based on both the search by the word condition search unit 302 and the search by the full-text search unit 402. By taking advantage of the search by, you can provide results that compensate for the shortcomings.

以上の説明において、情報検索システム１００の各部は、ハードウエアにより実現されてもよく、ソフトウエアにより実現されてもよい。また、ハードウエアとソフトウエアとの組み合わせにより実現されてもよい。例えば、シナリオ実行エンジン２００、単語条件検索エンジン３００、全文検索エンジン４００、及び評価エンジン５００は、それぞれ異なるハードウエア、ソフトウエアによって実現されてもよい。 In the above description, each unit of the information search system 100 may be realized by hardware or may be realized by software. Further, it may be realized by a combination of hardware and software. For example, the scenario execution engine 200, the word condition search engine 300, the full-text search engine 400, and the evaluation engine 500 may be realized by different hardware and software, respectively.

例えば、情報検索システム１００上でプログラムが実行されることにより、コンピュータが、情報検索システム１００の一部として機能してもよい。プログラムは、コンピュータ読み取り可能な媒体に記憶されていてもよく、ネットワークに接続された記憶装置に記憶されていてもよい。ＣＰＵ、ＲＯＭ、ＲＡＭ、通信インターフェース等を有するデータ処理装置と、入力装置と、出力装置と、記憶装置とを備えた一般的な構成の情報処理装置において、情報検索システム１００の各部の動作を規定したソフトウエア又はプログラムを起動することにより、情報検索システム１００が実現されてよい。 For example, a computer may function as a part of the information search system 100 by executing a program on the information search system 100. The program may be stored in a computer-readable medium, or may be stored in a storage device connected to a network. In an information processing apparatus having a general configuration including a data processing apparatus having a CPU, ROM, RAM, communication interface, etc., an input apparatus, an output apparatus, and a storage apparatus, the operation of each part of the information search system 100 is defined. The information search system 100 may be realized by starting the software or program.

コンピュータにインストールされ、コンピュータを本実施形態に係る情報検索システム１００の一部として機能させるプログラムは、情報検索システム１００の各部の動作を規定したモジュールを備える。これらのプログラム又はモジュールは、ＣＰＵ等に働きかけて、コンピュータを、情報検索システム１００の各部としてそれぞれ機能させる。これらのプログラムに記述された情報処理は、コンピュータに読込まれることにより、ソフトウエアと上述した各種のハードウエア資源とが協働した具体的手段として機能する。そして、これらの具体的手段によって、本実施形態におけるコンピュータの使用目的に応じた情報の演算又は加工を実現することにより、使用目的に応じた特有の測定装置を構築することができる。情報検索システム１００は、情報処理装置の一例であってよい。 A program that is installed in a computer and causes the computer to function as a part of the information search system 100 according to the present embodiment includes a module that defines the operation of each unit of the information search system 100. These programs or modules work on the CPU or the like to cause the computer to function as each part of the information search system 100. Information processing described in these programs functions as a specific means in which software and the various hardware resources described above cooperate with each other by being read by a computer. A specific measurement device according to the purpose of use can be constructed by realizing calculation or processing of information according to the purpose of use of the computer in the present embodiment by these specific means. The information search system 100 may be an example of an information processing device.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above-described embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

特許請求の範囲、明細書、および図面中において示した装置、システム、プログラム、および方法における動作、手順、ステップ、および段階などの各処理の実行順序は、特段「より前に」、「先立って」などと明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」などを用いて説明したとしても、この順で実施することが必須であることを意味するものではない。 The order of execution of each process such as operations, procedures, steps, and stages in the apparatus, system, program, and method shown in the claims, the description, and the drawings is particularly “before” or “prior to”. It should be noted that it can be realized in any order unless the output of the previous process is used in the subsequent process. Regarding the operation flow in the claims, the description, and the drawings, even if it is described using “first,” “next,” etc. for convenience, it means that it is essential to carry out in this order. It is not a thing.

１０ユーザ、２０ネットワーク、３０通信端末、４０音声処理サーバ、１００情報検索システム、２００シナリオ実行エンジン、２０２クエリ受付部、２０４形態素解析部、２０５形態素辞書、２０６ＮＧワードフィルタ、２０７ＮＧワード辞書、２０８類義語展開部、２０９類義語辞書、２１０スコア取得部、２１２第１スコア取得部、２１４第２スコア取得部、２１６応答内容決定部、２１８応答内容出力部、２８０コンテンツテーブル、３００単語条件検索エンジン、３０２単語条件検索部、３１０語句スコアテーブル、３５０単語条件検索結果スコア、３５２単語条件正規化後スコア、４００全文検索エンジン、４０２全文検索部、４５０全文検索結果スコア、４５２全文検索正規化後スコア、５００評価エンジン、５０２評価部 10 users, 20 networks, 30 communication terminals, 40 voice processing servers, 100 information retrieval systems, 200 scenario execution engines, 202 query reception units, 204 morpheme analysis units, 205 morpheme dictionaries, 206 NG word filters, 207 NG word dictionaries, 208 Synonym expansion unit, 209 Synonym dictionary, 210 score acquisition unit, 212 first score acquisition unit, 214 second score acquisition unit, 216 response content determination unit, 218 response content output unit, 280 content table, 300 word condition search engine, 302 Word condition search unit, 310 Phrase score table, 350 Word condition search result score, 352 Word condition normalized score, 400 Full-text search engine, 402 Full-text search unit, 450 Full-text search result score, 452 Full-text search normalized score , 500 evaluation engine, 502 evaluation unit

Claims

Computer
A response content table reference unit for referring to a response content table in which a plurality of words associated with a plurality of response content and a plurality of response content are registered;
A phrase score table reference unit that refers to a phrase score table in which a plurality of phrases and scores associated with each of the plurality of phrases are registered;
A query reception unit that receives a query entered by the user,
A phrase extraction unit that extracts a plurality of phrases from the query received by the query reception unit;
Using the first search algorithm, for each of the plurality of response contents registered in the response content table, out of a plurality of words extracted by the word / phrase extraction unit, scores of a plurality of words / phrases included in the response content are calculated. , A first score calculation unit that calculates a first score for the query of the response content by obtaining from the phrase score table and multiplying or adding the result,
A first score acquisition unit that acquires the first score calculated by the first score calculation unit for each of a plurality of response contents searched using the first search algorithm;
Using a second search algorithm different from the first search algorithm, for each of the plurality of response contents registered in the response content table, among the plurality of phrases extracted by the phrase extraction unit, the response contents and By calculating and adding scores of a plurality of phrases included in a plurality of phrases associated with the response content by at least one of a TF (Term Frequency) method and an IDF method, the response content with respect to the query 2nd score calculation part which calculates a 2nd score, Comprising: The number of the kind of phrase contained in the some phrase matched with the response content and the said response content among the some phrase extracted by the said phrase extraction part A second score calculation unit for calculating the second score based on weighting for
A second score acquisition unit that acquires the second score calculated by the second score calculation unit for each of a plurality of response contents searched using the second search algorithm;
A response content determination unit that determines response content to the query based on the plurality of first scores acquired by the first score acquisition unit and the plurality of second scores acquired by the second score acquisition unit; A program for functioning as a response content output unit that outputs the response content determined by the response content determination unit.

The first score calculation unit is associated with the response content and the response content among the plurality of words extracted by the word extraction unit for each of the plurality of response contents registered in the response content table. The program according to claim 1, wherein the first score of the response content is calculated by obtaining scores of a plurality of phrases included in a plurality of phrases from the phrase score table and multiplying or adding the scores.

Said first score calculating unit, IDF (Inverse Document Frequency) further calculates the first score with a method, a program according to claim 1 or 2.

The response content determination unit
Normalizing each first score of the plurality of response contents by a first addition value obtained by adding the first scores of the plurality of response contents searched using the first search algorithm;
The second score of each of the plurality of response contents is normalized by a second addition value obtained by adding the second scores of the plurality of response contents searched using the second search algorithm. 4. The program according to any one of items 1 to 3 .

The response content determination unit
Applying the number of response contents searched using the first search algorithm to the plurality of first scores;
The program according to any one of claims 1 to 4 , wherein the number of the plurality of response contents searched by using the second search algorithm is applied to the plurality of second scores.

Computer
A query reception unit that receives a query entered by the user,
A first score acquisition unit that acquires a first score for the query for each of a plurality of response contents searched using the first search algorithm;
A second score acquisition unit that acquires a second score for the query for each of a plurality of response contents searched using a second search algorithm different from the first search algorithm;
A response content determination unit that determines response content to the query based on the plurality of first scores acquired by the first score acquisition unit and the plurality of second scores acquired by the second score acquisition unit; The first score of each of the plurality of response contents is normalized and normalized by a first addition value obtained by adding the first scores of the plurality of response contents searched using the first search algorithm. Each of the plurality of response contents searched using the second search algorithm is multiplied by the number of the plurality of response contents searched using the first search algorithm to the plurality of first scores. The second score of each of the plurality of response contents is normalized by a second addition value obtained by adding the second score, and the second search algorithm is obtained for the normalized second scores. Multiplying the number of retrieved plurality of response content using a rhythm, response content determination unit, and a program for functioning as a response content output unit which outputs the response content to the response content determination unit has determined.

A response content table reference unit that refers to a response content table in which a plurality of response contents and a plurality of words associated with each of the plurality of response contents are registered;
A phrase score table reference unit that refers to a phrase score table in which a plurality of phrases and scores associated with each of the plurality of phrases are registered;
A query receiving unit for receiving a query input by a user;
A phrase extraction unit that extracts a plurality of phrases from the query received by the query reception unit;
Using the first search algorithm, for each of the plurality of response contents registered in the response content table, out of a plurality of words extracted by the word / phrase extraction unit, scores of a plurality of words / phrases included in the response content are calculated. A first score calculation unit that calculates a first score for the query of the response content by acquiring from the phrase score table and multiplying or adding;
A first score acquisition unit that acquires the first score calculated by the first score calculation unit for each of a plurality of response contents searched using the first search algorithm;
Using a second search algorithm different from the first search algorithm, for each of the plurality of response contents registered in the response content table, among the plurality of phrases extracted by the phrase extraction unit, the response contents and By calculating and adding scores of a plurality of phrases included in a plurality of phrases associated with the response content by at least one of a TF (Term Frequency) method and an IDF method, the response content with respect to the query A second score calculation unit for calculating a second score, wherein a plurality of words extracted by the word extraction unit are included in the response content or a plurality of words associated with the response content Depends on the weight and the response content among the plurality of words extracted by the word extraction unit and the response content A second score calculating unit that calculates the second score based on weighting for the number of types of phrases included in the plurality of phrases,
A second score acquisition unit that acquires the second score calculated by the second score calculation unit for each of a plurality of response contents searched using the second search algorithm;
A response content determination unit that determines response content to the query based on the plurality of first scores acquired by the first score acquisition unit and the plurality of second scores acquired by the second score acquisition unit; The first score of each of the plurality of response contents is normalized and normalized by a first addition value obtained by adding the first scores of the plurality of response contents searched using the first search algorithm. Each of the plurality of response contents searched using the second search algorithm is multiplied by the number of the plurality of response contents searched using the first search algorithm to the plurality of first scores. The second score of each of the plurality of response contents is normalized by a second addition value obtained by adding the second score, and the second search algorithm is obtained for the normalized second scores. A plurality of first scores obtained by multiplying the number of the plurality of response contents searched using a rhythm and applying a reliability set in advance for the first search algorithm, and the second search algorithm A response content determination unit that determines response content to the query based on a plurality of second scores to which a predetermined reliability is applied;
An information search device comprising: a response content output unit that outputs the response content determined by the response content determination unit.