JP7396190B2

JP7396190B2 - Extraction program, extraction method and extraction device

Info

Publication number: JP7396190B2
Application number: JP2020080846A
Authority: JP
Inventors: 俊秀宮城; 淳真工藤; 幸太山越; 航太穴田; 大紀塙; 那美加江原
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2023-12-12
Anticipated expiration: 2040-04-30
Also published as: JP2021174484A

Description

本発明は、抽出プログラム、抽出方法及び抽出装置に関する。 The present invention relates to an extraction program, an extraction method, and an extraction device.

従来、検索キーワードを形態素解析により区切ることで得られた単語を用いて、コンテンツを検索する技術が知られている。 2. Description of the Related Art Conventionally, a technique is known in which content is searched using words obtained by segmenting a search keyword through morphological analysis.

特開２０１５－１３８３５１号公報Japanese Patent Application Publication No. 2015-138351

しかしながら、従来の技術には、検索キーワードを適切に区切ることができない場合があるという問題がある。 However, the conventional technology has a problem in that search keywords may not be separated appropriately.

例えば、従来の技術では、辞書に登録された単語を基に形態素解析が行われる。ここで、辞書に「富士通株式会社」という単語が登録されており、「富士通」という単語は登録されていない場合を考える。このとき、ユーザが「富士通」という検索キーワードを入力して検索を行った場合、「富士通株式会社」という単語が記載された文書を検索することができない場合がある。なお、ユーザは、「富士通株式会社」という単語が記載された文書が検索結果に含まれることを意図して、略称である「富士通」という検索キーワードを入力したものとする。 For example, in conventional technology, morphological analysis is performed based on words registered in a dictionary. Here, consider a case where the word "Fujitsu Ltd." is registered in the dictionary, but the word "Fujitsu" is not registered. At this time, if the user enters the search keyword "Fujitsu" and performs a search, it may not be possible to search for a document in which the word "Fujitsu Ltd." is written. It is assumed that the user inputs the search keyword ``Fujitsu,'' which is an abbreviation, with the intention that a document containing the word ``Fujitsu Ltd.'' will be included in the search results.

１つの側面では、検索キーワードを適切に区切ることを目的とする。 One aspect aims to appropriately segment search keywords.

１つの態様では、抽出プログラムは、データを用いて区切り前の検索キーワードに対する複数の区切りパターンそれぞれについて区切り後の検索キーワードで検索を行ったときの検索結果における正解ナレッジの検索順位を特定する処理をコンピュータに実行させる。データは、区切り前の検索キーワードと区切り前の検索キーワードに対する正解ナレッジとの対応関係を示したデータである。抽出プログラムは、複数の区切りパターンのうち、特定された検索順位が所定条件を満たす区切りパターンを、区切り前の検索キーワードに適用する区切りパターンとして抽出する処理をコンピュータに実行させる。 In one aspect, the extraction program performs a process of identifying the search ranking of correct answer knowledge in search results when a search is performed using a search keyword after separation for each of a plurality of separation patterns for a search keyword before separation using data. Let the computer run it. The data is data showing the correspondence between the search keyword before the break and the correct answer knowledge for the search keyword before the break. The extraction program causes the computer to execute a process of extracting, from among the plurality of delimitation patterns, a delimitation pattern whose specified search ranking satisfies a predetermined condition as a delimitation pattern to be applied to the search keyword before delimitation.

１つの側面では、検索キーワードを適切に区切ることができる。 In one aspect, search keywords can be appropriately separated.

図１は、抽出装置の構成例を示す図である。FIG. 1 is a diagram showing an example of the configuration of an extraction device. 図２は、辞書情報のデータ構造の例を示す図である。FIG. 2 is a diagram showing an example of the data structure of dictionary information. 図３は、区切りパターンの抽出について説明する図である。FIG. 3 is a diagram illustrating extraction of a delimiter pattern. 図４は、抽出処理の流れを示すフローチャートである。FIG. 4 is a flowchart showing the flow of the extraction process. 図５は、ハードウェア構成例を説明する図である。FIG. 5 is a diagram illustrating an example of a hardware configuration.

以下に、本発明に係る抽出プログラム、抽出方法及び抽出装置の実施例を図面に基づいて詳細に説明する。なお、この実施例により本発明が限定されるものではない。また、各実施例は、矛盾のない範囲内で適宜組み合わせることができる。 Embodiments of the extraction program, extraction method, and extraction device according to the present invention will be described in detail below based on the drawings. Note that the present invention is not limited to this example. Moreover, each embodiment can be combined as appropriate within a consistent range.

図１を用いて、実施例に係る抽出装置の構成を説明する。図１は、抽出装置の構成例を示す図である。抽出装置１０は、ユーザから検索キーワードの入力を受け付ける。また、抽出装置１０は、検索キーワードの区切りパターンが登録された辞書形式のデータを出力する。 The configuration of an extraction device according to an embodiment will be explained using FIG. 1. FIG. 1 is a diagram showing an example of the configuration of an extraction device. The extraction device 10 receives input of a search keyword from a user. Further, the extraction device 10 outputs data in a dictionary format in which delimitation patterns of search keywords are registered.

図１に示すように、抽出装置１０は、入力部１１、出力部１２、記憶部１３及び制御部１４を有する。入力部１１は、データを入力するためのインタフェースである。例えば、入力部１１は、マウス及びキーボード等の入力装置を介してデータの入力を受け付ける。また、出力部１２は、データを出力するためのインタフェースである。例えば、出力部１２は、ディスプレイ等の出力装置にデータを出力する。 As shown in FIG. 1, the extraction device 10 includes an input section 11, an output section 12, a storage section 13, and a control section 14. The input unit 11 is an interface for inputting data. For example, the input unit 11 receives data input via input devices such as a mouse and a keyboard. Further, the output unit 12 is an interface for outputting data. For example, the output unit 12 outputs data to an output device such as a display.

記憶部１３は、データや制御部１４が実行するプログラム等を記憶する記憶装置の一例であり、例えばハードディスクやメモリ等である。記憶部１３は、正解ナレッジ情報１３１、辞書情報１３２及びコンテンツＤＢ１３３を有する。 The storage unit 13 is an example of a storage device that stores data, programs executed by the control unit 14, and the like, and is, for example, a hard disk or a memory. The storage unit 13 includes correct answer knowledge information 131, dictionary information 132, and content DB 133.

正解ナレッジ情報１３１は、正解ナレッジの順位（rank_base）である。例えば、正解ナレッジは、ユーザが入力した検索キーワードに対する検索結果のうち、ユーザの意図に最も合致するものである。 The correct knowledge information 131 is the rank (rank_base) of the correct knowledge. For example, the correct answer knowledge is the one that most matches the user's intention among the search results for the search keyword input by the user.

例えば、ユーザがコンテンツ群の中から所望するコンテンツを検索するために「入出力設計書」という検索キーワードを入力した場合の、当該ユーザが所望するコンテンツが検索結果の中で３番目に位置していたとする。この場合、「入出力設計書」という検索キーワードに対する、当該所望するコンテンツの検索順位が３位であったことを示す情報が正解ナレッジ情報１３１となる。 For example, when a user enters the search keyword "input/output design document" to search for a desired content from a group of contents, the content desired by the user is located third in the search results. Suppose that In this case, the correct knowledge information 131 is information indicating that the desired content was ranked third in the search ranking for the search keyword "input/output design document."

辞書情報１３２は、形態素解析を行うための情報である。図２は、辞書情報のデータ構造の例を示す図である。図２に示すように、辞書情報１３２には、区切り前の検索キーワードと区切り後の検索キーワードが含まれる。図２の例では、区切り前の検索キーワードが「入出力設計書」であり、区切り後の検索キーワードが「［入出力］，［設計書］」である。 The dictionary information 132 is information for performing morphological analysis. FIG. 2 is a diagram showing an example of the data structure of dictionary information. As shown in FIG. 2, the dictionary information 132 includes a search keyword before separation and a search keyword after separation. In the example of FIG. 2, the search keyword before the break is "input/output design document" and the search keyword after the break is "[input/output], [design document]".

コンテンツＤＢ１３３は、コンテンツを記憶する。コンテンツは、例えば文書ファイル、テキストファイル及び表計算ファイル等である。 The content DB 133 stores content. The contents include, for example, document files, text files, spreadsheet files, and the like.

図１に戻り、制御部１４は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＧＰＵ（Graphics Processing Unit）等によって、内部の記憶装置に記憶されているプログラムがＲＡＭを作業領域として実行されることにより実現される。また、制御部１４は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されるようにしてもよい。制御部１４は、解析部１４１、検索部１４２、特定部１４３、及び抽出部１４５を有する。 Returning to FIG. 1, the control unit 14 uses, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), etc., to allow programs stored in an internal storage device to use the RAM as a work area. This is achieved by executing as . Further, the control unit 14 may be realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). The control unit 14 includes an analysis unit 141, a search unit 142, a specification unit 143, and an extraction unit 145.

解析部１４１は、区切り前の検索キーワードに対して形態素解析を行う。例えば、区切り前の検索キーワードは、入力部１１を介してユーザによって入力される。解析部１４１は、形態素解析により、区切り前の検索キーワードを、所定の区切りパターンに基づく要素に分解する。なお、本実施例において、要素は、１つの形態素又は複数の形態素を結合した文字列であるものとする。 The analysis unit 141 performs morphological analysis on the search keyword before separation. For example, the search keyword before the break is input by the user via the input unit 11. The analysis unit 141 uses morphological analysis to decompose the search keyword before delimitation into elements based on a predetermined delimitation pattern. Note that in this example, an element is assumed to be one morpheme or a character string formed by combining a plurality of morphemes.

検索部１４２は、区切り前の検索キーワード、又は区切り後の検索キーワードを用いてコンテンツの検索を行う。検索部１４２は、コンテンツＤＢ１３３に格納されたコンテンツの検索を行う。また、検索部１４２は、検索結果として得られたコンテンツに順位を付与する。例えば、検索部１４２は、検索キーワードとの類似度が大きい順に高い順位を付与する。 The search unit 142 searches for content using a search keyword before division or a search keyword after division. The search unit 142 searches for content stored in the content DB 133. The search unit 142 also ranks the content obtained as a search result. For example, the search unit 142 assigns a higher ranking in descending order of similarity to the search keyword.

特定部１４３は、データを用いて、区切り前の検索キーワードに対する複数の区切りパターンそれぞれについて、区切り後の検索キーワードで検索を行ったときの検索結果における正解ナレッジの検索順位を特定する。データは、区切り前の検索キーワードと区切り前の検索キーワードに対する正解ナレッジとの対応関係を示したデータである。 Using data, the identification unit 143 identifies the search ranking of correct knowledge in the search results when a search is performed using the search keyword after separation, for each of a plurality of separation patterns for the search keyword before separation. The data is data showing the correspondence between the search keyword before the break and the correct answer knowledge for the search keyword before the break.

抽出部１４４は、複数の区切りパターンのうち、特定された検索順位が所定条件を満たす区切りパターンを、区切り前の検索キーワードに適用する区切りパターンとして抽出する。また、抽出部１４４は、抽出した区切りパターンによって区切り前の検索キーワードを区切ることにより得られる情報を、辞書形式のデータ、すなわち辞書情報１３２として記憶部１３に登録する。 The extraction unit 144 extracts, from among the plurality of break patterns, a break pattern whose specified search ranking satisfies a predetermined condition as a break pattern to be applied to the search keyword before break. Further, the extraction unit 144 registers information obtained by dividing the search keyword before division using the extracted division pattern in the storage unit 13 as data in a dictionary format, that is, dictionary information 132.

図３は、区切りパターンの抽出について説明する図である。まず、図３に示すように、解析部１４１は、区切り前の検索キーワードに対し、複数の区切りパターンで形態素解析を行う。ここで、区切り前の検索キーワードが、単語ｗ＿１＋単語ｗ＿２＋単語ｗ＿３＋単語ｗ＿４であるものとする。なお、「＋」は文字列の結合を表す演算子であるものとする。例えば、区切りパターンには、単語ｗ＿１、単語ｗ＿２、単語ｗ＿３及び単語ｗ＿４の全てを区切るパターン、単語ｗ＿１と単語ｗ＿２は結合したままにしておくパターン等がある。 FIG. 3 is a diagram illustrating extraction of a delimiter pattern. First, as shown in FIG. 3, the analysis unit 141 performs morphological analysis on the search keyword before separation using a plurality of separation patterns. Here, it is assumed that the search keywords before separation are word w_1+word w_2+word w_3+word w_4. Note that "+" is an operator representing a combination of character strings. For example, the separation pattern includes a pattern that separates all of word w_1, word w_2, word w_3, and word w_4, a pattern that leaves word w_1 and word w_2 combined, and the like.

このとき、検索部１４２は、各区切りパターンに対応する区切り後の検索キーワードのそれぞれを用いてコンテンツの検索を行う。特定部１４３は、最も検索精度が高い区切りパターンを特定する。 At this time, the search unit 142 searches for content using each of the post-delimitation search keywords corresponding to each delimitation pattern. The specifying unit 143 specifies the delimiter pattern with the highest search accuracy.

抽出部１４４は、特定部１４３によって特定された区切りパターンを辞書情報１３２として登録する。例えば、抽出部１４４は、複数の区切りパターンのうち、特定された検索順位が、区切り前の検索キーワードで検索を行ったときの検索結果における正解ナレッジの検索順位よりも上位である区切りパターンを、区切り前の検索キーワードに適用する区切りパターンとして抽出する。 The extracting unit 144 registers the delimiter pattern specified by the specifying unit 143 as dictionary information 132. For example, the extraction unit 144 selects, among the plurality of break patterns, a break pattern whose specified search ranking is higher than the search ranking of the correct answer knowledge in the search results when a search is performed using the search keyword before the break. Extract as a separation pattern to be applied to the search keyword before separation.

さらに具体的な例を挙げて説明する。ユーザは、「入出力画面設計書.xlsx」というファイルを検索結果として得ることを所望し、「入出力設計書」という検索キーワードを入力したものとする。このとき、「入出力設計書」という検索キーワードをそのまま使って検索を行ったときの検索結果における「入出力画面設計書.xlsx」の順位が３位であったものとする。また、「入出力設計書」という検索キーワードと、「入出力画面設計書.xlsx」の順位が３位であったという情報は、正解ナレッジ情報１３１として記憶されているものとする。 This will be explained by giving a more specific example. It is assumed that the user desires to obtain a file called "input/output screen design document.xlsx" as a search result, and inputs the search keyword "input/output design document." At this time, it is assumed that when the search keyword "input/output design document" is used as is, the ranking of "input/output screen design document.xlsx" in the search results is third. Further, it is assumed that the search keyword "input/output design document" and the information that "input/output screen design document.xlsx" was ranked third are stored as correct answer knowledge information 131.

ここで、まず、解析部１４１は、「入出力設計書」を、「入出力設計書」、「入出力設計書」、「入出力設計書」、「入出力設計書」の４つのパターンに分解する。なお、スペースは区切り記号であり、検索においてはＡＮＤ条件を表すものとする。 First, the analysis unit 141 divides the "input/output design document" into four patterns: "input/output design document," "input/output design document," "input/output design document," and "input/output design document." Disassemble. Note that a space is a delimiter and represents an AND condition in a search.

そして、検索部１４２は、各パターンの検索キーワードで検索を行う。検索キーワードが「入出力設計書」であるときの、検索結果における「入出力画面設計書.xlsx」の順位は、１０位であったものとする。また、検索キーワードが「入出力設計書」であるときの、検索結果における「入出力画面設計書.xlsx」の順位は、６位であったものとする。また、検索キーワードが「入出力設計書」であるときの、検索結果における「入出力画面設計書.xlsx」の順位は、３位であったものとする。また、検索キーワードが「入出力設計書」であるときの、検索結果における「入出力画面設計書.xlsx」の順位は、２位であったものとする。 Then, the search unit 142 performs a search using each pattern of search keywords. It is assumed that when the search keyword is "input/output design document", the ranking of "input/output screen design document.xlsx" in the search results is 10th place. Further, it is assumed that when the search keyword is "input/output design document", the ranking of "input/output screen design document.xlsx" in the search results is 6th place. It is also assumed that when the search keyword is "input/output design document," the ranking of "input/output screen design document.xlsx" in the search results is 3rd place. Further, it is assumed that when the search keyword is "input/output design document", the ranking of "input/output screen design document.xlsx" in the search results is second.

このとき、特定部１４３は、最も順位が高かった「入出力設計書」に対応する区切りパターンを特定する。そして、抽出部１４４は、区切りパターンを「［入出力］，［設計書］」のように表現し、辞書情報１３２に追加する。なお、区切りパターンの表現形式は、上記のものに限られない。 At this time, the specifying unit 143 specifies the delimiter pattern corresponding to the "input/output design document" having the highest rank. Then, the extraction unit 144 expresses the delimiter pattern as “[input/output], [design document]” and adds it to the dictionary information 132. Note that the expression format of the delimiter pattern is not limited to the above.

ここで、解析部１４１は、辞書情報１３２に登録済みの検索キーワードについては、辞書情報１３２に従って形態素解析を行う。つまり、図２のように、区切り前の検索キーワード「入出力設計書」に対応する区切り後の検索キーワードが辞書情報１３２に追加済みの場合、解析部１４１は、「入出力設計書」を「入出力設計書」と分解する。そして、検索部１４２は、「入出力設計書」という検索キーワードで検索を行う。 Here, the analysis unit 141 performs morphological analysis on search keywords registered in the dictionary information 132 according to the dictionary information 132. In other words, as shown in FIG. 2, if the search keyword after the break that corresponds to the search keyword "input/output design document" before the break has been added to the dictionary information 132, the analysis unit 141 changes the "input/output design document" to "input/output design document". It is broken down into "input/output design document". Then, the search unit 142 performs a search using the search keyword "input/output design document."

図４は、抽出処理の流れを示すフローチャートである。図４に示すように、まず、解析部１４１は、形態素解析により検索キーワードを要素に分解する（ステップＳ１０１）。次に、検索部１４２は、形態素解析における区切りパターンごとに検索を行う（ステップＳ１０２）。そして、特定部１４３は、正解ナレッジの検索結果における順位を特定する（ステップＳ１０３）。抽出部１４４は、最も順位が高い区切りパターンを抽出し辞書に登録する（ステップＳ１０４）。 FIG. 4 is a flowchart showing the flow of the extraction process. As shown in FIG. 4, the analysis unit 141 first decomposes a search keyword into elements by morphological analysis (step S101). Next, the search unit 142 searches for each delimiter pattern in the morphological analysis (step S102). The specifying unit 143 then specifies the ranking of the correct knowledge in the search results (step S103). The extraction unit 144 extracts the delimiter pattern with the highest rank and registers it in the dictionary (step S104).

上述したように、特定部１４３は、データを用いて、区切り前の検索キーワードに対する複数の区切りパターンそれぞれについて、区切り後の検索キーワードで検索を行ったときの検索結果における正解ナレッジの検索順位を特定する。データは、区切り前の検索キーワードと区切り前の検索キーワードに対する正解ナレッジとの対応関係を示したデータである。抽出部１４４は、複数の区切りパターンのうち、特定された検索順位が所定条件を満たす区切りパターンを、区切り前の検索キーワードに適用する区切りパターンとして抽出する。このように、抽出装置１０は、実際に検索精度が良好であった区切りパターンを抽出することができる。このため、本実施例によれば、検索キーワードを適切に区切ることができるようになる。 As described above, the specifying unit 143 uses data to specify, for each of a plurality of break patterns for the search keyword before break, the search ranking of the correct answer knowledge in the search results when a search is performed using the search keyword after break. do. The data is data showing the correspondence between the search keyword before the break and the correct answer knowledge for the search keyword before the break. The extraction unit 144 extracts, from among the plurality of break patterns, a break pattern whose specified search ranking satisfies a predetermined condition as a break pattern to be applied to the search keyword before break. In this way, the extraction device 10 can extract delimiter patterns that actually have good search accuracy. Therefore, according to this embodiment, search keywords can be appropriately divided.

抽出部１４４は、複数の区切りパターンのうち、特定された検索順位が、区切り前の検索キーワードで検索を行ったときの検索結果における正解ナレッジの検索順位よりも上位である区切りパターンを、区切り前の検索キーワードに適用する区切りパターンとして抽出する。これにより、区切り前の検索キーワードよりも検索精度が向上するような区切りパターンのみを抽出することができる。 The extraction unit 144 selects, from among the plurality of delimitation patterns, a delimitation pattern whose identified search ranking is higher than the search ranking of the correct answer knowledge in the search results when searching with the search keyword before the delimitation. Extract as a delimiter pattern to be applied to search keywords. Thereby, it is possible to extract only the break pattern that improves the search accuracy compared to the search keyword before the break.

抽出部１４４は、抽出した区切りパターンによって区切り前の検索キーワードを区切ることにより得られる情報を、辞書形式のデータとして記憶部に登録する。このように、検索精度が向上する区切りパターンを辞書として保持しておくことで、精度の高い検索を行うことができる。 The extraction unit 144 registers information obtained by dividing the search keyword before division using the extracted division pattern into the storage unit as data in a dictionary format. In this way, by storing delimiter patterns that improve search accuracy as a dictionary, highly accurate searches can be performed.

上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。また、実施例で説明した具体例、分布、数値等は、あくまで一例であり、任意に変更することができる。 Information including processing procedures, control procedures, specific names, and various data and parameters shown in the above documents and drawings can be changed arbitrarily unless otherwise specified. Furthermore, the specific examples, distributions, numerical values, etc. described in the examples are merely examples, and can be changed arbitrarily.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散や統合の具体的形態は図示のものに限られない。つまり、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部又は任意の一部が、ＣＰＵ及び当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Furthermore, each component of each device shown in the drawings is functionally conceptual, and does not necessarily need to be physically configured as shown in the drawings. That is, the specific form of distributing and integrating each device is not limited to what is shown in the drawings. In other words, all or part of them can be functionally or physically distributed and integrated into arbitrary units depending on various loads, usage conditions, and the like. Furthermore, all or any part of each processing function performed by each device can be realized by a CPU and a program that is analyzed and executed by the CPU, or can be realized as hardware using wired logic.

図５は、ハードウェア構成例を説明する図である。図５に示すように、抽出装置１０は、通信インタフェース１０ａ、ＨＤＤ（Hard Disk Drive）１０ｂ、メモリ１０ｃ、プロセッサ１０ｄを有する。また、図５に示した各部は、バス等で相互に接続される。 FIG. 5 is a diagram illustrating an example of a hardware configuration. As shown in FIG. 5, the extraction device 10 includes a communication interface 10a, an HDD (Hard Disk Drive) 10b, a memory 10c, and a processor 10d. Further, each part shown in FIG. 5 is interconnected by a bus or the like.

通信インタフェース１０ａは、ネットワークインタフェースカード等であり、他のサーバとの通信を行う。ＨＤＤ１０ｂは、図１に示した機能を動作させるプログラムやＤＢを記憶する。 The communication interface 10a is a network interface card or the like, and communicates with other servers. The HDD 10b stores programs and DB that operate the functions shown in FIG.

プロセッサ１０ｄは、図１に示した各処理部と同様の処理を実行するプログラムをＨＤＤ１０ｂ等から読み出してメモリ１０ｃに展開することで、図１等で説明した各機能を実行するプロセスを動作させるハードウェア回路である。すなわち、このプロセスは、抽出装置１０が有する各処理部と同様の機能を実行する。具体的には、プロセッサ１０ｄは、解析部１４１、検索部１４２、特定部１４３及び抽出部１４５と同様の機能を有するプログラムをＨＤＤ１０ｂ等から読み出す。そして、プロセッサ１０ｄは、解析部１４１、検索部１４２、特定部１４３及び抽出部１４５等と同様の処理を実行するプロセスを実行する。 The processor 10d reads a program that executes the same processing as each processing unit shown in FIG. It is a hardware circuit. That is, this process executes the same functions as each processing unit included in the extraction device 10. Specifically, the processor 10d reads a program having the same functions as the analysis section 141, the search section 142, the identification section 143, and the extraction section 145 from the HDD 10b or the like. The processor 10d then executes a process that performs the same processing as the analysis unit 141, search unit 142, identification unit 143, extraction unit 145, and the like.

このように抽出装置１０は、プログラムを読み出して実行することで学習類方法を実行する情報処理装置として動作する。また、抽出装置１０は、媒体読取装置によって記録媒体から上記プログラムを読み出し、読み出された上記プログラムを実行することで上記した実施例と同様の機能を実現することもできる。なお、この他の実施例でいうプログラムは、抽出装置１０によって実行されることに限定されるものではない。例えば、他のコンピュータ又はサーバがプログラムを実行する場合や、これらが協働してプログラムを実行するような場合にも、本発明を同様に適用することができる。 In this way, the extraction device 10 operates as an information processing device that executes a learning method by reading and executing a program. Further, the extraction device 10 can also realize the same functions as in the above-described embodiments by reading the program from the recording medium using a medium reading device and executing the read program. Note that the programs in other embodiments are not limited to being executed by the extraction device 10. For example, the present invention can be similarly applied to cases where another computer or server executes a program, or where these computers or servers cooperate to execute a program.

このプログラムは、インターネット等のネットワークを介して配布することができる。また、このプログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＭＯ（Magneto－Optical disk）、ＤＶＤ（Digital Versatile Disc）等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行することができる。 This program can be distributed via a network such as the Internet. In addition, this program is recorded on a computer-readable recording medium such as a hard disk, flexible disk (FD), CD-ROM, MO (Magneto-Optical disk), or DVD (Digital Versatile Disc), and is read from the recording medium by the computer. It can be executed by being read.

１０抽出装置
１１入力部
１２出力部
１３記憶部
１４制御部
１３１正解ナレッジ情報
１３２辞書情報
１３３コンテンツＤＢ
１４１解析部、
１４２検索部
１４３特定部
１４４抽出部 10 Extraction device 11 Input section 12 Output section 13 Storage section 14 Control section 131 Correct answer knowledge information 132 Dictionary information 133 Content DB
141 Analysis Department,
142 Search section 143 Specification section 144 Extraction section

Claims

Using data showing the correspondence between the search keyword before the break and the correct answer knowledge for the search keyword before the break, a search is performed using the search keyword after the break for each of the plurality of break patterns for the search keyword before the break. identify the search ranking of the correct answer knowledge in the search results when
An extraction program that causes a computer to perform a process of extracting, among the plurality of break patterns, a break pattern whose search ranking satisfies a predetermined condition as a break pattern to be applied to the search keyword before the break. .

The process of extracting includes, among the plurality of break patterns, a break in which the identified search ranking is higher than the search ranking of the correct answer knowledge in the search results when a search is performed using the search keyword before the break. 2. The extraction program according to claim 1, wherein the pattern is extracted as a break pattern to be applied to the search keyword before the break.

2. The method further comprises performing a process of registering information obtained by delimiting the search keyword before the delimitation in the storage unit as data in a dictionary format using the delimitation pattern extracted by the extraction process. extraction program.

Using data showing the correspondence between the search keyword before the break and the correct answer knowledge for the search keyword before the break, a search is performed using the search keyword after the break for each of the plurality of break patterns for the search keyword before the break. identify the search ranking of the correct answer knowledge in the search results when
An extraction method characterized in that a computer performs a process of extracting, from among the plurality of break patterns, a break pattern whose specified search ranking satisfies a predetermined condition as a break pattern to be applied to the search keyword before the break. .

Using data showing the correspondence between the search keyword before the break and the correct answer knowledge for the search keyword before the break, a search is performed using the search keyword after the break for each of the plurality of break patterns for the search keyword before the break. a specifying unit that specifies the search ranking of the correct answer knowledge in the search results when
an extraction unit that extracts, from among the plurality of break patterns, a break pattern in which the specified search ranking satisfies a predetermined condition as a break pattern to be applied to the search keyword before the break;
An extraction device characterized by having: