JP5046170B2

JP5046170B2 - SEARCH SYSTEM, SEARCH METHOD, REPORT SYSTEM, REPORT METHOD, AND PROGRAM

Info

Publication number: JP5046170B2
Application number: JP2010111458A
Authority: JP
Inventors: 浩野美山; 大介宅間
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2010-05-13
Filing date: 2010-05-13
Publication date: 2012-10-10
Anticipated expiration: 2024-07-13
Also published as: JP2010211821A

Description

本発明は、検索システム、検索方法、報告システム、報告方法、及びプログラムに関する。特に本発明は、複数の文書データから文書データを検索し、特定の概念を有する文書データの数が増加していることを報告する検索システム、検索方法、報告システム、報告方法、及びプログラムに関する。 The present invention relates to a search system, a search method, a report system, a report method, and a program. In particular, the present invention relates to a search system, a search method, a report system, a report method, and a program that search document data from a plurality of document data and report that the number of document data having a specific concept is increasing.

従来、複数の文書データから、入力された検索文により指定される内容を含む文書データを検索する検索システムとして、文書データ中に検索文自体が完全に含まれていなくても、検索意図を反映して適切な文書を検索する検索システムが研究されている。このような検索システムは、例えば製品の製造メーカにおいて、コールセンターに寄せられた製品についての問い合わせ及び問い合わせに対する回答を文書データとしてテキスト化したコールログデータベースを作成し、このデータベースを活用して問い合わせに対する回答を適切に行うための支援システムの基本技術として活用できる（非特許文献５参照。）。 Conventionally, as a search system that searches document data that includes the contents specified by the input search sentence from multiple document data, the search intention is reflected even if the search sentence itself is not completely included in the document data. Search systems that search for appropriate documents are being studied. Such a search system, for example, in a product manufacturer, creates a call log database in which an inquiry about a product sent to a call center and an answer to the inquiry are converted into text as document data, and the answer to the inquiry is made using this database. Can be utilized as a basic technology of a support system for appropriately performing (see Non-Patent Document 5).

このような検索システムの一例として、検索に用いる検索文や文書データから自立語のキーワードを抽出する際に曖昧性を考慮するものが提案されている（非特許文献１、２及び６参照。）。また、検索をより正確に行うため、キーワードとして、付属語で表現される意味を組み込むものが提案されている（非特許文献５参照。）。更に、検索文や文書データ中にキーワードが含まれるか否かのみでなく、単語間の係り受けを考慮するものが提案されている（非特許文献４、特許文献１及び２参照。）。また、質問文に対する答えを出力するシステムとして、質問に対する正解例に基づいて学習するものが提案されている（非特許文献３参照。）。 As an example of such a search system, a system that considers ambiguity when extracting a keyword of an independent word from a search sentence or document data used for search has been proposed (see Non-Patent Documents 1, 2, and 6). . In order to perform a search more accurately, a keyword incorporating a meaning expressed by an attached word has been proposed (see Non-Patent Document 5). Furthermore, not only whether or not a keyword is included in a search sentence or document data, but also taking into account dependency between words has been proposed (see Non-Patent Document 4, Patent Documents 1 and 2). As a system for outputting an answer to a question sentence, a system for learning based on an example of a correct answer to a question has been proposed (see Non-Patent Document 3).

また、企業にとって、顧客との信頼関係を確立し、製品の品質や顧客サポートを更に向上していくことが重要である。このため、企業において、製品やサービス上の問題を早期発見することが望まれており、この問題発見の手段としてコールセンターのコールログを活用することが期待される。 It is also important for companies to establish relationships of trust with customers and further improve product quality and customer support. For this reason, it is desirable for companies to detect problems in products and services at an early stage, and it is expected that call logs of call centers will be used as means for finding problems.

このように順次蓄積される情報から問題を検知する方法としては、非特許文献７が提案されている。また、このような方法の一例として、文書ストリーム中において特定のキーワードに関する文書の入力間隔が小さくなっている部分を判定して問題を検知するシステムが提案されている（非特許文献８参照。）。更に、この判定において時間当たりの書き込み数を考慮するもの（非特許文献９参照。）や、特定のトピックの出現回数がしきい値を超えた場合に警告を発するもの（非特許文献１０参照。）や、キーワードの頻度が増加したことを検知し急騰話題を抽出するもの（非特許文献１１参照。）等が提案されている。また、製品等における既知の不具合の事例を用いて予測的解析を行うものが提案されている（非特許文献１２参照。）。
［先行技術文献］
［特許文献１］特開平１１−２５９５２４号公報
［特許文献２］特許３２６６５８６号公報
［非特許文献１］JUSTSYSTEM、「ConceptBase 技術とは」、[online]、平成１５年７月３０日、JUSTSYSTEM、［平成１６年６月３０日検索］、インターネット＜URL: http://www.justsystem.co.jp/km/whats/search_q_104.html＞
［非特許文献２］NRI、「サービスについて（NRIサイバーパテント）」、[online]、［平成１６年６月３０日検索］、インターネット＜URL: http://www.patent.ne.jp/01gaiyo/s-point/06.html＞
［非特許文献３］佐々木他、「SVMを用いた学習型質問応答システムSAIQA-II」、情報処理学会論文誌、Vol. 45、No 02、２００４年
［非特許文献４］松村他、「単語間の係受け関係を用いた情報検索手法の評価」、情報処理学会論文誌、Vol. 41、No. SIG01-003、２０００年
［非特許文献５］T. Nasukawa and T. Nagano, "Text analysis and knowledge mining system", IBM Systems Journal, Vol. 40, No. 4, ２００１年
［非特許文献６］Autonomy, "Conceptual Search", [online], ［平成１６年６月３０日検索］、インターネット＜URL: http://www.autonomy.com/c/content/Products/IDOL/f/Conceptual_Search＞
［非特許文献７］T. Fawcett and F. Provost, "Activity monitoring: Noticing interesting changes in behavior.", In Proc. Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 53--62, １９９９年
［非特許文献８］Jon Kleinberg, "Bursty and hierarchical structure in streams", In Proc．The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，２００２年
［非特許文献９］藤木稔明，南野朋之，鈴木泰裕，奥村学、「document streamにおけるburstの発見」、情報処理学会研究報告、2004-NL-160、p.85-92
［非特許文献１０］山西健司、「テキストマイニングとＮＬＰビジネス」、[online]、ＮＥＣ、［平成１６年６月３０日検索］、インターネット＜URL:http://it.jeita.or.jp/eltech/committee/knowledge/PDF/2003/Yamanishi.pdf
［非特許文献１１］野村総合研究所、「True Tellerとは？」、[online]、［平成１６年６月３０日検索］、インターネット＜URL: http://www.trueteller.net/about/index.shtml＞
［非特許文献１２］JUSTSYSTEM、「Alize」、[online]、［平成１６年６月３０日検索］、インターネット＜URL: http://www.justsystem.co.jp/km/ssm＞ Non-patent document 7 has been proposed as a method of detecting a problem from information sequentially accumulated in this way. In addition, as an example of such a method, a system has been proposed in which a problem is detected by determining a portion of a document stream where a document input interval related to a specific keyword is small (see Non-Patent Document 8). . Further, in this determination, the number of writing per time is taken into consideration (see Non-Patent Document 9), or a warning is issued when the number of appearances of a specific topic exceeds a threshold (see Non-Patent Document 10). ), And those that detect that the frequency of keywords has increased and extract a sudden topic (see Non-Patent Document 11) have been proposed. Moreover, what performs a predictive analysis using the case of the known malfunction in a product etc. is proposed (refer nonpatent literature 12).
[Prior art documents]
[Patent Document 1] Japanese Patent Application Laid-Open No. 11-259524 [Patent Document 2] Japanese Patent 3266586 [Non-Patent Document 1] JUSTSYSTEM, “What is ConceptBase Technology”, [online], July 30, 2003, JUSTSYSTEM, [Search June 30, 2004], Internet <URL: http://www.justsystem.co.jp/km/whats/search_q_104.html>
[Non-Patent Document 2] NRI, “About Services (NRI Cyber Patent)”, [online], [Search June 30, 2004], Internet <URL: http://www.patent.ne.jp/01gaiyo /s-point/06.html>
[Non-patent document 3] Sasaki et al., “Learning-type question answering system SAIQA-II using SVM”, IPSJ Journal, Vol. 45, No 02, 2004 [Non-patent document 4] Matsumura et al., “Word Evaluation of Information Retrieval Method Using Dependency Relationship between Information Processing, ”IPSJ Journal, Vol. 41, No. SIG01-003, 2000 [Non-Patent Document 5] T. Nasukawa and T. Nagano,“ Text analysis and knowledge mining system ", IBM Systems Journal, Vol. 40, No. 4, 2001 [Non-Patent Document 6] Autonomy," Conceptual Search ", [online], [Search June 30, 2004], Internet < URL: http://www.autonomy.com/c/content/Products/IDOL/f/Conceptual_Search>
[Non-Patent Document 7] T. Fawcett and F. Provost, "Activity monitoring: Noticing interesting changes in behavior.", In Proc. Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 53--62, 1999 [ Non-Patent Document 8] Jon Kleinberg, "Bursty and hierarchical structure in streams", In Proc. The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002 [Non-patent document 9] Fujiaki Yasuaki, Minamino Yasuyuki, Suzuki Yasuhiro, Okumura Manabu, "Discovery of burst in document stream", Information Processing Society of Japan, 2004- NL-160, p.85-92
[Non-Patent Document 10] Kenji Yamanishi, “Text Mining and NLP Business”, [online], NEC, [Search June 30, 2004], Internet <URL: http://it.jeita.or.jp/ eltech / committee / knowledge / PDF / 2003 / Yamanishi.pdf
[Non-Patent Document 11] Nomura Research Institute, “What is True Teller?”, [Online], [Search June 30, 2004], Internet <URL: http://www.trueteller.net/about/ index.shtml>
[Non-Patent Document 12] JUSTSYSTEM, “Alize”, [online], [searched on June 30, 2004], Internet <URL: http://www.justsystem.co.jp/km/ssm>

このような分野においては、コールを受けたスタッフが問い合わせの内容を検索文として入力し、検索意図に沿った文書データを効率良く検索することが望まれる。 In such a field, it is desired that the staff who receives the call inputs the contents of the inquiry as a search sentence and efficiently searches the document data according to the search intention.

キーワード抽出において曖昧性を考慮する検索システムにおいて、キーワードとして自立語のみを対象とした場合、例えば「ハードディスクを認識しない」という検索文から「ハードディスク」及び「認識」が抽出される。この結果、「認識しない」という検索意図が欠落し、「認識する」という文書データまでも検索されてしまう。 In a search system that considers ambiguity in keyword extraction, when only an independent word is targeted as a keyword, for example, “hard disk” and “recognition” are extracted from a search sentence “does not recognize hard disk”. As a result, the search intention “not recognized” is lost, and even document data “recognized” is searched.

また、キーワードとして付属語を考慮した場合、「ハードディスクを認識しない」という検索文から「ハードディスク」及び「認識［否定］」が抽出され、「認識しない」という検索意図が反映される。しかし、指定されたキーワードが文書中に現れる否かに基づいて検索されるため、「ＣＤ−ＲＯＭは認識できないが、ハードディスクは認識する」という文書データが検索されてしまう。 In addition, when an ancillary word is considered as a keyword, “hard disk” and “recognition [denial]” are extracted from a search sentence “does not recognize hard disk”, and the search intention “does not recognize” is reflected. However, because the search is performed based on whether or not the specified keyword appears in the document, document data “CD-ROM cannot be recognized but hard disk is recognized” is searched.

また、単語間の係り受けを考慮した場合であっても、検索意図を表現する様々な表現形式、例えば「ハードディスクが認識できない」や「ハードディスクが見えない」等の表現形式を一致させるのが困難である。なぜなら、各単語を類義語の範囲で拡張して検索文を意味的に解析したとしても、「ハードディスクが見えない」等の特定の状況でのみ使用する表現（単語の組み合わせ）を適切に区別できないためである。 In addition, even when taking into account dependency between words, it is difficult to match various expression formats that express search intent, for example, “Unable to recognize hard disk” and “Invisible hard disk”. It is. Because, even if each word is expanded in the range of synonyms and the search sentence is semantically analyzed, expressions (word combinations) used only in a specific situation such as “I can't see the hard disk” cannot be properly distinguished. It is.

更に、問題発見の手段としてコールセンターのコールログを活用する場合、単語ベースの処理においては個々の問題を表現する単語が少なく問題の切り分けが困難である。また、増加が報告されたキーワードによりどのような問題が生じているかを判別することができない。また、新たな製品については、全ての問題についてのコール数が増加する傾向となるが、このような状況において特定の問題を早期に発見することが難しい。 Further, when call center call logs are used as means for finding problems, it is difficult to isolate problems in word-based processing because there are few words expressing individual problems. Further, it is impossible to determine what kind of problem is caused by the keyword for which the increase has been reported. In addition, for new products, the number of calls for all problems tends to increase, but it is difficult to find a specific problem early in such a situation.

そこで本発明は、上記の課題を解決することのできる検索システム、検索方法、報告システム、報告方法、及びプログラムを提供することを目的とする。この目的は特許請求の範囲における独立項に記載の特徴の組み合わせにより達成される。また従属項は本発明の更なる有利な具体例を規定する。 Accordingly, an object of the present invention is to provide a search system, a search method, a report system, a report method, and a program that can solve the above-described problems. This object is achieved by a combination of features described in the independent claims. The dependent claims define further advantageous specific examples of the present invention.

本発明の第１の形態によると、複数の文書データから、入力された検索文により指定される内容を含む文書データを検索する検索システムであって、前記複数の文書データを記憶する文書データベースと、一の概念を包含する他の概念を当該一の概念の上位階層とする階層構造により、予め定められた複数の概念を記憶する概念データベースと、それぞれの前記文書データに含まれるキーワードに基づいて、当該文書データに対応する前記概念である文書概念を抽出する文書データ概念抽出部と、前記検索文に含まれるキーワードに基づいて、前記検索文に対応する前記概念である検索文概念を抽出する検索文概念抽出部と、前記複数の文書データのそれぞれのうち、前記検索文概念が前記文書概念の上位階層又は下位階層の概念となる前記文書データを検索する概念検索部と、前記概念検索部により検索された前記文書データを、前記検索文により指定される内容を含む前記文書データとして出力する検索結果出力部とを備える検索システムと、当該検索システムに関する検索方法、プログラム及び記録媒体とを提供する。 According to a first aspect of the present invention, there is provided a search system for searching document data including content specified by an input search sentence from a plurality of document data, the document database storing the plurality of document data, Based on a concept database that stores a plurality of predetermined concepts and a keyword included in each of the document data by using a hierarchical structure in which another concept including the one concept is an upper hierarchy of the one concept. A document data concept extracting unit that extracts the document concept that is the concept corresponding to the document data, and a search sentence concept that is the concept corresponding to the search sentence based on a keyword included in the search sentence The search sentence concept extraction unit, and the search sentence concept of each of the plurality of document data is a concept of an upper hierarchy or a lower hierarchy of the document concept A search system comprising: a concept search unit that searches for document data; and a search result output unit that outputs the document data searched by the concept search unit as the document data including content specified by the search sentence; A search method, a program, and a recording medium relating to the search system are provided.

本発明の第２の形態によると、複数の文書データが順次入力される報告システムであって、入力された文書データを順次記憶する文書データベースと、一の概念を包含する他の概念を当該一の概念の上位階層とする階層構造により、予め定められた複数の概念を記憶する概念データベースと、それぞれの前記文書データに含まれるキーワードに基づいて、当該文書データに対応する前記概念である文書概念を抽出する文書データ概念抽出部と、前記文書データベース内の前記文書データの数に対する、それぞれの前記概念に対応する前記文書データの数の比率を算出する概念比率算出部と、それぞれの前記概念に対応する基準比率に対する、前記概念比率算出部により算出された比率の大きさを示す相対頻度を算出する相対頻度算出部と、前記複数の概念のうち、前記相対頻度が予め定められたしきい値以上となる前記概念を選択する多頻度概念選択部と、前記多頻度概念選択部が選択した第１の前記概念と、前記第１の概念の上位階層の第２の前記概念との一方を、前記第１の概念及び前記第２の概念の相対頻度に基づいて選択する優先概念選択部と、前記第１の概念又は前記第２の概念のうち、前記優先概念選択部により選択された前記概念の相対頻度が高くなっていることを、使用者へ通知する通知部とを備える報告システムと、当該報告システムに関する報告方法、プログラム及び記録媒体とを提供する。 According to the second aspect of the present invention, there is provided a reporting system in which a plurality of document data is sequentially input, the document database storing the input document data sequentially, and other concepts including one concept. A concept database that stores a plurality of predetermined concepts, and a document concept that is the concept corresponding to the document data, based on a keyword included in each of the document data. A document data concept extraction unit that extracts the document data, a concept ratio calculation unit that calculates a ratio of the number of document data corresponding to each concept to the number of document data in the document database, and each of the concepts A relative frequency calculation unit that calculates a relative frequency indicating the magnitude of the ratio calculated by the concept ratio calculation unit with respect to a corresponding reference ratio; Of a plurality of concepts, a frequent concept selection unit that selects the concept having a relative frequency equal to or higher than a predetermined threshold, the first concept selected by the frequent concept selection unit, and the first A priority concept selection unit that selects one of the second concept and the second concept in the upper hierarchy of the concept based on the relative frequency of the first concept and the second concept; and the first concept or the first concept A reporting system including a notifying unit for notifying a user that the relative frequency of the concept selected by the priority concept selecting unit among the two concepts is high, and a reporting method and program related to the reporting system And a recording medium.

なお、上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではなく、これらの特徴群のサブコンビネーションもまた、発明となりうる。 The above summary of the invention does not enumerate all the necessary features of the present invention, and sub-combinations of these feature groups can also be the invention.

本発明によれば、検索文の内容を適切に反映して文書データを検索すると共に、順次追加される文書データから適切に問題発生を検出することができる。 According to the present invention, it is possible to search document data by appropriately reflecting the contents of a search sentence and to appropriately detect occurrence of a problem from sequentially added document data.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではなく、また実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described through embodiments of the invention. However, the following embodiments do not limit the invention according to the scope of claims, and all combinations of features described in the embodiments are included. It is not necessarily essential for the solution of the invention.

図１は、本発明の実施形態に係る検索システム１０の構成を示す。検索システム１０は、複数の文書データから、利用者等により入力された検索文により指定される内容を含む文書データを適切に検索する。本実施形態において、検索システム１０は、一例として、コールセンターに寄せられた製品についての問い合わせとその回答をテキスト化した文書データを検索するものである。 FIG. 1 shows a configuration of a search system 10 according to an embodiment of the present invention. The search system 10 appropriately searches document data including contents specified by a search sentence input by a user or the like from a plurality of document data. In the present embodiment, for example, the search system 10 searches document data in which an inquiry about a product sent to a call center and a response to the inquiry are converted into text.

検索システム１０は、文書ＤＢ１００と、概念ＤＢ１０５（概念データベース）と、製品ＤＢ１０６（製品データベース）と、構成要素ＤＢ１０７（構成要素データベース）と、辞書ＤＢ１１０（辞書データベース）と、類義語ＤＢ１１５（類義語データベース）と、文書データ正規化部１２０と、概念抽出規則ＤＢ１２５と、文書データ概念抽出部１３０と、検索インデクスＤＢ１３５（検索インデクスデータベース）と、検索文正規化部１４０と、検索文概念抽出部１４５と、概念検索部１５０と、概念選択支援部１５５と、検索結果出力部１６０とを備える。 The search system 10 includes a document DB 100, a concept DB 105 (concept database), a product DB 106 (product database), a component DB 107 (component database), a dictionary DB 110 (dictionary database), and a synonym DB 115 (synonym database). , Document data normalization unit 120, concept extraction rule DB 125, document data concept extraction unit 130, search index DB 135 (search index database), search sentence normalization part 140, search sentence concept extraction part 145, concept A search unit 150, a concept selection support unit 155, and a search result output unit 160 are provided.

文書ＤＢ１００は、複数の文書データを記憶する。本実施形態において、文書ＤＢ１００は、製品についての複数の不具合のそれぞれについて、当該不具合の内容を示す、例えば製品の使用者からの問い合わせと問い合わせに対する回答との内容を含む文書データを記憶する。この文書データは、問い合わせ及び回答が行われる度に順次蓄積されていく。 The document DB 100 stores a plurality of document data. In the present embodiment, the document DB 100 stores, for each of a plurality of defects related to a product, document data including the contents of the problem, for example, contents of an inquiry from a user of the product and an answer to the inquiry. This document data is sequentially accumulated every time an inquiry and response are made.

概念ＤＢ１０５は、一の概念を包含する他の概念を当該一の概念の上位階層とする階層構造により、予め定められた複数の概念を記憶する。ここで、概念とは、検索システム１０の製造者又は使用者により予め定義され、検索システム１０が扱う文章の意味内容を体系的に分類した個々の情報である。本実施形態に係る概念ＤＢ１０５は、製品についての複数の不具合を特定する複数の概念を記憶する。 The concept DB 105 stores a plurality of predetermined concepts in a hierarchical structure in which another concept including the one concept is an upper hierarchy of the one concept. Here, the concept is individual information that is defined in advance by the manufacturer or user of the search system 10 and systematically classifies the semantic content of the text handled by the search system 10. The concept DB 105 according to the present embodiment stores a plurality of concepts that specify a plurality of defects about a product.

製品ＤＢ１０６は、複数の製品の製品名の包含関係を階層構造により記憶する。構成要素ＤＢ１０７は、製品の各構成要素の包含関係を階層構造により記憶する。辞書ＤＢ１１０は、単語の品詞や正規表現等を記述した辞書を記憶する。類義語ＤＢ１１５は、予め定められた語句と、当該語句の類義語であるキーワードとの対応付けを記憶する。 The product DB 106 stores an inclusion relationship of product names of a plurality of products in a hierarchical structure. The component element DB 107 stores the inclusion relationship of each component element of the product in a hierarchical structure. The dictionary DB 110 stores a dictionary in which word parts of speech and regular expressions are described. The synonym DB 115 stores a correspondence between a predetermined phrase and a keyword that is a synonym of the phrase.

文書データ正規化部１２０は、辞書ＤＢ１１０を用いて、文書ＤＢ１００に記憶されたそれぞれの文書データの形態素解析及び構文解析を行う。また、文書データ正規化部１２０は、類義語ＤＢ１１５を用いて、それぞれの文書データに含まれる語句を、当該語句の類義語であるキーワードに置換することにより、当該文書データを正規化する。 The document data normalization unit 120 uses the dictionary DB 110 to perform morphological analysis and syntax analysis of each document data stored in the document DB 100. Further, the document data normalization unit 120 normalizes the document data by using the synonym DB 115 to replace a phrase included in each document data with a keyword that is a synonym of the phrase.

概念抽出規則ＤＢ１２５は、１又は複数のキーワードと、当該１又は複数のキーワードの意味内容を示す概念との組を含む概念抽出規則を記憶する。文書データ概念抽出部１３０は、それぞれの文書データに含まれるキーワードに基づいて、当該文書データに対応する概念である文書概念を抽出する。本実施形態に係る文書データ概念抽出部１３０は、文書データに含まれる１又は複数のキーワードに対して概念抽出規則ＤＢ１２５に格納された概念抽出規則を適用し、１又は複数のキーワードに対応する概念に変換する。検索インデクスＤＢ１３５は、それぞれの文書データについて、文書データ概念抽出部１３０により抽出された、当該文書データの文書概念と、当該文書データとの対応付けを記憶する。 The concept extraction rule DB 125 stores a concept extraction rule including a set of one or more keywords and a concept indicating the meaning content of the one or more keywords. The document data concept extraction unit 130 extracts a document concept that is a concept corresponding to the document data based on a keyword included in each document data. The document data concept extraction unit 130 according to the present embodiment applies the concept extraction rules stored in the concept extraction rule DB 125 to one or more keywords included in the document data, and the concepts corresponding to the one or more keywords. Convert to The search index DB 135 stores, for each document data, a correspondence between the document concept of the document data extracted by the document data concept extraction unit 130 and the document data.

検索文正規化部１４０は、文書データ正規化部１２０と同様にして、検索文の形態素解析及び構文解析を行うと共に、検索文を正規化する。検索文概念抽出部１４５は、文書データ正規化部１２０と同様にして、検索文に含まれるキーワードに基づいて、検索文に対応する概念である検索文概念を抽出する。 Similar to the document data normalization unit 120, the search sentence normalization unit 140 performs morphological analysis and syntax analysis of the search sentence and normalizes the search sentence. Similar to the document data normalization unit 120, the search sentence concept extraction unit 145 extracts a search sentence concept that is a concept corresponding to the search sentence based on a keyword included in the search sentence.

概念検索部１５０は、検索インデクスＤＢ１３５を用いて、複数の文書データのうち、検索文概念に対応する文書概念を有する文書データを検索する。この際、概念検索部１５０は、複数の文書データのそれぞれのうち、検索文概念が文書概念の上位階層又は下位階層の概念となる文書データを検索する。概念選択支援部１５５は、検索システム１０の使用者の指示に基づいて、検索文概念の上位概念化及び／又は下位概念化を支援する。 The concept search unit 150 uses the search index DB 135 to search for document data having a document concept corresponding to the search statement concept among a plurality of document data. At this time, the concept search unit 150 searches for document data in which the search sentence concept is an upper hierarchy or lower hierarchy concept of the document concept among each of the plurality of document data. The concept selection support unit 155 supports high-level conceptualization and / or low-level conceptualization of a search sentence concept based on an instruction from a user of the search system 10.

検索結果出力部１６０は、概念検索部１５０により検索された文書データを、検索文により指定される内容を含む文書データとして出力する。 The search result output unit 160 outputs the document data searched by the concept search unit 150 as document data including contents specified by the search sentence.

以上に示した検索システム１０によれば、予め定められた階層構造により概念の包含関係を体系化して定義しておき、検索文概念と文書概念の包含関係を考慮して文書データを検索することができる。これにより、検索文及び文書データを適切に概念にマッピングして、検索文の内容を適切に反映した検索を行うことができる。 According to the search system 10 described above, concept inclusion relationships are systematically defined by a predetermined hierarchical structure, and document data is searched in consideration of the search statement concept and document concept inclusion relationship. Can do. As a result, it is possible to perform a search that appropriately maps the search text and document data to the concept and appropriately reflects the contents of the search text.

図２は、本発明の実施形態に係る概念ＤＢ１０５が記憶する不具合の概念階層の一例を示す。
本実施形態に係る概念ＤＢ１０５は、一例として、ある物又は構成要素を特定する概念を上位階層とし、その物又は構成要素の状態等を説明する概念を下位階層とした階層構造を記憶する。 FIG. 2 shows an example of a concept hierarchy of defects stored in the concept DB 105 according to the embodiment of the present invention.
The concept DB 105 according to the present embodiment stores, as an example, a hierarchical structure in which a concept for specifying a certain thing or component is set as an upper hierarchy, and a concept for explaining the state or the like of the thing or component is set as a lower hierarchy.

不具合の概念階層は、一の概念を意味的に包含する他の概念を一の概念の上位階層とする階層構造により、製品に発生し得る各不具合を特定する各概念を体系的にまとめたものである。例えば、図２の階層構造におけるノードである「ハードウェア」は、「ハードウェアに不具合がある」という意味内容の概念であり、「／不具合／ハードウェア」と表される。また、「ハードディスク」は、「ハードウェアの（一部である）ハードディスクに不具合がある」という意味内容の概念であり、「／不具合／ハードウェア／ハードディスク」と表される。 The concept hierarchy of defects is a systematic summary of each concept that identifies each defect that can occur in a product, with a hierarchical structure in which other concepts that semantically include one concept are the upper hierarchy of one concept. It is. For example, “hardware” which is a node in the hierarchical structure of FIG. 2 is a concept of meaning content “hardware has a defect” and is represented as “/ fault / hardware”. “Hard disk” is a concept of the meaning content “a hard disk (which is a part) has a defect” and is represented as “/ failure / hardware / hard disk”.

本実施形態において、下位階層の概念は、当該概念の上位階層の概念に意味的に包含される。例えば、「ハードウェアの（一部である）ハードディスクに不具合がある」を示す概念である「／不具合／ハードウェア／ハードディスク」は、「ハードウェアに不具合がある」を示す概念である「／不具合／ハードウェア」の一形態であり、下位階層の概念が成立する場合に上位階層の概念も成立する包含関係を有する。同様に、「ハードウェアの（一部である）ディスプレイがちらつく不具合がある」を示す概念である「／不具合／ハードウェア／ディスプレイ／ちらつき」は、「ハードウェアの（一部である）ディスプレイが不具合がある」を示す概念である「／不具合／ハードウェア／ディスプレイ」の一形態であり、上位概念に包含される。 In the present embodiment, the concept of the lower hierarchy is semantically included in the concept of the upper hierarchy of the concept. For example, “/ defect / hardware / hard disk”, which is a concept indicating that “a hardware (part of) hard disk has a defect”, is a concept indicating “defect in hardware” “/ defect. / Hardware ”and has an inclusive relationship in which the concept of the upper hierarchy is also established when the concept of the lower hierarchy is established. Similarly, “/ defect / hardware / display / flicker”, which is a concept indicating that “the display of the hardware (which is a part) flickers”, This is a form of “/ defect / hardware / display”, which is a concept indicating “defect”, and is included in the superordinate concept.

また、本実施形態において、概念ＤＢ１０５は、製品の構成要素に不具合があることを示す概念（例えば「不具合／ハードウェア」）の下位階層に、当該構成要素の不具合の状態を示す概念（例えば「不具合／ハードウェア／騒音」）又は当該構成要素の一部分の構成要素を示す概念（例えば「不具合／ハードウェア／ハードディスク」を設けた階層構造により複数の概念を記憶する。 Further, in the present embodiment, the concept DB 105 has a concept (eg ““ defect / hardware ”) indicating the status of the malfunction of the component in a lower hierarchy of the concept (eg,“ defect / hardware ”) indicating that the product component is defective. A plurality of concepts are stored in a hierarchical structure provided with a component (for example, “failure / hardware / hard disk”) indicating a component of a part of the component (for example, “failure / hardware / noise”).

図３は、本発明の実施形態に係る製品ＤＢ１０６が記憶する製品の概念階層の一例を示す。
製品の概念階層は、複数の製品のそれぞれの製品名を示す概念の包含関係を階層構造により体系的にまとめたものである。本実施形態においては、不具合の概念階層と同様に、一の概念を包含する他の概念を一の概念の上位階層とする階層構造が用いられる。例えば、図３における「／製品／ＰＣ製品／ノート／シリーズＡ」は、「／製品／ＰＣ製品／ノート／シリーズＡ／モデルＡ１」及び「／製品／ＰＣ製品／ノート／シリーズＡ／モデルＡ２」を包含する総称的な製品名を示す概念である。 FIG. 3 shows an example of a conceptual hierarchy of products stored in the product DB 106 according to the embodiment of the present invention.
The product concept hierarchy is a systematic collection of concept inclusions indicating product names of a plurality of products in a hierarchical structure. In the present embodiment, similarly to the concept hierarchy of defects, a hierarchical structure is used in which another concept including one concept is an upper hierarchy of the one concept. For example, “/ product / PC product / notebook / series A” in FIG. 3 is “/ product / PC product / notebook / series A / model A1” and “/ product / PC product / notebook / series A / model A2”. It is the concept which shows the generic product name which includes.

以上に示した製品の概念階層は、不具合についての階層構造とは異なる階層構造として概念ＤＢ１０５に記憶されてもよい。この場合、概念ＤＢ１０５は、概念ＤＢ１０５は、複数の概念のそれぞれを、互いに異なる複数の階層構造（第１の階層構造、第２の階層構造、…）のいずれかのノードとして記憶する。 The concept hierarchy of the product shown above may be stored in the concept DB 105 as a hierarchy structure different from the hierarchy structure regarding defects. In this case, the concept DB 105 stores each of the plurality of concepts as a node of any of a plurality of different hierarchical structures (first hierarchical structure, second hierarchical structure,...).

図４は、本発明の実施形態に係る構成要素ＤＢ１０７が記憶する構成要素の概念階層の一例を示す。
構成要素の概念階層は、製品を構成する各構成要素を示す各概念の包含関係を、階層構造により体系的にまとめたものである。本実施形態においては、不具合の概念階層と同様に、一の概念を包含する他の概念を一の概念の上位階層とする階層構造が用いられる。例えば、図４における「構成要素／ハードウェア」は、「構成要素／ハードウェア／ハードディスク」、「構成要素／ハードウェア／ＣＰＵ」、「構成要素／ハードウェア／ＣＤドライブ」、及び「構成要素／ハードウェア／キーボード」を包含する構成要素を示す概念となる。 FIG. 4 shows an example of the conceptual hierarchy of the components stored in the component DB 107 according to the embodiment of the present invention.
The concept hierarchy of the component is a systematic summary of the inclusion relationship of each concept indicating each component constituting the product by a hierarchical structure. In the present embodiment, similarly to the concept hierarchy of defects, a hierarchical structure is used in which another concept including one concept is an upper hierarchy of the one concept. For example, “component / hardware” in FIG. 4 includes “component / hardware / hard disk”, “component / hardware / CPU”, “component / hardware / CD drive”, and “component / This is a concept indicating components including “hardware / keyboard”.

以上に示した構成要素の概念階層は、不具合についての階層構造とは異なる階層構造として概念ＤＢ１０５に記憶されてもよい。 The concept hierarchies of the constituent elements described above may be stored in the concept DB 105 as a hierarchical structure different from the hierarchical structure regarding the defects.

図５は、本発明の実施形態に係る検索システム１０の動作フローを示す。
まず、文書データ正規化部１２０は、辞書ＤＢ１１０を用いて、文書ＤＢ１００に記憶された各文書データの形態素解析及び構造解析等のテキスト解析を行う（Ｓ５００）。次に、文書データ正規化部１２０は、テキスト解析の結果抽出された各語句を、当該語句の正書に対応するキーワードに置換して、文書データを正規化する（Ｓ５０５）。この際、文書データ正規化部１２０は、当該語句に対応して一意に定まる概念が辞書中に記録されていた場合、この概念を用いて正規化を行う。 FIG. 5 shows an operation flow of the search system 10 according to the embodiment of the present invention.
First, the document data normalization unit 120 uses the dictionary DB 110 to perform text analysis such as morphological analysis and structural analysis of each document data stored in the document DB 100 (S500). Next, the document data normalization unit 120 normalizes the document data by replacing each word / phrase extracted as a result of the text analysis with a keyword corresponding to the original text of the word / phrase (S505). At this time, the document data normalization unit 120 performs normalization using this concept when a concept uniquely determined corresponding to the word is recorded in the dictionary.

次に、文書データ概念抽出部１３０は、それぞれの文書データに含まれるキーワードに基づいて、当該文書データの概念（文書概念）を抽出する（Ｓ５１０）。ここで、概念ＤＢ１０５が概念の階層構造を複数記憶している場合、文書データ概念抽出部１３０は、文書データに対応して、それぞれの階層構造について当該階層構造に属する文書概念を抽出する。すなわち例えば、概念ＤＢ１０５が各概念を第１の階層構造又は第２の階層構造のノードとして記憶している場合、文書データ概念抽出部１３０は、各文書データに対応して、第１の階層構造に属する第１の文書概念と、第２の階層構造に属する第２の文書概念とを抽出してもよい。また、文書データ概念抽出部１３０は、文書データに含まれるキーワードに基づいて、文書データに記述された製品名を示す文書概念、及び／又は、文書データに記述された構成要素を示す文書概念を更に抽出してもよい。そして文書データ概念抽出部１３０は、各文書データを識別する情報に当該文書データの文書概念を付加した検索インデクスを作成して検索インデクスＤＢ１３５に格納する（Ｓ５１５）。 Next, the document data concept extraction unit 130 extracts the concept of the document data (document concept) based on the keyword included in each document data (S510). When the concept DB 105 stores a plurality of concept hierarchical structures, the document data concept extraction unit 130 extracts document concepts belonging to the hierarchical structure for each hierarchical structure corresponding to the document data. That is, for example, when the concept DB 105 stores each concept as a node of the first hierarchical structure or the second hierarchical structure, the document data concept extraction unit 130 corresponds to each document data in the first hierarchical structure. The first document concept that belongs to and the second document concept that belongs to the second hierarchical structure may be extracted. Further, the document data concept extraction unit 130 generates a document concept indicating a product name described in the document data and / or a document concept indicating a component described in the document data based on a keyword included in the document data. Further extraction may be performed. Then, the document data concept extraction unit 130 creates a search index in which the document concept of the document data is added to information for identifying each document data, and stores the search index in the search index DB 135 (S515).

検索文が入力されると（Ｓ５２０）、検索文正規化部１４０は、文書データ正規化部１２０と同様にして、検索文のテキスト解析を行う（Ｓ５２５）。次に、検索文正規化部１４０は、検索文に含まれる語句を、当該語句の類義語であるキーワードに置換することにより検索文を正規化する（Ｓ５３０）。 When a search sentence is input (S520), the search sentence normalization unit 140 performs text analysis of the search sentence in the same manner as the document data normalization unit 120 (S525). Next, the search sentence normalization unit 140 normalizes the search sentence by replacing a phrase included in the search sentence with a keyword that is a synonym of the phrase (S530).

次に、検索文概念抽出部１４５は、文書データ概念抽出部１３０と同様にして、検索文から検索文概念を抽出する（Ｓ５３５）。本実施形態において、検索文概念抽出部１４５は、Ｓ５２０において利用者により入力された、製品についての不具合を検索するための検索文に対応する検索文概念を抽出する。また、検索文概念抽出部１４５は、検索文に含まれるキーワードに基づいて、検索文に記述された構成要素を示す検索文概念、及び／又は、検索文に記述された製品名を示す検索文概念を更に抽出してもよい。 Next, the search sentence concept extraction unit 145 extracts the search sentence concept from the search sentence in the same manner as the document data concept extraction unit 130 (S535). In the present embodiment, the search sentence concept extraction unit 145 extracts a search sentence concept corresponding to the search sentence for searching for defects in the product input by the user in S520. In addition, the search sentence concept extraction unit 145 performs a search sentence concept indicating a component described in the search sentence and / or a search sentence indicating a product name described in the search sentence based on a keyword included in the search sentence. Concepts may be further extracted.

次に、概念検索部１５０は、検索文から抽出した検索文概念と、文書データから抽出した文書概念とに基づいて、文書データの概念検索を行う（Ｓ５４０）。より具体的には、概念検索部１５０は、文書概念が検索文概念と同一である場合や、文書概念が検索文概念の上位階層又は下位階層の概念となる場合に、当該文書データを検索文に対応する文書データとして選択する。この際、概念検索部１５０は、検索文概念が、検索文が入力される前に検索インデクスＤＢ１３５に格納された文書概念の上位階層又は下位階層の概念となる場合に、当該文書概念に対応する文書データを検索結果として出力する。これにより、概念検索部１５０は、既に抽出された文書概念に基づいて検索を行うことができ、検索する度に各文書データから文書概念を抽出する場合と比較し検索を高速に行うことができる。 Next, the concept retrieval unit 150 performs document data concept retrieval based on the retrieval sentence concept extracted from the retrieval sentence and the document concept extracted from the document data (S540). More specifically, the concept search unit 150 searches the document data for the search sentence when the document concept is the same as the search sentence concept or when the document concept is a concept of the upper hierarchy or lower hierarchy of the search sentence concept. Is selected as document data corresponding to. At this time, the concept search unit 150 corresponds to the document concept when the search sentence concept becomes a concept of an upper hierarchy or a lower hierarchy of the document concept stored in the search index DB 135 before the search sentence is input. Document data is output as a search result. As a result, the concept search unit 150 can perform a search based on the document concept that has already been extracted, and can perform a search at a higher speed than when the document concept is extracted from each document data each time the search is performed. .

次に、検索結果出力部１６０は、概念検索部１５０により検索された文書データを検索結果として出力する（Ｓ５４５）。本実施形態に係る検索結果出力部１６０は、概念検索部１５０により検索された文書データを、利用者により入力された製品についての不具合の内容を示す文書データとして出力する。 Next, the search result output unit 160 outputs the document data searched by the concept search unit 150 as a search result (S545). The search result output unit 160 according to the present embodiment outputs the document data searched by the concept search unit 150 as document data indicating the content of the defect of the product input by the user.

そして、検索システム１０は、次に検索文の入力を受けると、処理をＳ５２０へ進める。なお、新たな文書データが文書ＤＢ１００に追加された場合、検索システム１０は、処理をＳ５００へ進めて、当該文書データから文書概念を抽出し、検索インデクスＤＢ１３５へ格納する。 Then, when the search system 10 next receives an input of a search sentence, the search system 10 advances the process to S520. When new document data is added to the document DB 100, the search system 10 advances the process to S500, extracts the document concept from the document data, and stores it in the search index DB 135.

以上に示した検索システム１０によれば、予め定められた階層構造により概念の包含関係を体系化して定義しておき、検索文概念と文書概念の包含関係を考慮して文書データを検索することができる。これにより、検索文及び文書データを適切に概念にマッピングして、検索文の内容を適切に反映した検索を行うことができる。このような機能は、製品に対する問い合わせ及び回答をデータベース化し、新たな問い合わせに対する対応に用いる場合のように、限られた種類の概念を正確に定義し、的確に検索可能とすることが望まれる場合に特に有効である。 According to the search system 10 described above, concept inclusion relationships are systematically defined by a predetermined hierarchical structure, and document data is searched in consideration of the search statement concept and document concept inclusion relationship. Can do. As a result, it is possible to perform a search that appropriately maps the search text and document data to the concept and appropriately reflects the contents of the search text. When such functions are used to create a database of inquiries and answers about products and use them in response to new inquiries, it is desirable to accurately define a limited number of concepts so that they can be searched accurately. Is particularly effective.

図６は、本発明の実施形態に係る類義語ＤＢ１１５が記憶する正規化規則の一例を示す。
類義語ＤＢ１１５は、検索文及び文書データを正規化するために、図６に例示した正規化規則を記憶する。図６の正規化規則は、「電源を切る」又は「電源を落とす」という表現における「切る」又は「落とす」等の語句を、これらの類義語であるキーワード「遮断（する）」に置換して正規化するための規則である。文書データ正規化部１２０及び検索文正規化部１４０は、文書データ又は検索文の構文解析の結果、主語が「電源」、述語が「切る」又は「落とす」となる表現を検出した場合に、述語を「遮断（する）」に置換する。また、文書データ正規化部１２０及び検索文正規化部１４０は、単なる語句の置き換えだけでなく、「腹を立てる」を「怒る」に置換したり、「激怒する」を「非常に起こる」に置換する等の表現の正規化を、正規化規則に基づいて行ってもよい。そして、文書データ概念抽出部１３０は、正規化された文書データから文書概念を抽出し、検索文概念抽出部１４５は、正規化された検索文から検索文概念を抽出する。 FIG. 6 shows an example of normalization rules stored in the synonym DB 115 according to the embodiment of the present invention.
The synonym DB 115 stores the normalization rule illustrated in FIG. 6 in order to normalize the search sentence and the document data. The normalization rule of FIG. 6 replaces phrases such as “turn off” or “turn off” in the expressions “turn off power” or “turn off power” with the keyword “shut down” which is a synonym for these. It is a rule for normalization. When the document data normalization unit 120 and the search statement normalization unit 140 detect an expression in which the subject is “power” and the predicate is “cut” or “drop” as a result of the syntax analysis of the document data or the search statement, Replace predicate with “block”. In addition, the document data normalization unit 120 and the search sentence normalization unit 140 not only replace words but also replace “anger” with “anger”, or change “furious” to “very happen”. Expression normalization such as replacement may be performed based on a normalization rule. Then, the document data concept extraction unit 130 extracts the document concept from the normalized document data, and the search sentence concept extraction unit 145 extracts the search sentence concept from the normalized search sentence.

このようにして、概念検索に先立って予め語句レベルで類義語を正規化することにより、検索文に対応する文書データの検索精度を高めることができる。 In this way, by synchronizing synonyms in advance at the phrase level prior to concept search, the search accuracy of document data corresponding to a search sentence can be increased.

図７は、本発明の実施形態に係る概念抽出規則ＤＢ１２５が記憶する概念抽出規則の一例を示す。
概念抽出規則ＤＢ１２５は、検索文及び文書データから抽出する概念を予め定義するために、図７に例示した概念抽出規則を記憶する。ここで、概念抽出規則は、テキスト解析により得られた文章の構文（係り受け関係等）に基づいて、構文木中の１又は複数のキーワードを、当該１又は複数のキーワードの意味内容を示す概念に変換するための規則である。図７においては、「ハードディスクを認識できない」という文章から抽出されるキーワード「ハードディスク」及び「認識」と、「認識」の係り受け関係"否定"（ｈｉｔｅｉ＝"１"）とに基づいて、概念「／不具合／ハードウェア／ハードディスク」を抽出する規則を定めている。 FIG. 7 shows an example of the concept extraction rules stored in the concept extraction rule DB 125 according to the embodiment of the present invention.
The concept extraction rule DB 125 stores the concept extraction rules illustrated in FIG. 7 in order to predefine concepts to be extracted from search sentences and document data. Here, the concept extraction rule is a concept that indicates one or more keywords in the syntax tree based on the sentence syntax (dependency relationship, etc.) obtained by text analysis, and indicates the semantic content of the one or more keywords. It is a rule for converting to. In FIG. 7, the concept is based on the keywords “hard disk” and “recognition” extracted from the sentence “cannot recognize hard disk” and the dependency relationship “reject” (hitei = “1”) of “recognition”. Rules for extracting “/ defect / hardware / hard disk” are defined.

文書データ概念抽出部１３０は、概念抽出規則ＤＢ１２５に格納されたいずれかの概念抽出規則に含まれる１又は複数のキーワードが文書データに含まれる場合に、当該概念抽出規則に含まれる概念を文書概念として抽出する。同様に、検索文概念抽出部１４５は、概念抽出規則ＤＢ１２５に格納されたいずれかの概念抽出規則に含まれる１又は複数のキーワードが検索文に含まれる場合に、当該概念抽出規則に含まれる概念を、検索文概念として抽出する。 When one or more keywords included in any one of the concept extraction rules stored in the concept extraction rule DB 125 are included in the document data, the document data concept extraction unit 130 converts the concept included in the concept extraction rule into the document concept. Extract as Similarly, the search sentence concept extraction unit 145, when one or more keywords included in any concept extraction rule stored in the concept extraction rule DB 125 are included in the search sentence, the concept included in the concept extraction rule. Are extracted as a search sentence concept.

以上において、キーワード、係り受け、及び、属性のみでなく、「問題」、「要望」などの総称的な語を用いて概念抽出規則を定義し、これらに基づいて文書データ概念抽出部１３０及び検索文概念抽出部１４５により概念の抽出を行ってもよい。すなわち例えば、文書データ概念抽出部１３０及び検索文概念抽出部１４５は、「ハードディスクの問題」から、概念「／不具合／ハードウェア／ハードディスク」を抽出してもよい。
また、概念抽出規則ＤＢ１２５は、不具合についての概念階層のみでなく、製品や構成要素の概念階層についての概念抽出規則を更に記憶してもよい。 In the above, concept extraction rules are defined using not only keywords, dependency, and attributes but also generic terms such as “problem” and “request”, and the document data concept extraction unit 130 and search are performed based on these rules. The concept may be extracted by the sentence concept extraction unit 145. That is, for example, the document data concept extraction unit 130 and the search sentence concept extraction unit 145 may extract the concept “/ defect / hardware / hard disk” from the “hard disk problem”.
The concept extraction rule DB 125 may further store not only the concept hierarchy for defects but also the concept extraction rules for the concept hierarchy of products and components.

以上の処理により、文書データ概念抽出部１３０及び検索文概念抽出部１４５は、「ノート・シリーズＡ・モデルＡ１でハードディスクを認識しない」という文章から、「不具合／ハードウェア／ハードディスク」、「製品／ＰＣ製品／ノート／シリーズＡ／モデルＡ１」、「構成要素／ハードウェア／ハードディスク」の３つの概念を抽出することができる。 Through the above processing, the document data concept extraction unit 130 and the search sentence concept extraction unit 145 extract “failure / hardware / hard disk”, “product / hardware” from the sentence “notebook series A / model A1 does not recognize hard disk”. Three concepts of “PC product / note / series A / model A1” and “component / hardware / hard disk” can be extracted.

このように、本実施形態に係る検索システム１０によれば、１又は複数のキーワードとこれらの係り受け関係とに応じて、対応する概念を予め定義しておくことができる。これにより、自然言語の文章を、検索システム１０の応用分野に応じて体系化された概念に適切に変換することができる。 Thus, according to the search system 10 according to the present embodiment, corresponding concepts can be defined in advance according to one or a plurality of keywords and their dependency relationships. Thereby, natural language sentences can be appropriately converted into a systematic concept according to the application field of the search system 10.

図８は、本発明の実施形態に係る概念検索部１５０の構成を示す。概念検索部１５０は、同一概念出力部８００と、上位概念取得部８１０と、汎化概念出力部８２０と、下位概念取得部８３０と、特化概念出力部８４０とを有する。 FIG. 8 shows a configuration of the concept search unit 150 according to the embodiment of the present invention. The concept search unit 150 includes an identical concept output unit 800, a higher concept acquisition unit 810, a generalized concept output unit 820, a lower concept acquisition unit 830, and a specialized concept output unit 840.

同一概念出力部８００は、検索文概念が、文書概念と一致する場合において、検索文概念を特化しない場合に、当該文書データを検索結果として検索結果出力部１６０に出力する。上位概念取得部８１０は、検索文概念が、文書概念と一致しない場合に、検索文概念の上位階層の概念である検索文上位概念を取得する。汎化概念出力部８２０は、検索文上位概念が、文書概念と一致する場合に、当該文書データを検索結果として出力する。下位概念取得部８３０は、検索文概念を下位階層の概念である検索文下位概念に置換しても同一の文書データを検索できる場合に、検索文概念を検索文下位概念に置換する。特化概念出力部８４０は、検索文下位概念が文書概念と一致する文書データを検索結果として出力する。 The same concept output unit 800 outputs the document data to the search result output unit 160 as a search result when the search statement concept matches the document concept and the search statement concept is not specialized. When the search sentence concept does not match the document concept, the higher concept acquisition unit 810 acquires a search sentence superordinate concept that is a concept of a higher hierarchy of the search sentence concept. The generalized concept output unit 820 outputs the document data as a search result when the search sentence superordinate concept matches the document concept. The lower concept acquisition unit 830 replaces the search sentence concept with the search sentence subordinate concept when the same document data can be searched even if the search sentence concept is replaced with the search sentence subordinate concept that is a concept of the lower hierarchy. The specialized concept output unit 840 outputs document data whose search sentence subordinate concept matches the document concept as a search result.

図９は、本発明の実施形態に係る概念検索部１５０の動作フローを示す。
まず、概念検索部１５０は、検索文から抽出された１又は複数の検索文概念を受け取る。また、概念検索部１５０は、各文書データについて、当該文書データから抽出された１又は複数の文書概念を受け取る。そして、同一概念出力部８００は、検索文概念が文書概念と一致する場合（Ｓ９００：Ｙｅｓ）に、処理をＳ９４０へ進める。これにより、同一概念出力部８００は、検索文概念を検索文下位概念に置換できないことを条件として（Ｓ９４５：Ｎｏ）、当該文書データを検索結果として検索結果出力部１６０に出力する（Ｓ９１０）。ここで、複数の階層構造に対応して複数の検索文概念及び複数の文書概念が抽出されている場合、同一概念出力部８００は、検索文概念の全てが、いずれかの文書概念と同一である場合に、当該文書データを検索結果として出力する。例えば、「不具合／ハードウェア／ハードディスク」、「製品／ＰＣ製品／ノート／シリーズＡ／モデルＡ１」、及び「構成要素／ハードウェア／ハードディスク」の３つの検索文概念が抽出された場合、同一概念出力部８００は、これら３つの概念の全てを文書概念として含む文書データを検索結果として出力する。 FIG. 9 shows an operation flow of the concept search unit 150 according to the embodiment of the present invention.
First, the concept search unit 150 receives one or more search sentence concepts extracted from the search sentence. The concept search unit 150 receives one or more document concepts extracted from the document data for each document data. If the search statement concept matches the document concept (S900: Yes), the same concept output unit 800 advances the process to S940. As a result, the same concept output unit 800 outputs the document data as a search result to the search result output unit 160 on the condition that the search sentence concept cannot be replaced with the search sentence subordinate concept (S945: No) (S910). Here, when a plurality of search sentence concepts and a plurality of document concepts are extracted corresponding to a plurality of hierarchical structures, the same concept output unit 800 indicates that all of the search sentence concepts are the same as any one of the document concepts. In some cases, the document data is output as a search result. For example, if three search sentence concepts of “defect / hardware / hard disk”, “product / PC product / notebook / series A / model A1”, and “component / hardware / hard disk” are extracted, the same concept The output unit 800 outputs document data including all these three concepts as a document concept as a search result.

一方、上位概念取得部８１０は、検索文概念が文書概念と一致しない場合（Ｓ９００：Ｎｏ）に、検索文上位概念を取得する（Ｓ９２０）。ここで、複数の検索文概念（例えば第１の検索文概念及び第２の検索文概念）が抽出された場合、上位概念取得部８１０は、第１の検索文概念及び第２の検索文概念が、第１の文書概念及び第２の文書概念とそれぞれ同一でない場合に、第１の検索文概念の上位階層の第１の検索文上位概念と、第２の検索文概念の上位階層の第２の検索文上位概念とを取得する。 On the other hand, if the search sentence concept does not match the document concept (S900: No), the superordinate concept acquisition unit 810 acquires the search sentence superordinate concept (S920). Here, when a plurality of search sentence concepts (for example, the first search sentence concept and the second search sentence concept) are extracted, the higher-level concept acquisition unit 810 performs the first search sentence concept and the second search sentence concept. Are not the same as the first document concept and the second document concept, respectively, the first search sentence superordinate concept in the upper hierarchy of the first search sentence concept and the first hierarchy in the upper hierarchy of the second search sentence concept. 2. The search sentence superordinate concept of 2 is acquired.

本実施形態に係る上位概念取得部８１０は、構成要素に不具合があること又は構成要素の不具合の状態を示す、不具合の概念階層に属する検索文概念が存在する場合に、当該概念の上位階層となる概念を、検索文上位概念の１つとして取得する。また、構成要素の概念階層に属する検索文概念が存在する場合に、構成要素を示す検索文概念の上位階層となる概念を、検索文上位概念の１つとして取得する。また、製品の概念階層に属する検索文概念が存在する場合に、製品名を示す検索文概念の上位階層となる概念を、検索文概念の１つとして取得する。 The higher-level concept acquisition unit 810 according to the present embodiment, when there is a search statement concept that belongs to a failure concept hierarchy and indicates a failure state of the component or a failure state of the component, Is acquired as one of the search sentence superordinate concepts. Further, when there is a search sentence concept belonging to the concept hierarchy of the constituent element, a concept that is a higher hierarchy of the search sentence concept indicating the constituent element is acquired as one of the search sentence superordinate concepts. Further, when there is a search sentence concept belonging to the product concept hierarchy, a concept that is a higher hierarchy of the search sentence concept indicating the product name is acquired as one of the search sentence concepts.

例えば、「不具合／ハードウェア／ハードディスク」、「製品／ＰＣ製品／ノート／シリーズＡ／モデルＡ１」、及び「構成要素／ハードウェア／ハードディスク」の３つの検索文概念が抽出された場合、上位概念取得部８１０は、「不具合／ハードウェア」、「製品／ＰＣ製品／ノート／シリーズＡ」、及び「構成要素／ハードウェア」の３つの検索文上位概念を概念ＤＢ１０５、製品ＤＢ１０６、及び構成要素ＤＢ１０７から取得する。この結果、検索文は、検索文上位概念を用いて１階層分の汎化を行った場合、以下の３つに上位概念化される。 For example, when three search sentence concepts of “defect / hardware / hard disk”, “product / PC product / notebook / series A / model A1”, and “component / hardware / hard disk” are extracted, the superordinate concept The acquisition unit 810 has three search statement superordinate concepts of “defect / hardware”, “product / PC product / note / series A”, and “component / hardware” as a concept DB 105, a product DB 106, and a component DB 107. Get from. As a result, when the search sentence is generalized for one layer using the search sentence superordinate concept, the search sentence is superposed into the following three.

（１）第１の検索文上位概念「不具合／ハードウェア」、検索文概念「製品／ＰＣ製品／ノート／シリーズＡ／モデルＡ１」、及び検索文概念「構成要素／ハードウェア／ハードディスク」の組からなる概念
すなわち例えば、文書データ概念抽出部１３０が、文書データに含まれるキーワードに基づいて、一の構成要素に不具合があることを示す文書概念を抽出し、検索文概念抽出部１４５が、検索文に含まれるキーワードに基づいて、当該一の構成要素の一部に不具合があることを示す検索概念を抽出した場合、上位概念取得部８１０は、当該検索文概念の上位階層の概念である、一の構成要素に不具合があることを示す概念を検索文上位概念として取得する。 (1) Set of first search sentence superordinate concept “defect / hardware”, search sentence concept “product / PC product / notebook / series A / model A1”, and search sentence concept “component / hardware / hard disk” That is, for example, the document data concept extraction unit 130 extracts a document concept indicating that one component is defective based on a keyword included in the document data, and the search sentence concept extraction unit 145 When a search concept indicating that there is a defect in a part of the one component is extracted based on a keyword included in the sentence, the higher concept acquisition unit 810 is a concept of a higher hierarchy of the search sentence concept. A concept indicating that one component has a defect is acquired as a search sentence superordinate concept.

同様に、文書データ概念抽出部１３０が、文書データに含まれるキーワードに基づいて、一の構成要素に不具合があることを示す文書概念を抽出し、検索文概念抽出部１４５が、検索文に含まれるキーワードに基づいて、一の構成要素の不具合の状態を示す検索文概念を抽出した場合、上位概念取得部８１０は、検索文概念の上位階層の概念である、一の構成要素に不具合があることを示す概念を検索文上位概念として取得する。この結果、検索結果出力部１６０は、検索文上位概念と一致する、一の構成要素に不具合があることを示す文書概念を有する文書データを、検索結果として出力することができる。 Similarly, the document data concept extraction unit 130 extracts a document concept indicating that one component is defective based on a keyword included in the document data, and the search sentence concept extraction unit 145 includes the search sentence. When a search sentence concept indicating a failure state of one constituent element is extracted based on the keyword to be searched, the higher-level concept acquisition unit 810 has a defect in one constituent element that is a concept of a higher hierarchy of the search sentence concept. A concept indicating this is acquired as a search sentence superordinate concept. As a result, the search result output unit 160 can output, as a search result, document data having a document concept that matches the search sentence superordinate concept and indicates that one component has a defect.

（２）検索文概念「不具合／ハードウェア／ハードディスク」、第２の検索文上位概念「製品／ＰＣ製品／ノート／シリーズＡ」、及び検索文概念「構成要素／ハードウェア／ハードディスク」の組からなる概念
すなわち例えば、文書データ概念抽出部１３０が、文書データに含まれるキーワードに基づいて、一の製品名を示す文書概念を抽出し、検索文概念抽出部１４５が、検索文に含まれるキーワードに基づいて、当該製品名の下位階層の製品名を示す検索文概念を抽出した場合、上位概念取得部８１０は、検索文概念の上位階層の製品名に対応する概念を検索文上位概念として取得する。 (2) From the combination of search sentence concept “defect / hardware / hard disk”, second search sentence superordinate concept “product / PC product / note / series A”, and search sentence concept “component / hardware / hard disk” That is, for example, the document data concept extraction unit 130 extracts a document concept indicating one product name based on a keyword included in the document data, and the search sentence concept extraction unit 145 selects a keyword included in the search sentence. Based on this, when the search sentence concept indicating the product name in the lower hierarchy of the product name is extracted, the upper concept acquisition unit 810 acquires the concept corresponding to the product name in the upper hierarchy of the search sentence concept as the search sentence upper concept. .

（３）検索文概念「不具合／ハードウェア／ハードディスク」、検索文概念「製品／ＰＣ製品／ノート／シリーズＡ／モデルＡ１」、及び第３の検索文上位概念「構成要素／ハードウェア」
すなわち例えば、文書データ概念抽出部１３０が、文書データに含まれるキーワードに基づいて、一の構成要素を示す文書概念を抽出し、検索文概念抽出部１４５が、検索文に含まれるキーワードに基づいて、当該構成要素の一部の構成要素を示す検索文概念を抽出した場合、上位概念取得部８１０は、検索文概念の上位階層の構成要素に対応する概念を検索文上位概念として取得する。 (3) Search sentence concept “defect / hardware / hard disk”, search sentence concept “product / PC product / notebook / series A / model A1”, and third search sentence superordinate concept “component / hardware”
That is, for example, the document data concept extracting unit 130 extracts a document concept indicating one component based on a keyword included in the document data, and the search sentence concept extracting unit 145 is based on the keyword included in the search sentence. When a search sentence concept indicating some constituent elements of the constituent element is extracted, the upper concept acquisition unit 810 acquires a concept corresponding to a constituent element of an upper hierarchy of the search sentence concept as a search sentence upper concept.

以上において、上位概念取得部８１０は、検索文上位概念として、検索文概念に対して複数階層分上位階層に位置する概念を取得してもよい。この場合において、上位概念取得部８１０は、検索文上位概念が文書概念と一致するまで、検索文概念を上位階層の概念に順次置換し、いずれかの階層の検索文上位概念が文書概念と一致した場合に（Ｓ９３０：Ｙｅｓ）、当該検索文上位概念を用いることを決定してもよい。また、検索文に対応して、複数種類の検索文概念及び／又は検索文上位概念の組み合わせを取得した場合、汎化概念出力部８２０は、適切な検索文概念及び／又は検索文上位概念の組み合わせを選択する（Ｓ９３５）。 In the above, the higher concept acquisition unit 810 may acquire a concept that is positioned in a higher hierarchy for a plurality of hierarchies with respect to the search sentence concept as a search sentence superordinate concept. In this case, the superordinate concept acquisition unit 810 sequentially replaces the search sentence concept with the concept of the upper hierarchy until the search sentence superordinate concept matches the document concept, and the search sentence superordinate concept of any hierarchy matches the document concept. In such a case (S930: Yes), it may be determined to use the search sentence superordinate concept. In addition, when a combination of a plurality of types of search sentence concepts and / or search sentence superordinate concepts is acquired corresponding to the search sentence, the generalized concept output unit 820 displays an appropriate search sentence concept and / or search sentence superordinate concept. A combination is selected (S935).

この処理において、汎化概念出力部８２０は、より情報量が高い文書データを検索可能な検索文上位概念を選択する。すなわち例えば、汎化概念出力部８２０は、第１の検索文上位概念及び第１の文書概念と、第２の検索文概念及び第２の文書概念とがそれぞれ同一となる第１の文書データの数が、第１の検索文概念及び第１の文書概念と、第２の検索文上位概念及び第２の文書概念とがそれぞれ同一となる第２の文書データの数より小さい場合に、第１の文書データを検索結果として出力する。これにより汎化概念出力部８２０は、検索対象となる文書データをより適切に選択して出力することができる。 In this processing, the generalized concept output unit 820 selects a search sentence superordinate concept that can search document data having a higher amount of information. That is, for example, the generalization concept output unit 820 includes the first document data in which the first search sentence superordinate concept and the first document concept, and the second search sentence concept and the second document concept are the same. When the number is smaller than the number of second document data in which the first search sentence concept and the first document concept and the second search sentence superordinate concept and the second document concept are the same, the first Document data is output as a search result. As a result, the generalization concept output unit 820 can more appropriately select and output document data to be searched.

次に、下位概念取得部８３０は、Ｓ９００における検索文概念又はＳ９３５により得られた検索文概念の下位階層の概念である検索文下位概念を取得する（Ｓ９４０）。ここで、Ｓ９３５により得られた検索文概念の検索文下位概念が存在しなければ（Ｓ９４５：Ｎｏ）、汎化概念出力部８２０は、検索文上位概念と一致する文書概念を有する文書データ（Ｓ９３０参照）を検索結果として検索結果出力部１６０に出力する（Ｓ９１０）。
ここで、複数の検索文上位概念が取得された場合においては、少なくとも１つの検索文概念を上位階層の検索文上位概念とした場合に、全ての検索文概念又は検索文概念を置き換えた検索文上位概念と一致する文書概念を有する文書データを検索結果として出力する。すなわち、例えば第１の検索文上位概念及び第２の検索文上位概念が取得された場合において、汎化概念出力部８２０は、第１の検索文概念及び第２の検索文概念の少なくとも一方を上位階層の概念とした場合に、第１の検索文概念と一致する文書概念及び第２の検索文概念と一致する文書概念を有する文書データを検索結果として出力する。 Next, the lower concept acquisition unit 830 acquires a search sentence subordinate concept that is a search sentence concept in S900 or a concept in a lower hierarchy of the search sentence concept obtained in S935 (S940). Here, if there is no search sentence subordinate concept of the search sentence concept obtained in S935 (S945: No), the generalized concept output unit 820 has document data having a document concept that matches the search sentence superordinate concept (S930). Reference) is output to the search result output unit 160 as a search result (S910).
Here, in the case where a plurality of search sentence superordinate concepts are acquired, when at least one search sentence concept is set as a high-order search sentence superordinate concept, all search sentence concepts or search sentence concepts replaced with search sentence superordinate concepts Document data having a document concept that matches the superordinate concept is output as a search result. That is, for example, when the first search sentence superordinate concept and the second search sentence superordinate concept are acquired, the generalized concept output unit 820 displays at least one of the first search sentence concept and the second search sentence concept. In the case of a higher-level concept, document data having a document concept that matches the first search sentence concept and a document concept that matches the second search sentence concept is output as a search result.

一方、検索文下位概念が存在する場合（Ｓ９４５：Ｙｅｓ）、下位概念取得部８３０は、検索文概念と同一の文書概念を有する文書データの全てが検索文概念の下位階層の概念である検索文下位概念と同一の文書概念を有することを条件として（Ｓ９５０：Ｙｅｓ）、検索文概念を当該検索文下位概念に置換し、処理をＳ９４０へ進める（Ｓ９５０：Ｙｅｓ）。そして、下位概念取得部８３０は、Ｓ９４０及びＳ９４５の処理を再度行い、検索文概念を更に特化する。 On the other hand, when a search sentence subordinate concept exists (S945: Yes), the subordinate concept acquisition unit 830 searches the search sentence in which all document data having the same document concept as the search sentence concept is a concept in a lower hierarchy of the search sentence concept. On condition that the document concept is the same as the subordinate concept (S950: Yes), the search sentence concept is replaced with the search sentence subordinate concept, and the process proceeds to S940 (S950: Yes). Then, the lower-level concept acquisition unit 830 performs the processing of S940 and S945 again to further specialize the search sentence concept.

このようにして、下位概念取得部８３０は、前述の条件が成立しなくなるまで（Ｓ９５０：Ｎｏ）、検索文概念を下位階層の概念に順次置換していく。これにより、下位概念取得部８３０は、検索文下位概念として、検索文概念に対して複数階層分下位階層に位置する概念を選択することができる。したがって、下位概念取得部８３０は、検索インデクスＤＢ１３５に格納された各文書データの文書概念に応じて、適切な検索文概念を選択することができる。 In this way, the lower-level concept acquisition unit 830 sequentially replaces the search sentence concept with the lower-level concept until the above-described condition is not satisfied (S950: No). Thereby, the lower concept acquisition unit 830 can select a concept located in a lower hierarchy for a plurality of hierarchies with respect to the search sentence concept as a search sentence lower concept. Therefore, the lower-level concept acquisition unit 830 can select an appropriate search sentence concept according to the document concept of each document data stored in the search index DB 135.

そして、特化概念出力部８４０は、検索文下位概念が文書概念と一致する文書データ（Ｓ９５０：Ｎｏ）を、検索結果として出力する（Ｓ９１０）。 Then, the specialized concept output unit 840 outputs document data (S950: No) whose search sentence subordinate concept matches the document concept as a search result (S910).

以上に示した概念検索部１５０によれば、検索結果に応じて検索文概念を上位概念化又は下位概念化することにより、検索対象の文書データを適切に検索することができる。 According to the concept search unit 150 described above, the search target document data can be appropriately searched by converting the search sentence concept into a higher concept or a lower concept according to the search result.

なお、以上に示した処理の結果、得られた検索文概念の全てが下位階層の概念を有しない場合、検索結果出力部１６０は、図５のＳ５４５において、検索文概念と一致する文書概念を有する文書データの一覧を表示する。一方、いずれかの検索文概念が２以上の下位概念を有する場合には、概念選択支援部１５５は、当該検索文概念を２以上の下位概念のそれぞれとした場合に検索される文書データの数を利用者に表示し、利用者によりいずれかの下位概念を選択させてもよい。これに代えて、検索結果出力部１６０は、検索される文書データの数に基づいて、例えばエントロピーを低減する等の基準により、いずれかの下位概念を選択してもよい。 As a result of the processing described above, if all of the obtained search sentence concepts do not have a lower hierarchy concept, the search result output unit 160 selects a document concept that matches the search sentence concept in S545 of FIG. A list of document data that the user has is displayed. On the other hand, if any one of the search sentence concepts has two or more subordinate concepts, the concept selection support unit 155 determines the number of document data to be searched when the search sentence concept is each of two or more subordinate concepts. May be displayed to the user, and any subordinate concept may be selected by the user. Instead, the search result output unit 160 may select any subordinate concept based on the number of document data to be searched, for example, based on a criterion such as reducing entropy.

図１０は、本発明の実施形態に係る概念検索部１５０による汎化・特化の一例を示す。
上位概念取得部８１０は、検索文概念と一致する文書概念を有する文書データが存在しない場合、図９のＳ９２０に示したように検索文概念を汎化する。例えば、本図において検索文概念が「…／シリーズＡ／モデルＡ３」であった場合、同一概念出力部８００は、検索文概念「…／シリーズＡ／モデルＡ３」と同一の文書概念を有する文書データを１件も抽出することができない。そこで、上位概念取得部８１０は、検索文概念「…／シリーズＡ／モデルＡ３」を、上位階層の検索文上位概念「…／シリーズＡ」に置換し汎化する。 FIG. 10 shows an example of generalization / specialization by the concept search unit 150 according to the embodiment of the present invention.
When there is no document data having a document concept that matches the search sentence concept, the higher-level concept acquisition unit 810 generalizes the search sentence concept as shown in S920 of FIG. For example, if the search sentence concept is “... / Series A / model A3” in the figure, the same concept output unit 800 will have a document having the same document concept as the search sentence concept “. No data can be extracted. Therefore, the higher-level concept acquisition unit 810 replaces the search sentence concept “.. ./Series A / model A3” with the higher-level search sentence higher level concept “... / Series A” and generalizes it.

この汎化により、当該検索文上位概念と同一の文書概念を有する文書データを５件抽出することができる。ここで、本図のケースにおいては、「…／シリーズＡ」に対応する文書データの数が、「…／シリーズＡ／モデルＡ２」に対応する文書データの数と同一であり、汎化後の検索文概念（すなわち検索文上位概念）と同一の文書概念を有する文書データの全てが、検索文下位概念「…／シリーズＡ／モデルＡ２」と同一の文書概念を有していることが分かる。このように、検索文概念と同一の文書概念を有する文書データの全てが、検索文概念の下位階層の概念である検索文下位概念と同一の文書概念を有する場合に、下位概念取得部８３０は、検索文概念を、当該検索文下位概念に置換して特化する。これにより、下位概念取得部８３０は、検索文概念を一意に特化することができる。 By this generalization, five pieces of document data having the same document concept as the search sentence superordinate concept can be extracted. Here, in the case of the figure, the number of document data corresponding to “.. ./Series A” is the same as the number of document data corresponding to “. It can be seen that all document data having the same document concept as the search sentence concept (that is, the search sentence superordinate concept) has the same document concept as the search sentence subordinate concept “... / Series A / model A2”. As described above, when all of the document data having the same document concept as the search sentence concept has the same document concept as the search sentence subordinate concept that is a concept of the lower hierarchy of the search sentence concept, the lower concept obtaining unit 830 Then, the search sentence concept is replaced with the search sentence subordinate concept and specialized. Thereby, the lower level concept acquisition unit 830 can uniquely specialize the search sentence concept.

なお、複数の検索文概念が抽出された場合において、下位概念取得部８３０は、１又は２以上の検索文概念を下位概念とした場合に同一の文書データを検索可能であれば、これらの検索文概念を下位概念に置換して特化してもよい。 When a plurality of search sentence concepts are extracted, the lower-level concept acquisition unit 830 searches for the same document data if one or more search sentence concepts are set as lower-level concepts and can search for the same document data. You may specialize by replacing the sentence concept with a subordinate concept.

図１１は、本発明の実施形態に係る検索システム１０の表示画面１１００の一例を示す。
表示画面１１００は、検索文入力画面１１１０と、概念操作画面１１３０と、検索結果出力画面１１６０とを備える。検索文入力画面１１１０は、検索システム１０の利用者に検索文を入力させるための画面である。検索文正規化部１４０は、製品の機種名及び検索文を、検索文入力画面１１１０を用いて入力させ、検索ボタンにより検索開始の指示を受ける。また、検索文入力画面１１１０は、概念検索部１５０により汎化又は特化が行われた場合、その旨を例えば「シリーズＡモデルＡ３をシリーズＡに汎化しました。」のように表示する。 FIG. 11 shows an example of a display screen 1100 of the search system 10 according to the embodiment of the present invention.
The display screen 1100 includes a search text input screen 1110, a concept operation screen 1130, and a search result output screen 1160. The search text input screen 1110 is a screen for allowing a user of the search system 10 to input a search text. The search sentence normalization unit 140 causes the product model name and the search sentence to be input using the search sentence input screen 1110, and receives a search start instruction from the search button. Further, when generalization or specialization is performed by the concept search unit 150, the search sentence input screen 1110 displays that fact, for example, “Series A model A3 has been generalized to series A”.

概念操作画面１１３０は、概念選択支援部１５５の指示に基づいて、検索文から抽出した各検索文概念を表示する。概念選択支援部１５５は、複数の検索文概念が抽出された場合、これらの検索文概念の関係（ＡＮＤ条件、ＯＲ条件）を概念操作画面１１３０に表示する。また、概念選択支援部１５５は、各検索文概念と同一の文書概念を有する文書データの数（頻度）を表示してもよい。概念選択支援部１５５は、検索文概念の削除ボタンにより当該検索文概念を削除する指示を受けると、当該検索文概念を検索条件から取り除く。また、上位概念ボタンにより検索文概念を上位階層の検索文上位概念に置換する指示を受けると、当該検索文概念を検索文上位概念に置換する。概念選択支援部１５５は、概念操作画面１１３０を介して検索文概念を表示し、各検索文概念に対する操作を受けることにより、検索文概念の上位概念化及び／又は下位概念化を支援する。 The concept operation screen 1130 displays each search sentence concept extracted from the search sentence based on an instruction from the concept selection support unit 155. When a plurality of search statement concepts are extracted, the concept selection support unit 155 displays the relationship (AND condition, OR condition) of these search statement concepts on the concept operation screen 1130. The concept selection support unit 155 may display the number (frequency) of document data having the same document concept as each search sentence concept. When the concept selection support unit 155 receives an instruction to delete the search sentence concept with the search sentence concept deletion button, the concept selection support unit 155 removes the search sentence concept from the search condition. Further, when an instruction to replace the search sentence concept with a higher-order search sentence superordinate concept is received by the superordinate concept button, the search sentence concept is replaced with the search sentence superordinate concept. The concept selection support unit 155 displays the search sentence concept via the concept operation screen 1130 and receives an operation on each search sentence concept, thereby supporting higher-level concept and / or lower-level conception of the search sentence concept.

検索結果出力画面１１６０は、検索結果出力部１６０により出力される検索結果を表示する。 The search result output screen 1160 displays search results output by the search result output unit 160.

以上に示したように、検索システム１０によれば、階層構造により体系化された概念に基づいて、検索文の内容を適切に反映して文書データを検索することができる。そして、検索システム１０の利用者は、表示画面１１００を介して検索処理を効率良く行うことができる。
以上に示した検索システム１０は、製品についての問い合わせ及び回答の検索に用いる他、各種の技術情報を文書データとして記憶しておき、検索文に基づいて検索を行う技術情報検索システムとしても使用できる。例えば、検索システム１０は、各種の薬についての情報を文書データとして記憶しておき、「がん細胞を増幅させるたんぱく質」等の検索文に概念的に一致する文書データを検索するシステムとして用いられてもよい。 As described above, according to the search system 10, it is possible to search document data by appropriately reflecting the contents of the search sentence based on the concept organized by the hierarchical structure. The user of the search system 10 can efficiently perform the search process via the display screen 1100.
The search system 10 described above can be used as a technical information search system that stores various technical information as document data and performs a search based on a search sentence, in addition to being used for searching for inquiries and answers about products. . For example, the search system 10 stores information about various drugs as document data, and is used as a system for searching document data that conceptually matches a search sentence such as “a protein that amplifies cancer cells”. May be.

図１２は、本発明の実施形態に係る報告システム２０の構成を示す。報告システム２０は、順次入力される文書データのそれぞれの文書概念を抽出し、特定の文書概念の頻度が所定の値以上となった場合にその旨を利用者に通知する。本実施形態に係る報告システム２０は、一例として、コールセンターに寄せられた製品についての問い合わせをテキスト化した文書データから不具合を示す文書概念を抽出し、特定の文書概念の頻度が所定の値以上となった場合に当該製品に当該不具合が多発していることを通知するものである。 FIG. 12 shows the configuration of the reporting system 20 according to the embodiment of the present invention. The reporting system 20 extracts each document concept of sequentially input document data, and notifies the user when the frequency of a specific document concept exceeds a predetermined value. As an example, the reporting system 20 according to the present embodiment extracts a document concept indicating a failure from document data obtained by texting an inquiry about a product sent to a call center, and the frequency of a specific document concept is equal to or higher than a predetermined value. In such a case, the product is notified that the trouble is frequently occurring.

報告システム２０は、文書ＤＢ１００と、概念ＤＢ１０５と、製品ＤＢ１０６と、構成要素ＤＢ１０７と、辞書ＤＢ１１０と、類義語ＤＢ１１５と、文書データ正規化部１２０と、概念抽出規則ＤＢ１２５と、文書データ概念抽出部１３０と、検索インデクスＤＢ１３５と、概念比率算出部１２００と、相対頻度算出部１２１０と、多頻度概念選択部１２２０と、優先概念選択部１２３０と、基準頻度算出部１２４０と、通知部１２５０とを備える。ここで、文書ＤＢ１００、概念ＤＢ１０５、製品ＤＢ１０６、構成要素ＤＢ１０７、辞書ＤＢ１１０、類義語ＤＢ１１５、文書データ正規化部１２０、概念抽出規則ＤＢ１２５、文書データ概念抽出部１３０、及び検索インデクスＤＢ１３５は、図１に示した同一符号の部材と略同一の機能及び構成を採るため、以下相違点を除き説明を省略する。 The reporting system 20 includes a document DB 100, a concept DB 105, a product DB 106, a component DB 107, a dictionary DB 110, a synonym DB 115, a document data normalization unit 120, a concept extraction rule DB 125, and a document data concept extraction unit 130. A search index DB 135, a concept ratio calculation unit 1200, a relative frequency calculation unit 1210, a frequent concept selection unit 1220, a priority concept selection unit 1230, a reference frequency calculation unit 1240, and a notification unit 1250. Here, the document DB 100, concept DB 105, product DB 106, component DB 107, dictionary DB 110, synonym DB 115, document data normalization unit 120, concept extraction rule DB 125, document data concept extraction unit 130, and search index DB 135 are shown in FIG. In order to adopt substantially the same function and configuration as the members having the same reference numerals, the description is omitted except for the following differences.

文書ＤＢ１００は、入力された文書データを順次記憶する。本実施形態に係る文書ＤＢ１００は、複数の製品のそれぞれについて、当該製品の不具合の内容を示す文書データを記憶する。概念ＤＢ１０５は、製品についての複数の不具合を特定する複数の概念を、図２に例示した階層構造により記憶する。 The document DB 100 sequentially stores input document data. The document DB 100 according to the present embodiment stores document data indicating the contents of defects of the products for each of the plurality of products. The concept DB 105 stores a plurality of concepts that specify a plurality of defects for a product in the hierarchical structure illustrated in FIG.

概念比率算出部１２００は、検索インデクスＤＢ１３５に格納された文書概念を用いて、文書ＤＢ１００内の文書データの数に対する、それぞれの概念に対応する文書データの数の比率を算出する。概念比率算出部１２００は、全製品概念比率算出部１２０３及び特定製品概念比率算出部１２０６を有する。全製品概念比率算出部１２０３は、比較対象となる複数の製品について、文書データの数に対する、それぞれの概念に対応する文書データの数の比率を算出する。例えば、文書ＤＢ１００に格納された文書データ数が１０００であり、文書概念「／不具合／ハードウェア／ハードディスク」を有する文書データの数が３５の場合、当該比率は３．５％（３５／１０００）となる。 The concept ratio calculation unit 1200 uses the document concept stored in the search index DB 135 to calculate the ratio of the number of document data corresponding to each concept to the number of document data in the document DB 100. The concept ratio calculation unit 1200 includes an all product concept ratio calculation unit 1203 and a specific product concept ratio calculation unit 1206. The total product concept ratio calculation unit 1203 calculates the ratio of the number of document data corresponding to each concept to the number of document data for a plurality of products to be compared. For example, when the number of document data stored in the document DB 100 is 1000 and the number of document data having the document concept “/ defect / hardware / hard disk” is 35, the ratio is 3.5% (35/1000). It becomes.

特定製品概念比率算出部１２０６は、不具合が多発していることを報告する対象となる少なくとも１つの製品について、文書データの数に対する、それぞれの概念に対応する当該製品についての文書データの数の比率を算出する。例えば、製品「／製品／ＰＣ製品／ノート／シリーズＡ／モデルＡ２」について文書ＤＢ１００に格納された文書データの数が１００であり、文書概念「／不具合／ハードウェア／ハードディスク」を有する文書データの数が１０の場合、当該比率は１０％（１０／１００）となる。 The specific product concept ratio calculation unit 1206 has a ratio of the number of document data for the product corresponding to each concept with respect to the number of document data for at least one product for which it is reported that many defects have occurred. Is calculated. For example, the number of document data stored in the document DB 100 for the product “/ product / PC product / notebook / series A / model A2” is 100, and the document data having the document concept “/ defect / hardware / hard disk” is stored. When the number is 10, the ratio is 10% (10/100).

相対頻度算出部１２１０は、それぞれの概念に対応する基準比率に対する、概念比率算出部１２００内の特定製品概念比率算出部１２０６により算出された比率の大きさを示す相対頻度を算出する。本実施形態に係る相対頻度算出部１２１０は、全製品概念比率算出部１２０３が算出した比率を基準比率として用い、全製品概念比率算出部１２０３により算出された比率に対する、特定製品概念比率算出部１２０６により算出された比率の大きさを示す相対頻度を算出する。すなわち、上記の例の場合、製品「／製品／ＰＣ製品／ノート／シリーズＡ／モデルＡ２」について文書概念「／不具合／ハードウェア／ハードディスク」に対応する相対頻度は、約２．９（１０％／３．５％）である。 The relative frequency calculation unit 1210 calculates a relative frequency indicating the magnitude of the ratio calculated by the specific product concept ratio calculation unit 1206 in the concept ratio calculation unit 1200 with respect to the reference ratio corresponding to each concept. The relative frequency calculation unit 1210 according to the present embodiment uses the ratio calculated by the all product concept ratio calculation unit 1203 as a reference ratio, and the specific product concept ratio calculation unit 1206 with respect to the ratio calculated by the all product concept ratio calculation unit 1203. The relative frequency indicating the magnitude of the ratio calculated by is calculated. That is, in the case of the above example, the relative frequency corresponding to the document concept “/ defect / hardware / hard disk” for the product “/ product / PC product / notebook / series A / model A2” is about 2.9 (10% /3.5%).

多頻度概念選択部１２２０は、複数の概念のうち、相対頻度が予め定められたしきい値以上となる概念を選択する。優先概念選択部１２３０は、多頻度概念選択部１２２０が選択した第１の概念と、第１の概念の上位階層の第２の概念との一方を、第１の概念及び第２の概念の相対頻度に基づいて選択する。これにより、優先概念選択部１２３０は、多頻度概念選択部１２２０が選択した概念のうち、上位階層又は下位階層の関係にある概念の中から、報告する概念の階層を適切に選択する。 The frequent concept selection unit 1220 selects a concept having a relative frequency equal to or higher than a predetermined threshold among a plurality of concepts. The priority concept selection unit 1230 selects one of the first concept selected by the frequent concept selection unit 1220 and the second concept in the upper hierarchy of the first concept as a relative of the first concept and the second concept. Select based on frequency. Thus, the priority concept selection unit 1230 appropriately selects the concept hierarchy to be reported from the concepts selected by the frequent concept selection unit 1220 from the upper hierarchy or the lower hierarchy.

基準頻度算出部１２４０は、いずれの概念を報告するかを選択する基準となる頻度を計算する。通知部１２５０は、第１の概念又は第２の概念のうち、優先概念選択部１２３０により選択された概念の相対頻度が高くなっていることを、使用者へ通知する。 The reference frequency calculation unit 1240 calculates a reference frequency for selecting which concept to report. The notification unit 1250 notifies the user that the relative frequency of the concept selected by the priority concept selection unit 1230 among the first concept or the second concept is high.

以上に示した報告システム２０によれば、特定の概念に対応する文書データが頻繁に入力されている場合に、概念の階層を適切に選択して、当該概念が多発していることを利用者に報告することができる。これにより、コールセンタへの問い合わせを順次文書データとして登録する文書ＤＢ１００を利用し、ある製品について特定の不具合が多発していることを早期に検出し報告することができる。 According to the reporting system 20 shown above, when document data corresponding to a specific concept is frequently input, the user selects that the concept hierarchy is appropriately selected and the concept is frequently generated. Can be reported to. As a result, the document DB 100 that sequentially registers inquiries to the call center as document data can be used to detect and report at an early stage that a specific defect frequently occurs for a certain product.

なお、以上に示した報告システム２０は、図１に示した検索システム１０の一部として設けられてもよい。すなわち例えば、図１に示した検索システム１０は、図１２に示した概念比率算出部１２００、相対頻度算出部１２１０、多頻度概念選択部１２２０、優先概念選択部１２３０、基準頻度算出部１２４０、及び通知部１２５０を更に備えてもよい。 The reporting system 20 shown above may be provided as a part of the search system 10 shown in FIG. That is, for example, the search system 10 shown in FIG. 1 includes the concept ratio calculation unit 1200, the relative frequency calculation unit 1210, the frequent concept selection unit 1220, the priority concept selection unit 1230, the reference frequency calculation unit 1240, and the like shown in FIG. A notification unit 1250 may be further provided.

図１３は、本発明の実施形態に係る報告システム２０の動作フローを示す。図１３において、図５と同一のステップ番号を付した段階は、図５と略同一の動作を行うため、以下相違点を除き説明を省略する。 FIG. 13 shows an operation flow of the reporting system 20 according to the embodiment of the present invention. In FIG. 13, the steps denoted by the same step numbers as in FIG. 5 perform substantially the same operations as in FIG.

まず、報告システム２０は、文書データが入力される度に、当該文書データのテキスト解析（Ｓ５００）、正規化（Ｓ５０５）、概念抽出（Ｓ５１０）、及び検索インデクスの作成（Ｓ５１５）を行う。 First, every time document data is input, the reporting system 20 performs text analysis (S500), normalization (S505), concept extraction (S510), and search index creation (S515) of the document data.

次に、概念比率算出部１２００は、文書ＤＢ１００内の文書データの数に対する、それぞれの概念又は概念の組に対応する文書データの数の比率を算出する（Ｓ１３４０）。より具体的には、全製品概念比率算出部１２０３は、全ての製品についての当該比率R_allを以下の式（１）により算出し、特定製品概念比率算出部１２０６は、不具合を報告する対象とする製品についての当該比率Rを以下の式（２）により算出する。 Next, the concept ratio calculation unit 1200 calculates the ratio of the number of document data corresponding to each concept or set of concepts to the number of document data in the document DB 100 (S1340). More specifically, the all-product concept ratio calculation unit 1203 calculates the ratio R _all for _all products by the following equation (1), and the specific product concept ratio calculation unit 1206 is a target for reporting defects. The ratio R for the product to be calculated is calculated by the following formula (2).

R_all = #(A_all∩X) / #A_all （１）
R = #(A∩X) / #A （２）
ここで、A_allは全ての製品、Aは不具合を報告する対象とする製品、Xは不具合に対応する概念又は概念の組、#Cは概念Cに対応する文書データの数を示す。ここで全ての製品とは、相対頻度算出部１２１０が相対頻度を算出する基準となる基準比率に寄与する複数の製品である。報告システム２０は、この基準比率に寄与する製品として、図３に例示した製品の概念階層において、発生する不具合の傾向が略同一と認められる階層に対応する複数の製品を用いてよい。すなわち例えば、報告システム２０は、図３における「製品／ＰＣ製品／ノート」の下位階層に位置する「シリーズＡ」、「シリーズＢ」、及び「シリーズＣ」等を基準比率に寄与する製品として用いてもよい。 R _all = # (A _all ∩X) / #A _all (1)
R = # (A∩X) / #A (2)
Here, A _all indicates all products, A indicates a product for which a defect is to be reported, X indicates a concept or a set of concepts corresponding to the defect, and #C indicates the number of document data corresponding to the concept C. Here, all the products are a plurality of products that contribute to a reference ratio that is a reference for the relative frequency calculation unit 1210 to calculate the relative frequency. The reporting system 20 may use a plurality of products corresponding to a hierarchy in which the tendency of a failure to occur is substantially the same in the conceptual hierarchy of the product illustrated in FIG. 3 as a product that contributes to the reference ratio. That is, for example, the reporting system 20 uses “series A”, “series B”, “series C”, etc., which are located in the lower hierarchy of “product / PC product / notebook” in FIG. May be.

なお、概念ＤＢ１０５が、複数の概念のそれぞれを、不具合の概念階層、製品の概念階層、及び、構成要素の概念階層等の複数の階層構造のノードとして記憶している場合、文書データ概念抽出部１３０は、文書データに対応して、複数の階層構造のそれぞれに属する複数の文書概念を抽出してもよい（Ｓ５１０）。例えば、概念ＤＢ１０５が複数の概念のそれぞれを第１の階層構造又は第２の階層構造のノードとして記憶している場合、文書データ概念抽出部１３０は、各文書データに対応して、第１の階層構造に属する第１の文書概念及び第２の階層構造に属する第２の文書概念を抽出してもよい。 When the concept DB 105 stores each of a plurality of concepts as a node having a plurality of hierarchical structures such as a defect concept hierarchy, a product concept hierarchy, and a component concept hierarchy, a document data concept extraction unit 130 may extract a plurality of document concepts belonging to each of a plurality of hierarchical structures corresponding to the document data (S510). For example, when the concept DB 105 stores each of a plurality of concepts as a node of the first hierarchical structure or the second hierarchical structure, the document data concept extraction unit 130 corresponds to each document data, The first document concept belonging to the hierarchical structure and the second document concept belonging to the second hierarchical structure may be extracted.

この場合、概念比率算出部１２００は、全ての製品及び対象の製品のそれぞれについて、文書ＤＢ１００内の文書データの数に対する、第１の階層構造の概念に対応する文書データの数の第１比率R1_all及びR1と、第２の階層構造の概念に対応する文書データの数の第２比率R2_all及びR2と、第１の階層構造の概念及び第２の階層構造の概念の組み合わせに対応する文書データの数の第３比率R1&2_all及びR1&2とをそれぞれ算出しておく。 In this case, the concept ratio calculation unit 1200 has a first ratio R1 of the number of document data corresponding to the concept of the first hierarchical structure with respect to the number of document data in the document DB 100 for each of all products and target products. documents corresponding to a combination of _all and R1, and a second ratio R2 _all and R2 of the number of document data corresponding to the concept of the second hierarchical structure, and a concept of the first hierarchical structure and a concept of the second hierarchical structure The third ratios R1 & 2 _all and R1 & 2 of the number of data are respectively calculated.

次に、相対頻度算出部１２１０は、それぞれの概念に対応する基準比率に対する、概念比率算出部１２００内の特定製品概念比率算出部１２０６により算出された比率の大きさを示す相対頻度RR(=R/R_all)を算出する（Ｓ１３５０）。本実施形態に係る相対頻度算出部１２１０は、全製品概念比率算出部１２０３により算出された比率を基準比率とし、不具合を報告する対象とする製品についての特定製品概念比率算出部１２０６により算出された比率が、全ての製品についての平均的な比率に対してどれだけ大きいかを示す相対頻度を算出する。 Next, the relative frequency calculation unit 1210 has a relative frequency RR (= R) indicating the magnitude of the ratio calculated by the specific product concept ratio calculation unit 1206 in the concept ratio calculation unit 1200 with respect to the reference ratio corresponding to each concept. / R _all ) is calculated (S1350). The relative frequency calculation unit 1210 according to the present embodiment uses the ratio calculated by the all product concept ratio calculation unit 1203 as a reference ratio, and is calculated by the specific product concept ratio calculation unit 1206 for the product to be reported as a defect. A relative frequency is calculated indicating how large the ratio is relative to the average ratio for all products.

ここで、文書データに対応して第１の文書概念及び第２の文書概念の組が抽出された場合、相対頻度算出部１２１０は、第１の階層構造の概念X1に対応する基準比率R1_allに対する第１比率R1の大きさを示す第１相対頻度RR1(=R1/R1_all)と、第２の階層構造の概念X2に対応する基準比率R2_allに対する第２比率R2の大きさを示す第２相対頻度RR2(=R2/R2_all)と、第１の階層構造の概念及び第２の階層構造の概念の組み合わせに対応する概念X1&2に対応する基準比率R1&2_allに対する、第３比率R1&2の大きさを示す第３相対頻度RR1&2(=R1&2/R1&2_all)とを算出する。 Here, when a set of the first document concept and the second document concept is extracted corresponding to the document data, the relative frequency calculation unit 1210 calculates the reference ratio R1 _all corresponding to the concept X1 of the first hierarchical structure. The first relative frequency RR1 (= R1 / R1 _all ) indicating the magnitude of the first ratio R1 with respect to the second and the second ratio R2 relative to the reference ratio R2 _all corresponding to the concept X2 of the second hierarchical structure 2 Relative frequency RR2 (= R2 / R2 _all ) and the third ratio R1 & 2 relative to the reference ratio R1 & 2 _all corresponding to the concept X1 & 2 corresponding to the combination of the concept of the first hierarchical structure and the concept of the second hierarchical structure A third relative frequency RR1 & 2 (= R1 & 2 / R1 & 2 _all ) indicating the above is calculated.

ここで、相対頻度算出部１２１０は、概念に対応する製品についての文書データの数が小さい場合に、大きい場合と比較し当該製品についての当該概念の相対頻度を小さく補正してもよい。より具体的には、相対頻度算出部１２１０は、相対頻度として、信頼係数８０％での区間推定における、信頼区間の最小値を用いてもよい。これにより、相対頻度算出部１２１０は、サンプル数が少なく不具合が多発していると認定するのが難しい状態で不具合を報告するのを避けることができる。 Here, the relative frequency calculation unit 1210 may correct the relative frequency of the concept for the product to be smaller when the number of document data for the product corresponding to the concept is small than when the document data is large. More specifically, the relative frequency calculation unit 1210 may use the minimum value of the confidence interval in the interval estimation with the reliability coefficient of 80% as the relative frequency. Thereby, the relative frequency calculation part 1210 can avoid reporting a malfunction in the state where it is difficult to recognize that the number of samples is small and the malfunction occurs frequently.

次に、多頻度概念選択部１２２０は、複数の概念のうち、相対頻度が予め定められたしきい値以上となる概念を選択する（Ｓ１３６０）。より具体的には、多頻度概念選択部１２２０は、複数の概念のうち、少なくとも１つの製品についての相対頻度が、予め定められたしきい値以上となる概念を選択する。ここで、文書データに対応して第１の階層構造の概念及び第２の階層構造の概念の組が複数抽出された場合、多頻度概念選択部１２２０は、第１の階層構造の概念及び第２の階層構造の概念の組み合わせのうち、相対頻度がしきい値以上となる第１の階層構造の概念及び第２の階層構造の概念の組を選択する。 Next, the frequent concept selection unit 1220 selects a concept having a relative frequency equal to or higher than a predetermined threshold among a plurality of concepts (S1360). More specifically, the frequent concept selection unit 1220 selects a concept whose relative frequency for at least one product is equal to or higher than a predetermined threshold among a plurality of concepts. When a plurality of sets of the first hierarchical structure concept and the second hierarchical structure concept are extracted corresponding to the document data, the frequent concept selection unit 1220 includes the first hierarchical structure concept and the first hierarchical structure concept. Among the two combinations of the hierarchical structure concepts, a first hierarchical structure concept and a second hierarchical structure concept set whose relative frequency is equal to or higher than a threshold value is selected.

次に、基準頻度算出部１２４０は、２以上の文書概念の組み合わせに対応する文書データの相対頻度がしきい値以上である場合に、これらの文書概念の組み合わせを報告するか、又は、これらの文書概念の組み合わせを上位概念化若しくは下位概念化して報告するかを選択する基準となる頻度を計算する（Ｓ１３７０）。 Next, when the relative frequency of the document data corresponding to the combination of two or more document concepts is equal to or higher than the threshold, the reference frequency calculation unit 1240 reports the combination of these document concepts, or A frequency serving as a reference for selecting whether to report a combination of document concepts as a higher concept or a lower concept is calculated (S1370).

より具体的には、基準頻度算出部１２４０は、第１の文書概念及び第２の文書概念の組を報告するか、それとも第１の文書概念として報告するかを判定するための基準頻度として、第１の文書概念及び第２の文書概念が独立事象であった場合における相対頻度の計算値RR1&2_baseを算出する。基準頻度算出部１２４０は、当該基準頻度RR1&2_baseを、第１の文書概念についての相対頻度RR1及び第２の文書概念についての相対頻度RR2に基づいて、以下の式（３）により算出する。
RR1&2_base = RR1×RR2×(#(X1∩A_all)×#(X2∩A_all)) / (#(X1∩X2∩A_all)×#A_all) （３） More specifically, the reference frequency calculation unit 1240 uses a reference frequency for determining whether to report a set of the first document concept and the second document concept or to report as a first document concept, A calculated value RR1 & 2 _base of a relative frequency when the first document concept and the second document concept are independent events is calculated. The reference frequency calculation unit 1240 calculates the reference frequency RR1 & 2 _{base based} on the relative frequency RR1 for the first document concept and the relative frequency RR2 for the second document concept by the following equation (3).
RR1 & 2 _base = RR1 × RR2 × (# (X1∩A _all ) × # (X2∩A _all )) / (# (X1∩X2∩A _all ) × # A _all ) (3)

次に、優先概念選択部１２３０は、多頻度概念選択部１２２０が選択した概念と、当該概念の上位階層の概念との一方を、当該概念及び当該概念の上位階層の概念の相対頻度に基づいて選択する（Ｓ１３８０）。優先概念選択部１２３０は、この上位階層の概念を、多頻度概念選択部１２２０が選択した概念の中から選択し用いてもよい。Ｓ１３８０の処理により優先概念選択部１２３０は、特定の概念の相対頻度がしきい値以上である場合に、当該概念を報告するか、又は、当該概念の上位階層の概念を報告するかを選択する。 Next, the priority concept selection unit 1230 selects one of the concept selected by the frequent concept selection unit 1220 and the concept of the upper hierarchy of the concept based on the relative frequency of the concept and the concept of the upper hierarchy of the concept. Select (S1380). The priority concept selection unit 1230 may select and use the concept of the higher hierarchy from the concepts selected by the frequent concept selection unit 1220. By the processing of S1380, the priority concept selection unit 1230 selects whether to report the concept or a concept of a higher hierarchy of the concept when the relative frequency of the specific concept is equal to or higher than the threshold value. .

より具体的には、１又は複数の文書概念の組{X1,X2,…,Xn}からなる概念Xと、１又は複数の文書概念の組{Y1,Y2,…,Ym}からなる、概念Xより詳細な概念Yとのいずれを報告するかを、以下の（１）又は（２）に示すように選択する。ここで、概念Yが概念Xより詳細とは、任意のXiに対し、いずれかのYjがある概念階層においてXiと同一又は下位階層の概念であることをいい、概念Xが概念Yより一般的であるとも表現する。 More specifically, a concept X consisting of a set of one or more document concepts {X1, X2, ..., Xn} and a set of one or more document concepts {Y1, Y2, ..., Ym} Which of the more detailed concepts Y to report than X is selected as shown in (1) or (2) below. Here, concept Y is more detailed than concept X means that for any Xi, any Yj is the same or lower-level concept as Xi in a certain concept hierarchy. Concept X is more general than concept Y It is also expressed.

（１）X={X1,X2,…,Xn}、Y={Y1,Y2,…,Yn}であり、全てのXkが、ある概念階層におけるYkと同一又は上位階層の概念である場合
例えば、概念Xが「／不具合／ハードウェア／インプット・デバイス／ポインティング・デバイス」(={X1})であり、概念Yが「／不具合／ハードウェア／インプット・デバイス／ポインティング・デバイス／マウス」(={Y1})の場合である。 (1) When X = {X1, X2,..., Xn}, Y = {Y1, Y2,..., Yn}, and all Xk are the same or higher-level concepts as Yk in a certain concept hierarchy. The concept X is “/ defect / hardware / input device / pointing device” (= {X1}) and the concept Y is “/ defect / hardware / input device / pointing device / mouse” (= {Y1}).

この場合、優先概念選択部１２３０は、多頻度概念選択部１２２０が選択した概念Y(={Y1,Y2,…,Yn})と、その上位階層の概念X(={X1,X2,…,Xn})とが、以下の式（４）を満たすか否かを判断する。
(Yの相対頻度) ＞ α×(Xの相対頻度) （４）
ただし、αは予め定められた割合であり、例えば１．５〜２程度の値をとる。 In this case, the priority concept selection unit 1230 selects the concept Y (= {Y1, Y2,..., Yn}) selected by the frequent concept selection unit 1220 and the concept X (= {X1, X2,. Xn}) satisfies the following formula (4).
(Relative frequency of Y)> α × (Relative frequency of X) (4)
However, (alpha) is a predetermined ratio and takes the value of about 1.5-2, for example.

そして、優先概念選択部１２３０は、概念Yの相対頻度が、概念Xの相対頻度と比較して予め定められた割合α以上大きい場合に概念Yを選択し、予め定められた割合α以上大きくない場合に概念Xを選択する。これにより、優先概念選択部１２３０は、上位階層の概念Xの相対頻度が低く、概念Yの相対頻度が高い場合に概念Yを報告し、上位階層の概念Xの相対頻度ががある程度高い場合に概念Yを概念Xにまとめて報告させることができる。 Then, the prioritized concept selection unit 1230 selects the concept Y when the relative frequency of the concept Y is greater than a predetermined ratio α as compared with the relative frequency of the concept X, and is not greater than the predetermined ratio α. If you choose concept X. Thereby, the priority concept selection unit 1230 reports the concept Y when the relative frequency of the concept X in the upper hierarchy is low and the relative frequency of the concept Y is high, and when the relative frequency of the concept X in the upper hierarchy is high to some extent Concept Y can be reported together as Concept X.

（２）X={X1,X2,…,Xn}、Y={Y1,Y2,…,Yn,…,Ym}であり、全てのXk(k=1,…,n)が、ある概念階層におけるYkと同一又は上位階層の概念である場合
例えば、概念Xが「／構成要素／ハードウェア／ファン」(={X1})であり、概念Yが「／構成要素／ハードウェア／ファン」(=Y1)＆＆「／問題／ハードウェア／騒音」(=Y2)の場合である。なお、"＆＆"は、概念Yが、概念Y1及び概念Y2のＡＮＤ条件であることを示す。 (2) X = {X1, X2, ..., Xn}, Y = {Y1, Y2, ..., Yn, ..., Ym}, and all Xk (k = 1, ..., n) are in a certain concept hierarchy. For example, the concept X is “/ component / hardware / fan” (= {X1}) and the concept Y is “/ component / hardware / fan” (= = Y1) && “/ Problem / Hardware / Noise” (= Y2). “&&” indicates that the concept Y is an AND condition of the concepts Y1 and Y2.

この場合、優先概念選択部１２３０は、多頻度概念選択部１２２０が選択した概念Yと、その上位階層の概念Xとが、以下の式（５）を満たすか否かを判断する。ただし、Zは、同一の概念階層における上位階層又は下位階層の相違を考慮しない場合のXとYの差分の概念の組（={Yn+1,…,Ym}、上記の例においては{Y2}）である。
(Yの相対頻度)＞α×(XとZが独立事象である場合のYの相対頻度(RR1&2_base)) （５） In this case, the priority concept selection unit 1230 determines whether the concept Y selected by the frequent concept selection unit 1220 and the concept X of the higher hierarchy satisfy the following formula (5). However, Z is a set of concepts of differences between X and Y (= {Yn + 1,..., Ym}) in the above example when the difference between the upper and lower layers in the same concept hierarchy is not considered. }).
(Relative frequency of Y)> α × (Relative frequency of Y when X and Z are independent events (RR1 & 2 _base )) (5)

そして、優先概念選択部１２３０は、概念Yの相対頻度RR1&2が、概念X及び概念Zが独立事象であった場合における概念Yの相対頻度の計算値RR1&2_baseと比較して予め定められた割合α以上大きい場合に、概念X及び概念Zの組み合わせである概念Yを選択し、予め定められた割合α以上大きくない場合に、概念Xを選択する。これにより、優先概念選択部１２３０は、概念Yが概念X及び概念Zの組み合わせに起因する可能性が高い場合に、概念Yを報告させることができる。 Then, the priority concept selection unit 1230 compares the relative frequency RR1 & 2 of the concept Y with a predetermined ratio α compared to the calculated value RR1 & 2 _base of the relative frequency of the concept Y when the concept X and the concept Z are independent events. If it is larger than the above, the concept Y that is a combination of the concept X and the concept Z is selected, and if it is not larger than the predetermined ratio α, the concept X is selected. Accordingly, the priority concept selection unit 1230 can report the concept Y when the concept Y is highly likely to be caused by the combination of the concept X and the concept Z.

なお、優先概念選択部１２３０は、概念Yに対して上記（１）及び（２）の両方を適用することにより、より上位階層の概念Xを求めてもよい。 Note that the priority concept selection unit 1230 may obtain the concept X of a higher hierarchy by applying both (1) and (2) to the concept Y.

次に、通知部１２５０は、概念Y又は概念Xのうち、優先概念選択部１２３０により選択された概念に対応する不具合が少なくとも１つの製品に多発していることを、当該検索システム１０の使用者へ通知する（Ｓ１３９０）。ここで上記（２）の場合、通知部１２５０は、優先概念選択部１２３０により選択された、概念X及び概念Zの組み合わせである概念Y、又は概念Xの相対頻度が高くなっていることを、使用者へ通知する。 Next, the notifying unit 1250 indicates that the user of the search system 10 has frequently reported that at least one product among the concepts Y or X corresponds to the concept selected by the priority concept selecting unit 1230. (S1390). Here, in the case of (2), the notification unit 1250 indicates that the concept Y, which is a combination of the concept X and the concept Z, selected by the priority concept selection unit 1230, or the relative frequency of the concept X is high. Notify the user.

そして、新たな文書データが文書ＤＢ１００に追加されると、報告システム２０は、処理をＳ５００へ進める（Ｓ１３９５）。これに代えて、報告システム２０は、例えば１週間等の予め定められた期間毎に、上記の処理を行ってもよい。 Then, when new document data is added to the document DB 100, the reporting system 20 advances the process to S500 (S1395). Instead, the reporting system 20 may perform the above processing every predetermined period such as one week.

以上に示した報告システム２０によれば、順次入力される文書データのそれぞれの文書概念を抽出し、多頻度概念選択部１２２０及び優先概念選択部１２３０により選択された概念を通知することにより、特定の文書概念又は文書概念の組の頻度が所定の値以上となった場合にその旨を利用者に通知することができる。これにより、例えばコールセンターに寄せられる製品についての問い合わせの数に応じて、製品の不具合を早期に通知することができる。 According to the reporting system 20 shown above, each document concept of sequentially input document data is extracted, and the concept selected by the frequent concept selection unit 1220 and the priority concept selection unit 1230 is notified to identify When the frequency of the document concept or the set of document concepts exceeds a predetermined value, the user can be notified of this. Thereby, for example, according to the number of inquiries about the product sent to the call center, it is possible to notify the malfunction of the product at an early stage.

図１４は、本実施形態に係るコンピュータ１９００のハードウェア構成の一例を示す。本実施形態に係るコンピュータ１９００は、ホスト・コントローラ２０８２により相互に接続されるＣＰＵ２０００、ＲＡＭ２０２０、グラフィック・コントローラ２０７５、及び表示装置２０８０を有するＣＰＵ周辺部と、入出力コントローラ２０８４によりホスト・コントローラ２０８２に接続される通信インターフェイス２０３０、ハードディスクドライブ２０４０、及びＣＤ−ＲＯＭドライブ２０６０を有する入出力部と、入出力コントローラ２０８４に接続されるＲＯＭ２０１０、フレキシブルディスク・ドライブ２０５０、及び入出力チップ２０７０を有するレガシー入出力部とを備える。 FIG. 14 shows an example of a hardware configuration of a computer 1900 according to this embodiment. A computer 1900 according to this embodiment is connected to a CPU peripheral unit having a CPU 2000, a RAM 2020, a graphic controller 2075, and a display device 2080 that are connected to each other by a host controller 2082, and to the host controller 2082 by an input / output controller 2084. Input / output unit having communication interface 2030, hard disk drive 2040, and CD-ROM drive 2060, and legacy input / output unit having ROM 2010, flexible disk drive 2050, and input / output chip 2070 connected to input / output controller 2084 With.

ホスト・コントローラ２０８２は、ＲＡＭ２０２０と、高い転送レートでＲＡＭ２０２０をアクセスするＣＰＵ２０００及びグラフィック・コントローラ２０７５とを接続する。ＣＰＵ２０００は、ＲＯＭ２０１０及びＲＡＭ２０２０に格納されたプログラムに基づいて動作し、各部の制御を行う。グラフィック・コントローラ２０７５は、ＣＰＵ２０００等がＲＡＭ２０２０内に設けたフレーム・バッファ上に生成する画像データを取得し、表示装置２０８０上に表示させる。これに代えて、グラフィック・コントローラ２０７５は、ＣＰＵ２０００等が生成する画像データを格納するフレーム・バッファを、内部に含んでもよい。 The host controller 2082 connects the RAM 2020 to the CPU 2000 and the graphic controller 2075 that access the RAM 2020 at a high transfer rate. The CPU 2000 operates based on programs stored in the ROM 2010 and the RAM 2020 and controls each unit. The graphic controller 2075 acquires image data generated by the CPU 2000 or the like on a frame buffer provided in the RAM 2020 and displays it on the display device 2080. Instead of this, the graphic controller 2075 may include a frame buffer for storing image data generated by the CPU 2000 or the like.

入出力コントローラ２０８４は、ホスト・コントローラ２０８２と、比較的高速な入出力装置である通信インターフェイス２０３０、ハードディスクドライブ２０４０、ＣＤ−ＲＯＭドライブ２０６０を接続する。通信インターフェイス２０３０は、ネットワークを介して他の装置と通信する。ハードディスクドライブ２０４０は、コンピュータ１９００内のＣＰＵ２０００が使用するプログラム及びデータを格納する。ＣＤ−ＲＯＭドライブ２０６０は、ＣＤ−ＲＯＭ２０９５からプログラム又はデータを読み取り、ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供する。 The input / output controller 2084 connects the host controller 2082 to the communication interface 2030, the hard disk drive 2040, and the CD-ROM drive 2060, which are relatively high-speed input / output devices. The communication interface 2030 communicates with other devices via a network. The hard disk drive 2040 stores programs and data used by the CPU 2000 in the computer 1900. The CD-ROM drive 2060 reads a program or data from the CD-ROM 2095 and provides it to the hard disk drive 2040 via the RAM 2020.

また、入出力コントローラ２０８４には、ＲＯＭ２０１０と、フレキシブルディスク・ドライブ２０５０、及び入出力チップ２０７０の比較的低速な入出力装置とが接続される。ＲＯＭ２０１０は、コンピュータ１９００が起動時に実行するブート・プログラムや、コンピュータ１９００のハードウェアに依存するプログラム等を格納する。フレキシブルディスク・ドライブ２０５０は、フレキシブルディスク２０９０からプログラム又はデータを読み取り、ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供する。入出力チップ２０７０は、フレキシブルディスク・ドライブ２０５０や、例えばパラレル・ポート、シリアル・ポート、キーボード・ポート、マウス・ポート等を介して各種の入出力装置を接続する。 The input / output controller 2084 is connected to the ROM 2010, the flexible disk drive 2050, and the relatively low-speed input / output device of the input / output chip 2070. The ROM 2010 stores a boot program that the computer 1900 executes at startup, a program that depends on the hardware of the computer 1900, and the like. The flexible disk drive 2050 reads a program or data from the flexible disk 2090 and provides it to the hard disk drive 2040 via the RAM 2020. The input / output chip 2070 connects various input / output devices via a flexible disk drive 2050 and, for example, a parallel port, a serial port, a keyboard port, a mouse port, and the like.

ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供されるプログラムは、フレキシブルディスク２０９０、ＣＤ−ＲＯＭ２０９５、又はＩＣカード等の記録媒体に格納されて利用者によって提供される。プログラムは、記録媒体から読み出され、ＲＡＭ２０２０を介してコンピュータ１９００内のハードディスクドライブ２０４０にインストールされ、ＣＰＵ２０００において実行される。 A program provided to the hard disk drive 2040 via the RAM 2020 is stored in a recording medium such as the flexible disk 2090, the CD-ROM 2095, or an IC card and provided by the user. The program is read from the recording medium, installed in the hard disk drive 2040 in the computer 1900 via the RAM 2020, and executed by the CPU 2000.

コンピュータ１９００にインストールされ、コンピュータ１９００を検索システム１０として機能させる検索プログラムは、文書ＤＢ１００を管理する文書ＤＢ管理モジュールと、概念ＤＢ１０５を管理する概念ＤＢ管理モジュールと、製品ＤＢ１０６を管理する製品ＤＢ管理モジュールと、構成要素ＤＢ１０７を管理する構成要素ＤＢ管理モジュールと、辞書ＤＢ１１０を管理する辞書ＤＢモジュールと、類義語ＤＢ１１５を管理する類義語ＤＢモジュールと、文書データ正規化モジュールと、概念抽出規則ＤＢ１２５を管理する概念抽出規則ＤＢモジュールと、文書データ概念抽出モジュールと、検索インデクスＤＢ１３５を管理する検索インデクスＤＢモジュールと、検索文正規化モジュールと、検索文概念抽出モジュールと、概念検索モジュールと、概念選択支援モジュールと、検索結果出力モジュールとを備える。これらのプログラム又はモジュールは、ＣＰＵ２０００等に働きかけて、コンピュータ１９００を、文書ＤＢ１００と、概念ＤＢ１０５と、製品ＤＢ１０６と、構成要素ＤＢ１０７と、辞書ＤＢ１１０と、類義語ＤＢ１１５と、文書データ正規化部１２０と、概念抽出規則ＤＢ１２５と、文書データ概念抽出部１３０と、検索インデクスＤＢ１３５と、検索文正規化部１４０と、検索文概念抽出部１４５と、概念検索部１５０と、概念選択支援部１５５と、検索結果出力部１６０としてそれぞれ機能させる。 The search program installed in the computer 1900 and causing the computer 1900 to function as the search system 10 includes a document DB management module that manages the document DB 100, a concept DB management module that manages the concept DB 105, and a product DB management module that manages the product DB 106. A component DB management module that manages the component DB 107, a dictionary DB module that manages the dictionary DB 110, a synonym DB module that manages the synonym DB 115, a document data normalization module, and a concept that manages the concept extraction rule DB 125 An extraction rule DB module, a document data concept extraction module, a search index DB module for managing the search index DB 135, a search sentence normalization module, a search sentence concept extraction module, a concept check Comprising a module, a concept selection support module, and a search result output module. These programs or modules work on the CPU 2000 or the like to make the computer 1900, the document DB 100, the concept DB 105, the product DB 106, the component DB 107, the dictionary DB 110, the synonym DB 115, the document data normalization unit 120, Concept extraction rule DB 125, document data concept extraction section 130, search index DB 135, search sentence normalization section 140, search sentence concept extraction section 145, concept search section 150, concept selection support section 155, search results It functions as the output unit 160.

また、コンピュータ１９００にインストールされ、コンピュータ１９００を報告システム２０として機能させる報告プログラムは、文書ＤＢ１００を管理する文書ＤＢ管理モジュールと、概念ＤＢ１０５を管理する概念ＤＢ管理モジュールと、製品ＤＢ１０６を管理する製品ＤＢ管理モジュールと、構成要素ＤＢ１０７を管理する構成要素ＤＢ管理モジュールと、辞書ＤＢ１１０を管理する辞書ＤＢモジュールと、類義語ＤＢ１１５を管理する類義語ＤＢモジュールと、文書データ正規化モジュールと、概念抽出規則ＤＢ１２５を管理する概念抽出規則ＤＢモジュールと、文書データ概念抽出モジュールと、検索インデクスＤＢ１３５を管理する検索インデクスＤＢモジュールと、全製品概念比率算出モジュール及び特定製品概念比率算出モジュールを有する概念比率算出モジュールと、相対頻度算出モジュールと、多頻度概念選択モジュールと、優先概念選択モジュールと、基準頻度算出モジュールと、通知モジュールとを備える。これらのプログラム又はモジュールは、ＣＰＵ２０００等に働きかけて、コンピュータ１９００を、文書ＤＢ１００と、概念ＤＢ１０５と、製品ＤＢ１０６と、構成要素ＤＢ１０７と、辞書ＤＢ１１０と、類義語ＤＢ１１５と、文書データ正規化部１２０と、概念抽出規則ＤＢ１２５と、文書データ概念抽出部１３０と、検索インデクスＤＢ１３５と、全製品概念比率算出部１２０３及び特定製品概念比率算出部１２０６を有する概念比率算出部１２００と、相対頻度算出部１２１０と、多頻度概念選択部１２２０と、優先概念選択部１２３０と、基準頻度算出部１２４０と、通知部１２５０としてそれぞれ機能させる。 The report program installed in the computer 1900 and causing the computer 1900 to function as the reporting system 20 includes a document DB management module that manages the document DB 100, a concept DB management module that manages the concept DB 105, and a product DB that manages the product DB 106. Manages a management module, a component DB management module that manages the component DB 107, a dictionary DB module that manages the dictionary DB 110, a synonym DB module that manages the synonym DB 115, a document data normalization module, and a concept extraction rule DB 125 Concept extraction rule DB module, document data concept extraction module, search index DB module for managing the search index DB 135, all product concept ratio calculation module, and specific product concept ratio calculation module Comprising a concept ratio calculating module having Yuru, the relative frequency calculating module, a frequent concept selection module, a priority concept selection module, a reference frequency calculation module, a notification module. These programs or modules work on the CPU 2000 or the like to make the computer 1900, the document DB 100, the concept DB 105, the product DB 106, the component DB 107, the dictionary DB 110, the synonym DB 115, the document data normalization unit 120, A concept extraction rule DB 125, a document data concept extraction unit 130, a search index DB 135, a concept ratio calculation unit 1200 having an all product concept ratio calculation unit 1203 and a specific product concept ratio calculation unit 1206, a relative frequency calculation unit 1210, The frequent concept selection unit 1220, the priority concept selection unit 1230, the reference frequency calculation unit 1240, and the notification unit 1250 are caused to function.

以上に示したプログラム又はモジュールは、外部の記憶媒体に格納されてもよい。記憶媒体としては、フレキシブルディスク２０９０、ＣＤ−ＲＯＭ２０９５の他に、ＤＶＤやＣＤ等の光学記録媒体、ＭＯ等の光磁気記録媒体、テープ媒体、ＩＣカード等の半導体メモリ等を用いることができる。また、専用通信ネットワークやインターネットに接続されたサーバシステムに設けたハードディスク又はＲＡＭ等の記憶装置を記録媒体として使用し、ネットワークを介してプログラムをコンピュータ１９００に提供してもよい。 The program or module shown above may be stored in an external storage medium. As the storage medium, in addition to the flexible disk 2090 and the CD-ROM 2095, an optical recording medium such as DVD or CD, a magneto-optical recording medium such as MO, a tape medium, a semiconductor memory such as an IC card, or the like can be used. Further, a storage device such as a hard disk or a RAM provided in a server system connected to a dedicated communication network or the Internet may be used as a recording medium, and the program may be provided to the computer 1900 via the network.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above-described embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

本発明の実施形態に係る検索システム１０の構成を示す。1 shows a configuration of a search system 10 according to an embodiment of the present invention. 本発明の実施形態に係る概念ＤＢ１０５が記憶する不具合の概念階層の一例を示す。An example of the concept hierarchy of the malfunction which concept DB105 concerning the embodiment of the present invention memorizes is shown. 本発明の実施形態に係る製品ＤＢ１０６が記憶する製品の概念階層の一例を示す。An example of the concept hierarchy of the product which product DB106 concerning the embodiment of the present invention memorizes is shown. 本発明の実施形態に係る構成要素ＤＢ１０７が記憶する構成要素の概念階層の一例を示す。An example of the concept hierarchy of the component memorize | stored in component DB107 which concerns on embodiment of this invention is shown. 本発明の実施形態に係る検索システム１０の動作フローを示す。The operation | movement flow of the search system 10 which concerns on embodiment of this invention is shown. 本発明の実施形態に係る類義語ＤＢ１１５が記憶する正規化規則の一例を示す。An example of the normalization rule which the synonym DB115 which concerns on embodiment of this invention memorize | stores is shown. 本発明の実施形態に係る概念抽出規則ＤＢ１２５が記憶する概念抽出規則の一例を示す。An example of the concept extraction rule which the concept extraction rule DB125 which concerns on embodiment of this invention memorize | stores is shown. 本発明の実施形態に係る概念検索部１５０の構成を示す。The structure of the concept search part 150 which concerns on embodiment of this invention is shown. 本発明の実施形態に係る概念検索部１５０の動作フローを示す。The operation | movement flow of the concept search part 150 which concerns on embodiment of this invention is shown. 本発明の実施形態に係る概念検索部１５０による汎化・特化の一例を示す。An example of generalization and specialization by the concept search unit 150 according to the embodiment of the present invention is shown. 本発明の実施形態に係る検索システム１０の表示画面１１００の一例を示す。An example of the display screen 1100 of the search system 10 which concerns on embodiment of this invention is shown. 本発明の実施形態に係る報告システム２０の構成を示す。1 shows a configuration of a reporting system 20 according to an embodiment of the present invention. 本発明の実施形態に係る報告システム２０の動作フローを示す。The operation | movement flow of the report system 20 which concerns on embodiment of this invention is shown. 本発明の実施形態に係るコンピュータ１９００のハードウェア構成の一例を示す。2 shows an exemplary hardware configuration of a computer 1900 according to an embodiment of the present invention.

１０検索システム
２０報告システム
１００文書ＤＢ
１０５概念ＤＢ
１０６製品ＤＢ
１０７構成要素ＤＢ
１１０辞書ＤＢ
１１５類義語ＤＢ
１２０文書データ正規化部
１２５概念抽出規則ＤＢ
１３０文書データ概念抽出部
１３５検索インデクスＤＢ
１４０検索文正規化部
１４５検索文概念抽出部
１５０概念検索部
１５５概念選択支援部
１６０検索結果出力部
８００同一概念出力部
８１０上位概念取得部
８２０汎化概念出力部
８３０下位概念取得部
８４０特化概念出力部
１１００表示画面
１１１０検索文入力画面
１１３０概念操作画面
１１６０検索結果出力画面
１２００概念比率算出部
１２０３全製品概念比率算出部
１２０６特定製品概念比率算出部
１２１０相対頻度算出部
１２２０多頻度概念選択部
１２３０優先概念選択部
１２４０基準頻度算出部
１２５０通知部
１９００コンピュータ
２０００ＣＰＵ
２０１０ＲＯＭ
２０２０ＲＡＭ
２０３０通信インターフェイス
２０４０ハードディスクドライブ
２０５０フレキシブルディスク・ドライブ
２０６０ＣＤ−ＲＯＭドライブ
２０７０入出力チップ
２０７５グラフィック・コントローラ
２０８０表示装置
２０８２ホスト・コントローラ
２０８４入出力コントローラ
２０９０フレキシブルディスク
２０９５ＣＤ−ＲＯＭ 10 Search system 20 Reporting system 100 Document DB
105 Concept DB
106 Product DB
107 Component DB
110 Dictionary DB
115 Synonyms DB
120 Document data normalization unit 125 Concept extraction rule DB
130 Document Data Concept Extraction Unit 135 Search Index DB
140 Search sentence normalization part 145 Search sentence concept extraction part 150 Concept search part 155 Concept selection support part 160 Search result output part 800 Same concept output part 810 Upper concept acquisition part 820 Generalized concept output part 830 Lower concept acquisition part 840 Specialization Concept output unit 1100 Display screen 1110 Search sentence input screen 1130 Concept operation screen 1160 Search result output screen 1200 Concept ratio calculation unit 1203 All product concept ratio calculation unit 1206 Specific product concept ratio calculation unit 1210 Relative frequency calculation unit 1220 Multi-frequency concept selection unit 1230 Priority concept selection unit 1240 Reference frequency calculation unit 1250 Notification unit 1900 Computer 2000 CPU
2010 ROM
2020 RAM
2030 Communication interface 2040 Hard disk drive 2050 Flexible disk drive 2060 CD-ROM drive 2070 Input / output chip 2075 Graphic controller 2080 Display device 2082 Host controller 2084 Input / output controller 2090 Flexible disk 2095 CD-ROM

Claims

A document database for sequentially storing input document data;
A concept database that stores a plurality of predetermined concepts by means of a hierarchical structure in which another concept including one concept is an upper hierarchy of the one concept;
A document data concept extraction unit that extracts a document concept that is the concept corresponding to the document data based on a keyword included in each of the document data;
A concept ratio calculation unit that calculates a ratio of the number of document data corresponding to each concept to the number of document data in the document database;
A relative frequency calculating unit that calculates a relative frequency indicating a magnitude of the ratio calculated by the concept ratio calculating unit with respect to a reference ratio corresponding to each of the concepts;
Among the plurality of concepts, a multi-frequency concept selecting unit that selects the concept having the relative frequency equal to or higher than a predetermined threshold;
One of the first concept selected by the frequent concept selection unit and the second concept in the upper hierarchy of the first concept is set as the relative frequency of the first concept and the second concept. A priority concept selection unit to select based on;
A reporting system comprising: a notifying unit for notifying a user that the relative frequency of the concept selected by the priority concept selecting unit from the first concept or the second concept is high.

The document database stores, for each of a plurality of products, document data indicating contents of defects of the plurality of products,
The concept database stores the plurality of concepts for identifying a plurality of defects about a product,
The document data concept extraction unit extracts a document concept that is the concept corresponding to the document data based on a keyword included in each document data,
The concept ratio calculation unit
An overall product concept ratio calculating unit that calculates a ratio of the number of document data corresponding to each concept to the number of document data for the plurality of products;
A specific product concept ratio calculating unit that calculates a ratio of the number of document data for the product corresponding to each concept to the number of the document data for at least one product,
The relative frequency calculation unit calculates the relative frequency indicating the size of the ratio calculated by the specific product concept ratio calculation unit with respect to the ratio calculated by the all product concept ratio calculation unit,
The multi-frequency concept selection unit selects, from the plurality of concepts, the concept in which the relative frequency for the at least one product is equal to or higher than the predetermined threshold value,
The priority concept selection unit selects one of the first concept selected by the frequent concept selection unit and the second concept in a higher hierarchy of the first concept as the first concept and the first concept. Select based on the relative frequency of the two concepts,
The notification unit reports that the at least one product frequently has a defect corresponding to the concept selected by the priority concept selection unit from the first concept or the second concept. Notify system users
The reporting system according to claim 1 .

The priority concept selection unit selects the first concept when the relative frequency of the first concept is greater than a predetermined ratio compared to the relative frequency of the second concept, and determines the predetermined concept. The reporting system according to claim 1 or 2 , wherein the second concept is selected when the ratio is not larger than a predetermined ratio.

The concept database stores each of the plurality of concepts as a node of the first hierarchical structure or the second hierarchical structure,
The document data concept extraction unit extracts the first document concept belonging to the first hierarchical structure and the second document concept belonging to the second hierarchical structure corresponding to the document data,
The concept ratio calculation unit includes a first ratio of the number of the document data corresponding to the concept of the first hierarchical structure with respect to the number of the document data in the document database, and the second hierarchical structure. A second ratio of the number of document data corresponding to a concept, and a third ratio of the number of document data corresponding to a combination of the concept of the first hierarchical structure and the concept of the second hierarchical structure. Calculate
The relative frequency calculation unit includes a first relative frequency indicating a magnitude of the first ratio with respect to a reference ratio corresponding to the concept of the first hierarchical structure, and a reference corresponding to the concept of the second hierarchical structure. A second relative frequency indicating a magnitude of the second ratio with respect to a ratio, and a third ratio with respect to a reference ratio corresponding to a combination of the concept of the first hierarchical structure and the concept of the second hierarchical structure. Calculating the third relative frequency indicating the magnitude,
The frequent concept selection unit is configured such that the relative frequency of the combination of the concept of the first hierarchical structure and the concept of the second hierarchical structure is equal to or higher than the predetermined threshold value . Selecting a third concept of the second hierarchical structure and a fourth concept of the second hierarchical structure;
The reporting system is configured such that the third concept and the fourth concept are independent events based on the first relative frequency for the third concept and the second relative frequency for the fourth concept. A reference frequency calculation unit for calculating a calculated value of the third relative frequency in the case of
The priority concept selection unit is configured such that the third relative frequency is a ratio determined in advance in comparison with a calculated value of the third relative frequency when the third concept and the fourth concept are independent events. If greater than or equal to, select a combination of the third concept and the fourth concept; if not greater than the predetermined percentage, select the third concept;
The notification unit notifies the user that the combination of the third concept and the fourth concept selected by the priority concept selection unit or the relative frequency of the third concept is high.
The reporting system according to any one of claims 1 to 3 .

  A search sentence concept extraction unit that extracts a search sentence concept that is the concept corresponding to the search sentence, based on a keyword included in the input search sentence;
  A concept search unit for searching for document data in which the search sentence concept is an upper hierarchy or lower hierarchy concept of the document concept among each of the plurality of document data;
  A search result output unit for outputting the document data searched by the concept search unit as the document data including content specified by the search sentence;
  The reporting system according to claim 1, further comprising:

  A concept extraction rule database that stores a concept extraction rule that includes a set of one or more of the keywords and the concept indicating the semantic content of the one or more keywords;
  The search sentence concept extraction unit, when the one or more keywords included in any of the concept extraction rules are included in the search sentence, the concept included in the concept extraction rule as the search sentence concept Extract
  The reporting system according to claim 5.

The document data concept extraction unit extracts the concept included in the concept extraction rule as the document concept when the one or more keywords included in any of the concept extraction rules are included in the document data. Do
The reporting system according to claim 6.

  For each of the document data, further comprising a search index database that stores correspondence between the document concept of the document data extracted by the document data concept extraction unit and the document data,
  The concept search unit corresponds to a document concept when the search sentence concept is a concept of an upper hierarchy or a lower hierarchy of the document concept stored in the search index database before the search sentence is input. Output the document data as search results
  The reporting system according to any one of claims 5 to 7.

  A synonym database that stores a correspondence between a predetermined phrase and the keyword that is a synonym of the phrase;
  A document data normalization unit that normalizes the document data by replacing the phrase included in each document data with the keyword that is a synonym of the phrase;
  A search sentence normalization unit that normalizes the search sentence by replacing the phrase included in the search sentence with the keyword that is a synonym of the phrase;
  Further comprising
  The document data concept extraction unit extracts the document concept from the normalized document data,
  The search sentence concept extraction unit extracts the search sentence concept from the normalized search sentence
  The reporting system according to any one of claims 5 to 8.

  The concept search unit
  A superordinate concept acquisition unit that, when the search sentence concept does not match the document concept, acquires a search sentence superordinate concept that is a concept of an upper hierarchy of the search sentence concept;
  A generalized concept output unit that outputs the document data as a search result when the search sentence superordinate concept matches the document concept;
  The reporting system according to any one of claims 5 to 9.

  The concept database stores each of the plurality of concepts as a node of the first hierarchical structure or the second hierarchical structure,
  The document data concept extraction unit extracts the first document concept belonging to the first hierarchical structure and the second document concept belonging to the second hierarchical structure corresponding to the document data,
  The search sentence concept extraction unit extracts the first search sentence concept belonging to the first hierarchical structure and the second search sentence concept belonging to the second hierarchical structure corresponding to the search sentence. ,
  The higher-level concept acquisition unit, when the first search sentence concept and the second search sentence concept are not the same as the first document concept and the second document concept, respectively. Obtaining the first search sentence superordinate concept in the upper hierarchy of the concept and the second search sentence superordinate concept in the upper hierarchy of the second search sentence concept;
  The generalization concept output unit includes the first document in which the first search sentence superordinate concept and the first document concept are the same as the second search sentence concept and the second document concept. The number of second document data in which the number of data is the same as the first search sentence concept and the first document concept, and the second search sentence superordinate concept and the second document concept. If smaller, output the first document data as a search result
  The reporting system according to claim 10.

  The concept search unit
  When all of the document data having the same document concept as the search sentence concept has the same document concept as the search sentence subordinate concept that is a concept of a lower hierarchy of the search sentence concept, the search sentence concept is , A subordinate concept obtaining unit that replaces the search sentence subordinate concept,
  A specialized concept output unit for outputting the document data whose search sentence subordinate concept matches the document concept as a search result;
  The reporting system according to claim 5, comprising:

  The concept database stores the plurality of concepts for identifying a plurality of defects about a product,
  The document database stores, for each of the plurality of defects, the document data indicating the contents of the plurality of defects,
  The search sentence concept extraction unit extracts the search sentence concept corresponding to the search sentence for searching for defects in the product, which is input by a user,
  The search result output unit outputs the document data searched by the concept search unit as the document data indicating the content of the defect of the product input by the user. Or reporting system.

  The concept database stores the plurality of concepts according to a hierarchical structure in which the concept indicating the state of failure of the component is provided in a lower hierarchy of the concept indicating that the component of the product is defective.
  The document data concept extraction unit extracts the document concept indicating that one component is defective based on a keyword included in the document data,
  The search sentence concept extraction unit extracts the search sentence concept indicating a failure state of the one component based on a keyword included in the search sentence,
  The concept search unit
  An upper concept acquisition unit that acquires the concept indicating that there is a defect in the one component, which is the concept in the upper hierarchy of the search sentence concept, as a search sentence upper concept;
  A generalized concept output unit that outputs the document data having the document concept indicating that there is a defect in the one component that matches the search sentence superordinate concept, as a search result;
  Have
  The reporting system according to claim 13.

  A component database for storing the inclusion relationship of each component of the product in a hierarchical structure;
  The document data concept extraction unit further extracts the document concept indicating the component described in the document data based on a keyword included in the document data,
  The search sentence concept extraction unit further extracts the search sentence concept indicating the components described in the search sentence based on a keyword included in the search sentence,
  The higher concept acquisition unit includes the concept that is a higher hierarchy of the first search sentence concept indicating that the component has a defect or a failure state of the component, and the second that indicates the component. Obtain the concept that is the upper hierarchy of the search sentence concept,
  The generalization concept output unit, when at least one of the first search sentence concept and the second search sentence concept is the higher-level concept, the document concept that matches the first search sentence concept And the document data having the document concept matching the second search sentence concept is output as a search result.
  The reporting system according to claim 14.

  A product database for storing the inclusion relationship of the product names of the plurality of products in a hierarchical structure;
  The document data concept extraction unit further extracts the document concept indicating the product name described in the document data based on a keyword included in the document data,
  The search sentence concept extraction unit further extracts the search sentence concept indicating the product name described in the search sentence concept extraction unit based on a keyword included in the search sentence,
  The higher-level concept acquisition unit includes the concept that is an upper layer of the first search statement concept indicating that the component has a defect or a defect state of the component, and the second that indicates the product name. Obtain the concept that is the upper hierarchy of the search sentence concept,
  The generalization concept output unit, when at least one of the first search sentence concept and the second search sentence concept is the higher-level concept, the document concept that matches the first search sentence concept And the document data having the document concept matching the second search sentence concept is output as a search result.
  The reporting system according to claim 14 or 15.

A reporting method in a reporting system in which a plurality of document data is sequentially input,
A document database storage stage for sequentially storing input document data;
A concept database storage stage for storing a plurality of predetermined concepts in a hierarchical structure in which another concept including the one concept is an upper hierarchy of the one concept;
A document data concept extraction step for extracting a document concept, which is the concept corresponding to the document data, based on a keyword included in each document data;
A concept ratio calculating step of calculating a ratio of the number of the document data corresponding to each of the concepts to the number of the document data stored in the document database storing step;
A relative frequency calculating step for calculating a relative frequency indicating a magnitude of the ratio calculated by the concept ratio calculating step with respect to a reference ratio corresponding to each of the concepts;
Of the plurality of concepts, a multi-frequency concept selection step of selecting the concept having the relative frequency equal to or higher than a predetermined threshold;
One of the first concept selected in the frequent concept selection stage and the second concept in the upper hierarchy of the first concept is set as the relative frequency of the first concept and the second concept. A priority concept selection stage to select based on;
A reporting method comprising: a notification step of notifying a user that the relative frequency of the concept selected in the priority concept selection step of the first concept or the second concept is high.

  A search sentence concept extraction step of extracting a search sentence concept that is the concept corresponding to the search sentence based on a keyword included in the input search sentence;
  A concept search step of searching for the document data in which the search sentence concept is an upper hierarchy or lower hierarchy concept of the document concept among each of the plurality of document data;
  A search result output step of outputting the document data searched by the concept search unit as the document data including the content specified by the search sentence;
  The reporting method according to claim 17, further comprising:

A program for a reporting system in which a plurality of document data is sequentially input,
The program uses the reporting system,
A document database for sequentially storing input document data;
A concept database that stores a plurality of predetermined concepts by means of a hierarchical structure in which another concept including one concept is an upper hierarchy of the one concept;
A document data concept extraction unit that extracts a document concept that is the concept corresponding to the document data based on a keyword included in each of the document data;
A concept ratio calculation unit that calculates a ratio of the number of document data corresponding to each concept to the number of document data in the document database;
A relative frequency calculating unit that calculates a relative frequency indicating a magnitude of the ratio calculated by the concept ratio calculating unit with respect to a reference ratio corresponding to each of the concepts;
Among the plurality of concepts, a multi-frequency concept selecting unit that selects the concept having the relative frequency equal to or higher than a predetermined threshold;
One of the first concept selected by the frequent concept selection unit and the second concept in the upper hierarchy of the first concept is set as the relative frequency of the first concept and the second concept. A priority concept selection unit to select based on;
A program that functions as a notifying unit for notifying a user that the relative frequency of the concept selected by the priority concept selecting unit of the first concept or the second concept is high.

  The reporting system,
  A search sentence concept extraction unit that extracts a search sentence concept that is the concept corresponding to the search sentence, based on a keyword included in the input search sentence;
  A concept search unit for searching the document data in which the search statement concept is an upper hierarchy or lower hierarchy concept of the document concept among each of the plurality of document data;
  A search result output unit for outputting the document data searched by the concept search unit as the document data including content specified by the search sentence;
  The program according to claim 19, further functioning.