JP2010224833A

JP2010224833A - Ontology generation device and method

Info

Publication number: JP2010224833A
Application number: JP2009070959A
Authority: JP
Inventors: Shinichi Nagano; 伸一長野; Masumi Inaba; 真純稲葉; Yumiko Shimogoori; 祐美子下郡; Takayuki Iida; 貴之飯田; Masanori Hattori; 正典服部
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2009-03-23
Filing date: 2009-03-23
Publication date: 2010-10-07
Anticipated expiration: 2029-03-23
Also published as: JP5430989B2

Abstract

PROBLEM TO BE SOLVED: To provide an ontology generation device and method, which allows a user to easily determine inconsistency between concepts constituting a generated ontology. SOLUTION: An acquisition unit 45 obtains document data. A first extraction unit 50 extracts a pattern showing a dependency between a first and second character strings in which words showing a concept pair that is a set of concepts stored in a concept information storage part 31 are replaced by variables, of character strings of a sentence in which the concept pair is co-occurring, with other character strings from the document data and stores it in a pattern information storage part 36. A second extraction unit 55 extracts a new concept pair from the document data using the pattern stored in the pattern information storage part 36, and stores it in the concept information storage part 31. A generation unit 60 generates an ontology using a plurality of concept pairs stored in the concept information storage part 31, and a determination unit 65 determines presence/absence of inconsistency between concepts constituting the ontology, and an output unit 20 outputs the determination result together with the ontology. COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、オントロジー生成装置、及び方法に関する。 The present invention relates to an ontology generation apparatus and method.

従来から、情報処理装置などに用いられるオントロジー技術が知られている。「オントロジー」とは、辞書の一種で、言葉の持つ概念を体系的に整理したものである。オントロジーでは、概念の位置関係が概念間の相対的な意味を表現しており、この意味情報を利用することにより、高度な知識処理を行うことが可能となってきている。 Conventionally, ontology techniques used for information processing apparatuses and the like are known. “Ontology” is a type of dictionary that systematically organizes the concepts of words. In ontology, the positional relationship of concepts expresses the relative meaning between concepts, and it has become possible to perform advanced knowledge processing by using this semantic information.

そして、近年では、大量の文書データから概念および概念間の関係を抽出してオントロジーを構築する技術への取り組みが行われている。 In recent years, efforts have been made to develop an ontology by extracting concepts and relationships between concepts from a large amount of document data.

このような技術として、文書データに対して自然言語処理技術を適用し、品詞情報、構文情報、又は意味情報など（例えば、文字列パターンや構文木パターン）を用いて概念と関係を抽出し、オントロジーを構築する手法がある。また、文書データに対して自然言語処理技術を適用して得られた特徴語や文の言語情報（品詞情報、構文情報、意味情報など）を素性として、正しい関係にある概念の組を抽出し、オントロジーを構築する手法がある（非特許文献１、２参照）。 As such a technique, natural language processing technology is applied to document data, and the concept and relationship are extracted using part-of-speech information, syntax information, or semantic information (for example, a character string pattern or a syntax tree pattern), There are techniques to build ontology. In addition, feature language and sentence linguistic information (part of speech information, syntax information, semantic information, etc.) obtained by applying natural language processing technology to document data are used as features to extract a set of concepts that have the correct relationship. There are techniques for constructing an ontology (see Non-Patent Documents 1 and 2).

ＰａｔｒｉｃｋＰａｎｔｅｌ，ＭａｒｃｏＰｅｎｎａｃｃｈｉｏｔｔｉ，「Ｅｓｐｒｅｓｓｏ：ＬｅｖｅｒａｇｉｎｇＧｅｎｅｒｉｃＰａｔｔｅｒｎｓｆｏｒＡｕｔｏｍａｔｉｃａｌｌｙＨａｒｖｅｓｔｉｎｇＳｅｍａｎｔｉｃＲｅｌａｔｉｏｎｓ」，ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ／ＡｓｓｏｃｉａｔｉｏｎｆｏｒＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ（Ｃｏｌｉｎｇ／ＡＣＬ−０６），ｐｐ．１１３−１２０，２００６Patrick Pantel, Marco Pennacchioti, “Espresso: Leveraging Generic Patters for Automatic Coding and Costensive Relations”. 113-120, 2006 ＦａｂｉａｎＭ．Ｓｕｃｈａｎｅｋ，ＧｅｏｒｇｉａｎａＩｆｒｉｍ，ＧｅｒｈａｒｄＷｅｉｋｕｍ，「ＬＥＩＬＡ：ＬｅａｒｎｉｎｇｔｏＥｘｔｒａｃｔＩｎｆｏｒｍａｔｉｏｎｂｙＬｉｎｇｕｉｓｔｉｃＡｎａｌｙｓｉｓ」，ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２ｎｄＷｏｒｋｓｈｏｐｏｎＯｎｔｏｌｏｇｙＬｅａｒｎｉｎｇａｎｄＰｏｐｕｌａｔｉｏｎ，ｐｐ．１８−２５，２００６Fabian M.M. Suchanek, Georgiana Ifrim, Gerhard Weikum, "LEILA: Learning to Extract Information by Linguistic Analysis and Wrong and Plow. 18-25, 2006

しかしながら、上述したような従来技術では、抽出した概念の組の中に、概念間の関係は正しいものの互いの概念を相互に参照してしまうなど矛盾が生じている概念の組が含まれる可能性がある。そして、このような矛盾が生じていることをユーザが判別できない可能性がある。 However, in the conventional technology as described above, there is a possibility that the extracted concept set includes a set of concepts that have contradictions such as mutual reference to each other although the relationship between the concepts is correct. There is. The user may not be able to determine that such a contradiction has occurred.

本発明は、上記事情に鑑みてなされたものであり、生成されたオントロジーを構成する概念間の矛盾をユーザが容易に判別することができるオントロジー生成装置、及び方法を提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide an ontology generation apparatus and method that allow a user to easily discriminate between contradictions between the concepts constituting the generated ontology. .

上述した課題を解決し、目的を達成するために、本発明の一態様にかかるオントロジー生成装置は、文書データを取得する取得部と、語彙の概念の組である概念ペアを記憶する概念情報記憶部と、前記文書データから、前記概念ペアが共起する文の文字列のうち、前記概念ペアの概念を表す語彙それぞれを変数に置き換えた第１及び第２文字列と他の文字列との依存関係を示すパターンを抽出する第１抽出部と、抽出された前記パターンを記憶するパターン情報記憶部と、前記パターン情報記憶部に記憶されている前記パターンを用いて、前記文書データから新たな概念ペアを抽出し、前記概念情報記憶部に記憶させる第２抽出部と、前記概念情報記憶部に記憶されている複数の前記概念ペアを用いて、概念間の関係を整理したオントロジーを生成する生成部と、前記オントロジーを構成する概念間の矛盾の有無を判定する判定部と、前記オントロジーとともに判定結果を出力する出力部と、を備えることを特徴とする。 In order to solve the above-described problems and achieve the object, an ontology generation apparatus according to one aspect of the present invention includes a acquisition unit that acquires document data and a concept information storage that stores a concept pair that is a set of vocabulary concepts. And the first and second character strings obtained by replacing each vocabulary representing the concept of the concept pair with a variable among the character strings of the sentence in which the concept pair co-occurs from the document data and other character strings. A first extraction unit that extracts a pattern indicating a dependence relationship, a pattern information storage unit that stores the extracted pattern, and a new information from the document data using the pattern stored in the pattern information storage unit. An ontology in which relationships between concepts are organized using a second extraction unit that extracts concept pairs and stores them in the concept information storage unit, and a plurality of concept pairs stored in the concept information storage unit A generating unit that generates a determination unit that determines presence or absence of conflict between concepts that constitute the ontology, characterized in that it comprises an output unit for outputting a determination result together with the ontology.

また、本発明の別の態様にかかるオントロジー生成方法は、取得部が、文書データを取得する取得ステップと、第１抽出部が、前記文書データから、語彙の概念の組である概念ペアを記憶する概念情報記憶部に記憶されている前記概念ペアが共起する文の文字列のうち、前記概念ペアの概念を表す語彙それぞれを変数に置き換えた第１及び第２文字列と他の文字列との依存関係を示すパターンを抽出して、パターン情報記憶部に記憶させる第１抽出ステップと、第２抽出部が、前記パターン情報記憶部に記憶されている前記パターンを用いて、前記文書データから新たな概念ペアを抽出し、前記概念情報記憶部に記憶させる第２抽出ステップと、生成部が、前記概念情報記憶部に記憶されている複数の前記概念ペアを用いて、概念間の関係を整理したオントロジーを生成する生成ステップと、判定部が、前記オントロジーを構成する概念間の矛盾の有無を判定する判定ステップと、出力部が、前記オントロジーとともに判定結果を出力部に出力させる出力制御ステップと、を含むことを特徴とする。 An ontology generation method according to another aspect of the present invention includes an acquisition step in which an acquisition unit acquires document data, and a first extraction unit that stores a concept pair that is a set of vocabulary concepts from the document data. First and second character strings and other character strings obtained by replacing each vocabulary representing the concept of the concept pair with a variable among character strings of sentences co-occurring with the concept pair stored in the concept information storage unit A first extraction step of extracting a pattern indicating a dependency relationship with the pattern information storage unit and storing the pattern in the pattern information storage unit, and the second extraction unit using the pattern stored in the pattern information storage unit, the document data A second extraction step of extracting a new concept pair from the concept information storage unit and storing it in the concept information storage unit, and a generation unit using the plurality of concept pairs stored in the concept information storage unit, The A generation step for generating the processed ontology, a determination step for determining whether there is a contradiction between the concepts constituting the ontology, and an output control step for causing the output unit to output a determination result together with the ontology. It is characterized by including these.

本発明によれば、生成されたオントロジーを構成する概念間の矛盾をユーザが容易に判別することができるという効果を奏する。 According to the present invention, there is an effect that a user can easily discriminate between contradictions between the concepts constituting the generated ontology.

図１は、オントロジーの一例を示す図である。FIG. 1 is a diagram illustrating an example of an ontology. 図２は、本実施の形態のオントロジー生成装置の構成の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of the configuration of the ontology generation apparatus according to the present embodiment. 図３は、本実施の形態の抽出概念情報記憶部に記憶されている概念情報の一例を示す図である。FIG. 3 is a diagram illustrating an example of concept information stored in the extracted concept information storage unit of the present embodiment. 図４は、本実施の形態のユーザ概念情報記憶部に記憶されている概念情報の一例を示す図である。FIG. 4 is a diagram illustrating an example of concept information stored in the user concept information storage unit of the present embodiment. 図５は、本実施の形態の抽出パターン情報記憶部に記憶されているパターン情報の一例を示す図である。FIG. 5 is a diagram illustrating an example of pattern information stored in the extracted pattern information storage unit of the present embodiment. 図６は、本実施の形態のユーザパターン情報記憶部に記憶されているパターン情報の一例を示す図である。FIG. 6 is a diagram illustrating an example of pattern information stored in the user pattern information storage unit of the present embodiment. 図７は、本実施の形態の構文木情報の一例を示す図である。FIG. 7 is a diagram illustrating an example of syntax tree information according to the present embodiment. 図８は、本実施の形態の構文木パターンの一例を示す図である。FIG. 8 is a diagram illustrating an example of a syntax tree pattern according to the present embodiment. 図９は、新たなパターン情報が本実施の形態の抽出パターン情報記憶部に記憶された状態の一例を示す図である。FIG. 9 is a diagram illustrating an example of a state in which new pattern information is stored in the extracted pattern information storage unit of the present embodiment. 図１０は、本実施の形態の構文木パターンの一例を示す図である。FIG. 10 is a diagram illustrating an example of a syntax tree pattern according to the present embodiment. 図１１は、本実施の形態の構文木パターンを有する文の構文木情報の一例を示す図である。FIG. 11 is a diagram illustrating an example of syntax tree information of a sentence having a syntax tree pattern according to the present embodiment. 図１２は、新たな概念情報が本実施の形態の抽出概念情報記憶部に記憶された状態の一例を示す図である。FIG. 12 is a diagram illustrating an example of a state in which new concept information is stored in the extracted concept information storage unit of the present embodiment. 図１３は、本実施の形態のオントロジーの一例を示す図である。FIG. 13 is a diagram illustrating an example of the ontology of the present embodiment. 図１４は、本実施の形態のオントロジーの出力態様の一例を示す図である。FIG. 14 is a diagram illustrating an example of an ontology output mode according to the present embodiment. 図１５は、矛盾判定結果が本実施の形態の抽出概念情報記憶部に記憶された状態の一例を示す図である。FIG. 15 is a diagram illustrating an example of a state in which the contradiction determination result is stored in the extracted concept information storage unit of the present embodiment. 図１６は、矛盾判定結果が本実施の形態のユーザ概念情報記憶部に記憶された状態の一例を示す図である。FIG. 16 is a diagram illustrating an example of a state in which the contradiction determination result is stored in the user concept information storage unit of the present embodiment. 図１７は、本実施の形態のオントロジー生成装置で行われる処理の手順の流れの一例を示すフローチャートである。FIG. 17 is a flowchart illustrating an example of a procedure flow of processing performed by the ontology generation device according to the present embodiment. 図１８は、本実施の形態のパターン情報抽出処理の手順の流れの一例を示すフローチャートである。FIG. 18 is a flowchart illustrating an example of a flow of a procedure of pattern information extraction processing according to the present embodiment. 図１９は、構文木パターンの機械学習に用いるベクトルの一例を示す図である。FIG. 19 is a diagram illustrating an example of a vector used for machine learning of a syntax tree pattern. 図２０は、概念ペアと、概念ペアが共起する文の一例を示す図である。FIG. 20 is a diagram illustrating an example of a concept pair and a sentence in which the concept pair co-occurs. 図２１は、構文木パターンの機械学習に用いるベクトルの一例を示す図である。FIG. 21 is a diagram illustrating an example of a vector used for machine learning of a syntax tree pattern. 図２２は、本実施の形態の分類器の概念の一例を示す図である。FIG. 22 is a diagram illustrating an example of the concept of the classifier according to the present embodiment. 図２３は、本実施の形態の分類器の一例を示す図である。FIG. 23 is a diagram illustrating an example of a classifier according to the present embodiment. 図２４は、本実施の形態の概念情報抽出処理の手順の流れの一例を示すフローチャートである。FIG. 24 is a flowchart illustrating an example of a flow of a concept information extraction process according to the present embodiment. 図２５は、概念ペアの相関性を検定する例を示す図である。FIG. 25 is a diagram illustrating an example of testing the correlation between concept pairs. 図２６は、本実施の形態のオントロジー生成処理の手順の流れの一例を示すフローチャートである。FIG. 26 is a flowchart illustrating an example of the procedure of ontology generation processing according to the present embodiment. 図２７は、本実施の形態の矛盾判定処理の手順の流れの一例を示すフローチャートである。FIG. 27 is a flowchart illustrating an example of the flow of the contradiction determination process according to the present embodiment. 図２８は、本実施の形態の登録処理の手順の流れの一例を示すフローチャートである。FIG. 28 is a flowchart illustrating an example of a flow of a registration process according to the present embodiment. 図２９は、本実施の形態の比較処理の手順の流れの一例を示すフローチャートである。FIG. 29 is a flowchart illustrating an example of a flow of comparison processing according to the present embodiment.

以下、添付図面を参照しながら、本発明にかかるオントロジー生成装置、及び方法の最良な実施の形態を詳細に説明する。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, embodiments of an ontology generation device and method according to the invention will be described in detail with reference to the accompanying drawings.

まず、オントロジーの概要について説明する。 First, an overview of the ontology will be described.

「オントロジー」は、言葉の概念を体系的に整理したものである。「概念」には、主としてクラスとインスタンスの２種類が利用されるが、これに限定されるものではない。なお、「クラス」は、概念の分類名を示し、「インスタンス」は、概念の実例を示す。 "Ontology" is a systematic arrangement of the concept of words. As the “concept”, two types of classes and instances are mainly used, but the present invention is not limited to this. “Class” indicates the classification name of the concept, and “Instance” indicates an example of the concept.

また、オントロジーでは、オントロジー上に配置された概念の位置関係が概念間の相対的な意味を表現しており、一般的に、概念の関係には、上位下位関係（ｉｓ−ａ関係）、全体部分関係（ｐａｒｔ−ｏｆ関係）、及びインスタンス関係（ｉｎｓｔａｎｃｅ−ｏｆ関係）などのいずれかが含まれる。 In Ontology, the positional relationship of concepts placed on the ontology expresses the relative meaning between the concepts. In general, the relationship between concepts is an upper-lower relationship (is-a relationship), the whole Any of a partial relationship (part-of relationship) and an instance relationship (instance-of relationship) is included.

図１は、オントロジーの一例を示す図である。図１に示す例では、オントロジーは、７つのクラスと３つのインスタンスから構成されており、クラスは、自動車の車種分類を示し，インスタンスは自動車の年式を示している。また、図１に示す例では、クラス間の関係は上位下位関係であり、クラス及びインスタンス間の関係はインスタンス関係となっている。 FIG. 1 is a diagram illustrating an example of an ontology. In the example shown in FIG. 1, the ontology is composed of seven classes and three instances. The class indicates the vehicle type classification of the automobile, and the instance indicates the year of the automobile. In the example shown in FIG. 1, the relationship between classes is an upper-lower relationship, and the relationship between classes and instances is an instance relationship.

なお、オントロジーの表現には、例えばオントロジー記述言語であるＯＷＬなどを用いることができるが、これに限定されるものではない。 For example, the ontology description language OWL, which is an ontology description language, can be used, but is not limited thereto.

次に、本実施の形態のオントロジー生成装置の構成について説明する。 Next, the configuration of the ontology generation device according to the present embodiment will be described.

図２は、本実施の形態のオントロジー生成装置１の構成の一例を示すブロック図である。図２に示すように、オントロジー生成装置１は、入力部１０と、出力部２０と、記憶部３０と、受付部４０と、取得部４５と、パターン抽出部５０と、概念抽出部５５と、生成部６０と、判定部６５と、出力制御部７０と、登録部７５と、比較部８０とを備える。 FIG. 2 is a block diagram illustrating an example of the configuration of the ontology generation device 1 according to the present embodiment. As shown in FIG. 2, the ontology generation device 1 includes an input unit 10, an output unit 20, a storage unit 30, a reception unit 40, an acquisition unit 45, a pattern extraction unit 50, a concept extraction unit 55, A generation unit 60, a determination unit 65, an output control unit 70, a registration unit 75, and a comparison unit 80 are provided.

入力部１０は、文書の取得を指示する取得操作などの各種操作の入力を行うものであり、例えば、キーボード、マウス、又はタッチパネルなどの既存の入力装置により実現できる。 The input unit 10 performs input of various operations such as an acquisition operation instructing acquisition of a document, and can be realized by an existing input device such as a keyboard, a mouse, or a touch panel.

出力部２０は、後述する出力制御部７０の指示により、後述する生成部６０により生成されたオントロジーや、後述する判定部６５の判定結果などを出力するものであり、例えば、ＣＲＴディスプレイ、液晶ディスプレイ、プラズマディスプレイ、有機ＥＬディスプレイ、又はタッチパネル式ディスプレイなどの既存の表示装置により実現できる。なお出力部２０を、プリンタなどの既存の印刷装置により実現してもよいし、これらを併用して実現するようにしてもよい。 The output unit 20 outputs an ontology generated by a generation unit 60 described later, a determination result of a determination unit 65 described later, and the like according to an instruction from an output control unit 70 described later. For example, a CRT display or a liquid crystal display It can be realized by an existing display device such as a plasma display, an organic EL display, or a touch panel display. The output unit 20 may be realized by an existing printing apparatus such as a printer, or may be realized by using these together.

記憶部３０は、オントロジー生成装置１で行われる各種処理に使用される情報を記憶するものであり、例えば、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、メモリカード、光ディスク、又はＲＡＭ（Random Access Memory）などの磁気的、電気的、又は光学的に記憶可能な既存の記憶媒体により実現できる。そして記憶部３０は、概念情報記憶部３１と、パターン情報記憶部３６とを含む。 The storage unit 30 stores information used for various processes performed by the ontology generation device 1. For example, an HDD (Hard Disk Drive), an SSD (Solid State Drive), a memory card, an optical disk, or a RAM ( It can be realized by an existing storage medium that can store magnetically, electrically, or optically such as Random Access Memory. The storage unit 30 includes a concept information storage unit 31 and a pattern information storage unit 36.

概念情報記憶部３１は、概念の組である概念ペアなどを含む概念情報を記憶するものであり、後述の概念抽出部５５により抽出された概念ペアなどを含む概念情報を記憶する抽出概念情報記憶部３２と、ユーザにより登録される概念情報を記憶するユーザ概念情報記憶部３３とを含む。 The concept information storage unit 31 stores concept information including a concept pair that is a set of concepts, and an extracted concept information storage that stores concept information including a concept pair extracted by a concept extraction unit 55 described later. Unit 32 and a user concept information storage unit 33 for storing concept information registered by the user.

図３は、抽出概念情報記憶部３２に記憶されている概念情報の一例を示す図であり、図４は、ユーザ概念情報記憶部３３に記憶されている概念情報の一例を示す図である。図３及び図４に示す例では、概念情報は、概念ペア（概念１及び概念２）と、関係と、ラベル（第１ラベルの一例）と、矛盾とを含む情報となっている。「概念ペア」は、オントロジーの構成要素である概念を表す語彙のペアを示す情報である。「関係」は、概念ペア間の関係性を示す情報である。「ラベル」は、概念ペア間の関係が正しいか否かを示す情報であり、正しい関係であれば「正例」、正しくない関係であれば「負例」となる。「矛盾」は、概念ペア間に相互参照や巡回参照などの矛盾が生じるか否かを示す情報であり、矛盾があれば「あり」、矛盾がなければ「なし」となる。 FIG. 3 is a diagram illustrating an example of concept information stored in the extracted concept information storage unit 32, and FIG. 4 is a diagram illustrating an example of concept information stored in the user concept information storage unit 33. In the example illustrated in FIGS. 3 and 4, the concept information is information including a concept pair (concept 1 and concept 2), a relationship, a label (an example of a first label), and a contradiction. The “concept pair” is information indicating a vocabulary pair representing a concept that is a component of the ontology. “Relationship” is information indicating the relationship between concept pairs. “Label” is information indicating whether or not the relationship between concept pairs is correct, and is “positive example” if the relationship is correct and “negative example” if the relationship is not correct. “Contradiction” is information indicating whether or not there is a contradiction such as a cross-reference or a cyclic reference between concept pairs. If there is a contradiction, “Yes” is indicated, and if there is no contradiction, “No” is indicated.

パターン情報記憶部３６は、文字列の依存関係を示す構文木パターンなどを含むパターン情報を記憶するものであり、後述のパターン抽出部５０により抽出された構文木パターンなどを含むパターン情報を記憶する抽出パターン情報記憶部３７と、ユーザにより登録されるパターン情報を記憶するユーザパターン情報記憶部３８とを含む。 The pattern information storage unit 36 stores pattern information including a syntax tree pattern indicating the dependency relationship between character strings, and stores pattern information including a syntax tree pattern extracted by the pattern extraction unit 50 described later. An extraction pattern information storage unit 37 and a user pattern information storage unit 38 that stores pattern information registered by the user are included.

図５は、抽出パターン情報記憶部３７に記憶されているパターン情報の一例を示す図であり、図６は、ユーザパターン情報記憶部３８に記憶されているパターン情報の一例を示す図である。図５及び図６に示す例では、パターン情報は、構文木パターンと、関係と、ラベル（第２ラベルの一例）とを含む情報となっている。「構文木パターン」は、文字列の係り受け関係を示す情報である。「関係」は、構文木パターンによって抽出される概念ペア間の関係性を示す情報である。「ラベル」は、概念ペアの抽出に有効か否かを示す情報であり、有効であれば「正例」、有効でなければ「負例」となる。 FIG. 5 is a diagram illustrating an example of pattern information stored in the extracted pattern information storage unit 37, and FIG. 6 is a diagram illustrating an example of pattern information stored in the user pattern information storage unit 38. In the example shown in FIGS. 5 and 6, the pattern information is information including a syntax tree pattern, a relationship, and a label (an example of a second label). The “syntax tree pattern” is information indicating the dependency relationship of character strings. “Relationship” is information indicating the relationship between concept pairs extracted by the syntax tree pattern. “Label” is information indicating whether or not the concept pair is valid for extraction, and is “positive example” if valid and “negative example” if not valid.

受付部４０は、入力部１０により入力された各種操作の入力を受け付ける。具体的には、受付部４０は、取得操作、ユーザ概念情報記憶部３３に対する概念情報の登録操作、ユーザパターン情報記憶部３８に対するパターン情報の登録操作、抽出概念情報記憶部３２及びユーザ概念情報記憶部３３間での概念情報の比較操作、又は抽出パターン情報記憶部３７及びユーザパターン情報記憶部３８間でのパターン情報の比較操作などを受け付ける。 The accepting unit 40 accepts input of various operations input by the input unit 10. Specifically, the receiving unit 40 performs an acquisition operation, a concept information registration operation with respect to the user concept information storage unit 33, a pattern information registration operation with respect to the user pattern information storage unit 38, an extraction concept information storage unit 32, and a user concept information storage. A conceptual information comparison operation between the units 33 or a pattern information comparison operation between the extracted pattern information storage unit 37 and the user pattern information storage unit 38 is accepted.

取得部４５は、文書データを取得する。具体的には、取得部４５は、受付部４０により取得操作が受け付けられると、文書データを取得する。なお、記憶部３０に文書データを格納しておき、取得部４５が記憶部３０から文書データを取得するようにしてもよいし、例えばサーバなどの図示せぬ外部装置に格納された文書データを取得するようにしてもよい。 The acquisition unit 45 acquires document data. Specifically, the acquisition unit 45 acquires document data when an acquisition operation is received by the reception unit 40. The document data may be stored in the storage unit 30, and the acquisition unit 45 may acquire the document data from the storage unit 30. For example, the document data stored in an external device (not shown) such as a server may be stored. You may make it acquire.

なお、取得部４５が取得する文書データは、特定ドメインについて書かれた文書データの集合であり、例えば、特許文書、技術文書、営業文書、業務文書などが挙げられる。また、取得部４５が取得する文書データの形式は限定されず、構造化文書、半構造化文書、非構造化文書のいずれであってもよい。 The document data acquired by the acquisition unit 45 is a collection of document data written for a specific domain, and examples thereof include patent documents, technical documents, sales documents, and business documents. The format of the document data acquired by the acquisition unit 45 is not limited, and may be any of a structured document, a semi-structured document, and an unstructured document.

パターン抽出部５０（第１抽出部の一例）は、取得部４５により取得された文書データから、概念情報記憶部３１に記憶されている概念ペアが共起する文の文字列のうち、前記概念ペアの概念を表す語彙それぞれを変数に置き換えた第１及び第２文字列と他の文字列との依存関係を示すパターンを抽出する。 The pattern extraction unit 50 (an example of a first extraction unit) uses the concept from the document data acquired by the acquisition unit 45 among the character strings of the sentences in which the concept pairs stored in the concept information storage unit 31 co-occur. A pattern indicating a dependency relationship between the first and second character strings in which each vocabulary representing the concept of the pair is replaced with a variable and another character string is extracted.

具体的には、パターン抽出部５０は、抽出概念情報記憶部３２又はユーザ概念情報記憶部３３から概念情報を読み出し、読み出した概念情報の概念ペアが共起する文書データを検索する。そして、パターン抽出部５０は、検索された文書データを構文解析し、概念ペアが共起する文を構文木情報として抽出する。 Specifically, the pattern extraction unit 50 reads the concept information from the extracted concept information storage unit 32 or the user concept information storage unit 33, and searches for document data in which the concept pairs of the read concept information co-occur. Then, the pattern extraction unit 50 parses the retrieved document data, and extracts sentences in which concept pairs co-occur as syntax tree information.

図７は、パターン抽出部５０により、図４に示す概念情報１１３を用いて抽出された構文木情報の一例を示す図である。図７に示す構文木情報は、文字列１２１〜１２５から構成される文を構文解析して生成された構文木情報であり、文字列１２１〜１２５が構文木となっている。そして、図７に示す構文木情報は、構文木である文字列１２１〜１２５それぞれの係り受け関係を示している。 FIG. 7 is a diagram illustrating an example of syntax tree information extracted by the pattern extraction unit 50 using the conceptual information 113 illustrated in FIG. 4. The syntax tree information shown in FIG. 7 is syntax tree information generated by parsing a sentence composed of character strings 121 to 125, and the character strings 121 to 125 are syntax trees. And the syntax tree information shown in FIG. 7 has shown the dependency relationship of each of the character strings 121-125 which are syntax trees.

そして、パターン抽出部５０は、抽出した構文木情報から不要な文字列を削除し、概念情報に対応する文字列を変数に置き換えた構文木パターンを、関係やラベルとともに抽出概念情報記憶部３２に記憶させる。 Then, the pattern extraction unit 50 deletes an unnecessary character string from the extracted syntax tree information and replaces the character string corresponding to the concept information with a variable in the extracted concept information storage unit 32 together with the relationship and the label. Remember.

なお、不要な文字列の削除として、パターン抽出部５０は、概念情報に対応する文字列が出現する最短パス、及び当該文字列が直結する文字列以外の文字列を削除する。また、抽出概念情報記憶部３２に記憶される関係は、構文木パターンの抽出に用いた概念ペアの関係が示す情報であり、抽出概念情報記憶部３２に記憶されるラベルについては後述する。 In addition, as deletion of an unnecessary character string, the pattern extraction part 50 deletes character strings other than the shortest path | pass where the character string corresponding to conceptual information appears, and the said character string are connected directly. The relationship stored in the extracted concept information storage unit 32 is information indicating the relationship between concept pairs used for extracting the syntax tree pattern, and the labels stored in the extracted concept information storage unit 32 will be described later.

図８は、図７に示す構文木情報の構文木パターンの一例を示す図である。図に示す例では、図７に示す構文木情報から文字列１２１が削除され、概念情報１１３の概念ペアに対応する文字列１２２、１２４が、それぞれ変数を含む文字列１３２、１３４に置き換えられている。 FIG. 8 is a diagram illustrating an example of the syntax tree pattern of the syntax tree information illustrated in FIG. In the example shown in the figure, the character string 121 is deleted from the syntax tree information shown in FIG. 7, and the character strings 122 and 124 corresponding to the concept pairs in the concept information 113 are replaced with character strings 132 and 134 including variables, respectively. Yes.

図９は、図８に示す構文木パターンを含むパターン情報が、抽出パターン情報記憶部３７に新たに記憶された状態の一例を示す図である。図９に示す例では、文字列１３２、１２３、１３４、及び１２５から構成される構文木パターンが、関係やラベルとともにパターン情報１４１として抽出パターン情報記憶部３７に記憶されている。なお、図８に示す構文木パターンの抽出に用いた概念情報は、図４に示す概念情報１１３であるため、パターン情報１４１の関係は、「車種」となっている。 FIG. 9 is a diagram illustrating an example of a state in which pattern information including the syntax tree pattern illustrated in FIG. 8 is newly stored in the extracted pattern information storage unit 37. In the example shown in FIG. 9, a syntax tree pattern composed of character strings 132, 123, 134, and 125 is stored in the extracted pattern information storage unit 37 as pattern information 141 together with relationships and labels. Note that the conceptual information used for extracting the syntactic tree pattern shown in FIG. 8 is the conceptual information 113 shown in FIG. 4, so the relationship of the pattern information 141 is “vehicle type”.

概念抽出部５５（第２抽出部の一例）は、パターン情報記憶部３６に記憶されている構文木パターンを用いて、文書データから新たな概念ペアを抽出し、概念情報記憶部３１に記憶させる。 The concept extraction unit 55 (an example of a second extraction unit) uses the syntax tree pattern stored in the pattern information storage unit 36 to extract a new concept pair from the document data and stores it in the concept information storage unit 31. .

具体的には、概念抽出部５５は、抽出パターン情報記憶部３７又はユーザパターン情報記憶部３８からパターン情報を読み出し、読み出したパターン情報の構文木パターンを有する文を、取得部４５により取得された文書データから抽出する。 Specifically, the concept extraction unit 55 reads pattern information from the extraction pattern information storage unit 37 or the user pattern information storage unit 38, and the acquisition unit 45 acquires a sentence having a syntax tree pattern of the read pattern information. Extract from document data.

図１０は、図６に示すパターン情報１１７の構文木パターンの一例を示す図である。図１０に示す構文木パターンは、文字列１５３〜１５５から構成されている。 FIG. 10 is a diagram illustrating an example of the syntax tree pattern of the pattern information 117 illustrated in FIG. The syntax tree pattern shown in FIG. 10 is composed of character strings 153 to 155.

図１１は、図１０に示す構文木パターンを有する文の構文木情報の一例を示す図である。図１１に示す構文木情報は、文字列１６１〜１６５から構成される文の構文木情報であり、文字列１６３〜１６５が、それぞれ図１０に示す構文木パターンの文字列１５３〜１５５に対応している。従って、図１１に示す文字列１６１〜１６５から構成される文は、概念抽出部５５により抽出される。 FIG. 11 is a diagram illustrating an example of syntax tree information of a sentence having the syntax tree pattern illustrated in FIG. The syntax tree information shown in FIG. 11 is the syntax tree information of a sentence composed of character strings 161 to 165, and the character strings 163 to 165 correspond to the character strings 153 to 155 of the syntax tree pattern shown in FIG. ing. Therefore, the sentence composed of the character strings 161 to 165 shown in FIG. 11 is extracted by the concept extraction unit 55.

そして、概念抽出部５５は、抽出した文の構文木パターンのうち、変数部分を含む語彙の相関性を検定し、相関性がある場合には、構文木パターンの変数部分に対応する文字列を概念ペアとして抽出し、関係やラベルとともに抽出概念情報記憶部３２に記憶させる。 Then, the concept extraction unit 55 examines the correlation of the vocabulary including the variable part in the syntax tree pattern of the extracted sentence. If there is a correlation, the concept extraction unit 55 determines the character string corresponding to the variable part of the syntax tree pattern. Extracted as a concept pair and stored in the extracted concept information storage unit 32 together with the relationship and label.

図１２は、図１１に示す文から抽出された概念ペアを含む概念情報が、新たに抽出概念情報記憶部３２に記憶された状態の一例を示す図である。図１２に示す例では、図１１に示す文字列１６３及び文字列１６４に含まれる文字のペアである概念ペアが、関係、ラベル、及び矛盾とともに概念情報１７１として抽出概念情報記憶部３２に記憶されている。なお、概念ペアの抽出に用いたパターン情報は、図６に示すパターン情報１１７であるため、概念情報１７１の関係は、「車種」となっている。ラベル、及び矛盾については後述する。 FIG. 12 is a diagram illustrating an example of a state in which the concept information including the concept pair extracted from the sentence illustrated in FIG. 11 is newly stored in the extracted concept information storage unit 32. In the example illustrated in FIG. 12, the concept pair that is a pair of characters included in the character string 163 and the character string 164 illustrated in FIG. 11 is stored in the extracted concept information storage unit 32 as the concept information 171 together with the relationship, the label, and the contradiction. ing. Since the pattern information used for extracting the concept pair is the pattern information 117 shown in FIG. 6, the relationship of the concept information 171 is “vehicle type”. The label and contradiction will be described later.

生成部６０は、概念情報記憶部３１に記憶されている複数の概念ペアを用いてオントロジーを生成する。具体的には、生成部６０は、抽出概念情報記憶部３２又はユーザ概念情報記憶部３３から概念情報を読み出し、読み出した概念情報の概念ペアのそれぞれの名称を付与したノードとともに、各ノード間のリンクを作成する。 The generation unit 60 generates an ontology using a plurality of concept pairs stored in the concept information storage unit 31. Specifically, the generation unit 60 reads the concept information from the extracted concept information storage unit 32 or the user concept information storage unit 33, and together with the nodes assigned the names of the concept pairs of the read concept information, Create a link.

図１３は、図４に示す概念情報１１３、１１４及び、図１２に示す概念情報１１１の概念ペアを用いて生成されたオントロジーの一例を示す図である。図１３に示すオントロジーは、ノード１８１〜１８３により構成されるオントロジーであり、ノード１８１及びノード１８２間ではリンクが巡回している。 FIG. 13 is a diagram illustrating an example of an ontology generated using the concept information 113 and 114 illustrated in FIG. 4 and the concept pair of the concept information 111 illustrated in FIG. The ontology shown in FIG. 13 is an ontology composed of nodes 181 to 183, and a link circulates between the nodes 181 and 182.

判定部６５は、生成部６０により生成されたオントロジーを構成する概念間の矛盾の有無を判定する。具体的には、判定部６５は、生成部６０により生成されたオントロジーを構成する概念間に巡回が生じる場合に、当該概念間に矛盾があると判定する。そして、判定部６５は、判定結果である矛盾の有無をオントロジーの生成に用いられた概念ペアに対応付けて抽出概念情報記憶部３２又はユーザ概念情報記憶部３３に記憶させる。 The determination unit 65 determines whether there is a contradiction between concepts constituting the ontology generated by the generation unit 60. Specifically, the determination unit 65 determines that there is a contradiction between concepts when a cycle occurs between the concepts constituting the ontology generated by the generation unit 60. Then, the determination unit 65 causes the extracted concept information storage unit 32 or the user concept information storage unit 33 to store the presence or absence of a contradiction as a determination result in association with the concept pair used for the ontology generation.

図１３に示す例では、ノード１８１及びノード１８２間ではリンクが巡回しているため、判定部６５は、図１４に示すようにノード１８１及びノード１８２の生成元の概念ペアに矛盾があると判定する。 In the example shown in FIG. 13, since the link circulates between the node 181 and the node 182, the determination unit 65 determines that there is a contradiction in the concept pair of the generation source of the node 181 and the node 182 as shown in FIG. To do.

図１５は、図１４に示すオントロジーの矛盾判定により、矛盾の有無が抽出概念情報記憶部３２に記憶された状態の一例を示す図であり、図１６は、図１４に示すオントロジーの矛盾判定により、矛盾の有無がユーザ概念情報記憶部３３に記憶された状態の一例を示す図である。 FIG. 15 is a diagram illustrating an example of a state in which the presence or absence of contradiction is stored in the extracted concept information storage unit 32 by the ontology contradiction determination illustrated in FIG. 14. FIG. 16 is a diagram illustrating the ontology contradiction determination illustrated in FIG. FIG. 5 is a diagram illustrating an example of a state in which the presence or absence of contradiction is stored in a user concept information storage unit 33.

図１３に示すオントロジーでは、ノード１８１及びノード１８２の生成元の概念ペアに矛盾があると判定されるため、図１５に示す抽出概念情報記憶部３２の概念情報１１１、及び図１６に示すユーザ概念情報記憶部３３の概念情報１１４には、矛盾ありが設定される。 In the ontology shown in FIG. 13, since it is determined that there is a contradiction in the concept pair of the generation source of the node 181 and the node 182, the concept information 111 in the extracted concept information storage unit 32 shown in FIG. 15 and the user concept shown in FIG. Inconsistency is set in the concept information 114 of the information storage unit 33.

出力制御部７０は、生成部６０により生成されたオントロジーとともに判定部６５の判定結果を出力部２０に出力させる。例えば、出力制御部７０は、図１４に示すように、矛盾が生じる部分を出力部２０に強調表示させる。また出力制御部７０は、後述の比較部８０の比較結果を出力部２０に出力させる。 The output control unit 70 causes the output unit 20 to output the determination result of the determination unit 65 together with the ontology generated by the generation unit 60. For example, as illustrated in FIG. 14, the output control unit 70 causes the output unit 20 to highlight a portion where a contradiction occurs. Further, the output control unit 70 causes the output unit 20 to output a comparison result of the comparison unit 80 described later.

登録部７５は、受付部４０により登録操作が受け付けられると、受け付けられた概念ペアをユーザ概念情報記憶部３３に登録したり、受け付けられた構文木パターンをユーザパターン情報記憶部３８に登録する。 When the registration operation is accepted by the accepting unit 40, the registration unit 75 registers the accepted concept pair in the user concept information storage unit 33 or registers the accepted syntax tree pattern in the user pattern information storage unit 38.

比較部８０は、受付部４０により比較操作が受け付けられると、抽出概念情報記憶部３２及びユーザ概念情報記憶部３３の双方に記憶された同一の概念ペアに対するラベルの比較や、抽出パターン情報記憶部３７及びユーザパターン情報記憶部３８の双方に記憶された同一の構文木パターンに対するラベルの比較を行う。 When a comparison operation is received by the receiving unit 40, the comparing unit 80 compares labels for the same concept pair stored in both the extracted concept information storage unit 32 and the user concept information storage unit 33, and an extracted pattern information storage unit. 37 and label comparison for the same syntax tree pattern stored in both the user pattern information storage unit 38 and the user pattern information storage unit 38.

次に、本実施の形態のオントロジー生成装置の動作について説明する。 Next, the operation of the ontology generation device according to the present embodiment will be described.

図１７は、本実施の形態のオントロジー生成装置１で行われるオントロジー生成処理の手順の流れの一例を示すフローチャートである。 FIG. 17 is a flowchart illustrating an example of a flow of an ontology generation process performed by the ontology generation apparatus 1 according to the present embodiment.

まず、取得部４５は、受付部４０に取得操作が受け付けられると、文書データ群を取得する（ステップＳ１０）。 First, when the acquisition unit 45 receives an acquisition operation, the acquisition unit 45 acquires a document data group (step S10).

続いて、パターン抽出部５０は、取得部４５により取得された文書から、概念情報記憶部３１に記憶されている概念情報の概念ペアが共起する文の構文木情報であって、前記概念ペアの概念それぞれを変数に置き換えた構文木と他の構文木との依存関係を示す構文木パターンを抽出して、パターン情報記憶部３６に記憶させるパターン情報抽出処理を行う（ステップＳ２０）。なお、パターン情報抽出処理の詳細は後述する。 Subsequently, the pattern extraction unit 50 is syntax tree information of a sentence in which a concept pair of concept information stored in the concept information storage unit 31 co-occurs from a document acquired by the acquisition unit 45, and the concept pair Pattern information extraction processing is performed for extracting a syntax tree pattern indicating a dependency relationship between the syntax tree in which each of the concepts is replaced with a variable and another syntax tree and storing the extracted syntax tree pattern in the pattern information storage unit 36 (step S20). Details of the pattern information extraction process will be described later.

続いて、概念抽出部５５は、パターン情報記憶部３６に記憶されている構文木パターンを用いて、取得部４５により取得された文書データから新たな概念ペアを抽出し、概念情報記憶部３１に記憶させる概念情報抽出処理を行う（ステップＳ３０）。なお、概念情報抽出処理の詳細は後述する。 Subsequently, the concept extraction unit 55 extracts a new concept pair from the document data acquired by the acquisition unit 45 using the syntax tree pattern stored in the pattern information storage unit 36, and stores it in the concept information storage unit 31. A concept information extraction process is performed (step S30). Details of the concept information extraction process will be described later.

続いて、概念抽出部５５により新たな概念ペアを含む概念情報が概念情報記憶部３１に記憶され、概念情報記憶部３１の概念情報数が増加した場合には（ステップＳ４０でＹｅｓ）、生成部６０は、概念情報記憶部３１に記憶されている概念情報の概念ペアを用いて、オントロジー生成処理を行う（ステップＳ５０）。なお、オントロジー生成処理の詳細は後述する。 Subsequently, when the concept information including the new concept pair is stored in the concept information storage unit 31 by the concept extraction unit 55 and the number of concept information in the concept information storage unit 31 increases (Yes in step S40), the generation unit 60 performs an ontology generation process using the concept pair of concept information stored in the concept information storage unit 31 (step S50). Details of the ontology generation process will be described later.

続いて、判定部６５は、生成部６０により生成されたオントロジーを構成する概念間の矛盾の有無を判定する矛盾判定処理を行う（ステップＳ６０）。なお、矛盾判定処理の詳細は後述する。 Subsequently, the determination unit 65 performs a contradiction determination process for determining whether there is a contradiction between concepts constituting the ontology generated by the generation unit 60 (step S60). Details of the contradiction determination process will be described later.

続いて、出力制御部７０は、生成部６０により生成されたオントロジーとともに、判定部６５の判定結果を出力部２０に出力させる（ステップＳ７０）。 Subsequently, the output control unit 70 causes the output unit 20 to output the determination result of the determination unit 65 together with the ontology generated by the generation unit 60 (step S70).

図１８は、図１７のステップＳ２０に示すパターン情報抽出処理の手順の流れの一例を示すフローチャートである。 FIG. 18 is a flowchart showing an example of the flow of the pattern information extraction process shown in step S20 of FIG.

まず、パターン抽出部５０は、抽出概念情報記憶部３２又はユーザ概念情報記憶部３３から読み出す概念情報を選択する（ステップＳ２０２）。 First, the pattern extraction unit 50 selects concept information to be read from the extracted concept information storage unit 32 or the user concept information storage unit 33 (step S202).

この際、パターン抽出部５０は、抽出概念情報記憶部３２及びユーザ概念情報記憶部３３に同一の概念ペアを有する概念情報が記憶され、両概念ペアのラベルが異なる場合には、ユーザ概念情報記憶部３３に記憶された概念情報を選択する。 At this time, the pattern extraction unit 50 stores the concept information having the same concept pair in the extracted concept information storage unit 32 and the user concept information storage unit 33, and stores the user concept information storage if the concept pairs have different labels. The concept information stored in the unit 33 is selected.

例えば、図３に示す抽出概念情報記憶部３２に記憶された概念情報１１２と、図４に示すユーザ概念情報記憶部３３に記憶された概念情報１１３とは、概念ペアが同一であり、ラベルが互いに異なるため、パターン抽出部５０は、ユーザ概念情報記憶部３３に記憶された概念情報１１３を選択する。 For example, the concept information 112 stored in the extracted concept information storage unit 32 shown in FIG. 3 and the concept information 113 stored in the user concept information storage unit 33 shown in FIG. Since they are different from each other, the pattern extraction unit 50 selects the concept information 113 stored in the user concept information storage unit 33.

続いて、パターン抽出部５０は、選択した概念情報を読み出し、読み出した概念情報の概念ペアが共起する文書データを、取得部４５により取得された文書データの中から検索する（ステップＳ２０４）。 Subsequently, the pattern extraction unit 50 reads the selected concept information and searches the document data acquired by the acquisition unit 45 for document data in which the concept pairs of the read concept information co-occur (step S204).

続いて、パターン抽出部５０は、検索した文書データを構文解析し、概念ペアが共起する文を構文木情報として抽出する（ステップＳ２０６）。図７に示す例では、パターン抽出部５０は、図４に示す概念情報１１３を用いて、文字列１２１〜１２５から構成される文を構文木情報として抽出している。 Subsequently, the pattern extraction unit 50 parses the retrieved document data, and extracts a sentence in which concept pairs co-occur as syntax tree information (step S206). In the example shown in FIG. 7, the pattern extraction unit 50 uses the concept information 113 shown in FIG. 4 to extract sentences composed of character strings 121 to 125 as syntax tree information.

続いて、パターン抽出部５０は、抽出した文の構文木情報から不要な文字列を削除し、概念情報に対応する文字列を変数に置き換えた構文木パターンを、新たに抽出概念情報記憶部３２に格納する（ステップＳ２０８）。 Subsequently, the pattern extraction unit 50 deletes an unnecessary character string from the syntax tree information of the extracted sentence, and newly extracts a syntax tree pattern in which the character string corresponding to the concept information is replaced with a variable. (Step S208).

図８に示す例では、パターン抽出部５０は、図７に示す構文木情報から文字列１２１を削除し、概念情報１１３の概念ペアに対応する文字列１２２、１２４を、それぞれ変数を含む文字列１３２、１３４に置き換えている。 In the example illustrated in FIG. 8, the pattern extraction unit 50 deletes the character string 121 from the syntax tree information illustrated in FIG. 7, and character strings 122 and 124 corresponding to the concept pairs in the concept information 113 are character strings including variables, respectively. 132 and 134 are replaced.

続いて、パターン抽出部５０は、抽出した構文木情報を構成する文字列（部分木）、抽出に用いた概念ペア、及び概念ペアのラベルを素性として構文木パターンを機械学習することで、構文木パターンのラベルの付与に用いる分類器を生成する（ステップＳ２１０）。なお、機械学習の手法としては、例えばＳＶＭなどが挙げられるが、これに限定されるものではない。 Subsequently, the pattern extraction unit 50 performs machine learning of the syntax tree pattern using the character strings (subtrees) constituting the extracted syntax tree information, the concept pairs used for the extraction, and the labels of the concept pairs as features, thereby syntactic syntax. A classifier used for labeling the tree pattern is generated (step S210). An example of the machine learning method is SVM, but is not limited thereto.

ここで、図１９〜図２３を参照しながら、分類器について説明する。 Here, the classifier will be described with reference to FIGS.

図１９は、構文木パターンの機械学習に用いるベクトルの一例を示す図である。図１９に示す例では、図７に示す構文木情報を構成する文字列１２１〜１２５、図８に示す構文木パターンの抽出に用いた図４に示す概念情報１１３の概念ペア、及び概念情報１１３の正例を示すラベルを素性としたベクトルを生成している。 FIG. 19 is a diagram illustrating an example of a vector used for machine learning of a syntax tree pattern. In the example shown in FIG. 19, the character strings 121 to 125 constituting the syntax tree information shown in FIG. 7, the concept pair of the concept information 113 shown in FIG. 4 used for extracting the syntax tree pattern shown in FIG. A vector is generated with a label indicating a positive example as a feature.

具体的には、図１９に示す例では、概念情報１１３の正例を示すラベル、文字列１２１〜１２５、概念情報１１３の概念ペア、文字列１２１〜１２５及び概念情報１１３の概念ペアの係り受け関係のそれぞれに対して、ベクトルを生成している。 Specifically, in the example illustrated in FIG. 19, a label indicating a positive example of the concept information 113, a character string 121 to 125, a concept pair of the concept information 113, and a dependency pair of the character string 121 to 125 and the concept information 113. A vector is generated for each of the relationships.

図２０は、概念ペアと、概念ペアが共起する文の一例を示す図である。図２０に示す例では、概念ペア２１１〜２１５のそれぞれに文２２１〜２２５が共起する。 FIG. 20 is a diagram illustrating an example of a concept pair and a sentence in which the concept pair co-occurs. In the example illustrated in FIG. 20, sentences 221 to 225 co-occur in the concept pairs 211 to 215, respectively.

図２１は、図２０に示す概念ペア及び当該概念ペアが共起する文から生成したベクトル集合の一例を示す図である。なお、図２１に示す例では、表２２７が表２２６の右列に続くものであり、表２２８が表２２７の右列に続くものである。 FIG. 21 is a diagram illustrating an example of a vector set generated from the concept pair illustrated in FIG. 20 and a sentence in which the concept pair co-occurs. In the example shown in FIG. 21, the table 227 follows the right column of the table 226, and the table 228 continues after the right column of the table 227.

図２２は、図２１に示すベクトル集合を機械学習することで生成される分類器の概念の一例を示す図であり、図２３は、図２１に示すベクトル集合を機械学習することで生成される分類器の一例を示す図である。なお、図２３に示す例では、表２４２が表２４１の右列に続くものであり、表２４３が表２４２の右列に続くものである。 FIG. 22 is a diagram illustrating an example of the concept of a classifier generated by machine learning of the vector set shown in FIG. 21, and FIG. 23 is generated by machine learning of the vector set shown in FIG. It is a figure which shows an example of a classifier. In the example illustrated in FIG. 23, the table 242 continues to the right column of the table 241, and the table 243 continues to the right column of the table 242.

図２２に示す例では、ベクトル空間２３０上に、「正例」、「負例」のベクトルが存在しており、境界曲線２３３により負例のベクトル空間２３１と正例のベクトル空間２３２に分けられている。従って、ベクトル集合を図２２に示す分類器に入力することにより、構文木パターンに付与するラベルが正例であるか負例であるかを決定することができる。 In the example shown in FIG. 22, “positive example” and “negative example” vectors exist on the vector space 230, and are divided into a negative example vector space 231 and a positive example vector space 232 by a boundary curve 233. ing. Therefore, by inputting the vector set to the classifier shown in FIG. 22, it is possible to determine whether the label to be given to the syntax tree pattern is a positive example or a negative example.

図１８に戻り、パターン抽出部５０は、抽出概念情報記憶部３２及びユーザ概念情報記憶部３３に記憶された全ての選択可能な概念情報を処理するまで、ステップＳ２０２〜ステップＳ２１０の処理を繰り返す（ステップＳ２１２でＮｏ）。 Returning to FIG. 18, the pattern extraction unit 50 repeats the processing from step S 202 to step S 210 until all selectable concept information stored in the extraction concept information storage unit 32 and the user concept information storage unit 33 is processed ( No in step S212).

そして、全ての選択可能な概念情報を処理した場合には（ステップＳ２１２でＹｅｓ）、パターン抽出部５０は、抽出した構文木パターンを選択して、分類器に入力する（ステップＳ２１４）。 If all selectable conceptual information has been processed (Yes in step S212), the pattern extraction unit 50 selects the extracted syntax tree pattern and inputs it to the classifier (step S214).

続いて、パターン抽出部５０は、分類器から、入力した構文木パターンに対応付ける分類ラベルを獲得し、構文木パターンに対応付けて抽出パターン情報記憶部３７及びユーザパターン情報記憶部３８に格納する（ステップＳ２１６）。例えば、図８に示す構文木パターンを図２３に示す分類器に入力した場合、いずれの文字列（構文木）も正例となるため、図８に示す構文木パターンには正例のラベルが対応付けられる。 Subsequently, the pattern extraction unit 50 acquires a classification label associated with the input syntax tree pattern from the classifier, and stores it in the extraction pattern information storage unit 37 and the user pattern information storage unit 38 in association with the syntax tree pattern ( Step S216). For example, when the syntax tree pattern shown in FIG. 8 is input to the classifier shown in FIG. 23, since any character string (syntax tree) is a positive example, the syntax tree pattern shown in FIG. It is associated.

続いて、パターン抽出部５０は、抽出した全ての構文木パターンを処理するまで、ステップＳ２１４〜ステップＳ２１６の処理を繰り返す（ステップＳ２１８でＮｏ）。そして、抽出した全ての構文木パターンを処理した場合には（ステップＳ２１８でＹｅｓ）、パターン抽出部５０は、処理を終了する。 Subsequently, the pattern extraction unit 50 repeats the processing from step S214 to step S216 until all the extracted syntax tree patterns are processed (No in step S218). If all the extracted syntax tree patterns have been processed (Yes in step S218), the pattern extraction unit 50 ends the process.

図２４は、図１７のステップＳ３０に示す概念情報抽出処理の手順の流れの一例を示すフローチャートである。 FIG. 24 is a flowchart illustrating an example of a procedure flow of the conceptual information extraction process illustrated in step S30 of FIG.

まず、概念抽出部５５は、抽出パターン情報記憶部３７又はユーザパターン情報記憶部３８から読み出すパターン情報を選択する（ステップＳ３０２）。 First, the concept extraction unit 55 selects pattern information to be read from the extraction pattern information storage unit 37 or the user pattern information storage unit 38 (step S302).

この際、概念抽出部５５は、抽出パターン情報記憶部３７及びユーザパターン情報記憶部３８に同一の構文木パターンを有するパターン情報が記憶され、両構文木パターンのラベルが異なる場合には、ユーザパターン情報記憶部３８に記憶されたパターン情報を選択する。 At this time, the concept extraction unit 55 stores pattern information having the same syntax tree pattern in the extraction pattern information storage unit 37 and the user pattern information storage unit 38, and if the two syntax tree patterns have different labels, the user pattern The pattern information stored in the information storage unit 38 is selected.

例えば、図５に示す抽出パターン情報記憶部３７に記憶されたパターン情報１１６と、図４に示すユーザパターン情報記憶部３８に記憶されたパターン情報１１７とは、構文木パターンが同一であり、ラベルが互いに異なるため、概念抽出部５５は、ユーザパターン情報記憶部３８に記憶されたパターン情報１１７を選択する。 For example, the pattern information 116 stored in the extracted pattern information storage unit 37 shown in FIG. 5 and the pattern information 117 stored in the user pattern information storage unit 38 shown in FIG. Are different from each other, the concept extraction unit 55 selects the pattern information 117 stored in the user pattern information storage unit 38.

続いて、概念抽出部５５は、選択したパターン情報を読み出し、読み出したパターン情報の構文木パターンが正例であるか否かを確認する（ステップＳ３０４）。 Subsequently, the concept extraction unit 55 reads the selected pattern information and checks whether or not the syntax tree pattern of the read pattern information is a positive example (step S304).

続いて、概念抽出部５５は、選択した構文木パターンが正例である場合（ステップＳ３０４でＹｅｓ）には、取得部４５により取得された文書データの中から、当該構文木パターンを有する文を抽出する（ステップＳ３０６）。図１１に示す例では、概念抽出部５５は、図１０に示す構文木情報を有する文字列１６１〜１６５から構成される文を抽出する。なお、選択した構文木パターンが負例である場合（ステップＳ３０４でＮｏ）には、概念抽出部５５は、新たなパターン情報を選択する。 Subsequently, when the selected syntax tree pattern is a correct example (Yes in step S304), the concept extraction unit 55 selects a sentence having the syntax tree pattern from the document data acquired by the acquisition unit 45. Extract (step S306). In the example illustrated in FIG. 11, the concept extraction unit 55 extracts a sentence including character strings 161 to 165 having the syntax tree information illustrated in FIG. 10. When the selected syntax tree pattern is a negative example (No in step S304), the concept extraction unit 55 selects new pattern information.

続いて、概念抽出部５５は、構文木パターンの変数に相当する語彙のペアを概念情報として抽出する（ステップＳ３０８）。図１１に示す例では、概念抽出部５５は、文字列１６３及び文字列１６４に含まれる文字のペアを概念ペアとして抽出する。 Subsequently, the concept extraction unit 55 extracts vocabulary pairs corresponding to variables of the syntax tree pattern as concept information (step S308). In the example illustrated in FIG. 11, the concept extraction unit 55 extracts a character pair included in the character string 163 and the character string 164 as a concept pair.

続いて、概念抽出部５５は、抽出した概念情報を統計的に検定する（ステップＳ３１０）。なお、概念情報の統計的検定には、例えば、カイ二乗検定などを用いることができるがこれに限定されるものではない。 Subsequently, the concept extraction unit 55 statistically tests the extracted concept information (step S310). Note that, for example, a chi-square test can be used for the statistical test of the concept information, but the present invention is not limited to this.

図２５は、図１２の概念情報１７１の概念ペアの相関性をカイ二乗検定で検定した例を示す図である。図２５に示す例では、文書データに対する概念情報１７１の概念ペアそれぞれの出現頻度及び共起頻度のカウント結果を示しており、概念ペアの双方が出現する場合が共起頻度を示している。 FIG. 25 is a diagram showing an example in which the correlation between the concept pairs in the concept information 171 in FIG. 12 is tested by chi-square test. In the example shown in FIG. 25, the appearance frequency and the co-occurrence frequency of each concept pair of the concept information 171 for the document data are shown, and the co-occurrence frequency is shown when both concept pairs appear.

そして、概念ペアそれぞれの出現に関する独立性を判定することで、概念ペアそれぞれの相関性を統計的に判定できる。なお、独立性の判定には、統計解析でよく用いられる検定手法の１つであるカイ二乗検定を用いることができるが、これに限定されるものではない。ここで、帰無仮説として「概念ペアは互いに独立である」とし，対立仮説を「SUVと概念ペアは互いに独立ではない」と定める。 And the correlation of each concept pair can be statistically determined by determining the independence regarding the appearance of each concept pair. Independence can be determined using a chi-square test, which is one of the test methods often used in statistical analysis, but is not limited to this. Here, the null hypothesis is “concept pairs are independent of each other” and the alternative hypothesis is “SUV and concept pairs are not independent”.

図２５に示す例では、カイ二乗統計量は６．２０７１、ｐ値は０．０１３となる。つまり、自由度１のカイ二乗分布において、カイ二乗統計量６．２０７１が起こる確率は０．００１３となる。そして、ｐ値が有意水準０．０５よりも小さいため、帰無仮説は棄却され、概念ペアは互いに独立ではなく、統計的には相関性があるとみなされる。 In the example shown in FIG. 25, the chi-square statistic is 6.2071, and the p-value is 0.013. That is, in the chi-square distribution with one degree of freedom, the probability that the chi-square statistic 6.2071 will occur is 0.0013. Since the p-value is smaller than the significance level 0.05, the null hypothesis is rejected, and the concept pairs are not independent of each other and are considered statistically correlated.

このため、図２５に示す例では、概念抽出部５５は、概念ペアに対して正例のラベルを付与する。なお、カイ二乗検定の結果，互いに独立であると判定された場合は、概念抽出部５５は、概念ペアに対して負例のラベルを付与する。 For this reason, in the example shown in FIG. 25, the concept extraction part 55 gives the label of a positive example with respect to a concept pair. Note that, when it is determined that they are independent from each other as a result of the chi-square test, the concept extraction unit 55 gives a negative example label to the concept pair.

図２４に戻り、概念抽出部５５は、概念ペアに相関性がある場合には（ステップＳ３１２でＹｅｓ）、概念ペアに正例を示すラベルを対応付けて抽出概念情報記憶部３２又はユーザ概念情報記憶部３３に格納する（ステップＳ３１４）。一方、概念抽出部５５は、概念ペアに相関性がない場合には（ステップＳ３１２でＮｏ）、概念ペアに負例を示すラベルを対応付けて抽出概念情報記憶部３２又はユーザ概念情報記憶部３３に格納する（ステップＳ３１６）。 Returning to FIG. 24, when the concept pair has a correlation (Yes in step S312), the concept extraction unit 55 associates the concept pair with a label indicating a positive example, or extracts the concept information storage unit 32 or the user concept information. Store in the storage unit 33 (step S314). On the other hand, when the concept pair has no correlation (No in step S312), the concept extraction unit 55 associates the concept pair with a label indicating a negative example, or the extracted concept information storage unit 32 or the user concept information storage unit 33. (Step S316).

続いて、概念抽出部５５は、全ての選択可能なパターン情報を処理するまで、ステップＳ３０２〜ステップＳ３１６の処理を繰り返す（ステップＳ３１８でＮｏ）。そして、全ての選択可能なパターン情報を処理した場合には（ステップＳ３１８でＹｅｓ）、概念抽出部５５は、処理を終了する。 Subsequently, the concept extraction unit 55 repeats the processing from step S302 to step S316 until all selectable pattern information is processed (No in step S318). When all the selectable pattern information has been processed (Yes in step S318), the concept extraction unit 55 ends the process.

図２６は、図１７のステップＳ５０に示すオントロジー生成処理の手順の流れの一例を示すフローチャートである。 FIG. 26 is a flowchart illustrating an example of the flow of the ontology generation process shown in step S50 of FIG.

まず、生成部６０は、抽出概念情報記憶部３２又はユーザ概念情報記憶部３３から概念情報を選択する（ステップＳ５０２）。 First, the generation unit 60 selects concept information from the extracted concept information storage unit 32 or the user concept information storage unit 33 (step S502).

続いて、生成部６０は、選択した概念情報の概念ペアが正例であるか否かを確認する（ステップＳ５０３）。 Subsequently, the generation unit 60 confirms whether the concept pair of the selected concept information is a positive example (Step S503).

続いて、生成部６０は、選択した概念ペアが正例である場合には（ステップＳ５０３でＹｅｓ）、概念ペアのそれぞれの名称を付与したノードとともに、各ノード間のリンクを作成する（ステップＳ５０４）。図１３に示す例では、生成部６０は、ノード１８１〜１８３により構成されるオントロジーを生成している。 Subsequently, when the selected concept pair is a positive example (Yes in step S503), the generation unit 60 creates a link between the nodes together with the node to which the name of the concept pair is assigned (step S504). ). In the example illustrated in FIG. 13, the generation unit 60 generates an ontology including nodes 181 to 183.

続いて、生成部６０は、全ての概念情報を処理するまで、ステップＳ５０２〜ステップＳ５０４の処理を繰り返す（ステップＳ５０６でＮｏ）。そして、全ての概念情報を処理した場合には（ステップＳ５０６でＹｅｓ）、生成部６０は、処理を終了する。 Subsequently, the generation unit 60 repeats the processing from step S502 to step S504 until all the conceptual information is processed (No in step S506). If all the conceptual information has been processed (Yes in step S506), the generation unit 60 ends the process.

図２７は、図１７のステップＳ６０に示す矛盾判定処理の手順の流れの一例を示すフローチャートである。 FIG. 27 is a flowchart showing an example of the flow of the contradiction determination process shown in step S60 of FIG.

まず、判定部６５は、生成部６０により生成されたオントロジーからノードを選択する（ステップＳ６０２）。 First, the determination unit 65 selects a node from the ontology generated by the generation unit 60 (step S602).

続いて、判定部６５は、選択したノードを開始点として、オントロジー上で深さ優先探索を行う（ステップＳ６０４）。 Subsequently, the determination unit 65 performs a depth-first search on the ontology using the selected node as a starting point (step S604).

続いて、判定部６５は、深さ優先探索の結果、訪問済みのノードにたどり着いたか否かを確認する（ステップＳ６０６）。 Subsequently, the determination unit 65 confirms whether or not the visited node has been reached as a result of the depth-first search (step S606).

続いて、訪問済みのノードにたどり着いた場合には（ステップＳ６０６でＹｅｓ）、判定部６５は、深さ優先探索で訪問した全ての訪問済みのノードの生成元となった概念ペアに矛盾ありを対応付けて抽出概念情報記憶部３２又はユーザ概念情報記憶部３３に格納する（ステップＳ６０８）。なお、深さ優先探索中に訪問済みのノードにたどり着かなかった場合には（ステップＳ６０６でＮｏ）、判定部６５は、ステップＳ６０８の処理を行わない。図１４に示す例では、ノード１８１及びノード１８２間ではリンクが巡回しているため、判定部６５は、ノード１８１及びノード１８２の生成元の概念ペアに矛盾があると判定する。 Subsequently, when the visited node is reached (Yes in step S606), the determination unit 65 determines that there is a contradiction in the concept pairs that are the generation sources of all visited nodes visited in the depth-first search. The associated information is stored in the extracted concept information storage unit 32 or the user concept information storage unit 33 (step S608). If the visited node is not reached during the depth-first search (No in step S606), the determination unit 65 does not perform the process in step S608. In the example illustrated in FIG. 14, since the link circulates between the node 181 and the node 182, the determination unit 65 determines that there is a contradiction in the concept pair of the generation source of the node 181 and the node 182.

続いて、判定部６５は、全てのノードを処理するまで、ステップＳ６０２〜ステップＳ６０８の処理を繰り返す（ステップＳ６１０でＮｏ）。そして、全てのノードを処理した場合には（ステップＳ６１０でＹｅｓ）、判定部６５は、処理を終了する。 Subsequently, the determination unit 65 repeats the processes in steps S602 to S608 until all the nodes are processed (No in step S610). If all nodes have been processed (Yes in step S610), the determination unit 65 ends the process.

図２８は、本実施の形態のオントロジー生成装置１で行われるユーザ概念情報記憶部３３への概念情報の登録処理の手順の流れの一例を示すフローチャートである。なお、図２８に示す例では、概念情報の登録処理について説明するが、パターン情報についても同様の手法で登録することができる。 FIG. 28 is a flowchart illustrating an example of a flow of a procedure for registering concept information in the user concept information storage unit 33 performed by the ontology generation device 1 according to the present embodiment. In the example shown in FIG. 28, the concept information registration process will be described, but pattern information can also be registered by the same method.

まず、受付部４０は、入力部１０から概念情報の登録操作の入力を受け付ける（ステップＳ８０２）。 First, the receiving unit 40 receives an input of a concept information registration operation from the input unit 10 (step S802).

続いて、登録部７５は、受け付けた概念情報をユーザ概念情報記憶部３３へ登録（格納）する（ステップＳ８０４）。 Subsequently, the registration unit 75 registers (stores) the received concept information in the user concept information storage unit 33 (step S804).

図２９は、本実施の形態のオントロジー生成装置１で行われる概念情報の比較処理の手順の流れの一例を示すフローチャートである。なお、図２９に示す例では、概念情報の比較処理について説明するが、パターン情報についても同様の手法で比較することができる。 FIG. 29 is a flowchart illustrating an example of a flow of a conceptual information comparison process performed by the ontology generation device 1 according to the present embodiment. In the example shown in FIG. 29, the concept information comparison process will be described, but pattern information can also be compared using the same method.

まず、受付部４０は、入力部１０から概念情報の比較操作の入力を受け付ける（ステップＳ９０２）。 First, the receiving unit 40 receives an input of a conceptual information comparison operation from the input unit 10 (step S902).

続いて、比較部８０は、受付部４０により比較操作の入力が受け付けられると、抽出概念情報記憶部３２及びユーザ概念情報記憶部３３の双方に記憶されている同一の概念ペアを有する概念情報を読み出す（ステップＳ９０４）。 Subsequently, when the comparison unit 80 receives the input of the comparison operation, the comparison unit 80 displays the concept information having the same concept pair stored in both the extracted concept information storage unit 32 and the user concept information storage unit 33. Read (step S904).

続いて、比較部８０は、読み出した概念情報のラベルを比較する（ステップＳ９０６）。 Subsequently, the comparison unit 80 compares the labels of the read concept information (step S906).

そして、出力制御部７０は、ラベルの不一致により互いの概念情報が一致しない場合には（ステップＳ９０８でＮｏ）、互いの概念情報が一致しない旨を出力部２０に出力させる（ステップＳ９１０）。なお、ラベルの一致により互いの概念情報が一致する場合には（ステップＳ９０８でＹｅｓ）、出力制御部７０は、ステップＳ９１０に示す処理を行わない。 If the concept information does not match due to label mismatch (No in step S908), the output control unit 70 causes the output unit 20 to output that the concept information does not match (step S910). Note that if the conceptual information matches due to matching of labels (Yes in step S908), the output control unit 70 does not perform the process shown in step S910.

続いて、比較部８０は、抽出概念情報記憶部３２及びユーザ概念情報記憶部３３の双方に記憶されている同一の概念ペアを全て処理するまで、ステップＳ９０２〜ステップＳ９１０の処理を繰り返す（ステップＳ９１２でＮｏ）。 Subsequently, the comparison unit 80 repeats the processing from step S902 to step S910 until all the same concept pairs stored in both the extracted concept information storage unit 32 and the user concept information storage unit 33 are processed (step S912). No).

そして、同一の概念ペアを全て処理した場合には（ステップＳ９１２でＹｅｓ）、比較部８０は、処理を終了する。 If all the same concept pairs have been processed (Yes in step S912), the comparison unit 80 ends the process.

このように本実施の形態では、オントロジーを生成して出力するだけでなく、オントロジーを構成する概念間の矛盾の有無まで判定して出力しているため、生成されたオントロジーを構成する概念間の矛盾をユーザが容易に判別することができる。 As described above, in this embodiment, not only the ontology is generated and output, but also whether there is a contradiction between concepts constituting the ontology is determined and output. The user can easily discriminate the contradiction.

また本実施の形態では、オントロジーを構成する概念間の矛盾の有無の判定結果を、オントロジーを構成する概念に対応づけて記憶するため、この情報を用いれば、オントロジーを構成する概念間の矛盾の有無を容易に判別可能なオントロジーを構築することができる。 In this embodiment, the determination result of the existence of contradiction between the concepts constituting the ontology is stored in association with the concept constituting the ontology. An ontology that can easily determine the presence or absence can be constructed.

また本実施の形態では、概念ペアを用いて構文木パターンを抽出するとともに、抽出した構文木パターンを用いて概念ペアを抽出することを繰り返し行うため、概念及び概念の関係の抽出漏れを減らし，幅広く獲得することができる。 In the present embodiment, the syntax tree pattern is extracted using the concept pair and the concept pair is extracted using the extracted syntax tree pattern repeatedly. You can earn a wide range.

また本実施の形態では、概念ペア及び構文木パターンにそれぞれラベルを対応付けているため、正しい関係にない概念ペアや、概念ペアの抽出に適さない構文木パターンを排除することができる。 Further, in the present embodiment, since labels are associated with concept pairs and syntax tree patterns, concept pairs that are not in a correct relationship and syntax tree patterns that are not suitable for extracting concept pairs can be excluded.

また本実施の形態では、概念ペア及び構文木パターンのそれぞれを、オントロジー生成装置自身により抽出できるだけでなく、ユーザによる登録を併用することもできる。 In the present embodiment, not only the concept pair and the syntax tree pattern can be extracted by the ontology generation device itself, but also registration by the user can be used together.

なお本実施の形態のオントロジー生成装置１は、ＣＰＵ（Central Processing Unit）などの制御装置、ＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory）などの記憶装置、液晶ディスプレイなどの表示装置、キーボードやマウスなどの入力装置、ネットワークに接続して通信を行う通信Ｉ／Ｆ等を備えたハードウェア構成となっている。 The ontology generation device 1 according to the present embodiment includes a control device such as a CPU (Central Processing Unit), a storage device such as a ROM (Read Only Memory) and a RAM (Random Access Memory), a display device such as a liquid crystal display, a keyboard, The hardware configuration includes an input device such as a mouse, a communication I / F that communicates by connecting to a network.

また、本発明は、上記実施の形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化することができる。また、上記実施の形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成することができる。例えば、実施の形態に示される全構成要素からいくつかの構成要素を削除してもよい。さらに、異なる実施の形態にわたる構成要素を適宜組み合わせても良い。 The present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

１オントロジー生成装置
１０入力部
２０出力部
３０記憶部
３１概念情報記憶部
３２抽出概念情報記憶部
３３ユーザ概念情報記憶部
３６パターン情報記憶部
３７抽出パターン情報記憶部
３８ユーザパターン情報記憶部
４０受付部
４５取得部
５０パターン抽出部
５５概念抽出部
６０生成部
６５判定部
７０出力制御部
７５登録部
８０比較部
１１１、１１２、１１３、１１４、１７１概念情報
１１６、１１７、１４１パターン情報
１２１〜１２５、１３２、１３４、１５３〜１５５、１６１〜１６５文字列
１８１〜１８３ノード
２１１〜２１５概念ペア
２２１〜２２５文
２３０ベクトル空間
２３１負例ベクトル空間
２３２正例ベクトル空間
２３３境界曲線 DESCRIPTION OF SYMBOLS 1 Ontology production | generation apparatus 10 Input part 20 Output part 30 Storage part 31 Concept information storage part 32 Extraction concept information storage part 33 User concept information storage part 36 Pattern information storage part 37 Extraction pattern information storage part 38 User pattern information storage part 40 Reception part 45 acquisition unit 50 pattern extraction unit 55 concept extraction unit 60 generation unit 65 determination unit 70 output control unit 75 registration unit 80 comparison unit 111, 112, 113, 114, 171 concept information 116, 117, 141 pattern information 121-125, 132 , 134, 153 to 155, 161 to 165 Character string 181 to 183 Node 211 to 215 Concept pair 221 to 225 Sentence 230 Vector space 231 Negative example vector space 232 Positive example vector space 233 Boundary curve

Claims

An acquisition unit for acquiring document data;
A concept information storage unit for storing a concept pair which is a set of vocabulary concepts;
From the document data, among the character strings of the sentences in which the concept pair co-occurs, the dependency relationship between the first and second character strings obtained by replacing each vocabulary representing the concept of the concept pair with a variable and other character strings A first extraction unit for extracting a pattern to be shown;
A pattern information storage unit for storing the extracted pattern;
A second extraction unit that extracts a new concept pair from the document data using the pattern stored in the pattern information storage unit and stores the new concept pair in the concept information storage unit;
A generation unit that generates an ontology that organizes relationships between concepts using the plurality of concept pairs stored in the concept information storage unit;
A determination unit for determining the presence or absence of contradiction between concepts constituting the ontology;
And an output unit that outputs a determination result together with the ontology.

The ontology generation apparatus according to claim 1, wherein the determination unit determines that there is a contradiction between the concepts when a cycle occurs between the concepts constituting the ontology.

The ontology generation apparatus according to claim 1, wherein the determination unit stores the determination result in the concept information storage unit in association with the concept pair.

The concept information storage unit further stores a first label indicating whether the relationship of the concept pair is correct or not in association with the concept pair,
The pattern information storage unit further stores a second label indicating whether or not the pattern is effective for extraction of the new concept pair in association with the pattern,
The first extraction unit performs machine learning using the linguistic information constituting the extracted pattern, the concept pair used for extraction, and the first label of the concept pair as a feature, thereby the first of the extracted pattern. 2 labels are generated, stored in the pattern information storage unit in association with the extracted pattern,
The second extraction unit extracts the new concept pair from the document data using the pattern in which the second label is valid among the patterns stored in the pattern information storage unit, and Store it in the concept information storage unit,
The said generation part produces | generates the said ontology using the said concept pair in which the said 1st label shows validity among the several said concept pairs memorize | stored in the said concept information storage part. The ontology generation device according to any one of?

The concept information storage unit includes a first concept information storage unit that stores the new concept pair extracted by the second extraction unit, and a second concept information storage unit that stores the concept pair registered by a user. Including,
The pattern information storage unit includes a first pattern information storage unit that stores the pattern extracted by the first extraction unit, and a second pattern information storage unit that stores the pattern registered by a user,
The first extraction unit extracts the pattern from the document data by using the concept pair stored in the first concept information storage unit and the second concept information storage unit, and the first concept Store it in the information storage unit,
The second extraction unit extracts the new concept pair from the document data using the patterns stored in the first pattern information storage unit and the second pattern information storage unit, and The ontology generation apparatus according to claim 4, wherein the ontology generation apparatus is stored in one concept information storage unit.

An accepting unit that accepts input of at least one of the concept pair and the pattern;
The registration unit that registers the accepted concept pair in the second concept information storage unit and registers the accepted pattern in the second pattern information storage unit. The ontology generation device described.

Comparison of the first label with respect to the same concept pair stored in both the first concept information storage unit and the second concept information storage unit, and the first pattern information storage unit and the second pattern information storage unit A comparison unit that performs at least one of the comparison of the second label with respect to the same pattern stored in both of
The ontology generation apparatus according to claim 5, wherein the output unit further outputs a comparison result.

An acquisition step in which the acquisition unit acquires document data;
The first extraction unit stores, from the document data, a character string of the concept pair out of a character string of a sentence co-occurring with the concept pair stored in a concept information storage unit that stores a concept pair that is a set of vocabulary concepts. A first extraction step of extracting a pattern indicating a dependency relationship between the first and second character strings obtained by replacing each vocabulary representing a concept with a variable and another character string and storing the pattern in a pattern information storage unit;
A second extraction step in which a second extraction unit extracts a new concept pair from the document data using the pattern stored in the pattern information storage unit and stores the new concept pair in the concept information storage unit;
A generation step of generating an ontology in which relationships between concepts are organized using the plurality of concept pairs stored in the concept information storage unit;
A determination step for determining whether or not there is a contradiction between concepts constituting the ontology; and
And an output control step for causing the output unit to output a determination result together with the ontology.