JP2009140113A

JP2009140113A - Dictionary editing device, dictionary editing method, and computer program

Info

Publication number: JP2009140113A
Application number: JP2007314122A
Authority: JP
Inventors: Yasuhide Miura; 康秀三浦; Motoyuki Takaai; 基行鷹合; Hiroshi Masuichi; 博増市
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2007-12-05
Filing date: 2007-12-05
Publication date: 2009-06-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device for enhancing an ontology label, and also to provide its method. <P>SOLUTION: A dictionary editing device is configured to perform the updating processing of dictionary data, e.g., an ontology dictionary, including relation information between the label being property information and the class being attribute information by class unit as a concept to express existence. A restriction condition as a text corpus retrieval condition is determined through the use of the label information of the class related to the class to be updated. The text corpus is retrieved according to the restriction condition so as to extract an association sentence including the label candidate of the class to be updated. An associated character string to be the label candidate of the class to be updated is extracted from the association sentence according to the restriction condition, thereby performing editing processing to set the extracted associated character string as the label of the class to be updated. By the above configuration, the ontology label is efficiently enhanced. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、辞書編集装置、および辞書編集方法、並びにコンピュータ・プログラムに関する。さらに具体的には存在を表す概念（概念に属する語彙を含む）の辞書の一種であり、上位概念と下位概念との語関係を記述した辞書であるオントロジーの構築や更新を行う辞書編集装置、および辞書編集方法、並びにコンピュータ・プログラムに関する。 The present invention relates to a dictionary editing device, a dictionary editing method, and a computer program. More specifically, a dictionary editing device for constructing and updating an ontology, which is a kind of dictionary of concepts representing existence (including vocabularies belonging to concepts), and describing a word relationship between a superordinate concept and a subordinate concept, The present invention relates to a dictionary editing method and a computer program.

例えばデータベース検索などにおいて適用する検索キーや、用語辞書の索引としてのインデックスの設定など、データ処理において適用する用語を自然言語の文書から抽出する処理は、様々なデータ処理分野において必要となる技術である。 For example, a process for extracting a term applied in data processing from a natural language document such as a search key applied in a database search or an index as an index of a term dictionary is a technique required in various data processing fields. is there.

様々なテキストデータの集合はテキストコーパスと呼ばれる。テキストコーパスに含まれる文書からの用語抽出、すなわち意味のある言語単位としての用語を抽出する研究は従来から行われている。例えば、［車が道路を走る］といったありふれた文書であれば、一般的な形態素解析システムを適用することで、［車］、［道路］、［走る］といった形態素を抽出することが可能である。形態素解析システムは、予め定めた形態素解析用の辞書を適用して、辞書登録語に基づいて意味的最小単位である形態素（ｍｏｒｐｈｅｍｅ）に分節して品詞の認定処理を行なうシステムとして知られている。 A collection of various text data is called a text corpus. Research on extracting terms from documents included in a text corpus, that is, extracting terms as meaningful language units, has been performed. For example, if it is a common document such as [car runs on road], it is possible to extract morphemes such as [car], [road], and [run] by applying a general morphological analysis system. . The morpheme analysis system is known as a system that applies a predetermined dictionary for morpheme analysis and performs a part-of-speech recognition process by segmenting into morphemes that are semantic minimum units based on dictionary registered words. .

また、例えばデータ検索を行う場合に用いられる用語を体系的にまとめた辞書としてシソーラスやオントロジーがある。シソーラスは意味の類似する用語を体系的にまとめた辞書である。オントロジーは、知識の分類体系および推論規則の集合を意味し、存在を表す概念の辞書の一種であり、上位概念と下位概念との語関係を記述した辞書として構成される。例えばある概念の上位概念、下位概念や、概念を表す用語（ラベル）、概念間の関係情報などを登録している。オントロジーは一般的には概念に対応するクラスをノードとしてノード間の接続関係を定義したクラス階層構成を持つ。具体的にはオントロジーには［概念］に対応するクラスや、クラスの性質情報としてのプロパティ、さらにクラス属性情報としてのアトリビュートなどが記述される。 Further, for example, a thesaurus or ontology is a dictionary that systematically summarizes terms used when performing data search. A thesaurus is a dictionary that systematically summarizes terms with similar meanings. Ontology means a classification system of knowledge and a set of inference rules, is a kind of concept dictionary representing existence, and is configured as a dictionary describing the word relationship between a superordinate concept and a subordinate concept. For example, a superordinate concept, a subordinate concept, a term (label) representing the concept, relation information between concepts, and the like are registered. In general, an ontology has a class hierarchy in which a connection relationship between nodes is defined with a class corresponding to a concept as a node. Specifically, the ontology describes a class corresponding to [concept], properties as class property information, and attributes as class attribute information.

シソーラスやオントロジーの構築や更新処理について開示した従来技術として、例えば特許文献１（特開２００１−１４１６６号公報）、特許文献２（特開２００６−３０９４４６号公報）がある。 For example, Patent Literature 1 (Japanese Patent Laid-Open No. 2001-14166) and Patent Literature 2 (Japanese Patent Laid-Open No. 2006-309446) are disclosed as related arts that disclose the thesaurus and ontology construction and update processing.

特許文献１（特開２００１−１４１６６号公報）は、複数のオントロジーがあるときに、オントロジー間でノードの属性（ノード名、ノードタイプ、ノード定義）の類似性を用いて、ノードの対応付けを自動的に行う構成や、ＸＭＬ、ＲＤＦ、ＤＴＤ等の様々な形式で表現されるデータについて、各データスキーマに対応したノード変換規則を用意して対応付けを行う構成を開示している。 Patent Document 1 (Japanese Patent Application Laid-Open No. 2001-14166) uses a similarity of node attributes (node name, node type, node definition) between ontologies when there are a plurality of ontologies. A configuration is disclosed in which a node conversion rule corresponding to each data schema is prepared and associated with data automatically expressed in various formats such as XML, RDF, and DTD.

また、特許文献２（特開２００６−３０９４４６号公報）は、オントロジーを構成する階層型データベースにおいて、例えばオントロジーに登録される［概念］に対応して定義されるクラスを追加する際、既存クラスとのアトリビュート（クラスやその属性の詳細フィールド）の類似性を用いて、類似クラスを抽出しユーザに提示して確認させてオントロジーに対するクラス追加などのオントロジー更新や、既存クラスのアトリビュートの編集に際して、予め保持している過去の編集履歴を用いて、アトリビュート値の候補をユーザに提示して選択可能とするなどオントロジーの更新の効率化を図る構成を提示している。 Further, Patent Document 2 (Japanese Patent Application Laid-Open No. 2006-309446) discloses that, in a hierarchical database constituting an ontology, for example, when adding a class defined corresponding to [concept] registered in the ontology, Using the similarity of attributes (class and details field of the attribute), similar classes are extracted and presented to the user for confirmation. Ontology updates such as adding classes to the ontology and editing attributes of existing classes A configuration for improving the efficiency of ontology is proposed, such as using the past editing history that is held to present a candidate of an attribute value to the user so that the candidate can be selected.

さらに、非特許文献１（Ｅｘｔｅｎｄｉｎｇａｔｈｅｓａｕｒｕｓｂｙｃｌａｓｓｉｆｙｉｎｇｗｏｒｄｓ．（ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＡＣＬ／ＥＡＣＬＷｏｒｋｓｈｏｐｏｎＡｕｔｏｍａｔｉｃＩｎｆｏｒｍａｔｉｏｎＥｘｔｒａｃｔｉｏｎａｎｄＢｕｉｌｄｉｎｇｏｆＬｅｘｉｃａｌＳｅｍａｎｔｉｃＲｅｓｏｕｒｃｅｓ，１９９７．））は、シソーラスに含まれる単語に対して、予め共起する単語を用いて特徴ベクトルを作成し、シソーラスに新規に単語を追加する際に、新規に追加する単語の特徴ベクトルを同様に作成し、既存の単語との特徴ベクトルの類似性を用いて、追加する場所を決定する技術について開示している。 Further, Non-Patent Document 1 (Extended a thesaurus by classifying Words, (In Proceedings of the ACL / EACL Workshop on Automatic Information Extraction and Bild 19). When creating a feature vector using co-occurring words and adding a new word to the thesaurus, create a new feature vector for the newly added word and use the similarity of the feature vector with the existing word The technology for determining the location to add is disclosed.

このように、シソーラスやオントロジーの構築や更新を行うシステムについて開示した文献は複数存在する。しかし、これらの従来技術には、既存のオントロジーに不足しているプロパティ（性質情報）を充実化させる構成については示されていない。例えば特許文献２はクラスの編集支援を行うが、過去の編集履歴を用いて支援を行うため、ラベル等のクラス固有の値を取るプロパティの充実化には向いていない。なお、テキストデータ等とオントロジーを結び付け高度な知識処理を行うには、クラスのテキスト中での存在を表すラベルが適切に設定されていることが重要になる。これは、オントロジー上ではクラスとは概念を現すものであり、テキスト中ではラベルが概念のインスタンス（実体）となって出現するためである。従って、オントロジーに設定されたクラスのラベルを充実化することはオントロジーの有用性を高めることになる。
特開２００１−１４１６６号公報特開２００６−３０９４４６号公報Ｅｘｔｅｎｄｉｎｇａｔｈｅｓａｕｒｕｓｂｙｃｌａｓｓｉｆｙｉｎｇｗｏｒｄｓ．（ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＡＣＬ／ＥＡＣＬＷｏｒｋｓｈｏｐｏｎＡｕｔｏｍａｔｉｃＩｎｆｏｒｍａｔｉｏｎＥｘｔｒａｃｔｉｏｎａｎｄＢｕｉｌｄｉｎｇｏｆＬｅｘｉｃａｌＳｅｍａｎｔｉｃＲｅｓｏｕｒｃｅｓ，１９９７．） As described above, there are a plurality of documents that disclose a system for constructing or updating a thesaurus or ontology. However, these prior arts do not show a configuration for enhancing properties (property information) that are lacking in existing ontologies. For example, Patent Document 2 provides support for class editing, but supports the use of past editing history, and is not suitable for enhancing properties that take values specific to classes such as labels. In order to perform advanced knowledge processing by linking text data and ontology, it is important that a label indicating the existence of the class in the text is appropriately set. This is because a class represents a concept in the ontology, and a label appears as an instance of the concept in the text. Therefore, enriching the label of the class set in the ontology increases the usefulness of the ontology.
Japanese Patent Laid-Open No. 2001-14166 JP 2006-309446 A Extending a thesaurus by classifying words. (In Proceedings of the ACL / EACL Workshop on Automatic Information Extraction and Building of Lexical Semantic Resources, 1997.)

本発明は、オントロジーに設定されたクラスの性質情報（プロパティ）の一種であるラベルの充実化を行う辞書編集装置、および辞書編集方法、並びにコンピュータ・プログラムを提供することを目的とする。 An object of the present invention is to provide a dictionary editing apparatus, a dictionary editing method, and a computer program for enhancing a label, which is a kind of property information (property) of a class set in an ontology.

例えば、あるオントロジーが存在する場合、クラスによってプロパティの充実度が異なることがある。これは例えば違う言語（英語等）で作成されたオントロジーを日本語化したときには顕著である。またそうでなくてもオントロジー構築者の知識の偏りによっては、クラス各々についてのラベルの表記に偏りが発生し、ある特定のクラスのラベルが不足してしまうことがある。本発明ではある特定のクラスのラベルを、関連クラスのラベルと、例えばオントロジーと関連性の高いテキストを含むテキストコーパスを利用して、ラベルの充実化を行うことを可能とする。なお、本発明はクラスのラベルの充実化について述べるが、ラベルと類似した性質を持つ他のプロパティにも本発明で述べる充実化手法は適用できる。 For example, when a certain ontology exists, the level of property fulfillment may differ depending on the class. This is remarkable when, for example, an ontology created in a different language (such as English) is translated into Japanese. Even if this is not the case, depending on the bias in the knowledge of the ontology builder, the label notation for each class may be biased, and the label for a particular class may be insufficient. In the present invention, it is possible to enrich a label by using a text corpus including a specific class label and a related class label, for example, a text highly relevant to an ontology. Although the present invention describes enhancement of class labels, the enhancement method described in the present invention can be applied to other properties having properties similar to labels.

本発明の第１の側面は、
存在を表す概念をクラスとして設定し、クラス対応の性質情報であるラベルと、クラス対応の属性情報であるクラス間の関係情報を含むデータを記録した辞書データの更新処理を実行する辞書編集装置であり、
更新対象クラスの関連クラスを前記辞書データから抽出する関連クラス抽出部と、
関連クラスに対応するラベル情報に基づいてテキストコーパスの検索条件としての制約条件を決定する制約条件抽出部と、
前記制約条件に従ってテキストコーパスの検索を行い、更新対象クラスのラベル候補を含む関連文を抽出する関連文抽出部と、
前記制約条件に従って、前記関連文抽出部の抽出した関連文から更新対象クラスのラベル候補となる関連文字列を抽出する関連文字列抽出部と、
前記関連文字列抽出部の抽出した関連文字列を前記更新対象クラスのラベルとして設定する編集処理部と、
を有することを特徴とする辞書編集装置にある。 The first aspect of the present invention is:
A dictionary editing device that sets a concept representing existence as a class, and executes update processing of dictionary data in which data including class-specific property information and data including class-related attribute information is included. Yes,
A related class extraction unit that extracts a related class of the update target class from the dictionary data;
A constraint condition extraction unit that determines a constraint condition as a search condition of a text corpus based on label information corresponding to a related class;
A related sentence extraction unit that searches a text corpus according to the constraint condition and extracts a related sentence including a candidate label of the update target class;
In accordance with the constraint condition, a related character string extraction unit that extracts a related character string that is a label candidate of the update target class from the related sentence extracted by the related sentence extraction unit;
An edit processing unit that sets the related character string extracted by the related character string extraction unit as a label of the update target class;
The dictionary editing apparatus is characterized by comprising:

さらに、本発明の辞書編集装置の一実施態様において、前記辞書編集装置は、さらに、前記関連文字列抽出部の抽出した関連文字列を編集オペレータに対して提示する関連文字列提示部を有し、前記編集処理部は、前記編集オペレータによる関連文字列の選択情報の入力に応じて、選択された関連文字列を前記更新対象クラスのラベルとして設定する処理を実行することを特徴とする。 Furthermore, in one embodiment of the dictionary editing device of the present invention, the dictionary editing device further includes a related character string presentation unit that presents the related character string extracted by the related character string extraction unit to the editing operator. The edit processing unit executes a process of setting the selected related character string as a label of the update target class in response to input of selection information of the related character string by the editing operator.

さらに、本発明の辞書編集装置の一実施態様において、前記辞書データは複数のクラスを階層構成として設定した辞書データであり、前記関連クラス抽出部は、更新対象クラスの関連クラスとして、スーパークラス（親クラス）、サブクラス（子クラス）、シブリングクラス（兄弟クラス）の少なくともいずれかの関連クラスを抽出し、前記制約条件抽出部は、スーパークラス（親クラス）、サブクラス（子クラス）、シブリングクラス（兄弟クラス）から選択された２つの関連クラスに対応するラベル情報に基づいて前記制約条件を決定する処理を行うことを特徴とする。 Furthermore, in one embodiment of the dictionary editing apparatus of the present invention, the dictionary data is dictionary data in which a plurality of classes are set as a hierarchical structure, and the related class extraction unit is configured to use a super class ( A related class of at least one of a parent class), a subclass (child class), and a sibling class (sibling class) is extracted, and the constraint condition extraction unit includes a super class (parent class), a subclass (child class), and a sibling class ( The constraint condition is determined based on label information corresponding to two related classes selected from the sibling class.

さらに、本発明の辞書編集装置の一実施態様において、前記制約条件抽出部は、正の制約条件としての文字列と、負の制約条件としての文字列を決定し、前記関連文抽出部は、正の制約条件として決定した文字と同種の文字列を含み、負の制約条件として決定した文字と同種の文字列を含まないテキストを前記テキストコーパスから抽出する処理を実行することを特徴とする。 Furthermore, in one embodiment of the dictionary editing apparatus of the present invention, the constraint condition extraction unit determines a character string as a positive constraint condition and a character string as a negative constraint condition, and the related sentence extraction unit includes: A process of extracting from the text corpus text that includes a character string of the same type as a character determined as a positive constraint condition and does not include a character string of the same type as a character determined as a negative constraint condition.

さらに、本発明の辞書編集装置の一実施態様において、前記制約条件抽出部は、更新対象クラスの関連クラスとして抽出したスーパークラス（親クラス）、サブクラス（子クラス）、シブリングクラス（兄弟クラス）から選択された２つの関連クラスに対応するラベル情報の最長の共通部分を正の制約条件として決定することを特徴とする。 Furthermore, in one embodiment of the dictionary editing apparatus according to the present invention, the constraint condition extraction unit includes a super class (parent class), a sub class (child class), and a sibling class (sibling class) extracted as related classes of the update target class. The longest common part of the label information corresponding to the two selected related classes is determined as a positive constraint condition.

さらに、本発明の辞書編集装置の一実施態様において、前記制約条件抽出部は、更新対象クラスの関連クラスのラベルを参照して抽出した正の制約条件に内包される負の制約条件を削除する処理を実行して最終的な制約条件を決定することを特徴とする。 Furthermore, in one embodiment of the dictionary editing apparatus according to the present invention, the constraint condition extraction unit deletes a negative constraint condition included in a positive constraint condition extracted by referring to a label of a related class of the update target class. The process is executed to determine the final constraint condition.

さらに、本発明の辞書編集装置の一実施態様において、前記関連文字列抽出部は、前記関連文抽出部の抽出した関連文から、前記正の制約条件としての文字列を含む連続する文字列を、前記更新対象クラスのラベル候補となる関連文字列として抽出することを特徴とする。 Furthermore, in one embodiment of the dictionary editing apparatus of the present invention, the related character string extraction unit extracts a continuous character string including the character string as the positive constraint condition from the related sentence extracted by the related sentence extraction unit. , And extracting as a related character string to be a label candidate of the update target class.

さらに、本発明の辞書編集装置の一実施態様において、前記関連文字列抽出部は、前記関連文抽出部の抽出した関連文の形態素解析を実行し、前記正の制約条件としての文字列を含む名詞の連続部を前記更新対象クラスのラベル候補となる関連文字列として抽出することを特徴とする。 Furthermore, in an embodiment of the dictionary editing apparatus according to the present invention, the related character string extraction unit performs a morphological analysis of the related sentence extracted by the related sentence extraction unit, and includes a character string as the positive constraint condition. A continuous part of nouns is extracted as a related character string that is a label candidate of the update target class.

さらに、本発明の辞書編集装置の一実施態様において、前記辞書は、階層構成を持つクラスの登録情報によって構成されたオントロジー辞書であることを特徴とする。 Furthermore, in one embodiment of the dictionary editing apparatus according to the present invention, the dictionary is an ontology dictionary configured by registration information of classes having a hierarchical structure.

さらに、本発明の第２の側面は、
情報処理装置において辞書データの編集処理を実行する辞書編集方法であり、
編集対象の辞書データは、存在を表す概念がクラスとして設定され、クラス対応の性質情報であるラベルと、クラス対応の属性情報であるクラス間の関係情報を含むデータを記録した辞書データであり、
関連クラス抽出部が、更新対象クラスの関連クラスを前記辞書データから抽出する関連クラス抽出ステップと、
制約条件抽出部が、関連クラスに対応するラベル情報に基づいてテキストコーパスの検索条件としての制約条件を決定する制約条件抽出ステップと、
関連文抽出部が、前記制約条件に従ってテキストコーパスの検索を行い、更新対象クラスのラベル候補を含む関連文を抽出する関連文抽出ステップと、
関連文字列抽出部が、前記制約条件に従って、前記関連文抽出部の抽出した関連文から更新対象クラスのラベル候補となる関連文字列を抽出する関連文字列抽出ステップと、
編集処理部が、前記関連文字列抽出ステップにおいて抽出した関連文字列を前記更新対象クラスのラベルとして設定する編集処理ステップと、
を有することを特徴とする辞書編集方法にある。 Furthermore, the second aspect of the present invention provides
A dictionary editing method for executing dictionary data editing processing in an information processing device,
The dictionary data to be edited is a dictionary data in which a concept representing existence is set as a class, and data including a label that is property information corresponding to a class and data including relationship information between classes that is attribute information corresponding to a class,
A related class extraction unit extracts a related class of the update target class from the dictionary data; and
A constraint condition extraction step in which a constraint condition extraction unit determines a constraint condition as a text corpus search condition based on label information corresponding to the related class;
A related sentence extracting unit performs a search of a text corpus according to the constraint condition, and extracts a related sentence including a label candidate of the update target class; and
A related character string extraction unit that extracts a related character string that becomes a label candidate of the update target class from the related sentence extracted by the related sentence extraction unit according to the constraint condition;
An edit processing unit that sets the related character string extracted in the related character string extraction step as a label of the update target class; and
A dictionary editing method characterized by comprising:

さらに、本発明の辞書編集方法の一実施態様において、前記辞書編集方法は、さらに、関連文字列提示部が、前記関連文字列抽出ステップにおいて抽出された関連文字列を編集オペレータに対して提示する関連文字列提示ステップを有し、前記編集処理ステップは、前記編集オペレータによる関連文字列の選択情報の入力に応じて、選択された関連文字列を前記更新対象クラスのラベルとして設定する処理を実行することを特徴とする。 Furthermore, in one embodiment of the dictionary editing method of the present invention, in the dictionary editing method, the related character string presentation unit further presents the related character string extracted in the related character string extraction step to the editing operator. A related character string presenting step, wherein the editing processing step executes a process of setting the selected related character string as a label of the update target class in response to input of selection information of the related character string by the editing operator It is characterized by doing.

さらに、本発明の辞書編集方法の一実施態様において、前記辞書データは複数のクラスを階層構成として設定した辞書データであり、前記関連クラス抽出ステップは、更新対象クラスの関連クラスとして、スーパークラス（親クラス）、サブクラス（子クラス）、シブリングクラス（兄弟クラス）の少なくともいずれかの関連クラスを抽出し、前記制約条件抽出ステップは、スーパークラス（親クラス）、サブクラス（子クラス）、シブリングクラス（兄弟クラス）から選択された２つの関連クラスに対応するラベル情報に基づいて前記制約条件を決定する処理を行うことを特徴とする。 Furthermore, in an embodiment of the dictionary editing method of the present invention, the dictionary data is dictionary data in which a plurality of classes are set as a hierarchical structure, and the related class extracting step includes a super class ( A related class of at least one of a parent class), a subclass (child class), and a sibling class (sibling class) is extracted, and the constraint extraction step includes a super class (parent class), a subclass (child class), and a sibling class ( The constraint condition is determined based on label information corresponding to two related classes selected from the sibling class.

さらに、本発明の辞書編集方法の一実施態様において、前記制約条件抽出ステップは、正の制約条件としての文字列と、負の制約条件としての文字列を決定するステップであり、前記関連文抽出ステップは、正の制約条件として決定した文字と同種の文字列を含み、負の制約条件として決定した文字と同種の文字列を含まないテキストを前記テキストコーパスから抽出する処理を実行することを特徴とする。 Furthermore, in one embodiment of the dictionary editing method of the present invention, the constraint condition extraction step is a step of determining a character string as a positive constraint condition and a character string as a negative constraint condition, and the related sentence extraction The step includes executing a process of extracting, from the text corpus, text that includes a character string of the same type as the character determined as the positive constraint condition and does not include the character string of the same type as the character determined as the negative constraint condition. And

さらに、本発明の辞書編集方法の一実施態様において、前記制約条件抽出ステップは、更新対象クラスの関連クラスとして抽出したスーパークラス（親クラス）、サブクラス（子クラス）、シブリングクラス（兄弟クラス）から選択された２つの関連クラスに対応するラベル情報の最長の共通部分を正の制約条件として決定することを特徴とする。 Furthermore, in one embodiment of the dictionary editing method according to the present invention, the constraint condition extraction step includes a super class (parent class), a sub class (child class), and a sibling class (sibling class) extracted as related classes of the update target class. The longest common part of the label information corresponding to the two selected related classes is determined as a positive constraint condition.

さらに、本発明の辞書編集方法の一実施態様において、前記制約条件抽出ステップは、更新対象クラスの関連クラスのラベルを参照して抽出した正の制約条件に内包される負の制約条件を削除する処理を実行して最終的な制約条件を決定することを特徴とする。 Furthermore, in one embodiment of the dictionary editing method of the present invention, the constraint extraction step deletes a negative constraint included in a positive constraint extracted by referring to a label of a related class of the update target class. The process is executed to determine the final constraint condition.

さらに、本発明の辞書編集方法の一実施態様において、前記関連文字列抽出ステップは、前記関連文抽出ステップにおいて抽出した関連文から、前記正の制約条件としての文字列を含む連続する文字列を、前記更新対象クラスのラベル候補となる関連文字列として抽出することを特徴とする。 Furthermore, in one embodiment of the dictionary editing method of the present invention, the related character string extraction step includes a continuous character string including the character string as the positive constraint condition from the related sentence extracted in the related sentence extraction step. , And extracting as a related character string to be a label candidate of the update target class.

さらに、本発明の辞書編集方法の一実施態様において、前記関連文字列抽出ステップは、前記関連文抽出ステップにおいて抽出した関連文の形態素解析を実行し、前記正の制約条件としての文字列を含む名詞の連続部を前記更新対象クラスのラベル候補となる関連文字列として抽出することを特徴とする。 Furthermore, in one embodiment of the dictionary editing method of the present invention, the related character string extraction step performs a morphological analysis of the related sentence extracted in the related sentence extraction step, and includes a character string as the positive constraint condition. A continuous part of nouns is extracted as a related character string that is a label candidate of the update target class.

さらに、本発明の辞書編集方法の一実施態様において、前記辞書は、階層構成を持つクラスの登録情報によって構成されたオントロジー辞書であることを特徴とする。 Furthermore, in an embodiment of the dictionary editing method of the present invention, the dictionary is an ontology dictionary configured by registration information of classes having a hierarchical structure.

さらに、本発明の第３の側面は、
情報処理装置において辞書データの編集処理を実行させるコンピュータ・プログラムであり、
編集対象の辞書データは、存在を表す概念がクラスとして設定され、クラス対応の性質情報であるラベルと、クラス対応の属性情報であるクラス間の関係情報を含むデータを記録した辞書データであり、
関連クラス抽出部に、更新対象クラスの関連クラスを前記辞書データから抽出させる関連クラス抽出ステップと、
制約条件抽出部に、関連クラスに対応するラベル情報に基づいてテキストコーパスの検索条件としての制約条件を決定させる制約条件抽出ステップと、
関連文抽出部に、前記制約条件に従ってテキストコーパスの検索を行い、更新対象クラスのラベル候補を含む関連文を抽出させる関連文抽出ステップと、
関連文字列抽出部に、前記制約条件に従って、前記関連文抽出部の抽出した関連文から更新対象クラスのラベル候補となる関連文字列を抽出させる関連文字列抽出ステップと、
編集処理部に、前記関連文字列抽出ステップにおいて抽出した関連文字列を前記更新対象クラスのラベルとして設定させる編集処理ステップと、
を有することを特徴とするコンピュータ・プログラムにある。 Furthermore, the third aspect of the present invention provides
A computer program for executing dictionary data editing processing in an information processing apparatus;
The dictionary data to be edited is a dictionary data in which a concept representing existence is set as a class, and data including a label that is property information corresponding to a class and data including relationship information between classes that is attribute information corresponding to a class,
A related class extraction step for causing the related class extraction unit to extract a related class of the update target class from the dictionary data;
A constraint condition extraction step for causing the constraint condition extraction unit to determine a constraint condition as a text corpus search condition based on the label information corresponding to the related class;
A related sentence extraction unit that performs a search of a text corpus according to the constraint condition and causes a related sentence extraction unit to extract a related sentence including a label candidate of the update target class; and
A related character string extraction step for causing the related character string extraction unit to extract a related character string that is a candidate label for the update target class from the related sentence extracted by the related sentence extraction unit in accordance with the constraint condition;
An editing processing step for causing the editing processing unit to set the related character string extracted in the related character string extracting step as a label of the update target class;
There is a computer program characterized by comprising:

なお、本発明のコンピュータ・プログラムは、例えば、様々なプログラム・コードを実行可能な汎用コンピュータ・システムに対して、コンピュータ可読な形式で提供する記憶媒体、通信媒体によって提供可能なコンピュータ・プログラムである。このようなプログラムをコンピュータ可読な形式で提供することにより、コンピュータ・システム上でプログラムに応じた処理が実現される。 The computer program of the present invention is, for example, a computer program that can be provided by a storage medium or a communication medium provided in a computer-readable format to a general-purpose computer system that can execute various program codes. . By providing such a program in a computer-readable format, processing corresponding to the program is realized on the computer system.

本発明のさらに他の目的、特徴や利点は、後述する本発明の実施例や添付する図面に基づくより詳細な説明によって明らかになるであろう。なお、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Other objects, features, and advantages of the present invention will become apparent from a more detailed description based on embodiments of the present invention described later and the accompanying drawings. In this specification, the system is a logical set configuration of a plurality of devices, and is not limited to one in which the devices of each configuration are in the same casing.

本発明の一実施例の構成によれば、存在を表す概念としてのクラス単位で性質情報であるラベルと、属性情報であるクラス間の関係情報を含む辞書データ、例えばオントロジー辞書の更新処理を実行する構成において、更新対象クラスの関連クラスのラベル情報を利用してテキストコーパスの検索条件としての制約条件を決定し、制約条件に従ってテキストコーパスの検索を行ない更新対象クラスのラベル候補を含む関連文を抽出し、さらに制約条件に従って関連文から更新対象クラスのラベル候補となる関連文字列を抽出して抽出した関連文字列を更新対象クラスのラベルとして設定する編集処理を行なう。本構成によりオントロジーのラベルの充実化を効率的に実行することが可能となる。 According to the configuration of one embodiment of the present invention, update processing is performed for dictionary data, for example, an ontology dictionary, which includes labels that are property information in terms of classes as a concept representing existence and relationship information between classes that is attribute information. In this configuration, the constraint condition as the text corpus search condition is determined using the label information of the related class of the update target class, the text corpus is searched according to the constraint condition, and the related sentence including the label candidate of the update target class is obtained. Further, an editing process is performed in which a related character string that is a candidate label for the update target class is extracted from the related sentence according to the constraint condition, and the extracted related character string is set as a label of the update target class. With this configuration, it is possible to efficiently execute the ontology label enhancement.

以下、図面を参照しながら本発明の一実施形態に係る辞書編集装置、および辞書編集方法、並びにコンピュータ・プログラムの詳細について説明する。 Details of a dictionary editing apparatus, a dictionary editing method, and a computer program according to an embodiment of the present invention will be described below with reference to the drawings.

図１に本発明の一実施形態に係る辞書編集装置の構成図を示す。本発明の辞書編集装置は、辞書記憶部であるオントロジー記憶部１１１に格納された辞書データ、本例ではオントロジーから処理対象とする［クラス］、すなわちオントロジーに登録された存在を表す概念に相当する［クラス］を選択し、さらに、オントロジー記憶部１１１に格納された関連クラスを抽出して、処理対象とする［クラス］の性質情報（プロパティ）として記録されたラベルの更新、例えば新たなラベルを追加する処理を行う。 FIG. 1 shows a configuration diagram of a dictionary editing apparatus according to an embodiment of the present invention. The dictionary editing apparatus according to the present invention corresponds to the dictionary data stored in the ontology storage unit 111 which is a dictionary storage unit, in this example, the [class] to be processed from the ontology, that is, the concept representing the presence registered in the ontology. [Class] is selected, and a related class stored in the ontology storage unit 111 is extracted, and an update of a label recorded as property information (property) of [Class] to be processed, for example, a new label Perform additional processing.

図１に示すように、本発明の一実施形態に係る辞書編集装置１００は、クラス選択部１０１、関連クラス抽出部１０２、制約条件抽出部１０３、関連文抽出部１０４、関連文字列抽出部１０５、関連文字列提示部１０６、オントロジー編集処理部１０７を有する。さらに、編集処理対象とするオントロジーを格納したオントロジー記憶部１１１、関連文抽出部１０４において利用されるテキストコーパスを格納したテキストコーパス記憶部１１２が利用される。 As shown in FIG. 1, a dictionary editing apparatus 100 according to an embodiment of the present invention includes a class selection unit 101, a related class extraction unit 102, a constraint condition extraction unit 103, a related sentence extraction unit 104, and a related character string extraction unit 105. A related character string presentation unit 106 and an ontology editing processing unit 107. Further, an ontology storage unit 111 that stores an ontology to be edited and a text corpus storage unit 112 that stores a text corpus used in the related sentence extraction unit 104 are used.

辞書編集装置１００を利用した処理の説明の前に、辞書記憶部であるオントロジー記憶部１１１に格納された辞書データとしてのオントロジーと、テキストコーパス記憶部１１２に格納されるテキストコーパスについて図２〜図４を参照して説明する。 Before describing the processing using the dictionary editing device 100, the ontology as dictionary data stored in the ontology storage unit 111, which is a dictionary storage unit, and the text corpus stored in the text corpus storage unit 112 will be described with reference to FIGS. This will be described with reference to FIG.

まず、図２を参照してオントロジー記憶部１１１に格納されるオントロジーについて説明する。前述したようにオントロジーは存在を表す概念の辞書の一種であり、上位概念と下位概念の関係を記述した辞書として構成される。例えばある概念の上位概念、下位概念の関連をノード間の接続関係で定義した階層構成を持つ。オントロジーには［概念］に対応するクラスや、クラスの性質情報としてのプロパティが登録情報として記述される。 First, the ontology stored in the ontology storage unit 111 will be described with reference to FIG. As described above, the ontology is a kind of concept dictionary representing existence, and is configured as a dictionary describing the relationship between the superordinate concept and the subordinate concept. For example, it has a hierarchical structure in which a relationship between a superordinate concept and a subordinate concept of a concept is defined by a connection relationship between nodes. In the ontology, a class corresponding to [concept] and a property as property information of the class are described as registration information.

図２は、オントロジー記述言語として知られるＯＷＬに従って記述された１つの登録語の情報構成例を示している。オントロジー記憶部１１１には、図２に示すような登録情報が、登録語である概念に対応するクラス各々について記録されている。 FIG. 2 shows an information configuration example of one registered word described according to OWL known as an ontology description language. In the ontology storage unit 111, registration information as shown in FIG. 2 is recorded for each class corresponding to a concept that is a registered word.

図２に示すように、オントロジーには、例えば以下の情報、すなわち、
（ａ）［概念］に対応するクラス、
（ｂ）クラスの性質情報としてのプロパティ、
（ｃ）クラスの属性情報としてのアトリビュート（他のクラスとの関係情報、例えば親子関係など）、
これらの情報が登録される。 As shown in FIG. 2, the ontology includes, for example, the following information:
(A) a class corresponding to [concept],
(B) properties as class property information;
(C) Attributes as class attribute information (relationship information with other classes, such as parent-child relationships),
These pieces of information are registered.

［クラス］は、登録された存在を表す概念名を記録するフィールドであり、［性質情報（プロパティ）］はラベル等のクラスの情報が記録される。さらに、クラスの属性情報（アトリビュート）として他のクラスとの関係情報も記録される。 [Class] is a field for recording a concept name representing registered existence, and [Property information (property)] records information on a class such as a label. Furthermore, relation information with other classes is also recorded as class attribute information (attributes).

他のクラスとの関係について、図３を参照して説明する。オントロジーは、概念に対応するクラス単位で、図２に示す登録情報を記録しているとともに、各クラスを１つのノードとして、クラス間の関係をノード間の接続関係として識別可能な構成を持つ。クラス間の関係として、大きく分類すると３つの関係が定義される。図３に示すクラスＡ２０１を注目クラスとして、クラスＡ２０１とのクラス関係について説明する。 The relationship with other classes will be described with reference to FIG. The ontology records the registration information shown in FIG. 2 in units of classes corresponding to concepts, and has a configuration that allows each class to be identified as one node and the relationship between classes to be identified as a connection relationship between nodes. As relations between classes, three relations are defined when roughly classified. The class relationship with the class A 201 will be described with the class A 201 shown in FIG. 3 as the class of interest.

まず、クラスＰ２１１は、クラスＡ２０１に相当する概念の上位概念のクラスである。
このクラスＰ２１１は、クラスＡ２０１のスーパークラス（親クラス）となる。 First, the class P211 is a superordinate concept class corresponding to the class A201.
This class P211 is a superclass (parent class) of class A201.

また、クラスＡ２０１は、クラスＡ２０１に相当する概念の下位概念のクラスとして、クラスＳ２２１、クラスＴ２２２を有する。
このクラスＳ２２１、クラスＴ２２２は、クラスＡ２０１のサブクラス（子クラス）となる。 The class A 201 includes a class S 221 and a class T 222 as subordinate concept classes corresponding to the class A 201.
The class S221 and the class T222 are subclasses (child classes) of the class A201.

さらに、クラスＰ２１１の下位概念として登録されたクラスには、クラスＡ２０１の他、クラスＢ２１２、クラスＣ２１３がある。クラスＢ２１２、クラスＣ２１３は、クラスＡ２０１と同じスーパークラス（親クラス）＝クラスＰ２１１を有する。
このクラスＢ２１２、クラスＣ２１３は、クラスＡ２０１のシブリングクラス（兄弟クラス）となる。 Further, classes registered as subordinate concepts of class P211 include class B212 and class C213 in addition to class A201. Class B212 and class C213 have the same superclass (parent class) = class P211 as class A201.
The class B212 and the class C213 are sibling classes (sibling classes) of the class A201.

このように、オントロジーのクラス関係としては、
（ａ）スーパークラス（親クラス）
（ｂ）サブクラス（子クラス）
（ｃ）シブリングクラス（兄弟クラス）
これらのクラス関係が定義される。さらに、サブクラスのサブクラス（孫クラス）等についても定義される。 In this way, the ontology class relationship is
(A) Super class (parent class)
(B) Subclass (child class)
(C) Sibling class (sibling class)
These class relationships are defined. Furthermore, a subclass of a subclass (grandchild class) and the like are also defined.

本発明の辞書編集装置では、オントロジー記憶部１１１に格納されたある１つのクラスを処理対象として選択し、さらに、オントロジー記憶部１１１に格納された関連クラスを抽出して、処理対象［クラス］の性質情報（プロパティ）として記録されたラベルの更新、例えば新たなラベルを追加する処理を行いラベルの充実化を実行する。 In the dictionary editing apparatus of the present invention, one class stored in the ontology storage unit 111 is selected as a processing target, and a related class stored in the ontology storage unit 111 is extracted, and the processing target [class] is selected. The label recorded as the property information (property) is updated, for example, a process of adding a new label is performed to enrich the label.

次に、図４を参照してテキストコーパス記憶部１１２に格納されるテキストコーパスの例について説明する。テキストコーパス記憶部１１２は様々なテキストを格納したデータベースである。図４に示すように、各テキストには識別子としてのＩＤが設定されている。 Next, an example of a text corpus stored in the text corpus storage unit 112 will be described with reference to FIG. The text corpus storage unit 112 is a database that stores various texts. As shown in FIG. 4, an ID as an identifier is set for each text.

以下、図１に示す辞書編集装置１００を利用した具体的処理例について説明する。なお、本実施例では、医療分野の用語を格納したオントロジー記憶部と、医療分野のテキストを集めたテキストデータベースを利用した処理例について説明する。 Hereinafter, a specific processing example using the dictionary editing apparatus 100 shown in FIG. 1 will be described. In the present embodiment, an example of processing using an ontology storage unit that stores medical field terms and a text database that collects medical field texts will be described.

まず、クラス選択部１０１において、ユーザ（オントロジー編集者）から、オントロジー記憶部１１１に格納されたオントロジーから、ラベルの充実化を行いたいクラスの指定を入力する。
指定クラスを［クラスＡ］とする。
例えば処理対象として指定されたクラスＡは、図２を参照して説明したと同様の登録情報を持つ。 First, in the class selection unit 101, a user (ontology editor) inputs a designation of a class to be enriched from the ontology stored in the ontology storage unit 111.
The designated class is [Class A].
For example, the class A designated as the processing target has registration information similar to that described with reference to FIG.

関連クラス抽出部１０２は、クラスＡの登録情報から、クラスＡの関連クラスとして、スーパークラス（親クラス）、サブクラス（子クラス）、シブリングクラス（兄弟クラス）を判別し、判別した関連クラスを、オントロジー記憶部１１１から抽出する。 The related class extraction unit 102 determines super classes (parent classes), subclasses (child classes), sibling classes (sibling classes) as the related classes of class A from the registration information of class A, and determines the determined related classes as Extracted from the ontology storage unit 111.

例えば、図２に示すＯＷＬで記述されたオントロジーでは、スーパークラスはクラスＡの"ＳｕｂＣｌａｓｓＯｆ"プロパティを参照することにより抽出できる。サブクラスについても、全体のスーパークラス構造を抽出していれば、指定したクラスをスーパークラスに持つクラスとして抽出できる。また、シブリングクラスについても、クラスＡのスーパークラスのサブクラスとして抽出できる。なお、抽出する関連クラスとしては、サブクラスのサブクラス（孫クラス）等をさらに抽出してもよい。 For example, in the ontology described in OWL shown in FIG. 2, the superclass can be extracted by referring to the “SubClassOf” property of class A. As for the subclass, if the entire superclass structure is extracted, the specified class can be extracted as a class having the superclass. The sibling class can also be extracted as a subclass of the superclass of class A. Note that as a related class to be extracted, a subclass of a subclass (grandchild class) or the like may be further extracted.

次に、制約条件抽出部１０３は、クラスＡの関連クラスとして抽出された関連クラスのラベルから、テキストコーパス記憶部１１２のテキストコーパスを検索するための制約条件を決定する。 Next, the constraint condition extraction unit 103 determines a constraint condition for searching the text corpus in the text corpus storage unit 112 from the label of the related class extracted as the related class of class A.

制約条件とは、
テキストコーパス記憶部１１２のテキストコーパスを検索する際の制約条件であり、［正の制約条件］と［負の制約条件］を決定する。［正の制約条件］と［負の制約条件］は、関連文抽出部１０４において、テキストコーパス記憶部１１２のテキストコーパスを検索する際に、以下のように利用される。
関連文抽出部１０４のテキスト検索処理＝テキストコーパス記憶部１１２のテキストコーパスから、正の制約条件として決定した文字と同種の文字列を含み、負の制約条件として決定した文字と同種の文字列を含まないテキストを抽出する。 What are constraints?
It is a constraint condition when searching for a text corpus in the text corpus storage unit 112, and determines [positive constraint condition] and [negative constraint condition]. [Positive constraint] and [Negative constraint] are used as follows when the related sentence extraction unit 104 searches for a text corpus in the text corpus storage unit 112.
A text search process of the related sentence extraction unit 104 = a character string of the same type as a character determined as a negative constraint including a character string of the same type as a positive constraint from the text corpus of the text corpus storage unit 112 Extract non-contained text.

制約条件抽出部１０３は、クラスＡの関連クラスとして抽出された関連クラスのラベルから、テキストコーパス記憶部１１２のテキストコーパスを検索するための制約条件をクラスＡとのクラス関係に応じて異なる態様で決定する。
以下では、
（ａ）シブリング（兄弟）クラスとシブリング（兄弟）クラスとのラベル比較による制約条件抽出
（ｂ）スーパー（親）クラスとシブリング（兄弟）クラスとのラベル比較による制約条件抽出
（ｃ）スーパー（親）クラスとサブ（子）クラスとのラベル比較による制約条件抽出
（ｄ）サブ（子）クラスとサブ（子）クラスとのラベル比較による制約条件抽出 The constraint condition extraction unit 103 sets the constraint condition for searching the text corpus in the text corpus storage unit 112 from the label of the related class extracted as the related class of the class A in a different manner according to the class relationship with the class A. decide.
Below,
(A) Constraint extraction by label comparison between sibling (sibling) class and sibling (sibling) class (b) Constraint extraction by label comparison between super (parent) class and sibling (sibling) class (c) Super (parent ) Constraint extraction by label comparison between class and sub (child) class (d) Constraint extraction by label comparison between sub (child) class and sub (child) class

これらの４パターンの処理による例を説明する。なお、以下に説明する制約条件決定パターンは一例であり、以下に説明する処理以外に、抽出された関連クラスの間で、同様の他のパターンを設定して制約条件を定める構成としてもよい。以下、上記（ａ）〜（ｄ）の各処理について説明する。 An example of processing of these four patterns will be described. Note that the constraint condition determination pattern described below is merely an example, and other than the processing described below, a configuration may be adopted in which a constraint condition is set by setting another similar pattern between the extracted related classes. Hereinafter, each of the processes (a) to (d) will be described.

（ａ）シブリング（兄弟）クラスとシブリング（兄弟）クラスとのラベル比較による制約条件抽出
まず、ラベル充実化の処理対象とするクラスＡに対応する２つのシブリング（兄弟）クラスとシブリング（兄弟）クラスとのラベル比較による制約条件抽出処理例について、図５を参照して説明する。 (A) Constraint extraction by label comparison between sibling class and sibling class First, two sibling classes and sibling class corresponding to class A to be processed for label enrichment An example of a constraint condition extraction process based on a label comparison will be described with reference to FIG.

シブリング（兄弟）クラスとシブリング（兄弟）クラスとのラベル比較による制約条件抽出処理においては、ラベル充実化の処理対象とするクラスＡに対応する２つのシブリングクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出する。また、共通部分を含むラベルの差分を負の制約条件として抽出する。 In the constraint extraction process by comparing the sibling class with the sibling class, the labels of the two sibling classes corresponding to the class A to be processed for label enrichment are compared, and the longest common Extract the part as a positive constraint. Further, a difference between labels including the common part is extracted as a negative constraint condition.

例えば図５に示すようにラベル充実化の処理対象とするクラスＡ３０１に、スーパー（親）クラス３０２を共通とするシブリング（兄弟）クラスとして、シブリングクラス１，３０３とシブリングクラス２，３０４がある。 For example, as shown in FIG. 5, there are sibling classes 1 and 303 and sibling classes 2 and 304 as sibling (sibling) classes that share the super (parent) class 302 in the class A 301 to be processed for label enhancement.

このような設定において、シブリングクラス１，３０３とシブリングクラス２，３０４それぞれの性質情報としてのラベルを抽出し、ラベルの比較を行う。図５に示す例では、
シブリングクラス１，３０３には、
＊肺がん
＊肺癌
これらのラベルが登録され。
シブリングクラス２，３０４には、
＊乳がん
＊乳癌
これらのラベルが登録されている。 In such a setting, labels as property information of the sibling classes 1 and 303 and the sibling classes 2 and 304 are extracted, and the labels are compared. In the example shown in FIG.
In sibling class 1,303,
* Lung cancer * Lung cancer These labels are registered.
In sibling class 2,304,
* Breast cancer * Breast cancer These labels are registered.

これら２つのシブリングクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出する。また、共通部分を含むラベルの差分を負の制約条件として抽出する。図５に示す例では、
正の制約条件として"がん"、"癌"
これらのラベルを抽出し、
負の制約条件として"肺"および"乳"
これらのラベルを抽出する。 The labels of these two sibling classes are compared, and the longest common part is extracted as a positive constraint. Further, a difference between labels including the common part is extracted as a negative constraint condition. In the example shown in FIG.
"Cancer" and "Cancer" as positive constraints
Extract these labels,
"Lung" and "milk" as negative constraints
Extract these labels.

（ｂ）スーパー（親）クラスとシブリング（兄弟）クラスとのラベル比較による制約条件抽出
次に、ラベル充実化の処理対象とするクラスＡに対応するスーパー（親）クラスとシブリング（兄弟）クラスとのラベル比較による制約条件抽出処理例について、図６を参照して説明する。 (B) Extraction of constraint condition by label comparison between super (parent) class and sibling (sibling) class Next, a super (parent) class and sibling (sibling) class corresponding to class A to be processed for label enhancement An example of constraint condition extraction processing based on label comparison will be described with reference to FIG.

スーパー（親）クラスとシブリング（兄弟）クラスのラベルを比較による制約条件抽出処理においては、これらのクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出する。また、共通部分を含むシブリングクラスのラベルの差分を負の制約条件として抽出する。 In the constraint condition extraction process by comparing the labels of the super (parent) class and sibling (sibling) class, the labels of these classes are compared, and the longest common part is extracted as a positive constraint condition. Also, the difference between sibling class labels including the common part is extracted as a negative constraint.

例えば図６に示すようにラベル充実化の処理対象とするクラスＡ３１１に、スーパー（親）クラス３１２とシブリング（兄弟）クラス３１３がある。 For example, as shown in FIG. 6, there are a super (parent) class 312 and a sibling (sibling) class 313 in the class A 311 to be processed for label enhancement.

このような設定において、スーパー（親）クラス３１２と、シブリング（兄弟）クラス３１３それぞれの性質情報としてのラベルを抽出し、ラベルの比較を行う。図６に示す例では、
スーパー（親）クラス３１２には、
＊がん
＊癌
＊癌腫
これらのラベルが登録され。
シブリング（兄弟）クラス３１３には、
＊肺がん
＊肺癌
これらのラベルが登録されている。 In such a setting, labels as property information of the super (parent) class 312 and sibling (sibling) class 313 are extracted, and the labels are compared. In the example shown in FIG.
Super (parent) class 312 includes
* Cancer * Cancer * Carcinoma These labels are registered.
In the sibling class 313,
* Lung cancer * Lung cancer These labels are registered.

これら２つのクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出する。また、共通部分を含むラベルの差分を負の制約条件として抽出する。図６に示す例では、
正の制約条件として"がん"、"癌"
これらのラベルを抽出し、
負の制約条件として"肺"
これらのラベルを抽出する。 The labels of these two classes are compared, and the longest common part is extracted as a positive constraint. Further, a difference between labels including the common part is extracted as a negative constraint condition. In the example shown in FIG.
"Cancer" and "Cancer" as positive constraints
Extract these labels,
"Lung" as a negative constraint
Extract these labels.

（ｃ）スーパー（親）クラスとサブ（子）クラスとのラベル比較による制約条件抽出
次に、ラベル充実化の処理対象とするクラスＡに対応するスーパー（親）クラスとサブ（子）クラスとのラベル比較による制約条件抽出処理例について、図７を参照して説明する。 (C) Constraint extraction by label comparison between super (parent) class and sub (child) class Next, a super (parent) class and a sub (child) class corresponding to class A to be processed for label enrichment An example of constraint condition extraction processing based on label comparison will be described with reference to FIG.

スーパー（親）クラスとサブ（子）クラスとのラベル比較による制約条件抽出処理においては、これらのクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出し、さらに、共通部分を含むスーパークラスのラベルも正の制約条件として抽出する。負の制約条件は抽出しない。 In the constraint extraction process based on label comparison between the super (parent) class and sub (child) class, the labels of these classes are compared, the longest common part is extracted as a positive constraint, and the common part is further extracted. Superclass labels that contain are also extracted as positive constraints. Negative constraints are not extracted.

例えば図７に示すようにラベル充実化の処理対象とするクラスＡ３２１に、スーパー（親）クラス３２２とサブ（子）クラス３２３がある。 For example, as shown in FIG. 7, the class A 321 to be processed for label enhancement includes a super (parent) class 322 and a sub (child) class 323.

このような設定において、スーパー（親）クラス３２２と、サブ（子）クラス３２３それぞれの性質情報としてのラベルを抽出し、ラベルの比較を行う。図６に示す例では、
スーパー（親）クラス３２２には、
＊腫
＊腫瘍
＊腫瘤
＊異常増殖
これらのラベルが登録され。
サブ（子）クラス３２３には、
＊中皮腫
これらのラベルが登録されている。 In such a setting, labels as property information of the super (parent) class 322 and the sub (child) class 323 are extracted, and the labels are compared. In the example shown in FIG.
Super (parent) class 322 has
* Tumor * Tumor * Mass * Abnormal growth These labels are registered.
The sub (child) class 323 includes
* Mesothelioma These labels are registered.

これら２つのクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出し、さらに、共通部分を含むスーパークラスのラベルも正の制約条件として抽出する。負の制約条件は抽出しない。図７に示す例では、
正の制約条件として"腫"、"腫瘍"、"腫瘤"
これらのラベルを抽出する。 By comparing the labels of these two classes, the longest common part is extracted as a positive constraint, and the label of the superclass including the common part is also extracted as a positive constraint. Negative constraints are not extracted. In the example shown in FIG.
"Tumor", "Tumor", "Tumor" as positive constraints
Extract these labels.

（ｄ）サブ（子）クラスとサブ（子）クラスとのラベル比較による制約条件抽出
次に、ラベル充実化の処理対象とするクラスＡに対応する２つのサブ（子）クラスのラベル比較による制約条件抽出処理例について、図８を参照して説明する。 (D) Extraction of constraint conditions by label comparison between sub (child) class and sub (child) class Next, constraint by label comparison of two sub (child) classes corresponding to class A to be processed for label enrichment An example of condition extraction processing will be described with reference to FIG.

サブ（子）クラスとサブ（子）クラスのラベルを比較による制約条件抽出処理においては、これらのクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出する。負の制約条件は抽出しない。 In the constraint condition extraction process by comparing the labels of the sub (child) class and the sub (child) class, the labels of these classes are compared, and the longest common part is extracted as a positive constraint condition. Negative constraints are not extracted.

例えば図８に示すようにラベル充実化の処理対象とするクラスＡ３３１に、サブ（子）クラス３３２とサブ（子）クラス３３３がある。 For example, as shown in FIG. 8, the class A 331 to be processed for label enhancement includes a sub (child) class 332 and a sub (child) class 333.

このような設定において、サブ（子）クラス３３２と、サブ（子）クラス３３３それぞれの性質情報としてのラベルを抽出し、ラベルの比較を行う。図８に示す例では、
サブ（子）クラス３３２には、
＊がん
＊癌
＊癌種
＊異常増殖
これらのラベルが登録され。
サブ（子）クラス３３３には、
＊アデノーマ
＊腺腫
これらのラベルが登録されている。 In such a setting, labels as property information of the sub (child) class 332 and the sub (child) class 333 are extracted, and the labels are compared. In the example shown in FIG.
The sub (child) class 332 includes
* Cancer * Cancer * Cancer type * Abnormal growth These labels are registered.
Sub (child) class 333 includes
* Adenoma * Adenoma These labels are registered.

これら２つのクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出する。負の制約条件は抽出しない。図７に示す例では、
正の制約条件として"腫"
このラベルを抽出する。 The labels of these two classes are compared, and the longest common part is extracted as a positive constraint. Negative constraints are not extracted. In the example shown in FIG.
"Tumor" as a positive constraint
Extract this label.

以上が、図１に示す制約条件抽出部１０３が実行する制約条件抽出処理である。上述したように制約条件抽出部１０３は、クラスＡの関連クラスとして抽出された関連クラスのラベルから、テキストコーパス記憶部１１２のテキストコーパスを検索するための制約条件を決定する。 The above is the constraint condition extraction processing executed by the constraint condition extraction unit 103 shown in FIG. As described above, the constraint condition extraction unit 103 determines the constraint condition for searching the text corpus in the text corpus storage unit 112 from the label of the related class extracted as the related class of class A.

さらに、制約条件抽出部１０３は、全ての関連クラス間から上述した処理により制約条件を抽出した後、抽出された正負の制約条件を利用してさらに最終的な制約条件を決定する。
具体的には、正の制約条件に内包される負の制約条件は削除する処理を行う。
例えば、
正の制約条件として"腫瘍"が抽出され、
負の制約条件として"腫"が抽出されていた場合、
負の制約条件"腫"は削除される。 Further, the constraint condition extraction unit 103 extracts a constraint condition from among all the related classes by the above-described process, and further determines a final constraint condition by using the extracted positive / negative constraint condition.
Specifically, a process of deleting a negative constraint included in a positive constraint is performed.
For example,
"Tumor" is extracted as a positive constraint,
If "tumor" was extracted as a negative constraint,
Negative constraint "tumor" is removed.

このようにして正負の制約条件が決定した後、関連文抽出部１０４が、テキストコーパス記憶部１１２からテキスト検索を行う。
関連文抽出部１０４は、正の制約条件を含み負の制約条件を含まない文の集合を抽出する。 After the positive / negative constraint conditions are determined in this way, the related sentence extraction unit 104 performs a text search from the text corpus storage unit 112.
The related sentence extraction unit 104 extracts a set of sentences that include a positive constraint condition and does not include a negative constraint condition.

例えば、正の制約条件として"がん"、"癌"があり、負の制約条件として"肺"、"乳"がある場合、"…食道癌と思われ…"のような文は抽出されるが、"…肺癌の可能性が…"のような文は抽出しない。 For example, if there are “cancer” and “cancer” as positive constraints, and “lung” and “milk” as negative constraints, a sentence like “… is likely esophageal cancer…” is extracted. However, it does not extract sentences such as "... probable lung cancer ...".

関連文抽出部１０４が、正の制約条件と負の制約条件を利用してテキストコーパス記憶部１１２から抽出したテキストは関連文字列抽出部１０５に入力される。 The text extracted by the related sentence extraction unit 104 from the text corpus storage unit 112 using the positive constraint condition and the negative constraint condition is input to the related character string extraction unit 105.

関連文字列抽出部１０５は、関連文抽出部１０４が、テキストコーパス記憶部１１２から抽出した文の集合から正の制約条件等を利用して関連文字列を抽出する。この関連文字列抽出処理は、例えば以下の２つの方法（方法ａ），（方法ｂ）のいずれか、またはその組み合わせを用いて実行することが可能である。 The related character string extraction unit 105 extracts a related character string from the set of sentences extracted by the related sentence extraction unit 104 from the text corpus storage unit 112 using a positive constraint condition or the like. This related character string extraction process can be executed using, for example, one of the following two methods (method a) and (method b), or a combination thereof.

（方法ａ）正の制約条件と同種の文字を含む連続する文字列を抽出する
抽出した文の集合から、正の制約条件と同種の文字を含む連続する文字列を抽出する方法による関連文字列抽出処理について説明する。 (Method a) Extracting a continuous character string including characters of the same type as a positive constraint condition Related character string by a method of extracting a continuous character string including characters of the same type as a positive constraint condition from a set of extracted sentences The extraction process will be described.

テキストコーパス記憶部１１２から抽出した文の集合から、
正の制約条件と同種の文字を、先頭文字として含む文字列を選択し、その文字列を含む連続する文字列を抽出する
同様に、正の制約条件と同種の文字を、末尾文字として含む文字列を選択し、その文字列を含む連続する文字列を抽出する From a set of sentences extracted from the text corpus storage unit 112,
Select a character string that contains the same type of character as the positive constraint as the first character, and extract consecutive characters that contain that character. Similarly, a character that contains the same type of character as the last character Select a column and extract consecutive strings that contain that string

例えば、
正の制約条件として"腫瘍"があり、
文の集合として"…や梗塞、腫瘍性病巣を…"、"膀胱腫瘍を見ている…"があれば、
"腫瘍性病巣"と"膀胱腫瘍"
これらの文字列が抽出される。
なお、文字種としては漢字、平仮名、片仮名、英数字、記号等を用いる。 For example,
The positive constraint is "tumor"
If there is a sentence collection of “… and infarcts, neoplastic lesions…” and “I am looking at a bladder tumor…”
"Neoplastic focus" and "bladder tumor"
These character strings are extracted.
As the character type, kanji, hiragana, katakana, alphanumeric characters, symbols, and the like are used.

（方法ｂ）文に対して形態素解析を行い正の制約条件を含む形態素の連なりを抽出する
テキストコーパス記憶部１１２から抽出した文の集合に対して、例えば形態素解析アルゴリズムとして知られている茶筌（ｈｔｔｐ：／／ｃｈａｓｅｎ．ｎａｉｓｔ．ｊｐ／ｈｉｋｉ／ＣｈａＳｅｎ／）等を用いて形態素解析を行い、抽出文を分かち書きにし、正の制約条件と同種の語を含む連続する名詞の集合を関連文字列として抽出する。 (Method b) Performing morphological analysis on a sentence and extracting a series of morphemes including positive constraint conditions For a set of sentences extracted from the text corpus storage unit 112, for example, a bowl (known as a morphological analysis algorithm) http://chasen.naist.jp/hiki/ChaSen/) etc., morphological analysis is performed, the extracted sentence is divided, and a set of consecutive nouns containing the same kind of word as a positive constraint is used as a related character string. Extract.

例えば、正の制約条件として"腫瘍"があり、
テキストコーパス記憶部１１２から抽出し、形態素解析によって分かち書きにされた文として
"…／や（助詞）／梗塞（名詞）／、（記号）／腫瘍性（名詞）／病巣（名詞）／を（助詞）／…"、"膀胱（名詞）／腫瘍（名詞）／を（助詞）／見（動詞）／て（助詞）／いる（動詞）／…"があれば、
正の制約条件と同種の語を含む連続する名詞の集合として、
"腫瘍性病巣"と"膀胱腫瘍"が関連文字列として抽出される。 For example, a positive constraint is "tumor"
As a sentence extracted from the text corpus storage unit 112 and separated by morphological analysis
"... / ya (particle) / infarction (noun) /, (symbol) / neoplastic (noun) / lesion (noun) / a (particle) / ...", "bladder (noun) / tumor (noun) / Particle) / see (verb) / te (particle) / is (verb) / ... "
As a set of consecutive nouns that contain words of the same kind as positive constraints,
“Tumor lesion” and “bladder tumor” are extracted as related character strings.

なお、正の制約条件が現れる文が多いと抽出される関連文字列の数も多くなりがちなので、関連文字列のランキングを行い、上位ｎ（ｎは任意の数）を抽出してもよい。例えば、テキストコーパス内での関連文字列のエントロピーを求め、その値の高い上位ｎを取ることができる。なお、文字列のエントロピーを求める手法は一般的に知られており、文献「北研二．確率的言語モデル．東京大学出版会，１９９９」等に記載された方法を適用可能である。 If there are many sentences in which a positive constraint appears, the number of related character strings to be extracted tends to increase. Therefore, ranking of related character strings may be performed to extract the top n (n is an arbitrary number). For example, the entropy of the related character string in the text corpus can be obtained, and the top n having the highest value can be taken. A method for obtaining entropy of a character string is generally known, and a method described in the literature “Kitakenji. Stochastic language model. The University of Tokyo Press, 1999” or the like can be applied.

以上の処理によって、関連文字列抽出部１０５は、関連文抽出部１０４が、テキストコーパス記憶部１１２から抽出した文の集合から正の制約条件等を利用して関連文字列を抽出する。抽出された関連文字列は、関連文字列提示部１０６に入力され、抽出された関連文字列がディスプレイなどを利用してユーザ（オントロジー編集者）に提示される。ディスプレイ出力の他、音声出力なども適用可能であり、ユーザへの提示方法は特に問わない。また、関連文字列のランキングを行っている場合には、その順位とスコアを併せて提示してもよい。 With the above processing, the related character string extraction unit 105 extracts a related character string from the set of sentences extracted by the related sentence extraction unit 104 from the text corpus storage unit 112 using a positive constraint condition or the like. The extracted related character string is input to the related character string presentation unit 106, and the extracted related character string is presented to the user (ontology editor) using a display or the like. In addition to display output, audio output and the like can be applied, and the presentation method to the user is not particularly limited. Moreover, when ranking of related character strings is performed, the ranking and score may be presented together.

オントロジー編集処理部１０７は、ラベルの充実化を行いたいクラスＡの性質情報としてのラベルの編集を実行する処理部である。例えば、ユーザ（オントロジー編集者）が、ラベルの充実化を行いたいクラスＡの性質情報としてのラベルとして、提示された関連文字列が妥当であるかを判定し、妥当であると判定した場合は、その関連文字列をクラスＡのラベルとして追記する処理を行う。 The ontology editing processing unit 107 is a processing unit that executes editing of the label as the property information of the class A for which the label is to be enriched. For example, when the user (ontology editor) determines whether the presented related character string is valid as a label as the property information of class A to be enriched, and determines that it is valid Then, the related character string is additionally written as a class A label.

なお、ユーザ（オントロジー編集者）の介在なしに処理を実行する構成としてもよく、その場合は、関連文字列抽出部１０６において抽出された関連文字列を、クラスＡのラベルとして追加する処理を実行する構成とすればよい。この場合、図１に示す点線の矢印に従った処理となり、関連文字列提示部１０６は省略可能となる。 In addition, it is good also as a structure which performs a process without a user (ontology editor) intervention, In that case, the process which adds the related character string extracted in the related character string extraction part 106 as a class A label is performed. What is necessary is just to be the structure to do. In this case, the process follows the dotted arrow shown in FIG. 1, and the related character string presentation unit 106 can be omitted.

次に、図９に示すフローチャートを参照して、図１に示す辞書編集装置１００を利用した処理シーケンスについて説明する。 Next, a processing sequence using the dictionary editing apparatus 100 shown in FIG. 1 will be described with reference to the flowchart shown in FIG.

まず、ステップＳ１０１において、オントロジー記憶部１１１に格納されたオントロジーから、ラベルの充実化を行いたいクラスを選択する。これは、図１に示すクラス選択部１０１の処理であり、例えば、ユーザ（オントロジー編集者）による選択として行ってもよいし、あるいはオントロジー記憶部１１１に格納されたオントロジーから、順次、クラスを選択する構成としてもよい。処理対象クラス［クラスＡ］は、図２を参照して説明したと同様の登録情報を持つ。 First, in step S101, a class to be enriched with a label is selected from the ontology stored in the ontology storage unit 111. This is the processing of the class selection unit 101 shown in FIG. 1, for example, may be performed as a selection by the user (ontology editor), or the classes are sequentially selected from the ontology stored in the ontology storage unit 111. It is good also as composition to do. The processing target class [class A] has registration information similar to that described with reference to FIG.

次に、ステップＳ１０２において、処理対象クラス［クラスＡ］の関連クラスを抽出する。これは図１に示す関連クラス抽出部１０２の処理であり、クラスＡの登録情報から、クラスＡの関連クラスとして、スーパークラス（親クラス）、サブクラス（子クラス）、シブリングクラス（兄弟クラス）を判別し、判別した関連クラスを、オントロジー記憶部１１１から抽出する。 Next, in step S102, a related class of the processing target class [class A] is extracted. This is the processing of the related class extraction unit 102 shown in FIG. 1, and the super class (parent class), subclass (child class), sibling class (sibling class) are selected as the related class of class A from the registration information of class A. The determined related class is extracted from the ontology storage unit 111.

次に、ステップＳ１０３において、処理対象クラス［クラスＡ］のラベルに適用する関連文字列を抽出するために利用するテキストコーパスの検索条件である制約条件を抽出する。この処理は、図１に示す制約条件抽出部１０３の処理である。 Next, in step S103, a constraint condition that is a search condition for a text corpus used to extract a related character string applied to the label of the processing target class [class A] is extracted. This process is the process of the constraint condition extraction unit 103 shown in FIG.

制約条件は、前述したように、テキストコーパス記憶部１１２のテキストコーパスを検索する際の制約条件であり、［正の制約条件］と［負の制約条件］がある。テキストコーパスを検索する際に利用され、正の制約条件として決定した文字と同種の文字列を含み、負の制約条件として決定した文字と同種の文字列を含まないテキストの抽出が行われることになる。 As described above, the constraint condition is a constraint condition when searching the text corpus in the text corpus storage unit 112, and includes a [positive constraint condition] and a [negative constraint condition]. It is used when searching a text corpus, and the extraction of text that includes the same type of character string as the positive constraint and does not include the same type of character as the negative constraint Become.

ステップＳ１０３の制約条件抽出処理は、先に説明したように、処理対象クラス［クラスＡ］の関連クラスのクラス関係に応じて異なる態様で決定する。すなわち、先に図５〜図８を参照して説明したように以下の処理態様によって制約条件を決定する。 As described above, the constraint condition extraction process in step S103 is determined in a different manner according to the class relationship of the related classes of the processing target class [class A]. That is, as described above with reference to FIGS. 5 to 8, the constraint condition is determined by the following processing mode.

（ａ）シブリング（兄弟）クラスとシブリング（兄弟）クラスとのラベル比較による制約条件抽出
２つのシブリングクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出する。また、共通部分を含むラベルの差分を負の制約条件として抽出する。 (A) Extraction of constraints by label comparison between sibling class and sibling class Compare the labels of two sibling classes and extract the longest common part as a positive constraint. Further, a difference between labels including the common part is extracted as a negative constraint condition.

（ｂ）スーパー（親）クラスとシブリング（兄弟）クラスとのラベル比較による制約条件抽出
これらのクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出する。また、共通部分を含むシブリングクラスのラベルの差分を負の制約条件として抽出する。 (B) Extraction of constraint conditions by label comparison between super (parent) class and sibling (sibling) class Compare the labels of these classes and extract the longest common part as a positive constraint condition. Also, the difference between sibling class labels including the common part is extracted as a negative constraint.

（ｃ）スーパー（親）クラスとサブ（子）クラスとのラベル比較による制約条件抽出
これらのクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出し、さらに、共通部分を含むスーパークラスのラベルも正の制約条件として抽出する。負の制約条件は抽出しない。 (C) Constraint extraction by label comparison of super (parent) class and sub (child) class Compares the labels of these classes, extracts the longest common part as a positive constraint, and further extracts the common part The superclass labels that it contains are also extracted as positive constraints. Negative constraints are not extracted.

（ｄ）サブ（子）クラスとサブ（子）クラスとのラベル比較による制約条件抽出
これらのクラスのラベルを比較して、最長の共通部分を正の制約条件として抽出する。負の制約条件は抽出しない。 (D) Extraction of constraint condition by label comparison between sub (child) class and sub (child) class The labels of these classes are compared, and the longest common part is extracted as a positive constraint condition. Negative constraints are not extracted.

さらに、ステップＳ１０３では、全ての関連クラス間からの制約条件抽出後、正の制約条件に内包される負の制約条件は削除して、最終的な制約条件を決定する。
例えば、
正の制約条件として"腫瘍"が抽出され、
負の制約条件として"腫"が抽出されていた場合、
負の制約条件"腫"は削除される。 Further, in step S103, after extracting the constraint conditions between all the related classes, the negative constraint condition included in the positive constraint condition is deleted, and the final constraint condition is determined.
For example,
"Tumor" is extracted as a positive constraint,
If "tumor" was extracted as a negative constraint,
Negative constraint "tumor" is removed.

このようにして正負の制約条件が決定した後、ステップＳ１０４において、決定した制約条件を適用してテキストコーパス記憶部１１２からテキスト検索を行う。この処理は、図１に示す関連文抽出部１０４の処理である。
関連文抽出部１０４は、正の制約条件を含み負の制約条件を含まない文の集合をテキストコーパス記憶部１１２から抽出する。 After the positive and negative constraint conditions are determined in this way, in step S104, text search is performed from the text corpus storage unit 112 by applying the determined constraint conditions. This process is the process of the related sentence extraction unit 104 shown in FIG.
The related sentence extraction unit 104 extracts from the text corpus storage unit 112 a set of sentences that include positive constraint conditions and does not include negative constraint conditions.

次に、ステップＳ１０５において、テキストコーパス記憶部１１２から抽出した文の集合から、クラスＡのラベルとして追加する候補となる関連文字列を抽出する処理が実行される。この処理は図１に示す関連文字列抽出部１０５の処理である。 Next, in step S105, a process of extracting a related character string that is a candidate to be added as a class A label from the set of sentences extracted from the text corpus storage unit 112 is executed. This process is the process of the related character string extraction unit 105 shown in FIG.

関連文字列抽出部１０５は、先に説明したように、以下の（方法ａ）、（方法ｂ）、これらの方法のいずれか、または組み合わせにより、クラスＡのラベルとして追加する候補となる関連文字列を抽出する。
（方法ａ）正の制約条件と同種の文字を含む連続する文字列を抽出する
（方法ｂ）文に対して形態素解析を行い正の制約条件を含む形態素の連なりを抽出する。 As described above, the related character string extraction unit 105 uses the following (Method a), (Method b), any one of these methods, or a combination thereof as a candidate related character to be added as a class A label. Extract columns.
(Method a) Extracting a continuous character string including characters of the same type as a positive constraint (Method b) Performing morphological analysis on a sentence to extract a series of morphemes including a positive constraint.

なお、前述したように、正の制約条件が現れる文が多いと抽出される関連文字列の数も多くなりがちなので、関連文字列のランキングを行い、上位ｎ（ｎは任意の数）を抽出してもよい。 As mentioned above, the number of related character strings to be extracted tends to increase if there are many sentences in which a positive constraint appears. Therefore, ranking of related character strings is performed, and the top n (n is an arbitrary number) is extracted. May be.

次に、ステップＳ１０６において、抽出した関連文字列がディスプレイなどを利用してユーザ（オントロジー編集者）に提示される。これは、図１に示す関連文字列提示部１０６の処理である。関連文字列のランキングを行っている場合には、その順位とスコアを併せて提示してもよい。 Next, in step S106, the extracted related character string is presented to the user (ontology editor) using a display or the like. This is the process of the related character string presentation unit 106 shown in FIG. When ranking related character strings, the ranking and score may be presented together.

最後に、ステップＳ１０７において、提示された関連文字列を、ラベルの充実化を行いたいクラスＡの性質情報として追記するラベル編集を実行する。この処理は、図１に示すオントロジー編集処理部１０７の処理である。 Finally, in step S107, label editing is performed in which the presented related character string is additionally written as property information of class A to be enriched. This process is the process of the ontology editing processing unit 107 shown in FIG.

例えば、ユーザ（オントロジー編集者）が、ラベルの充実化を行いたいクラスＡの性質情報としてのラベルとして、提示された関連文字列が妥当であるかを判定し、妥当であると判定した場合は、その関連文字列をクラスＡのラベルとして追記する処理を行う。 For example, when the user (ontology editor) determines whether the presented related character string is valid as a label as the property information of class A to be enriched, and determines that it is valid Then, the related character string is additionally written as a class A label.

なお、前述したように、ユーザ（オントロジー編集者）の介在なしに処理を実行する構成としてもよく、その場合は、ステップＳ１０５の関連文字列抽出処理において抽出された関連文字列を、そのままクラスＡのラベルとして追加する処理を実行する。この場合、ステップＳ１０６の関連文字列提示処理は省略される。 As described above, the process may be executed without the intervention of the user (ontology editor). In this case, the related character string extracted in the related character string extraction process in step S105 is directly used as the class A. Execute the process of adding as a label. In this case, the related character string presentation process in step S106 is omitted.

最後に、上述した処理を実行する辞書編集装置を構成する情報処理装置のハードウェア構成例について、図１０を参照して説明する。辞書編集装置を構成する情報処理装置は、ハードウェアとしては例えばＰＣによって実現可能であり、上述した処理を実行するプログラムを実行させることによってデータ検索および表示データの生成、出力が可能である。ＣＰＵ（Central Processing Unit）７０１は、ＯＳ（Operating System)に対応する処理や、上述の実施例において説明したクラス選択処理、関連クラス抽出処理、制約条件抽出処理、関連文抽出処理、文字列抽出処理、関連文字列提示処理、オントロジー編集処理などを実行する。これらの処理は、各情報処理装置のＲＯＭ、ハードディスクなどのデータ記憶部に格納されたコンピュータ・プログラムに従って実行される。 Finally, a hardware configuration example of an information processing apparatus that constitutes a dictionary editing apparatus that executes the above-described processing will be described with reference to FIG. The information processing apparatus that constitutes the dictionary editing apparatus can be realized as hardware by, for example, a PC, and can execute data search and display data generation and output by executing a program that executes the above-described processing. A CPU (Central Processing Unit) 701 is a process corresponding to an OS (Operating System), a class selection process, a related class extraction process, a constraint condition extraction process, a related sentence extraction process, and a character string extraction process described in the above embodiment. , Related character string presentation processing, ontology editing processing, and the like. These processes are executed according to a computer program stored in a data storage unit such as a ROM or a hard disk of each information processing apparatus.

ＲＯＭ（Read Only Memory）７０２は、ＣＰＵ７０１が使用するプログラムや演算パラメータ等を格納する。ＲＡＭ（Random Access Memory）７０３は、ＣＰＵ７０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を格納する。これらはＣＰＵバスなどから構成されるホストバス７０４により相互に接続されている。 A ROM (Read Only Memory) 702 stores programs used by the CPU 701, calculation parameters, and the like. A RAM (Random Access Memory) 703 stores programs used in the execution of the CPU 701, parameters that change as appropriate during the execution, and the like. These are connected to each other via a host bus 704 including a CPU bus.

ホストバス７０４は、ブリッジ７０５を介して、ＰＣＩ(Peripheral Component Interconnect/Interface)バスなどの外部バス７０６に接続されている。キーボード７０８、ポインティングデバイス７０９は、ユーザにより操作される入力デバイスである。ディスプレイ７１０は、液晶表示装置またはＣＲＴ（Cathode Ray Tube）などから成り、各種情報をテキストやイメージで表示する。 The host bus 704 is connected to an external bus 706 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 705. A keyboard 708 and a pointing device 709 are input devices operated by the user. The display 710 is composed of a liquid crystal display device, a CRT (Cathode Ray Tube), or the like, and displays various types of information as text and images.

ＨＤＤ（Hard Disk Drive）７１１は、ハードディスクを内蔵し、ハードディスクを駆動し、ＣＰＵ７０１によって実行するプログラムや情報を記録または再生させる。ハードディスクは、例えば、オントロジー、テキストコーパス、辞書などの格納手段などに利用され、さらに、データ処理プログラム等、各種コンピュータ・プログラムが格納される。 An HDD (Hard Disk Drive) 711 includes a hard disk, drives the hard disk, and records or reproduces a program executed by the CPU 701 and information. The hard disk is used as storage means such as an ontology, a text corpus, and a dictionary, and further stores various computer programs such as a data processing program.

ドライブ７１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、または半導体メモリ等のリムーバブル記録媒体７２１に記録されているデータまたはプログラムを読み出して、そのデータまたはプログラムを、インタフェース７０７、外部バス７０６、ブリッジ７０５、およびホストバス７０４を介して接続されているＲＡＭ７０３に供給する。 The drive 712 reads data or a program recorded on a removable recording medium 721 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and reads the data or program as an interface 707 and an external bus 706. , The bridge 705, and the RAM 703 connected via the host bus 704.

接続ポート７１４は、外部接続機器７２２を接続するポートであり、ＵＳＢ，ＩＥＥＥ１３９４等の接続部を持つ。接続ポート７１４は、インタフェース７０７、および外部バス７０６、ブリッジ７０５、ホストバス７０４等を介してＣＰＵ７０１等に接続されている。通信部７１５は、ネットワークに接続され、例えば外部のデータベース８０１との通信によりデータ検索を実行する。 The connection port 714 is a port for connecting the external connection device 722 and has a connection unit such as USB or IEEE1394. The connection port 714 is connected to the CPU 701 and the like via the interface 707, the external bus 706, the bridge 705, the host bus 704, and the like. The communication unit 715 is connected to a network and executes data search by communicating with an external database 801, for example.

なお、図１０に示す情報処理装置のハードウェア構成例は、ＰＣを適用して構成した装置の一例であり、本発明の辞書編集装置は、図１０に示す構成に限らず、上述した実施例において説明した処理を実行可能な構成であればよい。 The hardware configuration example of the information processing apparatus shown in FIG. 10 is an example of an apparatus configured by applying a PC, and the dictionary editing apparatus of the present invention is not limited to the configuration shown in FIG. Any configuration can be used as long as the processing described in the above can be executed.

以上、特定の実施例を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が実施例の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、限定的に解釈されるべきではない。本発明の要旨を判断するためには、特許請求の範囲の欄を参酌すべきである。 The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiments without departing from the gist of the present invention. In other words, the present invention has been disclosed in the form of exemplification, and should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims should be taken into consideration.

また、明細書中において説明した一連の処理はハードウェア、またはソフトウェア、あるいは両者の複合構成によって実行することが可能である。ソフトウェアによる処理を実行する場合は、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれたコンピュータ内のメモリにインストールして実行させるか、あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。例えば、プログラムは記録媒体に予め記録しておくことができる。記録媒体からコンピュータにインストールする他、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、インターネットといったネットワークを介してプログラムを受信し、内蔵するハードディスク等の記録媒体にインストールすることができる。 The series of processing described in the specification can be executed by hardware, software, or a combined configuration of both. When executing processing by software, the program recording the processing sequence is installed in a memory in a computer incorporated in dedicated hardware and executed, or the program is executed on a general-purpose computer capable of executing various processing. It can be installed and run. For example, the program can be recorded in advance on a recording medium. In addition to being installed on a computer from a recording medium, the program can be received via a network such as a LAN (Local Area Network) or the Internet, and installed on a recording medium such as a built-in hard disk.

なお、明細書に記載された各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。また、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Note that the various processes described in the specification are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. Further, in this specification, the system is a logical set configuration of a plurality of devices, and the devices of each configuration are not limited to being in the same casing.

以上、説明したように、本発明の一実施例の構成によれば、存在を表す概念としてのクラス単位で性質情報であるラベルと、属性情報であるクラス間の関係情報を含む辞書データ、例えばオントロジー辞書の更新処理を実行する構成において、更新対象クラスの関連クラスのラベル情報を利用してテキストコーパスの検索条件としての制約条件を決定し、制約条件に従ってテキストコーパスの検索を行ない更新対象クラスのラベル候補を含む関連文を抽出し、さらに制約条件に従って関連文から更新対象クラスのラベル候補となる関連文字列を抽出して抽出した関連文字列を更新対象クラスのラベルとして設定する編集処理を行なう。本構成によりオントロジーのラベルの充実化を効率的に実行することが可能となる。 As described above, according to the configuration of one embodiment of the present invention, dictionary data including a label that is property information and a relationship information between classes that are attribute information in units of classes as a concept representing existence, for example, In the configuration that executes the update process of the ontology dictionary, the constraint information as the text corpus search condition is determined using the label information of the related class of the update target class, the text corpus is searched according to the constraint condition, and the update target class The related sentence including the label candidate is extracted, and further, the related character string that becomes the label candidate of the update target class is extracted from the related sentence according to the constraint condition, and the extracted related character string is set as the update target class label. . With this configuration, it is possible to efficiently execute the ontology label enhancement.

本発明の一実施例に係る辞書編集装置の構成例について示すブロック図である。It is a block diagram shown about the structural example of the dictionary editing apparatus which concerns on one Example of this invention. 本発明の一実施例に係る辞書編集装置において利用するオントロジーの例について説明する図である。It is a figure explaining the example of ontology utilized in the dictionary editing apparatus concerning one Example of this invention. 本発明の一実施例に係る辞書編集装置において利用するオントロジーの例について説明する図である。It is a figure explaining the example of ontology utilized in the dictionary editing apparatus concerning one Example of this invention. 本発明の一実施例に係る辞書編集装置において利用するテキストコーパスの例について説明する図である。It is a figure explaining the example of the text corpus utilized in the dictionary editing apparatus concerning one Example of this invention. 本発明の一実施例に係る辞書編集装置において実行する制約条件の抽出処理例について説明する図である。It is a figure explaining the example of an extraction process of the constraint conditions performed in the dictionary editing apparatus concerning one Example of this invention. 本発明の一実施例に係る辞書編集装置において実行する制約条件の抽出処理例について説明する図である。It is a figure explaining the example of an extraction process of the constraint conditions performed in the dictionary editing apparatus concerning one Example of this invention. 本発明の一実施例に係る辞書編集装置において実行する制約条件の抽出処理例について説明する図である。It is a figure explaining the example of an extraction process of the constraint conditions performed in the dictionary editing apparatus concerning one Example of this invention. 本発明の一実施例に係る辞書編集装置において実行する制約条件の抽出処理例について説明する図である。It is a figure explaining the example of an extraction process of the constraint conditions performed in the dictionary editing apparatus concerning one Example of this invention. 本発明の一実施例に係る辞書編集装置において実行する処理シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the process sequence performed in the dictionary editing apparatus which concerns on one Example of this invention. 本発明の一実施形態に係る辞書編集装置を構成する情報処理装置のハードウェア構成例について説明する図である。It is a figure explaining the hardware structural example of the information processing apparatus which comprises the dictionary editing apparatus which concerns on one Embodiment of this invention.

Explanation of symbols

１００辞書編集装置
１０１クラス選択部
１０２関連クラス抽出部
１０３制約条件抽出部
１０４関連文抽出部
１０５関連文字列抽出部
１０６関連文字列提示部
１０７オントロジー編集処理部
１１１オントロジー記憶部
１１２テキストコーパス記憶部
２０１クラスＡ
２１１〜２１３クラス
２２１〜２２２クラス
３０１クラスＡ
３０２スーパークラス
３０３，３０４シブリングクラス
３１１クラスＡ
３１２スーパークラス
３１３シブリングクラス
３２１クラスＡ
３２２スーパークラス
３２３サブクラス
３３１クラスＡ
３３２，３３３サブクラス
７０１ＣＰＵ(Central Processing Unit)
７０２ＲＯＭ（Read-Only-Memory）
７０３ＲＡＭ（Random Access Memory）
７０４ホストバス
７０５ブリッジ
７０６外部バス
７０７インタフェース
７０８キーボード
７０９ポインティングデバイス
７１０ディスプレイ
７１１ＨＤＤ（Hard Disk Drive）
７１２ドライブ
７１４接続ポート
７１５通信部
７２１リムーバブル記録媒体
７２２外部接続機器
８０１データベース DESCRIPTION OF SYMBOLS 100 Dictionary editing apparatus 101 Class selection part 102 Related class extraction part 103 Restriction condition extraction part 104 Related sentence extraction part 105 Related character string extraction part 106 Related character string presentation part 107 Ontology edit processing part 111 Ontology storage part 112 Text corpus storage part 201 Class A
211 to 213 Class 221 to 222 Class 301 Class A
302 Super class 303,304 Sibling class 311 Class A
312 Super class 313 Sibling class 321 Class A
322 Super class 323 Subclass 331 Class A
332,333 Subclass 701 CPU (Central Processing Unit)
702 ROM (Read-Only-Memory)
703 RAM (Random Access Memory)
704 Host bus 705 Bridge 706 External bus 707 Interface 708 Keyboard 709 Pointing device 710 Display 711 HDD (Hard Disk Drive)
712 Drive 714 Connection port 715 Communication unit 721 Removable recording medium 722 Externally connected device 801 Database

Claims

A dictionary editing device that sets a concept representing existence as a class, and executes update processing of dictionary data in which data including class-specific property information and data including class-related attribute information is included. Yes,
A related class extraction unit that extracts a related class of the update target class from the dictionary data;
A constraint condition extraction unit that determines a constraint condition as a search condition of a text corpus based on label information corresponding to a related class;
A related sentence extraction unit that searches a text corpus according to the constraint condition and extracts a related sentence including a candidate label of the update target class;
In accordance with the constraint condition, a related character string extraction unit that extracts a related character string that is a label candidate of the update target class from the related sentence extracted by the related sentence extraction unit;
An edit processing unit that sets the related character string extracted by the related character string extraction unit as a label of the update target class;
A dictionary editing apparatus comprising:

The dictionary editing device further includes:
A related character string presenting unit that presents the related character string extracted by the related character string extracting unit to an editing operator;
The edit processing unit
The dictionary editing apparatus according to claim 1, wherein a process of setting the selected related character string as a label of the update target class is executed in response to input of selection information of the related character string by the editing operator. .

The dictionary data is dictionary data in which a plurality of classes are set as a hierarchical structure,
The related class extraction unit includes:
As a related class of the update target class, extract at least one of the super class (parent class), sub class (child class), sibling class (sibling class),
The constraint condition extraction unit
The processing for determining the constraint condition is performed based on label information corresponding to two related classes selected from a super class (parent class), a subclass (child class), and a sibling class (sibling class). Item 4. The dictionary editing device according to item 1.

The constraint condition extraction unit
Determine a string as a positive constraint and a string as a negative constraint,
The related sentence extraction unit includes:
A process for extracting from the text corpus text that includes a character string of the same type as a character determined as a positive constraint and does not include a character string of the same type as a character determined as a negative constraint. Item 4. The dictionary editing device according to item 3.

The constraint condition extraction unit
Positive constraint on the longest common part of the label information corresponding to two related classes selected from the super class (parent class), subclass (child class), sibling class (sibling class) extracted as the related class of the update target class The dictionary editing apparatus according to claim 4, wherein the dictionary editing apparatus is determined as a condition.

The constraint condition extraction unit
5. A final constraint condition is determined by executing a process of deleting a negative constraint condition included in a positive constraint condition extracted by referring to a label of a related class of an update target class. The dictionary editing device described in 1.

The related character string extraction unit includes:
A continuous character string including a character string as the positive constraint condition is extracted as a related character string to be a label candidate of the update target class from the related sentence extracted by the related sentence extraction unit. Item 5. The dictionary editing device according to item 4.

The related character string extraction unit includes:
Performing a morphological analysis of the related sentence extracted by the related sentence extraction unit, and extracting a continuous part of a noun including a character string as the positive constraint condition as a related character string serving as a label candidate of the update target class. 5. The dictionary editing apparatus according to claim 4, wherein

The dictionary editing apparatus according to claim 1, wherein the dictionary is an ontology dictionary configured by registration information of a class having a hierarchical structure.

A dictionary editing method for executing dictionary data editing processing in an information processing device,
The dictionary data to be edited is a dictionary data in which a concept representing existence is set as a class, and data including a label that is property information corresponding to a class and data including relationship information between classes that is attribute information corresponding to a class,
A related class extraction unit extracts a related class of the update target class from the dictionary data; and
A constraint condition extraction step in which a constraint condition extraction unit determines a constraint condition as a text corpus search condition based on label information corresponding to the related class;
A related sentence extracting unit performs a search of a text corpus according to the constraint condition, and extracts a related sentence including a label candidate of the update target class; and
A related character string extraction unit that extracts a related character string that becomes a label candidate of the update target class from the related sentence extracted by the related sentence extraction unit according to the constraint condition;
An edit processing unit that sets the related character string extracted in the related character string extraction step as a label of the update target class; and
A dictionary editing method characterized by comprising:

The dictionary editing method further includes:
A related character string presentation unit has a related character string presentation step of presenting the related character string extracted in the related character string extraction step to the editing operator;
The editing process step includes:
The dictionary editing method according to claim 10, wherein a process of setting the selected related character string as a label of the update target class is executed in response to input of selection information of the related character string by the editing operator. .

The dictionary data is dictionary data in which a plurality of classes are set as a hierarchical structure,
The related class extracting step includes:
As a related class of the update target class, extract at least one of the super class (parent class), sub class (child class), sibling class (sibling class),
The constraint condition extraction step includes:
The processing for determining the constraint condition is performed based on label information corresponding to two related classes selected from a super class (parent class), a subclass (child class), and a sibling class (sibling class). Item 11. The dictionary editing method according to Item 10.

The constraint condition extraction step includes:
Determining a string as a positive constraint and a string as a negative constraint;
The related sentence extracting step includes:
A process for extracting from the text corpus text that includes a character string of the same type as a character determined as a positive constraint and does not include a character string of the same type as a character determined as a negative constraint. Item 13. The dictionary editing method according to Item 12.

The constraint condition extraction step includes:
Positive constraint on the longest common part of the label information corresponding to two related classes selected from the super class (parent class), subclass (child class), sibling class (sibling class) extracted as the related class of the update target class The dictionary editing method according to claim 13, wherein the dictionary editing method is determined as a condition.

The constraint condition extraction step includes:
14. A final constraint condition is determined by executing a process of deleting a negative constraint condition included in a positive constraint condition extracted by referring to a label of a related class of an update target class. The dictionary editing method described in 1.

The related character string extraction step includes:
A continuous character string including a character string as the positive constraint condition is extracted as a related character string to be a label candidate of the update target class from the related sentence extracted in the related sentence extracting step. Item 14. The dictionary editing method according to Item 13.

The related character string extraction step includes:
Performing morphological analysis of the related sentence extracted in the related sentence extracting step, and extracting a continuous part of nouns including the character string as the positive constraint condition as a related character string serving as a label candidate of the update target class. The dictionary editing method according to claim 13, wherein:

The dictionary editing method according to claim 10, wherein the dictionary is an ontology dictionary configured by registration information of a class having a hierarchical structure.

A computer program for executing dictionary data editing processing in an information processing apparatus;
The dictionary data to be edited is a dictionary data in which a concept representing existence is set as a class, and data including a label that is property information corresponding to a class and data including relationship information between classes that is attribute information corresponding to a class,
A related class extraction step for causing the related class extraction unit to extract a related class of the update target class from the dictionary data;
A constraint condition extraction step for causing the constraint condition extraction unit to determine a constraint condition as a text corpus search condition based on the label information corresponding to the related class;
A related sentence extraction unit that performs a search of a text corpus according to the constraint condition and causes a related sentence extraction unit to extract a related sentence including a label candidate of the update target class; and
A related character string extraction step for causing the related character string extraction unit to extract a related character string that is a candidate label for the update target class from the related sentence extracted by the related sentence extraction unit in accordance with the constraint condition;
An editing processing step for causing the editing processing unit to set the related character string extracted in the related character string extracting step as a label of the update target class;
A computer program characterized by comprising: