JP2005157823A

JP2005157823A - Knowledge base system, inter-word meaning relation determination method in the same system and computer program

Info

Publication number: JP2005157823A
Application number: JP2003396561A
Authority: JP
Inventors: Chikara Kurosawa; 主税黒沢; Shigeki Hino; 滋樹日野; Koji Ito; 浩二伊藤; Yukihisa Nishizawa; 幸久西澤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-11-27
Filing date: 2003-11-27
Publication date: 2005-06-16

Abstract

<P>PROBLEM TO BE SOLVED: To uniquely specify an inter-concept path, and to reduce an operation load at the time of applying data showing meaning structures to an electronic document. <P>SOLUTION: This document classification processing part 2 operates the morphemic analysis of an electronic document fetched through a document input interface 11, and retrieves the carrying position of a concept dictionary for morphemes obtained by the morphemic analysis, and selects two morphemes from among the morphemes, and extracts the path of a conceptual relation line between the selected two morphemes, and extracts the proper conceptual relation line path according to weighting factors applied according to the strength of the inter-concept connection when a plurality of extracted paths are confirmed. Then, the conceptual relation data following the conceptual relation path are generated, and outputted through a document output interface 12 to the outside. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、電子文書を取り込み、当該電子文書が持つ意味、概念に基づいて分類し、意味、概念的に合致した出力を生成する、知識ベースシステム、および同システムにおける単語間の意味関係判別方法、ならびにそのコンピュータプログラムに関する。 The present invention relates to a knowledge base system that captures an electronic document, classifies the electronic document based on the meaning and concept of the electronic document, and generates an output that matches the meaning and concept, and a method for determining a semantic relationship between words in the system , And its computer program.

電子文書に対して、言葉の持つ概念間の関係を木構造や網構造により記述した概念辞書（オントロジ辞書ともいう）を用いて文書の意味的内容を推論する方式が、従来から多数提案されている。例えば、多様な対象分野における文書に対し、文書の分野および利用者の目的、要求に応じた柔軟な情報抽出を行うために、オントロジ辞書における語彙間の意味的関係を用い、文書中に出現しない関係を補足しつつ利用者の興味を端的に示す中心語に対する単語関係列に展開し、個々の単語関係列を部分的な単語の関係や記事全体の語の用いられ方に基づいて、属性名と属性値とからなる関係に変換する方法が提案されている（特許文献１参照）。 Many methods have been proposed for inferring the semantic content of electronic documents using concept dictionaries (also called ontology dictionaries) that describe the relationships between words in terms of trees and networks. Yes. For example, in order to perform flexible information extraction according to the field of the document, the purpose of the user, and the request for the document in various target fields, it uses the semantic relationship between the vocabularies in the ontology dictionary and does not appear in the document. Expand the word relationship sequence for the central word that briefly shows the user's interest while supplementing the relationship, and attribute the individual word relationship sequence based on the partial word relationship and how the entire word is used. And a method of converting the relationship into attribute values has been proposed (see Patent Document 1).

また、利用者に多様なサービスを効率的に提供するため、利用者からの自然言語による問い合せに対してコンテンツを提供する際、オントロジ辞書を利用してそのコンテンツに関する知識を概念化し、利用者の問い合わせに対して対話的な支援を行いユーザにコンテンツを提供する方法も提案されている（特許文献２参照）。 In addition, in order to efficiently provide various services to users, when providing content in response to inquiries in natural language from users, the ontology dictionary is used to conceptualize knowledge about the content and There has also been proposed a method of providing content to a user by providing interactive support for an inquiry (see Patent Document 2).

更に、辞書に登録されていない語に対しても品詞や語彙概念を獲得することができる語彙獲得方法を得るために、基本語辞書と用語辞書とを参照することで、入力されたテキストの形態素解析を行い、その後、用語パターン辞書を参照することで、形態素解析の結果に現れた語と用語パターン辞書に記載された語とを照合し、一致した場合、用語を抽出し、次にオントロジ辞書を参照することで、形態素解析の結果に現れた語が、用語パターン辞書に記載されている語の概念の下位概念である場合にも用語を抽出する方法も提案されている（特許文献３参照）。ここで「形態素」とは、これ以上細かくすると意味が無くなってしまう最小の文字列を言い、「形態素解析」とは、文章を形態素に分解することを言う。
特開２０００−２０７４０７号公報（段落０００７〜００１１、図１）特開２００２−１９７１０７号公報（段落０００８、０００９、図１）特開２００１−６７３５６号公報（段落０００７〜００１９、図１） Furthermore, in order to obtain a vocabulary acquisition method that can acquire parts of speech and vocabulary concepts even for words that are not registered in the dictionary, by referring to the basic word dictionary and the term dictionary, the morphemes of the input text Analyze, and then refer to the term pattern dictionary to match the words that appear in the morphological analysis result with the words that are listed in the term pattern dictionary. A method of extracting a term even when a word appearing in the result of morphological analysis is a subordinate concept of a word concept described in a term pattern dictionary has been proposed (see Patent Document 3). ). Here, “morpheme” refers to a minimum character string that has no meaning if it is further reduced, and “morpheme analysis” refers to breaking a sentence into morphemes.
Japanese Unexamined Patent Publication No. 2000-207407 (paragraphs 0007 to 0011, FIG. 1) JP 2002-197107 A (paragraphs 0008 and 0009, FIG. 1) JP 2001-67356 A (paragraphs 0007 to 0019, FIG. 1)

前記した特許文献の多くは、文書の意味内容を推論するために、名詞間の上位、下位概念の関係（いわゆるｈａｓ−ａ関係、ｉｓ−ａ関係）を記述した木構造の概念辞書を使用している。例えば、図７に従来の木構造の概念辞書における概念関係が示されるように、二つの概念Ａ、Ｂを結ぶ径路（図中、太線）は特別な論理的な手法を用いることなく一意に定めることができる。 Many of the above-mentioned patent documents use a tree-structured concept dictionary describing the relationship between upper and lower concepts (so-called has-a relationship, is-a relationship) between nouns in order to infer the semantic content of a document. ing. For example, as shown in FIG. 7 which shows the conceptual relationship in the conventional tree-structured concept dictionary, the path (thick line in the figure) connecting the two concepts A and B is uniquely determined without using a special logical method. be able to.

しかしながら、概念体系は単純な木構造ではなく、図８に示されるように、網構造（ネットワーク）を持つことが一般的である。つまり、上位、下位概念のつながりの他に、同値、背反関係、対象関係、因果関係といった概念のつながりが存在し、これらは上位、下位といった方向性や包含関係とは異なる対応関係を持つ。すなわち、図８において、概念Ｃは、概念Ｅと概念Ｆの二つの上位概念を、概念Ｄは、概念Ｆと概念Ｇの二つの上位概念を持つ他に、概念Ｃと概念Ｄは上位、下位ではない、同値等の概念関係を持つ。 However, the conceptual system is not a simple tree structure, but generally has a network structure (network) as shown in FIG. That is, in addition to the upper and lower concept connections, there are conceptual connections such as equivalence, contradiction, object relationship, and causal relationship, and these have a correspondence relationship different from the directionality and inclusion relationship such as upper and lower. That is, in FIG. 8, concept C has two superordinate concepts of concept E and concept F, concept D has two superordinate concepts of concept F and concept G, and concept C and concept D are superordinate and subordinate. It has a conceptual relationship such as equivalence.

木構造と網構造の違いは、上位概念が一意に定まるか、あるいは複数存在しうるかにある。網構造の概念辞書では、一つの概念に対し、複数の上位概念が存在する場合やその他の概念関係が存在する場合があるため、ある二つの概念を結ぶ概念関係線の径路が複数存在する場合も生じる。例えば、図９に示される例では、概念Ｈ−概念Ｄ間は、実線と一点鎖線で示される概念関係線の径路が存在し、また、その他の径路をとることも可能である。ここでは、上位、下位概念も含めた概念間のつながりを概念関係と呼び、つながりを示す線を概念関係線と呼ぶ。この場合、前記のいずれの特許文献によっても二つの概念間の径路を一意に特定することができない。 The difference between the tree structure and the network structure is whether the superordinate concept is uniquely determined or a plurality of concepts can exist. In a concept dictionary of network structure, there may be multiple superordinate concepts for one concept or other conceptual relationships, so there are multiple paths of concept relationship lines connecting two concepts. Also occurs. For example, in the example shown in FIG. 9, there are paths of concept relationship lines indicated by solid lines and alternate long and short dash lines between Concept H and Concept D, and other paths can be taken. Here, a connection between concepts including upper and lower concepts is called a concept relationship, and a line indicating the connection is called a concept relationship line. In this case, the path between the two concepts cannot be uniquely specified by any of the above patent documents.

一方、電子文書の意味内容を現す一つの方法として、テキストや画像、音声等の情報（コンテンツ）の内容を説明するための情報であるメタデータによる記述がある。従来、電子文書にメタデータを付与するのは人海戦術に委ねられ、従って、作業者（文書登録者）に非常に多くの負担が強いられ、機械化による改善が望まれていた。 On the other hand, as one method for expressing the semantic content of an electronic document, there is a description by metadata that is information for explaining the content of information (content) such as text, image, and sound. Conventionally, it has been left to human naval tactics to give metadata to electronic documents. Therefore, a great burden has been imposed on workers (document registrants), and improvement by mechanization has been desired.

本発明は前記諸々の事情に鑑みてなされたものであり、言語学における概念関係を網羅した網構造の概念辞書を用い、電子文書が持つ意味構造をより自然言語に則したかたちで自動判別し、このことにより、概念間の径路を一意に特定することを可能とし、かつ、電子文書に意味構造を示すデータを付与する際の作業負担の軽減をはかり、電子文書を検索漏れ等により埋もれさせることなく有効に活用することのできる、知識ベースシステム、および同システムにおける単語間の意味関係判別方法、ならびにそのコンピュータプログラムを提供することを目的とする。ここで「意味構造」とは、意味解析（後述）により得られた形態素間の意味上のつながりを示す構造を言う。 The present invention has been made in view of the above-mentioned circumstances, and uses a network-structured concept dictionary that covers conceptual relationships in linguistics, and automatically discriminates the semantic structure of an electronic document in a form that conforms to a natural language. This makes it possible to uniquely identify the path between concepts, reduce the work load when assigning data indicating a semantic structure to an electronic document, and cause the electronic document to be buried due to omissions in search, etc. It is an object of the present invention to provide a knowledge base system, a method for determining semantic relations between words in the system, and a computer program thereof that can be used effectively without any problems. Here, the “semantic structure” refers to a structure showing a semantic connection between morphemes obtained by semantic analysis (described later).

前記した課題を解決するために本発明は、電子文書を取り込み、当該電子文書が持つ意味、概念に基づいて分類し、意味、概念的に合致した出力を生成する演算装置を備えた知識ベースシステムであって、前記電子文書を取り込む文書入力インタフェース部と、前記電子文書の形態素解析を行い、当該形態素解析によって得られる形態素間の概念関係について概念辞書を参照して複数の関係が確認されたとき、前記概念関係の強さに応じて付与された重み係数に従い前記形態素間の概念関係を現すデータを生成する前記演算装置に構築される文書分類処理部と、前記形態素間の概念関係を現すデータを出力する文書出力インタフェース部とを備えたことを特徴とする。 In order to solve the above-described problems, the present invention provides a knowledge base system including an arithmetic unit that takes in an electronic document, classifies it based on the meaning and concept of the electronic document, and generates an output that matches the meaning and concept When a plurality of relationships are confirmed by referring to a concept dictionary for a conceptual relationship between morphemes obtained by performing a morphological analysis of the electronic document and a document input interface unit that captures the electronic document and the electronic document. A document classification processing unit constructed in the arithmetic unit for generating data representing the conceptual relationship between the morphemes according to a weighting factor given according to the strength of the conceptual relationship, and data representing the conceptual relationship between the morphemes And a document output interface unit for outputting.

また、本発明において、前記文書分類処理部は、前記電子文書を形態素解析により形態素に分解する形態素解析部と、前記形態素についての前記概念辞書の掲載位置を検索する形態素掲載位置解析部と、前記形態素の中から任意の２個を選択し、当該２個の形態素間における概念関係線の径路を抽出し、当該抽出した径路が複数確認されたときに、概念間のつながりの強さに応じて付与された重み係数に従い概念関係線の径路を抽出する概念関係解析部と、前記概念関係線の径路に従う概念関係のデータを生成する概念関係データ生成部とを備えたことを特徴とする。 In the present invention, the document classification processing unit includes a morpheme analysis unit that decomposes the electronic document into morphemes by morpheme analysis, a morpheme placement position analysis unit that searches for a placement position of the concept dictionary for the morpheme, Select any two of the morphemes, extract the path of the conceptual relationship line between the two morphemes, and when multiple extracted paths are confirmed, depending on the strength of the connection between the concepts A conceptual relationship analysis unit that extracts a path of a concept relationship line according to a given weighting factor and a concept relationship data generation unit that generates data of a concept relationship according to the path of the concept relationship line are provided.

また、本発明において、前記概念関係解析部は、前記重み係数の逆数を演算して前記概念関係線の距離とし、目的の２つの概念間を結ぶ径路に沿って距離の総和を演算し、当該総和が最小となる径路を概念関係線の径路として抽出することを特徴とする。 Further, in the present invention, the concept relationship analysis unit calculates a reciprocal of the weighting factor to obtain a distance of the concept relationship line, calculates a sum of distances along a path connecting the two concepts of interest, The path having the minimum sum is extracted as the path of the concept relation line.

また、本発明において、文の文法上の構造関係を解析する構文解析を行って前記概念関係を抽出すべき形態素の組み合わせを選別し、前記概念関係解析部へその組み合わせを供給する構文解析部を更に備えたことを特徴とする。 Further, in the present invention, a syntax analysis unit that analyzes a grammatical structural relationship of a sentence to select a combination of morphemes from which the concept relationship is to be extracted, and supplies the combination to the concept relationship analysis unit It is further provided with the feature.

前記した課題を解決するために本発明は、電子文書を取り込み、当該電子文書が持つ意味、概念に基づいて分類し、意味、概念的に合致した出力を生成する知識ベースシステムにおける単語間の意味関係判別方法であって、前記電子文書の形態素解析を行うステップと、前記形態素解析によって得られる形態素についての前記概念辞書の掲載位置を検索するステップと、前記形態素の中から任意の２個を選択し、当該２個の形態素間における概念関係線の径路を抽出し、当該抽出した径路が複数確認されたときに、概念間のつながりの強さに応じて付与された重み係数に従い概念関係線の径路を抽出するステップと、前記概念関係線の径路に従う概念関係のデータを生成するステップとを含むことを特徴とする。 In order to solve the above-described problems, the present invention captures an electronic document, classifies it based on the meaning and concept of the electronic document, and generates an output that matches the meaning and concept, meaning between words A relationship determination method, comprising: performing a morpheme analysis of the electronic document; searching a posting position of the concept dictionary for a morpheme obtained by the morpheme analysis; and selecting any two of the morphemes Then, the path of the concept relationship line between the two morphemes is extracted, and when a plurality of the extracted paths are confirmed, the concept relationship line is determined according to the weighting coefficient assigned according to the strength of the connection between the concepts. The method includes a step of extracting a path and a step of generating data on a concept relationship according to the path of the concept relationship line.

前記した課題を解決するために本発明は、電子文書を取り込み、当該電子文書が持つ意味、概念に基づいて分類し、意味、概念的に合致した検索結果を生成する知識ベースシステムに用いられるコンピュータプログラムであって、前記電子文書の形態素解析を行う処理と、前記形態素解析によって得られる形態素についての前記概念辞書の掲載位置を検索する処理と、前記形態素の中から任意の２個を選択し、当該２個の形態素間における概念関係線の径路を抽出し、当該抽出した径路が複数確認されたときに、概念間のつながりの強さに応じて付与された重み係数に従い概念関係線の径路を抽出する処理と、前記概念関係線の径路に従う概念関係のデータを生成して前記記憶装置に格納する処理とをコンピュータに実行させることを特徴とする。 In order to solve the above problems, the present invention is a computer used in a knowledge base system that takes in an electronic document, classifies it based on the meaning and concept of the electronic document, and generates a search result that matches the meaning and concept. A program for performing a morphological analysis of the electronic document, a process for searching for a posting position of the concept dictionary for a morpheme obtained by the morpheme analysis, and selecting any two of the morphemes, The path of the concept relation line between the two morphemes is extracted, and when a plurality of the extracted paths are confirmed, the path of the concept relation line is determined according to the weighting coefficient given according to the strength of the connection between the concepts. And causing the computer to execute a process of extracting and a process of generating data on a conceptual relationship according to a path of the conceptual relationship line and storing the data in the storage device That.

本発明によれば、文書分類処理部が、電子文書の形態素解析を行い、当該形態素解析によって得られる形態素間の概念関係について概念辞書を参照して複数の関係が確認されたとき、概念間の関係の強さに応じて付与された重み係数に従って形態素間の概念関係を現すデータを生成することで、電子文書が持つ意味構造をより自然言語に則したかたちで自動判別することができ、このことにより、概念間の径路を一意に特定することを可能とし、かつ、電子文書に意味構造を示すデータを付与する際の作業負担の軽減がはかれ、電子文書を検索漏れ等により埋もれさせることなく有効に活用できる知識ベースシステムを提供することができる。 According to the present invention, when the document classification processing unit performs morphological analysis of an electronic document and a plurality of relationships are confirmed by referring to the concept dictionary regarding the conceptual relationship between morphemes obtained by the morpheme analysis, By generating data that expresses the conceptual relationship between morphemes according to the weighting factor assigned according to the strength of the relationship, the semantic structure of the electronic document can be automatically identified in a more natural language form. This makes it possible to uniquely identify the path between concepts, reduce the work load when assigning data indicating the semantic structure to the electronic document, and bury the electronic document due to omission of search, etc. It is possible to provide a knowledge base system that can be used effectively without any problems.

また、本発明によれば、文書分類処理部を構成する概念関係解析部が、選択された２個の形態素間についての概念関係線の径路を抽出し、当該抽出した径路が複数確認されたときに概念間のつながりの強さに応じて付与された、例えば共起関係に基づいて付された重み係数に従って概念関係線の径路を抽出し、概念関係データ生成部にてその概念関係線の径路に従う概念関係のデータを生成することで、電子文書に意味構造を示すデータを付与する際の作業負担の軽減がはかれる。なお、ここで、「共起」とは、ある単語と他の単語が同じ文書中に含まれることをいい、「共起関係」とは、前記した共起の頻度によって定められる、ある単語と他の単語との遠近の程度をいうものとする。 Further, according to the present invention, when the concept relationship analysis unit constituting the document classification processing unit extracts the path of the concept relationship line between the two selected morphemes, and when a plurality of the extracted paths are confirmed The path of the concept relationship line is extracted according to the weighting factor assigned based on the co-occurrence relationship, for example, according to the strength of the connection between the concepts, and the concept relationship data generation unit extracts the path of the concept relationship line By generating the conceptual relationship data according to the above, it is possible to reduce the work load when the data indicating the semantic structure is given to the electronic document. Here, “co-occurrence” means that a certain word and another word are included in the same document, and “co-occurrence relationship” means a certain word determined by the frequency of co-occurrence described above. The degree of perspective with other words.

また、本発明によれば、概念関係解析部が、距離を重み係数の逆数を演算することによって求め、更に、目的の２つの概念間を結ぶ径路に沿って距離の総和を演算し、この総和が最小となる径路を抽出することで、概念関係線の径路として抽出することができ、このことにより、電子文書が持つ意味構造をより自然言語に則したかたちで自動判別することができ、電子文書に意味構造を示すデータを簡単に付与することができる。 Further, according to the present invention, the concept relation analysis unit obtains the distance by calculating the reciprocal of the weighting factor, and further calculates the sum of the distances along the path connecting the two target concepts. By extracting the path with the smallest value, it can be extracted as the path of the conceptual relationship line, and this allows the semantic structure of the electronic document to be automatically identified in a form that complies with natural language. Data indicating a semantic structure can be easily given to a document.

また、本発明によれば、構文解析を行って概念関係を抽出すべき形態素の組み合わせを選別することで、係り受け関係にある形態素同士のみを選択することができ、このことにより、形態素間の概念関係を見出すためのコンピュータにかかる処理負担を軽減することができる。 In addition, according to the present invention, it is possible to select only morphemes having a dependency relationship by selecting a combination of morphemes for which a conceptual relationship is to be extracted by performing a syntax analysis. It is possible to reduce the processing load on the computer for finding the conceptual relationship.

図１は、本発明の知識ベースシステムの一実施形態を示すブロック図である。本発明の知識ベースシステムは、電子文書の形態素解析を行い、当該形態素解析によって得られる形態素間の概念関係について概念辞書を参照して複数の関係が確認されたとき、概念間の関係の強さに応じて付与された重み係数に従って形態素間の概念関係を現すデータを生成する演算装置上に構築された文書分類処理部２を備え、電子文書を取り込む文書入力インタフェース部１１、あるいは形態素間の概念関係を現すデータを出力する文書出力インタフェース部１２を介して外部システムに接続される。 FIG. 1 is a block diagram showing an embodiment of the knowledge base system of the present invention. The knowledge base system of the present invention performs morphological analysis of an electronic document, and when a plurality of relationships are confirmed with reference to a concept dictionary regarding the concept relationship between morphemes obtained by the morpheme analysis, the strength of the relationship between concepts A document classification processing unit 2 constructed on an arithmetic unit that generates data representing a conceptual relationship between morphemes according to a weighting factor assigned according to the document, and a document input interface unit 11 for capturing an electronic document, or a concept between morphemes It is connected to an external system via a document output interface unit 12 that outputs data representing the relationship.

文書分類処理部２は、更に、形態素解析部２１、形態素掲載位置解析部２２、構文解析部２３、概念関係解析部２４、概念関係データ生成部２５、形態素辞書２６、概念辞書２７、概念関係データ保存ＤＢ（Data Base）２８で構成される。 The document classification processing unit 2 further includes a morpheme analysis unit 21, a morpheme placement position analysis unit 22, a syntax analysis unit 23, a concept relationship analysis unit 24, a concept relationship data generation unit 25, a morpheme dictionary 26, a concept dictionary 27, and concept relationship data. It is composed of a storage DB (Data Base) 28.

形態素解析部２１は、文書入力インタフェース部１１を介して取り込まれる電子文書について形態素辞書２６を参照して形態素解析を行い、形態素に分解して形態素掲載位置解析部２２へ供給する機能を持つ。 The morpheme analysis unit 21 has a function of referring to the morpheme dictionary 26 for an electronic document captured via the document input interface unit 11, decomposing the electronic document into morphemes, and supplying the morpheme placement position analysis unit 22.

また、形態素掲載位置解析部２２は、形態素解析部２１より供給された形態素についての概念辞書２７の掲載位置を検索して構文解析部２３へ供給する機能、また構文解析部２３が省略されている場合には、概念関係解析部２４へ供給する機能を持つ。 Further, the morpheme placement position analysis unit 22 searches for the placement position of the concept dictionary 27 for the morpheme supplied from the morpheme analysis unit 21 and supplies it to the syntax analysis unit 23, and the syntax analysis unit 23 is omitted. In some cases, it has a function of supplying to the conceptual relationship analysis unit 24.

一方、構文解析部２３は、周知の構文解析を行って概念関係を抽出すべき形態素の組み合わせを選別し、概念関係解析部２４へその組み合わせを供給する機能を持ち、ここでは、必須の構成として用意されるものではなく、省略されても構わない。 On the other hand, the syntax analysis unit 23 has a function of performing a well-known syntax analysis to select a combination of morphemes from which a concept relationship is to be extracted, and supplying the combination to the concept relationship analysis unit 24. It is not prepared and may be omitted.

概念関係解析部２４は、形態素解析部２１で解析した形態素の中から任意の２個を選択し、選択された２個の形態素間についての概念関係線の径路を抽出し、当該抽出した径路が複数確認されたときに、概念間のつながりの強さに応じて付与された重み係数に従い概念関係線の径路を抽出し、概念関係データ生成部２５へ供給する。概念関係解析部２４は、また、重み係数の逆数を演算して概念関係線の距離とし、目的の２つの概念間を結ぶ径路に沿って距離の総和を演算し、当該総和が最小となる径路を概念関係線の径路として抽出する機能もあわせ持つ。 The conceptual relationship analysis unit 24 selects any two of the morphemes analyzed by the morpheme analysis unit 21, extracts the path of the conceptual relationship line between the two selected morphemes, and the extracted path is When a plurality of the relations are confirmed, the path of the concept relation line is extracted according to the weighting coefficient assigned according to the strength of the connection between the concepts and supplied to the concept relation data generation unit 25. The conceptual relationship analysis unit 24 also calculates the reciprocal of the weighting factor to obtain the distance of the conceptual relationship line, calculates the sum of the distances along the path connecting the two target concepts, and the path that minimizes the sum It also has a function to extract as a path of conceptual relationship lines.

概念関係データ生成部２５は、概念関係線の径路に従う概念関係のデータを生成して概念関係データ保存ＤＢ２８へ保存する機能を持つ。 The concept relationship data generation unit 25 has a function of generating concept relationship data according to the path of the concept relationship line and storing it in the concept relationship data storage DB 28.

図２、図３は、本発明における知識ベースシステムの動作を説明するために引用したフローチャートであり、本発明におけるコンピュータプログラムの処理手順も示している。また、図４〜図６は、本発明における知識ベースシステムの動作の理解を助ける意味で引用した動作概念図であり、それぞれ、構文解析の一例（図４）、概念辞書２７における概念間の接続イメージ（図５）、概念関係解析結果である径路選択の一例（図６）を示す。以下、図２〜図６を参照しながら、図１に示す本発明の知識ベースシステムの動作について詳細に説明する。 FIGS. 2 and 3 are flowcharts cited for explaining the operation of the knowledge base system in the present invention, and also show the processing procedure of the computer program in the present invention. FIG. 4 to FIG. 6 are operation concept diagrams quoted for the purpose of helping understanding of the operation of the knowledge base system according to the present invention. Examples of syntax analysis (FIG. 4) and connections between concepts in the concept dictionary 27 are shown. FIG. 5 shows an image (FIG. 5) and an example (FIG. 6) of path selection as a result of conceptual relationship analysis. The operation of the knowledge base system of the present invention shown in FIG. 1 will be described in detail below with reference to FIGS.

まず、前提条件として、本発明で使用される概念辞書２７は、自然言語学における概念関係を保持するものとし、また、その概念関係には、関係の強さに応じた重み係数、例えば、言語学上の「共起関係」に基づく重み付けがなされているものとする。 First, as a precondition, the concept dictionary 27 used in the present invention holds a conceptual relationship in natural linguistics, and the conceptual relationship includes a weighting factor according to the strength of the relationship, for example, language It is assumed that weighting is based on academic “co-occurrence relationships”.

最初に、文書入力インタフェース部１１を介して他のシステムから意味データ付与の対象となる電子文書の取り込みが行われる（Ｓ２０１）。そして、その電子文書について形態素解析部２１が形態素辞書２６を参照して形態素単位（語構成の最小単位）に分解し、それぞれの形態素が持つ性質を明らかにする周知の形態素解析を行い、形態素掲載位置解析部２２へ供給する（Ｓ２０２）。なお、電子文書は一つまたは複数の文章から構成されるものとする。 First, an electronic document to be given semantic data is fetched from another system via the document input interface unit 11 (S201). Then, the morpheme analysis unit 21 refers to the morpheme dictionary 26 and decomposes the electronic document into morpheme units (minimum units of word structure), performs well-known morpheme analysis to clarify the properties of each morpheme, It supplies to the position analysis part 22 (S202). It is assumed that the electronic document is composed of one or a plurality of sentences.

形態素掲載位置解析部２２では、その文章の中から一つを選び、当該文章に含まれる形態素の概念辞書２７上の掲載位置を検索する（Ｓ２０３）。 The morpheme placement position analysis unit 22 selects one of the sentences and searches for the placement position of the morpheme contained in the sentence on the concept dictionary 27 (S203).

このとき、構文解析部２３により、係り受け関係にある形態素同士のみを選択するようにしてもよい。構文解析とは、周知のように、形態素解析された文章を、文法を用いて正しい文章であるか否かを判定し、正しい文章のときはその構文解析結果として網構造を得るものであり、この中から正しい径路を選択するために意味解析が行われる。ここで「意味解析」とは、構文解析された文に対して、意味・概念体形を抽出し、文の概念的構成を解析することを言う。 At this time, the syntax analysis unit 23 may select only morphemes having a dependency relationship. As is well known, syntactic analysis is to determine whether a morphologically analyzed sentence is a correct sentence using grammar, and when it is a correct sentence, a network structure is obtained as a result of the syntactic analysis. Semantic analysis is performed to select the correct path from these. Here, “semantic analysis” refers to extracting a semantic / conceptual form from a syntactically analyzed sentence and analyzing a conceptual structure of the sentence.

図４に、構文解析を行い、概念関係を抽出すべき形態素の組み合わせを選別する例が示されている。すなわち、構文解析の対象となる文章が「協議会の総会が新橋で開かれた」となっていた場合、意味解析を要する形態素の組み合わせは、「協議会」と「総会」、「総会」と「開かれた」、「新橋」と「開かれた」であって、例えば、「協議会」と「新橋」、「総会」と「新橋」の組み合わせについては構文上のつながりがないため、意味解析をしなくても全体の解析結果には影響しない。従って、ここで構文解析を行うことにより、意味解析すべき形態素の組み合わせを選別することが可能となり、このとき、形態素間の概念関係を見出すための処理を軽減することが可能になる。 FIG. 4 shows an example in which syntactic analysis is performed to select combinations of morphemes from which concept relationships are to be extracted. In other words, if the sentence to be parsed is “The Council General Assembly was held in Shimbashi”, the combination of morphemes that require semantic analysis is “Council”, “General Assembly”, and “General Assembly”. "Open", "Shinbashi" and "Open", for example, there is no syntactic connection for the combination of "Conference" and "Shimbashi", "General Assembly" and "Shinbashi", Even if analysis is not performed, the overall analysis result is not affected. Therefore, by performing syntax analysis here, it is possible to select combinations of morphemes to be semantically analyzed, and at this time, it is possible to reduce processing for finding a conceptual relationship between morphemes.

次に、概念関係解析部２４は、形態素解析部２１が解析した形態素の中から任意の２個を選択し（Ｓ２０４）、選択された２個の形態素についての概念辞書２７上の掲載位置から、当該形態素間の概念関係線の径路を抽出する（Ｓ２０５）。 Next, the conceptual relationship analysis unit 24 selects any two of the morphemes analyzed by the morpheme analysis unit 21 (S204), and from the posted positions on the concept dictionary 27 for the two selected morphemes, The path of the conceptual relationship line between the morphemes is extracted (S205).

図５に、概念辞書２７における概念間の接続イメージが示されている。ここでは、概念関係線に付随して示されている用語（上位下位、主格、対象）は、概念関係の種類を、また、数値は概念関係の重み係数を示す。前記したように、概念辞書２７における概念間のつながり、すなわち概念関係線には、その関係の強さに応じた重み係数が付与されている。具体的には、共起関係に基づく重み付けがある。共起関係とは、一つの文中で同時に出現する単語の関係を意味する。共起頻度＝概念関係線の重みとしてもよい。また、共起関係以外の、例えば、文中の動詞が名詞概念との間に有する意味関係である深層格関係等の言語学上の概念関係によって付けられた重みを用いてもよい。 FIG. 5 shows a connection image between concepts in the concept dictionary 27. Here, the terms (higher and lower ranks, principals, and objects) shown accompanying the concept relationship line indicate the type of concept relationship, and the numerical value indicates the weighting coefficient of the concept relationship. As described above, the connection between concepts in the concept dictionary 27, that is, the concept relationship line, is given a weighting factor corresponding to the strength of the relationship. Specifically, there is weighting based on the co-occurrence relationship. A co-occurrence relationship means a relationship between words that appear simultaneously in one sentence. The co-occurrence frequency may be the weight of the concept relationship line. Moreover, you may use the weight attached | subjected by linguistic conceptual relations other than co-occurrence relations, for example, the deep case relation which is the semantic relation which the verb in a sentence has with a noun concept.

ところで、抽出された径路が複数あるか否かを判断し（Ｓ２０６）、複数ある場合（Ｓ２０６のＹｅｓの場合）、概念関係解析部２４は、以下の手順に従ってそれぞれの径路の“距離”を算出し、その中で最小となる径路を選択する。 By the way, it is determined whether or not there are a plurality of extracted paths (S206). If there are a plurality of paths (Yes in S206), the conceptual relationship analysis unit 24 calculates the “distance” of each path according to the following procedure. Then, the smallest path is selected.

ここでは、重み係数の逆数を概念関係線の“距離”とし、目的の二概念間を結ぶ径路に沿って距離の総和をとり、その総和が最小となるものを選出する（Ｓ２０７）。なお、最小となるものが複数存在する場合は、それらすべてを選択してもよいし、また、任意の一つを選択してもよい。 Here, the reciprocal of the weight coefficient is defined as the “distance” of the concept relation line, the sum of the distances is taken along the path connecting the two target concepts, and the one that minimizes the sum is selected (S207). If there are a plurality of minimum items, all of them may be selected, or any one may be selected.

図６に、径路選択の一例が示されている。ここでは、概念Ａ−概念Ｃ、概念Ｃ−概念Ｄ、概念Ｄ−概念Ｂ間は、共起関係の重み係数が“２”、概念Ａ−概念Ｅ、概念Ｅ−概念Ｂ間は、共起関係の重み係数が“１”になっている。従って、径路Ａ−Ｃ−Ｄ−Ｂの重み係数の総和は、１／２＋１／２＋１／２＝３／２、径路Ａ−Ｅ−Ｂの重み係数の総和は、１／１＋１／１＝２、従って、前者＜後者となることから、径路Ａ−Ｃ−Ｄ−Ｂが選択される。 FIG. 6 shows an example of path selection. Here, between the concept A-concept C, the concept C-concept D, and the concept D-concept B, the co-occurrence relationship weight coefficient is "2", and between the concept A-concept E, concept E-concept B is co-occurrence. The relationship weighting factor is “1”. Therefore, the sum of the weighting factors of the path A-C-D-B is 1/2 + 1/2 + 1/2 = 3/2, and the sum of the weighting factors of the path A-E-B is 1/1 + 1/1 = 2. Therefore, since the former is less than the latter, the path A-C-D-B is selected.

なお、径路が単一の場合（Ｓ２０６のＮｏの場合）は、前記したＳ２０７の処理はスキップされ、後記するＳ２０８の処理を実行する。Ｓ２０８では、概念関係データ生成部２５が当該形態素に対する概念関係を現すデータを生成して概念関係データ保存ＤＢ２８に保存する。ここで、そのデータ形式の具体例としては、Ｗ３Ｃ（World Wide Web Consortium）で標準化された記述方式であるＲＤＦ（Resource Description Framework）形式データがある。勿論、ＲＤＦ形式に制限されるものではない。 When there is a single path (No in S206), the process of S207 described above is skipped, and the process of S208 described later is executed. In S208, the concept relation data generation unit 25 generates data representing the concept relation for the morpheme and stores it in the concept relation data storage DB. Here, as a specific example of the data format, there is RDF (Resource Description Framework) format data which is a description method standardized by W3C (World Wide Web Consortium). Of course, it is not limited to the RDF format.

前記した概念関係解析部２４による一連の処理（Ｓ２０５〜Ｓ２０８）は、当該文章に含まれるすべての形態素の組（または係り受け関係にある形態素の組）について繰り返される（Ｓ２０９）。そして、当該文章について抽出されたすべての概念関係データを一つに纏め（Ｓ２１０）、更に、電子文書に含まれるすべての文章について前記した一連の処理（Ｓ２０３〜Ｓ２１０）を繰り返す（Ｓ２１１）。 The series of processing (S205 to S208) by the conceptual relationship analysis unit 24 described above is repeated for all morpheme sets (or morpheme sets in a dependency relationship) included in the sentence (S209). Then, all the conceptual relationship data extracted for the sentence is collected together (S210), and the above-described series of processing (S203 to S210) is repeated for all sentences included in the electronic document (S211).

そして、概念関係解析部２４は、すべての解析処理を終えた後、概念関係データ生成部２５を起動し、概念関係データ生成部２５が当該形態素に対する概念関係を現すデータを生成し、必要に応じて、文書出力インタフェース１２を介して、図示せぬ外部システムへ出力する（Ｓ２１２）。 Then, after completing all the analysis processes, the conceptual relationship analysis unit 24 activates the conceptual relationship data generation unit 25, and the conceptual relationship data generation unit 25 generates data representing the conceptual relationship with respect to the morpheme. Then, the data is output to an external system (not shown) via the document output interface 12 (S212).

以上説明のように本発明は、電子文書の表面的な文字情報のみならず、その意味内容も含めて、情報共有、流通、あるいは検索を行うことのできる知識ベースシステムの構築を可能とするものである。また、電子文書から意味内容を自動判別し、その結果を意味データとして付与することが可能になるため、文書登録を行う作業者の負担を大幅に軽減するものである。 As described above, the present invention makes it possible to construct a knowledge base system that can share, distribute, or search information including not only superficial character information of an electronic document but also its semantic content. It is. Further, since it is possible to automatically determine the semantic content from the electronic document and assign the result as semantic data, the burden on the operator who performs document registration is greatly reduced.

更に、言語学における概念関係を網羅した網構造の概念辞書を使用することにより、文書作成者の違いによる言葉の揺れを吸収し、意味の統一性をもった意味データを付与することが可能になる。この意味の統一性により、検索性能の向上や文書の自動分類なども可能となる。 Furthermore, by using a network-structured concept dictionary that covers conceptual relationships in linguistics, it is possible to absorb fluctuations of words due to differences in document creators and to provide semantic data with semantic uniformity. Become. This uniformity of meaning makes it possible to improve search performance and automatically classify documents.

なお、図１に示す文書分類処理部２、ならびに当該文書分類処理部２を構成する形態素解析部２１、形態素掲載位置解析部２２、構文解析部２３、概念関係解析部２４、概念関係データ生成部２５のそれぞれで実行される手順をコンピュータの読み取り可能な記録媒体に記録し、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによっても本発明の知識ベースシステムを実現することができる。ここでいうコンピュータシステムとは、ＯＳ（Operating System）や周辺機器等のハードウェアを含むものである。 The document classification processing unit 2 shown in FIG. 1 and the morpheme analysis unit 21, the morpheme placement position analysis unit 22, the syntax analysis unit 23, the concept relationship analysis unit 24, and the concept relationship data generation unit that constitute the document classification processing unit 2 The knowledge base system of the present invention is also realized by recording the procedure executed in each of the programs 25 on a computer-readable recording medium, causing the computer system to read and execute the program recorded on the recording medium. Can do. The computer system here includes hardware such as an OS (Operating System) and peripheral devices.

本発明の知識ベースシステムの一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the knowledge base system of this invention. 本発明における知識ベースシステムの動作を説明するために引用したフローチャートである。It is the flowchart quoted in order to demonstrate operation | movement of the knowledge base system in this invention. 本発明における知識ベースシステムの動作を説明するために引用したフローチャートである。It is the flowchart quoted in order to demonstrate operation | movement of the knowledge base system in this invention. 本発明における知識ベースシステムの動作の理解を助けるために引用した動作概念図である。It is the operation | movement conceptual diagram quoted in order to assist the understanding of operation | movement of the knowledge base system in this invention. 本発明における知識ベースシステムの動作の理解を助けるために引用した動作概念図である。It is the operation | movement conceptual diagram quoted in order to assist the understanding of operation | movement of the knowledge base system in this invention. 本発明における知識ベースシステムの動作の理解を助けるために引用した動作概念図である。It is the operation | movement conceptual diagram quoted in order to assist the understanding of operation | movement of the knowledge base system in this invention. 従来の木構造の概念辞書における概念関係を説明するために引用した図である。It is the figure quoted in order to demonstrate the conceptual relationship in the conventional tree-structured concept dictionary. 従来の網構造の概念辞書における概念関係を説明するために引用した図である。It is the figure quoted in order to demonstrate the conceptual relationship in the conventional network structure concept dictionary. 従来の網構造の概念辞書における概念関係線の径路を説明するために引用した図である。It is the figure quoted in order to demonstrate the path | route of the concept relation line in the conventional network structure concept dictionary.

Explanation of symbols

１１文書入力インタフェース部
１２文書出力インタフェース部
２文書分類処理部
２１形態素解析部
２２形態素掲載位置解析部
２３構文解析部
２４概念関係解析部
２５概念関係データ生成部
２６形態素辞書
２７概念辞書
２８概念関係データ保存ＤＢ DESCRIPTION OF SYMBOLS 11 Document input interface part 12 Document output interface part 2 Document classification | category process part 21 Morphological analysis part 22 Morphological placement position analysis part 23 Syntax analysis part 24 Concept relation analysis part 25 Concept relation data generation part 26 Morphological dictionary 27 Concept dictionary 28 Concept relation data Save DB

Claims

A knowledge base system including an arithmetic device that takes in an electronic document, classifies it based on the meaning and concept of the electronic document, and generates an output that matches the meaning and concept,
A document input interface unit for capturing the electronic document;
When the morpheme analysis of the electronic document is performed and a plurality of relationships are confirmed with reference to the concept dictionary stored in the storage device, the strength of the concept relationship is determined. A document classification processing unit constructed in the arithmetic device for generating data representing a conceptual relationship between the morphemes according to a weighting factor assigned accordingly,
A knowledge base system comprising: a document output interface unit that outputs data representing a conceptual relationship between the morphemes.

The document classification processing unit
A morpheme analyzer that decomposes the electronic document into morphemes by morpheme analysis;
A morpheme placement position analysis unit for searching for a placement position of the concept dictionary for the morpheme;
Select any two of the morphemes, extract the path of the conceptual relationship line between the two morphemes, and depending on the strength of the connection between the concepts when a plurality of the extracted paths are confirmed A concept relationship analysis unit that extracts a path of a concept relationship line according to the weighting factor assigned in
The knowledge base system according to claim 1, further comprising a concept relation data generation unit that generates data of a concept relation according to a path of the concept relation line.

The conceptual relationship analysis unit
The reciprocal of the weight coefficient is calculated as the distance of the conceptual relationship line, the sum of distances is calculated along the path connecting the two target concepts, and the path having the minimum sum is defined as the path of the conceptual relationship line. The knowledge base system according to claim 2, wherein the knowledge base system is extracted.

A syntactic analysis unit that selects a combination of morphemes for which the conceptual relationship is to be extracted by performing a syntactic analysis and supplies the combination to the conceptual relationship analysis unit or the morpheme placement position analysis unit; The knowledge base system according to claim 2 or 3.

The method of determining a semantic relationship between words in a knowledge base system according to claim 1, wherein an electronic document is taken in, classified based on the meaning and concept of the electronic document, and an output that matches the meaning and concept is generated. ,
Performing a morphological analysis of the electronic document;
Searching the posting position of the concept dictionary for morphemes obtained by the morpheme analysis;
Select any two of the morphemes, extract the path of the conceptual relationship line between the two morphemes, and depending on the strength of the connection between the concepts when a plurality of the extracted paths are confirmed Extracting a path of the concept relation line according to the weighting factor given in
Generating semantic relationship data according to the path of the conceptual relationship line, and determining a semantic relationship between words in a knowledge base system.

A computer program used in the knowledge base system according to claim 1, which takes in an electronic document, classifies it based on the meaning and concept of the electronic document, and generates an output that matches the meaning and concept,
Processing to perform morphological analysis of the electronic document;
A process of searching for a posting position of the concept dictionary for a morpheme obtained by the morpheme analysis;
Select any two of the morphemes, extract the path of the conceptual relationship line between the two morphemes, and depending on the strength of the connection between the concepts when a plurality of the extracted paths are confirmed A process of extracting the path of the concept relation line according to the weighting factor given in
The computer program which makes a computer perform the process which produces | generates the data of the conceptual relationship according to the path | route of the said conceptual relationship line.