JP4892868B2

JP4892868B2 - Inter-set relationship determination program and inter-set relationship determination device

Info

Publication number: JP4892868B2
Application number: JP2005148584A
Authority: JP
Inventors: 忠星合
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2005-05-20
Filing date: 2005-05-20
Publication date: 2012-03-07
Anticipated expiration: 2025-05-20
Also published as: JP2006323785A

Description

この発明は、数値データを要素とする集合間の関連性を判定する集合間関連性判定プログラム及び集合間関連性判定装置に関し、特に、数値データを要素とする集合間の関連性を適切に発見することができる集合間関連性判定プログラム及び集合間関連性判定装置に関するものである。 The present invention relates to an inter-set relationship determination program and an inter-set relationship determination apparatus for determining the relationship between sets having numerical data as elements, and more particularly to appropriately finding the relationship between sets having numerical data as elements. The present invention relates to an inter-set relationship determination program and an inter-set relationship determination apparatus .

近年、企業や組織の合併や統合がなされる場合に、企業や組織が有する情報をいかに統合し、相互運用をおこなうかといった問題は、知的資産の適切な管理・運用のためにますます重要になってきている。 In recent years, when companies and organizations are merged or integrated, the issue of how to integrate and interoperate information held by companies and organizations is increasingly important for the proper management and operation of intellectual assets. It is becoming.

しかし、情報をカテゴリに分類する分類体系が異なる場合には、情報を統合するために類似した情報を含んだカテゴリ対を探し出すことが困難となる。具体的には、カテゴリの名称の違いや、カテゴリ粒度の違い、あるいは、分類階層の構成方法の違いなどにより、単純にカテゴリの名称を比較するだけでは分類体系間において対応するカテゴリを適切に探し出すことができないという問題が発生する。 However, when the classification system for classifying information into categories is different, it is difficult to find a category pair including similar information in order to integrate the information. Specifically, due to differences in category names, category granularity, or differences in the structure of classification hierarchies, simply searching for the corresponding category between classification systems is possible simply by comparing the category names. The problem of not being able to occur.

図１１は、異なる分類体系間におけるカテゴリの違いについて説明する図である。図１１に示すように、分類体系１では、「Sports」というカテゴリの下位カテゴリとして、「American football」というカテゴリと、「football」というカテゴリとが存在する。また、分類体系２では、「Sports」というカテゴリの下位カテゴリとして、「Soccer」というカテゴリと、「Football」というカテゴリが存在する。 FIG. 11 is a diagram for explaining a difference in categories between different classification systems. As shown in FIG. 11, in the classification system 1, there are a category “American football” and a category “football” as subordinate categories of the category “Sports”. In the classification system 2, there are a category “Soccer” and a category “Football” as subcategories of the category “Sports”.

ここで、「Football」という語は、「Soccer」や「American football」などの広い意味で用いられる場合がある。そのため、分類体系１における「Football」というカテゴリが実は「Soccer」を意味し、分類体系２における「Football」というカテゴリが実は「American football」を意味しているような場合も生じる。このような場合には、単にカテゴリの名称を比較するだけではカテゴリを適切に対応付けることができなくなる。 Here, the word “Football” may be used in a broad sense such as “Soccer” or “American football”. Therefore, the category “Football” in the classification system 1 may actually mean “Soccer”, and the category “Football” in the classification system 2 may actually mean “American football”. In such a case, it is not possible to appropriately associate the categories by simply comparing the category names.

そのため、単にカテゴリの名称を比較するのではなく、２つの分類体系におけるカテゴリからテキストデータを抽出し、それらから特徴ベクトルを生成して比較することによりカテゴリの対応付けをおこなう技術が開示されている（たとえば、特許文献１を参照。）。 Therefore, a technique is disclosed in which texts are associated with each other by extracting text data from categories in two classification systems, generating feature vectors from the categories, and comparing them, instead of simply comparing the names of the categories. (For example, see Patent Document 1).

特開２００５−６３３３２号公報JP 2005-63332 A

しかしながら、上述した特許文献に代表される従来技術は、テキストデータに対して形態素解析や特徴語の抽出などをおこなってカテゴリの対応付けをおこなうものであり、カテゴリに属するデータが数値データであるような場合には対応付けをおこなうことが難しいという問題があった。 However, the conventional techniques represented by the above-mentioned patent documents perform morphological analysis, feature word extraction, and the like on text data to associate categories, and data belonging to the category is numerical data. In such a case, there is a problem that it is difficult to perform the association.

カテゴリに属するデータが数値データである場合には、個々の数値によりカテゴリの特徴付けをおこなうことが難しく、個々の数値を単に調べたとしてもカテゴリ間の対応関係を発見しにくい。 When the data belonging to the category is numerical data, it is difficult to characterize the category by each numerical value, and it is difficult to find the correspondence between the categories even if each numerical value is simply examined.

そのため、数値データの集合からなるカテゴリが複数ある場合に、それらの間の関連性をいかに適切に発見することができるかが知的資産の適切な管理・運用のために重要な問題となってきている。 Therefore, when there are multiple categories consisting of a collection of numerical data, how to properly find the relationship between them becomes an important issue for the appropriate management and operation of intellectual assets. ing.

この発明は、上述した従来技術による問題点を解消するためになされたものであり、数値データを要素とする集合間の関連性を適切に発見することができる集合間関連性判定プログラム及び集合間関連性判定装置を提供することを目的とする。 The present invention has been made to solve the above-described problems caused by the prior art, and is a program for determining the relationship between sets, which can appropriately find the relationship between sets whose elements are numerical data, and between sets. An object of the present invention is to provide a relevance determination device .

上述した課題を解決し、目的を達成するため、本発明は、数値データを要素とする集合間の関連性を判定する集合間関連性判定プログラムであって、前記数値データの分布に係る統計量を算出する統計量算出手順と、前記統計量算出手順により算出された統計量を記憶する統計量記憶手順と、前記統計量記憶手順に記憶された統計量に基づいて集合間の関連性を判定する関連性判定手順と、をコンピュータに実行させることを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention is an inter-group relevance determination program for determining relevance between sets having numerical data as an element, the statistics relating to the distribution of the numerical data A statistic calculation procedure for calculating the statistic, a statistic storage procedure for storing the statistic calculated by the statistic calculation procedure, and determining an association between sets based on the statistic stored in the statistic storage procedure And causing the computer to execute the relevance determination procedure.

また、本発明は、上記発明において、前記関連性判定手順は、同値関係、等値関係、兄弟関係あるいは上下階層関係のうちのいずれか１つの関係が集合間にあるか否かを判定することを特徴とする。 Also, in the present invention according to the above-described invention, the relevance determination procedure determines whether any one of an equivalence relation, an equivalence relation, a sibling relation, or a hierarchy relation is between sets. It is characterized by.

また、本発明は、上記発明において、前記集合は、所定の概念の属性に係る集合であることを特徴とする。 In the invention described above, the present invention is characterized in that the set is a set related to attributes of a predetermined concept.

また、本発明は、上記発明において、前記数値データの単位を調整し、単位が整合しない場合には、前記集合間には関連性がないものと判定する単位調整手順をさらにコンピュータに実行させ、前記統計量算出手順は、前記単位が整合する場合に、前記単位調整手順により単位が調整された数値データの分布に係る統計量を算出することを特徴とする。 Further, in the present invention, in the above invention, the unit of the numerical data is adjusted, and when the units do not match, the computer further executes a unit adjustment procedure for determining that there is no relationship between the sets, The statistic calculation procedure calculates a statistic related to the distribution of numerical data whose units are adjusted by the unit adjustment procedure when the units match.

また、本発明は、上記発明において、前記関連性判定手順は、集合の数値データの分布間の隔たりを所定の閾値と比較することにより集合間の関連性を判定することを特徴とする。 Further, the present invention is characterized in that, in the above-mentioned invention, the relevance determination procedure determines relevance between sets by comparing a distance between distributions of numerical data of the set with a predetermined threshold.

また、本発明は、上記発明において、前記関連性判定手順は、集合の数値データの２つの分布間に重なりがない場合に、値が小さい方の分布における数値データの最大値と、値が大きい方の分布における数値データの最小値との間の桁数の違いが３桁よりも大きいか否かを調べることにより、集合間の関連性を判定することを特徴とする。 Further, in the present invention according to the above-described invention, the relevance determination procedure is such that when there is no overlap between the two distributions of the numerical data in the set, the maximum value of the numerical data in the smaller distribution and the value are large. The relationship between sets is determined by examining whether or not the difference in the number of digits from the minimum value of the numerical data in this distribution is greater than 3 digits.

また、本発明は、上記発明において、前記関連性判定手順は、前記統計量記憶手順に記憶された統計量に基づいて前記数値データの平均値に係る同一性の検定をおこなうことにより、集合間の関連性を判定することを特徴とする。 Further, the present invention is the above invention, wherein the relevance determination procedure performs an identity test on the average value of the numerical data based on the statistic stored in the statistic storage procedure. It is characterized by determining the relevance of.

また、本発明は、上記発明において、前記関連性判定手順は、前記数値データの平均値に係る同一性の検定をおこなう場合の検定に係る有意水準を０．０１から０．３の範囲内で設定することを特徴とする。 Further, the present invention is the above invention, wherein the relevancy determination procedure is such that the significance level related to the test in the case of performing the test of identity related to the average value of the numerical data is within a range of 0.01 to 0.3. It is characterized by setting.

また、本発明は、上記発明において、前記関連性判定手順は、前記統計量記憶手順に記憶された統計量に基づいて、前記数値データの分散に係る同一性の検定をおこなうことにより、集合間の関連性を判定することを特徴とする。 Further, the present invention is the above invention, wherein the relevance determination procedure performs an identity test related to the dispersion of the numerical data based on the statistic stored in the statistic storage procedure. It is characterized by determining the relevance of.

また、本発明は、上記発明において、前記関連性判定手順は、前記数値データの分散に係る同一性の検定をおこなう場合の検定に係る有意水準を０．０１から０．３の範囲内で設定することを特徴とする。 Further, the present invention is the above invention, wherein the relevance determination procedure sets a significance level related to a test in the case of performing an identity test related to the variance of the numerical data within a range of 0.01 to 0.3. It is characterized by doing.

また、本発明は、数値データを要素とする集合間の関連性を判定する集合間関連性判定方法であって、前記数値データの分布に係る統計量を算出する統計量算出工程と、前記統計量算出工程により算出された統計量を記憶する統計量記憶工程と、前記統計量記憶工程に記憶された統計量に基づいて集合間の関連性を判定する関連性判定工程と、を含んだことを特徴とする。 Further, the present invention is an inter-set relevance determination method for determining relevance between sets having numerical data as an element, the statistical amount calculating step for calculating a statistic related to the distribution of the numerical data, and the statistical A statistic storage step for storing the statistic calculated by the quantity calculation step, and a relevance determination step for determining the relevance between the sets based on the statistic stored in the statistic storage step. It is characterized by.

本発明によれば、数値データの分布に係る統計量を算出し、算出した統計量を記憶し、記憶した統計量に基づいて集合間の関連性を判定することとしたので、数値データの分布に係る統計量から数値データを要素とする集合間の関連性を適切に発見することができるという効果を奏する。 According to the present invention, the statistics related to the distribution of numerical data are calculated, the calculated statistics are stored, and the relevance between sets is determined based on the stored statistics. It is possible to appropriately find the relationship between the sets having numerical data as elements from the statistics related to.

また、本発明によれば、同値関係、等値関係、兄弟関係あるいは上下階層関係のうちのいずれか１つの関係が集合間にあるか否かを判定することとしたので、集合間の関係を詳細に分析することができるという効果を奏する。 In addition, according to the present invention, since it is determined whether any one of the equivalence relation, the equivalence relation, the sibling relation, or the upper and lower hierarchy relation is between the sets, the relation between the sets is determined. The effect that it can analyze in detail is produced.

また、本発明によれば、集合は、所定の概念の属性に係る集合であることとしたので、オントロジーなどにおけるクラスの属性が数値データの集合からなる場合に、属性間の関連性を適切に発見することができるという効果を奏する。 In addition, according to the present invention, since the set is a set related to attributes of a predetermined concept, when the class attribute in the ontology or the like is a set of numerical data, the relationship between the attributes is appropriately set. The effect is that it can be discovered.

また、本発明によれば、数値データの単位を調整し、単位が整合しない場合には、集合間には関連性がないものと判定し、単位が整合する場合に、単位が調整された数値データの分布に係る統計量を算出することとしたので、単位の整合性を調べることにより集合間に関連性がない場合を適切に発見することができるという効果を奏する。 In addition, according to the present invention, when the units of numerical data are adjusted and the units do not match, it is determined that there is no relationship between the sets. Since the statistics related to the distribution of data are calculated, it is possible to appropriately find a case where there is no relationship between sets by checking the unit consistency.

また、本発明によれば、集合の数値データの分布間の隔たりを所定の閾値と比較することにより集合間の関連性を判定することとしたので、数値データの分布間の隔たりを調べることにより集合間に関連性がない場合を適切に発見することができるという効果を奏する。 In addition, according to the present invention, the relationship between sets of numerical data is determined by comparing the distance between sets of numerical data distributions with a predetermined threshold value. There is an effect that it is possible to appropriately find a case where there is no relationship between sets.

また、本発明によれば、集合の数値データの２つの分布間に重なりがない場合に、値が小さい方の分布における数値データの最大値と、値が大きい方の分布における数値データの最小値との間の桁数の違いが３桁よりも大きいか否かを調べることにより、集合間の関連性を判定することとしたので、関連性のある集合の大多数は桁数の違いが３桁以内となることから、集合間の関連性を適切に発見することができるという効果を奏する。 Further, according to the present invention, when there is no overlap between two distributions of the numerical data of the set, the maximum value of the numerical data in the distribution with the smaller value and the minimum value of the numerical data in the distribution with the larger value Since the relationship between sets is determined by checking whether the difference in the number of digits between and is greater than 3 digits, the majority of related sets have a difference in the number of digits of 3. Since it is within digits, there is an effect that it is possible to appropriately find the association between sets.

また、本発明によれば、統計量に基づいて数値データの平均値に係る同一性の検定をおこなうことにより、集合間の関連性を判定することとしたので、集合間に同値関係あるいは等値関係があるか否かを適切に発見することができるという効果を奏する。 In addition, according to the present invention, since the relationship between sets is determined by performing an identity test related to the average value of the numerical data based on the statistic, the equivalence relationship or the equivalence between the sets is determined. There is an effect that it is possible to appropriately discover whether or not there is a relationship.

また、本発明によれば、数値データの平均値に係る同一性の検定をおこなう場合の検定に係る有意水準を０．０１から０．３の範囲内で設定することとしたので、平均値に係る同一性の検定の誤差を考慮して有意水準を設定することにより集合間の関連性を適切に発見することができるという効果を奏する。 Further, according to the present invention, since the significance level related to the test in the case of performing the identity test related to the average value of the numerical data is set within the range of 0.01 to 0.3, By setting the significance level in consideration of the error of the identity test, it is possible to appropriately find the association between sets.

また、本発明によれば、統計量に基づいて、数値データの分散に係る同一性の検定をおこなうことにより、集合間の関連性を判定することとしたので、集合間に同値関係、等値関係あるいは兄弟関係があるか否かを適切に発見することができるという効果を奏する。 Further, according to the present invention, since the relevance between sets is determined by performing the identity test related to the dispersion of numerical data based on the statistic, the equivalence relationship between the sets, the equality There is an effect that it is possible to appropriately discover whether there is a relationship or sibling relationship.

また、本発明によれば、数値データの分散に係る同一性の検定をおこなう場合の検定に係る有意水準を０．０１から０．３の範囲内で設定することとしたので、分散に係る同一性の検定の誤差を考慮して有意水準を設定することにより集合間の関連性を適切に発見することができるという効果を奏する。 Further, according to the present invention, since the significance level related to the test in the case of performing the identity test related to the dispersion of numerical data is set within the range of 0.01 to 0.3, By setting the significance level in consideration of the error of the sex test, there is an effect that it is possible to appropriately find the association between sets.

以下に添付図面を参照して、本発明に係る集合間関連性判定プログラム及び集合間関連性判定装置の好適な実施例を詳細に説明する。 Exemplary embodiments of an inter-set relationship determination program and an inter-set relationship determination apparatus according to the present invention will be described below in detail with reference to the accompanying drawings.

まず、本発明に係る集合間関連性判定処理の概念について説明する。図１は、本発明に係る集合間関連性判定処理の概念を説明する図である。図１では、２つの情報体系ＡおよびＢにおけるカテゴリ間の関連性を判定する場合が示されている。 First, the concept of inter-set relationship determination processing according to the present invention will be described. FIG. 1 is a diagram for explaining the concept of inter-set relevance determination processing according to the present invention. FIG. 1 shows a case where the relationship between categories in the two information systems A and B is determined.

ここで、情報体系とは、上位−下位の階層関係を有するカテゴリからなる分類体系のことである。また、ここで対象とするカテゴリとは、数値データを要素とする集合を表す概念のことである。 Here, the information system is a classification system composed of categories having an upper-lower hierarchical relationship. The target category here is a concept representing a set having numerical data as elements.

図１に示すように、カテゴリに属する数値データは、数値、倍数および単位などの情報を含む。図１の例では、情報体系Ａのカテゴリ「赤」の数値データは、「Ｘ」という数値、「Ｈｚ（ヘルツ）」という単位を含んでおり、情報体系Ｂのカテゴリ「Ｒｅｄ」の数値データは、「Ｙ」という数値、「ｋ（キロ）」という単位倍率、「Ｈｚ」という単位を含んでいる。 As shown in FIG. 1, numerical data belonging to a category includes information such as numerical values, multiples, and units. In the example of FIG. 1, the numerical data of the category “red” of the information system A includes a numerical value “X” and a unit “Hz (Hertz)”, and the numerical data of the category “Red” of the information system B is , A numerical value “Y”, a unit magnification “k (kilo)”, and a unit “Hz”.

本実施例においては、異なる情報体系のカテゴリに属する数値データの確率分布から統計的特徴を抽出し、それを比較することによりカテゴリ間に同値関係、等値関係、兄弟関係あるいは階層関係があるか否かを判定する。 In the present embodiment, whether statistical features are extracted from probability distributions of numerical data belonging to categories of different information systems, and is compared, whether there is an equivalence relationship, an equivalence relationship, a sibling relationship, or a hierarchical relationship between the categories. Determine whether or not.

ここで、同値関係とは、カテゴリが同一となる関係のことである。等値関係とは、カテゴリは同一ではないが、同義となる関係のことである。兄弟関係とは、同一の上位概念を有する関係のことである。階層関係とは、上位概念ー下位概念となる関係のことである。また、同値関係と等値関係をあわせて、類似関係とも呼ぶ。 Here, the equivalence relationship is a relationship in which the categories are the same. An equivalence relationship is a relationship in which categories are not the same but are synonymous. A sibling relationship is a relationship having the same superordinate concept. The hierarchical relationship is a relationship that is a superordinate concept-subordinate concept. Further, the equivalence relation and the equivalence relation are collectively referred to as a similarity relation.

ここで、数値データの確率分布から統計的特徴を抽出する場合には、数値データの単位を揃えてから統計的特徴を抽出する。図２は、単位を有する数値データの確率分布の一例を示す図である。 Here, when extracting a statistical feature from the probability distribution of the numerical data, the statistical feature is extracted after aligning the units of the numerical data. FIG. 2 is a diagram illustrating an example of a probability distribution of numerical data having units.

図２の例では、情報体系Ｂの単位「ｋＨｚ」の単位に倍数「ｋ」が含まれているため、情報体系Ａの単位「Ｈｚ」と情報体系Ｂの単位「ｋＨｚ」とが異なっている。そのため、、本実施例では、あらかじめ定められた基準単位「Ｈｚ」に単位を揃えて数値データの確率分布の統計的特徴を比較する。 In the example of FIG. 2, since the unit “kHz” of the information system B includes a multiple “k”, the unit “Hz” of the information system A and the unit “kHz” of the information system B are different. . For this reason, in this embodiment, the statistical features of the probability distributions of numerical data are compared by aligning units to a predetermined reference unit “Hz”.

このようにして、数値データからなるカテゴリなど間の関連性を自動的に検出することができるようになるため、企業や組織の合併や統合がなされる場合に、企業や組織が有する情報の統合を容易にし、作業コストを削減することができるようになる。 In this way, it becomes possible to automatically detect the relationship between categories of numerical data, etc., so when a company or organization is merged or integrated, the information held by the company or organization is integrated. Can be made easier and work costs can be reduced.

なお、ここでは、情報体系が情報をカテゴリに分類する分類体系であることとしたが、上位−下位の階層関係を有するクラスとクラスの属性とを定義するオントロジーの体系、あるいは、テーブルとテーブルに含まれるフィールドとからなるデータベースの体系であってもよい。 Here, the information system is a classification system that classifies information into categories. However, an ontology system that defines classes and class attributes having a higher-lower hierarchical relationship, or tables and tables. It may be a database system composed of included fields.

オントロジーの体系では、上記カテゴリには数値データの集合からなるクラスまたは数値データの集合からなる属性が対応し、データベースの体系では、上記カテゴリには数値データの集合からなるフィールドが対応する。 In the ontology system, a class consisting of a set of numerical data or an attribute consisting of a set of numerical data corresponds to the category, and in the database system, a field consisting of a set of numerical data corresponds to the category.

図３は、クラスの属性間の関係を比較する場合の一例を示す図である。図３の場合には、「子供」クラスおよび「ｂｏｙ」クラスはそれぞれ年齢属性を有し、年齢属性には数値データが付随している。 FIG. 3 is a diagram illustrating an example in the case of comparing relationships between class attributes. In the case of FIG. 3, the “child” class and the “boy” class each have an age attribute, and numerical data is attached to the age attribute.

そして、数値データの確率分布から統計的特徴を抽出し、それを比較することにより「年齢」属性間に同値関係、等値関係あるいは兄弟関係があるか否かを判定する。 Then, a statistical feature is extracted from the probability distribution of the numerical data and is compared to determine whether there is an equivalence relationship, an equivalence relationship, or a sibling relationship between the “age” attributes.

また、「年齢」属性間に同値関係あるいは等値関係があると判定され、その他の属性間に含有関係が見出される場合には、属性が属するクラス間に、上位下位の階層関係があると判定できる。 In addition, if it is determined that there is an equivalence relationship or an equivalence relationship between the “age” attributes, and a content relationship is found between the other attributes, it is determined that there is an upper and lower hierarchical relationship between the classes to which the attribute belongs. it can.

図４は、上位下位の階層関係の判定処理について説明する図である。図４に示すように、「子供」クラスは、「男」であるか「女」であるかを示す性別属性を有し、「ｂｏｙ」クラスは、「男」という性別属性を有している。 FIG. 4 is a diagram for explaining the determination process of the upper and lower hierarchy relationships. As shown in FIG. 4, the “child” class has a gender attribute indicating “male” or “female”, and the “boy” class has a gender attribute “male”. .

これらの性別の分布には含有関係が存在するが、これだけではそれらの属性が属するクラス間にどのような関係があるかを判定することは難しい。そのため、「子供」クラスおよび「ｂｏｙ」クラスの年齢属性を比較する処理をおこなう。 These gender distributions have a containment relationship, but it is difficult to determine the relationship between the classes to which these attributes belong. Therefore, a process of comparing the age attributes of the “child” class and the “boy” class is performed.

具体的には、年齢属性に付随する数値データの確率分布から統計的特徴を抽出し、それを比較することにより年齢属性間に同値関係あるいは等値関係があるか否かを判定する。そして、年齢属性間に同値関係あるいは等値関係があり、他の属性間に含有関係がある場合に、それらの属性が属するクラス間に上位下位の階層関係があると判定する。 Specifically, a statistical feature is extracted from the probability distribution of numerical data associated with the age attribute, and it is determined whether or not there is an equivalence relationship or an equivalence relationship between the age attributes. When there is an equivalence relationship or an equivalence relationship between age attributes, and there is a containment relationship between other attributes, it is determined that there is an upper and lower hierarchical relationship between classes to which those attributes belong.

つぎに、本実施例に係る集合間関連性判定装置の機能構成について説明する。図５は、本実施例に係る集合間関連性判定装置の機能構成を説明する図である。 Next, a functional configuration of the inter-set relationship determination apparatus according to the present embodiment will be described. FIG. 5 is a diagram illustrating a functional configuration of the inter-set relationship determination apparatus according to the present embodiment.

図５に示すように、この集合間関連性判定装置は、入力部１０、表示部１１、記憶部１２、単位換算部１３、統計量算出部１４、値域隔たり分析部１５、平均値同一性分析部１６、分散同一性分析部１７、カテゴリ間関連性判定部１８および制御部１９を有する。 As shown in FIG. 5, the inter-set relationship determination apparatus includes an input unit 10, a display unit 11, a storage unit 12, a unit conversion unit 13, a statistic calculation unit 14, a range interval analysis unit 15, and an average value identity analysis. Section 16, distributed identity analysis section 17, inter-category relation determination section 18, and control section 19.

入力部１０は、キーボードやマウスなどの入力デバイスである。表示部１１は、ディスプレイなどの表示デバイスである。記憶部１２は、ハードディスク装置などの記憶デバイスである。 The input unit 10 is an input device such as a keyboard or a mouse. The display unit 11 is a display device such as a display. The storage unit 12 is a storage device such as a hard disk device.

この記憶部１２は、単位換算データ１２ａ、第１情報体系データ１２ｂ、第２情報体系データ１２ｃ、第１確率分布データ１２ｄおよび第２確率分布データ１２ｅを記憶している。 The storage unit 12 stores unit conversion data 12a, first information system data 12b, second information system data 12c, first probability distribution data 12d, and second probability distribution data 12e.

単位換算データ１２ａは、カテゴリの数値データごとの基準単位および数値データの単位として用いられる他の単位、単位倍率などの情報を記憶したデータである。この単位換算データ１２ａは、カテゴリの数値データ間で用いられている単位が異なっている場合に、単位を基準単位に揃えるために用いられる。 The unit conversion data 12a is data storing information such as a reference unit for each category of numeric data, other units used as units of numeric data, and unit magnification. The unit conversion data 12a is used to align the unit with the reference unit when the unit used between the category numerical data is different.

第１情報体系データ１２ｂおよび第２情報体系データ１２ｃは、関連性を判定するカテゴリが属する異なる２つの情報体系のデータである。この第１情報体系データ１２ｂおよび第２情報体系データ１２ｃは、それぞれの情報体系に属するカテゴリの階層関係や、カテゴリに属する数値データの情報などを含んでいる。 The 1st information system data 12b and the 2nd information system data 12c are data of two different information systems to which the category which judges relevance belongs. The first information system data 12b and the second information system data 12c include hierarchical relationships of categories belonging to the respective information systems, information on numerical data belonging to the categories, and the like.

第１確率分布データ１２ｄおよび第２確率分布データ１２ｅは、異なる２つの情報体系のカテゴリに属する数値データからそれぞれ得られる確率分布の統計量のデータである。 The first probability distribution data 12d and the second probability distribution data 12e are statistical data of probability distributions respectively obtained from numerical data belonging to two different information system categories.

単位換算部１３は、カテゴリの数値データ間で用いられている単位が異なっている場合に、単位を基準単位に換算する処理をおこなう。 The unit conversion unit 13 performs a process of converting the unit to the reference unit when the unit used between the numerical data of the categories is different.

ここで、単位とは、数値データの物差しとなるものであり、測定可能な物理量（身長、体重、金額、速度、牛乳ビンの容量など）に単位を付けることによって初めて、「ことばの世界」と「実世界」とが繋がれる（たとえば、身長データ１６２ｃｍ、金額データ２百万円など）。 Here, the unit is a measure of the numerical data, and it is only after adding a unit to a measurable physical quantity (height, weight, amount of money, speed, milk bottle capacity, etc.) “Real world” is connected (for example, height data 162 cm, amount data 2 million yen, etc.).

したがって、単位は数値データにとって本質的な属性であり、解析学的に考えると、単位は座標軸に相当し、数値は座標値に相当する。そして、数値データの各実数値は単位に相当する座標軸上に分布することになり、各実数値の量的な大小の比較をおこなうことができるようになる。 Therefore, the unit is an essential attribute for the numerical data. From an analytical viewpoint, the unit corresponds to the coordinate axis, and the numerical value corresponds to the coordinate value. Then, each real value of the numerical data is distributed on the coordinate axis corresponding to the unit, and it becomes possible to perform a quantitative comparison of each real value.

そのため、２つのカテゴリに属する数値データを比較するためには、座標軸の共通性、すなわち、単位の共通性が不可欠となり、座標軸が共通でない数値データを比較することはできない。 Therefore, in order to compare numerical data belonging to two categories, commonality of coordinate axes, that is, commonality of units is indispensable, and numerical data that does not have common coordinate axes cannot be compared.

ただし、数値データに付随する単位名が異なっているとしても、それが同じ属性を有する単位であれば数値データを比較することができる。たとえば、長さに関する単位には、「ｍ（メートル）」、「ｋｍ（キロメートル）」、「ｃｍ（センチメートル）」などの複数の単位があるが、「ｋ（キロ）」や「ｃ（センチ）」などの単位間の換算値（単位倍率）の違いを考慮することにより数値データの比較が可能となる。 However, even if the unit names associated with the numerical data are different, the numerical data can be compared if they are units having the same attribute. For example, there are a plurality of units such as “m (meters)”, “km (kilometers)”, and “cm (centimeters)” as units related to the length, but “k (kilometers)” and “c (centimeters)”. The numerical data can be compared by taking into consideration the difference in the conversion value (unit magnification) between the units such as “)”.

そこで、本実施例では、単位換算部１３は、数値カテゴリの単位を基準単位に変換する。ここで、基準単位は、「長さ」属性の場合は「ｍ」、「重さ」属性の場合は「ｇ」などとあらかじめ定めておく。 Therefore, in the present embodiment, the unit conversion unit 13 converts the unit of the numerical category into a reference unit. Here, the reference unit is predetermined as “m” in the case of the “length” attribute, “g” in the case of the “weight” attribute, or the like.

具体的には、単位換算部１３は、記憶部１２から第１情報体系データ１２ｂおよび第２情報体系データ１２ｃを読み込み、カテゴリ名、カテゴリ中の数値データに付随する単位名、単位倍数および、当該カテゴリに属するすべての数値データの実数値の情報を取得する。 Specifically, the unit conversion unit 13 reads the first information system data 12b and the second information system data 12c from the storage unit 12, and the category name, the unit name associated with the numerical data in the category, the unit multiple, Get real value information of all numeric data belonging to a category.

そして、単位換算部１３は、たとえば、２つの数値カテゴリの単位が「ｍ」および「ｃｍ」であった場合には、「ｍ」の単位は基準単位であるのでそのまま用いることとし、「ｃｍ」という単位については、「ｃ」の単位倍率０．０１を「ｃｍ」の単位を有する数値データにかけることにより基準単位の「ｍ」に変換する。 Then, for example, when the units of the two numerical categories are “m” and “cm”, the unit conversion unit 13 uses the unit of “m” as it is because it is a reference unit. The unit “c” is converted to a reference unit “m” by applying a unit magnification 0.01 of “c” to numerical data having a unit of “cm”.

単位名Ｕ、単位倍率Ｍ、実数値ｘ、基準単位名Ｕ^*とすれば、上記処理は、
ｘ｜Ｕ ⇒ ｘ・Ｍ｜Ｕ^*
と一般的に表現することができる。 If unit name U, unit magnification M, real value x, and reference unit name U ^* , the above processing is
x | U ⇒ x ・ M | U ^*
And can be expressed in general.

ここで、２つのカテゴリのカテゴリ名を「Ａ」および「Ｂ」、単位名をＵ_AおよびＵ_B、実数値をｘ₁，・・・，ｘ_mおよびｙ₁，・・・，ｙ_nとそれぞれ表すこととすると、基準化された２つのカテゴリの実数値は、ｘ₁・Ｍ_A，・・・，ｘ_m・Ｍ_Aおよびｙ₁・Ｍ_B，・・・，ｙ_n・Ｍ_Bとなり、単位名はＵ_A ^*，Ｕ_B ^*となる。 Here, the category name of two categories "A" and "B", the unit name U _A and U _B, real values x _1, ···, x _m and y _1, · · ·, and y _n When represent respectively, the real value of the scaled two categories were _{_{the, x 1 · M a, ···}} , x m · M a and _{_{y 1 · M B, ···,}} y n · M B next The unit names are U _A ^* and U _B ^* .

統計量算出部１４は、２つのカテゴリの数値データに係る確率分布の統計量を算出する処理をおこなう。具体的には、以下の式（１）〜（６）を用いて、２つのカテゴリ「Ａ」および「Ｂ」の数値データに係る確率分布のデータ数ｎ_Aおよびｎ_B、平均値μ_Aおよびμ_B、分散σ_A ²およびσ_B ²などの統計量を算出する。 The statistic calculation unit 14 performs a process of calculating the statistic of the probability distribution related to the numerical data of the two categories. Specifically, using the following formulas (1) to (6), the number of data n _A and n _{B of} the probability distribution relating to the numerical data of the two categories “A” and “B”, the average value μ _A and Statistics such as μ _B , variances σ _A ² and σ _B ² are calculated.

ここで、数値データのデータ数ｎ_Aおよびｎ_Bには、既知のデータ数ｍおよびｎをそのまま設定する。そして、統計量算出部１４は、２つの確率分布の統計量を記憶部１２に第１確率分布データ１２ｄおよび第２確率分布データ１２ｅとして記憶する。 Here, as the data numbers n _A and n _B of the numerical data, the known data numbers m and n are set as they are. Then, the statistic calculator 14 stores the statistics of the two probability distributions in the storage unit 12 as the first probability distribution data 12d and the second probability distribution data 12e.

値域隔たり分析部１５は、２つの確率分布の値域の隔たりを分析する処理をおこなう。たとえば、カテゴリ「Ａ」および「Ｂ」が同値関係あるいは等値関係を有するカテゴリであったとすると、図２に示したカテゴリ「赤」とカテゴリ「Ｒｅｄ」の確率分布のように、確率分布の値域はかなりの部分で重なるはずである。 The value range analysis unit 15 performs processing for analyzing the difference between the value ranges of the two probability distributions. For example, if the categories “A” and “B” are categories having an equivalence relationship or an equivalence relationship, the range of probability distributions is similar to the probability distribution of the categories “red” and “Red” shown in FIG. Should overlap quite a bit.

したがって、カテゴリ間に同値関係あるいは等値関係がある場合には、確率分布の値域が重なることが必要条件となる。 Therefore, when there is an equivalence relationship or an equivalence relationship between categories, it is a necessary condition that the range of probability distributions overlap.

また、２つのカテゴリ間に兄弟関係がある場合には、２つの確率分布の値域は近い位置にあるか、あるいは、（わずかではあるが）重なっていることが考えられる。そのため、２つの確率分布が兄弟関係にあるためには、それぞれの値域が（わずかだが）重なっているか、あるいは、近くに分布することが必要条件となる。 In addition, when there is a sibling relationship between two categories, it is conceivable that the two probability distribution ranges are close to each other or overlap (although slightly). Therefore, in order for the two probability distributions to have a sibling relationship, it is necessary that the respective value ranges overlap (although slightly) or be distributed close to each other.

図６は、兄弟関係にあるカテゴリの確率分布の一例を示す図である。図６の例では、「赤」という周波数のカテゴリと「青」という周波数のカテゴリとが「周波数」という共通概念を親概念とした兄弟関係にある場合を示している。このように、「赤」という周波数のカテゴリおよび「青」という周波数のカテゴリの確率分布の値域は近くに分布する。 FIG. 6 is a diagram illustrating an example of a probability distribution of categories in a sibling relationship. The example of FIG. 6 shows a case where the frequency category “red” and the frequency category “blue” are in a sibling relationship with the common concept “frequency” as a parent concept. As described above, the value ranges of the probability distributions of the frequency category “red” and the frequency category “blue” are distributed close to each other.

このような値域の重なりを調べるために、値域隔たり分析部１５は、カテゴリＡの確率分布Ｘ（Ａ）の上限値（最大値）ｘ₊および下限値（最小値）ｘ_-、カテゴリＢの確率分布Ｘ（Ｂ）の上限値ｙ₊および下限値ｙ_-の情報を取得して、以下の条件が成り立つか否かを調べる。 In order to investigate such overlapping of the range, the range separation analysis unit 15 performs the upper limit value (maximum value) x ₊ and lower limit value (minimum value) x ₋ of the category A probability distribution X (A), and the probability of category B. Information on the upper limit value y ₊ and the lower limit value y ₋ of the distribution X (B) is acquired, and it is checked whether or not the following conditions are satisfied.

（ｘ_-≦ｙ_-＜ｘ₊）∨（ｘ_-＜ｙ₊≦ｘ₊） (X ₋ ≦ y ₋ <x ₊ ) ∨ (x ₋ <y ₊ ≦ x ₊ )

また、値域隔たり分析部１５は、２つの確率分布の値域が近いか否かを以下の条件式（７）〜（１８）により判定する。なお、以下の式において、Ｘは、カテゴリ「Ａ」の値域であり、Ｙは、カテゴリ「Ｂ」の値域である。 In addition, the value range analysis unit 15 determines whether or not the value ranges of the two probability distributions are close according to the following conditional expressions (7) to (18). In the following expression, X is a range of category “A”, and Y is a range of category “B”.

ここで、Ｏ₁は、値域が近いか否かを判定するための閾値である。式（７）〜（１８）に示すように、ログスケールの上限値および下限値の比がＯ₁よりも小さければ、２つの確率分布の値域は近いと判定され、それぞれの確率分布に対応するカテゴリには兄弟関係がある可能性があると判定される。 Here, O ₁ is a threshold value for determining whether or not the value range is close. As shown in the equations (7) to (18), if the ratio of the upper limit value and the lower limit value of the log scale is smaller than O ₁ , the two probability distribution ranges are determined to be close and correspond to the respective probability distributions. The category is determined to have a sibling relationship.

逆に、ログスケールの上限値および下限値の比がＯ₁よりも大きければ、２つの確率分布の値域は遠いと判定され、それぞれの確率分布に対応するカテゴリには同値関係や等値関係あるいは兄弟関係が無いものと判定される。 Conversely, if the ratio of the upper limit value and the lower limit value of the log scale is greater than O ₁ , the two probability distribution ranges are determined to be far, and the category corresponding to each probability distribution has an equivalence relationship, an equality relationship, or It is determined that there is no sibling relationship.

また、Ｏ₁の値は３以下に設定される。これは、大多数の兄弟関係にあるカテゴリの数値データは、３桁程度の数値の桁数の違いに収まるためである。 The value of O ₁ is set to 3 or less. This is because the numerical data of the category having the majority of sibling relationships falls within the difference in the number of digits of about three digits.

平均値同一性分析部１６は、２つの確率分布の平均値が同一性を有するか否かを検定する処理をおこなう。具体的には、平均値同一性分析部１６は、ウェルチの検定により平均値の同一性の検定をおこなう。ウェルチの検定は、確率分布の等分散性があらかじめ仮定できない場合に用いられる検定方法である。 The average value identity analysis unit 16 performs processing to test whether or not the average values of the two probability distributions have identity. Specifically, the average value identity analysis unit 16 performs an average value identity test by Welch's test. Welch's test is a test method used when equal distribution of probability distribution cannot be assumed in advance.

まず、平均値同一性分析部１６は、仮説の設定をおこなう。すなわち、平均値同一性分析部１６は、有意水準をａ％に、帰無仮説Ｈ０をμ_A＝μ_Bに、対立仮説Ｈ１をμ_A≠μ_Bに設定する。そして、平均値同一性分析部１６は、以下の式（１９）および（２０）によりｔ値ｔ₀および自由度νを算出する。 First, the average value identity analysis unit 16 sets a hypothesis. That is, the average value identity analysis unit 16 sets the significance level to a%, the null hypothesis H0 to μ _A = μ _B , and the alternative hypothesis H1 to μ _A ≠ μ _B. Then, the average value identity analysis unit 16 calculates the t value t ₀ and the degree of freedom ν by the following equations (19) and (20).

そして、平均値同一性分析部１６は、ｔ分布関数の両側ａ％点の値ｔ_1-a/2（ν）を算出する。図７は、ｔ分布関数の両側ａ％点の値ｔ_1-a/2（ν）について説明する図である。図７に示すように、ここでは、有意水準がａ％であるので、上側ａ／２％の点の値、すなわち、ｔ^* _a（ν）＝ｔ_1-a/2（ν）を算出する。 Then, the average value identity analysis unit 16 calculates _a value t _{1−a / 2} (ν) at both sides a% point of the t distribution function. FIG. 7 is a diagram for explaining the value t _{1−a / 2} (ν) of the a% point on both sides of the t distribution function. As shown in FIG. 7, here, since the significance level is a%, the value of the upper a / 2% point, that is, t ^* _a (ν) = t _{1−a / 2} (ν) is calculated. .

その後、平均値同一性分析部１６は、ｔ値の絶対値｜ｔ₀｜とｔ_1-a/2（ν）との間の大小関係を調べ、帰無仮説Ｈ０に有意差があるか否かを調べる。 Thereafter, the average value identity analysis unit 16 examines the magnitude relationship between the absolute value | t ₀ | of the t value and t _{1−a / 2} (ν), and determines whether or not there is a significant difference in the null hypothesis H0. Find out.

具体的には、平均値同一性分析部１６は、｜ｔ₀｜≧ｔ_1-a/2（ν）である場合には、有意差があると判定し、帰無仮説Ｈ０を棄却して対立仮説Ｈ１を採用する。｜ｔ₀｜＜ｔ_1-a/2（ν）である場合には、平均値同一性分析部１６は、有意差がないと判定し、帰無仮説Ｈ０を採用して対立仮説Ｈ１を棄却する。 Specifically, the average value identity analysis unit 16 determines that there is a significant difference when | t ₀ | ≧ t _{1−a / 2} (ν), and rejects the null hypothesis H0. The alternative hypothesis H1 is adopted. If | t ₀ | <t _{1−a / 2} (ν), the mean value identity analysis unit 16 determines that there is no significant difference, rejects the alternative hypothesis H1 by adopting the null hypothesis H0. To do.

このようにして、帰無仮説Ｈ０が採択されれば、２つの確率分布の平均値の同一性は否定されず、それぞれの確率分布に対応する２つのカテゴリ「Ａ」および「Ｂ」間に同値関係あるいは等値関係がある可能性が生じる。 In this way, if the null hypothesis H0 is adopted, the identity of the average values of the two probability distributions is not denied, and the equivalence between the two categories “A” and “B” corresponding to the respective probability distributions. There is a possibility that there is a relationship or an equivalence relationship.

対立仮説Ｈ１が採択されれば、２つの確率分布の平均値の同一性は否定され、それぞれの確率分布に対応する２つのカテゴリ「Ａ」および「Ｂ」間に同値関係あるいは等値関係がある可能性も否定される。 If the alternative hypothesis H1 is adopted, the identity of the average values of the two probability distributions is denied, and there is an equivalence relationship or an equivalence relationship between the two categories “A” and “B” corresponding to each probability distribution. The possibility is also denied.

分散同一性分析部１７は、２つの確率分布の分散が同一性を有するか否かを検定する処理をおこなう。具体的には、分散同一性分析部１７は、Ｆ検定により分散の同一性の検定をおこなう。 The variance identity analysis unit 17 performs processing to test whether or not the variances of the two probability distributions are identical. Specifically, the variance identity analysis unit 17 tests the identity of variance by F test.

まず、分散同一性分析部１７は、仮説の設定をおこなう。すなわち、分散同一性分析部１７は、有意水準をｂ％に、帰無仮説Ｈ０をσ_A ²＝σ_B ²に、対立仮説Ｈ１をσ_A ²≠σ_B ²に設定する。 First, the variance identity analysis unit 17 sets a hypothesis. That is, the variance identity analysis unit 17 sets the significance level to b%, the null hypothesis H0 to σ _A ² = σ _B ² , and the alternative hypothesis H1 to σ _A ² ≠ σ _B ² .

そして、分散同一性分析部１７は、統計量算出部１４により算出された分散σ_A ²およびσ_B ²の情報を記憶部１２に記憶された第１確率分布データおよび第２確率分布データから読み出し、そのうち大きい方をＶ₁に、小さい方をＶ₂に設定する。 Then, the variance identity analysis unit 17 reads the information of the variances σ _A ² and σ _B ² calculated by the statistic calculation unit 14 from the first probability distribution data and the second probability distribution data stored in the storage unit 12. The larger one is set to V ₁ and the smaller one is set to V ₂ .

また、分散同一性分析部１７は、第１確率分布データおよび第２確率分布データからデータ数ｎ_Aおよびｎ_Bの情報を読み出し、標本数ｎ₁およびｎ₂として設定する。そして、分散同一性分析部１７は、以下の式（２１）および（２２）により自由度ν₁，ν₂および分散比Ｆ₀（＞１）を算出する。 In addition, the variance identity analysis unit 17 reads information on the data numbers n _A and n _B from the first probability distribution data and the second probability distribution data, and sets them as the sample numbers n ₁ and n ₂ . Then, the variance identity analysis unit 17 calculates the degrees of freedom ν ₁ and ν ₂ and the variance ratio F ₀ (> 1) by the following equations (21) and (22).

そして、分散同一性分析部１７は、Ｆ分布関数の上側確率ｂ／２％点の値Ｆ_1-b/2（ν₁，ν₂）を算出する。図８は、Ｆ分布関数の上側確率ｂ／２％点の値Ｆ_1-b/2（ν₁，ν₂）について説明する図である。図８に示すように、ここでは、有意水準がｂ％であるので、上側ｂ／２％の点の値、すなわち、Ｆ^* _b（ν₁，ν₂）＝Ｆ_1-b/2（ν₁，ν₂）を算出する。 Then, the variance identity analysis unit 17 calculates the value F _{1−b / 2} (ν ₁ , ν ₂ ) of the upper probability b / 2% point of the F distribution function. FIG. 8 is a diagram for explaining the value F _{1−b / 2} (ν ₁ , ν ₂ ) of the upper probability b / 2% point of the F distribution function. As shown in FIG. 8, here, since the significance level is b%, the value of the upper b / 2% point, that is, F ^* _b (ν ₁ , ν ₂ ) = F _{1−b / 2} (ν ₁ , ν ₂ ).

その後、分散同一性分析部１７は、分散比Ｆ₀とＦ_1-b/2（ν₁，ν₂）との間の大小関係を調べ、帰無仮説Ｈ０に有意差があるか否かを調べる。 Thereafter, the variance identity analysis unit 17 examines the magnitude relationship between the variance ratio F ₀ and F _{1−b / 2} (ν ₁ , ν ₂ ) to determine whether there is a significant difference in the null hypothesis H0. Investigate.

具体的には、分散同一性分析部１７は、Ｆ₀≧Ｆ_1-b/2（ν₁，ν₂）である場合には、有意差があると判定し、帰無仮説Ｈ０を棄却して対立仮説Ｈ１を採用する。Ｆ₀＜Ｆ_1-b/2（ν₁，ν₂）である場合には、分散同一性分析部１７は、有意差がないと判定し、帰無仮説Ｈ０を採用して対立仮説Ｈ１を棄却する。 Specifically, the variance identity analysis unit 17 determines that there is a significant difference when F ₀ ≧ F _{1−b / 2} (ν ₁ , ν ₂ ), and rejects the null hypothesis H0. The alternative hypothesis H1 is adopted. When F ₀ <F _{1−b / 2} (ν ₁ , ν ₂ ), the variance identity analysis unit 17 determines that there is no significant difference, adopts the null hypothesis H0, and sets the alternative hypothesis H1. Dismiss.

このようにして、帰無仮説Ｈ０が採択されれば、２つの確率分布の分散の同一性は否定されない。一方、対立仮説Ｈ１が採択されれば、２つの確率分布の分散の同一性は否定される。 Thus, if the null hypothesis H0 is adopted, the identity of the variance of the two probability distributions cannot be denied. On the other hand, if the alternative hypothesis H1 is adopted, the identity of the variance of the two probability distributions is denied.

カテゴリ間関連性判定部１８は、単位換算部１３、値域隔たり分析部１５、平均値同一性分析部１６、分散同一性分析部１７により分析された結果に基づいて、２つの数値データのカテゴリ間に同値関係、等値関係、兄弟関係あるいは上下階層関係があるか否かを判定する処理をおこなう。 The inter-category relevance determination unit 18 is based on the results analyzed by the unit conversion unit 13, the range separation analysis unit 15, the average value identity analysis unit 16, and the variance identity analysis unit 17, between the categories of two numerical data. Is processed to determine whether or not there is an equivalence relation, an equality relation, a sibling relation, or an upper and lower hierarchy relation.

具体的には、カテゴリ間関連性判定部１８は、単位換算部１３により数値データの単位が基準単位に換算された場合に、２つのカテゴリ「Ａ」および「Ｂ」の基準単位名Ｕ_A ^*およびＵ_B ^*が等しいか否かを判定し、基準単位名Ｕ_A ^*およびＵ_B ^*が等しくない場合には、２つのカテゴリ間に関係は無いものと判定し、その結果を出力する。 Specifically, the inter-category relevance determination unit 18, when the unit of numerical data is converted into the reference unit by the unit conversion unit 13, the reference unit names U _A ^{* of the} two categories “A” and “B” ^. and it is determined whether U _B ^* is equal, if ^* reference unit name U _a ^* and U _B are not equal, the relationship between the two categories is determined that there is no, and outputs the result.

また、カテゴリ間関連性判定部１８は、２つのカテゴリの基準単位名が一致し、数値データの２つの確率分布の値域が重なっており、かつ、平均値および分散が同一であると判定された場合には、２つのカテゴリ間に同値関係があると判定し、その結果を出力する。 In addition, the inter-category relevance determination unit 18 determines that the reference unit names of the two categories match, the two probability distribution ranges of the numerical data overlap, and the average value and variance are the same. In this case, it is determined that there is an equivalence relationship between the two categories, and the result is output.

また、カテゴリ間関連性判定部１８は、２つのカテゴリの基準単位名が一致し、数値データの２つの確率分布の値域が重なっており、分散の同一性は否定されたが平均値が同一であると判定された場合には、２つのカテゴリ間に等値関係があると判定し、その結果を出力する。 In addition, the inter-category relevance determination unit 18 matches the reference unit names of the two categories, the two probability distribution ranges of the numerical data overlap, and the average value is the same although the identity of the variance is denied. If it is determined that there is, it is determined that there is an equivalence relationship between the two categories, and the result is output.

また、カテゴリ間関連性判定部１８は、２つのカテゴリの基準単位名が一致し、数値データの２つの確率分布の値域が重なっており、平均値の同一性は否定されたが分散が同一であると判定された場合には、２つのカテゴリ間に兄弟関係があると判定し、その結果を出力する。 In addition, the inter-category relevance determination unit 18 matches the reference unit names of the two categories, the two probability distribution ranges of the numerical data are overlapped, and the mean value identity is denied but the variance is the same. If it is determined that there is a sibling relationship between the two categories, the result is output.

また、カテゴリ間関連性判定部１８は、２つのカテゴリの基準単位名が一致し、数値データの２つの確率分布の値域が重なってはいないが式（７）〜（１８）により値域が近い関係にあると判定され、かつ、分散が同一であると判定された場合には、２つのカテゴリ間に兄弟関係があると判定し、その結果を出力する。 Further, the inter-category relevance determination unit 18 matches the reference unit names of the two categories and does not overlap the ranges of the two probability distributions of the numerical data, but is close to the range of values according to the equations (7) to (18). If it is determined that the distribution is the same, it is determined that there is a sibling relationship between the two categories, and the result is output.

それ以外の場合には、カテゴリ間関連性判定部１８は、２つのカテゴリ間に関係は無いものと判定し、その結果を出力する。 In other cases, the inter-category relationship determination unit 18 determines that there is no relationship between the two categories, and outputs the result.

制御部１９は、集合間関連性判定装置を全体制御する制御部であり、各機能部間のデータの授受などを司る。 The control unit 19 is a control unit that controls the inter-group relevance determination device as a whole, and controls data exchange between the functional units.

つぎに、本実施例に係る集合間関連性判定処理の処理手順について説明する。図９−１および図９−２は、本実施例に係る集合間関連性判定処理の処理手順を説明するフローチャート（１）および（２）である。 Next, a processing procedure of inter-set relevance determination processing according to the present embodiment will be described. FIGS. 9A and 9B are flowcharts (1) and (2) for explaining the processing procedure of the inter-set relevance determination processing according to the present embodiment.

図９−１に示すように、まず、集合間関連性判定装置の単位換算部１３は、記憶部１２から単位換算データ１２ａ、第１情報体系データ１２ｂ、第２情報体系データ１２ｃを読み込む（ステップＳ１０１）。 As shown in FIG. 9A, first, the unit conversion unit 13 of the inter-group relevance determination device reads unit conversion data 12a, first information system data 12b, and second information system data 12c from the storage unit 12 (steps). S101).

そして、単位換算部１３は、２つの情報体系に属し、関連性があるか否かを判定するカテゴリの数値データの単位を基準単位に調整する（ステップＳ１０２）。その後、カテゴリ間関連性判定部１８は、調整後の単位が一致するか否かを調べる（ステップＳ１０３）。 And the unit conversion part 13 adjusts the unit of the numerical data of the category which belongs to two information systems, and determines whether it is relevant or not to a reference unit (step S102). Thereafter, the inter-category relationship determination unit 18 checks whether or not the adjusted units match (step S103).

単位が一致しない場合には（ステップＳ１０３，Ｎｏ）、図９−２に示されるように、カテゴリ間関連性判定部１８は、２つのカテゴリは無関係と判定し（ステップＳ１１７）、判定結果を表示部１１に出力して（ステップＳ１１８）、この集合間関連性判定処理を終了する。 When the units do not match (No at Step S103), as shown in FIG. 9B, the inter-category relationship determination unit 18 determines that the two categories are irrelevant (Step S117), and displays the determination result. This is output to the unit 11 (step S118), and the inter-set relevance determination process is terminated.

単位が一致した場合には（ステップＳ１０３，Ｙｅｓ）、図９−１に示されるように、平均値同一性分析部１６および分散同一性分析部１７は、有意水準ａおよびｂを設定し、さらに、その有意水準に対応する分位点を求めて有意値域を算出する（ステップＳ１０４）。 When the units match (step S103, Yes), as shown in FIG. 9A, the mean value identity analysis unit 16 and the variance identity analysis unit 17 set the significance levels a and b, and Then, a quantile corresponding to the significance level is obtained to calculate a significant value range (step S104).

その後、統計量算出部１４は、２つの確率分布の数値データに係るデータ数、平均値および分散を、式（１）〜（６）を用いて算出し、記憶部１２に第１確率分布データ１２ｄおよび第２確率分布データ１２ｅとして記憶する（ステップＳ１０５）。 Thereafter, the statistic calculation unit 14 calculates the number of data, the average value, and the variance relating to the numerical data of the two probability distributions using the formulas (1) to (6), and stores the first probability distribution data in the storage unit 12. 12d and the second probability distribution data 12e are stored (step S105).

そして、図９−２に示されるように、値域隔たり分析部１５は、２種類の数値データの確率分布の値域が重なっているか、確率分布の隔たりの絶対値がＯ₁以下であるか、あるいは、確率分布の隔たりの絶対値がＯ₁より大きいかを調べる（ステップＳ１０６）。 Then, as shown in FIG. 9-2, the value range analysis unit 15 determines whether the value ranges of the probability distributions of the two types of numerical data overlap, the absolute value of the probability distribution interval is O ₁ or less, or Then, it is examined whether the absolute value of the probability distribution interval is larger than O ₁ (step S106).

そして、２種類の数値データの確率分布の値域が重なっている場合には（ステップＳ１０６，「重なる」）、平均値同一性分析部１６は、２種類の数値データの平均値の同一性検定を実行し（ステップＳ１０７）、平均値が同一であるか否かを調べる（ステップＳ１０８）。 If the value ranges of the probability distributions of the two types of numerical data overlap (step S106, “overlap”), the average value identity analysis unit 16 performs an identity test of the average values of the two types of numerical data. It is executed (step S107), and it is checked whether the average values are the same (step S108).

平均値が同一である場合には（ステップＳ１０８，Ｙｅｓ）、分散同一性分析部１７は、２種類の数値データの分散の同一性検定を実行し（ステップＳ１０９）、分散が同一であるか否かを調べる（ステップＳ１１０）。 If the average values are the same (step S108, Yes), the variance identity analysis unit 17 performs an identity test of the variance of the two types of numerical data (step S109), and whether the variances are the same. (Step S110).

分散が同一である場合には（ステップＳ１０９，Ｙｅｓ）、カテゴリ間関連性判定部１８は、２つのカテゴリは同一概念であると判定し（ステップＳ１１１）、判定結果を表示部１１に出力して（ステップＳ１１３）、この集合間関連性判定処理を終了する。 If the variances are the same (step S109, Yes), the inter-category relationship determination unit 18 determines that the two categories are the same concept (step S111), and outputs the determination result to the display unit 11. (Step S113), this inter-set relevance determination process is terminated.

ステップＳ１１０において分散が同一でない場合には（ステップＳ１０９，Ｎｏ）、カテゴリ間関連性判定部１８は、２つのカテゴリは等値概念であると判定し（ステップＳ１１２）、判定結果を表示部１１に出力して（ステップＳ１１３）、この集合間関連性判定処理を終了する。 When the variances are not the same in step S110 (No in step S109), the inter-category relationship determination unit 18 determines that the two categories are equal concepts (step S112), and the determination result is displayed on the display unit 11. This is output (step S113), and the inter-set relevance determination process is terminated.

ステップＳ１０８において平均値が同一でない場合（ステップＳ１０８，Ｎｏ）、あるいは、ステップＳ１０６において２種類の数値データに係る確率分布の隔たりの絶対値がＯ₁以下である場合には（ステップＳ１０６，「｜隔たり｜≦Ｏ₁」）、分散同一性分析部１７は、２種類の数値データの分散の同一性検定を実行し（ステップＳ１１４）、分散が同一であるか否かを調べる（ステップＳ１１５）。 If the average values are not the same in step S108 (step S108, No), or if the absolute value of the difference between the probability distributions of the two types of numerical data is less than O _{1 in} step S106 (step S106, “| Distance | ≦ O ₁ )), the variance identity analysis unit 17 executes a variance identity test of two types of numerical data (step S114), and checks whether the variances are the same (step S115).

分散が同一である場合には（ステップＳ１１５，Ｙｅｓ）、カテゴリ間関連性判定部１８は、２つのカテゴリは兄弟概念であると判定し（ステップＳ１１６）、判定結果を表示部１１に出力して（ステップＳ１１３）、この集合間関連性判定処理を終了する。 When the variances are the same (step S115, Yes), the inter-category relationship determination unit 18 determines that the two categories are sibling concepts (step S116), and outputs the determination result to the display unit 11. (Step S113), this inter-set relevance determination process is terminated.

ステップＳ１１５において分散が同一でない場合（ステップＳ１１５，Ｎｏ）、あるいは、ステップＳ１０６において２種類の数値データに係る確率分布の隔たりの絶対値がＯ₁より大きい場合には（ステップＳ１０６，「｜隔たり｜＞Ｏ₁」）、分散同一性分析部１７は、２つのカテゴリは無関係であると判定し（ステップＳ１１７）、判定結果を表示部１１に出力して（ステップＳ１１３）、この集合間関連性判定処理を終了する。 If the variances are not the same in step S115 (step S115, No), or if the absolute value of the difference between the probability distributions related to the two types of numerical data is greater than O _{1 in} step S106 (step S106, “| > O ₁ ”), the variance identity analysis unit 17 determines that the two categories are irrelevant (step S 117), outputs the determination result to the display unit 11 (step S 113), and determines the association between the sets. The process ends.

なお、ここでは、カテゴリ間に同値関係、等値関係あるいは兄弟関係があるか否かを判定することとしたが、図４で説明したようにクラスが数値データからなる属性をもつ場合に、属性間の関連性からクラス間に階層関係があるか否かを判定してもよい。 Here, it is determined whether there is an equivalence relationship, an equivalence relationship, or a sibling relationship between the categories. However, as described with reference to FIG. It may be determined whether there is a hierarchical relationship between classes based on the relationship between them.

この場合、カテゴリ間関連性判定部１８は、上述したような方法にしたがって、属性間に同値関係あるいは等値関係があるか否かを調べる。また、カテゴリ間関連性判定部１８は、その他の属性に含有関係があるか否かを調べる。 In this case, the inter-category relationship determination unit 18 checks whether there is an equivalence relationship or an equivalence relationship between attributes according to the method described above. Further, the inter-category relevance determination unit 18 checks whether or not other attributes have a content relationship.

そして、カテゴリ間関連性判定部１８は、数値データからなる属性に同値関係あるいは等値関係があり、他の属性間に含有関係がある場合に、それらの属性が属するクラス間に上位下位の階層関係があると判定する。 Then, the inter-category relevance determination unit 18 has an upper or lower hierarchy between classes to which these attributes belong when there is an equivalence relationship or an equivalence relationship between attributes consisting of numerical data and there is a content relationship between other attributes. Determine that there is a relationship.

また、カテゴリ間関連性判定部１８は、他の属性間に同値関係がみられた場合には、数値データからなる属性間の関係が同値関係であるか等値関係であるかに応じて、クラス間に同値関係あるいは等値関係があると判定する。 Further, when an equivalence relationship is found between other attributes, the inter-category relationship determination unit 18 determines whether the relationship between attributes consisting of numerical data is an equivalence relationship or an equivalence relationship. It is determined that there is an equivalence relationship or an equivalence relationship between classes.

さらに、カテゴリ間関連性判定部１８は、他の属性間に等値関係がみられ、かつ、数値データからなる属性間の関係が同値関係あるいは等値関係であった場合には、クラス間に等値関係があると判定する。 Further, the inter-category relevancy determination unit 18 determines that there is an equivalence relationship between other attributes, and the relationship between attributes composed of numerical data is an equivalence relationship or an equivalence relationship. It is determined that there is an equivalence relationship.

ところで、上記実施例で説明した各種の処理は、あらかじめ用意されたプログラムをコンピュータで実行することによって実現することができる。そこで、以下では、図１０を用いて、上記各種処理を実現するプログラムを実行するコンピュータの一例について説明する。 Incidentally, the various processes described in the above embodiments can be realized by executing a program prepared in advance by a computer. Therefore, in the following, an example of a computer that executes a program that realizes the various processes will be described with reference to FIG.

図１０は、図５に示した集合間関連性判定装置となるコンピュータのハードウェア構成を示す図である。 FIG. 10 is a diagram illustrating a hardware configuration of a computer serving as the inter-group relationship determination apparatus illustrated in FIG.

このコンピュータは、ユーザからのデータの入力を受け付ける入力装置１００、表示装置１０１、ＲＡＭ（Random Access Memory）１０２、ＲＯＭ（Read Only Memory）１０３、各種プログラムを記録した記録媒体からプログラムを読み取る媒体読取装置１０４、ネットワークを介して他のコンピュータとの間でデータの授受をおこなうネットワークインターフェース１０５、ＣＰＵ（Central Processing Unit）１０６およびＨＤＤ（Hard Disk Drive）１０７をバス１０８で接続して構成される。 The computer includes an input device 100 that receives input of data from a user, a display device 101, a RAM (Random Access Memory) 102, a ROM (Read Only Memory) 103, and a medium reading device that reads a program from a recording medium in which various programs are recorded. 104, a network interface 105 that exchanges data with other computers via a network, a CPU (Central Processing Unit) 106, and an HDD (Hard Disk Drive) 107 are connected by a bus 108.

そして、ＨＤＤ１０８には、集合間関連性判定装置の機能と同様の機能を発揮するプログラム、すなわち、集合間関連性判定プログラム１０７ｂが記憶されている。ここで、集合間関連性判定プログラム１０７ｂについては、適宜分散して記憶することとしてもよい。 The HDD 108 stores a program that exhibits the same function as that of the inter-group relevance determination apparatus, that is, an inter-group relevance determination program 107b. Here, the inter-set relationship determination program 107b may be appropriately distributed and stored.

そして、ＣＰＵ１０６が、集合間関連性判定プログラム１０７ｂをＨＤＤ１０７から読み出して実行することにより、集合間関連性判定プロセス１０６ａとして機能するようになる。 Then, the CPU 106 reads out and executes the inter-group relationship determination program 107b from the HDD 107, thereby functioning as the inter-group relationship determination process 106a.

この集合間関連性判定プロセス１０６ａは、図５に示した単位換算部１３、統計量算出部１４、値域隔たり分析部１５、平均値同一性分析部１６、分散同一性分析部１７、カテゴリ間関連性判定部１８および制御部１９に対応する。 This inter-set relationship determination process 106a includes a unit conversion unit 13, a statistic calculation unit 14, a range interval analysis unit 15, an average value identity analysis unit 16, a variance identity analysis unit 17, and an inter-category relationship shown in FIG. This corresponds to the sex determination unit 18 and the control unit 19.

また、ＨＤＤ１０７には、各種データ１０７ａが記憶される。この各種データ１０７ａは、図５に示した記憶部１２に記憶される単位換算データ１２ａ、第１情報体系データ１２ｂ、第２情報体系データ１２ｃ、第１確率分布データ１２ｄおよび第２確率分布データ１２ｅに対応する。 The HDD 107 stores various data 107a. The various data 107a includes unit conversion data 12a, first information system data 12b, second information system data 12c, first probability distribution data 12d, and second probability distribution data 12e stored in the storage unit 12 shown in FIG. Corresponding to

そして、ＣＰＵ１０６は、各種データ１０７ａをＨＤＤ１０７に記憶するとともに、各種データ１０７ａをＨＤＤ１０７から読み出してＲＡＭ１０２に格納し、ＲＡＭ１０２に格納された各種データ１０２ａに基づいてデータ処理を実行する。 The CPU 106 stores various data 107 a in the HDD 107, reads the various data 107 a from the HDD 107, stores it in the RAM 102, and executes data processing based on the various data 102 a stored in the RAM 102.

ところで、集合間関連性判定プログラム１０７ｂは、必ずしも最初からＨＤＤ１０７に記憶させておく必要はない。 By the way, the inter-set relationship determination program 107b does not necessarily need to be stored in the HDD 107 from the beginning.

たとえば、コンピュータに挿入されるフレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＭＯディスク、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」、または、コンピュータの内外に備えられるハードディスクドライブ（ＨＤＤ）などの「固定用の物理媒体」、さらには、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータに接続される「他のコンピュータ（またはサーバ）」などに各プログラムを記憶しておき、コンピュータがこれらから各プログラムを読み出して実行するようにしてもよい。 For example, a “portable physical medium” such as a flexible disk (FD), a CD-ROM, an MO disk, a DVD disk, a magneto-optical disk, and an IC card inserted into a computer, or a hard disk drive (inside and outside the computer) Each program is stored in a “fixed physical medium” such as an HDD), and “another computer (or server)” connected to the computer via a public line, the Internet, a LAN, a WAN, or the like. The computer may read and execute each program from these.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、特許請求の範囲に記載した技術的思想の範囲内において種々の異なる実施例にて実施されてもよいものである。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in various different embodiments in addition to the above-described embodiments within the scope of the technical idea described in the claims. It ’s good.

たとえば、上記実施例では、集合間関連性判定装置が、カテゴリ間に同値関係、等値関係、兄弟関係あるいは上下階層関係があるか否かを判定することとしたが、集合間関連性判定装置は、それらの関係のうちの１つの関係がカテゴリ間にあるか否かを判定するだけであってもよい。 For example, in the above embodiment, the inter-set relationship determination device determines whether there is an equivalence relationship, an equivalence relationship, a sibling relationship, or an upper / lower hierarchy relationship between categories. May only determine whether one of those relationships is between categories.

また、本実施例において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。 In addition, among the processes described in this embodiment, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method.

この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図示した集合間関連性判定装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。 Each component of the illustrated inter-set relationship determination apparatus is functionally conceptual and does not necessarily need to be physically configured as illustrated.

すなわち、集合間関連性判定装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 In other words, the specific form of distribution / integration of the inter-set relationship determination device is not limited to the one shown in the figure, and all or a part thereof can be functionally or physically processed in an arbitrary unit according to various loads or usage conditions. Can be distributed and integrated.

さらに、集合間関連性判定装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each processing function performed in the inter-set relationship determination apparatus is realized by a CPU and a program that is analyzed and executed by the CPU, or as hardware by wired logic. Can be realized.

（付記１）数値データを要素とする集合間の関連性を判定する集合間関連性判定プログラムであって、
前記数値データの分布に係る統計量を算出する統計量算出手順と、
前記統計量算出手順により算出された統計量を記憶する統計量記憶手順と、
前記統計量記憶手順に記憶された統計量に基づいて集合間の関連性を判定する関連性判定手順と、
をコンピュータに実行させることを特徴とする集合間関連性判定プログラム。 (Supplementary note 1) A program for determining relevance between sets for determining relevance between sets having numerical data as elements,
A statistic calculation procedure for calculating a statistic related to the distribution of the numerical data;
A statistic storage procedure for storing a statistic calculated by the statistic calculation procedure;
A relevance determination procedure for determining relevance between sets based on the statistics stored in the statistics storage procedure;
A program for determining the relationship between sets, which causes a computer to execute.

（付記２）前記関連性判定手順は、同値関係、等値関係、兄弟関係あるいは上下階層関係のうちのいずれか１つの関係が集合間にあるか否かを判定することを特徴とする付記１に記載の集合間関連性判定プログラム。 (Additional remark 2) The said relevance determination procedure determines whether any one of equivalence relations, equivalence relations, sibling relations, or upper and lower hierarchy relations exists between sets. Program for determining association between sets described in 1.

（付記３）前記集合は、所定の概念の属性に係る集合であることを特徴とする付記１または２に記載の集合間関連性判定プログラム。 (Additional remark 3) The said set is a set which concerns on the attribute of a predetermined concept, The relevance determination program between sets as described in Additional remark 1 or 2 characterized by the above-mentioned.

（付記４）前記数値データの単位を調整し、単位が整合しない場合には、前記集合間には関連性がないものと判定する単位調整手順をさらにコンピュータに実行させ、前記統計量算出手順は、前記単位が整合する場合に、前記単位調整手順により単位が調整された数値データの分布に係る統計量を算出することを特徴とする付記１、２または３に記載の集合間関連性判定プログラム。 (Supplementary Note 4) If the units of the numerical data are adjusted and the units do not match, the computer further executes a unit adjustment procedure for determining that there is no relationship between the sets, and the statistical calculation procedure includes: 4. The inter-group relevance determination program according to appendix 1, 2, or 3, wherein, when the units match, a statistic relating to a distribution of numerical data whose units are adjusted by the unit adjustment procedure is calculated. .

（付記５）前記関連性判定手順は、集合の数値データの分布間の隔たりを所定の閾値と比較することにより集合間の関連性を判定することを特徴とする付記１〜４のいずれか１つに記載の集合間関連性判定プログラム。 (Additional remark 5) The said relevance determination procedure determines the relevance between sets by comparing the gap between distribution of the numerical data of a set with a predetermined threshold value, Any one of Additional remarks 1-4 characterized by the above-mentioned. The program for determining the relationship between sets described in 1.

（付記６）前記関連性判定手順は、集合の数値データの２つの分布間に重なりがない場合に、値が小さい方の分布における数値データの最大値と、値が大きい方の分布における数値データの最小値との間の桁数の違いが３桁よりも大きいか否かを調べることにより、集合間の関連性を判定することを特徴とする付記５に記載の集合間関連性判定プログラム。 (Additional remark 6) The said relevance determination procedure is the case where there is no overlap between the two distributions of the numerical data of the set, the maximum value of the numerical data in the distribution with the smaller value, and the numerical data in the distribution with the larger value 6. The inter-set relevance determination program according to appendix 5, wherein the relevance between sets is determined by examining whether or not the difference in the number of digits from the minimum value of the set is greater than 3 digits.

（付記７）前記関連性判定手順は、前記統計量記憶手順に記憶された統計量に基づいて前記数値データの平均値に係る同一性の検定をおこなうことにより、集合間の関連性を判定することを特徴とする付記１〜６のいずれか１つに記載の集合間関連性判定プログラム。 (Supplementary Note 7) The relevance determination procedure determines relevance between sets by performing an identity test on the average value of the numerical data based on the statistic stored in the statistic storage procedure. The inter-group relevance determination program according to any one of Supplementary notes 1 to 6, characterized in that:

（付記８）前記関連性判定手順は、前記数値データの平均値に係る同一性の検定をおこなう場合の検定に係る有意水準を０．０１から０．３の範囲内で設定することを特徴とする付記７に記載の集合間関連性判定プログラム。 (Supplementary Note 8) The relevance determination procedure is characterized in that a significance level related to a test in the case of performing an identity test related to an average value of the numerical data is set within a range of 0.01 to 0.3. The program for determining relevance between sets according to appendix 7.

（付記９）前記関連性判定手順は、前記統計量記憶手順に記憶された統計量に基づいて、前記数値データの分散に係る同一性の検定をおこなうことにより、集合間の関連性を判定することを特徴とする付記１〜８のいずれか１つに記載の集合間関連性判定プログラム。 (Supplementary Note 9) In the relevance determination procedure, the relevance between sets is determined by performing an identity test related to the dispersion of the numerical data based on the statistic stored in the statistic storage procedure. The inter-group relevance determination program according to any one of Supplementary notes 1 to 8, characterized in that:

（付記１０）前記関連性判定手順は、前記数値データの分散に係る同一性の検定をおこなう場合の検定に係る有意水準を０．０１から０．３の範囲内で設定することを特徴とする付記９に記載の集合間関連性判定プログラム。 (Additional remark 10) The said relevance determination procedure sets the significance level which concerns on the test in the case of performing the test of the identity which concerns on dispersion | distribution of the said numerical data within the range of 0.01 to 0.3, It is characterized by the above-mentioned. The program for determining relevance between sets according to appendix 9.

（付記１１）数値データを要素とする集合間の関連性を判定する集合間関連性判定方法であって、
前記数値データの分布に係る統計量を算出する統計量算出工程と、
前記統計量算出工程により算出された統計量を記憶する統計量記憶工程と、
前記統計量記憶工程に記憶された統計量に基づいて集合間の関連性を判定する関連性判定工程と、
を含んだことを特徴とする集合間関連性判定方法。 (Supplementary note 11) A method for determining the relationship between sets for determining the relationship between sets having numerical data as an element,
A statistic calculating step of calculating a statistic related to the distribution of the numerical data;
A statistic storage step for storing the statistic calculated by the statistic calculation step;
A relevance determination step of determining relevance between sets based on the statistic stored in the statistic storage step;
A method for determining the relationship between sets, characterized in that

以上のように、本発明に係る集合間関連性判定プログラム及び集合間関連性判定装置は、数値データを要素とする集合間の関連性を適切に発見することが必要な集合間関連性判定システムに有用である。 As described above, the inter-set relevance determination program and inter-set relevance determination apparatus according to the present invention are required to appropriately find the relevance between sets having numerical data as elements. Useful for.

本発明に係る集合間関連性判定処理の概念を説明する図である。It is a figure explaining the concept of the relationship determination process between sets which concerns on this invention. 単位を有する数値データの確率分布の一例を示す図である。It is a figure which shows an example of probability distribution of the numerical data which has a unit. クラスの属性間の関係を比較する場合の一例を示す図である。It is a figure which shows an example in the case of comparing the relationship between the attributes of a class. 上位下位の階層関係の判定処理について説明する図である。It is a figure explaining the determination process of a high-order hierarchy relation. 本実施例に係る集合間関連性判定装置の機能構成を説明する図である。It is a figure explaining the functional structure of the relationship determination apparatus between sets which concerns on a present Example. 兄弟関係にあるカテゴリの確率分布の一例を示す図である。It is a figure which shows an example of the probability distribution of the category which has a sibling relationship. ｔ分布関数の両側ａ％点の値ｔ_1-a/2（ν）について説明する図である。It is a figure explaining the value t1 _{-a / 2} ((nu)) of the both sides a% point of t distribution function. Ｆ分布関数の上側確率ｂ／２％点の値Ｆ_1-b/2（ν₁，ν₂）について説明する図である。It is a figure explaining value F1 _{-b / 2} ((nu) ₁ , (nu) ₂ ) of the upper probability b / 2% point of F distribution function. 本実施例に係る集合間関連性判定処理の処理手順を説明するフローチャート（１）である。It is a flowchart (1) explaining the process sequence of the relationship determination process between sets which concerns on a present Example. 本実施例に係る集合間関連性判定処理の処理手順を説明するフローチャート（２）である。It is a flowchart (2) explaining the process sequence of the relationship determination process between sets which concerns on a present Example. 図５に示した集合間関連性判定装置となるコンピュータのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the computer used as the inter-group relationship determination apparatus shown in FIG. 異なる分類体系間におけるカテゴリの違いについて説明する図である。It is a figure explaining the difference of the category between different classification systems.

Explanation of symbols

１０入力部
１１表示部
１２記憶部
１２ａ単位換算データ
１２ｂ第１情報体系データ
１２ｃ第２情報体系データ
１２ｄ第１確率分布データ
１２ｅ第２確率分布データ
１３単位換算部
１４統計量算出部
１５値域隔たり分析部
１６平均値同一性分析部
１７分散同一性分析部
１８カテゴリ間関連性判定部
１９制御部 DESCRIPTION OF SYMBOLS 10 Input part 11 Display part 12 Storage part 12a Unit conversion data 12b 1st information system data 12c 2nd information system data 12d 1st probability distribution data 12e 2nd probability distribution data 13 Unit conversion part 14 Statistics calculation part 15 Value range analysis Unit 16 Average value identity analysis unit 17 Variance identity analysis unit 18 Inter-category relationship determination unit 19 Control unit

Claims

A program for determining relevance between sets for determining relevance between sets having numerical data as an element,
A statistic calculation procedure for calculating a statistic related to the distribution of the numerical data;
A statistic storage procedure for storing a statistic calculated by the statistic calculation procedure;
The relationship between sets is determined based on the statistics stored in the statistics storage procedure, and the ranges of the probability distributions of the numerical data between the sets overlap, but are identical to the average value of the probability distribution of the numerical data between the sets If the absolute value of the range of the probability distribution of numerical data between sets is less than or equal to a predetermined threshold and the variance of the probability distribution of numerical data between sets is identical, the sibling concept Relevance determination procedure for determining that there is,
A program for determining the relationship between sets, which causes a computer to execute.

If the units of the numerical data are adjusted and the units do not match, the computer further executes a unit adjustment procedure for determining that there is no relationship between the sets, and the statistics calculation procedure matches the units. 2. The inter-set relevance determination program according to claim 1, wherein a statistic relating to a distribution of numerical data whose units are adjusted by the unit adjustment procedure is calculated.

The relevance determination procedure is characterized in that the relevance between sets is determined by performing an identity test on an average value of the numerical data based on the statistic stored in the statistic storage procedure. The inter-group relevance determination program according to claim 1 or 2.

The relevance determination procedure is characterized in that the relevance between sets is determined by performing an identity test related to the dispersion of the numerical data based on the statistics stored in the statistical storage procedure. The inter-set relationship determination program according to any one of claims 1 to 3.

A device for determining the relationship between sets that determines the relationship between sets having numerical data as an element,
Statistic calculation means for calculating a statistic related to the distribution of the numerical data;
Statistic storage means for storing the statistic calculated by the statistic calculation means;
The relationship between sets is determined based on the statistics stored in the statistic storage means, and the ranges of the probability distributions of the numerical data between the sets overlap, but they are identical to the average value of the probability distribution of the numerical data between the sets If the absolute value of the range of the probability distribution of numerical data between sets is less than or equal to a predetermined threshold and the variance of the probability distribution of numerical data between sets is identical, the sibling concept Relevance determining means for determining that there is,
A device for determining the relationship between sets, comprising: