JP5871842B2

JP5871842B2 - Information visualization apparatus, method, and program

Info

Publication number: JP5871842B2
Application number: JP2013042202A
Authority: JP
Inventors: 皓平森; 中村　隆幸; 隆幸中村; 豊荒川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-03-04
Filing date: 2013-03-04
Publication date: 2016-03-01
Anticipated expiration: 2033-03-04
Also published as: JP2014170410A

Description

本発明は、データ集合を描画する情報可視化装置、方法及びプログラムに関する。 The present invention relates to an information visualization apparatus, method, and program for drawing a data set.

センサなどのデータを記録し、記録したデータの集合を検索可能とするデータベースやファイルシステムとして、ＲＤＢ（ＲｅｌａｔｉｏｎａｌＤａｔａＢａｓｅ）やＣｏｕｃｈＤＢ，ｕＴｕｐｌｅＳｐａｃｅなど、幾つかのデータ集合記録システムがある（例えば、非特許文献１参照。）。
中でも、ｕＴｕｐｌｅＳｐａｃｅやＣｏｕｃｈＤＢなどはスキーマレスなデータベースと呼ばれ、多種多様なデータ、すなわち異なる種類、異なる数の属性をもつ異種異数多属性のデータの集合を管理する。例えばｕＴｕｐｌｅＳｐａｃｅでは、ｕＴｕｐｌｅ形式という1以上の「キー＝値」の組により属性名と属性名に対応する属性値とを自由に表現した形式のデータを管理する。このとき「キー＝値」は「属性名＝属性値」と同義となる。例えば、あるデータを「センサＩＤ＝１，時間＝３，温度＝２０」のように表現することで、センサＩＤと時間と温度という３つの属性名とこれに対応する属性値を表現する。
一方、文字列集合の索引構造などに用いられるデータ構造としてトライ木がある（例えば、非特許文献２参照。）。トライ木は、木構造をもつデータ構造である。
トライ木では、木構造を構成する各ノードに文字を割り当て、ルートノードからリーフノードまでの経路上に現れる文字の並びにより文字列を表現する。 There are several data set recording systems, such as RDB (Relational DataBase), CoachDB, and uTupleSpace, as data bases and file systems that record data such as sensors and make it possible to search a set of recorded data (for example, non-patent literature) 1).
Among them, uTupleSpace and CoachDB are called schemaless databases, and manage a wide variety of data, that is, a set of heterogeneous and multi-attribute data having different types and different numbers of attributes. For example, in uTupleSpace, data in a format in which an attribute name and an attribute value corresponding to the attribute name are freely expressed by one or more “key = value” pairs called a uTuple format are managed. At this time, “key = value” is synonymous with “attribute name = attribute value”. For example, by expressing certain data as “sensor ID = 1, time = 3, temperature = 20”, three attribute names of sensor ID, time, and temperature and corresponding attribute values are expressed.
On the other hand, there is a trie tree as a data structure used for an index structure of a character string set (see Non-Patent Document 2, for example). A trie tree is a data structure having a tree structure.
In the trie tree, a character is assigned to each node constituting the tree structure, and a character string is represented by a sequence of characters appearing on a route from the root node to the leaf node.

T. Nakamura, K. Kashiwagi, Y. Arakawa, and M. Nakamura, "Design and implementation of new uTupleSpace enabling storage and retrieval of large amount of schema-less sensor data," 2nd Intl. Workshop on Enablers for Ubiquitous and Context-Aware Services on Sensor Networks (EUCASS2011), pp.414-420, July 2011.T. Nakamura, K. Kashiwagi, Y. Arakawa, and M. Nakamura, "Design and implementation of new uTupleSpace enabling storage and retrieval of large amount of schema-less sensor data," 2nd Intl. Workshop on Enablers for Ubiquitous and Context- Aware Services on Sensor Networks (EUCASS2011), pp.414-420, July 2011. Fredkin E, “Trie memory, ”Communications of the ACM, Vol. 3, No. 9, pp.490-500, 1960.Fredkin E, “Trie memory,” Communications of the ACM, Vol. 3, No. 9, pp.490-500, 1960.

データベースやファイルシステムにおいて、どのようなデータが記録されているかを把握することは当該データの利用や当該システムのメンテナンスにおいて有用である。例えば記録されたデータの活用方法を検討する際、ＳＱＬに代表される検索文を設計する際、あるいはデータが増えて記憶容量が圧迫された場合に削除対象とするデータを決定する際などに役立つ。
しかしながら、特にｕＴｕｐｌｅＳｐａｃｅのように多種多様なデータを記録する場合には、データの増大に伴いデータの種類も増大し、データの一覧からどのようなデータが記録されているのかを把握することは困難となる。
こうした場合において、どのような種類のデータが存在するか、すなわちどのような属性名の組み合わせを含むデータが記録され、またどのような属性名が多く含まれているのかを俯瞰できるようシンプルに表現できると良い。
例えば、従来の方法としてトライ木がある。
トライ木では、木構造を構成する各ノードに文字を割り当て、ルートノードからリーフノードまでの経路上に現れる文字の並びにより文字列を表現することで、木全体では文字列の集合を表現することが出来る。
しかしながら、各ノードに割り当てられた文字は単体では意味を為さず、また各ノードに割り当てる文字は割り当ての順序に意味があるため組み合わせは考慮しない。
このため意味ある文字列である属性名について、その組み合わせの集合を俯瞰するために用いることはできない。 Understanding what data is recorded in a database or file system is useful in the use of the data and maintenance of the system. For example, it is useful when considering how to use the recorded data, designing a search sentence represented by SQL, or determining data to be deleted when the data capacity increases and the storage capacity is under pressure. .
However, especially when recording a wide variety of data, such as uTupleSpace, the type of data increases as the data increases, and it is difficult to know what data is recorded from the data list. It becomes.
In such a case, simply express what kind of data exists, that is, what kind of combination of attribute names is recorded, and what kind of attribute names are included. I can do it.
For example, there is a trie tree as a conventional method.
In a trie tree, a character is assigned to each node constituting the tree structure, and a character string is represented by a sequence of characters appearing on a route from a root node to a leaf node. I can do it.
However, the characters assigned to each node do not make sense alone, and the characters assigned to each node are meaningful in the order of assignment, so combinations are not considered.
For this reason, attribute names that are meaningful character strings cannot be used for an overview of the set of combinations.

本発明は上記事情に着目してなされたもので、その目的とするところは、多種多様なデータを容易に把握することができる情報可視化装置、方法及びプログラムを提供することにある。 The present invention has been made paying attention to the above circumstances, and an object thereof is to provide an information visualization apparatus, method, and program capable of easily grasping a wide variety of data.

上記目的を達成するための本発明の一つの観点は、１以上の属性名を含むデータの集合から、１つのデータに含まれる属性名の組み合わせである出現パターンを少なくとも生成する生成手段と、前記出現パターンの一覧から、木構造における各ノードが０または１以上の前記属性名を含み、ルートノードからリーフノードまでの経路上に一度ずつ出現する属性名の集合が出現パターンを表す出現パターン木を構築する構築手段とを、具備し、前記構築手段は、前記出現パターン木において、共通した親ノードを持つ１以上のノード群に共通して含まれる少なくとも１つの括りだし対象属性名を抽出し、該括りだし対象属性名を単一のノードに括りだす、組み換え処理を行なうことを特徴とする。 One aspect of the present invention for achieving the above object is to generate at least an appearance pattern that is a combination of attribute names included in one data from a set of data including one or more attribute names; From the list of appearance patterns, an appearance pattern tree in which each node in the tree structure includes 0 or 1 or more attribute names, and a set of attribute names that appear once on the route from the root node to the leaf node represents the appearance pattern. Constructing means for constructing, wherein the constructing means extracts at least one grouping target attribute name included in common in one or more node groups having a common parent node in the appearance pattern tree, A recombination process is performed in which the grouping target attribute names are grouped into a single node.

すなわち本発明によれば、ｕＴｕｐｌｅＳｐａｃｅのようにデータベースやファイルシステムに記録された少なくとも１つの属性名を含むデータの集合から、各データが含む属性名の出現パターンの一覧である出現パターン一覧を生成し、出現パターン一覧から、ルートノードからリーフノードまでの経路上に一度ずつ出現する属性名の集合が出現パターンを表す出現パターン木を構築することで、どのような種類のデータが存在するか、どのような属性名の組み合わせを含むデータが記録されているのかを容易に把握することができるよう、多種多様なデータの集合を木構造により描画することができる。
さらに本発明によれば、出現パターン木において、共通した親ノードを持つ１以上のノード群に共通して含まれる少なくとも１つの括りだし対象属性名を抽出し、括りだし対象属性名を単一のノードに括りだす組み換え処理を繰り返し実行することにより、各ノードが保持する複数の属性名全てが同じではなくとも、一部が共通している場合には括りだすことができ、各ノードが含む属性名の数の合計値を小さくすることができる。すなわち、出現パターン一覧をより簡略化した木構造により描画できる。 That is, according to the present invention, an appearance pattern list that is a list of appearance patterns of attribute names included in each data is generated from a set of data including at least one attribute name recorded in a database or file system such as uTupleSpace. What kind of data exists by constructing an appearance pattern tree in which a set of attribute names that appear once on the path from the root node to the leaf node from the list of appearance patterns represents the appearance pattern Various sets of data can be drawn with a tree structure so that it is possible to easily grasp whether data including such combinations of attribute names is recorded.
Furthermore, according to the present invention, in the appearance pattern tree, at least one grouping target attribute name that is commonly included in one or more nodes having a common parent node is extracted, and the grouping target attribute name is a single group name. By repeatedly executing the recombination process that starts with nodes, even if all of the attribute names held by each node are not the same, they can be bundled if some of them are in common. The total number of names can be reduced. That is, the appearance pattern list can be drawn with a simplified tree structure.

さらに本発明によれば、出現パターン一覧に出現する属性名とその出現頻度を算出し、出現頻度が高い属性名を優先して括りだし対象属性名候補とすることで、多くのノードに共通して含まれている属性名を括りだすことができ、出現頻度の低い属性名により括りだした場合に比べ、各ノードが含む属性名の数の合計値を小さくする効果が大きくなる。よって、出現パターン一覧をより簡略化した木構造により描画できる。また、各出現パターンがルートノードからリーフノードまでの経路上に一度ずつ出現する属性名のうち出現頻度の高い属性名が優先して分類されることになるので、ルートノードに近いノードほど出現パターン一覧における出現頻度の高い属性名が含まれているため、どの属性名の出現頻度が大きいかを出現パターン木から容易に把握することができる。 Furthermore, according to the present invention, attribute names appearing in the appearance pattern list and their appearance frequencies are calculated, and attribute names with a high appearance frequency are preferentially grouped out as target attribute name candidates. The attribute names included can be bundled, and the effect of reducing the total value of the number of attribute names included in each node is greater than when bundled with attribute names having a low appearance frequency. Therefore, the appearance pattern list can be drawn with a simplified tree structure. In addition, among the attribute names in which each occurrence pattern appears once on the route from the root node to the leaf node, the attribute name having a high appearance frequency is preferentially classified, so the closer the root node is to the appearance pattern Since attribute names having a high appearance frequency in the list are included, it is possible to easily grasp which attribute name has a high appearance frequency from the appearance pattern tree.

さらに本発明によれば、各ノードに１つの属性名を保持させるのではなく、複数の属性名を保持可能とすることにより、出現パターン木のノード数を少なくすることができ、出現パターン一覧をより簡略化した木構造により描画できる。
さらに本発明によれば、共通した親ノードを持つ１以上のノード群において、出現頻度が同数である括りだし対象属性名の候補が複数存在する場合、各ノードにおいて、括りだし対象属性名候補と、括りだし対象属性名候補を含むノード全てに共通して存在する括りだし対象属性名候補以外の属性名を合わせた属性名の数、すなわち共通属性名群に含まれる属性名の数と、各ノードに含まれる属性名のうち共通属性名群を除いた属性名の数との合計値を算出し、算出した合計値が最小となる括りだし対象属性名候補における共通属性名群を括りだし対象属性名とすることにより、さらに各ノードが含む属性名の数の合計値を小さくすることができ、出現パターン一覧をより簡略化した木構造により描画できる。 Furthermore, according to the present invention, by allowing each node to hold a plurality of attribute names instead of holding one attribute name, the number of nodes in the appearance pattern tree can be reduced, and the appearance pattern list can be displayed. It is possible to draw with a simplified tree structure.
Further, according to the present invention, when there are a plurality of grouping target attribute name candidates having the same number of appearance frequencies in one or more node groups having a common parent node, each node has a grouping target attribute name candidate. , The number of attribute names combined with attribute names other than the target attribute name candidates that are common to all nodes including the target attribute name candidates, that is, the number of attribute names included in the common attribute name group, Calculate the total value of the attribute names included in the node and the number of attribute names excluding the common attribute name group, and group the common attribute name group in the target attribute name candidate that minimizes the calculated total value By using attribute names, the total number of attribute names included in each node can be further reduced, and the appearance pattern list can be drawn with a simplified tree structure.

本実施形態に係る情報可視化装置を示すブロック図。The block diagram which shows the information visualization apparatus which concerns on this embodiment. データ蓄積部に蓄積されるデータの集合の具体例を示す図。The figure which shows the specific example of the collection of the data accumulate | stored in a data storage part. 情報可視化装置の動作を示すフローチャート。The flowchart which shows operation | movement of an information visualization apparatus. 出現パターン生成部での出現パターン生成処理を示すフローチャート。The flowchart which shows the appearance pattern production | generation process in an appearance pattern production | generation part. 出現パターン生成処理結果の具体例を示す図。The figure which shows the specific example of an appearance pattern production | generation process result. 木構造構築部における木構造構築処理を示すフローチャート。The flowchart which shows the tree structure construction process in a tree structure construction part. 出現頻度リストの具体例を示す図。The figure which shows the specific example of an appearance frequency list. 初期出現パターン木の具体例を示す図。The figure which shows the specific example of an initial appearance pattern tree. 出現パターン木の組み換え処理の詳細を示すフローチャート。The flowchart which shows the detail of the recombination process of an appearance pattern tree. 評価処理の詳細を示すフローチャート。The flowchart which shows the detail of an evaluation process. 出現パターン木の組み換え処理の第１経過を示す図。The figure which shows the 1st progress of the recombination process of an appearance pattern tree. 出現パターン木の組み換え処理の第２経過を示す図。The figure which shows the 2nd progress of the recombination process of an appearance pattern tree. 出現パターン木の組み換え処理の第３経過を示す図。The figure which shows the 3rd progress of the recombination process of an appearance pattern tree. 出現パターン木の組み換え処理の第４経過を示す図。The figure which shows the 4th progress of the recombination process of an appearance pattern tree. 最終的な出現パターン木を示す図。The figure which shows the final appearance pattern tree. 出現パターン木に各出現パターンの出現回数を表すノードを追加した一例を示す図。The figure which shows an example which added the node showing the frequency | count of appearance of each appearance pattern to the appearance pattern tree.

以下、図面を参照しながら本開示の一実施形態に係る情報可視化装置、方法及びプログラムについて詳細に説明する。なお、以下の実施形態では、同一の番号を付した部分については同様の動作を行うものとして、重ねての説明を省略する。
本実施形態に係る情報可視化装置について図１のブロック図を参照して説明する。
本実施形態に係る情報可視化装置１００は、データ蓄積部１０１、出現パターン生成部１０２、木構造構築部１０３および表示部１０４を含む。 Hereinafter, an information visualization apparatus, method, and program according to an embodiment of the present disclosure will be described in detail with reference to the drawings. Note that, in the following embodiments, the same numbered portions are assumed to perform the same operation, and repeated description is omitted.
The information visualization apparatus according to the present embodiment will be described with reference to the block diagram of FIG.
The information visualization apparatus 100 according to the present embodiment includes a data storage unit 101, an appearance pattern generation unit 102, a tree structure construction unit 103, and a display unit 104.

データ蓄積部１０１は、１以上の属性名を含むデータを蓄積する。属性名は属性の名称であり、例えば、温度、湿度、電流、電圧、流体の流量、物質の濃度、明度、騒音、位置および加速度が挙げられる。データについては図２を参照して後述する。
出現パターン生成部１０２は、データ蓄積部１０１に蓄積されたデータの集合から、各データに含まれる属性名の組み合わせを抽出し、属性名の組み合わせである出現パターンを、データごとに生成する。出現パターン生成部１０２はその後、出現パターンの一覧である出現パターン一覧を生成する。 The data storage unit 101 stores data including one or more attribute names. The attribute name is the name of the attribute, and examples thereof include temperature, humidity, current, voltage, fluid flow rate, substance concentration, brightness, noise, position, and acceleration. The data will be described later with reference to FIG.
The appearance pattern generation unit 102 extracts a combination of attribute names included in each data from a set of data stored in the data storage unit 101, and generates an appearance pattern that is a combination of attribute names for each data. The appearance pattern generation unit 102 then generates an appearance pattern list that is a list of appearance patterns.

木構造構築部１０３は、出現パターン一覧から出現頻度リストを作成し、出現パターン一覧から初期出現パターン木を構築し、初期出現パターン木と出現頻度リストから出現パターン木を構築する。出現パターン木は、ルートノードからリーフノードまでの経路上に一度ずつ現れる属性名の集合が出現パターンを示す木構造である。
表示部１０４は、出現パターン木を表示する。 The tree structure construction unit 103 creates an appearance frequency list from the appearance pattern list, constructs an initial appearance pattern tree from the appearance pattern list, and constructs an appearance pattern tree from the initial appearance pattern tree and the appearance frequency list. The appearance pattern tree is a tree structure in which a set of attribute names that appear once on a route from a root node to a leaf node indicates an appearance pattern.
The display unit 104 displays the appearance pattern tree.

次に、木構造構築部１０３について詳細に説明する。
木構造構築部１０３は、リスト作成部１０５および組み替え部１０６を含む。
リスト作成部１０５は、出現パターン一覧に基づいて、出現パターン一覧における属性名ごとの出現頻度を算出し、出現頻度リストを作成する。
組み替え部１０６は、出現パターン一覧に基づいて初期出現パターン木を構築する。初期出現パターン木は、出現パターン一覧における各出現パターンに含まれる属性名全てを含むノードをルートノードの各子ノードとした木構造である。さらに組み換え部１０６は、出現頻度リストを参照して、初期出現パターン木におけるルートノードの子ノード群を処理対象ノード群として組み換え処理を繰り返し行なう。これにより、出現パターン木を構築する。 Next, the tree structure construction unit 103 will be described in detail.
The tree structure construction unit 103 includes a list creation unit 105 and a rearrangement unit 106.
The list creation unit 105 calculates an appearance frequency for each attribute name in the appearance pattern list based on the appearance pattern list, and creates an appearance frequency list.
The rearrangement unit 106 constructs an initial appearance pattern tree based on the appearance pattern list. The initial appearance pattern tree is a tree structure in which a node including all attribute names included in each appearance pattern in the appearance pattern list is a child node of the root node. Further, the recombination unit 106 refers to the appearance frequency list and repeats the recombination process with the child node group of the root node in the initial appearance pattern tree as the processing target node group. Thereby, an appearance pattern tree is constructed.

次に、データ蓄積部１０１に蓄積されるデータの集合の一例について図２を参照して説明する。
本実施形態では、属性名２０１と属性値２０２とを対応付けて１つの属性２０３として、１以上の属性２０３の組を１つのデータ２０４と呼ぶ。属性値２０２は、属性名２０１のそれぞれに対応する値である。 Next, an example of a set of data stored in the data storage unit 101 will be described with reference to FIG.
In this embodiment, the attribute name 201 and the attribute value 202 are associated with each other as one attribute 203, and a set of one or more attributes 203 is referred to as one data 204. The attribute value 202 is a value corresponding to each attribute name 201.

本実施形態では、データは１以上の属性、すなわち「属性名＝属性値」の組から構成されるデータ形式によって記述することとする。例えば、図２の例では、データ（ア）からデータ（ク）までの８個のデータ２０４が蓄積されている。一例として、データ（エ）は、「subject=project_A, date=602, switch=on」といった形式で記述される。すなわち、データ（エ）は、「subject」、「date」および「switch」の３つの属性名を含み、さらにこれらの属性名に対応する属性値が表現される。
なお、本実施形態では、データは１以上の「属性名＝属性値」の組から構成されるデータ形式によって記述することとしたが、これに限らず、属性名だけからなるデータ形式でもよい。また、「属性名＝属性値」の組と組とをカンマによって区切っているが、これに限らず、スペースや他の記号を用いても良い。例えば「属性値１／属性名２／属性名３／…」といった形式でも良い。また、「属性名１，属性値の型１（ｉｎｔ，ｆｌｏａｔ，ｓｔｒｉｎｇ等），属性値１，属性名２，属性値の型２（ｉｎｔ，ｆｌｏａｔ，ｓｔｒｉｎｇ等），属性値３），…」と３つの項が１セットで複数並んでいるような、属性名と属性値以外の情報が含まれている形式でもよい。 In this embodiment, data is described in a data format composed of one or more attributes, that is, a set of “attribute name = attribute value”. For example, in the example of FIG. 2, eight pieces of data 204 from data (a) to data (ku) are stored. As an example, the data (d) is described in a format such as “subject = project_A, date = 602, switch = on”. That is, the data (d) includes three attribute names “subject”, “date”, and “switch”, and attribute values corresponding to these attribute names are expressed.
In this embodiment, the data is described in a data format composed of one or more “attribute name = attribute value” pairs. However, the present invention is not limited to this, and a data format composed of only attribute names may be used. In addition, the pair of “attribute name = attribute value” is separated by a comma, but the present invention is not limited to this, and a space or another symbol may be used. For example, a format of “attribute value 1 / attribute name 2 / attribute name 3 /...” May be used. Also, “attribute name 1, attribute value type 1 (int, float, string, etc.), attribute value 1, attribute name 2, attribute value type 2 (int, float, string, etc.), attribute value 3),... A format that includes information other than the attribute name and attribute value, such that a plurality of three terms are arranged in one set.

なお、上述した例に限らず、「属性名＝属性値」の形式や、属性名だけで表現できるのであれば、例えばＷｅｂおよびインターネットを経由して取得したデータを用いてもよい。また、図２の例では、２個から６個の属性名を含むデータを例として挙げたが、属性名の数は１以上であれば何個含んでいてもよい。また、データ蓄積部１０１に蓄積されたデータの数が８個の場合を例として挙げたが、これに限らず、さらに多くのデータを蓄積してもよい。 Note that the present invention is not limited to the above-described example, and data acquired via the Web and the Internet may be used, for example, as long as it can be expressed only by the attribute name = attribute value format or the attribute name. In the example of FIG. 2, data including 2 to 6 attribute names is given as an example, but any number of attribute names may be included as long as it is 1 or more. Further, although the case where the number of data stored in the data storage unit 101 is eight has been described as an example, the present invention is not limited thereto, and more data may be stored.

次に、情報可視化装置１００の動作を図３のフローチャートを参照して説明する。
ステップＳ３０１では、出現パターン生成部１０２が、データ蓄積部１０１に蓄積されたデータの集合から出現パターン一覧を生成する。出現パターン生成部１０２における出現パターン生成処理の詳細については図４を参照して後述する。
ステップＳ３０２では、木構造構築部１０３が、出現パターン一覧から出現パターン木を構築する。木構造構築部１０３における木構造構築処理の詳細については図６、図９および図１０を参照して後述する。
ステップＳ３０３では、表示部１０４が、出現パターン木を描画する。以上で、情報可視化装置１００の動作を終了する。 Next, the operation of the information visualization apparatus 100 will be described with reference to the flowchart of FIG.
In step S <b> 301, the appearance pattern generation unit 102 generates an appearance pattern list from the set of data stored in the data storage unit 101. Details of the appearance pattern generation processing in the appearance pattern generation unit 102 will be described later with reference to FIG.
In step S302, the tree structure construction unit 103 constructs an appearance pattern tree from the appearance pattern list. Details of the tree structure building process in the tree structure building unit 103 will be described later with reference to FIGS. 6, 9, and 10.
In step S303, the display unit 104 draws an appearance pattern tree. Above, operation | movement of the information visualization apparatus 100 is complete | finished.

次に、ステップＳ３０１に示す出現パターン生成部１０２での出現パターン生成処理について図４のフローチャートを参照して説明する。
ステップＳ４０１では、データ蓄積部１０１に蓄積されるデータの集合から、ステップＳ３０１の動作時点でのデータの集合を受け取る。
ステップＳ４０２では、ステップＳ４０１で受け取ったデータのうち１つのデータを選択し、このデータに含まれる属性名を抽出する。
ステップＳ４０３では、ステップＳ４０２で抽出した１以上の属性名から出現パターンを生成する。
ステップＳ４０４では、ステップＳ４０３で生成した出現パターンと、出現パターン一覧に含まれる出現パターンとを比較して、出現パターン一覧に同一の出現パターンが既に存在するかどうかを判定する。出現パターン一覧に同一の出現パターンが存在すれば、ステップＳ４０６に進み、出現パターン一覧に同一の出現パターンが存在しなければ、ステップＳ４０５に進む。なお、出現パターン同士の比較の際は、データ中での属性名の出現順序は問わず、同一の属性名の組み合わせであれば同一の出現パターンとすればよい。例えば、「ｓｕｂｊｅｃｔ，ｄａｔｅ，ｓｗｉｔｃｈ，ｉｄ」と「ｄａｔｅ，ｉｄ，ｓｕｂｊｅｃｔ，ｓｗｉｔｃｈ」は同一の出現パターンとする。また、最初に処理するデータの場合は、比較対象となる出現パターンが出現パターン一覧に存在しないので、出現パターン一覧に追加する。
なお、データ蓄積部１０１に蓄積されるデータの集合から、データを１つずつ受け取って、出現パターン一覧を生成していくようにしてもよい。
また、データが同じ属性名を重複して含んでいる場合は、1つにまとめることで対応できる。 Next, the appearance pattern generation processing in the appearance pattern generation unit 102 shown in step S301 will be described with reference to the flowchart of FIG.
In step S401, a set of data at the time of operation in step S301 is received from the set of data stored in the data storage unit 101.
In step S402, one of the data received in step S401 is selected, and an attribute name included in this data is extracted.
In step S403, an appearance pattern is generated from one or more attribute names extracted in step S402.
In step S404, the appearance pattern generated in step S403 is compared with the appearance patterns included in the appearance pattern list to determine whether or not the same appearance pattern already exists in the appearance pattern list. If the same appearance pattern exists in the appearance pattern list, the process proceeds to step S406. If the same appearance pattern does not exist in the appearance pattern list, the process proceeds to step S405. When comparing the appearance patterns, the appearance order of the attribute names in the data is not limited, and the same appearance pattern may be used as long as the combination is the same attribute name. For example, “subject, date, switch, id” and “date, id, subject, switch” have the same appearance pattern. In the case of data to be processed first, the appearance pattern to be compared does not exist in the appearance pattern list, and is added to the appearance pattern list.
The appearance pattern list may be generated by receiving data one by one from the set of data stored in the data storage unit 101.
If the data contains the same attribute name, it can be handled by combining them into one.

ステップＳ４０５では、出現パターン一覧に新しい出現パターンを追加する。
ステップＳ４０６では、全てのデータについて出現パターンを生成したかどうかを判定する。全てのデータについて出現パターンを生成した場合は、出現パターン抽出処理を終了し、全てのデータについて出現パターンを生成していない場合、すなわち出現パターンを生成していないデータが存在する場合は、ステップＳ４０２に戻り同様の処理を繰り返す。
以上で出現パターン生成処理を終了する。 In step S405, a new appearance pattern is added to the appearance pattern list.
In step S406, it is determined whether or not an appearance pattern has been generated for all data. If the appearance pattern is generated for all data, the appearance pattern extraction process is terminated. If no appearance pattern is generated for all data, that is, if there is data for which no appearance pattern is generated, step S402 is performed. Return to and repeat the same process.
Thus, the appearance pattern generation process ends.

次に、出現パターン生成処理の具体例について図５を参照して説明する。
図５は、図２に示すデータの集合から出現パターン一覧５００を生成した例である。
具体的な処理としては、まず、出現パターン生成処理を行なう時点でのデータの集合、すなわち図２に示したデータの集合を受け取る。続いて図２のデータの集合からデータ（ア）を選択し、データ（ア）から属性名「ｓｕｂｊｅｃｔ，ｄａｔｅ，ｐｏｓｉｔｉｏｎｘ，ｐｏｓｉｔｉｏｎｙ，ｉｄ，ｔｅｍｐ」を抽出し、これらの属性名の組み合わせを出現パターンとして生成する。その後、出現パターン一覧に同一の出現パターンがあるかを判定するが、最初に処理するデータであるため、出現パターン５０１（Ａ）として出現パターン一覧に格納する。
続いて、まだ全てのデータについて出現パターンを生成していないため、次のデータ（イ）から同様に属性名「ｔｅｍｐ，ｓｕｂｊｅｃｔ，ｄａｔｅ，ｐｏｓｉｔｉｏｎｘ，ｐｏｓｉｔｉｏｎｙ，ｉｄ」を抽出して出現パターンを生成する。このときデータ（イ）の出現パターンはデータ（ア）の出現パターンである出現パターン５０１（Ａ）と同一の属性名の組み合わせとなるので、出現パターン一覧には追加しない。
続いて、まだ全てのデータについて出現パターンを生成していないため、図２のデータ（ウ）から属性名「ｓｕｂｊｅｃｔ，ｉｄ，ｐｏｓｉｔｉｏｎｘ，ｐｏｓｉｔｉｏｎｙ，ｍａｇｎｉｔｕｄｅ」を抽出して出現パターンを生成する。出現パターン「ｓｕｂｊｅｃｔ，ｉｄ，ｐｏｓｉｔｉｏｎｘ，ｐｏｓｉｔｉｏｎｙ，ｍａｇｎｉｔｕｄｅ」は出現パターン一覧に同一の出現パターンが存在しないため出現パターン５０１（Ｂ）として出現パターン一覧に追加される。以上の処理を全てのデータに対して行なえばよい。結果として、図５に示すように、データ（エ）から出現パターン５０１（Ｃ）、データ（オ）から出現パターン５０１（Ｄ）、データ（カ）から出現パターン５０１（Ｅ）、データ（キ）から出現パターン５０１（Ｆ）、およびデータ（ク）から出現パターン５０１（Ｇ）がそれぞれ生成され、出現パターン一覧に追加される。 Next, a specific example of the appearance pattern generation process will be described with reference to FIG.
FIG. 5 shows an example in which the appearance pattern list 500 is generated from the data set shown in FIG.
As a specific process, first, a set of data at the time of performing an appearance pattern generation process, that is, a set of data shown in FIG. 2 is received. Subsequently, data (A) is selected from the data set of FIG. 2, and attribute names “subject, date, positionx, positiony, id, temp” are extracted from the data (A), and combinations of these attribute names are represented as appearance patterns. Generate as After that, it is determined whether there is the same appearance pattern in the appearance pattern list, but since it is the data to be processed first, it is stored in the appearance pattern list as the appearance pattern 501 (A).
Subsequently, since no appearance pattern has been generated for all data, the attribute name “temp, subject, date, positionx, positiony, id” is similarly extracted from the next data (A) to generate an appearance pattern. . At this time, the appearance pattern of the data (A) is a combination of the same attribute names as the appearance pattern 501 (A) that is the appearance pattern of the data (A), and thus is not added to the appearance pattern list.
Subsequently, since the appearance pattern has not been generated for all the data, the attribute name “subject, id, positionx, positiony, magnesium” is extracted from the data (c) in FIG. 2 to generate the appearance pattern. The appearance pattern “subject, id, positionx, positiony, magnesium” is added to the appearance pattern list as the appearance pattern 501 (B) because the same appearance pattern does not exist in the appearance pattern list. What is necessary is just to perform the above process with respect to all the data. As a result, as shown in FIG. 5, the appearance pattern 501 (C) from data (D), the appearance pattern 501 (D) from data (E), the appearance pattern 501 (E) from data (F), and the data (G) , An appearance pattern 501 (F) and an appearance pattern 501 (G) are generated from the data (K) and added to the appearance pattern list.

なお、ここでは図２のデータ（ア）からデータ（ク）までの表記順に出現パターンを生成した結果を示しているが、これに限らず、逆順に抽出してもよい。また、生成中の出現パターン一覧に含まれる出現パターンを属性名の数によってソートしながら出現パターンの生成を行ってもよい。これにより、出現パターン同士の比較を効率的に行うことができる。例えば、属性名の数が５個の出現パターンと同一である可能性がある出現パターンは、属性名の数が５個の出現パターンのみであり、出現パターン一覧を属性名の数によってソートしておくことにより、出現パターン一覧の中から属性名の数が５個の出現パターンを容易にみつけることができる。
また、作成中の出現パターン一覧において、各出現パターンを構成する属性名を辞書順によりソートしつつ出現パターンの生成をおこなってもよい。これにより、出現パターンを構成する属性名の表示順が一意に定まるため、出現パターン同士の比較を容易におこなうことができる。
また本実施形態において、出現パターンは属性名と属性名をカンマによって区切っているが、これに限らず、スペースや他の記号等を用いても良い。 Here, although the result of generating the appearance patterns in the order of the notation from the data (a) to the data (ku) in FIG. 2 is shown, the present invention is not limited to this and may be extracted in the reverse order. Further, the appearance patterns may be generated while sorting the appearance patterns included in the list of appearance patterns being generated according to the number of attribute names. Thereby, the appearance patterns can be compared efficiently. For example, the appearance patterns that may have the same number of attribute names as the appearance patterns with 5 attribute names are only the appearance patterns with 5 attribute names, and the appearance pattern list is sorted by the number of attribute names. Thus, it is possible to easily find an appearance pattern having five attribute names from the appearance pattern list.
In addition, in the appearance pattern list being created, the appearance patterns may be generated while sorting the attribute names constituting the appearance patterns in the order of the dictionary. Thereby, since the display order of the attribute names constituting the appearance pattern is uniquely determined, it is possible to easily compare the appearance patterns.
In the present embodiment, the appearance pattern is such that the attribute name is separated from the attribute name by a comma. However, the present invention is not limited to this, and a space, another symbol, or the like may be used.

次に、ステップＳ３０２に示す木構造構築部１０３における木構造構築処理について図６のフローチャートを参照して説明する。
ステップＳ６０１では、リスト作成部１０５が、出現パターン一覧から属性名を抽出し、各属性名の出現パターン一覧における出現頻度を算出し、出現頻度リストを作成する。
ステップＳ６０２では、組み替え部１０６が、出現パターン一覧に基づいて初期出現パターン木を構築する。 Next, the tree structure building process in the tree structure building unit 103 shown in step S302 will be described with reference to the flowchart of FIG.
In step S601, the list creation unit 105 extracts attribute names from the appearance pattern list, calculates the appearance frequency of each attribute name in the appearance pattern list, and creates an appearance frequency list.
In step S602, the rearrangement unit 106 constructs an initial appearance pattern tree based on the appearance pattern list.

ステップＳ６０３では、組み替え部１０６が、出現頻度リストを参照して初期出現パターン木の組み換え処理を繰り返し実行し、出現パターン木を構築する。
次に、ステップＳ６０２において作成される出現頻度リストの具体例について図７を参照して説明する。
出現頻度リスト７００は、属性名７０１とその出現頻度７０２とが対応付けられたテーブルである。出現頻度７０２は、属性名ごとに、出現パターン一覧中に出現する同一の属性名の数をカウントして算出すればよい。図７に示す出現頻度リスト７００は、図５の出現パターン一覧を参照して属性名の出現頻度を算出しており、例えば、図５の出現パターン一覧には、属性名７０１「ｓｕｂｊｅｃｔ」が５個存在するので出現頻度７０２「５」、属性名７０１「ｄａｔｅ」が３個存在するので出現頻度７０２「３」、と対応付けられて格納される。 In step S603, the rearrangement unit 106 refers to the appearance frequency list and repeatedly executes the recombination process of the initial appearance pattern tree to construct the appearance pattern tree.
Next, a specific example of the appearance frequency list created in step S602 will be described with reference to FIG.
The appearance frequency list 700 is a table in which attribute names 701 and their appearance frequencies 702 are associated with each other. The appearance frequency 702 may be calculated by counting the number of identical attribute names that appear in the appearance pattern list for each attribute name. The appearance frequency list 700 illustrated in FIG. 7 calculates the appearance frequency of the attribute name with reference to the appearance pattern list in FIG. 5. For example, the attribute name 701 “subject” is 5 in the appearance pattern list in FIG. 5. Since there are three appearance frequencies 702 “5” and three attribute names 701 “date”, they are stored in association with the appearance frequency 702 “3”.

次に、ステップＳ６０２において構築される初期出現パターン木の具体例について図８を参照して説明する。
図８に示す初期出現パターン木８００は、図５の出現パターン一覧を参照して、親ノードをルートノード８０１、出現パターン一覧内の各出現パターンに含まれる属性名全てを含むノード８０２からノード８０８をその子ノードとして構築される。図５の出現パターン（Ａ）から（Ｇ）までが、それぞれ子ノード８０２から子ノード８０８までに対応する。ここでは、ルートノード８０１を空のノードとしているが、ルートノードであることを記号等により示すこととしても良い． Next, a specific example of the initial appearance pattern tree constructed in step S602 will be described with reference to FIG.
The initial appearance pattern tree 800 shown in FIG. 8 refers to the appearance pattern list of FIG. 5, the parent node is the root node 801, and the node 802 to the node 808 including all the attribute names included in each appearance pattern in the appearance pattern list. As its child nodes. Appearance patterns (A) to (G) in FIG. 5 correspond to child nodes 802 to 808, respectively. Here, the root node 801 is an empty node, but the root node 801 may be indicated by a symbol or the like.

次に、ステップＳ６０３における組み換え処理の詳細について図９のフローチャートを参照して説明する。なお、以下のステップＳ９０１からステップＳ９０７までの処理を組み換え処理９００と呼ぶ。
また、最初の処理では、初期出現パターン木におけるルートノードの子ノード群が処理対象ノード群となる。処理対象ノード群とは、親ノードを共通とする子ノード群であって、組み換え処理の対象となる１以上のノードである。
ステップＳ９０１では、処理対象ノード群に対して括りだし操作が可能であるかどうかを判定する。括りだし操作が可能であれば、ステップＳ９０２に進み、括りだし操作が可能でない、すなわち括りだし操作が不可能であれば、処理対象ノード群における各ノードが含む属性名、ノード間の関係といった木構造情報を返す。
また、括りだし操作が可能であるかどうかの判定は、処理対象ノード群に含まれるノード間で共通する属性名が存在するかどうかを判定し、共通する属性名が存在すれば、括りだし操作が可能であると判定し、共通する属性名が存在しなければ、括りだし操作が不可能であると判定すればよい。
ステップＳ９０２では、出現頻度リストを参照して、処理対象ノード群に含まれる属性名のうち、出現頻度が最大となる属性名を括りだし対象属性名候補として抽出する。なお、出現頻度が最大となる属性名が複数存在する場合は、それぞれ括りだし対象属性名候補とする。
ステップＳ９０３では、各括りだし対象属性名候補について、評価処理を行なう。評価処理については、図１０を参照して後述する。 Next, details of the recombination process in step S603 will be described with reference to the flowchart of FIG. The following processing from step S901 to step S907 is referred to as recombination processing 900.
In the first process, the child node group of the root node in the initial appearance pattern tree becomes the process target node group. The processing target node group is a child node group having a common parent node, and is one or more nodes to be subjected to recombination processing.
In step S901, it is determined whether or not a grouping operation can be performed on the processing target node group. If the grouping operation is possible, the process proceeds to step S902. If the grouping operation is not possible, that is, if the grouping operation is not possible, the tree includes the attribute names included in each node in the processing target node group and the relationship between the nodes. Returns structure information.
Also, whether or not the grouping operation is possible is determined by determining whether or not there is a common attribute name among the nodes included in the processing target node group. If there is a common attribute name, the grouping operation is performed. If there is no common attribute name, it may be determined that the grouping operation is impossible.
In step S902, with reference to the appearance frequency list, among the attribute names included in the processing target node group, the attribute name having the highest appearance frequency is bundled and extracted as a target attribute name candidate. If there are a plurality of attribute names having the highest appearance frequency, each attribute name is a candidate for a target attribute name.
In step S903, evaluation processing is performed for each bundle target attribute name candidate. The evaluation process will be described later with reference to FIG.

ステップＳ９０４では、評価処理結果から、括りだし対象属性名を決定する。複数の括りだし対象属性名候補が存在する場合は、ステップＳ９０３の評価処理の結果から括りだし対象属性名を決定すればよい。本実施形態では、評価処理の結果である評価値が最小となる括りだし対象属性名候補について評価処理を行った際の共通属性名群を括りだし対象属性名とする。共通属性名群については後述する。
ステップＳ９０５では、処理対象ノード群における括りだし対象属性名を含まないノード群を新たな処理対象ノード群とした組み換え処理９００を実行する。すなわち、図９に示す処理全体を再帰的に呼び出して組み換え処理を実行する。
ステップＳ９０６では、処理対象ノード群における括りだし対象属性名を含むノード群について、括りだし対象属性名を括りだす括りだし操作を行なう。具体的には、処理対象ノード群における括りだし対象属性名を含むノードについて１つずつ子ノードを生成し、括りだし対象属性名以外の属性名をそれぞれの子ノードに移し替える。このとき、括りだし対象属性名以外の属性名を含まない場合には、子ノードは１つも属性名を含まないことになる。１つも属性名を含まないノードであることを記号φで示す。さらに、括りだし対象属性名のみを含むノード群を１つのノードに融合し、生成した各子ノードは融合したノードの子ノードとする。なお、本実施形態では１つも属性名を含まないノードであることを記号φで示すこととしたが、記号「φ」に限らず他の記号を用いてもよいし、空のノードとしてもよい。また、処理対象ノード群における括りだし対象属性名を含むノードについて共通する親ノードを生成し、括りだし対象属性名を生成した親ノードに移し替えることとしても良い。
ステップＳ９０７では、処理対象ノード群における括りだし対象属性名のみを含むノードの子ノード群、すなわちステップＳ９０６で融合した括りだし対象属性名のみを含むノードを親ノードとする子ノード群を新たな処理対象ノード群とした組み換え処理９００を実行する。すなわち、図９に示す処理全体を再帰的に呼び出して組み換え処理を実行する。
ステップＳ９０７の処理を終了すると、組み換え処理によって組み換えられた木構造情報を返す。上記手順により組み換え処理が再帰的に実行され、出現パターン木が構築される。
本実施形態において組み換え処理を繰り返し実行する手段として、組み換え処理を再帰的に実行することとしたが，組み換え処理を実行するたびに括りだし操作が可能なノード群、すなわち親ノードを共通とする子ノード群であり、該子ノード群におけるノード間で共通する属性名が含まれる、という条件を満たすノード群を捜索し、括りだし操作が可能なノード群が存在しなくなるまで組み換え処理を実行することとしても良い。 In step S904, the grouping target attribute name is determined from the evaluation processing result. When there are a plurality of grouping target attribute name candidates, the grouping target attribute name may be determined from the result of the evaluation process in step S903. In the present embodiment, the common attribute name group when the evaluation process is performed on the grouping target attribute name candidate having the smallest evaluation value as a result of the evaluation process is defined as the grouping target attribute name. The common attribute name group will be described later.
In step S905, the recombination process 900 is executed with a node group that does not include the grouping target attribute name in the processing target node group as a new processing target node group. That is, the entire process shown in FIG. 9 is recursively called to execute the recombination process.
In step S906, for the node group including the grouping target attribute name in the processing target node group, a grouping operation for grouping the grouping target attribute name is performed. Specifically, one child node is generated for each node including the grouping target attribute name in the processing target node group, and attribute names other than the grouping target attribute name are transferred to the respective child nodes. At this time, if no attribute name other than the grouping target attribute name is included, no child node includes the attribute name. A symbol φ indicates that no node includes any attribute name. Further, a node group including only the grouping target attribute name is merged into one node, and each generated child node is a child node of the merged node. In the present embodiment, the symbol φ indicates that no node includes any attribute name, but other symbols may be used instead of the symbol “φ”, or an empty node may be used. . In addition, a common parent node may be generated for the node including the grouping target attribute name in the processing target node group, and the grouping target attribute name may be transferred to the generated parent node.
In step S907, the child node group of the node including only the grouping target attribute name in the processing target node group, that is, the child node group having the node including only the grouping target attribute name fused in step S906 as a parent node is newly processed. A recombination process 900 is executed as a target node group. That is, the entire process shown in FIG. 9 is recursively called to execute the recombination process.
When the process of step S907 is completed, the tree structure information recombined by the recombination process is returned. The recombination process is recursively executed by the above procedure, and an appearance pattern tree is constructed.
In the present embodiment, the recombination process is recursively executed as means for repeatedly executing the recombination process. However, each time a recombination process is executed, a group of nodes that can be grouped and operated, that is, a child having a common parent node. Search for a node group that satisfies the condition that it is a node group and the attribute name common to the nodes in the child node group is included, and perform recombination processing until there is no node group that can be grouped. It is also good.

次に、ステップＳ９０３における評価処理について図１０のフローチャートを参照して説明する。
ステップＳ１００１では、評価対象である括りだし対象属性名候補Ｘについて、処理対象ノード群のうち括りだし対象属性名候補Ｘを含むノード全てに共通して存在する、括りだし対象属性名候補Ｘ以外の属性名Ｙを抽出する。なお、共通して存在する括りだし対象属性名候補Ｘ以外の属性名Ｙは存在しなくてもよく、また複数存在してもよい。
ステップＳ１００２では、括りだし対象属性名候補Ｘと属性名Ｙとを合わせた共通属性名群の数と、処理対象ノード群における各ノードが含む共通属性名群以外の属性名の数との合計数を括りだし対象属性名候補Ｘの評価値として算出する。Ｓ１００２の処理を終了すると、評価値を返す。以上で評価処理を終了する。 Next, the evaluation process in step S903 will be described with reference to the flowchart of FIG.
In step S1001, for the grouping target attribute name candidate X that is the evaluation target, a group other than the grouping target attribute name candidate X that exists in common for all nodes including the grouping target attribute name candidate X in the processing target node group. The attribute name Y is extracted. Note that the attribute name Y other than the grouping target attribute name candidate X that exists in common does not have to exist, and a plurality of attribute names Y may exist.
In step S1002, the total number of the common attribute name group including the bundled target attribute name candidate X and the attribute name Y and the number of attribute names other than the common attribute name group included in each node in the processing target node group Is calculated as the evaluation value of the target attribute name candidate X. When the process of S1002 ends, the evaluation value is returned. The evaluation process ends here.

なお、本実施形態では、処理対象ノード群における括りだし対象属性名を含まないノード群から組み換え処理を行なう例を示すが、処理対象ノード群における括りだし対象属性名を含むノード群から先に組み換え処理を行なうようにしてもよい。
次に、図９に示す出現パターン木の組み換え処理と、図１０に示す評価処理の具体例について図１１から図１４までの処理経過を参照して説明する。
図１１は、初期出現パターン木から括りだし対象属性名候補を抽出する例である。初期出現パターン木におけるルートノードの子ノード群が最初の処理対象ノード群であるので、処理対象ノード群について括りだし操作が可能であるかを判定する。ここでは「ｓｕｂｊｅｃｔ」や「ｄａｔｅ」といった属性名が処理対象ノード群に含まれるノード間で共通して含まれているため、括りだし操作が可能であると判断する。
続いて、処理対象ノード群に含まれるノード間で共通する属性名のうち、出現頻度が最大となる属性名を抽出する。図７の出現頻度リストを参照すると、属性名「ｓｕｂｊｅｃｔ」が出現頻度５回と最大であるため、属性名「ｓｕｂｊｅｃｔ」を括りだし対象属性名候補として抽出する。 In this embodiment, an example is shown in which recombination processing is performed from a node group that does not include the grouping target attribute name in the processing target node group. However, the recombination processing is performed first from the node group that includes the grouping target attribute name in the processing target node group. Processing may be performed.
Next, a specific example of the appearance pattern tree recombination processing shown in FIG. 9 and the evaluation processing shown in FIG. 10 will be described with reference to the processing progress from FIG. 11 to FIG.
FIG. 11 is an example of extracting the bundled target attribute name candidates from the initial appearance pattern tree. Since the child node group of the root node in the initial appearance pattern tree is the first processing target node group, it is determined whether the processing target node group can be bundled. Here, since attribute names such as “subject” and “date” are included in common among nodes included in the processing target node group, it is determined that the grouping operation is possible.
Subsequently, among the attribute names common among the nodes included in the processing target node group, the attribute name having the highest appearance frequency is extracted. Referring to the appearance frequency list in FIG. 7, the attribute name “subject” has the maximum appearance frequency of 5 times, so the attribute name “subject” is extracted as a target attribute name candidate.

続いて、括りだし対象属性名候補である「ｓｕｂｊｅｃｔ」について、評価処理を行う。
まず評価対象である括りだし対象属性名候補「ｓｕｂｊｅｃｔ」について、処理対象ノード群のうち括りだし対象属性名候補「ｓｕｂｊｅｃｔ」を含むノード全てに共通して存在する、括りだし対象属性名候補「ｓｕｂｊｅｃｔ」以外の属性名を抽出する。ここでは、処理対象ノード群のうち括りだし対象属性名候補「ｓｕｂｊｅｃｔ」を含むノード全てに共通して存在する、括りだし対象属性名候補「ｓｕｂｊｅｃｔ」以外の属性名は存在しない。
続いて括りだし対象属性名候補「ｓｕｂｊｅｃｔ」の評価値を算出すると、括りだし対象属性名候補「ｓｕｂｊｅｃｔ」と、処理対象ノード群のうち括りだし対象属性名候補「ｓｕｂｊｅｃｔ」を含むノード全てに共通して存在する、括りだし対象属性名候補「ｓｕｂｊｅｃｔ」以外の属性名を合わせた属性名の数、すなわち共通属性名群が含む属性名の数は、１＋０＝１となる。また、処理対象ノード群における各ノードが含む共通属性名群以外の属性名の数は、各子ノード８０２から８０８までそれぞれ５，４，２，３，３，３，２となり、合わせて２２であるので、括りだし対象属性名候補「ｓｕｂｊｅｃｔ」の評価値は１＋２２＝２３と算出される。
ここでは、括りだし対象属性名候補が「ｓｕｂｊｅｃｔ」のみであり、従って評価値が最小となる括りだし対象属性名候補「ｓｕｂｊｅｃｔ」について、評価処理を行った際の共通属性名群「ｓｕｂｊｅｃｔ」を括りだし対象属性名として決定する。 Subsequently, an evaluation process is performed on “subject” which is a grouping target attribute name candidate.
First, regarding the grouping target attribute name candidate “subject” to be evaluated, the grouping target attribute name candidate “subject” that exists in common in all nodes including the grouping target attribute name candidate “subject” in the processing target node group. ”Is extracted. Here, there is no attribute name other than the grouping target attribute name candidate “subject” that is common to all nodes including the grouping target attribute name candidate “subject” in the processing target node group.
Subsequently, when the evaluation value of the grouping target attribute name candidate “subject” is calculated, it is common to all the nodes including the grouping target attribute name candidate “subject” and the grouping target attribute name candidate “subject” in the processing target node group. Thus, the number of attribute names including attribute names other than the grouping target attribute name candidate “subject”, that is, the number of attribute names included in the common attribute name group is 1 + 0 = 1. In addition, the number of attribute names other than the common attribute name group included in each node in the processing target node group is 5, 4, 2, 3, 3, 3, 2 from the child nodes 802 to 808, respectively. Therefore, the evaluation value of the grouping target attribute name candidate “subject” is calculated as 1 + 22 = 23.
Here, the grouping target attribute name candidate is only “subject”, and therefore the grouping target attribute name group “subject” when the evaluation process is performed on the grouping target attribute name candidate “subject” having the smallest evaluation value. It is determined as a grouping target attribute name.

結果として、図１１に示す初期出現パターン木における処理対象ノード群は、括りだし対象属性名「ｓｕｂｊｅｃｔ」を含むノード群Ｎ−１、括りだし対象属性名候補「ｓｕｂｊｅｃｔ」を含まないノード群Ｎ−２とに分類できる。 As a result, the processing target node group in the initial appearance pattern tree shown in FIG. 11 includes a node group N-1 including the grouping target attribute name “subject”, and a node group N− including no grouping target attribute name candidate “subject”. It can be classified as 2.

続いて、括りだし対象属性名「ｓｕｂｊｅｃｔ」が含まれないノード群、すなわちノード群Ｎ−２を新たな処理対象ノード群とした組み換え処理を行なう。ここでは、処理対象ノード群について、複数のノード間で共通する属性名として「ｏｂｊｅｃｔ」、「ｓｔａｔｕｓ」があるため括りだし操作が可能であると判断する。続いて処理対象ノード群に含まれるノード間で共通する属性名のうち、出現頻度が最大となる属性名を抽出する。図７の出現頻度リストを参照すると、属性名「ｏｂｊｅｃｔ」と「ｓｔａｔｕｓ」が出現頻度２回と最大であるため、属性名「ｏｂｊｅｃｔ」と「ｓｔａｔｕｓ」を括りだし対象属性名候補として抽出する。さらに、両括りだし対象属性名候補について評価処理を行う。まず評価対象である括りだし対象属性名「ｏｂｊｅｃｔ」について、処理対象ノード群のうち括りだし対象属性名候補「ｏｂｊｅｃｔ」を含むノード全てに共通して存在する、括りだし対象属性名候補「ｏｂｊｅｃｔ」以外の属性名を抽出する。このとき属性名「ｓｔａｔｕｓ」が抽出される。従って、括りだし対象属性名候補「ｏｂｊｅｃｔ」と、処理対象ノード群のうち括りだし対象属性名候補「ｏｂｊｅｃｔ」を含むノード全てに共通して存在する括りだし対象属性名候補以外の属性名「ｓｔａｔｕｓ」を合わせた共通属性名群の数は、１＋１＝２となる。また、処理対象ノード群における各ノードが含む共通属性名群以外の属性名の数は、子ノード８０７における「ｇａｔｅ＿ｎｕｍ」のみであるので、括りだし対象属性名候補「ｏｂｊｅｃｔ」の評価値は２＋１＝３と算出される。
同様に、括りだし対象属性名候補「ｓｔａｔｕｓ」の評価値を算出する場合、処理対象ノード群のうち括りだし対象属性名候補「ｓｔａｔｕｓ」を含むノード全てに共通して存在する、括りだし対象属性名候補「ｓｔａｔｕｓ」以外の属性名は「ｏｂｊｅｃｔ」が存在する。このとき処理対象ノード群における各ノードが含む共通属性名群以外の属性名の数は、子ノード８０７における「ｇａｔｅ＿ｎｕｍ」のみであるので、括りだし対象属性名候補「ｓｔａｔｕｓ」の評価値は２＋１＝３と算出される。 Subsequently, a recombination process is performed in which a node group that does not include the grouping target attribute name “subject”, that is, the node group N-2 is a new processing target node group. Here, since there are “object” and “status” as attribute names common to a plurality of nodes in the processing target node group, it is determined that the grouping operation is possible. Subsequently, among the attribute names common among the nodes included in the processing target node group, the attribute name having the highest appearance frequency is extracted. Referring to the appearance frequency list of FIG. 7, since the attribute names “object” and “status” have the maximum appearance frequency of 2 times, the attribute names “object” and “status” are bundled and extracted as target attribute name candidates. Furthermore, evaluation processing is performed on the attribute name candidates that are to be bundled. First, with regard to the grouping target attribute name “object” to be evaluated, the grouping target attribute name candidate “object” that is common to all nodes including the grouping target attribute name candidate “object” in the processing target node group. Extract attribute names other than. At this time, the attribute name “status” is extracted. Accordingly, the attribute name candidate “object” and the attribute names “status” other than the grouping target attribute name candidate that exist in common in all nodes including the grouping target attribute name candidate “object” in the processing target node group. The number of common attribute name groups including “1” is 1 + 1 = 2. In addition, since the number of attribute names other than the common attribute name group included in each node in the processing target node group is only “gate_num” in the child node 807, the evaluation value of the grouping target attribute name candidate “object” is 2 + 1 = 3 is calculated.
Similarly, when calculating the evaluation value of the grouping target attribute name candidate “status”, the grouping target attribute that is common to all nodes including the grouping target attribute name candidate “status” in the processing target node group. “Object” exists as an attribute name other than the name candidate “status”. At this time, since the number of attribute names other than the common attribute name group included in each node in the processing target node group is only “gate_num” in the child node 807, the evaluation value of the grouping target attribute name candidate “status” is 2 + 1 = 3 is calculated.

続いて評価値が最小となる括りだし対象属性名候補について評価処理を行った際の共通属性名群を括りだし対象属性名とするが、この場合はどちらの括りだし対象属性名候補を選択しても評価値が等しく最小である。本実施形態においては、評価値が最小である括りだし対象属性名候補が複数存在する場合には、先に評価処理を行った括りだし対象属性名候補を選択することとする。すなわち、括りだし対象属性名候補「ｏｂｊｅｃｔ」を選択し、共通属性名群「ｏｂｊｅｃｔ，ｓｔａｔｕｓ」を括りだし対象属性名として決定する。
なお、本実施形態においては、評価値が最小である括りだし対象属性名候補が複数存在する場合には、先に評価を行った括りだし対象属性名候補を選択することとしたが、これに限らず、共通属性名群の数が多い括りだし対象属性名候補を選択することとしてもよい。その上でも複数の括りだし対象属性名候補が存在する場合は、任意の括りだし対象属性名候補を選択することとしてもよい。 Subsequently, the common attribute name group when the evaluation target attribute name candidate with the smallest evaluation value is evaluated is used as the target attribute name. In this case, either of the target attribute name candidates is selected. Even the evaluation value is equally minimum. In the present embodiment, when there are a plurality of grouping target attribute name candidates having the smallest evaluation value, the grouping target attribute name candidate that has been subjected to the evaluation process is selected. That is, the grouping target attribute name candidate “object” is selected, and the common attribute name group “object, status” is determined as the grouping target attribute name.
In this embodiment, when there are a plurality of bundle target attribute name candidates having the smallest evaluation value, the bundle target attribute name candidate that has been evaluated first is selected. However, it is also possible to select a grouping target attribute name candidate having a large number of common attribute name groups. In addition, when there are a plurality of grouping target attribute name candidates, an arbitrary grouping target attribute name candidate may be selected.

次に、括りだし対象属性名「ｏｂｊｅｃｔ，ｓｔａｔｕｓ」の括りだし操作を行う。具体的にはまず、処理対象ノード群における括りだし対象属性名「ｏｂｊｅｃｔ，ｓｔａｔｕｓ」を含むノード、すなわち図１１におけるノード８０７とノード８０８について１つずつ子ノードを生成し、括りだし対象属性名以外の属性名をそれぞれの子ノードに移し替える。その結果、ノード８０７とノード８０８は、図１２に示すように括りだし対象属性名「ｏｂｊｅｃｔ，ｓｔａｔｕｓ」のみを含むノード１２０１−１とその子ノード１２０２、およびノード１２０１−２とその子ノード１２０３に分割される。なお、ノード１２０３は、１つも属性名を含まないノードであるため、記号φで示す。
さらに、処理対象ノード群における括りだし対象属性名「ｏｂｊｅｃｔ，ｓｔａｔｕｓ」のみを含むノード１２０１−１とノード１２０１−２を１つのノード１３０１に融合し、ノード１２０２およびノード１２０３は、ノード１３０１の子ノードとする。この結果が図１３に示す木構造である。なお続いて、処理対象ノード群Ｎ−２における、括りだし対象属性名「ｏｂｊｅｃｔ，ｓｔａｔｕｓ」のみを含むノード１３０１の子ノード群、すなわちノード１２０２とノード１２０３を含むノード群Ｎ−２−ｃを新たな処理対象ノード群とした組み換え処理（ステップＳ９００）を行うが、ノード群Ｎ−２−ｃに含まれるノード間で共通する属性名が存在しないため、括りだし操作が不可能であると判定し、Ｎ−２−ｃのノード群についての組み換え処理（ステップＳ９００）を終了する。その結果、処理対象ノード群Ｎ−２の組み換え処理（ステップＳ９００）も終了する。 Next, the grouping operation of the grouping target attribute name “object, status” is performed. Specifically, first, one child node is generated for each node including the grouping target attribute name “object, status” in the processing target node group, that is, the nodes 807 and 808 in FIG. The attribute name of is transferred to each child node. As a result, the node 807 and the node 808 are divided into a node 1201-1 and its child node 1202, and only a node 1201-2 and its child node 1203 that include only the grouping target attribute name “object, status” as shown in FIG. The Note that since the node 1203 is a node that does not include any attribute name, it is indicated by the symbol φ.
Further, the node 1201-1 and the node 1201-2 including only the grouping target attribute name “object, status” in the processing target node group are merged into one node 1301, and the node 1202 and the node 1203 are child nodes of the node 1301. And The result is the tree structure shown in FIG. Subsequently, in the processing target node group N-2, a new node group N-2-c including the node 1202 and the node 1203, that is, a child node group of the node 1301 including only the grouping target attribute name “object, status” is newly added. The recombination process (step S900) is performed as a processing target node group, but it is determined that the grouping operation is not possible because there is no common attribute name among the nodes included in the node group N-2-c. , The recombination process (step S900) for the node group N-2-c ends. As a result, the recombination process (step S900) of the processing target node group N-2 is also terminated.

続いて、初期出現パターン木における処理対象ノード群について、括りだし操作を行う。括りだし対象属性名は「ｓｕｂｊｅｃｔ」と決定されているため、処理対象ノード群における括りだし対象属性名を含むノード、すなわち図１１におけるノード８０２からノード８０６について１つずつ子ノードを生成し、括りだし対象属性名以外の属性名をそれぞれの子ノードに移し替える。さらに、処理対象ノード群における括りだし対象属性名「ｓｕｂｊｅｃｔ」のみを含むノードを１つのノード１４０１に融合し、ノード１４０２からノード１４０６は、ノード１４０１の子ノードとする。この結果を図１４に示す。 Subsequently, a grouping operation is performed on the processing target node group in the initial appearance pattern tree. Since the grouping target attribute name is determined as “subject”, one child node is generated for each of the nodes including the grouping target attribute name in the processing target node group, that is, the nodes 802 to 806 in FIG. However, attribute names other than the target attribute name are transferred to their child nodes. Further, nodes including only the grouping target attribute name “subject” in the processing target node group are merged into one node 1401, and the nodes 1402 to 1406 are child nodes of the node 1401. The result is shown in FIG.

続いて、初期出現パターン木における処理対象ノード群について、括りだし対象属性名「ｓｕｂｊｅｃｔ」のみを含むノード１４０１の子ノード群、すなわちノード１４０２からノード１４０６までを含むノード群Ｎ−１−ｃを新たな処理対象ノード群とした組み換え処理を行う。
このとき括りだし操作は可能であり、出現頻度が最大となる属性名「ｄａｔｅ」、「ｉｄ」、「ｐｏｓｉｔｉｏｎｘ」および「ｐｏｓｉｔｉｏｎｙ」を括りだし対象属性名候補としてそれぞれ評価処理を行なう。
例えば、括りだし対象属性名候補「ｐｏｓｉｔｉｏｎｘ」の評価値を算出する場合、処理対象ノード群Ｎ−１−ｃのうち括りだし対象属性名候補「ｐｏｓｉｔｉｏｎｘ」を含むノード全てに共通して存在する、括りだし対象属性名候補「ｐｏｓｉｔｉｏｎｘ」以外の属性名は「ｐｏｓｉｔｉｏｎｙ」が存在する。よって、共通属性名群の数は「ｐｏｓｉｔｉｏｎｘ、ｐｏｓｉｔｉｏｎｙ」の２つであり、処理対象ノード群における各ノードが含む共通属性名群以外の属性名の数は、ノード１４０２からノード１４０６まで順に「ｄａｔｅ，ｉｄ，ｔｅｍｐ」の３個、「ｉｄ，ｍａｇｎｉｔｕｄｅ」の２個、「ｄａｔｅ，ｓｗｉｔｃｈ」の２個、「ｐｏｓｔ＿ｎｕｍ」の１個、「ｄａｔｅ，ｉｄ，ｈｕｍｉｄｉｔｙ」の３個である。 Subsequently, for the processing target node group in the initial appearance pattern tree, a child node group of the node 1401 including only the grouping target attribute name “subject”, that is, a node group N-1-c including the nodes 1402 to 1406 is newly added. Recombination processing is performed as a group of nodes to be processed.
At this time, the grouping operation is possible, and the attribute names “date”, “id”, “positionx”, and “positiony” with the maximum appearance frequency are each evaluated as the target attribute name candidates.
For example, when the evaluation value of the grouping target attribute name candidate “positionx” is calculated, it exists in common for all nodes including the grouping target attribute name candidate “positionx” in the processing target node group N-1-c. There is “positiony” as an attribute name other than the grouping target attribute name candidate “positionx”. Therefore, the number of common attribute name groups is “positionx, position”, and the number of attribute names other than the common attribute name group included in each node in the processing target node group is “date” in order from the node 1402 to the node 1406. , Id, temp ”, two“ id, magnesium ”, two“ date, switch ”, one“ post_num ”, and three“ date, id, humidity ”.

よって、その総数は１１となり、括りだし対象属性名候補「ｐｏｓｉｔｉｏｎｘ」の評価値は２＋１１＝１３となる。同様に、括りだし対象属性名候補「ｄａｔｅ」、「ｉｄ」および「ｐｏｓｉｔｉｏｎｙ」についても評価値を算出すると、評価値は、それぞれ１５、１５および１３となる。結果として、評価値が最小となる括りだし対象属性名候補のうち、先に評価処理を行った括りだし対象属性名候補「ｐｏｓｉｔｉｏｎｘ」を選択し、共通属性名群「ｐｏｓｉｔｉｏｎｘ，ｐｏｓｉｔｉｏｎｙ」を括りだし対象属性名として決定する。 Therefore, the total number is 11, and the evaluation value of the grouping target attribute name candidate “positionx” is 2 + 11 = 13. Similarly, when evaluation values are calculated for the grouping target attribute name candidates “date”, “id”, and “positiony”, the evaluation values are 15, 15, and 13, respectively. As a result, among the grouping target attribute names with the smallest evaluation value, the grouping target attribute name candidate “positionx” that has been evaluated first is selected, and the common attribute name group “positionx, positiony” is grouped. Determine as the target attribute name.

以下同様に、初期出現パターン木について括りだし操作が不可能となるまで組み換え処理を再帰的に実行することで、出現パターン木を構築することができる。本実施形態における、木構造構築部１０３で生成される最終的な出現パターン木を図１５に示す。
このように、組み換え処理を行うことで、出現パターン一覧をより簡略化した木構造を構築することができ、これを描画することでデータの集合にどのような属性名を含むデータが存在しているかを容易に把握することができる。
さらに、出現頻度が高い属性名を優先して括りだし対象属性名候補とした組み換え処理を行うことで、出現パターン木における各ノードが含む属性名の数の合計値を小さくする効果が大きくなる。よって、出現パターン一覧をより簡略化した木構造により描画できる。さらにルートノードに近いほど出現頻度が高い属性名が含まれているため、どのような属性名が出現パターン一覧に多く含まれているかを容易に把握することができる。 Similarly, the appearance pattern tree can be constructed by recursively executing the recombination process until the initial appearance pattern tree cannot be bundled and operated. FIG. 15 shows a final appearance pattern tree generated by the tree structure construction unit 103 in this embodiment.
In this way, by performing recombination processing, it is possible to construct a tree structure that further simplifies the appearance pattern list, and by drawing this, there is data including any attribute name in the data set Can be easily grasped.
Furthermore, the effect of reducing the total value of the number of attribute names included in each node in the appearance pattern tree is increased by performing the recombination process in which attribute names with high appearance frequency are preferentially bundled and used as target attribute name candidates. Therefore, the appearance pattern list can be drawn with a simplified tree structure. Furthermore, since the attribute name with higher appearance frequency is included as it is closer to the root node, it is possible to easily grasp what kind of attribute name is included in the appearance pattern list.

また、本実施形態では、図１５に示すノード１５０１にように、出現頻度の異なる属性名が１つのノードに含まれることがあるが、これに限らず、出現頻度が異なる属性名を含むノードを分割し、出現頻度が異なる属性名を含むノードが存在しないようにしてもよい。
例えば、ノード１５０１の例では、ノード１５０１は出現頻度が３回の属性名「ｄａｔｅ」と出現頻度が１回の属性名「ｔｅｍｐ」とを含むため、属性名「ｄａｔｅ」を含む１つのノードと、その子ノードとして属性名「ｔｅｍｐ」を含むノードに分割してもよい。これによって、属性名間の出現頻度の違いをより詳細に出現パターン木から読み取ることができる。
なお、出現パターン一覧の全ての出現パターンに共通する１以上の属性名が存在する場合、ルートノードを省略した木構造で描画してもよい。 In this embodiment, as shown in a node 1501 in FIG. 15, attribute names with different appearance frequencies may be included in one node. However, the present invention is not limited to this, and nodes including attribute names with different appearance frequencies may be included. It may be divided so that there are no nodes including attribute names having different appearance frequencies.
For example, in the example of the node 1501, since the node 1501 includes an attribute name “date” having an appearance frequency of 3 times and an attribute name “temp” having an appearance frequency of 1 time, The child node may be divided into nodes including the attribute name “temp”. Thereby, the difference in appearance frequency between attribute names can be read in more detail from the appearance pattern tree.
If there is one or more attribute names common to all appearance patterns in the appearance pattern list, the tree structure may be drawn with the root node omitted.

また、データの集合における各出現パターンの出現回数をカウントし、出現パターン木にデータの集合における各出現パターンのデータの出現回数を表すノードを追加して描画してもよい。出現回数は、データ蓄積部に蓄積されたデータの集合において、各出現パターンのデータが何回出現したか、すなわち何個蓄積されていたかを示す回数である。
出現パターン木に各出現パターンの出現回数を表すノードを追加した一例を図１６に示す。
例えば、左端のノードの末端にノード１６０１が追加され、ノード１６０１には「２」と記載される。これにより、出現パターン「ｓｕｂｊｅｃｔ，ｐｏｓｉｔｉｏｎｘ，ｐｏｓｉｔｉｏｎｙ，ｉｄ，ｄａｔｅ，ｔｅｍｐ」は、データの集合に２回出現することが分かる。また、１つも属性名を含まない空のノード（φノード）は省略して描画してもよい。 Alternatively, the number of appearances of each appearance pattern in the data set may be counted, and a node representing the number of appearances of the data of each appearance pattern in the data set may be added to the appearance pattern tree for drawing. The number of appearances is a number indicating how many times the data of each appearance pattern has appeared, that is, how many have been stored in the set of data stored in the data storage unit.
FIG. 16 shows an example in which a node representing the number of appearances of each appearance pattern is added to the appearance pattern tree.
For example, a node 1601 is added to the end of the leftmost node, and “2” is described in the node 1601. Thereby, it can be seen that the appearance pattern “subject, positionx, positiony, id, date, temp” appears twice in the data set. Also, drawing may be performed by omitting an empty node (φ node) that does not include any attribute name.

次に、表示部１０４における表示例について説明する。
表示部１０４は、例えば図１５に示す出現パターン木をディスプレイなどに表示装置に描画する。なお、本実施形態において、出現パターン木は２次元の木構造で描画したが、例えば、ＣｏｎｅＴｒｅｅなどの手法を用いて３次元の木構造で描画してもよい（例えば、George G.Robertson, Jock D.Mackinlay, Stuart K.Card, "Cone Trees: Animated 3D Visualizations of Hierarchical Information," Proceedings of ACM Conference on Human Factors in Computing Systems (CHI'91), ACM Press, 1991,pp.189-194,参照）。また、木構造におけるノード間の親子関係を２次元上の多角形や閉曲線の包含関係で表すデータ宝石箱やベン図などを用いて描画してもよい（例えば、伊藤貴之, 梶永泰正, 池端裕子, “データ宝石箱：大規模階層型データのグラフィックスショーケース,” 情報処理学会グラフィクス&CAD 研究会, 2001-CG-104, pp.65-70,2001. 参照）。
また、図示しないが図１に示す情報可視化装置１００に接続されたプリンタなどで紙媒体などに描画してもよい。さらに、情報可視化装置１００にネットワーク経由で接続したクライアント・コンピュータに出現パターン木の描画イメージ、および出現パターン木の構造情報、つまりノード間の接続情報や各ノードが含む属性名の情報を送出してもよい。なお、このような表示処理は、既存の描画ライブラリを用いればよく、ここでの詳細な説明は省略する。 Next, a display example on the display unit 104 will be described.
For example, the display unit 104 draws the appearance pattern tree illustrated in FIG. 15 on the display device on a display or the like. In the present embodiment, the appearance pattern tree is drawn with a two-dimensional tree structure, but may be drawn with a three-dimensional tree structure using a technique such as Cone Tree (for example, George G. Robertson, See Jock D. Mackinlay, Stuart K. Card, "Cone Trees: Animated 3D Visualizations of Hierarchical Information," Proceedings of ACM Conference on Human Factors in Computing Systems (CHI'91), ACM Press, 1991, pp.189-194. ). In addition, the parent-child relationship between nodes in a tree structure may be drawn using a data jewel box or Venn diagram that represents the inclusion relationship of a two-dimensional polygon or closed curve (for example, Takayuki Ito, Yasumasa Tominaga, Yuko Ikehata) , “Data Jewel Box: Graphics Showcase of Large-Scale Hierarchical Data,” Information Processing Society of Japan, Graphics & CAD Study Group, 2001-CG-104, pp.65-70, 2001.).
Although not shown, the image may be drawn on a paper medium or the like by a printer connected to the information visualization apparatus 100 shown in FIG. Furthermore, the drawing image of the appearance pattern tree and the structure information of the appearance pattern tree, that is, the connection information between the nodes and the attribute name information included in each node are sent to the client computer connected to the information visualization apparatus 100 via the network. Also good. Note that such a display process may use an existing drawing library, and a detailed description thereof is omitted here.

以上に示した本実施形態に係る情報可視化装置によれば、多種多様なデータの集合から、属性名の組み合わせを示す出現パターンを生成して、出現パターンの一覧を木構造で描画した出現パターン木を構築することで、どのような属性名の組み合わせを含むデータが記録されているかを俯瞰することができ、多種多様なデータの集合を容易に把握することができる。
また、１つのノードに複数の属性名を保持可能とすることで、出現パターン木のノード数を少なくすることができ、簡略化した木構造とすることができる。 According to the information visualization apparatus according to the present embodiment described above, an appearance pattern tree in which an appearance pattern indicating a combination of attribute names is generated from various data sets and a list of appearance patterns is drawn in a tree structure. By constructing, it is possible to overlook what kind of attribute name combination is recorded, and it is possible to easily grasp a wide variety of data sets.
Further, by making it possible to hold a plurality of attribute names in one node, the number of nodes in the appearance pattern tree can be reduced, and a simplified tree structure can be obtained.

さらに、共通した親ノードを持つ１以上のノード群において、１つ以上のノードで共通して出現する属性名を単一のノードに括りだすことで、各子ノードが保持する複数の属性名が全て同じでなくとも一部が共通していれば括りだすことができるので、出現パターン一覧をより簡略化した木構造で描画することができる。
また、出現頻度が高い属性名を優先して括りだし対象属性名とした組み換え処理をおこなうことで、出現パターン木における各ノードが含む属性名の数の合計値を小さくすることができ、出現パターン一覧をより簡略化した木構造で描画することができる。さらにルートノードに近いほど出現頻度が高い属性名が含まれているため、どのような属性名が出現パターン一覧に多く含まれているかを容易に把握することができる。
さらに、共通した親ノードを持つ１以上のノード群において、括りだすことが可能な括りだし対象属性名候補が複数存在する場合は、評価値が最小となる括りだし対象属性名候補を選択し、該括りだし対象属性名候補について評価処理を行った際の共通属性名群を括りだし対象属性名とすることで、出現パターン木における各ノードが含む属性名の数の合計値を小さくすることができ、出現パターン一覧をより簡略化した木構造で描画することができる。
なお、本実施形態にかかる情報可視化装置は、コンピュータとプログラムによっても実現でき、プログラムを記録媒体に記録することも、ネットワークを通して提供することも可能である。 Furthermore, in one or more node groups having a common parent node, attribute names that appear in common in one or more nodes are bundled into a single node, so that a plurality of attribute names held by each child node can be obtained. Even if they are not all the same, they can be bundled if they are partly in common, so that the appearance pattern list can be drawn with a simplified tree structure.
In addition, by performing the recombination process that preferentially bundles attribute names with high appearance frequency into target attribute names, the total number of attribute names included in each node in the appearance pattern tree can be reduced, and the appearance pattern The list can be drawn with a simplified tree structure. Furthermore, since the attribute name with higher appearance frequency is included as it is closer to the root node, it is possible to easily grasp what kind of attribute name is included in the appearance pattern list.
Furthermore, when there are a plurality of grouping target attribute name candidates that can be grouped in one or more node groups having a common parent node, the grouping target attribute name candidate having the smallest evaluation value is selected, It is possible to reduce the total value of the number of attribute names included in each node in the appearance pattern tree by setting the common attribute name group at the time of performing the evaluation process for the grouping target attribute name as the grouping target attribute name. The appearance pattern list can be drawn with a simplified tree structure.
The information visualization apparatus according to the present embodiment can be realized by a computer and a program, and can be recorded on a recording medium or provided through a network.

要するに本発明は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 In short, the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, you may combine suitably the component covering different embodiment.

１００・・・情報可視化装置、１０１・・・データ蓄積部、１０２・・・出現パターン生成部、１０３・・・木構造構築部、１０４・・・表示部、１０５・・・リスト作成部、１０６・・・組み替え部、２０１，７０１・・・属性名、２０２・・・属性値、２０３・・・属性、２０４・・・データ、５００・・・出現パターン一覧、５０１・・・出現パターン、７００・・・出現頻度リスト、７０２・・・出現頻度、８００・・・初期出現パターン木、８０１・・・ルートノード、８０２〜８０８，１２０１−１，１２０１−２，１２０２，１２０３，１３０１，１４０１〜１４０６，１５０１，１６０１・・・ノード。 DESCRIPTION OF SYMBOLS 100 ... Information visualization apparatus, 101 ... Data storage part, 102 ... Appearance pattern generation part, 103 ... Tree structure construction part, 104 ... Display part, 105 ... List creation part, 106 ... rearrangement unit, 201,701 ... attribute name, 202 ... attribute value, 203 ... attribute, 204 ... data, 500 ... appearance pattern list, 501 ... appearance pattern, 700 ... appearance frequency list, 702 ... appearance frequency, 800 ... initial appearance pattern tree, 801 ... root node, 802 to 808, 1201-1, 1201-2, 1202, 1203, 1301, 1401 1406, 1501, 1601... Nodes.

Claims

Generating means for generating at least an appearance pattern that is a combination of attribute names included in one data from a set of data including one or more attribute names;
From the list of appearance patterns, each node in the tree structure includes 0 or 1 or more attribute names, and an appearance pattern tree in which a set of attribute names that appear once on a route from a root node to a leaf node represents an appearance pattern A construction means for constructing
The construction means extracts at least one grouping target attribute name that is commonly included in one or more node groups having a common parent node in the appearance pattern tree, and sets the grouping target attribute name as a single group name. An information visualization device characterized by performing recombination processing in a node.

The information visualization apparatus according to claim 1, wherein the construction unit repeatedly executes the recombination process.

The construction means creates an appearance frequency list in which attribute names included in the appearance patterns are associated with the appearance frequencies of the attribute names in all appearance patterns;
Based on the appearance frequency list, recombination means for constructing the appearance pattern tree by executing the recombination process as the bundled target attribute name in preference to the attribute name having a high appearance frequency among the attribute names; The information visualization apparatus according to claim 1, further comprising:

In the node group including the grouping target attribute name, the construction unit extracts an attribute name included in the grouping group along with the grouping target attribute name, and includes the grouping target attribute name and the grouping target attribute name. The information according to any one of claims 1 to 3, wherein a group of common attributes combined with attribute names included together is grouped into a single node as a new grouping target attribute name. Visualization device.

The recombination means, when there are a plurality of grouping target attribute names having the same appearance frequency, for each grouping target attribute name, the number of attribute names included in the common attribute name group and the attribute name included in each node An evaluation value that is the total value of the number of attribute names excluding the common attribute name group is calculated, and the common attribute name group in the grouping target attribute name having the smallest evaluation value is grouped as a single target attribute name. The information visualization apparatus according to claim 4, wherein the information visualization apparatus is bundled into one node.

The information visualization according to any one of claims 1 to 5, wherein the data is in a data format including at least a set of one or more attribute names and a value corresponding to the attribute name. apparatus.

The information visualization apparatus according to claim 1, wherein the data is u Tuple format data.

An information visualization method executed by an information visualization apparatus including a generation unit and a construction unit,
The generating means generates at least an appearance pattern that is a combination of attribute names included in one data from a set of data including one or more attribute names;
The construction means includes, from the list of appearance patterns, a set of attribute names each including one or more of the attribute names in the tree structure and appearing once on a route from a root node to a leaf node. Is created, and at least one grouping target attribute name that is commonly included in one or more node groups having a common parent node is extracted from the patterning tree, and the grouping target attribute name is extracted. An information visualization method characterized by performing a recombination process that bundles a single node.

The program for making a computer run as each means of the information visualization apparatus of any one of Claims 1-7.