JP3636682B2

JP3636682B2 - Data analysis apparatus and method

Info

Publication number: JP3636682B2
Application number: JP2001265457A
Authority: JP
Inventors: 時彦丹羽; 眞弓比嘉; 一義田中; 健二藤川
Original assignee: 時彦丹羽; 眞弓比嘉; 株式会社日立システムアンドサービス
Priority date: 2001-09-03
Filing date: 2001-09-03
Publication date: 2005-04-06
Anticipated expiration: 2021-09-03
Also published as: JP2003075202A

Description

【０００１】
【発明の技術分野】
本発明は、多変量データを分析するデータ分析装置に関する。特に、多変量データを、ユーザ操作に従って視覚的に分析することのできるデータ分析装置に関する。
【０００２】
【従来の技術および課題】
大量のデータを入力して、データ分析を行うツールとしてデータマイニングツールが知られている。このようなデータマイニングツールを用いてデータ分析を行うことにより、そのデータにおける傾向やデータ間の関係等といった複雑な分析情報を抽出することができる。ユーザは、この分析情報に基づいて、判定や予測などの意志決定を行う。
【０００３】
例えば、このようなデータマイニングツールにおいては、多変量データをクラスタ分類し、各クラスタにおけるデータ変量の影響度を様々な角度から視覚的に表示させることができる。
【０００４】
しかしながら、この場合のクラスタ分類は、ソフトウェアによってブラックボックス的に行われるため、ユーザにクラスタ分類にかかる専門的な知識がなければ、分析結果として表示された内容を理解することはできない。すなわち、多変量データのどの変量に着目することによって得られた結果であるのかを理解することは容易ではない。
【０００５】
また、他の従来技術として、図１４に示すようなレーダーチャートによる表示方法がある。レーダーチャートとは、多変量データの各変量を中心から放射状に伸びる数値軸上にプロットし、各プロット点を線で結ぶことにより１つの多変量データを表現したものである。したがって、複数の線が表示された場合に、ほぼ同形状を構成する線同士が、１つのクラスタであると判別できる。
【０００６】
しかしながら、レーダーチャートに表示する多変量データの数が増加すると、多変量データを表現する線の重なり部分が増加するため、クラスタの判別が困難になってしまう。
【０００７】
本発明は、このような課題を解決するためになされたものであり、多変量データを視覚的に提示して、ユーザ自身の操作においてデータのクラスタ分類を行うことのできるデータ分析装置を提供することを目的とする。さらに、クラスタ分類を行った多変量データに基づいて、当該クラスタをより明瞭に表示することのできるデータ分析装置を提供することを目的とする。
【０００８】
【課題を解決するための手段および発明の効果】
(1)(2)(3)この発明にかかるデータ分析装置、データ分析装置をコンピュータを用いて実現するためのプログラムまたはデータ分析装置をコンピュータを用いて実現するためのプログラムを記録した記録媒体においては、与えられた多変量データを構成する各変量データの値およびその重みに基づいて、当該多変量データを座標平面または座標空間に作図する作図手段と、前記各変量データの重みを操作する重み操作手段と、を備えており、前記重み操作手段によって重みを操作することにより、座標平面または座標空間上の各変量データの作図状態を変更して、複数の多変量データを視覚的にグループ化して把握することを容易にすることを特徴としている。
【０００９】
したがって、多変量データにおいて特定の変量の重みを変化させる操作を行うことができ、この操作に応じて当該変量が多変量データ全体に与える影響を作図状態の変更から確認することができる。これにより、多変量データのグループ化を視覚的に行うことができる。
【００１０】
(4)この発明にかかるデータ分析装置、データ分析装置をコンピュータを用いて実現するためのプログラムまたはデータ分析装置をコンピュータを用いて実現するためのプログラムを記録した記録媒体においては、作図手段は、すでに作図されている多変量データにおける重みを用いて、新たに与えられた多変量データを座標平面または座標空間に作図することを特徴としている。
【００１１】
したがって、重み操作を行って視覚的にグループ化された複数の多変量データと、グループが未知である新たに与えられた多変量データを視覚的に比較することができる。これにより、当該新たに与えられた多変量データにかかるグループ属性の判断を容易にすることができる。
【００１２】
(5)この発明にかかるデータ分析装置、データ分析装置をコンピュータを用いて実現するためのプログラムまたはデータ分析装置をコンピュータを用いて実現するためのプログラムを記録した記録媒体においては、作図手段は、多変量データを座標平面または座標空間に作図する際の作図要素として、各変量データの値をベクトル角度に対応付け、さらに、その重みをベクトル長に対応付けて、各変量データにかかるベクトルの連結によって得られる最終到達点を代表点として作図することを特徴としている。
【００１３】
したがって、多変量データにおける各変量データの値およびその重みに応じて決定される代表点を、座標平面または座標空間に作図することができる。すなわち、１つの代表点が表示されている位置が、一つの多変量データを表していることになる。
【００１４】
(6)この発明にかかるデータ分析装置、データ分析装置をコンピュータを用いて実現するためのプログラムまたはデータ分析装置をコンピュータを用いて実現するためのプログラムを記録した記録媒体においては、作図手段は、与えられた多変量データを作図する座標平面として、当該多変量データの代表点を所定の半円内に作図する星座グラフを用いることを特徴としている。
【００１５】
したがって、多変量データの最終到達点としての代表点が表示される位置は、星座グラフ上の半円内に限られるので、視覚的にグループ化を把握しやすい。
【００１６】
(7)この発明にかかるデータ分析装置、データ分析装置をコンピュータを用いて実現するためのプログラムまたはデータ分析装置をコンピュータを用いて実現するためのプログラムを記録した記録媒体においては、重み操作手段は、与えられた多変量データがどのグループに属するのかを示すグループ情報を取得して、各グループにおける多変量データの代表点の平均点と各代表点との距離、および全多変量データの代表点同士の距離に基づいて、各代表点により構成される各グループが明瞭に分かれて表示されるように、各変量データの重みを決定することを特徴としている。
【００１７】
したがって、与えられた多変量データの作図状態において、当該多変量データの代表点が明瞭なグループを構成していなくても、各変量データの重みを調整して明瞭なグループ表示を行うことができる。さらに、決定された各変量データの重みにより、そのグループ化状態における各変量の影響度をより明確にすることができる。
【００１８】
(8)この発明にかかるデータ分析装置、データ分析装置をコンピュータを用いて実現するためのプログラムまたはデータ分析装置をコンピュータを用いて実現するためのプログラムを記録した記録媒体においては、重み操作手段は、多変量データの作図状態を受けたユーザによって与えられる情報に基づいて、各変量データの重みを決定することを特徴としている。
【００１９】
したがって、与えられた多変量データの作図状態において、ユーザが自由に各変量の重みを入力することができる。これにより、注目している変量が、多変量データ全体に与える影響を視覚的に理解することができる。また、作図状態にある多変量データの代表点を、微調整して表示させることができる。
【００２０】
(9)この発明にかかるデータ分析装置、データ分析装置をコンピュータを用いて実現するためのプログラムまたはデータ分析装置をコンピュータを用いて実現するためのプログラムを記録した記録媒体においては、作図手段によって作図された各多変量データの作図状態に基づいて、複数の多変量データを選択することにより、選択された多変量データをグループ化して出力するグループ化手段を備えていることを特徴としている。
【００２１】
したがって、与えられた多変量データが属するグループを、作図状態に基づいて、視覚的に決定することができる。例えば、ある代表点の近傍に表示されている代表点を同じグループに属するものとして決定することができる。
【００２２】
(10)この発明にかかるデータ分析装置、データ分析装置をコンピュータを用いて実現するためのプログラムまたはデータ分析装置をコンピュータを用いて実現するためのプログラムを記録した記録媒体においては、グループ化手段は、前記作図手段によって作図された各多変量データの作図状態を視覚的に確認したユーザが、当該作図状態に基づいて選択した複数の多変量データをグループ化するものであることを特徴としている。
【００２３】
したがって、ユーザは、ディスプレイなどに表示された作図状態を見て、マウスなどのポインティングデバイス等を使用して、多変量データをグループ分けすることができる。
【００２４】
(11)この発明にかかるデータ分析装置、データ分析装置をコンピュータを用いて実現するためのプログラムまたはデータ分析装置をコンピュータを用いて実現するためのプログラムを記録した記録媒体においては、グループ化手段は、複数の多変量データが選択される際に、選択された多変量データと他の多変量データとを色分けして表示することを特徴としている。
【００２５】
したがって、グループ化を行う際において、選択している代表点を明確に認識することができ、グループ化作業を行いやすい。
【００２６】
(12)この発明にかかるデータ分析方法においては、与えられた多変量データを構成する各変量データの値およびその重みに基づいて、当該多変量データを座標平面または座標空間に作図する作図手段と、前記各変量データの重みを操作する重み操作手段と、を備えており、前記重み操作手段によって重みを操作することにより、座標平面または座標空間上の各変量データの作図状態を変更して、複数の多変量データを視覚的にグループ化して把握することを容易にすることを特徴としている。
【００２７】
したがって、多変量データにおいて注目すべき変量の重みを変化させる操作を行うことができ、この操作に応じて当該変量が多変量データ全体に与える影響を作図状態の変更から確認することができる。これにより、多変量データのグループ化を視覚的に行うことができる。
【００２８】
(13)この発明にかかるデータ分析方法においては、作図手段は、すでに視覚的にグループ化して把握することが容易な状態で作図されている多変量データにおける重みを用いて、新たに与えられた多変量データの作図を行い、当該新たに与えられた多変量データにかかるグループ属性の判断を容易にすることを特徴としている。
【００２９】
したがって、重み操作を行って視覚的にグループ化された複数の多変量データと、グループが未知である新たに与えられた多変量データを視覚的に比較することができる。これにより、当該新たに与えられた多変量データにかかるグループ属性の判断を容易にすることができる。
【００３０】
この明細書で用いられる用語については、次のように定義する。
【００３１】
「重み」とは、多変量データの各変量において、当該変量の他の変量に対する相対的な重要度をいう。
【００３２】
「グループ化」とは、複数のデータをまとめた集合を形成することをいう。実施形態では、作図された多変量データの代表点に基づき、代表点が互いに近傍に表示されている多変量データを集合としてグループ化している。
【００３３】
【発明の実施の形態】
以下、本発明における実施形態について、図面を参照して説明する。
【００３４】
１．第１の実施形態
１−１．機能ブロック図
図１は、本発明にかかるデータ分析装置１の構成を示す機能ブロック図である。
【００３５】
この図において、データ分析装置１は、多変量データを入力する入力手段２と、
与えられた多変量データを構成する各変量データの値およびその重みに基づいて、当該多変量データを座標平面または座標空間に作図する作図手段３と、前記各変量データの重みを操作する重み操作手段４と、作図手段によって作図された座標平面または座標空間を出力する出力手段５、を備えている。さらに、作図手段によって作図された各多変量データの作図状態に基づいて、複数の多変量データを選択することにより、選択された多変量データをグループ化して出力するグループ化手段６を備えている。
【００３６】
入力手段２は、ユーザが多変量データを入力するコンピュータのキーボード等である。出力手段は、作図手段３によって座標平面または座標空間に作図された情報を、ユーザが視覚的に認識できるようにするディスプレイ装置等である。
【００３７】
１−２．ハードウェア構成
図２に、データ分析装置１のハードウェア構成図を示す。この装置は、ＣＰＵ２０、メモリ２１、ディスプレイ２２、ハードディスク２３（記憶装置）、キーボード／マウス２４、ＣＤ−ＲＯＭドライブ２５を備えている。
【００３８】
なお、ハードディスク２３には、データ分析のためのプログラムなどが記録されている。このプログラムは、ＣＤ−ＲＯＭドライブ２５を介して、プログラムが記憶されたＣＤ−ＲＯＭ２６から読み出されてハードディスク２３にインストールされたものである。ＣＤ−ＲＯＭ２６以外の読み取り可能な記録媒体から、ハードディスクにインストールさせるようにしてもよい。さらに、通信回線を用いて当該プログラムをダウンロードするようにしてもよい。
【００３９】
１−３．フローチャート
図３は、当該データ分析装置１にかかるデータ分析プログラムのフローチャートである。
【００４０】
ユーザがデータ分析装置１のプログラムを起動すると、ディスプレイ２２にはデータ入力画面が表示される（ステップＳ３０１）。このデータ入力画面は、本実施形態の初期画面であって、データ分析の対象となる多変量データの入力を要求するものである。この入力画面は、多変量データを特定するためのデータＩＤを行として、当該データＩＤの持つ属性（変量）を列として入力するテーブル形式の構造をなしている。
【００４１】
図４は、植物のアヤメの観測データを入力データとしてデータ入力画面入力した場合の例を示しており、「がくの長さ」４１、「がくの幅」４２、「花びらの長さ」４３、「花びらの幅」４４を属性としている。
【００４２】
なお、本実施形態では、多変量データを画面に直接入力するようにしているが、他のコンピュータ装置等で作成した多変量データを読み込むようにしてもよい。この場合は、図４に示す「ファイル読込」４５ボタンを押下して対象のデータが記録されているファイルを指定すればよい。
【００４３】
データ入力を終えたユーザが、「作図」ボタン４６を押下すると、データ分析装置１は、入力された多変量データをメモリ２１またはハードディスク２３に読み込む（ステップＳ３０２）。
【００４４】
多変量データを読み込んだデータ分析装置１は、当該多変量データを構成する各変量データの値およびその重みに基づいて、座標平面に作図を行う（ステップＳ３０３）。なお、この場合の座標平面には、各多変量データの代表点を所定の半円内に表示する星座グラフを用いる。星座グラフとは、多変量データの各変量によって得られるそれぞれの代表点を、天空高く星のように散らばっているように表示することのできるグラフである。
【００４５】
図５は、ステップＳ３０３の作図処理の詳細を示すフローチャートである。データ分析装置１のＣＰＵ２０は、メモリ２１またはハードディスク２３に読み込んだ全ての多変量データから、各変量毎の最大値および最小値を求める（ステップＳ５０１）。例えば、図４の多変量データにおいては、「がくの長さ」４１、「がくの幅」４２、「花びらの長さ」４３、「花びらの幅」４４毎に、それぞれの最大値ｘ⁽²⁾ _jおよび最小値ｘ⁽¹⁾ _jが決定される。なお、ｊは変量の番号を表し、例えば、「がくの長さ」４１を表す変量では、ｊ＝１である。
【００４６】
次に、データ分析装置１のＣＰＵ２０は、多変量データを１レコードずつメモリ２１に読み込み（ステップＳ５０２）、上記で求めた最大値ｘ⁽²⁾ _jおよび最小値ｘ⁽¹⁾ _jを用いて、ｉ番目のレコードのｊ番目の変量ｘ_ijに対応する角度θ_ijを求める（ステップＳ５０３）。なお、当該角度θ_ijは数式１に示すように、変量ｘ_ijの値によって一値に決定される。
【００４７】
【数１】

次に、読み込んだ１レコードの変量ｘ_ijに対応するベクトルを（ｃｏｓθ_ij，ｓｉｎθ_ij）とし、このベクトルに対して重みｗ_jを付加する。但し、初期表示の場合、重みｗ_jは「１」に固定して処理するものとする。
【００４８】
例えば、「がくの長さ」４１、「がくの幅」４２、「花びらの長さ」４３、「花びらの幅」４４の変量をそれぞれ、ｘ₁₁、ｘ₁₂、ｘ₁₃、ｘ₁₄とすると、これに対応する角度がそれぞれ、θ₁₁、θ₁₂、θ₁₃、θ₁₄と定まる。
【００４９】
したがって、図６のＡに示すように、ベクトルｘ₁₁とｘ軸のなす角は、θ₁₁と定まり、重みｗ_j（＝１）をベクトル長とするベクトルｘ₁₁の終点を決定する。さらに、ベクトルｘ₁₂とｘ軸のなす角をθ₁₂とし、ベクトルｘ₁₁の終点を始点とするベクトルｘ₁₂の終点を決定する。
【００５０】
同様に、ベクトルｘ₁₃、ｘ₁₄の終点を決定し、ベクトルｘ₁₄の終点をプロットする。すなわち、ベクトルｘ₁₁、ｘ₁₂、ｘ₁₃、ｘ₁₄を連結した最終到達点が「★０１」である。
【００５１】
また、１レコードにおける各変量の連結ベクトルは、数式２のように定めることができる。
【００５２】
【数２】

したがって、数式２で定まるベクトルｘ_iを連結したベクトルの最終到達点をして、半径１の半円の原点を始点とする星座グラフにプロットする（ステップＳ５０４）。
【００５３】
なお、図６に示すように、各ベクトルの軌跡を星座グラフ上に表示してもよい。この場合、該当するレコードを構成する各変量のベクトルをより明瞭に表現できる。すなわち、代表点「★０１」をプロットする際に、各変量がどのような影響を及ぼしているのかがより明瞭になる。
【００５４】
さらに、データ分析装置１のＣＰＵ２０は、読み込んだ多変量データが、最終レコードであるか否かのチェックを行い（ステップＳ５０５）、全てのレコードの連結ベクトルの最終到達点がプロットされるまで同様の処理を行う。
【００５５】
図６のＢは、データＩＤ「０１」〜「１０」の多変量データをプロットした場合の例である。この図は、重みｗ_jが「１」の場合における多変量データのグループ状態を示している。図６のＢの右端（１〜２時方向）に示されているように、データＩＤの代表点「★０１」、「★０３」、「★０９」は、同一グループであると考えられる。
【００５６】
しかしながら、代表点「★０２」や「★１０」は、どのグループに属するのかは、不明瞭である。そこで、重み付与手段により、星座グラフ上における各データの表示位置を修正し、各データがどのようなグループを構成するのかを調整する。すなわち、初期表示の段階において、全ての変量に一律の重み（「１」）を与えた状態から、この重みをユーザが操作することにより、多変量データの各変量データによって決定される連結ベクトルの最終到達点である代表点を調整する。
【００５７】
全ての多変量データのレコードを、ディスプレイにプロットすると、データ分析装置１のＣＰＵ２０は、図７に示す重み操作画面７１を表示する。この重み操作画面７１は、変量毎に重みを変更できるスライダ７２を備えている。ユーザが、このスライダ７２をマウスでドラッグしながら上下に操作することで、該当する変量にかかる重みｗ_jを調整することができる。
【００５８】
ユーザによって重みが変更されると（ステップＳ３０４、ＹＥＳ）、データ分析装置１のＣＰＵ２０は、該当する変量の重みｗ_jの変更を行い（ステップＳ３０５）、再び作図処理を行う（ステップＳ３０３）。したがって、ユーザは、作図処理の結果を確認することで、重み操作を行った変量の重みがどの程度影響を与えているのかを視覚的に知ることができる。
【００５９】
例えば、「がくの長さ」４１の重みｗ₁が変更された場合、図６のＡに示す代表点「★０１」においては、ベクトルｘ₁₁の長さがｗ₁倍に変更されベクトルｘ₁₁の終点が移動する。したがって、ベクトルｘ₁₂の始点が移動するため、最終到達点である代表点「★０１」のプロット位置が移動する。
【００６０】
このように、重み操作と作図処理の結果確認を繰り返すことにより、複数の多変量データを、ユーザ任意の重み操作を行いつつ、視覚的な方法によりグループ化することができる。
【００６１】
図８に、重みを上手く操作して、グループ化した例を示す。重み操作画面８２において、ユーザは、「がくの長さ」、「がくの幅」に比べて、「花びらの長さ」、「花びらの幅」の重みを高く設定している。これにより、グラフ表示画面８１に示すように、多変量データのグループが３つ形成されている。すなわち、点線で囲まれた複数のプロット点（★部分）のかたまりが、それぞれのグループを示している。
【００６２】
これにより、ユーザは、アヤメの品種を３つのグループに分類することができ、このグループ化に最も影響を及ぼす要因となった変量は、「花びらの幅」であったと推測することができる。
【００６３】
２．第２の実施形態
第１の実施形態においては、ユーザが重み操作を行うことにより、複数の多変量データを視覚的にグループ化する例を示した。しかし、本実施形態においては、第１の実施形態や既知の情報等により、グループ化が終了している多変量データを入力して、星座グラフを表示させる方法について説明する。
【００６４】
例えば、星座グラフ上に表示させる際において、各データが構成するグループがより明瞭になるように、入力した多変量データにかかる各変量データの重みの最適化を行う。これにより、各変量毎の当該グループ化状態への影響度を知ることができる。
【００６５】
２−１．機能ブロック図、ハードウェア構成
第２の実施形態における機能ブロック図、ハードウェア構成は、第１の実施形態の場合と同様である。
【００６６】
２−２．フローチャート
図９は、当該データ分析装置１にかかるデータ分析プログラムのフローチャートである。
【００６７】
ユーザがデータ分析装置１のプログラムを起動すると、ディスプレイ２２にはデータ入力画面が表示され（ステップＳ９０１）、ユーザは、分析対象となる植物のアヤメの観測データを入力する。
図１０に、データ入力画面１００の例を示す。ここでは、入力項目として、「データＩＤ」１０１、「グループＩＤ」１０２、「がくの長さ」１０３、「がくの幅」１０４、「花びらの長さ」１０５、「花びらの幅」１０６が表示されている。なお、グループＩＤは、それぞれの観測データが、どの種のアヤメに属するのかをを示すものである。
【００６８】
また、グループＩＤが不明な場合には、データ入力画面１００で入力せずに、星座グラフ上に表示された代表点を選択することで、グループ化が可能である。
【００６９】
観測データの入力を終えたユーザが「作図」ボタン１０７を押下すると、データ分析装置１のＣＰＵ２０は、多変量データである当該観測データを、メモリ２１またはハードディスク２３に読み込む（ステップＳ９０２）。
【００７０】
多変量データを読み込んだデータ分析装置１のＣＰＵ２０は、当該多変量データを構成する各変量データの値およびその重みに基づいて、座標平面に作図を行う（ステップＳ９０３）。
【００７１】
なお、前記ステップＳ９０３の作図処理の詳細は、第１の実施形態に示した図５のフローチャートと同様である。
【００７２】
図１１のＡにステップ９０３の作図処理において、表示される星座グラフを示す。また、この状態から重み操作画面の重みスライダを操作した結果得られる星座グラフを図１１のＢに示す（ステップＳ９０３〜９０５の繰り返し）。
【００７３】
ユーザは、図１１のＢの星座グラフ上の各代表点に対して、グループ化手段を用いて、視覚的にグループ化を行うことができる。例えば、グループＩＤ入力欄１１４にグループＩＤを入力し、マウスのドラッグ等を行うことにより、同一グループとする代表点を選択する。図１１のＢにおいては、円１１１、円１１２、円１１３でそれぞれ囲まれた代表点（★印）がグループ化されている。また、入力したグループＩＤは、グループ化情報として、図１０に示したデータ入力画面１００のグループＩＤ列に反映される（ステップＳ９０６）。
【００７４】
なお、図１１のＢに示す代表点（★印）をマウスでドラッグして選択する場合において、選択中の代表点（★印）の色を変えて、他の代表点（★印）と区別できる様にしてもよい。例えば、図１１のＢにおいては、円１１１の代表点を塗りつぶし、円１１２の代表点を白抜きにし、円１１３の代表点を斜線としている。
【００７５】
代表点のグループ化が完了すると、ユーザは、図１１のＢに示す最適化ボタン１１５を押下する。なお、観測データのグループＩＤが予め入力されている場合には、図１０に示すデータ入力画面の作図ボタン１０７を押下すればよい。
【００７６】
指令を受けて、データ分析装置１のＣＰＵ２０は、重みの最適化処理を行う（ステップＳ９０７）。なお、図１２に、重みの最適化処理の詳細フローチャートを示す。
【００７７】
この処理において、データ分析装置１のＣＰＵ２０は、グループ毎に平均点と呼ぶ基準点を求める（ステップＳ１２０１）。なお、この平均点の求め方を以下に示す。
【００７８】
【数３】

である場合において、
【数４】

であり、
【数５】

である。
【００７９】
このとき、グループの個数をｎとし、ｊ番目のグループのｉ番目の代表点の連結ベクトルの終点を
【数６】

と定める。
【００８０】
この場合、
【数７】

と求まる。
【００８１】
次に、データ分析装置１のＣＰＵ２０は、代表点全体の平均点を求める（ステップＳ１２０２）。なお、この平均点の求め方を以下に示す。
【００８２】
【数８】

と求まる。
【００８３】
次に、データ分析装置１のＣＰＵ２０は、グループ内の各代表点と当該グループの平均点との距離の和を求める（ステップＳ１２０３）。なお、この距離の和Ｖａｒ_j（ｊ＝１，２，・・・ｎ）の求め方を以下に示す。
【００８４】
数式７により、
【数９】

としたとき、
【数１０】

である。
【００８５】
次に、データ分析装置１のＣＰＵ２０は、全体の各代表点と全体のの平均点との距離の和を求める（ステップＳ１２０４）。なお、この距離の和Ｖａｒの求め方を以下に示す。
【００８６】
数式８により、
【数１１】

としたとき、
【数１２】

である。
【００８７】
ここで、グループ内における代表点の散らばりを小さくし、グループ間の散らばりを大きくするためには、前記Ｖａｒ_jを小さくし、前記Ｖａｒを大きくすればよい。ここで、判定値Ｊを次のように定める。
【００８８】
【数１３】

さらに、全ての重み値の組み合わせについてＪを求める（ステップＳ１２０５）。
【００８９】
すなわち、前記判定値Ｊを最小にする重み値の組み合わせを求めるために、全ての重み値の組み合わせについて処理するまで（ステップＳ１２０７、ＮＯ）、各変量毎の重み値の組み合わせを変更して（ステップＳ１２０６）、判定値Ｊを求める処理であるステップＳ１２０１〜ステップＳ１２０５を繰り返す。
【００９０】
データ分析装置１のＣＰＵ２０は、全ての重み値の組み合わせについて上記の処理を実行すると（ステップＳ１２０７、ＹＥＳ）、判定値Ｊを最小にする時の重み値の組み合わせを採用して、作図処理を行う（ステップＳ１２０８）。なお、この作図処理は、図５に示すフローチャートと同様の処理で行われる。
【００９１】
図１３に、上記の「重みの最適化処理」を行った場合におけるグラフ表示画面１３１および重み操作画面１３２の例を示す。
【００９２】
グラフ表示画面１３１においては、図１１のＢに示すグラフ表示画面に比べて、グループを構成する代表点の表示位置が、よりグループが明瞭になる位置に移動している。また、グループ内においては、代表点同士の距離が小さくなる位置に移動している。
【００９３】
この時の、重み操作画面１３２に示された重み値が、前記「重みの最適化処理」において採用された、重み値の組み合わせである。例えば、今回のグループ分けに大きく寄与している変量は、その値が「４．５」である「花びらの長さ」であったと解釈することができる。
【００９４】
このように、「重みの最適化処理」を行うことにより、グループを構成する要因となる変量がいずれであるのかを調べることができる。
【００９５】
３．その他
上記実施形態においては、説明の都合上、数十件のデータを用いてデータ分析を行ったが、実際のデータ分析では、観測対象に応じたデータ件数でデータ分析をおこなうことが望ましい。
【００９６】
上記実施形態においては、与えられた多変量データのすべてについて、グループ化を行うことを前提としている。しかし、グループ化が完了した状態の重みを用いて、新たに与えられた多変量データの代表点を、既にグループ化された作図状態に重ねて表示するようにしてもよい。これにより、新たに与えられた多変量データが何れのグループに属するのかを、容易に視覚的に判断することができる。なお、このようなグループ化の判断方法は、病気の診断や機械の故障診断などのように様々な分野において応用することができる。
【００９７】
第２の実施形態においては、多変量データがグループ分けされていることを前提条件として、「重みの最適化処理」を行っている。しかしながら、大量の多変量データを分析する必要がある場合には、全てのデータをグループ化するのは困難である。
【００９８】
そこで、大量の多変量データのうち、少量の多変量データをサンプルとして抽出する。さらに、これらのデータについてのみ第２の実施形態によってグループ化し、「重みの最適化処理」を行う。これにより、当該サンプルを最も明瞭にグループ化するための重み値の組み合わせが求まる。
【００９９】
次に、上記で求めた重み値の組み合わせを、大量の多変量データに適用して星座グラフを表示させる。
【０１００】
この場合、もし、サンプルについて適切なグループ化が行われていれば、大量の多変量データを入力した場合であっても、星座グラフ内でプロットされる代表点は、明瞭なグループを構成して表示されることになる。なお、明瞭なグループが表示されない場合は、明瞭になるまで、サンプルの抽出から再び同じ処理を繰り返せばよい。
【０１０１】
このように、本発明は、グループ情報が未知である多変量データのデータ分析においても適用することができる。
【０１０２】
上記実施形態においては、各変量データの値をベクトル角度に対応付け、さらに、その変量の重みをベクトル長に対応付けて、各変量データ毎のベクトルを連結することにより、代表点のプロット位置を決定したが、逆に、変量の重みをベクトル角度に対応付け、各変量データの値をベクトル長に対応付けるようにしてもよい。
【０１０３】
また、各変量データおよびその変量の重みを、他の作図要素に対応付けてもよい。例えば、各変量データをＸ座標、その変量の重みをＹ座標に対応付けたＸＹ平面座標として表現してもよい。また、円形状、四角形、その他の形状の座標平面または３次元の座標空間等として表現してもよい。
【図面の簡単な説明】
【図１】この発明の一実施形態におけるデータ分析装置の機能ブロック図を示す図である。
【図２】この発明の一実施形態におけるデータ分析装置のハードウェア構成図を示す例である。
【図３】第１の実施形態の「データ分析プログラム」におけるフローチャートを示す図である。
【図４】第１の実施形態の「データ入力画面」におけるデータ分析装置のディスプレイを示す図である。
【図５】「作図処理」におけるフローチャートを示す図である。
【図６】第１の実施形態の「グラフ表示画面」におけるデータ分析装置のディスプレイを示す図である。
【図７】「重み操作画面」におけるデータ分析装置のディスプレイを示す図である。
【図８】第１の実施形態の「グラフ表示画面」および「重み操作画面」におけるデータ分析装置のディスプレイを示す図である。
【図９】第２の実施形態の「データ分析プログラム」におけるフローチャートを示す図である。
【図１０】第２の実施形態の「データ入力画面」におけるデータ分析装置のディスプレイを示す図である。
【図１１】第２の実施形態の「グラフ表示画面」におけるデータ分析装置のディスプレイを示す図である。
【図１２】「重みの最適化処理」におけるフローチャートを示す図である。
【図１３】第２の実施形態の「グラフ表示画面」および「重み操作画面」におけるデータ分析装置のディスプレイを示す図である。
【図１４】従来技術のレーダーチャートの例を示す図である。
【符号の説明】
１・・・データ入力装置
２・・・入力手段
３・・・作図手段
４・・・重み操作手段
５・・・出力手段
６・・・グループ化手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a data analysis apparatus for analyzing multivariate data. In particular, the present invention relates to a data analysis apparatus capable of visually analyzing multivariate data according to a user operation.
[0002]
[Prior art and problems]
A data mining tool is known as a tool for inputting a large amount of data and analyzing the data. By performing data analysis using such a data mining tool, it is possible to extract complicated analysis information such as trends in the data and relationships between the data. The user makes a decision such as determination and prediction based on the analysis information.
[0003]
For example, in such a data mining tool, multivariate data can be classified into clusters, and the influence of the data variables in each cluster can be visually displayed from various angles.
[0004]
However, since the cluster classification in this case is performed in a black box by software, the contents displayed as the analysis result cannot be understood unless the user has specialized knowledge regarding the cluster classification. That is, it is not easy to understand which variable of the multivariate data is the result obtained by paying attention.
[0005]
As another conventional technique, there is a display method using a radar chart as shown in FIG. A radar chart is a representation of one multivariate data by plotting each variable of multivariate data on a numerical axis extending radially from the center and connecting each plot point with a line. Therefore, when a plurality of lines are displayed, it can be determined that the lines constituting substantially the same shape are one cluster.
[0006]
However, when the number of multivariate data displayed on the radar chart increases, the overlapping portion of the lines representing the multivariate data increases, so that it becomes difficult to discriminate clusters.
[0007]
The present invention has been made to solve such a problem, and provides a data analysis apparatus that can visually present multivariate data and perform cluster classification of data in the user's own operation. For the purpose. It is another object of the present invention to provide a data analysis apparatus capable of displaying the cluster more clearly based on the multivariate data subjected to cluster classification.
[0008]
[Means for Solving the Problems and Effects of the Invention]
(1) (2) (3) Data analysis apparatus according to the present invention, a program for realizing the data analysis apparatus using a computer, or a recording medium recording a program for realizing the data analysis apparatus using a computer Is a plotting means for plotting the multivariate data on a coordinate plane or coordinate space based on the value of each variable data constituting the given multivariate data and its weight, and a weight for operating the weight of each of the variable data Operating means, and by operating the weight by the weight operating means, the drawing state of each variable data on the coordinate plane or coordinate space is changed, and a plurality of multivariate data is visually grouped. It is characterized by making it easy to grasp.
[0009]
Therefore, an operation for changing the weight of a specific variable in the multivariate data can be performed, and the influence of the variable on the entire multivariate data can be confirmed from the change in the drawing state according to this operation. Thereby, grouping of multivariate data can be performed visually.
[0010]
(4) In a data analysis apparatus according to the present invention, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer, the drawing means includes: It is characterized in that newly given multivariate data is drawn on a coordinate plane or a coordinate space using the weights in the already drawn multivariate data.
[0011]
Therefore, it is possible to visually compare a plurality of multivariate data visually grouped by performing a weight operation and newly provided multivariate data whose group is unknown. Thereby, it is possible to easily determine the group attribute related to the newly given multivariate data.
[0012]
(5) In a data analysis apparatus according to the present invention, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer, the drawing means includes: As a plotting element when plotting multivariate data on a coordinate plane or coordinate space, each variable data value is associated with a vector angle, and its weight is associated with a vector length, and the vectors of each variable data are linked. It is characterized in that the final reaching point obtained by the above is drawn as a representative point.
[0013]
Therefore, the representative point determined according to the value of each variable data and the weight thereof in the multivariate data can be plotted on the coordinate plane or the coordinate space. That is, the position where one representative point is displayed represents one multivariate data.
[0014]
(6) In the data analysis apparatus according to the present invention, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer, the drawing means includes: As a coordinate plane for plotting the given multivariate data, a constellation graph for plotting representative points of the multivariate data in a predetermined semicircle is used.
[0015]
Therefore, since the position where the representative point as the final arrival point of the multivariate data is displayed is limited to the semicircle on the constellation graph, it is easy to visually grasp the grouping.
[0016]
(7) In the data analysis apparatus according to the present invention, a recording medium storing a program for realizing the data analysis apparatus using a computer or a program for realizing the data analysis apparatus using a computer, the weight operation means is , Get group information indicating which group the given multivariate data belongs to, the distance between the average point of each multivariate data representative point in each group and each representative point, and the representative point of all multivariate data Based on the distance between them, the weight of each variable data is determined so that each group constituted by each representative point is clearly divided and displayed.
[0017]
Therefore, in the drawing state of the given multivariate data, even if the representative point of the multivariate data does not form a clear group, the weight of each variable data can be adjusted and a clear group display can be performed. . Further, the degree of influence of each variable in the grouped state can be further clarified by the determined weight of each variable data.
[0018]
(8) In the data analysis apparatus according to the present invention, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer, the weight operation means is The weight of each variable data is determined based on the information given by the user who has received the drawing state of the multivariate data.
[0019]
Therefore, the user can freely input the weight of each variable in the drawing state of the given multivariate data. Thereby, it is possible to visually understand the influence of the variable being noticed on the entire multivariate data. In addition, the representative points of the multivariate data in the drawing state can be finely adjusted and displayed.
[0020]
(9) In the data analysis apparatus according to the present invention, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer A grouping means for grouping and outputting the selected multivariate data by selecting a plurality of multivariate data based on the drawing state of each multivariate data is provided.
[0021]
Therefore, the group to which the given multivariate data belongs can be visually determined based on the drawing state. For example, representative points displayed in the vicinity of a certain representative point can be determined as belonging to the same group.
[0022]
(10) In the data analysis apparatus according to the present invention, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer, the grouping means is The user who visually confirms the drawing state of each multivariate data drawn by the drawing means groups a plurality of multivariate data selected based on the drawing state.
[0023]
Accordingly, the user can group the multivariate data by using a pointing device such as a mouse while viewing the drawing state displayed on the display or the like.
[0024]
(11) In the data analysis apparatus according to the present invention, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer, the grouping means is When a plurality of multivariate data is selected, the selected multivariate data and other multivariate data are displayed in different colors.
[0025]
Therefore, when performing grouping, the selected representative point can be clearly recognized, and the grouping operation can be easily performed.
[0026]
(12) In the data analysis method according to the present invention, a plotting means for plotting the multivariate data on a coordinate plane or coordinate space based on the value of each variable data constituting the given multivariate data and its weight. And a weight operation means for manipulating the weight of each variable data, and by operating the weight by the weight operation means, the drawing state of each variable data on the coordinate plane or coordinate space is changed, It is characterized by facilitating visual grouping of multiple multivariate data.
[0027]
Therefore, it is possible to perform an operation of changing the weight of the variable to be noted in the multivariate data, and according to this operation, the influence of the variable on the entire multivariate data can be confirmed from the change of the drawing state. Thereby, grouping of multivariate data can be performed visually.
[0028]
(13) In the data analysis method according to the present invention, the plotting means is newly given by using the weight in the multivariate data that has already been plotted in a state that can be easily visually grouped and grasped. Plotting multivariate data makes it easy to determine group attributes for the newly given multivariate data.
[0029]
Therefore, it is possible to visually compare a plurality of multivariate data visually grouped by performing a weight operation and newly provided multivariate data whose group is unknown. Thereby, it is possible to easily determine the group attribute related to the newly given multivariate data.
[0030]
The terms used in this specification are defined as follows.
[0031]
“Weight” refers to the relative importance of each variable in the multivariate data relative to other variables.
[0032]
“Grouping” means forming a set of a plurality of data. In the embodiment, based on the representative points of the plotted multivariate data, the multivariate data in which the representative points are displayed near each other is grouped as a set.
[0033]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0034]
1. First embodiment
1-1. Functional block diagram
FIG. 1 is a functional block diagram showing a configuration of a data analysis apparatus 1 according to the present invention.
[0035]
In this figure, the data analysis apparatus 1 includes an input means 2 for inputting multivariate data,
A plotting means 3 for plotting the multivariate data on a coordinate plane or coordinate space based on the value of each variable data constituting the given multivariate data and its weight, and a weight operation for manipulating the weight of each of the variable data Means 4 and output means 5 for outputting the coordinate plane or coordinate space drawn by the drawing means. In addition, grouping means 6 is provided for grouping and outputting the selected multivariate data by selecting a plurality of multivariate data based on the plotting state of each multivariate data plotted by the plotting means. .
[0036]
The input means 2 is a computer keyboard or the like on which a user inputs multivariate data. The output means is a display device or the like that allows the user to visually recognize the information drawn on the coordinate plane or coordinate space by the drawing means 3.
[0037]
1-2. Hardware configuration
FIG. 2 shows a hardware configuration diagram of the data analysis apparatus 1. This apparatus includes a CPU 20, a memory 21, a display 22, a hard disk 23 (storage device), a keyboard / mouse 24, and a CD-ROM drive 25.
[0038]
The hard disk 23 stores a program for data analysis. This program is read from the CD-ROM 26 storing the program via the CD-ROM drive 25 and installed in the hard disk 23. You may make it install in a hard disk from readable recording media other than CD-ROM26. Further, the program may be downloaded using a communication line.
[0039]
1-3. flowchart
FIG. 3 is a flowchart of a data analysis program according to the data analysis apparatus 1.
[0040]
When the user starts the program of the data analysis device 1, a data input screen is displayed on the display 22 (step S301). This data input screen is an initial screen of this embodiment, and requests input of multivariate data to be subjected to data analysis. This input screen has a table format structure in which data IDs for specifying multivariate data are input as rows and attributes (variables) of the data IDs are input as columns.
[0041]
FIG. 4 shows an example of the case where the observation data of the plant iris is input as the input data on the data input screen. The “grab length” 41, the “grab width” 42, the “petal length” 43, “Petal width” 44 is an attribute.
[0042]
In the present embodiment, multivariate data is directly input to the screen, but multivariate data created by another computer device or the like may be read. In this case, a “file read” 45 button shown in FIG. 4 may be pressed to designate a file in which target data is recorded.
[0043]
When the user who has finished inputting data presses the “plot” button 46, the data analysis apparatus 1 reads the input multivariate data into the memory 21 or the hard disk 23 (step S302).
[0044]
The data analysis apparatus 1 that has read the multivariate data draws a coordinate plane based on the value of each variable data constituting the multivariate data and its weight (step S303). In this case, a constellation graph that displays representative points of each multivariate data in a predetermined semicircle is used for the coordinate plane. The constellation graph is a graph that can display each representative point obtained by each variable of the multivariate data so that it is scattered like a star in the sky.
[0045]
FIG. 5 is a flowchart showing details of the drawing process in step S303. The CPU 20 of the data analysis apparatus 1 obtains the maximum value and the minimum value for each variable from all the multivariate data read into the memory 21 or the hard disk 23 (step S501). For example, in the multivariate data in FIG. 4, the maximum value x for each of “card length” 41, “card width” 42, “petal length” 43, and “petal width” 44.⁽²⁾ _jAnd the minimum value x⁽¹⁾ _jIs determined. Note that j represents the number of the variable. For example, j = 1 in the variable representing the “length of postcard” 41.
[0046]
Next, the CPU 20 of the data analyzer 1 reads the multivariate data into the memory 21 one record at a time (step S502), and calculates the maximum value x obtained above.⁽²⁾ _jAnd the minimum value x⁽¹⁾ _jAnd the j-th variable x of the i-th record_ijThe angle θ corresponding to_ijIs obtained (step S503). The angle θ_ijIs the variable x as shown in Equation 1._ijIt is determined to be one value by the value of.
[0047]
[Expression 1]

Next, the variable x of one record read_ijThe vector corresponding to (cosθ_ij, Sinθ_ij) And weight w for this vector_jIs added. However, in the initial display, the weight w_jIs fixed to “1”.
[0048]
For example, the variables of “the length of the graffiti” 41, “the width of the graffiti” 42, “the length of the petals” 43, and “the width of the petals” 44 are respectively expressed as x₁₁, X₁₂, X₁₃, X₁₄Then the corresponding angles are θ₁₁, Θ₁₂, Θ₁₃, Θ₁₄Is determined.
[0049]
Therefore, as shown in FIG.₁₁And the angle formed by the x axis is θ₁₁And weight w_jVector x with (= 1) as vector length₁₁Determine the end point of. Furthermore, the vector x₁₂And the angle between x axis and θ₁₂And the vector x₁₁Vector x starting at the end of₁₂Determine the end point of.
[0050]
Similarly, the vector x₁₃, X₁₄Determine the end point of the vector x₁₄Plot the end point of. That is, the vector x₁₁, X₁₂, X₁₃, X₁₄The final reaching point obtained by concatenating is “★ 01”.
[0051]
In addition, the concatenated vector of each variable in one record can be determined as Equation 2.
[0052]
[Expression 2]

Therefore, the vector x determined by Equation 2_iAre plotted on a constellation graph starting from the origin of a semicircle having a radius of 1 (step S504).
[0053]
In addition, as shown in FIG. 6, you may display the locus | trajectory of each vector on a constellation graph. In this case, the vector of each variable constituting the corresponding record can be expressed more clearly. That is, it becomes clearer what influence each variable has when plotting the representative point “★ 01”.
[0054]
Furthermore, the CPU 20 of the data analysis apparatus 1 checks whether or not the read multivariate data is the last record (step S505), and the same process is performed until the final arrival points of the connected vectors of all the records are plotted. Process.
[0055]
B of FIG. 6 is an example when multivariate data with data IDs “01” to “10” are plotted. This figure shows the weight w_jIndicates the group state of multivariate data when “1” is “1”. As shown on the right end (1-2 o'clock direction) of B in FIG. 6, the representative points “★ 01”, “★ 03”, and “★ 09” of the data ID are considered to be the same group.
[0056]
However, it is unclear to which group the representative points “★ 02” and “★ 10” belong. Therefore, the display position of each data on the constellation graph is corrected by weighting means to adjust what group each data constitutes. That is, in the initial display stage, a uniform weight (“1”) is given to all the variables, and the user operates the weights so that the concatenated vector determined by each variable data of the multivariate data. Adjust the representative point that is the final destination.
[0057]
When all the multivariate data records are plotted on the display, the CPU 20 of the data analysis apparatus 1 displays the weight operation screen 71 shown in FIG. The weight operation screen 71 includes a slider 72 that can change the weight for each variable. When the user operates the slider 72 up and down while dragging with the mouse, the weight w applied to the corresponding variable_jCan be adjusted.
[0058]
When the weight is changed by the user (step S304, YES), the CPU 20 of the data analysis apparatus 1 uses the weight w of the corresponding variable._jIs changed (step S305), and the drawing process is performed again (step S303). Therefore, the user can visually know how much the weight of the variable subjected to the weighting operation has an effect by confirming the result of the drawing process.
[0059]
For example, the weight w of “the length of the letter” 41₁Is changed, at the representative point “★ 01” shown in FIG.₁₁Length of w₁Vector x₁₁The end point of moves. Therefore, the vector x₁₂Therefore, the plot position of the representative point “★ 01”, which is the final arrival point, moves.
[0060]
In this way, by repeating the weighting operation and the result confirmation of the drawing process, a plurality of multivariate data can be grouped by a visual method while performing a user's arbitrary weighting operation.
[0061]
FIG. 8 shows an example in which weights are manipulated successfully and grouped. On the weight operation screen 82, the user sets the weights of “petal length” and “petal width” higher than those of “card length” and “card width”. Thereby, as shown in the graph display screen 81, three groups of multivariate data are formed. That is, a group of a plurality of plot points (★ portions) surrounded by a dotted line indicates each group.
[0062]
Thus, the user can classify the iris varieties into three groups, and it can be inferred that the variable that has the most influence on the grouping is the “petal width”.
[0063]
2. Second embodiment
In the first embodiment, an example has been described in which a plurality of multivariate data is visually grouped by a user performing a weighting operation. However, in the present embodiment, a method of displaying a constellation graph by inputting multivariate data that has been grouped according to the first embodiment, known information, or the like will be described.
[0064]
For example, when displaying on the constellation graph, the weight of each variable data applied to the input multivariate data is optimized so that the group constituted by each data becomes clearer. Thereby, it is possible to know the degree of influence on the grouping state for each variable.
[0065]
2-1. Functional block diagram, hardware configuration
The functional block diagram and hardware configuration in the second embodiment are the same as those in the first embodiment.
[0066]
2-2. flowchart
FIG. 9 is a flowchart of a data analysis program according to the data analysis apparatus 1.
[0067]
When the user starts the program of the data analysis apparatus 1, a data input screen is displayed on the display 22 (step S901), and the user inputs observation data of the iris of the plant to be analyzed.
FIG. 10 shows an example of the data input screen 100. Here, “data ID” 101, “group ID” 102, “card length” 103, “card width” 104, “petal length” 105, and “petal width” 106 are displayed as input items. Has been. The group ID indicates to which type of iris each observation data belongs.
[0068]
When the group ID is unknown, grouping is possible by selecting a representative point displayed on the constellation graph without inputting it on the data input screen 100.
[0069]
When the user who has finished inputting observation data presses the “plot” button 107, the CPU 20 of the data analysis apparatus 1 reads the observation data, which is multivariate data, into the memory 21 or the hard disk 23 (step S902).
[0070]
The CPU 20 of the data analysis apparatus 1 that has read the multivariate data draws a coordinate plane based on the value of each variable data constituting the multivariate data and its weight (step S903).
[0071]
The details of the drawing process in step S903 are the same as those in the flowchart of FIG. 5 described in the first embodiment.
[0072]
FIG. 11A shows a constellation graph that is displayed in the plotting process of step 903. Further, a constellation graph obtained as a result of operating the weight slider on the weight operation screen from this state is shown in FIG. 11B (repetition of steps S903 to 905).
[0073]
The user can visually group each representative point on the constellation graph in B of FIG. 11 using grouping means. For example, by inputting a group ID in the group ID input field 114 and dragging the mouse, representative points for the same group are selected. In B of FIG. 11, representative points (marked by ★) surrounded by a circle 111, a circle 112, and a circle 113 are grouped. The input group ID is reflected as grouping information in the group ID column of the data input screen 100 shown in FIG. 10 (step S906).
[0074]
When the representative point (★ mark) shown in B of FIG. 11 is selected by dragging with the mouse, the color of the selected representative point (★ mark) is changed to distinguish it from other representative points (★ mark). You may make it possible. For example, in FIG. 11B, the representative points of the circle 111 are filled in, the representative points of the circle 112 are outlined, and the representative points of the circle 113 are hatched.
[0075]
When the grouping of representative points is completed, the user presses an optimization button 115 shown in FIG. If the observation data group ID is input in advance, the drawing button 107 on the data input screen shown in FIG. 10 may be pressed.
[0076]
In response to the instruction, the CPU 20 of the data analysis apparatus 1 performs weight optimization processing (step S907). FIG. 12 is a detailed flowchart of the weight optimization process.
[0077]
In this process, the CPU 20 of the data analyzer 1 obtains a reference point called an average point for each group (step S1201). In addition, the method of calculating | requiring this average score is shown below.
[0078]
[Equation 3]

In the case
[Expression 4]

And
[Equation 5]

It is.
[0079]
At this time, the number of groups is n, and the end point of the connected vector of the i-th representative point of the j-th group is
[Formula 6]

It is determined.
[0080]
in this case,
[Expression 7]

It is obtained.
[0081]
Next, the CPU 20 of the data analysis device 1 obtains an average score of all the representative points (step S1202). In addition, how to obtain this average score is shown below.
[0082]
[Equation 8]

It is obtained.
[0083]
Next, the CPU 20 of the data analysis device 1 obtains the sum of the distances between the representative points in the group and the average points of the group (step S1203). This sum of distances Var_jThe method for obtaining (j = 1, 2,... N) is shown below.
[0084]
From Equation 7,
[Equation 9]

When
[Expression 10]

It is.
[0085]
Next, the CPU 20 of the data analysis apparatus 1 calculates the sum of the distances between the entire representative points and the overall average points (step S1204). A method for obtaining the sum of distances Var is shown below.
[0086]
From Equation 8,
## EQU11 ##

When
[Expression 12]

It is.
[0087]
Here, in order to reduce the dispersion of the representative points in the group and increase the dispersion between the groups, the Var is used._jMay be reduced and the Var may be increased. Here, the judgment value J is determined as follows.
[0088]
[Formula 13]

Further, J is obtained for all combinations of weight values (step S1205).
[0089]
That is, in order to obtain a combination of weight values that minimizes the judgment value J, the combination of weight values for each variable is changed until all weight value combinations are processed (NO in step S1207) (step S1207). In step S1206, steps S1201 to S1205, which are processes for obtaining the determination value J, are repeated.
[0090]
When the CPU 20 of the data analysis apparatus 1 executes the above processing for all weight value combinations (YES in step S1207), the CPU 20 of the data analysis apparatus 1 adopts the weight value combination when the determination value J is minimized and performs the drawing process. (Step S1208). This drawing process is performed by the same process as the flowchart shown in FIG.
[0091]
FIG. 13 shows an example of the graph display screen 131 and the weight operation screen 132 when the above-described “weight optimization processing” is performed.
[0092]
In the graph display screen 131, the display position of the representative points constituting the group is moved to a position where the group becomes clearer than the graph display screen shown in FIG. 11B. In the group, the distance between the representative points is reduced to a position where the distance is reduced.
[0093]
The weight value displayed on the weight operation screen 132 at this time is a combination of weight values employed in the “weight optimization process”. For example, the variable that greatly contributes to the current grouping can be interpreted as “the length of the petals” whose value is “4.5”.
[0094]
In this way, by performing the “weight optimization process”, it is possible to check which variable is a factor constituting the group.
[0095]
3. Other
In the above embodiment, for the convenience of explanation, data analysis is performed using several tens of data. However, in actual data analysis, it is desirable to perform data analysis with the number of data corresponding to the observation target.
[0096]
In the above embodiment, it is assumed that grouping is performed for all of the given multivariate data. However, the representative points of the newly given multivariate data may be displayed so as to overlap the already grouped drawing state by using the weight of the grouping completed state. Thereby, it is possible to easily visually determine to which group the newly given multivariate data belongs. Such a grouping determination method can be applied in various fields such as disease diagnosis and machine failure diagnosis.
[0097]
In the second embodiment, “weight optimization processing” is performed on the precondition that multivariate data is grouped. However, when it is necessary to analyze a large amount of multivariate data, it is difficult to group all the data.
[0098]
Therefore, a small amount of multivariate data is extracted as a sample from a large amount of multivariate data. Furthermore, only these data are grouped according to the second embodiment, and “weight optimization processing” is performed. Thereby, a combination of weight values for most clearly grouping the samples is obtained.
[0099]
Next, the constellation graph is displayed by applying the combination of the weight values obtained above to a large amount of multivariate data.
[0100]
In this case, if the sample is properly grouped, the representative points plotted in the constellation graph form a clear group even when a large amount of multivariate data is input. Will be displayed. If a clear group is not displayed, the same process may be repeated from the sample extraction until it becomes clear.
[0101]
Thus, the present invention can also be applied to data analysis of multivariate data whose group information is unknown.
[0102]
In the above embodiment, the value of each variable data is associated with the vector angle, the weight of the variable is associated with the vector length, and the vector for each variable data is connected to thereby obtain the plot position of the representative point. However, conversely, the variable weight may be associated with the vector angle, and the value of each variable data may be associated with the vector length.
[0103]
Each variable data and the weight of the variable may be associated with other drawing elements. For example, each variable data may be expressed as an X coordinate, and the weight of the variable is expressed as an XY plane coordinate corresponding to the Y coordinate. Further, it may be expressed as a circular shape, a square shape, a coordinate plane of other shapes, a three-dimensional coordinate space, or the like.
[Brief description of the drawings]
FIG. 1 is a functional block diagram of a data analysis apparatus according to an embodiment of the present invention.
FIG. 2 is an example showing a hardware configuration diagram of a data analysis apparatus according to an embodiment of the present invention.
FIG. 3 is a diagram showing a flowchart in a “data analysis program” of the first embodiment.
FIG. 4 is a diagram showing a display of the data analysis device on the “data input screen” of the first embodiment.
FIG. 5 is a diagram illustrating a flowchart in “plotting processing”;
FIG. 6 is a diagram showing a display of the data analysis apparatus in the “graph display screen” of the first embodiment.
FIG. 7 is a diagram showing a display of the data analysis device in a “weight operation screen”.
FIG. 8 is a diagram showing a display of the data analysis device in the “graph display screen” and the “weight operation screen” of the first embodiment.
FIG. 9 is a diagram showing a flowchart in a “data analysis program” of the second embodiment.
FIG. 10 is a diagram showing a display of the data analysis device on the “data input screen” of the second embodiment.
FIG. 11 is a diagram showing a display of the data analysis apparatus in the “graph display screen” of the second embodiment.
FIG. 12 is a flowchart of “weight optimization processing”.
FIG. 13 is a diagram showing a display of the data analysis device in the “graph display screen” and the “weight operation screen” of the second embodiment.
FIG. 14 is a diagram showing an example of a radar chart according to the prior art.
[Explanation of symbols]
1 ... Data input device
2 ... Input means
3 ... Drawing means
4 ... Weight operation means
5 ... Output means
6 ... Grouping means

Claims

A data analysis device for analyzing data based on a plurality of multivariate data,
A plotting means for plotting the multivariate data on a coordinate plane or a coordinate space based on the value of each variable data constituting the given multivariate data and its weight;
Weight operating means for operating the weight of each of the variable data;
With
Data that makes it easy to visually grasp a plurality of multivariate data by changing the drawing state of each variable data on the coordinate plane or coordinate space by manipulating the weight by the weight manipulation means Analysis equipment.

A program for realizing, using a computer, a data analysis device that analyzes data based on a plurality of multivariate data,
A plotting means for plotting the multivariate data on a coordinate plane or a coordinate space based on the value of each variable data constituting the given multivariate data and its weight;
Weight operating means for operating the weight of each of the variable data;
With
Data that makes it easy to visually grasp a plurality of multivariate data by changing the drawing state of each variable data on the coordinate plane or coordinate space by manipulating the weight by the weight manipulation means A program for realizing an analyzer using a computer.

A recording medium recording a program for realizing a data analysis device that analyzes data based on a plurality of multivariate data using a computer,
A plotting means for plotting the multivariate data on a coordinate plane or a coordinate space based on the value of each variable data constituting the given multivariate data and its weight;
Weight operating means for operating the weight of each of the variable data;
With
Data that makes it easy to visually grasp a plurality of multivariate data by changing the drawing state of each variable data on the coordinate plane or coordinate space by manipulating the weight by the weight manipulation means A recording medium on which a program for realizing the analysis apparatus using a computer is recorded.

A data analysis apparatus according to any one of claims 1 to 3, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer,
The plotting means plots newly given multivariate data on a coordinate plane or a coordinate space using weights in already plotted multivariate data.

A data analysis apparatus according to any one of claims 1 to 4, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer,
The plotting means associates each variable data value with a vector angle as a plotting element when plotting multivariate data on a coordinate plane or a coordinate space, and further associates each weight data with a vector length. The final arrival point obtained by concatenating the vectors is drawn as a representative point.

A data analysis apparatus according to any one of claims 1 to 5, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer,
The plotting means uses a constellation graph that plots representative points of the multivariate data in a predetermined semicircle as a coordinate plane for plotting the given multivariate data.

A data analysis apparatus according to any one of claims 1 to 6, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer,
The weight operation means acquires group information indicating to which group the given multivariate data belongs, the distance between the average point of each representative point of the multivariate data in each group and each representative point, and the total number The weight of each variable data is determined based on the distance between the representative points of the variable data so that each group constituted by each representative point is clearly divided and displayed.

A data analysis apparatus according to any one of claims 1 to 6, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer,
The weight operation means determines the weight of each variable data based on information given by a user who has received a drawing state of multivariate data.

A data analysis apparatus according to any one of claims 1 to 8, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer,
Grouping means for grouping and outputting the selected multivariate data by selecting a plurality of multivariate data based on a plotting state of each multivariate data plotted by the plotting means. Features

A data analysis apparatus according to claim 9, a program for realizing the data analysis apparatus using a computer or a recording medium recording a program for realizing the data analysis apparatus using a computer,
The grouping means groups a plurality of multivariate data selected based on the drawing state by a user who visually confirms the drawing state of each multivariate data drawn by the drawing means. It is characterized by.

A data analysis apparatus according to claim 10, a program for realizing the data analysis apparatus using a computer, or a recording medium recording a program for realizing the data analysis apparatus using a computer,
The grouping means displays the selected multivariate data and other multivariate data in different colors when a plurality of multivariate data is selected.

A data analysis method for analyzing data based on a plurality of multivariate data,
A plotting means for plotting the multivariate data on a coordinate plane or a coordinate space based on the value of each variable data constituting the given multivariate data and its weight;
Weight operating means for operating the weight of each of the variable data;
With
Data that makes it easy to visually grasp a plurality of multivariate data by changing the drawing state of each variable data on the coordinate plane or coordinate space by manipulating the weight by the weight manipulation means Analysis method.

The data analysis method of claim 12,
The plotting means plots the newly given multivariate data using the weight in the multivariate data that has already been plotted in a state that can be easily visually grouped and grasped. Which makes it easy to determine the group attributes of the given multivariate data.