JP2007226639A

JP2007226639A - Multivariate data discrimination device

Info

Publication number: JP2007226639A
Application number: JP2006048558A
Authority: JP
Inventors: Norio Hirai; 規郎平井
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2006-02-24
Filing date: 2006-02-24
Publication date: 2007-09-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide a multivariate data discrimination device easily specifying a factor by which an observation phenomenon is largely affected at high speed without supposing a distribution state of data on a factor item. <P>SOLUTION: This multivariate data distinction device has: data collection means 101-104 storing multivariate data added with attribute information; data group extraction means 105-108 set with an extraction condition designated with two factor items to one state item corresponding to a factor, extracting a multivariate data group corresponding to the factor items and the state item satisfying the extraction condition from the multivariate data; a discrimination possibility/impossibility decision means 109 deciding whether or not discrimination of the state item is possible in the factor item by minimizing a mixing area of the multivariate data group when two-dimensionally mapping and rotating the multivariate data group and projecting it to one factor item; and display means 111-113 calculating an influence degree of the factor item to the state item from a decision result of the discrimination possibility/impossibility decision means 109, and displaying it. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、機器や設備の故障予測または人間の行動予測などに適用される多変量データ判別装置に関し、特に機器または設備などから収集された多変量データが、指定された項目で判別可能か否かを判定する多変量データ判別装置に関するものである。 The present invention relates to a multivariate data discriminating apparatus applied to device or facility failure prediction or human behavior prediction, and in particular, whether or not multivariate data collected from a device or facility can be discriminated by a specified item. The present invention relates to a multivariate data discriminating apparatus for determining whether or not.

一般に、機器または設備の故障時期を予測する場合などにおいて、劣化状態などの現象に関係する要因を知ることは重要である。
このとき、機器または設備の状態に対する要因を特定する１つの基準として、劣化した状態と劣化していない状態との２つの状態をあらかじめ設定し、これらの基準状態といくつかの採取されたデータとの関係を調査して、両者が関係する場合には、それらの項目が影響因子であると判定している。 In general, it is important to know factors related to a phenomenon such as a deterioration state when predicting a failure time of a device or facility.
At this time, as one criterion for specifying the factor for the state of the equipment or equipment, two states, a deteriorated state and an undegraded state, are set in advance, and these reference states and some collected data In the case where both are related, it is determined that those items are influencing factors.

また、上記関係が存在するか否かを判定するための判定手段として、劣化している状態と劣化していない状態との２つの状態にしたがって、影響因子として考えられる項目のデータを２つの群に判別可能な場合には、選択した因子は影響ありと判定し、一方、２つの群に判別できない場合には、選択した因子は影響なしと判定している。 Further, as a determination means for determining whether or not the above relationship exists, two groups of data of items considered as influencing factors according to two states of a deteriorated state and a non-degraded state If the two factors can be discriminated, it is determined that the selected factor has an effect. On the other hand, if the two factors cannot be discriminated, it is determined that the selected factor has no effect.

つまり、或るデータをいくつかの状態によって判別することにより、劣化に影響する因子を特定して、機器または設備の劣化時期などを推定している。
このような多変量データ判別装置における判定手法としては、影響因子として考えられる項目のデータをプロットし、プロット結果が２つの状態によって２つの群に分割されているか否かを、目視により判断する方法が最も一般的である。 That is, by identifying certain data according to several states, a factor that affects the deterioration is specified, and the deterioration time of the device or equipment is estimated.
As a determination method in such a multivariate data determination device, a method of plotting data of items considered as influencing factors and visually determining whether or not the plot result is divided into two groups by two states Is the most common.

しかしながら、採取されたデータ項目が非常に多い場合には、要因として考えられる項目の組み合わせ数も膨大になり、すべてを目視により判別することは困難である。
そこで、この問題に鑑み、従来から、統計学の分野で用いられる判別分析の応用が提案されている。
判別分析を応用した従来の多変量データ判別装置においては、あらかじめ判別された既知の２つの群に対して、新しく採取された標本がどちらに近いかを判定する手法を用いており、このとき、近さの判定基準として、マハラノビス距離が多く用いられ、いくつかの手法が提案されている（たとえば、特許文献１、特許文献２参照）。 However, when the number of collected data items is very large, the number of combinations of items that can be considered as a factor is enormous, and it is difficult to visually determine all the combinations.
In view of this problem, the application of discriminant analysis used in the field of statistics has been proposed.
In a conventional multivariate data discriminating apparatus that applies discriminant analysis, a technique is used to determine which is close to a newly collected sample for two known groups that have been discriminated in advance. The Mahalanobis distance is often used as a proximity criterion, and several methods have been proposed (see, for example, Patent Document 1 and Patent Document 2).

特許文献１では、パターン認識を目的として、パターンが属する類を各類の類似性尺度を示す判別関数を用いて類似度を計算し、どの類に判別されるかを選択する方法が開示されている。
また、特許文献２では、多変量データを分類する方法として、あらかじめ、類そのものをさらに大きないくつかのカテゴリに分類して上記判別分析手法を適用することにより、判別精度を向上させる方法が開示されている。 Patent Document 1 discloses a method for calculating a similarity using a discriminant function indicating a similarity measure of each class and selecting which class is discriminated for the purpose of pattern recognition. Yes.
Patent Document 2 discloses a method for classifying multivariate data in advance by classifying classes themselves into several larger categories and applying the above discriminant analysis technique to improve the discrimination accuracy. ing.

特開平８−１０６２９５号公報（第４頁〜第５頁、第２図）JP-A-8-106295 (pages 4-5, FIG. 2) 特開２００４−２１３３１６号公報（第５頁〜第７頁、第３図）JP 2004-213316 A (pages 5-7, FIG. 3)

従来の多変量データ判別装置は、特許文献１または特許文献２のいずれの場合も、劣化状態などをより高精度に判別することを目的としているものの、状態に応じた判別結果が既知であることを前提としていることから判別関数も既知のものを用いているので、或る状態によって判別可能か否かが未知である実際のデータに対する判別関数を導出することができないという課題があった。 The conventional multivariate data discriminating apparatus aims at discriminating the deterioration state with higher accuracy in either case of Patent Document 1 or Patent Document 2, but the discrimination result corresponding to the state is known. Since the discriminant function is also known, the discriminant function for the actual data for which it is unknown whether it can be discriminated by a certain state cannot be derived.

また、特許文献１における判別基準として用いられているマハラノビス距離は、データの各群での分布状態を正規分布と仮定しているが、実際に採取されたデータでは必ずしも正規性の仮定が妥当でない場合も多く、特に影響因子として考えられる２項目間で相関が見られない場合には、データが一様に分布してしまい、マハラノビス距離を適用することができないという課題があった。 Further, the Mahalanobis distance used as a discrimination criterion in Patent Document 1 assumes that the distribution state of each group of data is a normal distribution, but the assumption of normality is not necessarily valid in actually collected data. In many cases, particularly when there is no correlation between two items that can be considered as influencing factors, there is a problem that the data is uniformly distributed and the Mahalanobis distance cannot be applied.

この発明は、上記のような問題点を解決するためになされたもので、設備や機器の劣化予測、人間の行動予測などを行うために、観測される現象（要因）がどの因子に大きく影響されるかを、因子項目のデータの分布状態を仮定することなく、簡単かつ高速に特定することのできる多変量データ判別装置を得ることを目的とする。
また、上記目的を実現するために、採取した影響因子項目のデータを、状態に応じて多変量データ群に分割し、２つの群からなる各データを総当りで判別可能か否かを判定し、判別度（完全に分割できる場合が最も高く、定量的に表される値）が最も高い項目を影響因子として特定することのできる多変量データ判別装置を得ることを目的とする。 The present invention has been made to solve the above-mentioned problems, and in order to predict deterioration of facilities and equipment, human behavior prediction, etc., the observed phenomenon (factor) greatly affects which factor. It is an object of the present invention to provide a multivariate data discriminating apparatus that can easily and quickly specify whether or not the distribution of the data of factor items is assumed.
In order to realize the above purpose, the collected data of the influencing factor items are divided into multivariate data groups according to the state, and it is determined whether or not each data of the two groups can be discriminated by brute force. An object of the present invention is to obtain a multivariate data discriminating apparatus that can identify an item having the highest discriminating degree (the highest value that can be completely divided and the value expressed quantitatively) as an influencing factor.

この発明による多変量データ判別装置は、センサまたは手動計測で収集されたデータが、１つの要因に対する影響因子として指定された項目で判別可能か否かを判定する多変量データ判別装置であって、データを採取するとともに、データのＩＤ情報を含む属性情報をデータに付加し、多変量データとしてデータベースに蓄積するデータ収集手段と、要因に相当する１つの状態項目に対して２つの要因項目が指定された抽出条件が設定されることにより、多変量データから抽出条件を満たす状態項目および要因項目に対応した多変量データ群を抽出するデータ群抽出手段と、多変量データ群を２次元マッピングして回転させ、１つの要因項目に射影したときの多変量データ群の混在領域を最小化することにより、要因項目で状態項目の判別が可能か否かを判定する判別可否判定手段と、判別可否判定手段の判定結果から、状態項目に対する要因項目の影響度を算出して表示する表示手段とを備えたものである。 The multivariate data discriminating apparatus according to the present invention is a multivariate data discriminating apparatus that determines whether or not data collected by a sensor or manual measurement can be discriminated by an item designated as an influencing factor for one factor, Collecting data, adding attribute information including data ID information to the data, storing data in the database as multivariate data, and specifying two factor items for one status item corresponding to the factor When the extracted extraction conditions are set, the data group extraction means for extracting the multivariate data group corresponding to the state item and the factor item satisfying the extraction condition from the multivariate data, and the multivariate data group are two-dimensionally mapped. By rotating and minimizing the mixed area of multivariate data groups when projected onto one factor item, the state item can be identified by the factor item. And determining discrimination possibility determination means for determining capacity or not, the determination result of the determination permission determination unit, in which a display means for displaying by calculating the degree of influence of factors items for state item.

この発明によれば、観測現象がどの因子に大きく影響されるかを、因子項目のデータの分布状態を仮定することなく、簡単かつ高速に特定することができる。 According to the present invention, it is possible to easily and quickly specify which factor the observation phenomenon is greatly influenced without assuming the distribution state of the data of the factor item.

実施の形態１．
図１はこの発明の実施の形態１に係る多変量データ判別装置の全体構成をフローチャート形式で示すブロック図であり、或る設備における部品の故障発生状態の影響因子を特定する場合の構成例を示している。
図１において、監視対象となる設備や機器（図示せず）には、複数の計測器１００が設けられている。計測器１００は、各種のセンサからなり、設備や機器の各部品に関連した状態量を計測し、監視対象の状態および影響因子を含むデータとして出力する。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing the overall configuration of the multivariate data discriminating apparatus according to Embodiment 1 of the present invention in the form of a flowchart. Show.
In FIG. 1, a plurality of measuring instruments 100 are provided in equipment or equipment (not shown) to be monitored. The measuring instrument 100 includes various sensors, measures state quantities related to each part of equipment and equipment, and outputs the data as data including the state to be monitored and influencing factors.

この発明の実施の形態１に係る多変量データ判別装置は、採取データ収集部１０１と、属性情報収集部１０２と、データ結合手段１０３と、多変量データ蓄積手段（データベースＤＢ）１０４と、ユーザ操作による条件入力手段１０５Ａと協働するデータ抽出手段１０５と、抽出データ蓄積手段１０７と、状態別分類作成手段１０８と、関数入力手段１０９Ａと協働する判別可否判定手段１０９と、判別可否判定手段１０９の終了判定ブロック１１０と、因子影響度算出手段１１１と、ディスプレイ１１３を有する因子影響度表示手段１１２とを備えている。 A multivariate data discriminating apparatus according to Embodiment 1 of the present invention includes a collected data collection unit 101, an attribute information collection unit 102, a data combination unit 103, a multivariate data storage unit (database DB) 104, and a user operation. The data extraction means 105 that cooperates with the condition input means 105A, the extracted data storage means 107, the classification-by-state creation means 108, the discrimination availability determination means 109 that cooperates with the function input means 109A, and the discrimination availability judgment means 109. End determination block 110, factor influence calculation means 111, and factor influence display means 112 having a display 113.

また、破線矢印で示すように、データ抽出手段１０５と状態別分類作成手段１０９との間に、必要（後述する）に応じて、特徴量抽出手段１０６および特徴量蓄積手段１０６Ａが設けられる。
特徴量抽出手段１０６および特徴量蓄積手段１０６Ａは、変量圧縮手段を構成しており、多変量項目を含む場合に、多変量データの特異値を分解することにより、特徴量のみを抽出して、２変量に圧縮して状態別分類作成手段１０９に送る。 Further, as indicated by a broken line arrow, a feature amount extraction unit 106 and a feature amount storage unit 106A are provided between the data extraction unit 105 and the state-specific classification creation unit 109 as necessary (described later).
The feature quantity extraction unit 106 and the feature quantity storage unit 106A constitute a variable compression unit. When the feature quantity extraction unit 106A includes multivariate items, only the feature quantity is extracted by decomposing the singular values of the multivariate data. The data is compressed into bivariate and sent to the state-specific classification creating means 109.

採取データ収集部１０１、属性情報収集部１０２、データ結合手段１０３および多変量データ蓄積手段１０４は、属性情報（後述する）が付加された多変量データをデータベースとして蓄積するデータ収集手段を構成している。
図２は各データ収集部１０１、１０２におけるデータ形式例を示す説明図である。
図２（ａ）は採取データ収集部１０１に蓄積される記録形式例を示しており、採取データは、設備ＩＤ（または、センサＩＤ）に対応付けられた部品の故障発生率、軸温度、油温および振動値などを含む。
また、図２（ｂ）は属性情報収集部１０２にあらかじめ格納されている記録形式例を示しており、属性情報は、設備ＩＤに対応付けられた設置場所および設置年などを含む。 The collection data collection unit 101, the attribute information collection unit 102, the data combination unit 103, and the multivariate data storage unit 104 constitute a data collection unit that accumulates multivariate data to which attribute information (described later) is added as a database. Yes.
FIG. 2 is an explanatory diagram illustrating an example of a data format in each of the data collection units 101 and 102.
FIG. 2A shows an example of a recording format stored in the collection data collection unit 101. The collection data includes the failure occurrence rate, the shaft temperature, and the oil of the component associated with the equipment ID (or sensor ID). Including temperature and vibration values.
FIG. 2B shows an example of a recording format stored in advance in the attribute information collection unit 102, and the attribute information includes an installation location and an installation year associated with the equipment ID.

図３は多変量データ蓄積手段１０４（データベース）における多変量データのテーブル例を示す説明図である。
図３において、多変量データのテーブルは、設備ＩＤ「１、２、３、・・・」ごとに、図２（ａ）のデータ（各部品の故障発生率、軸温度、振動値、油温など）と、図２（ｂ）の属性情報（経過年など）とを、対応付けて格納している。 FIG. 3 is an explanatory view showing a table example of multivariate data in the multivariate data storage means 104 (database).
3, the multivariate data table includes the data of FIG. 2A (fault occurrence rate, shaft temperature, vibration value, oil temperature for each part ID “1, 2, 3,...”). Etc.) and attribute information (elapsed year etc.) in FIG. 2B are stored in association with each other.

複数の計測器１００（または、手動計測）で検出（採取）された採取データ（以下、単に「データ」という）は、採取データ収集部１０１に格納される。
ここで、データは、データの採取時刻とともに、図２（ａ）のように、設備ＩＤ（採取場所などに対応）と、設備内の或る部品の状態量（たとえば、故障発生率、設備稼動時の軸温度、油温、軸の振動値などの影響因子）とを含む。
また、蓄積データ収集部１０２内にあらかじめ格納された属性情報は、図２（ｂ）のように、設備ＩＤ（または、センサＩＤ）、設置場所、設置年などの影響因子を含む。 Collection data (hereinafter simply referred to as “data”) detected (collected) by a plurality of measuring instruments 100 (or manual measurement) is stored in the collection data collection unit 101.
Here, the data includes the equipment ID (corresponding to the sampling location, etc.) and the state quantity of a certain part in the equipment (for example, failure occurrence rate, equipment operation, etc.) as shown in FIG. Influencing factors such as shaft temperature, oil temperature, shaft vibration value).
Further, the attribute information stored in advance in the accumulated data collection unit 102 includes influencing factors such as equipment ID (or sensor ID), installation location, and installation year, as shown in FIG.

データ結合手段１０３は、採取データ収集部１０１内のデータを読出すとともに、属性情報収集部１０２内の属性情報を読出してデータに付加し、図３のように、設備ＩＤごとの多変量データとして、多変量データ蓄積手段１０４に蓄積する。
すなわち、設備の設置場所や設置時期などの設備ごとの固定的な属性情報は、採取データ収集部１０１内のデータと同様に、設備ＩＤごとに属性情報収集部１０２に格納されており、各収集部１０１、１０２内の２種類のデータは、データ結合手段１０３によって設備ＩＤでマージされ、多変量データＤＢ１０４に蓄えられる。 The data combination means 103 reads the data in the collected data collection unit 101 and reads the attribute information in the attribute information collection unit 102 and adds it to the data, as shown in FIG. 3, as multivariate data for each equipment ID. The data is stored in the multivariate data storage means 104.
That is, the fixed attribute information for each equipment such as the installation location and the installation time of the equipment is stored in the attribute information collecting section 102 for each equipment ID, similarly to the data in the collected data collecting section 101. The two types of data in the sections 101 and 102 are merged by the equipment ID by the data combination means 103 and stored in the multivariate data DB 104.

データ抽出手段１０５、特徴量抽出手段１０６、特徴量蓄積手段１０６Ａ、抽出データ蓄積手段１０７および状態別分類作成手段１０８は、データ群抽出手段を構成しており、ユーザ要求に応じた条件入力手段１０５Ａからの選択情報、すなわち１つの状態項目に対して２つの要因項目が指定された抽出条件（測定日、設備ＩＤまたは抽出対象項目など）が設定されることにより、多変量データ蓄積手段１０４内の多変量データから、条件入力手段１０５Ａからの抽出条件を満たす状態項目および要因項目に対応した多変量データ群を抽出する。 The data extraction unit 105, the feature amount extraction unit 106, the feature amount storage unit 106A, the extracted data storage unit 107, and the state classification creation unit 108 constitute a data group extraction unit, and a condition input unit 105A according to a user request. Is set in the multivariate data storage means 104 by setting the extraction information (measurement date, equipment ID or extraction target item, etc.) in which two factor items are specified for one state item. From the multivariate data, a multivariate data group corresponding to the state item and the factor item satisfying the extraction condition from the condition input means 105A is extracted.

判別可否判定手段１０９（たとえば、線形の判別関数に基づく線形判別可否判定手段）は、データ群抽出手段により抽出された多変量データ群の集団を原点（後述する判別直線の中心）周りに回転させることにより、１つの要因項目に射影したときの多変量データ群の混在領域（後述する）を最小化することにより、要因項目で状態項目の線形判別が可能か否かを判定する。
因子影響度算出手段１１１、因子影響度表示手段１１２およびディスプレイ１１３は、判別可否判定手段１０９の判定結果から、状態項目に対する要因項目の影響度を算出して表示する表示手段を構成している。 Discriminability determination unit 109 (for example, a linear discriminability determination unit based on a linear discriminant function) rotates the group of multivariate data groups extracted by the data group extraction unit around the origin (the center of a discriminant line described later). Thus, by minimizing the mixed region (described later) of the multivariate data group when projected onto one factor item, it is determined whether or not the factor item can be linearly discriminated.
The factor influence degree calculating means 111, the factor influence degree displaying means 112 and the display 113 constitute display means for calculating and displaying the influence degree of the factor item with respect to the state item from the determination result of the discriminability determination means 109.

多変量データ蓄積手段１０４から、条件入力手段１０５Ａからの抽出条件に基づき、データ抽出手段１０５により抽出された多変量データ群は、抽出データ蓄積手段１０７を介して状態別分類作成手段１０８に送られる。
状態別分類作成手段１０８は、ユーザ操作による条件入力手段１０５Ａからの抽出条件（状態項目として指定されている項目および分類方法など）の指定にしたがい、影響因子の項目データを複数の多変量データ群に分類する。 The multivariate data group extracted by the data extraction unit 105 from the multivariate data storage unit 104 based on the extraction condition from the condition input unit 105A is sent to the classification-by-state generation unit 108 via the extraction data storage unit 107. .
According to the specification of extraction conditions (items specified as state items, classification methods, etc.) from the condition input means 105A by the user operation, the state-specific classification creation means 108 converts the influence factor item data into a plurality of multivariate data groups. Classify into:

たとえば、状態別分類作成手段１０８は、設備の或る部品の「故障発生率」を状態として指定し、これを高（６０％以上）、中（２０％〜６０％）、小（２０％以下）の３つの多変量データ群に分類する場合、設置場所や軸温度、油温、振動値などの指定した影響因子項目のデータを３つの多変量データ群に分類する。 For example, the state-specific classification creating means 108 designates “failure occurrence rate” of a certain part of equipment as a state, and designates this as high (60% or more), medium (20% to 60%), small (20% or less). ) Are classified into three multivariate data groups, such as installation location, shaft temperature, oil temperature, vibration value, and the like.

続いて、状態別分類作成手段１０８は、分類された多変量データ群を２群ずつ判別可否判定手段１０９に送る。
判別可否判定手段１０９は、ユーザ要求に応じた関数入力手段１０９Ａからの選択条件（多変量データ群を分割するものが、線形の判別関数か、非線形の判別関数か）に基づいて判定処理を行う。ここでは、後述するように線形の判別関数を用いるものとする。
関数入力手段１０９Ａからの選択条件が線形の場合には、特にそれ以上のユーザ入力操作は不要であるが、関数入力手段１０９Ａからの選択条件が非線形の場合には、関数の次数および係数（たとえば、ｙ＝ｘ３）などをさらに指定する必要がある。 Subsequently, the state-specific classification creating unit 108 sends the classified multivariate data groups two by two to the discriminability determination unit 109.
The discriminability determination unit 109 performs a determination process based on a selection condition from the function input unit 109A according to a user request (whether the one that divides the multivariate data group is a linear discriminant function or a nonlinear discriminant function). . Here, a linear discriminant function is used as will be described later.
When the selection condition from the function input unit 109A is linear, no further user input operation is particularly necessary. However, when the selection condition from the function input unit 109A is non-linear, the function order and coefficient (for example, , Y = x3) and the like need to be further specified.

判別可否判定手段１０９は、状態別分類作成手段１０８からの２つの多変量データ群ごとに、ユーザ入力にしたがう判別が可能か否かを判定して、その判定結果から判別度を算出する。
次に、終了判定ブロック１１０は、すべての群について、判別可否判定手段１０９による判定処理を実行（総当りでの判定処理が終了）したか否かを判定し、すべての群で判定完了していない（すなわち、ＮＯ）と判定されれば、判別可否判定手段１０９による判定処理を継続して実行する。 The discriminability determination unit 109 determines whether it is possible to discriminate according to user input for each of the two multivariate data groups from the state classification creation unit 108, and calculates a discriminating degree from the determination result.
Next, the end determination block 110 determines whether or not the determination process by the determination possibility determination unit 109 has been executed for all the groups (the determination process for brute force has been completed), and the determination has been completed for all the groups. If it is determined that there is no (that is, NO), the determination process by the determination possibility determination unit 109 is continued.

一方、終了判定ブロック１１０において、すべての群で判定完了した（すなわち、ＹＥＳ）と判定されれば、各多変量データ群の判定処理で算出された判別度に基づいて、因子影響度算出手段１１１により、選択した状態に対する因子影響度を算出する。
算出された因子影響度は、因子影響度表示手段１１２を介してディスプレイ１１３に表示される。 On the other hand, if it is determined in the end determination block 110 that the determination has been completed for all groups (that is, YES), the factor influence calculation means 111 is based on the determination degree calculated in the determination process for each multivariate data group. To calculate the factor influence on the selected state.
The calculated factor influence degree is displayed on the display 113 via the factor influence degree display means 112.

次に、図４を参照しながら、図１内の判別可否判定手段１０９による判定処理動作について、具体的に説明する。
図４は多変量データ群の２次元マッピング状態を示す説明図である。
ここでは、部品の故障発生率の高低を要因とした場合に、図４（ａ）は要因の判別が可能な場合、図４（ｂ）は要因の判別が不能の場合を示している。 Next, with reference to FIG. 4, the determination processing operation by the determination possibility determination unit 109 in FIG. 1 will be specifically described.
FIG. 4 is an explanatory diagram showing a two-dimensional mapping state of the multivariate data group.
Here, when the cause of the failure occurrence rate of a component is a factor, FIG. 4A shows a case where the factor can be discriminated, and FIG. 4B shows a case where the factor cannot be discriminated.

図４（ａ）において、横軸（属性情報）は経過年数、縦軸（状態量）は振動値であり、図４（ｂ）において、横軸（属性情報）は設置場所、縦軸（状態量）は油温である。
また、図４（ａ）、（ｂ）において、プロット点「□」は、部品の故障発生率が低い設備に対応し、プロット点「△」は、故障発生率が高い設備に対応する。
たとえば、図４（ａ）に示すように、検出される振動値は、故障発生率または経過年数が大きくなるほど大きくなる傾向にある。 4A, the horizontal axis (attribute information) is the number of years elapsed, and the vertical axis (state quantity) is the vibration value. In FIG. 4B, the horizontal axis (attribute information) is the installation location, and the vertical axis (state). Amount) is the oil temperature.
In FIGS. 4A and 4B, the plot point “□” corresponds to equipment with a low failure rate of parts, and the plot point “Δ” corresponds to equipment with a high failure rate.
For example, as shown in FIG. 4A, the detected vibration value tends to increase as the failure occurrence rate or the number of years elapsed.

まず、判別可否判定手段１０９は、状態として選択した「故障発生率」以外の影響因子データ（状態量）と想定されるデータ「振動値」、「油温」を、図４（ａ）、（ｂ）のように空間上にマッピングする。
この場合、影響因子の組み合わせとして、２項目を選択したので、それぞれ、２次元上へのマッピングとなる。すなわち、図４（ａ）ではＸ軸に経過年数、Ｙ軸に振動値をとり、図４（ｂ）ではＸ軸に設置場所（たとえば、標高など）、Ｙ軸に油温をとっている。
また、比較される２つの多変量データ群は、故障発生率が低い場合のデータ「□」と、故障発生率が高い場合のデータ「△」としている。 First, the discriminability determination means 109 obtains data “vibration value” and “oil temperature” assumed to be influence factor data (state quantity) other than the “failure occurrence rate” selected as the state, as shown in FIG. Map in space as shown in b).
In this case, since two items are selected as the combination of the influencing factors, each becomes a two-dimensional mapping. That is, in FIG. 4A, the elapsed time is taken on the X axis, the vibration value is taken on the Y axis, and in FIG. 4B, the installation location (for example, elevation) is taken on the X axis, and the oil temperature is taken on the Y axis.
The two multivariate data groups to be compared are data “□” when the failure occurrence rate is low and data “Δ” when the failure occurrence rate is high.

このとき、図４（ａ）では、判別直線４０３（２点鎖線参照）を境界として、故障発生率の低いデータ「□」と故障発生率の高いデータ「△」とが明確に区別されるので、図４（ａ）の場合は、故障発生率（要因に相当する状態項目）の線形判別が可能な２つの多変量データ群であると判別される。
すなわち、図４（ａ）の「経過年数と振動値」という項目の組み合わせは、「故障発生率」によって異なる傾向を示しているので、故障発生率の高低（要因）に影響する因子であると考えられる。 At this time, in FIG. 4A, data “□” having a low failure rate and data “Δ” having a high failure rate are clearly distinguished from each other with a discrimination line 403 (see a two-dot chain line) as a boundary. In the case of FIG. 4A, it is determined that there are two multivariate data groups capable of linearly determining the failure occurrence rate (state item corresponding to the factor).
That is, since the combination of the items “elapsed years and vibration value” in FIG. 4A shows a different tendency depending on the “failure occurrence rate”, it is a factor that affects the level (factor) of the failure occurrence rate. Conceivable.

一方、図４（ｂ）では、故障発生率の低いデータ「□」と、故障発生率の高いデータ「△」とが混在しており、判別直線が存在しないので、図４（ｂ）の場合は、要因（故障発生率の高低）の線形判別が不能な２つの多変量データ群であると判別される。 On the other hand, in FIG. 4B, data “□” having a low failure rate and data “Δ” having a high failure rate are mixed and there is no discrimination line. Is determined to be two multivariate data groups that cannot be linearly determined as a factor (high or low failure rate).

図４（ａ）、（ｂ）のように、「故障発生率」の違いに応じて、プロット点「□」、「△」に分けてグラフ化してディスプレイ１１３に表示すれば、線形判別が可能な多変量データ群であるか否かを、目視によって人が判定することができる。 As shown in FIGS. 4 (a) and 4 (b), if the graph is divided into plot points “□” and “Δ” and displayed on the display 113 according to the difference in “failure occurrence rate”, linear discrimination is possible. It is possible for a person to visually determine whether or not the data group is a multivariate data group.

次に、図５を参照しながら、目視により個別に判断するのではなく、自動的に判別する場合の処理動作について説明する。
図５（ａ）、（ｂ）は、図４（ａ）、（ｂ）と同様に、経過年数および振動値と設置場所および油温とに関する多変量データ群を示しているが、この場合、図５（ａ）、（ｂ）のいずれも、要因（故障発生率の高低）の線形判別が可能なデータ群を示している。 Next, referring to FIG. 5, a description will be given of the processing operation in the case where the determination is made automatically instead of being determined individually by visual observation.
5 (a) and 5 (b) show multivariate data groups related to the elapsed years, vibration values, installation location, and oil temperature, as in FIGS. 4 (a) and 4 (b). Each of FIGS. 5A and 5B shows a data group capable of linearly determining a factor (high or low failure rate).

図５（ａ）のように、判別直線５０１が縦軸（Ｙ軸）に対して平行で、かつ判別直線５０１によって故障発生率を１００％判別可能であるとすれば、横軸（Ｘ軸）に該当する項目（経過年数）のみで故障発生率を判別することができる。
いま、故障発生率の高低を目視でなく自動的に判別しようとすれば、故障発生率が低い場合に相当するデータ「□」と、故障発生率が低い場合に相当するデータ「△」とを、それぞれ領域５０３および領域５０４としてＸ軸に射影し、Ｘ軸上で各データの混在領域が所定値（要因の判別が可能と見なすことのできる最大値）以下であるかを認識すればよいことが分かる。 As shown in FIG. 5A, if the discrimination line 501 is parallel to the vertical axis (Y axis) and the failure occurrence rate can be discriminated 100% by the discrimination line 501, the horizontal axis (X axis). It is possible to determine the failure occurrence rate only by the item corresponding to (years elapsed).
Now, if it is attempted to automatically determine the level of failure occurrence rate without visual inspection, data “□” corresponding to a low failure rate and data “△” corresponding to a low failure rate are obtained. Projecting on the X-axis as the region 503 and the region 504, respectively, it is only necessary to recognize whether the mixed region of each data on the X-axis is equal to or less than a predetermined value (a maximum value that can be considered to determine the factor). I understand.

一方、図５（ｂ）のように、判別直線５０２がＸ軸に対して平行で、かつ判別直線５０２によって故障発生率を１００％判別可能であるとすれば、Ｙ軸に該当する項目（油温）のみで故障発生率を判別することができる。
したがって、故障発生率が低い場合に相当するデータ「□」と、故障発生率が低い場合に相当するデータ「△」とを、それぞれ領域５０５および領域５０６としてＹ軸に射影し、Ｙ軸上で各データの混在領域が所定値以下であるかを認識すればよいことが分かる。
図５（ａ）、（ｂ）の例では、いずれも判別直線５０１、５０２によって各データが完全に分類されているので、混在領域は存在せず、したがって、要因の判別が可能と見なすことができる。 On the other hand, as shown in FIG. 5B, if the discrimination line 502 is parallel to the X axis and the failure occurrence rate can be discriminated 100% by the discrimination line 502, an item corresponding to the Y axis (oil The failure rate can be determined only by (temperature).
Therefore, data “□” corresponding to the case where the failure occurrence rate is low and data “Δ” corresponding to the case where the failure occurrence rate is low are projected onto the Y axis as the region 505 and the region 506, respectively, on the Y axis. It can be seen that it is only necessary to recognize whether the mixed area of each data is equal to or less than a predetermined value.
In the examples of FIGS. 5A and 5B, since each data is completely classified by the discrimination lines 501 and 502, there is no mixed area, and therefore it can be considered that the factor can be discriminated. it can.

次に、図６を参照しながら、判別直線６０１がＸ軸またはＹ軸に対して平行でなく、かつ要因をほぼ判別できるものの、１００％判別可能ではない場合の回転処理動作について説明する。
図６は前述と同様の多変量データ群の２次元マッピング状態を示す説明図であり、故障発生率の高低を要因としたときの因子として、Ｘ軸（経過年数）およびＹ軸（振動値）をとった場合を示している。 Next, with reference to FIG. 6, the rotation processing operation when the discrimination line 601 is not parallel to the X axis or the Y axis and the factor can be almost discriminated but cannot be discriminated 100% will be described.
FIG. 6 is an explanatory diagram showing a two-dimensional mapping state of the same multivariate data group as described above, and the X axis (elapsed years) and Y axis (vibration values) as factors when the failure occurrence rate is a factor. This shows the case where

図６（ａ）は判別直線６０１がＸ軸またはＹ軸に対して平行でなく、かつ要因（故障発生率の高低）をほぼ判別できるものの、１００％判別可能ではない状態を示している。
また、図６（ｂ）、（ｃ）は、図６（ａ）の多変量データ群を判別直線６０１の中心を原点として順次右回転した状態を示している。 FIG. 6A shows a state in which the discrimination line 601 is not parallel to the X axis or the Y axis, and the factor (the level of failure occurrence rate) can be almost discriminated, but is not 100% discriminable.
FIGS. 6B and 6C show a state in which the multivariate data group in FIG. 6A is sequentially rotated clockwise with the center of the discrimination line 601 as the origin.

図６（ａ）の状態において、故障発生率が低いデータ「□」および故障発生率が高いデータ「△」の２つの多変量データ群は、負の一次関数からなる判別直線６０１によって区別される。
したがって、多変量データ群に該当する各データをＸ軸に対して射影すると、各データの一方のみが存在する領域６０２、６０３と、２つの多変量データ群のデータが混在する領域６０４とが存在する。 In the state of FIG. 6A, the two multivariate data groups of data “□” having a low failure rate and data “Δ” having a high failure rate are distinguished by a discriminant line 601 composed of a negative linear function. .
Therefore, when each data corresponding to the multivariate data group is projected onto the X-axis, there are areas 602 and 603 where only one of the data exists, and an area 604 where the data of the two multivariate data groups are mixed. To do.

図６（ａ）のデータ群を、図６（ｂ）、（ｃ）のように、判別直線６０１の中心を原点として、少しずつ（たとえば、１０度ずつ）データ全体を一集団として時計（右）回りに回転させ、それぞれの状態でＸ軸にデータを射影する。
この場合、図６（ａ）から図６（ｂ）、図６（ｃ）へと回転角が大きくなるにつれて、混在領域６０４、６１４、６２４は、少しずつ順次に小さくなり、図６（ｃ）のように、判別直線６０１がＹ軸に平行になったときに、Ｘ軸に射影された２群のデータ領域における混在領域６２４は最小になる。 As shown in FIGS. 6B and 6C, the data group in FIG. 6A is set to the clock (right) as a group of the entire data little by little (for example, 10 degrees each) with the center of the discrimination line 601 as the origin. ) Rotate around and project data onto the X axis in each state.
In this case, as the rotation angle increases from FIG. 6A to FIG. 6B and FIG. 6C, the mixed regions 604, 614, and 624 gradually decrease in size, and FIG. As described above, when the discrimination line 601 is parallel to the Y axis, the mixed area 624 in the two groups of data areas projected onto the X axis is minimized.

このように、２つの群からなるデータを回転させ、Ｘ軸またはＹ軸にそれぞれ射影したときに「全体の分布領域に占める混在領域の割合」が最も小さくなる状態を求め、そのときの影響因子を最も判別可能な状態と見なす。
このとき、混在領域の割合を「判別度」と定義すると、図５（ａ）、（ｂ）のように、混在領域が「０」で１００％判別できる場合には、判別度は「０」となる。
逆に、図４（ｂ）のように、２つの群からなるデータが完全に混在（各データ領域が１００％オーバーラップ）した場合には、判別度は「１００」となる。 As described above, when the data composed of the two groups is rotated and projected onto the X axis or the Y axis, a state in which the “ratio of the mixed area in the entire distribution area” becomes the smallest is obtained, and the influencing factors at that time Is regarded as the most discriminable state.
At this time, if the ratio of the mixed area is defined as “discrimination degree”, the discrimination degree is “0” when the mixed area is “0” and 100% can be discriminated as shown in FIGS. It becomes.
On the other hand, as shown in FIG. 4B, when the data of the two groups is completely mixed (each data area overlaps by 100%), the discrimination degree is “100”.

したがって、図６（ｃ）のように、最小に設定された混在領域６２４の判別度（割合）が所定値（たとえば、２０［％］）以下であれば、抽出した多変量データにより要因の判別が可能であることが分かる。
一方、最小に設定された混在領域６２４の判別度が所定値以上であれば、抽出した多変量データにより要因の判別が不能であることが分かる。 Therefore, as shown in FIG. 6C, if the discrimination degree (ratio) of the mixed region 624 set to the minimum is not more than a predetermined value (for example, 20 [%]), the factor is discriminated based on the extracted multivariate data. It is understood that is possible.
On the other hand, if the discriminating degree of the mixed region 624 set to the minimum is equal to or greater than a predetermined value, it can be understood that the factor cannot be discriminated by the extracted multivariate data.

なお、図６においては、判別直線６０１の中心を原点として回転させたが、２次元マッピングの原点（Ｘ軸、Ｙ軸の交点）を回転中心としてもよい。
また、多変量データ群の回転角度の刻み幅については、小さく設定すればするほど、要因とのデータ相関性が高くなり高精度の判別が可能になる。
また、判別直線がどこに存在しても、９０度まで回転させれば、必ずＸ軸またはＹ軸と平行になることから、最大でも９０度まで回転させれば十分である。
また、影響因子のデータ領域については、異常値やノイズなどを考慮して、標本平均±２σを採用する。 In FIG. 6, the center of the discrimination line 601 is rotated as the origin, but the origin of the two-dimensional mapping (the intersection of the X axis and the Y axis) may be the center of rotation.
Further, the smaller the step size of the rotation angle of the multivariate data group is set, the higher the data correlation with the factor and the higher accuracy discrimination becomes possible.
Further, wherever the discrimination line exists, if it is rotated up to 90 degrees, it is always parallel to the X axis or Y axis, so it is sufficient to rotate it up to 90 degrees at the maximum.
For the data area of the influencing factors, the sample average ± 2σ is adopted in consideration of abnormal values and noise.

さらに、ここでは、故障発生率の高低に影響する因子の項目数が「２」（経過年数、振動値）の場合について説明したが、影響因子の項目数が「３」（たとえば、経過年数、振動値、油温など）以上の場合には、２項目ずつの総当りで判別度（混在領域の割合）を算出することにより、任意数の多変量項目での影響因子を自動的に特定することが可能になる。 Furthermore, although the case where the number of items of the factor that affects the level of the failure occurrence rate is “2” (elapsed years, vibration value) has been described here, the number of items of the influence factor is “3” (for example, elapsed years, If the value is greater than or equal to (vibration value, oil temperature, etc.), it automatically identifies the influencing factors in any number of multivariate items by calculating the discriminant degree (ratio of the mixed area) for every 2 items. It becomes possible.

以上のように、この発明の実施の形態１による多変量データ判別装置は、センサからなる計測器１００（または、手動計測）で収集されたデータが、１つの要因に対する影響因子として指定された項目で判別可能か否かを判定する多変量データ判別装置であって、データを採取するとともに、データに含まれるＩＤ情報に対応した属性情報をデータに付加し、多変量データとしてデータベースに蓄積するデータ収集手段１０１〜１０４と、要因に相当する１つの状態項目に対して２つの要因項目が指定された抽出条件が設定されることにより、多変量データから抽出条件を満たす状態項目および要因項目に対応した多変量データ群を抽出するデータ群抽出手段１０５〜１０８と、多変量データ群を２次元マッピングして回転させ、１つの要因項目に射影したときの多変量データ群の混在領域を最小化することにより、要因項目で状態項目の判別が可能か否かを判定する判別可否判定手段１０９と、判別可否判定手段１０９の判定結果から、状態項目に対する要因項目の影響度を算出して表示する表示手段１１１〜１１３とを備えている。 As described above, the multivariate data discriminating apparatus according to Embodiment 1 of the present invention is an item in which data collected by the measuring instrument 100 (or manual measurement) composed of sensors is designated as an influencing factor for one factor. Is a multivariate data discriminating apparatus that judges whether or not it can be discriminated, and collects data and adds attribute information corresponding to ID information included in the data to the data and accumulates it in the database as multivariate data Corresponding to the condition items and factor items satisfying the extraction condition from the multivariate data by setting the collection means 101-104 and the extraction condition in which two factor items are specified for one state item corresponding to the factor Data group extraction means 105-108 for extracting the multivariate data group, and two-dimensional mapping of the multivariate data group to rotate and one factor The determination result of the determination enable / disable determining means 109 for determining whether or not the state item can be determined by the factor item by minimizing the mixed region of the multivariate data group when projected to the eye, and the determination result of the determination enable / disable determination means 109 Display means 111 to 113 for calculating and displaying the influence degree of the factor item with respect to the state item.

また、データ収集手段は、データを蓄積する採取データ収集部１０１と、属性情報を格納する属性情報収集部１０２と、データおよび属性情報を結合して多変量データとするデータ結合手段１０３と、多変量データを蓄積するためのデータベースとなる多変量データ蓄積手段１０４とを含み、属性情報は、データの採取時刻または採取場所を含む。
また、判別可否判定手段１０９は、多変量データ群の分布領域に対する混在領域の割合を判別度として算出し、最小化された混在領域の判別度が所定値以下を示す場合に、要因項目で状態項目の判別が可能であると判定する。
これにより、判別関数の定義が不要となり、たとえば図４（ａ）、（ｂ）のように、データの分布に正規性がなく、マハラノビス距離の導入が困難な場合においても、多変量データが判別可能であるか否かを簡単に知ることができる。 The data collection means includes a collection data collection unit 101 that accumulates data, an attribute information collection unit 102 that stores attribute information, a data combination unit 103 that combines data and attribute information into multivariate data, And multivariate data storage means 104 serving as a database for storing variable data, and the attribute information includes data collection time or collection location.
Further, the discriminability determination means 109 calculates the ratio of the mixed area to the distribution area of the multivariate data group as the discriminating degree, and when the discriminating degree of the minimized mixed area indicates a predetermined value or less, the state is set as the factor item. It is determined that the item can be determined.
This eliminates the need to define a discriminant function. For example, as shown in FIGS. 4 (a) and 4 (b), multivariate data can be discriminated even when the data distribution is not normal and it is difficult to introduce the Mahalanobis distance. You can easily know whether it is possible.

また、判別可否判定手段１０９は、状態項目に対する要因項目として考えられる項目が３項目以上に達する場合には、要因項目を２項目ずつ判定する。
したがって、３項目以上の多変量項目においても、２次元マッピングによって２項目ずつ判定することにより、多変量データが判別可能であるか否かを知ることができる。 Further, when the number of items that can be considered as factor items for the state item reaches three or more, the discrimination enable / disable determining unit 109 determines the factor items by two items.
Therefore, it is possible to know whether or not multivariate data can be discriminated by determining two items by two-dimensional mapping even in three or more multivariate items.

実施の形態２．
なお、上記実施の形態１では、関数入力手段１０９Ａ（図１参照）で線形の判別関数を選択した場合を例にとり、判別直線でデータ区別が可能な場合に多変量データを判別可能と判定したが、関数入力手段１０９Ａで非線形の判別関数（曲線など）を選択した場合でも、判別可否判定手段１０９により、同様に判別可否を判定することができる。
次に、図７を参照しながら、判別関数として非線形の曲線などを選択したこの発明の実施の形態２にについて説明する。 Embodiment 2. FIG.
In the first embodiment, the case where a linear discriminant function is selected by the function input unit 109A (see FIG. 1) is taken as an example, and it is determined that multivariate data can be discriminated when data can be distinguished by a discriminant line. However, even when a non-linear discriminant function (curve or the like) is selected by the function input unit 109A, the discriminability determination unit 109 can determine whether or not discrimination is possible.
Next, a second embodiment of the present invention in which a nonlinear curve or the like is selected as the discriminant function will be described with reference to FIG.

図７はこの発明の実施の形態２による判別方法を説明するための説明図であり、前述と同様に、多変量データ群の２次元マッピング状態を示している。ここでは、非線形の判別関数として、３次関数を想定した場合を示しているが、他の任意の判別関数を用いた場合も同様である。
前述（図６参照）のように線形判別を行う場合には、単純にデータ群を少しずつ回転させればよいが、この発明の実施の形態２のように非線形判別を行う場合には、想定される非線形の判別関数を関数入力手段１０９Ａから入力する必要がある。 FIG. 7 is an explanatory diagram for explaining the discrimination method according to the second embodiment of the present invention, and shows the two-dimensional mapping state of the multivariate data group as described above. Here, a case where a cubic function is assumed as the nonlinear discriminant function is shown, but the same applies to the case where any other discriminant function is used.
When linear discrimination is performed as described above (see FIG. 6), the data group may be simply rotated little by little. However, when nonlinear discrimination is performed as in the second embodiment of the present invention, it is assumed. It is necessary to input the nonlinear discriminant function to be performed from the function input means 109A.

図７のように、判別関数が３次関数７０２の場合には、複数領域７０６、７０７、７０８（３つの領域）に個別に対応した３本の判別直線７０３、７０４、７０５により近似させる。
この場合、３次関数７０２上の上下方向の凸部分と微分係数が「０」となる点とを求めて３分割し、それぞれの微分係数で求められる３本の判別直線７０３、７０４、７０５の各交点によって得られる複数領域７０６、７０７、７０８をそれぞれ回転させる。
以下、回転させて判別度を判定する方法は、前述の実施の形態１と同様である。 As shown in FIG. 7, when the discriminant function is a cubic function 702, the discriminant function is approximated by three discriminant lines 703, 704, and 705 individually corresponding to a plurality of regions 706, 707, and 708 (three regions).
In this case, the convex portion in the vertical direction on the cubic function 702 and the point where the differential coefficient is “0” are obtained and divided into three, and three discriminant straight lines 703, 704, 705 obtained by the respective differential coefficients are obtained. A plurality of regions 706, 707, and 708 obtained by each intersection are rotated.
Hereinafter, the method for determining the degree of discrimination by rotating is the same as in the first embodiment.

このように、この発明の実施の形態２によれば、判別可否判定手段１０９は、判別関数が非線形の場合においても、判別関数を有限区分された複数領域７０６〜７０８に分割して線形近似することにより、複雑な非線形の判別関数をもつ場合でも、判別可能か否かを簡単に判定することができる。 As described above, according to the second embodiment of the present invention, the discriminability determination means 109 divides the discriminant function into a plurality of regions 706 to 708 that are finitely divided and performs linear approximation even when the discriminant function is nonlinear. Thus, it is possible to easily determine whether or not discrimination is possible even when a complicated nonlinear discrimination function is provided.

実施の形態３．
なお、上記実施の形態１、２では、判別可否判定手段１０９での判定対象として生の多変量データを扱っているが、影響因子として考えられる項目が非常に多く、２項目ずつ総当りで判定することが現実的に困難な場合には、図１内の破線矢印で示すように、特異値分解を用いた特徴量抽出手段１０６および特徴量蓄積手段１０６Ａを挿入することにより、次元を圧縮（たとえば、２次元化）してから判定処理を実行してもよい。
この場合、膨大な多変量データにおいても、影響因子を簡単に見つけることができるように構成してもよい。 Embodiment 3 FIG.
In the first and second embodiments, raw multivariate data is handled as a determination target in the determination capability determination unit 109. However, there are very many items that can be considered as influencing factors, and two items are determined round-robin. If it is practically difficult to do so, the dimension is compressed by inserting the feature quantity extraction means 106 and the feature quantity storage means 106A using singular value decomposition, as shown by the broken line arrows in FIG. For example, the determination process may be executed after two-dimensionalization.
In this case, the influential factor may be easily found even in a large amount of multivariate data.

また、影響因子（状態項目に対する要因項目）として考えられる項目が３項目以上の場合であれば、特徴量抽出手段１０６を用いてもよい。
すなわち、データ群抽出手段は、特徴量抽出手段１０６（変量圧縮手段）を含み、特徴量抽出手段１０６は、状態項目に対する要因項目として考えられる項目が３項目以上に達する場合には、要因項目の特異値を分解することにより、特徴量のみを抽出して２つの変量に圧縮する。 Further, if there are three or more items that can be considered as influencing factors (factor items for the state item), the feature amount extraction unit 106 may be used.
That is, the data group extraction unit includes a feature amount extraction unit 106 (variable compression unit), and the feature amount extraction unit 106 determines whether the factor item is a factor item when the number of items considered as a factor item for the state item reaches three or more. By decomposing the singular value, only the feature value is extracted and compressed into two variables.

この発明の実施の形態１〜３に係る多変量データ判別装置の全体構成を一部フローチャートで示すブロック図である。It is a block diagram which shows in part a flowchart the whole structure of the multivariate data discrimination | determination apparatus which concerns on Embodiment 1-3 of this invention. この発明の実施の形態１〜３に係る（ａ）採取データおよび（ｂ）属性情報の記録形式を示す説明図である。It is explanatory drawing which shows the recording format of (a) collection data and (b) attribute information which concern on Embodiment 1-3 of this invention. この発明の実施の形態１〜３に係る多変量データの記録形式を示す説明図である。It is explanatory drawing which shows the recording format of the multivariate data which concerns on Embodiment 1-3 of this invention. この発明の実施の形態１に係る多変量データ群の２次元マッピング状態を示す説明図である。It is explanatory drawing which shows the two-dimensional mapping state of the multivariate data group which concerns on Embodiment 1 of this invention. この発明の実施の形態１に係る多変量データ群の２次元マッピング状態を示す説明図である。It is explanatory drawing which shows the two-dimensional mapping state of the multivariate data group which concerns on Embodiment 1 of this invention. この発明の実施の形態１に係る多変量データ群の２次元マッピングの回転処理を示す説明図である。It is explanatory drawing which shows the rotation process of the two-dimensional mapping of the multivariate data group which concerns on Embodiment 1 of this invention. この発明の実施の形態２に係る多変量データ群の２次元マッピング上の非線形の判別関数を示す説明図である。It is explanatory drawing which shows the nonlinear discriminant function on the two-dimensional mapping of the multivariate data group which concerns on Embodiment 2 of this invention.

Explanation of symbols

１００計測器、１０１採取データ収集部、１０２属性情報収集部、１０３データ結合手段、１０４多変量データ蓄積手段、１０５データ抽出手段、１０５Ａ条件入力手段、１０６特徴量抽出手段、１０６Ａ特徴量蓄積手段（変量圧縮手段）、１０７抽出データ蓄積手段、１０８状態別分類作成手段、１０９判別可否判定手段、１０９Ａ関数入力手段、１１１因子影響度算出手段、１１２因子影響度表示手段、１１３ディスプレイ、４０３、５０１、５０２、６０１、７０３、７０４、７０５判別直線、６０４、６１４、６２４混在領域、７０６、７０７、７０８複数領域。 DESCRIPTION OF SYMBOLS 100 Measuring instrument 101 Collected data collection part 102 Attribute information collection part 103 Data combination means 104 Multivariate data accumulation means 105 Data extraction means 105A Condition input means 106 Feature quantity extraction means 106A Feature quantity accumulation means ( (Variable compression means), 107 extracted data storage means, 108 state-specific classification creation means, 109 discriminability determination means, 109A function input means, 111 factor influence calculation means, 112 factor influence display means, 113 display, 403, 501, 502, 601, 703, 704, 705 Discrimination line, 604, 614, 624 Mixed area, 706, 707, 708 Multiple areas.

Claims

A multivariate data discriminating apparatus that judges whether data collected by a sensor or manual measurement can be discriminated by an item designated as an influencing factor for one factor,
Data collection means for collecting the data, adding attribute information including ID information of the data to the data, and accumulating in the database as multivariate data;
By setting an extraction condition in which two factor items are designated for one state item corresponding to the factor, the state item satisfying the extraction condition from the multivariate data and the multiple corresponding to the factor item are set. A data group extraction means for extracting a variable data group;
Is it possible to discriminate the state item by the factor item by minimizing the mixed area of the multivariate data group when the multivariate data group is two-dimensionally mapped and rotated and projected onto one factor item? Discriminability determination means for determining whether or not,
A multivariate data discriminating apparatus comprising: a display unit configured to calculate and display the degree of influence of the factor item on the state item from the determination result of the discriminability determination unit.

The discriminability determination means calculates the ratio of the mixed area to the distribution area of the multivariate data group as a discriminating degree, and when the minimized discriminating degree of the mixed area shows a predetermined value or less, the factor item The multivariate data discriminating apparatus according to claim 1, wherein the status item is discriminated to be discriminable.

3. The determination item according to claim 1, wherein when the number of items that can be considered as the factor items for the state item reaches three or more items, the determination possibility determination unit determines the factor items two by two. The multivariate data discrimination device described.

The data group extraction means includes variable compression means,
When the number of items considered as the factor items for the state item reaches three or more items, the variable compression means extracts only the feature value by decomposing the singular value of the factor item and converts it into two variables. The multivariate data discriminating apparatus according to claim 3, wherein the data is compressed.

The discriminability determination unit includes a function input unit that inputs a linear discriminant function, and uses the discriminant function to determine whether the state item can be discriminated from the factor item. The multivariate data discriminating apparatus according to any one of claims 1 to 4.

The discriminability determination means has a function input means for inputting a nonlinear discriminant function, divides the discriminant function into a plurality of finitely divided regions and linearly approximates the state item by the factor item 5. The multivariate data discriminating apparatus according to claim 1, wherein the multivariate data discriminating apparatus according to claim 1 is determined.

The data collection means includes
A collected data collection unit for accumulating the data;
An attribute information collection unit for storing the attribute information;
Data combining means for combining the data and the attribute information into multivariate data;
Including multivariate data storage means serving as a database for storing the multivariate data,
The multivariate data discriminating apparatus according to any one of claims 1 to 6, wherein the attribute information includes a collection time or a collection place of the data.