JP7230654B2

JP7230654B2 - Data analysis method, data analysis apparatus, and computer program

Info

Publication number: JP7230654B2
Application number: JP2019074580A
Authority: JP
Inventors: 伊弦宮嵜; 隆道岩田
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2019-04-10
Filing date: 2019-04-10
Publication date: 2023-03-01
Anticipated expiration: 2039-04-10
Also published as: JP2020173581A

Description

本発明は、データ分析方法、データ分析装置、およびコンピュータプログラムに関する。 The present invention relates to a data analysis method, a data analysis device, and a computer program.

製品の元となる原料では、各原料の特性（例えば、粒度など）にばらつきがある。従来、このようなばらつきを含むデータを分析する方法が知られている（例えば、特許文献１参照）。特許文献１に記載された技術は、露光装置が所定の制御処理を行った場合に記録されるログデータを、グラフィカルラッソ（Graphical Lasso）を用いて評価している。非特許文献１には、データにおける各特性の関連度についての評価方法が記載されている。 Raw materials that form the basis of products have variations in the properties of each raw material (for example, particle size). Conventionally, a method of analyzing data containing such variations is known (see Patent Document 1, for example). The technique described in Patent Document 1 uses a graphical lasso to evaluate log data recorded when an exposure apparatus performs predetermined control processing. Non-Patent Document 1 describes an evaluation method for the degree of relevance of each characteristic in data.

特開２０１７－１６７３１０号公報JP 2017-167310 A

山口和範著、廣野元久著、棟近雅彦監修、「ＳＥＭ因果分析入門ＪＵＳＥ－ＳｔａｔＷｏｒｋｓオフィシャルテキスト（実部に役立つシリーズ６）」、日科技連出版社、２０１２年７月Written by Kazunori Yamaguchi, written by Motohisa Hirono, supervised by Masahiko Munechika, "Introduction to SEM Causal Analysis JUSE-StatWorks Official Text (Series 6 that is useful for practical use)", Nikkagiren Publishing Co., Ltd., July 2012

製造工程において収集された種々のデータの関連性は、製造ラインに依存していることがある。このため、製造工程において収集された種々のデータの分析は、製造ライン毎にデータを分割して実施される（以降「層別」とも呼ぶ）。しかしながら、製造ラインには依存しない原料ロットのデータ等は、層別されることにより標本数が減少する。このように、少ない標本数のデータから評価された関連度は、真の値から大きく乖離する可能性があるという問題があった。また、関連度として相関係数を用いた場合、偶然に高い相関係数が得られる場合もあり、このような場合にガウス型グラフィカルモデルを適用すると、分散共分散行列や相関係数行列の正定値性が満足できなくなり、関連度の評価ができないという問題があった。 The relevance of various data collected in the manufacturing process may depend on the manufacturing line. For this reason, analysis of various data collected in the manufacturing process is performed by dividing the data for each manufacturing line (hereinafter also referred to as "stratification"). However, the number of specimens for raw material lot data that does not depend on production lines is reduced by stratification. Thus, there is a problem that the degree of association evaluated from data of a small number of samples may deviate greatly from the true value. In addition, when the correlation coefficient is used as the degree of association, there are cases where a high correlation coefficient is obtained by chance. There is a problem that the value becomes unsatisfactory and the degree of relevance cannot be evaluated.

本発明は、上述した課題の少なくとも一部を解決するためになされたものであり、同じ原料ロットから異なる製造ラインによって製造された製品間における特性の関連度を正しく評価することを目的とする。 The present invention has been made to solve at least part of the above-described problems, and an object thereof is to correctly evaluate the degree of property relevance between products manufactured from the same raw material lot on different production lines.

本発明は、上述の課題を少なくとも一部を解決するためになされたものであり、以下の形態として実現することが可能である。データ分析方法であって、製造工程において収集されたデータであって、前記製造工程中の分析対象とする特定のプロセスに依存しない非依存データと、前記特定のプロセスに依存する依存データと、を含むデータセットを取得する取得工程と、前記非依存データの全てと、前記特定のプロセスに対応した前記依存データと、を用いて、前記データセットの関連性を評価する評価工程と、を備え、前記関連性の評価は、前記特定のプロセス毎に前記データセットを分割した状態で求められた分散共分散行列又は相関係数行列の内の前記非依存データに対応する要素に、前記特定のプロセス毎に前記データセットを分割しない状態で求められた分散共分散行列又は相関係数行列を用いた行列をガウス型グラフィカルモデルに入力することで実施する、データ分析方法。そのほか、本発明は、以下の形態としても実現可能である。 The present invention has been made to solve at least part of the above problems, and can be implemented as the following modes. A data analysis method, which is data collected in a manufacturing process and includes independent data that does not depend on a specific process to be analyzed during the manufacturing process and dependent data that depends on the specific process. and an evaluation step of evaluating the relevance of the data set using all of the independent data and the dependent data corresponding to the specific process, The evaluation of the relevance is performed on the element corresponding to the independent data in the variance-covariance matrix or the correlation coefficient matrix obtained with the data set divided for each of the specific processes. A data analysis method by inputting a matrix using a variance-covariance matrix or a correlation coefficient matrix obtained without dividing the data set into a Gaussian graphical model. In addition, the present invention can also be implemented as the following modes.

（１）本発明の一形態によれば、データ分析方法が提供される。このデータ分析方法は、製造工程において収集されたデータであって、前記製造工程中の分析対象とする特定のプロセスに依存しない非依存データと、前記特定のプロセスに依存する依存データと、を含むデータセットを取得する工程と、前記非依存データの全てと、前記特定のプロセスに対応した前記依存データと、を用いて、前記データセットの関連性を評価する工程と、を備える。 (1) According to one aspect of the present invention, a data analysis method is provided. This data analysis method includes data collected in a manufacturing process, independent data that does not depend on a specific process to be analyzed during the manufacturing process, and dependent data that depends on the specific process. obtaining a data set; and using all of the non-dependent data and the dependent data corresponding to the particular process to assess the relevance of the data set.

この構成によれば、データセットの関連性の評価に、特定のプロセスに依存しない非依存データの全てと、特定のプロセスに対応した依存データ（すなわち、特定のプロセス毎に分割されたデータのうち、特定のプロセスに対応したデータ）とを用いる。このため、データセットの全てを特定のプロセス毎に分割した場合と比較して、非依存データの標本数の減少を抑制することができると共に、標本数の減少により生じていた問題（関連度が真の値から乖離する、ガウス型グラフィカルモデルを適用した際に分散共分散行列や相関係数行列の正定値性が満足できなくなる）の発生を抑制し、関連度の評価を正しく行うことができる。 According to this configuration, all non-dependent data that does not depend on a specific process and dependent data that corresponds to a specific process (that is, out of the data divided for each specific process, , data corresponding to a specific process). For this reason, compared to the case where the entire data set is divided for each specific process, it is possible to suppress the decrease in the number of samples of independent data, and the problem caused by the decrease in the number of samples (the degree of relevance is low). It is possible to suppress the occurrence of divergence from the true value, and the positive definiteness of the variance-covariance matrix and correlation coefficient matrix cannot be satisfied when applying the Gaussian graphical model), and to correctly evaluate the degree of association. .

（２）上記形態のデータ分析方法において、前記特定のプロセスは、前記製造工程を実現する製造ラインであってもよい。
この構成によれば、特定のプロセスが製造ラインであるため、製造ラインでの加工前の非依存データと、加工後の依存データとを含むデータセットの関連性を評価できる。 (2) In the data analysis method of the above aspect, the specific process may be a manufacturing line that implements the manufacturing process.
According to this configuration, since the specific process is the manufacturing line, it is possible to evaluate the relevance of the data set including the independent data before processing on the manufacturing line and the dependent data after processing.

（３）上記形態のデータ分析方法において、前記非依存データは、前記製造工程において使用される各原料における各種特性を表すデータであり、前記依存データは、前記製造工程中の各工程において取得された加工品の各種特性と、各原料を加工するための加工条件と、の少なくとも１つを表すデータであってもよい。
この構成によれば、非依存データとしての原料の各種特性間の関連性を評価でき、依存データとしての加工品の各種特性間と、製造工程での加工条件間との少なくとも１つの関連性を評価できる。 (3) In the data analysis method of the above aspect, the non-dependent data is data representing various characteristics of each raw material used in the manufacturing process, and the dependent data is acquired in each step in the manufacturing process. The data may represent at least one of various characteristics of the processed product and processing conditions for processing each raw material.
According to this configuration, it is possible to evaluate the relationship between various characteristics of the raw material as independent data, and at least one relationship between the various characteristics of the processed product as dependent data and the processing conditions in the manufacturing process. can be evaluated.

（４）上記形態のデータ分析方法において、前記関連性の評価は、前記特定のプロセス毎に前記データセットを分割した状態で求められた分散共分散行列又は相関係数行列の内の前記非依存データに対応する要素に、前記特定のプロセス毎に前記データセットを分割しない状態で求められた分散共分散行列又は相関係数行列を用いた行列をガウス型グラフィカルモデルに入力することで実施してもよい。
この構成によれば、データセットの関連性の評価に分散共分散行列または相関係数行列が用いられる。分割された依存データのデータセットの分散共分散行列または相関係数行列の内、非依存データに対応する要素が非依存データセットの分散共分散行列または相関係数行列に置換される。これにより、非依存データの変数間の偶然に高い相関係数に起因するガウス型グラフィカルモデルにおけるエラーを生じずに関連性の計算を実行でき、変数間のネットワークとしての関連性を評価できる。 (4) In the data analysis method of the above form, the evaluation of the relevance is the non-dependence of the variance-covariance matrix or the correlation coefficient matrix obtained with the data set divided for each of the specific processes By inputting a matrix using a variance-covariance matrix or a correlation coefficient matrix obtained without dividing the data set for each specific process into the Gaussian graphical model as an element corresponding to the data good too.
According to this configuration, a variance-covariance matrix or a correlation coefficient matrix is used to evaluate the relevance of data sets. Among the variance-covariance matrix or correlation coefficient matrix of the data set of the divided dependent data, the elements corresponding to the independent data are replaced with the variance-covariance matrix or correlation coefficient matrix of the independent data set. This allows the computation of associations to be performed without the errors in Gaussian graphical models caused by chance high correlation coefficients between variables in the independent data, and the association as a network between variables to be evaluated.

なお、本発明は、種々の態様で実現することが可能であり、例えば、データ分析装置、データ分析システム、データ分析方法、データ分析装置の制御方法、これら装置や方法を実行するためのコンピュータプログラム、このコンピュータプログラムを配布するためのサーバ装置、コンピュータプログラムを記憶した一時的でない記憶媒体等の形態で実現することができる。 It should be noted that the present invention can be implemented in various aspects, such as a data analysis device, a data analysis system, a data analysis method, a control method for a data analysis device, and a computer program for executing these devices and methods. , a server device for distributing the computer program, a non-temporary storage medium storing the computer program, or the like.

本発明の一実施形態としてのデータ分析装置のブロック図である。1 is a block diagram of a data analysis device as one embodiment of the present invention; FIG. 原料データおよび製品データを含むデータセットの一例である。1 is an example data set including raw material data and product data; 原料データから算出された相関係数行列の一例である。It is an example of a correlation coefficient matrix calculated from raw material data. 製造ラインに対応するデータセットである。It is a data set corresponding to a manufacturing line. 製造ラインに対応するデータセットである。It is a data set corresponding to a manufacturing line. 製造ラインに対応するデータセットから算出された相関係数行列の一部である。It is part of the correlation coefficient matrix calculated from the data set corresponding to the production line. 図６に示される相関係数行列の一部を置換した相関係数行列である。7 is a correlation coefficient matrix obtained by substituting a part of the correlation coefficient matrix shown in FIG. 6; 本実施形態におけるデータ分析方法のフローチャートである。It is a flow chart of the data analysis method in this embodiment. 第２実施形態におけるデータ分析方法のフローチャートである。6 is a flow chart of a data analysis method in the second embodiment;

＜第１実施形態＞
図１は、本発明の一実施形態としてのデータ分析装置１０のブロック図である。データ分析装置１０は、図１に示されるように、原料粉Ａ，Ｂが製造ラインＬ１，Ｌ２のそれぞれで加工されて製品に製造されるまでの工程において、原料粉Ａ，Ｂの各種特性を表す原料データＤＴＳと、製品の各種特性を表す製品データＤＴＬ１，ＤＴＬ２を取得する。データ分析装置１０は、取得した各データにおける各種特性間の関連性を評価する。 <First Embodiment>
FIG. 1 is a block diagram of a data analysis device 10 as one embodiment of the invention. As shown in FIG. 1, the data analysis device 10 analyzes various characteristics of the raw material powders A and B in the process of processing the raw material powders A and B in production lines L1 and L2, respectively, and manufacturing them into products. Raw material data DTS representing the product and product data DTL1 and DTL2 representing various characteristics of the product are obtained. The data analysis device 10 evaluates the relationship between various characteristics in each acquired data.

データ分析装置１０は、ＣＰＵ（Central Processing Unit）１と、ＲＯＭ（Read Only Memory）２と、ＲＡＭ（Random Access Memory）３と、記憶部９とを備えている。ＣＰＵ１は、ＲＯＭ２に格納されているコンピュータプログラムをＲＡＭ３に展開して実行することにより、取得部４および評価部５として機能する。記憶部９は、原料データＤＴＳおよび製品データＤＴＬ１，ＤＴＬ２などの各種データを記憶する。記憶部９は、ハードディスクドライブ（ＨＤＤ：Hard Disk Drive）などで構成されている。 The data analysis device 10 includes a CPU (Central Processing Unit) 1 , a ROM (Read Only Memory) 2 , a RAM (Random Access Memory) 3 and a storage section 9 . The CPU 1 functions as an acquisition unit 4 and an evaluation unit 5 by deploying a computer program stored in the ROM 2 in the RAM 3 and executing it. The storage unit 9 stores various data such as raw material data DTS and product data DTL1 and DTL2. The storage unit 9 is configured by a hard disk drive (HDD) or the like.

本実施形態の製造工程では、図１に示されるように、２種類の原料粉Ａ，Ｂにバインダが加えられた原料が製造ラインＬ１と製造ラインＬ２とに分割される。製造ラインＬ１，Ｌ２のそれぞれでは、原料は、撹拌された後、ギヤの形に圧縮成形され、所定温度および所定時間の間焼結され、製品として出荷される。取得部４は、製造工程において収集される原料データＤＴＳおよび製品データＤＴＬ１，ＤＴＬ２を取得する。原料データＤＴＳは、バインダが加えられる前の、本実施形態の分析対象である製造ラインＬ１，Ｌ２に依存しない非依存データである。一方で、製品データＤＴＬ１，ＤＴＬ２は、製造ラインＬ１，Ｌ２に依存する依存データである。なお、原料データＤＴＳおよび製品データＤＴＬ１，ＤＴＬ２の取得方法としては、図示されていないキーボードやマウスといった入力部を介して入力されてもよいし、有線通信または無線通信によりデータが取得されてもよい。以下では、収集されたデータの関連性を、製造ラインＬ１，Ｌ２ごとに評価する場合について例示する。このため、製造ラインＬ１，Ｌ２が、特定のプロセスに相当する。 In the manufacturing process of the present embodiment, as shown in FIG. 1, raw materials obtained by adding a binder to two types of raw material powders A and B are divided into a manufacturing line L1 and a manufacturing line L2. In each of the production lines L1 and L2, the raw material is agitated, compression-molded into a gear shape, sintered at a predetermined temperature for a predetermined time, and shipped as a product. The acquisition unit 4 acquires raw material data DTS and product data DTL1 and DTL2 collected in the manufacturing process. The raw material data DTS is non-dependent data that does not depend on the production lines L1 and L2, which are the analysis targets of this embodiment, before the binder is added. On the other hand, the product data DTL1 and DTL2 are dependent data dependent on the production lines L1 and L2. As a method for obtaining the raw material data DTS and the product data DTL1 and DTL2, the data may be input via an input unit such as a keyboard or mouse (not shown), or the data may be obtained by wired or wireless communication. . Below, the case where the relevance of the collected data is evaluated for each production line L1, L2 will be exemplified. Therefore, production lines L1 and L2 correspond to specific processes.

図２は、原料データＤＴＳおよび製品データＤＴＬ１，ＤＴＬ２を含むデータセットＤＳの一例である。図２には、６つの製品ＩＤに対応付けられた各種特性が表として示されている。例えば、製品ＩＤ１では、原料ロットＩＤがａ、原料粉Ａの純度が６．２、原料粉Ａの水分率が１．１、製造されるラインが製造ラインＬ１、撹拌時の撹拌羽の回転数が３．２、圧縮前の重量が５．３、製品の硬さが１０．５である。なお、これらの特性を表す数値は、正規化された基準値に対する数値を表すものであるため、単位が存在していない。他の実施形態では、正規化されていない測定値そのものをデータとしてもよい。 FIG. 2 is an example of a data set DS including raw material data DTS and product data DTL1 and DTL2. FIG. 2 shows a table of various characteristics associated with six product IDs. For example, for product ID 1, the raw material lot ID is a, the purity of raw material powder A is 6.2, the moisture content of raw material powder A is 1.1, the production line is production line L1, and the rotation speed of the stirring blade during stirring is is 3.2, the weight before compression is 5.3, and the product hardness is 10.5. Note that the numerical values representing these characteristics do not have units because they represent numerical values with respect to normalized reference values. In other embodiments, the data may be the unnormalized measurements themselves.

図２に示されるように、製品ＩＤ１～３は、製造ラインＬ１により製造され、製品ＩＤ４～６は、製造ラインＬ２により製造されている。図２に示される原料粉Ａの純度および原料粉Ａの水分率は、製造ラインＬ１，Ｌ２に依存しない非依存データである。一方で、撹拌羽の回転数、圧縮前の重量、および硬さは、製造ラインＬ１，Ｌ２に依存する依存データである。依存データとしての製品データＤＴＬ１，ＤＴＬ２のそれぞれは、製造工程中の各工程においた各種特性と、各原料を加工するための加工条件とを表すデータである。なお、本実施形態における他の非依存データとしては、原料粉Ａ，Ｂの粒度などが挙げられる。また、本実施形態における他の依存データとしては、加工条件としての撹拌羽のトルク、圧縮荷重、焼結温度、および焼結時間と、加工された製品の特性としての圧縮後の重量および焼結後の重量などとが挙げられる。 As shown in FIG. 2, product IDs 1-3 are manufactured by manufacturing line L1, and product IDs 4-6 are manufactured by manufacturing line L2. The purity of the raw material powder A and the moisture content of the raw material powder A shown in FIG. 2 are independent data that do not depend on the production lines L1 and L2. On the other hand, the number of revolutions of the stirring blades, the weight before compression, and the hardness are dependent data depending on the production lines L1 and L2. Each of the product data DTL1 and DTL2 as dependent data is data representing various characteristics in each step in the manufacturing process and processing conditions for processing each raw material. Other non-dependent data in the present embodiment include particle sizes of the raw material powders A and B, and the like. In addition, other dependent data in this embodiment include the torque of the stirring blade, compression load, sintering temperature, and sintering time as processing conditions, and the weight after compression and sintering as characteristics of the processed product. After the weight and the like.

評価部５は、原料データＤＴＳの全てと、製品データＤＴＬ１，ＤＴＬ２とを用いて、図２に示されるデータセットＤＳにおける各種特性の関連性を評価する。評価部５は、製造ラインＬ１，Ｌ２に依存しない原料データＤＴＳの全てを用いて、原料粉Ａ，Ｂのそれぞれにおける各種特性間の関連性を評価する。本実施形態の評価部５は、図２に示される原料データＤＴＳを用いて、相関係数行列を算出する。 The evaluation unit 5 evaluates the relevance of various characteristics in the data set DS shown in FIG. 2 using all of the raw material data DTS and the product data DTL1 and DTL2. The evaluation unit 5 evaluates the relationships between various characteristics of the raw material powders A and B using all raw material data DTS independent of the production lines L1 and L2. The evaluation unit 5 of this embodiment calculates the correlation coefficient matrix using the raw material data DTS shown in FIG.

図３は、原料データＤＴＳから算出された相関係数行列の一例である。図３に示される原料粉Ａの純度と、原料粉Ａの水分率との相関係数は、０．７である。この相関係数行列は、図２における原料ロットＩＤａ～ＩＤｆの６つの原料ロットから導かれている。次に、評価部５は、図２のデータセットＤＳを、製造ラインＬ１と、製造ラインＬ２とに分割する。 FIG. 3 is an example of a correlation coefficient matrix calculated from raw material data DTS. The correlation coefficient between the purity of raw material powder A and the moisture content of raw material powder A shown in FIG. 3 is 0.7. This correlation coefficient matrix is derived from the six raw material lots IDa to IDf in FIG. Next, the evaluation unit 5 divides the data set DS of FIG. 2 into the production line L1 and the production line L2.

図４は、製造ラインＬ１に対応するデータセットＤＳ１である。図５は、製造ラインＬ２に対応するデータセットＤＳ２である。図４に示されるデータセットＤＳ１は、製造ラインＬ１で加工される原料ロットＩＤａ～ＩＤｃの３つの原料データＤＴＳ１と、製品データＤＴＬ１とから構成されている。図５に示されるデータセットＤＳ２は、製造ラインＬ２で加工される原料ロットＩＤｄ～ＩＤｆの３つの原料データＤＴＳ２と、製品データＤＴＬ２とから構成されている。 FIG. 4 is a data set DS1 corresponding to the production line L1. FIG. 5 is the data set DS2 corresponding to the production line L2. The data set DS1 shown in FIG. 4 is composed of raw material data DTS1 of three raw material lots IDa to IDc processed on the production line L1, and product data DTL1. The data set DS2 shown in FIG. 5 is composed of raw material data DTS2 of three raw material lots IDd to IDf processed on the production line L2, and product data DTL2.

評価部５は、図４および図５に示される分割されたデータセットＤＳ１，ＤＳ２のそれぞれに対して、相関係数行列を算出する。図６は、製造ラインＬ１に対応するデータセットＤＳ１から算出された相関係数行列の一部である。図６に示される相関係数行列は、製品ＩＤ１～３における各種特性に基づいて算出されている。そのため、図６中の太線で囲まれている範囲ＲＧ１の数値は、原料ロットＩＤａ～ＩＤｃの３つの原料ロットにおける各種特性から算出されている。範囲ＲＧ１の数値は、製造ラインＬ１に依存しない非依存データから算出されている。 The evaluation unit 5 calculates a correlation coefficient matrix for each of the divided data sets DS1 and DS2 shown in FIGS. 4 and 5. FIG. FIG. 6 shows part of the correlation coefficient matrix calculated from the data set DS1 corresponding to the production line L1. The correlation coefficient matrix shown in FIG. 6 is calculated based on various characteristics of product IDs 1-3. Therefore, the numerical values in the range RG1 surrounded by the thick line in FIG. 6 are calculated from various characteristics of the three raw material lots IDa to IDc. Numerical values in the range RG1 are calculated from independent data that does not depend on the production line L1.

図７は、図６に示される相関係数行列の一部を置換した相関係数行列である。評価部５は、図６に示される相関係数行列を算出すると、非依存データに対応する範囲ＲＧ１の数値を、原料データＤＴＳから算出した相関係数行列（図３）に置換する。そのため、図７中の太線で囲まれている範囲ＲＧ１の数値は、原料ロットＩＤａ～ＩＤｆの６つ全ての原料ロットにおける各種特性から算出されている。評価部５は、図７に示される範囲ＲＧ１置換後の相関係数行列を入力として、ガウス型グラフィカルモデルを用いて、各種特性間の関連度を評価する。 FIG. 7 is a correlation coefficient matrix in which a part of the correlation coefficient matrix shown in FIG. 6 is replaced. After calculating the correlation coefficient matrix shown in FIG. 6, the evaluation unit 5 replaces the numerical values of the range RG1 corresponding to the independent data with the correlation coefficient matrix (FIG. 3) calculated from the raw material data DTS. Therefore, the numerical values in the range RG1 surrounded by the thick line in FIG. 7 are calculated from the various characteristics of all the six raw material lots IDa to IDf. The evaluation unit 5 receives as input the correlation coefficient matrix after the range RG1 replacement shown in FIG. 7, and uses a Gaussian graphical model to evaluate the degree of association between various characteristics.

図８は、本実施形態におけるデータ分析方法のフローチャートである。データ分析方法では、初めに、取得部４は、原料データＤＴＳおよび製品データＤＴＬ１，ＤＴＬ２を含むデータセットＤＳを取得する（ステップＳ１）。評価部５は、製造ラインＬ１，Ｌ２に依存しない原料データＤＴＳから、図３に示される相関係数行列を算出する（ステップＳ２）。評価部５は、図２のデータセットＤＳを、製造ラインＬ１のデータセットＤＳ１と、製造ラインＬ２のデータセットＤＳ２とに分割する（ステップＳ３）。評価部５は、製造ラインＬ１と製造ラインＬ２とのそれぞれのデータセットＤＳ１，ＤＳ２から、相関係数行列を算出する（ステップＳ４）。評価部５は、製造ラインＬ１，Ｌ２のデータセットＤＳ１，ＤＳ２のそれぞれに対応する相関係数行列の内の非依存データに対応する範囲ＲＧ１（図６）に、製造ラインＬ１，Ｌ２毎にデータセットＤＳ１，ＤＳ２を分割してしない状態で求められた相関係数行列（図３）を置換する（ステップＳ５）。評価部５は、置換後の相関係数行列（図７）をガウス型グラフィカルモデルに入力することにより、各種特性間の関連性を評価し（ステップＳ６）、データ分析方法を終了する。なお、ステップＳ１は、取得工程に相当し、ステップＳ２～Ｓ６は、評価工程に相当する。 FIG. 8 is a flow chart of the data analysis method in this embodiment. In the data analysis method, first, the acquisition unit 4 acquires a data set DS including raw material data DTS and product data DTL1 and DTL2 (step S1). The evaluation unit 5 calculates the correlation coefficient matrix shown in FIG. 3 from the raw material data DTS independent of the production lines L1 and L2 (step S2). The evaluation unit 5 divides the data set DS of FIG. 2 into a data set DS1 for the production line L1 and a data set DS2 for the production line L2 (step S3). The evaluation unit 5 calculates correlation coefficient matrices from the data sets DS1 and DS2 of the production line L1 and the production line L2 (step S4). The evaluation unit 5 stores the data for each of the production lines L1 and L2 in the range RG1 (FIG. 6) corresponding to the independent data in the correlation coefficient matrix corresponding to the data sets DS1 and DS2 of the production lines L1 and L2. The correlation coefficient matrix (FIG. 3) obtained without dividing the sets DS1 and DS2 is replaced (step S5). The evaluation unit 5 inputs the correlation coefficient matrix (FIG. 7) after substitution into the Gaussian graphical model to evaluate the relationship between various characteristics (step S6), and ends the data analysis method. Note that step S1 corresponds to the acquisition step, and steps S2 to S6 correspond to the evaluation step.

以上説明したように、本実施形態のデータ分析装置１０では、取得部４は、製造工程において収集される原料データＤＴＳおよび製品データＤＴＬ１，ＤＴＬ２を取得する。評価部５は、製造ラインＬ１，Ｌ２毎に分割される前の原料データＤＴＳの全てと、製造ラインＬ１，Ｌ２毎の製品データＤＴＬ１，ＤＴＬ２とを用いて、データセットＤＳの関連性を評価する。すなわち、関連性の評価に用いられる非依存データとして、製造ラインＬ１，Ｌ２毎に分割された原料データＤＴＳ１，ＤＴＳ２の代わりに、製造ラインＬ１，Ｌ２に依存しない全ての原料データＤＴＳが用いられる。このため、データセットＤＳの全てを製造ラインＬ１，Ｌ２ごとにデータセットＤＳ１，ＤＳ２に分割した場合と比較して、非依存データである原料データＤＴＳの標本数の減少を抑制できる。これにより、関連性が真の値から乖離するなどの標本数の減少によって生じていた問題の発生を抑制し、関連性の評価を正しく行うことができる。 As described above, in the data analysis device 10 of the present embodiment, the acquisition unit 4 acquires the raw material data DTS and the product data DTL1 and DTL2 collected in the manufacturing process. The evaluation unit 5 evaluates the relevance of the data set DS using all the raw material data DTS before being divided for each of the production lines L1 and L2 and the product data DTL1 and DTL2 for each of the production lines L1 and L2. . That is, instead of the raw material data DTS1 and DTS2 divided by the manufacturing lines L1 and L2, all the raw material data DTS independent of the manufacturing lines L1 and L2 are used as the independent data used to evaluate the relationship. Therefore, compared to the case where all the data sets DS are divided into the data sets DS1 and DS2 for each of the production lines L1 and L2, it is possible to suppress the decrease in the number of samples of the raw material data DTS, which is independent data. As a result, the occurrence of problems caused by a decrease in the number of samples, such as relevance deviating from the true value, can be suppressed, and relevance can be evaluated correctly.

また、本実施形態のデータ分析装置１０では、製造ラインＬ１，Ｌ２を基準として、非依存データである原料データＤＴＳと、依存データである製品データＤＴＬ１，ＤＴＬ２とが分けられている。そのため、製造ラインでの加工前後の非依存データおよび依存データを含むデータセットＤＳの関連性を評価できる。 In addition, in the data analysis device 10 of the present embodiment, raw material data DTS, which is non-dependent data, and product data DTL1, DTL2, which are dependent data, are divided on the basis of production lines L1 and L2. Therefore, the relevance of the data set DS including independent data and dependent data before and after processing on the manufacturing line can be evaluated.

また、本実施形態のデータ分析装置１０では、非依存データである原料データＤＴＳは、原料の特性を表すデータである。また、依存データである製品データＤＴＬ１，ＤＴＬ２は、製造ラインＬ１，Ｌ２における加工条件を表すデータと、製造ラインＬ１，Ｌ２の加工後の製品の特性を表すデータとを含んでいる。そのため、本実施形態のデータ分析装置１０は、原料の各種特性間の関連性を評価でき、加工品の各種特性間の関連性と、製造ラインＬ１，Ｌ２での加工条件間の関連性とを評価できる。 In addition, in the data analysis device 10 of the present embodiment, the raw material data DTS, which is non-dependent data, is data representing characteristics of raw materials. The product data DTL1 and DTL2, which are dependent data, include data representing processing conditions in the production lines L1 and L2 and data representing characteristics of products after processing in the production lines L1 and L2. Therefore, the data analysis device 10 of the present embodiment can evaluate the relationship between various characteristics of the raw material, and can evaluate the relationship between the various characteristics of the processed product and the relationship between the processing conditions in the production lines L1 and L2. can be evaluated.

また、本実施形態のデータ分析装置１０では、データセットＤＳの関連性の評価に相関係数行列が用いられる。評価部５は、分割されたデータセットＤＳ１，ＤＳ２の内、非依存データに対応する範囲ＲＧ１の数値を、非依存データの原料データＤＴＳから算出される相関係数行列に置換する。これにより、非依存データの変数間の偶然に高い相関係数に起因するガウス型グラフィカルモデルにおけるエラーを生じずに関連性の計算を実行できる。また、ガウス型グラフィカルモデルを適用した際に相関係数行列の正定値性が満足できなくなることの発生を抑制できる。よって、本実施形態のデータ分析装置１０は、変数間のネットワークとしての関連性を評価できる。 Further, in the data analysis device 10 of the present embodiment, a correlation coefficient matrix is used for evaluating the relevance of the dataset DS. The evaluation unit 5 replaces the numerical values in the range RG1 corresponding to the independent data in the divided data sets DS1 and DS2 with the correlation coefficient matrix calculated from the raw material data DTS of the independent data. This allows association calculations to be performed without errors in Gaussian graphical models due to chance high correlation coefficients between variables in the independent data. In addition, it is possible to prevent the positive definiteness of the correlation coefficient matrix from being unsatisfied when the Gaussian graphical model is applied. Therefore, the data analysis device 10 of the present embodiment can evaluate network relevance between variables.

＜第２実施形態＞
図９は、第２実施形態におけるデータ分析方法のフローチャートである。第２実施形態のデータ分析方法を実施するデータ分析装置は、第１実施形態のデータ分析装置１０と同じである。第１実施形態のデータ分析方法の評価は、相関係数行列を用いたガウス型グラフィカルモデルによる評価とは異なる。第２実施形態では、第１実施形態と異なるデータ分析方法について説明し、第１実施形態と同じ構成などについての説明を省略する。 <Second embodiment>
FIG. 9 is a flow chart of a data analysis method in the second embodiment. A data analysis device that implements the data analysis method of the second embodiment is the same as the data analysis device 10 of the first embodiment. Evaluation of the data analysis method of the first embodiment is different from evaluation using a Gaussian graphical model using a correlation coefficient matrix. In the second embodiment, a data analysis method different from that of the first embodiment will be described, and description of the same configuration as that of the first embodiment will be omitted.

図９に示されるように、第２実施形態のデータ分析方法では、初めに、取得部４が、原料データＤＴＳおよび製品データＤＴＬ１，ＤＴＬ２を含むデータセットＤＳ（図２）を取得する（ステップＳ１１）。評価部５は、製造ラインＬ１，Ｌ２に依存しない原料データＤＴＳから、相関係数を算出する（ステップＳ１２）。評価部５は、データセットＤＳを、製造ラインＬ１のデータセットＤＳ１と、製造ラインＬ２のデータセットＤＳ２とに分割する（ステップＳ１３）。評価部５は、製造ラインＬ１と製造ラインＬ２とのそれぞれのデータセットＤＳ１，ＤＳ２から、製品データＤＴＬ１，ＤＴＬ２における各種特性間の相関係数および製品データＤＴＬ１，ＤＴＬ２における各種特性間と、原料データにおける各種特性間との相関係数を算出する（ステップＳ１４）。 As shown in FIG. 9, in the data analysis method of the second embodiment, first, the acquisition unit 4 acquires the data set DS (FIG. 2) including raw material data DTS and product data DTL1 and DTL2 (step S11 ). The evaluation unit 5 calculates a correlation coefficient from the raw material data DTS independent of the production lines L1 and L2 (step S12). The evaluation unit 5 divides the data set DS into a data set DS1 for the production line L1 and a data set DS2 for the production line L2 (step S13). The evaluation unit 5 determines correlation coefficients between various characteristics in the product data DTL1 and DTL2, correlation coefficients between various characteristics in the product data DTL1 and DTL2, raw material data , and the correlation coefficients between various characteristics are calculated (step S14).

第２実施形態のデータ分析方法では、評価部５は、原料データＤＴＳおよび製品データＤＴＬ１，ＤＴＬ２を用いて、製品データＤＴＬ１，ＤＴＬ２のそれぞれのデータセットＤＳ１，ＤＳ２の関連性を評価する。そのため、第１実施形態と同じように、第２実施形態のデータ分析装置１０は、製造ラインＬ１，Ｌ２毎に分割されたデータセットＤＳ１，ＤＳ２のそれぞれにおいて、原料データＤＴＳ間における各種特性の関連性をより正確に評価できる。 In the data analysis method of the second embodiment, the evaluation unit 5 uses raw material data DTS and product data DTL1 and DTL2 to evaluate the relevance of data sets DS1 and DS2 of product data DTL1 and DTL2, respectively. Therefore, in the same way as in the first embodiment, the data analysis device 10 of the second embodiment determines the relation of various characteristics between the raw material data DTS in each of the data sets DS1 and DS2 divided for each of the production lines L1 and L2. can more accurately assess gender.

＜本実施形態の変形例＞
本発明は上記の実施形態に限られるものではなく、その要旨を逸脱しない範囲において種々の態様において実施することが可能であり、例えば次のような変形も可能である。 <Modification of this embodiment>
The present invention is not limited to the above-described embodiments, and can be implemented in various aspects without departing from the scope of the invention. For example, the following modifications are possible.

第１実施形態および第２実施形態では、データ分析方法を実施するデータ分析装置１０を一例として説明したが、データ分析装置１０については種々変形可能である。例えば、データ分析装置１０は、音声入力を受け付けるマイク、関連性の評価結果を画像として表示するモニタ、関連性の評価結果をログとして出力する出力装置、および評価結果を音声出力するスピーカなどを備えていてもよい。 In the first and second embodiments, the data analysis device 10 that implements the data analysis method has been described as an example, but the data analysis device 10 can be modified in various ways. For example, the data analysis device 10 includes a microphone that accepts voice input, a monitor that displays the relationship evaluation results as an image, an output device that outputs the relationship evaluation results as a log, and a speaker that outputs the evaluation results as voice. may be

上記第１実施形態では、関連性の評価として、ガウス型グラフィカルモデルへの相関係数行列の入力が用いられたが、ガウス型グラフィカルモデルへの分散共分散行列の入力が用いられてもよい。この場合に、原料データＤＴＳにおける各種特性間の関連性も、分散共分散行列によって表現される。また、評価部５は、必ずしもガウス型グラフィカルモデルに算出した相関係数行列を入力する必要はなく、相関係数行列を算出してデータ分析方法（図８）を終了してもよい。 In the above-described first embodiment, the correlation coefficient matrix input to the Gaussian graphical model is used to evaluate the relevance, but the variance-covariance matrix input to the Gaussian graphical model may be used. In this case, the relationship between various characteristics in the raw material data DTS is also represented by the variance-covariance matrix. Moreover, the evaluation unit 5 does not necessarily have to input the calculated correlation coefficient matrix to the Gaussian graphical model, and may calculate the correlation coefficient matrix and terminate the data analysis method (FIG. 8).

上記第１実施形態では、製造ラインＬ１，Ｌ２に依存しない原料データＤＴＳと、製造ラインＬ１，Ｌ２に依存する製品データＤＴＬ１，ＤＴＬ２とを含むデータセットＤＳの関連性が評価されたが、それ以外の組み合わせのデータセットの関連性が評価されてもよい。例えば、異なる材料メーカから納品された複数の異なる原料データを、製造ラインＬ１で加工する場合、複数の原料データのそれぞれが製造ラインＬ１に依存しない非依存データである。この場合に、例えば、評価部５は、原料ロットのそれぞれに対応する複数の非依存データにおける関連性を表す相関係数行列を算出してもよい。 In the first embodiment, the relationship between the data set DS including the raw material data DTS independent of the production lines L1 and L2 and the product data DTL1 and DTL2 dependent on the production lines L1 and L2 was evaluated. may be evaluated for relevance of the combined datasets. For example, when a plurality of different raw material data delivered from different material manufacturers are processed on the production line L1, each of the plurality of raw material data is non-dependent data that does not depend on the production line L1. In this case, for example, the evaluation unit 5 may calculate a correlation coefficient matrix representing the relevance of a plurality of independent data corresponding to each raw material lot.

非依存データと依存データとを分ける特定のプロセスは、製造工程を実現する製造ラインＬ１，Ｌ２以外のプロセスであってもよい。例えば、製造ラインＬ１，Ｌ２中の撹拌および焼結の各工程に分割されたプロセスであってもよい。例えば、収集されたデータの関連性を、原料ロットごとに評価してもよい。この場合、原料ロットＩＤが相違する場合に、特定のプロセスが相違するとみなして処理を行う。 A specific process that separates the independent data and the dependent data may be a process other than the manufacturing lines L1 and L2 that implement the manufacturing process. For example, the process may be divided into steps of stirring and sintering in production lines L1 and L2. For example, the relevance of collected data may be evaluated for each raw material lot. In this case, if the raw material lot IDs are different, the processing is performed assuming that the specific processes are different.

依存データおよび非依存データは、各種特性を表すデータ以外でもよい。例えば、原料データＤＴＳとして、原料の仕入れ先、仕入れ価格、および仕入れ時期などであってもよい。また、製品データＤＴＬ１，ＤＴＬ２として、測定値、評価値、測定値と評価値との差違を表す指標、および販売価格などであってもよい。 Dependent data and non-dependent data may be data other than data representing various characteristics. For example, raw material data DTS may include the supplier of the raw material, the purchase price, and the purchase timing. Further, the product data DTL1 and DTL2 may be a measured value, an evaluation value, an index representing the difference between the measured value and the evaluation value, a sales price, and the like.

以上、実施形態、変形例に基づき本態様について説明してきたが、上記した態様の実施の形態は、本態様の理解を容易にするためのものであり、本態様を限定するものではない。本態様は、その趣旨並びに特許請求の範囲を逸脱することなく、変更、改良され得ると共に、本態様にはその等価物が含まれる。また、その技術的特徴が本明細書中に必須なものとして説明されていなければ、適宜、削除することができる。 The present aspect has been described above based on the embodiments and modifications, but the above-described embodiments are intended to facilitate understanding of the present aspect, and do not limit the present aspect. This aspect may be modified and modified without departing from the spirit and scope of the claims, and this aspect includes equivalents thereof. Also, if the technical features are not described as essential in this specification, they can be deleted as appropriate.

１…ＣＰＵ
２…ＲＯＭ
３…ＲＡＭ
４…取得部
５…評価部
９…記憶部
１０…データ分析装置
Ａ，Ｂ…原料粉
ＤＳ，ＤＳ１，ＤＳ２…データセット
ＤＴＬ１，ＤＴＬ２…製品データ
ＤＴＳ，ＤＴＳ１，ＤＴＳ２…原料データ
ＩＤａ～ＩＤｆ…原料ロット
ＩＤ１～ＩＤ６…製品
Ｌ１，Ｌ２…製造ライン
ＲＧ１…範囲 1 CPU
2 ROM
3 RAM
4... Acquisition unit 5... Evaluation unit 9... Storage unit 10... Data analysis device A, B... Raw material powder DS, DS1, DS2... Data set DTL1, DTL2... Product data DTS, DTS1, DTS2... Raw material data IDa to IDf... Raw material Lot ID1 to ID6 ... Product L1, L2 ... Production line RG1 ... Range

Claims

A data analysis method comprising:
Acquisition for acquiring a data set that is data collected in a manufacturing process and that includes independent data that does not depend on a specific process to be analyzed during the manufacturing process and dependent data that depends on the specific process process and
an evaluation step of evaluating relevance of the data set using all of the non-dependent data and the dependent data corresponding to the specific process;
with
The evaluation of the relevance is performed on the element corresponding to the independent data in the variance-covariance matrix or the correlation coefficient matrix obtained with the data set divided for each of the specific processes. A data analysis method by inputting a matrix using a variance-covariance matrix or a correlation coefficient matrix obtained without dividing the data set into a Gaussian graphical model.

The data analysis method according to claim 1,
The data analysis method, wherein the specific process is a manufacturing line that implements the manufacturing process.

The data analysis method according to claim 2,
The independent data is data representing various characteristics of each raw material used in the manufacturing process,
The data analysis method, wherein the dependent data is data representing at least one of various characteristics of the processed product obtained in each step of the manufacturing process and processing conditions for processing each raw material.

A data analysis device,
Acquisition for acquiring a data set that is data collected in a manufacturing process and that includes independent data that does not depend on a specific process to be analyzed during the manufacturing process and dependent data that depends on the specific process Department and
an evaluation unit that evaluates the relevance of the data set using all of the non-dependent data and the dependent data corresponding to the specific process;
with
The evaluation of the relevance is performed on the element corresponding to the independent data in the variance-covariance matrix or the correlation coefficient matrix obtained with the data set divided for each of the specific processes. A data analysis device , which is implemented by inputting a matrix using a variance-covariance matrix or a correlation coefficient matrix obtained without dividing the data set into a Gaussian graphical model.

A computer program,
Acquisition for acquiring a data set that is data collected in a manufacturing process and that includes independent data that does not depend on a specific process to be analyzed during the manufacturing process and dependent data that depends on the specific process function and
an evaluation function that evaluates the relevance of the data set using all of the non-dependent data and the dependent data corresponding to the specific process;
is realized on a computer,
The evaluation of the relevance is performed on the element corresponding to the independent data in the variance-covariance matrix or the correlation coefficient matrix obtained with the data set divided for each of the specific processes. A computer program executed by inputting a matrix using a variance-covariance matrix or a correlation coefficient matrix obtained without dividing the data set into a Gaussian graphical model.