JP2003142361A

JP2003142361A - Method for analyzing data

Info

Publication number: JP2003142361A
Application number: JP2001338185A
Authority: JP
Inventors: Hidetaka Tsuda; 英隆津田; Eidai Shirai; 英大白井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2001-11-02
Filing date: 2001-11-02
Publication date: 2003-05-16
Anticipated expiration: 2021-11-02
Also published as: JP4866520B2

Abstract

PROBLEM TO BE SOLVED: To efficiently and automatically extract a fault cause or the like, without depending on technologist's subjectivity, so as to analyze data to extract the fault cause or the like. SOLUTION: The method for analyzing the data comprises a step of selecting and extracting data to be an object to be analyzed from original data group, such as a yield value, various type measured values or the like (step S1). The method further comprises a step of extracting one or more data distribution features to the extracted data (step S2). The method also comprises steps of selecting a data distribution feature variables to be an object to be analyzed from them, and data mining a regression tree analysis or the like wit the feature variable as a target variable (step S3). The method also comprises the steps of finishing the regression tree analysis of the extracted data distribution feature (step S4), outputting the analyzed result and confirming it by the technologist (step S5), and making determination of his decision (step S6).

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、広く産業界で取り
扱われるデータ間の関連を把握し、産業上優位な結果を
もたらすための有意性のある結果を抽出するデータ解析
方法に関し、さらに、解析対象とするデータ値やその平
均値だけに注目していては判別が困難である知識や情報
を抽出するデータ解析方法に関する。また、解析結果の
精度等を評価するデータ解析方法およびデータ解析装置
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data analysis method for grasping relationships between data widely handled in industry and extracting meaningful results for producing industrially superior results. The present invention relates to a data analysis method for extracting knowledge or information that is difficult to discriminate by paying attention only to a target data value and its average value. The present invention also relates to a data analysis method and a data analysis device for evaluating the accuracy of analysis results.

【０００２】たとえば、半導体製造工程において取得さ
れる使用装置履歴、試験結果、設計情報または各種測定
データ等をもって歩留りの変動状況を把握し、よって歩
留り向上に有利な条件を抽出するためになされるデータ
解析方法に関する。特に、計算機システムに蓄積されて
いるオリジナルデータやその平均値だけでなく、それら
オリジナルデータ等を編集することによって得られるデ
ータ分布特徴を自動的かつ定量的に抽出して認識し、そ
の特徴量に基づいて半導体等の低歩留り要因を抽出し、
評価するデータ解析方法およびデータ解析装置に関す
る。[0002] For example, data used for grasping a yield variation condition based on a history of used devices, test results, design information or various measurement data acquired in a semiconductor manufacturing process, and thus extracting conditions advantageous for improving yield. Regarding analysis method. In particular, not only the original data stored in the computer system and its average value, but also the data distribution features obtained by editing these original data are automatically and quantitatively extracted and recognized, Based on low yield factors such as semiconductors,
The present invention relates to a data analysis method and a data analysis device for evaluation.

【０００３】また、複数の説明変数が互いに交絡（独立
でなくなる）してしまい、有意差の抽出が困難になる場
合に対処し、より効率的かつ信頼性のある解析結果を得
るためにデータ解析結果の精度等を評価するデータ解析
方法およびデータ解析装置に関する。Further, in order to obtain a more efficient and reliable analysis result, a data analysis is made to deal with the case where a plurality of explanatory variables are entangled with each other (not independent) and it becomes difficult to extract a significant difference. The present invention relates to a data analysis method and a data analysis device for evaluating the accuracy of results and the like.

【０００４】[0004]

【従来の技術】半導体データの歩留り解析を例にとって
進める。特に、プロセスデータ解析のように、その解析
結果から品質、生産性向上の対策決定のための参考デー
タを得ようとする場合には、その精度、信頼度等が重要
であり、これについては本願発明者等により既に出願さ
れている（出願番号：特願２００１−１２７５３４
号）。歩留り低下要因をできるだけ速やかに見つけて対
策を実施するために、装置履歴、試験結果、設計情報、
各種測定データ等から歩留りに効いている要因やその要
因に効いている別の要因を見つけるためのデータ解析が
おこなわれる。2. Description of the Related Art A semiconductor data yield analysis will be taken as an example. In particular, when trying to obtain reference data for determining measures for quality and productivity improvement from the analysis results, such as process data analysis, the accuracy, reliability, etc. are important. The inventors have already filed an application (application number: Japanese Patent Application No. 2001-127534).
issue). In order to find the factors that reduce the yield as quickly as possible and implement countermeasures, equipment history, test results, design information,
Data analysis is performed to find factors that are effective in yield and other factors that are effective in that factor from various measurement data.

【０００５】データ解析において、歩留り値のように解
析対象となるものを目的変数、目的変数の要因となる装
置履歴、試験結果、設計情報、各種測定データ等は説明
変数といわれる。その際に各種統計学的手法が適用され
るが、そのうちの一つとしてデータマイニングを適用す
ることで、多種大量のデータから判別しにくい価値ある
情報や規則性を抽出することができる。In the data analysis, what is to be analyzed, such as the yield value, is called the explanatory variable, the objective variable, the device history that causes the objective variable, the test result, the design information, and various measurement data. Various statistical methods are applied at that time, and by applying data mining as one of them, it is possible to extract valuable information and regularity that are difficult to discriminate from a large amount of various kinds of data.

【０００６】半導体デバイスの不良要因を解析するため
には、収集されたデータをより多面的に科学的根拠に基
づいて解析し、より多くの有意差を抽出するのが重要で
ある。そのため、従来は計算機システムに蓄積されたオ
リジナルデータの値やその平均値がよく活用されてい
る。しかし、複雑に絡み合ったオリジナルデータ群から
不良要因等を抽出するのが困難な場合もある。そのよう
な場合、ウェーハ面内チップやロット内ウェーハの各種
測定結果や歩留り等に関して特徴的なデータ分布が存在
すれば、それに基づいて不良データの解析が進められる
場合がある。In order to analyze the cause of failure of a semiconductor device, it is important to analyze the collected data in a multifaceted manner based on scientific grounds and extract more significant differences. Therefore, conventionally, the value of the original data accumulated in the computer system and its average value are often used. However, in some cases, it is difficult to extract a defect factor or the like from an original data group that is intricately entangled. In such a case, if there is a characteristic data distribution regarding various measurement results of wafer in-plane chips or wafers in a lot, yield, etc., analysis of defective data may proceed based on it.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、従来の
計算機システムでは、たとえば歩留り値や電気的特性値
等のオリジナルデータは蓄積されているが、ウェーハ面
内の複数のチップやロット内の複数のウェーハにわたる
特徴的なデータ分布はほとんど蓄積されていない。した
がって、技術者はオリジナルデータの編集をおこない、
各種統計解析ツールや図表作成ツール等を用いてデータ
分布状況を取得する必要がある。そして、取得したデー
タ分布状況を、技術者が有する経験やノウハウ等に照ら
し合わせて、データの集計や傾向を認識する必要があ
る。そのため、大量のオリジナルデータの分布に関する
特徴量を客観的に把握することは困難である。また、こ
のように技術者の主観が入ったデータ分布の特徴量に基
づいて解析を進めても正確な結果が得られないという問
題点がある。However, in the conventional computer system, although original data such as yield value and electric characteristic value is accumulated, a plurality of chips in a wafer surface and a plurality of wafers in a lot are accumulated. The characteristic data distribution over the whole is hardly accumulated. Therefore, the technician edits the original data,
It is necessary to acquire the data distribution status using various statistical analysis tools and chart creation tools. Then, it is necessary to compare the acquired data distribution status with the experience and know-how of the technician and to recognize the totalization and tendency of the data. Therefore, it is difficult to objectively grasp the feature amount related to the distribution of a large amount of original data. Further, there is a problem that an accurate result cannot be obtained even if the analysis is advanced based on the feature amount of the data distribution including the subjectivity of the engineer.

【０００８】また、従来は、技術者が各種統計解析ツー
ルや図表作成ツール等により得たデータ分布状況を見
て、たとえばその分布に、ある特徴が「ある」か「な
い」か、ある特徴の増減傾向が「増大である」か「減少
である」か、ある特徴に２の周期性が「ある」か「な
い」か、またはある特徴に３の周期性が「ある」か「な
い」か、などというように、データ分布の特徴量を離散
値で表している。そのため、ある特徴がどの程度ある
（または、ない）のか、あるいはある特徴がどの程度増
大傾向（または、減少傾向）にあるのか、というような
程度を表す情報が欠落してしまう。また、たとえばある
特徴が、ある程度の２の周期性とある程度の３の周期性
を同時に有するような場合、より程度が強い方の周期性
しか認識されないという問題点がある。Further, conventionally, an engineer looks at the data distribution status obtained by various statistical analysis tools, chart creation tools, etc., and, for example, the distribution has a certain characteristic “whether” or “not present” or a certain characteristic. Whether the increase / decrease tendency is “increase” or “decrease”, whether a certain feature has “2” periodicity or “not”, or whether a certain feature has “3” periodicity ” , Etc., the feature quantity of the data distribution is represented by discrete values. For this reason, information indicating the extent to which a certain characteristic exists (or does not exist) or the degree to which a certain characteristic has an increasing tendency (or decreasing tendency) is missing. Further, for example, when a certain feature has a certain degree of periodicity of 2 and a certain degree of periodicity of 3 at the same time, there is a problem that only the stronger periodicity is recognized.

【０００９】また、各種試験結果や測定結果、およびそ
れらの組み合わせまで考慮すると、想定されるデータ分
布特徴の組み合わせは膨大になり、それらすべてについ
て調査するのは極めて困難である。しかも、抽出したデ
ータ分布特徴に対応する不良要因は必ずしも既知のもの
ではないし、また既知でない不良要因を判別するには多
くの経験やノウハウが必要であるという問題点もある。Further, considering various test results, measurement results, and combinations thereof, the combinations of assumed data distribution features become enormous, and it is extremely difficult to investigate all of them. In addition, there is a problem that the defect factor corresponding to the extracted data distribution feature is not always known, and much experience and know-how is required to determine an unknown defect factor.

【００１０】また、実際に、たとえば半導体データの歩
留り解析にデータマイニングを適用してみても、うまく
行かない場合がある。金融や流通などの分野での適用で
は、何百万件もの膨大なデータ件数があり、説明変数の
数はせいぜい数十であるため、精度の高い分析結果が得
られた。ところが半導体プロセスデータ解析の場合はデ
ータ件数が少なく、同じ品種では多くても２００ロット
程度であるにもかかわらず、説明変数の数は数百にも達
し（装置履歴、工程内検査値等）、複数の説明変数が独
立ではなくなってしまい、単純にデータマイニングをお
こなっただけでは信頼できる結果が得られないことがあ
る。以下に、これについて半導体データの歩留り解析を
例にとって簡単に説明する。[0010] Actually, even if data mining is applied to the yield analysis of semiconductor data, for example, it may not be successful. When applied to fields such as finance and distribution, there are millions of enormous amounts of data, and the number of explanatory variables is at most tens, so highly accurate analysis results were obtained. However, in the case of semiconductor process data analysis, the number of data is small, and the number of explanatory variables reaches several hundreds (apparatus history, in-process inspection value, etc.) despite the fact that the same type has about 200 lots at most. Multiple explanatory variables are no longer independent, and simple data mining may not give reliable results. This will be briefly described below by taking the yield analysis of semiconductor data as an example.

【００１１】データ数（例：ロット数）に比較して説明
変数（例：ＬＳＩ製造工程データ）が多いプロセスデー
タ解析において、複数の説明変数が互いに交絡（独立で
なくなる）してしまい、統計的有意差による問題点が十
分絞り込めないことが多くある。データマイニング（回
帰木分析など）を適用した場合においても、この問題が
ある場合には、かなり手間をかけて分析結果の精度、信
頼できる範囲の確認が必要となる。In process data analysis in which there are many explanatory variables (eg, LSI manufacturing process data) compared to the number of data (eg, lot number), a plurality of explanatory variables are entangled with each other (not independent), and statistically. In many cases, problems due to significant differences cannot be sufficiently narrowed down. Even when data mining (regression tree analysis, etc.) is applied, if this problem occurs, it is necessary to spend a lot of time to confirm the accuracy of the analysis result and the reliable range.

【００１２】図４０は、ロットの流れと異常製造装置の
関係を示す。白丸“○”は正常装置１０１を示し、黒丸
“●”は異常装置１０２を示す。矢印はロットの流れを
示す。ＬＳＩ製造データにおける装置間差の解析は、各
ロットの工程ごとの使用装置データから、どの製造工程
でどの製造装置を使用すると歩留りが最も影響を受ける
かを抽出する。FIG. 40 shows the relationship between the lot flow and the abnormal manufacturing apparatus. A white circle “◯” indicates a normal device 101, and a black circle “●” indicates an abnormal device 102. Arrows indicate lot flow. The analysis of the difference between the devices in the LSI manufacturing data extracts, from the used device data for each process of each lot, which manufacturing device is used in which manufacturing process the yield is most affected.

【００１３】図４１は、従来技術によるある工程での装
置別歩留り分布（箱ヒゲ図）を示す。各製造工程ごとに
使用した装置ごとにそのロットの歩留り値を箱ヒゲ図で
表示し、各工程について確認していき、最も差が顕著な
工程とその装置を同定する。FIG. 41 shows the yield distribution (box whisker diagram) by device in a certain process according to the prior art. The yield value of the lot is displayed in a box-and-whisker diagram for each device used in each manufacturing process, and each process is checked to identify the process with the most remarkable difference and the device.

【００１４】しかし、この手法では工程数が数百となっ
た現在では大きな工数を要し、また差異が明確に出ない
場合や条件が複雑に絡み合った場合などはなかなか判断
が付きにくい。これらに対処するために回帰木分析によ
るデータマイニング手法が有効であり、目的変数の値が
高くなる使用装置群と低くなる使用装置群に分割する。
図４２のようにロットごとに使用される装置を固定して
ロットを流した場合、黒丸“●”で示す異常装置１０２
が一意に同定できないことがある。すなわち、説明変数
間の独立性が低い場合は、集合の２分割による有意差が
大となるものが必ずしも“真に有意差が大”であるとは
限らない。However, this method requires a large number of steps at the present when the number of steps is several hundreds, and it is difficult to make a judgment when a difference is not apparent or when conditions are complicatedly entangled. A data mining method by regression tree analysis is effective for dealing with these, and the data is divided into a device group in which the value of the objective variable is high and a device group in which it is low.
As shown in FIG. 42, when the device used for each lot is fixed and the lot is flown, the abnormal device 102 indicated by a black circle "●"
May not be uniquely identified. That is, when the independence between the explanatory variables is low, the one in which the significant difference due to the two divisions of the set is large is not always “the true significant difference”.

【００１５】以上が、半導体製造の各工程における使用
装置における交絡であるが、回帰木分析結果として２分
割された集合の交絡についても同様である。すなわち、
各工程ごとに高歩留りが生じている装置群と低歩留りが
生じている装置群からなる集合についても同じことがい
える。この２分割された集合の交絡については、説明変
数が連続値である場合も同様である。The above is the confounding in the apparatus used in each step of semiconductor manufacturing, but the same applies to the confounding of the set divided into two as the result of the regression tree analysis. That is,
The same can be said for a set of a device group having a high yield and a device group having a low yield for each process. The same applies to the confounding of the two divided sets even when the explanatory variable is a continuous value.

【００１６】本発明は、上記問題点に鑑みてなされたも
のであって、オリジナルデータを編集して各種統計値等
のデータ分布特徴量を抽出し、それを客観的に認識して
活用することにより、不良要因等の抽出を自動的におこ
なうデータ解析方法を提供することを目的とする。ま
た、複数の説明変数間の交絡の度合いを明確にすること
ができるデータ解析方法およびデータ解析装置を提供す
ることを本発明の目的に含めることができる。The present invention has been made in view of the above problems, and edits original data to extract data distribution characteristic amounts such as various statistical values, and objectively recognizes and utilizes them. Therefore, it is an object of the present invention to provide a data analysis method for automatically extracting factors such as defects. Further, it can be included in the object of the present invention to provide a data analysis method and a data analysis device capable of clarifying the degree of confounding among a plurality of explanatory variables.

【００１７】[0017]

【課題を解決するための手段】上記目的を達成するた
め、本発明は、計算機システムに蓄積されているオリジ
ナルデータ群内に存在する種々のデータ分布特徴を自動
的かつ定量的に評価して抽出し、その抽出された各特徴
量を順次選択して解析をおこなうことにより、各特徴量
が生じた要因を自動的かつ定量的に評価して抽出するこ
とを特徴とする。この発明によれば、オリジナルデータ
からデータ分布の傾向や特徴的パターンやデータ間の関
連性などの多くの情報が抽出されるので、従来は多種多
様なデータに埋もれて判別が困難であった関連性や有意
差が効率的に科学的根拠に基づいて定量的に抽出され
る。In order to achieve the above object, the present invention is to automatically and quantitatively evaluate and extract various data distribution characteristics existing in an original data group accumulated in a computer system. Then, the extracted feature quantities are sequentially selected and analyzed to automatically and quantitatively evaluate and extract the factors causing the feature quantities. According to the present invention, a lot of information such as the tendency of data distribution, characteristic patterns, and relationships between data is extracted from the original data. Gender and significant differences are efficiently and quantitatively extracted based on scientific evidence.

【００１８】また、複数の説明変数間の交絡の度合いを
明確にするため、説明変数および目的変数のデータ結果
を準備するステップと、そのデータ結果を基に複数の説
明変数間の交絡度および／または独立度を演算するステ
ップと、交絡度および／または独立度を用いてデータマ
イニングをおこなうステップとを有するデータ解析方法
が提供される。複数の説明変数間の交絡度および／また
は独立度を演算することにより、説明変数の交絡の度合
いを明確に把握できる。これを基に回帰木分析をおこな
えば、回帰木分析の集合の２分割結果に基づき、説明変
数の交絡度を定量的に評価できるようになり、回帰木に
おける最初の分岐の有意差が大きい問題となる説明変数
に交絡している注意すべき説明変数を明確化することが
可能となる。Further, in order to clarify the degree of confounding among a plurality of explanatory variables, a step of preparing data results of explanatory variables and objective variables, and a degree of confounding between a plurality of explanatory variables based on the data result and / or Alternatively, there is provided a data analysis method including a step of calculating the degree of independence and a step of performing data mining using the degree of confounding and / or the degree of independence. By calculating the degree of confounding and / or the degree of independence between a plurality of explanatory variables, the degree of confounding of explanatory variables can be clearly understood. If regression tree analysis is performed based on this, it becomes possible to quantitatively evaluate the degree of confounding of explanatory variables based on the result of bisection of the set of regression tree analysis, and there is a large difference in the first branch in the regression tree. It is possible to clarify the explanatory variables that should be noted and are confounded with the explanatory variables that are

【００１９】[0019]

【発明の実施の形態】以下に、本発明の実施の形態１、
２について図面を参照しつつ詳細に説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiment 1 of the present invention will be described below.
2 will be described in detail with reference to the drawings.

【００２０】（実施の形態１）図１は、本発明の実施の
形態１にかかるデータ解析方法の実施に供せられる計算
機システムのハードウェア構成の一例を示す図である。
この計算機システムは、図１に示すように、入力装置
１、中央処理装置２、出力装置３および記憶装置４から
構成される。(Embodiment 1) FIG. 1 is a diagram showing an example of a hardware configuration of a computer system used for implementing a data analysis method according to Embodiment 1 of the present invention.
As shown in FIG. 1, this computer system comprises an input device 1, a central processing unit 2, an output device 3 and a storage device 4.

【００２１】図２は、図１に示す構成の計算機システム
により実現されるデータ解析装置の機能構成の一例を示
すブロック図である。このデータ解析装置は、図２に示
すように、複数のオリジナルデータを含むデータベース
４１からなるオリジナルデータ群４２を有する。このデ
ータベース４１は、図１に示す計算機システムの記憶装
置４において構築されている。FIG. 2 is a block diagram showing an example of the functional configuration of the data analysis apparatus realized by the computer system having the configuration shown in FIG. As shown in FIG. 2, this data analysis device has an original data group 42 including a database 41 containing a plurality of original data. The database 41 is constructed in the storage device 4 of the computer system shown in FIG.

【００２２】また、データ解析装置は、オリジナルデー
タ群４２の中に存在する１以上のデータ分布特徴を定量
的に評価して抽出する手段２１、抽出した１以上のデー
タ分布特徴量の中から解析対象とする特徴量を選択する
手段２２、解析対象に選択したデータ分布特徴量を目的
変数として回帰木分析手法などによるデータマイニング
をおこない、データ分布に潜む特徴や規則性などのルー
ルファイル２４を抽出する手段２３、抽出したルールフ
ァイル２４を用いてオリジナルデータの分布特徴を解析
する統計解析コンポーネント２５や図表作成コンポーネ
ント２６などの解析ツール群２７を備える。Further, the data analysis device includes means 21 for quantitatively evaluating and extracting one or more data distribution features existing in the original data group 42, and analyzing from the extracted one or more data distribution feature quantities. A means 22 for selecting a target feature amount, data mining by a regression tree analysis method using the data distribution feature amount selected as an analysis target as a target variable, and a rule file 24 for features and regularities hidden in the data distribution is extracted. Means 23, and an analysis tool group 27 such as a statistical analysis component 25 for analyzing distribution characteristics of original data using the extracted rule file 24 and a chart creation component 26.

【００２３】以上の各手段２１，２２，２３および解析
ツール群２７は、それぞれの処理をおこなうためのプロ
グラムを中央処理装置２で実行することにより実現され
る。抽出されたルールファイル２４は記憶装置４に記憶
されるとともに、表示装置や印刷装置などの出力装置３
により出力される。意思決定５は解析ツール群２７によ
る解析結果に基づいてなされる。The above-mentioned means 21, 22, 23 and analysis tool group 27 are realized by the central processing unit 2 executing programs for performing the respective processing. The extracted rule file 24 is stored in the storage device 4 and also output device 3 such as a display device or a printing device.
Is output by. The decision 5 is made based on the analysis result by the analysis tool group 27.

【００２４】また、上述したルールファイル２４を抽出
する手段２３は、オリジナルデータ群４２の中のオリジ
ナルデータ、データ分布特徴を抽出する手段２１により
抽出されたデータ分布特徴、または解析ツール群２７に
よる解析結果に対してもデータマイニングをおこなうよ
うになっている。また、解析ツール群２７は、オリジナ
ルデータ群４２の中のオリジナルデータ、データ分布特
徴を抽出する手段２１により抽出されたデータ分布特
徴、または解析ツール群２７の出力結果に対しても解析
をおこなうようになっている。また、解析ツール群２７
による解析結果は、解析対象となるデータ分布特徴量を
選択する手段２２やオリジナルデータ群４２にフィード
バックされる。また、オリジナルデータ群４２には、デ
ータ分布特徴を抽出する手段２１の出力がフィードバッ
クされる。The means 23 for extracting the above-mentioned rule file 24 is the original data in the original data group 42, the data distribution characteristics extracted by the means 21 for extracting the data distribution characteristics, or the analysis by the analysis tool group 27. Data mining is also performed on the results. The analysis tool group 27 also analyzes the original data in the original data group 42, the data distribution feature extracted by the means 21 for extracting the data distribution feature, or the output result of the analysis tool group 27. It has become. In addition, the analysis tool group 27
The analysis result by is fed back to the means 22 for selecting the data distribution characteristic amount to be analyzed and the original data group 42. The output of the means 21 for extracting the data distribution feature is fed back to the original data group 42.

【００２５】図３は、たとえば図２のデータ分布特徴を
抽出する手段２１により抽出されたデータ分布特徴量を
ＣＳＶ形式で出力した例を示す。各特徴量はレコードご
とに独立して求められるので、独立して扱われる。たと
えば図３に示すように、各特徴量はＣＳＶ形式で自動的
に出力されるので、各特徴量ごとに効率的に有意な解析
をおこなうことが可能となる。ここで、特徴量となるデ
ータは、オリジナルデータ値やその平均値だけでなく、
オリジナルデータの最大値、最小値、レンジまたは標準
偏差値等でもよい。また、データの周期性や特定モデル
との類似度などを特徴量のデータとしてもよい。FIG. 3 shows an example in which the data distribution characteristic amount extracted by the means 21 for extracting the data distribution characteristic shown in FIG. 2 is output in CSV format. Since each feature amount is calculated independently for each record, it is treated independently. For example, as shown in FIG. 3, since each feature amount is automatically output in the CSV format, it is possible to efficiently and significantly analyze each feature amount. Here, the data that is the feature amount is not only the original data value and its average value,
It may be the maximum value, the minimum value, the range, or the standard deviation value of the original data. Further, the periodicity of the data, the degree of similarity with the specific model, or the like may be used as the feature amount data.

【００２６】ここで、対象とするデータ群の構造によっ
て様々な特徴を抽出することができるが、目的に合わせ
てどのような特徴量を抽出するかという処理をあらかじ
めプログラムに組み込んでおくか、あるいは抽出する特
徴量を定義したファイルを用意して、そのファイルを読
むようにしてもよい。いずれの特徴量も、離散値ではな
く、その特徴がどの程度の強さであるかという連続値で
定義される。したがって、従来のように離散値化による
情報の欠落が生じないので、より良好な解析結果が期待
される。Here, various features can be extracted according to the structure of the target data group, but the process of extracting what kind of feature amount according to the purpose is incorporated in the program in advance, or It is also possible to prepare a file that defines the feature quantity to be extracted and read the file. Each feature amount is not defined as a discrete value, but is defined as a continuous value indicating how strong the feature is. Therefore, since there is no loss of information due to the discrete value conversion as in the conventional case, a better analysis result is expected.

【００２７】データ分布特徴量の一例として、半導体デ
ータの歩留り解析におけるロット内データ分布の特徴に
ついて説明する。図４は、ウェーハの属性値の変動に着
目した情報を示す一覧表である。ここでは、独立変数は
ウェーハ番号であり、従属変数は歩留り、カテゴリ歩留
りまたは各種測定値等のオリジナルデータである。As an example of the data distribution feature amount, the feature of the intra-lot data distribution in the yield analysis of semiconductor data will be described. FIG. 4 is a list showing information focusing on the variation of the wafer attribute value. Here, the independent variable is the wafer number, and the dependent variable is the original data such as yield, category yield or various measured values.

【００２８】特に限定しないが、図４に示す例では、
（１）データ分布全体の中心、（２）データのばらつ
き、（３）ウェーハ番号に対するデータの相関、（４）
一次近似した時のｙ軸切片、（５）ウェーハ番号に対す
るデータの傾き、（６）周期２(枚)の強さ、（７）周期
３(枚)の強さ、（８）ロット内で最も強い周期、（９）
前半ウェーハ−後半ウェーハの平均値の差、（１０）前
半ウェーハ−後半ウェーハのばらつきの差、（１１）前
半ウェーハ−後半ウェーハの相関の差、（１２）前半ウ
ェーハ−後半ウェーハの一次近似ｙ軸切片の差、（１
３）前半ウェーハ−後半ウェーハの傾きの差、（１４）
後半ロットの周期２（枚）の強さ、（１５）後半ロット
の周期３（枚）の強さ、および（１６）後半ロットの最
も強い周期、の１６個の特徴項目が定義されている。各
特徴項目の特徴量はロット単位で求められる。Although not particularly limited, in the example shown in FIG.
(1) Center of overall data distribution, (2) Data variation, (3) Correlation of data with wafer number, (4)
Y-axis intercept at the time of first-order approximation, (5) slope of data with respect to wafer number, (6) strength of cycle 2 (sheets), (7) strength of cycle 3 (sheets), (8) most within lot Strong cycle, (9)
First half wafer-second half wafer average difference, (10) first half wafer-second half wafer variation difference, (11) first half wafer-second half wafer correlation difference, (12) first half wafer-second half wafer primary approximation y-axis Difference of intercepts, (1
3) Difference in inclination between first half wafer and second half wafer, (14)
Sixteen characteristic items are defined, namely, the strength of the second half lot cycle 2 (sheets), (15) the strength of the second half lot cycle 3 (sheets), and (16) the strongest cycle of the second half lot. The feature amount of each feature item is obtained in lot units.

【００２９】ここで定義された１６個の特徴項目につい
て簡単に説明する。（１）の特徴量は同一ロット内の全
ウェーハの歩留りや各種測定値等の平均値である。
（２）の特徴量は同一ロット内の全ウェーハの歩留りや
各種測定値等の標準偏差値である。（３）の特徴量は同
一ロット内のウェーハのウェーハ番号と歩留りや各種測
定値等との相関係数であり、この相関係数の算出の仕方
はあらかじめ解析の対象や目的などに照らし合わせて決
められている。（４）の特徴量は同一ロット内のウェー
ハのウェーハ番号をｘとし、歩留りや各種測定値等をｙ
とし、ｘとｙの関係を一次式ｙ＝ｂ・ｘ＋ａに近似した
時のｙ軸切片の値である。The 16 characteristic items defined here will be briefly described. The feature amount of (1) is an average value of yields and various measured values of all wafers in the same lot.
The feature amount of (2) is a standard deviation value of yields of various wafers in the same lot and various measured values. The feature quantity of (3) is the correlation coefficient between the wafer number of wafers in the same lot and the yield and various measured values. The method of calculating this correlation coefficient is based on the target and purpose of analysis beforehand. It has been decided. In the feature quantity of (4), the wafer number of wafers in the same lot is x, and the yield and various measured values are y.
Is the value of the y-axis intercept when the relation between x and y is approximated to the linear expression y = b · x + a.

【００３０】（５）の特徴量は同一ロット内のウェーハ
のウェーハ番号をｘとし、歩留りや各種測定値等をｙと
し、ｘとｙの関係を一次式ｙ＝ｂ・ｘ＋ａに近似した時
の母回帰係数である。（６）の特徴量は同一ロット内の
全ウェーハの歩留りや各種測定値等の分散と、ウェーハ
番号が１、３、５、・・・のウェーハ群またはウェーハ
番号が２、４、６、・・・のウェーハ群の歩留りや各種
測定値等の分散との比である。（７）の特徴量は同一ロ
ット内の全ウェーハの歩留りや各種測定値等の分散と、
ウェーハ番号が１、４、７、・・・のウェーハ群、ウェ
ーハ番号が２、５、８、・・・のウェーハ群またはウェ
ーハ番号が３、６、９、・・・のウェーハ群の歩留りや
各種測定値等の分散との比である。（８）の特徴量は同
一ロット内のウェーハについて、上記（６）や（７）の
ようにして求めた周期２（枚)や３（枚）の分散比、同
様にして求められる周期４（枚）や５（枚）、・・・な
どの分散比のうち、分散比が最大となる周期の値であ
る。The characteristic amount of (5) is obtained by approximating the relation between x and y to a linear expression y = b · x + a, where x is the wafer number of wafers in the same lot, and y is the yield and various measured values. It is a mother regression coefficient. The feature amount of (6) is the distribution of yields and various measured values of all wafers in the same lot, and the wafer group of wafer numbers 1, 3, 5, ... Or the wafer numbers of 2, 4, 6 ,. .. is the ratio to the yield of wafers and the variance of various measured values. The feature amount of (7) is the variance of yields and various measured values of all wafers in the same lot,
The yield of wafer groups having wafer numbers 1, 4, 7, ..., Wafer groups having wafer numbers 2, 5, 8, ... Or the yield of wafer groups having wafer numbers 3, 6, 9 ,. It is the ratio to the variance of various measured values. The feature amount of (8) is the dispersion ratio of the cycles 2 (sheets) and 3 (sheets) obtained as described in (6) and (7) above for the wafers in the same lot, and the cycle 4 (which is similarly obtained. Of the dispersion ratios such as (sheet), 5 (sheet), ..., It is the value of the cycle in which the dispersion ratio is maximum.

【００３１】（９）の特徴量は、同一ロット内の全ウェ
ーハ（たとえば５０枚）を前半（たとえば２５枚）と後
半（たとえば２５枚）に分け、前半ウェーハ群の歩留り
や各種測定値等の平均値と後半ウェーハ群の歩留りや各
種測定値等の平均値との差である。このように前半と後
半に分けるのは、半導体製造プロセスにおいて装置履歴
が異なるからである。（１０）の特徴量は、前半ウェー
ハ群の歩留りや各種測定値等の標準偏差値と後半ウェー
ハ群の歩留りや各種測定値等の標準偏差値との差であ
る。（１１）の特徴量は前半ウェーハ群の相関係数と後
半ウェーハ群の相関係数との差である。（１２）の特徴
量は前半ウェーハ群の一次近似ｙ軸切片の値と後半ウェ
ーハ群の一次近似ｙ軸切片の値との差である。The characteristic amount of (9) is to divide all wafers (for example, 50 wafers) in the same lot into the first half (for example, 25 wafers) and the latter half (for example, 25 wafers), and obtain the yield of the first half wafer group and various measured values. It is the difference between the average value and the average value of the yield of the latter half wafer group and various measured values. The reason why the device is divided into the first half and the second half is that the device history is different in the semiconductor manufacturing process. The feature amount of (10) is the difference between the yield of the first half wafer group and the standard deviation value of various measured values and the standard deviation value of the yield of the latter half wafer group and various measured values. The feature amount of (11) is the difference between the correlation coefficient of the first half wafer group and the correlation coefficient of the second half wafer group. The feature amount of (12) is the difference between the value of the first-order approximate y-axis intercept of the first half wafer group and the value of the first-order approximate y-axis intercept of the latter half wafer group.

【００３２】（１３）の特徴量は前半ウェーハ群の母回
帰係数と後半ウェーハ群の母回帰係数との差である。
（１４）の特徴量は後半ロット群について上記（６）と
同様の周期２に関する分散比である。（１５）の特徴量
は後半ロット群について上記（７）と同様の周期３に関
する分散比である。（１６）の特徴量は後半ロット群に
ついて上記（８）と同様に分散比が最大となる周期の値
である。なお、前半ロット群についても上記（１４）〜
（１６）のように周期２（枚）の強さ、周期３（枚）の
強さ、最も強い周期を定義してもよい。また、ここで例
示した特徴項目に限らず、解析の対象や目的などに応じ
て種々の特徴項目が定義される。The feature amount (13) is the difference between the mother regression coefficient of the first half wafer group and the mother regression coefficient of the second half wafer group.
The feature amount of (14) is the dispersion ratio for the period 2 similar to (6) above for the latter half lot group. The feature amount of (15) is the variance ratio for the period 3 for the latter half lot group as in (7) above. The feature quantity of (16) is the value of the cycle in which the variance ratio becomes maximum for the latter half lot group, as in (8) above. Regarding the first half lot group, the above (14)-
The strength of cycle 2 (sheets), the strength of cycle 3 (sheets), and the strongest cycle may be defined as in (16). Further, not only the characteristic items illustrated here, but various characteristic items are defined according to the analysis target and purpose.

【００３３】上述したような特徴項目を定義して解析す
ることによって、たとえば従来のように歩留り値等のオ
リジナルデータ値やその平均値を用いて解析しただけで
は使用装置による有意差を抽出できない場合でも、使用
装置による有意差を抽出することが可能となる場合があ
る。たとえば、複数のロットについてロット内ウェーハ
の歩留り値のばらつき（上記（２）に対応）に注目した
例を図５に示す。When the characteristic items as described above are defined and analyzed, a significant difference depending on the apparatus used cannot be extracted only by analyzing the original data value such as the yield value or the average value as in the conventional case. However, it may be possible to extract a significant difference depending on the device used. For example, FIG. 5 shows an example in which variations in the yield value of wafers within a lot (corresponding to (2) above) are noted for a plurality of lots.

【００３４】図５に示す例では、工程１で２１号機、２
２号機、２４号機または２５号機を使用したロット群６
（図５の一点鎖線の左側）と２８号機を使用したロット
群７（図５の一点鎖線の右側）とでは、ウェーハ歩留り
の平均値や全体の分布（ロット間ばらつき）はほとんど
同じである。したがって、ウェーハ歩留りの値やその平
均値を用いて解析しても明らかな有意差は認められな
い。それに対して、各ロット内でのウェーハ歩留り値の
ばらつきというデータ分布の特徴量に注目すると、２つ
のロット群６，７の間には明らかな有意差が認められ
る。注目する項目は上記（２）の項目に限らず、上記
（１）の項目、上記（３）〜（１６）のいずれかの項
目、またはその他の項目であってもよい。In the example shown in FIG.
Lot group 6 using Unit 2, Unit 24 or Unit 25
The average value of wafer yields and the overall distribution (variation between lots) of the lot group 7 (on the left side of the alternate long and short dashed line in FIG. 5) and the lot group 7 using the No. 28 machine (on the right side of the alternate long and short dashed line in FIG. 5) are almost the same. Therefore, no clear significant difference is observed even when the analysis is performed using the wafer yield value and the average value thereof. On the other hand, when attention is paid to the characteristic amount of the data distribution, that is, the variation of the wafer yield value within each lot, a clear significant difference is recognized between the two lot groups 6 and 7. The item of interest is not limited to the item (2), but may be the item (1), any of the items (3) to (16), or any other item.

【００３５】上述したように各データ分布特徴量は各ロ
ットの属性値として存在するので、前記データ分布特徴
量を選択する手段２２は、各データ分布特徴量を順次目
的変数として選択する。そして、データマイニングをお
こなってルールファイル２４を抽出する手段２３は、各
データ分布特徴量を順次目的変数として回帰木分析をお
こなう。それによって、そのデータ分布特徴量が生じた
要因の判別が可能となり、従来の解析方法よりも多くの
不良要因を抽出することができる。その際、データ分布
特徴量を順次選択する処理および回帰木分析処理はプロ
グラムにしたがって自動的に実行されるので、技術者は
どのデータ分布特徴量を目的変数に選択するかを考えず
に済み、解析を効率的におこなうことができる。特に何
を解析すべきかが不明な場合には有効である。As described above, since each data distribution characteristic amount exists as the attribute value of each lot, the means 22 for selecting the data distribution characteristic amount sequentially selects each data distribution characteristic amount as the objective variable. Then, the means 23 for performing the data mining and extracting the rule file 24 performs the regression tree analysis using the respective data distribution feature quantities as the objective variables in order. As a result, it is possible to determine the cause of the data distribution characteristic amount, and it is possible to extract more defect factors than the conventional analysis method. At that time, since the process of sequentially selecting the data distribution feature amount and the regression tree analysis process are automatically executed according to the program, the engineer does not have to consider which data distribution feature amount is selected as the objective variable. Analysis can be performed efficiently. This is especially effective when it is unknown what to analyze.

【００３６】また、同一レコードに、周期２（枚）の強
さと周期３（枚）の強さの両方が存在する場合のよう
に、複数の特徴パターンが見られる場合でも、両方の特
徴を評価することが可能となるので、情報量の欠落をな
くしてより実状を反映した解析結果が得られる。Further, even when a plurality of characteristic patterns are seen, as in the case where both the strength of cycle 2 (sheets) and the strength of cycle 3 (sheets) exist in the same record, both characteristics are evaluated. Since it is possible to do so, it is possible to obtain an analysis result that reflects the actual situation by eliminating the loss of information amount.

【００３７】つぎに、本発明の実施の形態１にかかるデ
ータ解析方法の流れについて説明する。図６は、本発明
にかかるデータ解析方法の一例の概略を示すフローチャ
ートである。図６に示すように、このデータ解析方法が
開始されると、まずオリジナルデータ群４２の中から解
析の対象とするデータ、たとえば歩留り値や各種測定値
等が選択されて抽出される（ステップＳ１）。つづい
て、抽出されたデータに対して１以上のデータ分布特徴
を抽出する処理がおこなわれる（ステップＳ２）。Next, the flow of the data analysis method according to the first embodiment of the present invention will be described. FIG. 6 is a flowchart showing the outline of an example of the data analysis method according to the present invention. As shown in FIG. 6, when this data analysis method is started, first, data to be analyzed, such as yield value and various measured values, is selected and extracted from the original data group 42 (step S1). ). Subsequently, a process of extracting one or more data distribution features is performed on the extracted data (step S2).

【００３８】そして、解析対象とするデータ分布特徴量
が選択され、それを目的変数として回帰木分析等のデー
タマイニングがおこなわれる（ステップＳ３）。ステッ
プＳ２で抽出されたすべてのデータ分布特徴について回
帰木分析が終了したら（ステップＳ４）、分析結果が出
力され、技術者はその確認をおこなう（ステップＳ
５）。そして、技術者は、分析結果に基づいて意思決定
をおこなう（ステップＳ６）。Then, the data distribution feature quantity to be analyzed is selected, and data mining such as regression tree analysis is performed using the selected data distribution feature quantity as an objective variable (step S3). When the regression tree analysis is completed for all the data distribution features extracted in step S2 (step S4), the analysis result is output, and the engineer confirms the result (step S4).
5). Then, the engineer makes a decision based on the analysis result (step S6).

【００３９】つぎに、本発明の特徴をより明らかにする
ため、データ分布特徴量を用いたデータ解析方法につい
て具体例を挙げて説明する。一般に、同一ロット内のウ
ェーハ群でもウェーハ番号が異なるとウェーハ単位の歩
留り値や電気的特性値は異なり、それらの値はいろいろ
な変動パターンを示す。歩留り値や電気的特性値はウェ
ーハ単位で保存されている。そのため、本実施の形態１
では、このように複数のロットにわたってウェーハ番号
に対する歩留り値等の変動パターンをデータ分布特徴と
して解析をおこなうことができる。ここでは、製品の性
能に大きな影響を及ぼす重要な電気的特性であるテスト
用代用Ｎｃｈトランジスタスレッシュホールド電圧ＶＴ
＿Ｎ２（以下、単にＶＴ＿Ｎ２とする）について、多面
的な解析をおこなう例を示す。なお、歩留りには各製造
工程での使用装置履歴が効果があるとする。Next, in order to further clarify the features of the present invention, a data analysis method using the data distribution feature will be described with a specific example. Generally, even in a group of wafers in the same lot, if the wafer numbers are different, the yield value and the electrical characteristic value of each wafer are different, and these values show various fluctuation patterns. The yield value and the electrical characteristic value are stored for each wafer. Therefore, the first embodiment
In this way, it is possible to analyze the variation pattern such as the yield value with respect to the wafer number as the data distribution characteristic over a plurality of lots. Here, the test substitute Nch transistor threshold voltage VT, which is an important electrical characteristic that greatly affects the product performance, is used.
An example of performing multifaceted analysis on _N2 (hereinafter, simply referred to as VT_N2) will be described. In addition, it is assumed that the history of used devices in each manufacturing process is effective for the yield.

【００４０】図７は歩留りとＶＴ＿Ｎ２との関係を示す
特性図であるが、同図より歩留りとＶＴ＿Ｎ２とは一見
して無関係のように見える。また、図８はすべてのウェ
ーハから得られたＶＴ＿Ｎ２データのヒストグラムであ
り、図９は全ＶＴ＿Ｎ２データをウェーハ番号ごとに表
示した箱ヒゲ図である。これらの図に示す結果から統計
的有意差を抽出するのは困難である。FIG. 7 is a characteristic diagram showing the relationship between the yield and VT_N2. From the figure, the yield and VT_N2 seem to be irrelevant at first glance. Further, FIG. 8 is a histogram of VT_N2 data obtained from all wafers, and FIG. 9 is a box whiskers diagram showing all VT_N2 data for each wafer number. It is difficult to extract statistically significant differences from the results shown in these figures.

【００４１】また、図１０は、目的変数を各ロットにお
けるＶＴ＿Ｎ２の平均値とし、説明変数を各工程で使用
した装置名として回帰木分析をおこなった結果の例を示
す図であり、図１１はこの回帰木分析の信頼度情報を表
す評価用統計値リストの例を示す図である。この回帰木
分析結果によれば、図１０に示すように、ＶＴ＿Ｎ２の
変動に対して最も有意とされるのは第２配線＿装置とし
て、１１号機または１３号機を使用したか、あるいは１
２号機、１４号機、１７号機または１８号機を使用した
かということである。全ＶＴ＿Ｎ２データを第２配線＿
装置の使用装置名ごとに表示した箱ヒゲ図を図１２に示
すが、同図においては顕著な有意差が見られない。な
お、評価用統計値リストは回帰木図とともに出力される
が、これについては後述する。FIG. 10 is a diagram showing an example of the result of a regression tree analysis in which the objective variable is the average value of VT_N2 in each lot and the explanatory variable is the device name used in each step. It is a figure which shows the example of the statistical value list for evaluation showing the reliability information of this regression tree analysis. According to the result of this regression tree analysis, as shown in FIG. 10, the most significant value with respect to the variation of VT_N2 is No. 11 or No. 13 as the second wiring_device, or 1
It means whether Unit 2, Unit 14, Unit 17, or Unit 18 was used. All VT_N2 data on the second wiring_
FIG. 12 shows a box whiskers diagram displayed for each device name used, but no significant difference is seen in the diagram. The evaluation statistical value list is output together with the regression tree diagram, which will be described later.

【００４２】また、図１３は、目的変数を各ウェーハの
ＶＴ＿Ｎ２の値とし、説明変数を各工程で使用した装置
名として回帰木分析をおこなった結果の例を示す図であ
り、図１４はこの回帰木分析に対する評価用統計値リス
トの例を示す図である。この回帰木分析結果によれば、
図１３に示すように、ＶＴ＿Ｎ２の変動に対して最も有
意とされるのは２ＣＯＮ工程＿装置として、１１号機を
使用したか、あるいは１２号機または１３号機を使用し
たかということである。図１５は、全ＶＴ＿Ｎ２データ
を２ＣＯＮ工程＿装置の使用装置名ごとに表示した箱ヒ
ゲ図であるが、この図においても顕著な有意差は見られ
ない。FIG. 13 is a diagram showing an example of a result of performing a regression tree analysis with the objective variable being the value of VT_N2 of each wafer and the explanatory variable being the device name used in each step. FIG. It is a figure which shows the example of the statistical value list for evaluation with respect to a regression tree analysis. According to this regression tree analysis result,
As shown in FIG. 13, what is most significant with respect to the fluctuation of VT_N2 is whether the No. 11 machine, or No. 12 or No. 13 machine was used as the 2CON process_apparatus. FIG. 15 is a box whiskers diagram showing all VT_N2 data for each device name used in the 2CON process_device, but no significant difference is seen in this diagram either.

【００４３】それに対して、以下のようにＶＴ＿Ｎ２に
ついてデータ分布特徴を抽出して解析をおこなうことに
より不良要因の解明が可能となる。図１６は、ロットご
とにＶＴ＿Ｎ２の各特徴量を定義したファイルの一例を
ＣＳＶ形式で示す図表である。このファイルは、図２に
示す装置のデータ分布特徴を抽出する手段２１により出
力される。On the other hand, by extracting and analyzing the data distribution characteristics of VT_N2 as described below, the cause of failure can be clarified. FIG. 16 is a chart showing in CSV format an example of a file that defines each feature amount of VT_N2 for each lot. This file is output by the means 21 for extracting the data distribution characteristics of the device shown in FIG.

【００４４】図１７は、図１６に示すＣＳＶ形式データ
に基づいてＶＴ＿Ｎ２の種々のロット内分布の特徴量を
示すヒストグラムである。ここでは、ＶＴ＿Ｎ２につい
て、図４に関連して説明した（１）〜（１６）の１６個
の特徴項目のうち、（１）平均値（ＶＴ＿Ｎ２＿ａｖ
ｅ）、（２）標準偏差値（ＶＴ＿Ｎ２＿ｓ）、（３）ウ
ェーハ番号に対する相関係数（ＶＴ＿Ｎ２＿ｒ）、
（４）一次近似式のｙ軸切片（ＶＴ＿Ｎ２＿ａ）、
（５）母回帰係数（ＶＴ＿Ｎ２＿ｂ）、（６）ウェーハ
番号の間隔２の周期性（ＶＴ＿Ｎ２＿２）、（７）ウェ
ーハ番号の間隔３の周期性（ＶＴ＿Ｎ２＿３）、（９）
前半ウェーハと後半ウェーハの平均値の差（ＶＴ＿Ｎ２
＿ａｖｅ＿ｄ）、（１０）前半ウェーハと後半ウェーハ
の標準偏差値の差（ＶＴ＿Ｎ２＿ｓ＿ｄ）、（１１）前
半ウェーハと後半ウェーハの相関係数の差（ＶＴ＿Ｎ２
＿ｒ＿ｄ）、（１２）前半ウェーハと後半ウェーハの一
次近似式のｙ軸切片の差（ＶＴ＿Ｎ２＿ａ＿ｄ）、（１
３）前半ウェーハと後半ウェーハの母回帰係数の差（Ｖ
Ｔ＿Ｎ２＿ｂ＿ｄ）、の１２個が抽出されている。FIG. 17 is a histogram showing the characteristic quantities of various distributions within the lot of VT_N2 based on the CSV format data shown in FIG. Here, regarding VT_N2, of the 16 characteristic items (1) to (16) described with reference to FIG. 4, (1) average value (VT_N2_av
e), (2) standard deviation value (VT_N2_s), (3) correlation coefficient for wafer number (VT_N2_r),
(4) y-axis intercept (VT_N2_a) of the first-order approximation formula,
(5) mother regression coefficient (VT_N2_b), (6) periodicity of wafer number interval 2 (VT_N2_2), (7) periodicity of wafer number interval 3 (VT_N2_3), (9)
Difference between the average value of the first half wafer and the latter half wafer (VT_N2
_Ave_d), (10) difference in standard deviation between the first half wafer and the second half wafer (VT_N2_s_d), (11) difference in correlation coefficient between the first half wafer and second half wafer (VT_N2)
_R_d), (12) difference of the y-axis intercept of the first-order approximate expression of the first half wafer and the second half wafer (VT_N2_a_d), (1
3) Difference of the mother regression coefficient between the first half wafer and the second half wafer (V
T_N2_b_d) are extracted.

【００４５】図１７より、いずれの特徴量もかなりばら
ついていることがわかる。したがって、各特徴量を目的
変数として回帰木分析をおこなえば、それぞれの特徴量
に有意差が生じた要因、すなわち不良要因等を解析する
ことができる。From FIG. 17, it can be seen that all the feature quantities are considerably varied. Therefore, if a regression tree analysis is performed using each feature amount as an objective variable, a factor causing a significant difference between the feature amounts, that is, a failure factor or the like can be analyzed.

【００４６】データ分布特徴を解析対象として効率的に
解析結果を得るために、回帰木分析の入力データとし
て、図１８に示すように、ロットごとに、各工程での使
用装置名と、抽出された特徴量とが定義されたファイル
が作成される。このファイルは、歩留りの変動要因を回
帰木分析で解析する際の入力データとして各工程での使
用装置名とロット歩留りを定義したルールファイル２４
（図１９参照）と、図１６に示すファイルとを同一ロッ
ト番号について結合したものである。In order to efficiently obtain the analysis result by using the data distribution feature as the analysis target, as the input data of the regression tree analysis, as shown in FIG. 18, the name of the device used in each process and the extracted data are extracted for each lot. A file is created in which the specified features are defined. This file is a rule file 24 that defines the device name used in each process and the lot yield as input data when analyzing the factor of yield variation by regression tree analysis.
(See FIG. 19) and the file shown in FIG. 16 are combined for the same lot number.

【００４７】図２０は、図１８に示すファイルに基づい
て、上述した（２）標準偏差値（ＶＴ＿Ｎ２＿ｓ）を目
的変数とし、各工程での使用装置名を説明変数として、
ＶＴ＿Ｎ２のロット内で生じているばらつきの要因を抽
出するためにおこなった回帰木分析結果を示す回帰木図
である。また、図２１は、この回帰木分析に対する評価
用統計値リストの例を示す図である。In FIG. 20, based on the file shown in FIG. 18, the above-mentioned (2) standard deviation value (VT_N2_s) is used as an objective variable, and the device name used in each step is used as an explanatory variable.
It is a regression tree diagram which shows the regression tree analysis result performed in order to extract the factor of the variation which has arisen in the lot of VT_N2. Further, FIG. 21 is a diagram showing an example of an evaluation statistical value list for this regression tree analysis.

【００４８】図２０に示す回帰木図によれば、ＶＴ＿Ｎ
２の標準偏差値（ＶＴ＿Ｎ２＿ｓ）の変動に対して最も
有意とされるのはＦｉｅｌｄ＿Ｏｘ工程＿装置として、
ＰＭ１号機またはＰＭ３号機を使用したか、あるいはＰ
Ｍ２号機を使用したかということである。これは、評価
用統計値リストのＳ比およびｔ値等について、１番目に
出てくるＦｉｅｌｄ＿Ｏｘ工程＿装置のそれぞれの値
（Ｓ比＝０．３７６７、ｔ＝３．０８１）と、２番目以
降に出てくる第２配線＿装置やＤＲＹ工程＿装置のそれ
ぞれの値（Ｓ比＞０．４３、ｔ＜２．２）とを比較する
と、明らかに有意差が見られることから、信頼度が高い
と判断される。According to the regression tree diagram shown in FIG. 20, VT_N
It is the Field_Ox process_apparatus that is most significant with respect to the fluctuation of the standard deviation value (VT_N2_s) of 2.
PM1 or PM3 was used, or P
It is whether or not M2 machine was used. This is because the S_ratio and t-value, etc. of the evaluation statistical value list are the respective values (S-ratio = 0.3767, t = 3.081) of the first Field_Ox process_apparatus that appears first and the second and subsequent values. In comparison with the respective values (S ratio> 0.43, t <2.2) of the second wiring_device and the DRY process_device appearing in Fig. 2, there is a clear significant difference. It is judged to be high.

【００４９】これを確認するため、図２２に、Ｆｉｅｌ
ｄ＿Ｏｘ工程で使用した装置ごとにＶＴ＿Ｎ２の値の分
布を箱ヒゲ図で示す。図２２では、ＰＭ１号機またはＰ
Ｍ３号機と、ＰＭ２号機との間には明らかな有意差が確
認される。つまり、オリジナルデータの分布特徴を用い
て解析をおこなうという本発明方法の有効性が確認され
たわけである。なお、評価用統計値リスト、Ｓ比および
ｔ値については後述する。In order to confirm this, FIG.
The distribution of the value of VT_N2 for each device used in the d_Ox step is shown in a box-and-whisker diagram. In FIG. 22, PM No. 1 or P
A clear significant difference is confirmed between the M3 machine and the PM2 machine. In other words, the effectiveness of the method of the present invention of performing analysis using the distribution characteristics of the original data was confirmed. The evaluation statistical value list, S ratio, and t value will be described later.

【００５０】図２３は、図２１および図２２に示す回帰
木分析の結果、問題工程とされたＦｉｅｌｄ＿Ｏｘ工程
でＰＭ１号機またはＰＭ３号機を使用した全ウェーハの
ＶＴ＿Ｎ２の分布を示すヒストグラムである。図２４〜
図２６は、それぞれＦｉｅｌｄ＿Ｏｘ工程でＰＭ１号機
またはＰＭ３号機を使用した別々の１ロット分のウェー
ハのＶＴ＿Ｎ２の分布を示すヒストグラムである。ま
た、図２７は、Ｆｉｅｌｄ＿Ｏｘ工程でＰＭ２号機を使
用した全ウェーハのＶＴ＿Ｎ２の分布を示すヒストグラ
ムであり、図２８〜図３０は、それぞれＦｉｅｌｄ＿Ｏ
ｘ工程でＰＭ２号機を使用した別々の１ロット分のウェ
ーハのＶＴ＿Ｎ２の分布を示すヒストグラムである。FIG. 23 is a histogram showing the distribution of VT_N2 of all wafers using PM No. 1 or PM No. 3 in the Field_Ox step, which is a problematic step, as a result of the regression tree analysis shown in FIGS. 21 and 22. Figure 24 ~
FIG. 26 is a histogram showing the distribution of VT_N2 of wafers for different one lot using PM1 machine or PM3 machine in the Field_Ox process. 27 is a histogram showing the distribution of VT_N2 of all wafers using PM2 machine in the Field_Ox process, and FIGS. 28 to 30 are respectively Field_O.
It is a histogram which shows distribution of VT_N2 of the wafer for each separate 1 lot which used PM2 machine in the x process.

【００５１】図２３および図２７に示すように、ＰＭ１
号機またはＰＭ３号機を使用した全ウェーハのＶＴ＿Ｎ
２の平均値（μ＝０．８５６０）と、ＰＭ２号機を使用
した全ウェーハのＶＴ＿Ｎ２の平均値（μ＝０．７３０
２）とは略同じである。そのため、従来のように平均値
を用いて解析しても有意差を抽出するのは困難である。As shown in FIGS. 23 and 27, PM1
VT_N of all wafers using No. 3 or PM No. 3
2 and the average value of VT_N2 (μ = 0.730) of all wafers using PM2.
It is almost the same as 2). Therefore, it is difficult to extract a significant difference even if an analysis is performed using an average value as in the conventional case.

【００５２】しかし、ＰＭ１号機またはＰＭ３号機を使
用した全ウェーハのＶＴ＿Ｎ２の標準偏差値（σ＝０．
０８３５）と、ＰＭ２号機を使用した全ウェーハのＶＴ
＿Ｎ２の標準偏差値（σ＝０．２３５１）とを比較する
と明らかに有意差が見られる。したがって、実施の形態
１のように、オリジナルデータのばらつき等のデータ分
布特徴に着目することにより、オリジナルデータのみを
解析対象としていたのでは抽出できなかった有意差をあ
らたに不良要因として抽出することが可能となる。However, the standard deviation value of VT_N2 of all wafers using PM No. 1 or PM No. 3 (σ = 0.
0835) and VT of all wafers using PM2
A significant difference is clearly seen when compared with the standard deviation value of _N2 (σ = 0.2351). Therefore, as in the first embodiment, by focusing on the data distribution characteristics such as the variation of the original data, a significant difference that could not be extracted by using only the original data as the analysis target is newly extracted as a failure factor. Is possible.

【００５３】上述した解析結果に基づいて実際にＰＭ２
号機について詳細な調査をおこなった結果、ＰＭ１号機
およびＰＭ３号機に比べて炉内の温度分布差が大きいこ
とが判明した。さらに、それは熱電対劣化に起因するこ
とがわかり、定期点検方法の最適化がおこなわれた。と
ころで、ロット歩留りを目的変数とし、各工程での使用
装置名を説明変数として回帰木分析をおこなった結果で
は、ＰＭ２号機が歩留り低下要因であることは明らかに
ならなかった。つまり、歩留り値に明確に現れていなか
った低歩留り要因が、ロット内の電気的特性値の標準偏
差等に有意差が生じる要因を解析するという本発明方法
により明らかにされたわけである。なお、実施の形態１
では、蓄積されたデータの編集、回帰木分析の実行、独
自な手法によるその結果の定量的な評価までが自動的に
実行される。Based on the above analysis results, the PM2 is actually
As a result of a detailed investigation of Unit No. 1, it was found that the temperature distribution difference in the furnace was larger than that of Units PM1 and PM3. Furthermore, it was found that it was due to thermocouple deterioration, and the periodic inspection method was optimized. By the way, the result of performing the regression tree analysis with the lot yield as the objective variable and the device name used in each process as the explanatory variable did not reveal that PM2 was a factor of yield reduction. That is, the low-yield factor, which did not clearly appear in the yield value, was clarified by the method of the present invention in which the factor causing a significant difference in the standard deviation of the electrical characteristic values within the lot is analyzed. The first embodiment
Automatically edits the accumulated data, performs regression tree analysis, and quantitatively evaluates the results using a unique method.

【００５４】図３１は、図１８に示すファイルに基づい
て、上述した（６）ウェーハ番号の間隔２の周期性（Ｖ
Ｔ＿Ｎ２＿２）を目的変数とし、各工程での使用装置名
を説明変数として回帰木分析をおこなった結果を示す回
帰木図である。また、図３２は、この回帰木分析に対す
る評価用統計値リストの例を示す図である。図３１に示
す回帰木図によれば、ＶＴ＿Ｎ２＿２のロット内変動が
２の周期性を有することに対して最も有意とされるの
は、Ｆ拡散工程＿装置としてＦ７号機を使用したか、あ
るいはＦ５号機、Ｆ６号機、Ｆ８号機またはＦ９号機を
使用したかということである。Ｆ７号機を使用した方が
５０％程度強く２の周期性を示すことがわかる。FIG. 31 is based on the file shown in FIG. 18, and has the periodicity (V) of the interval 2 of (6) wafer number described above.
FIG. 9 is a regression tree diagram showing the result of performing a regression tree analysis with T_N2_2) as the objective variable and the device name used in each step as the explanatory variable. Further, FIG. 32 is a diagram showing an example of an evaluation statistical value list for this regression tree analysis. According to the regression tree diagram shown in FIG. 31, it is most significant for the intra-lot variation of VT_N2_2 to have a periodicity of 2 whether the F7 machine is used as the F diffusion process_apparatus, or F5. It means whether the No. 6, F6, F8 or F9 was used. It can be seen that the use of F7 machine shows a periodicity of 2 which is about 50% stronger.

【００５５】これを確認するため、図３３に、Ｆ拡散工
程で使用した装置ごとに２の周期性の値（ＶＴ＿Ｎ２＿
２）の分布を箱ヒゲ図で示す。図３３では、Ｆ７号機
と、Ｆ５号機、Ｆ６号機、Ｆ８号機またはＦ９号機との
間には明らかな有意差が確認される。なお、全ＶＴ＿Ｎ
２データをウェーハ番号ごとに表示した図９の箱ヒゲ図
からは、２の周期性を見ることはできない。この例で
も、オリジナルデータの分布特徴を用いて解析をおこな
うという本発明方法の有効性が確認されたわけである。To confirm this, in FIG. 33, the periodicity value of 2 (VT_N2_) is set for each device used in the F diffusion process.
The distribution of 2) is shown by a box-and-whisker diagram. In FIG. 33, a clear significant difference is confirmed between the F7 machine and the F5 machine, the F6 machine, the F8 machine, or the F9 machine. Note that all VT_N
It is not possible to see the periodicity of 2 from the box-and-whisker diagram of FIG. 9 in which 2 data is displayed for each wafer number. In this example as well, the effectiveness of the method of the present invention of performing analysis using the distribution characteristics of the original data was confirmed.

【００５６】図３４〜図３６は、図３１および図３２に
示す回帰木分析の結果、問題工程とされたＦ拡散工程で
Ｆ７号機を使用した別々の１ロット分のウェーハについ
てＶＴ＿Ｎ２のロット内変動を示すヒストグラムであ
る。図３７〜図３９は、Ｆ拡散工程でＦ５号機、Ｆ６号
機、Ｆ８号機またはＦ９号機を使用した別々の１ロット
分のウェーハについてＶＴ＿Ｎ２のロット内変動を示す
ヒストグラムである。上述した解析結果より、ＶＴ＿Ｎ
２のロット内変動の要因が抽出され、ウェーハが交互に
使用される装置であるＦ拡散工程の装置が注目され、実
際に２つのチャンバーのうちの一方でパーティクルの発
生が多いことが判明した。FIGS. 34 to 36 show results of the regression tree analysis shown in FIGS. 31 and 32, showing variation in VT_N2 within a lot for each one lot of wafers using the F7 machine in the F diffusion process which is a problematic process. It is a histogram showing. 37 to 39 are histograms showing the intra-lot variation of VT_N2 with respect to each wafer for one lot using F5 machine, F6 machine, F8 machine, or F9 machine in the F diffusion process. From the above analysis results, VT_N
The factor of the intra-lot variation was extracted, and the device of the F diffusion process, which is the device in which the wafers are used alternately, has been noticed, and it has been found that particles are often generated in one of the two chambers.

【００５７】ところで実施の形態１では、回帰木分析は
説明変数を同じにして抽出された各特徴量を順次目的変
数に選択して、自動的に回帰木分析をおこない、それに
よって各特徴量を左右する要因がそれぞれについて抽出
される。特に何を解析すべきかが明確となっていない場
合には、考えられるすべての特徴量を抽出し、それらを
目的変数として回帰木分析を実行する。その結果、上述
したように種々の解析結果が得られるので、その中で最
も有意差が大とみなされる項目を歩留り改善のための対
策項目の候補とする。このように従来の解析方法では容
易に抽出されなかった多くの有意差が効率的に抽出され
る。In the first embodiment, in the regression tree analysis, the feature variables extracted with the same explanatory variables are sequentially selected as the objective variables, and the regression tree analysis is automatically performed. Factors that influence each are extracted. In particular, when it is not clear what to analyze, all possible feature quantities are extracted and a regression tree analysis is performed using them as objective variables. As a result, since various analysis results are obtained as described above, the item considered to have the largest significant difference among them is taken as the candidate for the measure item for improving the yield. As described above, many significant differences that cannot be easily extracted by the conventional analysis method are efficiently extracted.

【００５８】ここで、回帰木分析および評価用統計値リ
ストについて説明する。まず、回帰木分析について簡単
に説明する。回帰木分析は、複数の属性を示す説明変数
とそれにより影響を受ける目的変数からなるレコードの
集合を対象とし、その目的変数に最も影響を与える属性
と属性値を判別するものである。データマイニングをお
こなってルールファイル２４を抽出する手段２３（回帰
木分析エンジン）からはデータの特徴や規則性を示すル
ールが出力される。The regression tree analysis and evaluation statistical value list will be described below. First, the regression tree analysis will be briefly described. Regression tree analysis targets a set of records consisting of explanatory variables indicating a plurality of attributes and objective variables affected by the explanatory variables, and discriminates the attributes and attribute values that most affect the objective variables. From the means 23 (regression tree analysis engine) for performing the data mining and extracting the rule file 24, the rules indicating the characteristics and regularity of the data are output.

【００５９】回帰木分析の処理は、各説明変数（属性）
のパラメータ値（属性値）に基づいて集合の２分割を繰
り返していくことで実現される。その集合分割の際、分
割前の目的変数の平方和をＳ０、分割後の２つの集合の
それぞれの目的変数の平方和をＳ１およびＳ２としたと
き、式（１）で示すΔＳが最大となるように、分割する
レコードの説明変数とそのパラメータ値を求める。The process of regression tree analysis is performed by using each explanatory variable (attribute).
It is realized by repeatedly dividing the set into two based on the parameter value (attribute value) of. At the time of dividing the set, when the sum of squares of the objective variable before the division is S0 and the sum of squares of the respective objective variables of the two sets after the division is S1 and S2, ΔS shown in the equation (1) becomes maximum. In this way, the explanatory variable of the record to be divided and its parameter value are obtained.

【００６０】 ΔＳ＝Ｓ０−（Ｓ１＋Ｓ２）・・・（１）[0060] ΔS = S0− (S1 + S2) (1)

【００６１】ここで得られる説明変数とそのパラメータ
値は、回帰木では分岐点に対応している。以降、分割さ
れた集合についても同様な処理を繰り返し、説明変数の
目的変数に対する影響を調べる。以上が、一般によく知
られている回帰木分析の手法であるが、集合分割の明確
さをより詳しく把握するために、複数の上位分割候補に
関して、ΔＳの他に以下のパラメータ（ａ）〜（ｄ）も
回帰木分析結果の定量的な評価として使用する。これら
のパラメータは評価用統計値リストとして出力される。The explanatory variables and their parameter values obtained here correspond to branch points in the regression tree. After that, similar processing is repeated for the divided sets, and the influence of the explanatory variable on the objective variable is examined. The above is the method of regression tree analysis that is generally well known, but in order to understand the clarity of the set partitioning in more detail, the following parameters (a) to (S d) is also used as a quantitative evaluation of the regression tree analysis results. These parameters are output as a list of statistical values for evaluation.

【００６２】（ａ）Ｓ比：集合分割による平方和の低減
率であり、集合分割により平方和がどの程度低減したか
を示すパラメータである。この値が小さいほど集合分割
の効果は大きく、集合分割が明確におこなわれているの
で、有意差が大である。(A) S ratio: This is a reduction rate of the sum of squares due to the set division, and is a parameter indicating how much the sum of squares is reduced due to the set division. The smaller this value is, the greater the effect of the set partitioning is, and the set partitioning is clearly performed, so that the significant difference is large.

【００６３】Ｓ比＝（（Ｓ１＋Ｓ２）／２）／Ｓ０・・・（２）[0063] S ratio = ((S1 + S2) / 2) / S0 (2)

【００６４】（ｂ）ｔ値：回帰木分析エンジンにより集
合が２分割されるが、分割された２つの集合の平均（／
Ｘ１，／Ｘ２）の差の検定のための値である。ここで、
“／”は上線を示す。統計のｔ検定は、分割された集合
における目的変数の平均値の有意差を示す基準となる。
自由度、すなわちデータ数が同じであるなら、ｔが大き
いほど集合が明確に分割されており、有意差が大であ
る。(B) t value: The regression tree analysis engine divides the set into two, and the average (/
X1, / X2) is a value for testing the difference. here,
"/" Indicates an overline. The statistical t-test serves as a standard for showing a significant difference in the average value of the objective variables in the divided sets.
If the degrees of freedom, that is, the number of data is the same, the larger t is, the more clearly the set is divided, and the significant difference is large.

【００６５】この際、分割された集合の分散に有意差が
ない場合にはつぎの（３）式によりｔ値を求め、分割さ
れた集合の分散に有意差がある場合には（４）式により
ｔ値を求める。ここで、Ｎ１およびＮ２は、それぞれ分
割した集合１および集合２の要素数である。また、／Ｘ
１および／Ｘ２はそれぞれ分割後の各集合の平均であ
る。Ｓ１およびＳ２は、それぞれ分割後の各集合の目的
変数の平方和である。At this time, when there is no significant difference in the variance of the divided sets, the t value is obtained by the following formula (3), and when there is a significant difference in the variance of the divided sets, the formula (4) is used. Find the t value. Here, N1 and N2 are the numbers of elements of the divided set 1 and set 2, respectively. Also, / X
1 and / X2 are the average of each set after division. S1 and S2 are sums of squares of the objective variables of each set after division.

【００６６】[0066]

【数１】 [Equation 1]

【００６７】[0067]

【数２】 [Equation 2]

【００６８】（ｃ）分割された集合の目的変数の平均値
の差：この値が大きいほど有意差が大である。(C) Difference in average values of objective variables of divided sets: The larger this value, the larger the significant difference.

【００６９】（ｄ）分割された各集合のデータ数：両者
の差が小さいほど異常値（ノイズ）による影響が小であ
る。(D) Number of data in each divided set: The smaller the difference between the two, the smaller the influence of the abnormal value (noise).

【００７０】上述した実施の形態１によれば、従来のよ
うにオリジナルデータやその平均値だけでなく、オリジ
ナルデータのばらつきやロット内変動パターンなど、オ
リジナルデータ群内に存在する種々のデータ分布特徴を
抽出し、各特徴量を順次目的変数に選択して解析をおこ
なうことにより、各特徴量が生じた要因を自動的かつ定
量的に評価して抽出し、データをより多面的にみて多く
の情報を抽出することができる。したがって、従来は多
種多様なデータに埋もれて判別が困難であった関連性や
有意差を、技術者の主観によらずに客観的に、また効率
的に定量的に抽出することができる。According to the above-described first embodiment, not only the original data and the average value thereof as in the prior art, but also various data distribution characteristics existing in the original data group such as variations of the original data and variation patterns within the lot. By extracting each feature quantity as an objective variable and performing an analysis, the factors that cause each feature quantity are automatically and quantitatively evaluated and extracted. Information can be extracted. Therefore, it is possible to objectively and efficiently and quantitatively extract the relation and the significant difference which are conventionally difficult to discriminate because they are buried in various data.

【００７１】また、実施の形態１では、特徴量の抽出か
らその要因抽出までの一連の手順が自動的におこなわれ
るので、所定の設定をしておくことによって自動的に半
導体製造ライン等の変動状況やその要因を絶えず監視す
ることが可能となる。Further, in the first embodiment, since a series of procedures from the extraction of the characteristic amount to the extraction of the factor thereof are automatically performed, it is possible to automatically change the semiconductor manufacturing line etc. by setting a predetermined setting. It will be possible to constantly monitor the situation and its factors.

【００７２】なお、本発明は上述した実施の形態１に限
らず、適用範囲が広い。たとえば新品種の立ち上げ時な
どで、悪化原因がたくさんあり、歩留りの悪いロットが
多発している状況では、オリジナルデータやその平均値
を用いた原因工程の調査だけでなく、ロットやウェーハ
内のデータ分布特徴からの原因工程調査をおこなうこと
によって、隠れていた原因を見つけたり、原因の絞り込
みをおこなうことが可能となる。The present invention is not limited to the above-described first embodiment, but has a wide range of application. For example, when starting a new product, there are many causes of deterioration and lots with poor yield frequently occur.In addition to investigating the cause process using the original data and its average value, By investigating the cause process from the data distribution feature, it is possible to find the hidden cause and narrow down the cause.

【００７３】（実施の形態２）図５６は、本発明の実施
の形態２によるデータマイニングを導入したデータ解析
装置の機能構成の一例を示す図である。データマイニン
グ部１７０３は、オリジナルデータ群１７０１内の各デ
ータベース１７０２から抽出された個々のオリジナルデ
ータに基づいて、データ内に潜む特徴や規則性の抽出処
理を行い、ルールファイル１７０４を作成する。解析ツ
ール群１７０５は、統計解析コンポーネント１７０６お
よび図表作成コンポーネント１７０７等を有し、ルール
ファイル１７０４を基にデータベース１７０２から抽出
された個々のオリジナルデータを解析する。(Second Embodiment) FIG. 56 is a diagram showing an example of a functional configuration of a data analysis device to which data mining according to the second embodiment of the present invention is introduced. The data mining unit 1703 creates a rule file 1704 by extracting features and regularity hidden in the data based on the individual original data extracted from each database 1702 in the original data group 1701. The analysis tool group 1705 has a statistical analysis component 1706, a chart creation component 1707, etc., and analyzes individual original data extracted from the database 1702 based on the rule file 1704.

【００７４】その解析結果は、解析ツール群１７０５お
よびデータマイニング部１７０３にフィードバックされ
る。データマイニング部１７０３は、解析ツール群１７
０５の解析結果およびオリジナルデータ群１７０１を基
にデータマイニングをおこなう。解析ツール群１７０５
は、ルールファイル１７０４、データベース１７０２か
ら抽出された個々のオリジナルデータ、および自己の解
析結果を基に解析をおこなう。意思決定（部）１７０８
は、解析ツール群１７０５の解析結果を基に意思決定を
おこなう。The analysis result is fed back to the analysis tool group 1705 and the data mining unit 1703. The data mining unit 1703 includes the analysis tool group 17
Data mining is performed based on the analysis result of 05 and the original data group 1701. Analysis tool group 1705
Analyzes based on the rule file 1704, individual original data extracted from the database 1702, and its own analysis result. Decision making (part) 1708
Makes a decision based on the analysis result of the analysis tool group 1705.

【００７５】歩留りデータ解析においてデータマイニン
グを適用した場合、データマイニング結果に基づいて歩
留り向上のための対策を決定したり、対策を実施すべき
か否かの判定をおこなったり、対策効果の予測をおこな
ったりすることになる。そのためには、データマイニン
グ結果の定量的な評価や精度が必要となる。When data mining is applied in the yield data analysis, a measure for improving the yield is determined based on the data mining result, whether or not the measure should be implemented is determined, and the effect of the measure is predicted. Will be. For that purpose, quantitative evaluation and accuracy of data mining results are required.

【００７６】データマイニングの一手法である判別木分
析のうち、回帰木分析は特に有効である。回帰木分析の
利点の一つは、結果がわかりやすいルールとして出力さ
れることであり、それは一般的な言語やＳＱＬ言語のよ
うなデータベース言語であらわされる。したがって、こ
れらの結果の信頼度、精度を有効に使い、その結果によ
り有効な意思決定をおこなったり、行動（すなわち対策
等）を起こすようにすることが可能となる。Among the discriminant tree analyzes that are one method of data mining, the regression tree analysis is particularly effective. One of the advantages of the regression tree analysis is that the result is output as an easy-to-understand rule, which is expressed in a general language or a database language such as SQL language. Therefore, it is possible to make effective use of the reliability and precision of these results, and to make effective decisions and take actions (that is, countermeasures) based on the results.

【００７７】図４３に回帰木分析の入力となるデータ例
の形式を示す。レコードはウェーハ番号単位であり、各
レコードは各製造工程での使用装置４１１、電気的特性
データ４１２とウェーハ歩留り４１３を有する。説明変
数４０１は、使用装置４１１および電気的特性データ４
１２等である。目的変数４０２は、歩留り４１３であ
る。たとえば、歩留りに効果があるのは、使用装置４１
１と電気的特性データ４１２であるとする。このデータ
による回帰木分析結果である回帰木図と評価用統計値リ
ストを図４４、図４５に示す。FIG. 43 shows a format of an example of data which is an input of the regression tree analysis. A record is a wafer number unit, and each record has a device 411 used in each manufacturing process, electrical characteristic data 412, and a wafer yield 413. The explanatory variable 401 is the used device 411 and the electrical characteristic data 4
It is 12 mag. The objective variable 402 is the yield 413. For example, the device used 41 has an effect on yield.
1 and electrical characteristic data 412. 44 and 45 show a regression tree diagram which is the result of the regression tree analysis based on this data and a list of statistical values for evaluation.

【００７８】図４４は、回帰木分析結果である回帰木図
である。ルートノードｎ０は、ノードｎ１およびｎ２に
２分割される。ノードｎ１は、ノードｎ３およびｎ４に
２分割される。ノードｎ２は、ノードｎ５およびｎ６に
２分割される。ノードｎ６は、ノードｎ７およびｎ８に
２分割される。FIG. 44 is a regression tree diagram showing the result of the regression tree analysis. The root node n0 is divided into two nodes n1 and n2. The node n1 is divided into two nodes n3 and n4. The node n2 is divided into two nodes n5 and n6. The node n6 is divided into two nodes n7 and n8.

【００７９】図４５は、第１の２分割時の説明変数の評
価用統計値である。たとえば、全集合の目的変数の平均
値Ａｖｅが７５であり、標準偏差ｓが１２であり、デー
タ数Ｎが１０００である。リスト６０１〜６０４は、そ
れぞれ左から有意差による順位、Ｓ比、ｔ値、分割され
た集合の目的変数の平均値の差、分割された各集合のデ
ータ数、分割された集合の属性名（説明変数）、分割さ
れた２つの集合の属性値（パラメータ値）とその目的変
数の大小関係を示す。このリスト６０１〜６０４は、分
割する属性値（説明変数）の（１）式に示すΔＳの値に
よるグループ分けの候補であり、有意差（ΔＳ）の大き
い順に並べてある。図４４は、第１候補６０１を基にノ
ードｎ０をノードｎ１およびｎ２に分割したものであ
る。FIG. 45 shows evaluation statistical values of explanatory variables in the first two-division. For example, the average value Ave of the objective variables of all sets is 75, the standard deviation s is 12, and the number of data N is 1000. The lists 601 to 604 are, from the left, the ranks by significant difference, the S ratio, the t value, the difference in the average value of the objective variables of the divided sets, the number of data of each divided set, and the attribute name of the divided set ( Explanatory variables), the attribute value (parameter value) of the two divided sets, and the magnitude relation between the objective variables. The lists 601 to 604 are candidates for grouping according to the value of ΔS shown in the expression (1) of the attribute value (explanatory variable) to be divided, and are arranged in descending order of significant difference (ΔS). In FIG. 44, the node n0 is divided into nodes n1 and n2 based on the first candidate 601.

【００８０】図４４の全ウェーハの集合ｎ０を式（１）
のΔＳの評価値に基づいて２つの集合ｎ１およびｎ２に
分割をおこなうと、歩留りに最も影響を及ぼすのは工程
ＡでＡＭ１かＡＭ２のいずれかを使うかであり、後者の
方が歩留りがよい。以下、分割された集合に対して、同
様な集合分割を繰り返していくとこの回帰木図が得られ
る。工程ＡでＡＭ２かつ工程ＣでＣＭ２を使用したウェ
ーハ群に対しては、電気的特性データＲＳＰが９０以下
の状態が最も効果がある（歩留りが高い）。The set n0 of all wafers in FIG.
When the two sets n1 and n2 are divided on the basis of the evaluation value of ΔS, the yield is most affected by using either AM1 or AM2 in the process A, and the latter yields better. . This regression tree diagram can be obtained by repeating the same set division for the divided sets. For the wafer group using AM2 in the process A and CM2 in the process C, the state in which the electrical characteristic data RSP is 90 or less is most effective (high yield).

【００８１】図４６は図４４と等価であり、分割された
ウェーハ集合の歩留りと特定工程の使用装置と電気的特
性データとの相関を示す。図４４の回帰木図で上階層に
現れる説明変数ほど、目的変数に対する影響は大きい。
全ウェーハの平均歩留りは７４．８％であるが、使用装
置や電気的特性データとの関連で幾つかの集合に分けて
みるとこのような特徴、規則性があることを回帰木分析
は自動的に抽出し、歩留り解析の手がかりとなる。FIG. 46 is equivalent to FIG. 44 and shows the correlation between the yield of divided wafer sets, the equipment used in a specific process and the electrical characteristic data. The higher the explanatory variable that appears in the regression tree diagram of FIG. 44, the greater the influence on the objective variable.
The average yield of all wafers is 74.8%, but if we divide it into several sets in relation to the equipment used and electrical characteristic data, regression tree analysis automatically shows that there are such characteristics and regularity. To be used as a clue for yield analysis.

【００８２】図４４の回帰木図において上位２階層はい
ずれも使用装置差によるものであるので、全ウェーハを
使った解析では歩留りに影響の大きいのは複合条件を含
めても使用装置差である。電気的特性データはあまり効
いていないように見られる。しかし、工程ＡでＡＭ２か
つ工程ＣでＣＭ２を使用したウェーハ群について歩留り
に最も効くのはＲＳＰであることが図４４、図４６から
読み取れる。In the regression tree diagram of FIG. 44, since the upper two layers are all due to the difference in the used device, it is the difference in the used device that has a great influence on the yield in the analysis using all wafers even if the compound condition is included. . The electrical characteristics data appear to be less effective. However, it can be read from FIGS. 44 and 46 that RSP is most effective for the yield in the wafer group using AM2 in step A and CM2 in step C.

【００８３】つぎに、２分割交絡度、２分割独立度の算
出例を説明する。回帰木分析において、目的変数に対し
て最も有意な説明変数を求めるためにおこなわれた各集
合分割状態の交絡度（交絡の状態、独立でない度合い）
を統計的に把握し、有意差が大とされた説明変数に交絡
している他の説明変数を明確にする。図４７を参照しな
がら、２分割交絡度および２分割独立度の演算方法を説
明する。Next, an example of calculating the 2-division confounding degree and 2-division independence will be described. In the regression tree analysis, the degree of confounding of each set division state (confounding state, degree of independence) performed to find the most significant explanatory variable for the objective variable
Statistically grasping and clarifying the other explanatory variables confounding the explanatory variables with significant difference. With reference to FIG. 47, a method of calculating the 2-division confounding degree and 2-division independence will be described.

【００８４】第１に、説明変数のうち、交絡度を評価し
たいものを基準説明変数８０１とする。First, of the explanatory variables, the one for which the degree of confounding is to be evaluated is set as the reference explanatory variable 801.

【００８５】第２に、各レコードは説明変数ごとに
“Ｌ”または“Ｈ”をデータ値とするテーブルを構成す
る。ここで、Ｈは回帰木分析時の集合２分割時の目的変
数が高い値となる集合、Ｌは回帰木分析時の集合２分割
時の目的変数が低い値となる集合にそれぞれ属する。集
合２分割時においては、全レコードの各説明変数につい
て、Ｌ，Ｈが定まる。Secondly, each record constitutes a table having a data value of "L" or "H" for each explanatory variable. Here, H belongs to a set in which the objective variable has a high value when the set is divided into two in the regression tree analysis, and L belongs to a set in which the target variable has a low value when the set is divided into two in the regression tree analysis. When the set is divided into two, L and H are determined for each explanatory variable of all records.

【００８６】第３に、基準説明変数８０１を基に各比較
説明変数８０２のＬ，Ｈの一致度の評価値として、Ｌ，
Ｈが一致するレコード数をＮａ、全レコード数をＮと
し、２分割交絡度ＤＥＰを式（５）のように定義する。
２分割交絡度ＤＥＰの範囲は−１〜１であり、完全に交
絡していれば１、全く交絡してなければ０、逆の交絡で
あれば−１である。Thirdly, based on the reference explanatory variable 801, as the evaluation value of the degree of coincidence between L and H of each comparative explanatory variable 802, L,
Let Na be the number of records in which H matches and N be the total number of records, and define the 2-division confounding degree DEP as in Expression (5).
The range of the 2-division entanglement degree DEP is −1 to 1, and is 1 when completely entangled, 0 when completely entangled, and −1 when reverse entangled.

【００８７】ＤＥＰ＝（２×Ｎａ／Ｎ）−１・・・（５）[0087] DEP = (2 × Na / N) −1 (5)

【００８８】また、２分割交絡度ＤＥＰを基に、２分割
独立度ＩＮＤを式（６）のように定義する。２分割独立
度ＩＮＤの範囲は０〜１であり、完全に独立していれば
１、全く独立でなければ０である。Further, the two-division independence degree IND is defined as in the equation (6) based on the two-division confounding degree DEP. The range of the 2-division independence degree IND is 0 to 1, and is 1 if completely independent, and 0 if not completely independent.

【００８９】ＩＮＤ＝１−｜ＤＥＰ｜・・・（６）IND = 1- | DEP | (6)

【００９０】第４に、上記の２分割交絡度ＤＥＰ、２分
割独立度ＩＮＤを一つの基準説明変数８０１とその他の
説明変数８０２との間で求め、説明変数間の評価尺度と
する。どの説明変数を基準説明変数とするかは任意であ
るが、その有用性からして回帰木分析において目的変数
に対して、特に最上階層での集合分割で有意差が大とさ
れたものとするのが有効である。Fourthly, the above-mentioned two-divided confounding degree DEP and two-divided independence degree IND are obtained between one reference explanatory variable 801 and the other explanatory variables 802 and used as an evaluation scale between the explanatory variables. Which explanatory variable is used as the reference explanatory variable is arbitrary, but it is assumed from the usefulness that there is a significant difference from the objective variable in the regression tree analysis, especially in the set partition at the highest hierarchy. Is effective.

【００９１】第５に、上記の２分割交絡度ＤＥＰおよび
２分割独立度ＩＮＤを求めることにより、各比較説明変
数８０２がＬ，Ｈの各集合に属する状態が基準説明変数
８０１のものとどれだけ差異があるかを定量的に評価で
きる。Fifth, by obtaining the above-mentioned two-divided confounding degree DEP and two-divided independence degree IND, how much each comparative explanatory variable 802 belongs to each set of L and H and what is the reference explanatory variable 801? Whether there is a difference can be evaluated quantitatively.

【００９２】２分割交絡度および／または２分割独立度
を求めることにより、回帰木分析の集合２分割結果に基
づき説明変数の交絡度を定量的に評価できるようにな
り、回帰木分析と組み合わせて、回帰木分析で得られた
有意差が大となる説明変数と交絡している別の説明変数
を自動的に抽出することが可能となる。By obtaining the degree of 2-part confounding and / or the degree of 2-part independence, it becomes possible to quantitatively evaluate the degree of confounding of explanatory variables based on the result of the set bipartition of the regression tree analysis, and in combination with the regression tree analysis. , It is possible to automatically extract another explanatory variable confounding with an explanatory variable having a large significant difference obtained by regression tree analysis.

【００９３】２分割交絡度は、回帰木分析での対象とさ
れたどの説明変数についても評価できるが、その有効性
からみて図４５の最初の分割候補の上位に挙がった説明
変数（＝基準説明変数、評価用統計値リストに挙がる）
と他の任意の説明変数がどれだけ交絡しているかを統計
的に把握し、有意差が大きい説明変数について交絡して
いる注意すべき説明変数を抽出する。基準説明変数８０
１との交絡度を解析しようとする説明変数を、比較説明
変数８０２とし、両者とも図４４の評価用統計値リスト
から選択される。２分割交絡度、２分割独立度の算出例
を、図４７を参照しながら説明する。The 2-division confounding degree can be evaluated with respect to any of the explanatory variables targeted in the regression tree analysis, but from the viewpoint of its effectiveness, the explanatory variable (= reference explanation) that is ranked higher than the first division candidate in FIG. Variables, listed in the statistical value list for evaluation)
And how much any other explanatory variables are confounded statistically, and the explanatory variables that are confounded with respect to the explanatory variables having a large significant difference are extracted. Standard explanatory variable 80
The explanatory variable for which the degree of confounding with 1 is to be analyzed is the comparative explanatory variable 802, and both are selected from the evaluation statistical value list of FIG. A calculation example of the 2-division confounding degree and 2-division independence degree will be described with reference to FIG. 47.

【００９４】図４７は、横軸にウェーハ番号８０３、比
較説明変数８０２、基準説明変数８０１、歩留り８０４
を示し、縦軸に基準説明変数の高歩留りグループ８１
１、基準説明変数の低歩留りグループ８１２、２分割交
絡度の計算式８１３、２分割交絡度８１４、２分割独立
度８１５を示す。In FIG. 47, the horizontal axis represents the wafer number 803, the comparative explanatory variable 802, the reference explanatory variable 801, and the yield 804.
And the vertical axis represents the high yield group 81 of the reference explanatory variables.
1, a low-yield group 812 as a reference explanatory variable, a calculation formula 813 for 2-division confounding degree, a 2-division confounding degree 814, and 2-division independence degree 815 are shown.

【００９５】図４５の上位候補項目（評価用統計値リス
ト）の中から比較の基準とする項目を基準説明変数８０
１として決める。図４７では、ＳＴ３が基準説明変数８
０１である。その他の説明変数を比較説明変数８０２と
する。図４７では、ＳＴ１，ＳＴ２，ＷＥＴ２が比較説
明変数８０２である。各比較説明変数８０２と基準説明
変数８０１とを比較する。説明変数であるＳＴ１，ＳＴ
２，ＳＴ３，ＷＥＴ２では、低歩留りグループの“Ｌ”
をハッチで示し、高歩留りグループの“Ｈ”をハッチな
しで示す。From the upper candidate items (list of statistical values for evaluation) shown in FIG.
Decide as 1. In FIG. 47, ST3 is the reference explanatory variable 8
01. The other explanatory variables are referred to as comparative explanatory variables 802. In FIG. 47, ST1, ST2, and WET2 are comparative explanatory variables 802. Each comparative explanatory variable 802 is compared with the reference explanatory variable 801. Explanatory variables ST1, ST
2, ST3, WET2, low yield group "L"
Is indicated by a hatch, and “H” in the high yield group is indicated by no hatch.

【００９６】基準説明変数８０１であるＳＴ３は、その
属性値により、基準説明変数の高歩留りグループ８１１
と基準説明変数の低歩留りグループ８１２に分けること
ができる。基準説明変数の高歩留りグループ８１１は１
０個の集合であり、基準説明変数の低歩留りグループ８
１２も１０個の集合である。ST3, which is the reference explanation variable 801, has a high yield group 811 of the reference explanation variables depending on its attribute value.
Can be divided into the low yield group 812 of the reference explanatory variables. High yield group 811 of the standard explanatory variable is 1
Low yield group 8 which is a set of 0 and has a standard explanatory variable
12 is also a set of 10.

【００９７】つぎに、それぞれの説明変数の２分割され
た高歩留グループと低歩留グループのロットが基準説明
変数の同じグループとどれだけ一致しているかを数えて
Ｎａとする。たとえば、比較説明変数８０２であるＳＴ
１は、基準説明変数の高歩留りグループ８１１に含まれ
る高歩留りグループが１０個であり、基準説明変数の低
歩留りグループ８１２に含まれる低歩留りグループが２
個である。すなわち、比較説明変数であるＳＴ１と基準
説明変数であるＳＴ３とが相互に同じグループに属する
数Ｎａ＝１０＋２＝１２である。Next, Na is counted by counting how much the lots of the high yield group and the low yield group, which are divided into two of the explanatory variables, match the same group of the reference explanatory variables. For example, ST which is the comparative explanatory variable 802
In No. 1, 10 high yield groups are included in the high yield group 811 of the reference explanatory variable, and 2 low yield groups are included in the low yield group of the reference explanatory variable 812.
It is an individual. That is, the number Na = 10 + 2 = 12 of the comparative explanatory variable ST1 and the standard explanatory variable ST3 that belong to the same group.

【００９８】上記Ｎａを式（５）に代入した式を２分割
交絡度の計算式８１３に示す。ここで、データ数Ｎは２
０である。この計算結果を２分割交絡度８１４に示す。
式（６）により求めた値を２分割独立度８１５として示
す。２分割交絡度８１４および２分割独立度８１５を、
図４７の各列の下に示す。A formula 813 for substituting the above Na into the formula (5) is shown in the calculation formula 813 for the degree of entanglement. Here, the number of data N is 2
It is 0. The result of this calculation is shown in the degree of halving 814.
The value obtained by the equation (6) is shown as the 2-division independence degree 815. The two-division confounding degree 814 and the two-division independence degree 815 are
It is shown under each column in FIG. 47.

【００９９】２分割交絡度および２分割独立度の基本的
活用方法は次の３つである。従来は判別が難しかった説
明変数が、以下のように定量的な情報として得られる。There are the following three basic methods of utilizing the two-division confounding degree and the two-division independence degree. Explanatory variables that were difficult to discriminate in the past can be obtained as quantitative information as follows.

【０１００】（１）有意な説明変数の範囲を確認：有意
性の高い候補と交絡している候補を把握し、これらも有
意な説明変数と判断する。交絡度に対する基準は特に無
いが、他の説明変数の値と比較して判断できる。また、
技術的に対象として考えなくてよい候補が上位に来た場
合、この候補に交絡している候補を明確にできる。さら
に、意味の無い候補を削除して再度分析して確認でき
る。(1) Confirming the range of significant explanatory variables: Grasping the candidates confounding with highly significant candidates and judging them as significant explanatory variables. There is no particular criterion for the degree of confounding, but it can be judged by comparing with the values of other explanatory variables. Also,
When a candidate that does not need to be considered as a technical target comes to the top, the candidate confounding this candidate can be clarified. Furthermore, meaningless candidates can be deleted and analyzed again for confirmation.

【０１０１】（２）独立性の高い候補の確認とその応
用：すべての候補について他の候補との独立度を確認
し、他の候補との独立度が十分高い候補がある場合、こ
の候補による歩留り差は他の候補に独立して存在するこ
とが明確になる。さらに、この候補の分割グループごと
に同様の判別木分析をおこなって比較し、どちらも同様
の分析結果が得られる場合は分析結果の信頼性が高いこ
とがわかる。逆に、分析結果が異なる場合は独立と考え
られた候補との複合条件で歩留りを左右する説明変数が
あるか、または特異なデータに左右されている（データ
数が少ないことなどが要因）と考えられる。(2) Confirmation of highly independent candidate and its application: If all the candidates are confirmed to have independence with other candidates, and if there is a candidate with sufficiently high independence with other candidates, then this candidate is used. It becomes clear that the yield difference exists independently of other candidates. Furthermore, the same discriminant tree analysis is performed for each of the divided groups of the candidates, and the results are compared, and it is found that the reliability of the analysis result is high when the same analysis result is obtained. On the other hand, if the analysis results are different, there is an explanatory variable that influences the yield depending on the composite condition with the candidate considered to be independent, or it depends on peculiar data (due to the small number of data etc.) Conceivable.

【０１０２】（３）交絡している候補に関する判別木分
析：ある重要と考えられる候補が第１分岐候補に交絡し
ている場合、第１分岐の下層の分岐には現れ難い。その
際、他の独立度の高い候補の分割グループによってデー
タを分割して判別木分析をおこない、この分割グループ
の下での判別木分析結果を比較する。同様の結果であれ
ば、その重要な候補は第１候補と区別できないが、分析
自体は信頼性が高いと考えられる。逆にその重要と考え
られる候補が現われ、異なった結果となった場合、この
結果も考慮すべきであり、重要と考えられる候補と第１
候補とを区別して分析できるデータ解析をさらにおこな
う必要が有ると考えられる。(3) Discriminant Tree Analysis on Entangled Candidates: When a candidate considered to be important is entangled with the first branch candidate, it does not easily appear in the lower branch of the first branch. At that time, the data is divided by other division groups having a high degree of independence to perform the discriminant tree analysis, and the discriminant tree analysis results under this divided group are compared. If the result is similar, the important candidate cannot be distinguished from the first candidate, but the analysis itself is considered to have high reliability. On the contrary, if a candidate that seems to be important appears and the result is different, this result should be taken into consideration.
It is considered necessary to further analyze the data so that it can be analyzed separately from the candidates.

【０１０３】つぎに、装置履歴、電気的特性値を説明変
数、ウェーハ歩留りを目的変数とする回帰木分析をおこ
ない、回帰木分析結果の第１分岐の上位１２候補につい
て２分割交絡度および２分割独立度を求める場合を説明
する。Next, a regression tree analysis is performed using the apparatus history, the electrical characteristic value as an explanatory variable, and the wafer yield as an objective variable, and the top 12 candidates of the first branch of the regression tree analysis result are divided into two divisions and the degree of convolution is divided into two. A case of obtaining the independence will be described.

【０１０４】本実施の形態２で得られる回帰木図および
評価用統計値リストを図４８および図４９に示す。図４
８では、ノードｎ９００がノードｎ９０１〜ｎ９１４に
分割される。図４９は、第１の２分割時の上位１２の説
明変数の評価用統計値を示す。これにより、集合分岐の
１２の候補１００１〜１０１２が挙がる。The regression tree diagram and the evaluation statistical value list obtained in the second embodiment are shown in FIGS. 48 and 49. Figure 4
In 8, the node n900 is divided into nodes n901 to n914. FIG. 49 shows evaluation statistical values of the top 12 explanatory variables at the time of the first two-division. As a result, twelve candidates 1001 to 1012 of the set branch are listed.

【０１０５】図５０は、図４９の最上位の第一候補１０
０１として挙がっているＳＴ１を基準説明変数１１０１
とし、評価用統計値リストの他の説明変数を比較説明変
数１１０２としたときの２分割交絡度１１１１および２
分割独立度１１１２を示す。FIG. 50 shows the top first candidate 10 of FIG.
ST1 listed as 01 is a reference explanatory variable 1101.
Then, when the other explanatory variables of the evaluation statistical value list are set to the comparative explanatory variable 1102, the degree of convolution of two divisions 1111 and 2
The division independence 1112 is shown.

【０１０６】図５１は、図４９の集合分岐の第三候補１
００３として挙がっているＳＴ３を基準説明変数１２０
１とし、評価用統計値リストの他の説明変数を比較説明
変数１２０２としたときの２分割交絡度１２１１および
２分割独立度１２１２を示す。FIG. 51 shows the third candidate 1 of the set branch of FIG.
ST3 listed as 003 is the reference explanatory variable 120.
2 shows the 2-division confounding degree 1211 and the 2-division independence degree 1212 when the other explanatory variable of the evaluation statistical value list is set to the comparative explanatory variable 1202.

【０１０７】図５０に示す２分割交絡度が０．７５を超
えているのはＳＴ２，ＳＴ４，ＳＴ５，ＳＴ６，ＳＴ１
０，ＷＥＴ２であり、これらは図４８の回帰木図には現
れてこないが、歩留りに大きく効いている要因である可
能性がある。逆に、図５１により、ＳＴ３は、２分割独
立度が高いことを示している。The degree of 2-part entanglement shown in FIG. 50 exceeds 0.75 in ST2, ST4, ST5, ST6 and ST1.
0 and WET2, which do not appear in the regression tree diagram of FIG. 48, but may be a factor that greatly affects the yield. On the contrary, FIG. 51 shows that ST3 has a high degree of 2-division independence.

【０１０８】図５１は、図５０で２分割独立度が高いと
されたＳＴ３を基準説明変数とし、他の１１の説明変数
との２分割交絡度１２１１および２分割独立度１２１２
を示している。ＳＴ３は他のいずれの説明変数とも独立
度が高いことを示している。In FIG. 51, ST3, which has a high degree of 2-division independence in FIG. 50, is used as a reference explanatory variable, and 2-division confounding degree 1211 and 2-division independence 1212 with the other 11 explanatory variables.
Is shown. ST3 indicates that the degree of independence with any of the other explanatory variables is high.

【０１０９】図５２および図５３は、図４９の回帰木分
析で有意差が大きいとされた上位１２の説明変数同士の
２分割交絡度、２分割独立度およびその平均値を示し、
説明変数間の関連を一見に把握できる。図５２の最下欄
は２分割交絡度の平均値１３０１を示し、図５３の最下
欄は２分割独立度の平均値１４０１を示す。FIG. 52 and FIG. 53 show the 2-part confounding degree, the 2-part independence degree, and their average value of the top 12 explanatory variables which are considered to have a large significant difference in the regression tree analysis of FIG.
The relationship between explanatory variables can be grasped at a glance. The bottom column of FIG. 52 shows the average value 1301 of the 2-division confounding degree, and the bottom column of FIG. 53 shows the average value 1401 of the 2-division independence degree.

【０１１０】つぎに、ＳＴ３での使用装置の差は他の説
明変数と独立して歩留りに効いていることが判明したの
で、歩留りが不良となるＳＴ３での装置群によるウェー
ハ群（不良ウェーハ群：Ｓ３Ｍ２，Ｓ３Ｍ３を使用）と
良好となるＳＴ３での装置群によるウェーハ群（良好ウ
ェーハ群、Ｓ３Ｍ１，Ｓ３Ｍ４を使用）に分けて別個に
回帰木分析をおこなう。その結果としての回帰木図を図
５４、図５５に示す。Next, it was found that the difference in the equipment used in ST3 was effective for the yield independently of other explanatory variables. Therefore, the wafer group by the equipment group in ST3 (the defective wafer group : S3M2 and S3M3 are used) and the wafer group (the good wafer group, S3M1 and S3M4 are used) by the device group in ST3 that is good is separately subjected to the regression tree analysis. The resulting regression tree diagrams are shown in FIGS. 54 and 55.

【０１１１】図５４は、不良ウェーハ群による回帰木分
析結果を示す回帰木図であり、ノードｎ１５００〜ｎ１
５０６で構成される。図５５は、良好ウェーハ群による
回帰木分析結果を示す回帰木図であり、ノードｎ１６０
０〜ｎ１６０６で構成される。FIG. 54 is a regression tree diagram showing the results of the regression tree analysis based on the defective wafer group.
506. FIG. 55 is a regression tree diagram showing the result of the regression tree analysis using the good wafer group.
0 to n1606.

【０１１２】図５４の不良ウェーハ群の第一分岐は図４
８の全ウェーハ群によるものと同じであり、図４８の回
帰木図の最上階層の不良ウェーハ群はｎ＝３９と少ない
こともあわせ、歩留りが他に比べて極端に悪いウェーハ
によりかなり左右されると推察され、解析を困難にして
いる一因である。図５５の良好ウェーハ群では、ＳＴ３
工程の不良装置により見えにくかった要因があらたに判
明したことになる。The first branch of the defective wafer group in FIG. 54 is shown in FIG.
8 is the same as that of all the wafer groups, and the number of defective wafers in the top hierarchy of the regression tree diagram of FIG. 48 is small (n = 39). This is one of the factors that make analysis difficult. In the good wafer group of FIG. 55, ST3
The factors that were difficult to see due to defective equipment in the process were newly found.

【０１１３】本実施の形態２によれば、２分割交絡度お
よび２分割独立度を用いて説明変数の交絡の度合いをよ
り明確に把握できるようになり、回帰木分析と組み合わ
せて、回帰木における最初の分岐の有意差が大きい問題
説明変数に交絡している注意すべき説明変数を明確化す
ることが可能となる。According to the second embodiment, the degree of confounding the explanatory variables can be more clearly grasped by using the degree of convolution of two divisions and the degree of independence of two divisions. It is possible to clarify the explanatory variables to be noted that are confounded with the problem explanatory variables that have a large significant difference in the first branch.

【０１１４】さらに、独立性の高い説明変数のグループ
分けを応用して再度回帰木分析することによって、回帰
木分析の精度（信頼度）および解析効率を向上させ、ま
た、より詳しい分析が可能となる。Furthermore, the regression tree analysis is applied again by applying grouping of explanatory variables having high independence to improve the accuracy (reliability) and analysis efficiency of the regression tree analysis, and further detailed analysis is possible. Become.

【０１１５】上述した実施の形態は、コンピュータがプ
ログラムを実行することによって実現することができ
る。また、プログラムをコンピュータに供給するための
手段、たとえばかかるプログラムを記録したＣＤ−ＲＯ
Ｍ等の記録媒体またはかかるプログラムを伝送するイン
ターネット等の伝送媒体も本発明の実施の形態として適
用することができる。上記のプログラム、記録媒体およ
び伝送媒体は、本発明の範疇に含まれる。The above-described embodiment can be realized by the computer executing the program. Also, means for supplying the program to the computer, for example, a CD-RO recording the program.
A recording medium such as M or a transmission medium such as the Internet that transmits such a program can also be applied as an embodiment of the present invention. The above program, recording medium, and transmission medium are included in the scope of the present invention.

【０１１６】なお、上記実施の形態は、いずれも本発明
を実施するにあたっての具体化のほんの一例を示したも
のに過ぎず、これらによって本発明の技術的範囲が限定
的に解釈されてはならないものである。すなわち、本発
明はその技術思想、またはその主要な特徴から逸脱する
ことなく、様々な形で実施することができる。The above-mentioned embodiments are merely examples of the implementation of the present invention, and the technical scope of the present invention should not be limitedly interpreted by these. It is a thing. That is, the present invention can be implemented in various forms without departing from its technical idea or its main features.

【０１１７】（付記１）オリジナルデータ値を編集して
前記オリジナルデータ群内に存在する１以上のデータ分
布特徴量を定量的に評価して抽出する工程と、抽出され
た前記データ分布特徴量の中から任意のデータ分布特徴
量を選択して解析をおこなう工程と、得られた解析結果
に基づいて意思決定をおこなう工程と、を含んだことを
特徴とするデータ解析方法。(Supplementary Note 1) A step of editing the original data values to quantitatively evaluate and extract one or more data distribution feature quantities existing in the original data group, and a step of extracting the extracted data distribution feature quantities. A data analysis method comprising: a step of performing an analysis by selecting an arbitrary data distribution feature amount from the inside, and a step of making a decision based on the obtained analysis result.

【０１１８】（付記２）オリジナルデータ値を編集して
前記オリジナルデータ群内に存在する２以上のデータ分
布特徴量を定量的に評価して抽出する工程と、抽出され
た個々の前記データ分布特徴量を順次選択して解析をお
こなう工程と、得られた解析結果に基づいて意思決定を
おこなう工程と、を含んだことを特徴とするデータ解析
方法。(Supplementary Note 2) A step of editing the original data values to quantitatively evaluate and extract two or more data distribution feature quantities existing in the original data group, and the extracted individual data distribution feature quantities. A data analysis method comprising: a step of sequentially selecting an amount for analysis, and a step of making a decision based on the obtained analysis result.

【０１１９】（付記３）前記データ分布特徴量は、特徴
の程度を表す連続値で示されることを特徴とする付記１
または２に記載のデータ解析方法。(Supplementary Note 3) The above-mentioned data distribution feature quantity is represented by a continuous value representing the degree of the feature, Supplementary Note 1
Or the data analysis method described in 2.

【０１２０】（付記４）各レコードに関して各データ分
布特徴量は互いに独立であることを特徴とする付記１〜
３のいずれか一つに記載のデータ解析方法。(Supplementary Note 4) Regarding each record, each data distribution feature quantity is independent from each other
3. The data analysis method described in any one of 3.

【０１２１】（付記５）前記データ分布特徴量を目的変
数とするデータマイニングにより解析をおこなうことを
特徴とする付記１〜４のいずれか一つに記載のデータ解
析方法。(Supplementary note 5) The data analysis method according to any one of supplementary notes 1 to 4, wherein analysis is performed by data mining using the data distribution feature quantity as an objective variable.

【０１２２】（付記６）個々の前記データ分布特徴量を
レコードごとにファイルに保存し、前記ファイルから、
一部または全部のレコードに対して同じデータ分布特徴
量を順次選択して目的変数とし、回帰木分析をおこなう
ことを特徴とする付記５に記載のデータ解析方法。(Supplementary Note 6) The individual data distribution feature quantities are saved in a file for each record, and from the file,
6. The data analysis method according to appendix 5, wherein the same data distribution feature amount is sequentially selected for some or all of the records as an objective variable and a regression tree analysis is performed.

【０１２３】（付記７）前記各工程は、順次おこなうよ
うに組まれたソフトウェアを計算機システムで実行する
ことによって自動的におこなわれることを特徴とする付
記１〜６のいずれか一つに記載のデータ解析方法。(Supplementary Note 7) Each of the above-mentioned steps is automatically performed by executing a software set up so as to be performed sequentially in a computer system. Data analysis method.

【０１２４】（付記８）前記データ分布特徴量の一つ
は、オリジナルデータの配列の順番をｘとし、オリジナ
ルデータ値をｙとし、ｘとｙの関係を一次式ｙ＝ｂ・ｘ
＋ａに近似した時のｙ軸切片の値ａであることを特徴と
する付記１〜７のいずれか一つに記載のデータ解析方
法。(Supplementary Note 8) One of the data distribution feature quantities is that the order of arrangement of original data is x, the original data value is y, and the relationship between x and y is a linear expression y = b · x.
The data analysis method according to any one of appendices 1 to 7, which is the value a of the y-axis intercept when approximated to + a.

【０１２５】（付記９）前記データ分布特徴量の一つ
は、オリジナルデータの配列の順番をｘとし、オリジナ
ルデータ値をｙとし、ｘとｙの関係を一次式ｙ＝ｂ・ｘ
＋ａに近似した時の傾きの値ｂであることを特徴とする
付記１〜７のいずれか一つに記載のデータ解析方法。(Supplementary note 9) One of the data distribution feature quantities is that the order of arrangement of the original data is x, the original data value is y, and the relationship between x and y is a linear expression y = b · x.
8. The data analysis method according to any one of appendices 1 to 7, wherein the value of the slope is b when approximated to + a.

【０１２６】（付記１０）前記データ分布特徴量の一つ
は、オリジナルデータの配列の順番に対するオリジナル
データ値の特定の周期性の強度であることを特徴とする
付記１〜７のいずれか一つに記載のデータ解析方法。(Supplementary note 10) One of the supplementary notes 1 to 7, wherein one of the data distribution feature quantities is the strength of the specific periodicity of the original data values with respect to the order of arrangement of the original data. Data analysis method described in.

【０１２７】（付記１１）前記データ分布特徴量の一つ
は、オリジナルデータの配列の順番に対するオリジナル
データ値の最も強い周期性を示す値であることを特徴と
する付記１〜７のいずれか一つに記載のデータ解析方
法。(Supplementary Note 11) One of the supplementary notes 1 to 7 is characterized in that one of the data distribution feature quantities is a value indicating the strongest periodicity of the original data value with respect to the order of arrangement of the original data. Data analysis method described in section 3.

【０１２８】（付記１２）（ａ）説明変数および目的変
数のデータ結果を準備する工程と、（ｂ）前記データ結
果を基に複数の説明変数間の交絡度および／または独立
度を演算する工程と、（ｃ）前記交絡度および／または
独立度を用いてデータマイニングをおこなう工程と、を
含んだことを特徴とするデータ解析方法。(Supplementary Note 12) (a) a step of preparing data results of explanatory variables and objective variables, and (b) a step of calculating the degree of confounding and / or independence between a plurality of explanatory variables based on the data results. And (c) a step of performing data mining using the degree of confounding and / or the degree of independence, the data analysis method.

【０１２９】（付記１３）前記ステップ（ｂ）は、回帰
木分析により２分割された集合単位で前記交絡度および
／または独立度を演算することを特徴とする付記１２に
記載のデータ解析方法。(Supplementary note 13) The data analysis method according to supplementary note 12, wherein in the step (b), the degree of confounding and / or the degree of independence is calculated in units of a set divided into two by regression tree analysis.

【０１３０】（付記１４）前記ステップ（ｂ）は、回帰
木分析により有意差が大きい分割の要因となる複数の説
明変数を選択し、該複数の説明変数間の交絡度および／
または独立度を演算することを特徴とする付記１３に記
載のデータ解析方法。(Supplementary Note 14) In the step (b), a plurality of explanatory variables that are factors of the division having a large significant difference are selected by regression tree analysis, and the degree of confounding and / or the degree of confounding among the plurality of explanatory variables are selected.
Alternatively, the data analysis method according to appendix 13, wherein the degree of independence is calculated.

【０１３１】（付記１５）前記ステップ（ｂ）は、基準
となる説明変数とその他の説明変数との間の交絡度およ
び／または独立度を演算する際、回帰木分析により２分
割された各集合内の説明変数間のデータの一致と不一致
との割合を基に交絡度および／または独立度を演算する
ことを特徴とする付記１４に記載のデータ解析方法。(Supplementary Note 15) In the step (b), when the degree of confounding and / or the degree of independence between the reference explanatory variable and other explanatory variables are calculated, each set divided into two by regression tree analysis. 15. The data analysis method according to appendix 14, wherein the degree of confounding and / or the degree of independence is calculated based on the ratio of agreement and disagreement of data between the explanatory variables in the above.

【０１３２】（付記１６）前記ステップ（ｃ）は、前記
交絡度および／または独立度を基に説明変数を取捨選択
することによりデータマイニングをおこなうことを特徴
とする付記１５に記載のデータ解析方法。(Supplementary note 16) The data analysis method according to supplementary note 15, wherein in step (c), data mining is performed by selecting explanatory variables based on the degree of confounding and / or the degree of independence. .

【０１３３】（付記１７）説明変数および目的変数のデ
ータ結果を基に複数の説明変数間の交絡度および／また
は独立度を演算する演算手段と、前記交絡度および／ま
たは独立度を用いてデータマイニングをおこなうデータ
マイニング手段と、を備えたことを特徴とするデータ解
析装置。(Supplementary Note 17) Calculation means for calculating the degree of confounding and / or the degree of independence between a plurality of explanatory variables based on the data results of the explanatory variables and the objective variables, and data using the degree of confounding and / or the degree of independence. A data analysis device, comprising: a data mining means for performing mining.

【０１３４】（付記１８）前記演算手段は、回帰木分析
により２分割された集合単位で前記交絡度および／また
は独立度を演算することを特徴とする付記１７に記載の
データ解析装置。(Supplementary note 18) The data analyzing apparatus according to supplementary note 17, wherein the calculating means calculates the degree of confounding and / or the degree of independence in units of sets divided into two by regression tree analysis.

【０１３５】（付記１９）前記演算手段は、回帰木分析
により有意差が大きい分割の要因となる複数の説明変数
を選択し、該複数の説明変数間の交絡度および／または
独立度を演算することを特徴とする付記１８に記載のデ
ータ解析装置。(Supplementary Note 19) The calculating means selects a plurality of explanatory variables which are factors of division with a large significant difference by regression tree analysis, and calculates the degree of confounding and / or the degree of independence between the plurality of explanatory variables. 19. The data analysis device according to appendix 18, characterized in that.

【０１３６】（付記２０）前記演算手段は、基準となる
説明変数とその他の説明変数との間の交絡度および／ま
たは独立度を演算する際、回帰木分析により２分割され
た各集合内の説明変数間のデータの一致と不一致との割
合を基に交絡度および／または独立度を演算することを
特徴とする付記１９に記載のデータ解析装置。(Supplementary Note 20) When calculating the degree of confounding and / or the degree of independence between a reference explanatory variable and other explanatory variables, the calculating means calculates the degree of confusion in each set divided into two by regression tree analysis. 20. The data analysis apparatus according to appendix 19, wherein the degree of confounding and / or the degree of independence is calculated based on the ratio of agreement and disagreement of data between explanatory variables.

【０１３７】（付記２１）前記データマイニング手段
は、前記交絡度および／または独立度を基に説明変数を
取捨選択することによりデータマイニングをおこなうこ
とを特徴とする付記２０に記載のデータ解析装置。(Supplementary note 21) The data analysis apparatus according to supplementary note 20, wherein the data mining means performs data mining by selecting the explanatory variables based on the degree of confounding and / or the degree of independence.

【０１３８】（付記２２）（ａ）説明変数および目的変
数のデータ結果を準備する手順と、（ｂ）前記データ結
果を基に複数の説明変数間の交絡度および／または独立
度を演算する手順と、（ｃ）前記交絡度および／または
独立度を用いてデータマイニングをおこなう手順と、を
コンピュータに実行させるためのプログラムを記録した
コンピュータ読み取り可能な記録媒体。(Supplementary Note 22) (a) Procedure for preparing data results of explanatory variables and objective variables, and (b) Procedure for calculating confounding degree and / or independence degree between a plurality of explanatory variables based on the data results. And (c) a procedure for performing data mining using the degree of confounding and / or the degree of independence, a computer-readable recording medium recording a program for causing a computer to execute the procedure.

【０１３９】[0139]

【発明の効果】本発明によれば、計算機システムに蓄積
されているオリジナルデータ群内に存在する種々のデー
タ分布特徴を抽出し、各特徴量を順次選択して解析をお
こなうことにより、各特徴量が生じた要因を自動的かつ
定量的に評価して抽出するため、データをより多面的に
みて多くの情報（傾向、特徴的パターン、データ間の関
連性等）を抽出することができる。したがって、従来は
多種多様なデータに埋もれて判別が困難であった関連性
や有意差を、技術者の主観によらずに客観的に、また効
率的に定量的に抽出することができる。According to the present invention, various data distribution features existing in the original data group accumulated in the computer system are extracted, and each feature amount is sequentially selected and analyzed to analyze each feature. Since the factor that causes the quantity is automatically and quantitatively evaluated and extracted, it is possible to extract a lot of information (trends, characteristic patterns, relationships between data, etc.) from a more multifaceted perspective. Therefore, it is possible to objectively and efficiently and quantitatively extract the relation and the significant difference which are conventionally difficult to discriminate because they are buried in various data.

[Brief description of drawings]

【図１】本発明の実施の形態１において使用される計算
機システムの一例を示す図である。FIG. 1 is a diagram showing an example of a computer system used in a first embodiment of the present invention.

【図２】図１に示す構成の計算機システムにより実現さ
れるデータ解析装置の機能構成の一例を示すブロック図
である。FIG. 2 is a block diagram showing an example of a functional configuration of a data analysis device implemented by the computer system having the configuration shown in FIG.

【図３】本発明の実施の形態１においてデータ分布特徴
の抽出により抽出される各特徴量をＣＳＶ形式で表した
図表である。FIG. 3 is a chart showing each feature amount extracted by extraction of data distribution features in the first embodiment of the present invention in CSV format.

【図４】本発明の実施の形態１において半導体データの
歩留り解析をおこなう際にロット内データ分布の特徴と
してウェーハの属性値の変動に着目した情報を示す図表
である。FIG. 4 is a chart showing information focusing on a variation of a wafer attribute value as a feature of a data distribution within a lot when performing a yield analysis of semiconductor data in the first embodiment of the present invention.

【図５】複数のロットについてロット内ウェーハの歩留
り値のばらつきの様子を示す図である。FIG. 5 is a diagram showing variations in yield values of wafers in a lot for a plurality of lots.

【図６】本発明の実施の形態１にかかるデータ解析方法
の一例の概略を示すフローチャートである。FIG. 6 is a flowchart showing an outline of an example of a data analysis method according to the first exemplary embodiment of the present invention.

【図７】具体例として歩留りとＶＴ＿Ｎ２との関係を示
す特性図である。FIG. 7 is a characteristic diagram showing a relationship between yield and VT_N2 as a specific example.

【図８】具体例としてすべてのウェーハから得られたＶ
Ｔ＿Ｎ２データのヒストグラムを示す図である。FIG. 8: V obtained from all wafers as a specific example
It is a figure which shows the histogram of T_N2 data.

【図９】具体例として全ＶＴ＿Ｎ２データをウェーハ番
号ごとに表示した箱ヒゲ図を示す図である。FIG. 9 is a diagram showing a box whisker diagram in which all VT_N2 data are displayed for each wafer number as a specific example.

【図１０】具体例として目的変数を各ロットにおけるＶ
Ｔ＿Ｎ２の平均値とし、説明変数を各工程で使用した装
置名として回帰木分析をおこなった結果を示す図であ
る。FIG. 10: As a specific example, the objective variable is V for each lot
It is a figure which shows the result of performing a regression tree analysis by setting the average value of T_N2 and using the explanatory variable as the device name used in each process.

【図１１】図１０に示す回帰木分析結果に対する評価用
統計値リストの例を示す図である。11 is a diagram showing an example of an evaluation statistical value list for the regression tree analysis result shown in FIG.

【図１２】具体例として全ＶＴ＿Ｎ２データを第２配線
＿装置の使用装置名ごとに表示した箱ヒゲ図を示す図で
ある。FIG. 12 is a diagram showing a box whiskers diagram in which all VT_N2 data are displayed for each device name of the second wiring device used as a specific example.

【図１３】具体例として目的変数を各ウェーハのＶＴ＿
Ｎ２の値とし、説明変数を各工程で使用した装置名とし
て回帰木分析をおこなった結果を示す図である。FIG. 13 shows a specific example in which the objective variable is VT_ of each wafer.
It is a figure which shows the result of having performed the regression tree analysis by setting the value of N2 and using the explanatory variable as the apparatus name used in each process.

【図１４】図１３に示す回帰木分析結果に対する評価用
統計値リストの例を示す図である。14 is a diagram showing an example of an evaluation statistical value list for the regression tree analysis result shown in FIG.

【図１５】具体例として全ＶＴ＿Ｎ２データを２ＣＯＮ
工程＿装置の使用装置名ごとに表示した箱ヒゲ図を示す
図である。[FIG. 15] As a concrete example, all VT_N2 data is set to 2CON.
It is a figure which shows the box mustache figure displayed for every use apparatus name of process_apparatus.

【図１６】具体例としてロットごとにＶＴ＿Ｎ２の各特
徴量を定義したファイルを示す図表である。FIG. 16 is a table showing a file defining each feature amount of VT_N2 for each lot as a specific example.

【図１７】具体例としてＶＴ＿Ｎ２のロット内分布の各
特徴量のヒストグラムを示す図である。FIG. 17 is a diagram showing a histogram of each feature amount of the intra-lot distribution of VT_N2 as a specific example.

【図１８】具体例としてＶＴ＿Ｎ２のロット内分布の特
徴量について回帰木分析をおこなうためのファイルを示
す図表である。FIG. 18 is a table showing a file for performing a regression tree analysis on the feature amount of the intra-lot distribution of VT_N2 as a specific example.

【図１９】具体例として歩留りの変動要因を回帰木分析
で解析するための入力ファイルを示す図表である。FIG. 19 is a chart showing an input file for analyzing a yield variation factor by regression tree analysis as a specific example.

【図２０】具体例として目的変数を各ロットにおけるＶ
Ｔ＿Ｎ２の標準偏差値とし、説明変数を各工程で使用し
た装置名として回帰木分析をおこなった結果を示す図で
ある。FIG. 20: As a specific example, the objective variable is V for each lot
It is a figure which shows the result of performing a regression tree analysis by setting the standard deviation value of T_N2 and using the explanatory variable as the device name used in each process.

【図２１】図２０に示す回帰木分析結果に対する評価用
統計値リストの例を示す図である。FIG. 21 is a diagram showing an example of an evaluation statistical value list for the regression tree analysis result shown in FIG. 20.

【図２２】具体例として全ＶＴ＿Ｎ２データをＦｉｅｌ
ｄ＿Ｏｘ工程＿装置の使用装置名ごとに表示した箱ヒゲ
図を示す図である。[Fig. 22] As a specific example, all VT_N2 data is field
It is a figure which shows the box mustache figure displayed for every used device name of d_Ox process_device.

【図２３】具体例としてＦｉｅｌｄ＿Ｏｘ工程でＰＭ１
号機またはＰＭ３号機を使用した全ウェーハのＶＴ＿Ｎ
２のヒストグラムを示す図である。FIG. 23 shows PM1 in the Field_Ox process as a specific example.
VT_N of all wafers using No. 3 or PM No. 3
It is a figure which shows the histogram of 2.

【図２４】具体例としてＦｉｅｌｄ＿Ｏｘ工程でＰＭ１
号機またはＰＭ３号機を使用した１ロット分のウェーハ
のＶＴ＿Ｎ２のヒストグラムを示す図である。FIG. 24 shows PM1 in the Field_Ox process as a specific example.
It is a figure which shows the histogram of VT_N2 of the wafer for 1 lot which used the No. 3 machine or PM3 machine.

【図２５】具体例としてＦｉｅｌｄ＿Ｏｘ工程でＰＭ１
号機またはＰＭ３号機を使用した１ロット分のウェーハ
のＶＴ＿Ｎ２のヒストグラムを示す図である。FIG. 25 shows PM1 in the Field_Ox process as a specific example.
It is a figure which shows the histogram of VT_N2 of the wafer for 1 lot which used the No. 3 machine or PM3 machine.

【図２６】具体例としてＦｉｅｌｄ＿Ｏｘ工程でＰＭ１
号機またはＰＭ３号機を使用した１ロット分のウェーハ
のＶＴ＿Ｎ２のヒストグラムを示す図である。FIG. 26 shows PM1 in the Field_Ox process as a specific example.
It is a figure which shows the histogram of VT_N2 of the wafer for 1 lot which used the No. 3 machine or PM3 machine.

【図２７】具体例としてＦｉｅｌｄ＿Ｏｘ工程でＰＭ２
号機を使用した全ウェーハのＶＴ＿Ｎ２のヒストグラム
を示す図である。FIG. 27 shows PM2 in the Field_Ox process as a specific example.
It is a figure which shows the histogram of VT_N2 of all the wafers which used the No. machine.

【図２８】具体例としてＦｉｅｌｄ＿Ｏｘ工程でＰＭ２
号機を使用した１ロット分のウェーハのＶＴ＿Ｎ２のヒ
ストグラムを示す図である。[FIG. 28] As a specific example, PM2 in the Field_Ox step
It is a figure which shows the histogram of VT_N2 of the wafer for 1 lot which used the No. machine.

【図２９】具体例としてＦｉｅｌｄ＿Ｏｘ工程でＰＭ２
号機を使用した１ロット分のウェーハのＶＴ＿Ｎ２のヒ
ストグラムを示す図である。FIG. 29 shows PM2 in the Field_Ox process as a specific example.
It is a figure which shows the histogram of VT_N2 of the wafer for 1 lot which used the No. machine.

【図３０】具体例としてＦｉｅｌｄ＿Ｏｘ工程でＰＭ２
号機を使用した１ロット分のウェーハのＶＴ＿Ｎ２のヒ
ストグラムを示す図である。FIG. 30 shows PM2 in the Field_Ox process as a specific example.
It is a figure which shows the histogram of VT_N2 of the wafer for 1 lot which used the No. machine.

【図３１】具体例として目的変数をウェーハ番号の間隔
２の周期性の値とし、説明変数を各工程で使用した装置
名として回帰木分析をおこなった結果を示す図である。FIG. 31 is a diagram showing a result of performing a regression tree analysis as a specific example, in which an objective variable is a periodicity value of a wafer number interval 2 and an explanatory variable is an apparatus name used in each step.

【図３２】図３１に示す回帰木分析結果に対する評価用
統計値リストの例を示す図である。32 is a diagram showing an example of an evaluation statistical value list for the regression tree analysis result shown in FIG. 31.

【図３３】具体例としてウェーハ番号の間隔２の周期性
の値をＦ拡散工程＿装置の使用装置名ごとに表示した箱
ヒゲ図を示す図である。FIG. 33 is a diagram showing a box whisker diagram in which the periodicity value of the wafer number interval 2 is displayed for each device name of the F diffusion process_device as a specific example.

【図３４】具体例としてＦ拡散工程でＦ７号機を使用し
た１ロット分のウェーハについてＶＴ＿Ｎ２のロット内
変動を示す図である。FIG. 34 is a diagram showing intra-lot variation of VT_N2 for one lot of wafers using the F7 machine in the F diffusion process as a specific example.

【図３５】具体例としてＦ拡散工程でＦ７号機を使用し
た１ロット分のウェーハについてＶＴ＿Ｎ２のロット内
変動を示す図である。FIG. 35 is a diagram showing intra-lot variation of VT_N2 for one lot of wafers using F7 machine in the F diffusion process as a specific example.

【図３６】具体例としてＦ拡散工程でＦ７号機を使用し
た１ロット分のウェーハについてＶＴ＿Ｎ２のロット内
変動を示す図である。FIG. 36 is a diagram showing intra-lot variation of VT_N2 for one lot of wafers using F7 machine in the F diffusion process as a specific example.

【図３７】具体例としてＦ拡散工程でＦ５号機、Ｆ６号
機、Ｆ８号機またはＦ９号機を使用した１ロット分のウ
ェーハについてＶＴ＿Ｎ２のロット内変動を示す図であ
る。FIG. 37 is a diagram showing intra-lot variation of VT_N2 for one lot of wafers using F5 machine, F6 machine, F8 machine, or F9 machine in the F diffusion process as a specific example.

【図３８】具体例としてＦ拡散工程でＦ５号機、Ｆ６号
機、Ｆ８号機またはＦ９号機を使用した１ロット分のウ
ェーハについてＶＴ＿Ｎ２のロット内変動を示す図であ
る。FIG. 38 is a diagram showing intra-lot variation of VT_N2 for one lot of wafers using F5 machine, F6 machine, F8 machine, or F9 machine in the F diffusion process as a specific example.

【図３９】具体例としてＦ拡散工程でＦ５号機、Ｆ６号
機、Ｆ８号機またはＦ９号機を使用した１ロット分のウ
ェーハについてＶＴ＿Ｎ２のロット内変動を示す図であ
る。FIG. 39 is a diagram showing intra-lot variation of VT_N2 for one lot of wafers using F5 machine, F6 machine, F8 machine, or F9 machine in the F diffusion process as a specific example.

【図４０】ロットの流れと異常製造装置の関係を示す図
である。FIG. 40 is a diagram showing a relationship between a lot flow and an abnormal manufacturing apparatus.

【図４１】従来技術によるある工程での装置別歩留り分
布を示す図である。FIG. 41 is a diagram showing a yield distribution by device in a certain process according to a conventional technique.

【図４２】ロットの流れと異常製造装置の交絡の関係を
示す図である。FIG. 42 is a diagram showing the relationship between the lot flow and the confounding of the abnormal manufacturing apparatus.

【図４３】回帰木分析入力データの例を示す図である。FIG. 43 is a diagram showing an example of regression tree analysis input data.

【図４４】回帰木の例を示す図である。FIG. 44 is a diagram showing an example of a regression tree.

【図４５】評価用統計値リストの例を示す図である。FIG. 45 is a diagram showing an example of an evaluation statistical value list.

【図４６】使用製造装置と電気的特性データと歩留り値
の関係を示す図である。FIG. 46 is a diagram showing a relationship between a manufacturing apparatus used, electrical characteristic data, and a yield value.

【図４７】２分割交絡度および２分割独立度の算出例を
示す図である。[Fig. 47] Fig. 47 is a diagram illustrating a calculation example of a 2-division confounding degree and a 2-division independence degree.

【図４８】回帰木の例を示す図である。FIG. 48 is a diagram showing an example of a regression tree.

【図４９】評価用統計値リストの例を示す図である。FIG. 49 is a diagram showing an example of an evaluation statistical value list.

【図５０】各説明変数と第１候補の説明変数との交絡度
および独立度を示す図である。FIG. 50 is a diagram showing the degree of confounding and the degree of independence of each explanatory variable and the explanatory variable of the first candidate.

【図５１】各説明変数と第３候補の説明変数との交絡度
および独立度を示す図である。FIG. 51 is a diagram showing the degree of confounding and the degree of independence of each explanatory variable and the explanatory variable of the third candidate.

【図５２】全候補同士の交絡度およびその平均を示す図
である。FIG. 52 is a diagram showing a degree of confounding among all candidates and an average thereof.

【図５３】全候補同士の独立度およびその平均を示す図
である。FIG. 53 is a diagram showing the independence of all candidates and their average.

【図５４】不良ウェーハ群による回帰木分析結果を示す
回帰木図である。FIG. 54 is a regression tree diagram showing a regression tree analysis result based on a defective wafer group.

【図５５】良好ウェーハ群による回帰木分析結果を示す
回帰木図である。FIG. 55 is a regression tree diagram showing the results of regression tree analysis using a group of good wafers.

【図５６】データ解析装置の機能構成の一例を示す図で
ある。FIG. 56 is a diagram showing an example of a functional configuration of a data analysis device.

[Explanation of symbols]

２１データ分布特徴を抽出する手段２２解析対象とする特徴量を選択する手段２３データマイニングをおこなう手段２４ルールファイル２５統計解析コンポーネント２６図表作成コンポーネント２７解析ツール群４１データベース４２オリジナルデータ群１０１正常装置１０２異常装置４０１説明変数４０２目的変数４１１使用装置４１２電気的特性データ４１３歩留り８０１基準説明変数８０２比較説明変数８０３ウェーハ番号８０４歩留り８１１基準説明変数の高歩留りグループ８１２基準説明変数の低歩留りグループ８１３２分割交絡度の計算式８１４２分割交絡度８１５２分割独立度１７０１オリジナルデータ群１７０２データベース１７０３データマイニング部１７０４ルールファイル１７０５解析ツール群１７０６統計解析コンポーネント１７０７図表作成コンポーネント１７０８意思決定部 21 means for extracting data distribution features 22 Means for selecting feature quantity to be analyzed 23 Means for data mining 24 Rule file 25 Statistical analysis component 26 Chart Creation Component 27 Analysis tools 41 Database 42 Original data group 101 Normal device 102 Abnormal device 401 Explanatory variable 402 Objective variable 411 Equipment used 412 Electrical characteristics data 413 Yield 801 Standard explanatory variables 802 Comparison explanatory variables 803 Wafer number 804 Yield 811 High Yield Group of Standard Explanatory Variables 812 Low-yield group of reference explanatory variables 813 2-division entanglement calculation formula 814 2 division degree 815 2-division independence 1701 Original data group 1702 database 1703 Data Mining Department 1704 rule file 1705 Analysis tools 1706 Statistical Analysis Component 1707 Chart creation component 1708 decision-making department

───────────────────────────────────────────────────── フロントページの続き (72)発明者白井英大神奈川県川崎市高津区坂戸３丁目２番１号富士通エルエスアイテクノロジ株式会社内Ｆターム(参考） 5B056 BB61 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Hidehiro Shirai 3-2-1 Sakado, Takatsu-ku, Kawasaki City, Kanagawa Prefecture Fujitsu LSI Technology Co., Ltd. Within F term (reference) 5B056 BB61

Claims

[Claims]

1. A step of quantitatively evaluating and extracting one or more data distribution feature quantities existing in the original data group by editing an original data value, and from the extracted data distribution feature quantities A data analysis method characterized by including a step of performing an analysis by selecting an arbitrary data distribution characteristic amount and a step of making a decision based on the obtained analysis result.

2. A step of editing an original data value to quantitatively evaluate and extract two or more data distribution feature quantities existing in the original data group, and a step of extracting each of the extracted data distribution feature quantities. A data analysis method comprising: a step of sequentially selecting and performing an analysis; and a step of making a decision based on the obtained analysis result.

3. The data analysis method according to claim 1, wherein the data distribution characteristic amount is represented by a continuous value indicating a degree of the characteristic.

4. The data analysis method according to any one of claims 1 to 3, wherein the respective data distribution feature quantities for each record are independent of each other.

5. The data analysis method according to claim 1, wherein analysis is performed by data mining using the data distribution feature quantity as an objective variable.

6. A regression tree in which the individual data distribution feature amounts are stored in a file for each record, and the same data distribution feature amount is sequentially selected for some or all of the records from the file as an objective variable. The data analysis method according to claim 5, wherein analysis is performed.

7. The process according to claim 1, wherein each of the steps is automatically performed by executing software that is configured to be sequentially performed on a computer system.
6. The data analysis method according to any one of 6.