JP2005532671A

JP2005532671A - Manufacturing data analysis method and apparatus

Info

Publication number: JP2005532671A
Application number: JP2003517801A
Authority: JP
Inventors: ショウンビースミス; ブライアンピーグリグスビー; ハングジェイファム; トニーエルデイヴィス; マンジュナスエスイェダトーア; ウィリアムアールザサードクレメンツ
Original assignee: Applied Materials Inc
Current assignee: Applied Materials Inc
Priority date: 2001-07-30
Filing date: 2002-07-29
Publication date: 2005-10-27
Anticipated expiration: 2022-07-29
Also published as: JP4446231B2

Abstract

集積回路製造工場（“工場”）において入手した情報をデータマインニングする方法であって、（ａ）工場内でデータを発生する、または工場からデータを収集するシステム、ツール、及びデータベースの１またはそれ以上からデータを集めるステップと、（ｂ）データをフォーマットし、フォーマットされたデータをソースデータベース内に格納するステップと、（ｃ）ユーザが指定した構成ファイルに従ってデータマインニングに使用するためのデータの部分を抽出するステップと、（ｄ）ユーザが指定した分析構成ファイルに応答してデータの抽出された部分をデータマインニングするステップと、（ｅ）データマインニングの結果を結果データベース内に格納するステップと、（ｆ）結果へのアクセスを提供するステップとを含む。A method of data mining information obtained at an integrated circuit manufacturing factory ("factory"), comprising: (a) one of a system, tool, and database that generates or collects data from the factory Collecting data from further, (b) formatting the data, storing the formatted data in the source database, and (c) data for use in data mining according to a configuration file specified by the user (D) a step of data mining the extracted part of the data in response to the analysis configuration file specified by the user, and (e) storing the result of the data mining in the result database. And (f) providing access to the results.

Description

本発明の１またはそれ以上の実施例は、限定するものではないが、例えば、集積回路（“ＩＣ”）製造または組立て工場（以下、“半導体製造工場”または“工場”という）において得られる情報を分析するための方法及び装置に関する。 One or more embodiments of the present invention include, but are not limited to, information obtained in, for example, an integrated circuit (“IC”) manufacturing or assembly plant (hereinafter “semiconductor manufacturing plant” or “factory”). The present invention relates to a method and apparatus for analyzing

図１は、従来技術による集積回路（以下、“ＩＣ”という）の製造または組立て工場（以下、“半導体製造工場”または“工場”という）内に存在する歩留まり分析ツールインフラストラクチャを示している。図１に示すように、マスクショップ１０００はレティクルを生産する。図１に更に示すように、作業進行追跡システム１０２０（以下、“ＷＩＰ（ワーク・イン・プログレス）追跡システム１０２０”という）は、ウェーハまたは基体上にＩＣを製造（及び試験）するために使用される工場内の諸処理ステップを通して進行するウェーハを追跡する。本明細書においては、“ウェーハ”及び“基体”という用語を互換的に使用し、限定するものではないが、例えば、ガラス基体を含む全ての種類の半導体ウェーハ、または基体を意味するものとする。ＷＩＰ追跡システム１０２０は、限定するものではないが、例えば、インプラントツール１０３０、拡散・酸化・堆積ツール１０４０、化学・機械的平面化ツール１０５０（以下、“ＣＭＰツール１０５０”という）、レジストコーティングツール１０６０（限定するものではないが、例えば、フォトレジストをコーティングするためのツール）、ステッパツール１０７０、現像装置ツール１０８０、エッチング／洗浄ツール１０９０、レーザ試験ツール１１００、パラメータ試験ツール１１１０、ウェーハ分類ツール１１２０、及び最終試験ツール１１３０を通るウェーハを追跡する。これらのツールは、ＩＣを生産する工場において使用されるツールの殆どを表している。しかしながら、これらは単なる例示に過ぎず、本発明を限定するものではない。 FIG. 1 shows a yield analysis tool infrastructure that exists in a prior art integrated circuit (hereinafter “IC”) manufacturing or assembly plant (hereinafter “semiconductor manufacturing plant” or “factory”). As shown in FIG. 1, the mask shop 1000 produces a reticle. As further shown in FIG. 1, a work progress tracking system 1020 (hereinafter “WIP (work-in-progress) tracking system 1020”) is used to manufacture (and test) ICs on a wafer or substrate. Track the wafer as it progresses through the various processing steps in the factory. In this specification, the terms “wafer” and “substrate” are used interchangeably and shall mean, for example, but not limited to, all types of semiconductor wafers, including glass substrates, or substrates. . The WIP tracking system 1020 includes, but is not limited to, an implant tool 1030, a diffusion / oxidation / deposition tool 1040, a chemical / mechanical planarization tool 1050 (hereinafter referred to as “CMP tool 1050”), a resist coating tool 1060, for example. (For example, but not limited to, a tool for coating photoresist), a stepper tool 1070, a developer tool 1080, an etching / cleaning tool 1090, a laser test tool 1100, a parameter test tool 1110, a wafer classification tool 1120, And tracking the wafer through the final test tool 1130. These tools represent most of the tools used in factories that produce ICs. However, these are merely examples and do not limit the present invention.

図１に更に示すように、工場はツールレベル測定を入手するための、及び種々のプロセスを自動化するための多くのシステムを含む。例えば、図１に示すように、ツールレベル測定及び自動化システムは、例えば、処理ツール管理（例えば、プロセスレシピ管理）及びツールセンサ測定データ収集及び分析のようなツールレベル測定及び自動化タスクを可能にするためのツールデータベース１２１０を含む。例えば、限定するものではなく単なる例示に過ぎないが、ＰＣサーバ１２３０は、プロセスレシピデータを（レシピモジュール１２３３を通して）ツールへダウンロードし、ツールセンサから（センサモジュール１２３５から）ツールセンサ測定データを受信し、プロセスレシピデータ及びツールセンサ測定データを例えばツールデータベース１２１０内へ格納する。 As further shown in FIG. 1, the factory includes many systems for obtaining tool level measurements and for automating various processes. For example, as shown in FIG. 1, the tool level measurement and automation system enables tool level measurement and automation tasks such as, for example, processing tool management (eg, process recipe management) and tool sensor measurement data collection and analysis. A tool database 1210 is included. For example, but not by way of example only, the PC server 1230 downloads process recipe data to the tool (through the recipe module 1233) and receives tool sensor measurement data from the tool sensor (from the sensor module 1235). The process recipe data and the tool sensor measurement data are stored in the tool database 1210, for example.

図１に更に示すように、工場は多くのプロセス測定ツールを含んでいる。例えば、欠陥測定ツール１２６０及び１２６１、レティクル欠陥測定ツール１２６５、オーバレイ欠陥測定ツール１２６７、欠陥見直しツール１２７０（以下、“ＤＲＴ１２７０”という）、限界寸法（クリティカルディメンション）測定ツール１２８０（以下、“ＣＤ測定ツール１２８０”という）、及び電圧コントラスト測定ツール１２９０が含まれ、これらのプロセス測定ツールは、プロセス評価ツール１３００によって駆動される。 As further shown in FIG. 1, the factory includes a number of process measurement tools. For example, defect measurement tools 1260 and 1261, reticle defect measurement tool 1265, overlay defect measurement tool 1267, defect review tool 1270 (hereinafter referred to as "DRT1270"), critical dimension measurement tool 1280 (hereinafter referred to as "CD measurement tool"). 1280 ″), and a voltage contrast measurement tool 1290, which are driven by a process evaluation tool 1300.

図１に更に示すように、特定用途向け分析ツールが、あるプロセス測定ツールを駆動する。例えば、欠陥管理者ツール１３１０は欠陥測定ツール１２６０及び１２６１が生成したデータを分析し、レティクル分析ツール１３２０はレティクル欠陥測定ツール１２６７が生成したデータを分析し、ＣＤ分析ツール１３４０はＣＤ測定ツール１２８０が生成したデータを分析し、テストウェアツール１３５０はレーザ試験ツール１１００、パラメトリック試験ツール１１１０、ウェーハ分類ツール１２１０、及び最終試験ツール１１３０が生成したデータを分析する。 As further shown in FIG. 1, an application specific analysis tool drives a process measurement tool. For example, the defect manager tool 1310 analyzes the data generated by the defect measurement tools 1260 and 1261, the reticle analysis tool 1320 analyzes the data generated by the reticle defect measurement tool 1267, and the CD analysis tool 1340 is analyzed by the CD measurement tool 1280. Analyzing the generated data, testware tool 1350 analyzes the data generated by laser test tool 1100, parametric test tool 1110, wafer classification tool 1210, and final test tool 1130.

図１に更に示すように、データベース追跡／相関ツールは、特定用途向け分析ツールの１つまたはそれ以上からのデータを、通信ネットワークを通して入手する。例えば、統計的分析ツール１４００は、例えば欠陥管理者ツール１３１０、ＣＤ分析ツール１３４０、テストウェア１３５０からデータを入手し、そのデータを関係型（リレーショナル）データベース１４１０内へ格納する。 As further shown in FIG. 1, the database tracking / correlation tool obtains data from one or more of the application specific analysis tools through a communication network. For example, the statistical analysis tool 1400 obtains data from, for example, the defect manager tool 1310, the CD analysis tool 1340, and the testware 1350 and stores the data in the relational database 1410.

最後に、データ抽出データベース１４２０内に格納されているデータに対して歩留まり管理方法論が適用される。このデータは、通信ネットワークを通してＷＩＰ追跡システム１０２０及びツールデータベース１２１０から抽出される。 Finally, the yield management methodology is applied to the data stored in the data extraction database 1420. This data is extracted from the WIP tracking system 1020 and the tool database 1210 through a communication network.

従来技術においては、工場内において使用される歩留まり管理システムが多くの問題を抱えている。図２は、工場内において使用されている従来技術のプロセスを示しており、以下この従来のプロセスをラインの終わり（エンド・オブ・ライン）監視という。ラインの終わり監視は、“終了標識(trailing indicator）”フィードバックループを使用するプロセスである。例えば、図２にボックス２０００で示すように、限定するものではないが、例えば、低歩留まり、低品質、及び／またはデバイスの低速のような終了標識が識別される。次いで、ボックス２０１０において“不良ロット”メトリックス（即ち、終了標識を発生したウェーハロットに関連する測定）がメトリックスの仕様（以下、スペックという）と比較される。もしメトリックスが“スペック外”であればプロセスはボックス２０３０へ進み、“スペック外”イベントに対する動作が遂行され、“スペック外”状態を補正するためのフィードバックがプロセス制御エンジニアへ供給される。一方、もしメトリックスが“スペック内”にあればプロセスはボックス２０２０へ進み、障害に対する過去の履歴のプラント知識が分析される。もしこれが既に識別済みの問題であれば、プロセスはボックス２０４０へ進む。識別済みの問題でなければ（即ち、先行知識が存在しなければ）、プロセスはボックス２０５０へ進む。ボックス２０４０において、既に識別済みの問題に関連するロットまたはツールコメントに対する動作が遂行され、先に遂行された動作と同一の型の動作を遂行するためのフィードバックがプロセス制御エンジニアへ供給される。ボックス２０５０に示すように、ツールまたはデバイス処理履歴データと障害との相関の存否が判断される。もし相関が見出されればプロセスはボックス２０６０へ進み、もし相関が見出されなければプロセスはボックス２０７０へ進む。ボックス２０６０において“不良”ツールまたはデバイス処理が“固定”（フィックス）され、フィードバックがプロセス制御エンジニアへ供給される。ボックス２０７０において、工場保守ジョブが遂行される。 In the prior art, the yield management system used in a factory has many problems. FIG. 2 shows a prior art process used in a factory, which is hereinafter referred to as end-of-line monitoring. End-of-line monitoring is a process that uses a “trailing indicator” feedback loop. For example, as shown by box 2000 in FIG. 2, an end indicator such as, but not limited to, low yield, low quality, and / or device slowness is identified. The “bad lot” metric (ie, the measurement associated with the wafer lot that generated the end indicator) is then compared in box 2010 with the metric specification (hereinafter, the specification). If the metric is “out of spec”, the process proceeds to box 2030 where the action for the “out of spec” event is performed and feedback to correct the “out of spec” condition is provided to the process control engineer. On the other hand, if the metric is “in specification”, the process proceeds to box 2020 where the past history of plant knowledge for the failure is analyzed. If this is an already identified problem, the process proceeds to box 2040. If it is not an identified problem (ie, no prior knowledge exists), the process proceeds to box 2050. In box 2040, an operation is performed on a lot or tool comment associated with an already identified problem and feedback is provided to the process control engineer to perform the same type of operation as previously performed. As shown in box 2050, it is determined whether there is a correlation between the tool or device processing history data and the failure. If a correlation is found, the process proceeds to box 2060, and if no correlation is found, the process proceeds to box 2070. In box 2060, the “bad” tool or device process is “fixed” and feedback is provided to the process control engineer. In box 2070, a factory maintenance job is performed.

上述したラインの終わり監視プロセスには、幾つかの問題が付随する。例えば、（ａ）幾つかの問題によって低歩留まりが発生することが多く、（ｂ）理論的に未確認の事由から屡々“スペック”限界に到達し、（ｃ）過去の製品障害履歴の知識が文書化されないことが多いか、または、たとえ文書化されていてもその文書が広く配布されておらず、（ｄ）データ及びデータアクセスが断片化され、そして（ｅ）相関分析を遂行する前に作業仮説を生成しなければならず、相関の数が極めて多く、相関分析を遂行するために使用される資源が制限されている。 There are several problems associated with the end-of-line monitoring process described above. For example, (a) low yields often occur due to some problems, (b) the “spec” limit is often reached due to theoretically unconfirmed reasons, and (c) knowledge of past product failure history is documented. Are often not documented, or even if documented, the document is not widely distributed, (d) data and data access are fragmented, and (e) work before performing correlation analysis Hypotheses must be generated, the number of correlations is very large, and the resources used to perform the correlation analysis are limited.

例えば、データフィードバック及び問題固定化の典型的なエンジニアリングプロセスは典型的には次の諸ステップを含む。即ち、（ａ）問題を定義し（これが出現する典型的な時間は、約１日である）、（ｂ）例えば、歩留まりのパーセンテージ、欠陥のパーセンテージ等のようなキー分析変数を選択し（これが出現する典型的な時間は、約１日である）、（ｃ）選択されたキー分析変数異常についての仮説を形成し（これが出現する典型的な時間は、約１日である）、（ｄ）種々の“ガットフィール”（gut-feel）方法を使用して仮説をランク付けし（これが出現する典型的な時間は、約１日である）、（ｅ）実験戦略及び実験試験計画を開発し（これが出現する典型的な時間は、約１日である）、（ｆ）実験を実施してデータを収集し（これが出現する典型的な時間は、約15日である）、（ｇ）モデルを適合させ（これが出現する典型的な時間は、約１日である）、（ｈ）モデルを診断し（これが出現する典型的な時間は、約１日である）、（ｉ）モデルを解釈し（これが出現する典型的な時間は、約１日である）、（ｊ）確認試験を実施して改善を確認する（これが出現する典型的な時間は、約20日である）か、または改善が見られなければ次の実験を（ｃ）から開始して実施する（典型的には５回の繰り返しを含む）。その結果、１つの問題を固定するための典型的な時間は、約７ヶ月になる。 For example, a typical engineering process for data feedback and problem fixing typically includes the following steps. (A) Define the problem (the typical time it appears is about one day), and (b) select key analysis variables such as, for example, yield percentage, defect percentage, etc. (Typical time of appearance is about 1 day), (c) form a hypothesis about the selected key analysis variable anomaly (typical time of appearance is about 1 day) and (d ) Rank hypotheses using various “gut-feel” methods (typical time for this to appear is about 1 day), and (e) develop experimental strategies and experimental test plans (Typical time for this to appear is about 1 day), (f) Perform experiments to collect data (Typical time for this to appear is about 15 days), (g) Fit the model (typical time for this to appear is about 1 day) (H) Diagnose the model (typical time it appears is about 1 day), (i) interpret the model (typical time it appears is about 1 day), ( j) Run a confirmation test to confirm improvement (typical time for this to appear is about 20 days), or if no improvement is seen, start the next experiment starting from (c) (Typically including 5 repetitions). As a result, the typical time to fix a problem is about 7 months.

ラインの幅が縮小され、ＩＣを製造するためにより新しい技術及び材料（例えば、銅金属化、及び新しい低ｋ誘電性フィルム）が使用されるにつれて、欠陥（その処理または誘起した汚染）を減少させることが益々重大になっている。原因を根絶させるための時間が欠陥を解消するためのキーである。これらの問題は、300ｍｍウェーハへ移行することによってより容易ではなくなる。従って、同時に収束する多くのことと共に、歩留まりランピング（ ramping ）が主要障害になりつつある。 As line widths are reduced and newer technologies and materials (eg, copper metallization, and new low-k dielectric films) are used to make ICs, defects (its processing or induced contamination) are reduced. This is becoming increasingly important. Time to eradicate the cause is the key to eliminating the defects. These problems are less easily transferred to 300 mm wafers. Thus, yield ramping is becoming a major obstacle, with much converging at the same time.

上述した諸問題に加えて、半導体工場は欠陥を監視し、欠陥密度を減少させ続ける努力の一環として、欠陥検出装置及び欠陥データ管理ソフトウェアに大量の資本を投ずるというさらなる問題が発生している。欠陥データ管理ソフトウェアにおける現行従来技術は、以下のデリバラブル（deliverables）の１またはそれ以上の開発を含む。即ち、（ａ）欠陥傾向（例えば、欠陥の型及びサイズによるパレート（paretos））、（ｂ）ウェーハレベル欠陥対歩留まりチャート、及び（ｃ）型及びサイズによる特別及び手動でのキル比。これらの各デリバラブル毎の主要欠陥は、ユーザが何をプロットすることを望んでいるかという事前知識を彼／彼女が有していなければならないことである。しかしながら、データが大き過ぎるために、ユーザが原因を根絶しようとする確率は低い。更に、たとえ各変数毎にチャートが生成されるとしても、チャートの数が莫大になればユーザがこれらのチャートを１つ１つ分析することは実質的に不可能である。 In addition to the above-mentioned problems, semiconductor factories are further challenged with investing large amounts of capital in defect detection equipment and defect data management software as part of their efforts to monitor defects and continue to reduce defect density. Current prior art in defect data management software includes the development of one or more of the following deliverables. (A) defect tendency (eg, paretos by defect type and size), (b) wafer level defect versus yield chart, and (c) special and manual kill ratio by type and size. The main flaw for each of these deliverables is that he / she must have prior knowledge of what the user wants to plot. However, since the data is too large, the probability that the user will try to eradicate the cause is low. Furthermore, even if charts are generated for each variable, it is virtually impossible for the user to analyze each of these charts if the number of charts becomes enormous.

上述した諸問題に加えて、半導体工場において使用されるデータの殆どは、“間接計測データ”であるという問題が存在する。この文脈における“間接計測”とは間接メトリックスで収集されたデータのことであり、この間接メトリックスは予測可能な方法で工場における製造プロセスに関係付けられているものとする。例えば、ＩＣ上に金属ラインをパターン化した後に、限界寸法走査電子顕微鏡（“ＣＤ−ＳＥＭ”）を使用して、所与のセットのウェーハ上の種々の位置における金属ラインの幅を測定することができる。半導体工場内の計測インフラストラクチャに、１つのビジネス値を割当てることができる。これは工場における“不良に向かった”プロセスの進行を停止させるために、計測データ測定を如何に速く活動可能な情報に変えるかに関係している。しかしながら、実際には、間接測定は莫大な潜在的問題を伴い、これらの問題が“活動可能な”工場処理ツールまたは処理ツール処理状態を指定する明白な関係が欠如することが多い。処理ツールセットと半導体工場の殆どの間接測定との間に明白な関係が欠如するために、エンジニアリングスタッフィングインフラストラクチャにかなりの投資を必要とし、また原因となるデータ内の関係を確立するために要する予測不能な時間フレームに起因するかなりの“スクラップ”材料コストがもたらされる。 In addition to the problems described above, there is a problem that most of the data used in the semiconductor factory is “indirect measurement data”. “Indirect measurements” in this context are data collected with indirect metrics, which are related to the manufacturing process in the factory in a predictable manner. For example, after patterning metal lines on an IC, a critical dimension scanning electron microscope ("CD-SEM") is used to measure the width of the metal lines at various locations on a given set of wafers. Can do. One business value can be assigned to the metrology infrastructure in the semiconductor factory. This is related to how quickly the measurement data measurement is turned into information that can be actuated in order to stop the process of “towards the defect” in the factory. In practice, however, indirect measurements involve enormous potential problems, and these problems often lack an explicit relationship that specifies an “active” factory processing tool or processing tool processing state. The lack of a clear relationship between the processing toolset and most indirect measurements at the semiconductor plant requires significant investment in the engineering stuffing infrastructure and is required to establish the relationships in the causative data Significant “scrap” material costs result from unpredictable time frames.

計測に加えて、ここ数年以内に、ウェーハの処理時間中の半導体ウェーハ処理ツールの動作状態を記録するように設計されたデータ抽出システムを開発するために多額の資本が投入されてきた。現在では時間をベースとするプロセスツールデータが、少なくとも幾つかの工場における処理ツールの幾分かに対して利用可能ではあるが、生産されつつあるＩＣに対する処理ツールの性能を最適化するためにこのデータを使用することは制限されている。これは、処理ツール時間データをどのように表すかと、ＩＣ性能データをどのように表すかとの間が切り離されていることが原因である。例えば、ＩＣについてのデータ測定が、ウェーハの所与のバッチ（以下、ロットという）、または所与のウェーハ、またはウェーハ上のＩＣの所与のサブセットに関連付けられることは間違いない。一方、処理ツール時間データからのデータ測定は、ウェーハ処理中の特定時点における処理ツール内の離散した動作状態として表される。例えば、もし処理ツールが孤立した処理チャンバを有していれば、所与のウェーハがその処理チャンバ内に留まる間は各ミリ秒毎にチャンバ圧力が記録される。この例では、任意のウェーハのためのチャンバ圧力データが1000の独自測定のシリーズとして記録される。ＩＣデータメトリックスは単一の離散した測定であるので、このデータフォーマットは所与のＩＣデータメトリックスを有する分析テーブル内に“併合”することはできない。処理ツール時間データを離散したデータメトリックスに“併合”することに伴う困難さから、処理ツール時間データを工場効率を最適化する手段として使用することが制限されるのである。 In addition to metrology, significant capital has been invested within the last few years to develop data extraction systems designed to record the operating status of semiconductor wafer processing tools during wafer processing times. Currently, time-based process tool data is available for at least some of the processing tools in some factories, but this is to optimize the performance of the processing tools for the ICs being produced. Use of data is restricted. This is due to the disconnect between how the processing tool time data is represented and how the IC performance data is represented. For example, there is no doubt that a data measurement for an IC is associated with a given batch of wafers (hereinafter referred to as a lot), or a given wafer, or a given subset of ICs on the wafer. On the other hand, data measurements from processing tool time data are represented as discrete operating states within the processing tool at specific points during wafer processing. For example, if the processing tool has an isolated processing chamber, the chamber pressure is recorded every millisecond while a given wafer remains in the processing chamber. In this example, chamber pressure data for any wafer is recorded as a series of 1000 unique measurements. Since IC data metrics are a single discrete measurement, this data format cannot be “merged” into an analysis table with a given IC data metric. The difficulty associated with “merging” processing tool time data into discrete data metrics limits the use of processing tool time data as a means of optimizing factory efficiency.

上述した諸問題に加えて、工場において生成されたデータを格納するための関係型データベースの使用を含む別の問題が存在している。限定するものではないが、例えばオラクル(ORACLE)及びＳＱＬサーバーのような関係型データベースが、データ要素間の定義された、または割当てられた関係を有するデータを編成して引用する必要があることに起因する。使用する場合、これらの関係型データベース技術のユーザ（例えば、プログラマ）は、各データ要素を如何に他のデータ要素に関係付けるかを予め定義するスキーマを供給する。データベースが作成された後、そのデータベースのアプリケーションユーザは予め確立されている関係に基づいてデータベース内に含まれている情報に関する問い合わせを行う。因みに、従来技術の関係型データベースは、これらの関係型データベースを工場で使用する時に問題をもたらす２つの固有の問題を有している。第１の問題は、モデル化されるデータのための特定のスキーマ（即ち、関係及びデータベーステーブル）を作成する前に、ユーザ（例えば、プログラマ）がそのデータを熟知していなければならないことである。このスキーマは、本質的にデータ要素関係をセーフガードする制御を実現する。データをデータベース内に配置するソフトウェア、及びデータベースからデータを検索するためのアプリケーションソフトウェアは、データベース内の何れか２つのデータ要素間のスキーマ関係を使用しなければならない。第２の問題は、小さいデータトランザクション（例えば銀行業務、航空券販売等）を検索する場合は、関係型データベースは優れたＴＰＳ定格（即ち、トランザクション処理スペック）を有しているが、特に工場において歩留まりを改善する場合に要求されるデータウェアハウジング、及びデータマインニング（mining）のような決定支援システムを援助するために大量のデータセットを生成する場合には、その動作が不十分なことである。 In addition to the problems described above, there are other problems that involve the use of relational databases to store data generated at the factory. Although not limiting, relational databases, such as Oracle and SQL servers, need to organize and cite data that has a defined or assigned relationship between data elements. to cause. In use, users of these relational database technologies (eg, programmers) provide a schema that predefines how each data element is related to other data elements. After the database is created, the application user of that database makes an inquiry about the information contained in the database based on a pre-established relationship. Incidentally, the relational database in the prior art has two inherent problems that cause problems when these relational databases are used in a factory. The first problem is that before creating a specific schema (ie, relationships and database tables) for the data to be modeled, the user (eg, programmer) must be familiar with that data. . This schema essentially implements controls that safeguard data element relationships. Software that places data in the database and application software for retrieving data from the database must use a schema relationship between any two data elements in the database. The second problem is that when searching for small data transactions (eg banking, ticket sales, etc.), the relational database has an excellent TPS rating (ie transaction processing spec), especially in factories. When generating large data sets to assist decision support systems such as data warehousing and data mining, which are required to improve yield, the operation is insufficient. is there.

上述した諸問題に加えて、生産歩留まり問題を定量化するために半導体製造産業において使用される従来技術のデータ分析アルゴリズムの結果としての問題が存在している。これらのアルゴリズムは、線形回帰分析、及び決定樹(decision tree)データマインニング方法の手動適用を含む。これらのアルゴリズムは、次のような２つの基本的問題を抱えている。即ち、（ａ）所与のデータセット内には殆ど常に１より多くの歩留まりにインパクトを与える問題が存在しているが、これらのアルゴリズムは所与の工場内の分離した１組の歩留まりにインパクトを与える諸問題を定量化するのではなく、“１つの”答えを見出すために最良に使用されている。（ｂ）これらのアルゴリズムは、分析の“ハンドオフ”を完全に自動化することはできない。それは、線形回帰分析は分析に先立って変数カテゴリを手動で準備し、定義する必要があり、また決定樹データマインニングは分析内の目標変数を定義するために、並びに分析自体のための種々のパラメータを定義するために“人間のユーザ”を必要とするからである。 In addition to the problems described above, there are problems as a result of prior art data analysis algorithms used in the semiconductor manufacturing industry to quantify production yield problems. These algorithms include linear regression analysis and manual application of a decision tree data mining method. These algorithms have the following two basic problems. (A) Although there is always a problem that impacts more than one yield in a given data set, these algorithms impact a separate set of yields in a given factory. Rather than quantifying the problems that give, it is best used to find “one” answer. (B) These algorithms cannot fully automate the analysis “handoff”. That is, linear regression analysis requires manual preparation and definition of variable categories prior to analysis, and decision tree data mining is used to define target variables within the analysis, as well as various variables for the analysis itself. This is because a “human user” is required to define the parameters.

上述した諸問題に加えて、かなり大きいデータセットをデータマインニングすることを原因とする別の問題が存在している。例えば、従来技術によれば、分析されるデータ内の変数のサイズ及び数を減少させるようにデータセットを濾波するために、あるレベルのドメイン知識（即ち、データのストリーム内のどのフィールドが“関心のある”情報を表しているかに関する情報）を使用した後に限って、かなり大きいデータセットをデータマインニングすることが可能である。縮小されたデータセットを生成した後に、それはエキスパートが価値システムを定義（即ち、どれが重要であるかの定義）することによって既知の分析技術／モデルに対してマインニングされ、分析システムを駆動すべき“優良質問”が推測される。この方法を有効なものとするために、典型的にはツールが手動で構成され、結果を最終的に評価する人々によって操作される。データを収集し、データセットをマインニングするのに使用される適切な質問を形成するためには、産業専門知識、正確に言えば、特定のプロセスの知識が必要であるから、通常これらの人々は、評価されつつあるプロセスに責任のある人々と同一の人々である。これらの産業エキスパートに所要のデータマインニング及び相関タスクの責任を持たせることは、彼等の時間の使用が非効率になり、またデータマインニングのプロセスは手動で介入することによって大きく駆動されるから、プロセス毎に得られる結果に矛盾がもたらされるようになる。最終的には、たとえ成功したとしても“利益”の殆どが失われるか、または低下する。例えば、データを手動で処理して分析する時間のかかるプロセスは人時及び装置が高価になり、またもし結果が十分に速く達成されなければ発見された変化を実行するための十分な時間がなくなる。 In addition to the problems described above, there is another problem due to data mining of fairly large data sets. For example, according to the prior art, some level of domain knowledge (ie, which field in the stream of data is “interested”) to filter the data set to reduce the size and number of variables in the analyzed data. It is possible to data mining a fairly large data set only after using certain “information representing” information. After generating a reduced data set, it is mined against known analysis techniques / models by experts defining value systems (ie, defining what is important) and drives the analysis system A “good question” should be inferred. In order to make this method effective, tools are typically manually configured and manipulated by those who ultimately evaluate the results. Usually these people need to have industry expertise, or more precisely, specific process knowledge, to collect data and form appropriate questions used to mining the dataset. Are the same people who are responsible for the process being evaluated. Making these industry experts accountable for the required data mining and correlation tasks makes their time use inefficient and the data mining process is driven largely by manual intervention Therefore, inconsistency will be brought about in the result obtained for each process. Eventually, even if successful, most of the “profit” is lost or reduced. For example, a time consuming process of manually processing and analyzing the data can be time consuming and expensive, and if the results are not achieved fast enough, there is not enough time to perform the discovered changes. .

上述した諸問題に加えて、以下のような別の問題も存在している。歩留まりの向上及び工場効率改善監視努力の重要な部分は、ラインの終わり機能試験データ、インラインパラメトリックデータ、インライン計測データ、及びＩＣを製造するために使用される特定の工場処理ツール間の相関に集中されていた。これらの相関を遂行するには、工場処理ツールデータの全ての列（カラム）と、指定された“データの数値列”との関係（どの処理ツールデータをカテゴリ別属性として事前提示するか）を決定する必要がある。良好な相関とは、その列内のカテゴリの１つを有する処理ツールの特定の列（即ち、カテゴリ別）が、選択された数値列（即ち、従属変数または“ＤＶ”と呼ぶ）のための値の望ましくない範囲と相関することと定義されている。このような分析の目的は、望ましくないＤＶの読みの原因であると推測されるカテゴリ（例えば、工場処理ツール）を識別すること、及び処理ツールは正しく動作していることをエンジニアが確認できるようになる時点まで、それを工場処理の流れから排除することである。半導体工場データベースには莫大な数のツール及び“ツール状”カテゴリ別データが与えられるから、手動スプレッドシート探索技術（“コモナリティスタディ”と呼ばれる）を使用して不良処理ツールを分離することは困難である。この制約にも拘わらず、半導体産業には、不良処理ツールまたはカテゴリ別処理データを検出するための技術が存在している。これは、例えば、ロットコモナリティ分析を遂行することによって行うことができる。しかしながら、この技術は特定のプロセス層の事前知識を必要とし、またもしユーザが障害の本質を十分に理解していなければ、時間を消費しかねない。別の技術は、ニューラルネットワークまたは決定樹のような進歩したデータマインニングアルゴリズムを使用することである。これらの技術は効果的ではあるが、データマインニングには該博なドメイン専門知識が必要であるので、それらをセットアップするのを困難にしている。更に、これらのデータマインニングアルゴリズムは、このような包括的データ分析技術に要求される大量のアルゴリズムオーバヘッドのために低速であることが知られている。上述した分析技術を使用すると、ユーザは、典型的に、不良処理ツールを見出した後にそれを実際に固定するのに費やす努力よりも多くの時間を基本的な、または複雑な分析によって問題を識別する試みに費やすようになる。 In addition to the problems described above, there are other problems as follows. An important part of the yield improvement and factory efficiency improvement monitoring efforts concentrates on end-of-line functional test data, in-line parametric data, in-line measurement data, and correlation between specific factory processing tools used to manufacture ICs It had been. To perform these correlations, the relationship between all the columns of the factory processing tool data and the specified “numerical sequence of data” (which processing tool data is pre-presented as a category attribute) It is necessary to decide. Good correlation means that a particular column of processing tools (ie, by category) that has one of the categories in that column is for the selected numeric column (ie, called the dependent variable or “DV”). Defined as correlating with an undesired range of values. The purpose of such an analysis is to identify the category (eg, factory processing tool) that is suspected of causing the undesirable DV reading, and to allow the engineer to confirm that the processing tool is operating correctly. Until that time, to eliminate it from the factory processing flow. The semiconductor factory database provides a huge number of tools and “tool-like” categorical data, making it difficult to isolate fault handling tools using manual spreadsheet search technology (called “commonality studies”) It is. Despite this limitation, there are technologies in the semiconductor industry for detecting defect handling tools or categorized process data. This can be done, for example, by performing a lot commonality analysis. However, this technique requires prior knowledge of a particular process layer and can be time consuming if the user does not fully understand the nature of the failure. Another technique is to use advanced data mining algorithms such as neural networks or decision trees. While these techniques are effective, data mining requires such extensive domain expertise, making them difficult to set up. Furthermore, these data mining algorithms are known to be slow due to the large amount of algorithm overhead required for such comprehensive data analysis techniques. Using the analysis techniques described above, users typically identify problems through basic or complex analysis that takes more time than finding the fault handling tool and actually spending it in place. To spend on an attempt to do.

最後に、上述した諸問題に加えて、以下のような別の問題も存在している。大きいデータセット内の相関の探索について通常の線形統計量と比較する場合には、ニューラルネットワーク、規則誘導（ rule induction ）探索、及び決定樹のようなデータマインニングアルゴリズムがより望ましい方法であることが多い。しかしながら、Windows 2000サーバーのような低価格ハードウェアプラットフォーム上で大きいデータセットを分析するためにこれらのアルゴリズムを使用する場合、幾つかの限界がある。これらの限界の中で主たるものは、これらの技術によって要求されるランダムアクセスメモリ及び拡張ＣＰＵローディングの使用である。大きい半導体製造データセット（例えば、＞40Ｍバイト）をニューラルネットワーク分析するには数時間以上かかることが多く、Windows 2000オペレーティングシステムの場合、２ＧバイトＲＡＭ限界を突破することさえあり得る。更に、これらの大きいデータセットを規則誘導または決定樹分析するとしても（必ずしも単一WindowsプロセスのためのＲＡＭ限界を破ることはないが）、分析を完了させるまでに数時間を必要とし得る。 Finally, in addition to the above-mentioned problems, there are other problems as follows. Data mining algorithms such as neural networks, rule induction searches, and decision trees are more desirable methods when comparing correlation searches in large data sets with normal linear statistics. Many. However, there are some limitations when using these algorithms to analyze large data sets on low-cost hardware platforms such as Windows 2000 servers. The main among these limitations is the use of random access memory and extended CPU loading required by these technologies. Neural network analysis of large semiconductor manufacturing data sets (eg,> 40 Mbytes) often takes several hours or more and can even exceed the 2 Gbyte RAM limit for the Windows 2000 operating system. Furthermore, even if these large datasets are rule derived or decision tree analyzed (although not necessarily breaching the RAM limit for a single Windows process), it may take several hours to complete the analysis.

当分野においては、上述した諸問題の１つまたはそれ以上を解消する要望が存在している。 There is a desire in the art to overcome one or more of the problems described above.

本発明の１またはそれ以上の実施例は、上述した当分野における要望を有利に満足させる。詳述すれば、本発明の一実施例は、集積回路製造工場（“工場”）において入手した情報をデータマインニングする方法である。本方法は、（ａ）工場内においてデータを発生するシステム、ツール、及びデータベースの１またはそれ以上を通して工場からデータを収集するか、または工場からデータを収集するステップと、（ｂ）データをフォーマットし、フォーマットされたデータをソースデータベース内に格納するステップと、（ｃ）データマインニングに使用するためのデータの部分を、ユーザ指定の構成ファイルに従って抽出するステップと、（ｄ）データの抽出された部分を、ユーザ指定の構成ファイルに従ってデータマインニングするステップと、（ｅ）データマインニングの結果を結果データベース内に格納するステップと、（ｆ）結果へのアクセスを与えるステップとを含む。 One or more embodiments of the present invention advantageously satisfy the needs in the art as described above. More particularly, one embodiment of the present invention is a method for data mining information obtained at an integrated circuit manufacturing factory ("factory"). The method comprises the steps of: (a) collecting data from a factory or collecting data from the factory through one or more of systems, tools, and databases that generate data within the factory; and (b) formatting the data. Storing the formatted data in the source database; (c) extracting a portion of the data for use in data mining according to a user-specified configuration file; and (d) extracting the data. Data mining according to a user-specified configuration file, (e) storing data mining results in a result database, and (f) providing access to the results.

本発明の１またはそれ以上の実施例は、特に、（ａ）集積回路（“ＩＣ”）製造工場（“半導体工場”または“工場”）データフィード（即ち、マルチフォーマットデータファイルストリーミングを確立することによる）（ｂ）限定するものではないが、例えばオラクル(Oracle)ファイルシステムにおいて10,000の測定を索引する（限定するものではないが、例えばハイブリッドデータベースのような10,000の測定を索引する）データベース、（ｃ）分析のための多重データセットの迅速エキスポートを有する決定分析データフィード、（ｄ）“データ価値システム”を使用して問い合わせる自動化された質問を伴う無支援分析自動化、（ｅ）限定するものではないが、例えばニューラルネットワーク、規則誘導、及び多変体統計量のような多重データマインニング技術、（ｆ）調査結果を分類するための複数のフォローオン統計量を有する視覚化ツール、及び（ｇ）迅速展開を与える終端間ウェブ引渡しシステムのためのアプリケーションサービスプロバイダ（“ＡＳＰ”）の１またはそれ以上を提供することによって歩留まりを向上させることが可能である。本発明のこれらの実施例の１またはそれ以上を使用することによって、典型的なデータフィードバック及び問題固定のためのエンジニアリングプロセスは、典型的に、（ａ）自動化問題定義（これが出現する典型的な時間は約０日である）、（ｂ）歩留まりのパーセンテージ、欠陥のパーセンテージ等のような全てのキー分析変数の監視（これが出現する典型的な時間は約０日である）、（ｃ）全てのキー分析変数異常に関する仮説の形成（これが出現する典型的な時間は約０日である）、（ｄ）統計的信頼レベルと、可固定性基準を使用しての仮説のランク付け（即ち、（例えば、経験に基づくことができる構成ファイル内に）供給される命令。これは、限定するものではないが、例えばある人口知能のための重み付けを含む仮説を、どのようにスコア付与または定格付けするかを指示する。例えばツールデータのようなカテゴリ別データのための可固定性基準は、例えばプローブデータのような数値データのための可固定性基準とは異なることに注目されたい）（これが出現する典型的な時間は約１日である）、（ｅ）実験戦略及び実験試験計画の開発（これが出現する典型的な時間は約１日である）、（ｆ）実験の遂行とデータの収集（これが出現する典型的な時間は約15日である）、（ｇ）モデルの適合（これが出現する典型的な時間は約１日である）、（ｈ）モデルの診断（これが出現する典型的な時間は約１日である）、（ｉ）モデルの解釈（これが出現する典型的な時間は約１日である）、及び（ｊ）繰り返しを行わずに改善を検証するための試験確認（これが出現する典型的な時間は約20日である）の諸ステップを含むことになろう。その結果、１つの問題を固定するための典型的な時間は、約1.5月である。 One or more embodiments of the present invention may in particular establish (a) an integrated circuit (“IC”) manufacturing plant (“semiconductor factory” or “factory”) data feed (ie, multi-format data file streaming). (B) a database that indexes, but is not limited to, for example, 10,000 measurements in an Oracle file system (but not limited to, for example, 10,000 measurements such as a hybrid database); c) Decision analysis data feed with rapid export of multiple data sets for analysis, (d) Unsupported analysis automation with automated questions queried using a “data value system”, (e) Limited Although not multiple data such as neural networks, rule induction, and multivariate statistics. Taminening technology, (f) a visualization tool with multiple follow-on statistics to classify survey results, and (g) an application service provider ("ASP") for an end-to-end web delivery system that provides rapid deployment It is possible to improve the yield by providing one or more of the above. By using one or more of these embodiments of the present invention, an exemplary data feedback and problem fixing engineering process typically includes (a) an automated problem definition (the typical one in which this appears). (Time is about 0 days), (b) monitoring of all key analysis variables such as yield percentage, defect percentage, etc. (typical time for this to appear is about 0 days), (c) all Formation of hypotheses for key analysis variable anomalies (typical time for this to occur is about 0 days), (d) ranking of hypotheses using statistical confidence levels and fixability criteria (ie, Instructions supplied (eg, in a configuration file that can be based on experience), which include, but are not limited to, hypotheses that include weighting for some artificial intelligence, for example For example, the fixability criteria for categorical data such as tool data are different from the fixability criteria for numerical data such as probe data. (Note) (typical time it appears is about 1 day), (e) development of experimental strategy and experimental test plan (typical time it appears is about 1 day), (f) Performing the experiment and collecting data (typical time it appears is about 15 days), (g) fitting the model (typical time it appears is about 1 day), (h) model Diagnosis (typical time it appears is about 1 day), (i) interpretation of the model (typical time it appears is about 1 day), and (j) improvement without repetition Test confirmation to verify (this appears Typical time is about 20 days). As a result, the typical time to fix one problem is about 1.5 months.

図３は、本発明の１またはそれ以上の実施例に従って製造された工場データ分析システム３０００と、ＩＣ製造プロセスと共に使用するために本発明の１またはそれ以上の実施例にデータマインニングを適用する場合における生のフォーマットされていない入力からデータマインニング結果までのデータの自動化された流れとを示している。本発明の１またはそれ以上の実施例によれば、分析プロセスの各ステップ、及び分析プロセスの１つの相から次の相への流れを自動化することによって、プロセスを手動でデータマインニングし、データマインニングの結果をプロセス改善に変える欠点を大幅に減少または排除することができる。更に、本発明の１またはそれ以上のさらなる実施例によれば、データ分析セットアップのためのユーザまたはクライアントアクセスが提供され、またインターネットウェブブラウザのような一般的に利用可能な既設のインタフェースを介して結果を見ることが可能である。このようなウェブブラウザインタフェースを実現するためには、アプリケーションサービスプロバイダ（“ＡＳＰ”）システム分配方法（即ち、当分野においては公知のウェブをベースとするデータ転送方法）が好ましい方法である。また、１またはそれ以上の工場サイトからのデータに関してデータ収集及び分析を遂行する１つの工場によって、または各会社毎に１またはそれ以上の工場サイトからのデータに関してデータ収集及び分析を遂行する幾つかの会社によって、図３に示す工場データ分析システムの実施例の１またはそれ以上を使用することができる。更に、１またはそれ以上のこれらの実施例の場合、データが会計管理方法による機密保護によって隔離されている場合には、ユーザまたはクライアントセットアップ及び／またはビューイング結果は、同一会社の異なる部署からの異なるユーザまたはクライアントでも、または異なる会社の異なる部署からの異なるユーザまたはクライアントでもあり得る。 FIG. 3 applies factory data analysis system 3000 manufactured according to one or more embodiments of the present invention and data mining to one or more embodiments of the present invention for use with an IC manufacturing process. Fig. 2 shows an automated flow of data from raw unformatted input to data mining results in the case. According to one or more embodiments of the present invention, the process is manually data mined by automating each step of the analytical process and the flow from one phase of the analytical process to the next, and data The disadvantages of turning mining results into process improvements can be greatly reduced or eliminated. Furthermore, according to one or more further embodiments of the present invention, user or client access for data analysis setup is provided and via a commonly available existing interface such as an Internet web browser. It is possible to see the result. An application service provider (“ASP”) system distribution method (ie, a web-based data transfer method known in the art) is a preferred method for implementing such a web browser interface. Also, some that perform data collection and analysis on data from one or more factory sites by one factory that performs data collection and analysis on data from one or more factory sites May use one or more of the factory data analysis system embodiments shown in FIG. Further, in one or more of these embodiments, if the data is segregated by accounting method security, user or client setup and / or viewing results are from different departments of the same company. It can be different users or clients, or different users or clients from different departments of different companies.

本発明の１またはそれ以上の実施例においては、（ａ）データは自動的に検索され、処理され、そしてフォーマットされるので、データマインニングツールはそのデータを用いて作業することができ、（ｂ）価値システムが適用され、質問が自動的に生成されるので、データマインニングツールは関連結果を戻し、そして（ｃ）結果が自動的に通知され、遠隔的にアクセス可能であるので、結果に基づく補正動作を迅速に行うことができる。 In one or more embodiments of the invention, (a) data is automatically retrieved, processed, and formatted so that the data mining tool can work with that data ( b) Since the value system is applied and the question is automatically generated, the data mining tool returns the relevant results, and (c) the results are automatically notified and remotely accessible, so the results The correction operation based on can be performed quickly.

図３に示すように、ＡＳＰデータ転送モジュール３０１０は、限定するものではないが、例えば、（ａ）ＭＥＳ（“測定実行システム”）からのロット機器履歴データ、（ｂ）機器インタフェースからのデータ、（ｃ）工場が準備したデータソースからの処理ツールレシピ及び処理ツール試験プログラム、及び（ｄ）限定するものではないが、例えば、プローブ試験データ、Ｅ試験（電気試験）データ、欠陥測定データ、遠隔診断データ収集、及び工場が準備したデータソースからの事後処理データのような、工場内の異なる型のデータソースの何れか１つまたは複数からの異なる型のデータを取得するデータ収集プロセスまたはモジュールである。本発明の１またはそれ以上の実施例によれば、ＡＳＰデータ転送モジュール３０１０は、限定するものではないが、例えば、ツール及び／または直接データソースからの生データ出力を格納する顧客データ収集データベース（集中化、またはそれ以外）からの顧客及び／またはツールが指令したフォーマット形状で伝送されるデータを受け入れ、及び／または収集する。更に、このようなデータ受け入れ、または収集はスケジュールをベースとして、またはオンデマンドで行うことができる。また更に、データは暗号化することも、または顧客のイントラネットのような機密保護ネットワークを通してＦＴＰファイルとして（例えば、安全ｅメールのように）伝送することもできる。本発明の一実施例によれば、ＡＳＰデータ転送モジュール３０１０はＰＣサーバー上で走るソフトウェアアプリケーションであり、当分野においては公知の多くの方法の何れか１つに従ってＣ⁺⁺、パール(Perl)、及びVisual Basicで符号化されている。例えば、一般的に使用可能な典型的なデータは、（ａ）典型的に約12,000アイテム／ロット（ウェーハのロットとは、典型的にカセット内で処理中に通常一緒に走行する25ウェーハのことをいう）を含むＷＩＰ（作業進行）情報（ＷＩＰ情報は、典型的にはプロセスエンジニアによってアクセスされる）、（ｂ）例えば、典型的には約120,000アイテム／ロットを含む生の処理ツールデータのような機器インタフェース（従来は、典型的に、機器インタフェース情報には誰からもアクセスされなかったことに注目されたい）、（ｃ）典型的に約1,000アイテム／ロットを含むプロセス計測情報（プロセス計測情報は、典型的にはプロセスエンジニアによってアクセスされる）、（ｄ）典型的に約1,000アイテム／ロットを含む欠陥情報（欠陥情報は、典型的には歩留まりエンジニアによってアクセスされる）、（ｅ）典型的に約10,000アイテム／ロットを含むＥ試験（電気試験）情報（Ｅ試験情報は、典型的にはデバイスエンジニアによってアクセスされる）、及び（ｆ）典型的に約2,000アイテム／ロットを含む分類（データログ及びビットマップを有する）情報（分類情報は、典型的には製品エンジニアによってアクセスされる）を含む。これらのデータを、ウェーハ当たり合計約136,000独自測定までロールアップできることは容易に理解されよう。 As shown in FIG. 3, the ASP data transfer module 3010 includes, but is not limited to, for example, (a) lot equipment history data from MES (“measurement execution system”), (b) data from equipment interface, (C) Processing tool recipes and processing tool test programs from factory-prepared data sources, and (d) but not limited to, for example, probe test data, E test (electrical test) data, defect measurement data, remote A data collection process or module that obtains different types of data from any one or more of the different types of data sources in the factory, such as diagnostic data collection and post-processing data from factory prepared data sources is there. In accordance with one or more embodiments of the present invention, the ASP data transfer module 3010 includes, but is not limited to, a customer data collection database (e.g., storing raw data output from tools and / or direct data sources). Accept and / or collect data transmitted in a format dictated by customers and / or tools from centralized or otherwise). Further, such data acceptance or collection can be done on a schedule basis or on demand. Still further, the data can be encrypted or transmitted as an FTP file (eg, like secure email) through a secure network such as the customer's intranet. In accordance with one embodiment of the present invention, ASP data transfer module 3010 is a software application running on a PC server, and C ⁺⁺ , Perl, according to any one of many methods known in the art. And encoded in Visual Basic. For example, typical data generally available are: (a) typically about 12,000 items / lot (wafer lots are typically 25 wafers that typically run together during processing in a cassette. WIP (work progress) information (WIP information is typically accessed by process engineers), (b) of raw processing tool data typically including, for example, about 120,000 items / lot Equipment interface (previously, typically, nobody has accessed the equipment interface information), (c) process metrology information (process metrology typically containing about 1,000 items / lot) Information is typically accessed by a process engineer), (d) defect information typically containing about 1,000 items / lot (defect information is typically (E) typically accessed by a yield engineer), (e) E test (electrical test) information typically containing about 10,000 items / lot (E test information is typically accessed by a device engineer), and (F) Includes classification (with data logs and bitmaps) information typically containing about 2,000 items / lot (classification information is typically accessed by product engineers). It will be readily understood that these data can be rolled up to a total of approximately 136,000 unique measurements per wafer.

図３に更に示すように、データ変換モジュール３０２０は、ＡＳＰデータ転送モジュール３０１０によって受信された生のデータを、当分野においては公知の多くの方法の何れか１つによるキー／列／データを含むデータフォーマットに変換及び／または翻訳し、変換されたデータを自己適応データベース３０３０内に格納する。データ変換モジュール３０２０によって遂行されるデータ変換処理は、生データの分類と、限定するものではないが、例えば、工場・試験ロットＩＤ変換（例えば、これは鋳物工場に有用である）、ウェーハＩＤ変換（例えば、スルース・アンド・スクライブ（Sleuth and Scribe）ＩＤ）、及びウェーハ／レティクル／ダイ座標正規化及び変換（限定するものではないが、例えば、座標正規化のためにノッチが使用されているのか、またはウェーハ基準点測定が使用されているのかに依存する）のような併合処理と、限定するものではないが、例えば、Ｅ試験データのためのスペック限界、ビン（ bin ）プローブデータ（例えば、あるラインの終わりプローブ試験の場合、10から100障害モードが存在し得る）、計測データ、及び限定するものではないが、例えば、ロット、ウェーハ、領域、及び層データのような計算されたデータ型のようなデータスペックとを含む。本発明の一実施例によれば、データ変換モジュール３０２０はＰＣサーバー上で走るソフトウェアアプリケーションであり、当分野においては公知の多くの方法の何れか１つに従ってオラクル Dynamic ＰＬ−ＳＱＬ及びパールで符号化されている。本発明のこのような実施例の１つによれば、データ変換モジュール３０２０によって遂行されるデータ変換処理は、生でファイルを“十分にフォーマットされた”産業寛容なファイルに変換する当分野においては公知の多くの方法の何れか１つに従って複数の翻訳プログラムの１つの包括的セットを使用することを含む（即ち、データフォーマットは“一般化”されており、従って、たとえデータを変換できるデータフォーマットがどれ程多く存在していても、僅かに数フォーマットだけが使用される）。本発明の１またはそれ以上の実施例によれば、変換されたファイルは、（低細分性データを高細分性データへ“ロールアップ”するために、その後のプロセスを動作可能にするために）生データ内に存在する“レベル”情報を維持しながら、産業特定の情報は含まない。生データをこのフォーマットにした後に、それを自己適応データベース３０３０へ供給して格納させる。 As further shown in FIG. 3, the data conversion module 3020 includes the raw data received by the ASP data transfer module 3010 with keys / columns / data in any one of many ways known in the art. The data is converted and / or translated into a data format, and the converted data is stored in the self-adaptive database 3030. The data conversion process performed by the data conversion module 3020 includes, but is not limited to, raw data classification, for example, factory / test lot ID conversion (eg, useful for foundries), wafer ID conversion. (E.g., Sleuth and Scribe ID), and wafer / reticle / die coordinate normalization and transformation (for example, but not limited to, notches are used for coordinate normalization , Or depending on whether wafer reference measurement is used) and, for example, but not limited to, spec limits for E test data, bin probe data (eg, For end-of-line probe tests, there may be 10 to 100 failure modes), measurement data, and not limitation But it includes, for example, the lot, wafer, regions, and the data specifications, such as the calculated data types such as layer data. According to one embodiment of the present invention, the data conversion module 3020 is a software application running on a PC server and encoded with Oracle Dynamic PL-SQL and Pearl according to any one of many methods known in the art. Has been. According to one such embodiment of the present invention, the data conversion process performed by the data conversion module 3020 is in the art of converting a raw file into a “fully formatted” industry-friendly file. Including using one comprehensive set of multiple translation programs according to any one of a number of known methods (ie, data formats are "generalized" and thus data formats that can convert data) Only a few formats are used, no matter how many are present). According to one or more embodiments of the present invention, the converted file is (to enable subsequent processes to be “rolled up” to low granularity data into high granularity data). While maintaining the “level” information present in the raw data, it does not include industry specific information. After the raw data is in this format, it is supplied to the self-adaptive database 3030 for storage.

本発明の１またはそれ以上の実施例によれば、入力データのための包括的ファイルフォーマットは、ウィジェット（Widget）ＩＤ、何処？、何時？、何？、及び値のレベリングスキームを使用することによって定義される。例えば、半導体工場の場合、これらは特に以下のように定義される。即ち、ウィジェトＩＤは、ロットＩＤ、ウェーハＩＤ、スロットＩＤ、レティクルＩＤ、ダイＩＤ、及びサブダイｘ，ｙデカルト座標の１またはそれ以上によって識別される。何処？は、プロセスの流れ／アセンブリライン製造ステップ、及びサブステップの１またはそれ以上によって識別される。何時？は、測定の日付／時間の１またはそれ以上によって識別される。何？は、測定名“限定するものではないが、例えば、歩留まり”、測定型／カテゴリ、及びウェーハ分類の１またはそれ以上として識別される。値？は、限定するものではないが、例えば、歩留まり51.4％として定義される。このような実施例を使用すれば、どのようなプラントデータも表すことができる。 According to one or more embodiments of the present invention, the generic file format for input data is Widget ID, where? ,What time? ,what? , And by using a value leveling scheme. For example, in the case of a semiconductor factory, these are specifically defined as follows. That is, the widget ID is identified by one or more of lot ID, wafer ID, slot ID, reticle ID, die ID, and sub-die x, y Cartesian coordinates. where? Are identified by one or more of process flow / assembly line manufacturing steps and sub-steps. What time? Is identified by one or more of the measurement date / time. what? Is identified as one or more of a measurement name “for example, but not limited to, yield”, measurement type / category, and wafer classification. value? Is defined as, for example, but not limited to, 51.4% yield. Using such an embodiment, any plant data can be represented.

本発明の１またはそれ以上の実施例によれば、データ変換モジュール３０２０は、ＡＳＰデータ転送モジュール３０１０によって収集された新しい型のデータを包括的に翻訳する。詳述すれば、データ変換モジュール３０２０は、例えばデータアクセスのためのハッシュコードを作成することによって、新しいデータを自己適応データベース３０３０内に格納できるようにするための“オンザフライ”データベース“ハンドシェーク”を作成する。最後に、本発明の一実施例によれば、データは、それが工場分析システム３０００に到着するにつれて自己適応データベース３０３０内に格納される。 According to one or more embodiments of the present invention, the data conversion module 3020 comprehensively translates new types of data collected by the ASP data transfer module 3010. Specifically, the data conversion module 3020 creates an “on-the-fly” database “handshake” to allow new data to be stored in the self-adaptive database 3030, for example by creating a hash code for data access. To do. Finally, according to one embodiment of the present invention, data is stored in a self-adaptive database 3030 as it arrives at the factory analysis system 3000.

本発明の１またはそれ以上の実施例によれば、ＡＳＰデータ転送モジュール３０１０は、SmartSys^TMデータベース（SmartSys^TMアプリケーションは、Applied Materials, Inc.から入手可能なソフトウェアアプリケーションであり、例えばセンサデータのような工場内の処理ツールからのデータを収集し、分析し、そして格納する）からの処理ツールセンサデータを収集するモジュールを含む。更に、データ変換モジュール３０２０は、SmartSys^TM処理ツールセンサデータを、データマインニングのためのマスターローダーモジュール３０５０及びマスタービルダーモジュール３０６０によって準備されるデータセットに変換するモジュールを含む。 In accordance with one or more embodiments of the present invention, the ASP data transfer module 3010 is a SmartSys ^™ database (the SmartSys ^™ application is a software application available from Applied Materials, Inc., such as sensor data. Collecting, analyzing, and storing data from processing tools in a factory). Further, the data conversion module 3020 includes a module that converts SmartSys ^™ processing tool sensor data into a data set prepared by a master loader module 3050 and a master builder module 3060 for data mining.

本発明の１またはそれ以上の実施例によれば、データ変換アルゴリズムによって、計測データメトリックスと既存の非最適工場（即ち、工場状態）との間に“直接”リンクを確立するために、個々の処理（即ち、工場またはアセンブリライン）ツールからの時間をベースとする（以下、時間ベースという）データの使用が可能である。このデータ変換アルゴリズムの重要な部分は、ウェーハ処理中に処理（工場またはアセンブリライン）ツールにおいて生成された時間ベース動作状態データを、キー集積回路特定統計量に変換する方法である。この統計量は、自動化データマインニング障害検出分析を行うために後述する手法でデータブレインエンジンモジュール３０８０によって分析される。本発明の１またはそれ以上の実施例によれば、このような時間ベース処理ツールデータを翻訳するために以下のステップが遂行される。
ａ．包括的時間ベースデータフォーマットのディジタル化の細分性を指定する構成ファイルの作成（後述するユーザインタフェースを使用）、
ｂ．限定するものではないが、例えば、ASCIIデータのような種々のファイルフォーマットからＡＳＰデータ転送モジュール３０１０によって受信された時間ベース処理ツールデータを、構成ファイルを使用して時間ベースデータファイルフォーマットに翻訳する。 In accordance with one or more embodiments of the present invention, a data conversion algorithm may be used to establish a “direct” link between measured data metrics and an existing non-optimal factory (ie, factory condition). It is possible to use data based on time (hereinafter referred to as time-based) from a processing (ie factory or assembly line) tool. An important part of this data conversion algorithm is the method of converting time-based operating state data generated in a processing (factory or assembly line) tool during wafer processing into key integrated circuit specific statistics. This statistic is analyzed by the data brain engine module 3080 in the manner described below to perform automated data mining failure detection analysis. According to one or more embodiments of the present invention, the following steps are performed to translate such time-based processing tool data.
a. Creating a configuration file (using the user interface described below) that specifies the digitization granularity of the comprehensive time-based data format;
b. For example, without limitation, time-based processing tool data received by the ASP data transfer module 3010 from various file formats such as ASCII data is translated into a time-based data file format using a configuration file.

以下に、包括的時間ベースデータファイルフォーマットのためのフォーマットの定義の一実施例を示す。有利なことには、これらの実施例によれば、あるファイルを“価値がある”と考えるようにするために、全てのデータフィールドが完全である必要はない。代わりに、後述するように、幾つかのデータフィールドは後に、半導体製造実行システム（ＭＥＳ）ホストと通信する“事後処理”データファイリングルーチンによって集団化することができる。

＜BEGINNING OF HEADER＞
[ PRODUCTID CODE ]
LOTID CODE ] (italic)
[ PARNET LOTID CODO ]
[ WAFERID CODE ]
[ SLOTID CODE ]
[WIP CODE ]
[ WIP SUB-MODULE ]
[ WIP SUB-MODULE-STEP ]
[ TRACKIN DATE ] (italic)
[ TRACKOUT DATE ] (italic)
[ PROCESS TOOLID ]
[ PROCESS TOOL RECIPE USED ]
＜END OF HEADER＞
＜BEGINNING OF DATA＞ (italic)
＜BEGINNING OF PARAMETER＞ (italic)
[ PARAMETER ENGLISH NAME ] (italic)
[ PARAMETER NUMBER ] (italic)
[ DATA COLLECTION START TIME ] (italic)
[ DATA COLLECTION END TIME ] (italic)
time increment 1, data value 1 (italic)
time increment 2, data value 2 (italic)
time increment 3, data value 3 (italic)
…
＜END OF PARAMETER＞ (italic)
＜BEGINNING OF PARAMETER＞ (italic)
[ PARAMETER ENGLISH NAME ] (italic)
[ PARAMETERID NUMBER ] (italic)
[ DATA COLLECTION START TIME ] (italic)
[ DATA COLLECTION END TIME ] (italic)
time increment 1, data value 1 (italic)
time increment 2, data value 2 (italic)
time increment 3, data value 3 (italic)
…
＜END OF PARAMETER＞ (italic)
＜END OF DATA＞ (italic) The following is an example format definition for a comprehensive time-based data file format. Advantageously, according to these embodiments, not all data fields need to be complete in order for a file to be considered “worthy”. Instead, as described below, some data fields can later be grouped by a “post processing” data filing routine that communicates with a semiconductor manufacturing execution system (MES) host.

<BEGINNING OF HEADER>
[PRODUCTID CODE]
LOTID CODE] (italic)
[PARNET LOTID CODO]
[WAFERID CODE]
[SLOTID CODE]
[WIP CODE]
[WIP SUB-MODULE]
[WIP SUB-MODULE-STEP]
[TRACKIN DATE] (italic)
[TRACKOUT DATE] (italic)
[PROCESS TOOLID]
[PROCESS TOOL RECIPE USED]
<END OF HEADER>
<BEGINNING OF DATA> (italic)
<BEGINNING OF PARAMETER> (italic)
[PARAMETER ENGLISH NAME] (italic)
[PARAMETER NUMBER] (italic)
[DATA COLLECTION START TIME] (italic)
[DATA COLLECTION END TIME] (italic)
time increment 1, data value 1 (italic)
time increment 2, data value 2 (italic)
time increment 3, data value 3 (italic)
...
<END OF PARAMETER> (italic)
<BEGINNING OF PARAMETER> (italic)
[PARAMETER ENGLISH NAME] (italic)
[PARAMETERID NUMBER] (italic)
[DATA COLLECTION START TIME] (italic)
[DATA COLLECTION END TIME] (italic)
time increment 1, data value 1 (italic)
time increment 2, data value 2 (italic)
time increment 3, data value 3 (italic)
...
<END OF PARAMETER> (italic)
<END OF DATA> (italic)

この実施例によれば、イタリックで示されている(italicと表示)アイテムは、ファイル内容をＩＣデータメトリックスと適切に併合可能にするために必要である。 According to this embodiment, items shown in italics (labeled italic) are necessary to allow the file contents to be properly merged with IC data metrics.

上述したように、本発明の１またはそれ以上の実施例によれば、時間ベースデータ変換のための構成ファイルは、時間ベースデータをウェーハ統計量として表す細分性を指定する。このような実施例によれば、構成ファイルは更に、どの時間ベース生データフォーマットをその特定の構成ファイルによって処理するのかに関する情報、並びに生ファイルのデータアーカイビングに関する１またはそれ以上のオプションを含むことができる。以下は、構成ファイルの一実施例である。
＜BEGINNING OF HEADER＞

[ FILE EXTENSHIONS APPLICABLE TO THIS CONFG FILE ]
[ RAW DATA ARCHＩＶE FILE ＜Y OR N＞]
[ CREATE IMAGE ARCHＩＶE FILES＜NUMBER OF FILES / PARAMETER＞]
[ IMAGE ARCHＩＶE FILE RESOLUTION ]

＜END OF HEADER＞

＜BEGINNING OF ANALYSIS HEADER＞

[ GLOVAL GRAPH STARTS ＜ON / OFF＞, N SEGMENTS ]
[ XAXIS TIME STARTS ＜ON / OFF＞, N SEGMENTS ]
[ YAXIS PARAMETER STARTS ＜ON / OFF＞, N SEGMENTS ]

＜END OF ANALYSIS HEADER＞ As described above, according to one or more embodiments of the present invention, the configuration file for time-based data conversion specifies a granularity that represents the time-based data as wafer statistics. According to such an embodiment, the configuration file further includes information regarding which time-based raw data format is processed by that particular configuration file, as well as one or more options for data archiving of the raw file. Can do. The following is an example of a configuration file.
<BEGINNING OF HEADER>

[FILE EXTENSHIONS APPLICABLE TO THIS CONFG FILE]
[RAW DATA ARCHIVE FILE <Y OR N>]
[CREATE IMAGE ARCHIVE FILES <NUMBER OF FILES / PARAMETER>]
[IMAGE ARCHIVE FILE RESOLUTION]

<END OF HEADER>

<BEGINNING OF ANALYSIS HEADER>

[GLOVAL GRAPH STARTS <ON / OFF>, N SEGMENTS]
[XAXIS TIME STARTS <ON / OFF>, N SEGMENTS]
[YAXIS PARAMETER STARTS <ON / OFF>, N SEGMENTS]

<END OF ANALYSIS HEADER>

以下に、上述した構成ファイルパラメータを説明する。 The configuration file parameters described above will be described below.

ファイル拡張：構成ファイル内のこのラインは、所与の生の包括的時間ベースデータファイルを所与の構成ファイル内に定義されているパラメータを使用して変換することを指示するファイル拡張及び／または命名規則キーワードをリストしている。 File extension: This line in the configuration file is a file extension that indicates that a given raw generic time-based data file should be converted using the parameters defined in the given configuration file. Lists naming convention keywords.

生データアーカイブファイル：構成ファイル内のこのラインは、最初のデータのアーカイブされたコピーを保持すべきか否かを指定する。このオプションを使用することによってファイルは圧縮され、アーカイブディレクトリ構造内に格納される。 Raw data archive file: This line in the configuration file specifies whether an archived copy of the original data should be retained. By using this option, the file is compressed and stored in the archive directory structure.

イメージアーカイブファイル作成：構成ファイル内のこのラインは、データの“最初の”ビューを格納し、生データファイル（これらのファイルは大きくなる可能性があり、単一の処理ツールについて毎月合計10乃至20Ｇバイトまでを追加し得る）の全内容をアーカイブし、繰り返しプロットすることなく迅速に検索することができるように、生の時間ベースデータファイルを標準ｘ−ｙフォーマット内にグラフ化すべきか否かを指定する。イメージオプションの数は、ｘ−ｙデータプロットの種々のキー領域の多重スナップショットを格納することを可能にし、従って、データの“ズームイン”ビューも利用可能になる。 Image archive file creation: This line in the configuration file stores the “first” view of the data, and the raw data files (these files can grow, totaling 10-20G each month for a single processing tool Whether the raw time base data file should be graphed in standard xy format so that it can be archived and searched quickly without repeated plotting. specify. The number of image options makes it possible to store multiple snapshots of the various key areas of the xy data plot, thus also making a “zoomed in” view of the data available.

イメージアーカイブファイル分解能：構成ファイル内のこのラインは、イメージアーカイブファイルオプションによって捕捉されたｘ−ｙグラフに、標準イメージ圧縮のどのレベルを適用するかを定義する。 Image archive file resolution: This line in the configuration file defines what level of standard image compression is applied to the xy graph captured by the image archive file option.

大域グラフ統計量：構成ファイル内のこのラインは、当該構成ファイルによって処理された全てのファイルフォーマットに関する大域統計量の生成を指定する。これらの統計量がどのようにして生成されるかに関しては、後述する。 Global graph statistics: This line in the configuration file specifies the generation of global statistics for all file formats processed by the configuration file. How these statistics are generated will be described later.

Ｘ軸時間グラフ統計量：構成ファイル内のこのラインは、システムが、当該構成ファイルによって処理された全てのファイルフォーマットに関するＸ軸時間範囲定義された統計量の生成を指示する。これらの統計量がどのようにして生成されるかに関しては、後述する。 X-axis time graph statistics: This line in the configuration file directs the system to generate X-axis time range defined statistics for all file formats processed by the configuration file. How these statistics are generated will be described later.

パーセントデータグラフ統計量：構成ファイル内のこのラインは、システムが、当該構成ファイルによって処理された全てのファイルフォーマットに関するパーセントデータ統計量の生成を指示する。これらの統計量がどのようにして生成されるかに関しては、後述する。 Percent data graph statistics: This line in the configuration file directs the system to generate percent data statistics for all file formats processed by the configuration file. How these statistics are generated will be described later.

本発明のこれらの実施例の１またはそれ以上によれば、Ｘ軸時間グラフ統計量ともいう以下の統計量は、パラメータ毎の基準で各時間ベースデータグラフ毎に生成される。例えば、所与の時間ベースデータセット及び所与のパラメータに関して、データは構成ファイル内に定義されている複数のセグメントに分割される。Ｘ軸時間グラフセグメントは、ｘ軸の全幅（最小ｘ値から最大ｘ値まで）を取り、それをｘ軸範囲の複数（Ｎ）の等増分に分割することによって定義される。各セグメント毎に統計量が生成され、記録される。これがどのように作動するのかを理解するために、先ず図５を参照する。図５は、生の時間ベースデータの例を、詳述すれば、時間の関数としての処理ツールビーム電流のグラフを示している。図６は、図５に示す生の時間ベースデータをどのようにしてセグメントに分割するかを示しており、図７は、図６のセグメント１に対応する生の時間ベースデータを示している。 In accordance with one or more of these embodiments of the present invention, the following statistics, also referred to as X-axis time graph statistics, are generated for each time-based data graph on a parameter-by-parameter basis. For example, for a given time base data set and given parameters, the data is divided into a plurality of segments defined in the configuration file. An X-axis time graph segment is defined by taking the full width of the x-axis (from the minimum x value to the maximum x value) and dividing it into multiple (N) equal increments of the x-axis range. Statistics are generated and recorded for each segment. To understand how this works, reference is first made to FIG. FIG. 5 shows a graph of processing tool beam current as a function of time, in particular, an example of raw time base data. 6 shows how the raw time base data shown in FIG. 5 is divided into segments, and FIG. 7 shows the raw time base data corresponding to segment 1 of FIG.

以下は、典型的なセグメント統計量（例えば、各セグメント毎に10統計量ずつ、複数（Ｎ）のセグメントのセグメント統計量）である。
１．セグメント内の面積
２．セグメント内のデータの平均Ｙ軸値
３．セグメント内のデータのＹ軸値の標準偏差
４．セグメントの勾配
５．セグメントの最小Ｙ軸値
６．セグメントの最大Ｙ軸値
７．Ｙ軸平均値の先行セグメントからのパーセント変化
８．Ｙ軸平均値の次のセグメントからのパーセント変化
９．Ｙ軸標準偏差値の先行セグメントからのパーセント変化
１０．Ｙ軸標準偏差値の次のセグメントからのパーセント変化 The following are typical segment statistics (eg, segment statistics for multiple (N) segments, 10 statistics for each segment).
1. Area within the segment 2. Average Y-axis value of the data in the segment 3. Standard deviation of the Y-axis value of the data in the segment 4. Slope of the segment Minimum Y-axis value of the segment 6. Maximum Y-axis value of the segment 7. Percent change in Y-axis average value from the preceding segment 8. Percent change in Y-axis average from next segment 9. Percent change in Y-axis standard deviation value from the preceding segment Percent change in Y-axis standard deviation from the next segment

図８は、セグメント７内のＹ範囲のビン_Ｓに対する依存性の例を示している。上述した情報を使用することによってプロセスエンジニアは、処理ツール内のレシピ（処理ツールのセッティング）を低ビン_Ｓ障害に対応する範囲を有するように調整することができる。 FIG. 8 shows an example of the dependency of the Y range in the segment 7 on the bin_S. By using the information described above, the process engineer can adjust the recipe in the processing tool (processing tool settings) to have a range corresponding to the low bin_S failure.

本発明の一実施例によれば、以下の29の統計量は、ターキー（ Tukey ）データクリーニングされていないデータから計算された大域統計量である。
１．曲線下の合計面積
２．10％またはそれ以上のＹ軸勾配変化の数
３．Ｘ軸95％データ幅（即ち、データの中央から出発し、データの95％をピックアップするまで左及び右へ進む）
４．95％Ｘ軸データ幅のＹ軸平均
５．95％Ｘ軸データ幅のＹ軸標準偏差
６．95％Ｘ軸データ幅のＹ軸範囲
７．曲線下のＸ軸95％面積
８．Ｘ軸最左2.5％データ幅
９．曲線下のＸ軸最左2.5％面積
１０．Ｘ軸最右2.5％データ幅
１１．曲線下のＸ軸最右2.5％面積
１２．Ｘ軸90％データ幅（データの中央から出発し、90％をピックアップするまで左及び右へ進む）
１３．90％Ｘ軸データ幅のＹ軸平均
１４．90％Ｘ軸データ幅のＹ軸標準偏差
１５．90％Ｘ軸データ幅のＹ軸範囲
１６．曲線下のＸ軸90％面積
１７．Ｘ軸最左５％データ幅
１８．曲線下のＸ軸最左５％面積
１９．Ｘ軸最右５％データ幅
２０．曲線下のＸ軸最右５％面積
２１．Ｘ軸75％データ幅（データの中央から出発し、75％をピックアップするまで左及び右へ進む）
２２．75％Ｘ軸データ幅のＹ軸平均
２３．75％Ｘ軸データ幅のＹ軸標準偏差
２４．75％Ｘ軸データ幅のＹ軸範囲
２５．曲線下のＸ軸75％面積
２６．Ｘ軸最左12.5％データ幅
２７．曲線下のＸ軸最左12.5％面積
２８．Ｘ軸最右12.5％データ幅
２９．曲線下のＸ軸最右12.5％面積 According to one embodiment of the present invention, the following 29 statistics are global statistics calculated from data that has not been tukey data cleaned.
1. 2. Total area under the curve 2. Number of Y-axis gradient changes of 10% or more. X-axis 95% data width (ie starting from the center of the data and moving left and right until 95% of the data is picked up)
4. Y-axis average of 95% X-axis data width 5. Y-axis standard deviation of 95% X-axis data width 6. Y-axis range of 95% X-axis data width 7. X-axis 95% area under the curve X-axis leftmost 2.5% data width 9. 10. X-axis leftmost 2.5% area under the curve X-axis rightmost 2.5% data width 11. X-axis rightmost 2.5% area under the curve 12. X-axis 90% data width (starting from the center of the data and moving left and right until 90% is picked up)
13. Y-axis average of 90% X-axis data width 14. Y-axis standard deviation of 90% X-axis data width 15. Y-axis range of 90% X-axis data width 16. X-axis 90% area under the curve 17. X-axis leftmost 5% data width 18. 18. X-axis leftmost 5% area under the curve X-axis rightmost 5% data width 20. 21. X-axis rightmost 5% area under the curve X-axis 75% data width (starting from the center of the data and moving left and right until 75% is picked up)
22. Y-axis average of 2.75% X-axis data width 23.75% X-axis data width Y-axis standard deviation 24.75% X-axis data width Y-axis range X-axis 75% area under the curve 26. X-axis leftmost 12.5% data width 27. X-axis leftmost 12.5% area under the curve 28. X-axis rightmost 12.5% data width 29. X-axis rightmost 12.5% area under the curve

前記実施例において使用したパーセンテージは、一般的なパーセンテージである90、95、75等であるが、例えばデータの“核心”の範囲を多少広げることに関心がある場合にはこれらのパーセンテージを中間値に変更できるようなさらなる実施例が存在する。 The percentages used in the examples are 90, 95, 75, etc., which are common percentages, but if you are interested in expanding the range of the “core” of the data, for example, these percentages are intermediate values. There are further examples that can be changed to:

上述したような大域統計量を5000％ターキーデータクリーニングを用いて計算するさらなる実施例が存在し、また上述したような大域統計量を500％ターキーデータクリーニングを用いて計算する更に別の実施例が存在する。 There are further examples of calculating global statistics as described above using 5000% Turkey Data Cleaning, and yet another example of calculating global statistics as described above using 500% Turkey Data Cleaning. Exists.

本発明の一実施例によれば、パーセントデータ統計量は、Ｘ軸時間グラフ統計量のための前記10統計量と同一である。パーセントデータ統計量とＸ軸時間統計量との差は、セグメントを定義する方法にある。Ｘ軸時間統計量の場合、セグメントはＸ軸のＮの等部分に基づいている。しかしながら、パーセントデータ統計量の場合には、セグメントはセグメント内に含まれるデータのパーセンテージによって定義されるので、Ｘ軸上のセグメント幅は変化する。例えば、もしパーセントデータセグメントが10セグメントを用いて“オン”に調整されていれば、第１のセグメントはデータの最初の10％である（Ｘ軸をある基準として使用する場合、データ点の最左の10％）。 According to one embodiment of the present invention, the percent data statistic is the same as the 10 statistic for the X-axis time graph statistic. The difference between the percent data statistic and the X-axis time statistic is in the way the segment is defined. In the case of X-axis time statistics, the segment is based on N equal parts of the X-axis. However, in the case of percent data statistics, the segment width on the X-axis varies because the segment is defined by the percentage of data contained within the segment. For example, if the percent data segment is adjusted to “on” with 10 segments, the first segment is the first 10% of the data (if the X axis is used as a reference, 10% on the left).

図３に更に示すように、マスターローダーモジュール３０４０は（時間生成されたイベントによってトリガされるのか、またはデータ到着イベントによってトリガされるのかによって）、自己適応データベース３０３０（例えば、データファイル３０３５）からフォーマットされたデータを検索し、それを知能ベース３０５０に変換する。知能ベース３０５０は、当分野においては公知のオラクル関係型データベースとして実現されている。本発明のさらなる実施例によれば、マスターローダーモジュール３０４０は、工場からデータが“少量ずつ流入”（トリクル）してくるにつれて、検索して知能ベース３０５０へ転送するのに十分なデータの量が到着したか否かを決定するために、自己適応データベース３０３０内のディレクトリをポーリングする。 As further shown in FIG. 3, the master loader module 3040 is formatted from a self-adaptive database 3030 (eg, data file 3035) (whether triggered by a time-generated event or triggered by a data arrival event). The retrieved data is retrieved and converted to an intelligence base 3050. The intelligence base 3050 is realized as an Oracle relational database known in the art. According to a further embodiment of the present invention, the master loader module 3040 has a sufficient amount of data to retrieve and transfer to the intelligence base 3050 as the data “trickles” from the factory. To determine if it has arrived, it polls the directory in the self-adaptive database 3030.

本発明の１またはそれ以上の実施例によれば、マスターローダーモジュール３０４０及び知能ベース３０５０は、構造化されていない大量の関係型データを管理し、参照し、そして抽出する方法及び装置からなる。本発明の１またはそれ以上の実施例によれば、知能ベース３０５０は、知能ベース関係型データベース構成要素及び知能ベースファイルシステム構成要素からなるハイブリッドデータベースである。このような一実施例によれば、関係型データベース構成要素（例えば、スキーマ）は、分散型ファイルベース内に格納されている離散データへのアクセスキーを作成するために、ハッシュ・インデックスアルゴリズムを使用する。有利なことに、これは、構造化されていない生データを迅速に形式的構造に変換することを可能にし、それによって市販データベース製品の限界をバイパスし、ディスクアレイ内へ構造化ファイルを格納することによって得られる速度を利用することができる。 In accordance with one or more embodiments of the present invention, the master loader module 3040 and the intelligence base 3050 comprise methods and apparatus for managing, referencing and extracting large amounts of unstructured relational data. According to one or more embodiments of the present invention, the intelligence base 3050 is a hybrid database comprised of intelligence-based relational database components and intelligence-based file system components. According to one such embodiment, a relational database component (eg, schema) uses a hash index algorithm to create an access key to discrete data stored in a distributed file base. To do. Advantageously, this allows unstructured raw data to be quickly converted to formal structure, thereby bypassing the limitations of commercial database products and storing structured files in disk arrays The speed obtained can be used.

本発明の一実施例によれば、知能ベース３０５０のための設計セットアップの最初のステップは、存在し得る離散データ測定の適用可能レベルを定義することを含む。しかしながら、本発明のこれらの実施例の１またはそれ以上によれば、所与の離散データのための知能ベース３０５０を構築するプロセスを開始するために、その離散データのレベルがどれ程多く存在するのかを予測する必要はない。その代わりとして、知能ベース３０５０内のある点において、新しいレベル（サブレベルまたはスーパーレベルの何れか）と早期のレベルとの関係を定義することだけが要求される。これを理解するために、以下の例を考えてみよう。工場内に共通するレベルは、ウェーハの集まりであろう。これを知能ベース３０５０内のレベル１とインデックス付けすることができる。次に、ウェーハの集まり内の各特定ウェーハは、レベル２とインデックス付けすることができる。次に、ウェーハ上のチップの何れかの特定のサブグルーピングをレベル３（または、サブグルーピングカテゴリの一貫性に依存して多重レベル）とインデックス付けすることができる。有利なことに、知能ベース３０５０のこのような柔軟性によって、任意のデータ型を知能ベース３０５０内に格納することが可能になる。但し、その特性を、そのデータ型に適用されている現存細分性の最低レベルにインデックス付けできることが条件である。 According to one embodiment of the present invention, the first step in the design setup for the intelligence base 3050 includes defining the applicable level of discrete data measurements that may exist. However, according to one or more of these embodiments of the invention, there are many levels of that discrete data to initiate the process of building an intelligence base 3050 for a given discrete data. There is no need to predict. Instead, at some point in the intelligence base 3050, it is only necessary to define the relationship between the new level (either sub-level or super-level) and the early level. To understand this, consider the following example: A common level within a factory would be a collection of wafers. This can be indexed as level 1 in the intelligence base 3050. Each specific wafer in the collection of wafers can then be indexed as level 2. Next, any particular subgrouping of chips on the wafer can be indexed as level 3 (or multiple levels depending on the consistency of the subgrouping category). Advantageously, this flexibility of the intelligence base 3050 allows any data type to be stored in the intelligence base 3050. The requirement is that the property can be indexed to the lowest level of existing granularity applied to the data type.

有利なことに、本発明の１またはそれ以上の実施例によれば、知能ベース３０５０のためのデータローディングプロセスは、従来の関係型データベースのためのデータローディングプロセスよりも容易である。何故ならば、知能ベース３０５０の場合、離散した製造レベルとその特定レベルのＩＤのためのデータ測定（または、データ履歴）との関係を示すフォーマットで各新しいデータ型を再書込みするだけでよいからである。例えば、工場においては、所与のデータファイルを“レベル１”ＩＤ、“レベル２”ＩＤ等を含むライン内に再書込みし、次いでそのウェーハの組合わせの集まりのために測定を記録しなければならない。特定の関係型データベーススキーマを定義することなく、どのような適用可能なデータをもロードできるようにしたのが知能ベース３０５０のこの特性である。 Advantageously, according to one or more embodiments of the present invention, the data loading process for the intelligence base 3050 is easier than the data loading process for a traditional relational database. Because in the case of the intelligence base 3050, it is only necessary to rewrite each new data type in a format that shows the relationship between the discrete manufacturing level and the data measurement (or data history) for that particular level of ID. It is. For example, in a factory, a given data file must be rewritten into a line containing a “level 1” ID, “level 2” ID, etc., and then measurements must be recorded for that set of wafer combinations. Don't be. It is this property of the intelligence base 3050 that allows any applicable data to be loaded without defining a specific relational database schema.

有利なことに、本発明の１またはそれ以上の実施例によれば、知能ベース３０５０は、大量のデータを迅速に累積し、結合するためにハッシュ・結合アルゴリズムを使用することによって、自動化されたデータ分析ジョブ（詳細は後述する）を支援し、大きいデータベースを出力するように設計されている。従来の関係型データベース設計では、このような大量データセットを出力するには、通常は、データベース内に大きい“テーブル・結合”を必要とする。公知のように、関係型データベーステーブル・結合を使用すると、このような大きいデータセットの出力プロセスがＣＰＵを極めて繁忙にさせるが、有利なことに、知能ベース３０５０から大きいデータセットを出力するために使用される“ハッシュ・結合”アルゴリズムの場合にはこのようなことはない。 Advantageously, according to one or more embodiments of the present invention, the intelligence base 3050 is automated by using a hash and join algorithm to quickly accumulate and combine large amounts of data. It is designed to support data analysis jobs (details will be described later) and to output a large database. In conventional relational database design, outputting such large data sets usually requires large “tables / joins” in the database. As is well known, when using relational database tables and joins, the output process of such large data sets makes the CPU very busy, but advantageously, to output large data sets from the intelligence base 3050 This is not the case with the “hash-join” algorithm used.

図４は、構造化されていないデータイベントを知能ベース３０５０に構造化するための本発明の一実施例による方法の論理的なデータの流れを示している。図４に示すように、ボックス４０１０において、工場データが工場データウェアハウス４０００から検索される。この工場データは、多くの異なる形状の何れか１つであることができ、また、限定するものではないが、例えば、データベースからの履歴データ、限定するものではないが、例えば、センサのような処理ツール監視機器からの実時間データを含む多くの異なるソースの何れか１つが生成したものであることができる。次に、フォーマットされていないデータは、データパーサー４０２０へ供給される。工場ウェアハウス４０００からデータを検索する手法及び頻度は、データパーサー４０２０、データベースローダー４０４０、または知能ベース３０５０の挙動に影響を与えないことを理解されたい。次に、データパーサー４０２０は、フォーマットされたデータストリーム４０３０を出力する。このフォーマットされたデータは、データベースローダー４０４０によって受入れ可能なフォーマットである（これは単にフォーマットの問題、即ち、データがどのようにレイアウトされているかだけに関しており、データに関するどのような“知識”をも導入することはない）。次に、データベースローダー４０４０は、フォーマットされたデータストリーム４０３０を読む。データベースローダー４０４０は、ハッシュ・インデックスアルゴリズムを使用してデータ要素と、ファイルシステム４０５０内のそれらの位置との間のインデックスキーを生成する（限定するものではないが、例えば、ハッシュ・インデックスアルゴリズムはデータ要素のデータレベルＩＤを使用してインデックスキーを生成する）。次に、データは将来の参照及び使用のためにファイルシステム４０５０内に格納され、ファイルシステム４０５０を参照するハッシュ・インデックスキーが関係型データベース４０６０内に格納される。本発明の１またはそれ以上の代替実施例では、オラクル 9iデータマート内のレベルによって仕切られ、インデックス付けされたテーブル内にデータをロードすることによって知能ベース３０５０が作成される。 FIG. 4 illustrates the logical data flow of a method according to one embodiment of the present invention for structuring unstructured data events into an intelligence base 3050. As shown in FIG. 4, factory data is retrieved from the factory data warehouse 4000 in box 4010. This factory data can be any one of many different shapes, and is not limited to, for example, historical data from a database, but not limited to, for example, sensors. Any one of many different sources, including real-time data from processing tool monitoring equipment, can be generated. The unformatted data is then provided to the data parser 4020. It should be understood that the manner and frequency of retrieving data from the factory warehouse 4000 does not affect the behavior of the data parser 4020, database loader 4040, or intelligence base 3050. Next, the data parser 4020 outputs a formatted data stream 4030. This formatted data is in a format that can be accepted by the database loader 4040 (this is only about formatting issues, ie how the data is laid out, and does not have any “knowledge” about the data. Will not be introduced). The database loader 4040 then reads the formatted data stream 4030. Database loader 4040 uses a hash index algorithm to generate index keys between the data elements and their location in file system 4050 (for example, but not limited to, hash index algorithm Generate an index key using the data level ID of the element). The data is then stored in the file system 4050 for future reference and use, and a hash index key that references the file system 4050 is stored in the relational database 4060. In one or more alternative embodiments of the present invention, the intelligence base 3050 is created by loading data into an indexed table, partitioned by level in the Oracle 9i data mart.

図３に戻って、マスタービルダーモジュール３０６０は知能ベース３０５０にアクセスし、構成ファイル（ユーザ編集及び構成ファイルインタフェースモジュール３０５５を使用して生成）を使用してデータマインニング手順への入力として使用するためのデータ構造を構築する（後述）。ユーザ編集及び構成ファイルインタフェースモジュール３０５５は、マスタービルダーモジュール３０６０によって使用される構成データの構成ファイルをユーザが作成するのを可能にする。例えば、マスタービルダー３０６０は、構成ファイルによって指定された知能ベース３０５０からデータ（限定するものではないが、例えば、特定のパラメータ値範囲内の特定の型のデータ）を入手し、それを構成ファイルによって指定された知能ベース３０５０から他のデータ（限定するものではないが、例えば、別の特定の値範囲内の別の特定の型のデータ）と組合わせる。これを行うために、知能ベース３０５０の知能ベースファイルシステム構成要素が知能ベース３０５０の知能ベース関係型データベース構成要素によって参照され、異なるデータレベルを迅速に情報の“ベクトルキャッシュ”に併合することを可能にする（この“ベクトルキャッシュ”は、データマインニングにおいて使用するためのデータに変化させられる）。構成ファイルは、ユーザが、ハッシュ・インデックスを使用して新しい関係を定義し、それによって情報の新しい“ベクトルキャッシュ”を作成することを可能にする（この“ベクトルキャッシュ”は、後述する手法でデータマインニングにおいて使用するためのデータに変えられる）。このデータを、以下に“ハイパーキューブ”という。本発明の１またはそれ以上の実施例によれば、マスタービルダーモジュール３０６０はＰＣサーバー上で走るソフトウェアアプリケーションであり、当分野において公知の多くの方法の何れか１つに従ってオラクル Dynamic ＰＬ−ＳＱＬ及びパールで符号化される。 Returning to FIG. 3, the master builder module 3060 accesses the intelligence base 3050 and uses the configuration file (generated using the user edit and configuration file interface module 3055) as an input to the data mining procedure. Is constructed (described later). The user editing and configuration file interface module 3055 allows a user to create a configuration file of configuration data used by the master builder module 3060. For example, the master builder 3060 obtains data (such as but not limited to a specific type of data within a specific parameter value range) from the intelligence base 3050 specified by the configuration file and places it in the configuration file. Combined with other data from the specified intelligence base 3050 (for example, but not limited to, another specific type of data within another specific value range). To do this, the intelligence-based file system component of the intelligence base 3050 is referenced by the intelligence-based relational database component of the intelligence base 3050, allowing different data levels to be quickly merged into a “vector cache” of information (This "vector cache" is changed to data for use in data mining). The configuration file allows the user to define a new relationship using a hash index, thereby creating a new “vector cache” of information (this “vector cache” is a data Converted to data for use in mining). This data is hereinafter referred to as “Hypercube”. According to one or more embodiments of the present invention, the master builder module 3060 is a software application running on a PC server, and Oracle Dynamic PL-SQL and Perl according to any one of many methods known in the art. It is encoded with.

動作中、マスタービルダーモジュール３０６０は、構成ファイルを使用してハイパーキューブ定義を受信及び／または抽出する。次に、マスタービルダーモジュール３０６０は、ハイパーキューブ定義を使用してベクトルキャッシュ定義を作成する。次に、マスタービルダーモジュール３０６０は、ベクトルキャッシュ定義に従い、（ａ）知能ベース３０５０の知能ベース関係型データベース構成要素からのベクトルキャッシュ定義によって識別された、または指定されたファイル及びデータ要素のリストを、ハッシュ・インデックスキーを使用して検索し、（ｂ）知能ベース３０５０の知能ベースファイルシステム構成要素からファイルベースファイルを検索し、そして（ｃ）ベクトルキャッシュを、ベクトルキャッシュ定義内に識別されているデータ要素と集団にすることによって、情報のベクトルキャッシュを作成する。次に、マスタービルダーモジュール３０６０は、後述する手法でハイパーキューブ定義を使用して、ベクトルキャッシュ情報からハイパーキューブを生成する。これらのハイパーキューブには、それらが工場データ分析システム３０００を通って進行する際の識別分析結果に使用するための、及び分析結果を見直す際にクライアントが使用するためのＩＤが割当てられる。マスタービルダーモジュール３０６０は、ハイパーキューブを構築し、データマインニング結果に悪影響を及ぼすデータを、当分野において公知の多くの方法の何れか１つに従って除去してハイパーキューブデータをきれいにし、多くの異なる変数の分析を可能にするようにハイパーキューブを結合し、そしてビン及びパラメトリックデータをデータマインニングに使用するための形状に変換する（限定するものではないが、例えば、イベントによって駆動されるデータをビンに入れられたデータに変換することによって）サブモジュールを含む。 In operation, the master builder module 3060 receives and / or extracts a hypercube definition using a configuration file. Next, the master builder module 3060 creates a vector cache definition using the hypercube definition. Next, the master builder module 3060 follows the vector cache definition, (a) a list of files and data elements identified or designated by the vector cache definition from the intelligence base relational database component of the intelligence base 3050, Search using a hash index key, (b) retrieve a file base file from the intelligence base file system component of the intelligence base 3050, and (c) data identified in the vector cache definition Create a vector cache of information by grouping elements. Next, the master builder module 3060 generates a hypercube from the vector cache information using the hypercube definition by a method described later. These hypercubes are assigned IDs for use in identification analysis results as they travel through the factory data analysis system 3000 and for use by clients in reviewing analysis results. The master builder module 3060 builds the hypercube and removes data that adversely affects the data mining results according to any one of many methods known in the art to clean the hypercube data and make many different Combine hypercubes to allow analysis of variables and convert bin and parametric data into shapes for use in data mining (for example, but not limited to event driven data Includes submodules (by converting to binned data).

本発明の１またはそれ以上の実施例によれば、マスタービルダーモジュール３０６０は、データクリーナ及びスクラバ（例えば、当分野において公知の多くの方法の何れか１つに従って製造されるパール及びＣ⁺⁺ソフトウェアアプリケーション）を含む。このデータクリーニングは、構成ファイル内に記述されている基準に従って、またはユーザ入力の受信時に特別の基準で遂行することができる。 In accordance with one or more embodiments of the present invention, the master builder module 3060 is a data cleaner and scrubber (eg, pearl and C ⁺⁺ software manufactured according to any one of many methods known in the art). Application). This data cleaning can be performed according to criteria described in the configuration file, or on special criteria upon receipt of user input.

本発明の１またはそれ以上の実施例においては、マスタービルダーモジュール３０６０は、スプレッドシートを、限定するものではないが、例えば、ＳＡＳ（当分野においては公知のデータベースツール）、.jmp（ｘ−ｙデータを視覚化し、分析する際に使用するための当分野においては公知のJUMPプロット）、.xls（当分野においては公知のMicrosoft Excelスプレッドシート）、及び .txt（当分野においては公知のテキストファイルフォーマット）のような種々のファイルフォーマットでユーザへエクスポートする。本発明の１またはそれ以上の実施例においては、マスタービルダーモジュール３０６０は、ユーザが生成したハイパーキューブを入力として受信し、ベクトルキャッシュを分析のためにデータブレインエンジンモジュール３０８０へ転送するモジュールを含む。 In one or more embodiments of the present invention, the master builder module 3060 includes, but is not limited to, a spreadsheet (e.g., SAS (a database tool known in the art), .jmp (xy). JUMP plots known in the art for use in visualizing and analyzing data, .xls (a Microsoft Excel spreadsheet known in the art), and .txt (a text file known in the art) Export to the user in various file formats such as In one or more embodiments of the present invention, the master builder module 3060 includes a module that receives as input the user generated hypercube and forwards the vector cache to the data brain engine module 3080 for analysis.

本発明の１またはそれ以上の実施例によれば、データ変換モジュール３０２０、マスターローダーモジュール３０４０、及びマスタービルダーモジュール３０５０はそれぞれ、自己適応データベース３０３０、知能ベース３０５０、及びデータマインニングのためのデータ出力の連続更新を行うように動作する。 According to one or more embodiments of the present invention, the data conversion module 3020, the master loader module 3040, and the master builder module 3050 are respectively a self-adaptive database 3030, an intelligence base 3050, and data output for data mining. It operates to perform continuous update.

図３に更に示すように、ウェブコマンダーモジュール３０７０はマスタービルダーモジュール３０６０からのデータ出力を、分析のためにデータブレインエンジンモジュール３０８０へ転送する。マスタービルダーモジュール３０６０によってフォーマットされたデータセットファイルがデータマインニングのために使用可能になると、自動化されたデータマインニングプロセスは分析テンプレートに対してデータセットを分析する（分析テンプレート内の関連として設計された変数を最大化または最小化するように監視しながら、これらの変数の相対的に重要な大きさをも考える）。データブレインエンジンモジュール３０８０は、分析構成準備及びテンプレートビルダーモジュール（データブレインエンジンモジュール３０８０と共に使用するためのユーザが限定する構成パラメータ値、データマインニング自動化ファイルを構築するためのユーザインタフェースを提供する）を含むユーザ編集及び構成ファイルインタフェースモジュール３０５５を含む。データブレインエンジンモジュール３０８０は、変数の統計的特性と、自己学習ニューラルネットワーク内の変数の相対的貢献度との組合わせを使用して、自動化されたデータマインニングプロセスを遂行する。所与の定義された“重要”変数（分析テンプレート当たりの）、または、限定するものではないが、例えば、自己編成ニューラルネットワークマップ（“ＳＯＭ”）の構造に対する変数の貢献度の統計的分布及び大きさが関連質問を形成することができる基準を生成し、それを所与のデータセットの特定の型に最良に適する広範囲のデータマインニングアルゴリズムへ提示する。 As further shown in FIG. 3, the web commander module 3070 forwards the data output from the master builder module 3060 to the data brain engine module 3080 for analysis. Once the dataset file formatted by the master builder module 3060 is available for data mining, the automated data mining process analyzes the dataset against the analysis template (designed as an association within the analysis template. While monitoring to maximize or minimize the variables, consider the relative importance of these variables). The data brain engine module 3080 provides an analysis configuration preparation and template builder module (provides user-defined configuration parameter values for use with the data brain engine module 3080, a user interface for building data mining automation files). Includes user editing and configuration file interface module 3055. The data brain engine module 3080 performs an automated data mining process using a combination of the statistical properties of the variables and the relative contributions of the variables in the self-learning neural network. A statistical distribution of the contribution of a variable to a given defined “important” variable (per analysis template) or, for example, but not limited to, the structure of a self-organizing neural network map (“SOM”); We generate criteria whose magnitudes can form relevant questions and present them to a wide range of data mining algorithms that are best suited for a particular type of a given data set.

本発明の１またはそれ以上の実施例によれば、データブレインエンジンモジュール３０８０は、“ハンドオフ”動作を得るために未知のデータセットにおける統計的比較を探求することによって大きい未知のデータセットにおける柔軟な、自動化された繰り返しデータマインニングを発生する。このようなアルゴリズムの柔軟性は、データが数値属性及びカテゴリ別属性からなる場合の探求プロセスに特に有用である。これらのデータを完全に探求するために必要なアルゴリズムの例は、限定するものではないが、例えば、カテゴリ別及び数値データを相互に関係付けることができる特殊分散分析（ＡＮＯＶＡ）技術を含む。更に、これらのデータにおける統計的比較を完全に探求するためには、典型的には１つより多くのアルゴリズムが必要である。これらのデータは、半導体製造、回路基板アセンブリ、またはフラットパネルディスプレイ製造のような近代的な離散した製造プロセスに見出すことができる。 In accordance with one or more embodiments of the present invention, the data brain engine module 3080 provides flexibility in large unknown data sets by exploring statistical comparisons in unknown data sets to obtain “handoff” operations. Generate automated repeated data mining. Such algorithm flexibility is particularly useful in the exploration process where the data consists of numeric attributes and categorical attributes. Examples of algorithms required to fully explore these data include, but are not limited to, specialized variance analysis (ANOVA) techniques that can correlate categorical and numerical data, for example. Furthermore, more than one algorithm is typically required to fully explore statistical comparisons in these data. These data can be found in modern discrete manufacturing processes such as semiconductor manufacturing, circuit board assembly, or flat panel display manufacturing.

データブレインエンジンモジュール３０８０は、データマインニング分析を遂行するために、構成ファイル及びデータセット内に含まれる分析テンプレートを使用するデータマインニングソフトウェアアプリケーション（以下に、データブレインコマンドセンターアプリケーションという）を含む。本発明の１またはそれ以上の実施例によれば、データブレインコマンドセンターアプリケーションは、以下に列挙するデータマインニングアルゴリズムの１またはそれ以上を使用するために、データブレインモジュールを呼び出す。それらは、ＳＯＭ（当分野においては公知のデータマインニングアルゴリズム）、規則誘導（“ＲＩ”：当分野においては公知のデータマインニングアルゴリズム）、MahaCu（数値データをカテゴリ別または属性データ（限定するものではないが、例えば、処理ツールＩＤ）に相関付けるデータマインニングアルゴリズム、後述）、逆MahaCu（カテゴリ別または属性データ（限定するものではないが、例えば、処理ツールＩＤ）を数値データに相関付けるデータマインニングアルゴリズム、後述）、データマインニングがＳＯＭを使用して遂行され、ＳＯＭからの出力が（ａ）ＲＩ、（ｂ）MahaCuを使用してデータマインニングを遂行するために使用されるような多重レベル分析自動化、ピギン（ Pigin：後述する発明的なデータマインニングアルゴリズム）、欠陥ブレイン（詳細を後述する発明的なデータマインニングアルゴリズム）、及びセルデン（ Selden：当分野においては公知の予測モデルデータマインニングアルゴリズム）である。 The data brain engine module 3080 includes a data mining software application (hereinafter referred to as a data brain command center application) that uses the configuration files and analysis templates included in the data set to perform data mining analysis. In accordance with one or more embodiments of the present invention, the data brain command center application invokes the data brain module to use one or more of the data mining algorithms listed below. They are SOM (a data mining algorithm known in the art), rule induction ("RI": a data mining algorithm known in the art), MahaCu (numerical data by category or attribute data (limited) Although not, for example, a data mining algorithm that correlates to a processing tool ID), and inverse MahaCu (category-specific or attribute data (for example, but not limited to, processing tool ID) data that correlates to numerical data Mining algorithm (described later), data mining is performed using SOM, and output from SOM is used to perform data mining using (a) RI and (b) MahaCu. Multi-level analysis automation, Pigin: Inventive data mining algorithm Zum), inventive data mining algorithms to be described later defective brain (details) and Selden (Selden: a known predictive model data mining algorithm) in the art.

本発明の１またはそれ以上の実施例によれば、データブレインコマンドセンターアプリケーションは、複数のデータマインニングアルゴリズム及び統計的方法の使用を可能にする中央制御アプリケーションを使用する。詳述すれば、これらの実施例の１またはそれ以上による中央制御アプリケーションは、１つのデータマインニング分析からの結果をその後の分岐した分析またはランの入力へ送ることを可能にする。その結果、データブレインコマンドセンターアプリケーションは、ユーザが構成可能なシステム構成ファイルによって支配される分析の論理及び深さを用いてデータを探求するための自動化された、そして柔軟なメカニズムを提供することによって、分析繰り返しの数及び型に制限されることなく非有界データ探求を可能にする。 According to one or more embodiments of the present invention, the data brain command center application uses a central control application that allows the use of multiple data mining algorithms and statistical methods. Specifically, a centralized control application according to one or more of these embodiments allows the results from one data mining analysis to be sent to subsequent branched analysis or run inputs. As a result, the DataBrain Command Center application provides an automated and flexible mechanism for exploring data using the logic and depth of analysis governed by user configurable system configuration files. Enables unbounded data exploration without being limited by the number and type of analysis iterations.

工場データ分析システム３０００は、その最も一般的な形状において、複数の工場（それらの全てが同一法人によって所有されるか、または制御されている必要はない）から受信したデータを分析する。その結果、異なるデータセットを、並行データマインニング分析ランで同時に分析し、異なるユーザに報告することができる。更に、たとえ受信データを単一の工場（即ち、同一法人によって所有されるか、または制御されている工場）から入手したとしても、異なるデータセットを法人内の異なるグループによって並行データマインニング分析ランで同時に分析することができる。これらの場合、これらのデータマインニング分析ランは、サーバーファーム上で並行して効率的に遂行される。本発明のこれらの実施例の１またはそれ以上によれば、データブレインエンジンモジュール３０８０は自動化コマンドセンターとして働き、以下の構成要素を含む。それらは、（ａ）データブレインコマンドセンターアプリケーション（分岐分析決定及び制御アプリケーション）、これはデータブレインモジュールを呼び出し、またこれは、（i）サーバーファーム内の１組の分散されたスレーブ待ち行列（それらの１つがマスター待ち行列として構成される）を含むデータブレインコマンドセンター待ち行列管理者（当分野においては公知の多くの方法の何れか１つに従って製造される）と、（ii）サーバーファーム内の分布及びジョブロードをバランスさせるデータブレインコマンドセンターロードバランサーアプリケーション（当分野においては公知の多くの方法の何れか１つに従って製造される）と、（iii）顧客会計及び関連分析結果を作成し、管理し、そして状態監視することを可能にするデータブレインコマンドセンター会計管理者アプリケーションとを更に含み、（ｂ）構成ファイル内のデータマインニングのために使用される分析テンプレート情報をユーザが供給することを可能にするユーザ編集及び構成ファイルインタフェースモジュール３０５５（当分野においては公知の多くの方法の何れか１つに従って製造される）である。 Factory data analysis system 3000, in its most common form, analyzes data received from multiple factories (all of which need not be owned or controlled by the same legal entity). As a result, different data sets can be analyzed simultaneously in a parallel data mining analysis run and reported to different users. Furthermore, even if the received data is obtained from a single factory (ie, a factory that is owned or controlled by the same legal entity), different data sets can be analyzed by different groups within the legal entity in parallel data mining analysis runs. Can be analyzed simultaneously. In these cases, these data mining analysis runs are efficiently performed in parallel on the server farm. According to one or more of these embodiments of the present invention, the data brain engine module 3080 acts as an automated command center and includes the following components: They are: (a) the data brain command center application (branch analysis decision and control application), which calls the data brain module, and (i) a set of distributed slave queues in the server farm (they A data brain command center queue manager (manufactured according to any one of many methods known in the art), including (ii) one in the server farm, A data brain command center load balancer application that balances distribution and job load (manufactured according to any one of many methods known in the art), and (iii) creates and manages customer accounting and related analysis results , And data that allows status monitoring (B) a user editing and configuration file interface module 3055 that allows the user to supply analysis template information used for data mining in the configuration file. Manufactured according to any one of many methods known in the art).

この実施例によれば、データブレインコマンドセンターアプリケーションは、主に、データマインニングジョブ待ち行列の管理と、ネットワークされたWindowsサーバーまたはサーバーファームのアレイにジョブを自動的に分配する責を負う。データブレインコマンドセンターアプリケーションは、システム構成パラメータのための入力を受信するために、ユーザ編集及び構成ファイルインタフェースモジュール３０５５へインタフェースする。これらの実施例の１またはそれ以上によれば、データマインニングジョブは、複数のデータセット及び分析アルゴリズムからなる１組の分析ランとして定義される。ジョブは、個々のサーバースレーブ待ち行列（キュー）上に存在するマスター待ち行列管理者であるデータブレインコマンドセンター待ち行列管理者アプリケーションによって管理される。マスター待ち行列管理者は、ジョブを同時に走らせることができるように、使用可能なサーバーにデータマインニングジョブを論理的に分配する（データブレインモジュールによって遂行される）。分岐した分析ランの結果はデータブレインコマンドセンターアプリケーションによって収集され、次いで、もし必要ならば、それらはジョブの構成ファイルによって指図されてその後のランへ供給される。 According to this embodiment, the data brain command center application is primarily responsible for managing data mining job queues and automatically distributing jobs to an array of networked Windows servers or server farms. The data brain command center application interfaces to the user editing and configuration file interface module 3055 to receive input for system configuration parameters. According to one or more of these embodiments, a data mining job is defined as a set of analysis runs consisting of multiple data sets and analysis algorithms. Jobs are managed by a data brain command center queue manager application, which is a master queue manager residing on an individual server slave queue (queue). The master queue manager logically distributes data mining jobs to the available servers (performed by the data brain module) so that the jobs can run simultaneously. The results of the branched analysis runs are collected by the data brain command center application and then, if necessary, they are directed by the job configuration file and fed to subsequent runs.

更に、データブレインコマンドセンターアプリケーションは、サーバーファームのロードのバランシングを制御する。バランシングは、サーバーファーム内の使用可能なサーバー資源の効率及び制御を得るために有用である。適切なロードバランシングは、当分野においては公知の多くの方法の何れか１つに従って、個々のサーバーファームのサーバー待ち行列、及び他の相対ラン時間状態情報を実時間で監視することによって達成される。 In addition, the data brain command center application controls server farm load balancing. Balancing is useful to obtain the efficiency and control of available server resources within a server farm. Proper load balancing is achieved by monitoring the server queues of individual farms and other relative run time status information in real time according to any one of many methods known in the art. .

本発明のこれらの実施例の１またはそれ以上によれば、データブレインコマンドセンター会計管理者アプリケーションは、当分野においては公知の多くの方法の何れか１つに従って遂行される自動化された分析に対する顧客会計の作成、管理、及び状態監視を可能にする。管理及び状態通信は、データブレインコマンドセンター待ち行列管理者アプリケーション及びデータブレインコマンドセンターロードバランサーアプリケーションへの制御フィードバックを与える。 In accordance with one or more of these embodiments of the present invention, the data brain command center account administrator application provides a customer for automated analysis performed according to any one of many methods known in the art. Enable accounting creation, management, and status monitoring. Management and status communication provides control feedback to the data brain command center queue manager application and the data brain command center load balancer application.

本発明の１またはそれ以上の実施例によれば、データマインニング分析の１ステップは、相関を与えると思われるデータのクラスターを見出すために、数値データの分析に使用することができる（このステップは、これらの相関を与え得る種々の型のデータを使用してデータを分析することを試みる幾つかのデータマインニングステップを含むことができる）。このステップは、構成ファイル内に指定されているデータの型によって駆動される。次いで、次のステップにおいて、相関されたデータを分析してクラスターに関連付けられ得るパラメトリックデータを決定することができる（このステップは、これらの関連付けを与え得る種々の型のデータを使用してデータの分析を試みる幾つかのデータマインニングステップを含むことができる）。このステップも、構成ファイル内に指定されているデータの型によって駆動される。次いで、次のステップにおいて、パラメトリックデータをカテゴリ別データに対して分析し、関連付けられたパラメトリックデータと相関し得る処理ツールを決定することができる（このステップは、これらの相関を与え得る種々の型の処理ツールを使用してデータの分析を試みる幾つかのデータマインニングステップを含むことができる）。次いで、次のステップにおいて、処理ツールセンサデータをカテゴリ別データに対して分析し、障害を起こし得る処理ツールの面を決定することができる（このステップは、これらの相関を与え得る種々の型のセンサデータを使用してデータの分析を試みる幾つかのデータマインニングステップを含むことができる）。この１つの実施例によれば、データマインニング分析技術の階層は、ＳＯＭと、それに続く規則誘導と、それに続くＡＮＯＶＡと、それに続く統計的方法とを使用するであろう。 According to one or more embodiments of the present invention, one step of data mining analysis can be used to analyze numerical data to find a cluster of data that appears to be correlated (this step). Can include several data mining steps that attempt to analyze the data using various types of data that can provide these correlations). This step is driven by the type of data specified in the configuration file. In a next step, the correlated data can then be analyzed to determine parametric data that can be associated with the cluster (this step can be performed using various types of data that can provide these associations). May include several data mining steps to attempt the analysis). This step is also driven by the type of data specified in the configuration file. In the next step, the parametric data can then be analyzed against categorical data to determine processing tools that can correlate with the associated parametric data (this step can be done with various types that can provide these correlations). Can include several data mining steps that attempt to analyze the data using the processing tool Then, in the next step, the processing tool sensor data can be analyzed against categorical data to determine which aspects of the processing tool can fail (this step can be done with various types of correlations that can give these correlations). It may include several data mining steps that attempt to analyze the data using sensor data). According to this one embodiment, the hierarchy of data mining analysis techniques will use SOM, followed by rule induction, followed by ANOVA, followed by statistical methods.

図９は、例として３レベル分岐データマインニングランを示している。図９に示すように、データブレインコマンドセンターアプリケーションは（ユーザが生成した構成ファイルの分析テンプレート部分の指令の下に）、限定するものではないが、例えば、歩留まり（歩留まりとは、限定するものではないが、例えば、工場で製造されるＩＣの速度に関係付けて定義される）に関する数値データをクラスタ化するＳＯＭデータマインニング分析を遂行する。次に、図９に更に示すように、データブレインコマンドセンターアプリケーションは（ユーザが生成した分析テンプレートの指令の下に）、（ａ）ＳＯＭデータマインニング分析出力に対してマップマッチング分析（後述）を遂行し、それが、限定するものではないが、例えば、電気試験結果のようなパラメトリックデータに関係していればクラスタマッチングを遂行し、そして（ｂ）ＳＯＭデータマインニング分析出力に対して規則誘導データマインニング分析を遂行し、それが、限定するものではないが、例えば、電気試験結果のようなパラメトリックデータに関係していればクラスタの規則説明を生成する。次に、図９に更に示すように、データブレインコマンドセンターアプリケーションは（ユーザが生成した分析テンプレートの指令の下に）、（ａ）規則誘導データマインニング分析出力に対して逆MahaCu及び／またはANOVAデータマインニング分析を遂行し、それが、限定するものではないが、例えば、処理ツールにおいてなされる計測測定のための処理ツールセッティングに関係していればカテゴリ別データを数値データに相関させ、そして（ｂ）マップマッチングデータマインニング分析出力に対してMahaCu及び／またはANOVAデータマインニング分析を遂行し、それが、限定するものではないが、例えば、センサ測定のための処理ツールに関係していれば数値データをカテゴリ別データに相関させる。 FIG. 9 shows a three-level branch data mining run as an example. As shown in FIG. 9, the data brain command center application is not limited (under the command of the analysis template portion of the configuration file generated by the user), but for example, yield (yield is not limited) Perform SOM data mining analysis to cluster numerical data on (but not for example, defined in relation to the speed of ICs manufactured in the factory). Next, as further shown in FIG. 9, the data brain command center application (under the command of the analysis template generated by the user) (a) performs map matching analysis (described later) on the SOM data mining analysis output. Perform cluster matching if it is related to parametric data such as, for example, but not limited to, electrical test results, and (b) rule guidance for SOM data mining analysis output A data mining analysis is performed and a rule description of the cluster is generated if it is related to parametric data such as, but not limited to, electrical test results. Next, as further shown in FIG. 9, the data brain command center application (under the direction of the user-generated analysis template) (a) reverse MahaCu and / or ANOVA against the rule guided data mining analysis output. Perform a data mining analysis, for example, but not limited to, correlate categorical data to numerical data if it relates to processing tool settings for measurement measurements made in the processing tool; and (B) Perform MahaCu and / or ANOVA data mining analysis on the map matching data mining analysis output, for example but not limited to processing tools for sensor measurements. For example, numerical data is correlated with categorical data.

図１０は、本発明の１またはそれ以上の実施例によるデータブレインコマンドセンターアプリケーションによって遂行される分配待ち行列作成を示している。図１１は、本発明の１またはそれ以上の実施例に従って製造されるユーザ編集及び構成ファイルインタフェースモジュール３０５５の分析テンプレートユーザインタフェース部分を示している。図１２は、本発明の１またはそれ以上の実施例に従って製造される構成ファイルの分析テンプレート部分を示している。 FIG. 10 illustrates a distribution queue creation performed by a data brain command center application according to one or more embodiments of the present invention. FIG. 11 illustrates an analysis template user interface portion of a user editing and configuration file interface module 3055 that is manufactured in accordance with one or more embodiments of the present invention. FIG. 12 illustrates an analysis template portion of a configuration file that is manufactured in accordance with one or more embodiments of the present invention.

本発明の１またはそれ以上の実施例によれば、以下に“マップマッチング”と称するアルゴリズムは、自動化され集中された分析を達成するために（即ち、問題ステートメントの自動定義を与えるために）ＳＯＭを使用する。即ち、本発明の１またはそれ以上の実施例によれば、ＳＯＭは、類似パラメータを有するウェーハのクラスタのマップを作成する。例えば、もしデータセット内の各パラメータ毎にこのようなマップを作成すれば、それらは所与の時点における所与の製品にどれ程多くの独特な歩留まり問題が存在するかを決定するために使用することができる。また、これらのマップを使用して良好な“質問”を定義し、さらなるデータマインニング分析のために問い合わせることができる。 According to one or more embodiments of the present invention, an algorithm, hereinafter referred to as “map matching”, is used to achieve automated and centralized analysis (ie, to provide automatic definition of problem statements). Is used. That is, according to one or more embodiments of the present invention, the SOM creates a map of a cluster of wafers having similar parameters. For example, if you create such a map for each parameter in the data set, they can be used to determine how many unique yield problems exist for a given product at a given time. can do. These maps can also be used to define good “questions” and query for further data mining analysis.

自己編成されたマップの本質から分析を自動化することが可能であるので、発明的なＳＯＭマップマッチング技術のユーザは、完全な“ハンドオフ”自動化を達成するために“関係がある”工場内の変数名タグのリストを保持しているだけでよい。ＳＯＭ分析は、データを自動的に編成し、データセット内の異なる“工場問題”を表す分離した、そして支配的な（即ち、インパクトを与える）データクラスタを識別する。以下に説明するマップマッチングアルゴリズムと組合わされたこのＳＯＭクラスタ化によれば、各“関心のある”変数を、クラスタ毎の“関心のある”変数の挙動にインパクトを与えることが知られている何等かの履歴データの表現で記述することが可能である。このように、マップマッチングアルゴリズムと結合されたＳＯＭを使用すれば、工場は、歩留まりにインパクトを与える多くの問題（または他の重要な問題）に、完全に自動化された“ハンドオフ”分析技術を用いて対処することが可能になる。 Since it is possible to automate the analysis from the essence of self-organized maps, users of the inventive SOM map matching technology will be able to “relevant” in-factory variables to achieve full “handoff” automation. You only need to maintain a list of name tags. SOM analysis automatically organizes the data and identifies isolated and dominant (ie, impacting) data clusters that represent different “factory issues” within the data set. According to this SOM clustering combined with the map matching algorithm described below, what is known to impact each “interesting” variable on the behavior of the “interesting” variable per cluster. It is possible to describe it with the expression of such history data. Thus, using SOM combined with map matching algorithms, factories use fully automated “handoff” analysis techniques for many issues (or other important issues) that impact yield. Can be dealt with.

データセットのＳＯＭ分析を実行できるようになる前に、データセット内の各列毎に自己編成されたマップを生成しなければならない。これらのマップを生成するために、図１３に示すようなハイパーピラミッドキューブ構造が構築される。図１３に示すハイパーピラミッドキューブは、４つの層を有している。本発明の１またはそれ以上の実施例によれば、全てのハイパーピラミッドキューブは、各層が２＾ｎ×２＾ｎとなるように成長する（但し、ｎは、０をベースとする層番号）。更に、ピラミッドの各層は、ハイパーキューブを表している。即ち、ハイパーピラミッドキューブの各層は、データセット内の列を表している。図１４に示す層は、16列のデータセットの層２（０をベースとする）である。これらの実施例の１またはそれ以上によれば、ハイパーピラミッドキューブの深さが１つ進むにつれて、ハイパーキューブ（２＾ｎ×２＾ｎ）の幅が大きくなり、ハイパーキューブピラミッドの深さはデータセット内の列の数において一定に留まる。 Before a SOM analysis of a data set can be performed, a self-organized map must be generated for each column in the data set. In order to generate these maps, a hyperpyramid cube structure as shown in FIG. 13 is constructed. The hyperpyramid cube shown in FIG. 13 has four layers. According to one or more embodiments of the present invention, all hyperpyramid cubes are grown so that each layer is 2 ^ n * 2 ^ n, where n is a layer number based on 0. . Furthermore, each layer of the pyramid represents a hypercube. That is, each layer of the hyperpyramid cube represents a column in the data set. The layer shown in FIG. 14 is layer 2 (based on 0) of a 16-column data set. According to one or more of these embodiments, as the depth of the hyperpyramid cube advances, the width of the hypercube (2 ^ n × 2 ^ n) increases, and the depth of the hypercube pyramid is the data Stays constant in the number of columns in the set.

図１５は、ハイパーピラミッドキューブの第２層から抽出されたハイパーキューブからのものであるハイパーキューブ層（自己編成されたマップ）を示している。図１５に示すように、各層内のニューロン（即ち、セル）は、その列内の実記録の近似を表している。ピラミッド内の深さが１つ下方に進むとハイパーキューブはより大きくなり、キューブ内のニューロンが増加し、データキューブの各層が表している実際の列内の記録の実値に収束する。メモリの制約及び含まれる計算時間に起因して、ニューロンが表す実値にそれらが収束するまでピラミッドを成長させることは実際的でなく、または実行不能である。その代わりとして、本発明の１またはそれ以上の実施例によれば、ピラミッドは、あるしきい値に達するまで、または所定の最大深さに達するまで成長する。次いで、本発明の１またはそれ以上の実施例によれば、ＳＯＭ分析が最後の層化されたキューブに対して遂行され、ピラミッドを生成させる。 FIG. 15 shows a hypercube layer (self-organized map) that is from a hypercube extracted from the second layer of the hyperpyramid cube. As shown in FIG. 15, the neurons (ie cells) in each layer represent an approximation of the actual record in that column. As the depth in the pyramid goes down by one, the hypercube becomes larger and the neurons in the cube increase and converge to the actual value of the record in the actual column represented by each layer of the data cube. Due to memory constraints and included computation time, it is impractical or infeasible to grow pyramids until they converge to the real values that the neurons represent. Instead, according to one or more embodiments of the present invention, the pyramid grows until a certain threshold is reached or until a predetermined maximum depth is reached. Then, according to one or more embodiments of the present invention, SOM analysis is performed on the last stratified cube to generate a pyramid.

データセットの各カラム毎のＳＯＭが生成された後に、自動化されたマップマッチングデータ分析を達成するために以下の諸ステップが遂行される。 After the SOM for each column of the data set is generated, the following steps are performed to achieve automated map matching data analysis.

Ｉ．スナップショットの生成（繰り返し）：
数値依存変数（“ＤＶ”）（データ列（カラム））が与えられると、このＤＶが参照するデータキューブ内のニューラルマップを探知する。このニューラルマップを用いて、３つの領域を詳述している全ての可能なカラー領域組合わせを生成する。これら３つの領域は、高（丘）、低（池）、及び中央領域であり、ニューラルマップ上の任意のセルはこれらの領域の１つ内に入っている。これらの実施例を理解し易くするように簡易化するために、緑カラーを高領域に割当て、青カラーを中央領域に割当て、そして赤カラーを低領域に割当てる。次いで、最初のステップとして、自動化されたマップマッチング分析の基準として使用するのに必要なカラー領域のスナップショットを生成するために、各間隔において移動させることを必要とするデルタを決定する。全てのスナップショット組合わせを得るために、移動させることを必要とする２つのしきい値マーカー（即ち、低領域のためのしきい値を表すマーカーと、高領域のための別のマーカー）が存在することに注目されたい。これら２つのマーカーを変化させ、またデルタを使用することによって、全ての所望スナップショット組合わせを生成することができる。 I. Generate snapshot (repeated) :
Given a numerical dependent variable ("DV") (data string (column)), it detects the neural map in the data cube referenced by this DV. This neural map is used to generate all possible color region combinations detailing the three regions. These three regions are the high (hill), low (pond), and central regions, and any cell on the neural map falls within one of these regions. To simplify these embodiments for ease of understanding, a green color is assigned to the high region, a blue color is assigned to the central region, and a red color is assigned to the low region. The first step is then to determine the deltas that need to be moved at each interval in order to generate a snapshot of the color area needed to be used as a basis for automated map matching analysis. There are two threshold markers that need to be moved to get all the snapshot combinations (ie, a marker representing the threshold for the low region and another marker for the high region) Note that it exists. By changing these two markers and using deltas, all desired snapshot combinations can be generated.

デルタ値は、［デルタ＝（データ分布のパーセント ― これはユーザ構成値である）×２シグマ］として計算される。次に、高マーカー及び低マーカーを、この列内のデータの平均まで移動させる。この初期状態においては、ニューラルマップ内の全てのセルは、緑または赤の何れかの領域に入る。次に、低マーカーを左へデルタだけ移動させる。次いで、全てのセルが走査され、以下の諸ステップに基づいて、適切なカラーがそれらに割当てられる。もし関連セル値が、［（平均−1.25シグマ）＜セル値＜低マーカー］であれば、それには赤カラーが割当てられる。もし関連セル値が、［（高マーカー）＜セル値＜（平均＋1.25シグマ）］であれば、それには緑カラーが割当てられる。もし関連セル値が、［（低マーカー）＜セル値＜（高マーカー）］ならば、それには青カラーが割当てられる。 The delta value is calculated as [delta = (percent of data distribution-this is a user configured value) x 2 sigma]. The high and low markers are then moved to the average of the data in this column. In this initial state, all cells in the neural map fall into either the green or red area. Next, move the low marker to the left by delta. All cells are then scanned and appropriate colors are assigned to them based on the following steps. If the associated cell value is [(average −1.25 sigma) <cell value <low marker], it is assigned a red color. If the associated cell value is [(high marker) <cell value <(average + 1.25 sigma)], it is assigned a green color. If the associated cell value is [(low marker) <cell value <(high marker)], it is assigned a blue color.

これらの各スナップショット（繰り返し）において、高領域及び低領域の全てにタグが付けられ、ＳＯＭ自動化分析（後述）が遂行される。次いで、低マーカーが左へデルタだけ移動され、別のスナップショットが作成される。次いで、全ての高及び低領域にタグが付けられ、ＳＯＭ自動化分析が遂行される。このプロセスは、低マーカーが（平均−1.25シグマ）より小さくなるまで続行される。このようになると、低マーカーが初期状態にリセットされ、次いで高マーカーが右へデルタだけ前進され、このプロセスが繰り返される。これは、高マーカーが（平均＋1.25シグマ）より大きくなるまで続行される。これを以下の擬似コードで示す。
Set High Marker ＝ Mean value of column data.
Set Low Marker ＝ Mean value of column data.
Set Delta ＝ ( Percent of data distribution this is a user configurati
on
Value ) * 2sigma.
Set Low Iterator ＝ Low Marker;
Set High Iterator＝ High Marker

Keep Looping when ( High Iterator ＜( mean＋1.25 sigma )
Begin Loop
Keep Looping when ( Low Iterator ＜( mean−1.25 sigma )
Begin Loop
Go through each cell and color code the cells based on the procedure a
bove
and using the High Iterator and Low Iterator as threshold values.
Capture Automated Map Matching analysis ( see the next section below )
on this snapshot.
Set Low Iterator ＝ Low Iterator−Delta.
End Loop
Set High Iterator＝ High Iterator＋Delta.
End Loop In each of these snapshots (repeated), all the high and low regions are tagged and SOM automated analysis (described later) is performed. The low marker is then moved delta to the left and another snapshot is created. All high and low regions are then tagged and SOM automated analysis is performed. This process is continued until the low marker is less than (average -1.25 sigma). When this happens, the low marker is reset to the initial state, then the high marker is advanced delta to the right and the process is repeated. This is continued until the high marker is greater than (average +1.25 sigma). This is shown in the pseudo code below.
Set High Marker = Mean value of column data.
Set Low Marker = Mean value of column data.
Set Delta = (Percent of data distribution this is a user configurati
on
Value) * 2sigma.
Set Low Iterator = Low Marker;
Set High Iterator = High Marker

Keep Looping when (High Iterator <(mean + 1.25 sigma)
Begin Loop
Keep Looping when (Low Iterator <(mean−1.25 sigma)
Begin Loop
Go through each cell and color code the cells based on the procedure a
bove
and using the High Iterator and Low Iterator as threshold values.
Capture Automated Map Matching analysis (see the next section below)
on this snapshot.
Set Low Iterator = Low Iterator−Delta.
End Loop
Set High Iterator = High Iterator + Delta.
End Loop

図１６は、高、低、及び中央領域を有し、高クラスタ領域及び低クラスタ領域にその後の自動化マップマッチング分析のためのタグが付けられた自己編成マップを示している。 FIG. 16 shows a self-organizing map having high, low, and central regions, with the high and low cluster regions tagged for subsequent automated map matching analysis.

II．スナップショットの自動化マップマッチング分析（繰り返し）
ステップ１において生成された３カラー領域スナップショットの各々は、以下のように分析される。関心領域（ユーザは、選択されたＤＶ（列）ニューラルマップの池（低）領域または丘（高）領域の何れに関心があるかを指定する）。この関心領域をソース領域と呼び、他の、反対の領域をターゲット領域と呼ぶことにする。他の独立変数（“ＩＶ”）マップ、即ちＤＶ列ではないデータキューブ内の列の自動化されたＳＯＭランキングを得るための前提は、同一データセットの行（記録）がそのデータキューブを通してありのままに投影されるという事実に基づいている。従って、もしデータセットの行22が所与のＤＶのニューラルマップの10行40列上に位置していれば、そのセル位置（22, 40）は、他の全てのＩＶのニューラルマップのデータセットの行22を含むであろう。詳述すれば、図１７は、ハイパーキューブを通してのセルの投影を示している。図１７から明かなように、それがハイパーキューブの各層を通して投影された時に各層毎の予測値と最良適合するように、“最良適合”記録が確立される。簡単に言えば、その目的は、ソース及びターゲット領域からなる記録を分析することであり、またそれらが互いにどれ程異なっているかを決定することである。各グループを構成している記録はニューラルマップを横切って同一であるから、ソースのグループがターゲットのグループからどれ程大きく異なっているかに基づいて各ニューラルマップにランク付けすることができる。次いで、このスコアを使用してニューラルマップを最高から最低までランク付けする。スコアが高いことは、ニューラルマップ内の２つのグループが互いに極めて異なることを意味し、反対に、スコアが低いことは、２つのグループが互いに極めて類似していることを意味する。従って、目的は、２つのグループ間の差が最大のＩＶニューラルマップを見出すことである。以下に、この目的を達成するために使用される諸ステップを示す。
ａ．インパクトを受けたスコアに従って、ソースクラスタを最高から最低までランク付けする。各クラスタ毎のインパクトを受けたスコアは、［インパクトを受けたスコア＝（実際の列平均−ニューラルマップの平均）×（クラスタ内の独自記録の数）÷列内の合計記録］に従って計算される。
ｂ．最高にランク付けされたソースクラスタから開始し、以下の基準に基づいてそのターゲットクラスタ近隣にタグ付けする。以下の各基準は相応に重み付けされており、実際に割当てられる結果的なスコアは重みの平均である。
１．それが、ソースクラスタにどれ程近いか。これは、ターゲットクラスタからソースクラスタまでの図心距離として計算される（図心セルとは、クラスタの中心を占めるセルである）。２つのセルを決定した後に、ピタゴラスの定理を使用して図心距離が計算される。
２．クラスタ内の独自記録の数。
３．取り囲んでいるセルの平均と比較した周辺セルの平均。
これは、１つと多くとの関係を与える。即ち１つのソースクラスタがその多くのターゲットクラスタ近隣に関係付けされる。
ｃ．ソースクラスタ内の全ての記録に母集団１とラベル付けし、ターゲットクラスタ内の全ての記録に母集団２とラベル付けする。これは、以下に基づいて、２つのグループがどれ程異なっているかを決定するために使用される。
ｄ．母集団１及び母集団２を使用してＩＶの“スコア”を計算するために、スコアリング関数を使用する。このスコアリング関数は、限定するものではないが、例えば、変形Ｔ試験スコアリング関数、カラーコントラストスコアリング関数、ＩＶインパクトスコアリング関数等を含む。
「変形Ｔ試験スコアリング関数」は、次のように遂行される：
変形Ｔ試験は、２つの母集団グループを比較する正則Ｔ試験に基づく。その差は、Ｔ試験後にスコアが計算され、Ｔ試験スコアに縮小（ reduction ）比を乗ずることによって、最終スコアが計算される。即ち、
変形Ｔ試験＝（縮小比）×Ｔ試験
縮小比は、ソース母集団の平均より大きいターゲット母集団内の記録の数を計数することによって計算される。次に、この数を、ソース母集団の平均より小さいターゲット母集団内の記録の数から差し引く。最後に、ターゲット母集団内の合計記録数によって除すことによって縮小比が計算される。即ち、
縮小比＝（ソース平均より小さいターゲット記録の数−ソース平均より大きいターゲット記録の数）の絶対値÷ターゲット領域内の合計記録数
このスコアを、後刻ＩＶニューラルマップをランク付けするために格納する。
「カラーコントラストスコアリング関数」は、次のように遂行される：
ＩＶニューラルマップ上の母集団１と母集団２との間のカラーコントラストを比較する。
「ＩＶインパクトスコアリング関数」は、次のように遂行される：
上述のようにして決定されたカラーコントラストに、ＤＶニューラルマップに基づくインパクトスコアを乗ずる。
ｅ．ハイパーキューブ内の各ＩＶニューラルマップ毎にステップｄ．を繰り返す。
ｆ．変形Ｔ試験スコアに従って、ＩＶニューラルマップをランク付けする。全てのＩＶが使用される前に、またはユーザが指定したしきい値に達する前に、もし変形Ｔ試験スコアが０に接近すれば、残余のＩＶニューラルマップは一般Ｔ試験スコアを使用してランク付けされる。
ｇ．ユーザ構成設定によって指定されたトップパーセンテージＩＶニューラルマップを格納する。 II. Automated map matching analysis of snapshots (repeated)
Each of the three color region snapshots generated in step 1 is analyzed as follows. Region of interest (user specifies whether he is interested in a pond (low) region or a hill (high) region of the selected DV (column) neural map). This region of interest is called the source region, and the other opposite region is called the target region. The premise for obtaining an automated SOM ranking of other independent variable (“IV”) maps, ie columns in a data cube that is not a DV column, is to project rows (records) of the same data set through the data cube as is. Is based on the fact that Thus, if row 22 of the data set is located 10 rows and 40 columns of a given DV neural map, its cell location (22, 40) is the data set of all other IV neural maps. Would include line 22. Specifically, FIG. 17 shows the projection of a cell through a hypercube. As is apparent from FIG. 17, a “best fit” record is established so that when it is projected through each layer of the hypercube, it best fits the predicted value for each layer. Simply put, the purpose is to analyze the records consisting of the source and target areas and to determine how different they are from each other. Since the records that make up each group are the same across the neural map, each neural map can be ranked based on how much the source group differs from the target group. This score is then used to rank the neural map from highest to lowest. A high score means that the two groups in the neural map are very different from each other, whereas a low score means that the two groups are very similar to each other. The goal is therefore to find an IV neural map with the greatest difference between the two groups. Below are the steps used to achieve this goal.
a. Rank source clusters from highest to lowest according to impacted scores. The impacted score for each cluster is calculated according to [Impacted Score = (Actual Column Average-Neural Map Average) x (Number of Unique Records in Cluster) / Total Records in Column]. .
b. Start with the highest ranked source cluster and tag its target cluster neighborhood based on the following criteria: Each of the following criteria is weighted accordingly, and the resulting score actually assigned is the average of the weights.
1. How close is it to the source cluster? This is calculated as the centroid distance from the target cluster to the source cluster (the centroid cell is the cell occupying the center of the cluster). After determining the two cells, the centroid distance is calculated using the Pythagorean theorem.
2. Number of unique records in the cluster.
3. The average of the surrounding cells compared to the average of the surrounding cells.
This gives one and many relationships. That is, one source cluster is associated with its many target cluster neighbors.
c. Label all records in the source cluster with population 1, and label all records in the target cluster with population 2. This is used to determine how different the two groups are based on:
d. A scoring function is used to calculate an IV “score” using population 1 and population 2. This scoring function includes, but is not limited to, for example, a modified T test scoring function, a color contrast scoring function, an IV impact scoring function, and the like.
The “modified T test scoring function” is performed as follows:
The modified T test is based on a regular T test that compares two population groups. The difference is scored after the T test and the final score is calculated by multiplying the T test score by the reduction ratio. That is,
Modified T test = (reduction ratio) × T test reduction ratio is calculated by counting the number of records in the target population that is greater than the average of the source population. This number is then subtracted from the number of records in the target population that is less than the average of the source population. Finally, the reduction ratio is calculated by dividing by the total number of records in the target population. That is,
Reduction ratio = (absolute value of target records less than source average−number of target records greater than source average) ÷ total number of records in target area This score is stored for later ranking of the IV neural map.
The “color contrast scoring function” is performed as follows:
The color contrast between population 1 and population 2 on the IV neural map is compared.
The “IV impact scoring function” is performed as follows:
The color contrast determined as described above is multiplied by an impact score based on the DV neural map.
e. Step d. For each IV neural map in the hypercube. repeat.
f. Rank the IV neural map according to the modified T test score. If the modified T test score approaches zero before all IVs are used or before the user specified threshold is reached, the remaining IV neural map is ranked using the general T test score. Attached.
g. Store the top percentage IV neural map specified by the user configuration settings.

III．結果を生成し、結果を他の分析方法へ送る
ＩＶのトップＸ％（最高の合計スコアを有する；ユーザによって構成ファイル内に指定）を選択する。本発明の１またはそれ以上の実施例によれば、勝利した各スナップショット毎にユーザが見るための以下の自動化された結果が生成される。
ａ．勝利したＩＶのニューラルマップが表示される。独立変数のＳＯＭマップは、トップにアウトラインされている従属変数「丘」及び「池」クラスタを有し、明瞭なアウトラインカラー及びクリヤーなクラスタラベルを有する背景マップである。マップの凡例は、カラーの境界しきい値の実際の値と共に互いに結合された３つの明瞭なカラー（例えば、緑、赤、青）で指示される。
ｂ．実際の結果は、この特定の勝利したＤＶのために走る。これは、所与の選択されたＤＶのためにＩＶが互いに他に対してどのようにランク付けされたかの実際の結果である。
ｃ．ソース及びターゲット領域を構成している記録だけを含むより小さいデータセットが書かれる。このより小さいデータセットは、他のデータ分析方法によるさらなる分析のための基準である。例えば、自動化された“質問”を得るために、このより小さいデータセットは、マップマッチングランからアウトラインされた適切な領域を有する規則誘導データ分析方法エンジン内へフィードバックされる。これらの領域は規則誘導分析が説明する“質問”を形成する。規則誘導は、統計的な有効性を有する変数の相互作用を説明する規則を生成する。それは、生成された質問に最良適合する仮説を見出すために、データベースを探索する。 III. Select the top X% of IVs (with the highest total score; specified in the configuration file by the user) that generates the results and sends the results to other analysis methods . In accordance with one or more embodiments of the present invention, the following automated results are generated for the user to see for each winning snapshot.
a. A winning IV neural map is displayed. The independent variable SOM map is a background map with dependent variables “hill” and “pond” clusters outlined at the top, with a clear outline color and a clear cluster label. The map legend is indicated with three distinct colors (eg, green, red, blue) combined together with the actual value of the color boundary threshold.
b. The actual result will run for this particular winning DV. This is the actual result of how IVs are ranked against each other for a given selected DV.
c. A smaller data set is written that contains only the records that make up the source and target regions. This smaller data set is a reference for further analysis by other data analysis methods. For example, to obtain an automated “question”, this smaller data set is fed back into the rule-derived data analysis method engine with the appropriate region outlined from the map matching run. These areas form the “questions” that the rule induction analysis explains. Rule derivation generates rules that describe the interaction of variables with statistical validity. It searches the database to find a hypothesis that best fits the generated question.

IV．全てのＤＶのために前記ステップＩ−IIIを繰り返す：
構成ファイル内のユーザが指定した全てのＤＶのために、ステップＩ乃至ステップIIIを繰り返す。総合ハウスキーピングタスクを遂行し、自動化されたマップマッチング結果の報告書生成を準備し、そしてこれらのランの返答を他のデータ分析方法へ送る。 IV. Repeat steps I-III for all DVs :
Repeat steps I through III for all DVs specified by the user in the configuration file. Perform integrated housekeeping tasks, prepare automated map matching results report generation, and send these run responses to other data analysis methods.

本発明の１またはそれ以上の実施例によれば、データブレインモジュールは、以下に“ピギン”（Pigin）と称する発明的なデータマインニングアルゴリズムアプリケーションを含む。ピギンは、目標にされた数値変数のために、データセット内のどの他の数値変数が指定された目標変数に貢献（即ち、相関）しているかを決定する発明的なデータマインニングアルゴリズムアプリケーションである。ピギンはカテゴリ別データを分析しない（その意味では、他の幾つかのデータマインニングアルゴリズムよりも範囲が狭い）が、その分析をより速く、そして他の標準データマインニングアルゴリズムより効率的にメモリを使用して遂行する。このアルゴリズムは、目標にされた変数（即ち、データマインニングエクササイズによって説明される変数−以下に従属変数（“ＤＶ”）という）を取り扱う。アルゴリズムは、以下の諸ステップに従って動作する。ステップ１：どれ程多くのデータが各カテゴリ内に配置されているかを決定するユーザが構成可能なパラメータに基づいて、ＤＶの数値分布を一連のカテゴリとして処理する。ステップ１を、図１８に示す。図１８は、数値分布から“仮想”カテゴリを定義することを示している。ステップ２：ステップ１においてＤＶグループ（または、スプリット）が定義された後に、データセット内の他の数値変数（以下、独立変数、または“ＩＶ”という）のためのそのカテゴリに一致するデータに基づいて、各ＤＶカテゴリ毎に一連の信頼分布円を計算する。ステップ３：各ＩＶ毎の信頼円の総合的な広がりに基づいて、後刻アナリストによって“目標にされた”ＤＶにどのＩＶが最も高度に相関しているかを決定するのに使用するために、直径スコア及びギャップスコアをその変数に割当てる。直径スコアまたはギャップスコアの値が高いことは、ＤＶとＩＶとが“より良好に”相関していることを指示していることが多い。ステップ２及び３を、図１９に示す。図１９は、これらのスコアの計算を示しており、［ギャップスコア＝全てのギャップ（どの円内にもない）の和］として、及び［直径スコア＝３つの円のＤＶ平均直径］として計算される。ここに、ＤＶカテゴリは、そのＤＶのための数値分布に基づく。要約すれば、図１９は、各菱形が母集団を表し、また菱形の端点が図の右側にプロットされている円（これらの円を、“95％信頼円”という）を発生するような信頼プロットである。ステップ４：繰り返し。ステップ１のＤＶ定義に基づいて全てのＩＶにスコアが割当てられた後に、スプリットの定義を僅かに変化させるために、ＤＶが再定義される。この再定義が行われた後に、新しいＤＶカテゴリ定義に対する全てのＩＶのためのスコアが再計算される。ＤＶカテゴリ定義を洗練させるプロセスは、分析テンプレート内のユーザが指定した繰り返し数に達するまで続行される。ステップ５：総合スコア。全ての繰り返しが完了すると、ステップ１及び４において説明したようなＤＶの種々の定義に基づく一連のＩＶランキングが存在することになる。これらのリストが併合され、ターゲットＤＶに最も高度に相関しているＩＶの “マスターランク付けされた”リストが形成される。所与のＩＶのためのマスタースコアを計算する場合、３つのファクタを考慮に入れる。即ち、ギャップスコアの大きさ、直径スコアの大きさ、及びＩＶがＤＶスコアリングリストのシリーズに現れた回数である。これら３つのファクタは、ある基本的“ジャンク結果”排他基準と組合わされて、所与の目標にされたＤＶのための最も高度に相関しているＩＶのリストを形成する。これを、図２０に示す。遭遇する各ＩＶ毎のギャップスコア及び直径スコアを使用してこれらの実施例の１またはそれ以上を説明したが、本発明の実施例はこれらの型のスコアに限定されるものではなく、事実、ＩＶのためのスコアを計算するための他のスコアリング関数を使用するさらなる実施例が存在することを理解すべきである。 In accordance with one or more embodiments of the present invention, the data brain module includes an inventive data mining algorithm application, hereinafter referred to as “Pigin”. Piggin is an inventive data mining algorithm application that determines which other numeric variables in a data set contribute (ie, correlate) to a specified target variable for a targeted numerical variable. is there. Piggin does not analyze categorical data (in that sense, it is narrower in scope than some other data mining algorithms), but it analyzes faster and more efficiently than other standard data mining algorithms. Use to carry out. This algorithm handles targeted variables (ie, variables described by data mining exercises—hereinafter referred to as the dependent variable (“DV”)). The algorithm operates according to the following steps: Step 1: Process DV numeric distribution as a series of categories based on user configurable parameters that determine how much data is placed in each category. Step 1 is shown in FIG. FIG. 18 shows that the “virtual” category is defined from the numerical distribution. Step 2: After DV groups (or splits) are defined in Step 1, based on data matching that category for other numeric variables in the data set (hereinafter referred to as independent variables, or “IV”) A series of confidence distribution circles is calculated for each DV category. Step 3: Based on the overall spread of the confidence circle for each IV, it is used to determine which IV is most highly correlated with the “targeted” DV by the analyst at a later time. A diameter score and a gap score are assigned to the variables. A high value for the diameter score or gap score often indicates that DV and IV are “better” correlated. Steps 2 and 3 are shown in FIG. FIG. 19 shows the calculation of these scores, calculated as [Gap score = sum of all gaps (not in any circle)] and [Diameter score = DV average diameter of 3 circles]. The Here, the DV category is based on a numerical distribution for the DV. In summary, FIG. 19 shows a confidence such that each diamond represents a population and the end points of the diamond are plotted on the right side of the diagram (these circles are referred to as “95% confidence circles”). It is a plot. Step 4: Repeat. After the scores have been assigned to all IVs based on the DV definition in step 1, the DV is redefined to slightly change the split definition. After this redefinition is done, the scores for all IVs for the new DV category definition are recalculated. The process of refining the DV category definition continues until the number of iterations specified by the user in the analysis template is reached. Step 5: Overall score. When all iterations are complete, there will be a series of IV rankings based on the various definitions of DV as described in steps 1 and 4. These lists are merged to form a “master-ranked” list of IVs that are most highly correlated to the target DV. When calculating the master score for a given IV, three factors are taken into account. That is, the size of the gap score, the size of the diameter score, and the number of times IV appears in the DV scoring list series. These three factors, combined with some basic “junk result” exclusion criteria, form the most highly correlated list of IVs for a given targeted DV. This is shown in FIG. While one or more of these examples have been described using gap scores and diameter scores for each IV encountered, the examples of the present invention are not limited to these types of scores, in fact, It should be understood that there are further examples that use other scoring functions to calculate a score for IV.

本発明の１またはそれ以上の実施例によれば、データブレインモジュールは、数値データをカテゴリ別データまたは属性データ（限定するものではないが、例えば、処理ツールＩＤ）に相関させる相関アプリケーション（ MahaCu ）を含む。このアプリケーションは、（ａ）定性規則上でランク付けされた高速統計的出力、（ｂ）直径スコア及び／またはギャップスコアに基づくランク付けされたスコアリング、（ｃ）少なめに表されたツールＩＤを排除するために使用されるスコアリングしきい値、（ｄ）表示されるトップ“発見物”の数を選択する能力、及び（ｅ）“発見物”（ findings ）（ツールＩＤ）からの結果を、これらの“発見物”（ツールＩＤ）を表示できるようにすることによって影響される従属変数及びパラメータ（数）にすることができるような逆ランを遂行する能力を提供する。 In accordance with one or more embodiments of the present invention, the data brain module can correlate numeric data with categorical data or attribute data (such as, but not limited to, processing tool IDs). including. This application allows (a) fast statistical output ranked on qualitative rules, (b) ranked scoring based on diameter score and / or gap score, and (c) less expressed tool ID. The scoring threshold used to eliminate, (d) the ability to select the number of top “findings” to be displayed, and (e) the results from “findings” (tool ID) Provide the ability to perform reverse runs, which can be dependent variables and parameters (numbers) affected by allowing these “findings” (tool IDs) to be displayed.

図２１は、上述したデータブレインモジュール相関アプリケーションへの入力であるデータマトリックスのサブセットの例を示している。この例には、ラインの終わりプローブデータ（ＢＩＮ）が、処理ツールＩＤ（ Eq_Id ）及び処理時間（ Trackin ）と共に、ロットを基準として示されている。類似のデータマトリックスを、ウェーハ、サイト（レティクル）、またはダイを基準として作成することもできる。 FIG. 21 shows an example of a subset of a data matrix that is an input to the data brain module correlation application described above. In this example, end-of-line probe data (BIN) is shown relative to a lot, along with a processing tool ID (Eq_Id) and processing time (Trackin). Similar data matrices can be created based on wafers, sites (reticles), or dies.

図２２は、番号（ビン）対カテゴリ（ツールＩＤ）ランの例を示している。従属変数としてビン（番号）を使用すると、上述したデータブレインモジュール相関アプリケーションは、データマトリックス内の各Eq_Id毎に類似のプロットを作成する。左区画内の菱形の幅はツールを通ってランしたロットの数を表し、右区画内の円の直径は95％信頼レベルを表している。 FIG. 22 shows an example of a number (bin) vs category (tool ID) run. Using bins (numbers) as a dependent variable, the data brain module correlation application described above creates a similar plot for each Eq_Id in the data matrix. The diamond width in the left plot represents the number of lots run through the tool, and the circle diameter in the right plot represents the 95% confidence level.

多くのプロットを分類するために、円間のギャップ空間（即ち、円によって囲まれていない領域）と、最上円のトップと最下円の底との間の合計距離との和を、“ギャップスコア”または“直径スコア”と呼ぶものを計算するための公式の一部として使用する。上述してデータブレインモジュール相関アプリケーションは、スコアの型を優先させるユーザが選択可能な相対重み付けに基づく重要度の順序でプロットを分類する。 To classify many plots, the sum of the gap space between circles (ie, the region not surrounded by circles) and the total distance between the top of the top circle and the bottom of the bottom circle is expressed as “gap Used as part of the formula to calculate what is called "score" or "diameter score". As described above, the data brain module correlation application classifies the plots in order of importance based on a relative weight that can be selected by the user in favor of the score type.

本発明のこの実施例の別の面によれば、上述したデータブレインモジュール相関アプリケーションは、スコアリングしきい値を設定する。典型的にはＩＣの特定処理層のために使用される多くの処理ツールが存在するが、正規基準ではそれらのサブセットだけが使用される。使用されない処理ツールは、屡々データを規則的にスキューし、データ処理中に不要の雑音を発生させる恐れがある。上述したデータブレインモジュール相関アプリケーションは、分析に先立って少なめに表されたツールを濾過して除去できるように、ユーザが定義したスコアリング値を使用することができる。例えば、もしスコアリングしきい値が90に設定されていれば、図２３に示されている３つのツールの中のXTOOL1及びXTOOL2がロットの90％以上を含んでいるので、XTOOL3は濾過されて排除される。 According to another aspect of this embodiment of the invention, the data brain module correlation application described above sets a scoring threshold. There are many processing tools that are typically used for specific processing layers of an IC, but only a subset of them is used in the normal criteria. Unused processing tools often skew data regularly and cause unwanted noise during data processing. The data brain module correlation application described above can use user-defined scoring values so that less-represented tools can be filtered out prior to analysis. For example, if the scoring threshold is set to 90, XTOOL3 will be filtered because XTOOL1 and XTOOL2 in the three tools shown in Figure 23 contain more than 90% of the lot. Eliminated.

本発明の１またはそれ以上の実施例によれば、上述したデータブレインモジュール相関アプリケーションは、“トップスコアの数”オプションを提供する。この特色を使用することによってユーザは、従属変数毎に表示可能な結果の最大数を決定することができる。従って、上述したデータブレインモジュール相関アプリケーションは全ての独立変数に対する分析を遂行するが、“トップスコアの数”フィールド内のプロット入力の数だけが表示されるようになる。 In accordance with one or more embodiments of the present invention, the data brain module correlation application described above provides a “number of top scores” option. By using this feature, the user can determine the maximum number of results that can be displayed for each dependent variable. Thus, the data brain module correlation application described above performs an analysis on all independent variables, but only the number of plot entries in the “number of top scores” field is displayed.

本発明の１またはそれ以上の実施例によれば、上述したデータブレインモジュール相関アプリケーションは、カテゴリ（限定するものではないが、例えば、ツールＩＤ）を従属変数にし、またカテゴリによって影響を受ける数値パラメータ（限定するものではないが、例えば、ビン、電気試験、計測等）を重要度の順に表示する逆ラン（逆MahaCu）をも遂行する。この重要度（スコア）は、番号対ツールＩＤラン中に行われるものと同一である。これらのランは、正常ラン中に検出されたツールＩＤを、自動的に逆ランのための従属変数にすることができる“デイジーチェーン”化することができる。 In accordance with one or more embodiments of the present invention, the data brain module correlation application described above makes a category (for example, but not limited to, a tool ID) a dependent variable, and a numerical parameter that is affected by the category. It also performs a reverse run (reverse MahaCu) that displays (but not limited to, for example, bins, electrical tests, measurements, etc.) in order of importance. This importance (score) is the same as that performed during the number versus tool ID run. These runs can be “daisy chained” where tool IDs detected during normal runs can be automatically made dependent variables for reverse runs.

本発明の１またはそれ以上の実施例によれば、スコアリング技術に基づいて欠陥問題をランク付けする欠陥ブレインモジュールと称するアプリケーションを含んでいる。しかしながら、この分析を遂行するためには、以下に説明するように、欠陥データをデータ変換モジュール３０２０によってフォーマット化しなければならない。図２４は、例えば、工場内の欠陥検査ツールまたは欠陥見直しツールによって生成された欠陥データファイルの例を示している。詳述すれば、このようなファイルは典型的に、ウェーハ上の各欠陥のｘ及びｙ座標、ｘ及びｙダイ座標、サイズ、欠陥の型分類コード、及びイメージ情報に関する情報を含む。本発明の１またはそれ以上の実施例によれば、データ変換モジュール３０２０は、この欠陥データファイルをダイレベル上のサイジング、分類（例えば、欠陥の型）及び欠陥密度からなるマトリックスに変換する。図２５は、データ変換アルゴリズムによって作成されたデータマトリックスの例を示している。本発明の一実施例によれば、欠陥ブレインモジュールは、スコアリング技術に基づいて欠陥問題をランク付けする自動化された欠陥データマインニング障害検出アプリケーションからなる。このアプリケーションによれば、特定のサイズのビンまたは欠陥の型のインパクトは、以下に“キル（ Kill ）比”と呼ぶパラメータを使用して定量化される。キル比は、次のように定義される。
キル比＝欠陥の型を伴う不良ダイの数÷欠陥の型を伴うダイの合計数 According to one or more embodiments of the present invention, it includes an application called a defect brain module that ranks defect problems based on scoring techniques. However, in order to perform this analysis, the defect data must be formatted by the data conversion module 3020 as described below. FIG. 24 shows an example of a defect data file generated by, for example, a defect inspection tool or a defect review tool in a factory. Specifically, such a file typically includes information about the x and y coordinates, x and y die coordinates, size, defect type code, and image information for each defect on the wafer. According to one or more embodiments of the present invention, the data conversion module 3020 converts the defect data file into a matrix of die level sizing, classification (eg, defect type) and defect density. FIG. 25 shows an example of a data matrix created by the data conversion algorithm. According to one embodiment of the present invention, the defect brain module comprises an automated defect data mining fault detection application that ranks defect problems based on scoring techniques. According to this application, the impact of a particular size bin or defect type is quantified using a parameter referred to below as the “kill ratio”. The kill ratio is defined as follows:
Kill ratio = number of defective dies with defect type ÷ total number of dies with defect type

これも使用することができる別のパラメータは％損失であり、これは次のように定義される。
％損失＝欠陥の型を伴う不良ダイの数÷不良ダイの合計数 Another parameter that can also be used is% loss, which is defined as:
% Loss = number of defective dies with defective mold ÷ total number of defective dies

上述した定義内の不良ダイとは、機能しないダイと呼ばれるものである。 A defective die within the above definition is called a nonfunctional die.

図２６は、欠陥ブレインモジュールアプリケーションの典型的な出力を示している。図２６には、特定の欠陥の型（この例では、マイクロゲージ）を含むダイの数が、ダイ上のその型の欠陥の数に対してプロットされている。データマトリックス内には機能（即ち、良好）及び機能不全（即ち、不良）ダイ情報が存在しているから、特定の欠陥の型を含むどのダイが良好であるか、または不良であるかを決定するのは容易である。従って、図２６には良好及び不良ダイ頻度がプロットされており、欠陥を含むダイの合計数に対する不良ダイの比（即ち、キル比）がグラフとして示されている。これらのグラフにおいて、グラフィカルセグメントの勾配が抽出され、欠陥ブレインモジュールアプリケーションによって生成された他の全てのプロットからのグラフィカルセグメントの勾配と比較され、それらは最高の勾配から始めて最低の勾配までランク付けされる。最高の勾配を有するグラフは歩留まりに影響を与える最重要な勾配であり、歩留まり向上エンジニアにとって価値があろう。 FIG. 26 shows a typical output of a defect brain module application. In FIG. 26, the number of dies containing a particular defect type (in this example, microgauge) is plotted against the number of defects of that type on the die. Because there is functional (ie good) and dysfunctional (ie bad) die information in the data matrix, determine which die, including the specific defect type, is good or bad It is easy to do. Therefore, good and bad die frequencies are plotted in FIG. 26, and the ratio of bad dies to the total number of dies including defects (ie, kill ratio) is shown as a graph. In these graphs, the gradients of the graphical segments are extracted and compared with the gradients of the graphical segments from all other plots generated by the defect brain module application, and they are ranked from the highest gradient to the lowest gradient. The The graph with the highest slope is the most important slope that affects yield and will be valuable to yield improvement engineers.

これらのプロットの１つの重要な特色は、欠陥ブレインモジュールアプリケーションがｘ軸上の“欠陥の数”ビンの最大数を調整する能力である。もしこれを使用することができなければ、有害物または偽欠陥の場合のようにあるダイ上の欠陥が異常な数であるような場合に、勾配ランキングが誤りになる。 One important feature of these plots is the ability of the defect brain module application to adjust the maximum number of “number of defects” bins on the x-axis. If this cannot be used, the slope ranking will be incorrect if there are an unusual number of defects on a die, such as in the case of harmful or false defects.

本発明の１またはそれ以上の実施例によれば、データブレインモジュールは、例えば、データクリーナー（例えば、当分野においては公知の多数の方法の何れか１つに従って作成されたパール及びＣ⁺⁺ソフトウェアアプリケーション）、データ変換プログラム（例えば、当分野においては公知の多数の方法の何れか１つに従って作成されたパール及びＣ⁺⁺ソフトウェアアプリケーション）、及びデータフィルタ（例えば、当分野においては公知の多数の方法の何れか１つに従って作成されたパール及びＣ⁺⁺ソフトウェアアプリケーション）のようなユーティリティを利用する。これらのデータクリーニング、データ変換、及び／またはデータフィルタリングは、構成ファイルの項において説明した基準、またはユーザ入力受信時の特別基準に従って遂行させることができる。本発明の１またはそれ以上の実施例によれば、データブレインモジュールはＰＣサーバー上で走るソフトウェアアプリケーションであり、当分野においては公知の多数の方法の何れか１つによるＣ⁺⁺及びＳＯＭで符号化されている。 In accordance with one or more embodiments of the present invention, the data brain module can be, for example, a data cleaner (eg, Pearl and C ⁺⁺ software created according to any one of a number of methods known in the art). Applications), data conversion programs (eg, Perl and C ⁺⁺ software applications created according to any one of many methods known in the art), and data filters (eg, many known in the art) Utilize utilities such as Perl and C ⁺⁺ software applications created according to any one of the methods. These data cleaning, data conversion, and / or data filtering can be performed according to the criteria described in the configuration file section or special criteria upon receipt of user input. In accordance with one or more embodiments of the present invention, the data brain module is a software application that runs on a PC server and is encoded in C ⁺⁺ and SOM by any one of a number of methods known in the art. It has become.

本発明の１またはそれ以上の実施例によれば、データブレインエンジンモジュール３０８０の出力は、Microsoft FoxPro^TMデータベースとして実現されている結果データベース３０９０である。更に、本発明の１またはそれ以上の実施例によれば、ＷＥＢコマンダモジュール３０７０は、当分野においては公知の多数の方法の何れか１つに従って作成された安全ｆｔｐ伝送ソフトウェアであり、ユーザまたはクライアントはこの安全ｆｔｐ伝送ソフトウェアを使用してデータブレインエンジンモジュール３０８０へデータを送り、分析させることができる。 According to one or more embodiments of the present invention, the output of the data brain engine module 3080 is a results database 3090 implemented as a Microsoft FoxPro ^™ database. Further, in accordance with one or more embodiments of the present invention, the WEB commander module 3070 is secure ftp transmission software created according to any one of a number of methods known in the art, such as a user or client Can use this secure ftp transmission software to send data to the data brain engine module 3080 for analysis.

上述したデータマインニングプロセスの結果は、それら自体、データマインニングアルゴリズムに課せられた質問（規則誘導の場合のように）に返答するブール規則として、または構成ファイル内のテンプレートによって“重要”であると目標にされている、または指示されている変数のある相対的ランキング、または統計的貢献度として表されることが多い。どの特定のデータマインニングアルゴリズムを使用したのかに依存して、データマインニングアルゴリズムが生成する“結果”（即ち、数値データまたはカテゴリ別の変数の型）からなるデータの型は、各自動化されたデータマインニング分析ランを伴わせるようにユーザが定義することができる所定の統計的出力グラフのセットである。本発明の１またはそれ以上の実施例によれば、このような自動化された出力は、データマインニングの最初のパスに使用されるデータの“生”データマトリックス、及び／または完全データマインニングプロセスの“結果”からなるデータの列だけを含むより小さい“結果”データセットを伴うことができる。自動化されたデータマインニング分析ランが完了した後に、このような情報は全て結果データベース３０９０内に格納される。 The results of the data mining process described above are themselves “important” as a Boolean rule that responds to the questions imposed on the data mining algorithm (as in the case of rule derivation) or by a template in the configuration file. Often expressed as a relative ranking, or statistical contribution, of variables targeted or indicated. Depending on which specific data mining algorithm was used, the type of data consisting of “results” (ie numeric data or categorical variable types) generated by the data mining algorithm was A set of predefined statistical output graphs that can be defined by a user to accompany a data mining analysis run. According to one or more embodiments of the present invention, such automated output may include a “raw” data matrix of data used for the first pass of data mining, and / or a complete data mining process. It can be accompanied by a smaller “result” data set containing only a column of data consisting of “results”. All such information is stored in the results database 3090 after the automated data mining analysis run is completed.

結果の分配：図３に更に示されているように、本発明の１またはそれ以上の実施例によれば、ＷＥＢ視覚化モジュール３１００は、データブレインエンジンモジュール３０８０によって作成された結果データベース３０９０にアクセスして、限定するものではないが、例えば、ＷＥＢサーバーデータベース３１２０内に格納されるＨＴＭＬ報告書を生成するグラフィックス及び分析エンジン３１１０を走らせる。本発明の１またはそれ以上の実施例によれば、ＷＥＢサーバーデータベース３１２０は、当分野においては公知の多くの方法の何れか１つに従って報告書を発送するために、限定するものではないが、例えば、ＰＣのウェブブラウザを使用してユーザによってアクセスすることができる。本発明の１またはそれ以上の実施例によれば、ＷＥＢ視覚化モジュール３１００は、結果の繰り返し報告、ウェブブラウザによって可能化されたチャート、報告書、エキスポートのためのパワーポイントファイルの生成、構成ファイル生成及び変更、会計管理、結果のｅメール通知、及び情報共用を可能にするためのマルチユーザアクセスを可能にする。更に、本発明の１またはそれ以上の実施例によれば、ＷＥＢ視覚化モジュール３１００は、複数のユーザ（十分な機密保護アクセスを有する）が見て変更することができるMicrosoft Power Point（及び／またはWord）オンライン合作文書を複数のユーザが作成することを可能にする。本発明の１またはそれ以上の実施例によれば、ＷＥＢ視覚化モジュール３１００はＰＣサーバー上で走るソフトウェアアプリケーションであり、Java Applets、Microsoft Active Server Pages（ＡＳＰ）コード、及びＸＭＬを使用して符号化されている。例えば、ＷＥＢ視覚化モジュール３１００は、新ユーザがセットアップ（限定するものではないが、例えば、種々のシステム機能への保護アクセスのスペック）できる管理モジュール（例えば、ＰＣサーバー上で走り、当分野においては公知の多くの方法の何れか１つに従ってweb Microsoft ＡＳＰコードで符号化されているソフトウェアアプリケーション）を含み、ユーザ特権（限定するものではないが、例えば、データ分析結果、構成ファイルセットアップ等へのアクセスを含む）を可能にする。ＷＥＢ視覚化モジュール３１００は、更に、ユーザが分析結果を見て報告書を作成することを可能にするジョブビューワモジュール（例えば、ＰＣサーバー上で走り、当分野においては公知の多くの方法の何れか１つに従ってweb Microsoft ＡＳＰコードで符号化されているソフトウェアアプリケーション）を含む。ＷＥＢ視覚化モジュール３１００は、更に、ユーザが彼等のウェブブラウザを使用して特別のチャートを作成することを可能にするチャーティングモジュール（例えば、ＰＣサーバー上で走り、当分野においては公知の多くの方法の何れか１つに従ってweb Microsoft ＡＳＰコードで符号化されているソフトウェアアプリケーション）を含む。ＷＥＢ視覚化モジュール３１００は、更に、データマインニング及び／またはハイパーキューブ形成の前に、ユーザがデータセットを組合わせることを可能にする結合・キューブモジュール（例えば、ＰＣサーバー上で走り、当分野においては公知の多くの方法の何れか１つに従ってweb Microsoft ＡＳＰコードで符号化されているソフトウェアアプリケーション）を含む。ＷＥＢ視覚化モジュール３１００は、更に、データに対してデータマインニングを遂行する前に、ユーザがハイパーキューブで収集したデータを濾波することを可能にするフィルタモジュール（例えば、ＰＣサーバー上で走り、当分野においては公知の多くの方法の何れか１つに従ってweb Microsoft ＡＳＰコードで符号化されているソフトウェアアプリケーション）を含み、このようなフィルタリングはユーザが指定した基準に従って遂行される。ＷＥＢ視覚化モジュール３１００は、更に、ユーザが彼等のウェブブラウザを使用して特別の基準でデータマインニングを遂行することを可能にするオンラインデータツールモジュール（例えば、ＰＣサーバー上で走り、当分野においては公知の多くの方法の何れか１つに従ってweb Microsoft ＡＳＰコードで符号化されているソフトウェアアプリケーション）を含む。本発明の１またはそれ以上の実施例によれば、ユーザは、ＷＥＢ視覚化モジュール３１００に、ユーザがウェブブラウザを使用して所定のデータマトリックスを追跡することを可能にする統計的プロセス制御（“ＳＰＣ”）のチャートを準備させる構成ファイルを構成することができる。 Results Distribution : As further shown in FIG. 3, according to one or more embodiments of the present invention, the WEB visualization module 3100 accesses a results database 3090 created by the data brain engine module 3080. For example, but not limited to, a graphics and analysis engine 3110 that generates an HTML report stored in the WEB server database 3120 is run. In accordance with one or more embodiments of the present invention, the WEB server database 3120 is not limited to routing reports according to any one of many methods known in the art, For example, it can be accessed by a user using a web browser on a PC. In accordance with one or more embodiments of the present invention, the WEB visualization module 3100 can generate a powerpoint file for repeated reporting of results, charts, reports, and exports enabled by a web browser, configuration file. Enable multi-user access to enable creation and modification, accounting management, results email notification, and information sharing. Further, in accordance with one or more embodiments of the present invention, the WEB visualization module 3100 can be used by multiple users (with sufficient secure access) to view and modify a Microsoft Power Point (and / or Word) Allows multiple users to create online collaborative documents. In accordance with one or more embodiments of the present invention, the WEB visualization module 3100 is a software application that runs on a PC server and encodes using Java Applets, Microsoft Active Server Pages (ASP) code, and XML. Has been. For example, the WEB visualization module 3100 runs on a management module (eg, a PC server, which can be set up by a new user (for example, but not limited to, specifications for protected access to various system functions). Access to user privileges (including but not limited to data analysis results, configuration file setup, etc.), including software applications encoded with web Microsoft ASP code according to any one of many known methods Including). The WEB visualization module 3100 also allows a user to view the analysis results and create a report (eg, running on a PC server and any of a number of methods known in the art). Software application encoded with web Microsoft ASP code). The WEB visualization module 3100 also provides a charting module that allows users to create special charts using their web browser (eg, running on a PC server, many known in the art). Software application encoded with web Microsoft ASP code) according to any one of the methods. The WEB visualization module 3100 further includes a combined cube module (e.g., running on a PC server, allowing users to combine data sets prior to data mining and / or hypercube formation in the art). Includes software applications encoded with web Microsoft ASP code according to any one of many known methods. The WEB visualization module 3100 also runs on a filter module (eg, running on a PC server that allows the user to filter the data collected in the hypercube before performing data mining on the data. Such filtering is performed according to user-specified criteria, including software applications encoded in web Microsoft ASP code according to any one of many methods known in the art. The WEB visualization module 3100 further runs an online data tool module (eg, running on a PC server, which allows users to perform data mining on a special basis using their web browser, Software application encoded with web Microsoft ASP code according to any one of many known methods. In accordance with one or more embodiments of the present invention, the user may allow the WEB visualization module 3100 to use a statistical process control ("" that allows the user to track a predetermined data matrix using a web browser. A configuration file can be configured to prepare the SPC ") chart.

当業者ならば、以上の説明が単なる例示に過ぎないことが理解されよう。また、以上の説明は、本発明を説明した精密な形状以外のものを排除する、またはこれらの形状に限定する意図はない。例えばある寸法を記述したが、上述した実施例を使用して種々の設計を実現することが可能であり、これらの設計のための実際の寸法は回路要求に従って決定されるものであるから、これらは単なる例示に過ぎない。 Those skilled in the art will appreciate that the above description is merely illustrative. Also, the above description is not intended to exclude or limit to those other than the precise shapes describing the present invention. For example, although certain dimensions have been described, various designs can be implemented using the embodiments described above, and the actual dimensions for these designs are determined according to circuit requirements, so these Is just an example.

従来技術による集積回路（“ＩＣ”）製造または組立て工場（“半導体工場”または“工場”）に存在する歩留まり分析ツールインフラストラクチャを示す図である。FIG. 1 illustrates a yield analysis tool infrastructure that exists in a prior art integrated circuit (“IC”) manufacturing or assembly plant (“semiconductor factory” or “factory”). 工場内で利用される従来技術のプロセス（本明細書においては、ラインの終わり監視と称している）を示す図である。FIG. 2 illustrates a prior art process (referred to herein as end-of-line monitoring) utilized in a factory. 本発明の１またはそれ以上の実施例に従って製造された工場データ分析システム、及びそれをＩＣ製造プロセスに使用するために本発明の１またはそれ以上の実施例に適用した場合の生のフォーマットされていない入力からデータマインニング結果までの自動化されたデータの流れを示す図である。A factory data analysis system manufactured according to one or more embodiments of the present invention, and raw formatted when applied to one or more embodiments of the present invention for use in an IC manufacturing process. FIG. 6 is a diagram illustrating an automated data flow from no input to a data mining result. 本発明の１またはそれ以上の実施例に従って、構成されていないデータイベントを知的ベースに構成する方法の論理データ流を示す図である。FIG. 4 illustrates a logical data flow of a method for configuring unconfigured data events on an intelligent basis in accordance with one or more embodiments of the present invention. 生の時間ベースデータの例、詳述すれば、処理ツールビーム電流を時間の関数として表したグラフを示す図である。FIG. 4 is a diagram illustrating an example of raw time-based data, more specifically, a graph representing processing tool beam current as a function of time. 図５に示す生の時間ベースデータをどのようにセグメントに分割するかを示す図である。FIG. 6 is a diagram showing how the raw time base data shown in FIG. 5 is divided into segments. 図６のセグメント１に関連付けられた生の時間ベースデータを示す図である。FIG. 7 illustrates raw time-based data associated with segment 1 of FIG. ビン_Ｓ上のセグメント７内のＹ範囲の依存性の例を示す図である。It is a figure which shows the example of the dependence of the Y range in the segment 7 on bin_S. ３レベル分岐データマインニングランを示す図である。It is a figure which shows a 3 level branch data mining run. 本発明の１またはそれ以上の実施例に従って、データブレインコマンドセンターアプリケーションによって遂行される分配待ち行列作成を示す図である。FIG. 6 illustrates a distribution queue creation performed by a data brain command center application in accordance with one or more embodiments of the present invention. 本発明の１またはそれ以上の実施例に従って製造されたユーザ編集及び構成ファイルインタフェースモジュールの分析テンプレートユーザインタフェース部分を示す図である。FIG. 5 illustrates an analysis template user interface portion of a user editing and configuration file interface module manufactured in accordance with one or more embodiments of the present invention. 本発明の１またはそれ以上の実施例に従って製造された構成ファイルの分析テンプレート部分を示す図である。FIG. 6 illustrates an analysis template portion of a configuration file manufactured in accordance with one or more embodiments of the present invention. ハイパーピラミッドキューブ構造を示す図である。It is a figure which shows a hyper pyramid cube structure. ハイパーピラミッドキューブを示す図であって、１つの層を強調表示してある。FIG. 4 is a diagram showing a hyperpyramid cube, with one layer highlighted. ハイパーピラミッドキューブの第２の層から抽出されたハイパーキューブからのハイパーキューブ層（自己編成マップ）を示す図である。It is a figure which shows the hypercube layer (self-organizing map) from the hypercube extracted from the 2nd layer of the hyperpyramid cube. 高、低、及び中央領域を有し、高クラスタ領域及び低クラスタ領域の各々には将来の自動化されたマップマッチング分析のためにタグが付けられている自己編成マップを示す図である。FIG. 4 shows a self-organizing map having high, low, and central regions, each of which is tagged for future automated map matching analysis. ハイパーキューブを通してのセル投影を示す図である。It is a figure which shows the cell projection through a hypercube. 数値分布から“仮想”カテゴリを定義することを示す図である。It is a figure which shows defining a "virtual" category from numerical distribution. ＤＶカテゴリがＤＶの数値分布に基づく場合の、ギャップスコア（ギャップスコア＝（（どの円内にもない）全てのギャップの和）、及び直径スコア（直径スコア＝３つの円のＤＶ平均直径）の計算を示す図である。Gap score (gap score = (sum of all gaps (not in any circle)), and diameter score (diameter score = DV average diameter of 3 circles) when the DV category is based on DV numerical distribution It is a figure which shows calculation. ギャップスコアの大きさ、直径スコアの大きさ、及びＤＶスコアリングリストのシリーズ上に現れるＩＶの回数の３つのファクタを考慮し、所与のＩＶのためのマスタースコアの計算を示す図である。FIG. 6 illustrates the calculation of a master score for a given IV, taking into account three factors: the size of the gap score, the size of the diameter score, and the number of IVs that appear on the DV scoring list series. データブレインモジュールへの入力であるデータマトリックスのサブセットの例を示す図である。It is a figure which shows the example of the subset of the data matrix which is the input to a data brain module. 番号（ビン）対カテゴリ（ツールＩＤ）ランの例を示す図である。It is a figure which shows the example of a number (bin) vs category (tool ID) run. ３つのツールのためのスコアリングしきい値の使用を示す図である。FIG. 6 illustrates the use of scoring thresholds for three tools. 工場内の欠陥検査ツールまたは欠陥見直しツールが生成する欠陥データファイルの例を示す図である。It is a figure which shows the example of the defect data file which the defect inspection tool or defect review tool in a factory produces | generates. データ変換アルゴリズムによって作成されたデータマトリックスの例を示す図である。It is a figure which shows the example of the data matrix produced by the data conversion algorithm. 欠陥ブレインモジュールの典型的出力を示す図である。FIG. 3 is a diagram illustrating a typical output of a defective brain module.

Claims

A method of data mining information obtained at an integrated circuit manufacturing factory (“factory”),
(A) collecting data from one or more of systems, tools, and databases that generate data in the factory or collect data from the factory;
(B) formatting the data and storing the formatted data in a source database;
(C) extracting a portion of the data for use in data mining according to a configuration file specified by the user;
(D) data mining the extracted portion of the data in response to an analysis configuration file specified by the user;
(E) storing the result of the data mining in a result database;
(F) providing access to the results;
A method comprising the steps of:

The method of claim 1, wherein collecting the data comprises collecting in real time through a network on demand or on a scheduled basis.

The method of claim 1, wherein providing access comprises providing access through a network.

The method of claim 3, wherein providing access through the network includes providing access using a browser.

The method of claim 1, wherein collecting the data occurs in real time, and formatting the data and storing the formatted data in a source database occurs in real time. .

The method of claim 1, wherein collecting the data includes collecting data transmitted in a format commanded by a customer and / or commanded by a tool.

The method of claim 1, wherein collecting the data comprises collecting encrypted data.

The step of formatting the data is that the data is a widget ID, where? ,What time? ,what? And a leveling scheme including a value level.

9. The method of claim 8, wherein the widget ID is identified by one or more of lot ID, wafer ID, slot ID, reticle ID, die ID, and sub-die x, y Cartesian coordinates.

Where? 9. The method of claim 8, wherein the method is identified by one or more of a process flow / assembly line manufacturing step and substeps.

What time? 9. The method of claim 8, wherein is identified by one or more of a measurement date / time.

The step of formatting includes the step of converting time-based operational state data generated in a processing tool in a factory during wafer processing into key integrated circuit specific statistics according to a configuration file. The method according to 1.

The storing step further includes a step of extracting data from the source database, and a step of storing the extracted data in a hybrid database comprising a relational database component and a file system component. The method of claim 1.

The method of claim 13, wherein the relational database component uses a hash index algorithm to create an access key to discrete data stored in the file system component. .

The method of claim 14, wherein the extracting step comprises using a hash and join algorithm to accumulate data from the hybrid database.

The extracting step includes obtaining a hypercube definition using the configuration file, creating a vector cache definition using the hypercube definition, and creating a vector cache of information. The method of claim 14, wherein:

Creating a vector cache of the information comprises: (a) retrieving a list of files and data elements identified by the vector cache definition from the relational database component using a hash index key; (B) retrieving the file from the file system component; and (c) grouping the vector cache with data elements identified in the vector cache definition. The method of claim 16.

The method of claim 17, wherein the extracting step further comprises generating a hypercube from the vector cache information using the hypercube definition.

The data mining step includes self-organizing map data mining, rule-derived data mining, data mining for correlating numeric data with categorical data or attribute data, and converting the categorical data or attribute data into numeric data. The method of claim 1, comprising using one or more of data mining for correlation.

The data mining step includes a step of self-organizing map data mining to form a cluster, a step of performing map matching by performing map matching analysis on an output from the SOM data mining, and an SOM data mining analysis. Generating a rule description of the cluster by rule-derived data mining of the output of the data, correlating the category-specific data with the numerical data of the output from the rule-derived data mining, and numerical data as the map matching data mining The method of claim 1 including the step of correlating to categorical data of output from the ning.

The SOM data mining step automatically organizes the data and identifies separate and dominant data clusters that represent different “factory issues” within a data set, and the map matching analysis 21. A method according to claim 20, characterized in that a "variable" is described by some historical data known to impact the "interesting" variable behavior on a cluster-by-cluster basis.