KR102182678B1

KR102182678B1 - Method and appratus for predicting fault pattern using multi-classifier based on feature selection method in semiconductor manufacturing process

Info

Publication number: KR102182678B1
Application number: KR1020190004476A
Authority: KR
Inventors: 한영신
Original assignee: 인하대학교 산학협력단
Priority date: 2019-01-14
Filing date: 2019-01-14
Publication date: 2020-11-24
Also published as: KR20200088012A

Abstract

다양한 실시예들에 따른 반도체 제조 공정에서 특징 선택 기법에 따른 멀티 분류기를 활용한 불량 패턴 예측 장치 및 방법은, 제조되는 반도체로부터 데이터 셋을 수집하고, 데이터 셋으로부터 복수 개의 특징들을 선택하고, 복수 개의 분류기들을 이용하여, 특징들을 기반으로, 반도체에 대한 불량 여부를 예측하고, 분류기들로부터 출력되는 예측 결과들을 결합하여, 반도체의 불량 여부를 판정하도록 구성될 수 있다. In a semiconductor manufacturing process according to various embodiments, a defect pattern prediction apparatus and method using a multi-classifier according to a feature selection technique collects a data set from a semiconductor to be manufactured, selects a plurality of features from the data set, and The classifiers may be used to predict whether the semiconductor is defective based on features, and may be configured to determine whether the semiconductor is defective by combining prediction results output from the classifiers.

Description

A device and method for predicting a defect pattern using a multi-classifier according to a feature selection technique in the semiconductor manufacturing process {METHOD AND APPRATUS FOR PREDICTING FAULT PATTERN USING MULTI-CLASSIFIER BASED ON FEATURE SELECTION METHOD IN SEMICONDUCTOR MANUFACTURING PROCESS}

다양한 실시예들은 반도체 제조 공정에서 특징 선택 기법에 따른 멀티 분류기를 활용한 불량 패턴 예측 장치 및 방법에 관한 것이다.Various embodiments relate to an apparatus and method for predicting a defect pattern using a multi classifier according to a feature selection technique in a semiconductor manufacturing process.

반도체의 제조 공정은 FAB(Wafer fabrication)공정, Probe test 공정, Assembly 공정, Package test 공정의 순서대로 진행 된다. FAB 공정은 웨이퍼 표면에 레이어들을 조합하여 수백개의 칩으로 구성하는 공정이다. 웨이퍼 Probe test 공정은 FAB 공정이 끝난 뒤에 Wafer 내의 칩에 전기적 자극을 가해 정상적인 기능 여부를 검사하여 Pass/fail을 판별하는 공정이다. 현재 반도체 공정은 FAB 공정과 Probe Test 공정에 초점을 맞추어 반도체 수율을 예측하고 있다. The semiconductor manufacturing process proceeds in the order of FAB (Wafer fabrication) process, probe test process, assembly process, and package test process. The FAB process is a process that combines layers on a wafer surface to form hundreds of chips. The wafer probe test process is a process of determining pass/fail by applying electrical stimulation to the chip in the wafer after the FAB process is completed to check for normal functioning. Currently, the semiconductor process focuses on the FAB process and the probe test process to predict the semiconductor yield.

하지만, 반도체 제조 기술의 발달과 웨이퍼를 구성하는 칩의 수가 증가함에 따라 시간과 비용이 소요되는 문제점이 발생하고 있다. 따라서 반도체 산업에서 최종 검사 수율을 예측하여 시간과 비용을 줄이기 위한 연구가 필요하다. 복잡한 웨이퍼 제조 공정은 일부 결함이 발생할 수 있으며, 최종 제품 생산에 실패를 할 수 있다. 따라서 제조 공정에서의 오류 검출 및 분류 기법이 필요하며, 제품의 최종 생산되기 이전의 불량 패턴 예측을 통해 반도체의 품질과 신뢰성을 향상시킬 수 있다.However, with the development of semiconductor manufacturing technology and an increase in the number of chips constituting the wafer, there is a problem that it takes time and cost. Therefore, research is needed to reduce time and cost by predicting the final inspection yield in the semiconductor industry. Complex wafer fabrication processes may introduce some defects and result in failure to produce the final product. Therefore, it is necessary to detect and classify errors in the manufacturing process, and improve the quality and reliability of semiconductors through prediction of defective patterns before final production of the product.

다양한 실시예들에 따른 반도체 제조 공정에서 특징 선택 기법에 따른 멀티 분류기를 활용한 불량 패턴 예측 장치는, 제조되는 반도체로부터 데이터 셋을 수집하는 데이터 수집부, 상기 데이터 셋으로부터 복수 개의 특징들을 선택하는 복수 개의 특징 선택부들, 상기 특징들에 기반하여, 상기 반도체에 대한 불량 여부를 예측하는 복수 개의 분류기들, 및 상기 분류기들로부터 출력되는 예측 결과들을 결합하여, 상기 반도체의 불량 여부를 판정하는 판정부를 포함할 수 있다. In a semiconductor manufacturing process according to various embodiments, a defect pattern prediction apparatus using a multi-classifier according to a feature selection technique includes a data collection unit that collects a data set from a manufactured semiconductor, and a plurality of features that select a plurality of features from the data set. A plurality of feature selection units, a plurality of classifiers that predict whether the semiconductor is defective based on the characteristics, and a determination unit that determines whether the semiconductor is defective by combining prediction results output from the classifiers can do.

다양한 실시예들에 따른 반도체 제조 공정에서 특징 선택 기법에 따른 멀티 분류기를 활용한 불량 패턴 예측 방법은, 제조되는 반도체로부터 데이터 셋을 수집하는 단계, 상기 데이터 셋으로부터 복수 개의 특징들을 선택하는 단계, 복수 개의 분류기들을 이용하여, 상기 특징들을 기반으로, 상기 반도체에 대한 불량 여부를 예측하는 단계, 및 상기 분류기들로부터 출력되는 예측 결과들을 결합하여, 상기 반도체의 불량 여부를 판정하는 단계를 포함할 수 있다. A method for predicting a defect pattern using a multi-classifier according to a feature selection technique in a semiconductor manufacturing process according to various embodiments includes: collecting a data set from a semiconductor to be manufactured, selecting a plurality of features from the data set, and The method may include predicting whether the semiconductor is defective based on the features, and determining whether the semiconductor is defective by combining prediction results output from the classifiers using the classifiers. .

다양한 실시예들에 따르면, 불량 패턴 예측 장치가 복수 개의 분류기들을 이용하여 반도체의 불량 패턴을 예측함으로써, 보다 효과적으로 불량 패턴을 예측할 수 있다. 즉 불량 패턴 예측 장치가 복수 개의 분류기들을 이용하여 반도체에 대한 불량 여부를 예측하고, 분류기들로부터 출력되는 예측 결과를 결합하여 반도체의 불량 여부를 판정하기 때문에, 불량 패턴을 예측하는 데 있어서 정확성 및 신뢰성이 향상될 수 있다. According to various embodiments, the apparatus for predicting a defective pattern may predict a defective pattern more effectively by predicting a defective pattern of a semiconductor using a plurality of classifiers. That is, since the defective pattern prediction apparatus predicts whether a semiconductor is defective using a plurality of classifiers, and determines whether the semiconductor is defective by combining the prediction results output from the classifiers, accuracy and reliability in predicting a defective pattern This can be improved.

도 1은 다양한 실시예들에 따른 불량 패턴 예측 장치를 도시하는 도면이다.
도 2는 다양한 실시예들에 따른 불량 패턴 예측 방법을 도시하는 도면이다.
도 3은 도 2의 데이터 전처리 단계를 도시하는 도면이다. 1 is a diagram illustrating an apparatus for predicting a defective pattern according to various embodiments.
2 is a diagram illustrating a method of predicting a defective pattern according to various embodiments.
3 is a diagram illustrating a data preprocessing step of FIG. 2.

이하, 본 문서의 다양한 실시예들이 첨부된 도면을 참조하여 설명된다. Hereinafter, various embodiments of the present document will be described with reference to the accompanying drawings.

본 문서의 다양한 실시예들 및 이에 사용된 용어들은 본 문서에 기재된 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 해당 실시 예의 다양한 변경, 균등물, 및/또는 대체물을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조 부호가 사용될 수 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 본 문서에서, "A 또는 B", "A 및/또는 B 중 적어도 하나", "A, B 또는 C" 또는 "A, B 및/또는 C 중 적어도 하나" 등의 표현은 함께 나열된 항목들의 모든 가능한 조합을 포함할 수 있다. "제 1", "제 2", "첫째" 또는 "둘째" 등의 표현들은 해당 구성요소들을, 순서 또는 중요도에 상관없이 수식할 수 있고, 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 뿐 해당 구성요소들을 한정하지 않는다. 어떤(예: 제 1) 구성요소가 다른(예: 제 2) 구성요소에 "(기능적으로 또는 통신적으로) 연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로 연결되거나, 다른 구성요소(예: 제 3 구성요소)를 통하여 연결될 수 있다.Various embodiments of the present document and terms used therein are not intended to limit the technology described in this document to a specific embodiment, and should be understood to include various modifications, equivalents, and/or substitutes of the corresponding embodiment. In connection with the description of the drawings, similar reference numerals may be used for similar elements. Singular expressions may include plural expressions unless the context clearly indicates otherwise. In this document, expressions such as "A or B", "at least one of A and/or B", "A, B or C" or "at least one of A, B and/or C" are all of the items listed together. It can include possible combinations. Expressions such as "first", "second", "first" or "second" can modify the corresponding elements regardless of their order or importance, and are only used to distinguish one element from another. The components are not limited. When it is mentioned that a certain (eg, first) component is “(functionally or communicatively) connected” or “connected” to another (eg, second) component, the certain component is It may be directly connected to the component, or may be connected through another component (eg, a third component).

본 문서에서 사용된 용어 "모듈"은 하드웨어, 소프트웨어 또는 펌웨어로 구성된 유닛을 포함하며, 예를 들면, 로직, 논리 블록, 부품, 또는 회로 등의 용어와 상호 호환적으로 사용될 수 있다. 모듈은, 일체로 구성된 부품 또는 하나 또는 그 이상의 기능을 수행하는 최소 단위 또는 그 일부가 될 수 있다. 예를 들면, 모듈은 ASIC(application-specific integrated circuit)으로 구성될 수 있다. The term "module" used in this document includes a unit composed of hardware, software, or firmware, and may be used interchangeably with terms such as, for example, logic, logic blocks, parts, or circuits. A module may be an integrally configured component or a minimum unit or a part of one or more functions. For example, the module may be configured as an application-specific integrated circuit (ASIC).

반도체 제조공정에서 불량 패턴 예측은 복잡한 제조 공정에서 머신러닝 및 데이터 마이닝의 분류 기술을 이용하여 시간과 비용을 줄일 수 있기 때문에 매우 중요하다. 이러한 반도체 제조 공정에서 불량 패턴 예측을 위한 많은 연구가 진행되고 있다. A.M. Ison은 decision tree 기반의 불량 패턴 예측 모델을 제안했다. 다섯 개의 센서로부터 입력되는 데이터를 분석하여 plasma etch 장치의 불량 패턴을 검출했다. Plasma로부터 optical emission data를 획득하여, 데이터 분석을 통해 이산 데이터로 변환하여 decision tree를 적용하여 분류 예측 모델을 생성했다. He는 k-nearest neighbor rule 기반의 불량 패턴 예측 방법을 제안했다. 반도체 제조 공정에서는 fault detection을 하는 데에 불필요한 feature가 있기 때문에 이를 제거하고 주요 특징을 획득하기 위해 인접한 특징을 획득하기 위해 KNN rule을 이용했다. Tafazzoli는 SVM을 이용하여 패턴 검증 프로세스를 제안했다. 제안하는 방법은 3개의 서로 다른 SVM(SVM-Linear, SVM-RBF, SVM-Poly)을 결합하여 하나의 combined SVM 분류기를 생성했다. 서로 다른 분류기를 결합하여 하나의 분류기를 생성했기 때문에 낮은 분류 에러율을 나타냈다. Kittisak은 FAB 공정에서 센서를 통해 590 개의 특징에 대해 특징 선택을 위해 MeanDiff 방법을 제안했다. 추출된 특징을 이용하여 Decision tree와 boosting을 이용한 분류기를 생성하여 불량 패턴 예측의 성능을 높였다.Defect pattern prediction in the semiconductor manufacturing process is very important because it can reduce time and cost by using classification techniques of machine learning and data mining in a complex manufacturing process. In such a semiconductor manufacturing process, many studies are being conducted to predict a defective pattern. A.M. Ison proposed a decision tree-based failure pattern prediction model. By analyzing the data input from five sensors, a defective pattern of the plasma etch device was detected. Optical emission data was acquired from plasma, converted into discrete data through data analysis, and a decision tree was applied to generate a classification prediction model. He proposed a method for predicting the defect pattern based on the k-nearest neighbor rule. In the semiconductor manufacturing process, since there are features that are unnecessary for fault detection, the KNN rule was used to remove them and acquire adjacent features to obtain the main features. Tafazzoli proposed a pattern verification process using SVM. The proposed method combines three different SVMs (SVM-Linear, SVM-RBF, and SVM-Poly) to create a combined SVM classifier. Because one classifier was created by combining different classifiers, it showed a low classification error rate. Kittisak proposed the MeanDiff method for feature selection for 590 features through sensors in the FAB process. By using the extracted features, a decision tree and a classifier using boosting were created to improve the performance of bad pattern prediction.

불량 패턴 예측은 비용 및 품질에 많은 영향을 미치기 때문에, 정확도가 높은 분류기를 필요로 한다. 따라서, 불량 패턴을 예측하기 위해 특징을 선택하는 방법과 여러 개의 분류기들을 조합하는 방법이 중요하다. 다양한 실시예들에서 여러 개의 특징 선택 기법들에 따라 분류기들을 생성하고, DS(dempster-shafer)를 이용하여 분류기들에 따른 출력 데이터를 조합하는 방법을 제안한다. Defect pattern prediction has a large impact on cost and quality, and thus requires a classifier with high accuracy. Therefore, a method of selecting a feature and a method of combining several classifiers to predict a defective pattern is important. In various embodiments, a method of generating classifiers according to several feature selection techniques and combining output data according to the classifiers by using a density-shafer (DS) is proposed.

특징 선택 기법들은 알고리즘의 특성에 따라 필요한 특징들을 추출하지만 분류에 유용한 패턴을 선택하는 방법이 차이가 있다. 즉, 데이터의 입력에 따라 특징 선택 기법의 성능의 차이가 크다. 따라서, 불량 패턴을 예측하기 위해서는 여러 개의 특징 선택 기법의 조합이 필요하다. 특정한 데이터에 따라 잘 분류할 수 있는 특징들이 있기 때문이다. 특징 선택 기법에 따라 여러 개의 분류기들을 학습하면, 입력되는 데이터에 따라 예측되는 출력 데이터가 서로 다르다. 따라서 이를 위한 조합 방법이 필요하다. 출력 데이터의 정보를 조합하여 최종으로 결정하는 방법에는 여러 기법이 있다. DS는 확신의 정도를 구간으로 표현하여, 확률이론과 같이 서로 베타적인 가설집합을 설정하여 정보를 조합한다. 이를 통해, 여러 개의 분류기들에서 출력되는 데이터를 이용하여 최종 예측되는 정보를 제공할 수 있다. Feature selection techniques extract necessary features according to the characteristics of an algorithm, but there is a difference in how to select a pattern useful for classification. That is, there is a large difference in performance of the feature selection technique according to data input. Therefore, in order to predict a defective pattern, a combination of several feature selection techniques is required. This is because there are features that can be classified according to specific data. When several classifiers are trained according to the feature selection technique, predicted output data differ according to input data. Therefore, a combination method for this is required. There are several techniques for a final decision by combining the information of the output data. DS expresses the degree of confidence in intervals and combines information by establishing a hypothetical set that is beta to each other like probability theory. Through this, information that is finally predicted using data output from several classifiers may be provided.

DS(Dempster-Shafer Theory)는 Arthur Dempster와 Glenn Shafer에 의하여 제시된 불확실하고 부정확한 문제를 다루는 수학적 이론이다. DS는 데이터 집합에 대해서 믿음 값과 가능성 값을 이용하여 증거구간 설정과 같은 효과적인 방법을 제공한다. DS는 확신의 정도를 구간으로 표현하며 확률과 같이 서로 배타적인 가설 집합을 설정한다. 대상의 집합을 환경이라고 하며 θ으로 표시된다. θ는 θ={θ₁, θ₂, θ₃, ..., θ_n}와 같이 여러 개의 원소들을 가질 수 있으며, 부분집합의 개수는 2^k가 된다. θ가 오직 한가지의 원소를 가질 때, 식별 프레임이라고 한다. 2^k 개의 부분집합으로 구성된 것은 멱집합(Power Set)이라고 하며 Θ으로 표기된다. Θ가 어느 증거에 의해 지원받는 정도를 기본확률 배정함수 m이라고 하며, 하기 [수학식 1]과 같이 표현될 수 있다. 하기 [수학식 2]와 같이, m은 공집합에 대해서는 0의 확률값에 사상되며, Θ의 모든 부분집합에 대해서 m의 합은 1이 된다.DS (Dempster-Shafer Theory) is a mathematical theory that deals with uncertain and inaccurate problems proposed by Arthur Dempster and Glenn Shafer. DS provides an effective method, such as setting the evidence section, using belief values and probability values for data sets. DS expresses the degree of confidence in intervals and establishes a set of hypotheses that are mutually exclusive, such as probability. The set of objects is called the environment and is denoted by θ. θ can have several elements, such as θ = {θ ₁ , θ ₂ , θ ₃ , ..., θ _n }, and the number of subsets is 2 ^k . When θ has only one element, it is called an identification frame. What consists of ^2k subsets is called a power set and is denoted as Θ. The degree to which Θ is supported by a certain evidence is called the basic probability assignment function m, and can be expressed as [Equation 1] below. As shown in [Equation 2] below, m is mapped to a probability value of 0 for the empty set, and the sum of m becomes 1 for all subsets of Θ.

주어진 증거에 의해여 임의의 가설 H(Hypnosis)에 대한 믿음 값인 Belief(H)는 하기 [수학식 3]과 같다.Given the evidence, Belief(H), which is a belief value for an arbitrary hypothesis H (Hypnosis), is as shown in [Equation 3] below.

하기 [수학식 4]와 같이, 신뢰의 정도는 주어진 증거들의 신뢰성과 전반적인 환경의 영향에 따라 결정되며, 정도의 비율은 e로 표시한다.As shown in [Equation 4] below, the degree of trust is determined according to the reliability of the given evidence and the influence of the overall environment, and the ratio of the degree is denoted by e.

여기서, r은 0과 1사이의 값 (0≤r≤1)이며, r=0이면 참이고, r=1이면 거짓이 된다. DS는 서로 다른 증거들간의 융합의 과정을 통해 새로운 믿음의 값을 계산한다. 따라서, 증거들간의 융합은 하기 [수학식 5]와 같이 표현될 수 있으며,

이면, 두 증거들간의 융합의 믿음 값은 0이다.Here, r is a value between 0 and 1 (0≤r≤1), if r=0, it is true, and if r=1, it becomes false. DS calculates the value of a new belief through the process of fusion between different evidences. Therefore, the fusion between the evidence can be expressed as the following [Equation 5],

If so, the belief value of the fusion between the two evidences is zero.

DS는 H에 대한 신뢰 척도를 Bel(H)값으로 표현되지 않고, [Bel(H), Pls(H)]와 같은 구간으로 표시한다. 이 구간을 증거 구간(Evidential Interval)이라 한다. Plausibility(Pls)는 증거에 기초하여 가설이 부정되지 않는 범위를 의미하며(참과 거짓의 구간을 제외한 빈 구간), 최대로 신뢰 받을 수 있는 가능성을 의미한다. Bel은 0 부터 1 까지의 범위를 가지며(참과 거짓의 범위), Pls는 하기 [수학식 6]과 같이 정의될 수 있으며 [0,1] 값을 갖는다. 또한 가능성 값은 믿음 값의 융합과 마찬가지로 다수의 증거로부터 융합의 과정을 표현할 수 있다.DS is not expressed as a Bel(H) value for the confidence scale for H, but is expressed in the same interval as [Bel(H), Pls(H)]. This interval is called the Evidential Interval. Plausibility (Pls) refers to the range in which hypotheses are not negated based on evidence (empty intervals excluding the intervals between true and false), and refers to the possibility of being maximally trusted. Bel has a range from 0 to 1 (the range of true and false), and Pls can be defined as in [Equation 6] and has a value of [0,1]. Also, like the fusion of belief values, the likelihood value can express the process of fusion from a number of evidences.

도 1은 다양한 실시예들에 따른 불량 패턴 예측 장치를 도시하는 도면이다.1 is a diagram illustrating an apparatus for predicting a defective pattern according to various embodiments.

도 1을 참조하면, 다양한 실시예들에 따른 불량 패턴 예측 장치(100)는, 데이터 수집부(110), 복수 개의 특징 선택부(120)들, 복수 개의 분류기(130)들 및 판정부(140)를 포함할 수 있다. Referring to FIG. 1, the apparatus 100 for predicting a defective pattern according to various embodiments includes a data collection unit 110, a plurality of feature selection units 120, a plurality of classifiers 130, and a determination unit 140. ) Can be included.

데이터 수집부(110)는 제조되는 반도체로부터 데이터 셋을 수집할 수 있다.The data collection unit 110 may collect a data set from a manufactured semiconductor.

특징 선택부(120)들은 데이터 셋으로부터 불량 패턴과 관련된 복수 개의 특징들을 선택할 수 있다. 이 때 특징 선택부(120)들은 복수 개의 분류기(130)들에 각각 대응하여, 불량 패턴과 관련된 특징들을 추출할 수 있다. 여기서, 특징 선택부(120)들은 복수 개의 특징 선택 방법들을 이용하여, 특징들을 선택할 수 있다. 각각의 특징 선택부(120)가 각각의 특징 선택 방법을 이용할 수 있다. 예를 들면, 특징 선택 방법들은 Correlation-based Feature Selection(CFS), Symmetrical Uncertainty(SU), Information Gain(IG) 또는 Combination Features(CF) 중 적어도 어느 하나를 포함할 수 있다. CFS는, Pearson의 상관계수를 이용하여 두 개의 변수들 간의 상관 관계에 대해 분석하고, 상관이 높은 순위의 특징을 추출하는 방법이다. SU는, 다른 특징 집합과 관련하여 대칭에 대한 불확실성을 측정한 후, 특징 집합 속성의 가치를 평가하는 방법이다. IG는 목표 클래스와 입력 특징들 간의 평균 정보량을 비교하여 특징을 선택하는 방법으로서, Decision Tree C4.5에서도 이용하는 방법이다. CF는 다른 특징 선택 방법들의 결과에서 공통으로 들어가는 특징을 선택하는 방법이다. The feature selection units 120 may select a plurality of features related to the defective pattern from the data set. In this case, the feature selection units 120 may respectively correspond to the plurality of classifiers 130 and extract features related to the defective pattern. Here, the feature selection units 120 may select features using a plurality of feature selection methods. Each feature selection unit 120 may use a respective feature selection method. For example, the feature selection methods may include at least one of Correlation-based Feature Selection (CFS), Symmetrical Uncertainty (SU), Information Gain (IG), or Combination Features (CF). CFS is a method of analyzing the correlation between two variables using Pearson's correlation coefficient, and extracting a feature with a high correlation. SU is a method of evaluating the value of a feature set attribute after measuring the uncertainty about symmetry in relation to another feature set. IG is a method of selecting features by comparing the average amount of information between the target class and input features, and is also used in Decision Tree C4.5. CF is a method of selecting a feature that is commonly entered from the result of other feature selection methods.

분류기(130)들은 특징 선택부(120)들에서 선택된 특징들에 기반하여, 반도체를 분류할 수 있다. 이 때 분류기(130)들은 반도체에 대하여, pass와 fail에 대한 확률값을 예측할 수 있다. 예를 들면, 분류기(130)들은 Naive Bayesian(NB), Decision Tree C4.5, Support Vector Machine(SVM), Back Propagation Network(BPN) 또는 Random Forest(RF) 중 적어도 어느 하나를 포함할 수 있다. The classifiers 130 may classify a semiconductor based on features selected by the feature selectors 120. In this case, the classifiers 130 may predict probability values for pass and fail for the semiconductor. For example, the classifiers 130 may include at least one of Naive Bayesian (NB), Decision Tree C4.5, Support Vector Machine (SVM), Back Propagation Network (BPN), or Random Forest (RF).

판정부(140)는 분류기(130)들의 예측 결과들을 결합하여, 반도체를 판정할 수 있다. 이 때 판정부(140)는 DS(dempster-shafer) 기반으로 분류기(130)들의 예측 결과들을 결합하여, 반도체의 pass/fail을 판정할 수 있다. 예를 들면, 판정부(140)는 반도체에 대한 pass와 fail에 대한 확률 값들을 비교하여, 상대적으로 높은 확률 값을 결정할 수 있다. The determination unit 140 may determine a semiconductor by combining the prediction results of the classifiers 130. In this case, the determination unit 140 may determine pass/fail of the semiconductor by combining the prediction results of the classifiers 130 based on a density-shafer (DS). For example, the determination unit 140 may determine a relatively high probability value by comparing probability values for pass and fail for the semiconductor.

도 2는 다양한 실시예들에 따른 불량 패턴 예측 방법을 도시하는 도면이다.2 is a diagram illustrating a method of predicting a defective pattern according to various embodiments.

도 2를 참조하면, 불량 패턴 예측 장치(100)는 210 단계에서 제조되는 반도체로부터 데이터 셋을 수집할 수 있다. 예를 들면, 데이터 셋은 SECOM(Semi COnductor Manufacturing) 데이터 셋일 수 있으며, 반도체의 제조 공정에서 590 개의 센서들을 통하여 반도체로부터 수집되는 FAB 데이터로서, 1567 개의 레코드들과 590개의 특징들을 포함할 수 있다. 여기서, 1567 개의 레코드들 중에서, fail(불량 패턴)들의 개수는 104 개(encoded as 1)이며, pass(정량 패턴)들의 개수는 1463 개(encoded as -1)로 구성될 수 있다. Referring to FIG. 2, the apparatus 100 for predicting a defective pattern may collect a data set from a semiconductor manufactured in step 210. For example, the data set may be a semi-conductor manufacturing (SECOM) data set, as FAB data collected from a semiconductor through 590 sensors in a semiconductor manufacturing process, and may include 1567 records and 590 features. Here, among 1567 records, the number of fail (defect pattern) is 104 (encoded as 1), and the number of pass (quantitative pattern) may be composed of 1463 (encoded as -1).

불량 패턴 예측 장치(100)는 220 단계에서 데이터 셋에 대하여, 데이터 전처리를 수행할 수 있다. 이 때 불량 패턴 예측 장치(100)는 복수 개의 분류기(130)들에 대응하여, 불량 패턴과 관련된 복수 개의 특징들을 추출할 수 있다. 일 예로, 불량 패턴 예측 장치(100)는 SECOM 데이터 셋에서 590 개의 특징들로부터 불량 패턴과 관련이 있는 특징들을 추출할 수 있다. 다만, pass/fail의 불균형으로 인해 SECOM 데이터 셋은 정확하게 분석하기 매우 어렵다. 따라서, 정확한 pass/fail 분류를 위해서, 590 개의 특징들 중에서 불량 패턴과 관련된 특징들이 추출되어야 한다. 이를 위해, 불량 패턴 예측 장치(100)는 SECOM 데이터 셋의 1567 개의 레코드들와 590 개의 특징들을 분석하기 위해, 데이터 전처리를 수행할 수 있다.The bad pattern prediction apparatus 100 may perform data preprocessing on the data set in step 220. In this case, the apparatus 100 for predicting a defective pattern may extract a plurality of features related to the defective pattern in response to the plurality of classifiers 130. For example, the bad pattern prediction apparatus 100 may extract features related to the bad pattern from 590 features from the SECOM data set. However, due to the imbalance of pass/fail, it is very difficult to accurately analyze the SECOM data set. Therefore, for accurate pass/fail classification, features related to a defective pattern should be extracted from among 590 features. To this end, the bad pattern prediction apparatus 100 may perform data pre-processing to analyze 1567 records and 590 features of the SECOM data set.

도 3은 도 2의 데이터 전처리 단계를 도시하는 도면이다. 3 is a diagram illustrating a data preprocessing step of FIG. 2.

도 3을 참조하면, 불량 패턴 예측 장치(100)는 310 단계에서 데이터 셋에 대하여, 데이터 클리닝을 수행할 수 있다. 이 때 데이터 클리닝은 데이터 셋으로부터 불필요한 데이터를 제거하기 위한 작업이다. 예를 들면, 불량 패턴 예측 장치(100)는 두 가지의 규칙들에 의거하여 데이터 클리닝을 수행할 수 있다. 첫 번째 규칙에 따르면, 불량 패턴 예측 장치(100)는 missing 값을 포함하는 레코드들을 제거할 수 있다. 두 번째 규칙에 따르면, 불량 패턴 예측 장치(100)는 특징들 중에서 missing 값을 포함하는 데이터, ‘Not available’인 데이터 및 하나의 데이터(single value)로만 구성된 데이터를 제거할 수 있다. 데이터 클리닝 결과, 48 개의 레코드들이 제거되어 총 1,519 개의 레코드들이 남고, 281개의 특징들이 제거되어 309 개의 특징들이 남을 수 있다.Referring to FIG. 3, the apparatus 100 for predicting a bad pattern may perform data cleaning on a data set in step 310. In this case, data cleaning is an operation to remove unnecessary data from the data set. For example, the apparatus 100 for predicting a defective pattern may perform data cleaning based on two rules. According to the first rule, the apparatus 100 for predicting a bad pattern may remove records including a missing value. According to the second rule, the apparatus 100 for predicting a bad pattern may remove data including a missing value, data that is'not available', and data composed of only one data (single value) among the features. As a result of data cleaning, 48 records may be removed, leaving a total of 1,519 records, and 281 features may be removed, leaving 309 features.

불량 패턴 예측 장치(100)는 320 단계에서 데이터 셋으로부터 복수 개의 특징들을 선택할 수 있다. 이 때 불량 패턴 예측 장치(100)는 데이터 클리닝 결과에 기반하여, 특징들을 선택할 수 있다. 여기서, 불량 패턴 예측 장치(100)는 복수 개의 특징 선택 방법들을 이용하여 특징들을 선택할 수 있다. 예를 들면, 불량 패턴 예측 장치(100)는 데이터 클리닝 결과 남은 309 개의 특징들 중에서 pass/fail을 구분할 수 있는 특징들을 추출할 수 있다. 일 예로, 불량 패턴 예측 장치(100)는 4 개의 특징 선택 방법들을 이용할 수 있으며, 일반적으로 사용되는 3 개의 특징 선택 방법들을 이용할 수 있다. 예를 들면, 특징 선택 방법들은 Correlation-based Feature Selection(CFS), Symmetrical Uncertainty(SU), Information Gain(IG) 또는 Combination Features(CF) 중 적어도 어느 하나를 포함할 수 있다. CFS는 Pearson의 상관 계수를 이용하여 두 개의 변수들 간의 상관 관계에 대해 분석하고, 상관이 높은 순위의 특징을 추출하는 방법이다. SU는 다른 특징 집합과 관련하여 대칭에 대한 불확실성을 측정한 후, 특징 집합 속성의 가치를 평가하는 방법이다. IG는 목표 클래스와 입력 특징들 간의 평균 정보량을 비교하여 특징을 선택하는 방법으로서, Decision Tree C4.5에서도 이용하는 방법이다. CF는 다른 특징 선택 방법들의 결과에서, 공통으로 들어가는 특징을 선택하는 방법이다. 이를 통해, 불량 패턴 예측 장치(100)는 데이터 클리닝 결과 남은 1519 개의 레코드들과 309 개의 특징들로부터, 하기 [표 1]과 같은 특징들을 선택할 수 있다. 여기서, 불량 패턴 예측 장치(100)는 SU와 IG에 기반하여, 동일한 특징들을 선택하므로, SU 또는 IG 중 어느 하나의 특징 선택 방법이 이용될 수 있다. 그리고 불량 패턴 예측 장치(100)는 CF에 기반하여, CFS, SU 및 IG에 대응하여 공통으로 12 개의 특징들을 선택할 수 있다. The apparatus 100 for predicting a bad pattern may select a plurality of features from the data set in step 320. In this case, the apparatus 100 for predicting a defective pattern may select features based on the data cleaning result. Here, the bad pattern prediction apparatus 100 may select features using a plurality of feature selection methods. For example, the bad pattern prediction apparatus 100 may extract features capable of distinguishing pass/fail from among 309 features remaining as a result of data cleaning. For example, the apparatus 100 for predicting a bad pattern may use four feature selection methods, and three commonly used feature selection methods. For example, the feature selection methods may include at least one of Correlation-based Feature Selection (CFS), Symmetrical Uncertainty (SU), Information Gain (IG), or Combination Features (CF). CFS is a method of analyzing the correlation between two variables using Pearson's correlation coefficient, and extracting a feature with a high correlation. SU is a method of evaluating the value of a feature set attribute after measuring the uncertainty about symmetry in relation to another feature set. IG is a method of selecting features by comparing the average amount of information between the target class and input features, and is also used in Decision Tree C4.5. CF is a method of selecting a common feature from the results of different feature selection methods. Through this, the bad pattern prediction apparatus 100 may select features as shown in [Table 1] from the 1519 records and 309 features remaining as a result of data cleaning. Here, the bad pattern prediction apparatus 100 selects the same features based on the SU and IG, and thus any one feature selection method of SU or IG may be used. In addition, the bad pattern prediction apparatus 100 may select 12 features in common in correspondence with CFS, SU, and IG based on CF.

불량 패턴 예측 장치(100)는 330 단계에서 특징들을 스케일링할 수 있다. 이 때 불량 패턴 예측 장치(100)는 데이터 셋으로부터 선택된 특징들을 스케일링할 수 있다. 각각의 특징에 대응하는 데이터 크기의 범위가 상이하기 때문에, 특징들에 대한 스케일링이 필요할 수 있다. 여기서, 스케일링은 데이터의 집합에 대해 [0, 1] 까지의 데이터로 일반화하는 작업이다. 이를 통해, 각각의 분류기를 통하여 학습할 경우, 특정 특징에 대한 bias를 줄일 수 있으며, 실행 속도를 향상 시킬 수 있다. 예를 들면, 불량 패턴 예측 장치(100)는 하기 [수학식 7]에 기반하여, 특징들을 스케일링할 수 있다. The apparatus 100 for predicting a bad pattern may scale features in step 330. In this case, the bad pattern prediction apparatus 100 may scale features selected from the data set. Since the range of the data size corresponding to each feature is different, scaling for features may be required. Here, scaling is an operation of generalizing a data set to data up to [0, 1]. Through this, when learning through each classifier, bias for specific features can be reduced and execution speed can be improved. For example, the apparatus 100 for predicting a defective pattern may scale features based on Equation 7 below.

여기서, x는 입력 데이터를 나타내고, x’는 일반화된 데이터를 나타내고, Average(X), Min(X) 및 Max(X)는 특징들에 대응하는 입력 데이터 X의 평균값, 최소값 그리고 최대값을 나타낼 수 있다. Here, x denotes input data, x'denotes generalized data, and Average(X), Min(X), and Max(X) denote the average, minimum, and maximum values of the input data X corresponding to the features. I can.

불량 패턴 예측 장치(100)는 340 단계에서 오버 샘플링을 수행할 수 있다. 일반적으로 반도체 제조 공정은 불량 패턴에 대한 데이터의 수가 매우 적다. 따라서, 학습을 할 경우, 다수의 패턴(정량 패턴)을 예측하게 되는 상황이 발생되기 때문에, 소수의 클래스(불량 패턴)에 대해 예측할 수 없게 된다. 따라서 소수의 클래스에 대한 오버 샘플링이 필요하다. 예를 들면, 실험을 위해 70% 데이터 셋과 30% 데이터 셋이 이용될 수 있다. SECOM 데이터 셋의 실험데이터 70%은 총 1051 개의 레코드들 중에서 24:1의 비율로 1009 개의 정량 패턴들과 42 개의 불량 패턴들을 포함할 수 있다. 이러한 이유로, 불량 패턴 예측 장치(100)는 SMOTE(synthetic minority over-sampling technique)을 이용하여 불량 패턴에 대해 데이터를 생성할 수 있다. 예를 들면, 불량 패턴 예측 장치(100)는, 정량 패턴들과 불량 패턴들의 비율이 5:5가 되도록, 데이터를 오버 샘플링할 수 있다.The bad pattern prediction apparatus 100 may perform oversampling in step 340. In general, in a semiconductor manufacturing process, the number of data on defective patterns is very small. Therefore, in the case of learning, since a situation occurs in which a large number of patterns (quantitative patterns) are predicted, a small number of classes (defective patterns) cannot be predicted. Therefore, oversampling for a small number of classes is required. For example, a 70% data set and a 30% data set could be used for experiments. 70% of the experimental data of the SECOM data set may include 1009 quantitative patterns and 42 defective patterns in a ratio of 24:1 out of a total of 1051 records. For this reason, the apparatus 100 for predicting a defective pattern may generate data for a defective pattern using a synthetic minority over-sampling technique (SMOTE). For example, the apparatus 100 for predicting a defective pattern may oversample the data so that the ratio of the quantitative patterns and the defective patterns is 5:5.

불량 패턴 예측 장치(100)는 230 단계에서 복수 개의 분류기(130)들을 이용하여, 제조되는 반도체를 분류할 수 있다. 이 때 불량 패턴 예측 장치(100)는 DS(dempster-shafer) 기반 멀티 분류기로서, 특징 선택 방법들에 각각 대응하는 복수 개의 분류기들을 이용하여 반도체를 분류하기 위한 학습을 수행하고, 분류기들에서 예측된 결과들을 융합하여 제공할 수 있다. 예를 들면, 분류기들은 Naive Bayesian(NB), Decision Tree C4.5, Support Vector Machine(SVM), Back Propagation Network(BPN) 또는 Random Forest(RF) 중 적어도 어느 하나를 포함할 수 있다. 예를 들면, 분량 패턴 예측 장치(100)는 DS 기반으로 세 개의 분류기(130)들의 예측 결과들을 결합할 수 있다. 하기 [수학식 8]과 같이 첫 번째 분류기(130)의 예측 결과가 m₁이고, 두 번째 분류기(130)의 예측 결과가 m₂이고, 세 번째 분류기(130)의 예측 결과가 m₃일 때, 분량 패턴 예측 장치(100)는 분류기(130)들의 예측 결과들을 하기 [수학식 9]와 같이 m₄로 결합될 수 있다. 그리고 분량 패턴 예측 장치(100)는 m₄(Pass)와 m₄(Fail)중에서 확률이 높은 값을 최종적인 출력변수로 결정할 수 있다.The apparatus 100 for predicting a defective pattern may classify a semiconductor to be manufactured using the plurality of classifiers 130 in step 230. At this time, the bad pattern prediction apparatus 100 is a DS (dempster-shafer)-based multi classifier, and performs learning to classify a semiconductor using a plurality of classifiers respectively corresponding to feature selection methods, and predicted by the classifiers. Results can be provided by fusion. For example, the classifiers may include at least one of Naive Bayesian (NB), Decision Tree C4.5, Support Vector Machine (SVM), Back Propagation Network (BPN), or Random Forest (RF). For example, the quantity pattern prediction apparatus 100 may combine prediction results of three classifiers 130 based on DS. When the prediction result of the first classifier 130 is m ₁ , the prediction result of the second classifier 130 is m _2, and the prediction result of the third classifier 130 is m ₃ as shown in [Equation 8] below. , The quantity pattern prediction apparatus 100 may combine prediction results of the classifiers 130 into m ₄ as shown in [Equation 9] below. In addition, the quantity pattern prediction apparatus 100 may determine a value having a high probability among m ₄ (Pass) and m ₄ (Fail) as a final output variable.

다양한 실시예들에 따른 불량 패턴 예측 장치(100)에 대한 실험 및 평가가 진행되었다. 이 때 SECOM 데이터셋을 활용하여, 불량 패턴 예측 장치(100)에 대한 성능이 평가되었다. 전체 1519 개의 레코드 셋이 5-fold cross-validation되었다. 성능 평가의 기준은 confusion matrix를 이용하며, 각 모델별로 sensitivity, specificity, accuracy가 계산되었다. Experiments and evaluations of the apparatus 100 for predicting a defective pattern according to various embodiments were conducted. At this time, using the SECOM data set, the performance of the defective pattern prediction apparatus 100 was evaluated. A total of 1519 record sets were 5-fold cross-validated. The confusion matrix was used as the criterion for performance evaluation, and sensitivity, specificity, and accuracy were calculated for each model.

먼저, 각 특징 선택 방법 별로 분류기(130)에 대한 성능이 평가되었다. 분류기(130)들은 Naive Bayesian (NB), Support Vector Machine (SVM), Logistics Regression (LR), Back Propagation Network (BPN), Random Forest (RF), decision tree C4.5 (C4.5) 및 Bayesian Network (BN)가 사용되었다. 여기서, CFS, SU와 IG, 및 CF에 따른 분류기(130)들 각각의 성능 측정 결과가 하기 [표 2], [표 3] 및 [표 4]와 같다. First, the performance of the classifier 130 was evaluated for each feature selection method. Classifiers 130 are Naive Bayesian (NB), Support Vector Machine (SVM), Logistics Regression (LR), Back Propagation Network (BPN), Random Forest (RF), decision tree C4.5 (C4.5) and Bayesian Network. (BN) was used. Here, the performance measurement results of each of the classifiers 130 according to CFS, SU and IG, and CF are shown in [Table 2], [Table 3], and [Table 4] below.

상기 [표 2]에 따르면, CFS에 따른 분류기(130)는, sensitivity와 specificity가 각각 17.20%, 89.97%으로, 가장 우수했다. SVM은 경계선을 정하는 분류기라 소수의 클래스인 fail을 분류할 수 있는 경계선을 만들지 못하고 pass만 맞출 수 있도록 학습된다. 그리고 LR은 cost function을 사용하기 때문에 정확도에 의존이 된다. 따라서 LR도 소수의 클래스인 fail을 예측할 수 없다. CFS에서는 NB가 sensitivity와 specificity를 결합하여 계산했을 때, 다른 모델보다 높다. 상기 [표 3]에 따르면, SU와 IF에 따른 분류기(130)는 BN이 가장 높으며, 상기 [표 4]에 따르면, CF에 따른 분류기(130)는 NB가 가장 높다. SU와 IF에 따른 분류기(130)는 NB가 sensitivity가 매우 높지만, specificity가 매우 낮은 수준이기 때문에, pass를 예측하는 데 적합하지 않다. NB와 BN은 모두 bayes 이론을 이용한 것이며, 사후 확률로 계산을 하는 방법이다. 즉, 반도체 제조공정에서는 bayes를 이용한 확률 기반의 분류가 가장 적합하다고 할 수 있다.According to [Table 2], the classifier 130 according to the CFS had the best sensitivity and specificity of 17.20% and 89.97%, respectively. SVM is a classifier that determines the boundary line, so it is learned to match only the pass without creating a boundary line that can classify a few classes fail. And because LR uses cost function, it depends on accuracy. Therefore, even LR cannot predict fail, which is a small number of classes. In CFS, NB is higher than other models when calculated by combining sensitivity and specificity. According to [Table 3], the classifier 130 according to SU and IF has the highest BN, and according to Table 4, the classifier 130 according to CF has the highest NB. The classifier 130 according to SU and IF is not suitable for predicting a pass because NB has a very high sensitivity, but a very low specificity. Both NB and BN use Bayes theory and are methods of calculating with posterior probability. That is, in the semiconductor manufacturing process, it can be said that the probability-based classification using bayes is most suitable.

다음으로, DS(dempster-shafer)를 적용하여, 복수 개의 분류기(130)들의 결합에 대해 실험을 진행하였다. CFS와 CF가 NB와 SU에 각각 적용되었고, IF가 BN에 적용되었다. DS를 적용한 멀티 분류기의 실험결과는 GKRL [표 5]와 같다.Next, by applying a DS (dempster-shafer), an experiment was conducted on the combination of a plurality of classifiers 130. CFS and CF were applied to NB and SU, respectively, and IF was applied to BN. The experimental results of the multi-classifier applying DS are shown in GKRL [Table 5].

상기 [표 5]에 따르면, 하나의 분류기를 사용한 결과 보다는, 복수 개의 분류기(130)들을 사용함에 따라 accuracy가 낮다. 하지만, 복수 개의 분류기(130)들을 사용함에 따라, sensitivity는 대부분 높기 때문에 불량 패턴을 예측 하는데 유용하다. accuracy가 높지만 sensitivity가 매우 낮은 것은 소수의 클래스를 예측하기 어려운 것으로 해석될 수 있다. CFS와 SU, IF 모델을 결합했을 경우, accuracy가 81.29%으로 가장 높은 것으로 나타났다. 그리고, SU, IF와 CF를 결합한 모델은 sensitivity가 48.54%으로 가장 높지만, accuracy는 36.59%으로 가장 낮게 나타났다. 여기서, 소수의 클래스를 예측할 수 있는 불량 패턴을 예측하는 데에는 sensitivity와 specificity의 편차를 고려한다. 즉, 둘 다 만족할 수준으로 높은 것은, CFS와 SU, IF, CF를 모두 결합한 모델이 가장 우수한 성능을 갖는다는 것이다. 즉, 많은 수의 특징 선택 방법들을 조합할수록, 더욱 좋은 성능을 확보할 수 있다. According to [Table 5], the accuracy is lower as the plurality of classifiers 130 are used, rather than the result of using one classifier. However, as the plurality of classifiers 130 are used, the sensitivity is mostly high, which is useful for predicting a defective pattern. High accuracy but very low sensitivity can be interpreted as being difficult to predict a small number of classes. When the CFS, SU, and IF models were combined, the accuracy was the highest at 81.29%. In addition, the model combining SU, IF and CF showed the highest sensitivity at 48.54%, but the lowest accuracy at 36.59%. Here, deviations in sensitivity and specificity are considered in predicting a defective pattern capable of predicting a small number of classes. In other words, what is satisfactory for both is that the model that combines CFS, SU, IF, and CF has the best performance. That is, the more feature selection methods are combined, the better performance can be secured.

다양한 실시예들에 따른 불량 패턴 예측 장치(100)는, 제조되는 반도체로부터 데이터 셋을 수집하는 데이터 수집부(110), 상기 데이터 셋으로부터 복수 개의 특징들을 선택하는 복수 개의 특징 선택부(120)들, 상기 특징들에 기반하여, 상기 반도체에 대한 불량 여부를 예측하는 복수 개의 분류기(130)들, 및 상기 분류기(130)들로부터 출력되는 예측 결과들을 결합하여, 상기 반도체의 불량 여부를 판정하는 판정부(140)를 포함할 수 있다. A defect pattern prediction apparatus 100 according to various embodiments includes a data collection unit 110 for collecting a data set from a semiconductor to be manufactured, and a plurality of feature selection units 120 for selecting a plurality of features from the data set. , Based on the features, a plurality of classifiers 130 for predicting whether the semiconductor is defective, and a plate for determining whether the semiconductor is defective by combining prediction results output from the classifiers 130 May include government 140.

다양한 실시예들에 따르면, 상기 특징 선택부(120)들은, 복수 개의 특징 선택 방법들을 각각 이용하여, 불량 패턴과 관련된 특징들을 선택할 수 있다. According to various embodiments, the feature selection units 120 may select features related to a defective pattern by using a plurality of feature selection methods, respectively.

다양한 실시예들에 따르면, 상기 분류기들(130)은, 상기 특징 선택부들에 각각 대응하여 생성되며, 상기 특징 선택 방법들을 기반으로, 상기 반도체에 대한 불량 여부를 각각 예측할 수 있다. According to various embodiments, the classifiers 130 are generated corresponding to the feature selection units, respectively, and may predict whether the semiconductor is defective or not, based on the feature selection methods.

다양한 실시예들에 따르면, 상기 특징 선택 방법들은, Correlation-based Feature Selection(CFS), Symmetrical Uncertainty(SU), Information Gain(IG) 또는 Combination Features(CF) 중 적어도 어느 하나를 포함할 수 있다. According to various embodiments, the feature selection methods may include at least one of Correlation-based Feature Selection (CFS), Symmetrical Uncertainty (SU), Information Gain (IG), and Combination Features (CF).

다양한 실시예들에 따르면, 상기 판정부는, Dempster-shafer에 기반하여, 상기 분류기들로부터 출력되는 예측 결과들을 결합할 수 있다. According to various embodiments, the determination unit may combine prediction results output from the classifiers based on a Dempster-shafer.

다양한 실시예들에 따른 불량 패턴 예측 방법은, 제조되는 반도체로부터 데이터 셋을 수집하는 단계, 상기 데이터 셋으로부터 복수 개의 특징들을 선택하는 단계, 복수 개의 분류기(130)들을 이용하여, 상기 특징들을 기반으로, 상기 반도체에 대한 불량 여부를 예측하는 단계, 및 상기 분류기(130)들로부터 출력되는 예측 결과들을 결합하여, 상기 반도체의 불량 여부를 판정하는 단계를 포함할 수 있다. A method for predicting a defective pattern according to various embodiments includes: collecting a data set from a semiconductor to be manufactured, selecting a plurality of features from the data set, and using a plurality of classifiers 130, based on the features. , Predicting whether the semiconductor is defective, and combining prediction results output from the classifiers 130 to determine whether the semiconductor is defective.

다양한 실시예들에 따르면, 불량 패턴 예측 장치(100)가 복수 개의 분류기(130)들을 이용하여 반도체의 불량 패턴을 예측함으로써, 보다 효과적으로 불량 패턴을 예측할 수 있다. 즉 불량 패턴 예측 장치(100)가 복수 개의 분류기(130)들을 이용하여 반도체에 대한 불량 여부를 예측하고, 분류기(130)들로부터 출력되는 예측 결과를 결합하여 반도체의 불량 여부를 판정하기 때문에, 불량 패턴을 예측하는 데 있어서 정확성 및 신뢰성이 향상될 수 있다. According to various embodiments, the defective pattern prediction apparatus 100 may predict a defective pattern of a semiconductor using a plurality of classifiers 130, thereby more effectively predicting a defective pattern. That is, since the defective pattern prediction apparatus 100 predicts whether the semiconductor is defective using a plurality of classifiers 130 and combines the prediction results output from the classifiers 130 to determine whether the semiconductor is defective, Accuracy and reliability can be improved in predicting patterns.

본 문서의 다양한 실시예들에 관해 설명되었으나, 본 문서의 다양한 실시예들의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능하다. 그러므로, 본 문서의 다양한 실시예들의 범위는 설명된 실시예에 국한되어 정해져서는 아니되며 후술하는 특허청구의 범위 뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다. Although various embodiments of the present document have been described, various modifications may be made without departing from the scope of the various embodiments of the present document. Therefore, the scope of the various embodiments of the present document is limited to the described embodiments and should not be defined, but should be defined by the scope of the claims as well as the equivalents of the claims to be described later.

Claims

In a defect pattern prediction apparatus using a multi classifier according to a feature selection technique in a semiconductor manufacturing process,
A data collection unit collecting a data set from a manufactured semiconductor;
A plurality of feature selection units for selecting a plurality of features from the data set;
A plurality of classifiers for predicting whether the semiconductor is defective based on the characteristics; And
A determination unit that combines prediction results output from the classifiers to determine whether the semiconductor is defective,
The feature selection units,
Using each of a plurality of feature selection methods, each of the features related to the defective pattern is selected,
The classifiers,
It is generated corresponding to each of the feature selection units, and based on the feature selection methods, each predicts whether the semiconductor is defective or not,
The feature selection methods,
Including at least any one of Correlation-based Feature Selection (CFS), Combination Features (CF), and Symmetrical Uncertainty (SU) or Information Gain (IG),
To select each of the above features,
Any one of the feature selection units uses the CF,
The other one of the feature selection units uses the CFS,
Another one of the feature selection units uses either the SU or the IG,
The above features are:
For the data set, after data cleaning for removing unnecessary data is performed, it is selected from the data set,
To generalize the data size of each of the features, scaled,
The prediction result output from any one of the classifiers is,
M ₁ (Pass) representing the probability that the semiconductor is quantitative and m ₁ (Fail) representing the probability that the semiconductor is defective,
The prediction result output from the other one of the classifiers is,
M ₂ (Pass) representing the probability that the semiconductor is quantitative and m ₂ (Fail) representing the probability that the semiconductor is defective,
The prediction result output from another one of the classifiers is,
M ₃ (Pass) representing the probability that the semiconductor is quantitative and m ₃ (Fail) representing the probability that the semiconductor is defective,
The determination unit,
The prediction results output from the classifiers are combined into m ₄ (Pass) representing the probability that the semiconductor is quantitative and m ₄ (Fail) representing the probability that the semiconductor is defective, as shown in the following equation,

When the m ₄ (Pass) is higher than the m ₄ (Fail), the semiconductor is determined to be quantitative, and when the m ₄ (Fail) is higher than the m ₄ (Pass), the semiconductor is determined to be defective .

delete

The method of claim 1, wherein the determination unit,
An apparatus for combining prediction results output from the classifiers based on the Dempster-shafer.

In a method for predicting a defect pattern using a multi classifier according to a feature selection technique in a semiconductor manufacturing process,
Collecting a data set from a semiconductor manufactured by a data collection unit;
Selecting a plurality of features from the data set by using a plurality of feature selection methods, respectively, by a plurality of feature selection units;
Predicting whether or not the semiconductor is defective based on the features by a plurality of classifiers; And
A step of determining whether the semiconductor is defective by combining prediction results output from the classifiers,
The classifiers,
It is generated corresponding to each of the feature selection units, and based on the feature selection methods, each predicts whether the semiconductor is defective or not,
The feature selection methods,
Including at least any one of Correlation-based Feature Selection (CFS), Combination Features (CF), and Symmetrical Uncertainty (SU) or Information Gain (IG),
To select each of the above features,
Any one of the feature selection units uses the CF,
The other one of the feature selection units uses the CFS,
Another one of the feature selection units uses either the SU or the IG,
The above features are:
For the data set, after data cleaning for removing unnecessary data is performed, it is selected from the data set,
To generalize the data size of each of the features, scaled,
The prediction result output from any one of the classifiers is,
M ₁ (Pass) representing the probability that the semiconductor is quantitative and m ₁ (Fail) representing the probability that the semiconductor is defective,
The prediction result output from the other one of the classifiers is,
M ₂ (Pass) representing the probability that the semiconductor is quantitative and m ₂ (Fail) representing the probability that the semiconductor is defective,
The prediction result output from another one of the classifiers is,
M ₃ (Pass) representing the probability that the semiconductor is quantitative and m ₃ (Fail) representing the probability that the semiconductor is defective,
The step of determining whether the semiconductor is defective or not,
Combining the prediction results output from the classifiers into m ₄ (Pass) representing the probability that the semiconductor is quantitative and m ₄ (Fail) representing the probability that the semiconductor is defective, as shown in the following equation;

If the m ₄ (Pass) is higher than the m ₄ (Fail), determining that the semiconductor is quantitative; And
If the m ₄ (Fail) is higher than the m ₄ (Pass), determining that the semiconductor is defective.