KR20230016343A

KR20230016343A - Automatic Analysis System for Quality Data Based on Machine Learning

Info

Publication number: KR20230016343A
Application number: KR1020210097711A
Authority: KR
Inventors: 박준형
Original assignee: 현대모비스 주식회사
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2023-02-02

Abstract

In this embodiment, provided are a quality data analysis system and a method thereof. In order to reduce the quality cost due to the reduction of the time required for product quality analysis and the occurrence of defects, a machine learning-based inference model is trained based on product quality data, analysis reports are provided by analyzing the quality data based on the inference model, and process features are adjusted using the inference model as a simulator.

Description

Automatic Analysis System for Quality Data Based on Machine Learning}

본 개시는 머신 러닝(machine learning) 기반 품질데이터 자동 분석 시스템에 관한 것이다. 더욱 상세하게는, 제품에 대한 축적된 품질데이터를 기반으로 AI 기반 추론 모델을 트레이닝하고, 추론 모델을 기반으로 품질데이터를 분석하여 분석리포트를 제공하며, 추론 모델을 시뮬레이터로 이용하여 공정인자(process feature)를 조정하는 품질데이터 분석 시스템 및 방법에 관한 것이다.The present disclosure relates to a system for automatically analyzing quality data based on machine learning. More specifically, an AI-based inference model is trained based on the accumulated quality data for the product, an analysis report is provided by analyzing the quality data based on the inference model, and a process factor (process factor) is provided by using the inference model as a simulator. It relates to a quality data analysis system and method for adjusting feature).

이하에 기술되는 내용은 단순히 본 발명과 관련되는 배경 정보만을 제공할 뿐 종래기술을 구성하는 것이 아니다. The information described below merely provides background information related to the present invention and does not constitute prior art.

종래의 품질시스템은 제품의 생산과정에서 발생하는, 공정인자(process feature 또는 process parameter)에 대한 품질데이터, 및 판매과정에서 발생하는 필드클레임(field claim) 데이터를 축적함에도, 그들에 대한 활용은 거의 미미한 수준이다. 품질비용 감소를 창출한다는 측면에서, 축적된 필드클레임 데이터와 공정인자 간의 상관관계를 분석함으로써, 불량을 발생시키는 공정인자를 선별하고, 해당 공정인자의 값을 조절하여 불량을 개선하는 것이 필요하다. Although the conventional quality system accumulates quality data on process features (process features or process parameters) and field claim data generated during the sales process, they are rarely utilized. at an insignificant level. In terms of creating quality cost reduction, it is necessary to select process factors that cause defects by analyzing the correlation between accumulated field claim data and process factors, and improve defects by adjusting the values of the process factors.

최근 머신 러닝을 이용하는 품질데이터 분석이 산발적으로 시도되고 있으나, 공정 하나당 평균 2~3 개월이 소요되고, 분석된 결과를 확대하여 전개하기 위한 데이터 분석 전문인력도 부족하다는 문제가 있다. 또한, 품질데이터 분석은 한번으로 마감되는 경우가 드물기 때문에, 제품의 사양이나 생산조건의 변경 시, 품질데이터를 재분석하여 그 결과를 현장에 적용해야 한다는 문제도 존재한다. Recently, quality data analysis using machine learning has been sporadically attempted, but there is a problem that it takes an average of 2 to 3 months per process and there is a shortage of data analysis experts to expand and develop the analyzed results. In addition, since quality data analysis is rarely completed once, there is a problem that the quality data must be reanalyzed and the results applied to the field when product specifications or production conditions are changed.

한편, 생산과정에서 수집되는 품질데이터는 불량 원인 분석, 공정 개선, 및 그에 따른 품질비용 감소 측면에서 가치가 높은 자산이다. 그러나, 수집된 품질데이터에 있어서, 공정인자값들이 편향된(biased) 경우가 매우 많은데, 편향된 공정인자는 품질데이터에 기반하는 품질분석 과정을 어렵게 할 수 있다. 이러한 공정인자 편향의 원인의 하나로는, 체계적이지 못한 공정인자에 대한 관리를 들 수 있다.On the other hand, quality data collected in the production process is a valuable asset in terms of defect cause analysis, process improvement, and consequent quality cost reduction. However, in the collected quality data, there are many cases in which process factor values are biased, and the biased process factor can make the quality analysis process based on the quality data difficult. One of the causes of such process factor bias is unsystematic management of process factors.

일반적으로 공정인자는 품질관리 기준 범위 내에서 조정될 수 있다. 그러나, 현장 담당자가 직접 공정인자를 변경 또는 관리해야 한다는 특성으로 인하여, 하나의 공정인자값으로 고정되어 관리되는 경우가 흔히 발생할 수 있다. 예컨대, 현장 담당자의 판단에 따라 공정인자값이 변경되므로, 특정 공정인자값이 변경되지 못한 채로 단일한 값으로 관리되는 경우도 발생한다. 특히, 이러한 경우는 품질데이터에 대한 분석 자체가 불가능하다는 문제가 있다.In general, process parameters can be adjusted within the range of quality control standards. However, due to the nature that the person in charge of the field must directly change or manage the process factor, a case in which a process factor value is fixed and managed can often occur. For example, since the process factor value is changed according to the judgment of the person in charge of the field, there is a case where a specific process factor value is managed as a single value without being changed. In particular, in this case, there is a problem in that analysis of the quality data itself is impossible.

따라서, 공정인자 편향을 해결하여 분석이 용이한 품질데이터를 축적하고, 축적된 품질데이터를 분석하여 불량을 발생시키는 공정인자를 선별하며, 해당 공정인자의 값을 조절하여 불량을 감소시킬 수 있는 효과적인 방안이 고려되어야 한다. Therefore, it is possible to accumulate quality data that is easy to analyze by solving process factor bias, analyze the accumulated quality data to select process factors that cause defects, and adjust the value of the process factor to reduce defects. options should be considered.

본 개시는, 제품의 품질분석에 대한 소요 시간 및 불량 발생의 감소에 따른 품질비용 저감을 위해, 제품에 대한 품질데이터를 기반으로 머신 러닝(Machine Learning) 기반 추론 모델을 트레이닝하고, 추론 모델을 기반으로 품질데이터를 분석하여 분석리포트를 제공하며, 추론 모델을 시뮬레이터로 이용하여 공정인자(process feature)를 조정하는 품질데이터 분석 시스템 및 방법을 제공하는 데 목적이 있다.The present disclosure trains a machine learning-based inference model based on product quality data in order to reduce the quality cost due to the reduction in the time required for product quality analysis and the occurrence of defects, and based on the inference model. The purpose is to provide an analysis report by analyzing quality data, and to provide a quality data analysis system and method that adjusts process features using an inference model as a simulator.

본 개시의 실시예에 따르면, 컴퓨팅 장치가 수행하는, 제품에 대한 품질관리 기준을 개선하는 방법에 있어서, 상기 제품에 대한 품질데이터와 필드클레임을 획득하는 과정, 여기서, 상기 품질데이터는, 상기 제품의 생산과정에(서) 적용되거나 발생하는, 복수의 공정인자(process feature)에 대해 수집되고, 상기 필드클레임은 상기 제품에 대한 양 또는 불량을 나타냄; 상기 복수의 공정인자와 상기 필드클레임 간의 제1 영향도를 분석하는 과정; 상기 복수의 공정인자 중 상기 제1 영향도가 기설정된 상위 비율 이내에 들지 못하는 편향 공정인자의 경우, 상기 편향 공정인자의 기존 관리범위를 확대하는 과정; 상기 확대된 관리범위를 기반으로, 상기 복수의 공정인자에 대해 품질데이터를 재수집하는 과정; 상기 재수집된 품질데이터에 대해, 상기 복수의 공정인자와 상기 필드크레임 간의 제2 영향도를 분석하는 과정; 및 상기 제1 영향도 또는 제2 영향도가 상기 기설정된 상위 비율 이내에 드는 경우, 해당되는 공정인자의 데이터를 세분화한 후 재수집하는 과정을 포함하는, 품질관리 기준을 개선하는 방법을 제공한다. According to an embodiment of the present disclosure, in a method for improving quality control criteria for a product, performed by a computing device, a process of acquiring quality data and field claims for the product, wherein the quality data includes: Collected for a plurality of process features, applied or occurring in the production process of, the field claim indicates the quality or defect of the product; analyzing a first influence between the plurality of process factors and the field claim; expanding an existing management range of the biased process factor in the case of a biased process factor whose first influence does not fall within a predetermined upper ratio among the plurality of process factors; recollecting quality data for the plurality of process factors based on the expanded management scope; analyzing a second influence between the plurality of process factors and the field frame with respect to the recollected quality data; and, when the first influence or the second influence falls within the predetermined upper ratio, subdividing and recollecting the data of the corresponding process factor.

본 개시의 다른 실시예에 따르면, 상기 제품에 대한 품질데이터와 필드클레임을 획득하는 입력부, 여기서, 상기 품질데이터는, 상기 제품의 생산과정에(서) 적용되거나 발생하는, 복수의 공정인자(process feature)에 대해 수집되고, 상기 필드클레임은 제품에 대해 양 또는 불량을 나타냄; 상기 복수의 공정인자와 상기 필드클레임 간의 제1 영향도를 분석하는 영향분석부; 상기 복수의 공정인자 중 상기 제1 영향도가 기설정된 상위 비율 이내에 들지 못하는 편향 공정인자의 경우, 상기 편향 공정인자의 기존 관리범위를 확대하는 관리범위 조정부; 상기 확대된 관리범위를 기반으로, 상기 복수의 공정인자에 대해 품질데이터를 재수집하는 데이터 재수집부; 및 상기 제1 영향도가 상기 기설정된 상위 비율 이내에 드는 경우, 해당되는 공정인자의 데이터를 세분화한 후 재수집하는 데이터세분화 수집부를 포함하는, 품질관리 기준 개선장치를 제공한다. According to another embodiment of the present disclosure, an input unit for acquiring quality data and field claims for the product, wherein the quality data is applied or generated in the production process of the product, a plurality of process factors (process feature), and the field claim indicates a good or bad product; an impact analysis unit analyzing a first influence between the plurality of process factors and the field claim; a management range adjusting unit expanding an existing management range of the biased process factor in the case of a biased process factor whose first influence does not fall within a predetermined upper ratio among the plurality of process factors; Based on the expanded management scope, a data re-collecting unit for re-collecting quality data for the plurality of process factors; and a data segmentation collection unit that subdivides and recollects the data of the corresponding process factor when the first influence falls within the predetermined upper ratio.

본 개시의 다른 실시예에 따르면, 품질관리 기준을 개선하는 방법이 포함하는 각 단계를 실행시키기 위하여 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터프로그램을 제공한다. According to another embodiment of the present disclosure, a computer program stored in a computer-readable recording medium is provided to execute each step included in a method for improving quality control standards.

이상에서 설명한 바와 같이 본 실시예에 따르면, 제품에 대한 축적된 품질데이터를 기반으로 머신 러닝 기반 추론 모델을 트레이닝하고, 추론 모델을 기반으로 수집된 품질데이터를 분석하여 분석리포트를 제공하는 품질데이터 분석 시스템 및 방법을 제공함으로써, 제품의 품질분석에 대한 소요 시간의 감소에 따른 품질비용 저감이 가능해지는 효과가 있다.As described above, according to the present embodiment, quality data analysis that trains a machine learning-based inference model based on the accumulated quality data for products, analyzes the quality data collected based on the inference model, and provides an analysis report By providing the system and method, there is an effect of reducing the quality cost according to the reduction of the time required for product quality analysis.

또한 본 실시예에 따르면, 추론 모델을 시뮬레이터로 이용하여 공정인자를 조정하는 품질데이터 분석 시스템 및 방법을 제공함으로써, 제품 불량 발생을 감소시키는 것이 가능해지는 효과가 있다. In addition, according to the present embodiment, by providing a quality data analysis system and method for adjusting process factors using an inference model as a simulator, it is possible to reduce product defects.

또한 본 실시예에 따르면, 추론 모델을 기반으로 수집된 품질데이터를 분석하여 분석리포트를 제공하고, 추론 모델을 시뮬레이터로 이용하여 공정인자를 조정하는 품질데이터 분석 시스템 및 방법을 제공함으로써, 데이터 분석 비전공자인 현업 담당자가 제품에 대한 품질분석을 수행할 수 있는 MLaaS(Machine Learning as a Service) 환경을 구축하고, 현업 주도 품질데이터 관리 및 분석이 가능해지는 효과가 있다.In addition, according to this embodiment, by providing an analysis report by analyzing the quality data collected based on the inference model, and by using the inference model as a simulator to provide a quality data analysis system and method for adjusting process factors, data analysis non-majors It has the effect of establishing an MLaaS (Machine Learning as a Service) environment in which field personnel can perform quality analysis on products, and enabling field-led quality data management and analysis.

또한 본 실시예에 따르면, 품질관리 기준의 개선을 기반으로 공정인자 편향을 해결하여 품질데이터를 축적하는 품질데이터 분석 시스템 및 방법을 제공함으로써, 품질데이터의 불균형을 감소시키고, 품질데이터 분석의 효율을 증대시키는 것이 가능해지는 효과가 있다. In addition, according to the present embodiment, by providing a quality data analysis system and method for accumulating quality data by resolving process factor bias based on improvement of quality control standards, the imbalance of quality data is reduced and the efficiency of quality data analysis is improved. There is an effect that makes it possible to increase.

도 1은 본 개시의 일 실시예에 따른 품질데이터 분석 시스템에 대한 개략적인 예시도이다.
도 2는 본 개시의 일 실시예에 따른 분석리포트의 구성요소를 나타내기 위한 예시도이다.
도 3은 본 개시의 일 실시예에 따른 시뮬레이터의 추가적인 구성요소를 개략적으로 나타낸 것이다.
도 4는 본 개시의 일 실시예에 따른 공정인자 선택을 위한 UI의 예시도이다.
도 5는 본 개시의 일 실시예에 따른 공정인자 중요도를 나타내기 위한 UI의 예시도이다.
도 6은 본 개시의 일 실시예에 따른 분석결과를 나타내기 위한 UI의 예시도이다.
도 7은 본 개시의 일 실시예에 따른 공정인자 조정을 위한 UI의 예시도이다.
도 8은 본 개시의 일 실시예에 따른 추론 모델의 트레이닝에 이용되는 추가적인 구성요소를 개략적으로 나타낸다.
도 9는 본 개시의 일 실시예에 따른 품질데이터의 전처리과정에 대한 흐름도이다.
도 10은 본 개시의 일 실시예에 따른 공정인자 선정과정에 대한 흐름도이다.
도 11은 본 개시의 다른 실시예에 따른 머신 러닝 모델에 대한 트레이닝 과정을 나타내는 흐름도이다.
도 12는 본 개시의 일 실시예에 따른 품질데이터 분석방법에 대한 흐름도이다.
도 13은 본 개시의 일 실시예에 따른 시뮬레이터를 기반으로 품질관리 기준을 변경하는 방법에 대한 흐름도이다.
도 14는 본 개시의 일 실시예에 따른 추론 모델의 트레이닝 방법에 대한 흐름도이다.
도 15는 본 개시의 일 실시예에 따른 공정인자에 대한 품질관리 기준 개선장치에 대한 개략적인 구성도이다.
도 16은 본 개시의 일 실시예에 따른 공정인자의 품질관리 기준을 개선하는 방법에 대한 흐름도이다.
도 17은 본 개시의 일 실시예에 따른 분석 시스템을 기어박스에 적용하는 과정에 대한 흐름도이다.
도 18은 본 개시의 일 실시예에 따른 기어박스의 공정인자에 대한 특성 중요도를 나타내는 예시도이다.
도 19는 본 개시의 일 실시예에 따른 T-테스트를 나타내는 예시도이다. 1 is a schematic illustration of a quality data analysis system according to an embodiment of the present disclosure.
2 is an exemplary diagram illustrating components of an analysis report according to an embodiment of the present disclosure.
3 schematically illustrates additional components of a simulator according to one embodiment of the present disclosure.
4 is an exemplary view of a UI for selecting a process factor according to an embodiment of the present disclosure.
5 is an exemplary diagram of a UI for indicating importance of process factors according to an embodiment of the present disclosure.
6 is an exemplary diagram of a UI for displaying an analysis result according to an embodiment of the present disclosure.
7 is an exemplary view of a UI for adjusting a process factor according to an embodiment of the present disclosure.
8 schematically illustrates additional components used for training an inference model according to an embodiment of the present disclosure.
9 is a flowchart of a pre-processing process of quality data according to an embodiment of the present disclosure.
10 is a flowchart of a process factor selection process according to an embodiment of the present disclosure.
11 is a flowchart illustrating a training process for a machine learning model according to another embodiment of the present disclosure.
12 is a flowchart of a quality data analysis method according to an embodiment of the present disclosure.
13 is a flowchart of a method of changing a quality control criterion based on a simulator according to an embodiment of the present disclosure.
14 is a flowchart of a method for training an inference model according to an embodiment of the present disclosure.
15 is a schematic configuration diagram of an apparatus for improving quality control standards for process factors according to an embodiment of the present disclosure.
16 is a flowchart of a method for improving quality control standards of process factors according to an embodiment of the present disclosure.
17 is a flowchart of a process of applying an analysis system according to an embodiment of the present disclosure to a gearbox.
18 is an exemplary diagram illustrating importance of characteristics of process factors of a gearbox according to an embodiment of the present disclosure.
19 is an exemplary diagram illustrating a T-test according to an embodiment of the present disclosure.

이하, 본 발명의 실시예들을 예시적인 도면을 참조하여 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 실시예들의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to exemplary drawings. In adding reference numerals to components of each drawing, it should be noted that the same components have the same numerals as much as possible even if they are displayed on different drawings. In addition, in describing the present embodiments, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present embodiments, the detailed description will be omitted.

또한, 본 실시예들의 구성요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성요소를 다른 구성요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In addition, in describing the components of the present embodiments, terms such as first, second, A, B, (a), and (b) may be used. These terms are only used to distinguish the component from other components, and the nature, sequence, or order of the corresponding component is not limited by the term. Throughout the specification, when a part 'includes' or 'includes' a certain component, it means that it may further include other components without excluding other components unless otherwise stated. . In addition, the '... Terms such as 'unit' and 'module' refer to a unit that processes at least one function or operation, and may be implemented as hardware, software, or a combination of hardware and software.

첨부된 도면과 함께 이하에 개시될 상세한 설명은 본 발명의 예시적인 실시형태를 설명하고자 하는 것이며, 본 발명이 실시될 수 있는 유일한 실시형태를 나타내고자 하는 것이 아니다.The detailed description set forth below in conjunction with the accompanying drawings is intended to describe exemplary embodiments of the present invention and is not intended to represent the only embodiments in which the present invention may be practiced.

본 실시예는 머신 러닝(machine learning) 기반 품질데이터 자동 분석 시스템에 관한 내용을 개시한다. 보다 자세하게는, 제품의 품질분석에 대한 소요 시간 및 불량 발생의 감소에 따른 품질비용 저감을 위해, 축적된 품질데이터를 기반으로 머신 러닝(machine learning) 기반 추론 모델을 트레이닝하고, 추론 모델을 기반으로 품질데이터를 분석하여 분석리포트를 제공하며, 추론 모델을 시뮬레이터로 이용하여 공정인자(process feature)를 조정하는 품질데이터 분석 시스템 및 방법을 제공한다.This embodiment discloses a system for automatically analyzing quality data based on machine learning. More specifically, in order to reduce the quality cost due to the decrease in the time required for product quality analysis and the occurrence of defects, a machine learning-based inference model is trained based on the accumulated quality data, and based on the inference model Provides an analysis report by analyzing quality data, and a quality data analysis system and method for adjusting process features using an inference model as a simulator.

이하의 설명에서, 사용자(예컨대, 현업 또는 현장 담당자)에게 머신 러닝(machine learning) 기반의 품질분석 서비스를 제공할 수 있으므로, 본 실시예에 따른 품질데이터 분석 시스템이 사용자에 제공할 수 있는 서비스를 MLaaS(Machine Learning as a Service)로 나타낸다. In the following description, since a machine learning-based quality analysis service can be provided to a user (eg, a field or field manager), the service that the quality data analysis system according to this embodiment can provide to the user It is referred to as Machine Learning as a Service (MLaaS).

도 1은 본 개시의 일 실시예에 따른 품질데이터 분석 시스템에 대한 개략적인 예시도이다.1 is a schematic illustration of a quality data analysis system according to an embodiment of the present disclosure.

본 실시예에 따른 품질데이터 분석 시스템(100, 이하, '분석 시스템')은, 제품에 대한 축적된 품질데이터를 기반으로 머신 러닝(machine learning) 기반 추론 모델을 트레이닝하고, 추론 모델을 기반으로 품질데이터를 분석하여 분석리포트를 제공하며, 추론 모델을 시뮬레이터로 이용하여 공정인자를 조정한다. 분석 시스템(100)은 입력부(102), 데이터 전처리부(104), 판정부(106), 및 데이터 시각화부(108)의 전부 또는 일부를 포함한다. The quality data analysis system (100, hereinafter referred to as 'analysis system') according to this embodiment trains a machine learning-based inference model based on accumulated quality data for a product, and determines quality based on the inference model. It analyzes data to provide an analysis report and adjusts process factors by using an inference model as a simulator. The analysis system 100 includes all or part of an input unit 102 , a data pre-processing unit 104 , a determination unit 106 , and a data visualization unit 108 .

여기서, 본 실시예에 따른 분석 시스템(100)에 포함되는 구성요소가 반드시 이에 한정되는 것은 아니다. 예컨대, 분석 시스템(100)은 UI부(110)를 추가로 구비하여, 사용자가 MLaaS를 이용함에 있어서, 편의성을 제공할 수 있다. 또한, 분석 시스템(100)은, 판정부(106)에 포함된 추론 모델의 트레이닝을 위한 트레이닝부(112)를 추가로 구비하거나, 외부의 트레이닝부와 연동되는 형태로 구현될 수 있다. Here, components included in the analysis system 100 according to the present embodiment are not necessarily limited thereto. For example, the analysis system 100 may further include a UI unit 110 to provide convenience when a user uses MLaaS. In addition, the analysis system 100 may further include a training unit 112 for training of the reasoning model included in the determination unit 106, or may be implemented in a form interlocking with an external training unit.

도 1의 도시는 본 실시예에 따른 예시적인 구성이며, 입력부의 형태, 데이터 전처리부의 동작, 판정부에 포함된 추론 모델의 구조와 동작, 품질데이터 분석부의 동작, 트레이닝부의 구조와 동작, 및 UI부의 구성에 따라 다른 구성요소 또는 구성요소 간의 다른 연결을 포함하는 다양한 구현이 가능하다. 1 is an exemplary configuration according to the present embodiment, the shape of the input unit, the operation of the data pre-processing unit, the structure and operation of the inference model included in the determination unit, the operation of the quality data analysis unit, the structure and operation of the training unit, and the UI Depending on the configuration of the unit, various implementations including other components or other connections between components are possible.

입력부(102)는 제품에 대한 품질데이터를 획득한다. 여기서, 제품은 기어박스(gearbox)와 같은, 차량에 포함되는 부품일 수 있다. 품질데이터는 제품의 생산과정에(서) 적용되거나 발생하는, 복수의 공정인자에 대해 수집될 수 있다. The input unit 102 acquires quality data about the product. Here, the product may be a part included in a vehicle, such as a gearbox. Quality data can be collected for a plurality of process factors applied or occurring in the production process of a product.

공정인자는 제품의 생산과정을 조정하기 위한 입력인자, 생산과정의 중간에 형성되는 중간출력인자, 또는 생산과정의 결과로서 생성되는 출력인자의 전부 또는 일부를 포함할 수 있다.Process factors may include all or part of input factors for adjusting the production process of a product, intermediate output factors formed in the middle of the production process, or output factors generated as a result of the production process.

한편, 품질데이터 분석을 위해 입력되는 공정인자는, 추론 모델에 대한 사전 트레이닝 과정에서 선정된 주요 공정인자일 수 있다. 이러한 주요 공정인자의 선정 과정은 추론 모델에 대한 트레이닝 과정에서 설명하기로 한다. Meanwhile, process factors input for quality data analysis may be key process factors selected in a pre-training process for an inference model. The selection process of these key process factors will be explained in the training process for the inference model.

입력부(102)는, 추론 모델의 입력으로 이용되는 공정인자에 대해 데이터 유형을 설정할 수 있다. 여기서 공정인자의 데이터 유형은, 수치로 표현되는 숫자형(numerical type), 및 문자로 표현되는 범주형(category type)을 포함할 수 있다. 다른 데이터 유형으로는, 데이터가 수집된 시간 정보를 포함하는 시간형(time type)이 존재하나, 트레이닝 과정 중 주요 공정인자 선정 과정에서 제거될 수 있다. The input unit 102 may set a data type for a process factor used as an input of an inference model. Here, the data type of the process factor may include a numeric type represented by numbers and a category type represented by characters. As another data type, there is a time type including information on the time the data was collected, but it can be removed in the process of selecting key process factors during the training process.

한편, 품질데이터는, 추론 모델의 성능을 분석하기 위해, 타겟 출력(즉, 분석용 레이블)으로 이용될 수 있는 인자(예컨대, 제품에 대한 필드클레임 발생 유무)를 포함할 수 있다. 입력부(102)는 타겟 출력으로 이용되는 인자를 타겟 인자로 설정한다. On the other hand, the quality data may include a factor (eg, whether a field claim has occurred for a product) that can be used as a target output (ie, a label for analysis) in order to analyze the performance of an inference model. The input unit 102 sets a factor used as a target output as a target factor.

데이터 전처리부(104)는 공정인자의 데이터 유형별로 적절한 인코딩 과정을 수행하고, 수집과정에서 발생한 누락 데이터를 적절한 값으로 설정한다.The data pre-processing unit 104 performs an appropriate encoding process for each data type of process factor, and sets missing data generated in the collection process to an appropriate value.

데이터 전처리부(104)는, 범주형 공정인자에 대해, 추론 모델에 적합한 임베딩 값(embedding value)으로 변환하는 인코딩 과정을 수행할 수 있다. The data preprocessing unit 104 may perform an encoding process of converting the categorical process factor into an embedding value suitable for an inference model.

범주형 데이터의 예로는, 제품에 대한 필드클레임 발생 유무를 나타내는 타겟 인자를 들 수 있다. 타겟 인자에 대한 인코딩 과정은, 예컨대, 제품에 대한 필드클레임이 발생하지 않은 경우를 0, 필드클레임이 발생한 경우를 1로 나타낸다. 따라서, 이러한 타겟 인자에 대한 인코딩은, 추론 모델에 기반하는 품질 분석을 위한 분석용 레이블을 생성하는 과정일 수 있다.An example of categorical data is a target factor indicating whether a field claim has occurred for a product. In the encoding process for the target factor, for example, a case where no field claim for a product has occurred is represented by 0, and a case where a field claim has occurred is represented by 1. Accordingly, encoding of the target factor may be a process of generating an analysis label for quality analysis based on an inference model.

또한, 데이터 전처리부(104)는 수집과정에서 누락된 공정인자의 값을 설정할 수 있다. 예컨대, 숫자형 공정인자는 중앙값(median value)으로 설정되고, 범주형 공정인자는 최빈값(mode value)으로 설정될 수 있다.In addition, the data pre-processing unit 104 may set values of process factors omitted in the collection process. For example, the numerical process factor may be set to a median value, and the categorical process factor may be set to a mode value.

판정부(106)는 추론 모델을 포함하고, 전처리된 복수의 공정인자를 기반으로 추론 모델을 이용하여 제품의 양(OK) 또는 불량(No Good: NG) 여부에 대한 판정 결과를 생성한다. 여기서, 판정 결과는 제품의 양 또는 불량에 대한 확률값일 수 있다. The determination unit 106 includes an inference model, and generates a determination result on whether the product is good (OK) or bad (No Good: NG) by using the inference model based on a plurality of preprocessed process factors. Here, the determination result may be a probability value for the quantity or defect of the product.

제품의 불량에 대한 판정 결과는, 제품에 대한 필드클레임이 발생한 경우를 나타낼 수 있다. 따라서, 제품의 양에 대한 판정 결과는, 제품에 대한 필드클레임이 발생하지 않는 경우를 나타낸다. The result of the product defect determination may indicate a case in which a field claim has occurred for the product. Therefore, the result of the judgment on the quantity of the product indicates a case where no field claim is made for the product.

추론 모델은 머신 러닝 모델 형태로 구현되는데, 품질데이터를 대상으로 좋은 성능을 보이는, 트리 기반의 결정 트리(decision tree), 랜덤 포레스트(random forest), XGBoost(Extreme Gradient Boosting), 또는 LightGBM(Light Gradient Boosting Model)과 같은 4 가지 머신 러닝 알고리즘 중 하나가 구현된 모델일 수 있다. 트레이닝 과정을 이용하여, 트레이닝부(112)는, 4 가지 머신 러닝 알고리즘 각각을 적용한 모델 중에서 가장 성능이 좋은 모델을 추론 모델로 선정할 수 있다. 추론 모델의 선정을 위한 트레이닝 과정은 추후 설명하기로 한다. Inference models are implemented in the form of machine learning models, such as tree-based decision trees, random forests, XGBoost (Extreme Gradient Boosting), or LightGBM (Light Gradient Boosting), which show good performance for quality data. Boosting Model) can be an implemented model. Using the training process, the training unit 112 may select, as an inference model, a model with the best performance among models to which each of the four machine learning algorithms is applied. A training process for selecting an inference model will be described later.

데이터 시각화부(108)는 복수의 공정인자, 분석용 레이블, 및 판정 결과를 기반으로, 제품의 품질 분석, 또는 추론 모델의 학습 결과에 대한 분석리포트를 생성한다. The data visualization unit 108 generates an analysis report for product quality analysis or inference model learning results based on a plurality of process factors, analysis labels, and judgment results.

도 2는 본 개시의 일 실시예에 따른 분석리포트의 구성요소를 나타내기 위한 예시도이다. 2 is an exemplary diagram illustrating components of an analysis report according to an embodiment of the present disclosure.

판정 결과(제품의 양 또는 불량)에 미치는 공정인자별 영향을 포괄적/미시적으로 나타내기 위해, 데이터 시각화부(108)가 제공하는 분석리포트는 분석데이터 요약(202), 공정인자 중요도(204), 공정인자별 데이터 분포(206), 및 분석결과(208)의 전부 또는 일부를 포함할 수 있다.The analysis report provided by the data visualization unit 108 includes a summary of analysis data 202, importance of process factors 204, It may include all or part of the data distribution 206 for each process factor and the analysis result 208 .

분석데이터 요약(202)은 품질데이터를 구성하는 공정인자의 전반적인 정보를 나타낸다. 여기서 전반적인 정보는, 데이터 유형, 최빈값, 최솟값, 최댓값, 평균, 표준편차 등을 포함할 수 있다. 분석데이터 요약(202)은 제품의 품질 분석, 또는 추론 모델의 학습에 대한 결과로서 제공될 수 있다.The analysis data summary 202 represents overall information of process factors constituting the quality data. Here, the overall information may include a data type, a mode value, a minimum value, a maximum value, an average, a standard deviation, and the like. The analysis data summary 202 may be provided as a result of product quality analysis or training of an inference model.

공정인자 중요도(204)는 공정인자의 특성 중요도(feature importance)를 나타냄으로써, 각 공정인자가 판정 결과에 미치는 영향을 확인할 수 있도록 한다. 공정인자 중요도(204)는 추론 모델의 학습에 대한 결과로서 제공될 수 있다. 특성 중요도는 트리 기반 머신 러닝 알고리즘의 결과물인데, 자세한 사항에 대해서는 추후 설명하기로 한다. The process factor importance 204 indicates the feature importance of the process factor, so that the effect of each process factor on the judgment result can be confirmed. Process factor importance 204 may be provided as a result of training of an inference model. Feature importance is the result of a tree-based machine learning algorithm, which will be described in detail later.

공정인자별 데이터 분포(206)는 각 공정인자와 판정 결과 간, 또는 각 공정인자와 분석용 레이블 간의 관계에 대한 분포를 나타낸다.The data distribution 206 for each process factor represents the distribution of the relationship between each process factor and the judgment result or between each process factor and the label for analysis.

분석결과(208)는 판정 결과 및 분석용 레이블에 기반하는, 추론 모델에 대한 성능 분석을 나타낸다. 분석결과(208)는 제품의 품질 분석, 또는 추론 모델의 학습에 대한 결과로서 제공될 수 있다. 분석결과(208)에 대해서는 추후 설명하기로 한다.The analysis result 208 represents a performance analysis for the inference model based on the decision result and the label for analysis. The analysis result 208 may be provided as a result of product quality analysis or learning of an inference model. The analysis result 208 will be described later.

분석리포트는, 품질관리 기준을 변경하여 신규 품질관리 기준을 산출하는 과정에서 활용될 수 있다. 또한, 신규 품질관리 기준이 적용된 생산과정에서 수집된 품질데이터의 특성을 확인하기 위해, 분석리포트가 생성될 수 있다.The analysis report can be used in the process of calculating a new quality control standard by changing the quality control standard. In addition, an analysis report can be created to confirm the characteristics of the quality data collected during the production process to which the new quality control standards are applied.

한편, 판정부(106)는, 신규 품질관리 기준을 산출하기 위한 시뮬레이터로서 추론 모델을 이용할 수 있다. Meanwhile, the determination unit 106 may use an inference model as a simulator for calculating a new quality control criterion.

특정한 공정인자에 대하여, 조정된 인자값을 설정한 후, 판정부(106)는 조정된 공정인자를 시뮬레이터에 입력하여 시뮬레이션된(simulated) 판정 결과를 생성한다. 조정된 공정인자 및 해당되는 판정 결과를 이용하여, 제품의 불량 발생을 감소시키는 방향으로 공정인자에 대한 신규 품질관리 기준이 생성될 수 있다.After setting the adjusted factor value for a specific process factor, the determination unit 106 inputs the adjusted process factor into a simulator to generate a simulated decision result. A new quality control criterion for the process factor may be created in the direction of reducing the occurrence of defects in the product by using the adjusted process factor and the corresponding judgment result.

한편, 시뮬레이터는 사용자에게 편의를 제공하기 위해 추가적인 구성요소를 포함할 수 있다. 따라서, 이하의 설명에서, 시뮬레이터는 추론 모델과 추가적인 구성요소를 포함하는 시스템을 나타낸다. Meanwhile, the simulator may include additional components to provide convenience to the user. Thus, in the description below, the simulator represents a system that includes an inference model and additional components.

도 3은 본 개시의 일 실시예에 따른 시뮬레이터의 추가적인 구성요소를 개략적으로 나타낸 것이다. 3 schematically illustrates additional components of a simulator according to one embodiment of the present disclosure.

시뮬레이터는, 공정인자의 선정과 조정, 및 판정 결과의 제공을 위해 공정인자 조정부(302), 판정결과 출력부(304), 중요인자 출력부(306), 및 기준적용부(308)의 전부 또는 일부를 포함한다. The simulator is all or all of the process factor adjustment unit 302, the decision result output unit 304, the important factor output unit 306, and the standard application unit 308 for selection and adjustment of process factors and provision of judgment results. includes some

공정인자 조정부(302)는 주요 공정인자인자로부터 조정 공정인자의 선정, 및 조정 공정인자의 값에 대한 조정을 수행한다. 전술한 바와 같이, 조정 공정인자의 선정에는 분석리포트가 제공하는 특성 중요도 및 공정인자별 데이터 분포(206)가 활용될 수 있다. The process factor adjustment unit 302 selects an adjusted process factor from main process factors and adjusts the value of the adjusted process factor. As described above, the data distribution 206 for each process factor and the importance of the characteristics provided by the analysis report can be used to select the adjustment process factor.

한편, 공정인자 조정부(302)는, 전술한 바와 같은, 입력인자를 조정 공정인자로 선정할 수 있다.Meanwhile, the process factor adjustment unit 302 may select the input factor as the adjustment process factor, as described above.

선정된 조정 공정인자가 범주형인 경우, 체크박스(check box)를 이용하여, 사용자가 원하는 범주가 선택될 수 있다. 숫자형인 경우, 슬라이더(slider)를 이용하여, 공정인자의 값이 조절될 수 있다. 체크박스 해제에 따라 공정인자를 시뮬레이션에서 제외할 수 있으므로, 단일 공정인자에 대한 시뮬레이션도 수행될 수 있다. 이때, 추론 모델로서 XGBoost 기반 모델이 채택된 경우, 제외된 공정인자는 기설정된 값으로 설정되며, 다른 알고리즘 기반 모델이 채택된 경우, 공정인자의 데이터 유형에 따라 최빈값 또는 중앙값으로 설정될 수 있다. If the selected adjustment process factor is a categorical type, a user-desired category may be selected using a check box. In the case of numeric type, the value of the process factor can be adjusted using a slider. Since the process factor can be excluded from the simulation by unchecking the checkbox, simulation for a single process factor can also be performed. At this time, when the XGBoost-based model is adopted as the inference model, the excluded process factor is set to a preset value, and when another algorithm-based model is adopted, the mode or median value may be set according to the data type of the process factor.

한편, 공정인자값의 조정 시, 해당 공정인자에 대한 T-테스트 결과를 참조하여, 제품에 대한 불량 분포를 최소화할 수 있도록 공정인자값을 조정할 수 있다. Meanwhile, when adjusting the process factor value, the process factor value may be adjusted to minimize the distribution of defects in the product by referring to the T-test result for the corresponding process factor.

판정결과 출력부(304)는, 조정 공정인자가 시뮬레이터에 입력되었을 경우에 대해 판정결과를 제공한다. 전술한 바와 같이, 판정 결과는 제품의 양 또는 불량에 대한 확률값이다. The decision result output unit 304 provides a decision result when the adjustment process factor is input to the simulator. As described above, the judgment result is a probability value for the quantity or defect of the product.

한편, 사용자는 조정 공정인자의 값을 변경하여 입력한 후, 판정 결과를 확인함으로써, 조정 공정인자의 불량 발생에 대한 영향을 확인할 수 있다. On the other hand, the user can check the effect of the adjusted process factor on the occurrence of defects by changing and inputting the value of the adjusted process factor and then checking the judgment result.

중요인자 출력부(306)는, 시뮬레이터가 사용 중인 공정인자의 특성 중요도를 제공한다. 여기서, 특성 중요도로는, 시뮬레이터로 이용되는 추론 모델에 대한 학습과정에서 생성된 특성 중요도가 재사용된다. The important factor output unit 306 provides the importance of the characteristics of the process factor being used by the simulator. Here, as the feature importance, the feature importance generated in the learning process for the inference model used as the simulator is reused.

기준적용부(308)는, 조정 공정인자의 값에 대한 판정 결과를 기반으로 조정 공정인자에 대한 최적 인자값을 선정하고, 이를 기반으로 조정 공정인자에 대한 품질관리 기준을 변경한다. The criterion application unit 308 selects an optimal factor value for the adjusted process factor based on the result of determining the value of the adjusted process factor, and changes the quality control standard for the adjusted process factor based on this.

이상에서 설명한 바와 같이 본 실시예에 따르면, 추론 모델을 시뮬레이터로 이용하여 공정인자를 조정하는 분석 시스템을 제공함으로써, 제품 불량 발생을 감소시키는 것이 가능해지는 효과가 있다. As described above, according to the present embodiment, by providing an analysis system for adjusting process factors using an inference model as a simulator, it is possible to reduce product defects.

UI부(110)는, 분석 시스템(100)에 관련된 입력을 사용자로부터 획득하거나, 분석 시스템(100)이 생성하는 출력을 디스플레이 상에 제공함으로써, 분석 시스템(100)이 제공하는 MLaaS를 사용자와 연결시키는 역할을 수행한다. UI부(110)를 기반으로, 마우스, 키보드 등의 수단을 이용하여, 사용자 입력이 분석 시스템(100)에 제공될 수 있다. The UI unit 110 connects the MLaaS provided by the analysis system 100 to the user by obtaining an input related to the analysis system 100 from the user or providing an output generated by the analysis system 100 on a display. fulfill the role of Based on the UI unit 110, a user input may be provided to the analysis system 100 using means such as a mouse and a keyboard.

도 4는 본 개시의 일 실시예에 따른 공정인자 선택을 위한 UI의 예시도이다.4 is an exemplary view of a UI for selecting a process factor according to an embodiment of the present disclosure.

UI부(110)는, 도 4에 예시된 바와 같이, 데이터 분석에 적용되는 품질데이터 중에서 공정인자를 선택하기 위한 체크박스(check box)를 포함한다. 체크박스가 선택된 공정인자에 대해, 공정인자의 유형이 추가로 입력될 수 있으며, 서술(description)은 공정인자의 유형에 따른 제약 사항을 나타낸다. As illustrated in FIG. 4 , the UI unit 110 includes a check box for selecting a process factor from quality data applied to data analysis. For the process factor whose checkbox is selected, the type of process factor can be additionally input, and the description indicates constraints according to the type of process factor.

또한, 도 4에 예시된 바는, 추론 모델의 트레이닝에 이용되는 공정인자를 선택하기 위한 체크박스(check box)로도 이용될 수 있다. In addition, the bar illustrated in FIG. 4 may also be used as a check box for selecting a process factor used for training of an inference model.

한편, UI부(110)는, 공정인자 중에서 타겟 인자를 설정하기 위한 입력 인터페이스를 포함한다. Meanwhile, the UI unit 110 includes an input interface for setting a target factor among process factors.

UI부(110)는, 분석데이터 요약(202), 공정인자 중요도(204), 공정인자별 데이터 분포(206), 또는 분석결과(208)를 포함하는 분석리포트를 디스플레이 상에 제공한다. 예컨대, UI부(110)는, 도 5에 예시된 바와 같이, 공정인자에 대한 특성 중요도를 제공할 수 있다. The UI unit 110 provides an analysis report including an analysis data summary 202, process factor importance 204, data distribution by process factor 206, or analysis result 208 on a display. For example, as illustrated in FIG. 5 , the UI unit 110 may provide feature importance for process factors.

또한, UI부(110)는, 추론 모델의 판정 결과에 기반하는 분석결과(208)를 제공할 수 있다. In addition, the UI unit 110 may provide an analysis result 208 based on the decision result of the reasoning model.

분석결과(208)는, 도 6에 예시된 바와 같이, 제품에 대한 양 또는 불량 판정에 대한 정확도(accuracy), 정밀도(precision), 리콜(recall), 및 F1 스코어를 포함할 수 있다. As illustrated in FIG. 6 , the analysis results 208 may include accuracy, precision, recall, and F1 scores for good or bad decisions for the product.

여기서, 정확도는, 양 또는 불량에 대한 예측이 GT(Ground Truth, 즉, 정답 또는 레이블)와 일치하는 비율이다. 정밀도는 불량으로 예측한 것의 GT도 불량인 비율이고, 리콜은 GT가 불량인 것을 불량으로 예측한 비율이다. F1 스코어는 정밀도와 리콜의 조화평균값(harmonic mean value)이다. Here, accuracy is the rate at which predictions for good or bad match GT (Ground Truth, ie correct answer or label). Accuracy is the proportion of GTs predicted as defective that are also defective, and recall is the proportion of predicted defective GTs as defective. The F1 score is the harmonic mean value of precision and recall.

한편, 도 6의 예시에서, 머신 러닝 모델 식별자는, 머신 러닝 모델이 구현하고 있는 알고리즘, 즉 결정 트리, 랜덤 포레스트, XGBoost, 또는 LightGBM 중의 하나를 나타낸다. Meanwhile, in the example of FIG. 6 , the machine learning model identifier represents one of algorithms implemented by the machine learning model, that is, decision tree, random forest, XGBoost, or LightGBM.

UI부(110)는, 4 가지 머신 러닝 알고리즘을 각각을 적용한 모델에 대한 트레이닝 결과를 디스플레이 상에 제공한다. 도 6에 예시된 바와 같은 분석결과(208)는, 각 알고리즘 기반 모델에 대한 트레이닝 결과로도 이용될 수 있다. 또한, 트레이닝 결과는 추론 모델의 학습에 소요된 시간인 수행시간(runtime)을 포함할 수 있다.The UI unit 110 provides training results for models to which each of the four machine learning algorithms are applied on a display. The analysis result 208 as illustrated in FIG. 6 may also be used as a training result for each algorithm-based model. In addition, the training result may include runtime, which is the time required for learning the inference model.

UI부(110)는, 도 7에 예시된 바와 같이, 시뮬레이터의 이용 시, 공정인자 조정부(302)와 관련된 입력을 획득하기 위한 체크박스(check box)를 포함한다. 체크박스가 선택된 공정인자에 대해, 공정인자값이 데이터 유형에 따라 조정될 수 있다. 또한, UI부(110)는, 판정결과 출력부(304) 및 중요인자 출력부(306)와 관련된 결과를 디스플레이 상에 제공한다. As illustrated in FIG. 7 , the UI unit 110 includes a check box for obtaining an input related to the process factor adjustment unit 302 when using a simulator. For the process factor whose checkbox is selected, the process factor value can be adjusted according to the data type. In addition, the UI unit 110 provides results related to the decision result output unit 304 and the significant factor output unit 306 on a display.

UI부(100)는, 공정인자 간의 상관관계 분석에 이용되는 매트릭스 형태의 히트맵(heatmap)을 제공할 수 있다.The UI unit 100 may provide a matrix-type heatmap used for correlation analysis between process factors.

또한, UI부(100)는, MLaaS 상에서 각종 판단의 근거가 되는 기설정값들(예를 들어, 주요 공정인자의 개수를 나타내는 기설정값)을, 획득하기 위한 입력 인터페이스를 제공한다.In addition, the UI unit 100 provides an input interface for obtaining preset values (eg, preset values representing the number of major process factors) that are grounds for various judgments on MLaaS.

UI부(100)가 지원하는 인터페이스가 전술한 바에 한정되는 것은 아니며, MLaaS를 사용자와 연결시키기 위한 인터페이스가 필요에 따라 더 추가될 수 있다.The interface supported by the UI unit 100 is not limited to the above, and an interface for connecting MLaaS to a user may be further added as needed.

트레이닝부(112)는 학습용 품질데이터 및 해당되는 레이블을 이용하여 추론 모델에 대한 트레이닝을 수행한다. The training unit 112 performs training on an inference model using quality data for learning and corresponding labels.

전술한 바와 같이, 추론 모델은 머신 러닝 모델 형태로 구현되는데, 결정 트리, 랜덤 포레스트, XGBoost, 또는 LightGBM 과 같은 4 가지 머신 러닝 알고리즘 중 하나가 구현된 모델일 수 있다.As described above, the inference model is implemented in the form of a machine learning model, and may be a model implemented with one of four machine learning algorithms such as decision tree, random forest, XGBoost, or LightGBM.

결정 트리는 특정한 기준(예컨대, 수치형 공정인자의 특정값, 또는 범주형 공정인자의 범주 등)에 따라 데이터를 구분하는 모델이다. 결정 트리에서 분기는, 분기에 이용되는 공정인자에 의한 정보 이득(information gain)이 최대화되는 방향으로 수행되며, 이를 결정 트리에 대한 트레이닝이라 한다. A decision tree is a model that classifies data according to a specific criterion (eg, a specific value of a numerical process factor or a category of a categorical process factor). Branching in a decision tree is performed in a direction in which information gain by a process factor used for branching is maximized, and this is called training for a decision tree.

루트 노드(root node)를 하나의 공정인자를 기준으로 분기하여 두 개의 리프 노드(leaf node)가 생성된 경우, 루트 노드가 갖는 정보에서 두 개의 리프 노드가 갖는 정보를 감산함으로써, 정보 이득을 산출할 수 있다. 이때, 정보 이득의 산출 과정에서 레이블이 이용된다. 분기된 리프 노드가 더 정돈된 상태이므로, 두 개의 리프 노드가 갖는 정보는 루트 노드의 정보보다 클 수 없다. 따라서, 정보 이득은 항상 0 이상의 값을 갖는다. 한편, 정보로는 엔트로피(entropy) 또는 지니 불순도(Gini impurity)가 이용될 수 있다. When two leaf nodes are generated by branching the root node based on one process factor, the information gain is calculated by subtracting the information of the two leaf nodes from the information of the root node. can do. At this time, the label is used in the process of calculating the information gain. Since branched leaf nodes are more ordered, the information of two leaf nodes cannot be greater than that of the root node. Therefore, the information gain always has a value greater than zero. Meanwhile, as information, entropy or Gini impurity may be used.

랜덤 포레스트는, 다수의 결정 트리에 기반하는 앙상블 모델(ensemble model)로서, 다수의 결정 트리에 의한 결정을 결합(aggregation, 예컨대, 분류 모델인 경우, 다수결을 택하고, 회귀 모델인 경우 평균을 취함)하여 최종 출력을 생성한다. 랜덤 포레스트에 포함된 각 결정 트리에 대한 트레이닝은, 하나의 결정 트리에 대한 학습과 동일하게 수행될 수 있다. 랜덤 포레스트의 특징은, 각 결정 트리의 학습에 이용되는 학습용 데이터 셋 간에 복원추출(boostrap)을 허락한다는 점이다. 랜덤 포레스트가 갖는 복원추출, 및 다수의 결정 트리에 의한 결정의 결합을 포괄하여 이를 배깅(bagging, boostrap+aggregation)이라 한다. A random forest is an ensemble model based on multiple decision trees, which combines decisions made by multiple decision trees (e.g., in the case of a classification model, a majority vote is taken, and in the case of a regression model, the average is taken). ) to produce the final output. Training for each decision tree included in the random forest may be performed in the same way as learning for one decision tree. A feature of random forest is that it allows boosttrap between training data sets used for learning each decision tree. This is called bagging (boostrap + aggregation), which encompasses extraction with restoration of random forest and combination of decisions by multiple decision trees.

XGBoost와 LightGBM은 모두 GBM(Gradient Boosting Model) 계열의 알고리즘이다. GBM은 부스팅(boosting) 계열의 앙상블 알고리즘이다. 여기서, 부스팅이란 다수의 약한 분류기(weak classifier)를 순차적으로 생성(즉, 트레이닝)한 후, 이들을 결합하여 강한 분류기(strong classifier)를 생성하는 과정이다. 예컨대, 3 개의 약한 분류기 A, B, C에 대해, 분류기 A를 생성하고, 그 정보를 바탕으로 분류기 B를 생성하며, 다시 그 정보를 바탕으로 분류기 C를 생성한 후, 최종적으로 분류기들을 모두 결합하여 강한 분류기를 만들 수 있다. 이러한 부스팅 과정에서, GBM은 전단의 약한 모델로부터 산출된 부의 경사도(negative gradient)를 기반으로 다음 단의 약한 모델을 생성한다.Both XGBoost and LightGBM are GBM (Gradient Boosting Model) family algorithms. GBM is an ensemble algorithm based on boosting. Here, boosting is a process of sequentially generating (ie, training) a plurality of weak classifiers and then combining them to generate a strong classifier. For example, for three weak classifiers A, B, and C, classifier A is created, classifier B is created based on that information, classifier C is created again based on that information, and finally all classifiers are combined. Thus, a strong classifier can be created. In this boosting process, GBM creates a weak model of the next stage based on the negative gradient calculated from the weak model of the front stage.

XGBoost 알고리즘은, 약한 분류기가 결정 트리로 구현된 앙상블 모델을 학습하기 위한 GBM 계열의 알고리즘이다. XGBoost 알고리즘은 트레이닝을 위한 손실함수에 규제항(regulation term)을 포함하여, GBM의 단점인 과적합(overfitting)을 방지하는 데 있어서 유용하다는 장점을 갖는다. The XGBoost algorithm is a GBM-based algorithm for learning an ensemble model in which a weak classifier is implemented as a decision tree. The XGBoost algorithm has the advantage of being useful in preventing overfitting, which is a disadvantage of GBM, by including a regulation term in the loss function for training.

LightGBM 알고리즘도, 약한 분류기가 결정 트리로 구현된 앙상블 모델을 학습하기 위한 GBM 계열의 알고리즘이다. LightGBM 알고리즘은, GBM 계열의 알고리즘들의 느린 학습 속도를 개선하기 위해, 레벨중심(level-wise)가 아닌, 리프중심(leaf-wise)으로 트리 분기를 수행한다. LightGBM 알고리즘은, 너무 적은 수의 데이터를 이용하면 과적합 문제를 발생시키므로, 대용량의 데이터 처리에 적합한 것으로 알려져 있다. The LightGBM algorithm is also a GBM-based algorithm for learning an ensemble model in which a weak classifier is implemented as a decision tree. The LightGBM algorithm performs tree branching leaf-wise, not level-wise, in order to improve the slow learning speed of GBM-based algorithms. The LightGBM algorithm is known to be suitable for large-volume data processing because it causes an overfitting problem when too few data are used.

4 가지 머신 러닝 알고리즘은 모두 결정 트리 기반으로 동작하므로, 학습의 결과물로서 분기에 이용되는 공정인자에 대한 특성 중요도를 생성할 수 있다. Since all four machine learning algorithms operate based on decision trees, they can generate feature importance for process factors used for branching as a result of learning.

하나의 공정인자에 대한 특성 중요도는, (복수의) 결정 트리에 의한 총 정보 이득에 대한, 하나의 공정인자가 생성한 총 정보 이득의 비율이다. 즉, 학습된 결정 트리가 생성한 총 정보 이득 중에, 하나의 공정인자에 따른 모든 분기들이 기여한 정도를 나타낸다. 특성 중요도가 높을수록, 해당되는 공정인자는, 추론 모델이 판정 결과를 생성함에 있어서, 기여하는 바가 높은 것으로 판단된다. The feature importance for one process factor is the ratio of the total information gain generated by one process factor to the total information gain by (multiple) decision trees. That is, it represents the contribution of all branches according to one process factor to the total information gain generated by the learned decision tree. The higher the importance of the feature, the higher the contribution of the corresponding process factor when the inference model generates the decision result.

특정 공정인자에 대한 품질관리 기준을 조정할 때, 이러한 특성 중요도를 활용할 수 있으므로, 본 실시예에서는 추론 모델을 위한 머신 러닝 알고리즘으로서, 전술한 바와 같은 결정 트리, 랜덤 포레스트, XGBoost, 또는 LightGBM 중의 하나를 이용한다.When adjusting the quality control criteria for a specific process factor, this feature importance can be utilized, so in this embodiment, one of the above-described decision trees, random forests, XGBoost, or LightGBM is used as a machine learning algorithm for an inference model. use

트레이닝 과정을 이용하여, 트레이닝부(110)는, 4 가지 머신 러닝 알고리즘 각각을 적용한 모델 중에서 가장 성능이 좋은 모델을 추론 모델로 선정할 수 있다. 추론 모델에 대한 알고리즘을 선정한 후, 트레이닝부(110)는 4 가지 머신 러닝 알고리즘을 구현한 모델 각각에 대한 트레이닝 결과를 판정근거로서 제시한다. Using the training process, the training unit 110 may select a model with the best performance among models to which each of the four machine learning algorithms are applied as an inference model. After selecting an algorithm for an inference model, the training unit 110 presents training results for each of the models implementing the four machine learning algorithms as a basis for determination.

이하, 도 8 내지 도 11의 예시를 이용하여, 트레이닝부(110)가 수행하는 추론 모델의 트레이닝 과정에 대해 설명하도록 한다.Hereinafter, the training process of the inference model performed by the training unit 110 will be described using the examples of FIGS. 8 to 11 .

도 8은 본 개시의 일 실시예에 따른 추론 모델의 트레이닝에 이용되는 추가적인 구성요소를 개략적으로 나타낸다.8 schematically illustrates additional components used for training of an inference model according to an embodiment of the present disclosure.

추론 모델을 트레이닝하기 위해, 트레이닝부(110)는 입력부(102) 외에 추가적으로 데이터 전처리부(104), 공정인자 선정부(806), 데이터 균형화부(808), 및 4 개의 머신 러닝 모델(810, 이하, '4 개의 모델'과 호환하여 사용)의 전부 또는 일부를 이용할 수 있다. 여기서, 4 개의 모델(810)은, 전술한 바와 같은, 4 가지 머신 러닝 알고리즘 각각을 적용한 모델을 나타낸다. In order to train the inference model, the training unit 110, in addition to the input unit 102, additionally includes a data preprocessing unit 104, a process factor selection unit 806, a data balancing unit 808, and four machine learning models 810, Hereinafter, all or part of the 'four models' may be used. Here, the four models 810 represent models to which each of the four machine learning algorithms, as described above, is applied.

입력부(102)는, 트레이닝에 사용하기 위해, 제품에 대한 품질데이터를 획득한다. 품질데이터는 제품의 생산과정에(서) 적용되거나 발생하는, 복수의 공정인자에 대해 수집될 수 있다. The input unit 102 obtains quality data about the product for use in training. Quality data can be collected for a plurality of process factors applied or occurring in the production process of a product.

입력부(102)는, 트레이닝에 사용되는 공정인자에 대해 데이터 유형을 설정할 수 있다. 여기서 공정인자의 데이터 유형은, 수치로 표현되는 숫자형, 문자로 표현되는 범주형, 및 데이터가 수집된 시간 정보를 포함하는 시간형을 포함할 수 있다. The input unit 102 may set data types for process factors used for training. Here, the data type of the process factor may include a numeric type expressed as a numerical value, a categorical type expressed as a character, and a time type including time information at which data was collected.

한편, 품질데이터는, 추론 모델의 트레이닝 과정에서 타겟 출력(즉, 학습용 레이블)으로 이용될 수 있는 인자(예컨대, 제품에 대한 필드클레임 발생 유무)를 포함할 수 있다. 입력부(102)는 타겟 출력으로 이용되는 인자를 타겟 인자로 설정한다. On the other hand, the quality data may include a factor that can be used as a target output (ie, a label for learning) in the training process of an inference model (eg, whether a field claim has occurred for a product). The input unit 102 sets a factor used as a target output as a target factor.

공정인자에 대한 범주, 및 타겟 인자는, 전술한 바와 같은, UI부(110)를 이용하여 설정될 수 있다. The category and target factor for the process factor may be set using the UI unit 110 as described above.

데이터 전처리부(104)는 공정인자의 데이터 유형별로 적절한 인코딩 과정을 수행하고, 수집과정에서 발생한 누락 데이터를 적절한 값으로 설정한다. The data pre-processing unit 104 performs an appropriate encoding process for each data type of process factor, and sets missing data generated in the collection process to an appropriate value.

도 9는 본 개시의 일 실시예에 따른 품질데이터의 전처리과정에 대한 흐름도이다.9 is a flowchart of a pre-processing process of quality data according to an embodiment of the present disclosure.

데이터 전처리부(104)는 공정인자의 데이터 유형을 확인한다(S900).The data pre-processing unit 104 checks the data type of the process factor (S900).

데이터 전처리부(104)는, 데이터 유형이 숫자형 데이터인지를 확인하여(S902), 아닌 경우, 범주형 데이터인지를 확인한다(S904). The data pre-processing unit 104 checks whether the data type is numeric data (S902), and if not, checks whether it is categorical data (S904).

데이터 전처리부(104)는 숫자형/범주형 데이터가 아닌, 시간형 데이터를 제거한다(S906). 품질데이터가 수집된 시간은, 제품의 양 또는 불량과 연관 관계가 적다고 판단하여, 시간형 공정인자는 트레이닝을 위한 품질데이터에서 제거된다.The data pre-processing unit 104 removes time-type data, not numeric/categorical data (S906). It is determined that the time for which the quality data is collected has little correlation with the quantity or defect of the product, and the time-type process factor is removed from the quality data for training.

범주형 데이터인 경우, 데이터 전처리부(104)는 추론 모델에 적합한 임베딩 값으로 변환하는 인코딩 과정을 수행한다(S908).In the case of categorical data, the data pre-processing unit 104 performs an encoding process of converting into an embedding value suitable for the inference model (S908).

범주형 데이터의 예로는, 제품에 대한 필드클레임 발생 유무를 나타내는 타겟 인자를 들 수 있다. 타겟 인자에 대한 인코딩 과정은, 예컨대, 제품에 대한 필드클레임이 발생하지 않은 경우를 0, 필드클레임이 발생한 경우를 1로 나타낸다. 따라서, 이러한 타겟 인자에 대한 인코딩은, 추론 모델의 트레이닝을 위한 학습용 레이블을 생성하는 과정일 수 있다. An example of categorical data is a target factor indicating whether a field claim has occurred for a product. In the encoding process for the target factor, for example, a case where no field claim for a product has occurred is represented by 0, and a case where a field claim has occurred is represented by 1. Accordingly, encoding of the target factor may be a process of generating a learning label for training an inference model.

숫자형 데이터, 및 인코딩된 범주형 데이터에 대해, 데이터 전처리부(104)는 수집과정에서 발생한 누락 데이터를 처리한다(S910). 이때, 범주형 데이터는 최빈값으로, 숫자형 데이터는 중앙값으로 설정될 수 있다. 한편, 누락이 심한 공정인자의 경우, 추론 모델의 트레이닝에 방해가 될 수 있다. 따라서, 데이터 전처리부(104)는, 누락률이 기설정된 비율(예컨대, 80 %)보다 큰 공정인자를 트레이닝 과정에서 제거할 수 있다. For numeric data and encoded categorical data, the data pre-processing unit 104 processes missing data generated in the collection process (S910). In this case, the categorical data may be set as the most frequent value, and the numeric data may be set as the median value. On the other hand, in the case of a process factor with severe omission, it may interfere with the training of the inference model. Accordingly, the data pre-processing unit 104 may remove a process factor having a higher omission rate than a predetermined rate (eg, 80%) in the training process.

품질데이터에 포함된 공정인자의 개수는, 대상 제품에 따라 수십에서 수백 개일 수 있다. 공정인자 선정부(806)는, 품질데이터에 포함된 다수의 공정인자로부터 타겟 인자에 영향력이 높은 주요 공정인자를 선별한다. 선별된 주요 공정인자를 사용함으로써, 추론 모델의 복잡도, 및 학습 소요 시간이 감소될 수 있다. The number of process factors included in the quality data may range from tens to hundreds depending on the target product. The process factor selection unit 806 selects a major process factor having a high influence on the target factor from a plurality of process factors included in the quality data. By using the selected key process factors, the complexity and learning time of the inference model can be reduced.

도 10은 본 개시의 일 실시예에 따른 공정인자 선정과정에 대한 흐름도이다.10 is a flowchart of a process factor selection process according to an embodiment of the present disclosure.

공정인자 선정부(806)는, 데이터 전처리부(104)에 의해 전처리된 품질데이터를 획득한다(S1000). The process factor selection unit 806 obtains the quality data preprocessed by the data preprocessing unit 104 (S1000).

공정인자 선정부(806)는, 공정인자의 개수가 기설정된 개수(도 10의 예시에서는 20 개) 이하인지 확인한다(S1002), 공정인자의 개수가 기설정된 개수 이하인 경우, 공정인자 선정부(806)는 공정인자 선정과정을 생략할 수 있다. The process factor selection unit 806 checks whether the number of process factors is less than or equal to a predetermined number (20 in the example of FIG. 10) (S1002). If the number of process factors is less than or equal to the predetermined number, the process factor selection unit ( 806) can omit the process factor selection process.

공정인자의 개수가 기설정된 개수보다 큰 경우, 공정인자 선정부(806)는 주요 공정인자를 산출하기 위한 과정(S1004 내지 S1008)을 수행하여, 기설정된 개수 이하가 되도록 주요 공정인자를 선별할 수 있다.If the number of process factors is greater than the preset number, the process factor selection unit 806 may perform processes (S1004 to S1008) for calculating the main process factors to select the main process factors to be less than or equal to the preset number. there is.

먼저, 공정인자 선정부(806)는 품질데이터에 포함된 공정인자에 대하 T-테스트를 수행한다(S1004). First, the process factor selector 806 performs a T-test on process factors included in the quality data (S1004).

여기서, T-테스트는, 각 공정인자별로 제품의 양 및 불량에 대한 두 개의 분포를 비교하여 통계적 유의성을 확인하는 방식이다. 두 개의 분포 간에 차이가 유의한 경우, 공정인자 선정부(806)는 해당 공정인자가 불량의 발생에 영향을 줄 수 있다고 판단하여, 주요 공정인자로 선정한다.Here, the T-test is a method of confirming statistical significance by comparing two distributions for product quantity and defect for each process factor. When the difference between the two distributions is significant, the process factor selector 806 determines that the corresponding process factor can affect the occurrence of defects, and selects the process factor as a major process factor.

공정인자 선정부(806)는, T-테스트를 통과한 공정인자의 개수가 기설정된 개수 이하인 경우, 잔여 과정(S1006 및 S1008)을 생략하고, T-테스트를 통과한 공정인자를 최종적인 주요 공정인자로 선별할 수 있다. The process factor selection unit 806, when the number of process factors that have passed the T-test is less than or equal to the preset number, skips the remaining processes (S1006 and S1008), and selects the process factors that have passed the T-test as the final main process. It can be selected as a factor.

공정인자 선정부(806)는, T-테스트를 통과한 공정인자에 대해, 이들 간의 정보 이득을 비교한다(S1006). 정보 이득이 높은 순으로, 기설정된 개수(예컨대, 20 개)의 공정인자가 선별될 수 있다. 여기서 정보 이득은, 전술한 바와 같이, 제품의 정상 또는 불량에 대한 정보로부터, 하나의 공정인자에 의한 분기 후의 정상 또는 불량에 대한 정보를 감산하여 생성할 수 있다.The process factor selector 806 compares information gains between process factors that have passed the T-test (S1006). A predetermined number (eg, 20) of process factors may be selected in order of information gain. As described above, the information gain may be generated by subtracting the information on the normality or defect after branching by one process factor from the information on the normality or defect of the product.

공정인자 선정부(806)는, 정보 이득 순으로 선별된 공정인자 간의 상관관계를 분석한다(S1008). 전술한 바와 같이, 공정인자는 제품 생산과정의 입력인자, 중간출력인자, 또는 출력인자일 수 있으므로, 정보 이득 순으로 선별된 공정인자 간에 상관관계가 존재할 수 있다. 이때, 복수의 공정인자 대해, 두 공정인자 간의 상관관계는 상관계수(correlation coefficient)로 나타내는데, 상관계수는 두 공정인자의 공분산(covariance)을 두 공정인자의 표준편차의 곱으로 나눈 값이다. 한편, 상관계수는 매트릭스 형태의 히트맵(heatmap) 상에 표현될 수 있다. The process factor selection unit 806 analyzes the correlation between the process factors selected in the order of information gain (S1008). As described above, since the process factor may be an input factor, an intermediate output factor, or an output factor of the product production process, a correlation may exist between the process factors selected in the order of information gain. At this time, for a plurality of process factors, the correlation between the two process factors is represented by a correlation coefficient, which is a value obtained by dividing the covariance of the two process factors by the product of the standard deviations of the two process factors. Meanwhile, the correlation coefficient may be expressed on a heatmap in the form of a matrix.

공정인자 선정부(806)는 선별된 공정인자 간의 상관관계를 분석하여, 상관계수가 기설정된 기준치보다 큰 경우를 확인한다. 공정인자 선정부(806)는, 상관계수가 기설정된 기준치보다 큰 두 공정인자에 대해, 출력인자, 중간출력인자, 및 입력인자 순으로 제거한다. 예컨대, 상관관계가 존재하는 두 공정인자 각각이 출력인자 및 입력인자인 경우, 출력인자를 제거한다. 한편, 상관계수가 기설정된 기준치보다 큰 두 공정인자가 같은 종류인 경우, 정보 이득이 높은 공정인자를 선별한다. The process factor selection unit 806 analyzes the correlation between the selected process factors and identifies a case where the correlation coefficient is greater than a preset reference value. The process factor selection unit 806 removes, in the order of output factors, intermediate output factors, and input factors, two process factors having a correlation coefficient greater than a predetermined reference value. For example, when two process factors having a correlation are an output factor and an input factor, respectively, the output factor is removed. Meanwhile, when two process factors having a higher correlation coefficient than a predetermined reference value are of the same type, a process factor having a high information gain is selected.

상관관계에 기반하는 공정인자 선별을 이용하여, 공정인자 선정부(806)는 공정인자 간에 존재하는 다중공선성(multicollinearity)를 제거할 수 있다. Using process factor selection based on correlation, the process factor selector 806 may remove multicollinearity existing between process factors.

한편, 상관관계 분석에 따른 공정인자의 제거 때문에, 선별된 공정인자의 개수가 기설정된 개수보다 작아진 경우, 공정인자 선정부(806)는 정보 이득의 순서에 따라, 추가로 공정인자를 선별할 수 있다.On the other hand, when the number of selected process factors is smaller than the preset number due to the removal of process factors according to the correlation analysis, the process factor selector 806 may additionally select process factors according to the order of information gain. can

전술한 바와 같은 T-테스트, 정보 이득 비교, 및 상관관계 분석을 기반으로, 공정인자 선정부(806)는 최종적인 주요 공정인자를 선별할 수 있다.Based on the T-test, information gain comparison, and correlation analysis as described above, the process factor selector 806 may select a final major process factor.

한편, 품질데이터는 대체로 양품 대비하여 불량품 데이터가 굉장히 적은 불균형 상태를 가질 수 있다. 예컨대, 제품에 따라 수천 대 일의 심각한 비율을 나타내는 경우도 존재한다. 이러한 불균형 상태는 머신 러닝 알고리즘 기반 모델에 대한 편향된 학습을 유도할 수 있으므로, 불량품 데이터의 증강에 기반하는 데이터 균형화(data balancing)가 필요할 수 있다.On the other hand, the quality data may have an imbalance state in which the defective product data is generally very small compared to the good product. For example, depending on the product, there are cases where a serious ratio of thousands to one is displayed. Since such an imbalanced state may induce biased learning for a machine learning algorithm-based model, data balancing based on augmentation of defective data may be required.

데이터 균형화부(808)는 불량 데이터에 대한 데이터 균형화(data balancing)을 수행한다. 데이터 균형화부(808)는 불량 데이터를 업샘플링(upsampling)하여 불량 데이터의 개수를 증강시킴으로써, 불량 데이터와 양품 데이터 간의 균형을 성취한다. 예컨대, 데이터 균형화부(808)는 kNN(k Nearest Neighbors) 모델 기법을 이용하여 데이터 분포 내에서 유사한 데이터를 생성할 수 있다. The data balancing unit 808 performs data balancing on bad data. The data balancing unit 808 upsamples bad data to increase the number of bad data, thereby achieving a balance between bad data and good data. For example, the data balancing unit 808 may generate similar data within a data distribution using a k Nearest Neighbors (kNN) model technique.

여기서, kNN 모델 기법은, 새로운 데이터가 주어지면, 그 주변(이웃) k 개의 데이터를 살펴본 후, 더 많은 데이터가 포함되어 있는 범주로 분류하는 방식이다. 따라서, k 개 중에 과반 이상의 불량 데이터를 포함하는 주변에서 새로운 데이터를 생성함으로써, 데이터 균형화부(808)는 불량 데이터의 개수를 증강시킬 수 있다. Here, the kNN model technique is a method of classifying into a category that includes more data after examining k data around (neighbors) when new data is given. Accordingly, the data balancing unit 808 may increase the number of defective data by generating new data in the vicinity including more than half of the defective data among the k pieces.

품질데이터에 대한 전처리, 주요 공정인자 선별, 및 균형화를 수행한 후, 트레이닝부(112)는, 전술한 바와 같은, 결정 트리, 랜덤 포레스트, XGBoost, 및 LightGBM 알고리즘에 기반하는 4 개 머신 러닝 모델(810)에 대한 트레이닝을 수행한 후, 가장 성능이 좋은 하나를 추론 모델로 선정한다. After performing preprocessing of quality data, selection of key process factors, and balancing, the training unit 112, as described above, uses four machine learning models based on the decision tree, random forest, XGBoost, and LightGBM algorithms ( 810) is trained, and one with the best performance is selected as an inference model.

먼저, 트레이닝부(112)는 균형화된 품질데이터를 학습용 데이터 및 검증용 데이터로 분할한다. 예컨대, 80 %의 품질데이터가 학습용 데이터로, 잔여 20 %의 품질데이터가 검증용 데이터로 이용될 수 있다. First, the training unit 112 divides the balanced quality data into learning data and verification data. For example, 80% of quality data may be used as training data, and the remaining 20% of quality data may be used as verification data.

트레이닝부(112)는 학습용 데이터 및 학습용 레이블을 기반으로 4 개의 머신 러닝 모델(810)에 대한 트레이닝을 수행한다. 각 모델은 결정 트리 기반으로 구현되므로, 트리 내 각 분기에서의 정보 이득을 최대화하는 방향으로 트레이닝이 수행될 수 있다.The training unit 112 performs training on four machine learning models 810 based on training data and training labels. Since each model is implemented based on a decision tree, training can be performed in a direction that maximizes the information gain at each branch in the tree.

트레이닝부(112)는 검증용 데이터를 기반으로 4 개의 머신 러닝 모델(810)에 대한 교차 검증을 수행하여, 4 개의 머신 러닝 모델(810)에 대한 트레이닝 성능을 저장한다.The training unit 112 performs cross-validation on the four machine learning models 810 based on data for verification, and stores training performance of the four machine learning models 810 .

트레이닝을 위한 하이퍼파라미터로는, 예컨대, 최대 깊이(max-depth), 리프 한계(leaf-limit) 등이 이용되는데, 최대 깊이는 트리 분기의 최대값을 나타내고, 리프 한계는 리프에 대한 한계값을 나타낸다.As hyperparameters for training, for example, max-depth, leaf-limit, etc. are used. The maximum depth represents the maximum value of a tree branch, and the leaf limit represents the limit value for a leaf. indicate

특히, 트레이닝부(112)는, 4 개의 모델(810)에 대한 트레이닝 과정에서 최대 깊이를 적절히 조절함으로써, 과적합을 방지하는 데 중점을 둔다. In particular, the training unit 112 focuses on preventing overfitting by appropriately adjusting the maximum depth in the training process for the four models 810 .

4 개의 모델(810)에 대한 학습이 완료된 후, 트레이닝부(112)는 4 개의 모델(810) 간의 성능을 비교하여, 추론 모델을 선정한다. 학습된 모델의 성능은, 도 6에 예시된 바와 같이, 학습용 레이블, 및 각 머신 러닝 모델이 생성하는 판정 결과에 기반하는 정확도, 정밀도, 리콜, F1 스코어를 포함한다. 또한, 학습된 모델의 성능은 학습에 소요된 시간인 수행시간(runtime)을 포함할 수 있다. After learning of the four models 810 is completed, the training unit 112 compares performances of the four models 810 and selects an inference model. As illustrated in FIG. 6 , the performance of the learned model includes a training label and accuracy, precision, recall, and F1 score based on the decision result generated by each machine learning model. In addition, the performance of the learned model may include runtime, which is the time required for learning.

트레이닝부(112)는, F1 스코어가 가장 높은 모델을 최종적인 추론 모델로 선정한다. 다만, 사용자는 최종 모델의 선정 시, 불량을 감소시키는 것이 목적이면 리콜을, 가성불량을 감소시키는 것이 목적이면 정밀도를 선정기준으로 이용할 수 있다.The training unit 112 selects a model having the highest F1 score as a final inference model. However, when selecting a final model, the user may use recall as a selection criterion if the purpose is to reduce defects or precision if the purpose is to reduce false defects.

도 11은 본 개시의 다른 실시예에 따른 머신 러닝 모델에 대한 트레이닝 과정을 나타내는 흐름도이다.11 is a flowchart illustrating a training process for a machine learning model according to another embodiment of the present disclosure.

트레이닝부(112)는 균형화된 품질데이터를 학습용 데이터 및 검증용 데이터로 분할한다(S1100). The training unit 112 divides the balanced quality data into training data and verification data (S1100).

트레이닝부(112)는, 학습용 데이터 및 학습용 레이블을 기반으로 하나의 머신 러닝 모델에 대한 트레이닝을 수행한다(S1102). 각 모델은 결정 트리 기반으로 구현되므로, 트리 내 각 분기에서의 정보 이득을 최대화하는 방향으로 트레이닝이 수행될 수 있다.The training unit 112 performs training on one machine learning model based on the learning data and the learning label (S1102). Since each model is implemented based on a decision tree, training can be performed in a direction that maximizes the information gain at each branch in the tree.

트레이닝부(112)는, 학습된 머신 러닝 모델에 대해, 검증용 데이터를 기반으로 교차 검증을 수행한 후(S1104), 트레이닝 결과로서 모델에 대한 성능을 저장한다(S1106).The training unit 112 performs cross-validation on the learned machine learning model based on data for verification (S1104), and then stores the performance of the model as a training result (S1106).

머신 러닝 모델의 트레이닝에 있어서, 중요한 사항 중 하나는 학습 소요 시간과 성취되는 모델 성능 간의 절충(trade-off)이다. 데이터 분석에 숙달되지 않은 현업 담당자가 분석 시스템(100)을 활용하기 위해서, 학습 소요 시간을 2~3 시간 내외로 관리하는 것이 적정할 수 있으므로, 이 정도의 학습 시간이 절충 과정에서 기준으로 이용될 수 있다. 이러한 학습 시간의 기준을 만족시키기 위해, 품질데이터에 적합하도록 기설정된 하이퍼파라미터(hyperparameter, 초매개변수)를 기반으로 머신 러닝 모델 각각에 대해 한 차례 트레이닝을 진행하고, 4 개의 머신 러닝 모델(810)에 대해 교차검증(cross validation) 성능을 비교하여 최적의 모델을 선정하는 방식이 이용될 수 있다. In the training of machine learning models, one of the important issues is the trade-off between training time and model performance achieved. In order for field personnel who are not proficient in data analysis to utilize the analysis system 100, it may be appropriate to manage the learning time within 2 to 3 hours, so this amount of learning time can be used as a criterion in the negotiation process. can In order to satisfy this criterion of learning time, training is performed once for each machine learning model based on hyperparameters (hyperparameters) preset to be suitable for quality data, and four machine learning models (810) A method of selecting an optimal model by comparing cross validation performance may be used.

최적화의 측면에서는, 모델별로 하이퍼파라미터를 조정한 후에, 모델 간 성능을 비교하여 최적 모델을 선정해야 한다. 그러나, 본 실시예에 있어서, 품질데이터의 불균형 특성에 맞도록 경험적으로 적절한 값으로 하이퍼파라미터가 조정되었기 때문에, 한 차례의 학습 후에 교차검증을 수행하여 모델의 성능을 비교함으로써 학습 시간이 최소화될 수 있다.In terms of optimization, after adjusting the hyperparameters for each model, the performance of the models is compared to select the optimal model. However, in this embodiment, since the hyperparameters are empirically adjusted to appropriate values to match the imbalance characteristics of the quality data, the learning time can be minimized by comparing the performance of the model by performing cross-validation after one round of learning. there is.

특히, 트레이닝부(112)는, 4 개의 모델(810)에 대한 트레이닝 과정에서 최대 깊이를 적절히 설정함으로써, 과적합을 방지하는 데 중점을 둔다. In particular, the training unit 112 focuses on preventing overfitting by appropriately setting the maximum depth in the training process for the four models 810 .

트레이닝부(112)는, 4 개의 모델(810)에 대한 트레이닝이 모두 수행되었는지를 확인하여(S1110), 학습되지 않은 모델이 남아 있는 경우, 이들에 대한 학습 및 검증(S1102 내지 S1106)을 계속하여 수행한다. The training unit 112 checks whether training for all four models 810 has been performed (S1110), and if unlearned models remain, continuing to learn and verify them (S1102 to S1106) carry out

4 개의 모델(810)에 대한 학습이 완료된 후, 트레이닝부(112)는 4 개의 모델 간의 성능을 비교하여, 추론 모델을 선정한다(S1112). After learning of the four models 810 is completed, the training unit 112 compares performance between the four models and selects an inference model (S1112).

트레이닝부(112)는 선정된 추론 모델에 대한 하이퍼파라미터 최적화를 수행한다(S1114). The training unit 112 performs hyperparameter optimization on the selected inference model (S1114).

전술한 바와 같은, 학습 소요 시간에 대한 감축 방안을 이용하여 트레이닝된 추론 모델에 대해, 트레이닝부(112)는 하이퍼파라미터를 적정 범위 내에서 조정하여 성능을 향상시킨다. 대표적인 방법으로는 그리드 검색(grid search)이 이용될 수 있으나, 모든 경우의 하이퍼파라미터 설정에 대해 성능을 확인하기 때문에, 소요 시간이 길어진다는 단점이 있다. As described above, for an inference model trained using the method for reducing a required learning time, the training unit 112 adjusts hyperparameters within an appropriate range to improve performance. A grid search can be used as a representative method, but it has a disadvantage in that it takes a long time because performance is checked for all hyperparameter settings.

이를 개선하기 위해, 본 실시예에 따른 트레이닝부(112)는 임의 검색(random search) 방법을 기반으로 하이퍼파라미터를 조정할 수 있다. 임의 검색에서는, 하이퍼파라미터를 임의로 설정하고, 추론 모델의 성능을 확인하되, 기설정된 회수만큼 임의 설정 및 성능 확인이 수행될 수 있다. 트레이닝부(112)는, 가장 좋은 성능을 보이는 경우에 대한 하이퍼파라미터를 찾아냄으로써, 하이퍼파라미터에 대한 최적화를 수행할 수 있다. To improve this, the training unit 112 according to the present embodiment may adjust hyperparameters based on a random search method. In the random search, hyperparameters are arbitrarily set and performance of the inference model is checked, but random settings and performance checks may be performed a preset number of times. The training unit 112 may optimize the hyperparameters by finding the hyperparameters for the case showing the best performance.

본 개시의 다른 실시예에 있어서, 트레이닝부(112)는, 기설정된 회수 동안 임의 검색을 수행함에 있어서, 임의의 하이퍼파라미터에 대해, 추론 모델이 기설정된 성능을 만족하는 경우, 임의의 하이퍼파라미터를 최적 하이퍼파라미터로 선정하고, 임의 검색을 종료할 수 있다. In another embodiment of the present disclosure, the training unit 112, when performing a random search for a preset number of times, with respect to an arbitrary hyperparameter, when an inference model satisfies a preset performance, selects an arbitrary hyperparameter. After selecting the optimal hyperparameter, the random search can be terminated.

이상에서 설명한 바와 같이 본 실시예에 따르면, 하이퍼파라미터 최적화를 추론 모델에 대해서만 적용하고, 또한 임의 검색에 기반하는 최적화를 수행함으로써, 추론 모델의 학습 시간을 최소한으로 감소시키는 것이 가능해지는 효과가 있다.As described above, according to the present embodiment, it is possible to reduce the learning time of the inference model to a minimum by applying hyperparameter optimization only to the inference model and performing optimization based on random search.

본 실시예에 따른 분석 시스템(100)이 탑재되는 디바이스(미도시)는 프로그램가능 컴퓨터일 수 있으며, 서버(미도시)와 연결이 가능한 적어도 한 개의 통신 인터페이스를 포함한다. A device (not shown) on which the analysis system 100 according to this embodiment is mounted may be a programmable computer and includes at least one communication interface capable of connecting to a server (not shown).

전술한 바와 같은 추론 모델에 대한 트레이닝은, 분석 시스템(100)이 탑재되는 디바이스의 컴퓨팅 파워를 이용하여 분석 시스템(100)가 탑재되는 디바이스에서 진행될 수 있다. Training for the inference model as described above may be performed in a device on which the analysis system 100 is mounted using computing power of a device on which the analysis system 100 is mounted.

전술한 바와 같은 추론 모델에 대한 트레이닝은 서버에서 진행될 수 있다. 디바이스 상에 탑재된 분석 시스템(100)의 구성요소인 추론 모델과 동일한 구조의 머신 러닝 모델에 대하여 서버의 트레이닝부는 트레이닝을 수행할 수 있다. 디바이스와 연결되는 통신 인터페이스를 이용하여 서버는 트레이닝된 머신 러닝 모델의 파라미터를 디바이스로 전달하고, 전달받은 파라미터를 이용하여 분석 시스템(100)은 추론 모델의 파라미터를 설정할 수 있다. 또한, 분석 시스템(100)이 디바이스에 탑재되는 시점에, 추론 모델의 파라미터가 설정될 수 있다. Training for the inference model as described above may be performed in the server. The training unit of the server may perform training on a machine learning model having the same structure as an inference model, which is a component of the analysis system 100 mounted on the device. The server may transmit parameters of the trained machine learning model to the device using a communication interface connected to the device, and the analysis system 100 may set parameters of the inference model using the received parameters. In addition, parameters of an inference model may be set at a point in time when the analysis system 100 is loaded into a device.

도 12는 본 개시의 일 실시예에 따른 품질데이터 분석방법에 대한 흐름도이다.12 is a flowchart of a quality data analysis method according to an embodiment of the present disclosure.

분석 시스템(100)은 제품에 대한 품질데이터를 획득한다(S1200). 품질데이터는 제품의 생산과정에(서) 적용되거나 발생하는, 복수의 공정인자에 대해 수집될 수 있다. 품질데이터 분석을 위해 입력되는 공정인자는, 추론 모델에 대한 사전 트레이닝과정에서 선정된 주요 공정인자일 수 있다. The analysis system 100 acquires quality data for the product (S1200). Quality data can be collected for a plurality of process factors applied or occurring in the production process of a product. A process factor input for quality data analysis may be a major process factor selected in a pre-training process for an inference model.

분석 시스템(100)은, 추론 모델의 입력으로 이용되는 공정인자에 대해, 범주형, 숫자형 등의 데이터 유형을 설정할 수 있다. 이때, 분석 시스템(100)은 UI부(110)를 이용하여, 분석에 필요한 입력(예컨대, 품질데이터, 공정인자의 유형 등)을 사용자로부터 획득할 수 있다.The analysis system 100 may set a data type such as a categorical type or a numeric type for a process factor used as an input of an inference model. In this case, the analysis system 100 may obtain input necessary for analysis (eg, quality data, type of process factor, etc.) from a user by using the UI unit 110 .

한편, 분석 시스템(100)은 타겟 출력으로 이용되는 공정인자를 타겟 인자로 설정한다.Meanwhile, the analysis system 100 sets a process factor used as a target output as a target factor.

분석 시스템(100)은 품질데이터에 대한 전처리과정을 수행한다(S1202).The analysis system 100 performs a pre-processing process on the quality data (S1202).

범주형 공정인자에 대해, 추론 모델에 적합한 임베딩 값으로 변환하는 인코딩 과정이 수행될 수 있다. For the categorical process factor, an encoding process may be performed to convert it into an embedding value suitable for the inference model.

범주형 데이터의 예로는, 제품에 대한 필드클레임 발생 유무를 나타내는 타겟 인자를 들 수 있다. 따라서, 이러한 타겟 인자에 대한 인코딩은, 추론 모델에 기반하는 품질 분석을 위한 분석용 레이블을 생성하는 과정일 수 있다.An example of categorical data is a target factor indicating whether a field claim has occurred for a product. Accordingly, encoding of the target factor may be a process of generating an analysis label for quality analysis based on an inference model.

또한, 분석 시스템(100)은 수집과정에서 누락된 공정인자의 값을 설정할 수 있다. 예컨대, 숫자형 공정인자는 중앙값(median value)으로 설정되고, 범주형 공정인자는 최빈값(mode value)으로 설정될 수 있다.In addition, the analysis system 100 may set values of process factors that are omitted in the collection process. For example, the numerical process factor may be set to a median value, and the categorical process factor may be set to a mode value.

분석 시스템(100)은 전처리된 복수의 공정인자를 기반으로 추론 모델을 이용하여 제품의 양 또는 불량 여부에 대한 판정 결과를 생성한다(S1204). 여기서, 판정 결과는 제품의 양 또는 불량에 대한 확률값일 수 있다.The analysis system 100 uses an inference model based on a plurality of preprocessed process factors to generate a determination result on whether the product is good or bad (S1204). Here, the determination result may be a probability value for the quantity or defect of the product.

추론 모델은 머신 러닝 모델 형태로 구현되는데, 트리 기반의 결정 트리, 랜덤 포레스트, XGBoost, 또는 LightGBM과 같은 4 가지 머신 러닝 알고리즘 중 하나가 구현된 모델일 수 있다. 트레이닝 과정을 이용하여, 트레이닝부(112)는, 4 가지 머신 러닝 알고리즘 각각을 적용한 모델 중에서 가장 성능이 좋은 모델을 추론 모델로 선정할 수 있다. The inference model is implemented in the form of a machine learning model, and may be a model implemented with one of four machine learning algorithms such as a tree-based decision tree, random forest, XGBoost, or LightGBM. Using the training process, the training unit 112 may select, as an inference model, a model with the best performance among models to which each of the four machine learning algorithms is applied.

분석 시스템(100)은 복수의 공정인자, 분석용 레이블 및 판정 결과를 기반으로, 제품의 품질에 대한 분석리포트를 생성한다(S1206). 판정 결과(제품의 양 또는 불량)에 미치는 공정인자별 영향을 포괄적/미시적으로 나타내기 위해, 분석리포트는 분석데이터 요약(202), 공정인자 중요도(204), 공정인자별 데이터 분포(206), 및 분석결과(208)의 전부 또는 일부를 포함할 수 있다. The analysis system 100 generates an analysis report on product quality based on a plurality of process factors, analysis labels, and judgment results (S1206). In order to comprehensively/microscopically represent the effect of each process factor on the judgment result (amount or defect of the product), the analysis report includes analysis data summary (202), process factor importance (204), data distribution by process factor (206), and all or part of the analysis result 208 .

분석 시스템(100)은 UI부(110)를 이용하여, 사용자에게 분석리포트를 제공한다(S1208). The analysis system 100 provides an analysis report to the user using the UI unit 110 (S1208).

도 13은 본 개시의 일 실시예에 따른 시뮬레이터를 기반으로 품질관리 기준을 변경하는 방법에 대한 흐름도이다.13 is a flowchart of a method of changing a quality control criterion based on a simulator according to an embodiment of the present disclosure.

시뮬레이터는, 인자값 조절을 위한 조정 공정인자를 선정하고, 조정 공정인자에 대한 조정 인자값을 획득한다(S1300). 시뮬레이터는, 도 7에 예시된 바와 같은 UI부(110)를 이용하여, 조정 공정인자를 선정한 후, 이들에 대한 조정된 공정인자값을 사용자로부터 획득할 수 있다.The simulator selects an adjustment process factor for adjusting the factor value and obtains an adjustment factor value for the adjustment process factor (S1300). After selecting the adjusted process factors using the UI unit 110 as illustrated in FIG. 7 , the simulator may acquire adjusted process factor values for these values from the user.

조정 공정인자의 선정에는 체크박스가 이용될 수 있다. 체크박스가 선택된 공정인자에 대해, 공정인자값이 데이터 유형에 따라 조정될 수 있다. A check box can be used to select the adjustment process factor. For the process factor whose checkbox is selected, the process factor value can be adjusted according to the data type.

선정된 공정인자가 범주형인 경우, 체크박스를 이용하여, 사용자가 원하는 공정인자값의 범주가 선택될 수 있다. 숫자형인 경우, 슬라이더를 이용하여, 공정인자의 값이 조절될 수 있다. If the selected process factor is a categorical type, a user-desired process factor value category can be selected using a check box. In the case of numeric type, the value of the process factor can be adjusted using the slider.

또한, 공정인자에 대한 T-테스트 결과를 참조하여, 제품에 대한 불량 분포를 최소화할 수 있도록 해당 공정인자값이 조정될 수 있다. In addition, with reference to the T-test result for the process factor, the corresponding process factor value may be adjusted to minimize the distribution of defects in the product.

조정 공정인자는, 추론 모델에 대한 사전 트레이닝 과정에서 선정된 주요 공정인자의 전부 또는 일부일 수 있다. The adjusted process factor may be all or part of the main process factors selected in the pre-training process for the inference model.

또한, 조정 공정인자로는, 전술한 바와 같은 입력인자가 선정될 수 있다.In addition, as the adjustment process factor, the above-described input factor may be selected.

시뮬레이터는 추론 모델을 이용하여, 조정 공정인자를 기반으로 제품의 양 또는 불량 여부에 대한 확률을 생성한다(S1302). 전술한 바와 같이, 추론 모델이 생성하는 판정 결과는 제품의 양 또는 불량에 대한 확률값일 수 있다. The simulator generates a probability of whether the product is good or bad based on the adjusted process factor using the inference model (S1302). As described above, the decision result generated by the inference model may be a probability value for the quantity or defect of the product.

시뮬레이터는 불량에 대한 확률이 기설정된 기준 확률 미만인지를 확인한다(S1304). 불량에 대한 확률이 기준 확률 이상인 경우, 시뮬레이터는 조정된 공정인자를 신규로 획득하여, 시뮬레이션하는 과정(S1300 내지 S1304)을 반복하여 수행한다. The simulator checks whether the probability of failure is less than a predetermined standard probability (S1304). If the probability of failure is greater than or equal to the reference probability, the simulator newly acquires the adjusted process factor and repeats the simulation process (S1300 to S1304).

불량에 대한 확률이 기준 확률 미만인 경우, 시뮬레이터는 조정 인자값을 조정 공정인자에 대한 최적 인자값으로 선정한다(S1306).If the probability of failure is less than the reference probability, the simulator selects the adjustment factor value as the optimal factor value for the adjustment process factor (S1306).

시뮬레이터는, 최적 인자값을 기반으로 조정 공정인자에 대한 품질관리 기준을 변경한다(S1308). 변경된 품질관리 기준은, 추후 제품에 대한 생산과정에 적용될 수 있다. The simulator changes the quality control criteria for the adjusted process factor based on the optimal factor value (S1308). The changed quality control standards can be applied to the production process for future products.

도 14는 본 개시의 일 실시예에 따른 추론 모델의 트레이닝 방법에 대한 흐름도이다. 14 is a flowchart of a method for training an inference model according to an embodiment of the present disclosure.

트레이닝부(112)는, 추론 모델의 트레이닝에 사용하기 위해, 제품에 대한 품질데이터를 획득한다(S1410). 품질데이터는 제품의 생산과정에(서) 적용되거나 발생하는, 복수의 공정인자에 대해 수집될 수 있다. The training unit 112 acquires quality data about the product to use in training of the reasoning model (S1410). Quality data can be collected for a plurality of process factors applied or occurring in the production process of a product.

트레이닝부(112)는, 트레이닝에 사용되는 공정인자에 대해 데이터 유형을 설정할 수 있다. The training unit 112 may set data types for process factors used for training.

한편, 품질데이터는, 추론 모델의 트레이닝 과정에서 타겟 출력(즉, 학습용 레이블)으로 이용될 수 있는 인자(예컨대, 제품에 대한 필드클레임 발생 유무)를 포함할 수 있다. 트레이닝부(112)는 타겟 출력으로 이용되는 인자를 타겟 인자로 설정한다.On the other hand, the quality data may include a factor that can be used as a target output (ie, a label for learning) in the training process of an inference model (eg, whether a field claim has occurred for a product). The training unit 112 sets a factor used as a target output as a target factor.

공정인자에 대한 범주, 및 타겟 인자는, 전술한 바와 같은, UI부(110)를 이용하여 설정될 수 있다.The category and target factor for the process factor may be set using the UI unit 110 as described above.

트레이닝부(112)는, 품질데이터에 대한 전처리과정을 수행한다(S1402).The training unit 112 performs a pre-processing process on the quality data (S1402).

트레이닝부(112)는, 범주형 공정인자에 대해, 추론 모델에 적합한 임베딩 값으로 변환하는 인코딩 과정을 수행할 수 있다. 또한, 트레이닝부(112)는 수집과정에서 누락된 공정인자의 값을 설정할 수 있다. 예컨대, 숫자형 공정인자는 중앙값으로 설정되고, 범주형 공정인자는 최빈값으로 설정될 수 있다.The training unit 112 may perform an encoding process of converting the categorical process factor into an embedding value suitable for the inference model. In addition, the training unit 112 may set values of process factors omitted in the collection process. For example, a numerical process factor may be set as a median value, and a categorical process factor may be set as a mode value.

범주형 데이터인 타겟 인자에 대한 인코딩은, 추론 모델의 트레이닝을 위한 학습용 레이블을 생성하는 과정일 수 있다. Encoding of the target factor, which is categorical data, may be a process of generating a learning label for training an inference model.

트레이닝부(112)는, 품질데이터에 포함된 다수의 공정인자로부터 타겟 인자에 영향력이 높은 주요 공정인자를 선정한다(S1404).The training unit 112 selects a major process factor having a high influence on the target factor from a plurality of process factors included in the quality data (S1404).

공정인자의 개수가 기설정된 개수보다 큰 경우, 트레이닝부(112)는, 전술한 바와 같은 주요 공정인자를 산출하는 과정을 수행하여, 기설정된 개수 이하가 되도록 주요 공정인자를 선별할 수 있다. 주요 공정인자를 산출하는 과정으로는 T-테스트, 정보 이득의 비교, 및 상관관계 분석의 전부 또는 일부가 수행될 수 있다.If the number of process factors is greater than the preset number, the training unit 112 may select the major process factors to be less than or equal to the preset number by performing the process of calculating the major process factors as described above. All or part of a T-test, comparison of information gains, and correlation analysis may be performed as a process of calculating the main process factor.

트레이닝부(112)는, 주요 공정인자에 대해, 불량 데이터에 대한 데이터 균형화를 수행한다(S1406). 트레이닝부(112)는 불량 데이터를 업샘플링(upsampling)하여 불량 데이터의 개수를 증강시킴으로써, 불량 데이터와 양품 데이터 간의 균형을 성취할 수 있다. The training unit 112 performs data balancing on bad data with respect to the main process factor (S1406). The training unit 112 may increase the number of bad data by upsampling bad data, thereby achieving a balance between bad data and good data.

트레이닝부(112)는, 4 개 머신 러닝 모델(810)에 대한 트레이닝을 수행한다(S1408).The training unit 112 performs training on the four machine learning models 810 (S1408).

트레이닝부(112)는 균형화된 품질데이터를 학습용 데이터 및 검증용 데이터로 분할한 후, 학습용 데이터 및 학습용 레이블을 기반으로 4 개 머신 러닝 모델(810) 각각에 대한 트레이닝을 수행한다. 또한, 트레이닝부(112)는 학습된 머신 러닝 모델들에 대해, 검증용 데이터를 기반으로 교차 검증을 수행한 후, 각 모델에 대한 성능을 저장한다.The training unit 112 divides the balanced quality data into training data and verification data, and then performs training on each of the four machine learning models 810 based on the training data and training labels. In addition, the training unit 112 performs cross-validation on the learned machine learning models based on data for verification, and then stores the performance of each model.

트레이닝부(112)는, 4 개의 모델(810)에 대한 트레이닝 과정에서 최대 깊이를 적절히 조절함으로써, 과적합을 방지하는 데 중점을 둘 수 있다.The training unit 112 may focus on preventing overfitting by appropriately adjusting the maximum depth in the training process for the four models 810 .

4 개의 모델(810)에 대한 학습이 완료된 후, 트레이닝부(112)는 4 개의 모델(810) 간의 성능을 비교함으로써, 최적의 모델을 추론 모델로 선정한다(S1410). 트레이닝부(112)는, F1 스코어가 가장 높은 모델을 최종적인 추론 모델로 선정할 수 있다.After learning of the four models 810 is completed, the training unit 112 selects an optimal model as an inference model by comparing performances of the four models 810 (S1410). The training unit 112 may select a model having the highest F1 score as a final inference model.

본 실시예에 따른 분석 시스템(100)은, 품질데이터에 포함된 공정인자의 편향 문제를 해결하기 위해, 공정인자의 품질관리 기준을 개선하는 방안을 이용한다. The analysis system 100 according to the present embodiment uses a method of improving quality control standards of process factors in order to solve the bias problem of process factors included in quality data.

이하, 도 15 및 도 16을 이용하여, 분석 시스템(100)이 수행하는, 공정인자의 품질관리 기준을 개선하는 방안에 대해 설명한다.Hereinafter, a method of improving quality control standards of process factors performed by the analysis system 100 will be described using FIGS. 15 and 16 .

도 15는 본 개시의 일 실시예에 따른 공정인자에 대한 품질관리 기준 개선장치에 대한 개략적인 구성도이다.15 is a schematic configuration diagram of an apparatus for improving quality control standards for process factors according to an embodiment of the present disclosure.

본 실시예에 따른 품질관리 기준 개선장치는, 분석 시스템(100)에 포함되고, 제품에 대한 공정인자와 필드클레임 간의 영향 정도(이하, '영향도')를 기반으로, 영향도가 낮은 공정인자에 대한 품질관리 기준을 조정한다. 품질관리 기준 개선장치는 입력부(102), 영향 분석부(1504), 관리범위 조정부(1506), 데이터 재수집부(1508), 및 세분화 수집부(1510)의 전부 또는 일부를 포함할 수 있다.The quality control standard improvement apparatus according to the present embodiment is included in the analysis system 100, and the process factor having a low influence is based on the degree of influence between the process factor and the field claim for the product (hereinafter referred to as 'influence'). Adjust quality control standards for The quality control standard improvement device may include all or part of the input unit 102, the influence analysis unit 1504, the management range adjustment unit 1506, the data re-collection unit 1508, and the segmentation collection unit 1510.

입력부(102)는 제품에 대한 품질데이터와 필드클레임을 획득한다. 품질데이터는 제품의 생산과정에(서) 적용되거나 발생하는, 복수의 공정인자에 대해 수집될 수 있다. 한편, 필드클레임은, 제품에 대한 양 또는 불량을 나타낼 수 있으며, 추후 트레이닝을 위한 타겟 인자로 설정될 수 있다. The input unit 102 obtains quality data and field claims about the product. Quality data can be collected for a plurality of process factors applied or occurring in the production process of a product. On the other hand, the field claim may indicate the quantity or defect of the product, and may be set as a target factor for future training.

영향 분석부(1504)는 품질데이터에 포함된 공정인자와 필드클레임 간의 영향도를 분석한다. 영향도를 분석하는 방법으로는, 전술한 바와 같은, 주요 공정인자 선정과정에서 이용되었던 방식들(예컨대, T-테스트, 정보 이득의 산정, 상관관계 분석 등)이 이용될 수 있다. 이러한 영향 분석을 기반으로, 영향 분석부(1204)는 영향도 순서대로 공정인자를 배열할 수 있다. The influence analysis unit 1504 analyzes the degree of influence between process factors included in the quality data and field claims. As a method of analyzing the degree of influence, methods (eg, T-test, calculation of information gain, correlation analysis, etc.) used in the process of selecting key process factors as described above may be used. Based on this impact analysis, the impact analyzer 1204 may arrange the process factors in order of impact.

즉, 영향 분석부(1504)는 T-테스트를 이용하여, 제품의 양 또는 불량과 통계적인 유의성을 갖는 공정인자를 선별한다. 영향 분석부(1504)는 선별된 공정인자들 간의 정보 이득을 비교하여, 정보 이득이 높은 순으로 배열을 생성할 수 있다. 영향 분석부(1504)는 배열된 공정인자 간의 상관관계를 분석하여, 상관계수가 기설정된 기준치보다 큰 두 공정인자의 경우 둘 중의 하나(예컨대, 정보 이득이 낮은 공정인자)를 배열에서 제거한다. 이는, 상관관계가 높은 두 공정인자 모두에 대해 관리범위가 조정될 경우, 상충된 조정 결과가 발행할 수 있기 때문이다. 따라서, 영향도 순서는, 통계적인 유의성 및 상관관계가 반영된 정보 이득이 높은 순서일 수 있다. That is, the influence analysis unit 1504 selects a process factor having statistical significance with the quantity or defect of the product by using the T-test. The influence analyzer 1504 may compare information gains of the selected process factors and generate an array in the order of highest information gain. The influence analysis unit 1504 analyzes the correlation between the arrayed process factors, and removes one of the two process factors (eg, a process factor having a low information gain) from the array in the case of two process factors having a correlation coefficient greater than a predetermined reference value. This is because conflicting adjustment results may occur when the management scope is adjusted for both process factors with high correlation. Accordingly, the order of influence may be a higher order of information gain in which statistical significance and correlation are reflected.

관리범위 조정부(1506)는, 영향도가 상위 20 % 이내에 들지 못하는 편향 공정인자에 대해, 편향 공정인자의 관리범위를 확대한다. 편향 공정인자의 관리범위를 확대하기 위해, 분석 시스템(100)은 기존 관리범위의 하한값을 더 하향시키거나 상한값을 더 상향시킴으로써, 생산과정에서 수집되는 데이터의 범위를 확대시킬 수 있다. The management range adjustment unit 1506 expands the management range of biased process factors for biased process factors whose influence does not fall within the top 20%. In order to expand the management range of the biased process factor, the analysis system 100 may expand the range of data collected in the production process by lowering the lower limit value of the existing management range or increasing the upper limit value of the existing management range.

데이터 재수집부(1508)는, 확대된 관리범위를 기반으로, 품질데이터를 재수집한다. 분석 시스템(100) 또는 서버에 포함된 저장장치를 이용하여, 데이터 재수집부(1208)는 품질데이터를 재수집하여 저장할 수 있다. 생산과정 또는 공정인자의 특성에 따라, 이러한 재수집 과정은 수일, 수주, 또는 수개월 이상이 소요될 수 있다.The data recollecting unit 1508 recollects quality data based on the expanded management range. Using the storage device included in the analysis system 100 or the server, the data recollector 1208 may recollect and store the quality data. Depending on the nature of the production process or process parameters, this recollection process may take days, weeks, or months or longer.

한편 영향 분석부(1504)는, 관리범위가 조정된 후, 재수집된 품질데이터에 포함된 공정인자와 필드클레임 간의 영향도를 분석할 수 있다. 영향 분석을 기반으로, 영향도 순으로 공정인자가 재배열될 수 있다. 재수집된 품질데이터에 있어서, 영향도가 상위 20 % 이내에 들지 못하는 편향 공정인자에 대해, 영향 분석부(1204)는 기존 관리범위를 유지할 수 있다.Meanwhile, the influence analysis unit 1504 may analyze the degree of influence between process factors and field claims included in the recollected quality data after the management scope is adjusted. Based on the impact analysis, process factors can be rearranged in order of impact. In the recollected quality data, the influence analysis unit 1204 may maintain the existing management range for biased process factors whose influence does not fall within the top 20%.

데이터세분화 수집부(1510)는, 입력된 품질데이터 또는 재수집된 품질데이터에 있어서, 영향도가 상위 20 % 이내인 공정인자에 대해, 관리범위 내의 데이터를 세분화한 후 재수집한다. 관리범위 내의 데이터를 세분화한 후 재수집함으로써, 품질데이터가 관리범위 내에 고르게 존재할 수 있도록 할 수 있다. The data segmentation collection unit 1510 subdivides and recollects the data within the management range for the process factors whose influence is within the top 20% in the input quality data or the recollected quality data. By subdividing and recollecting data within the management scope, quality data can exist evenly within the management scope.

도 16은 본 개시의 일 실시예에 따른 공정인자의 품질관리 기준을 개선하는 방법에 대한 흐름도이다. 16 is a flowchart of a method for improving quality control standards of process factors according to an embodiment of the present disclosure.

분석 시스템(100)은 품질데이터에 포함된 공정인자와 필드클레임 간의 영향도를 분석한다(S1600). 영향도를 분석하는 방법으로는, 전술한 바와 같은, 주요 공정인자 선정과정에서 이용되었던 방식들(예컨대, T-테스트, 정보 이득의 산정, 상관관계 분석 등)이 이용될 수 있다. 영향 분석을 기반으로, 분석 시스템(100)은 영향도 순서대로 공정인자를 배열할 수 있다. 여기서, 영향도 순서는, 통계적인 유의성 및 상관관계가 반영된 정보 이득이 높은 순서일 수 있다. The analysis system 100 analyzes the degree of influence between process factors included in the quality data and field claims (S1600). As a method of analyzing the degree of influence, methods (eg, T-test, calculation of information gain, correlation analysis, etc.) used in the process of selecting key process factors as described above may be used. Based on the impact analysis, the analysis system 100 may arrange the process factors in order of impact. Here, the order of influence may be a higher order of information gain in which statistical significance and correlation are reflected.

분석 시스템(100)은 공정인자의 영향도가 상위 20 % 이내인지를 확인한다(S1602). The analysis system 100 checks whether the influence of the process factor is within the top 20% (S1602).

분석 시스템(100)은, 영향도가 20 % 이내에 들지 못하는 편향 공정인자에 대해, 편향 공정인자의 관리범위를 확대한다(S1604). 편향 공정인자의 관리범위를 확대하기 위해, 분석 시스템(100)은 기존 관리범위의 하한값을 더 하향시키거나 상한값을 더 상향시킴으로써, 생산과정에서 수집되는 데이터의 범위를 확대할 수 있다. The analysis system 100 expands the management range of the biased process factor with respect to the biased process factor whose influence does not fall within 20% (S1604). In order to expand the management range of the biased process factor, the analysis system 100 may expand the range of data collected in the production process by further lowering the lower limit value or increasing the upper limit value of the existing management range.

분석 시스템(100)은, 확대된 관리범위를 기반으로, 품질데이터를 재수집한다(S1606). 분석 시스템(100) 또는 서버에 포함된 저장장치를 이용하여, 분석 시스템(100)은 품질데이터를 재수집하여 저장할 수 있다.The analysis system 100 re-collects quality data based on the expanded management scope (S1606). Using a storage device included in the analysis system 100 or the server, the analysis system 100 may recollect and store quality data.

분석 시스템(100)은, 관리범위가 확대된 채로 재수집된 공정인자에 대해, 공정인자와 필드클레임 간 영향도를 분석한다(S1608). 영향 분석을 기반으로, 분석 시스템(100)은 영향도 순으로 공정인자를 재배열할 수 있다. The analysis system 100 analyzes the influence between the process factor and the field claim for the process factor recollected with the management scope expanded (S1608). Based on the impact analysis, the analysis system 100 may rearrange the process factors in order of impact.

분석 시스템(100)은 공정인자의 영향도가 상위 20 % 이내인지를 확인한다(S1610). The analysis system 100 checks whether the influence of the process factor is within the top 20% (S1610).

분석 시스템(100)은, 영향도가 상위 20 % 이내에 들지 못하는 편향 공정인자에 대해, 기존 관리범위를 유지한다(S1612).The analysis system 100 maintains the existing management range for biased process factors that do not have an influence within the top 20% (S1612).

한편, 분석 시스템(100)은, 관리범위가 확대된 편향 공정인자를 포함하여 영향도가 상위 20 % 이내인 공정인자에 대해(S1602 및 S1610), 관리범위 내의 데이터를 세분화한 후 재수집한다(S1614). 관리범위 내의 데이터를 세분화한 후 재수집함으로써, 품질데이터가 관리범위 내에 고르게 존재할 수 있도록 할 수 있다. On the other hand, the analysis system 100 subdivides and recollects data within the management range for process factors whose influence is within the top 20%, including biased process factors whose management range is expanded (S1602 and S1610) (S1602 and S1610). S1614). By subdividing and recollecting data within the management scope, quality data can exist evenly within the management scope.

이상에서 설명한 바와 같이 본 실시예에 따르면, 관리범위 내의 데이터를 세분화한 후 재수집하는 분석 시스템을 제공함으로써, 공정인자의 편향의 영향을 감소시키고, 품질데이터 분석의 효율을 증대시키는 것이 가능해지는 효과가 있다. As described above, according to the present embodiment, by providing an analysis system that subdivides and recollects data within the management scope, it is possible to reduce the influence of bias of process factors and increase the efficiency of quality data analysis. there is

본 실시예에 있어서, 전술한 바와 같이, 품질 분석의 대상이 되는 제품은 기어박스와 같은, 차량에 포함되는 부품일 수 있다. 복합 시스템인 차량 전체에 비해, 기어박스는, 본 실시예에 따른 분석 시스템(100)을 기반으로 품질 분석을 수행하기에 적절한 규모의 시스템이다. 즉, 머신 러닝 기반의 추론 모델은, 트레이닝 과정을 이용하여, 기어박스에 대한 다수의 공정인자와 필드클레임(즉, 제품의 양 또는 불량) 간의 인과 관계를 모델링할 수 있다. 또한, 트레이닝된 추론 모델을 이용하여, 특정한 공정인자에 대한 품질관리 기준을 조정함으로써, 기어박스의 불량 발생률을 감소시킬 수 있다. In this embodiment, as described above, a product to be analyzed for quality may be a part included in a vehicle, such as a gearbox. Compared to the entire vehicle, which is a complex system, the gearbox is a system of an appropriate scale to perform quality analysis based on the analysis system 100 according to the present embodiment. That is, the machine learning-based reasoning model may model a causal relationship between a plurality of process factors and field claims (ie, product quantity or defect) for the gearbox by using a training process. In addition, by using the trained inference model, by adjusting the quality control criteria for specific process factors, it is possible to reduce the rate of defective gearboxes.

한편, 기어박스에 대한 품질데이터를 구성하는 공정인자로는, 피니언 플러그 체결 토크, lock ring 압입깊이, 피니언 그리스 도포량, lock ring 코킹량, 피니언 플러그 LDVT 높이, 코킹하중, 4점 베어링 압입깊이, rack bar 하중(LH 방향), rack bar 하중(RH 방향), 요크 압입하중 등을 들 수 있다. 본 실시예는 기어박스와 같은 제품에 대한 품질분석에 대한 발명이므로, 기어박스의 공정인자에 대해 자세한 설명을 생략한다. On the other hand, the process factors constituting the quality data for the gearbox include pinion plug fastening torque, lock ring press-in depth, pinion grease application amount, lock ring caulking amount, pinion plug LDVT height, caulking load, 4-point bearing press-in depth, rack These include bar load (LH direction), rack bar load (RH direction), and yoke press-in load. Since this embodiment is an invention for quality analysis for a product such as a gearbox, a detailed description of the process factors of the gearbox will be omitted.

이하, 본 실시예에 따른 분석 시스템(100)을 기어박스의 품질분석에 적용한 사례에 대해 설명한다. Hereinafter, a case in which the analysis system 100 according to the present embodiment is applied to quality analysis of a gearbox will be described.

도 17은 본 개시의 일 실시예에 따른 분석 시스템을 기어박스에 적용하는 과정에 대한 흐름도이다. 17 is a flowchart of a process of applying an analysis system according to an embodiment of the present disclosure to a gearbox.

먼저, 기어박스에 대한 품질 분석에 이용되는 추론 모델을 선정하는 과정을 설명한다. First, the process of selecting an inference model used for quality analysis of a gearbox will be described.

분석 시스템(100)은 기어박스에 대한 품질데이터를 획득한다(S1700).The analysis system 100 acquires quality data for the gearbox (S1700).

일반적으로, 품질시스템에서 관리되는 필드클레임 데이터와 달리, 공정인자 데이터는 제조시스템(Manufacture Executive System: MES)에서 별도로 축적 및 관리되므로, 품질분석을 위해서는 두 데이터가 통합되어야 한다. 기어박스라는 제품 식별자(ID)를 기준으로 두 데이터를 통합하는 것이 가능한데, 두 가지 방안이 이용될 수 있다.In general, unlike field claim data managed in the quality system, process factor data is separately accumulated and managed in the manufacturing system (Manufacture Executive System: MES), so the two data must be integrated for quality analysis. It is possible to integrate the two data based on the product identifier (ID) of the gearbox, and two methods can be used.

첫 번째는, 필드클레임의 종류에 따라 공정인자 데이터를 분류하여 통합하는 방안이다. 기어박스의 경우, 진동, 소음, 파손 등과 같이 다양한 필드클레임의 존재하므로, 이들을 각각을 클래스로 분류하여 데이터 분석을 진행할 수 있다. 첫 번째 방안은 필드클레임 종류별로 세밀한 원인분석이 가능하다는 장점을 가지나, 필드클레임 종류별 데이터가 적을 경우 분석 결과가 편향된다는 문제가 있다. The first is a method of classifying and integrating process factor data according to the type of field claim. In the case of a gearbox, since there are various field claims such as vibration, noise, and breakage, data analysis can be performed by classifying each of them into classes. The first method has the advantage of enabling detailed cause analysis for each type of field claim, but has a problem that the analysis result is biased when there is little data for each type of field claim.

두 번째는, 필드클레임 종류와는 상관없이 필드클레임 발생 유무에 따라 공정인자 데이터를 양 또는 불량으로 이원화하여 통합하는 방안이다. 이 방안은 데이터를 분류하는 작업이 간단하여 시간 소모가 적고, 일부 필드클레임에 대한 데이터가 적더라도 보편적인 분석이 가능하다는 장점이 있다. 본 실시예에 있어서, 분석 시스템(100)은 두 번째 방안에 따라 통합된 품질데이터를 획득하나, 반드시 이에 한정하는 것은 아니다. 본 개시에 따른 다른 실시예에 있어서, 클래스에 따른 필드클레임 발생 유무를 추론하도록 추론 모델이 트레이닝될 수 있다. Second, regardless of the type of field claim, it is a method of integrating process factor data by dividing it into positive or defective data depending on whether or not a field claim has occurred. This method has the advantage that the task of classifying data is simple and time consuming, and even if there is little data on some field claims, universal analysis is possible. In this embodiment, the analysis system 100 acquires integrated quality data according to the second method, but is not necessarily limited thereto. In another embodiment according to the present disclosure, an inference model may be trained to infer whether a field claim has occurred according to a class.

한편, 통합된 품질데이터는, 전술한 바와 같이, 추론 모델의 트레이닝에 이용될 수 있다.Meanwhile, as described above, the integrated quality data may be used for training of an inference model.

분석 시스템(100)은, 트레이닝에 사용되는 공정인자에 대해 데이터 유형을 설정할 수 있다. 여기서 공정인자의 데이터 유형은, 수치로 표현되는 숫자형, 문자로 표현되는 범주형, 및 데이터가 수집된 시간 정보를 포함하는 시간형을 포함할 수 있다.The analysis system 100 may set data types for process factors used for training. Here, the data type of the process factor may include a numeric type expressed as a numerical value, a categorical type expressed as a character, and a time type including time information at which data was collected.

한편, 통합된 품질데이터는, 추론 모델의 트레이닝 과정에서 타겟 출력(즉, 학습용 레이블)으로 이용될 수 있는 인자(예컨대, 기어박스에 대한 필드클레임 발생 유무)를 포함한다. 분석 시스템(100)은 타겟 출력으로 이용되는 인자를 타겟 인자로 설정한다.On the other hand, the integrated quality data includes a factor that can be used as a target output (ie, label for learning) in the training process of the inference model (eg, whether a field claim has occurred for the gearbox). The analysis system 100 sets a factor used as a target output as a target factor.

분석 시스템(100)은, 도 9에 예시된 바와 같이, 품질데이터에 대한 전처리과정을 수행한다(S1702).As illustrated in FIG. 9 , the analysis system 100 performs pre-processing on quality data (S1702).

품질데이터가 수집된 시간은, 제품의 양 또는 불량과 연관 관계가 적다고 판단하여, 시간형 공정인자는 트레이닝을 위한 품질데이터에서 제거된다.It is determined that the time for which the quality data is collected has little correlation with the quantity or defect of the product, and the time-type process factor is removed from the quality data for training.

범주형 데이터인 경우, 분석 시스템(100)은 추론 모델에 적합한 임베딩 값으로 변환하는 인코딩 과정을 수행한다. In the case of categorical data, the analysis system 100 performs an encoding process of converting into an embedding value suitable for an inference model.

범주형 데이터의 예로는, 제품에 대한 필드클레임 발생 유무를 나타내는 타겟 인자를 들 수 있다. 이러한 타겟 인자에 대한 인코딩은, 추론 모델의 트레이닝을 위한 학습용 레이블을 생성하는 과정이다.An example of categorical data is a target factor indicating whether a field claim has occurred for a product. Encoding of these target factors is a process of generating learning labels for training an inference model.

숫자형 데이터, 및 인코딩된 범주형 데이터에 대해, 분석 시스템(100)은 수집과정에서 발생한 누락 데이터를 처리한다. 이때, 범주형 데이터는 최빈값으로, 숫자형 데이터는 중앙값으로 설정될 수 있다. 한편, 누락이 심한 공정인자의 경우, 추론 모델의 트레이닝에 방해가 될 수 있다. 따라서, 분석 시스템(100)은, 누락률이 기설정된 비율보다 큰 공정인자를 트레이닝 과정에서 제거할 수 있다. For numeric data and encoded categorical data, the analysis system 100 processes missing data generated in the collection process. In this case, the categorical data may be set as the most frequent value, and the numeric data may be set as the median value. On the other hand, in the case of a process factor with severe omission, it may interfere with the training of the inference model. Accordingly, the analysis system 100 may remove a process factor having a higher omission rate than a predetermined rate in the training process.

한편, 공정인자의 개수가 20 개보다 많은 경우, 분석 시스템(100)은, 도 10에 예시된 바와 같이, 트레이닝을 위한 주요 공정인자를 선정하는 과정을 수행할 수 있다. 기어박스와 관련된 도 17 예시에는, 주요 공정인자를 선정하는 과정이 생략되어 있는데, 이는 기어박스의 품질데이터를 구성하는 공정인자의 개수가 20 개 이하였기 때문이다. Meanwhile, when the number of process factors is greater than 20, the analysis system 100 may perform a process of selecting main process factors for training, as illustrated in FIG. 10 . In the example of FIG. 17 related to the gearbox, the process of selecting key process factors is omitted, because the number of process factors constituting the quality data of the gearbox was 20 or less.

분석 시스템(100)은 공정인자에 대한 데이터 균형화를 수행한다(S1704).The analysis system 100 performs data balancing on process factors (S1704).

분석 시스템(100)은 불량 데이터를 업샘플링(upsampling)하여 불량 데이터의 개수를 증강시킴으로써, 불량 데이터와 양품 데이터 간의 균형을 성취한다. 예컨대, 분석 시스템(100)은 kNN 모델 기법을 이용하여 데이터 분포 내에서 유사한 데이터를 생성할 수 있다.The analysis system 100 upsamples bad data to augment the number of bad data, thereby achieving a balance between bad and good data. For example, the analysis system 100 may generate similar data within a data distribution using kNN model techniques.

분석 시스템(100)은, 4 개의 머신 러닝 모델(810)에 대한 학습을 수행한 후, 최적 모델을 선정하여 추론 모델로 결정한다(S1706).The analysis system 100 performs learning on the four machine learning models 810, and then selects an optimal model and determines it as an inference model (S1706).

분석 시스템(100)은, 도 11에 예시된 바와 같이, 기어박스와 관련된 품질데이터 및 학습용 레이블을 이용하여 4 개의 머신 러닝 모델(810)에 대한 학습을 수행하여, 4 개의 머신 러닝 모델(810)에 대한 학습 성능을 기반으로 추론 모델을 선정할 수 있다.As illustrated in FIG. 11 , the analysis system 100 performs training on four machine learning models 810 using quality data related to the gearbox and labels for learning, so that the four machine learning models 810 An inference model can be selected based on the learning performance for .

본 실시예에 있어서, 4 개의 머신 러닝 모델(810)에 대한 트레이닝을 수행한 결과, 추론 모델로는 랜덤 포레스트 알고리즘을 구현한 모델이 선정되었다. 트레이닝 과정에서, 최대 깊이를 중심으로 하이터파라미터를 최적화하여, 추론 모델에 대해 최적 성능을 달성하였다. In this embodiment, as a result of performing training on the four machine learning models 810, a model implementing a random forest algorithm is selected as an inference model. In the training process, we optimized the height parameters around the maximum depth to achieve optimal performance for the inference model.

한편, 근래 일반적으로 활용되는 알고리즘은 XGBoost 또는 LightGBM과 같은 부스팅 계열 알고리즘이다. 하지만, 데이터 불균형이 심한 품질데이터 특성상, 최대 깊이를 조절하여 과적합을 방지하는 방안을 이용되기 때문에, 랜덤 포레스트와 같은 배깅 계열 알고리즘이 더 좋은 결과를 생성할 수 있다.Meanwhile, a commonly used algorithm is a boosting algorithm such as XGBoost or LightGBM. However, due to the nature of quality data with severe data imbalance, since a method of preventing overfitting by adjusting the maximum depth is used, bagging algorithms such as random forest can produce better results.

이하, 추론 모델을 이용하여, 기어박스의 공정인자를 조정하는 과정을 설명한다. Hereinafter, a process of adjusting a process factor of a gearbox using an inference model will be described.

분석 시스템(100)은 품질관리 기준 조정을 위한 공정인자를 선정한다(S1708).The analysis system 100 selects process factors for adjusting quality control standards (S1708).

분석 시스템(100)은, 필드클레임 발생 유무에 대한 공정인자의 영향을 분석하여(S1730 내지 S1734), 영향도가 큰 공정인자를 선정한다.The analysis system 100 analyzes the influence of the process factor on whether or not a field claim occurs (S1730 to S1734), and selects a process factor having a large influence.

분석 시스템(100)은 공정인자에 대한 특성 중요도를 비교한다(S1730).The analysis system 100 compares the feature importance for the process factor (S1730).

분석 시스템(100)은, 랜덤 포레스트 알고리즘을 구현한 추론 모델이 트레이닝 과정에서 생성한 특성 중요도를 비교하여, 공정인자를 일차적으로 선별한다.The analysis system 100 generates an inference model implementing a random forest algorithm in a training process. Process factors are primarily selected by comparing the importance of characteristics.

도 18은 본 개시의 일 실시예에 따른 기어박스의 공정인자에 대한 특성 중요도를 나타내는 예시도이다. 18 is an exemplary diagram illustrating importance of characteristics of process factors of a gearbox according to an embodiment of the present disclosure.

도 18의 예시에서, 'worst'의 의미는, 공정인자의 특성 중요도가 기설정된 기준치보다 커서, 필드클레임의 발생에 미치는 영향이 크다는 것을 나타낸다. 한편, 전술한 바와 같이, 이러한 특성 중요도는, 분석리포트의 일부로서 UI부(110)를 이용하여 사용자에게 제공될 수 있다. In the example of FIG. 18, the meaning of 'worst' indicates that the characteristic importance of the process factor is greater than the predetermined reference value, and thus has a great effect on the occurrence of field claims. On the other hand, as described above, this feature importance may be provided to the user using the UI unit 110 as a part of the analysis report.

분석 시스템(100)은, 공정인자에 대한 T-테스트를 수행한다(S1732). 분석 시스템(100)은, 특성 중요도가 높은 공정인자에 대해, 기어박스의 양 또는 불량 분포와의 유의성을 검증하는 T-테스트를 실시하여, 공정인자를 이차적으로 선별한다. The analysis system 100 performs a T-test on process factors (S1732). The analysis system 100 secondarily selects the process factor by performing a T-test to verify the significance of the gearbox amount or defect distribution for the process factor having a high importance of the characteristic.

도 19는 본 개시의 일 실시예에 따른 T-테스트를 나타내는 예시도이다. 19 is an exemplary diagram illustrating a T-test according to an embodiment of the present disclosure.

여기서, T-테스트의 예로서 이용된 기어박스의 공정인자는 'lock ring 압입깊이'이고, 검정결과는 유의함을 나타내고 있다. 한편, 전술한 바와 같이, 이러한 T-테스트의 기반이 되는 공정인자별 데이터 분포는, 분석리포트의 일부로서 UI부(110)를 이용하여 사용자에게 제공될 수 있다. Here, the process factor of the gearbox used as an example of the T-test is 'lock ring press-in depth', and the test result shows that it is significant. On the other hand, as described above, the data distribution for each process factor, which is the basis of the T-test, may be provided to the user using the UI unit 110 as a part of the analysis report.

분석 시스템(100)은, 공정인자에 대한 상관관계를 확인한다(S1734).The analysis system 100 checks the correlation with the process factor (S1734).

분석 시스템(100)은, T-테스트까지 통과한 공정인자 간의 상관관계를 확인한 후, 두 공정인자 간의 상관계수가 기설정된 기준치보다 큰 경우, 특성 중요도가 높은 공정인자를 선별한다.The analysis system 100 checks the correlation between the process factors that have passed the T-test, and then selects a process factor having a high characteristic importance when the correlation coefficient between the two process factors is greater than a predetermined reference value.

결론적으로, 도 18에 예시된 공정인자들은, 전술한 바와 같은 영향 분석에 따라, 기어박스의 불량 발생에 미치는 영향이 크다고 판정된 공정인자들을 나타낸다. 또한, 이들은 모두 조정이 가능한 공정인자(즉, 입력인자)들이다. In conclusion, the process factors illustrated in FIG. 18 represent process factors determined to have a large influence on the occurrence of defects in the gearbox according to the above-described effect analysis. In addition, these are all process factors (ie, input factors) that can be adjusted.

분석 시스템(100)은, 선정된 공정인자에 대한 품질관리 기준을 변경한다(S1710) The analysis system 100 changes the quality control criteria for the selected process factor (S1710)

공정인자의 분포를 기반으로, 조정이 가능한 관리범위 내에서 기어박스의 불량 분포를 최소화할 수 있도록 품질관리 기준이 변경될 수 있다. 이때, 공정인자의 분포는, 분포를 규정하는 모수가 사전에 추정된 후, 품질관리 기준의 변경에 이용될 수 있다.Based on the distribution of process factors, quality control standards can be changed to minimize the distribution of gearbox defects within an adjustable management range. At this time, the distribution of the process factor may be used to change the quality control standard after parameters defining the distribution are estimated in advance.

표 1은 기어박스의 공정인자에 대해, 품질관리 기준의 변경 전후를 나타낸다.Table 1 shows the process parameters of the gearbox before and after the change in quality control standards.

예컨대, 도 19에 예시된 바와 같은, T-테스트 결과를 기반으로 기어박스의 불량 분포를 최소화할 수 있도록, lock ring 압입깊이에 대한 품질관리 기준이 변경될 수 있다. For example, as illustrated in FIG. 19 , the quality control criterion for the lock ring press-in depth may be changed to minimize the distribution of defects in the gearbox based on the T-test result.

분석 시스템(100)은, 추론 모델을 시뮬레이터로 이용하여, 변경된 품질관리 기준에 따른 기어박스의 양 또는 불량에 대한 확률값을 생성함으로써, 품질관리 기준이 적절하게 변경되었는지를 확인할 수 있다. 예컨대, 기어박스의 불량에 대한 확률이 기준 확률 이상인 경우, 품질관리 기준을 획득하고, 판정 결과를 다시 생성함으로써, 품질관리 기준이 적절하게 변경되었는지를 반복적으로 확인할 수 있다. The analysis system 100 may check whether the quality control criteria have been appropriately changed by using the inference model as a simulator and generating a probability value for the amount or defect of the gearbox according to the changed quality control criteria. For example, when the probability of defective gearbox is greater than or equal to the reference probability, it is possible to repeatedly check whether the quality control standard has been appropriately changed by obtaining the quality control standard and regenerating the determination result.

표 1에 나타낸 공정인자들에 대해, 변경후의 품질관리 기준을 기어박스의 생산과정에 적용할 경우, 해당 공정인자에 기인하는 불량 발생률을 최소 10 %에서 최대 90 %까지 감소시킬 수 있을 것으로 예상되었다. 실제로, lock ring 코킹량, 피니언 플러그 LVDT 높이, 4점 베어링 압입깊이 등과 같은 공정인자에 대해, 변경된 품질관리 기준을 기어박스의 생산과정에 적용한 결과, 해당 공정인자에 기인하는 불량 발생률의 감소를 확인하였다.Regarding the process factors shown in Table 1, if the quality control standards after the change are applied to the gearbox production process, it is expected that the defect rate due to the process factor can be reduced from a minimum of 10% to a maximum of 90%. . In fact, as a result of applying the changed quality control standards to the gearbox production process for process factors such as lock ring caulking amount, pinion plug LVDT height, and 4-point bearing press-in depth, it was confirmed that the decrease in defect occurrence rate due to the corresponding process factor was confirmed. did

이상의 설명에서, 추론 모델이 품질관리 기준의 변경을 위해 사용되었으나, 추론 모델은 품질분석을 위해서도 활용될 수 있다. 예컨대, 기어박스에 대한 생산과정에 신규 품질관리 기준이 적용하고, 추후 생산과정에서 수집된 품질데이터의 특성을 확인하기 위해, 추론 모델이 활용될 수 있다.In the above description, the inference model is used for changing the quality control standard, but the inference model can also be used for quality analysis. For example, an inference model may be used to apply a new quality control standard to a gearbox production process and to confirm characteristics of quality data collected in a later production process.

본 실시예에 따른 각 순서도에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나, 반드시 이에 한정되는 것은 아니다. 다시 말해, 순서도에 기재된 과정을 변경하여 실행하거나 하나 이상의 과정을 병렬적으로 실행하는 것이 적용 가능할 것이므로, 순서도는 시계열적인 순서로 한정되는 것은 아니다.In each flowchart according to the present embodiment, it is described that each process is sequentially executed, but is not necessarily limited thereto. In other words, since it will be applicable to change and execute the process described in the flowchart or to execute one or more processes in parallel, the flowchart is not limited to a time-series sequence.

본 명세서에 설명되는 시스템들 및 기법들의 다양한 구현예들은, 디지털 전자 회로, 집적 회로, FPGA(field programmable gate array), ASIC(application specific integrated circuit), 컴퓨터 하드웨어, 펌웨어, 소프트웨어, 및/또는 이들의 조합으로 실현될 수 있다. 이러한 다양한 구현예들은 프로그래밍가능 시스템 상에서 실행가능한 하나 이상의 컴퓨터 프로그램들로 구현되는 것을 포함할 수 있다. 프로그래밍가능 시스템은, 저장 시스템, 적어도 하나의 입력 디바이스, 그리고 적어도 하나의 출력 디바이스로부터 데이터 및 명령들을 수신하고 이들에게 데이터 및 명령들을 전송하도록 결합되는 적어도 하나의 프로그래밍가능 프로세서(이것은 특수 목적 프로세서일 수 있거나 혹은 범용 프로세서일 수 있음)를 포함한다. 컴퓨터 프로그램들(이것은 또한 프로그램들, 소프트웨어, 소프트웨어 애플리케이션들 혹은 코드로서 알려져 있음)은 프로그래밍가능 프로세서에 대한 명령어들을 포함하며 "컴퓨터가 읽을 수 있는　기록매체"에 저장된다. Various implementations of the systems and techniques described herein may include digital electronic circuits, integrated circuits, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), computer hardware, firmware, software, and/or their can be realized in combination. These various implementations may include being implemented as one or more computer programs executable on a programmable system. A programmable system includes at least one programmable processor (which may be a special purpose processor) coupled to receive data and instructions from and transmit data and instructions to a storage system, at least one input device, and at least one output device. or may be a general-purpose processor). Computer programs (also known as programs, software, software applications or code) contain instructions for a programmable processor and are stored on a “computer readable medium”.

컴퓨터가 읽을 수 있는　기록매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 이러한 컴퓨터가 읽을 수 있는　기록매체는 ROM, CD-ROM, 자기 테이프, 플로피디스크, 메모리 카드, 하드 디스크, 광자기 디스크, 스토리지 디바이스 등의 비휘발성(non-volatile) 또는 비일시적인(non-transitory) 매체일 수 있다. 또한 컴퓨터가 읽을 수 있는　기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다.A computer-readable recording medium includes all kinds of recording devices that store data that can be read by a computer system. These computer-readable 　recording media include non-volatile or non-transitory media such as ROM, CD-ROM, magnetic tape, floppy disk, memory card, hard disk, magneto-optical disk, and storage device. can be a medium. In addition, the computer-readable 　recording medium may be distributed in computer systems connected through a network, and computer-readable codes may be stored and executed in a distributed manner.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an example of the technical idea of the present embodiment, and various modifications and variations can be made to those skilled in the art without departing from the essential characteristics of the present embodiment. Therefore, the present embodiments are not intended to limit the technical idea of the present embodiment, but to explain, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of this embodiment should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of rights of this embodiment.

100: 품질데이터 분석 시스템
102: 입력부 104: 데이터 전처리부
106: 판정부 108: 데이터 시각화부
110: UI부 112: 트레이닝부
202: 분석데이터 요약 204: 공정인자 중요도
206: 공정인자별 데이터 분포
208: 분석결과
302: 공정인자 조정부 304: 판정결과 출력부
306: 중요인자 출력부 308: 기준적용부
806: 공정인자 선정부 808: 데이터 균형화부
1504: 영향 분석부 1506: 관리범위 조정부
1508: 데이터 재수집부 1510: 데이터세분화 수집부
100: quality data analysis system
102: input unit 104: data pre-processing unit
106: determination unit 108: data visualization unit
110: UI unit 112: training unit
202: Summary of analysis data 204: Importance of process factors
206: Data distribution by process factor
208: Analysis result
302: process factor adjustment unit 304: judgment result output unit
306: important factor output unit 308: standard application unit
806: process factor selection unit 808: data balancing unit
1504: Impact Analysis Unit 1506: Management Scope Adjustment Unit
1508: data re-collection unit 1510: data segmentation collection unit

Claims

A method for improving quality control criteria for a product, performed by a computing device, comprising:
A process of acquiring quality data and field claims for the product, wherein the quality data is collected for a plurality of process features applied or occurring in the production process of the product, and the field Claims indicate quantity or defect in respect of said product;
analyzing a first influence between the plurality of process factors and the field claim;
expanding an existing management range of the biased process factor in the case of a biased process factor whose first influence does not fall within a predetermined upper ratio among the plurality of process factors;
recollecting quality data for the plurality of process factors based on the expanded management scope;
analyzing a second influence between the plurality of process factors and the field frame with respect to the recollected quality data; and
Process of subdividing and recollecting the data of the corresponding process factor when the first or second influence falls within the predetermined upper ratio
Including, how to improve quality control standards.

According to claim 1,
In the case of a biased process factor whose second influence does not fall within the predetermined upper ratio, a step of maintaining the management range of the biased process factor within the existing management range.

According to claim 1,
The process of analyzing the first influence,
performing a T-test for verifying significance with respect to the distribution of field claims for the plurality of process factors;
A process of comparing information gains for process factors that have passed the T-test and generating an array in an order of high information gain, wherein the information gain is information about the quantity or defect of the product generating by subtracting information on quantity or defect after branching by the process factor from ; and
A process of analyzing the correlation between the process factors having high information gain and, in the case of two process factors having a correlation coefficient greater than a predetermined reference value, removing one of them from the sequence.
Including, how to improve quality control standards.

According to claim 1,
The process of analyzing the first influence and the process of analyzing the second influence are mutually the same, a method for improving quality control standards.

According to claim 1,
The process of expanding the existing management scope,
A method for improving quality control standards by further lowering the lower limit value of the existing management range or further increasing the upper limit value.

An input unit for obtaining quality data and field claims for the product, wherein the quality data is collected for a plurality of process features applied or occurring in the production process of the product, and the field Claims indicate good or bad quality of the product;
an impact analysis unit analyzing a first influence between the plurality of process factors and the field claim;
a management range adjusting unit expanding an existing management range of the biased process factor in the case of a biased process factor whose first influence does not fall within a predetermined upper ratio among the plurality of process factors;
Based on the expanded management scope, a data re-collecting unit for re-collecting quality data for the plurality of process factors; and
If the first influence falls within the predetermined upper ratio, a data segmentation collection unit that subdivides and recollects the data of the corresponding process factor
Including, quality control standard improvement device.

According to claim 6,
The impact analysis unit,
For the re-collected quality data, a second influence between the plurality of process factors and the field frame is analyzed.

According to claim 6,
The impact analysis unit,
In the case of a biased process factor whose second influence does not fall within the predetermined upper ratio, the management range of the biased process factor is maintained within the existing management range.

According to claim 6,
The impact analysis unit,
For the plurality of process factors, a T-test is performed to verify the significance of the distribution of field claims, and information gains are compared for process factors that pass the T-test to determine whether the information gain is An array is created in order of high information gain, and a correlation between process factors having a high information gain is analyzed, and in the case of two process factors having a correlation coefficient greater than a predetermined reference value, one of the two process factors is removed from the sequence, thereby causing the first effect. , Quality control standard improvement apparatus for analyzing the degree or the second influence.

A computer program stored in a computer-readable recording medium to execute each step included in the method for improving quality control standards according to any one of claims 1 to 5.